Closed
Bug 1374164
Opened 7 years ago
Closed 7 years ago
Intermittent Main app process exited normally | application crashed [@ mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::~Log]
Categories
(Core :: Graphics, defect, P1)
Core
Graphics
Tracking
()
RESOLVED
FIXED
mozilla57
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox55 | --- | unaffected |
firefox56 | --- | fixed |
firefox57 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: jesup)
References
Details
(Keywords: crash, intermittent-failure, Whiteboard: [stockwell fixed:product])
Crash Data
Attachments
(2 files, 2 obsolete files)
(deleted),
patch
|
jib
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jib
:
review+
lizzard
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
Comment 1•7 years ago
|
||
[task 2017-06-18T19:35:58.600574Z] 19:35:58 INFO - GECKO(2930) | Assertion failure: [GFX1]: Texture deallocated too late during shutdown, at /home/worker/workspace/build/src/gfx/2d/Logging.h:519
[task 2017-06-18T19:36:31.027046Z] 19:36:31 INFO - GECKO(2930) | #01: mozilla::layers::DeallocateTextureClient [gfx/layers/client/TextureClient.cpp:326]
[task 2017-06-18T19:36:31.028023Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.028093Z] 19:36:31 INFO - GECKO(2930) | #02: mozilla::layers::TextureClient::Destroy [gfx/layers/client/TextureClient.cpp:423]
[task 2017-06-18T19:36:31.028707Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.029412Z] 19:36:31 INFO - GECKO(2930) | #03: mozilla::layers::TextureClient::~TextureClient [gfx/layers/client/TextureClient.cpp:601]
[task 2017-06-18T19:36:31.029959Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.030512Z] 19:36:31 INFO - GECKO(2930) | #04: mozilla::layers::TextureClient::~TextureClient [gfx/layers/client/TextureClient.cpp:605]
[task 2017-06-18T19:36:31.031038Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.031589Z] 19:36:31 INFO - GECKO(2930) | #05: mozilla::AtomicRefCountedWithFinalize<mozilla::layers::TextureClient>::Release [gfx/layers/AtomicRefCountedWithFinalize.h:138]
[task 2017-06-18T19:36:31.032138Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.032742Z] 19:36:31 INFO - GECKO(2930) | #06: mozilla::layers::TextureClientHolder::Release [memory/mozalloc/mozalloc.h:218]
[task 2017-06-18T19:36:31.032827Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.033400Z] 19:36:31 INFO - GECKO(2930) | #07: mozilla::layers::TextureClientRecycleAllocator::Destroy [gcc/include/c++/4.9.4/bits/stl_stack.h:218]
[task 2017-06-18T19:36:31.033943Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.034497Z] 19:36:31 INFO - GECKO(2930) | #08: mozilla::layers::CompositableClient::Destroy [gfx/layers/client/CompositableClient.cpp:85]
[task 2017-06-18T19:36:31.034669Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.035345Z] 19:36:31 INFO - GECKO(2930) | #09: mozilla::layers::CompositableClient::~CompositableClient [mfbt/RefPtr.h:77]
[task 2017-06-18T19:36:31.036881Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.037591Z] 19:36:31 INFO - GECKO(2930) | #10: mozilla::layers::ImageClientSingle::~ImageClientSingle [gfx/layers/client/ImageClient.h:87]
[task 2017-06-18T19:36:31.038136Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.039653Z] 19:36:31 INFO - GECKO(2930) | #11: mozilla::layers::CompositableClient::Release [gfx/layers/client/CompositableClient.h:75]
[task 2017-06-18T19:36:31.040193Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.041120Z] 19:36:31 INFO - GECKO(2930) | #12: mozilla::layers::ImageContainer::~ImageContainer [gfx/layers/ImageContainer.cpp:183]
[task 2017-06-18T19:36:31.041697Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.042368Z] 19:36:31 INFO - GECKO(2930) | #13: RefPtr<mozilla::layers::ImageContainer>::~RefPtr [memory/mozalloc/mozalloc.h:218]
[task 2017-06-18T19:36:31.042905Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.043777Z] 19:36:31 INFO - GECKO(2930) | #14: mozilla::MediaEngineCameraVideoSource::~MediaEngineCameraVideoSource [dom/media/webrtc/MediaEngineCameraVideoSource.h:75]
[task 2017-06-18T19:36:31.044408Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.045265Z] 19:36:31 INFO - GECKO(2930) | #15: mozilla::MediaEngineRemoteVideoSource::~MediaEngineRemoteVideoSource [dom/media/webrtc/MediaEngineRemoteVideoSource.h:98]
[task 2017-06-18T19:36:31.045883Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.046789Z] 19:36:31 INFO - GECKO(2930) | #16: mozilla::MediaEngineRemoteVideoSource::Release [dom/media/webrtc/MediaEngineRemoteVideoSource.cpp:28]
[task 2017-06-18T19:36:31.047399Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.048459Z] 19:36:31 INFO - GECKO(2930) | #17: mozilla::MediaDevice::~MediaDevice [xpcom/string/nsTString.h:20]
[task 2017-06-18T19:36:31.049133Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.050080Z] 19:36:31 INFO - GECKO(2930) | #18: mozilla::VideoDevice::~VideoDevice [dom/media/MediaManager.h:112]
[task 2017-06-18T19:36:31.050801Z] 19:36:31 INFO -
[task 2017-06-18T19:36:31.051666Z] 19:36:31 INFO - GECKO(2930) | #19: mozilla::MediaDevice::Release [dom/media/MediaManager.cpp:704]
[task 2017-06-18T19:36:31.052364Z] 19:36:31 INFO -
Comment 2•7 years ago
|
||
Blake, it seems like the resource that held on media side was deallocated after shutdown. Can you have someone to check it?
Flags: needinfo?(bwu)
Comment 3•7 years ago
|
||
Ni Randell since this is related to WebRTC.
Flags: needinfo?(bwu) → needinfo?(rjesup)
Assignee | ||
Comment 4•7 years ago
|
||
Sigh....
Something is holding a MediaDevice from some object in a CC situation it appears, and the "final-cc" is after we shut down gfx (which causes a lot of problems, btw).
#18: mozilla::VideoDevice::~VideoDevice [dom/media/MediaManager.h:112]
#19: mozilla::MediaDevice::Release [dom/media/MediaManager.cpp:704]
<called from CC code in final shutdown - nsCycleCollector::Shutdown>
Given I don't think we can know/stop people from doing this, we'll need to neuter all video sources on Shutdown().
Flags: needinfo?(rjesup)
Assignee | ||
Comment 5•7 years ago
|
||
Speculative fix; we know somehow mImage isn't null at final-cc here.
Attachment #8879633 -
Flags: review?(padenot)
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → rjesup
Status: NEW → ASSIGNED
Assignee | ||
Updated•7 years ago
|
Rank: 15
Priority: -- → P1
Updated•7 years ago
|
Attachment #8879633 -
Flags: review?(padenot) → review+
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 10•7 years ago
|
||
this is linux debug in mochitest-media-e10s tests. In the last week this has doubled in the number of failures and we should be taking a look at this.
There is a patch with r+, can we just land this?
Flags: needinfo?(padenot)
Whiteboard: [stockwell needswork]
Comment 11•7 years ago
|
||
FWIW I seem able to provoke this same assert quite reliably with my local patch set for bug 1088621:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=08d27ef79fc105be8711d300cc5e738dd41ccbd6&selectedJob=111305606
I then added a fix that seemed to help avoid late refs messing up GC for that particular case:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=23b79e7e8b76b8b17d87f59cfea2f5427c7be3bc&selectedJob=111396466
I'll try to rebase those and attach them to a new bug and link it here, in case it's helpful.
Comment 12•7 years ago
|
||
clearing NI, per comment 11, jib will handle this.
Flags: needinfo?(padenot)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Comment 20•7 years ago
|
||
(In reply to Paul Adenot (:padenot) from comment #12)
> clearing NI, per comment 11, jib will handle this.
Much thanks for jib for work on bug 1382095, but failures continue, and may have actually increased in frequency since that change.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 25•7 years ago
|
||
:jesup - Are you working on this? Making progress? This is one of our most frequent failures.
Flags: needinfo?(rjesup)
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 27•7 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #25)
> :jesup - Are you working on this? Making progress? This is one of our most
> frequent failures.
It's a speculative fix.... and bandaid on something that shouldn't be happening. I'd like to at least put out a warning when we hit the "bad" case. We still should never be getting to this point.
Pushing a Try before landing
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 30•7 years ago
|
||
jib: I just added the 'Unused << NS_WARN_IF...', because I don't want to totally hide this (but want to resolve the orange since it causes pain)
Attachment #8896962 -
Flags: review?(jib)
Assignee | ||
Updated•7 years ago
|
Attachment #8879633 -
Attachment is obsolete: true
Updated•7 years ago
|
Attachment #8896962 -
Flags: review?(jib) → review+
Comment 31•7 years ago
|
||
Pushed by rjesup@wgate.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3f44f13e8021
guarantee mImage is released by video sources before final-cc shutdown r=padenot,jib
Assignee | ||
Comment 32•7 years ago
|
||
Pushed the wallpaper patch... This doesn't really fix the bug, but demotes it to a warning.
Leave-open
Assignee: rjesup → nobody
Severity: critical → normal
Status: ASSIGNED → NEW
Rank: 15 → 18
Flags: needinfo?(rjesup)
Keywords: leave-open
Whiteboard: [stockwell needswork]
Comment 33•7 years ago
|
||
Thanks! That will clear up a lot of orange on treeherder.
Comment hidden (Intermittent Failures Robot) |
Comment 35•7 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment 37•7 years ago
|
||
This is still happening. Is there anything preventing an actual investigation of this? It seems to happen frequently enough on the linux64-qr M-e10s(mda2) job that it should be debuggable via try pushes even if you can't reproduce locally.
Comment 38•7 years ago
|
||
Yeah, it looks like the wallpaper patch had no effect on failure frequency. This remains the #1 most frequent test failure.
Whiteboard: [stockwell needswork]
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 40•7 years ago
|
||
(In reply to Geoff Brown [:gbrown] (pto Aug 19 - Aug 27) from comment #38)
> Yeah, it looks like the wallpaper patch had no effect on failure frequency.
> This remains the #1 most frequent test failure.
AHA! this is failing on mImageContainer not being nulled, not mImage! Easy fix.
Assignee | ||
Comment 41•7 years ago
|
||
Attachment #8898281 -
Flags: review?(jib)
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → rjesup
Status: NEW → ASSIGNED
Comment 42•7 years ago
|
||
Comment on attachment 8898281 [details] [diff] [review]
Clear mImageContainer as well as mImage when stopping a gUM stream
Review of attachment 8898281 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/media/webrtc/MediaEngineCameraVideoSource.h
@@ -71,5 @@
> - // somehow.) (Bug 1374164)
> -
> - Unused << NS_WARN_IF(mImage);
> -
> - mImage = nullptr;
I see you're backing off of clearing mImage here. Note that this is a different mImage from that in MediaEngineDefault.h. See bug 1391278.
I think we should still clear mImage and mImageContainer here. This is a hard requirement of the gfx code, and I don't know that we can guarantee that MediaEngine references are not held too long in all cases, or that stop will always have been called (may be the case).
Can we do both, MOZ_ASSERT AND clear?
Attachment #8898281 -
Flags: review?(jib) → review-
Comment 43•7 years ago
|
||
Comment on attachment 8898281 [details] [diff] [review]
Clear mImageContainer as well as mImage when stopping a gUM stream
From #media, we'd "rather let it be caught by the gfx check than hide it by clearing it", so switching to r+
Attachment #8898281 -
Flags: review- → review+
Comment 44•7 years ago
|
||
Pushed by rjesup@wgate.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e9e36f27b698
Clear mImageContainer as well as mImage when stopping a gUM stream r=jib
Comment 45•7 years ago
|
||
Backed out for failing e.g. mda's test_getUserMedia_mediaElementCapture_tracks.html and browser-chrome's browser_devices_get_user_media_unprompted_access.js:
https://hg.mozilla.org/integration/mozilla-inbound/rev/806d980dba310e653175973bd7f7e1c31b73c2d1
A push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=05358f0058a7a711dce865ea74af6ff451686657&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log mda: https://treeherder.mozilla.org/logviewer.html#?job_id=123953587&repo=mozilla-inbound
GECKO(3163) | Assertion failure: mRawPtr != nullptr (You can't dereference a NULL RefPtr with operator->().), at /home/worker/workspace/build/src/obj-firefox/dist/include/mozilla/RefPtr.h:315
GECKO(3163) | Assertion failure: mDestroyed, at /home/worker/workspace/build/src/gfx/layers/composite/ImageLayerComposite.cpp:44
TEST-UNEXPECTED-FAIL | dom/media/tests/mochitest/test_getUserMedia_mediaElementCapture_tracks.html | application terminated with exit code 11
Failure log bc: https://treeherder.mozilla.org/logviewer.html#?job_id=123953681&repo=mozilla-inbound
GECKO(1734) | Assertion failure: mRawPtr != nullptr (You can't dereference a NULL RefPtr with operator->().), at /home/worker/workspace/build/src/obj-firefox/dist/include/mozilla/RefPtr.h:315
TEST-UNEXPECTED-FAIL | browser/base/content/test/webrtc/browser_devices_get_user_media_unprompted_access.js | Test timed out -
Flags: needinfo?(rjesup)
Comment 46•7 years ago
|
||
Backout by kwierso@gmail.com:
https://hg.mozilla.org/mozilla-central/rev/6613ef541bb1
Backed out changeset e9e36f27b698 for failing e.g. mda's test_getUserMedia_mediaElementCapture_tracks.html and browser-chrome's browser_devices_get_user_media_unprompted_access.js. r=backout
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 48•7 years ago
|
||
Also removes hiding of mImage/mImageContainer/mMonitor by
MediaEngineDefault, general cleanup of MediaEngineDefault shutdown, and
fixing locking around access to protected members.
Attachment #8898834 -
Flags: review?(jib)
Comment 49•7 years ago
|
||
Comment on attachment 8898834 [details] [diff] [review]
Clear mImageContainer as well as mImage when stopping a gUM stream
Review of attachment 8898834 [details] [diff] [review]:
-----------------------------------------------------------------
Lgtm. One question on whether we need to lock twice.
::: dom/media/webrtc/MediaEngineRemoteVideoSource.cpp
@@ +435,2 @@
> // implicitly releases last image
> mImage = image.forget();
Just wondering if there are ways to avoid locking twice here. Since borh image and mImage are refptrs, couldn't we set mImage ahead of CopyData instead under the first lock? I may not understand the guarantees here, but wanted to point that out, as a potential optimization.
Attachment #8898834 -
Flags: review?(jib) → review+
Assignee | ||
Comment 50•7 years ago
|
||
> ::: dom/media/webrtc/MediaEngineRemoteVideoSource.cpp
> @@ +435,2 @@
> > // implicitly releases last image
> > mImage = image.forget();
>
> Just wondering if there are ways to avoid locking twice here. Since borh
> image and mImage are refptrs, couldn't we set mImage ahead of CopyData
> instead under the first lock? I may not understand the guarantees here, but
> wanted to point that out, as a potential optimization.
I'm locking twice to avoid holding it on a large memcpy/etc. mImage needs a lock to access/modify, since it's touched from multiple threads. If we copy data outside the lock, after setting mImage, then the MSG may come along and grab mImage and send to the compositor before we've finished the copy.
Flags: needinfo?(rjesup)
Assignee | ||
Comment 51•7 years ago
|
||
Comment 52•7 years ago
|
||
Pushed by rjesup@wgate.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4d6210358c49
Clear mImageContainer as well as mImage when stopping a gUM stream r=jib
Comment 54•7 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Whiteboard: [stockwell needswork] → [stockwell fixed:product]
Comment hidden (Intermittent Failures Robot) |
Comment 57•7 years ago
|
||
:jesup, this has so many failures in the last week, almost all on linux64-qr, but we still see issues on regular linux- can you look into this some more, or should we file a new bug?
Flags: needinfo?(rjesup)
Whiteboard: [stockwell fixed:product] → [stockwell needswork]
Comment 58•7 years ago
|
||
imeantbrasstacks |
Seems fixed to me, the last few on basstacks are on builds without the fix.
Comment 59•7 years ago
|
||
oh, thanks for pointing out my oversight! After the 18th (when the fix was merged everywhere) we just have a few spots on try/beta. sorry for the confusion
Flags: needinfo?(rjesup)
Whiteboard: [stockwell needswork] → [stockwell fixed:product]
Updated•7 years ago
|
Attachment #8898281 -
Attachment is obsolete: true
Comment 60•7 years ago
|
||
This grafts cleanly to Beta and OrangeFactor seems to indicate that the latest fix worked. Please nominate these patches for Beta approval when you get a chance.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox55:
--- → unaffected
status-firefox56:
--- → affected
status-firefox57:
--- → fixed
status-firefox-esr52:
--- → unaffected
Flags: needinfo?(rjesup)
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Assignee | ||
Comment 61•7 years ago
|
||
Comment on attachment 8898834 [details] [diff] [review]
Clear mImageContainer as well as mImage when stopping a gUM stream
Approval Request Comment
[Feature/Bug causing the regression]: Timing changes especially with QuantumRender
[User impact if declined]: Oranges in automation, possible shutdown crashes
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: simple changes it add locking, and avoid shadowing the monitor
[String changes made/needed]: none
Flags: needinfo?(rjesup)
Attachment #8898834 -
Flags: approval-mozilla-beta?
Comment on attachment 8898834 [details] [diff] [review]
Clear mImageContainer as well as mImage when stopping a gUM stream
Possible crash fix, should help with intermittent orange issues in tests.
Please uplift for tomorrow's beta 6 build.
Attachment #8898834 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Assignee | ||
Comment 63•7 years ago
|
||
uplift |
Comment 64•7 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #61)
> [Is this code covered by automated tests?]: yes
>
> [Has the fix been verified in Nightly?]: yes
>
> [Needs manual test from QE? If yes, steps to reproduce]: no
Setting qe-verify- based on Randell's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•