1127270 - shutdownhang in mozilla::layers::CompositorParent::ShutDown()

Reporter

Description

•

10 years ago

We see constant shutdown hangs for our Mozmill tests in mozilla::layers::CompositorParent::ShutDown(). It seems to mostly happen on Windows especially XP. Currently I'm working on reducing the tests in question but it's a bit tricky. Here the crash report of the shutdown hang: bp-50fa6da5-01ab-4ed6-ba0b-6fca22150129. Teodor, can you please help me and check older release/beta builds of Firefox? It would be good to know when this has been started.

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 1

•

10 years ago

[Tracking Requested - why for this release]: I see over 2000 crashes with this signature in the last 7 days across supported versions. All reporting this problem on shutdown.

URL: https://crash-stats.mozilla.com/repor...

status-firefox36: --- → affected

status-firefox37: --- → affected

status-firefox38: --- → affected

tracking-firefox36: --- → ?

Whiteboard: [qa-automation-blocked] → [qa-automation-blocked][mozmill]

Teodor Druta

Comment 2

•

10 years ago

I think I found the regressor for this crash: https://hg.mozilla.org/releases/mozilla-beta/rev/521859f9eae2 https://bugzilla.mozilla.org/show_bug.cgi?id=1119941 36.0b1 build2

Teodor Druta

Comment 3

•

10 years ago

Here's the pushlog between 36.0b1 build1 and build2: https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=1b26127c3323&tochange=521859f9eae2

Sylvestre Ledru [:Sylvestre]

Comment 4

•

10 years ago

Important crash, tracking!

tracking-firefox36: ? → +

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 5

•

10 years ago

Given that this shutdown hang and crash might be related to Flash protected mode, lets also CC Benjamin.

Robert Kaiser

Comment 6

•

10 years ago

(In reply to Teodor Druta from comment #2) > I think I found the regressor for this crash I don't think that's the right one, unless you have a reproducible test case and actually tested builds and backing out patches one by one. The shutdownhang|... signatures replaced the "RunWatchdog" signatures of bug 1103833 when bug 1104317 was solved on the crash-stats server side. In turn, the "RunWatchdog" signatures came into being when bug 1038342 was fixed by killing processes that hang for more than 60 seconds on shutdown. So, all in all, earlier versions would hang there for a long time while versions starting with 36 crash. Those crashes have been reported with "RunWatchdog" before and since January 21, when the fix to bug 1104317 was pushed live in Socorro production, they report with "shutdownhang" signature that give actually better insight in what was hanging.

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 7

•

10 years ago

(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #6) > (In reply to Teodor Druta from comment #2) > > I think I found the regressor for this crash > > I don't think that's the right one, unless you have a reproducible test case > and actually tested builds and backing out patches one by one. Please read my comment 0. It clearly states that we have reproducible steps to trigger this hang. And we know that we didn't crash formerly. All the bugs you are referring here have no impact to the hang problem.

Benjamin Smedberg

Comment 8

•

10 years ago

The thing that landed between build1 and build2 was a backout, and Flash protected mode is relatively unlikely to be related to this. I don't trust the regression range from comment 2-3. nical, can you tell from the crash report which is the compositor thread and/or why it's failing to shut down and hanging?

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Comment 9

•

10 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #8) > nical, can you tell from the crash report which is the compositor thread > and/or why it's failing to shut down and hanging? I don't know which is the compositor thread. CompositorParent::ShutDown waits (spins the event loop) until the Compositor thread is destroyed, which is triggered by the CompositorThreadHolder being destroyed which means both the global sCompositorThreadHolder variable and CompositorParent's mCompositorThread must be null. the global variable was just set to null in the stack so it looks like we haven't nulled out the CompositorParent's mCompositorThreadHolder variable. This should have happened in CompositorParent::DeferredDestroy which is scheduled on the main thread after the compositor thread is done cleaning its stuff up (in CompositorParent::RecvStop which runs in the Compositor thread, triggered by CompositorChild::SendStop() on the main thread which is called by CompositorChild::Destroy which in turn is called by nsBaseWidget::DestroyCompositor) What a happy mess :) First thing I'd look at is whether we loose the reference to the CompositorChild in nsBaseWidget without calling DestroyCompositor. Then if something could have prevented any of the functions I mentioned above to be called.

Flags: needinfo?(nical.bugzilla)

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 10

•

10 years ago

Nicolas, would a full minidump be helpful for you?

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Comment 11

•

10 years ago

(In reply to Henrik Skupin (:whimboo) from comment #10) > Nicolas, would a full minidump be helpful for you? I don't have time to work on this unless I sacrifice other bugs, so unless Bas or Milan want to bump the priority, assume I am not going to fix this in the short term (sorry).

Flags: needinfo?(nical.bugzilla)

Sylvestre Ledru [:Sylvestre]

Comment 12

•

10 years ago

Milan or Bas, could you find someone else to work on this? thanks FYI, beta 6 gtb is today...

Flags: needinfo?(milan)

Flags: needinfo?(bas)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 13

•

10 years ago

Safe to assume this won't get resolved by beta 6. We don't even seem to know what it is and where it is, and we don't have a regression range we trust.

Flags: needinfo?(milan)

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

10 years ago

Flags: needinfo?(bas)

Robert Kaiser

Updated

•

10 years ago

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 15

•

10 years ago

(In reply to Milan Sreckovic [:milan] from comment #13) > Safe to assume this won't get resolved by beta 6. We don't even seem to > know what it is and where it is, and we don't have a regression range we > trust. Milan, can you please have a look at my comment 10? We could provide a full minidump here if that is of any kind of help. If not someone from us would have to spend some more time to reduce the Mozmill test even further. Please let me know if the mini dump path would work.

Flags: needinfo?(milan)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 16

•

10 years ago

Need to clear up other beta bugs first, this is not likely to get looked at in the next couple of days.

David Anderson [:dvander] - inactive, e-mail if emergency

Updated

•

10 years ago

Whiteboard: [qa-automation-blocked][mozmill] → [qa-automation-blocked][mozmill][gfx-noted]

(Away)

Updated

•

10 years ago

(Away)

Updated

•

10 years ago

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 17

•

10 years ago

The number of crashes have been lowered with the last beta release. So I'm going to remove our blocking whiteboard entry for now.

Whiteboard: [qa-automation-blocked][mozmill][gfx-noted] → [mozmill][gfx-noted]

Sylvestre Ledru [:Sylvestre]

Comment 18

•

10 years ago

Since it decreased, I am going to mark it as wontfix for 36. It is not tracked for 37. Don't hesitate to submit for tracking if it spikes.

status-firefox36: affected → wontfix

Robert Kaiser

Updated

•

10 years ago

Crash Signature: , bool) | mozilla::layers::CompositorParent::ShutDown()] → , bool) | mozilla::layers::CompositorParent::ShutDown()] [@ shutdownhang | ntdll.dll@0x3c6bc]

Bryan Price

Comment 19

•

10 years ago

This build hasn't crashed (on shutdown!) for me yet. And I've done it a few times. Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:38.0) Gecko/20100101 Firefox/38.0 ID:20150215030238 CSet: e0cb32a0b1aa Which is a welcome change. My last crash today was the restart for this build. But I've shutdown three times now, and no crash.

Jim Mathies [:jimm]

Updated

•

10 years ago

Jim Mathies [:jimm]

Comment 20

•

10 years ago

sorry, wrong bug.

Sylvestre Ledru [:Sylvestre]

Comment 21

•

10 years ago

Tracking for the current release as it is #3.

status-firefox39: --- → affected

tracking-firefox38: --- → +

tracking-firefox39: --- → +

Robert Kaiser

Updated

•

10 years ago

Keywords: topcrash-win

Tracy Walker [:tracy]

Comment 22

•

10 years ago

[Tracking Requested - why for this release]: Combined signatures puts this in top 5 on Nightly (Fx40)

status-firefox40: --- → affected

tracking-firefox40: --- → ?

Jim Mathies [:jimm]

Updated

•

10 years ago

Blocks: 1121145

Sylvestre Ledru [:Sylvestre]

Updated

•

10 years ago

status-firefox37: affected → wontfix

Sylvestre Ledru [:Sylvestre]

Comment 23

•

10 years ago

Tracking as it is still one of the most important issue.

tracking-firefox40: ? → +

David Weir (satdav)

Comment 24

•

10 years ago

Hello I am marking this as a major as its a knwon crash and happens a lot PS I am on windows 7 and crashed yesterday

Severity: critical → major

OS: Windows XP → All

Version: 36 Branch → Trunk

David Weir (satdav)

Comment 25

•

10 years ago

can we get this on the release notes also for people to be aware

status-firefox38.0.5: --- → affected

Flags: needinfo?(milan)

David Weir (satdav)

Comment 26

•

10 years ago

Sorry untook the need info from Milan but added it back

Flags: needinfo?(milan)

Robert Kaiser

Comment 27

•

10 years ago

Please don't downgrade to major. Also, we know this happens quite a bit, otherwise it would not have a topcrash flag and be marked tracking for a number of releases, no need to flag more than that. The real issue is we need to find out what's really going on there. I think the only way you can really make this being fixed faster is to provide us with a scenario that can reliably reproduce the issue. We so far haven't heard of any such case.

Severity: major → critical

Sylvestre Ledru [:Sylvestre]

Comment 28

•

10 years ago

As for other releases it is too late for 38.

status-firefox38: affected → wontfix

Tracy Walker [:tracy]

Updated

•

10 years ago

Robert Kaiser

Comment 29

•

10 years ago

Tracy, I don't think the nsThread::Shutdown you added to the whiteboard is the same thing.

Tracy Walker [:tracy]

Comment 30

•

10 years ago

heh, didn't mean to add it to whiteboard. I compared stack signature with another in this bug and they were identical.

Crash Signature: , bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::dom::ContentParent::Observe(nsISupports*, char const*, wchar_t cons ] → , bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::dom::ContentParent::Observe(nsISupports*, char const*, wchar_t cons ] [@ shutdownhang | WaitForSingleObjectEx | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEve…

Andrew McCreight [:mccr8]

Comment 31

•

10 years ago

This hang is happening 100% of the time for me when I exit the browser with a page open (so there's a content process). This is with a DMD* Linux debug build, running over VNC, with an OSX client. I haven't tried with a non-DMD build yet. * https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD

Andrew McCreight [:mccr8]

Comment 32

•

10 years ago

It looks like my hang is some kind of fallout from my own patches, so I don't know how useful my being able to reproduce it is, but here's the stack on the compositor thread in case it is useful: #0 0x00007f70c0cf98bf in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0 (gdb) bt #0 0x00007f70c0cf98bf in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0 #1 0x00007f70ba16c191 in ConditionVariable::Wait (this=0x7f70a5acdbbc) at /home/amccreight/mc/ipc/chromium/src/base/condition_variable_posix.cc:40 #2 0x00007f70ba18474e in base::WaitableEvent::TimedWait (this=0x7f70a924e1d8, max_time=...) at /home/amccreight/mc/ipc/chromium/src/base/waitable_event_posix.cc:195 #3 0x00007f70ba18491b in base::WaitableEvent::Wait (this=0x7f70a5acdbbc) at /home/amccreight/mc/ipc/chromium/src/base/waitable_event_posix.cc:201 #4 0x00007f70ba17444f in base::MessagePumpDefault::Run (this=0x7f70a924e1c0, delegate=0x7f70a5acdd48) at /home/amccreight/mc/ipc/chromium/src/base/message_pump_default.cc:60 #5 0x00007f70ba1735d4 in RunHandler (this=0x7f70a5acdbbc) at /home/amccreight/mc/ipc/chromium/src/base/message_loop.cc:226 #6 MessageLoop::Run (this=0x7f70a5acdbbc) at /home/amccreight/mc/ipc/chromium/src/base/message_loop.cc:200 #7 0x00007f70ba17e519 in base::Thread::ThreadMain (this=0x7f70a924e160) at /home/amccreight/mc/ipc/chromium/src/base/thread.cc:170 #8 0x00007f70ba17e76f in ThreadFunc (closure=0x7f70a5acdbbc) at /home/amccreight/mc/ipc/chromium/src/base/platform_thread_posix.cc:39 That does not look useful but you never know.

Andrew McCreight [:mccr8]

Comment 33

•

10 years ago

Bill and I looked at this a little bit, but we weren't able to figure much out. Something is going wrong in the sequence of messages back and forth between the parent and child process, so the child doesn't shut down, but the ContentParent either gets far enough or doesn't notice the failure so it removes the xpcom-shutdown observer before shutdown, and thus never kills the nonresponsive child. My steps to reproduce are something like: 1. Make a debug build, and also add ac_add_options --enable-dmd 2. Start the browser with DMD on like this: ./mach run --dmd --mode=live --sample-below=1 3. Open a random webpage (I used http://news.ycombinator.com/ but maybe it doesn't matter). Let it load at least a little bit. 4. Exit. That hangs around 95% of the time for me.

Andrew McCreight [:mccr8]

Comment 34

•

10 years ago

The DMD changes there shouldn't affect anything except the performance, making it a good amount slower, so presumably there's some kind of race condition.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 35

•

10 years ago

Lee, if you follow instructions in comment 33, can you reproduce?

Flags: needinfo?(milan) → needinfo?(lsalzman)

Lee Salzman [:lsalzman]

Comment 36

•

10 years ago

(In reply to Milan Sreckovic [:milan] from comment #35) > Lee, if you follow instructions in comment 33, can you reproduce? I am having trouble reproducing this following those instructions. I'm not seeing any hangs with DMD enabled.

Lee Salzman [:lsalzman]

Updated

•

10 years ago

Flags: needinfo?(lsalzman)

Liz Henry (:lizzard Please n-i to RyanVM, jcristau, or pascal)

Comment 37

•

9 years ago

This could still squeak into 39 but we are heading into beta 4 now. It looks like people have had a crack at fixing it several times and Andrew has a good possible way to reproduce a related crash. Milan I realize there may be other higher priority issues; we should come back to this and not let it drop though. I'll keep tracking this for 39 for the moment.

Flags: needinfo?(milan)

Andrew McCreight [:mccr8]

Comment 38

•

9 years ago

It kind of feels like the issue I'm seeing is e10s-specific, and thus unrelated to whatever is happening on release. But it is hard to know.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 39

•

9 years ago

Agreed. I'll assign to :nical just in case he can get to it; we do want a strong 39.

Assignee: nobody → nical.bugzilla

Flags: needinfo?(milan)

alex_mayorga

Comment 40

•

9 years ago

bp-feaa716f-75c4-45a2-b146-7b5d72150619 19/06/2015 11:17 a.m. Crashing Thread Frame Module Signature Source 0 xul.dll mozilla::`anonymous namespace'::RunWatchdog(void*) toolkit/components/terminator/nsTerminator.cpp 1 nss3.dll PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c 2 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c 3 msvcr120.dll _callthreadstartex f:\dd\vctools\crt\crtw32\startup\threadex.c:376 4 msvcr120.dll _threadstartex f:\dd\vctools\crt\crtw32\startup\threadex.c:354 5 kernel32.dll BaseThreadInitThunk 6 ntdll.dll RtlUserThreadStart 7 kernel32.dll BasepReportFault 8 kernel32.dll BasepReportFault

status-firefox41: --- → affected

Milan Sreckovic [:milan] (needinfo for best results)

Comment 41

•

9 years ago

(In reply to alex_mayorga from comment #40) > bp-feaa716f-75c4-45a2-b146-7b5d72150619 > 19/06/2015 11:17 a.m. > > Crashing Thread > Frame Module Signature Source > 0 xul.dll mozilla::`anonymous namespace'::RunWatchdog(void*) > toolkit/components/terminator/nsTerminator.cpp > 1 nss3.dll PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c > 2 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c > 3 msvcr120.dll _callthreadstartex > f:\dd\vctools\crt\crtw32\startup\threadex.c:376 > 4 msvcr120.dll _threadstartex > f:\dd\vctools\crt\crtw32\startup\threadex.c:354 > 5 kernel32.dll BaseThreadInitThunk > 6 ntdll.dll RtlUserThreadStart > 7 kernel32.dll BasepReportFault > 8 kernel32.dll BasepReportFault Alex, you had this as a start up crash? Weird, the crash report itself is showing it as a > hour session. Does the start up crash persist? Does safe mode work?

Arthur K. (he/him)

Comment 42

•

9 years ago

(In reply to alex_mayorga from comment #40) > bp-feaa716f-75c4-45a2-b146-7b5d72150619 > 19/06/2015 11:17 a.m. > > Crashing Thread > Frame Module Signature Source > 0 xul.dll mozilla::`anonymous namespace'::RunWatchdog(void*) > toolkit/components/terminator/nsTerminator.cpp > 1 nss3.dll PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c > 2 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c > 3 msvcr120.dll _callthreadstartex > f:\dd\vctools\crt\crtw32\startup\threadex.c:376 > 4 msvcr120.dll _threadstartex > f:\dd\vctools\crt\crtw32\startup\threadex.c:354 > 5 kernel32.dll BaseThreadInitThunk > 6 ntdll.dll RtlUserThreadStart > 7 kernel32.dll BasepReportFault > 8 kernel32.dll BasepReportFault Seeing as how the crash might be related to your Intel VGA driver (8.15.10.2696, from 2013), could you try updating to the current version (15.33.36.64.4226, June 2015 via https://goo.gl/np2lji) and see if it still crashes?

Milan Sreckovic [:milan] (needinfo for best results)

Comment 43

•

9 years ago

Alex, did this just start happening for you with the nightly? Because if it did, and you have time, before you update the driver, it would be beyond awesome if you could run mozregression (https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/Existing_Tools#MozRegression) to help us find out exactly when it started happening.

alex_mayorga

Comment 44

•

9 years ago

¡Hola Milan! It seems to still be a thing on Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:41.0) Gecko/20100101 Firefox/41.0 ID:20150625030202 CSet: 0b2f5e8b7be5 bp-31608b8c-69d0-49ac-8085-4c6002150625 I'm not entirely sure of STR though... Is this the tab crash that happens when I shutdown the computer without closing Nightly first?

Flags: needinfo?(milan)

Arthur K. (he/him)

Comment 45

•

9 years ago

(In reply to alex_mayorga from comment #44) > ¡Hola Milan! > > It seems to still be a thing on Mozilla/5.0 (Windows NT 6.1; Win64; x64; > rv:41.0) Gecko/20100101 Firefox/41.0 ID:20150625030202 CSet: 0b2f5e8b7be5 > > bp-31608b8c-69d0-49ac-8085-4c6002150625 > > I'm not entirely sure of STR though... > > Is this the tab crash that happens when I shutdown the computer without > closing Nightly first? Did you try updating your Intel VGA driver as I mentioned in comment 42?

alex_mayorga

Comment 46

•

9 years ago

(In reply to Arthur K. from comment #45) > (In reply to alex_mayorga from comment #44) > > ¡Hola Milan! > > > > It seems to still be a thing on Mozilla/5.0 (Windows NT 6.1; Win64; x64; > > rv:41.0) Gecko/20100101 Firefox/41.0 ID:20150625030202 CSet: 0b2f5e8b7be5 > > > > bp-31608b8c-69d0-49ac-8085-4c6002150625 > > > > I'm not entirely sure of STR though... > > > > Is this the tab crash that happens when I shutdown the computer without > > closing Nightly first? > > Did you try updating your Intel VGA driver as I mentioned in comment 42? ¡Hola Arthur! I just tried updating with win64_153336.exe and got the following very uninformative message: "Error This computer does not meet the minimum requirements for installing the software. <OK>" =(

Arthur K. (he/him)

Comment 47

•

9 years ago

(In reply to alex_mayorga from comment #46) > (In reply to Arthur K. from comment #45) > > (In reply to alex_mayorga from comment #44) > > > ¡Hola Milan! > > > > > > It seems to still be a thing on Mozilla/5.0 (Windows NT 6.1; Win64; x64; > > > rv:41.0) Gecko/20100101 Firefox/41.0 ID:20150625030202 CSet: 0b2f5e8b7be5 > > > > > > bp-31608b8c-69d0-49ac-8085-4c6002150625 > > > > > > I'm not entirely sure of STR though... > > > > > > Is this the tab crash that happens when I shutdown the computer without > > > closing Nightly first? > > > > Did you try updating your Intel VGA driver as I mentioned in comment 42? > > ¡Hola Arthur! > > I just tried updating with win64_153336.exe and got the following very > uninformative message: > > "Error > This computer does not meet the minimum requirements for installing the > software. > <OK>" > > =( Hmm, based on your crash report DeviceID and what your old driver said, it should have been right. Can you please grab GPU-Z 0.8.4 and tell me what it says in the Name area? Also, what does it say in Display Adapter under Device Manager?

Arthur K. (he/him)

Comment 48

•

9 years ago

(In reply to alex_mayorga from comment #46) > (In reply to Arthur K. from comment #45) > > (In reply to alex_mayorga from comment #44) > > > ¡Hola Milan! > > > > > > It seems to still be a thing on Mozilla/5.0 (Windows NT 6.1; Win64; x64; > > > rv:41.0) Gecko/20100101 Firefox/41.0 ID:20150625030202 CSet: 0b2f5e8b7be5 > > > > > > bp-31608b8c-69d0-49ac-8085-4c6002150625 > > > > > > I'm not entirely sure of STR though... > > > > > > Is this the tab crash that happens when I shutdown the computer without > > > closing Nightly first? > > > > Did you try updating your Intel VGA driver as I mentioned in comment 42? > > ¡Hola Arthur! > > I just tried updating with win64_153336.exe and got the following very > uninformative message: > > "Error > This computer does not meet the minimum requirements for installing the > software. > <OK>" > > =( Well, originally I thought this was an HD4000 but it seems to be an HD3000. Try these drivers please: https://goo.gl/2SCvaR

alex_mayorga

Comment 49

•

9 years ago

¡Hola Arthur! win64_152824.exe did work I disobeyed the installer and left Nightly running during the update. This resulted on the following crash: Report ID Date Submitted bp-4388b86f-bd1b-4444-a05c-be5682150625 25/06/2015 04:04 p.m. That is seemingly https://bugzilla.mozilla.org/show_bug.cgi?id=1133623

Liz Henry (:lizzard Please n-i to RyanVM, jcristau, or pascal)

Comment 50

•

9 years ago

Wontfixing for 39. This recent activity appears to be on 41.

status-firefox39: affected → wontfix

Jonatan Svensson Glad

Comment 51

•

9 years ago

I pressed the "Update Nightly" in the "hamburger-menu". This caused this crash: https://crash-stats.mozilla.com/report/index/7ea1fd3e-9f3f-41ad-b7e4-e24842150710

Jonatan Svensson Glad

Comment 52

•

9 years ago

[Tracking Requested - why for this release]: See above comment, it happened for 42.0a1. Note I had e10s on.

tracking-firefox42: --- → ?

Jonatan Svensson Glad

Updated

•

9 years ago

status-firefox42: --- → ?

Milan Sreckovic [:milan] (needinfo for best results)

Comment 53

•

9 years ago

The original report talked about mozmill tests - did we ever run into a problem with the debug build?

Flags: needinfo?(milan) → needinfo?(hskupin)

Sylvestre Ledru [:Sylvestre]

Comment 54

•

9 years ago

Tracking for 42 as it is a top crash... but I am unhappy that we have been tracking it since 36... Wontfix for 40 as I don't think we will have a fix in time for this release...

status-firefox38.0.5: affected → wontfix

status-firefox40: affected → wontfix

status-firefox42: ? → affected

tracking-firefox41: --- → +

tracking-firefox42: ? → +

Milan Sreckovic [:milan] (needinfo for best results)

Comment 55

•

9 years ago

(In reply to Sylvestre Ledru [:sylvestre] PTO => July 10th from comment #54) > Tracking for 42 as it is a top crash... but I am unhappy that we have been > tracking it since 36... > Wontfix for 40 as I don't think we will have a fix in time for this > release... Agreed, but we don't know how to fix it.

avada

Comment 56

•

9 years ago

To me it changes a lot. It stopped happening maybe yesterday, but was hanging/crashing for several weeks before. It didn't happen before that for a long time. But it also happened for a while even before that.

Jim Mathies [:jimm]

Comment 57

•

9 years ago

Jim Mathies [:jimm]

Comment 58

•

9 years ago

Those were from the browser process list.

Bas Schouten (:bas.schouten)

Assignee

Comment 59

•

9 years ago

Nical, could there be like an image bridge that's still holding onto the CompositorThreadHolder?

Flags: needinfo?(nical.bugzilla)

Bas Schouten (:bas.schouten)

Assignee

Comment 60

•

9 years ago

I managed to reproduce this problem locally with fairly high reliability, I added a printf that shows me the address of the sCompositorThreadHolder before we null it out, here's where it gets interesting, when we're in the hung state, here's the data on sCompositorThreadHolder: - (CompositorThreadHolder*)0x10998660 0x10998660 {mRefCnt={mValue={...} } mHelperForMainThreadDestruction={...} mCompositorThread=0x109d8520 {...} } mozilla::layers::CompositorThreadHolder * - mRefCnt {mValue={...} } mozilla::ThreadSafeAutoRefCnt - mValue {...} mozilla::Atomic<unsigned int,2,void> - mozilla::detail::AtomicBaseIncDec<unsigned int,2> {...} mozilla::detail::AtomicBaseIncDec<unsigned int,2> - mozilla::detail::AtomicBase<unsigned int,2> {mValue={...} } mozilla::detail::AtomicBase<unsigned int,2> - mValue {...} std::atomic<unsigned int> - std::atomic_uint {_My_val=2 } std::atomic_uint _My_val 2 unsigned long mHelperForMainThreadDestruction {...} mozilla::layers::HelperForMainThreadDestruction + mCompositorThread 0x109d8520 {startup_data_=0x004fbc30 {options={message_loop_type=??? stack_size=??? transient_hang_timeout=...} ...} ...} base::Thread * const In other words, there's 2 references to the CompositorThreadHolder lying around somewhere and not being cleaned up, from there on, this hang occurring is no surprise.

Bas Schouten (:bas.schouten)

Assignee

Comment 61

•

9 years ago

I've confirmed that at this point the ImageBridgeParent singleton is properly destroyed.

Nicolas Silva [:nical]

Comment 62

•

9 years ago

Talked about it with Bas on skype.

Flags: needinfo?(nical.bugzilla)

Bas Schouten (:bas.schouten)

Assignee

Comment 63

•

9 years ago

So I've concluded this is to an ImageBridgeParent still being alive. Here's the stack that creates the offending ImageBridge: > xul.dll!mozilla::layers::ImageBridgeParent::ImageBridgeParent(MessageLoop * aLoop, IPC::Channel * aTransport, unsigned long aChildProcessId) Line 74 C++ xul.dll!mozilla::layers::ImageBridgeParent::Create(IPC::Channel * aTransport, unsigned long aChildProcessId) Line 196 C++ xul.dll!mozilla::dom::ContentParent::AllocPImageBridgeParent(IPC::Channel * aTransport, unsigned long aOtherProcess) Line 3163 C++ xul.dll!mozilla::dom::PContentParent::OnMessageReceived(const IPC::Message & msg__) Line 5657 C++ xul.dll!mozilla::ipc::MessageChannel::DispatchAsyncMessage(const IPC::Message & aMsg) Line 1373 C++ xul.dll!mozilla::ipc::MessageChannel::DispatchMessageW(const IPC::Message & aMsg) Line 1294 C++ xul.dll!mozilla::ipc::MessageChannel::OnMaybeDequeueOne() Line 1266 C++ xul.dll!DispatchToMethod<mozilla::ipc::MessageChannel,bool (__thiscall mozilla::ipc::MessageChannel::*)(void)>(mozilla::ipc::MessageChannel * obj, bool (void) * method, const Tuple0 & arg) Line 388 C++ xul.dll!RunnableMethod<mozilla::ipc::MessageChannel,bool (__thiscall mozilla::ipc::MessageChannel::*)(void),Tuple0>::Run() Line 310 C++ xul.dll!mozilla::ipc::MessageChannel::RefCountedTask::Run() Line 456 C++ xul.dll!mozilla::ipc::MessageChannel::DequeueTask::Run() Line 473 C++ xul.dll!MessageLoop::RunTask(Task * task) Line 365 C++ This ImageBridge is essentially created as a child of a content parent being created.

Bas Schouten (:bas.schouten)

Assignee

Comment 64

•

9 years ago

We've concluded this image bridge is the result of a content process that's created for background thumbnail generation. It appears that the ContentParent for this process and an image bridge are not properly being shutdown. There's a couple of message sending errors in the console, it's possible this has something to do with those messages being dropped because we're shutting down, but this is hard to know for sure.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 65

•

9 years ago

Note that some of these hangs would have been crashes prior to bug 1175521. That bug only wallpapered over the crash, but see https://bugzilla.mozilla.org/show_bug.cgi?id=1175521#c7 in particular for perhaps something that can help in this bug.

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 67

•

9 years ago

(In reply to Milan Sreckovic [:milan] from comment #53) > The original report talked about mozmill tests - did we ever run into a > problem with the debug build? We are no longer running Mozmill tests. They have been partly replaced with the new Marionette tests, and the coverage is still low. So we haven't seen this particular problem yet. Sorry.

Flags: needinfo?(hskupin)

u279076

Comment 68

•

9 years ago

I vote we close this bug as incomplete and reopen if the issue returns. Any objections?

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 69

•

9 years ago

This is reproducible by Bas. So I don't see why we should close it.

u279076

Comment 70

•

9 years ago

(In reply to Henrik Skupin (:whimboo) from comment #69) > This is reproducible by Bas. So I don't see why we should close it. My understanding was that this was only reproducible under Mozmill and if we're no longer running Mozmill then the crash is basically irrelevant. If we have a way to reproduce it outside Mozmill then I agree that we should continue to investigate. However that is not clear to me in reading this bug report.

Robert Kaiser

Comment 71

•

9 years ago

(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #70) > My understanding was that this was only reproducible under Mozmill and if > we're no longer running Mozmill then the crash is basically irrelevant. If > we have a way to reproduce it outside Mozmill then I agree that we should > continue to investigate. However that is not clear to me in reading this bug > report. The signatures in this bug appear quite a bit in crash data from "the wild", so this is surely not irrelevant. I don't know which paths we found for reproducing "in-house", though.

David Weir (satdav)

Updated

•

9 years ago

status-firefox43: --- → ?

tracking-e10s: --- → ?

Jim Mathies [:jimm]

Updated

•

9 years ago

tracking-e10s: ? → -

Bas Schouten (:bas.schouten)

Assignee

Comment 72

•

9 years ago

(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #70) > (In reply to Henrik Skupin (:whimboo) from comment #69) > > This is reproducible by Bas. So I don't see why we should close it. > > My understanding was that this was only reproducible under Mozmill and if > we're no longer running Mozmill then the crash is basically irrelevant. If > we have a way to reproduce it outside Mozmill then I agree that we should > continue to investigate. However that is not clear to me in reading this bug > report. No, I can reproduce this simply by making my content process die fairly early in its creation.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 73

•

9 years ago

Bas, since you can reproduce this, can you look for a fix, or work with :nical on it?

Flags: needinfo?(bas)

(Away)

Comment 74

•

9 years ago

This is currently the #1 crash on aurora 42 so it's definitely happening in the wild.

alex_mayorga

Comment 75

•

9 years ago

¡Hola! This just bite me today on Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:43.0) Gecko/20100101 Firefox/43.0 ID:20150902030229 CSet: fb720c90eb49590ba55bf52a8a4826ffff9f528b bp-4ec5b87f-e076-4260-9349-60e452150902 02/09/2015 12:50 p.m. Crashes while restarting to update Nightly. It was particularly bad as about:sessionrestore was wiped out clean so there was data loss =(

status-firefox43: ? → affected

u279076

Updated

•

9 years ago

Keywords: dataloss

Tracy Walker [:tracy]

Comment 76

•

9 years ago

Combined signatures put this at the #1 crash on Nightly (Fx43).

Milan Sreckovic [:milan] (needinfo for best results)

Comment 77

•

9 years ago

The one from comment 75 is a startup crash? That escallated quickly.

Bas Schouten (:bas.schouten)

Assignee

Comment 78

•

9 years ago

Hrm, I stopped being able to reproduce this on nightly.

Flags: needinfo?(bas)

mkdante381

Comment 79

•

9 years ago

Crash report: https://crash-stats.mozilla.com/report/index/25c50b8e-454f-43a6-ae10-645822150903

mkdante381

Comment 80

•

9 years ago

FF 41 beta7 x86build Add to crash signature: [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::MediaShutdownManager::Shutdown() ] My Crash Report https://crash-stats.mozilla.com/report/index/ba869ad3-7266-48d6-96d5-b10a22150905 My bug report: https://bugzilla.mozilla.org/show_bug.cgi?id=1201639

Arthur K. (he/him)

Comment 81

•

9 years ago

(In reply to mkdante381 from comment #80) > FF 41 beta7 x86build > Add to crash signature: > [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | > mozilla::ReentrantMonitor::Wait(unsigned int) | > nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, > bool) | mozilla::MediaShutdownManager::Shutdown() ] > > My Crash Report > https://crash-stats.mozilla.com/report/index/ba869ad3-7266-48d6-96d5- > b10a22150905 > > My bug report: > https://bugzilla.mozilla.org/show_bug.cgi?id=1201639 So your crash report shows you're using "AdapterDriverVersion: 15.201.1151.0" which is a beta (15.8) driver. How does it behave using the stable 15.7.1 driver?

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Updated

•

9 years ago

Blocks: 1202375

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 82

•

9 years ago

As mentioned earlier on this bug the Mozmill tests are dead. But interestingly we hit the same crash on Windows machines now with our Firefox UI Update tests. See bug 1202375 for details. The tests are getting run in VMs with default software installed. There is no specific graphic driver present beside the one Windows comes with. Crash report: d5e84fc1-60bb-425b-9e21-5e48a2150907 Those crashes happen multiple times a day on different boxes, and I think they might somewhat be reproducible.

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Updated

•

9 years ago

Whiteboard: [mozmill][gfx-noted] → [firefox-ui-tests][gfx-noted]

(Away)

Comment 83

•

9 years ago

Bas, is there anything you can do here? Maybe work with Henrik to figure out a repro?

Milan Sreckovic [:milan] (needinfo for best results)

Comment 84

•

9 years ago

Bas, assigning to you, I need :nical to look at something else for the next week or so.

Assignee: nical.bugzilla → bas

Flags: needinfo?(bas)

Bas Schouten (:bas.schouten)

Assignee

Comment 85

•

9 years ago

Most likely this is still the same issue as it was before when I -could- reproduce it, i.e. an ImageBridgeParent not being cleaned up the way it should after a content process crash. I can't reproduce this anymore but it would still at the very least be helpful to know if in the cases where we are seeing this a content process crash has occurred.

Flags: needinfo?(bas)

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 86

•

9 years ago

Too late to fix this in 41.

status-firefox41: affected → wontfix

Bas Schouten (:bas.schouten)

Assignee

Comment 87

•

9 years ago

I started being able to reproduce this again. This happens for me currently -most- of the time when the content process crashes early in creation. It seems that on a 'successful' content process crash we get notified of a channel error and our PImageBridgeParent actor and its subtree get successfully destroyed. On an 'unsuccessful' content process crash (i.e. which triggers this bug for me), we never get OnChannelError called on the image bridge parent and as a result it and its subtree just leak.

Bas Schouten (:bas.schouten)

Assignee

Comment 88

•

9 years ago

So, this occurs in case things die before the channel ever gets connected, I'm going to suggest a patch that will fix this shutdown hang, but it's very important to realize that in the current situation when that happens (i.e. a channel never gets connected because a content process dies early), we leak any actors whose channels have not been connected yet. CC'ing Brad to make sure the e10s folks are aware of this happening.

Flags: needinfo?(blassey.bugs)

Bas Schouten (:bas.schouten)

Assignee

Comment 89

•

9 years ago

Attached patch Only acquire a hold on the compositor thread once the channel is connected (deleted) — Details — Splinter Review

Attachment #8659871 - Flags: review?(nical.bugzilla)

Bas Schouten (:bas.schouten)

Assignee

Updated

•

9 years ago

Status: NEW → ASSIGNED

Nicolas Silva [:nical]

Updated

•

9 years ago

Attachment #8659871 - Flags: review?(nical.bugzilla) → review+

Pulsebot

Comment 90

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/f0fbe3de27cb

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 91

•

9 years ago

Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/1fd3662ece10 for b2g debug mochitest failures like https://treeherder.mozilla.org/logviewer.html#?job_id=13962777&repo=mozilla-inbound

Flags: needinfo?(bas)

Bas Schouten (:bas.schouten)

Assignee

Comment 92

•

9 years ago

Well that's odd... that should not really be possible... hmmmmm.

Flags: needinfo?(bas)

Bas Schouten (:bas.schouten)

Assignee

Comment 93

•

9 years ago

So.. the try run of this is clear (https://hg.mozilla.org/try/rev/c7c5b82af460) and I looked at a lot of code and can't find out what could possibly cause this. So I can only conclude it might be related to clobbering or something, but it seems odd... I'm going to push this again and will stick around to see what happens. Very sorry to the sheriff if I break things again :).

Brad Lassey [:blassey] (use needinfo?)

Updated

•

9 years ago

tracking-e10s: - → ?

Flags: needinfo?(blassey.bugs)

Pulsebot

Comment 94

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/5be65754c0d0

Carsten Book [:Tomcat]

Comment 95

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/5be65754c0d0

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox43: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla43

Arthur K. (he/him)

Comment 96

•

9 years ago

(In reply to mkdante381 from comment #80) > FF 41 beta7 x86build > Add to crash signature: > [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | > mozilla::ReentrantMonitor::Wait(unsigned int) | > nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, > bool) | mozilla::MediaShutdownManager::Shutdown() ] > > My Crash Report > https://crash-stats.mozilla.com/report/index/ba869ad3-7266-48d6-96d5- > b10a22150905 > > My bug report: > https://bugzilla.mozilla.org/show_bug.cgi?id=1201639 Catalyst 15.9 was released for Linux so I would guess it'll be released for Windows soon as well. If you're still on the 15.8 beta, they might help in your case if it's driver related.

Bas Schouten (:bas.schouten)

Assignee

Comment 97

•

9 years ago

Comment on attachment 8659871 [details] [diff] [review] Only acquire a hold on the compositor thread once the channel is connected Approval Request Comment [Feature/regressing bug #]: OMTC [User impact if declined]: Shutdown hangs if child process crashes early [Describe test coverage new/current, TreeHerder]: Nightly [Risks and why]: Low, merely delaying [String/UUID change made/needed]: None

Attachment #8659871 - Flags: approval-mozilla-aurora?

Sylvestre Ledru [:Sylvestre]

Comment 98

•

9 years ago

Comment on attachment 8659871 [details] [diff] [review] Only acquire a hold on the compositor thread once the channel is connected Fix a shutdown hang, taking it!

Attachment #8659871 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

mkdante381

Comment 99

•

9 years ago

Now This bug is fixed, when I am on site with HTML5 movie and Shutdown Firefox from Hamburger Australis Menu. Problem is still mainly with Adobe Flash GO to Youtube(You must force Flash on YT) or other site with Flash, or site with content Flash(no movies). Go to random movie or site with Flash Pause movie Shutdown FF Sometime Firefox crash with signature "[@ shutdownhang |", erlier also on site with only HTML5 movies This is problem with "plugin-container.exe". On my computer with AMD R9 270X and Catalyst 15.8beta is problem with Adobe Flash. Flash is unstable. Sometimes Flash process is suspended. Then Firefox is hanging itself. I must kill process "plugin-container.exe". After shutdown Firefox from Australis Menu, Firefox sometime not kill process "plugin-coantainer.exe" and FF crash. No problem with play HTML5 movies, but this problem was earlier, when shutdown FF on site with html 5 movies. Now I use script for Greasemonkey: https://greasyfork.org/pl/scripts/5433-force-flash-wmode and I added preference to Firefox: new > string Preference name: plugins.force.wmode Value: direct Now Flash is more stable. Problem is mainly with acceleration Adobe Flash

Arthur K. (he/him)

Comment 100

•

9 years ago

(In reply to mkdante381 from comment #99) > Now This bug is fixed, when I am on site with HTML5 movie and Shutdown > Firefox from Hamburger Australis Menu. Problem is still mainly with Adobe > Flash > > GO to Youtube(You must force Flash on YT) or other site with Flash, or site > with content Flash(no movies). > Go to random movie or site with Flash > Pause movie > Shutdown FF > Sometime Firefox crash with signature "[@ shutdownhang |", erlier also on > site with only HTML5 movies > > This is problem with "plugin-container.exe". On my computer with AMD R9 270X > and Catalyst 15.8beta is problem with Adobe Flash. Flash is unstable. > Sometimes Flash process is suspended. Then Firefox is hanging itself. I must > kill process "plugin-container.exe". After shutdown Firefox from Australis > Menu, Firefox sometime not kill process "plugin-coantainer.exe" and FF > crash. No problem with play HTML5 movies, but this problem was earlier, when > shutdown FF on site with html 5 movies. > > Now I use script for Greasemonkey: > https://greasyfork.org/pl/scripts/5433-force-flash-wmode and I added > preference to Firefox: > new > string > Preference name: plugins.force.wmode > Value: direct > > Now Flash is more stable. Problem is mainly with acceleration Adobe Flash So, I will state again, can you repro with the stable 15.7 Catalyst driver? Maybe the 15.8 Catalyst beta driver has a problem.

mkdante381

Comment 101

•

9 years ago

Nope with latest Catalyst 15.7.1 is even worse. 15.8beta fix bug "[424127] The Firefox browser may crash while opening multiple tabs (2 or more)" source: http://support.amd.com/en-us/kb-articles/Pages/latest-catalyst-windows-beta.aspx but not fix acceleration flash

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 102

•

9 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/82828c3a72c5

status-firefox42: affected → fixed

Arthur K. (he/him)

Comment 103

•

9 years ago

(In reply to mkdante381 from comment #101) > Nope with latest Catalyst 15.7.1 is even worse. 15.8beta fix bug "[424127] > The Firefox browser may crash while opening multiple tabs (2 or more)" > source: > http://support.amd.com/en-us/kb-articles/Pages/latest-catalyst-windows-beta. > aspx but not fix acceleration flash I don't see them yet released on AMD's site but Catalyst 15.8 seems to have been gleaned by the folks at Station Drivers (http://goo.gl/qRK54c). Give them a try.

Dragana Damjanovic [:dragana]

Updated

•

9 years ago

Blocks: 1207979

Dragana Damjanovic [:dragana]

Updated

•

9 years ago

Blocks: 1208019

Dragana Damjanovic [:dragana]

Updated

•

9 years ago

No longer blocks: 1207979

Dragana Damjanovic [:dragana]

Updated

•

9 years ago

Depends on: 1207979

mkdante381

Comment 104

•

9 years ago

(In reply to Arthur K. from comment #103) > (In reply to mkdante381 from comment #101) > > Nope with latest Catalyst 15.7.1 is even worse. 15.8beta fix bug "[424127] > > The Firefox browser may crash while opening multiple tabs (2 or more)" > > source: > > http://support.amd.com/en-us/kb-articles/Pages/latest-catalyst-windows-beta. > > aspx but not fix acceleration flash > > I don't see them yet released on AMD's site but Catalyst 15.8 seems to have > been gleaned by the folks at Station Drivers (http://goo.gl/qRK54c). Give > them a try. I use latest beta... http://support.amd.com/en-us/kb-articles/Pages/latest-catalyst-windows-beta.aspx