Closed
Bug 1127270
Opened 10 years ago
Closed 9 years ago
shutdownhang in mozilla::layers::CompositorParent::ShutDown()
Categories
(Core :: Graphics: Layers, defect)
Core
Graphics: Layers
Tracking
()
People
(Reporter: whimboo, Assigned: bas.schouten)
References
(Depends on 1 open bug, )
Details
(4 keywords, Whiteboard: [firefox-ui-tests][gfx-noted])
Crash Data
Attachments
(1 file)
(deleted),
patch
|
nical
:
review+
Sylvestre
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
We see constant shutdown hangs for our Mozmill tests in mozilla::layers::CompositorParent::ShutDown(). It seems to mostly happen on Windows especially XP. Currently I'm working on reducing the tests in question but it's a bit tricky.
Here the crash report of the shutdown hang:
bp-50fa6da5-01ab-4ed6-ba0b-6fca22150129.
Teodor, can you please help me and check older release/beta builds of Firefox? It would be good to know when this has been started.
Reporter | ||
Comment 1•10 years ago
|
||
[Tracking Requested - why for this release]:
I see over 2000 crashes with this signature in the last 7 days across supported versions. All reporting this problem on shutdown.
status-firefox36:
--- → affected
status-firefox37:
--- → affected
status-firefox38:
--- → affected
tracking-firefox36:
--- → ?
Whiteboard: [qa-automation-blocked] → [qa-automation-blocked][mozmill]
Comment 2•10 years ago
|
||
I think I found the regressor for this crash:
https://hg.mozilla.org/releases/mozilla-beta/rev/521859f9eae2
https://bugzilla.mozilla.org/show_bug.cgi?id=1119941
36.0b1 build2
Comment 3•10 years ago
|
||
Here's the pushlog between 36.0b1 build1 and build2:
https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=1b26127c3323&tochange=521859f9eae2
Reporter | ||
Comment 5•10 years ago
|
||
Given that this shutdown hang and crash might be related to Flash protected mode, lets also CC Benjamin.
Comment 6•10 years ago
|
||
(In reply to Teodor Druta from comment #2)
> I think I found the regressor for this crash
I don't think that's the right one, unless you have a reproducible test case and actually tested builds and backing out patches one by one.
The shutdownhang|... signatures replaced the "RunWatchdog" signatures of bug 1103833 when bug 1104317 was solved on the crash-stats server side. In turn, the "RunWatchdog" signatures came into being when bug 1038342 was fixed by killing processes that hang for more than 60 seconds on shutdown.
So, all in all, earlier versions would hang there for a long time while versions starting with 36 crash. Those crashes have been reported with "RunWatchdog" before and since January 21, when the fix to bug 1104317 was pushed live in Socorro production, they report with "shutdownhang" signature that give actually better insight in what was hanging.
Reporter | ||
Comment 7•10 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #6)
> (In reply to Teodor Druta from comment #2)
> > I think I found the regressor for this crash
>
> I don't think that's the right one, unless you have a reproducible test case
> and actually tested builds and backing out patches one by one.
Please read my comment 0. It clearly states that we have reproducible steps to trigger this hang. And we know that we didn't crash formerly. All the bugs you are referring here have no impact to the hang problem.
Comment 8•10 years ago
|
||
The thing that landed between build1 and build2 was a backout, and Flash protected mode is relatively unlikely to be related to this. I don't trust the regression range from comment 2-3.
nical, can you tell from the crash report which is the compositor thread and/or why it's failing to shut down and hanging?
Flags: needinfo?(nical.bugzilla)
Comment 9•10 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #8)
> nical, can you tell from the crash report which is the compositor thread
> and/or why it's failing to shut down and hanging?
I don't know which is the compositor thread. CompositorParent::ShutDown waits (spins the event loop) until the Compositor thread is destroyed, which is triggered by the CompositorThreadHolder being destroyed which means both the global sCompositorThreadHolder variable and CompositorParent's mCompositorThread must be null. the global variable was just set to null in the stack so it looks like we haven't nulled out the CompositorParent's mCompositorThreadHolder variable. This should have happened in CompositorParent::DeferredDestroy which is scheduled on the main thread after the compositor thread is done cleaning its stuff up (in CompositorParent::RecvStop which runs in the Compositor thread, triggered by CompositorChild::SendStop() on the main thread which is called by CompositorChild::Destroy which in turn is called by nsBaseWidget::DestroyCompositor)
What a happy mess :)
First thing I'd look at is whether we loose the reference to the CompositorChild in nsBaseWidget without calling DestroyCompositor. Then if something could have prevented any of the functions I mentioned above to be called.
Flags: needinfo?(nical.bugzilla)
Reporter | ||
Comment 10•10 years ago
|
||
Nicolas, would a full minidump be helpful for you?
Flags: needinfo?(nical.bugzilla)
Comment 11•10 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #10)
> Nicolas, would a full minidump be helpful for you?
I don't have time to work on this unless I sacrifice other bugs, so unless Bas or Milan want to bump the priority, assume I am not going to fix this in the short term (sorry).
Flags: needinfo?(nical.bugzilla)
Comment 12•10 years ago
|
||
Milan or Bas, could you find someone else to work on this? thanks
FYI, beta 6 gtb is today...
Flags: needinfo?(milan)
Flags: needinfo?(bas)
Comment 13•10 years ago
|
||
Safe to assume this won't get resolved by beta 6. We don't even seem to know what it is and where it is, and we don't have a regression range we trust.
Flags: needinfo?(milan)
Updated•10 years ago
|
Flags: needinfo?(bas)
Updated•10 years ago
|
Summary: Crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::layers::CompositorParent::ShutDown() → shutdownhang in mozilla::layers::CompositorParent::ShutDown()
Reporter | ||
Comment 15•10 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #13)
> Safe to assume this won't get resolved by beta 6. We don't even seem to
> know what it is and where it is, and we don't have a regression range we
> trust.
Milan, can you please have a look at my comment 10? We could provide a full minidump here if that is of any kind of help. If not someone from us would have to spend some more time to reduce the Mozmill test even further. Please let me know if the mini dump path would work.
Flags: needinfo?(milan)
Comment 16•10 years ago
|
||
Need to clear up other beta bugs first, this is not likely to get looked at in the next couple of days.
Updated•10 years ago
|
Whiteboard: [qa-automation-blocked][mozmill] → [qa-automation-blocked][mozmill][gfx-noted]
Crash Signature: [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::layers::CompositorParent::ShutDown()] → [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::layers::CompositorParent::ShutDown()]
[@ shutdownhang | WaitForSingleObjectEx | PR_Wait |…
Crash Signature: , bool) | mozilla::layers::CompositorParent::ShutDown()] → , bool) | mozilla::layers::CompositorParent::ShutDown()]
[@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, b…
Reporter | ||
Comment 17•10 years ago
|
||
The number of crashes have been lowered with the last beta release. So I'm going to remove our blocking whiteboard entry for now.
Whiteboard: [qa-automation-blocked][mozmill][gfx-noted] → [mozmill][gfx-noted]
Comment 18•10 years ago
|
||
Since it decreased, I am going to mark it as wontfix for 36. It is not tracked for 37. Don't hesitate to submit for tracking if it spikes.
Updated•10 years ago
|
Crash Signature: , bool) | mozilla::layers::CompositorParent::ShutDown()] → , bool) | mozilla::layers::CompositorParent::ShutDown()]
[@ shutdownhang | ntdll.dll@0x3c6bc]
Comment 19•10 years ago
|
||
This build hasn't crashed (on shutdown!) for me yet. And I've done it a few times.
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:38.0) Gecko/20100101 Firefox/38.0 ID:20150215030238 CSet: e0cb32a0b1aa
Which is a welcome change. My last crash today was the restart for this build. But I've shutdown three times now, and no crash.
Updated•10 years ago
|
Crash Signature: , bool) | mozilla::layers::CompositorParent::ShutDown()]
[@ shutdownhang | WaitForSingleObjectEx | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::layers… → , bool) | mozilla::layers::CompositorParent::ShutDown()]
[@ shutdownhang | WaitForSingleObjectEx | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::laye…
Comment 20•10 years ago
|
||
sorry, wrong bug.
Crash Signature: , bool) | mozilla::layers::CompositorParent::ShutDown()]
[@ shutdownhang | WaitForSingleObjectEx | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::dom:… → , bool) | mozilla::layers::CompositorParent::ShutDown()]
[@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::layers::CompositorParent::Shut…
Comment 21•10 years ago
|
||
Tracking for the current release as it is #3.
Updated•10 years ago
|
Keywords: topcrash-win
Comment 22•10 years ago
|
||
[Tracking Requested - why for this release]:
Combined signatures puts this in top 5 on Nightly (Fx40)
Crash Signature: , bool) | mozilla::layers::CompositorParent::ShutDown() ]
[@ shutdownhang | ntdll.dll@0x3c6bc] → , bool) | mozilla::layers::CompositorParent::ShutDown() ]
[@ shutdownhang | ntdll.dll@0x3c6bc]
[@ shutdownhang | WaitForSingleObjectEx | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNext…
status-firefox40:
--- → affected
tracking-firefox40:
--- → ?
Updated•10 years ago
|
Comment 24•10 years ago
|
||
Hello I am marking this as a major as its a knwon crash and happens a lot
PS I am on windows 7 and crashed yesterday
Severity: critical → major
OS: Windows XP → All
Version: 36 Branch → Trunk
Comment 25•10 years ago
|
||
can we get this on the release notes also for people to be aware
status-firefox38.0.5:
--- → affected
Flags: needinfo?(milan)
Comment 26•10 years ago
|
||
Sorry untook the need info from Milan but added it back
Flags: needinfo?(milan)
Comment 27•10 years ago
|
||
Please don't downgrade to major. Also, we know this happens quite a bit, otherwise it would not have a topcrash flag and be marked tracking for a number of releases, no need to flag more than that. The real issue is we need to find out what's really going on there. I think the only way you can really make this being fixed faster is to provide us with a scenario that can reliably reproduce the issue. We so far haven't heard of any such case.
Severity: major → critical
Comment 28•10 years ago
|
||
As for other releases it is too late for 38.
Updated•10 years ago
|
QA Whiteboard: [@ shutdownhang | WaitForSingleObjectEx | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | nsThread::Shutdown() ]
Comment 29•10 years ago
|
||
Tracy, I don't think the nsThread::Shutdown you added to the whiteboard is the same thing.
Comment 30•10 years ago
|
||
heh, didn't mean to add it to whiteboard.
I compared stack signature with another in this bug and they were identical.
Crash Signature: , bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::dom::ContentParent::Observe(nsISupports*, char const*, wchar_t cons ] → , bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::dom::ContentParent::Observe(nsISupports*, char const*, wchar_t cons ]
[@ shutdownhang | WaitForSingleObjectEx | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEve…
QA Whiteboard: [@ shutdownhang | WaitForSingleObjectEx | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | nsThread::Shutdown() ]
Comment 31•10 years ago
|
||
This hang is happening 100% of the time for me when I exit the browser with a page open (so there's a content process). This is with a DMD* Linux debug build, running over VNC, with an OSX client. I haven't tried with a non-DMD build yet.
* https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD
Comment 32•10 years ago
|
||
It looks like my hang is some kind of fallout from my own patches, so I don't know how useful my being able to reproduce it is, but here's the stack on the compositor thread in case it is useful:
#0 0x00007f70c0cf98bf in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
(gdb) bt
#0 0x00007f70c0cf98bf in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#1 0x00007f70ba16c191 in ConditionVariable::Wait (this=0x7f70a5acdbbc) at /home/amccreight/mc/ipc/chromium/src/base/condition_variable_posix.cc:40
#2 0x00007f70ba18474e in base::WaitableEvent::TimedWait (this=0x7f70a924e1d8, max_time=...) at /home/amccreight/mc/ipc/chromium/src/base/waitable_event_posix.cc:195
#3 0x00007f70ba18491b in base::WaitableEvent::Wait (this=0x7f70a5acdbbc) at /home/amccreight/mc/ipc/chromium/src/base/waitable_event_posix.cc:201
#4 0x00007f70ba17444f in base::MessagePumpDefault::Run (this=0x7f70a924e1c0, delegate=0x7f70a5acdd48) at /home/amccreight/mc/ipc/chromium/src/base/message_pump_default.cc:60
#5 0x00007f70ba1735d4 in RunHandler (this=0x7f70a5acdbbc) at /home/amccreight/mc/ipc/chromium/src/base/message_loop.cc:226
#6 MessageLoop::Run (this=0x7f70a5acdbbc) at /home/amccreight/mc/ipc/chromium/src/base/message_loop.cc:200
#7 0x00007f70ba17e519 in base::Thread::ThreadMain (this=0x7f70a924e160) at /home/amccreight/mc/ipc/chromium/src/base/thread.cc:170
#8 0x00007f70ba17e76f in ThreadFunc (closure=0x7f70a5acdbbc) at /home/amccreight/mc/ipc/chromium/src/base/platform_thread_posix.cc:39
That does not look useful but you never know.
Comment 33•10 years ago
|
||
Bill and I looked at this a little bit, but we weren't able to figure much out. Something is going wrong in the sequence of messages back and forth between the parent and child process, so the child doesn't shut down, but the ContentParent either gets far enough or doesn't notice the failure so it removes the xpcom-shutdown observer before shutdown, and thus never kills the nonresponsive child.
My steps to reproduce are something like:
1. Make a debug build, and also add ac_add_options --enable-dmd
2. Start the browser with DMD on like this: ./mach run --dmd --mode=live --sample-below=1
3. Open a random webpage (I used http://news.ycombinator.com/ but maybe it doesn't matter). Let it load at least a little bit.
4. Exit.
That hangs around 95% of the time for me.
Comment 34•10 years ago
|
||
The DMD changes there shouldn't affect anything except the performance, making it a good amount slower, so presumably there's some kind of race condition.
Comment 35•10 years ago
|
||
Lee, if you follow instructions in comment 33, can you reproduce?
Flags: needinfo?(milan) → needinfo?(lsalzman)
Comment 36•10 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #35)
> Lee, if you follow instructions in comment 33, can you reproduce?
I am having trouble reproducing this following those instructions. I'm not seeing any hangs with DMD enabled.
Updated•10 years ago
|
Flags: needinfo?(lsalzman)
This could still squeak into 39 but we are heading into beta 4 now.
It looks like people have had a crack at fixing it several times and Andrew has a good possible way to reproduce a related crash.
Milan I realize there may be other higher priority issues; we should come back to this and not let it drop though. I'll keep tracking this for 39 for the moment.
Flags: needinfo?(milan)
Comment 38•9 years ago
|
||
It kind of feels like the issue I'm seeing is e10s-specific, and thus unrelated to whatever is happening on release. But it is hard to know.
Agreed. I'll assign to :nical just in case he can get to it; we do want a strong 39.
Assignee: nobody → nical.bugzilla
Flags: needinfo?(milan)
Comment 40•9 years ago
|
||
bp-feaa716f-75c4-45a2-b146-7b5d72150619
19/06/2015 11:17 a.m.
Crashing Thread
Frame Module Signature Source
0 xul.dll mozilla::`anonymous namespace'::RunWatchdog(void*) toolkit/components/terminator/nsTerminator.cpp
1 nss3.dll PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c
2 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c
3 msvcr120.dll _callthreadstartex f:\dd\vctools\crt\crtw32\startup\threadex.c:376
4 msvcr120.dll _threadstartex f:\dd\vctools\crt\crtw32\startup\threadex.c:354
5 kernel32.dll BaseThreadInitThunk
6 ntdll.dll RtlUserThreadStart
7 kernel32.dll BasepReportFault
8 kernel32.dll BasepReportFault
status-firefox41:
--- → affected
(In reply to alex_mayorga from comment #40)
> bp-feaa716f-75c4-45a2-b146-7b5d72150619
> 19/06/2015 11:17 a.m.
>
> Crashing Thread
> Frame Module Signature Source
> 0 xul.dll mozilla::`anonymous namespace'::RunWatchdog(void*)
> toolkit/components/terminator/nsTerminator.cpp
> 1 nss3.dll PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c
> 2 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c
> 3 msvcr120.dll _callthreadstartex
> f:\dd\vctools\crt\crtw32\startup\threadex.c:376
> 4 msvcr120.dll _threadstartex
> f:\dd\vctools\crt\crtw32\startup\threadex.c:354
> 5 kernel32.dll BaseThreadInitThunk
> 6 ntdll.dll RtlUserThreadStart
> 7 kernel32.dll BasepReportFault
> 8 kernel32.dll BasepReportFault
Alex, you had this as a start up crash? Weird, the crash report itself is showing it as a > hour session. Does the start up crash persist? Does safe mode work?
Comment 42•9 years ago
|
||
(In reply to alex_mayorga from comment #40)
> bp-feaa716f-75c4-45a2-b146-7b5d72150619
> 19/06/2015 11:17 a.m.
>
> Crashing Thread
> Frame Module Signature Source
> 0 xul.dll mozilla::`anonymous namespace'::RunWatchdog(void*)
> toolkit/components/terminator/nsTerminator.cpp
> 1 nss3.dll PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c
> 2 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c
> 3 msvcr120.dll _callthreadstartex
> f:\dd\vctools\crt\crtw32\startup\threadex.c:376
> 4 msvcr120.dll _threadstartex
> f:\dd\vctools\crt\crtw32\startup\threadex.c:354
> 5 kernel32.dll BaseThreadInitThunk
> 6 ntdll.dll RtlUserThreadStart
> 7 kernel32.dll BasepReportFault
> 8 kernel32.dll BasepReportFault
Seeing as how the crash might be related to your Intel VGA driver (8.15.10.2696, from 2013), could you try updating to the current version (15.33.36.64.4226, June 2015 via https://goo.gl/np2lji) and see if it still crashes?
Alex, did this just start happening for you with the nightly? Because if it did, and you have time, before you update the driver, it would be beyond awesome if you could run mozregression (https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/Existing_Tools#MozRegression) to help us find out exactly when it started happening.
Comment 44•9 years ago
|
||
¡Hola Milan!
It seems to still be a thing on Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:41.0) Gecko/20100101 Firefox/41.0 ID:20150625030202 CSet: 0b2f5e8b7be5
bp-31608b8c-69d0-49ac-8085-4c6002150625
I'm not entirely sure of STR though...
Is this the tab crash that happens when I shutdown the computer without closing Nightly first?
Flags: needinfo?(milan)
Comment 45•9 years ago
|
||
(In reply to alex_mayorga from comment #44)
> ¡Hola Milan!
>
> It seems to still be a thing on Mozilla/5.0 (Windows NT 6.1; Win64; x64;
> rv:41.0) Gecko/20100101 Firefox/41.0 ID:20150625030202 CSet: 0b2f5e8b7be5
>
> bp-31608b8c-69d0-49ac-8085-4c6002150625
>
> I'm not entirely sure of STR though...
>
> Is this the tab crash that happens when I shutdown the computer without
> closing Nightly first?
Did you try updating your Intel VGA driver as I mentioned in comment 42?
Comment 46•9 years ago
|
||
(In reply to Arthur K. from comment #45)
> (In reply to alex_mayorga from comment #44)
> > ¡Hola Milan!
> >
> > It seems to still be a thing on Mozilla/5.0 (Windows NT 6.1; Win64; x64;
> > rv:41.0) Gecko/20100101 Firefox/41.0 ID:20150625030202 CSet: 0b2f5e8b7be5
> >
> > bp-31608b8c-69d0-49ac-8085-4c6002150625
> >
> > I'm not entirely sure of STR though...
> >
> > Is this the tab crash that happens when I shutdown the computer without
> > closing Nightly first?
>
> Did you try updating your Intel VGA driver as I mentioned in comment 42?
¡Hola Arthur!
I just tried updating with win64_153336.exe and got the following very uninformative message:
"Error
This computer does not meet the minimum requirements for installing the software.
<OK>"
=(
Comment 47•9 years ago
|
||
(In reply to alex_mayorga from comment #46)
> (In reply to Arthur K. from comment #45)
> > (In reply to alex_mayorga from comment #44)
> > > ¡Hola Milan!
> > >
> > > It seems to still be a thing on Mozilla/5.0 (Windows NT 6.1; Win64; x64;
> > > rv:41.0) Gecko/20100101 Firefox/41.0 ID:20150625030202 CSet: 0b2f5e8b7be5
> > >
> > > bp-31608b8c-69d0-49ac-8085-4c6002150625
> > >
> > > I'm not entirely sure of STR though...
> > >
> > > Is this the tab crash that happens when I shutdown the computer without
> > > closing Nightly first?
> >
> > Did you try updating your Intel VGA driver as I mentioned in comment 42?
>
> ¡Hola Arthur!
>
> I just tried updating with win64_153336.exe and got the following very
> uninformative message:
>
> "Error
> This computer does not meet the minimum requirements for installing the
> software.
> <OK>"
>
> =(
Hmm, based on your crash report DeviceID and what your old driver said, it should have been right. Can you please grab GPU-Z 0.8.4 and tell me what it says in the Name area? Also, what does it say in Display Adapter under Device Manager?
Comment 48•9 years ago
|
||
(In reply to alex_mayorga from comment #46)
> (In reply to Arthur K. from comment #45)
> > (In reply to alex_mayorga from comment #44)
> > > ¡Hola Milan!
> > >
> > > It seems to still be a thing on Mozilla/5.0 (Windows NT 6.1; Win64; x64;
> > > rv:41.0) Gecko/20100101 Firefox/41.0 ID:20150625030202 CSet: 0b2f5e8b7be5
> > >
> > > bp-31608b8c-69d0-49ac-8085-4c6002150625
> > >
> > > I'm not entirely sure of STR though...
> > >
> > > Is this the tab crash that happens when I shutdown the computer without
> > > closing Nightly first?
> >
> > Did you try updating your Intel VGA driver as I mentioned in comment 42?
>
> ¡Hola Arthur!
>
> I just tried updating with win64_153336.exe and got the following very
> uninformative message:
>
> "Error
> This computer does not meet the minimum requirements for installing the
> software.
> <OK>"
>
> =(
Well, originally I thought this was an HD4000 but it seems to be an HD3000. Try these drivers please: https://goo.gl/2SCvaR
Comment 49•9 years ago
|
||
¡Hola Arthur!
win64_152824.exe did work
I disobeyed the installer and left Nightly running during the update.
This resulted on the following crash:
Report ID Date Submitted
bp-4388b86f-bd1b-4444-a05c-be5682150625
25/06/2015 04:04 p.m.
That is seemingly https://bugzilla.mozilla.org/show_bug.cgi?id=1133623
Wontfixing for 39. This recent activity appears to be on 41.
Comment 51•9 years ago
|
||
I pressed the "Update Nightly" in the "hamburger-menu". This caused this crash:
https://crash-stats.mozilla.com/report/index/7ea1fd3e-9f3f-41ad-b7e4-e24842150710
Comment 52•9 years ago
|
||
[Tracking Requested - why for this release]: See above comment, it happened for 42.0a1.
Note I had e10s on.
tracking-firefox42:
--- → ?
Updated•9 years ago
|
status-firefox42:
--- → ?
The original report talked about mozmill tests - did we ever run into a problem with the debug build?
Flags: needinfo?(milan) → needinfo?(hskupin)
Comment 54•9 years ago
|
||
Tracking for 42 as it is a top crash... but I am unhappy that we have been tracking it since 36...
Wontfix for 40 as I don't think we will have a fix in time for this release...
(In reply to Sylvestre Ledru [:sylvestre] PTO => July 10th from comment #54)
> Tracking for 42 as it is a top crash... but I am unhappy that we have been
> tracking it since 36...
> Wontfix for 40 as I don't think we will have a fix in time for this
> release...
Agreed, but we don't know how to fix it.
Comment 56•9 years ago
|
||
To me it changes a lot. It stopped happening maybe yesterday, but was hanging/crashing for several weeks before. It didn't happen before that for a long time. But it also happened for a while even before that.
Comment 57•9 years ago
|
||
Current signatures on aurora:
#3 shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::layers::CompositorParent::ShutDown()
#6 shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::layers::CompositorParent::ShutDown()
#12 shutdownhang | WaitForSingleObjectEx | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::layers::CompositorParent::ShutDown()
Comment 58•9 years ago
|
||
Those were from the browser process list.
Assignee | ||
Comment 59•9 years ago
|
||
Nical, could there be like an image bridge that's still holding onto the CompositorThreadHolder?
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 60•9 years ago
|
||
I managed to reproduce this problem locally with fairly high reliability, I added a printf that shows me the address of the sCompositorThreadHolder before we null it out, here's where it gets interesting, when we're in the hung state, here's the data on sCompositorThreadHolder:
- (CompositorThreadHolder*)0x10998660 0x10998660 {mRefCnt={mValue={...} } mHelperForMainThreadDestruction={...} mCompositorThread=0x109d8520 {...} } mozilla::layers::CompositorThreadHolder *
- mRefCnt {mValue={...} } mozilla::ThreadSafeAutoRefCnt
- mValue {...} mozilla::Atomic<unsigned int,2,void>
- mozilla::detail::AtomicBaseIncDec<unsigned int,2> {...} mozilla::detail::AtomicBaseIncDec<unsigned int,2>
- mozilla::detail::AtomicBase<unsigned int,2> {mValue={...} } mozilla::detail::AtomicBase<unsigned int,2>
- mValue {...} std::atomic<unsigned int>
- std::atomic_uint {_My_val=2 } std::atomic_uint
_My_val 2 unsigned long
mHelperForMainThreadDestruction {...} mozilla::layers::HelperForMainThreadDestruction
+ mCompositorThread 0x109d8520 {startup_data_=0x004fbc30 {options={message_loop_type=??? stack_size=??? transient_hang_timeout=...} ...} ...} base::Thread * const
In other words, there's 2 references to the CompositorThreadHolder lying around somewhere and not being cleaned up, from there on, this hang occurring is no surprise.
Assignee | ||
Comment 61•9 years ago
|
||
I've confirmed that at this point the ImageBridgeParent singleton is properly destroyed.
Assignee | ||
Comment 63•9 years ago
|
||
So I've concluded this is to an ImageBridgeParent still being alive.
Here's the stack that creates the offending ImageBridge:
> xul.dll!mozilla::layers::ImageBridgeParent::ImageBridgeParent(MessageLoop * aLoop, IPC::Channel * aTransport, unsigned long aChildProcessId) Line 74 C++
xul.dll!mozilla::layers::ImageBridgeParent::Create(IPC::Channel * aTransport, unsigned long aChildProcessId) Line 196 C++
xul.dll!mozilla::dom::ContentParent::AllocPImageBridgeParent(IPC::Channel * aTransport, unsigned long aOtherProcess) Line 3163 C++
xul.dll!mozilla::dom::PContentParent::OnMessageReceived(const IPC::Message & msg__) Line 5657 C++
xul.dll!mozilla::ipc::MessageChannel::DispatchAsyncMessage(const IPC::Message & aMsg) Line 1373 C++
xul.dll!mozilla::ipc::MessageChannel::DispatchMessageW(const IPC::Message & aMsg) Line 1294 C++
xul.dll!mozilla::ipc::MessageChannel::OnMaybeDequeueOne() Line 1266 C++
xul.dll!DispatchToMethod<mozilla::ipc::MessageChannel,bool (__thiscall mozilla::ipc::MessageChannel::*)(void)>(mozilla::ipc::MessageChannel * obj, bool (void) * method, const Tuple0 & arg) Line 388 C++
xul.dll!RunnableMethod<mozilla::ipc::MessageChannel,bool (__thiscall mozilla::ipc::MessageChannel::*)(void),Tuple0>::Run() Line 310 C++
xul.dll!mozilla::ipc::MessageChannel::RefCountedTask::Run() Line 456 C++
xul.dll!mozilla::ipc::MessageChannel::DequeueTask::Run() Line 473 C++
xul.dll!MessageLoop::RunTask(Task * task) Line 365 C++
This ImageBridge is essentially created as a child of a content parent being created.
Assignee | ||
Comment 64•9 years ago
|
||
We've concluded this image bridge is the result of a content process that's created for background thumbnail generation. It appears that the ContentParent for this process and an image bridge are not properly being shutdown. There's a couple of message sending errors in the console, it's possible this has something to do with those messages being dropped because we're shutting down, but this is hard to know for sure.
Note that some of these hangs would have been crashes prior to bug 1175521. That bug only wallpapered over the crash, but see https://bugzilla.mozilla.org/show_bug.cgi?id=1175521#c7 in particular for perhaps something that can help in this bug.
Reporter | ||
Comment 67•9 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #53)
> The original report talked about mozmill tests - did we ever run into a
> problem with the debug build?
We are no longer running Mozmill tests. They have been partly replaced with the new Marionette tests, and the coverage is still low. So we haven't seen this particular problem yet. Sorry.
Flags: needinfo?(hskupin)
Comment 68•9 years ago
|
||
I vote we close this bug as incomplete and reopen if the issue returns. Any objections?
Reporter | ||
Comment 69•9 years ago
|
||
This is reproducible by Bas. So I don't see why we should close it.
Comment 70•9 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #69)
> This is reproducible by Bas. So I don't see why we should close it.
My understanding was that this was only reproducible under Mozmill and if we're no longer running Mozmill then the crash is basically irrelevant. If we have a way to reproduce it outside Mozmill then I agree that we should continue to investigate. However that is not clear to me in reading this bug report.
Comment 71•9 years ago
|
||
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #70)
> My understanding was that this was only reproducible under Mozmill and if
> we're no longer running Mozmill then the crash is basically irrelevant. If
> we have a way to reproduce it outside Mozmill then I agree that we should
> continue to investigate. However that is not clear to me in reading this bug
> report.
The signatures in this bug appear quite a bit in crash data from "the wild", so this is surely not irrelevant. I don't know which paths we found for reproducing "in-house", though.
Updated•9 years ago
|
status-firefox43:
--- → ?
tracking-e10s:
--- → ?
Updated•9 years ago
|
Assignee | ||
Comment 72•9 years ago
|
||
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #70)
> (In reply to Henrik Skupin (:whimboo) from comment #69)
> > This is reproducible by Bas. So I don't see why we should close it.
>
> My understanding was that this was only reproducible under Mozmill and if
> we're no longer running Mozmill then the crash is basically irrelevant. If
> we have a way to reproduce it outside Mozmill then I agree that we should
> continue to investigate. However that is not clear to me in reading this bug
> report.
No, I can reproduce this simply by making my content process die fairly early in its creation.
Bas, since you can reproduce this, can you look for a fix, or work with :nical on it?
Flags: needinfo?(bas)
Comment 74•9 years ago
|
||
This is currently the #1 crash on aurora 42 so it's definitely happening in the wild.
Comment 75•9 years ago
|
||
¡Hola!
This just bite me today on Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:43.0) Gecko/20100101 Firefox/43.0 ID:20150902030229 CSet: fb720c90eb49590ba55bf52a8a4826ffff9f528b
bp-4ec5b87f-e076-4260-9349-60e452150902
02/09/2015 12:50 p.m.
Crashes while restarting to update Nightly.
It was particularly bad as about:sessionrestore was wiped out clean so there was data loss =(
Comment 76•9 years ago
|
||
Combined signatures put this at the #1 crash on Nightly (Fx43).
The one from comment 75 is a startup crash? That escallated quickly.
Assignee | ||
Comment 78•9 years ago
|
||
Hrm, I stopped being able to reproduce this on nightly.
Flags: needinfo?(bas)
Comment 79•9 years ago
|
||
Comment 80•9 years ago
|
||
FF 41 beta7 x86build
Add to crash signature:
[@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::MediaShutdownManager::Shutdown() ]
My Crash Report
https://crash-stats.mozilla.com/report/index/ba869ad3-7266-48d6-96d5-b10a22150905
My bug report:
https://bugzilla.mozilla.org/show_bug.cgi?id=1201639
Comment 81•9 years ago
|
||
(In reply to mkdante381 from comment #80)
> FF 41 beta7 x86build
> Add to crash signature:
> [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait |
> mozilla::ReentrantMonitor::Wait(unsigned int) |
> nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*,
> bool) | mozilla::MediaShutdownManager::Shutdown() ]
>
> My Crash Report
> https://crash-stats.mozilla.com/report/index/ba869ad3-7266-48d6-96d5-
> b10a22150905
>
> My bug report:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1201639
So your crash report shows you're using "AdapterDriverVersion: 15.201.1151.0" which is a beta (15.8) driver. How does it behave using the stable 15.7.1 driver?
Reporter | ||
Comment 82•9 years ago
|
||
As mentioned earlier on this bug the Mozmill tests are dead. But interestingly we hit the same crash on Windows machines now with our Firefox UI Update tests. See bug 1202375 for details. The tests are getting run in VMs with default software installed. There is no specific graphic driver present beside the one Windows comes with.
Crash report: d5e84fc1-60bb-425b-9e21-5e48a2150907
Those crashes happen multiple times a day on different boxes, and I think they might somewhat be reproducible.
Reporter | ||
Updated•9 years ago
|
Whiteboard: [mozmill][gfx-noted] → [firefox-ui-tests][gfx-noted]
Comment 83•9 years ago
|
||
Bas, is there anything you can do here? Maybe work with Henrik to figure out a repro?
Bas, assigning to you, I need :nical to look at something else for the next week or so.
Assignee: nical.bugzilla → bas
Flags: needinfo?(bas)
Assignee | ||
Comment 85•9 years ago
|
||
Most likely this is still the same issue as it was before when I -could- reproduce it, i.e. an ImageBridgeParent not being cleaned up the way it should after a content process crash.
I can't reproduce this anymore but it would still at the very least be helpful to know if in the cases where we are seeing this a content process crash has occurred.
Flags: needinfo?(bas)
Too late to fix this in 41.
Assignee | ||
Comment 87•9 years ago
|
||
I started being able to reproduce this again. This happens for me currently -most- of the time when the content process crashes early in creation.
It seems that on a 'successful' content process crash we get notified of a channel error and our PImageBridgeParent actor and its subtree get successfully destroyed.
On an 'unsuccessful' content process crash (i.e. which triggers this bug for me), we never get OnChannelError called on the image bridge parent and as a result it and its subtree just leak.
Assignee | ||
Comment 88•9 years ago
|
||
So, this occurs in case things die before the channel ever gets connected, I'm going to suggest a patch that will fix this shutdown hang, but it's very important to realize that in the current situation when that happens (i.e. a channel never gets connected because a content process dies early), we leak any actors whose channels have not been connected yet. CC'ing Brad to make sure the e10s folks are aware of this happening.
Flags: needinfo?(blassey.bugs)
Assignee | ||
Comment 89•9 years ago
|
||
Attachment #8659871 -
Flags: review?(nical.bugzilla)
Assignee | ||
Updated•9 years ago
|
Status: NEW → ASSIGNED
Updated•9 years ago
|
Attachment #8659871 -
Flags: review?(nical.bugzilla) → review+
Comment 90•9 years ago
|
||
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/1fd3662ece10 for b2g debug mochitest failures like https://treeherder.mozilla.org/logviewer.html#?job_id=13962777&repo=mozilla-inbound
Flags: needinfo?(bas)
Assignee | ||
Comment 92•9 years ago
|
||
Well that's odd... that should not really be possible... hmmmmm.
Flags: needinfo?(bas)
Assignee | ||
Comment 93•9 years ago
|
||
So.. the try run of this is clear (https://hg.mozilla.org/try/rev/c7c5b82af460) and I looked at a lot of code and can't find out what could possibly cause this. So I can only conclude it might be related to clobbering or something, but it seems odd... I'm going to push this again and will stick around to see what happens. Very sorry to the sheriff if I break things again :).
Updated•9 years ago
|
Flags: needinfo?(blassey.bugs)
Comment 94•9 years ago
|
||
Comment 95•9 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Comment 96•9 years ago
|
||
(In reply to mkdante381 from comment #80)
> FF 41 beta7 x86build
> Add to crash signature:
> [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait |
> mozilla::ReentrantMonitor::Wait(unsigned int) |
> nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*,
> bool) | mozilla::MediaShutdownManager::Shutdown() ]
>
> My Crash Report
> https://crash-stats.mozilla.com/report/index/ba869ad3-7266-48d6-96d5-
> b10a22150905
>
> My bug report:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1201639
Catalyst 15.9 was released for Linux so I would guess it'll be released for Windows soon as well. If you're still on the 15.8 beta, they might help in your case if it's driver related.
Assignee | ||
Comment 97•9 years ago
|
||
Comment on attachment 8659871 [details] [diff] [review]
Only acquire a hold on the compositor thread once the channel is connected
Approval Request Comment
[Feature/regressing bug #]: OMTC
[User impact if declined]: Shutdown hangs if child process crashes early
[Describe test coverage new/current, TreeHerder]: Nightly
[Risks and why]: Low, merely delaying
[String/UUID change made/needed]: None
Attachment #8659871 -
Flags: approval-mozilla-aurora?
Comment 98•9 years ago
|
||
Comment on attachment 8659871 [details] [diff] [review]
Only acquire a hold on the compositor thread once the channel is connected
Fix a shutdown hang, taking it!
Attachment #8659871 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 99•9 years ago
|
||
Now This bug is fixed, when I am on site with HTML5 movie and Shutdown Firefox from Hamburger Australis Menu. Problem is still mainly with Adobe Flash
GO to Youtube(You must force Flash on YT) or other site with Flash, or site with content Flash(no movies).
Go to random movie or site with Flash
Pause movie
Shutdown FF
Sometime Firefox crash with signature "[@ shutdownhang |", erlier also on site with only HTML5 movies
This is problem with "plugin-container.exe". On my computer with AMD R9 270X and Catalyst 15.8beta is problem with Adobe Flash. Flash is unstable. Sometimes Flash process is suspended. Then Firefox is hanging itself. I must kill process "plugin-container.exe". After shutdown Firefox from Australis Menu, Firefox sometime not kill process "plugin-coantainer.exe" and FF crash. No problem with play HTML5 movies, but this problem was earlier, when shutdown FF on site with html 5 movies.
Now I use script for Greasemonkey: https://greasyfork.org/pl/scripts/5433-force-flash-wmode and I added preference to Firefox:
new > string
Preference name: plugins.force.wmode
Value: direct
Now Flash is more stable. Problem is mainly with acceleration Adobe Flash
Comment 100•9 years ago
|
||
(In reply to mkdante381 from comment #99)
> Now This bug is fixed, when I am on site with HTML5 movie and Shutdown
> Firefox from Hamburger Australis Menu. Problem is still mainly with Adobe
> Flash
>
> GO to Youtube(You must force Flash on YT) or other site with Flash, or site
> with content Flash(no movies).
> Go to random movie or site with Flash
> Pause movie
> Shutdown FF
> Sometime Firefox crash with signature "[@ shutdownhang |", erlier also on
> site with only HTML5 movies
>
> This is problem with "plugin-container.exe". On my computer with AMD R9 270X
> and Catalyst 15.8beta is problem with Adobe Flash. Flash is unstable.
> Sometimes Flash process is suspended. Then Firefox is hanging itself. I must
> kill process "plugin-container.exe". After shutdown Firefox from Australis
> Menu, Firefox sometime not kill process "plugin-coantainer.exe" and FF
> crash. No problem with play HTML5 movies, but this problem was earlier, when
> shutdown FF on site with html 5 movies.
>
> Now I use script for Greasemonkey:
> https://greasyfork.org/pl/scripts/5433-force-flash-wmode and I added
> preference to Firefox:
> new > string
> Preference name: plugins.force.wmode
> Value: direct
>
> Now Flash is more stable. Problem is mainly with acceleration Adobe Flash
So, I will state again, can you repro with the stable 15.7 Catalyst driver? Maybe the 15.8 Catalyst beta driver has a problem.
Comment 101•9 years ago
|
||
Nope with latest Catalyst 15.7.1 is even worse. 15.8beta fix bug "[424127] The Firefox browser may crash while opening multiple tabs (2 or more)" source: http://support.amd.com/en-us/kb-articles/Pages/latest-catalyst-windows-beta.aspx but not fix acceleration flash
Comment 103•9 years ago
|
||
(In reply to mkdante381 from comment #101)
> Nope with latest Catalyst 15.7.1 is even worse. 15.8beta fix bug "[424127]
> The Firefox browser may crash while opening multiple tabs (2 or more)"
> source:
> http://support.amd.com/en-us/kb-articles/Pages/latest-catalyst-windows-beta.
> aspx but not fix acceleration flash
I don't see them yet released on AMD's site but Catalyst 15.8 seems to have been gleaned by the folks at Station Drivers (http://goo.gl/qRK54c). Give them a try.
Comment 104•9 years ago
|
||
(In reply to Arthur K. from comment #103)
> (In reply to mkdante381 from comment #101)
> > Nope with latest Catalyst 15.7.1 is even worse. 15.8beta fix bug "[424127]
> > The Firefox browser may crash while opening multiple tabs (2 or more)"
> > source:
> > http://support.amd.com/en-us/kb-articles/Pages/latest-catalyst-windows-beta.
> > aspx but not fix acceleration flash
>
> I don't see them yet released on AMD's site but Catalyst 15.8 seems to have
> been gleaned by the folks at Station Drivers (http://goo.gl/qRK54c). Give
> them a try.
I use latest beta...
http://support.amd.com/en-us/kb-articles/Pages/latest-catalyst-windows-beta.aspx
You need to log in
before you can comment on or make changes to this bug.
Description
•