Closed
Bug 1138520
Opened 10 years ago
Closed 9 years ago
shutdownhang in mozilla::dom::ContentParent::Observe
Categories
(Core :: DOM: Content Processes, defect)
Tracking
()
People
(Reporter: kairo, Assigned: billm)
References
Details
(Keywords: crash, topcrash-win)
Crash Data
Attachments
(2 files)
(deleted),
patch
|
jchen
:
review+
billm
:
checkin+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jchen
:
review+
lizzard
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-0112a5f7-4dad-492c-985b-1b17c2150226.
=============================================================
Top frames - Thread 0:
Frame Module Signature Source
0 ntdll.dll KiFastSystemCallRet
1 ntdll.dll NtWaitForSingleObject
2 kernelbase.dll WaitForSingleObjectEx
3 kernel32.dll WaitForSingleObjectExImplementation
4 kernel32.dll WaitForSingleObject
5 nss3.dll PR_Wait nsprpub/pr/src/threads/prmon.c
6 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp
7 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp
8 xul.dll mozilla::dom::ContentParent::Observe(nsISupports*, char const*, wchar_t const*) dom/ipc/ContentParent.cpp
9 xul.dll nsObserverList::NotifyObservers(nsISupports*, char const*, wchar_t const*) xpcom/ds/nsObserverList.cpp
10 xul.dll nsObserverService::NotifyObservers(nsISupports*, char const*, wchar_t const*) xpcom/ds/nsObserverService.cpp
11 xul.dll nsXREDirProvider::DoShutdown() toolkit/xre/nsXREDirProvider.cpp
12 xul.dll ScopedXPCOMStartup::~ScopedXPCOMStartup() toolkit/xre/nsAppRunner.cpp
13 xul.dll ScopedXPCOMStartup::`scalar deleting destructor'(unsigned int)
14 xul.dll XREMain::XRE_main(int, char** const, nsXREAppData const*) toolkit/xre/nsAppRunner.cpp
15 xul.dll XRE_main toolkit/xre/nsAppRunner.cpp
[...]
More reports and stats at https://crash-stats.mozilla.com/report/list?signature=shutdownhang+%7C+WaitForSingleObjectEx+%7C+WaitForSingleObject+%7C+PR_Wait+%7C+nsThread%3A%3AProcessNextEvent%28bool%2C+bool%2A%29+%7C+NS_ProcessNextEvent%28nsIThread%2A%2C+bool%29+%7C+mozilla%3A%3Adom%3A%3AContentParent%3A%3AObserve%28nsISupports%2A%2C+char+const%2A%2C+wchar_t+const%2A%29
This is the #11 crash in Firefox 37.0b1 and the second-most common shutdownhang in that build (after bug 1124880).
The reports come from all Windows versions with a bit more emphasis on XP than usual, I'm at a loss to find any data that could help narrow this down, unfortunately.
Comment 1•10 years ago
|
||
[Tracking Requested - why for this release]:
Odd that this is a content process but not e10s, so it's likely to be the thumbnailing content process.
blame of http://hg.mozilla.org/releases/mozilla-beta/annotate/276fd03ae5ac/dom/ipc/ContentParent.cpp#l2776 has this on billm: we're in the while (mIPCOpen) loop for a long time/forever.
Assignee | ||
Comment 2•10 years ago
|
||
Is there any way we could get stacks for the content processes as well? It's possible that the code here is busted, but more likely the content process is failing to shut down.
Flags: needinfo?(wmccloskey)
Assignee | ||
Comment 3•10 years ago
|
||
Although we have a 5 second timer that's supposed to kill content processes at shutdown, so something clearly isn't going right. I assume that this shutdown hang detector triggers after more than 5 seconds?
Flags: needinfo?(benjamin)
Comment 5•10 years ago
|
||
We can't get content process stacks from socorro because we're not even sure that the process still exists. You'd have to write explicit code to try and collect the content process minidump.
Assignee | ||
Comment 6•10 years ago
|
||
I guess the real problem is that the timer isn't doing its job. I'll look into that first.
Just our of general interest, where is the code for the shutdown hang detection?
Comment 7•10 years ago
|
||
[Tracking Requested - why for this release]: Tracking this for 37 and up since it's a high volume crash. I haven't actually checked crash-stats to see if 38 and 39 are affected but it seems likely.
Bill would you mind assigning yourself to the bug if you're taking it? Release management is trying to make sure all bugs in beta (and maybe aurora too) have assignees. Thanks!
Flags: needinfo?(wmccloskey)
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → wmccloskey
Flags: needinfo?(wmccloskey)
Assignee | ||
Comment 10•10 years ago
|
||
We might want to back bug 1103036 out on beta.
Comment 11•10 years ago
|
||
That may break some of the child process leak logging stuff I landed, but working around that will just be a matter of backing out a few mochitest configuration changes.
Comment 12•10 years ago
|
||
Bill - Are we going to bug out bug 1103036 along with the mochitest config changes that mccr8 mentions above? Can you make that happen this week?
Flags: needinfo?(wmccloskey)
Assignee | ||
Comment 13•10 years ago
|
||
The patch has been backed out on beta.
Flags: needinfo?(wmccloskey)
Updated•10 years ago
|
Comment 14•10 years ago
|
||
I forgot to mark this as a topcrash so I'm doing that now.
For Firefox 37 this crash is #4 and dropping since the backout.
For Firefox 38 this crash is at #275 and dropping.
For Firefox 39 this crash is at #32 and stable.
Keywords: topcrash-win
Reporter | ||
Updated•10 years ago
|
Crash Signature: [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::dom::ContentParent::Observe(nsISupports*, char const*, wchar_t const*)] → [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::dom::ContentParent::Observe(nsISupports*, char const*, wchar_t const*)]
[@ shutdownhang |…
Reporter | ||
Updated•10 years ago
|
Crash Signature: , bool) | mozilla::dom::ContentParent::Observe(nsISupports*, char const*, wc... ] → , bool) | mozilla::dom::ContentParent::Observe(nsISupports*, char const*, wc... ]
Reporter | ||
Updated•10 years ago
|
Crash Signature: , wchar_t const*)]
[@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::dom::ContentParent::Ob… → , wchar_t const*) ]
[@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::dom::ContentParent::O…
Reporter | ||
Updated•10 years ago
|
Crash Signature: , bool) | mozilla::dom::ContentParent::Observe(nsISupports*, char const*, wc... ] → , bool) | mozilla::dom::ContentParent::Observe(nsISupports*, char const*, wc... ]
[@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_Proce…
Comment 15•10 years ago
|
||
Bill, can we backout bug 1103036 for 38 too? Thanks
Flags: needinfo?(wmccloskey)
Reporter | ||
Comment 16•10 years ago
|
||
IMHO, if we can back out for 38 without problems, we should see to do that on 39 as well if possible, esp. as 39 will have an extremely short beta cycle.
Comment 17•10 years ago
|
||
Looks like the backout on Firefox 37 worked as I'm seeing no crashes reported on that branch with Firefox 37.0b7 or later and this doesn't register at all in the top-300 crashes on Socorro for Fx37.
Reporter | ||
Comment 18•10 years ago
|
||
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #17)
> Looks like the backout on Firefox 37 worked as I'm seeing no crashes
> reported on that branch with Firefox 37.0b7 or later and this doesn't
> register at all in the top-300 crashes on Socorro for Fx37.
Right. It's still a top issue on 38 (and probably on 39 once the async plugin init fallout has been cleaned up).
Assignee | ||
Comment 19•10 years ago
|
||
This is just a patch to try to understand why we're hanging. I'll back out on beta once my try push finishes.
Attachment #8592008 -
Flags: review?(nchen)
Comment 20•10 years ago
|
||
Comment on attachment 8592008 [details] [diff] [review]
investigative patch
LGTM!
Attachment #8592008 -
Flags: review?(nchen) → review+
Assignee | ||
Comment 21•10 years ago
|
||
Backed out again on beta:
remote: https://hg.mozilla.org/releases/mozilla-beta/rev/56f805ac34ce
remote: https://hg.mozilla.org/releases/mozilla-beta/rev/8a5486269821
Flags: needinfo?(wmccloskey)
Assignee | ||
Comment 22•10 years ago
|
||
I had to add a few extra conditions so that this would pass try:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c13eef766b75
Keywords: leave-open
Assignee | ||
Updated•10 years ago
|
Attachment #8592008 -
Flags: checkin+
Comment 23•10 years ago
|
||
This broke ASAN builds because they have disabled the crash reporter. I pushed a follow-up to fix that:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b35bfa6c9c0e
Assignee | ||
Comment 24•10 years ago
|
||
Thanks!
Updated•10 years ago
|
Crash Signature: , bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::dom::ContentParent::Observe(nsISupports*, cha ] → , bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::dom::ContentParent::Observe(nsISupports*, cha ]
[@ shutdownhang | WaitForSingleObjectEx | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_P…
Comment 25•10 years ago
|
||
Assignee | ||
Comment 26•9 years ago
|
||
The crash reports suggest that we're calling KillHard but it's not causing ActorDestroy to be called. That's surprising. There may be some issue with the Windows IPC code that's causing this. However, I'd rather fix this right away than investigate further. I'm pretty sure this patch should eliminate the hangs.
Attachment #8594078 -
Flags: review?(nchen)
Comment 27•9 years ago
|
||
Comment on attachment 8594078 [details] [diff] [review]
possible fix
Review of attachment 8594078 [details] [diff] [review]:
-----------------------------------------------------------------
Good idea!
Attachment #8594078 -
Flags: review?(nchen) → review+
Assignee | ||
Comment 28•9 years ago
|
||
Comment 29•9 years ago
|
||
Assignee | ||
Comment 30•9 years ago
|
||
Comment on attachment 8594078 [details] [diff] [review]
possible fix
This patch seems to have fixed the problem. I don't see any new reports since 4/21.
Approval Request Comment
[Feature/regressing bug #]: bug 1103036
[User impact if declined]: Shutdown hangs
[Describe test coverage new/current, TreeHerder]: On m-c for a while.
[Risks and why]: Patch is very low risk.
[String/UUID change made/needed]: None.
Attachment #8594078 -
Flags: approval-mozilla-aurora?
status-firefox40:
--- → fixed
Comment on attachment 8594078 [details] [diff] [review]
possible fix
Approved for uplift to aurora; it will be great if this helps eliminate some shutdown hangs.
Attachment #8594078 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 32•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 33•7 years ago
|
||
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in
before you can comment on or make changes to this bug.
Description
•