Closed
Bug 1323084
Opened 8 years ago
Closed 7 years ago
Crash in shutdownhang | PR_MD_WAIT_CV | mozilla::CondVar::Wait | nsEventQueue::GetEvent | nsThread::nsChainedEventQueue::GetEvent
Categories
(Core :: Networking: HTTP, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: n.nethercote, Unassigned)
Details
(Keywords: crash, Whiteboard: [necko-next])
Crash Data
This bug was filed from the Socorro interface and is
report bp-d4fc7a65-cd4e-4db2-95d1-8cc152161210.
=============================================================
First occurrence of this was in Nightly 20161119030204. Since then it has occurred 57 times. Over the past week it's the #19 topcrash on Nightly; no sign of it on Aurora, which makes sense because the last merge date was November 14.
This is the regression window for that Nightly:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=28e2a6dde76ab6ad4464a3662df1bd57af04398a&tochange=f09e137ead39230eaa94f47988ccce2cfcda4195
But this crash is infrequent enough that it doesn't happen every day, so it's possible the true cause landed a day or two before that.
Honza, any ideas? The stack trace for the hanging thread contains network shutdown stuff -- though it varies a bit from crash to crash -- so it looks like it's related to networking.
Flags: needinfo?(honzab.moz)
Updated•8 years ago
|
Flags: needinfo?(honzab.moz) → needinfo?(mcmanus)
Comment 1•8 years ago
|
||
this is a pr_poll shutdiown hang - which is certainly not new (but is usually not win 10) .. maybe the signature is different or something about it is different.
dragana is the expert
Flags: needinfo?(mcmanus) → needinfo?(dd.mozilla)
Comment 2•8 years ago
|
||
We still do not have an easy way to analyze this shutdown hangs. The hangs here have may different causes, I had look at 20 crashes:
https://crash-stats.mozilla.com/report/index/ba47ab2d-0034-4b0c-bf05-d93c22161209 and https://crash-stats.mozilla.com/report/index/b14d7053-c702-4486-82e7-012212161212 and https://crash-stats.mozilla.com/report/index/df190ed9-44e4-4ffd-afcf-0fc272161212 - hang in mozilla::dom::workers::RuntimeService::Cleanup(); the socketThread shutdown have not started yet. In the first and the third crash, there is a thread that is in mozilla::dom::workers::WorkerPrivate::DoRunLoop(JSContext*) dom/workers/WorkerPrivate.cpp:4573, but I do not know what it is waiting for. In the second crash:
8 xul.dll js::gc::GCRuntime::gc(JSGCInvocationKind, JS::gcreason::Reason) js/src/jsgc.cpp:6398
9 xul.dll mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal(JSContext*, bool, bool) dom/workers/WorkerPrivate.cpp:6267
10 xul.dll `anonymous namespace'::GarbageCollectRunnable::WorkerRun dom/workers/WorkerPrivate.cpp:1394
11 xul.dll mozilla::dom::workers::WorkerRunnable::Run()
https://crash-stats.mozilla.com/report/index/025116c3-a3d0-4f58-8142-7ef942161210 and https://crash-stats.mozilla.com/report/index/fa032da1-a037-4107-9f44-dce8c2161210 and https://crash-stats.mozilla.com/report/index/ebc28ddb-f9bf-4965-a9a3-af2442161212 - hangs in mozilla::dom::XMLHttpRequestMainThread::SendInternal(mozilla::dom::XMLHttpRequestMainThread::RequestBodyBase const*) dom/xhr/XMLHttpRequestMainThread.cpp:3011, but I am not sure what it is waiting for, maybe the request to finish, timer to expire etc.
https://crash-stats.mozilla.com/report/index/186dcd92-8c38-4f54-b4df-7837a2161210 - js runs on the main thread
https://crash-stats.mozilla.com/report/index/e5083587-2a6e-48d7-931e-463962161210 and https://crash-stats.mozilla.com/report/index/d4fc7a65-cd4e-4db2-95d1-8cc152161210 and https://crash-stats.mozilla.com/report/index/47dc423a-34ae-44b9-a936-104162161212 and https://crash-stats.mozilla.com/report/index/4ed8dcf3-d5ba-4102-a193-a5e1e2161212 - our normal poll hangs
https://crash-stats.mozilla.com/report/index/fae718bb-0f61-4a77-b0fc-6c24a2161210 - This is waiting on socket thread to shutdown and socket thread is doing:
Module Signature Source
0 ntdll.dll NtWaitForSingleObject
1 KERNELBASE.dll WaitForSingleObjectEx
2 nss3.dll PR_MD_WAIT_CV nsprpub/pr/src/md/windows/w95cv.c:248
3 nss3.dll PR_WaitCondVar nsprpub/pr/src/threads/combined/prucv.c:172
4 nss3.dll PR_WaitCondVar nsprpub/pr/src/threads/combined/prucv.c:525
5 nss3.dll PR_Wait nsprpub/pr/src/threads/prmon.c:298
6 xul.dll mozilla::ReentrantMonitor::Wait(unsigned int) obj-firefox/dist/include/mozilla/ReentrantMonitor.h:91
7 xul.dll nsThread::Init() xpcom/threads/nsThread.cpp:660
8 xul.dll nsThreadManager::NewThread(unsigned int, unsigned int, nsIThread**) xpcom/threads/nsThreadManager.cpp:260
9 xul.dll nsThreadPool::PutEvent(already_AddRefed<nsIRunnable>, unsigned int) xpcom/threads/nsThreadPool.cpp:107
10 xul.dll nsThreadPool::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) xpcom/threads/nsThreadPool.cpp:275
11 xul.dll mozilla::net::nsStreamTransportService::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) netwerk/base/nsStreamTransportService.cpp:490
12 xul.dll nsIEventTarget::Dispatch(nsIRunnable*, unsigned int) obj-firefox/dist/include/nsIEventTarget.h:37
13 xul.dll nsAStreamCopier::PostContinuationEvent_Locked() xpcom/io/nsStreamUtils.cpp:476
14 xul.dll nsAStreamCopier::PostContinuationEvent() xpcom/io/nsStreamUtils.cpp:467
15 xul.dll nsAStreamCopier::OnOutputStreamReady(nsIAsyncOutputStream*) xpcom/io/nsStreamUtils.cpp:435
16 xul.dll mozilla::net::nsSocketOutputStream::OnSocketReady(nsresult) netwerk/base/nsSocketTransport2.cpp:552
17 xul.dll mozilla::net::nsSocketTransport::OnSocketEvent(unsigned int, nsresult, nsISupports*) netwerk/base/nsSocketTransport2.cpp:1845
18 xul.dll mozilla::net::nsSocketEvent::Run() netwerk/base/nsSocketTransport2.cpp:85
19 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1213
20 xul.dll mozilla::net::nsSocketTransportService::Run() netwerk/base/nsSocketTransportService2.cpp:939
https://crash-stats.mozilla.com/report/index/806797aa-be7f-4758-84f2-de4e82161211 and https://crash-stats.mozilla.com/report/index/aa60fdd0-7eb3-4522-96f8-0e8a62161211 - socketThread is already shutdown, We are waiting for the main thread to shutdown and I am not sure what the main thread is waiting for.
https://crash-stats.mozilla.com/report/index/1ccd079f-dcf5-4272-8f03-1de1e2161212 - not sure where it hangs
https://crash-stats.mozilla.com/report/index/769c61f2-b5d3-4878-96f9-2d5802161212 - hangs in trying to shutdown thread pool in 10 xul.dll nsThread::Shutdown() xpcom/threads/nsThread.cpp:983
11 xul.dll nsThreadPool::Shutdown() xpcom/threads/nsThreadPool.cpp:328
12 xul.dll mozilla::net::nsStreamTransportService::Observe(nsISupports*, char const*, char16_t const*) netwerk/base/nsStreamTransportService.cpp:556
ttps://crash-stats.mozilla.com/report/index/4241b3dc-fbaa-4611-9c21-248d32161212 - socket thread is waiting for:
10 xul.dll nsThread::Shutdown() xpcom/threads/nsThread.cpp:983
11 xul.dll nsThreadPool::Shutdown() xpcom/threads/nsThreadPool.cpp:328
12 xul.dll mozilla::psm::StopSSLServerCertVerificationThreads() security/manager/ssl/SSLServerCertVerification.cpp:207
https://crash-stats.mozilla.com/report/index/a94cf788-8027-49dc-868b-6dab32161212 - I am not sure, but it is not socketThread, it is already closed.
https://crash-stats.mozilla.com/report/index/b5fe58f4-71bf-4a23-8420-173312161212 and https://crash-stats.mozilla.com/report/index/0cce1bcb-ce6d-493a-9039-817f32161212 - hang in SocketClose
Flags: needinfo?(dd.mozilla)
Comment 3•8 years ago
|
||
6 crashes in comment #2 are know shutdown hangs cause by a hang in poll and socketClose. the others are different. I assume that the number of the other crashes rose because of a regression.
Comment 4•8 years ago
|
||
I have only looked at Windows 10 crashes.
Dragana, seems like you've got this. Feel free to re-triage as appropriate.
Whiteboard: [necko-next]
Comment 6•8 years ago
|
||
Ben, can you look at workers crash? I am not sure your are the right person.
Nathan, did something change in the thread pool shutdown, we are hanging in it from a couple of different code paths?
Thanks!
Flags: needinfo?(nfroyd)
Flags: needinfo?(bkelly)
Comment 7•8 years ago
|
||
(In reply to Dragana Damjanovic [:dragana] from comment #6)
> Nathan, did something change in the thread pool shutdown, we are hanging in
> it from a couple of different code paths?
We didn't change anything in thread pool shutdown recently AFAIK, but...
(In reply to Nicholas Nethercote [:njn] from comment #0)
> First occurrence of this was in Nightly 20161119030204. Since then it has
> occurred 57 times. Over the past week it's the #19 topcrash on Nightly; no
> sign of it on Aurora, which makes sense because the last merge date was
> November 14.
...we did change some code around the internal event queues used by threads in bug 1306591, which landed on 2016-11-08. Crashes showing up a week and a half later seem kind of unlikely to be correlated with that, but it's possible, I suppose? We also changed some internal nsThread bits around for requestIdle callback (bug 1198381, landed 2016-10-30) and various followup bugs, though I don't think any of the followup bugs were that significant.
I'm not sure if that makes things any clearer.
Flags: needinfo?(nfroyd)
Comment 8•8 years ago
|
||
(In reply to Dragana Damjanovic [:dragana] from comment #2)
> https://crash-stats.mozilla.com/report/index/ba47ab2d-0034-4b0c-bf05-
> d93c22161209 and
> https://crash-stats.mozilla.com/report/index/b14d7053-c702-4486-82e7-
> 012212161212 and
> https://crash-stats.mozilla.com/report/index/df190ed9-44e4-4ffd-afcf-
> 0fc272161212 - hang in mozilla::dom::workers::RuntimeService::Cleanup(); the
> socketThread shutdown have not started yet. In the first and the third
> crash, there is a thread that is in
> mozilla::dom::workers::WorkerPrivate::DoRunLoop(JSContext*)
> dom/workers/WorkerPrivate.cpp:4573, but I do not know what it is waiting
> for. In the second crash:
> 8 xul.dll js::gc::GCRuntime::gc(JSGCInvocationKind, JS::gcreason::Reason)
> js/src/jsgc.cpp:6398
> 9 xul.dll
> mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal(JSContext*,
> bool, bool) dom/workers/WorkerPrivate.cpp:6267
> 10 xul.dll `anonymous namespace'::GarbageCollectRunnable::WorkerRun
> dom/workers/WorkerPrivate.cpp:1394
> 11 xul.dll mozilla::dom::workers::WorkerRunnable::Run()
>
Andrea do you have any thoughts on the worker specifics in comment 2? It seems we can get into a case where the worker thread is stuck waiting for control runnables, but since the browser is shutting down we don't end up sending them the control runnable.
Flags: needinfo?(bkelly) → needinfo?(amarchesini)
Comment 9•8 years ago
|
||
Comment 10•8 years ago
|
||
Aurora has what appears to be a very similar version of this?
https://crash-stats.mozilla.com/signature/?product=Firefox&signature=shutdownhang%20%7C%20mozilla%3A%3ACondVar%3A%3AWait%20%7C%20nsEventQueue%3A%3AGetEvent%20%7C%20nsThread%3A%3AnsChainedEventQueue%3A%3AGetEvent
status-firefox52:
--- → affected
status-firefox53:
--- → affected
Comment 11•8 years ago
|
||
It seems that the worker is waiting for something. The runnable is blocked, and the worker cannot be released.
3 xul.dll js::ConditionVariable::wait(js::LockGuard<js::Mutex>&) js/src/threading/windows/ConditionVariable.cpp:364
4 xul.dll js::GCHelperState::waitBackgroundSweepEnd() js/src/jsgc.cpp:3356
5 xul.dll js::gc::GCRuntime::incrementalCollectSlice(js::SliceBudget&, JS::gcreason::Reason, js::AutoLockForExclusiveAccess&) js/src/jsgc.cpp:5927
6 xul.dll js::gc::GCRuntime::gcCycle(bool, js::SliceBudget&, JS::gcreason::Reason) js/src/jsgc.cpp:6183
7 xul.dll js::gc::GCRuntime::collect(bool, js::SliceBudget, JS::gcreason::Reason) js/src/jsgc.cpp:6337
8 xul.dll js::gc::GCRuntime::gc(JSGCInvocationKind, JS::gcreason::Reason) js/src/jsgc.cpp:6398
9 xul.dll mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal(JSContext*, bool, bool) dom/workers/WorkerPrivate.cpp:6267
Flags: needinfo?(amarchesini)
Comment 12•8 years ago
|
||
The similar signature mentioned in comment 10 was filed as bug 1329966.
Comment 13•8 years ago
|
||
Too late for firefox 52, mass-wontfix.
Comment 14•7 years ago
|
||
The crash no longer happens after build id 20170228030203. Shall we close it by same reason as bug 1329966?
Flags: needinfo?(madperson)
Comment 15•7 years ago
|
||
probably yes - those shifting shutdownhang signatures are really getting confusing. the crash in comment #0 seems to be the same one as discussed in bug 1158189.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(madperson)
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•