Closed
Bug 1143866
Opened 10 years ago
Closed 7 years ago
crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | PR_JoinThread | mozilla::dom::DOMStorageDBThread::Shutdown()
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
firefox36 | --- | affected |
firefox37 | --- | affected |
firefox38 | --- | affected |
firefox39 | --- | affected |
firefox42 | --- | affected |
firefox43 | --- | affected |
firefox44 | --- | affected |
firefox45 | --- | affected |
firefox47 | --- | affected |
firefox48 | --- | affected |
firefox49 | --- | affected |
firefox-esr38 | --- | affected |
firefox-esr45 | --- | affected |
firefox50 | --- | affected |
firefox51 | --- | affected |
People
(Reporter: u279076, Unassigned)
References
()
Details
(Keywords: crash, steps-wanted, topcrash-win)
Crash Data
Attachments
(2 files)
This bug was filed from the Socorro interface and is
report bp-593574cb-90c4-430d-a5a2-2220c2150309.
=============================================================
0 xul.dll mozilla::`anonymous namespace'::RunWatchdog(void*) toolkit/components/terminator/nsTerminator.cpp
1 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c
2 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c
3 msvcr120.dll _callthreadstartex f:\dd\vctools\crt\crtw32\startup\threadex.c:376
4 msvcr120.dll msvcr120.dll@0x2c000
5 kernel32.dll BaseThreadInitThunk
6 ntdll.dll __RtlUserThreadStart
7 ntdll.dll _RtlUserThreadStart
=============================================================
More reports: https://crash-stats.mozilla.com/report/list?product=Firefox&signature=shutdownhang+|+WaitForSingleObjectEx+|+WaitForSingleObject+|+PR_WaitCondVar+|+PR_JoinThread+|+mozilla%3A%3Adom%3A%3ADOMStorageDBThread%3A%3AShutdown%28%29#tab-reports
While there have been over 1600 reports of this crash over the last week, I do not believe this qualifies it for topcrash status.
Correlations:
* Top platform is Windows 7 @ 51%
* Top product is Firefox 36.0.1 @ 77%
* Top 10 GPUs are all Intel
* Top URLs are facebook, youtube, gmail, and twitter
* Comments: several people mention this occurs when restoring from a sleep/hibernate state
Florin, do you think you have time to see if you can shake out steps to reproduce?
Flags: needinfo?(florin.mezei)
Maybe Honza has some ideas.
Flags: needinfo?(honzab.moz)
We figured out earlier today that the "crashing thread" stack for shutdownhangs in Socorro does not match that of the reported signature. To get this you need to drill down to the other threads in the report. I have filed a Socorro bug to rectify this.
For clarity, here is the stack for the matching signature in this bug:
0 ntdll.dll NtWaitForSingleObject
1 kernelbase.dll WaitForSingleObjectEx
2 kernel32.dll WaitForSingleObjectExImplementation
3 kernel32.dll WaitForSingleObject
4 nss3.dll PR_WaitCondVar nsprpub/pr/src/threads/combined/prucv.c
5 nss3.dll PR_JoinThread nsprpub/pr/src/threads/combined/pruthr.c
6 xul.dll mozilla::dom::DOMStorageDBThread::Shutdown() dom/storage/DOMStorageDBThread.cpp
7 xul.dll mozilla::dom::DOMStorageCache::StopDatabase() dom/storage/DOMStorageCache.cpp
8 xul.dll mozilla::dom::DOMStorageObserver::Observe(nsISupports*, char const*, wchar_t const*) dom/storage/DOMStorageObserver.cpp
9 xul.dll nsObserverList::NotifyObservers(nsISupports*, char const*, wchar_t const*) xpcom/ds/nsObserverList.cpp
10 xul.dll nsObserverService::NotifyObservers(nsISupports*, char const*, wchar_t const*) xpcom/ds/nsObserverService.cpp
11 xul.dll nsXREDirProvider::DoShutdown() toolkit/xre/nsXREDirProvider.cpp
12 xul.dll ScopedXPCOMStartup::~ScopedXPCOMStartup() toolkit/xre/nsAppRunner.cpp
13 xul.dll ScopedXPCOMStartup::`scalar deleting destructor'(unsigned int)
14 xul.dll XREMain::XRE_main(int, char** const, nsXREAppData const*) toolkit/xre/nsAppRunner.cpp
15 xul.dll XRE_main toolkit/xre/nsAppRunner.cpp
16 firefox.exe do_main browser/app/nsBrowserApp.cpp
17 firefox.exe NS_internal_main(int, char**) browser/app/nsBrowserApp.cpp
18 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp
19 firefox.exe __tmainCRTStartup f:/dd/vctools/crt/crtw32/startup/crt0.c
20 kernel32.dll BaseThreadInitThunk
21 ntdll.dll __RtlUserThreadStart
22 ntdll.dll _RtlUserThreadStart
Comment 3•10 years ago
|
||
Hmm.. it actually "hangs" on the DOMStorage IO thread. I see two cases:
1. mozilla::storage::Connection::Close -> NtClose
2. mozilla::storage::Connection::Close -> arena_dalloc_large
I've encountered two other bugs (1124880 and 1131330) that hang in those two functions as well.
We either have a global problem here or it's just another individual case we have to handle separately.
I'm on both of those other bugs already, so I may find some conclusion there.
Flags: needinfo?(honzab.moz)
Comment 4•10 years ago
|
||
We tried to reproduce the issue today on two separate machines (Windows 7 x64 and x86) using info from comment 0, but we could not get the crash. Also we have not encountered this during our testing last week. At this point it seems highly unlikely that we could find steps to reproduce this.
Flags: needinfo?(florin.mezei)
Comment 5•10 years ago
|
||
I don't know when, but in the last weeks there was a patch checked in that fix a lot of the [OOM | small] crashes. So for me - I had this signature often before - it is nearly impossible to crash FF with this signature again.
Can somebody figure out when this patch was checked in and if the [shutdownhang] and [WaitForSingleObjectEx] crashes rose after that date ???
If so, I guess that it is (nearly) the same problem as described from me here:
https://bugzilla.mozilla.org/show_bug.cgi?id=1131330#c37
In short words:
FF now crashes no more on complex JS with a [OOM | small] (e.g. by Facebook, Google, Youtube, Twitter) because FF have now a patch that prevent FF from crashing with this signature, but now FF is not able to handle the cases that have FF crashed before.
As it normally only happens on Windows and there the memory goes very often by this pages "crazy", I guess it is a Garbage Collection problem (in different variations).
Solution should be IMHO to analyze mem-reports again, find the problems, fix GC and look if the crash signatures go down again.
Comment 6•10 years ago
|
||
Some "new" signature I had in last time:
[@ shutdownhang | ReleaseData(void*, unsigned int) ]
[@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_InvokeByIndex ]
[@ shutdownhang | _VEC_memset ]
[@ hang | WaitForMultipleObjectsEx | RealMsgWaitForMultipleObjectsEx | MsgWaitForMultipleObjects | F_1152915508___________________________________ ]
[@ shutdownhang | RtlEnterCriticalSection | PR_Lock | mozilla::net::CacheFile::~CacheFile()]
[OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xrealloc | mozilla::CycleCollectedJSRuntime::DeferredFinalize(nsISupports*)]
[@ js::gc::AllocateNonObject<JSFatInlineString, int>(js::ExclusiveContext*)]
I attach two mem-reports from the last days that were auto-created by FF but didn't send out because FF was no more crashing.
Btw.: FF hangs during creating this reports but don't send them out anymore because FF crash no more. Think there should be something changed again ...
The second archive is incomplete because it seems I killed the task before FF was able to finish the report.
But I don't think that FF would ever finished the report because the file in the archive would have - when complete - over 3GB !!!
Comment 7•10 years ago
|
||
Comment 8•10 years ago
|
||
Comment 9•10 years ago
|
||
Some more infos:
- I never use sleep/hibernate.
- Plz try to open in Facebook e.g. a group, where at the right side the advertisements and suggestions reload all the time. Then let it run for some time (some hours).
Comment 10•10 years ago
|
||
Tobias, unless you are sure your reports belong to this *specific* shutdownhang in DOMStorageDBThread shutdown, please put your comments on a separate bug report instead.
Also, there are no good reasons I know of to believe that OOM|small would have anything to do with any shutdownhangs. The former means we were out of (virtual) memory, the latter means Firefox was already being (intentionally) closed and could not finish that in more than a minute of time.
Comment 11•10 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #10)
> Tobias, unless you are sure your reports belong to this *specific*
> shutdownhang in DOMStorageDBThread shutdown, please put your comments on a
> separate bug report instead.
>
> Also, there are no good reasons I know of to believe that OOM|small would
> have anything to do with any shutdownhangs. The former means we were out of
> (virtual) memory, the latter means Firefox was already being (intentionally)
> closed and could not finish that in more than a minute of time.
Plz read again:
(In reply to Tobias B. Besemer from comment #5)
> I don't know when, but in the last weeks there was a patch checked in that
> fix a lot of the [OOM | small] crashes. So for me - I had this signature
> often before - it is nearly impossible to crash FF with this signature again.
>
> Can somebody figure out when this patch was checked in and if the
> [shutdownhang] and [WaitForSingleObjectEx] crashes rose after that date ???
>
> If so, I guess that it is (nearly) the same problem as described from me
> here:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1131330#c37
>
> In short words:
> FF now crashes no more on complex JS with a [OOM | small] (e.g. by Facebook,
> Google, Youtube, Twitter) because FF have now a patch that prevent FF from
> crashing with this signature, but now FF is not able to handle the cases
> that have FF crashed before.
> As it normally only happens on Windows and there the memory goes very often
> by this pages "crazy", I guess it is a Garbage Collection problem (in
> different variations).
> Solution should be IMHO to analyze mem-reports again, find the problems, fix
> GC and look if the crash signatures go down again.
Maybe you should read: https://bugzilla.mozilla.org/show_bug.cgi?id=1131330#c37
Comment 12•10 years ago
|
||
Robert, I comment on this bug because of "See Also: 1131330 Remove" and "Keywords: steps-wanted".
Bug 1131330 was also "steps-wanted" and you can find them here:
https://bugzilla.mozilla.org/show_bug.cgi?id=1131330#c37
I was able to reproduce it because I had an idea what should be the reason for the crash and so I guess what is the reason of the crash. ;-)
Comment 13•10 years ago
|
||
They still do not have anything to do with each other. You are speculating.
Comment 14•10 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #13)
> They still do not have anything to do with each other. You are speculating.
OK, I speculating! ;-)
But the crash seems to be gone in the last days, that's fine, or? :-)
https://crash-stats.mozilla.com/report/list?signature=shutdownhang%20|%20WaitForSingleObjectEx%20|%20WaitForSingleObject%20|%20PR_WaitCondVar%20|%20PR_JoinThread%20|%20mozilla%3A%3Adom%3A%3ADOMStorageDBThread%3A%3AShutdown%28%29#tab-table
Updated•10 years ago
|
OS: Windows NT → Windows
Updated•9 years ago
|
Crash Signature: [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | PR_JoinThread | mozilla::dom::DOMStorageDBThread::Shutdown()] → [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | PR_JoinThread | mozilla::dom::DOMStorageDBThread::Shutdown()]
[@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | PR_JoinThread | mozilla::dom::…
Comment 15•9 years ago
|
||
Zero hits in the last 28 days for this signature:
https://crash-stats.mozilla.com/report/list?range_unit=days&range_value=28&signature=shutdownhang+|+WaitForSingleObjectEx+|+WaitForSingleObject+|+PR_WaitCondVar+|+PR_JoinThread+|+mozilla%3A%3Adom%3A%3ADOMStorageDBThread%3A%3AShutdown%28%29
-> WFM
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Comment 16•9 years ago
|
||
*arg* Signature have changed ... :-/
https://crash-stats.mozilla.com/report/list?range_unit=days&range_value=28&signature=shutdownhang+|+WaitForSingleObjectEx+|+WaitForSingleObject+|+PR_WaitCondVar+|+PR_JoinThread+|+mozilla%3A%3Adom%3A%3ADOMStorageDBThread%3A%3AShutdown
Last 28 days:
Operating System Percentage Number Of Crashes
Windows 7 59.66% 2181
Windows XP 14.28% 522
Windows 8.1 13.48% 493
Windows Vista 6.32% 231
Windows 10 4.18% 153
Windows 8 2.02% 74
Windows Unknown 0.06% 2
18 comments, just one useful:
part of the problem is that time warner wifi service is slower that the old telephone dialup service. they ae the only game in town so I am stuck with them
Submitted: 2015-11-29T13:57:43+00:00
https://crash-stats.mozilla.com/report/index/d2fbe824-e635-4a7b-9175-f37332151129
Last report on FF44.0a2:
https://crash-stats.mozilla.com/report/index/85e33bee-3ef1-46b2-8979-854cc2151202
Crashing Thread
Frame Module Signature Source
0 xul.dll mozilla::`anonymous namespace'::RunWatchdog(void*)
toolkit/components/terminator/nsTerminator.cpp
1 nss3.dll _PR_NativeRunThread
nsprpub/pr/src/threads/combined/pruthr.c
2 nss3.dll pr_root
nsprpub/pr/src/md/windows/w95thred.c
3 msvcr120.dll _callthreadstartex
f:\dd\vctools\crt\crtw32\startup\threadex.c:376
4 msvcr120.dll msvcr120.dll@0x2c000
5 kernel32.dll BaseThreadStart
Seems the Crashing Thread is always the same.
Status: RESOLVED → REOPENED
status-firefox42:
--- → affected
status-firefox43:
--- → affected
status-firefox44:
--- → affected
status-firefox45:
--- → affected
status-firefox-esr38:
--- → affected
Keywords: topcrash-win
Resolution: WORKSFORME → ---
Updated•9 years ago
|
Component: DOM → Hardware Abstraction Layer (HAL)
Updated•9 years ago
|
Component: Hardware Abstraction Layer (HAL) → DOM
It's been a year since you were looking at those other bugs, and this is still here ...
Flags: needinfo?(honzab.moz)
Comment 18•9 years ago
|
||
I've spend a lot of time on this. We all did in necko, also asking around etc. No idea where from hangs in ntclose() and other IO operations are coming. Could be just a slow hard drive and just a coincidence the watchdog catches it in NTClose (where it may hang for few hundred ms). I also start thinking that _VEC_memset is just hanging because of system swaps, so again - blocked on io.
True is that we should not crash. But we are crashing artificially... I start thinking that the shutdown watchdog is a bad thing to do by only one measure. Apparently slower systems should have the delay longer.
To sum, no, I personally don't have any new data here. Sorry.
Flags: needinfo?(honzab.moz)
Comment 19•9 years ago
|
||
BTW, the hangs are inside mozstorage. You should target your questions to someone who maintains that code, like mak77.
Comment 20•9 years ago
|
||
Dropping in as a mozStorage peer... the NtClose thing seems like this is a system issue, but indeed there seem to be a lot of other indications in other stacks that indicate I/O issues, like the mozStorage stack being in a call to FlushFileBuffers, etc. There seem to be two major possibilities:
* Anti-virus issues. Brief searching brought up http://stackoverflow.com/questions/24169830/how-can-i-work-out-which-process-thread-owns-the-resource-that-my-program-is-han where someone who seems to know their windows kernels investigation stuff identified Symantec anti-virus as a problem with NtClose.
* I/O backlog (which can also be compounded with anti-virus). It probably also doesn't help that closes on Windows can be excessively slow, as :gps identified at http://gregoryszorc.com/blog/2015/10/22/append-i/o-performance-on-windows/
Anti-virus wise, does Socorro know whether there is an active anti-virus on the system and what it is?
I/O backlog wise... could the watchdog maybe do something crazy like join on a thread that issues one or more writes to the profile directory followed by a FlushFileBuffer invocation? The idea would be that the watchdog wouldn't kill the process if I/O is hopelessly backlogged while also grabbing a measurement point about the I/O status if it does force a crash. It would be very useful to know that it took 5 seconds to issue a write and a flush when the forced crash happens, for example. (Or even if there's just a number that indicates how long the crash-dump process took from start to finish if we have that?)
Comment 21•9 years ago
|
||
(In reply to Andrew Sutherland [:asuth] from comment #20)
> Anti-virus wise, does Socorro know whether there is an active anti-virus on
> the system and what it is?
Only if they have loaded a library into our process or installed and LSP.
Comment 22•8 years ago
|
||
Crash volume for signature 'shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | PR_JoinThread | mozilla::dom::DOMStorageDBThread::Shutdown':
- nightly (version 50): 3 crashes from 2016-06-06.
- aurora (version 49): 41 crashes from 2016-06-07.
- beta (version 48): 1774 crashes from 2016-06-06.
- release (version 47): 11676 crashes from 2016-05-31.
- esr (version 45): 749 crashes from 2016-04-07.
Crash volume on the last weeks:
Week N-1 Week N-2 Week N-3 Week N-4 Week N-5 Week N-6 Week N-7
- nightly 0 0 0 0 2 1 0
- aurora 8 4 8 6 6 7 1
- beta 224 239 207 274 345 327 83
- release 1814 1918 1724 1706 1764 1725 490
- esr 81 94 71 64 111 88 64
Affected platform: Windows
status-firefox47:
--- → affected
status-firefox48:
--- → affected
status-firefox49:
--- → affected
status-firefox50:
--- → affected
status-firefox-esr45:
--- → affected
Comment 23•8 years ago
|
||
Crash volume for signature 'shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | PR_JoinThread | mozilla::dom::DOMStorageDBThread::Shutdown':
- nightly (version 51): 2 crashes from 2016-08-01.
- aurora (version 50): 6 crashes from 2016-08-01.
- beta (version 49): 389 crashes from 2016-08-02.
- release (version 48): 477 crashes from 2016-07-25.
- esr (version 45): 967 crashes from 2016-05-02.
Crash volume on the last weeks (Week N is from 08-22 to 08-28):
W. N-1 W. N-2 W. N-3
- nightly 0 1 1
- aurora 0 5 1
- beta 103 222 64
- release 107 275 95
- esr 47 89 83
Affected platform: Windows
Crash rank on the last 7 days:
Browser Content Plugin
- nightly
- aurora
- beta
- release
- esr
status-firefox51:
--- → affected
Comment 24•8 years ago
|
||
Fixed?
Updated•7 years ago
|
Status: REOPENED → RESOLVED
Closed: 9 years ago → 7 years ago
Resolution: --- → WORKSFORME
Assignee | ||
Updated•6 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•