Closed
Bug 1359507
Opened 8 years ago
Closed 8 years ago
Mozilla Firefox Nightly 55.0a1 (2017-04-21) (64-bit) goes into unresponsive state and hangs after trying to add attachment on Bugzilla
Categories
(Core :: Gecko Profiler, defect)
Tracking
()
VERIFIED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox-esr45 | --- | unaffected |
firefox-esr52 | --- | unaffected |
firefox53 | --- | unaffected |
firefox54 | --- | unaffected |
firefox55 | + | verified |
People
(Reporter: Virtual, Assigned: away)
References
(Blocks 1 open bug)
Details
(Keywords: hang, nightly-community, regression)
Attachments
(2 files)
[Tracking Requested - why for this release]: Regression
STR:
1. Try to add attachment on Bugzilla
2. Enjoy the hang
Reporter | ||
Updated•8 years ago
|
Keywords: nightly-community
Reporter | ||
Updated•8 years ago
|
status-firefox-esr45:
--- → unaffected
status-firefox-esr52:
--- → unaffected
Reporter | ||
Updated•8 years ago
|
Has STR: --- → yes
Reporter | ||
Updated•8 years ago
|
Reporter | ||
Comment 1•8 years ago
|
||
"Speedy" Regression window (mozilla-central)
Good:
https://ftp.mozilla.org/pub/firefox/nightly/2017/04/2017-04-20-03-03-46-mozilla-central/
Bad:
https://ftp.mozilla.org/pub/firefox/nightly/2017/04/2017-04-21-03-02-41-mozilla-central/
Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=27311156637f9b5d4504373967e01c4241902ae7&tochange=dd530a59750adcaa0d48fa4f69b0cdb52715852a
Summary: Firefox goes into unresponsive state and hangs after trying to add attachment on Bugzilla → Mozilla Firefox Nightly 55.0a1 (2017-04-21) goes into unresponsive state and hangs after trying to add attachment on Bugzilla
Reporter | ||
Comment 2•8 years ago
|
||
"Speedy" Regression window (mozilla-inbound-win64)
Good:
https://ftp.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-win64/1492696708/
Bad:
https://ftp.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-win64/1492751968/
Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=2ddbf12eaee021e197c46071da5387cdae2af99e&tochange=f577512b1a299c7dd3325ac30538e021eeee6c90
Reporter | ||
Comment 3•8 years ago
|
||
str |
STR:
1. Start Firefox
2. Open this URL - https://bugzilla.mozilla.org/enter_bug.cgi?product=Core
3. Press "Add an attachment" button
4. Enjoy the hang
Regression window (mozilla-inbound-win64)
Good:
https://ftp.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-win64/1492696708/
Bad:
https://ftp.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-win64/1492726528/
Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=2ddbf12eaee021e197c46071da5387cdae2af99e&tochange=81de9d1439b0e352729142f6aa2914674073da03
Not reproducible on win32 builds, so maybe bug #1354611 is the cause.
Flags: needinfo?(dmajor)
It doesn't hang for me. Does this happen on a fresh profile? Can you grab some stacks with WinDbg? https://developer.mozilla.org/en-US/docs/Mozilla/How_to_get_a_stacktrace_with_WinDbg
(the "~*k" step is enough)
Flags: needinfo?(dmajor) → needinfo?(Virtual)
Comment 6•8 years ago
|
||
(In reply to Mike Taylor [:miketaylr] from comment #5)
> Jim, can you reproduce?
Not anymore. The hang was in system code while loading a dll. I think it had something to do with ieframe. I was able to reproduce this over the weekend in a local build but I just checked again and can't now. Haven't seen it in nightly. I did a pull and a clobber yesterday so that might be playing a part here. If it shows back up I'll grab the stack and post but honestly the stack was unactionable for the most part.
Flags: needinfo?(jmathies)
Reporter | ||
Comment 7•8 years ago
|
||
(In reply to David Major [:dmajor] from comment #4)
> It doesn't hang for me.
Maybe it's system dependent,
as it hangs on Windows 7 64-bit with Firefox 64-bit,
Firefox 32-bit is unaffected on Windows 7 64-bit,
and it's not reproducible on Windows 10 64-bit with Firefox 64-bit.
(In reply to David Major [:dmajor] from comment #4)
> Does this happen on a fresh profile?
Yes. Tested on Mozilla Firefox Nightly 55.0a1 (2017-04-21) (64-bit) [Portable] with clean new fresh profile without any addons (extensions, plugins, themes, etc.).
(In reply to David Major [:dmajor] from comment #4)
> Can you grab some stacks with WinDbg?
> https://developer.mozilla.org/en-US/docs/Mozilla/How_to_get_a_stacktrace_with_WinDbg
> (the "~*k" step is enough)
Unfortunately I'm not permitted to install any additional and external software on that system.
Maybe "Create dump file" will do the job?
Flags: needinfo?(Virtual) → needinfo?(dmajor)
Summary: Mozilla Firefox Nightly 55.0a1 (2017-04-21) goes into unresponsive state and hangs after trying to add attachment on Bugzilla → Mozilla Firefox Nightly 55.0a1 (2017-04-21) (64-bit) goes into unresponsive state and hangs after trying to add attachment on Bugzilla
Comment 8•8 years ago
|
||
I caught this in windbg, looks like there's a deadlock occurring occasionally in patched_LdrLoadDll.
Thanks Jim. It's definitely a mutual-wait:
Thread 0:
ntdll!ZwWaitForSingleObject+0xa
ntdll!RtlpWaitOnCriticalSection+0xe8
ntdll!RtlEnterCriticalSection+0xd1 <<<<<<<< waiting on stackwalk lock
mozglue!mozilla::SHA1Sum::SHA1Sum+0x159c
mozglue!`anonymous namespace'::patched_LdrLoadDll+0x18c
ieframe!CRWLock::CRWLock+0x1f
ieframe!CProcessElevationPolicy::CProcessElevationPolicy+0x39
ieframe!_xmm+0xa12
msvcrt!initterm+0x1f
ieframe!DllMainCRTStartup+0x10c
ntdll!LdrpRunInitializeRoutines+0x1fe <<<<<<< has the loader lock
ntdll!LdrGetProcedureAddressEx+0x2aa
ntdll!LdrGetProcedureAddress+0x11
KERNELBASE!GetProcAddress+0x5c
Thread 25:
ntdll!ZwWaitForSingleObject+0xa
ntdll!RtlpWaitOnCriticalSection+0xe8
ntdll!RtlEnterCriticalSection+0xd1 <<<<<<< waiting on loader lock
ntdll!LdrpLoadDll+0x2c7
ntdll!LdrLoadDll+0xed
mozglue!`anonymous namespace'::patched_LdrLoadDll+0x1a4 <<<<<<< has the stackwalk lock
Assignee | ||
Comment 10•8 years ago
|
||
> [Tracking Requested - why for this release]: Regression
I will add that this is a new hang and is relatively easy to hit (on 64-bit Win7, at least).
We shouldn't let this escape the nightly channel. I'll back out if I can't fix it quickly.
Assignee | ||
Comment 11•8 years ago
|
||
Thinking about this a bit, nobody should ever have to wait on the stackwalk lock. The intent of the stackwalk lock is to signal "It is not safe to call SuspendThread on me at this time". This ought to be an immediate and infallible operation, and it shouldn't require mutual exclusion. We should even be able to say "don't suspend me" while the profiler is in the middle of tracing another thread!
I'm going to try switching the innards of AcquireStackWalkWorkaroundLock to increment an atomic counter of functions that don't want stackwalking. The profiler can then give up if the counter is nonzero. (In theory this could even be a counter per thread, but I don't want to go there initially.) I discussed something similar with Markus a few weeks ago but ended up not needing it for reasons I don't remember.
Reporter | ||
Updated•8 years ago
|
Has Regression Range: --- → yes
Keywords: regressionwindow-wanted
Reporter | ||
Updated•8 years ago
|
Status: NEW → ASSIGNED
Component: Untriaged → DOM: Content Processes
Product: Firefox → Core
Tracking for 55 since it's a recent regression. Sounds like David has a handle on it already.
Assignee | ||
Comment 13•8 years ago
|
||
This patch replaces {Acquire,Release}StackWalkWorkaroundLock with a global counter of suppressions.
Please closely review my use of Atomic in particular.
Other notes:
- I based the name on JS's AutoSuppressGC.
- dllexport'ing all of struct AutoSuppressStackWalking is a bit odd-looking, but I like that it forces clients to always have their ++ and -- paired up.
- The #ifdef _M_AMD64 was really getting out of hand, so I decided to make this entire family of APIs available only on Win64. Most callers (including the ones yet to land) are already in Win64-only code anyway.
Attachment #8863901 -
Flags: review?(nfroyd)
Attachment #8863901 -
Flags: review?(mstange)
Comment 14•8 years ago
|
||
Comment on attachment 8863901 [details] [diff] [review]
Counter
Review of attachment 8863901 [details] [diff] [review]:
-----------------------------------------------------------------
I can't see anything wrong with this, but I'm not an atomics expert. I'm hoping Nathan can review that part.
Attachment #8863901 -
Flags: review?(mstange) → review+
Comment 15•8 years ago
|
||
Comment on attachment 8863901 [details] [diff] [review]
Counter
Review of attachment 8863901 [details] [diff] [review]:
-----------------------------------------------------------------
::: mozglue/misc/StackWalk.cpp
@@ +389,5 @@
> #endif
>
> #ifdef _WIN64
> + // Avoid deadlock; see comment at the definition of stackWalkSupppressions.
> + if (sStackWalkSuppressions) {
Maybe add another comment to explain how different threads can affect sStackWalkSuppressions between this check and the actual stack walking. Especially: The thread whose stack we'll be walking is currently suspended. If sStackWalkSuppressions has been zero at any point while that thread was suspended, then we know that the suspended thread for sure isn't holding the "lock". Other threads which are not suspended may have incremented sStackWalkSuppressions in the meantime since the check above, but that doesn't matter because we can't deadlock with those.
Comment 16•8 years ago
|
||
Comment on attachment 8863901 [details] [diff] [review]
Counter
Review of attachment 8863901 [details] [diff] [review]:
-----------------------------------------------------------------
You know, when efaust did the first patch for the win64 profiler hang, I convinced him that atomics wouldn't work. Apparently I didn't understand the profiler well enough.
r=me, but Markus is right: this needs a large comment somewhere explaining why atomics are OK and mutexes are not. Especially why it's OK to only check the atomic variable once before starting stack unwinding, and why we don't have to care about other threads incrementing that variable while we're doing stackwalking.
Attachment #8863901 -
Flags: review?(nfroyd) → review+
Comment 17•8 years ago
|
||
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ab0c15933c74
Replace the stack walk workaround lock with an atomic counter of suppressions. r=mstange,froydnj
Comment 18•8 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Reporter | ||
Comment 19•8 years ago
|
||
Looks fixed in Mozilla Firefox Nightly 55.0a1 (2017-05-04) (64-bit).
Thank you very much! \o/
Status: RESOLVED → VERIFIED
Reporter | ||
Updated•7 years ago
|
QA Contact: Virtual
You need to log in
before you can comment on or make changes to this bug.
Description
•