Closed Bug 1542485 Opened 6 years ago Closed 2 years ago

Crash in [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe]

Categories

(Core :: Storage: Quota Manager, defect, P2)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1588498
Tracking Status
firefox-esr60 --- unaffected
firefox67 + wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix
firefox81 --- affected
firefox82 --- affected
firefox83 --- affected
firefox84 --- affected

People

(Reporter: janv, Unassigned)

References

Details

(Keywords: crash, nightly-community, top50)

This bug is for crash report bp-44477099-72f5-4a39-8359-a8c4b0190406.

Top 10 frames of crashing thread:

0 ntdll.dll NtWaitForAlertByThreadId 
1 ntdll.dll RtlSleepConditionVariableSRW 
2 kernelbase.dll SleepConditionVariableSRW 
3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:50
4 xul.dll struct already_AddRefed<nsIRunnable> mozilla::ThreadEventQueue<mozilla::PrioritizedEventQueue<mozilla::EventQueue> >::GetEvent xpcom/threads/ThreadEventQueue.cpp:149
5 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1108
6 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:482
7 xul.dll static bool mozilla::SpinEventLoopUntil<mozilla::ProcessFailureBehavior::ReportToCaller, `lambda at z:/task_1553508105/build/src/dom/quota/ActorsParent.cpp:2552:5'> xpcom/threads/nsThreadUtils.h:348
8 xul.dll nsresult mozilla::dom::quota::QuotaManager::Observer::Observe dom/quota/ActorsParent.cpp:2552
9 xul.dll nsObserverList::NotifyObservers xpcom/ds/nsObserverList.cpp:66

Crash Signature: [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] → [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | __psynch_cvwait | <name omitted> | mozilla::dom::quota::QuotaManager::Observer::Observe]
Crash Signature: [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | __psynch_cvwait | <name omitted> | mozilla::dom::quota::QuotaManager::Observer::Observe] → [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | __psynch_cvwait | <name omitted> | mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | ntdll.dll | kernelbase.…
Assignee: nobody → jvarga
Status: NEW → ASSIGNED

This is a new bug for the crash that has been tracked in bug 1487194.
There were several changes in signatures, so the old bug now has too many signatures which makes it a bit hard to understand.
This new bug is mostly useful for FF 68 and even FF 67.
The old one is good for FF 66.

Keywords: topcrash
Priority: -- → P1

Jan, looking at the crash graph here makes me think something got fixed in nightly that hasn't (yet?) made it to beta?

Flags: needinfo?(jvarga)

Andrew, this graph is a little bit misleading. It's a bit more complicated, but basically starting with FF 68, these shutdown hangs are tracked in bug 1541776 and bug 1541928.
Anyway, bug 1542541 may have helped with reducing shutdown hangs and we should see after today's new beta.
Note, that beta still uses the old crash signatures (as in this bug).
Sorry for the confusion.

Flags: needinfo?(jvarga)

Thanks for clarifying - it is indeed a complicated situation but I'm confident we're on the right track to see major improvements here soon!

The volume of crashes didn't significantly go down after the patch in bug 1542541 which landed in beta 12, jan is there anything safe we could do to redice these crashes in the 67 timeframe (release and dot releases?) Thanks

Flags: needinfo?(jvarga)

Yeah, I should have more time for that now. I'm going to add more debugging info to crash annotations as the first thing.

Flags: needinfo?(jvarga)

(In reply to Jan Varga [:janv] from comment #6)

Yeah, I should have more time for that now. I'm going to add more debugging info to crash annotations as the first thing.

Any news on this front? Is there something we could do to mitigate the crashes in a RC2 this week?

Flags: needinfo?(jvarga)

I'm afraid that this can't be easily fixed/mitigated. It's a long standing issue, we still don't know exactly why is this happening. I'm going to add more info to crash annotations as I mentioned earlier. I was just too busy with last (hopefully) major LSNG bug 1534222.
When we have more info in crash annotations, we need to wait several days to collect and process data from crash-stats.
It's possible that we will have to add even more info to crash annotations. Once we know what the problem is, it's likely that the fix will be risky.
Btw, this bug tracks the same issue as bug 1487194. The only difference is that crash signatures changed since FF 66. So this is not a new issue, we already ship with it.
See also bug 1487194 comment 43.

Flags: needinfo?(jvarga)

(In reply to Jan Varga [:janv] from comment #8)

I'm afraid that this can't be easily fixed/mitigated. It's a long standing issue, we still don't know exactly why is this happening. I'm going to add more info to crash annotations as I mentioned earlier. I was just too busy with last (hopefully) major LSNG bug 1534222.
When we have more info in crash annotations, we need to wait several days to collect and process data from crash-stats.
It's possible that we will have to add even more info to crash annotations. Once we know what the problem is, it's likely that the fix will be risky.
Btw, this bug tracks the same issue as bug 1487194. The only difference is that crash signatures changed since FF 66. So this is not a new issue, we already ship with it.
See also bug 1487194 comment 43.

Not a new regression so we are not going to block 67 on it, thanks!

Thanks for the details, Jan. Bug 1542478 is where we're going to add more info to IDB shutdown hang crash annotations.

Crash Signature: kernelbase.dll | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | ntdll.dll | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] → kernelbase.dll | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | ntdll.dll | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | mozilla::dom:…

This is the #1 Top Crasher for FF67 (Release).

Top Crashers for Firefox 67.0.1 (Release) - 7 days ago
#1 10.78% -2.81% shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe 8842 8842 0 0 7928 0 2019-03-08

Signature report for shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe
Showing results from 7 days ago
11,844 Results

Windows 10 8855 74.8%
Windows 7 1902 16.1%
Windows 8.1 940 7.9%
Windows 8 147 1.2%

Firefox 68.0b7 29 0.2% 29
Firefox 68.0b8 18 0.2% 15
Firefox 68.0b6 14 0.1% 10
Firefox 68.0b9 14 0.1% 13
Firefox 68.0b4 7 0.1% 5
Firefox 68.0b5 6 0.1% 8
Firefox 68.0a1 4 0.0% 3
Firefox 68.0b3 4 0.0% 3
Firefox 68.0b1 1 0.0% 1
Firefox 67.0.1 8837 74.6% 7215
Firefox 67.0.2 1545 13.0% 959
Firefox 67.0 1241 10.5% 1395
Firefox 67.0rc2 23 0.2% 23
Firefox 67.0b15 12 0.1% 9

Architecture
amd64 8978 75.8%
x86 2866 24.2%

Signature report for shutdownhang | mozilla::dom::quota::QuotaManager::Observer::Observe
Showing results from 7 days ago
11 Results

Firefox 69.0a1 8 72.7% 8
Thunderbird 67.0b0 3 27.3% 3

OS: Windows 10 → All
Hardware: Unspecified → All

It's hard to compare this problem between ESR, FF 67 and FF 68 given the signature changed multiple times.
It used to be mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe
then it was changed to mozilla::dom::quota::QuotaManager::Observer::Observe
and then we created a separate bug for IndexedDB, bug 1541776.

The good news is that number of shutdown hangs caused by quota manager or quota clients (IndexedDB or LocalStorage) have been greatly reduced in current FF 69 nightly. Fixes have been uplifted to FF 68 beta and we see improvements there too.
There's still some work though.

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

Jan, is this alias accurate?

Alias: qm-shutdown-hangs
Flags: needinfo?(jvarga)

Yeah, it's ok.

Flags: needinfo?(jvarga)
Depends on: 1584323
Priority: P1 → P2
Assignee: jvarga → nobody
Status: ASSIGNED → NEW
Crash Signature: mozilla::dom::quota::QuotaManager::Observer::Observe] → mozilla::dom::quota::QuotaManager::Observer::Observe] {@ shutdownhang | __psynch_cvwait | <name omitted> | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe]
Crash Signature: mozilla::dom::quota::QuotaManager::Observer::Observe] {@ shutdownhang | __psynch_cvwait | <name omitted> | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] → mozilla::dom::quota::QuotaManager::Observer::Observe] {@ shutdownhang | __psynch_cvwait | <name omitted> | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ]
Crash Signature: mozilla::dom::quota::QuotaManager::Observer::Observe] {@ shutdownhang | __psynch_cvwait | <name omitted> | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ] → mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | __psynch_cvwait | <name omitted> | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ]
Crash Signature: mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | __psynch_cvwait | <name omitted> | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ] → mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | __psynch_cvwait | <name omitted> | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ] [@ shutdownhang | __pthread_cond_wait | <name omitted> | <…
Crash Signature: mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | __psynch_cvwait | <name omitted> | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ] [@ shutdownhang | __pthread_cond_wait | <name omitted> | → mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | __psynch_cvwait | <name omitted> | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ] [@ shutdownhang | __pthread_cond_wait | <name omitted> |

Quote from bug 1633342 comment 4:

The cause is similar for each of those components: during the shutdown, many components spin the event loop when they receive notification X ("xpcom-shutdown" for instance, but we have a few others). They do because they want to receive IPC calls, or because they want to wait until a subcomponent/object is released. Often this spinning is not terminated, and because of this, other components do not receive the same notification X.

For instance, https://bugzilla.mozilla.org/show_bug.cgi?id=1435962#c0 describes what happens for mozilla::net::nsHttpConnectionMgr::Shutdown().

The fix is to remove the spinning of the event loop in every component that does it during the shutdown because that can trigger race conditions.

Looking at the observe function I find:

MOZ_ALWAYS_TRUE(SpinEventLoopUntil([&]() { return mShutdownComplete; }));

We should probably find a different way than spinning the event loop directly here on the main thread while waiting for ourselves to shut down, probably through a shutdown blocker?

Flags: needinfo?(sgiesecke)
Flags: needinfo?(jvarga)

(In reply to Junior [:junior] from comment #18)

This is around 1/2 of mozilla::TaskController::GetRunnableForMTTask crashes, which is #4 top-crash in beta.

Here are some examples
https://crash-stats.mozilla.org/report/index/9f6b7760-93e0-4faa-a51d-c05d70200805
https://crash-stats.mozilla.org/report/index/da9b07bf-7e42-477a-b142-8727f0200805

I filed Bug 1658729 to differentiate the signatures.

Depends on: 1658729
Flags: needinfo?(sgiesecke)

I randomly picked a different signature and filed bug 1658871 for it, which seems completely independent from this.

Crash Signature: <name omitted> | mozilla::dom::quota::QuotaManager::Observer::Observe] → <name omitted> | mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | mozilla::TaskController::GetRunnableForMTTask | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ] [@ shutdownhang | __psynch_…
Crash Signature: __psynch_cvwait | mozilla::TaskController::GetRunnableForMTTask | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ] → __psynch_cvwait | mozilla::TaskController::GetRunnableForMTTask | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ] [@ shutdownhang | __pthread_cond_wait | mozilla::TaskController::GetRunnableForMTTask | mozilla::Sp…
Crash Signature: mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] → mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | __psynch_cvwait | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ]

Change the status for beta to have the same as nightly and release.
For more information, please visit auto_nag documentation.

(In reply to alex_mayorga from comment #21)

¡Hola y'all!

Crashed like https://crash-stats.mozilla.org/report/index/d44b9584-b17e-4065-a375-36e800201017#tab-details on Nightly.

Updating flags per https://crash-stats.mozilla.org/signature/?product=Firefox&signature=shutdownhang%20%7C%20mozilla%3A%3ATaskController%3A%3AGetRunnableForMTTask%20%7C%20mozilla%3A%3ASpinEventLoopUntil%3CT%3E%20%7C%20mozilla%3A%3Adom%3A%3Aquota%3A%3AQuotaManager%3A%3AObserver%3A%3AObserve FWIW.

Please ni? me if you need anything from the profile or device.

¡Gracias!
Alex

Hi Alex, thanks for your offer to help!

I see, that the "QuotaManager IO" thread is doing something on the filesystem while our shutdown hang timeout fires.

What did you do before closing Firefox? Are you able to reproduce this?

Flags: needinfo?(alex_mayorga)

There is one crash detected on Nightly 84.0a1 too based on the crash stats, updating flag for FX 84 status.

Keywords: regression

¡Hola Jens!

Unfortunately, I don’t have much in regards to steps to reproduction.

All I can say is that this laptop has been experiencing near 100% CPU consumption that had lead to Firefox hanging and crashing way more often than normal.

I’ll continue to be in the look for exact steps so not clearing this ni? but wanted to give context.

¡Gracias!
Alex

Picking another random crash signature, it seems we have three threads involved in the blocking situation:

At some point in time, the Quota Manager IO thread started an expensive or stalled disk operation deleting files, causing gNormalOriginOps to be non-empty.

Then:

Deleting files might indicate some interference with other modules' or extensions' shutdown behavior? At least I see in this particular crash's metadata, that there are several ad-blocker and privacy extensions installed.

Alex, I see in your crash the same situation between those three threads. And in both crashes' metadata I see the uBlock extension (and the Firefox screenshot extension, which might save huge temporary data, too?). Do you remember some particular task you did during the lifetime of the browser before shutting down (like taking dozens of screenshots, opening complex sites in private browsing mode, ...) ?

Jens, that's very likely bug 1588512. I already analyzed the crash reports related to shutdown hangs some time ago. I created a meta bug 1588498 for it with findings filed as dependent bugs. We should just assign priorities to the dependent bugs and fix them incrementally as time and resources permit.

There are basically two major problems:

  1. Timeouts caused by dead lock situations during shutdown (this was mitigated quite a bit some time ago)
  2. Timeouts caused by doing a lot of I/O during shutdown

The problem you described (bug 1588512) is probably caused by clearing too many origin directories during shutdown.
If we want to prioritize it, we should investigate if the user has some settings that trigger clearing of all data when FF quits, that should confirm the assumption.

Flags: needinfo?(jvarga)
Depends on: 1588512

¡Hola y'all!

Not seen this crash in the most recent months here so clearing my n? for now.

I still have access to the system when https://crash-stats.mozilla.org/report/index/d44b9584-b17e-4065-a375-36e800201017#tab-details occurred so please let me know if there's anything that you'd need me to try or collect.

¡Gracias!
Alex

Flags: needinfo?(alex_mayorga)

Looking at the signatures:

[@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ]
[@ shutdownhang | __pthread_cond_wait | mozilla::TaskController::GetRunnableForMTTask | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ]

Active until 84
[@ shutdownhang | mozilla::TaskController::GetRunnableForMTTask | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ]


Older versions only:
[@ shutdownhang | mozilla::dom::quota::QuotaManager::Observer::Observe ]

Inactive:
[@ shutdownhang | __psynch_cvwait | <name omitted> | mozilla::dom::quota::QuotaManager::Observer::Observe ]
[@ shutdownhang | ntdll.dll | kernelbase.dll | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ]
[@ shutdownhang | ntdll.dll | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ]
[@ shutdownhang | __psynch_cvwait | <name omitted> | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ]
[@ shutdownhang | __pthread_cond_wait | <name omitted> | <name omitted> | mozilla::dom::quota::QuotaManager::Observer::Observe ]
[@ shutdownhang | __psynch_cvwait | mozilla::TaskController::GetRunnableForMTTask | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ]
[@ shutdownhang | __psynch_cvwait | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ] 
Crash Signature: [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | __psynch_cvwait | <name omitted> | mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | ntdll.dll | kernelbase.… → [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ] [@ shutdownhang | __pthread_cond_wait | mozilla::TaskController::GetRunnableForMTTask | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaM…
QA Whiteboard: qa-not-actionable

Apart from bug 1788986, we might also want to replace the SpinEventLoopUntil in the observer with an async shutdown blocker?

Flags: needinfo?(jvarga)

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash
Severity: critical → S2
Depends on: 1820823

Looking at the crashes collected here they seem to be exactly the same as on bug 1588498 but instead reporting the signature based on the IPDL Background thread here we see a signature based on the main thread.

Can we make them converge based on IPDL background ? Thanks

Flags: needinfo?(jvarga) → needinfo?(gsvelto)

That'd be tricky. I see that there's several crashes here that were captured by WER at an awkward time. Take this crash for example. It seems like we had hit an assertion that would generate a crash report like those in bug 1588498, and the exception handler thread shows that we were launching the crash reporter client. In other words when this minidump was captured we had already captured a minidump that'd end up in bug 1588498. However shutdown had already started so when the watchdog thread timed out it tried to kill Firefox again. Because we had already hit the exception handler, this new exception was captured by WER instead which generated another minidump which ended up under this signature.

I wonder what the right fix should be there since this is going to be a common problem. We might want to put an atomic somewhere that gets set when we enter the exception handler and which the nsTerminator thread will check before trying to kill Firefox. We already have something like that for child process crashes. Maybe we should put something like that in the main process but not within the exception handler (as we don't have it anymore at this stage of shutdown).

Flags: needinfo?(gsvelto)

(In reply to Gabriele Svelto [:gsvelto] from comment #34)

That'd be tricky. I see that there's several crashes here that were captured by WER at an awkward time. Take this crash for example. It seems like we had hit an assertion that would generate a crash report like those in bug 1588498, and the exception handler thread shows that we were launching the crash reporter client. In other words when this minidump was captured we had already captured a minidump that'd end up in bug 1588498. However shutdown had already started so when the watchdog thread timed out it tried to kill Firefox again. Because we had already hit the exception handler, this new exception was captured by WER instead which generated another minidump which ended up under this signature.

Does this mean we would see the same hang twice?

I wonder what the right fix should be there since this is going to be a common problem. We might want to put an atomic somewhere that gets set when we enter the exception handler and which the nsTerminator thread will check before trying to kill Firefox. We already have something like that for child process crashes. Maybe we should put something like that in the main process but not within the exception handler (as we don't have it anymore at this stage of shutdown).

Yeah, that sounds like a path forward, thanks.

In the meantime I will just copy over the signatures...

Depends on: 1822498
No longer depends on: 1822498
Alias: qm-shutdown-hangs
Status: NEW → RESOLVED
Closed: 2 years ago
Duplicate of bug: qm-shutdown-hangs
Resolution: --- → DUPLICATE
Crash Signature: [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe ] [@ shutdownhang | __pthread_cond_wait | mozilla::TaskController::GetRunnableForMTTask | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaM…
No longer depends on: 1820823
You need to log in before you can comment on or make changes to this bug.