Closed Bug 1645865 Opened 4 years ago Closed 2 years ago

MOZ_CRASH("Unhandlable OOM while clearing document dependent slots.") [@ nsGlobalWindowInner::InitDocumentDependentState]

Categories

(Core :: DOM: Navigation, defect, P3)

defect

Tracking

()

RESOLVED FIXED
113 Branch
Tracking Status
firefox-esr102 --- fixed
firefox79 --- wontfix
firefox89 --- wontfix
firefox90 --- wontfix
firefox91 --- wontfix
firefox111 --- wontfix
firefox112 --- wontfix
firefox113 --- fixed

People

(Reporter: tsmith, Assigned: jjaschke)

References

(Blocks 1 open bug)

Details

(Keywords: crash, testcase, Whiteboard: [tbird crash])

Crash Data

Attachments

(2 files)

Attached file testcase.html (deleted) —
==50528==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000001 (pc 0x7f4152632b4d bp 0x7ffddf94b2d0 sp 0x7ffddf94b1e0 T0)
==50528==The signal is caused by a WRITE memory access.
==50528==Hint: address points to the zero page.
    #0 0x7f4152632b4c in ~RefPtr /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:80:9
    #1 0x7f4152632b4c in nsGlobalWindowInner::InitDocumentDependentState(JSContext*) /gecko/dom/base/nsGlobalWindowInner.cpp:1618:1
    #2 0x7f415267da6c in nsGlobalWindowOuter::SetNewDocument(mozilla::dom::Document*, nsISupports*, bool, mozilla::dom::WindowGlobalChild*) /gecko/dom/base/nsGlobalWindowOuter.cpp:2381:23
    #3 0x7f4157462e3b in nsDocumentViewer::InitInternal(nsIWidget*, nsISupports*, mozilla::dom::WindowGlobalChild*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, bool, bool, bool) /gecko/layout/base/nsDocumentViewer.cpp:961:22
    #4 0x7f415746229a in nsDocumentViewer::Init(nsIWidget*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::dom::WindowGlobalChild*) /gecko/layout/base/nsDocumentViewer.cpp:750:10
    #5 0x7f4159fe7cd5 in nsDocShell::SetupNewViewer(nsIContentViewer*, mozilla::dom::WindowGlobalChild*) /gecko/docshell/base/nsDocShell.cpp:7741:7
    #6 0x7f4159fe67d8 in nsDocShell::Embed(nsIContentViewer*, mozilla::dom::WindowGlobalChild*) /gecko/docshell/base/nsDocShell.cpp:5332:17
    #7 0x7f4159ff40ac in nsDocShell::CreateAboutBlankContentViewer(nsIPrincipal*, nsIPrincipal*, nsIContentSecurityPolicy*, nsIURI*, bool, bool, mozilla::dom::WindowGlobalChild*) /gecko/docshell/base/nsDocShell.cpp:6350:14
    #8 0x7f4159fb3a66 in nsDocShell::EnsureContentViewer() /gecko/docshell/base/nsDocShell.cpp:6177:17
    #9 0x7f4159fce247 in GetDocument /gecko/docshell/base/nsDocShell.cpp:2948:3
    #10 0x7f4159fce247 in non-virtual thunk to nsDocShell::GetDocument() /gecko/docshell/base/nsDocShell.cpp
    #11 0x7f41526b058d in nsPIDOMWindowOuter::MaybeCreateDoc() /gecko/dom/base/nsGlobalWindowOuter.cpp:7625:45
    #12 0x7f415a1130c6 in GetDoc /builds/worker/workspace/obj-build/dist/include/nsPIDOMWindow.h:812:7
    #13 0x7f415a1130c6 in mozilla::a11y::DocManager::OnStateChange(nsIWebProgress*, nsIRequest*, unsigned int, nsresult) /gecko/accessible/base/DocManager.cpp:220:43
    #14 0x7f41512875a0 in nsDocLoader::DoFireOnStateChange(nsIWebProgress*, nsIRequest*, int&, nsresult) /gecko/uriloader/base/nsDocLoader.cpp:1377:3
    #15 0x7f4151285dfc in nsDocLoader::FireOnStateChange(nsIWebProgress*, nsIRequest*, int, nsresult) /gecko/uriloader/base/nsDocLoader.cpp:1340:14
    #16 0x7f41512845f2 in nsDocLoader::OnStartRequest(nsIRequest*) /gecko/uriloader/base/nsDocLoader.cpp
    #17 0x7f4151284b8c in non-virtual thunk to nsDocLoader::OnStartRequest(nsIRequest*) /gecko/uriloader/base/nsDocLoader.cpp
    #18 0x7f414eb34b2e in mozilla::net::nsLoadGroup::AddRequest(nsIRequest*, nsISupports*) /gecko/netwerk/base/nsLoadGroup.cpp:484:22
    #19 0x7f414eaf2feb in nsBaseChannel::AsyncOpen(nsIStreamListener*) /gecko/netwerk/base/nsBaseChannel.cpp:723:31
    #20 0x7f41512954d8 in nsURILoader::OpenURI(nsIChannel*, unsigned int, nsIInterfaceRequestor*) /gecko/uriloader/base/nsURILoader.cpp:730:17
    #21 0x7f415a0116d1 in nsDocShell::OpenInitializedChannel(nsIChannel*, nsIURILoader*, unsigned int) /gecko/docshell/base/nsDocShell.cpp:9795:20
    #22 0x7f415a00abf5 in nsDocShell::DoURILoad(nsDocShellLoadState*, nsIDocShell**, nsIRequest**) /gecko/docshell/base/nsDocShell.cpp:9610:8
    #23 0x7f4159f823cb in nsDocShell::InternalLoad(nsDocShellLoadState*, nsIDocShell**, nsIRequest**) /gecko/docshell/base/nsDocShell.cpp:8807:8
    #24 0x7f4159fb57d1 in nsDocShell::LoadURI(nsDocShellLoadState*, bool) /gecko/docshell/base/nsDocShell.cpp:808:10
    #25 0x7f4152b0e364 in nsFrameLoader::ReallyStartLoadingInternal() /gecko/dom/base/nsFrameLoader.cpp:684:23
    #26 0x7f4152b0d76e in nsFrameLoader::ReallyStartLoading() /gecko/dom/base/nsFrameLoader.cpp:555:17
    #27 0x7f4152864276 in mozilla::dom::Document::MaybeInitializeFinalizeFrameLoaders() /gecko/dom/base/Document.cpp:8503:13
    #28 0x7f41529235e4 in applyImpl<mozilla::dom::Document, void (mozilla::dom::Document::*)()> /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:1185:12
    #29 0x7f41529235e4 in apply<mozilla::dom::Document, void (mozilla::dom::Document::*)()> /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:1191:12
    #30 0x7f41529235e4 in mozilla::detail::RunnableMethodImpl<mozilla::dom::Document*, void (mozilla::dom::Document::*)(), true, (mozilla::RunnableKind)0>::Run() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:1237:13
    #31 0x7f415259e703 in nsContentUtils::RemoveScriptBlocker() /gecko/dom/base/nsContentUtils.cpp:5344:15
    #32 0x7f4152854792 in mozilla::dom::Document::EndUpdate() /gecko/dom/base/Document.cpp:7090:3
    #33 0x7f415250e3b6 in mozAutoDocUpdate::~mozAutoDocUpdate() /gecko/dom/base/mozAutoDocUpdate.h:34:18
    #34 0x7f4152b4d9bf in nsINode::ReplaceOrInsertBefore(bool, nsINode*, nsINode*, mozilla::ErrorResult&) /gecko/dom/base/nsINode.cpp:2698:1
    #35 0x7f415328d2e0 in InsertBefore /gecko/dom/base/nsINode.h:1971:12
    #36 0x7f415328d2e0 in AppendChild /gecko/dom/base/nsINode.h:1974:12
    #37 0x7f415328d2e0 in mozilla::dom::Node_Binding::appendChild(JSContext*, JS::Handle<JSObject*>, void*, JSJitMethodCallArgs const&) /builds/worker/workspace/obj-build/dom/bindings/NodeBinding.cpp:989:60
    #38 0x7f40c506e552  (<unknown module>)
Flags: in-testsuite?

A Pernosco session is available here: https://pernos.co/debug/RaWTyMDDNECMEIZiQ44pPA/index.html

Crash Signature: [@ nsGlobalWindowInner::InitDocumentDependentState]
Summary: Crash in [@ nsGlobalWindowInner::InitDocumentDependentState] → MOZ_CRASH("Unhandlable OOM while clearing document dependent slots.") [@ nsGlobalWindowInner::InitDocumentDependentState]
Severity: -- → S3
Priority: -- → P3
Whiteboard: [tbird crash]

Nika is most familiar with this part of the code, so I'll defer to Nika to review these Thunderbird crashes.

Flags: needinfo?(nkochar) → needinfo?(nika)

Very odd that I just crashed firefox bp-15215314-3c3e-4f0a-86e2-dfb550210219. I was AFK, so I have no idea why it crashed. The tab was https://mcafee-total-protection-mtp10us.en.softonic.com/windows?utm_source=wakeup and I had restarted Firefox 10 hours before, so not open very long. My average crash rate is <1 per month, so I don't have a crashy system

(In reply to Tyson Smith [:tsmith] from comment #1)

A Pernosco session is available here: https://pernos.co/debug/RaWTyMDDNECMEIZiQ44pPA/index.html

The crash recorded in this pernosco trace is an over-recursion error which happened to occur within InitDocumentDependentState. Looking at the pernosco trace, it appears as though the JS code is repeatedly calling o.parentNode.removeNode(o) from within a DOMNodeRemoved mutation event listener until it fails to execute the callbacks, perhaps due to the recursion limit, and then calling window.open() which eventually tries to create a new document, and ends up crashing as it cannot create the Document object prototype with this deep of a native stack.

It's unlikely that the code in the pernosco trace is exactly the same as the code from crashes in the wild, but it may be the case that the crashes are caused by overrecursion errors in the wild being caught.

It may help diagnose issues like this in the future to include information about the depth of the main thread native stack in crash metadata, and note when the stack is close to tripping js overrecursion protection.

Flags: needinfo?(nika)

Fairly high crash rate; ~1100/week. Nika's analysis is likely correct, and I suspect as she indicates that this scenario or one very like it is happening in the wild (including to me on a nightly)

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #7)

Fairly high crash rate; ~1100/week. Nika's analysis is likely correct, and I suspect as she indicates that this scenario or one very like it is happening in the wild (including to me on a nightly)

FWIW, crash ping telemetry shows this to be the #7 top crash in content in 92.

Severity: S3 → S2

The crash volume is still concerning.
Andreas, can you please put this into your queue? I think we'd revisit this a bit and figure out a plan for it.

Flags: needinfo?(afarre)

Note that the crash from the testcase looks to be caused by a stack overflow, but there might actually be other crashes here with the same top-level signature but with a different cause mixed in.

With the testcase and Nika's finding we can see that one reason for this could be overrecursion. But looking at bunch of callstacks on crash-stats it doesn't seem to be the whole story. So this bug probably captures way to many OOM crashes to be just one thing. And overrecursion is possibly something we can deal with, but in the face of OOM it is a bit hard to do something else than crashing.

So I wonder if this should really be an S3 as it was to begin with. Otherwise we probably need a better way to distinguish what's actually causing the OOM.

Tentatively setting this to S3 again, that ok Hsin-Yi?

Severity: S2 → S3
Flags: needinfo?(afarre) → needinfo?(htsai)

(In reply to Andreas Farre [:farre] [Out until 2022-10-16] from comment #11)

With the testcase and Nika's finding we can see that one reason for this could be overrecursion. But looking at bunch of callstacks on crash-stats it doesn't seem to be the whole story. So this bug probably captures way to many OOM crashes to be just one thing. And overrecursion is possibly something we can deal with, but in the face of OOM it is a bit hard to do something else than crashing.

So I wonder if this should really be an S3 as it was to begin with. Otherwise we probably need a better way to distinguish what's actually causing the OOM.

Tentatively setting this to S3 again, that ok Hsin-Yi?

No objection, thanks for taking another look!

Flags: needinfo?(htsai)

The bug is linked to a topcrash signature, which matches the following criteria:

  • Top 20 desktop browser crashes on release (startup)
  • Top 10 content process crashes on beta
  • Top 10 content process crashes on release

:hsivonen, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(hsivonen)

Nika, we check for the recursion limit a couple of stack frames earlier: https://searchfox.org/mozilla-central/rev/6f9fdc5c3b164a46004c98a5baaf55b05e2ad329/dom/base/nsGlobalWindowOuter.cpp#2118

Could we extend the check to take some kind of "this close or closer to the limit" parameter and then size the parameter to cover the depth that we need for this stuff to succeed?

Flags: needinfo?(hsivonen) → needinfo?(nika)

That might work as a potential approach, with some allowance for how conservative we need to be. From the pernosco trace in comment 1, it looks like we'd need at least ~20k bytes of leeway (probably more) to avoid the assertion check which failed. The current check appears to only check for ~8k bytes of leeway on top of JS::StackForUntrustedScript (https://searchfox.org/mozilla-central/rev/b1e5f2c7c96be36974262551978d54f457db2cae/js/public/friend/StackLimits.h#309).

Flags: needinfo?(nika)

Thanks. Considering the top crash nature (according to the bot) and the existence of a path forward, let's triage this as S2.

Severity: S3 → S2

Let's see if we have tests that recurse so close to the limit that a larger gap between the point of check and the end of the stack would make them fail:
https://treeherder.mozilla.org/jobs?repo=try&revision=5f62602828246cda34f4b9c62217442ac97eced6

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

(In reply to Release mgmt bot [:suhaib / :marco/ :calixte] from comment #19)

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

This might be caused by whatever made the testcase not trigger the stack limit for me locally.

Assignee: nobody → hsivonen
Status: NEW → ASSIGNED

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

I don't think we really need bug 1593704, bug 1405521, this one, and who knows how many others for the same basic issue. This one at least has a patch that is in progress, at least.

Hsin-Yi, do you think there's somebody who could work on this patch while Henri is out? The basic idea looks simple, and the patch looked mostly finished. It would be good if we could fail in a gentler way. This issue seems fairly common.

Flags: needinfo?(htsai)

Thank you for bringing it up, Andrew. Jan is willing to take a look at where Henri left and resume the patch landing.

Flags: needinfo?(htsai)
Assignee: hsivonen → jjaschke
Attachment #9297774 - Attachment description: Bug 1645865 - Increase the distance to stack size limit when setting new document on a window. → Bug 1645865 - Increase the distance to stack size limit when setting new document on a window. r=nika,tcampbell
Pushed by jjaschke@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8348b4bbb0e7 Increase the distance to stack size limit when setting new document on a window. r=nika,tcampbell
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 113 Branch

The patch landed in nightly and beta is affected.
:jjaschke, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox112 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(jjaschke)

Crashes mostly happen on older versions (102 esr and such). I don't think an uplift is necessary.

Flags: needinfo?(jjaschke)

Given the crash rate on ESR, I think this might be worth an uplift to that branch still. What do you think, Jan?

Flags: needinfo?(jjaschke)

Fine by me. (I'm on PTO, would you mind taking care of that?)

Flags: needinfo?(jjaschke) → needinfo?(ryanvm)

Comment on attachment 9297774 [details]
Bug 1645865 - Increase the distance to stack size limit when setting new document on a window. r=nika,tcampbell

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: crashing fairly often on ESR
  • User impact if declined: some crashes
  • Fix Landed on Version: 113
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): The main risk is that we will fail earlier in some cases that would have been fine, causing some pages to fail to load or whatever.
Attachment #9297774 - Flags: approval-mozilla-esr102?
Flags: needinfo?(ryanvm)

(In reply to Andrew McCreight [:mccr8] from comment #31)

  • Why is the change risky/not risky? (and alternatives if risky): The main risk is that we will fail earlier in some cases that would have been fine, causing some pages to fail to load or whatever.

This patch will shift crashes like the one in the bug, where we randomly fail in a weird way like in a crash, to a more controlled failure where some JS throws an exception. So the more controlled failure is probably better, but we're going to do it a bit more often, I'd expect. I don't think we've seen any reports of issues, though, but it would probably be subtle.

Comment on attachment 9297774 [details]
Bug 1645865 - Increase the distance to stack size limit when setting new document on a window. r=nika,tcampbell

Approved for 102.11esr.

Attachment #9297774 - Flags: approval-mozilla-esr102? → approval-mozilla-esr102+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: