Crash in nsGlobalWindow::ClearDocumentDependentSlots: MOZ_CRASH(Unhandlable OOM while clearing document dependent slots.)
Categories
(Core :: DOM: Core & HTML, defect, P3)
Tracking
()
People
(Reporter: n.nethercote, Unassigned)
References
Details
(Keywords: crash, topcrash)
Crash Data
This bug was filed from the Socorro interface and is report bp-0901faec-eb60-4ad0-b1ce-0a9e10171003. ============================================================= We are hitting this: > MOZ_CRASH(Unhandlable OOM while clearing document dependent slots.) But in the linked crash report the system has huge amounts of virtual memory, page file and physical memory. The code looks like this: > if (!WindowBinding::ClearCachedDocumentValue(aCx, this) || > !WindowBinding::ClearCachedPerformanceValue(aCx, this)) { > MOZ_CRASH("Unhandlable OOM while clearing document dependent slots."); > } It assumes that ClearCachedDocumentValue() and ClearCachedPerformanceValue() failure indicates OOM, but they will also fail if `this->GetWrapper()` returns null. bz, is that possible?
Comment 1•7 years ago
|
||
> but they will also fail if `this->GetWrapper()` returns null. No, they won't; they'll no-op. The generated code is: obj = aObject->GetWrapper(); if (!obj) { return true; } precisely because no wrapper is not a failure condition: it just means that there is nothing to clear. False return means that either get_document or get_performance returned false. Looking at get_document, we're in the case when we just cleared the slot, so we'll make it to the self->GetDocument() call. Once we do, we will return false only in the following cases: 1) GetOrCreateDOMReflector(cx, result, args.rval()) returns false. 2) Wrapping the value the getter returned into the window's compartment returns false. 3) Wrapping the value the getter returned into the caller's compartment (which is also the window compartment in this case) returns false. That's it. Looking at get_performance, it can return false also in only those three cases. Wrapping into compartments only fails if JS_WrapValue returns false, which only happens if JSCompartment::wrap() returns false, which can happen in the following ways: A) CheckSystemRecursionLimit() (inside getNonWrapperObjectForCurrentCompartment) fails. B) The prewrap callback outputs null. This should never happen for webidl objects, afaict. C) JSCompartment::getOrCreateWrapper fails. I believe this can only happen on "OOM". As for GetOrCreateDOMReflector, it returns false if: D) CouldBeDOMBinding() returns false (should never happen for document or performance). E) The actual WrapObject() call returns false. F) JS_WrapValue returns false, see above. WrapObject() _can_ fail in somewhat interesting ways for documents. See <http://searchfox.org/mozilla-central/source/dom/base/nsINode.cpp#2953-2958>. But that shouldn't happen in this case, since we're just setting the document up so all the script handling object state should be fine. The other thing both WrapObjects call is the relevant binding's Wrap(), which can happen in cases when globals are missing or protos can't be instantiated, but fundamentally none of those should be happening here. And of course it can fail on "OOM". Now the big caveat: "OOM" for our purposes is "out of SpiderMonkey heap", not "out of memory". It's totally possible to hit "OOM" without actually being out of memory for malloc() purposes... I don't know offhand what the cap is on the size of SpiderMonkey's heap, esp because it looks like we have separate heaps for gcthing allocations and JS_malloc allocations or something. So for this specific crash, my best guess is that we either hit the SpiderMonkey memory cap or CheckSystemRecursionLimit() failed. Hard to tell about the latter, because the stack is truncated at the first jitframe.
Reporter | ||
Comment 2•7 years ago
|
||
> No, they won't; they'll no-op. The generated code is:
>
> obj = aObject->GetWrapper();
> if (!obj) {
> return true;
> }
>
> precisely because no wrapper is not a failure condition: it just means that
> there is nothing to clear.
Yes, my bad.
Thank you for the detailed analysis.
jonco, does SM have a heap limit?
Comment 3•7 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #2) > jonco, does SM have a heap limit? Yes, the GC parameter JSGC_MAX_BYTES is used to set this. This is set to 0xffffffff in XPConnect.
Comment 4•7 years ago
|
||
Fwiw, bug 1197540 has a testcase that can trigger this assert. That testcase definitely looks like an infinite recursion case to me, so it might be running into CheckSystemRecursionLimit() failure... I'll try catching it in rr.
Updated•7 years ago
|
Comment 5•7 years ago
|
||
426 crashes in the last week in 57.
Updated•7 years ago
|
There are around 1000 crashes per week on release 57 versions. From the duplicate bug 1422313, sounds like this is a shift in signature rather than a new crash.
Updated•7 years ago
|
Comment 8•6 years ago
|
||
bp-9d2936eb-65eb-4e92-bd7d-540380180425 with build 20180410100115 @ nsGlobalWindowInner::ClearDocumentDependentSlots
Comment 9•6 years ago
|
||
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #4) > Fwiw, [Mac] bug 1197540 has a testcase that can trigger this assert. That > testcase definitely looks like an infinite recursion case to me, so it might > be running into CheckSystemRecursionLimit() failure... I'll try catching it > in rr. The Mac crashes for that bug report dropped to near zero around mid-March, from whatever shipped after 58.0.2. https://crash-stats.mozilla.com/signature/?_sort=user_comments&_sort=-date&signature=nsGlobalWindow%3A%3AInnerSetNewDocument&date=%3E%3D2017-12-11T04%3A18%3A16.000Z&date=%3C2018-06-11T05%3A18%3A16.000Z#graphs
Comment 10•6 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #9) > Crashes for that bug report (bug 1197540) dropped to near zero around mid-March, > from whatever shipped after 58.0.2. > https://crash-stats.mozilla.com/signature/?_sort=user_comments&_sort=- > date&signature=nsGlobalWindow%3A%3AInnerSetNewDocument&date=%3E%3D2017-12- > 11T04%3A18%3A16.000Z&date=%3C2018-06-11T05%3A18%3A16.000Z#graphs However, the crash rate for this (Windows) bug has held fairly steady. But that is actually not surpring because the majority of crashes here are on 52.x esr. I had another crash bp-3819de05-b8f6-41a0-9297-1b2980180611
Updated•6 years ago
|
Assignee | ||
Updated•5 years ago
|
Comment 11•5 years ago
|
||
While trying to get an rr trace for bug 1593704 I kept triggering this issue. I created a Pernosco session which can be found here: https://pernos.co/debug/HeNG0Imk-tsJryLVWyqD2g/index.html
Updated•4 years ago
|
Updated•4 years ago
|
Comment 13•4 years ago
|
||
(In reply to Tyson Smith [:tsmith] from comment #11)
While trying to get an rr trace for bug 1593704 I kept triggering this issue. I created a Pernosco session which can be found here: https://pernos.co/debug/HeNG0Imk-tsJryLVWyqD2g/index.html
I happend to resurrect the pernosco trace. It seems that GetOrCreateDOMReflector returns false (I added an entry to the notebook there). Due to massive inlining it is less clear (to me), if this can really only be caused by OOM.
Comment 14•4 years ago
|
||
Here is a Pernosco session created with a -O0
build hopefully this is more helpful. https://pernos.co/debug/wo7vFFam6FDy7kYhiopX5g/index.html
Comment 15•4 years ago
|
||
Thanks a lot! That makes it easier. It seems, we are definitely not seeing an OOM here.
The low-level analysis is, that we rely on that call stack on GetWrapperMaybeDead to give us always a living wrapper, which is not the case. I was not yet able to check, what might cause the wrapper to be "dead and in the process of being finalized." as the comment points out to be a possible cause for being nullptr
.
Olli, does this help to understand this better?
Comment 16•4 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #15)
I was not yet able to check, what might cause the wrapper to be "dead and in the process of being finalized."
This means that the GC has determined that the wrapper is dead, but the wrapper has not been destroyed yet (and pointer to it has not been set to null).
Comment 17•4 years ago
|
||
The stack shows an interesting cycle of:
XMLHttpRequest_Binding::open
...
XMLHttpRequestMainThread::FireReadystatechangeEvent
...
js::RunScript
...
XMLHttpRequest_Binding::send
...
XMLHttpRequestMainThread::ResumeEventDispatching
EventTarget::DispatchEvent
...
js::RunScript
...
XMLHttpRequest_Binding::open
over and over again. So sync XHR that triggers sync XHR etc.
Eventually we are way down into that stack, in danger of hitting JS engine stack-overflow checks, and processing events under a sync XHR. We land in nsDocumentOpenInfo::OnStartRequest
and go from there. We try to create a wrapper for the document, try to create its proto, try to define properties on it, hit the over-recursion check in CallJSAddPropertyOp
and fail it, fail to add the property and bubble up the stack failing things.
I added some notes to the Pernosco session for these bits.
Updated•4 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 18•2 years ago
|
||
We had a report on webcompat with regards to this website
https://www.ensonhaber.com/
https://crash-stats.mozilla.org/report/index/5d90800a-fe28-4785-833e-dd8b60220302#tab-bugzilla
https://github.com/webcompat/web-bugs/issues/100455
If it helps.
Comment 19•2 years ago
|
||
(In reply to Karl Dubostđź’ˇ :karlcow from comment #18)
We had a report on webcompat with regards to this website
This crash is just a symptom of running out of memory. What is more interesting is what is causing the browser to use a lot of memory. You'll want a new bug for that.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 20•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 content process crashes on beta
For more information, please visit auto_nag documentation.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 21•2 years ago
|
||
As of comment 19 I'd assume this to be not worth S2 ?
Comment 22•2 years ago
|
||
Comment 19 was mostly relevant to a specific site that was maybe showing this issue.
It is a reasonably common crash, so it might qualify as S2, but we have no real plan of action here. Boris landed a ton of instrumentation in 2018 to try to figure out why this is happening, but he didn't post any kind of conclusion in these bugs as far as I can see, so I guess nothing useful resulted from it.
Although looking at this now, it is possible that bug 1543537 helped here. Weird null derefs in documents are a possible symptom of that issue. I fixed it in 107, and the volume in 107 beta seems to be a lot lower than 106 beta (I think comment 20 was made when 106 was in beta). Maybe we can wait a few weeks and see if the volume on 107 continues to be low, then we could remove the top crash and mark it S3.
Comment 23•2 years ago
|
||
This got frequent with 109.0a1 20221129084032: 25-70 crashes per Nightly build. There are no crash reports for the latest Nightly (20221030214707?) so far. Push log lists nothing obvious.
Updated•2 years ago
|
Comment 24•2 years ago
|
||
The bug is linked to topcrash signatures, which match the following criteria:
- Top 5 desktop browser crashes on Mac on beta
- Top 5 desktop browser crashes on Mac on release
- Top 20 desktop browser crashes on release (startup)
- Top 20 desktop browser crashes on beta
- Top 10 desktop browser crashes on nightly
- Top 10 content process crashes on beta
- Top 10 content process crashes on release
- Top 5 desktop browser crashes on Linux on beta
- Top 5 desktop browser crashes on Linux on release
- Top 5 desktop browser crashes on Windows on release (startup)
For more information, please visit auto_nag documentation.
Comment 25•2 years ago
|
||
Hello Andrew, we got new reports and this is a current topcrash-startup. Would you please take another look? Thanks.
Comment 26•2 years ago
|
||
The set of patches I'm seeing for the build that started crashing a lot is this. That includes bug 1219128, which has already had multiple OOM issues in automation associated with it already. I think we should back that patch out if the memory issues can't be resolved very quickly.
Comment 27•2 years ago
|
||
Hmm I guess it got backed out immediately, but still got marked fixed somehow, so I guess that can't be to blame?
Updated•2 years ago
|
Comment 28•2 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #26)
The set of patches I'm seeing for the build that started crashing a lot is this. That includes bug 1219128, which has already had multiple OOM issues in automation associated with it already. I think we should back that patch out if the memory issues can't be resolved very quickly.
The OOM issues related to this patch are strictly only caused by the test suite configured to be greedy at finding OOM issues by having a loop simulating OOM, and the associated backout is caused by failing to annotate the test cases that such tests cases were instrumented as such. None of these OOM are caused by the system running out of memory.
Otherwise Bug 1219128 only changes how Object
and Function
classes are registered in the GlobalObject
, by allocating them eagerly, which would most likely be present anyway if the global is not unused.
Updated•2 years ago
|
Comment 29•2 years ago
|
||
I filed a new bug for this recent spike as it seems to involve a bunch of unrelated signatures.
Comment 30•2 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit auto_nag documentation.
Comment 31•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criteria:
- Top 20 desktop browser crashes on release (startup)
- Top 10 content process crashes on release
For more information, please visit auto_nag documentation.
Comment 32•1 year ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit auto_nag documentation.
Comment 33•1 year ago
|
||
This is a symptom of an OOM. There has been extensive investigation that hasn't found anything.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 34•1 year ago
|
||
I don't know if it helps. But i recently stumbled over this a few times. A website that quite produced this error on me is https://www.welt.de/
It's a Newswebsite. If you let it "idle" for a while the website will notify you that there have been news updates.. First i though that this trigger the crash, however it doesn't. so just don'T touch the website and let it stay for a while longer most of time it took around 1hour till the tab crashes (though sometimes it doesn't happen at all). What's pretty inetresting to me is that before it crashes i'll get spamed with "save file to" with a lot of empty .html files (don't even know why that happens). After you saved them all or "cancle" the download you'll then realize that the tab has crashed. It also once gave me an Memory Expection Read as well with the exact same behaviour.
Comment 35•1 year ago
|
||
Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criteria:
- Top 20 desktop browser crashes on release (startup)
- Top 10 content process crashes on release
For more information, please visit BugBot documentation.
Comment 36•1 year ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit BugBot documentation.
Comment 37•1 year ago
|
||
Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criteria:
- Top 20 desktop browser crashes on release (startup)
- Top 10 content process crashes on release
For more information, please visit BugBot documentation.
Comment 38•1 year ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit BugBot documentation.
Comment 39•1 year ago
|
||
Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criteria:
- Top 20 desktop browser crashes on release (startup)
- Top 10 content process crashes on release
For more information, please visit BugBot documentation.
Comment 40•11 months ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit BugBot documentation.
Updated•10 months ago
|
Description
•