Crash in [@ OOM | large | js::AutoEnterOOMUnsafeRegion::crash | js::AutoEnterOOMUnsafeRegion::crash | JS::CallbackTracer::onEdge]
Categories
(Core :: JavaScript: GC, defect)
Tracking
()
People
(Reporter: pascalc, Unassigned)
References
Details
(Keywords: crash, regression)
Crash Data
Crash report: https://crash-stats.mozilla.org/report/index/3541ecd1-2435-4e7a-9844-292a60230301
MOZ_CRASH Reason: [unhandlable oom] Failed to allocate new chunk during GC
Top 10 frames of crashing thread:
0 xul.dll MOZ_Crash mfbt/Assertions.h:261
0 xul.dll js::AutoEnterOOMUnsafeRegion::crash js/src/vm/JSContext.cpp:1300
1 xul.dll js::AutoEnterOOMUnsafeRegion::crash js/src/vm/JSContext.cpp:1316
2 xul.dll JS::CallbackTracer::onEdge js/public/TracingAPI.h:245
2 xul.dll js::GenericTracerImpl<JS::CallbackTracer>::onObjectEdge js/public/TracingAPI.h:219
2 xul.dll js::gc::TraceEdgeInternal js/src/gc/Tracer.h:106
2 xul.dll js::TraceManuallyBarrieredEdge js/src/gc/Tracer.h:248
2 xul.dll js::BaseShape::traceChildren js/src/gc/TraceMethods-inl.h:306
2 xul.dll js::gc::TraceCycleCollectorChildren js/src/gc/Tracer.cpp:99
2 xul.dll JS_TraceShapeCycleCollectorChildren js/src/jsfriendapi.cpp:185
Reporter | ||
Comment 1•2 years ago
|
||
There is significant volume for this crasher in 110.
Reporter | ||
Comment 2•2 years ago
|
||
Too late for a fix in 110.
Comment 3•2 years ago
|
||
This looks like a cycle collector OOM. There's likely some inlined frame between onEdge and crash where the allocation is happening.
Comment 4•2 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #3)
This looks like a cycle collector OOM. There's likely some inlined frame between onEdge and crash where the allocation is happening.
Which is weird, because we unroll inlined frames now. And in fact, many of the above frames are inlined.
But this is not a cycle collector OOM. It looks like it from the above stack, but the full crash shows that this is happening during a minor GC. The crash is in js::gc::AllocateCellInGC while tracing the global from BaseShape
, oddly enough. But that's probably just on the stack because most of the tracing doesn't use the C stack?
I assume the callback would be TraversalTracer::onChild. Is something it traces getting tenured and allocating a tenured cell? There must be a bunch more missing inlined frames.
Comment 5•2 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #4)
It shouldn't be possible for us to call into TraversalTracer like this. Nursery collection uses a TenuringTracer and that isn't even a CallbackTracer. But all the stacks on the crash report this happening.
Comment 6•2 years ago
|
||
:jonco, is it possible to get a fix out for this issue for 111?
Comment 7•2 years ago
|
||
it would also be nice to have a severity set on this bug, if possible, thanks!
Comment 8•2 years ago
|
||
(In reply to Jared Hirsch [:jhirsch] (he/him) (Needinfo please) from comment #6)
I don't know what's going on with this stack, because it shouldn't be possible.
However assuming this is OOM while collecting the nursery then it's another signature for bug 1472062 and not a new issue.
Comment 9•2 years ago
|
||
The severity field is not set for this bug.
:willyelm, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 10•2 years ago
|
||
:sfink can you verify if this is just another signature for bug 1472062 as :jonco suggested in comment 8?
Comment 11•2 years ago
|
||
(In reply to Bryan Thrall [:bthrall] from comment #10)
:sfink can you verify if this is just another signature for bug 1472062 as :jonco suggested in comment 8?
I'm really not sure, but that seems the most likely.
The stack appears to be corrupt. If so, then which part of the stack should we trust? You could argue that the leafmost frames are the most trustworthy because they're closest to the actual program counter, and so the stackwalker hasn't had a chance to get lost yet. In that case, this would be a duplicate of the cycle collector OOM bug. On the other hand, if the stackwalker got lost, how did it manage to produce a whole series of valid-looking frames later? It's reading those off of the stack, it can't just hallucinate them. So maybe the older frames are valid, and it's an OOM during nursery collection (bug 1472062).
Oh! The non-inlined frames are:
0 xul.dll js::AutoEnterOOMUnsafeRegion::crash(char const*) js/src/vm/JSContext.cpp:1300 context
1 xul.dll js::AutoEnterOOMUnsafeRegion::crash(unsigned long long, char const*) js/src/vm/JSContext.cpp:1316 cfi
2 xul.dll js::TenuringTracer::collectToObjectFixedPoint() js/src/gc/Tenuring.cpp:873 cfi
3 xul.dll js::Nursery::collect(JS::GCOptions, JS::GCReason) js/src/gc/Nursery.cpp:1115 cfi
.
.
.
That stack would be absolutely consistent with the nursery OOM hypothesis. The problem with the stack could be purely in its reporting of inlined function "frames". In fact, this is looking like the most plausible interpretation to me. Without the inline frames, this would have already been assigned to bug 1472062.
(The remaining possibility is that this stack is completely valid, and the problem is that the actual tracer object or at least its vtable is getting swapped via stack corruption or something. The seems much less likely than a problem in reporting inline frames.)
I could imagine the problem being that some of those innermost inline frames are for identical function bodies that got folded together by the linker before being inlined during LTO? Can that happen?
I'm not sure how to update the signature for this.
Updated•2 years ago
|
Comment 12•2 years ago
|
||
Because bug 1825078 indicates that the inline frames are from bad debug info, it seems like the consensus here is to assign this to bug 1472062.
:jonco, do you have any reservations about marking this as a duplicate of bug 1472062 at this point?
Updated•2 years ago
|
Description
•