Closed Bug 1819532 Opened 2 years ago Closed 2 years ago

Crash in [@ OOM | large | js::AutoEnterOOMUnsafeRegion::crash | js::AutoEnterOOMUnsafeRegion::crash | JS::CallbackTracer::onEdge]

Categories

(Core :: JavaScript: GC, defect)

Unspecified
Windows 11
defect

Tracking

()

RESOLVED DUPLICATE of bug 1472062
Tracking Status
firefox110 --- wontfix
firefox111 --- wontfix

People

(Reporter: pascalc, Unassigned)

References

Details

(Keywords: crash, regression)

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/3541ecd1-2435-4e7a-9844-292a60230301

MOZ_CRASH Reason: [unhandlable oom] Failed to allocate new chunk during GC

Top 10 frames of crashing thread:

0  xul.dll  MOZ_Crash  mfbt/Assertions.h:261
0  xul.dll  js::AutoEnterOOMUnsafeRegion::crash  js/src/vm/JSContext.cpp:1300
1  xul.dll  js::AutoEnterOOMUnsafeRegion::crash  js/src/vm/JSContext.cpp:1316
2  xul.dll  JS::CallbackTracer::onEdge  js/public/TracingAPI.h:245
2  xul.dll  js::GenericTracerImpl<JS::CallbackTracer>::onObjectEdge  js/public/TracingAPI.h:219
2  xul.dll  js::gc::TraceEdgeInternal  js/src/gc/Tracer.h:106
2  xul.dll  js::TraceManuallyBarrieredEdge  js/src/gc/Tracer.h:248
2  xul.dll  js::BaseShape::traceChildren  js/src/gc/TraceMethods-inl.h:306
2  xul.dll  js::gc::TraceCycleCollectorChildren  js/src/gc/Tracer.cpp:99
2  xul.dll  JS_TraceShapeCycleCollectorChildren  js/src/jsfriendapi.cpp:185

There is significant volume for this crasher in 110.

Too late for a fix in 110.

This looks like a cycle collector OOM. There's likely some inlined frame between onEdge and crash where the allocation is happening.

Component: JavaScript Engine → XPCOM

(In reply to Jon Coppeard (:jonco) from comment #3)

This looks like a cycle collector OOM. There's likely some inlined frame between onEdge and crash where the allocation is happening.

Which is weird, because we unroll inlined frames now. And in fact, many of the above frames are inlined.

But this is not a cycle collector OOM. It looks like it from the above stack, but the full crash shows that this is happening during a minor GC. The crash is in js::gc::AllocateCellInGC while tracing the global from BaseShape, oddly enough. But that's probably just on the stack because most of the tracing doesn't use the C stack?

I assume the callback would be TraversalTracer::onChild. Is something it traces getting tenured and allocating a tenured cell? There must be a bunch more missing inlined frames.

Component: XPCOM → JavaScript: GC

(In reply to Steve Fink [:sfink] [:s:] from comment #4)
It shouldn't be possible for us to call into TraversalTracer like this. Nursery collection uses a TenuringTracer and that isn't even a CallbackTracer. But all the stacks on the crash report this happening.

:jonco, is it possible to get a fix out for this issue for 111?

Flags: needinfo?(jcoppeard)

it would also be nice to have a severity set on this bug, if possible, thanks!

(In reply to Jared Hirsch [:jhirsch] (he/him) (Needinfo please) from comment #6)
I don't know what's going on with this stack, because it shouldn't be possible.

However assuming this is OOM while collecting the nursery then it's another signature for bug 1472062 and not a new issue.

Flags: needinfo?(jcoppeard)

The severity field is not set for this bug.
:willyelm, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(wmedina)
Severity: -- → S3
Flags: needinfo?(wmedina)

:sfink can you verify if this is just another signature for bug 1472062 as :jonco suggested in comment 8?

Flags: needinfo?(sfink)

(In reply to Bryan Thrall [:bthrall] from comment #10)

:sfink can you verify if this is just another signature for bug 1472062 as :jonco suggested in comment 8?

I'm really not sure, but that seems the most likely.

The stack appears to be corrupt. If so, then which part of the stack should we trust? You could argue that the leafmost frames are the most trustworthy because they're closest to the actual program counter, and so the stackwalker hasn't had a chance to get lost yet. In that case, this would be a duplicate of the cycle collector OOM bug. On the other hand, if the stackwalker got lost, how did it manage to produce a whole series of valid-looking frames later? It's reading those off of the stack, it can't just hallucinate them. So maybe the older frames are valid, and it's an OOM during nursery collection (bug 1472062).

Oh! The non-inlined frames are:

0 	xul.dll 	js::AutoEnterOOMUnsafeRegion::crash(char const*) 	js/src/vm/JSContext.cpp:1300 	context
1 	xul.dll 	js::AutoEnterOOMUnsafeRegion::crash(unsigned long long, char const*) 	js/src/vm/JSContext.cpp:1316 	cfi
2 	xul.dll 	js::TenuringTracer::collectToObjectFixedPoint() 	js/src/gc/Tenuring.cpp:873 	cfi
3 	xul.dll 	js::Nursery::collect(JS::GCOptions, JS::GCReason) 	js/src/gc/Nursery.cpp:1115 	cfi
.
.
.

That stack would be absolutely consistent with the nursery OOM hypothesis. The problem with the stack could be purely in its reporting of inlined function "frames". In fact, this is looking like the most plausible interpretation to me. Without the inline frames, this would have already been assigned to bug 1472062.

(The remaining possibility is that this stack is completely valid, and the problem is that the actual tracer object or at least its vtable is getting swapped via stack corruption or something. The seems much less likely than a problem in reporting inline frames.)

I could imagine the problem being that some of those innermost inline frames are for identical function bodies that got folded together by the linker before being inlined during LTO? Can that happen?

I'm not sure how to update the signature for this.

Flags: needinfo?(sfink)

Because bug 1825078 indicates that the inline frames are from bad debug info, it seems like the consensus here is to assign this to bug 1472062.

:jonco, do you have any reservations about marking this as a duplicate of bug 1472062 at this point?

Flags: needinfo?(jcoppeard)
Status: NEW → RESOLVED
Closed: 2 years ago
Duplicate of bug: 1472062
Flags: needinfo?(jcoppeard)
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.