Incorrect inlined frames reported
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
People
(Reporter: sfink, Unassigned)
References
(Regression)
Details
(Keywords: regression)
Attachments
(2 files)
Bug 1819532 seems to show incorrect inlined frames for a crash (well, a bucket full of crashes). If only the non-inlined frames are considered, the stack seems totally plausible:
0 xul.dll js::AutoEnterOOMUnsafeRegion::crash(char const*) js/src/vm/JSContext.cpp:1300 context
1 xul.dll js::AutoEnterOOMUnsafeRegion::crash(unsigned long long, char const*) js/src/vm/JSContext.cpp:1316 cfi
2 xul.dll js::TenuringTracer::collectToObjectFixedPoint() js/src/gc/Tenuring.cpp:873 cfi
3 xul.dll js::Nursery::collect(JS::GCOptions, JS::GCReason) js/src/gc/Nursery.cpp:1115 cfi
.
.
.
but with inlined frames, it has the nursery GC calling the cycle collector's tracer, which can't happen (see bug 1819532 comment 5).
I don't know if these sorts of things are expected or known, but I thought I'd file this bug just in case.
Reporter | ||
Comment 1•2 years ago
|
||
Idle, baseless speculation: what happens when you have two functions with identical bodies that later get inlined during LTO? (Is that the right term? PGO?) Is it possible for the functions to be merged under one of the two bodies, and then the code address range gets attributed to the wrong function? I'm bringing LTO into this because it was my impression that the linker is the thing that can discard duplicates.
It's really not a very plausible explanation for this stack, because all of
2 xul.dll JS::CallbackTracer::onEdge(JSObject**, char const*) js/public/TracingAPI.h:245 inlined
2 xul.dll js::GenericTracerImpl<JS::CallbackTracer>::onObjectEdge(JSObject**, char const*) js/public/TracingAPI.h:219 inlined
2 xul.dll js::gc::TraceEdgeInternal(JSTracer*, JSObject**, char const*) js/src/gc/Tracer.h:106 inlined
2 xul.dll js::TraceManuallyBarrieredEdge(JSTracer*, JSObject**, char const*) js/src/gc/Tracer.h:248 inlined
2 xul.dll js::BaseShape::traceChildren(JSTracer*) js/src/gc/TraceMethods-inl.h:306 inlined
2 xul.dll js::gc::TraceCycleCollectorChildren(JS::CallbackTracer*, js::Shape*) js/src/gc/Tracer.cpp:99 inlined
2 xul.dll JS_TraceShapeCycleCollectorChildren(JS::CallbackTracer*, JS::GCCellPtr) js/src/jsfriendapi.cpp:185 inlined
2 xul.dll TraversalTracer::onChild(JS::GCCellPtr, char const*) xpcom/base/CycleCollectedJSRuntime.cpp:442 inlined
2 xul.dll JS::CallbackTracer::onEdge(js::BaseScript**, char const*) js/public/TracingAPI.h:245 inlined
would need to be hallucinated, which seems highly unlikely. I will note that it does make sense for all of those frames to be inlined together in actual cycle collector code, so it's not like it's pulling them out of thin air or something.
Comment 2•2 years ago
|
||
:mstange, since you are the author of the regressor, bug 1398533, could you take a look? Also, could you set the severity field?
For more information, please visit auto_nag documentation.
Comment 3•2 years ago
|
||
What's probably happening is that the compiler saw two calls from js::TenuringTracer::collectToObjectFixedPoint()
to js::AutoEnterOOMUnsafeRegion::crash(unsigned long long, char const*)
after all inlining was done. Then it combined those two calls into a single call instruction, and just picked one of the valid inline stacks arbitrarily. And now that single call instruction can be hit in two different cases, but the inline stack will only be valid for one of those cases.
So yeah, I think this is just a case of the debug info being lossy under compiler optimizations - you can't always unambiguously map an instruction back to its original inline stack.
Comment 4•2 years ago
|
||
I will say though that GC code has some crazy deep inlining going on. I wonder if less inlining would sometimes be better / faster. And it might make cases like these less likely to be hit.
Reporter | ||
Comment 5•2 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #3)
What's probably happening is that the compiler saw two calls from
js::TenuringTracer::collectToObjectFixedPoint()
tojs::AutoEnterOOMUnsafeRegion::crash(unsigned long long, char const*)
after all inlining was done. Then it combined those two calls into a single call instruction, and just picked one of the valid inline stacks arbitrarily. And now that single call instruction can be hit in two different cases, but the inline stack will only be valid for one of those cases.
I don't follow. js::TenuringTracer::collectToObjectFixedPoint()
may have multiple calls to js::AutoEnterOOMUnsafeRegion::crash
once everything is inlined, but none of those call sites would ever be from TraversalTracer::onChild
or anything on down.
TraversalTracer::onChild
could have a call to js::AutoEnterOOMUnsafeRegion::crash
(after inlining), and js::TenuringTracer::collectToObjectFixedPoint()
certainly does. But I don't see how those call sites could be merged; they'd be in separate code regions.
It makes sense that multiple calls within one big inlined code region might be collapsed together, and the stack shown for the wrong one. I just don't see any way to get a single code region that contains both TenuringTracer
and TraversalTracer
methods. Unless there's some serious code compression optimization going on, where they're devirtualized and distinguished with a runtime check or... something?
Comment 6•2 years ago
|
||
Comment 7•2 years ago
|
||
The call instruction which calls js::AutoEnterOOMUnsafeRegion::crash
in this crash is at 1d9fb69 (181d9fb69).
Comment 8•2 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #5)
I don't follow.
js::TenuringTracer::collectToObjectFixedPoint()
may have multiple calls tojs::AutoEnterOOMUnsafeRegion::crash
once everything is inlined, but none of those call sites would ever be fromTraversalTracer::onChild
or anything on down.
Ah, got it.
So it seems like the compiler generated bad debug info. It gives the faulty inline stack for this address range: 0x1d9fb11..0x1d9fbd0. This range contains the following code:
loc_181d9fb11:
0000000181d9fb11 mov rax, rdi ; CODE XREF=sub_181d994a0+7035
0000000181d9fb14 and rax, 0xfffffffffffff000
0000000181d9fb1a movzx eax, byte [rax+4]
0000000181d9fb1e lea rcx, qword [double_185bf76f0+2800] ; 0x185bf81e0
0000000181d9fb25 mov eax, dword [rcx+rax*4]
0000000181d9fb28 jmp loc_181d9b021
loc_181d9fb2d:
0000000181d9fb2d mov rcx, r10 ; CODE XREF=sub_181d994a0+7044
0000000181d9fb30 mov rdx, rdi
0000000181d9fb33 call sub_181a3e7a0 ; sub_181a3e7a0
0000000181d9fb38 jmp loc_181d9b039
loc_181d9fb3d:
0000000181d9fb3d lea r8, qword [aBaseshapegloba] ; "baseshape_global", CODE XREF=sub_181d994a0+7961
0000000181d9fb44 mov rcx, r10
0000000181d9fb47 mov rdx, rdi
0000000181d9fb4a call qword [qword_186648008] ; qword, sub_182060ea0,qword_186648008
0000000181d9fb50 jmp loc_181d9a933
loc_181d9fb55:
0000000181d9fb55 lea r8, qword [aFailedToAlloca_1860dafc7] ; "Failed to allocate new chunk during GC", CODE XREF=sub_181d994a0+4888, sub_181d994a0+4940, sub_181d994a0+6762, sub_181d994a0+7114, sub_181d9bb80+3731, sub_181d9bb80+3765, sub_181d9bb80+5005, sub_181d9d6d0+2204, sub_181d9d6d0+2239, sub_181d9d6d0+4715, sub_181d9d6d0+4930
0000000181d9fb5c lea rcx, qword [rsp+272]
0000000181d9fb64 mov edx, 0x100000
0000000181d9fb69 call j_sub_1820a7f80+526646
loc_181d9fb6e:
0000000181d9fb6e lea rax, qword [aMozreleaseasse_18616bb1a] ; "MOZ_RELEASE_ASSERT(isSome())", CODE XREF=sub_181d994a0+4248, sub_181d9bb80+3477, sub_181d9d6d0+3291
0000000181d9fb75 mov rcx, qword [imp_gMozCrashReason] ; imp_gMozCrashReason
0000000181d9fb7c mov qword [rcx], rax
0000000181d9fb7f int3
; endp
0000000181d9fb80 db 0xb9 ; '.'
0000000181d9fb81 db 0x0f ; '.'
0000000181d9fb82 db 0x03 ; '.'
0000000181d9fb83 db 0x00 ; '.'
0000000181d9fb84 db 0x00 ; '.'
0000000181d9fb85 db 0xe8 ; '.'
0000000181d9fb86 db 0x27 ; '''
0000000181d9fb87 db 0xef ; '.'
0000000181d9fb88 db 0x34 ; '4'
0000000181d9fb89 db 0x00 ; '.'
loc_181d9fb8a:
0000000181d9fb8a lea rax, qword [aMozcrashinvali_18614f411] ; "MOZ_CRASH(invalid scalar type)", CODE XREF=sub_181d9d6d0+6465, sub_181d9d6d0+6479, sub_181d9d6d0+8398, sub_181d9d6d0+8412, sub_181d9d6d0+8765, sub_181d9d6d0+8779
0000000181d9fb91 mov rcx, qword [imp_gMozCrashReason] ; imp_gMozCrashReason
0000000181d9fb98 mov qword [rcx], rax
0000000181d9fb9b int3
; endp
0000000181d9fb9c db 0xb9 ; '.'
0000000181d9fb9d db 0x57 ; 'W'
0000000181d9fb9e db 0x00 ; '.'
0000000181d9fb9f db 0x00 ; '.'
0000000181d9fba0 db 0x00 ; '.'
0000000181d9fba1 db 0xe8 ; '.'
0000000181d9fba2 db 0x0b ; '.'
0000000181d9fba3 db 0xef ; '.'
0000000181d9fba4 db 0x34 ; '4'
0000000181d9fba5 db 0x00 ; '.'
loc_181d9fba6:
0000000181d9fba6 mov edx, 0x23 ; argument #2 for method ?InvalidArrayIndex_CRASH@detail@mozilla@@YAX_K0@Z, CODE XREF=sub_181d98c20+949, sub_181d994a0+8888, sub_181d9bb80+5514
0000000181d9fbab mov rcx, rsi ; argument #1 for method ?InvalidArrayIndex_CRASH@detail@mozilla@@YAX_K0@Z
0000000181d9fbae call qword [imp_?InvalidArrayIndex_CRASH@detail@mozilla@@YAX_K0@Z] ; void __cdecl mozilla::detail::InvalidArrayIndex_CRASH(unsigned __int64,unsigned __int64), imp_?InvalidArrayIndex_CRASH@detail@mozilla@@YAX_K0@Z,?InvalidArrayIndex_CRASH@detail@mozilla@@YAX_K0@Z
loc_181d9fbb4:
0000000181d9fbb4 lea rax, qword [aMozcrashinvali_18614f411] ; "MOZ_CRASH(invalid scalar type)", CODE XREF=sub_181d9d6d0+6352, sub_181d9d6d0+6366, sub_181d9d6d0+8284, sub_181d9d6d0+8298, sub_181d9d6d0+8652, sub_181d9d6d0+8666
0000000181d9fbbb mov rcx, qword [imp_gMozCrashReason] ; imp_gMozCrashReason
0000000181d9fbc2 mov qword [rcx], rax
0000000181d9fbc5 int3
; endp
0000000181d9fbc6 db 0xb9 ; '.'
0000000181d9fbc7 db 0x57 ; 'W'
0000000181d9fbc8 db 0x00 ; '.'
0000000181d9fbc9 db 0x00 ; '.'
0000000181d9fbca db 0x00 ; '.'
0000000181d9fbcb db 0xe8 ; '.'
0000000181d9fbcc db 0xe1 ; '.'
0000000181d9fbcd db 0xee ; '.'
0000000181d9fbce db 0x34 ; '4'
0000000181d9fbcf db 0x00 ; '.'
The inline stack in question probably correctly applies to some of the basic blocks in there (e.g. 0x1d9fb11..0x1d9fb2d and 0x1d9fb2d..0x1d9fb3d), but not to the whole thing.
So I guess what must have happened is that the compiler created this shared error handling block, and then screwed up its debug info completely by merging it with the debug info of something unrelated.
Comment 9•2 years ago
|
||
Comment 10•2 years ago
|
||
The severity field is not set for this bug.
:gsvelto, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•2 years ago
|
Description
•