Open Bug 1825078 Opened 2 years ago Updated 2 years ago

Incorrect inlined frames reported

Categories

(Toolkit :: Crash Reporting, defect)

defect

Tracking

()

People

(Reporter: sfink, Unassigned)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files)

Bug 1819532 seems to show incorrect inlined frames for a crash (well, a bucket full of crashes). If only the non-inlined frames are considered, the stack seems totally plausible:

0 	xul.dll 	js::AutoEnterOOMUnsafeRegion::crash(char const*) 	js/src/vm/JSContext.cpp:1300 	context
1 	xul.dll 	js::AutoEnterOOMUnsafeRegion::crash(unsigned long long, char const*) 	js/src/vm/JSContext.cpp:1316 	cfi
2 	xul.dll 	js::TenuringTracer::collectToObjectFixedPoint() 	js/src/gc/Tenuring.cpp:873 	cfi
3 	xul.dll 	js::Nursery::collect(JS::GCOptions, JS::GCReason) 	js/src/gc/Nursery.cpp:1115 	cfi
.
.
.

but with inlined frames, it has the nursery GC calling the cycle collector's tracer, which can't happen (see bug 1819532 comment 5).

I don't know if these sorts of things are expected or known, but I thought I'd file this bug just in case.

Idle, baseless speculation: what happens when you have two functions with identical bodies that later get inlined during LTO? (Is that the right term? PGO?) Is it possible for the functions to be merged under one of the two bodies, and then the code address range gets attributed to the wrong function? I'm bringing LTO into this because it was my impression that the linker is the thing that can discard duplicates.

It's really not a very plausible explanation for this stack, because all of

2 	xul.dll 	JS::CallbackTracer::onEdge(JSObject**, char const*) 	js/public/TracingAPI.h:245 	inlined
2 	xul.dll 	js::GenericTracerImpl<JS::CallbackTracer>::onObjectEdge(JSObject**, char const*) 	js/public/TracingAPI.h:219 	inlined
2 	xul.dll 	js::gc::TraceEdgeInternal(JSTracer*, JSObject**, char const*) 	js/src/gc/Tracer.h:106 	inlined
2 	xul.dll 	js::TraceManuallyBarrieredEdge(JSTracer*, JSObject**, char const*) 	js/src/gc/Tracer.h:248 	inlined
2 	xul.dll 	js::BaseShape::traceChildren(JSTracer*) 	js/src/gc/TraceMethods-inl.h:306 	inlined
2 	xul.dll 	js::gc::TraceCycleCollectorChildren(JS::CallbackTracer*, js::Shape*) 	js/src/gc/Tracer.cpp:99 	inlined
2 	xul.dll 	JS_TraceShapeCycleCollectorChildren(JS::CallbackTracer*, JS::GCCellPtr) 	js/src/jsfriendapi.cpp:185 	inlined
2 	xul.dll 	TraversalTracer::onChild(JS::GCCellPtr, char const*) 	xpcom/base/CycleCollectedJSRuntime.cpp:442 	inlined
2 	xul.dll 	JS::CallbackTracer::onEdge(js::BaseScript**, char const*) 	js/public/TracingAPI.h:245 	inlined

would need to be hallucinated, which seems highly unlikely. I will note that it does make sense for all of those frames to be inlined together in actual cycle collector code, so it's not like it's pulling them out of thin air or something.

:mstange, since you are the author of the regressor, bug 1398533, could you take a look? Also, could you set the severity field?

For more information, please visit auto_nag documentation.

Flags: needinfo?(mstange.moz)

What's probably happening is that the compiler saw two calls from js::TenuringTracer::collectToObjectFixedPoint() to js::AutoEnterOOMUnsafeRegion::crash(unsigned long long, char const*) after all inlining was done. Then it combined those two calls into a single call instruction, and just picked one of the valid inline stacks arbitrarily. And now that single call instruction can be hit in two different cases, but the inline stack will only be valid for one of those cases.

So yeah, I think this is just a case of the debug info being lossy under compiler optimizations - you can't always unambiguously map an instruction back to its original inline stack.

Flags: needinfo?(mstange.moz)

I will say though that GC code has some crazy deep inlining going on. I wonder if less inlining would sometimes be better / faster. And it might make cases like these less likely to be hit.

(In reply to Markus Stange [:mstange] from comment #3)

What's probably happening is that the compiler saw two calls from js::TenuringTracer::collectToObjectFixedPoint() to js::AutoEnterOOMUnsafeRegion::crash(unsigned long long, char const*) after all inlining was done. Then it combined those two calls into a single call instruction, and just picked one of the valid inline stacks arbitrarily. And now that single call instruction can be hit in two different cases, but the inline stack will only be valid for one of those cases.

I don't follow. js::TenuringTracer::collectToObjectFixedPoint() may have multiple calls to js::AutoEnterOOMUnsafeRegion::crash once everything is inlined, but none of those call sites would ever be from TraversalTracer::onChild or anything on down.

TraversalTracer::onChild could have a call to js::AutoEnterOOMUnsafeRegion::crash (after inlining), and js::TenuringTracer::collectToObjectFixedPoint() certainly does. But I don't see how those call sites could be merged; they'd be in separate code regions.

It makes sense that multiple calls within one big inlined code region might be collapsed together, and the stack shown for the wrong one. I just don't see any way to get a single code region that contains both TenuringTracer and TraversalTracer methods. Unless there's some serious code compression optimization going on, where they're devirtualized and distinguished with a runtime check or... something?

The call instruction which calls js::AutoEnterOOMUnsafeRegion::crash in this crash is at 1d9fb69 (181d9fb69).

(In reply to Steve Fink [:sfink] [:s:] from comment #5)

I don't follow. js::TenuringTracer::collectToObjectFixedPoint() may have multiple calls to js::AutoEnterOOMUnsafeRegion::crash once everything is inlined, but none of those call sites would ever be from TraversalTracer::onChild or anything on down.

Ah, got it.

So it seems like the compiler generated bad debug info. It gives the faulty inline stack for this address range: 0x1d9fb11..0x1d9fbd0. This range contains the following code:

                     loc_181d9fb11:
0000000181d9fb11         mov        rax, rdi                                    ; CODE XREF=sub_181d994a0+7035
0000000181d9fb14         and        rax, 0xfffffffffffff000
0000000181d9fb1a         movzx      eax, byte [rax+4]
0000000181d9fb1e         lea        rcx, qword [double_185bf76f0+2800]          ; 0x185bf81e0
0000000181d9fb25         mov        eax, dword [rcx+rax*4]
0000000181d9fb28         jmp        loc_181d9b021

                     loc_181d9fb2d:
0000000181d9fb2d         mov        rcx, r10                                    ; CODE XREF=sub_181d994a0+7044
0000000181d9fb30         mov        rdx, rdi
0000000181d9fb33         call       sub_181a3e7a0                               ; sub_181a3e7a0
0000000181d9fb38         jmp        loc_181d9b039

                     loc_181d9fb3d:
0000000181d9fb3d         lea        r8, qword [aBaseshapegloba]                 ; "baseshape_global", CODE XREF=sub_181d994a0+7961
0000000181d9fb44         mov        rcx, r10
0000000181d9fb47         mov        rdx, rdi
0000000181d9fb4a         call       qword [qword_186648008]                     ; qword, sub_182060ea0,qword_186648008
0000000181d9fb50         jmp        loc_181d9a933

                     loc_181d9fb55:
0000000181d9fb55         lea        r8, qword [aFailedToAlloca_1860dafc7]       ; "Failed to allocate new chunk during GC", CODE XREF=sub_181d994a0+4888, sub_181d994a0+4940, sub_181d994a0+6762, sub_181d994a0+7114, sub_181d9bb80+3731, sub_181d9bb80+3765, sub_181d9bb80+5005, sub_181d9d6d0+2204, sub_181d9d6d0+2239, sub_181d9d6d0+4715, sub_181d9d6d0+4930
0000000181d9fb5c         lea        rcx, qword [rsp+272]
0000000181d9fb64         mov        edx, 0x100000
0000000181d9fb69         call       j_sub_1820a7f80+526646

                     loc_181d9fb6e:
0000000181d9fb6e         lea        rax, qword [aMozreleaseasse_18616bb1a]      ; "MOZ_RELEASE_ASSERT(isSome())", CODE XREF=sub_181d994a0+4248, sub_181d9bb80+3477, sub_181d9d6d0+3291
0000000181d9fb75         mov        rcx, qword [imp_gMozCrashReason]            ; imp_gMozCrashReason
0000000181d9fb7c         mov        qword [rcx], rax
0000000181d9fb7f         int3
                        ; endp
0000000181d9fb80         db  0xb9 ; '.'
0000000181d9fb81         db  0x0f ; '.'
0000000181d9fb82         db  0x03 ; '.'
0000000181d9fb83         db  0x00 ; '.'
0000000181d9fb84         db  0x00 ; '.'
0000000181d9fb85         db  0xe8 ; '.'
0000000181d9fb86         db  0x27 ; '''
0000000181d9fb87         db  0xef ; '.'
0000000181d9fb88         db  0x34 ; '4'
0000000181d9fb89         db  0x00 ; '.'

                     loc_181d9fb8a:
0000000181d9fb8a         lea        rax, qword [aMozcrashinvali_18614f411]      ; "MOZ_CRASH(invalid scalar type)", CODE XREF=sub_181d9d6d0+6465, sub_181d9d6d0+6479, sub_181d9d6d0+8398, sub_181d9d6d0+8412, sub_181d9d6d0+8765, sub_181d9d6d0+8779
0000000181d9fb91         mov        rcx, qword [imp_gMozCrashReason]            ; imp_gMozCrashReason
0000000181d9fb98         mov        qword [rcx], rax
0000000181d9fb9b         int3
                        ; endp
0000000181d9fb9c         db  0xb9 ; '.'
0000000181d9fb9d         db  0x57 ; 'W'
0000000181d9fb9e         db  0x00 ; '.'
0000000181d9fb9f         db  0x00 ; '.'
0000000181d9fba0         db  0x00 ; '.'
0000000181d9fba1         db  0xe8 ; '.'
0000000181d9fba2         db  0x0b ; '.'
0000000181d9fba3         db  0xef ; '.'
0000000181d9fba4         db  0x34 ; '4'
0000000181d9fba5         db  0x00 ; '.'

                     loc_181d9fba6:
0000000181d9fba6         mov        edx, 0x23                                   ; argument #2 for method ?InvalidArrayIndex_CRASH@detail@mozilla@@YAX_K0@Z, CODE XREF=sub_181d98c20+949, sub_181d994a0+8888, sub_181d9bb80+5514
0000000181d9fbab         mov        rcx, rsi                                    ; argument #1 for method ?InvalidArrayIndex_CRASH@detail@mozilla@@YAX_K0@Z
0000000181d9fbae         call       qword [imp_?InvalidArrayIndex_CRASH@detail@mozilla@@YAX_K0@Z] ; void __cdecl mozilla::detail::InvalidArrayIndex_CRASH(unsigned __int64,unsigned __int64), imp_?InvalidArrayIndex_CRASH@detail@mozilla@@YAX_K0@Z,?InvalidArrayIndex_CRASH@detail@mozilla@@YAX_K0@Z

                     loc_181d9fbb4:
0000000181d9fbb4         lea        rax, qword [aMozcrashinvali_18614f411]      ; "MOZ_CRASH(invalid scalar type)", CODE XREF=sub_181d9d6d0+6352, sub_181d9d6d0+6366, sub_181d9d6d0+8284, sub_181d9d6d0+8298, sub_181d9d6d0+8652, sub_181d9d6d0+8666
0000000181d9fbbb         mov        rcx, qword [imp_gMozCrashReason]            ; imp_gMozCrashReason
0000000181d9fbc2         mov        qword [rcx], rax
0000000181d9fbc5         int3
                        ; endp
0000000181d9fbc6         db  0xb9 ; '.'
0000000181d9fbc7         db  0x57 ; 'W'
0000000181d9fbc8         db  0x00 ; '.'
0000000181d9fbc9         db  0x00 ; '.'
0000000181d9fbca         db  0x00 ; '.'
0000000181d9fbcb         db  0xe8 ; '.'
0000000181d9fbcc         db  0xe1 ; '.'
0000000181d9fbcd         db  0xee ; '.'
0000000181d9fbce         db  0x34 ; '4'
0000000181d9fbcf         db  0x00 ; '.'

The inline stack in question probably correctly applies to some of the basic blocks in there (e.g. 0x1d9fb11..0x1d9fb2d and 0x1d9fb2d..0x1d9fb3d), but not to the whole thing.

So I guess what must have happened is that the compiler created this shared error handling block, and then screwed up its debug info completely by merging it with the debug info of something unrelated.

The raw pdb file is here (rename to xul.pdb.cab and uncompress), the raw dll is here (rename to xul.dll.cab and uncompress).

I'm not sure what the next step here is. Ideally we'd file a bug on LLVM but it seems tricky to make a reduced testcase.

The severity field is not set for this bug.
:gsvelto, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(gsvelto)
Severity: -- → S3
Flags: needinfo?(gsvelto)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: