Closed Bug 1434402 Opened 7 years ago Closed 2 years ago

Incomplete C++ stacks when Ion JavaScript frames are in the stack (can't unwind the native stack through Jit code)

Categories

(Core :: Gecko Profiler, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
Tracking Status
firefox60 --- affected

People

(Reporter: mstange, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(1 file)

Sometimes we fail to unwind a complete C++ stack because we encounter a JavaScript JIT frame on the stack for which the unwinder doesn't know how to find the parent frame. This leads to profiles like this one: https://perfht.ml/2noUdGE Here it looks like we have many different invocations of the JS function "getSymbols", even though it's just one invocation. But we have different stack frames leading up to getSymbols, because some samples have complete C++ stacks and other samples do not - for those other samples we only have pseudo stack frames and JS stack frames. This can lead to a lot of confusion. In order to fix this, we'll probably need different fixes for the different platforms and unwinders we support: - macOS: On macOS we use frame pointer unwinding. So emitting frame pointers for JIT code (bug 1426134) should get us most of the way. We only have 64-bit macOS builds. - Windows 32-bit: On Windows 32-bit builds we also use frame pointer unwinding, but bug 1426134 would probably not help us because it would only emit frame pointers in 64-bit builds. - Windows 64-bit: On Windows 64-bit we use RtlVirtualUnwind to unwind the C++ call stack. This uses stack unwinding information that's stored in a side table somewhere. We would need to create such unwinding information for every piece of JIT code we compile. We do not use frame pointers for unwinding on Windows 64-bit because the compiler does not support emitting frame pointers for C++ functions on that platform. - Linux: On Linux we use LuL for unwinding, which consults both tables with unwinding information, and the frame pointer. So frame pointers in JIT code would probably solve this in 64-bit builds. We probably don't care about profiling Linux 32-bit builds. - Android/ARM: I don't know how we unwind on Android/ARM.
(In reply to Markus Stange [:mstange] from comment #0) > - Windows 32-bit: > On Windows 32-bit builds we also use frame pointer unwinding, but bug > 1426134 would probably not help us because it would only emit frame pointers > in 64-bit builds. To achieve the overall simplifications (in profiling, etc) in bug 1426134, we would probably want to just take the perf hit by preserving fp, now that it's 32.7% of our users (and probably not what we're primarily being benchmarked on). (This is what we did in wasm and it wasn't terrible; other engines also preserve fp.) > - Windows 64-bit: > On Windows 64-bit we use RtlVirtualUnwind to unwind the C++ call stack. > This uses stack unwinding information that's stored in a side table > somewhere. We would need to create such unwinding information for every > piece of JIT code we compile. > We do not use frame pointers for unwinding on Windows 64-bit because the > compiler does not support emitting frame pointers for C++ functions on that > platform. This would take extra work but if one preserves frame pointers in JIT code it's not hard to write the RtlVirtualUnwind callbacks to unwind JIT code; that's what Chakra is doing, apparently.
For what it's worth, here's a profile where the stack unwinding failed: https://perfht.ml/2I5vxie Running: <script type="text/javascript"> function a() { b(); } function b() { c(); h(); } function c() { d(); f(); } function d() { e(); } function e() { for (let i = 0; i < 1e7; i += Math.random()) {} } function f() { for (let i = 0; i < 1e7; i += Math.random()) {} g(); } function g() { for (let i = 0; i < 1e7; i += Math.random()) {} } function h() { f(); } a(); </script>
Priority: -- → P1
Attached file baseline-script.html (deleted) —
I'm also getting failures on Linux when everything is baseline. I added a `with ({}) {}` statement to every function to try and de-optimize them. I didn't get any failures on Mac, but I did on Linux. The stacks all appear to be baseline, with a couple of interpreter samples. macOS (no issue): https://perfht.ml/2Ttxecv Linux 64: https://perfht.ml/2ToHjHI I attached the code I ran.

Seeing this in https://perfht.ml/2SV3W7J and I promised Florian I'd ping him about a bug (which I was going to file, but then Markus pointed to this one). :-)

Flags: needinfo?(florian)
Depends on: 1635987

Jan, you helped me with JIT questions while I was working on bug 1635987 (use JIT info to recover stack walking when it gets lost), could you please have a look at this bug here (the original problem & proposed solution(s)) and give some thoughts as to how possible and/or difficult it would be to solve as comment 0 suggests?

For more context:
I'm working on bug 1635987, and in sub-bug 1707489 (which implements the stack-walking and profiler bits on Win64), :glandium wrote in his review:

This seems like a lot of convoluted hoops to go through to avoid adding frame pointers to JIT frames. Is there anything that prevents us from doing that instead?

I may be responsible for the convoluted hoops, so don't worry about that part.
But what do you think about "adding frame pointers to JIT frames"? (My own reading of comment 0 above, if correct, is that it's not that simple? You may read my initial reply there.)
Thanks.

Flags: needinfo?(jdemooij)

(In reply to Gerald Squelart [:gerald] (he/him) from comment #5)

But what do you think about "adding frame pointers to JIT frames"? (My own reading of comment 0 above, if correct, is that it's not that simple? You may read my initial reply there.)

Preserving frame pointers in JIT code (bug 1426134) is hard, especially on 32-bit x86 where we're already often running out of registers without a frame pointer. Using frame pointers for JIT frames is something we want to do eventually, but it's a lot of work and not really a priority for us right now.

Flags: needinfo?(jdemooij)
Flags: needinfo?(florian)

I just got another occurrence of this issue: https://share.firefox.dev/3qfp4q7

Severity: normal → S3

I think we can call this finished how that Bug 1426134 is landed. Unwinding through the JITs and all its machinery seems to be working pretty well these days.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: