Closed Bug 1531183 Opened 6 years ago Closed 6 years ago

Bad remote symbol unwinding in Mali OpenGL driver

Categories

(Core :: Gecko Profiler, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: bholley, Unassigned)

Details

I can get a profile with good libXUL symbols for a local build on an HD8 tablet, but the samples on the Renderer thread that land in the GL driver are wacky: http://bit.ly/2VvLktR

Jed, see the private link to the library pulled from the phone. It would greatly help graphics performance work if we could get these symbols working reliably.

Flags: needinfo?(jld)

There's a large range that maps to EXIDX_CANTUNWIND:

(gdb) x/3xw 0x15d90e4
0x15d90e4:	0x7ed72e44	0x00000001	0x7f1456a4
(gdb) p/x 0x15d90e4 + *(long*)0x15d90e4 - (1<<31)
$23 = 0x34bf28
(gdb) p/x 0x15d90ec + *(long*)0x15d90ec - (1<<31)
$24 = 0x71e790

Judging by the symbol table, that range includes the entire OpenGL implementation. There is also no DWARF unwind info (contrast bug 1529108).

Theoretically it would be possible to implement something like Breakpad's stack scanning, which sweeps stack memory looking for plausible return addresses, but in practice this often produces false positives from old frames that already returned and were only partially overwritten.

It might also be possible to, as a special case for dealing with GL libraries, store enough information to restart stack unwinding for our code before calling into the library. (Didn't we handle unwinding across jitcode like that at one point?) I think we have a wrapper layer around GL calls to deal with library loading, in which case there would at least be an existing place to put code to do that, but someone would have to write it.

Flags: needinfo?(jld)

(In reply to Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧ from comment #3)

There's a large range that maps to EXIDX_CANTUNWIND:

(gdb) x/3xw 0x15d90e4
0x15d90e4:	0x7ed72e44	0x00000001	0x7f1456a4
(gdb) p/x 0x15d90e4 + *(long*)0x15d90e4 - (1<<31)
$23 = 0x34bf28
(gdb) p/x 0x15d90ec + *(long*)0x15d90ec - (1<<31)
$24 = 0x71e790

Judging by the symbol table, that range includes the entire OpenGL implementation. There is also no DWARF unwind info (contrast bug 1529108).

Ah yeah, that seems tricky then. I guess it's a driver-by-driver thing? I wonder how likely this is to reoccur on other drivers (it obviously works for Adrenos, per bug 1529108).

It might also be possible to, as a special case for dealing with GL libraries, store enough information to restart stack unwinding for our code before calling into the library. (Didn't we handle unwinding across jitcode like that at one point?) I think we have a wrapper layer around GL calls to deal with library loading, in which case there would at least be an existing place to put code to do that, but someone would have to write it.

That's an interesting idea - we have auto-generated bindings via gleam, and it would be fairly straightforward to put this code in if the overhead were low. Is there something existing to crib from that already works correctly with the stackwalker?

Flags: needinfo?(jld)

Markus pointed me to the implementation of the JIT trick [1]. Unfortunately, this relies on the JIT trampoline having stashed all the general-purpose registers in memory, and so that they can be restored (ARM unwinding can depend on arbitrary GPRs). We already do that for the JIT so the only added overhead is during unwinding, but it's probably not tenable to add that for every GL call.

I think a more tractable solution might be to interface more directly with the profiler: time every GL call, and if it's above some threshold (say, 1ms), add a marker to the timeline. I'll give this a try when I get a minute.

[1] https://searchfox.org/mozilla-central/rev/00f3836a87b844b5e4bc82f698c559b9966e4be2/tools/profiler/core/platform.cpp#1342-1345

Flags: needinfo?(jld)

(In reply to Bobby Holley (:bholley) from comment #5)

I think a more tractable solution might be to interface more directly with the profiler: time every GL call, and if it's above some threshold (say, 1ms), add a marker to the timeline. I'll give this a try when I get a minute.

That landed in bug 1532810. WONTFIXing this one, since we don't have another realistic path forward.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.