Closed Bug 650239 Opened 14 years ago Closed 8 years ago

Investigate doing client-side stack walking in Breakpad

Categories

(Toolkit :: Crash Reporting, defect)

All
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: ted, Unassigned)

References

Details

(Whiteboard: [fce-active-legacy])

On platforms like x86-64 and ARM, we often wind up with crummy crash report stacks because we don't have symbols for system libraries, and these architectures by default don't have a frame pointer, so the stack walker has to scan the stack and guess at return addresses. Humorously, the ABI for these platforms requires binaries to have unwind information present, so walking the stack client-side is pretty much always possible.

I don't think we'll ever solve the problem of having all the system symbols available, so it would be interesting to explore alternatives. Jim, Mike and I had a few separate conversations on IRC. First, I wondered whether it'd be possible to simply include all the unwind information from loaded libraries in the minidump. Mike did some calculations and found that it'd be several MB of extra data, even if we compressed it, so that sounds unreasonable. He wondered whether we could walk the stack client-side to figure out which libraries were necessary. Jim pointed out that if we were going to do that, we might as well instead just walk the stack and include the frame addresses in the minidump, since that'd be a tiny amount of information.

I looked at libunwind, and it's MIT licensed and has an API for using ptrace to unwind out-of-process, but Mike pointed out that including a whole bunch of code for something that Breakpad already supports seems silly. He makes a good point, and I think the Breakpad stack walker could be fairly easily adapted to run client-side using the unwind data directly.

As a nice extra, we could also include export symbols where available for functions that show up on the stack, so we could get function names. It wouldn't be much extra data if we only included things actually on the stack.

Doing this using Breakpad makes it possible to extend this beyond just Linux as well, it would be straightforward to make it work on OS X and even Win64.
Conveniently, we've got a copy of libunwind in the Mozilla tree nowadays (for the profiler):
http://mxr.mozilla.org/mozilla-central/source/tools/profiler/libunwind/
FWIW, the arm stack walking using libunwind is still not perfect, although it should work in many cases now.  I've not looked at using libunwind on other architectures.
I think it'd be most beneficial on ARM and x86-64, where we should have unwind info in the binaries. (Unwinding from system libs sucks on x86 too, but it's not as helpful there.)
We do not have unwind info enabled for non-profiling builds on ARM, as far as I know (see http://mxr.mozilla.org/mozilla-central/source/build/autoconf/frameptr.m4#7)
It's required in some cases by the ABI, but we could probably flip that on by default without any issues.
Whiteboard: [fct-active]
Whiteboard: [fct-active] → [fce-active]
Client-side stack-walking has been working fine for a few months now so I think we can close this too to wrap up :)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Whiteboard: [fce-active] → [fce-active-legacy]
You need to log in before you can comment on or make changes to this bug.