Rewrite the minidump-analyzer in Rust
Categories
(Toolkit :: Crash Reporting, task)
Tracking
()
People
(Reporter: gsvelto, Assigned: afranchuk)
References
(Blocks 2 open bugs)
Details
Attachments
(2 files)
As per title. The current minidump-analyzer takes the JSON annotations file as generated in a Firefox crash report plus the minidump and creates a stack trace that will be added to the JSON annotations. While doing so it also retrieves module signatures and adds them to the annotations.
We only ever implemented reading unwinding information on x86-64/Windows in bug 1333126, for all the other platforms we only used frame pointers or stack scanning.
A rewrite could be done in several steps:
- Build a version that only uses stack scanning & frame pointers but which is otherwise compatible with the existing implementation
- Add unwinding support for Windows (by using the symbolic-debuginfo crate to read the unwinding information from PE files)
- Add unwinding support macOS (likewise but for Mach-O ones)
- Add unwinding support for Linux/Android. This is trickier as I believe that symbolic-debuginfo has no way of surfacing this data - yet - so we might have to gimli/goblin directly to parse the data
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 1•2 years ago
|
||
Once the stackwalking functionality is split out of the processor we can move forward, as we don't want to audit/vendor the huge dependencies of minidump-processor
.
Assignee | ||
Comment 2•2 years ago
|
||
This needs the minidump-walk-stack
crate to be vendored. It builds when minidump
and
minidump-walk-stack
are changed to paths in the split-processor-crate
branch of rust-minidump
(see https://github.com/rust-minidump/rust-minidump/pull/825).
Assignee | ||
Comment 3•1 years ago
|
||
I'm now just waiting on a new release of rust-minidump
. Unfortunately the way vendoring works, I can't make progress or do it ahead of time without the crates being in crates.io
(I tried directing them to both git urls and local paths as a means of working on it prior to publishing, but that seems to cause the packages to not be pulled in by vendoring properly).
Reporter | ||
Comment 4•1 years ago
|
||
There's a bit more work that needs to be done. Steven Fackler did some testing and noticed that the logic that finds symbols has some issues. We also don't demangle symbols when using local debug information. Neither of these issues would preclude us from using the existing code for minidump-analyze because we won't be symbolicating the stacks on the user machine.
However the symbol information also contains the inlining information and with Steven's patch applied I get incoherent stacks. The latter needs to be fixed before we roll out the new minidump-analyzer as we need consistent stacks with what we get on Socorro.
Reporter | ||
Comment 5•1 year ago
|
||
I've just published version 0.17.0 of the rust-minidump
crate including the new minidump-unwind
crate which we need here.
Assignee | ||
Comment 6•1 year ago
|
||
This is almost there. When vendoring dependencies, some files hit the size limit. We've reached out to gimli developers wrt the fixtures in that crate (which don't need to be vendored). Some of the other large files are necessary so we'll have to accept them.
:glandium brought up slight concern with vendoring symbolic since we hope to replace it with wholesym in the future (i.e. symbolic will live on in the repository forever). I researched a bit and came to the conclusion that it will be a non-trivial effort to update wholesym for our needs. We will want to expose CFI and probably add some crate features to be able to turn off networked stuff (which brings along unnecessary crates for our purposes). This is somewhat straightforward for macos and linux as macho-unwind-info
and gimli
provide CFI support. Windows (PE) will be a little more effort. But at this time I think it makes more sense to go ahead with using symbolic. It's entirely possible we'll use it for other things in the future, so I don't feel too bad about vendoring it in the repository. In total the source of the crates are about 1.1MB.
Updated•1 year ago
|
Assignee | ||
Comment 7•1 year ago
|
||
Depends on D175252
Assignee | ||
Comment 8•1 year ago
|
||
Due to gsvelto's discovery that the new minidump-analyzer uses an unacceptable amount of memory (with a large portion being from internal symbolic
data structures), we need to rework the approach here. We cannot use symbolic
to enable stack walking.
Instead, I am planning to use framehop
. As of now, minidump-analyzer
only needs the return addresses (no symbolication). It would be possible to use framehop
alone with minidump
to read the stack/registers, without using minidump-unwind
at all. However, framehop
doesn't do stack scanning, so this would not work for platforms without CFI. So we should likely use framehop
with minidump-unwind
, and ensure that will work without requiring symbolication.
framehop
also doesn't support PE unwind info, so that will need to be added.
gsvelto, how does this approach sound to you?
Reporter | ||
Comment 9•1 year ago
|
||
(In reply to Alex Franchuk from comment #8)
Due to gsvelto's discovery that the new minidump-analyzer uses an unacceptable amount of memory (with a large portion being from internal
symbolic
data structures), we need to rework the approach here. We cannot usesymbolic
to enable stack walking.Instead, I am planning to use
framehop
. As of now,minidump-analyzer
only needs the return addresses (no symbolication). It would be possible to useframehop
alone withminidump
to read the stack/registers, without usingminidump-unwind
at all. However,framehop
doesn't do stack scanning, so this would not work for platforms without CFI. So we should likely useframehop
withminidump-unwind
, and ensure that will work without requiring symbolication.
framehop
also doesn't support PE unwind info, so that will need to be added.gsvelto, how does this approach sound to you?
This SGTM but before adding functionality to framehop
I think it would be better to start with a minimum viable implementation on Linux - where it should already work well. We'll use that to validate if the approach works and then jump into adding whatever we need for the other platforms.
Assignee | ||
Comment 10•1 year ago
|
||
Excellent idea, I will try Linux first.
Assignee | ||
Comment 11•1 year ago
|
||
I've confirmed that this works using framehop
(and wholesym
for completeness, which actually was able to get some symbols that symbolic
couldn't). framehop
sometimes performed functionally better than symbolic
(getting a few extra frames, usually at the beginning). As far as performance, in debug builds framehop
was about 10x faster than symbolic
, and had very low memory usage.
There's one drawback right now, which is that framehop
doesn't restore all registers for each unwound frame. This can be added in the future; I'm fine with adding a note to the documentation of the minidump-unwind
DebugInfoSymbolProvider
to that effect.
I'll move forward with adding the other platforms we need to framehop
.
EDIT
I just did a release build comparison.
framehop + wholesym
: 146MB peak memory usage, ~170ms usr time
symbolic
: 322MB peak memory usage, 1.44s usr time
Debug builds of the framehop
solution run faster than release builds of symbolic
:)
Note that the memory usage is just for the simple case of an about:crashcontent
minidump. This only loads libc.so
and libxul.so
(on linux). However the framehop
solution works differently than the old way which lazily loaded modules: it loads all modules up-front, so it shouldn't use much more memory than that for any firefox minidumps.
Description
•