Open Bug 1743983 Opened 3 years ago Updated 1 year ago

Rewrite the minidump-analyzer in Rust

Categories

(Toolkit :: Crash Reporting, task)

task

Tracking

()

People

(Reporter: gsvelto, Assigned: afranchuk)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files)

As per title. The current minidump-analyzer takes the JSON annotations file as generated in a Firefox crash report plus the minidump and creates a stack trace that will be added to the JSON annotations. While doing so it also retrieves module signatures and adds them to the annotations.

We only ever implemented reading unwinding information on x86-64/Windows in bug 1333126, for all the other platforms we only used frame pointers or stack scanning.

A rewrite could be done in several steps:

  • Build a version that only uses stack scanning & frame pointers but which is otherwise compatible with the existing implementation
  • Add unwinding support for Windows (by using the symbolic-debuginfo crate to read the unwinding information from PE files)
  • Add unwinding support macOS (likewise but for Mach-O ones)
  • Add unwinding support for Linux/Android. This is trickier as I believe that symbolic-debuginfo has no way of surfacing this data - yet - so we might have to gimli/goblin directly to parse the data
Assignee: nobody → afranchuk

Once the stackwalking functionality is split out of the processor we can move forward, as we don't want to audit/vendor the huge dependencies of minidump-processor.

This needs the minidump-walk-stack crate to be vendored. It builds when minidump and
minidump-walk-stack are changed to paths in the split-processor-crate branch of rust-minidump
(see https://github.com/rust-minidump/rust-minidump/pull/825).

I'm now just waiting on a new release of rust-minidump. Unfortunately the way vendoring works, I can't make progress or do it ahead of time without the crates being in crates.io (I tried directing them to both git urls and local paths as a means of working on it prior to publishing, but that seems to cause the packages to not be pulled in by vendoring properly).

There's a bit more work that needs to be done. Steven Fackler did some testing and noticed that the logic that finds symbols has some issues. We also don't demangle symbols when using local debug information. Neither of these issues would preclude us from using the existing code for minidump-analyze because we won't be symbolicating the stacks on the user machine.

However the symbol information also contains the inlining information and with Steven's patch applied I get incoherent stacks. The latter needs to be fixed before we roll out the new minidump-analyzer as we need consistent stacks with what we get on Socorro.

I've just published version 0.17.0 of the rust-minidump crate including the new minidump-unwind crate which we need here.

This is almost there. When vendoring dependencies, some files hit the size limit. We've reached out to gimli developers wrt the fixtures in that crate (which don't need to be vendored). Some of the other large files are necessary so we'll have to accept them.

:glandium brought up slight concern with vendoring symbolic since we hope to replace it with wholesym in the future (i.e. symbolic will live on in the repository forever). I researched a bit and came to the conclusion that it will be a non-trivial effort to update wholesym for our needs. We will want to expose CFI and probably add some crate features to be able to turn off networked stuff (which brings along unnecessary crates for our purposes). This is somewhat straightforward for macos and linux as macho-unwind-info and gimli provide CFI support. Windows (PE) will be a little more effort. But at this time I think it makes more sense to go ahead with using symbolic. It's entirely possible we'll use it for other things in the future, so I don't feel too bad about vendoring it in the repository. In total the source of the crates are about 1.1MB.

Attachment #9328160 - Attachment description: WIP: Bug 1743983 - Rewrite the minidump-analyzer in Rust r=gsvelto → Bug 1743983 - Rewrite the minidump-analyzer in Rust r=gsvelto
Blocks: syn-2

Depends on D175252

Due to gsvelto's discovery that the new minidump-analyzer uses an unacceptable amount of memory (with a large portion being from internal symbolic data structures), we need to rework the approach here. We cannot use symbolic to enable stack walking.

Instead, I am planning to use framehop. As of now, minidump-analyzer only needs the return addresses (no symbolication). It would be possible to use framehop alone with minidump to read the stack/registers, without using minidump-unwind at all. However, framehop doesn't do stack scanning, so this would not work for platforms without CFI. So we should likely use framehop with minidump-unwind, and ensure that will work without requiring symbolication.

framehop also doesn't support PE unwind info, so that will need to be added.

gsvelto, how does this approach sound to you?

Flags: needinfo?(gsvelto)

(In reply to Alex Franchuk from comment #8)

Due to gsvelto's discovery that the new minidump-analyzer uses an unacceptable amount of memory (with a large portion being from internal symbolic data structures), we need to rework the approach here. We cannot use symbolic to enable stack walking.

Instead, I am planning to use framehop. As of now, minidump-analyzer only needs the return addresses (no symbolication). It would be possible to use framehop alone with minidump to read the stack/registers, without using minidump-unwind at all. However, framehop doesn't do stack scanning, so this would not work for platforms without CFI. So we should likely use framehop with minidump-unwind, and ensure that will work without requiring symbolication.

framehop also doesn't support PE unwind info, so that will need to be added.

gsvelto, how does this approach sound to you?

This SGTM but before adding functionality to framehop I think it would be better to start with a minimum viable implementation on Linux - where it should already work well. We'll use that to validate if the approach works and then jump into adding whatever we need for the other platforms.

Flags: needinfo?(gsvelto)

Excellent idea, I will try Linux first.

I've confirmed that this works using framehop (and wholesym for completeness, which actually was able to get some symbols that symbolic couldn't). framehop sometimes performed functionally better than symbolic (getting a few extra frames, usually at the beginning). As far as performance, in debug builds framehop was about 10x faster than symbolic, and had very low memory usage.

There's one drawback right now, which is that framehop doesn't restore all registers for each unwound frame. This can be added in the future; I'm fine with adding a note to the documentation of the minidump-unwind DebugInfoSymbolProvider to that effect.

I'll move forward with adding the other platforms we need to framehop.

EDIT
I just did a release build comparison.
framehop + wholesym: 146MB peak memory usage, ~170ms usr time
symbolic: 322MB peak memory usage, 1.44s usr time

Debug builds of the framehop solution run faster than release builds of symbolic :)

Note that the memory usage is just for the simple case of an about:crashcontent minidump. This only loads libc.so and libxul.so (on linux). However the framehop solution works differently than the old way which lazily loaded modules: it loads all modules up-front, so it shouldn't use much more memory than that for any firefox minidumps.

No longer blocks: syn-2
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: