Closed
Bug 1474871
Opened 6 years ago
Closed 6 years ago
Rust crash signatures are broken: Symbols aren't getting demangled
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
RESOLVED
FIXED
mozilla63
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox-esr60 | --- | unaffected |
firefox61 | --- | unaffected |
firefox62 | --- | unaffected |
firefox63 | --- | fixed |
People
(Reporter: jan, Assigned: glandium)
References
Details
(Keywords: nightly-community, regression)
Attachments
(1 file)
Nightly 63 x64 20180710222524 de_DE @ Debian Testing (KDE, Xorg, GTX 1060 using https://packages.debian.org/buster/nvidia-driver (390.67-2))
Please correct the bug title if needed as I have no insight in the magic happening here.
(Jan Andre Ikenmeyer [:darkspirit] from bug 1423311 comment 9)
> (In reply to Jesse Schwartzentruber (:truber) from bug 1423311 comment 0)
> > Created attachment 8934650 [details]
> > testcase.html
>
> Crash reason changed:
> bp-7f92be4a-9163-4583-a990-3ab210180706
> > Moz2D replay problem
>
> And it looks like rust crash signatures are broken. Now there's one signature for multiple crash reasons:
> https://crash-stats.mozilla.com/signature/?product=Firefox&signature=mozalloc_abort%20%7C%20abort%20%7C%20_ZN11panic_abort18__rust_start_panic5abort17hfb98714efe360e0fE%20%7C%20__rust_start_panic%20%7C%20_ZN3std9panicking20rust_panic_with_hook17h608586f043d70222E&date=%3E%3D2018-06-29T15%3A24%3A56.000Z&date=%3C2018-07-06T15%3A24%3A56.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports
(Kartikaya Gupta (email:kats@mozilla.com) from bug 1423311 comment 10)
> Looks like the symbols aren't getting demangled, and so the step that's supposed to filter out the rust panic boilerplate stack frames out of the signature isn't getting triggered.
Comment 1•6 years ago
|
||
This could be a regression from bug 1447116, which merged to central 10 days ago, or bug 1309172, which merged to central 15 days ago.
Reporter | ||
Comment 2•6 years ago
|
||
RUST_BACKTRACE=1 mozregression --launch 2018-06-25 -B debug --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://bugzilla.mozilla.org/attachment.cgi?id=8934650'
> 0:38.62 INFO: thread 'WRRenderBackend#1' panicked at 'called `Option::unwrap()` on a `None` value', /checkout/src/libcore/option.rs:335:21
> 0:39.17 INFO: stack backtrace:
> 0:39.17 INFO: 0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
> 0:39.17 INFO: 1: std::sys_common::backtrace::print
> 0:39.17 INFO: 2: std::panicking::default_hook::{{closure}}
> 0:39.17 INFO: 3: std::panicking::default_hook
> 0:39.17 INFO: 4: std::panicking::rust_panic_with_hook
> 0:39.17 INFO: 5: std::panicking::begin_panic
> 0:39.17 INFO: 6: std::panicking::begin_panic_fmt
> 0:39.17 INFO: 7: rust_begin_unwind
> 0:39.17 INFO: 8: core::panicking::panic_fmt
> 0:39.17 INFO: 9: core::panicking::panic
> 0:39.17 INFO: 10: <unknown>
> 0:39.17 INFO: Redirecting call to abort() to mozalloc_abort
RUST_BACKTRACE=1 mozregression --launch 2018-07-10 -B debug --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://bugzilla.mozilla.org/attachment.cgi?id=8934650'
> 0:42.76 INFO: thread 'WRWorker#3' panicked at 'Moz2D replay problem', gfx/webrender_bindings/src/moz2d_renderer.rs:453:21
> 0:42.96 INFO: stack backtrace:
> 0:42.96 INFO: 0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
> 0:42.96 INFO: 1: std::sys_common::backtrace::print
> 0:42.96 INFO: 2: std::panicking::default_hook::{{closure}}
> 0:42.96 INFO: 3: std::panicking::default_hook
> 0:42.96 INFO: 4: std::panicking::rust_panic_with_hook
> 0:42.96 INFO: 5: <unknown>
> 0:42.96 INFO: Redirecting call to abort() to mozalloc_abort
I assume the now missing core::panicking::panic_fmt is the relevant difference? (I have really no idea what I'm talking about.)
RUST_BACKTRACE=1 mozregression --good 2018-06-25 --bad 2018-07-10 -B debug --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://bugzilla.mozilla.org/attachment.cgi?id=8934650'
> 12:05.55 INFO: Last good revision: 27f60c02583dca519ae203bcf34ac8133483f0e7
> 12:05.55 INFO: First bad revision: ba2b15dbe7d87b170a8f89bcdef6c739583b322c
> 12:05.55 INFO: Pushlog:
> https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=27f60c02583dca519ae203bcf34ac8133483f0e7&tochange=ba2b15dbe7d87b170a8f89bcdef6c739583b322c
> ba2b15dbe7d8 Martin Robinson — Bug 1470125 - Update WR bindings for changes in WR PR 2848. r=Gankro
> 720f0dd717a1 Kartikaya Gupta — Bug 1470125 - Update bindings for changes in WR PR 2849. r=Gankro
> afe2313f83ed Kartikaya Gupta — Bug 1470125 - Update webrender to commit cdfaaeb5f74e416f39af1081c9a676c752d23896. r=Gankro
> WR @ 866a83f86ae129052c38d009da5587dfed90c144
RUST_BACKTRACE=1 mozregression --repo try --launch a66923a8f067835f563a0a80ff4aea48275a3acd -B debug --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://bugzilla.mozilla.org/attachment.cgi?id=8934650'
good (has panic_fmt; 'called `Option::unwrap()` on a `None` value')
> WR @ 27443178dae2596536224b34445fb480c3cbc079
RUST_BACKTRACE=1 mozregression --repo try --launch 92f0c1a966c975f4822d2fd01570c339a2e8e022 -B debug --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://bugzilla.mozilla.org/attachment.cgi?id=8934650'
bad (no panic_fmt; 'Moz2D replay problem')
"Regression range": https://github.com/servo/webrender/compare/866a83f86ae129052c38d009da5587dfed90c144...27443178dae2596536224b34445fb480c3cbc079
There was no longer a core::panicking::panic_fmt when the crash reason changed.
I don't know if this finding is related to this bug. Please mark this comment as offtopic if it is not relevant.
It would be easier if I knew how to enable the crash reporter within mozregression or how to get the crash signature as it would be displayed on Socorro.
Assignee | ||
Comment 3•6 years ago
|
||
rust 1.28 changed how panic is hooked up at the lower level, and the panic_fmt function doesn't exist anymore. If soccoro relies on that, well, that's now obviously broken. Symbols not being demangled is ... unexpected. For instance, _ZN11panic_abort18__rust_start_panic5abort17hfb98714efe360e0fE demangles perfectly fine to panic_abort::__rust_start_panic::abort with c++filt, so I don't know why those would not be demangled... except if the DWARF changed in a way that breakpad doesn't like, kind of like bug 1464537.
Assignee | ||
Comment 4•6 years ago
|
||
The good news is that the demangling issue only affects linux builds (I checked symbols for android, windows and mac, they're fine)
Assignee | ||
Comment 5•6 years ago
|
||
It's a regression from bug 1309172.
Assignee: nobody → mh+mozilla
Blocks: 1309172
Assignee | ||
Comment 6•6 years ago
|
||
Hahaha the new version of breakpad added this: https://dxr.mozilla.org/mozilla-central/rev/085cdfb90903d4985f0de1dc7786522d9fb45596/toolkit/crashreporter/google-breakpad/src/common/language.cc#159-166
and the fallback is to use the mangled version, instead of trying to C++-demangle it, like it used to.
Assignee | ||
Comment 7•6 years ago
|
||
Ironically, the change was from Ted: https://chromium-review.googlesource.com/c/402308/
Assignee | ||
Comment 8•6 years ago
|
||
rust-demangle-capi fails to build. It's using rusty-cheddar to generate a header, which depends on syntex_syntax, which fails to build:
0:23.23 error: to use a constant of type `codemap::Span` in a pattern, `codemap::Span` must be annotated with `#[derive(PartialEq, Eq)]`
0:23.23 --> third_party/rust/syntex_syntax/src/errors/emitter.rs:100:18
0:23.23 |
0:23.23 100 | Some(COMMAND_LINE_SP) => self.emit_(FileLine(COMMAND_LINE_SP), msg, code, lvl),
0:23.23 | ^^^^^^^^^^^^^^^
rusty-cheddar is deprecated in favor of cbindgen, OTOH, that seems overkill to generate a header for two functions.
Assignee | ||
Comment 9•6 years ago
|
||
Note, it bugs me that the problem doesn't happen on mac and android. Does that mean the dwarf info is not annotated as being rust there?
Reporter | ||
Comment 10•6 years ago
|
||
Don't know if related: WebRender crash signatures from Windows and Mac often, but not always, looked a bit different than from Linux. Examples: bug 1455597, bug 1455743.
Assignee | ||
Comment 11•6 years ago
|
||
Actually, mac was affected to some degree. I guess Android probably too.
Assignee | ||
Comment 12•6 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #3)
> rust 1.28 changed how panic is hooked up at the lower level, and the
> panic_fmt function doesn't exist anymore.
Actually, it does, but it's mangled: _ZN4core9panicking9panic_fmt17h545cb08e3b473cc5E
So there might actually be a separate problem in that unwinding can't unwind through it... if that's the case, that's bad.
Assignee | ||
Comment 13•6 years ago
|
||
Crash reports like
https://crash-stats.mozilla.com/report/index/743446b3-396d-4adc-8628-442030180711
suggest we do unwind properly, but there are probably no frames left with panic_fmt for optimization reasons.
Comment hidden (mozreview-request) |
Comment 16•6 years ago
|
||
mozreview-review |
Comment on attachment 8991516 [details]
Bug 1474871 - Link dump_syms against rustc-demangle.
https://reviewboard.mozilla.org/r/256416/#review263372
Thanks for finishing this! When I originally implemented that upstream I didn't get this part done becuase we didn't have host Rust compilation wired up, and we needed to update Breakpad to pick up those changes as well. Your reworking of the -capi crate I wrote is totally sensible, doing all that work just to generate two C function declarations doesn't make much sense.
Attachment #8991516 -
Flags: review?(ted) → review+
Comment 17•6 years ago
|
||
Pushed by tmielczarek@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7bf77a18b13e
Link dump_syms against rustc-demangle. r=ted
Comment 18•6 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Reporter | ||
Comment 19•6 years ago
|
||
Thank you! Now I get bp-0ca773e7-2b08-4d1a-8fd7-67b4e0180713.
Could the crash signature be made as meaningful as in the past again? (e.g. bp-76972dfe-f63c-4d5a-8d71-1a6a60180425)
Reporter | ||
Comment 20•6 years ago
|
||
Or is it correct like it is now?
Assignee | ||
Comment 21•6 years ago
|
||
(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #19)
> Thank you! Now I get bp-0ca773e7-2b08-4d1a-8fd7-67b4e0180713.
> Could the crash signature be made as meaningful as in the past again? (e.g.
> bp-76972dfe-f63c-4d5a-8d71-1a6a60180425)
File a Soccoro bug for some filtering on the panic symbols, but I'll note that the signature *does* contain the relevant bit (<T>::execute), although the specific rayon type is masked.
Assignee | ||
Comment 22•6 years ago
|
||
(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #20)
> Or is it correct like it is now?
It doesn't seem to be the same crash at all, so you can't really compare them.
Reporter | ||
Comment 23•6 years ago
|
||
Okay, thank you!
Reporter | ||
Updated•6 years ago
|
Updated•6 years ago
|
status-firefox61:
--- → unaffected
status-firefox62:
--- → unaffected
status-firefox-esr52:
--- → unaffected
status-firefox-esr60:
--- → unaffected
You need to log in
before you can comment on or make changes to this bug.
Description
•