Closed
Bug 637243
Opened 14 years ago
Closed 14 years ago
Android crash stacks are completely busted
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
RESOLVED
FIXED
mozilla2.0
Tracking | Status | |
---|---|---|
blocking2.0 | --- | final+ |
People
(Reporter: jdm, Assigned: glandium)
References
Details
Attachments
(1 file)
(deleted),
patch
|
ted
:
review+
|
Details | Diff | Splinter Review |
Case in point: 02f20a8d-630c-49a6-a908-fe6242110227
I triggered a content crash with the latest crashme, but the stack does not reflect that at all.
Reporter | ||
Comment 1•14 years ago
|
||
Reporter | ||
Comment 2•14 years ago
|
||
I recall the stacks for 4.0b5pre being either completely or mostly bogus, so I'm doing some searching to try to narrow the window for when this started happening. I'm seeing bizarro stacks being reported at least as far back as Feb 2.
Keywords: regressionwindow-wanted
Reporter | ||
Comment 3•14 years ago
|
||
Ok, so it looks like we have no little crash information on the period between 1/26 and 1/28, which is when we switched from 4.0b4pre to 4.0b5pre (and stopped collecting data until crash-stats was updated) and coincidentally the closest regression window that I can find. Every stack I look at for 4.0b4pre in leading up to that switch looks fine; in 4.0b5pre, there's a single nsScriptSecurityManager::doGetObjectPrincipal crash which has an intelligible stack and comes from build 20110127162904. I think we should investigate what landed in after that build went out.
Reporter | ||
Comment 4•14 years ago
|
||
m-c pushlog for the range of the 1/27 to 1/28 nightlies: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b5314bc1a926&tochange=993b69aa088a
Reporter | ||
Comment 5•14 years ago
|
||
Likely suspect: bug 628233 in which elfhack is enabled on Android. Sigh.
Blocks: 628233
Reporter | ||
Updated•14 years ago
|
Summary: Crash stacks are completely busted → Android crash stacks are completely busted
Assignee | ||
Comment 6•14 years ago
|
||
I took a look at random ff 4.0b12 crash reports for linux, they seem to be busted too :( (if someone could take a closer look to validate...)
It could be a problem with either the crash symbols creation or the dumping process, because with gdb and standard debugging symbols, I get proper stacks.
blocking2.0: --- → ?
Assignee | ||
Comment 7•14 years ago
|
||
(In reply to comment #6)
> It could be a problem with either the crash symbols creation or the dumping
> process, because with gdb and standard debugging symbols, I get proper stacks.
Obviously not crash symbols creation, since it takes symbols from files *before* elfhack.
Assignee | ||
Comment 8•14 years ago
|
||
(In reply to comment #7)
> Obviously not crash symbols creation, since it takes symbols from files
> *before* elfhack.
However, note that .dbg files are taken out of non elfhack'ed binaries and thus don't correspond to elfhack'ed binaries. This shouldn't be a problem, though, as they are normally not used for crash reports. The .sym files are fine, though, since the .text addresses are the same between elfhack'ed and non-elfhack'ed binaries.
Comment 9•14 years ago
|
||
If the function at the top of the stack is sensible, but the stack after that is screwed, then it's possible that whatever elfhack is doing is screwing up the CFI data present in the .sym files. We parse the DWARF CFI and use that info to find caller frames. jimb wrote all that code, but he has a pretty good writeup of it here:
http://code.google.com/p/google-breakpad/wiki/SymbolFiles#STACK_CFI_records
Comment 10•14 years ago
|
||
bz kept pointing out bad stacks over the past few days: bug 635901, bug 636052, etc.
Comment 11•14 years ago
|
||
This is essential. Are we pretty sure that elfhack caused this? If so, we should just disabled elfhack for this release and revisit later.
And boy howdy, we should probably come up with some kind of unit test to make sure stackwalking works sanely.
ted/glandium, which one of you wants to take this?
blocking2.0: ? → final+
Comment 12•14 years ago
|
||
glandium is on this, but I agree that we should just disable for this release.
A stackwalking unittest would be great, although probably a PITA to setup just because of the need to build all the Breakpad processor code.
Assignee: nobody → mh+mozilla
Assignee | ||
Comment 13•14 years ago
|
||
The responsible is the minidump writer, which only prints out one mapping for what is not data in libxul.so, while there are now two with elfhack.
Let's disable elfhack for now (which is what the patch does), and we'll see later to fix minidump writer, and to fix the .dbg files generation, too.
Attachment #515610 -
Flags: review?(ted.mielczarek)
Comment 14•14 years ago
|
||
Comment on attachment 515610 [details] [diff] [review]
Disable elfhack by default
Please land this ASAP.
Attachment #515610 -
Flags: review?(ted.mielczarek) → review+
Assignee | ||
Comment 15•14 years ago
|
||
Keywords: regressionwindow-wanted
Assignee | ||
Updated•14 years ago
|
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0
Comment 16•14 years ago
|
||
(In reply to comment #13)
> we'll see
> later to fix minidump writer, and to fix the .dbg files generation, too.
Is there a bug on file for that part?
Assignee | ||
Comment 17•14 years ago
|
||
(In reply to comment #16)
> (In reply to comment #13)
> > we'll see
> > later to fix minidump writer, and to fix the .dbg files generation, too.
>
> Is there a bug on file for that part?
bug 637316 and bug 637341.
Updated•13 years ago
|
tracking-fennec: ? → ---
You need to log in
before you can comment on or make changes to this bug.
Description
•