Closed
Bug 1273048
Opened 9 years ago
Closed 8 years ago
x86 Android crashes in mozilla::PseudoElementForStyleContext
Categories
(Core :: Layout, defect)
Tracking
()
RESOLVED
FIXED
mozilla49
Tracking | Status | |
---|---|---|
firefox49 | --- | fixed |
People
(Reporter: n.nethercote, Assigned: dbaron)
References
Details
(Keywords: crash)
Crash Data
Attachments
(2 files, 1 obsolete file)
(deleted),
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
xidorn
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-eb3eccff-dc1a-4034-af8e-8a9732160511.
=============================================================
This crash signature first appeared in Fennec Nightly 20160511030221, and has occurred 5 times since, across three installations.
I don't see any obvious changes in and around this code recently.
One interesting thing is that all 5 crashes have occurred on x86/Android devices.
dbaron, any ideas?
Flags: needinfo?(dbaron)
Assignee | ||
Comment 1•9 years ago
|
||
The crash address for the crashes seems to be build-specific, and not near either any of the registers, nor in any of the mapped modules.
The CPU for *all* of the Android-x86 crashes is:
GenuineIntel family 6 model 55 stepping 8 | 4
(This CPU is the second most common Android-x86 CPU in crash-stats, responsible for 12.66% of crash reports:
https://crash-stats.mozilla.com/search/?product=FennecAndroid&cpu_arch=x86&_facets=cpu_info&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-cpu_info
The distribution on nightly is quite different, where it's 3rd and 13.08%:
https://crash-stats.mozilla.com/search/?product=FennecAndroid&cpu_arch=x86&release_channel=nightly&date=%3E2016-01-01&_facets=cpu_info&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-cpu_info .)
Flags: needinfo?(dbaron)
Comment 2•8 years ago
|
||
Bug 1269976 is about pseudo elements and landed on m-c 2016-05-10, fwiw.
Comment 3•8 years ago
|
||
My guess is that the compiler somehow removes the sanity check
> if (aPseudoType >= CSSPseudoElementType::Count) {
above the crashing line.
The patch in bug 1269976 moves kPseudoElementFlags from a local const array to a scoped one, which may change where the array lives in memory, and the new location may have less valid space after...
Assignee | ||
Comment 4•8 years ago
|
||
(In reply to Mats Palmgren (:mats) from comment #2)
> Bug 1269976 is about pseudo elements and landed on m-c 2016-05-10, fwiw.
Seems pretty likely. Given the nightly crash data, I think there's a >75% chance or so that the regression is in the one-day range:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=043082cb7bd8490c60815f67fbd1f33323ad7663&tochange=674a552743785c28c75866969aad513bd8eaf6ae
(especially since the May 9 and May 10 builds were built from the same changeset).
Blocks: 1269976
Assignee | ||
Updated•8 years ago
|
Summary: Crash in mozilla::PseudoElementForStyleContext → x86 Android crashes in mozilla::PseudoElementForStyleContext
Assignee | ||
Comment 5•8 years ago
|
||
(In reply to David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) from comment #1)
> The CPU for *all* of the Android-x86 crashes is:
> GenuineIntel family 6 model 55 stepping 8 | 4
This is no longer true.
This is also now 17% of x86 Android crashes for 49.0a1:
https://crash-stats.mozilla.com/search/?product=FennecAndroid&version=49.0a1&cpu_arch=x86&date=%3E2016-04-01&_facets=signature
Comment 6•8 years ago
|
||
Hmmm, I have no idea how to disasm the xul library in x86 Android build... I heard that libxul.so there is compressed by szip, which I don't find any tool to decompress. And I don't know how to get the proper symbols either...
Comment 7•8 years ago
|
||
Per IRC discussion between dbaron and glandium, to decompress the .so file, you need to:
> mach configure --enable-project=mobile/android
> mach build pre-export
> mach build -C . mozglue/linker/host
> $objdir/dist/host/bin/szip -d libxul.so
Assignee | ||
Comment 8•8 years ago
|
||
I debugged bp-1b4efb55-ecc2-42e4-98c7-27a872160601 a bit
In order to objdump the libxul.so from the build, I needed to compile szip with:
unset MOZCONFIG; MOZ_LINKER=1 ./mach configure --with-system-zlib ; ./mach build pre-export ; ./mach build -C . mozglue/linker/host
Then I could determine that the function we want is:
017eb8ad <mozilla::PseudoElementForStyleContext(nsIFrame*, mozilla::CSSPseudoEle
mentType)>:
17eb8ad: 80 fa 16 cmp dl,0x16
17eb8b0: 77 3a ja 17eb8ec <mozilla::PseudoElementFo
rStyleContext(nsIFrame*, mozilla::CSSPseudoElementType)+0x3f>
17eb8b2: 55 push ebp
17eb8b3: 89 e5 mov ebp,esp
17eb8b5: 56 push esi
17eb8b6: 89 c6 mov esi,eax
17eb8b8: 53 push ebx
17eb8b9: e8 b2 5a b5 fe call 341370 <__x86.get_pc_thunk.bx>
17eb8be: 81 c3 6a c6 17 02 add ebx,0x217c66a
17eb8c4: f6 84 93 98 1d 33 ff test BYTE PTR [ebx+edx*4-0xcce268],0x4
17eb8cb: 04
17eb8cc: 75 14 jne 17eb8e2 <mozilla::PseudoElementForStyleContext(nsIFrame*, mozilla::CSSPseudoElementType)+0x35>
17eb8ce: 83 ec 0c sub esp,0xc
17eb8d1: 52 push edx
17eb8d2: e8 0b 3b f9 ff call 177f3e2 <nsCSSPseudoElements::PseudoElementSupportsUserActionState(mozilla::CSSPseudoElementType)>
17eb8d7: 83 c4 10 add esp,0x10
17eb8da: 84 c0 test al,al
17eb8dc: 75 04 jne 17eb8e2 <mozilla::PseudoElementForStyleContext(nsIFrame*, mozilla::CSSPseudoElementType)+0x35>
17eb8de: 31 c0 xor eax,eax
17eb8e0: eb 03 jmp 17eb8e5 <mozilla::PseudoElementForStyleContext(nsIFrame*, mozilla::CSSPseudoElementType)+0x38>
17eb8e2: 8b 46 14 mov eax,DWORD PTR [esi+0x14]
17eb8e5: 8d 65 f8 lea esp,[ebp-0x8]
17eb8e8: 5b pop ebx
17eb8e9: 5e pop esi
17eb8ea: 5d pop ebp
17eb8eb: c3 ret
17eb8ec: 31 c0 xor eax,eax
17eb8ee: c3 ret
17eb8ef: 90 nop
The module_offset in the raw dump in the crash report is 0x17eb8c4,
so our EIP (0x2c6a48c4) means 0x17eb8c4 above.
EBX is 0x2e820f28 and EDX is 0x211fa509.
Note that the instruction at the very start compares only the low bytes
(cmp dl,0x16) when deciding whether to jump to the end.
It's not clear to me whether aType is supposed to be passed in DL (with the
other 24 bits ignored) or in EDX (consider all 32 bits). Either way, things
seem wrong.
Looking at the calling function, we have (from the crash report's raw
dump), module_offset: 0x17ff1ac (which should be the return address...
but apparently it's one byte less than the return address!?), and the
calling code there looks like:
17ff19f: 8a 95 5c ff ff ff mov dl,BYTE PTR [ebp-0xa4]
17ff1a5: 8b 45 0c mov eax,DWORD PTR [ebp+0xc]
17ff1a8: e8 00 c7 fe ff call 17eb8ad <mozilla::PseudoElementForStyleContext(nsIFrame*, mozilla::CSSPseudoElementType)>
17ff1ad: 52 push edx
This makes me think it's a miscompilation that the above code is using
a test instruction that uses all of EDX:
17eb8c4: f6 84 93 98 1d 33 ff test BYTE PTR [ebx+edx*4-0xcce268],0x4
The Android NDK's objdump and my system objdump on Ubuntu 16.04 at least
agree on that instruction. In AT&T syntax, it's:
17eb8c4: f6 84 93 98 1d 33 ff testb $0x4,-0xcce268(%ebx,%edx,4)
Comment 9•8 years ago
|
||
Looking nsCSSPseudoElements::PseudoElementSupportsUserActionState() which is called a bit later, it seems the function there only uses one byte. I suspect this is what was compiled before I move that function into the header, and thus it didn't crash before bug 1269976:
0177f3e2 <_ZN19nsCSSPseudoElements36PseudoElementSupportsUserActionStateEN7mozilla20CSSPseudoElementTypeE>:
177f3e2: e8 c0 6d bc fe call 3461a7 <__x86.get_pc_thunk.cx>
177f3e7: 81 c1 41 8b 1e 02 add $0x21e8b41,%ecx
177f3ed: 55 push %ebp
177f3ee: 89 e5 mov %esp,%ebp
177f3f0: 0f b6 45 08 movzbl 0x8(%ebp),%eax
177f3f4: 5d pop %ebp
177f3f5: f6 84 81 98 1d 33 ff testb $0x8,-0xcce268(%ecx,%eax,4)
177f3fc: 08
177f3fd: 0f 95 c0 setne %al
177f400: c3 ret
177f401: 90 nop
Assignee | ||
Comment 10•8 years ago
|
||
00:50:42 <jandem> dbaron: that sounds like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64037, it comes up from time to time in spidermonkey :/
00:50:55 <jandem> i checked the enum there has uint8_t storage
Comment 11•8 years ago
|
||
I mean, before I move nsCSSPseudoElements::PseudoElementSupportsStyleAttribute into the header.
So the solution would probably be: move nsCSSPseudoElements::PseudoElementSupportsStyleAttribute back to the source file, so that the compiler wouldn't mis-optimize it when inlining.
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 12•8 years ago
|
||
01:58:51 <xidorn> dbaron: I have no idea. if upgrading ndk is non-trivial, I'd suggest we move nsCSSPseudoElements::PseudoElementSupportsStyleAttribute back to the .cpp file, or mark it never inline for x86 android build
02:01:01 → adalucinet and KaiRo joined ⇐ reed and AdrianSV quit ↔ mgoodwin_OoO and sankha nipped out • mtabara|away → mtabara, fscholz|off → fscholz
02:10:56 <glandium> dbaron: it's supposed to be fixed in rev 218721 of the 4.9 branch, and NDK r11 has a newer revision than that
02:11:47 <glandium> dbaron: also, we don't have to land a NDK upgrade to validate that it does indeed fix it
02:11:57 <glandium> that can be checked with a try build
Also, we could perhaps use __attribute__((noinline)), #ifdef'd based on the bad GCC versions, based on https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes . (There's a good bit of __attribute__((noinline)) usage in the tree already; see https://mxr.mozilla.org/mozilla-central/search?string=__%28%28noinline%29%29&tree=mozilla-central
Assignee | ||
Comment 13•8 years ago
|
||
MozReview-Commit-ID: 4VjAra5B6GM
Attachment #8759868 -
Flags: review?(nfroyd)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → dbaron
Status: NEW → ASSIGNED
Assignee | ||
Comment 14•8 years ago
|
||
MozReview-Commit-ID: IXYYn3mLQBf
Attachment #8759870 -
Flags: review?(bugzilla)
Assignee | ||
Comment 15•8 years ago
|
||
I tested locally that the MOZ_GCC_VERSION_AT_LEAST() and
MOZ_GCC_VERSION_AT_MOST() expressions do sensible things when I fiddle
with the numbers to make them more or less than my local gcc version.
(I tested this for all 4 expressions.)
I don't know for sure that this will fix the crashes we're seeing, but
it seems like it should undo the change that triggered it, so I think
it's worth trying.
MozReview-Commit-ID: IXYYn3mLQBf
Attachment #8759871 -
Flags: review?(bugzilla)
Updated•8 years ago
|
Attachment #8759868 -
Flags: review?(nfroyd) → review+
Assignee | ||
Updated•8 years ago
|
Attachment #8759870 -
Attachment is obsolete: true
Attachment #8759870 -
Flags: review?(bugzilla)
Comment 16•8 years ago
|
||
Comment on attachment 8759871 [details] [diff] [review]
Add __attribute__((noinline)) to work around compiler bug on Android/x86
Review of attachment 8759871 [details] [diff] [review]:
-----------------------------------------------------------------
Looks fine if it works.
::: layout/style/nsCSSPseudoElements.h
@@ +109,5 @@
> +
> + // Work around https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64037 ,
> + // which is a general gcc bug that we seem to have hit only on Android/x86.
> +#if defined(ANDROID) && defined(__i386__) && defined(__GNUC__) && !defined(__clang__)
> +#if (MOZ_GCC_VERSION_AT_LEAST(4,8,0) && MOZ_GCC_VERSION_AT_MOST(4,8,4)) || (MOZ_GCC_VERSION_AT_LEAST(4,9,0) && MOZ_GCC_VERSION_AT_MOST(4,9,2))
This line, as well as the line above, looks too long. Could you break both of them into two lines?
Attachment #8759871 -
Flags: review?(bugzilla) → review+
Assignee | ||
Comment 17•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/efd20b79bbc525828b9a687bd14feb43c93c14d1
Bug 1273048 - Add MOZ_GCC_VERSION_AT_MOST macro. r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/4c3d7660fd3730f2bf51fd32f21f98063ea860c9
Bug 1273048 - Add __attribute__((noinline)) to work around compiler bug on Android/x86. r=xidorn
Comment 18•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/efd20b79bbc5
https://hg.mozilla.org/mozilla-central/rev/4c3d7660fd37
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Assignee | ||
Comment 19•8 years ago
|
||
It looks like the fix worked, since the last crash in:
https://crash-stats.mozilla.com/signature/?date=%3E%3D2016-05-01&signature=mozilla%3A%3APseudoElementForStyleContext#aggregations
is still from the 2016-06-05 build.
Comment 20•7 years ago
|
||
This crash is still live in FF 52 ESR/i386, at least on RHEL6/CentOS6:
https://bugs.centos.org/view.php?id=13253&nbn=1
https://bugzilla.redhat.com/show_bug.cgi?id=1451055
Comment 21•7 years ago
|
||
(In reply to Martin Stránský from comment #20)
> This crash is still live in FF 52 ESR/i386, at least on RHEL6/CentOS6:
>
> https://bugs.centos.org/view.php?id=13253&nbn=1
> https://bugzilla.redhat.com/show_bug.cgi?id=1451055
Same signature doesn't mean it is the same bug, but it could be.
This crash was from a compiler bug, and the workaround was only added for Android because that's the only place we use -Os I suppose. It is possible that the compiler RHEL6 / CentOS6 uses still have this bug, and your build configuration triggers it.
You need to log in
before you can comment on or make changes to this bug.
Description
•