Closed Bug 672787 Opened 13 years ago Closed 13 years ago

Aurora build crashes [@ _moz_pixman_image_composite32] at start-up (07/20)

Categories

(Core :: Graphics, defect)

ARM
Android
defect
Not set
blocker

Tracking

()

RESOLVED FIXED
mozilla8
Tracking Status
firefox7 + wontfix
firefox8 --- fixed

People

(Reporter: xti, Assigned: jchen)

References

Details

Crash Data

Attachments

(2 files)

Attached file crash logcat (deleted) —
Build id : Mozilla/5.0 (Android;Linux armv7l;rv:7.0a2)Gecko/20110720 Firefox/7.0a2 Fennec/7.0a2 Device: Motorola Droid 2 OS: Android 2.2 Steps to reproduce: Case 1: If there is any Aurora build outdated installed, update it from about:firefox. After the new build is installed, tap on Open button. Case 2: Go to http://ftp.mozilla.org/pub/mozilla.org/mobile/nightly/latest-mozilla-aurora-android/ and tap on fennec-7.0a2.multi.eabi-arm.apk. After the app is installed, tap on the Open button. Expected result: Aurora build opens normally. Actual result: Aurora build crashes every time when it's opened and a Mozilla Crash dialog is displayed. Note: I cannot get the crash report from about:crashes because the Aurora build doesn't open at all.
I was able to get a crash report after I've installed the build from 20110719 over it: https://crash-stats.mozilla.com/report/index/bp-502110f9-b010-4911-86f8-edc292110720
This issue doesn't occur on: Build id : Mozilla/5.0 (Android;Linux armv7l;rv:7.0a2)Gecko/20110719 Firefox/7.0a2 Fennec/7.0a2 Build config: http://hg.mozilla.org/releases/mozilla-aurora/rev/4d2a4e9e9730 But occurs on: Build id : Mozilla/5.0 (Android;Linux armv7l;rv:7.0a2)Gecko/20110719 Firefox/7.0a2 Fennec/7.0a2 A possible range is: http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2011-07-19&enddate=2011-07-20+03%3A00
https://crash-stats.mozilla.com/report/index/bp-502110f9-b010-4911-86f8-edc292110720 0 libxul.so libxul.so@0x9df494 1 libxul.so _moz_pixman_image_composite32 gfx/cairo/libpixman/src/pixman.c:371 2 libxul.so _clip_and_composite_boxes gfx/cairo/cairo/src/cairo-image-surface.c:3002 3 libxul.so _cairo_image_surface_paint gfx/cairo/cairo/src/cairo-image-surface.c:3304 4 libxul.so _cairo_surface_paint gfx/cairo/cairo/src/cairo-surface.c:2100 5 libxul.so _cairo_gstate_paint gfx/cairo/cairo/src/cairo-gstate.c:1049 6 libxul.so _moz_cairo_paint gfx/cairo/cairo/src/cairo.c:2238 7 libxul.so _moz_cairo_paint_with_alpha gfx/cairo/cairo/src/cairo.c:2267 8 libxul.so gfxContext::Paint gfx/thebes/gfxContext.cpp:772 9 libxul.so gfxPlatform::OptimizeImage gfx/thebes/gfxPlatform.cpp:414 10 libxul.so imgFrame::Optimize nsAutoPtr.h:954 11 libxul.so mozilla::imagelib::RasterImage::DecodingComplete modules/libpr0n/src/RasterImage.cpp:1111 12 libxul.so mozilla::imagelib::Decoder::PostDecodeDone nsCOMPtr.h:800 13 libxul.so mozilla::imagelib::nsPNGDecoder::end_callback modules/libpr0n/decoders/nsPNGDecoder.cpp:863 14 libxul.so MOZ_PNG_push_have_end modules/libimg/png/pngpread.c:1908 15 libxul.so MOZ_PNG_push_read_chunk modules/libimg/png/pngpread.c:364 16 libxul.so MOZ_PNG_proc_some_data modules/libimg/png/pngpread.c:65 17 libxul.so MOZ_PNG_process_data modules/libimg/png/pngpread.c:39 18 libxul.so mozilla::imagelib::nsPNGDecoder::WriteInternal modules/libpr0n/decoders/nsPNGDecoder.cpp:354 19 libxul.so mozilla::imagelib::Decoder::Write modules/libpr0n/src/Decoder.cpp:104 20 libxul.so mozilla::imagelib::RasterImage::WriteToDecoder modules/libpr0n/src/RasterImage.cpp:2277
Afaik, you need to look at the pushlog for Aurora, which is here: http://hg.mozilla.org/releases/mozilla-aurora/ The problem is that I don't see anything that could trigger this crash, afaik.
Crash Signature: [@ _moz_pixman_image_composite32]
Summary: Aurora build crashes at start-up (07/20) → Aurora build crashes [@ _moz_pixman_image_composite32] at start-up (07/20)
Today's Aurora nightly starts up fine (no crash) on my Xoom.
I guess this is basically related to/the same as bug 623161.
Btw, I can reproduce this crash on start-up, using the LG Optimus Black.
i don't crash on a n1.
I'm in the Mountain View office with the crashing Aurora browser on the phone. If someone wants to investigate, he can grab my phone (I'm in the QA area).
I mentioned this to Naoki in case it is useful - There is a corresponding signature on the Firefox side with fairly low volume crash rate: https://crash-stats.mozilla.com/report/list?signature=_moz_pixman_image_composite32
Ok, this looks more like a Cairo bug to me, hen. Moving it to Core->Graphics.
Component: General → Graphics
Product: Fennec → Core
QA Contact: general → thebes
Version: Firefox 7 → Trunk
This crashes (info pulled from application.ini): Version=7.0a2 BuildID=20110720042444 SourceRepository=http://hg.mozilla.org/releases/mozilla-aurora SourceStamp=579cbf7a9add This runs: Version=7.0a2 BuildID=20110719042859 SourceRepository=http://hg.mozilla.org/releases/mozilla-aurora SourceStamp=4d2a4e9e9730 So these are what landed in that span: changeset: 72687:579cbf7a9add user: Simon Montagu <smontagu@smontagu.org> date: Mon Jul 11 06:40:51 2011 +0300 summary: Don't resolve bidi paragraph in preformatted text until we really get to the end of the line. Bug 670226, r=roc, a=asa changeset: 72686:433cd269be19 user: Simon Montagu <smontagu@smontagu.org> date: Mon Jul 11 06:40:51 2011 +0300 summary: Tests for bug 670226 changeset: 72685:ef4909389600 user: Simon Montagu <smontagu@smontagu.org> date: Fri Jul 08 10:51:26 2011 +0300 summary: Make sure that bidi continuation chains don't go beyond the end of the paragraph. Bug 668941, r=roc, a=asa changeset: 72684:9a3234ac5c1c user: Myk Melez <myk@mozilla.org> date: Tue Jul 19 20:55:10 2011 -0700 summary: update revision of Add-on SDK tests to latest tip; a=test-only changeset: 72683:82f49f622e9d user: Luke Wagner <luke@mozilla.com> date: Mon Jul 18 17:37:19 2011 -0700 summary: Bug 672026 - Ensure that there is an object principals finder during early startup (r=mrbkap,a=asa)
(In reply to comment #13) > Talos regression in bug 672026 correction bug 654049
Depends on: 654049
No longer depends on: 672026
I helped dougt trigger the following jobs (from http://build.mozilla.org/builds/running.html): mozilla-aurora 9a3234ac5c1c Android mozilla-aurora build mozilla-aurora 82f49f622e9d Android mozilla-aurora build mozilla-aurora ef4909389600 Android mozilla-aurora build mozilla-aurora 433cd269be19 Android mozilla-aurora build 579cbf7a9add Android mozilla-aurora build 433cd269be19 Android mozilla-aurora build He would have not had the means to trigger those 3 csets that were on the same push. The builds should show up in http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-aurora-android/
How exactly is bug 654049 involved in this?
Depends on: 654049
No longer depends on: 654049
I did some debugging and seems like it's not really a JS bug, but rather some strange linker magic Functions from pixman_arm_neon_asm.o are supposed to be at least 4-byte aligned, which is the case before bug 672026: arm-linux-androideabi-objdump -t dist/lib/libxul.so | grep 'pixman.\+neon' > 009e57b8 l F .text 00000000 .hidden pixman_composite_src_0888_0565_rev_asm_neon > 009e79f8 l F .text 00000000 .hidden pixman_scaled_nearest_scanline_8888_8888_OVER_asm_neon > 009dc3d8 l F .text 00000000 .hidden pixman_composite_scanline_add_asm_neon > 009eb17c l F .text 00000000 .hidden pixman_scaled_bilinear_scanline_8888_8888_OVER_asm_neon > 009e6158 l F .text 00000000 .hidden pixman_composite_over_0565_8_0565_asm_neon > 009e8948 l F .text 00000000 .hidden pixman_scaled_nearest_scanline_0565_8888_SRC_asm_neon > 009e16c8 l F .text 00000000 .hidden pixman_composite_add_n_8_8_asm_neon > 009df054 l F .text 00000000 .hidden pixman_composite_src_n_0565_asm_neon But after bug 672026, everything from pixman_arm_neon_asm.o are now offset by 2 bytes (address in the first column): arm-linux-androideabi-objdump -t dist/lib/libxul.so | grep 'pixman.\+neon' > 009e5882 l F .text 00000000 .hidden pixman_composite_src_0888_0565_rev_asm_neon > 009e7ac2 l F .text 00000000 .hidden pixman_scaled_nearest_scanline_8888_8888_OVER_asm_neon > 009dc4a2 l F .text 00000000 .hidden pixman_composite_scanline_add_asm_neon > 009eb246 l F .text 00000000 .hidden pixman_scaled_bilinear_scanline_8888_8888_OVER_asm_neon > 009e6222 l F .text 00000000 .hidden pixman_composite_over_0565_8_0565_asm_neon > 009e8a12 l F .text 00000000 .hidden pixman_scaled_nearest_scanline_0565_8888_SRC_asm_neon > 009e1792 l F .text 00000000 .hidden pixman_composite_add_n_8_8_asm_neon > 009df11e l F .text 00000000 .hidden pixman_composite_src_n_0565_asm_neon Strange thing is this only happens to pixman_arm_neon_asm.o Now when we call these function, blx instruction implies 4-byte alignment: > 009da4dc <neon_composite_src_8888_8888+0x3c>: > 9da4dc: 9000 str r0, [sp, #0] > 9da4de: 980b ldr r0, [sp, #44] > 9da4e0: f004 efa8 blx 9df434 <pixman_composite_src_8888_8888_asm_neon+0x2> > 9da4e4: b003 add sp, #12 > 9da4e6: bd00 pop {pc} So our nice ARM instructions: > 009df432 <pixman_composite_src_8888_8888_asm_neon>: > 9df432: e92d5ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr} > 9df436: e59d4028 ldr r4, [sp, #40] > 9df43a: e3a0a000 mov sl, #0 ; 0x0 > 9df43e: e59d502c ldr r5, [sp, #44] > 9df442: e1a06002 mov r6, r2 > 9df446: e1a0b004 mov fp, r4 > 9df44a: e1a0c006 mov ip, r6 > 9df44e: e1a0e007 mov lr, r7 Turn into gibberish due to the 2-byte offset: > 009df434 <pixman_composite_src_8888_8888_asm_neon+0x2>: > 9df434: 4028e92d eormi lr, r8, sp, lsr #18 > 9df438: a000e59d mulge r0, sp, r5 > 9df43c: 502ce3a0 eorpl lr, ip, r0, lsr #7 > 9df440: 6002e59d mulvs r2, sp, r5 > 9df444: b004e1a0 andlt lr, r4, r0, lsr #3 > 9df448: c006e1a0 andgt lr, r6, r0, lsr #3 > 9df44c: e007e1a0 and lr, r7, r0, lsr #3 > 9df450: 9201e1a0 andls lr, r1, #40 ; 0x28 And sooner or later we crash. This only happens to that bit of NEON assembly, and our Tegra boards don't have NEON so this was not caught on tests. Also this doesn't happen with NDK5, so one more reason to switch :) I will try to find out if NDK5 doesn't have this linker bug because it was fixed or because the conditions for the bug aren't met under NDK5.
Yes, this looks like the same issue as bug 666931 and bug 623161
In the future please don't trigger nightlies when regression hunting. If you need clean builds, use https://build.mozilla.org/clobberer/ to clobber the builder, and use normal opt builds. Triggering multiple nightlies in parallel has unknown behaviour, and seems to cause us to temporarily strand users (bug 673501). Thanks!
I asked Timothy B. Terriberry on IRC and he provided more explanations about the problem and a link to this bug in binutils bugtracker: http://sourceware.org/bugzilla/show_bug.cgi?id=12931 For now the workaround (also applied to WebM earlier) is to explicitly set alignment for code sections and the following patch should do it for pixman: http://lists.freedesktop.org/archives/pixman/2011-July/001347.html Please confirm whether it really helps to resolve this bug. And if it does, then it makes sense to do a complete review of all the arm assembly code in Mozilla to see if such workarounds should be also applied somewhere else.
> Please confirm whether it really helps to resolve this bug. And if it does, > then it makes sense to do a complete review of all the arm assembly code in > Mozilla to see if such workarounds should be also applied somewhere else. Yes, this does fix the bug. Thank you for identifying the issue. I agree a complete review will be very helpful, before another innocent person gets bitten by this bug again :)
Has a bug been filed to get the pixman alignment fix into the mozilla codebase?
Crash Signature: [@ _moz_pixman_image_composite32] → [@ libxul.so@0x9df494] [@ _moz_pixman_image_composite32]
So, what do we need to do here for Firefox 7? Nothing? This is an existing problem? Do we have the workaround mentioned in comment 21 in mozilla-central or mozilla-beta? We must get some action on this today, preferably a resolution if it needs it.
(In reply to Christian Legnitto [:LegNeato] from comment #25) > So, what do we need to do here for Firefox 7? I would suggest cherry picking and applying http://cgit.freedesktop.org/pixman/commit/?id=b8d6babc91459a9f854695b56f0265298a3c6427 to the Mozilla's copy of pixman. And while you are at it, there is also bug 667284 with a simple fix available. Which would be also nice to have applied.
Attached patch Fix (deleted) — Splinter Review
Here's the patch for Mozilla. It's not in mozilla-central or anywhere else, but it would be very good to have.
Attachment #560443 - Flags: review?(siarhei.siamashka)
Comment on attachment 560443 [details] [diff] [review] Fix Also nominating for Aurora and Beta, since the crash was originally from Fennec 7. The patch has virtually no risk; it specifies one alignment attribute for three assembly files. Without it, any build can potentially contain this crash, and the tegras cannot catch it all the time.
Attachment #560443 - Flags: approval-mozilla-beta?
Attachment #560443 - Flags: approval-mozilla-aurora?
There were 2 crashes on android for the past week and ~45 on windows. Not high enough volume (though it is a startup crash which are underreported). Denying approval for beta.
Attachment #560443 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Comment on attachment 560443 [details] [diff] [review] Fix Review of attachment 560443 [details] [diff] [review]: ----------------------------------------------------------------- r+ from me
Comment on attachment 560443 [details] [diff] [review] Fix a=jst per todays driver meeting (and this was reviewed, the flag just didn't get set, and given the nature of this change we're ok approving this before it's been landed in mozilla-central).
Attachment #560443 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 560443 [details] [diff] [review] Fix making the r+ official
Attachment #560443 - Flags: review?(siarhei.siamashka) → review+
Assignee: nobody → jimnchen+bmo
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: