Closed Bug 810471 Opened 12 years ago Closed 9 years ago

Crash on startup during various tests

Categories

(Firefox for Android Graveyard :: General, defect)

x86
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: gbrown, Assigned: gbrown)

References

(Depends on 1 open bug)

Details

Various intermittent test failure bugs all appear to have one thing in common: Fennec crashes on startup. I believe this includes reftests, mochitests, robocop, and talos. Logcat often shows: E/GeckoLibLoad( 1536): Load nss done shortly before the crash. Stack traces are often not available, or show little more that libc and system library calls. Consider: Bug 722166 - Android tests intermittently just don't start Bug 796914 - Intermittent "talosError: 'stack found after process termination (org.mozilla.fennec: terminated by testAgent.killProcess, plugin-container: terminated by testAgent.killProcess, crashreporter: terminated by testAgent.killProcess)'" Bug 747787 - Intermittent Tegra crash in libdvm.so during tests Bug 686245 - Intermittent Android "talosError: 'browser non-zero return code (1) [browser_output.txt]'" and possibly also: Bug 810405 - Browser crashes not reported as test failures Bug 690176 - Intermittent Android "talosError: 'initialization timed out'" These bugs have thousands of tbpl failure logs. Many of these bugs contain logs that are not startup crashes, but I think startup crashes are a significant cause of the failures in most of these bugs. The absence of good stacks is a significant barrier to resolution.
(In reply to Geoff Brown [:gbrown] from comment #0) > Logcat often shows: > > E/GeckoLibLoad( 1536): Load nss done > > shortly before the crash. E/GeckoConsole(15253): Could not read chrome manifest 'file:///data/data/org.mozilla.fennec/chrome.manifest'. is just as common, and occurs after "Load nss done".
(In reply to Geoff Brown [:gbrown] from comment #1) > E/GeckoConsole(15253): Could not read chrome manifest > 'file:///data/data/org.mozilla.fennec/chrome.manifest'. > > is just as common, and occurs after "Load nss done". Neither are something wrong happening.
(In reply to Mike Hommey [:glandium] from comment #2) > (In reply to Geoff Brown [:gbrown] from comment #1) > > E/GeckoConsole(15253): Could not read chrome manifest > > 'file:///data/data/org.mozilla.fennec/chrome.manifest'. > > > > is just as common, and occurs after "Load nss done". > > Neither are something wrong happening. We should make them INFOs rather than ERRORs and/or suppress them if that's the case.
(In reply to Ed Morley [:edmorley UTC+0] from comment #3) > (In reply to Mike Hommey [:glandium] from comment #2) > > (In reply to Geoff Brown [:gbrown] from comment #1) > > > E/GeckoConsole(15253): Could not read chrome manifest > > > 'file:///data/data/org.mozilla.fennec/chrome.manifest'. > > > > > > is just as common, and occurs after "Load nss done". > > > > Neither are something wrong happening. > > We should make them INFOs rather than ERRORs and/or suppress them if that's > the case. The "Could not read chrome manifest" one is from the component manager, which sends it to the error console. You'll see similar messages on desktop builds, in the error console. There is no severity distinction for the component manager messages. For the others, there's a tendency to use ANDROID_LOG_ERROR for calls to __android_log_print in APKOpen.cpp which i'm not sure where they come from. The latter could be filed as a separate bug. The former would need to add severity to the component manager logging. File another bug for that if you want.
(In reply to Mike Hommey [:glandium] from comment #2) > (In reply to Geoff Brown [:gbrown] from comment #1) > > E/GeckoConsole(15253): Could not read chrome manifest > > 'file:///data/data/org.mozilla.fennec/chrome.manifest'. > > > > is just as common, and occurs after "Load nss done". > > Neither are something wrong happening. I know they are normal -- I just wanted to note that these are often the most-recent log messages before the crash. Since I cannot reproduce the crash with any regularity, I have been adding more logging to try runs. Preliminary results suggest -- but only suggest -- that the crash is in loadGeckoLibsNative.
(In reply to Geoff Brown [:gbrown] from comment #5) > Preliminary results suggest -- but only suggest -- that the crash is in > loadGeckoLibsNative. Logging shows that many of the crashes occur while loadGeckoLibsNative is executing...but sometimes the crash occurs after loadGeckoLibsNative completes, and occasionally before it starts. I'm thinking: - maybe the crash is on a different thread - maybe there are multiple causes - maybe the logging is not reliable
Blocks: 816584
Blocks: 817583
On a tip from glandium on an unrelated matter, I thought I would try re-creating the crash with additional logging enabled with: DEFINES += -DMOZ_DEBUG_LINKER in mozglue/linker/Makefile.in Here are 2 such crashes with the extra logging: https://tbpl.mozilla.org/php/getParsedLog.php?id=17566044&tree=Try&full=1 https://tbpl.mozilla.org/php/getParsedLog.php?id=17561794&tree=Try&full=1 Here's a sample: 12-03 13:05:48.236 E/GeckoLinker( 7522): /data/app/org.mozilla.fennec-1.apk - GetFile libxul.so 12-03 13:05:48.236 E/GeckoLinker( 7522): /data/app/org.mozilla.fennec-1.apk - Scan directory entries in search for libxul.so 12-03 13:05:48.236 E/GeckoLinker( 7522): Decompression buffer of size 37564622 in ashmem "libxul.so", mapped @0x5274d000 12-03 13:05:48.236 E/GeckoLinker( 7522): CustomElf::Load("/data/app/org.mozilla.fennec-1.apk!/libxul.so", 3) = ... 12-03 13:05:48.236 E/GeckoLinker( 7522): /data/app/org.mozilla.fennec-1.apk!/libxul.so: Warning: program header type #1879048193 not handled 12-03 13:05:48.236 E/GeckoLinker( 7522): PT_LOAD @0x00000000 (filesz: 0x0006c850, memsz: 0x0006c850, offset: 0x00000000, flags: r-x) 12-03 13:05:48.236 E/GeckoLinker( 7522): PT_LOAD @0x002880f4 (filesz: 0x010a1f14, memsz: 0x010a1f14, offset: 0x0006d0f4, flags: r-x) 12-03 13:05:48.236 E/GeckoLinker( 7522): PT_LOAD @0x0132b008 (filesz: 0x001a2cb0, memsz: 0x0021315c, offset: 0x0110f008, flags: rw-) 12-03 13:05:48.236 E/GeckoLinker( 7522): PT_DYNAMIC @0x01474c40 (filesz: 0x00000150, memsz: 0x00000150, offset: 0x01258c40, flags: rw-) 12-03 13:05:48.236 E/GeckoLinker( 7522): /data/app/org.mozilla.fennec-1.apk!/libxul.so: Warning: program header type #1685382480 not handled 12-03 13:05:48.236 E/GeckoLinker( 7522): PT_GNU_STACK @0x00000000 (filesz: 0x00000000, memsz: 0x00000000, offset: 0x00000000, flags: rw-) 12-03 13:05:48.236 E/GeckoLinker( 7522): /data/app/org.mozilla.fennec-1.apk!/libxul.so: Loading segment @0x54b22000 r-x 12-03 13:05:48.256 E/GeckoLinker( 7522): cacheflush(0x5274d000, 0x527b9850) 12-03 13:05:48.256 E/GeckoLinker( 7522): /data/app/org.mozilla.fennec-1.apk!/libxul.so: Loading segment @0x54daa000 r-x 12-03 13:05:48.316 E/GeckoLinker( 7522): Caught segmentation fault @0x270000 12-03 13:05:48.316 E/GeckoLinker( 7522): Redispatching to registered handler @0xb0003515 12-03 13:05:48.416 I/DEBUG ( 937): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** 12-03 13:05:48.416 I/DEBUG ( 937): Build fingerprint: 'nvidia/harmony/harmony/harmony:2.2/FRF91/20110202.102810:eng/test-keys' 12-03 13:05:48.416 I/DEBUG ( 937): pid: 7522, tid: 7522 >>> org.mozilla.fennec <<< 12-03 13:05:48.416 I/DEBUG ( 937): signal 11 (SIGSEGV), fault addr 00270000 12-03 13:05:48.416 I/DEBUG ( 937): r0 00310b00 r1 0026fffc r2 000019e5 r3 00fb709a 12-03 13:05:48.416 I/DEBUG ( 937): r4 9a164904 r5 f7f865dd r6 2b36a00b r7 73f43a94 12-03 13:05:48.416 I/DEBUG ( 937): r8 566f90b6 r9 08e40b93 10 eda26da8 fp fb709aed 12-03 13:05:48.416 I/DEBUG ( 937): ip 9a164904 sp bef04cc0 lr 00000018 pc afd0f388 cpsr 20000010 12-03 13:05:48.416 I/DEBUG ( 937): d0 4481a00044564000 d1 0000009d000000b4 12-03 13:05:48.416 I/DEBUG ( 937): d2 0000000000000000 d3 0000000000000000 12-03 13:05:48.416 I/DEBUG ( 937): d4 0000000000000000 d5 3ff000003f800000 12-03 13:05:48.416 I/DEBUG ( 937): d6 4456400040800000 d7 431d000040800000 12-03 13:05:48.416 I/DEBUG ( 937): d8 0000040d00000359 d9 0000000000000004 12-03 13:05:48.416 I/DEBUG ( 937): d10 0000000000000000 d11 0000000000000000 12-03 13:05:48.416 I/DEBUG ( 937): d12 0000000000000000 d13 0000000000000000 12-03 13:05:48.416 I/DEBUG ( 937): d14 0000000000000000 d15 0000000000000000 12-03 13:05:48.416 I/DEBUG ( 937): scr 80000012 12-03 13:05:48.416 I/DEBUG ( 937): 12-03 13:05:48.466 I/DEBUG ( 937): #00 pc 0000f388 /system/lib/libc.so 12-03 13:05:48.466 I/DEBUG ( 937): #01 pc 0008fabc /system/lib/libskia.so 12-03 13:05:48.466 I/DEBUG ( 937): #02 pc 000402b0 /system/lib/libskia.so 12-03 13:05:48.466 I/DEBUG ( 937): #03 pc 000d8640 /system/lib/libskia.so 12-03 13:05:48.466 I/DEBUG ( 937): #04 pc 000dd2ce /system/lib/libskia.so 12-03 13:05:48.466 I/DEBUG ( 937): #05 pc 000d785c /system/lib/libskia.so 12-03 13:05:48.466 I/DEBUG ( 937): #06 pc 000d7aac /system/lib/libskia.so 12-03 13:05:48.466 I/DEBUG ( 937): #07 pc 00040d44 /system/lib/libskia.so 12-03 13:05:48.476 I/DEBUG ( 937): #08 pc 0003d7a8 /system/lib/libskia.so 12-03 13:05:48.476 I/DEBUG ( 937): #09 pc 00049b2c /system/lib/libskia.so 12-03 13:05:48.476 I/DEBUG ( 937): #10 pc 000423d4 /system/lib/libskia.so 12-03 13:05:48.476 I/DEBUG ( 937): #11 pc 00042448 /system/lib/libskia.so 12-03 13:05:48.476 I/DEBUG ( 937): #12 pc 000852cc /system/lib/libskia.so 12-03 13:05:48.476 I/DEBUG ( 937): #13 pc 0004b568 /system/lib/libskia.so 12-03 13:05:48.476 I/DEBUG ( 937): #14 pc 0006d65c /system/lib/libskia.so 12-03 13:05:48.476 I/DEBUG ( 937): 12-03 13:05:48.476 I/DEBUG ( 937): code around pc: 12-03 13:05:48.476 I/DEBUG ( 937): afd0f368 e1a0a82a e18aa80b e8a007f8 e1a0382b 12-03 13:05:48.476 I/DEBUG ( 937): afd0f378 2affffe8 ea000030 e491c004 e1a0400c 12-03 13:05:48.476 I/DEBUG ( 937): afd0f388 e8b10fe0 f5d1f040 e2522020 2491c004 12-03 13:05:48.476 I/DEBUG ( 937): afd0f398 e1833c04 e1a04424 e1844c05 e1a05425 12-03 13:05:48.476 I/DEBUG ( 937): afd0f3a8 e1855c06 e1a06426 e1866c07 e1a07427 12-03 13:05:48.476 I/DEBUG ( 937): 12-03 13:05:48.476 I/DEBUG ( 937): code around lr: 12-03 13:05:48.476 I/DEBUG ( 937): 12-03 13:05:48.476 I/DEBUG ( 937): stack: 12-03 13:05:48.476 I/DEBUG ( 937): bef04c80 00312568 12-03 13:05:48.476 I/DEBUG ( 937): bef04c84 00312800 12-03 13:05:48.476 I/DEBUG ( 937): bef04c88 00000001 12-03 13:05:48.476 I/DEBUG ( 937): bef04c8c 00002530 12-03 13:05:48.476 I/DEBUG ( 937): bef04c90 000000c5 12-03 13:05:48.476 I/DEBUG ( 937): bef04c94 002f50a4 12-03 13:05:48.476 I/DEBUG ( 937): bef04c98 000000eb 12-03 13:05:48.476 I/DEBUG ( 937): bef04c9c 000002d1 12-03 13:05:48.476 I/DEBUG ( 937): bef04ca0 af710f0c /system/lib/libz.so 12-03 13:05:48.476 I/DEBUG ( 937): bef04ca4 000001ff 12-03 13:05:48.476 I/DEBUG ( 937): bef04ca8 00000008 12-03 13:05:48.476 I/DEBUG ( 937): bef04cac afd4a9b4 12-03 13:05:48.476 I/DEBUG ( 937): bef04cb0 00000208 12-03 13:05:48.476 I/DEBUG ( 937): bef04cb4 002f2d90 12-03 13:05:48.476 I/DEBUG ( 937): bef04cb8 00000001 12-03 13:05:48.476 I/DEBUG ( 937): bef04cbc af712158 /system/lib/libz.so 12-03 13:05:48.476 I/DEBUG ( 937): #00 bef04cc0 00002000 12-03 13:05:48.476 I/DEBUG ( 937): bef04cc4 00002000 12-03 13:05:48.476 I/DEBUG ( 937): bef04cc8 00314a48 12-03 13:05:48.476 I/DEBUG ( 937): bef04ccc 002f2d90 12-03 13:05:48.476 I/DEBUG ( 937): bef04cd0 00000001 12-03 13:05:48.476 I/DEBUG ( 937): bef04cd4 0000005f 12-03 13:05:48.476 I/DEBUG ( 937): bef04cd8 00000000 12-03 13:05:48.476 I/DEBUG ( 937): bef04cdc 00310508 12-03 13:05:48.476 I/DEBUG ( 937): bef04ce0 00249c28 12-03 13:05:48.476 I/DEBUG ( 937): bef04ce4 ab18fac0 /system/lib/libskia.so 12-03 13:05:48.476 I/DEBUG ( 937): #01 bef04ce8 002f2d90 12-03 13:05:48.476 I/DEBUG ( 937): bef04cec 00002000 12-03 13:05:48.476 I/DEBUG ( 937): bef04cf0 00002000 12-03 13:05:48.476 I/DEBUG ( 937): bef04cf4 ab1402b4 /system/lib/libskia.so 12-03 13:05:48.916 E/GeckoLinker( 7522): cacheflush(0x527ba000, 0x5385c008) 12-03 13:05:49.106 I/DEBUG ( 937): debuggerd committing suicide to free the zombie! 12-03 13:05:49.106 I/DEBUG ( 7541): debuggerd: Feb 2 2011 10:46:35 12-03 13:05:49.116 I/BootReceiver( 1020): Copying /data/tombstones/tombstone_09 to DropBox (SYSTEM_TOMBSTONE) 12-03 13:05:49.116 I/ActivityManager( 1020): Process org.mozilla.fennec (pid 7522) has died. 12-03 13:05:49.126 D/Zygote ( 939): Process 7522 terminated by signal (11) Is this helpful? Where can we go from here?
Flags: needinfo?(mh+mozilla)
Blocks: 818107
Blocks: 818235
(In reply to Geoff Brown [:gbrown] from comment #5) > Preliminary results suggest -- but only suggest -- that the crash is in > loadGeckoLibsNative. Comment 7 suggests otherwise.
Flags: needinfo?(mh+mozilla)
Blocks: 749785
There is also bug 757405, which has a SIGSEGV blamed on libskia in reftests. In that bug, the crash happens later -- typically 5 to 10 seconds after libs are loaded.
Blocks: 823535
And in bug 823535, we see the same symptoms on a pandaboard.
Blocks: 781107
Blocks: 825643
Depends on: 824760
Depends on: 803158
Blocks: 829371
These unexplained crash-on-startup failures are now quite rare. Reviews of recent logs reported in bugs 796914, 722166, 686245, 816584, and 781107 show that the most frequent problems are: Bug 761987 - shutdown crashes in nsCertificate_Destroy Bug 829419 - crashes in nsDeleteDir::PostTimer Bug 830557 - java.lang.NullPointerException in IInputMethodManager There is a fairly high (temporary?) incidence of unexplained crashes on mozilla-beta: see https://bugzilla.mozilla.org/show_bug.cgi?id=686245#c1344. The remaining unexplained crash on startup logs that I can find from the last week are: https://bugzilla.mozilla.org/show_bug.cgi?id=796914#c623 (Jan 12) https://bugzilla.mozilla.org/show_bug.cgi?id=781107#c233 (Jan 10) https://bugzilla.mozilla.org/show_bug.cgi?id=816584#c49 (mozilla-aurora, Jan 7) https://bugzilla.mozilla.org/show_bug.cgi?id=816584#c50 (mozilla-aurora, Jan 7) https://bugzilla.mozilla.org/show_bug.cgi?id=816584#c51 (mozilla-aurora, Jan 7) (There are additional examples on Jan 6 and earlier; eg https://bugzilla.mozilla.org/show_bug.cgi?id=686245#c1291).
Bug 833766 contains a crash report with symbols that would look similar to our unexplained crash reports, if it did not have symbols. In that case, the top frame is /data/data/org.mozilla.fennec_morbo/lib/libmozglue.so (Java_org_mozilla_gecko_GeckoAppShell_reportJavaCrash) ... worth investigating.
Blocks: 822296
Bug 837821 describes a possible cause for the remaining, unexplained startup crashes, and the test failures that initiated that bug look very similar to those we have been chasing in this bug.
Depends on: 837821
Depends on: 830557, 829419, 823452, 826385
Depends on: 838636
Assignee: nobody → gbrown
We still see some crashes on test startup, but they occur much less frequently than when this bug was opened. Reporting uncaught java exceptions was particularly helpful. Dependent bugs 803158 and 824760 are still good ideas that we should pursue when priorities allow.
Assignee: gbrown → nobody
We did good work here. There are far fewer startup crashes now and this bug has outlived its usefulness.
Assignee: nobody → gbrown
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.