Closed
Bug 810471
Opened 12 years ago
Closed 9 years ago
Crash on startup during various tests
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: gbrown, Assigned: gbrown)
References
(Depends on 1 open bug)
Details
Various intermittent test failure bugs all appear to have one thing in common: Fennec crashes on startup. I believe this includes reftests, mochitests, robocop, and talos.
Logcat often shows:
E/GeckoLibLoad( 1536): Load nss done
shortly before the crash.
Stack traces are often not available, or show little more that libc and system library calls.
Consider:
Bug 722166 - Android tests intermittently just don't start
Bug 796914 - Intermittent "talosError: 'stack found after process termination (org.mozilla.fennec: terminated by testAgent.killProcess, plugin-container: terminated by testAgent.killProcess, crashreporter: terminated by testAgent.killProcess)'"
Bug 747787 - Intermittent Tegra crash in libdvm.so during tests
Bug 686245 - Intermittent Android "talosError: 'browser non-zero return code (1) [browser_output.txt]'"
and possibly also:
Bug 810405 - Browser crashes not reported as test failures
Bug 690176 - Intermittent Android "talosError: 'initialization timed out'"
These bugs have thousands of tbpl failure logs. Many of these bugs contain logs that are not startup crashes, but I think startup crashes are a significant cause of the failures in most of these bugs.
The absence of good stacks is a significant barrier to resolution.
Assignee | ||
Comment 1•12 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #0)
> Logcat often shows:
>
> E/GeckoLibLoad( 1536): Load nss done
>
> shortly before the crash.
E/GeckoConsole(15253): Could not read chrome manifest 'file:///data/data/org.mozilla.fennec/chrome.manifest'.
is just as common, and occurs after "Load nss done".
Comment 2•12 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #1)
> E/GeckoConsole(15253): Could not read chrome manifest
> 'file:///data/data/org.mozilla.fennec/chrome.manifest'.
>
> is just as common, and occurs after "Load nss done".
Neither are something wrong happening.
Comment 3•12 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #2)
> (In reply to Geoff Brown [:gbrown] from comment #1)
> > E/GeckoConsole(15253): Could not read chrome manifest
> > 'file:///data/data/org.mozilla.fennec/chrome.manifest'.
> >
> > is just as common, and occurs after "Load nss done".
>
> Neither are something wrong happening.
We should make them INFOs rather than ERRORs and/or suppress them if that's the case.
Comment 4•12 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #3)
> (In reply to Mike Hommey [:glandium] from comment #2)
> > (In reply to Geoff Brown [:gbrown] from comment #1)
> > > E/GeckoConsole(15253): Could not read chrome manifest
> > > 'file:///data/data/org.mozilla.fennec/chrome.manifest'.
> > >
> > > is just as common, and occurs after "Load nss done".
> >
> > Neither are something wrong happening.
>
> We should make them INFOs rather than ERRORs and/or suppress them if that's
> the case.
The "Could not read chrome manifest" one is from the component manager, which sends it to the error console. You'll see similar messages on desktop builds, in the error console. There is no severity distinction for the component manager messages.
For the others, there's a tendency to use ANDROID_LOG_ERROR for calls to __android_log_print in APKOpen.cpp which i'm not sure where they come from.
The latter could be filed as a separate bug. The former would need to add severity to the component manager logging. File another bug for that if you want.
Assignee | ||
Comment 5•12 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #2)
> (In reply to Geoff Brown [:gbrown] from comment #1)
> > E/GeckoConsole(15253): Could not read chrome manifest
> > 'file:///data/data/org.mozilla.fennec/chrome.manifest'.
> >
> > is just as common, and occurs after "Load nss done".
>
> Neither are something wrong happening.
I know they are normal -- I just wanted to note that these are often the most-recent log messages before the crash. Since I cannot reproduce the crash with any regularity, I have been adding more logging to try runs. Preliminary results suggest -- but only suggest -- that the crash is in loadGeckoLibsNative.
Assignee | ||
Updated•12 years ago
|
Assignee | ||
Comment 6•12 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #5)
> Preliminary results suggest -- but only suggest -- that the crash is in
> loadGeckoLibsNative.
Logging shows that many of the crashes occur while loadGeckoLibsNative is executing...but sometimes the crash occurs after loadGeckoLibsNative completes, and occasionally before it starts. I'm thinking:
- maybe the crash is on a different thread
- maybe there are multiple causes
- maybe the logging is not reliable
Assignee | ||
Comment 7•12 years ago
|
||
On a tip from glandium on an unrelated matter, I thought I would try re-creating the crash with additional logging enabled with:
DEFINES += -DMOZ_DEBUG_LINKER
in mozglue/linker/Makefile.in
Here are 2 such crashes with the extra logging:
https://tbpl.mozilla.org/php/getParsedLog.php?id=17566044&tree=Try&full=1
https://tbpl.mozilla.org/php/getParsedLog.php?id=17561794&tree=Try&full=1
Here's a sample:
12-03 13:05:48.236 E/GeckoLinker( 7522): /data/app/org.mozilla.fennec-1.apk - GetFile libxul.so
12-03 13:05:48.236 E/GeckoLinker( 7522): /data/app/org.mozilla.fennec-1.apk - Scan directory entries in search for libxul.so
12-03 13:05:48.236 E/GeckoLinker( 7522): Decompression buffer of size 37564622 in ashmem "libxul.so", mapped @0x5274d000
12-03 13:05:48.236 E/GeckoLinker( 7522): CustomElf::Load("/data/app/org.mozilla.fennec-1.apk!/libxul.so", 3) = ...
12-03 13:05:48.236 E/GeckoLinker( 7522): /data/app/org.mozilla.fennec-1.apk!/libxul.so: Warning: program header type #1879048193 not handled
12-03 13:05:48.236 E/GeckoLinker( 7522): PT_LOAD @0x00000000 (filesz: 0x0006c850, memsz: 0x0006c850, offset: 0x00000000, flags: r-x)
12-03 13:05:48.236 E/GeckoLinker( 7522): PT_LOAD @0x002880f4 (filesz: 0x010a1f14, memsz: 0x010a1f14, offset: 0x0006d0f4, flags: r-x)
12-03 13:05:48.236 E/GeckoLinker( 7522): PT_LOAD @0x0132b008 (filesz: 0x001a2cb0, memsz: 0x0021315c, offset: 0x0110f008, flags: rw-)
12-03 13:05:48.236 E/GeckoLinker( 7522): PT_DYNAMIC @0x01474c40 (filesz: 0x00000150, memsz: 0x00000150, offset: 0x01258c40, flags: rw-)
12-03 13:05:48.236 E/GeckoLinker( 7522): /data/app/org.mozilla.fennec-1.apk!/libxul.so: Warning: program header type #1685382480 not handled
12-03 13:05:48.236 E/GeckoLinker( 7522): PT_GNU_STACK @0x00000000 (filesz: 0x00000000, memsz: 0x00000000, offset: 0x00000000, flags: rw-)
12-03 13:05:48.236 E/GeckoLinker( 7522): /data/app/org.mozilla.fennec-1.apk!/libxul.so: Loading segment @0x54b22000 r-x
12-03 13:05:48.256 E/GeckoLinker( 7522): cacheflush(0x5274d000, 0x527b9850)
12-03 13:05:48.256 E/GeckoLinker( 7522): /data/app/org.mozilla.fennec-1.apk!/libxul.so: Loading segment @0x54daa000 r-x
12-03 13:05:48.316 E/GeckoLinker( 7522): Caught segmentation fault @0x270000
12-03 13:05:48.316 E/GeckoLinker( 7522): Redispatching to registered handler @0xb0003515
12-03 13:05:48.416 I/DEBUG ( 937): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
12-03 13:05:48.416 I/DEBUG ( 937): Build fingerprint: 'nvidia/harmony/harmony/harmony:2.2/FRF91/20110202.102810:eng/test-keys'
12-03 13:05:48.416 I/DEBUG ( 937): pid: 7522, tid: 7522 >>> org.mozilla.fennec <<<
12-03 13:05:48.416 I/DEBUG ( 937): signal 11 (SIGSEGV), fault addr 00270000
12-03 13:05:48.416 I/DEBUG ( 937): r0 00310b00 r1 0026fffc r2 000019e5 r3 00fb709a
12-03 13:05:48.416 I/DEBUG ( 937): r4 9a164904 r5 f7f865dd r6 2b36a00b r7 73f43a94
12-03 13:05:48.416 I/DEBUG ( 937): r8 566f90b6 r9 08e40b93 10 eda26da8 fp fb709aed
12-03 13:05:48.416 I/DEBUG ( 937): ip 9a164904 sp bef04cc0 lr 00000018 pc afd0f388 cpsr 20000010
12-03 13:05:48.416 I/DEBUG ( 937): d0 4481a00044564000 d1 0000009d000000b4
12-03 13:05:48.416 I/DEBUG ( 937): d2 0000000000000000 d3 0000000000000000
12-03 13:05:48.416 I/DEBUG ( 937): d4 0000000000000000 d5 3ff000003f800000
12-03 13:05:48.416 I/DEBUG ( 937): d6 4456400040800000 d7 431d000040800000
12-03 13:05:48.416 I/DEBUG ( 937): d8 0000040d00000359 d9 0000000000000004
12-03 13:05:48.416 I/DEBUG ( 937): d10 0000000000000000 d11 0000000000000000
12-03 13:05:48.416 I/DEBUG ( 937): d12 0000000000000000 d13 0000000000000000
12-03 13:05:48.416 I/DEBUG ( 937): d14 0000000000000000 d15 0000000000000000
12-03 13:05:48.416 I/DEBUG ( 937): scr 80000012
12-03 13:05:48.416 I/DEBUG ( 937):
12-03 13:05:48.466 I/DEBUG ( 937): #00 pc 0000f388 /system/lib/libc.so
12-03 13:05:48.466 I/DEBUG ( 937): #01 pc 0008fabc /system/lib/libskia.so
12-03 13:05:48.466 I/DEBUG ( 937): #02 pc 000402b0 /system/lib/libskia.so
12-03 13:05:48.466 I/DEBUG ( 937): #03 pc 000d8640 /system/lib/libskia.so
12-03 13:05:48.466 I/DEBUG ( 937): #04 pc 000dd2ce /system/lib/libskia.so
12-03 13:05:48.466 I/DEBUG ( 937): #05 pc 000d785c /system/lib/libskia.so
12-03 13:05:48.466 I/DEBUG ( 937): #06 pc 000d7aac /system/lib/libskia.so
12-03 13:05:48.466 I/DEBUG ( 937): #07 pc 00040d44 /system/lib/libskia.so
12-03 13:05:48.476 I/DEBUG ( 937): #08 pc 0003d7a8 /system/lib/libskia.so
12-03 13:05:48.476 I/DEBUG ( 937): #09 pc 00049b2c /system/lib/libskia.so
12-03 13:05:48.476 I/DEBUG ( 937): #10 pc 000423d4 /system/lib/libskia.so
12-03 13:05:48.476 I/DEBUG ( 937): #11 pc 00042448 /system/lib/libskia.so
12-03 13:05:48.476 I/DEBUG ( 937): #12 pc 000852cc /system/lib/libskia.so
12-03 13:05:48.476 I/DEBUG ( 937): #13 pc 0004b568 /system/lib/libskia.so
12-03 13:05:48.476 I/DEBUG ( 937): #14 pc 0006d65c /system/lib/libskia.so
12-03 13:05:48.476 I/DEBUG ( 937):
12-03 13:05:48.476 I/DEBUG ( 937): code around pc:
12-03 13:05:48.476 I/DEBUG ( 937): afd0f368 e1a0a82a e18aa80b e8a007f8 e1a0382b
12-03 13:05:48.476 I/DEBUG ( 937): afd0f378 2affffe8 ea000030 e491c004 e1a0400c
12-03 13:05:48.476 I/DEBUG ( 937): afd0f388 e8b10fe0 f5d1f040 e2522020 2491c004
12-03 13:05:48.476 I/DEBUG ( 937): afd0f398 e1833c04 e1a04424 e1844c05 e1a05425
12-03 13:05:48.476 I/DEBUG ( 937): afd0f3a8 e1855c06 e1a06426 e1866c07 e1a07427
12-03 13:05:48.476 I/DEBUG ( 937):
12-03 13:05:48.476 I/DEBUG ( 937): code around lr:
12-03 13:05:48.476 I/DEBUG ( 937):
12-03 13:05:48.476 I/DEBUG ( 937): stack:
12-03 13:05:48.476 I/DEBUG ( 937): bef04c80 00312568
12-03 13:05:48.476 I/DEBUG ( 937): bef04c84 00312800
12-03 13:05:48.476 I/DEBUG ( 937): bef04c88 00000001
12-03 13:05:48.476 I/DEBUG ( 937): bef04c8c 00002530
12-03 13:05:48.476 I/DEBUG ( 937): bef04c90 000000c5
12-03 13:05:48.476 I/DEBUG ( 937): bef04c94 002f50a4
12-03 13:05:48.476 I/DEBUG ( 937): bef04c98 000000eb
12-03 13:05:48.476 I/DEBUG ( 937): bef04c9c 000002d1
12-03 13:05:48.476 I/DEBUG ( 937): bef04ca0 af710f0c /system/lib/libz.so
12-03 13:05:48.476 I/DEBUG ( 937): bef04ca4 000001ff
12-03 13:05:48.476 I/DEBUG ( 937): bef04ca8 00000008
12-03 13:05:48.476 I/DEBUG ( 937): bef04cac afd4a9b4
12-03 13:05:48.476 I/DEBUG ( 937): bef04cb0 00000208
12-03 13:05:48.476 I/DEBUG ( 937): bef04cb4 002f2d90
12-03 13:05:48.476 I/DEBUG ( 937): bef04cb8 00000001
12-03 13:05:48.476 I/DEBUG ( 937): bef04cbc af712158 /system/lib/libz.so
12-03 13:05:48.476 I/DEBUG ( 937): #00 bef04cc0 00002000
12-03 13:05:48.476 I/DEBUG ( 937): bef04cc4 00002000
12-03 13:05:48.476 I/DEBUG ( 937): bef04cc8 00314a48
12-03 13:05:48.476 I/DEBUG ( 937): bef04ccc 002f2d90
12-03 13:05:48.476 I/DEBUG ( 937): bef04cd0 00000001
12-03 13:05:48.476 I/DEBUG ( 937): bef04cd4 0000005f
12-03 13:05:48.476 I/DEBUG ( 937): bef04cd8 00000000
12-03 13:05:48.476 I/DEBUG ( 937): bef04cdc 00310508
12-03 13:05:48.476 I/DEBUG ( 937): bef04ce0 00249c28
12-03 13:05:48.476 I/DEBUG ( 937): bef04ce4 ab18fac0 /system/lib/libskia.so
12-03 13:05:48.476 I/DEBUG ( 937): #01 bef04ce8 002f2d90
12-03 13:05:48.476 I/DEBUG ( 937): bef04cec 00002000
12-03 13:05:48.476 I/DEBUG ( 937): bef04cf0 00002000
12-03 13:05:48.476 I/DEBUG ( 937): bef04cf4 ab1402b4 /system/lib/libskia.so
12-03 13:05:48.916 E/GeckoLinker( 7522): cacheflush(0x527ba000, 0x5385c008)
12-03 13:05:49.106 I/DEBUG ( 937): debuggerd committing suicide to free the zombie!
12-03 13:05:49.106 I/DEBUG ( 7541): debuggerd: Feb 2 2011 10:46:35
12-03 13:05:49.116 I/BootReceiver( 1020): Copying /data/tombstones/tombstone_09 to DropBox (SYSTEM_TOMBSTONE)
12-03 13:05:49.116 I/ActivityManager( 1020): Process org.mozilla.fennec (pid 7522) has died.
12-03 13:05:49.126 D/Zygote ( 939): Process 7522 terminated by signal (11)
Is this helpful? Where can we go from here?
Flags: needinfo?(mh+mozilla)
Assignee | ||
Comment 8•12 years ago
|
||
This log looks a little different:
https://tbpl.mozilla.org/php/getParsedLog.php?id=17572488&tree=Try&full=1
Comment 10•12 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #5)
> Preliminary results suggest -- but only suggest -- that the crash is in
> loadGeckoLibsNative.
Comment 7 suggests otherwise.
Flags: needinfo?(mh+mozilla)
Assignee | ||
Comment 11•12 years ago
|
||
There is also bug 757405, which has a SIGSEGV blamed on libskia in reftests. In that bug, the crash happens later -- typically 5 to 10 seconds after libs are loaded.
Assignee | ||
Comment 12•12 years ago
|
||
And in bug 823535, we see the same symptoms on a pandaboard.
Assignee | ||
Comment 13•12 years ago
|
||
These unexplained crash-on-startup failures are now quite rare. Reviews of recent logs reported in bugs 796914, 722166, 686245, 816584, and 781107 show that the most frequent problems are:
Bug 761987 - shutdown crashes in nsCertificate_Destroy
Bug 829419 - crashes in nsDeleteDir::PostTimer
Bug 830557 - java.lang.NullPointerException in IInputMethodManager
There is a fairly high (temporary?) incidence of unexplained crashes on mozilla-beta: see https://bugzilla.mozilla.org/show_bug.cgi?id=686245#c1344.
The remaining unexplained crash on startup logs that I can find from the last week are:
https://bugzilla.mozilla.org/show_bug.cgi?id=796914#c623 (Jan 12)
https://bugzilla.mozilla.org/show_bug.cgi?id=781107#c233 (Jan 10)
https://bugzilla.mozilla.org/show_bug.cgi?id=816584#c49 (mozilla-aurora, Jan 7)
https://bugzilla.mozilla.org/show_bug.cgi?id=816584#c50 (mozilla-aurora, Jan 7)
https://bugzilla.mozilla.org/show_bug.cgi?id=816584#c51 (mozilla-aurora, Jan 7)
(There are additional examples on Jan 6 and earlier; eg https://bugzilla.mozilla.org/show_bug.cgi?id=686245#c1291).
Assignee | ||
Comment 14•12 years ago
|
||
Bug 833766 contains a crash report with symbols that would look similar to our unexplained crash reports, if it did not have symbols. In that case, the top frame is /data/data/org.mozilla.fennec_morbo/lib/libmozglue.so (Java_org_mozilla_gecko_GeckoAppShell_reportJavaCrash) ... worth investigating.
Assignee | ||
Comment 15•12 years ago
|
||
Bug 837821 describes a possible cause for the remaining, unexplained startup crashes, and the test failures that initiated that bug look very similar to those we have been chasing in this bug.
Depends on: 837821
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → gbrown
Assignee | ||
Comment 16•12 years ago
|
||
We still see some crashes on test startup, but they occur much less frequently than when this bug was opened. Reporting uncaught java exceptions was particularly helpful. Dependent bugs 803158 and 824760 are still good ideas that we should pursue when priorities allow.
Assignee: gbrown → nobody
Assignee | ||
Comment 17•9 years ago
|
||
We did good work here. There are far fewer startup crashes now and this bug has outlived its usefulness.
Assignee: nobody → gbrown
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Updated•4 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•