Symbols aren't resolved during crash on try
Categories
(Firefox Build System :: General, defect)
Tracking
(Not tracked)
People
(Reporter: jya, Unassigned)
References
Details
Example:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=228647927&repo=try&lineNumber=1988
show:
[task 2019-02-15T13:20:20.612Z] 13:20:20 INFO - PROCESS-CRASH | Main app process exited normally | application crashed [@ libxul.so + 0x2c56e65]
[task 2019-02-15T13:20:20.614Z] 13:20:20 INFO - Crash dump filename: /tmp/tmpncP6lQ.mozrunner/minidumps/3ca19c50-e663-9714-4391-7a7b93beea6d.dmp
[task 2019-02-15T13:20:20.615Z] 13:20:20 INFO - Operating system: Linux
[task 2019-02-15T13:20:20.616Z] 13:20:20 INFO - 0.0.0 Linux 4.4.0-1014-aws #14taskcluster1-Ubuntu SMP Tue Apr 3 10:27:00 UTC 2018 x86_64
[task 2019-02-15T13:20:20.617Z] 13:20:20 INFO - CPU: amd64
[task 2019-02-15T13:20:20.618Z] 13:20:20 INFO - family 6 model 62 stepping 4
[task 2019-02-15T13:20:20.619Z] 13:20:20 INFO - 2 CPUs
[task 2019-02-15T13:20:20.620Z] 13:20:20 INFO -
[task 2019-02-15T13:20:20.622Z] 13:20:20 INFO - GPU: UNKNOWN
[task 2019-02-15T13:20:20.623Z] 13:20:20 INFO -
[task 2019-02-15T13:20:20.624Z] 13:20:20 INFO - Crash reason: SIGSEGV /SEGV_MAPERR
[task 2019-02-15T13:20:20.626Z] 13:20:20 INFO - Crash address: 0x0
[task 2019-02-15T13:20:20.626Z] 13:20:20 INFO - Process uptime: not available
[task 2019-02-15T13:20:20.626Z] 13:20:20 INFO -
[task 2019-02-15T13:20:20.628Z] 13:20:20 INFO - Thread 26 (crashed)
[task 2019-02-15T13:20:20.628Z] 13:20:20 INFO - 0 libxul.so + 0x2c56e65
[task 2019-02-15T13:20:20.628Z] 13:20:20 INFO - rax = 0x00007f79339caf99 rdx = 0x0000000000000000
[task 2019-02-15T13:20:20.629Z] 13:20:20 INFO - rcx = 0x00005560ebfc6980 rbx = 0x00007f791c08c840
[task 2019-02-15T13:20:20.629Z] 13:20:20 INFO - rsi = 0x00007f793f12b770 rdi = 0x00007f793f12a540
[task 2019-02-15T13:20:20.629Z] 13:20:20 INFO - rbp = 0x00007f791d73a530 rsp = 0x00007f791d73a420
[task 2019-02-15T13:20:20.630Z] 13:20:20 INFO - r8 = 0x00007f793f12b770 r9 = 0x00007f791d73b700
[task 2019-02-15T13:20:20.630Z] 13:20:20 INFO - r10 = 0x0000000000000000 r11 = 0x0000000000000000
[task 2019-02-15T13:20:20.630Z] 13:20:20 INFO - r12 = 0x00007f791c08c860 r13 = 0x00007f791d73a4d0
[task 2019-02-15T13:20:20.632Z] 13:20:20 INFO - r14 = 0x00007f791ae4ae80 r15 = 0x00007f791c08c820
[task 2019-02-15T13:20:20.632Z] 13:20:20 INFO - rip = 0x00007f792f537e65
[task 2019-02-15T13:20:20.632Z] 13:20:20 INFO - Found by: given as instruction pointer in context
[task 2019-02-15T13:20:20.633Z] 13:20:20 INFO - 1 libxul.so + 0x2c55b3c
[task 2019-02-15T13:20:20.633Z] 13:20:20 INFO - rbp = 0x00007f791d73a580 rsp = 0x00007f791d73a540
[task 2019-02-15T13:20:20.633Z] 13:20:20 INFO - rip = 0x00007f792f536b3c
[task 2019-02-15T13:20:20.634Z] 13:20:20 INFO - Found by: previous frame's frame pointer
[task 2019-02-15T13:20:20.634Z] 13:20:20 INFO - 2 libxul.so + 0x2c6bd02
[task 2019-02-15T13:20:20.634Z] 13:20:20 INFO - rbp = 0x00007f791d73a5f0 rsp = 0x00007f791d73a590
OR this windows:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=228632791&repo=try&lineNumber=3150
Makes crash investigation rather hard.
Updated•6 years ago
|
Comment 1•6 years ago
|
||
I see two problems here that have the same cause: first of all the assertion stack is not being symbolicated, this is done by fix_stack_using_bpsyms.py. Then the stack extracted from the minidump is not being symbolicated either, since this is done by a separate tool my guess is that we don't have symbols for this build or the symbols are corrupt in a way that makes both tools ignore them.
Comment 2•6 years ago
|
||
The Linux build [1] doesn't have symbols for libxul.so. That's odd because if I dump them manually on my machine everything works fine and the build log on taskcluster doesn't show any errors while dumping symbols. Could it be some kind of automation error? The Windows build [2] on the other hand has all the symbols it needs but somehow minidump_stackwalk can't compute the debug_file ID for xul.dll and so can't find them.
[1] https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=228647927&repo=try&lineNumber=1988
[2] https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=228632791&repo=try&lineNumber=3150
Comment 3•6 years ago
|
||
The Windows issue is also rather odd. The CodeView entry for the xul.dll library is empty in the minidump so minidump_stackwalk can't generate the debug_file ID and thus find the symbols. Ted, any clues as to what might be causing this? I've never seen it happen before.
Comment 4•6 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #3)
The Windows issue is also rather odd. The CodeView entry for the xul.dll library is empty in the minidump so minidump_stackwalk can't generate the debug_file ID and thus find the symbols. Ted, any clues as to what might be causing this? I've never seen it happen before.
We've seen this a number of times in near-OOM situations where the process can't mmap xul.dll so it can't get the debug id while writing the minidump. We have code that tries to reserve memory and free it before dump writing to solve this, but maybe it's not working (or xul.dll has grown much larger?):
https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/toolkit/crashreporter/nsExceptionHandler.cpp#392
Comment 5•6 years ago
|
||
From the linux64-debug build log:
[task 2019-02-15T13:03:52.143Z] 13:03:52 INFO - Beginning work for file: /builds/worker/workspace/build/src/obj-firefox/toolkit/library/libxul.so
[task 2019-02-15T13:03:52.143Z] 13:03:52 INFO - Processing file: /builds/worker/workspace/build/src/obj-firefox/toolkit/library/libxul.so
[task 2019-02-15T13:03:52.143Z] 13:03:52 INFO - /builds/worker/workspace/build/src/obj-firefox/dist/host/bin/dump_syms /builds/worker/workspace/build/src/obj-firefox/toolkit/library/libxul.so
[task 2019-02-15T13:03:52.143Z] 13:03:52 INFO - PERFHERDER_DATA: {"framework": {"name": "build_metrics"}, "suites": [{"subtests": [{"alertChangeType": "absolute", "name": "num_static_constructors", "value": 0, "alertThreshold": 3}], "name": "compiler_metrics"}]}
[task 2019-02-15T13:03:52.144Z] 13:03:52 INFO - Finished processing /builds/worker/workspace/build/src/obj-firefox/toolkit/library/libxul.so in 18.76s
Interesting that it doesn't show any errors. ~18 seconds seems a bit too quick for this to have actually done work. Perhaps there's an error in dump_syms that's causing things to fail but isn't resulting in it exiting with an error status? For a try push you could try removing the bit in symbolstore.py that pipes stderr to /dev/null to see if anything interesting shows up:
https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/toolkit/crashreporter/tools/symbolstore.py#534
Comment 6•6 years ago
|
||
I'm assuming this is also the explanation for why stack walking doesn't have symbols (cf. https://taskcluster-artifacts.net/TIqRZNdgRbGZt9SyqBv8KA/0/public/logs/live_backing.log from my trypush on Tuesday) ? Is there a workaround?
Comment 7•6 years ago
|
||
The Linux issue could also be bug 1510698 since bug 1520163 hasn't been fixed yet.
Updated•6 years ago
|
Updated•2 years ago
|
Description
•