Open Bug 1528668 Opened 6 years ago Updated 2 years ago

Symbols aren't resolved during crash on try

Categories

(Firefox Build System :: General, defect)

defect

Tracking

(Not tracked)

People

(Reporter: jya, Unassigned)

References

Details

Example:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=228647927&repo=try&lineNumber=1988

show:
[task 2019-02-15T13:20:20.612Z] 13:20:20 INFO - PROCESS-CRASH | Main app process exited normally | application crashed [@ libxul.so + 0x2c56e65]
[task 2019-02-15T13:20:20.614Z] 13:20:20 INFO - Crash dump filename: /tmp/tmpncP6lQ.mozrunner/minidumps/3ca19c50-e663-9714-4391-7a7b93beea6d.dmp
[task 2019-02-15T13:20:20.615Z] 13:20:20 INFO - Operating system: Linux
[task 2019-02-15T13:20:20.616Z] 13:20:20 INFO - 0.0.0 Linux 4.4.0-1014-aws #14taskcluster1-Ubuntu SMP Tue Apr 3 10:27:00 UTC 2018 x86_64
[task 2019-02-15T13:20:20.617Z] 13:20:20 INFO - CPU: amd64
[task 2019-02-15T13:20:20.618Z] 13:20:20 INFO - family 6 model 62 stepping 4
[task 2019-02-15T13:20:20.619Z] 13:20:20 INFO - 2 CPUs
[task 2019-02-15T13:20:20.620Z] 13:20:20 INFO -
[task 2019-02-15T13:20:20.622Z] 13:20:20 INFO - GPU: UNKNOWN
[task 2019-02-15T13:20:20.623Z] 13:20:20 INFO -
[task 2019-02-15T13:20:20.624Z] 13:20:20 INFO - Crash reason: SIGSEGV /SEGV_MAPERR
[task 2019-02-15T13:20:20.626Z] 13:20:20 INFO - Crash address: 0x0
[task 2019-02-15T13:20:20.626Z] 13:20:20 INFO - Process uptime: not available
[task 2019-02-15T13:20:20.626Z] 13:20:20 INFO -
[task 2019-02-15T13:20:20.628Z] 13:20:20 INFO - Thread 26 (crashed)
[task 2019-02-15T13:20:20.628Z] 13:20:20 INFO - 0 libxul.so + 0x2c56e65
[task 2019-02-15T13:20:20.628Z] 13:20:20 INFO - rax = 0x00007f79339caf99 rdx = 0x0000000000000000
[task 2019-02-15T13:20:20.629Z] 13:20:20 INFO - rcx = 0x00005560ebfc6980 rbx = 0x00007f791c08c840
[task 2019-02-15T13:20:20.629Z] 13:20:20 INFO - rsi = 0x00007f793f12b770 rdi = 0x00007f793f12a540
[task 2019-02-15T13:20:20.629Z] 13:20:20 INFO - rbp = 0x00007f791d73a530 rsp = 0x00007f791d73a420
[task 2019-02-15T13:20:20.630Z] 13:20:20 INFO - r8 = 0x00007f793f12b770 r9 = 0x00007f791d73b700
[task 2019-02-15T13:20:20.630Z] 13:20:20 INFO - r10 = 0x0000000000000000 r11 = 0x0000000000000000
[task 2019-02-15T13:20:20.630Z] 13:20:20 INFO - r12 = 0x00007f791c08c860 r13 = 0x00007f791d73a4d0
[task 2019-02-15T13:20:20.632Z] 13:20:20 INFO - r14 = 0x00007f791ae4ae80 r15 = 0x00007f791c08c820
[task 2019-02-15T13:20:20.632Z] 13:20:20 INFO - rip = 0x00007f792f537e65
[task 2019-02-15T13:20:20.632Z] 13:20:20 INFO - Found by: given as instruction pointer in context
[task 2019-02-15T13:20:20.633Z] 13:20:20 INFO - 1 libxul.so + 0x2c55b3c
[task 2019-02-15T13:20:20.633Z] 13:20:20 INFO - rbp = 0x00007f791d73a580 rsp = 0x00007f791d73a540
[task 2019-02-15T13:20:20.633Z] 13:20:20 INFO - rip = 0x00007f792f536b3c
[task 2019-02-15T13:20:20.634Z] 13:20:20 INFO - Found by: previous frame's frame pointer
[task 2019-02-15T13:20:20.634Z] 13:20:20 INFO - 2 libxul.so + 0x2c6bd02
[task 2019-02-15T13:20:20.634Z] 13:20:20 INFO - rbp = 0x00007f791d73a5f0 rsp = 0x00007f791d73a590

OR this windows:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=228632791&repo=try&lineNumber=3150

Makes crash investigation rather hard.

Blocks: 1517210
Flags: needinfo?(gsvelto)

I see two problems here that have the same cause: first of all the assertion stack is not being symbolicated, this is done by fix_stack_using_bpsyms.py. Then the stack extracted from the minidump is not being symbolicated either, since this is done by a separate tool my guess is that we don't have symbols for this build or the symbols are corrupt in a way that makes both tools ignore them.

The Linux build [1] doesn't have symbols for libxul.so. That's odd because if I dump them manually on my machine everything works fine and the build log on taskcluster doesn't show any errors while dumping symbols. Could it be some kind of automation error? The Windows build [2] on the other hand has all the symbols it needs but somehow minidump_stackwalk can't compute the debug_file ID for xul.dll and so can't find them.

[1] https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=228647927&repo=try&lineNumber=1988
[2] https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=228632791&repo=try&lineNumber=3150

The Windows issue is also rather odd. The CodeView entry for the xul.dll library is empty in the minidump so minidump_stackwalk can't generate the debug_file ID and thus find the symbols. Ted, any clues as to what might be causing this? I've never seen it happen before.

Flags: needinfo?(gsvelto) → needinfo?(ted)

(In reply to Gabriele Svelto [:gsvelto] from comment #3)

The Windows issue is also rather odd. The CodeView entry for the xul.dll library is empty in the minidump so minidump_stackwalk can't generate the debug_file ID and thus find the symbols. Ted, any clues as to what might be causing this? I've never seen it happen before.

We've seen this a number of times in near-OOM situations where the process can't mmap xul.dll so it can't get the debug id while writing the minidump. We have code that tries to reserve memory and free it before dump writing to solve this, but maybe it's not working (or xul.dll has grown much larger?):
https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/toolkit/crashreporter/nsExceptionHandler.cpp#392

Flags: needinfo?(ted)

From the linux64-debug build log:

[task 2019-02-15T13:03:52.143Z] 13:03:52     INFO -  Beginning work for file: /builds/worker/workspace/build/src/obj-firefox/toolkit/library/libxul.so
[task 2019-02-15T13:03:52.143Z] 13:03:52     INFO -  Processing file: /builds/worker/workspace/build/src/obj-firefox/toolkit/library/libxul.so
[task 2019-02-15T13:03:52.143Z] 13:03:52     INFO -  /builds/worker/workspace/build/src/obj-firefox/dist/host/bin/dump_syms /builds/worker/workspace/build/src/obj-firefox/toolkit/library/libxul.so
[task 2019-02-15T13:03:52.143Z] 13:03:52     INFO -  PERFHERDER_DATA: {"framework": {"name": "build_metrics"}, "suites": [{"subtests": [{"alertChangeType": "absolute", "name": "num_static_constructors", "value": 0, "alertThreshold": 3}], "name": "compiler_metrics"}]}
[task 2019-02-15T13:03:52.144Z] 13:03:52     INFO -  Finished processing /builds/worker/workspace/build/src/obj-firefox/toolkit/library/libxul.so in 18.76s

Interesting that it doesn't show any errors. ~18 seconds seems a bit too quick for this to have actually done work. Perhaps there's an error in dump_syms that's causing things to fail but isn't resulting in it exiting with an error status? For a try push you could try removing the bit in symbolstore.py that pipes stderr to /dev/null to see if anything interesting shows up:
https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/toolkit/crashreporter/tools/symbolstore.py#534

Blocks: 1470369

I'm assuming this is also the explanation for why stack walking doesn't have symbols (cf. https://taskcluster-artifacts.net/TIqRZNdgRbGZt9SyqBv8KA/0/public/logs/live_backing.log from my trypush on Tuesday) ? Is there a workaround?

The Linux issue could also be bug 1510698 since bug 1520163 hasn't been fixed yet.

Component: Try → General
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.