Crashes in PGO 'run' jobs upload no minidumps, nor stacks in the logs
Categories
(Firefox Build System :: Task Configuration, task)
Tracking
(firefox72 fixed)
Tracking | Status | |
---|---|---|
firefox72 | --- | fixed |
People
(Reporter: Gijs, Assigned: mshal)
References
Details
Attachments
(4 files)
Cf:
The log in the latter case indicates that the profile setup run quit with code 11, and some of the other log output suggests a crash of sorts, but no minidump artifacts are uploaded and no stack is printed in the log.
It'd be nice to make these things available so it'd be possible to know what was causing this issue. Not entirely sure this is the best component, please move as appropriate...
It's quite possible for these to be "special" crashes (ie not reproduced elsewhere), especially in the initial "profile setup" run, because the browser is only up a very short time, so if we shut down in the middle of a job that runs either on startup or shortly afterwards, that could cause crashes in ways that don't easily reproduce elsewhere...
Assignee | ||
Comment 1•5 years ago
|
||
This seems like it should be straightforward to add any crash artifacts to the task, but I'm having difficulty testing it. I tried using about:crashparent instead of about:blank in build/pgo/index.html, but Firefox never returns in the runner.wait() call from profileserver.py. Is there another way I can test this? And where would you expect the minidump artifacts to be?
Reporter | ||
Comment 2•5 years ago
|
||
(In reply to Michael Shal [:mshal] from comment #1)
This seems like it should be straightforward to add any crash artifacts to the task, but I'm having difficulty testing it. I tried using about:crashparent instead of about:blank in build/pgo/index.html, but Firefox never returns in the runner.wait() call from profileserver.py. Is there another way I can test this? And where would you expect the minidump artifacts to be?
I don't know (to both of these), sorry. A bit surprised about:crashparent didn't work... maybe it's to do with how we load it? It looks like loading it from the commandline doesn't work... Hopefully Gabriele knows.
Comment 3•5 years ago
|
||
PGO runs have the crash reporter enabled but disable reporting crashes at runtime:
In practice this means that minidumps should be generated but they won't end up in the usual folder ($HOME/.mozilla/firefox/Crash Reports/pending/) but they will be left under the user profiled instead ($<userprofile>/minidumps/).
These lines suggest that a minidump was generated, so it's possible that the task doesn't find it because it's not looking for it in the right place.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 4•5 years ago
|
||
Since we may or may not have minidump files, it is not practical to
specifically name each file. We can just upload all files in the
directory instead.
Assignee | ||
Comment 5•5 years ago
|
||
The minidumps are stored in the temporary profile directory used by
profileserver.py. Before the temporary directory goes away, we copy the
files to the artifacts directory. Note that simply moving the directory
doesn't necessarily work on Windows if this process hasn't shutdown
cleanly and something still has a file handle open.
Depends on D52393
Assignee | ||
Comment 6•5 years ago
|
||
Gijs, does this have all the data you're looking for? https://treeherder.mozilla.org/#/jobs?repo=try&revision=104a4dca74e0e1257f9a6a60476a78cc2d458058
The run tasks were forced to crash, and a .dmp and .extra files get uploaded as artifacts.
Reporter | ||
Comment 7•5 years ago
|
||
(In reply to Michael Shal [:mshal] from comment #6)
Gijs, does this have all the data you're looking for? https://treeherder.mozilla.org/#/jobs?repo=try&revision=104a4dca74e0e1257f9a6a60476a78cc2d458058
The run tasks were forced to crash, and a .dmp and .extra files get uploaded as artifacts.
Uploading the files is definitely an improvement. However, when there are crashes during test runs, normally the stack also gets dumped into the test logs - would it be possible to do the same here?
Assignee | ||
Comment 8•5 years ago
|
||
We already store the stdout+stderr from the Firefox process in the logs (profile-run-1.log and profile-run-2.log) - those are also dumped to the live_backing.log if the process quits with rc != 0. However, I don't see the stack trace in stdout/stderr when I forcefully crash the process. Is there an environment variable needed to trigger that? Or is there something in the test runner that opens up the minidump and produces a stack trace?
Reporter | ||
Comment 9•5 years ago
|
||
(In reply to Michael Shal [:mshal] from comment #8)
We already store the stdout+stderr from the Firefox process in the logs (profile-run-1.log and profile-run-2.log) - those are also dumped to the live_backing.log if the process quits with rc != 0. However, I don't see the stack trace in stdout/stderr when I forcefully crash the process. Is there an environment variable needed to trigger that? Or is there something in the test runner that opens up the minidump and produces a stack trace?
I don't know, I'm afraid. However, I found a random browser-chrome intermittent that crashes through my browser history, https://bugzilla.mozilla.org/show_bug.cgi?id=1582953, and looked at a recent log, https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=271255289&repo=mozilla-central&lineNumber=11414 , and based on some of the strings there, it seems like the output comes from a python library we have in the tree, called mozcrash
- https://searchfox.org/mozilla-central/rev/2a10f4812f3f7c7645d253a4fe52f26bf58e20e8/testing/mozbase/mozcrash/mozcrash/mozcrash.py#98-116 . It seems based on its docs that we should be able to import it and tell it where to find the dump files, and it can take care of the rest, or something? https://searchfox.org/mozilla-central/rev/2a10f4812f3f7c7645d253a4fe52f26bf58e20e8/testing/mozbase/mozcrash/mozcrash/mozcrash.py#50-82 may be more helpful than the docs. It looks like :ahal, :gbrown, :whimboo and :jmaher may know more about this code.
Comment 10•5 years ago
|
||
Yes, mozcrash is the way to go! Test harnesses like mochitest use mozcrash to report stacks when they find minidumps:
Comment 11•5 years ago
|
||
As Geoff already mentioned we should clearly make use of mozcrash if it's not the case yet. It does not only print the stack in the logs but also copies the minidump and extra file to the upload folder. Which means doing it manually (one of the attached patches) is not necessary.
Assignee | ||
Comment 12•5 years ago
|
||
Thanks for the info gbrown/whimboo! I've hooked up mozcrash now. Gijs, does this look reasonable? https://treeherder.mozilla.org/#/jobs?repo=try&revision=7aa429b775383f6363a04d0b914e29dd5093e854
The top of the crash shows up in the treeherder preview, with the full trace in the log, and the minidump files are still uploaded as artifacts.
Reporter | ||
Comment 13•5 years ago
|
||
(In reply to Michael Shal [:mshal] from comment #12)
Thanks for the info gbrown/whimboo! I've hooked up mozcrash now. Gijs, does this look reasonable? https://treeherder.mozilla.org/#/jobs?repo=try&revision=7aa429b775383f6363a04d0b914e29dd5093e854
The top of the crash shows up in the treeherder preview, with the full trace in the log, and the minidump files are still uploaded as artifacts.
Sort of, but the symbols don't seem to work? Is that expected?
Comment 14•5 years ago
|
||
I see minidump_stackwalk is being called with a symbols directory of /builds/worker/fetches/crashreporter-symbols. I also see:
[fetches 2019-11-15T23:30:58.318Z] Extracting /builds/worker/fetches/target.crashreporter-symbols-full.zip to /builds/worker/fetches
[fetches 2019-11-15T23:30:58.320Z] Archive: /builds/worker/fetches/target.crashreporter-symbols-full.zip
[fetches 2019-11-15T23:30:58.321Z] extracting: certutil/10F78F36AEBE340841159DE3986DBADB0/certutil.dbg.gz
[fetches 2019-11-15T23:30:58.322Z] inflating: certutil/10F78F36AEBE340841159DE3986DBADB0/certutil.sym
which looks like it might not be extracting to .../crashreporter-symbols??
Assignee | ||
Comment 15•5 years ago
|
||
Yeah I just noticed that as well. Locally I had unzipped it into a crashreporter-symbols directory. I'll do another push...
Assignee | ||
Comment 16•5 years ago
|
||
I think this looks better? https://treeherder.mozilla.org/#/jobs?repo=try&revision=f8e60a357a6f923f36c8439d2851274490703146
eg:
[task 2019-11-16T01:19:59.773Z] 0 libmozsqlite3.so!openDatabase [sqlite3.c:f8e60a357a6f923f36c8439d2851274490703146 : 159235 + 0x0]
[task 2019-11-16T01:19:59.773Z] eip = 0xf6541c0c esp = 0xcd690aa0 ebp = 0xcd690b08 ebx = 0xf672f000
[task 2019-11-16T01:19:59.773Z] esi = 0xcc87700c edi = 0x00040006 eax = 0x00000037 ecx = 0x00000037
[task 2019-11-16T01:19:59.773Z] edx = 0xf756a870 efl = 0x00010286
[task 2019-11-16T01:19:59.773Z] Found by: given as instruction pointer in context
[task 2019-11-16T01:19:59.773Z] 1 libmozsqlite3.so!sqlite3_open_v2 [sqlite3.c:f8e60a357a6f923f36c8439d2851274490703146 : 159618 + 0xa]
[task 2019-11-16T01:19:59.773Z] eip = 0xf6543055 esp = 0xcd690b10 ebp = 0xcd690b28 ebx = 0xf672f000
[task 2019-11-16T01:19:59.773Z] esi = 0xf1152cab edi = 0xcc876c00
[task 2019-11-16T01:19:59.773Z] Found by: call frame info
Reporter | ||
Comment 17•5 years ago
|
||
(In reply to Michael Shal [:mshal] from comment #16)
I think this looks better? https://treeherder.mozilla.org/#/jobs?repo=try&revision=f8e60a357a6f923f36c8439d2851274490703146
eg:
[task 2019-11-16T01:19:59.773Z] 0 libmozsqlite3.so!openDatabase [sqlite3.c:f8e60a357a6f923f36c8439d2851274490703146 : 159235 + 0x0]
[task 2019-11-16T01:19:59.773Z] eip = 0xf6541c0c esp = 0xcd690aa0 ebp = 0xcd690b08 ebx = 0xf672f000
[task 2019-11-16T01:19:59.773Z] esi = 0xcc87700c edi = 0x00040006 eax = 0x00000037 ecx = 0x00000037
[task 2019-11-16T01:19:59.773Z] edx = 0xf756a870 efl = 0x00010286
[task 2019-11-16T01:19:59.773Z] Found by: given as instruction pointer in context
[task 2019-11-16T01:19:59.773Z] 1 libmozsqlite3.so!sqlite3_open_v2 [sqlite3.c:f8e60a357a6f923f36c8439d2851274490703146 : 159618 + 0xa]
[task 2019-11-16T01:19:59.773Z] eip = 0xf6543055 esp = 0xcd690b10 ebp = 0xcd690b28 ebx = 0xf672f000
[task 2019-11-16T01:19:59.773Z] esi = 0xf1152cab edi = 0xcc876c00
[task 2019-11-16T01:19:59.773Z] Found by: call frame info
Perfect, thank you!
Comment 18•5 years ago
|
||
Where are those extra empty lines are coming from between each of the lines in the mozcrash output?
Assignee | ||
Comment 19•5 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #18)
Where are those extra empty lines are coming from between each of the lines in the mozcrash output?
It looks like minidump_stackwalk produces '0d 0a' line-endings. The output from mozcrash combines that output with other info (containing '0a' line endings), and the print(output) changes '0a' to '0d 0a', so the part that contains the minidump_stackwalk output becomes '0d 0d 0a'. Does this also show up in crashdumps from Windows tests? Or is the encoding set somewhere to avoid that problem?
Quick testing shows that adding universal_newlines=True to the minidump_stackwalk subprocess call fixes the issue, but I'm not sure if that would have other side effects if there's a workaround somewhere else for test crashes.
Comment 20•5 years ago
|
||
Interesting. Here an example from a crash in mochitest media which doesn't show those lines:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=276795702&repo=autoland&lineNumber=2049
Would you mind updating the patches here on the bug so that we can see under which conditions the try build was running? The current versions don't make use of mozcrash at all. Thanks.
Assignee | ||
Comment 21•5 years ago
|
||
minidump_stackwalk is used to process any crash reports generated by the
PGO run task.
Depends on D52393
Assignee | ||
Comment 22•5 years ago
|
||
To get an accurate backtrace if the run task crashes, we need the
symbols from the instrumented build.
Depends on D53716
Comment 23•5 years ago
|
||
I cannot actually see something right now. To no longer hold-up these patches from being landed please file a follow-up bug where this could be investigated.
Assignee | ||
Comment 24•5 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #23)
I cannot actually see something right now. To no longer hold-up these patches from being landed please file a follow-up bug where this could be investigated.
Good idea, I filed bug 1598393 as a followup.
Comment 25•5 years ago
|
||
Comment 26•5 years ago
|
||
Comment 27•5 years ago
|
||
Comment 28•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/ed1d3a106e3e
https://hg.mozilla.org/mozilla-central/rev/307146c7af6a
https://hg.mozilla.org/mozilla-central/rev/ee2997888f54
https://hg.mozilla.org/mozilla-central/rev/3fc9e18fc545
https://hg.mozilla.org/mozilla-central/rev/f494e587ee3a
https://hg.mozilla.org/mozilla-central/rev/33ea535c231b
Assignee | ||
Comment 29•5 years ago
|
||
Ahh, apparently this logic turns off full crashreporter syms on autoland, which caused the bustage there while my try push was green: https://searchfox.org/mozilla-central/rev/3483fb259b4edbe4594cfcc3911db97d5441b67d/taskcluster/taskgraph/transforms/build.py#160
Are ok with using the non-full crashreporter symbols for these backtraces?
Thanks :dvarga for the quick fixes!
Comment 30•5 years ago
|
||
minidump_stackwalk doesn't care about full crash symbols. Which means the enable-full-crashsymbols attribute can also be removed (it was not)
Assignee | ||
Updated•5 years ago
|
Description
•