ensure errorsummary.json has failure annotations (crash, timeout,etc.)
Categories
(Testing :: XPCShell Harness, enhancement)
Tracking
(Not tracked)
People
(Reporter: jmaher, Assigned: jmaher)
References
(Blocks 2 open bugs)
Details
Attachments
(1 file)
(deleted),
text/x-phabricator-request
|
Details |
this will help us make better decisions. Currently we detect crashes, timeouts, leaks - both at the test and the harness level. We can add metadata to the errorsummary.json so this is parsable.
Comment 1•3 years ago
|
||
So it does look like we should be including crash logs in the errorsummary:
https://searchfox.org/mozilla-central/source/testing/mozbase/mozlog/mozlog/formatters/errorsummary.py#123
It's possible the xpcshell harness isn't logging crashes properly (using the crash action in mozlog).
Assignee | ||
Comment 2•3 years ago
|
||
for a timeout within a testcase TEST-UNEXPECTED-TIMEOUT
, we have:
{"test": "netwerk/test/unit/test_http3_early_hint_listener.js", "subtest": null, "group": "netwerk/test/unit/xpcshell.ini", "status": "TIMEOUT", "expected": "PASS", "message": "Test timed out", "stack": null, "known_intermittent": [], "action": "test_result", "line": 2294}
{"level": "ERROR", "message": "TEST-UNEXPECTED-FAIL | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT)", "action": "log", "line": 2425}
- we can detect status!=expected && status=='TIMEOUT'
for a crash on a specific testcase, we have:
{"test": "xpcshell-child-process.ini:dom/indexedDB/test/unit/test_clear.js", "signature": "@ mozilla::dom::IDBTransaction::~IDBTransaction()", "stackwalk_stdout": "Operating system: Mac OS X\n 10.15.7 19H524\nCPU: amd64\n family 6 model 158 stepping 10\n 12 CPUs\n\nCrash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS\nCrash address: 0x0\nMac Crash Info:\n\nProcess uptime: 10 seconds\n\nThread 0 MainThread (crashed)\n 0 XUL!mozilla::dom::IDBTransaction::~IDBTransaction() [IDBTransaction.cpp:c06bbb0ddc24d3d1605e5f67c1b875aad60e26c5 : 135 + 0x29]\n rax = 0x000000011b0ed8d3
...
0.241.100.2\n0x7fff73544000 - 0x7fff7354cfff libsystem_platform.dylib 0.220.100.1\n0x7fff7354d000 - 0x7fff73557fff libsystem_pthread.dylib 0.416.100.3\n0x7fff73558000 - 0x7fff7355cfff libsystem_sandbox.dylib 0.1217.141.2\n0x7fff7355d000 - 0x7fff7355ffff libsystem_secinit.dylib 0.62.100.2\n0x7fff73560000 - 0x7fff73567fff libsystem_symptoms.dylib 0.1.0.0\n0x7fff73568000 - 0x7fff7357efff libsystem_trace.dylib 0.1147.120.1\n0x7fff73580000 - 0x7fff73585fff libunwind.dylib 0.35.4.0\n0x7fff73586000 - 0x7fff735bbfff libxpc.dylib 0.1738.140.2\n\nUnloaded modules:\n", "stackwalk_stderr": null, "action": "crash", "line": 1102}
- this has the field
signature
andaction=="crash"
there are cases when tests don't run or failures occur outside of tests...
- taskcluster error timed out live.log
** early into the test cycle, all tests started failing asTEST-TIMEOUT
- error while setting up live.log
- harness timeout while running tests errorsummary.json, live.log
** this has an existing error, but overall the harness timeout is not reflected in errorsummary.
Assignee | ||
Comment 3•3 years ago
|
||
given that for specific test cases we output discoverable items to erorrsummary.json, I think there isn't anything to do there.
:ahal, am I missing some of the obvious stuff here?
Comment 4•3 years ago
|
||
Yeah looks like it's working to me. Might want to verify with Marco as he mentioned there was data missing. Possible that the harness misses some cases but catches others? It's also possible we might need to update things on the mozci
side.
But either way looks like errorsummary.py
is good.
Marco did you have an example of a case where data appeared to be missing?
Comment 5•3 years ago
|
||
We discussed this on Matrix and yes, it looks like all the data we need is there.
The only problem we noticed, which is related to this but a separate problem, is that in the case of crashes we report a group result with status "OK".
For example in https://firefoxci.taskcluster-artifacts.net/ZhptEmYYQDCMHTLHtliH4Q/0/public/test_info/xpcshell_errorsummary.log we have {"group": "dom/indexedDB/test/unit/xpcshell-child-process.ini:dom/indexedDB/test/unit/xpcshell-shared.ini", "status": "OK", "duration": 31772, "action": "group_result", "line": 3559}
even though xpcshell-child-process.ini:dom/indexedDB/test/unit/test_clear.js
is crashing.
Assignee | ||
Comment 6•3 years ago
|
||
Updated•3 years ago
|
Assignee | ||
Comment 7•3 years ago
|
||
one case we can handle where we get a crash and need to mark the group status != OK in errorsummary.json
Assignee | ||
Updated•3 years ago
|
Comment 9•3 years ago
|
||
Backed out changeset e6ec6c4fe419 (bug 1758984) for causing python3 unit test failures in test_mochitest_integration
Backout link: https://hg.mozilla.org/integration/autoland/rev/aba142c47ba9e18a419b8dd0372eb8654e567e0f
Assignee | ||
Comment 10•3 years ago
|
||
had the fix in another commit locally, moved it over and tested on try:
https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=6f7114b4095fa7582e7e3ee3546202f2ea8c16e4
all looks good now
Comment 11•3 years ago
|
||
Comment 12•3 years ago
|
||
bugherder |
Comment 13•2 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:jmaher, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 15•1 year ago
|
||
I have fixed many edge cases recently. I would like to close this and open specific bugs for edge cases when we find them
Updated•1 year ago
|
Description
•