Closed Bug 1758984 Opened 3 years ago Closed 1 year ago

ensure errorsummary.json has failure annotations (crash, timeout,etc.)

Categories

(Testing :: XPCShell Harness, enhancement)

Default
enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jmaher, Assigned: jmaher)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file)

this will help us make better decisions. Currently we detect crashes, timeouts, leaks - both at the test and the harness level. We can add metadata to the errorsummary.json so this is parsable.

So it does look like we should be including crash logs in the errorsummary:
https://searchfox.org/mozilla-central/source/testing/mozbase/mozlog/mozlog/formatters/errorsummary.py#123

It's possible the xpcshell harness isn't logging crashes properly (using the crash action in mozlog).

for a timeout within a testcase TEST-UNEXPECTED-TIMEOUT, we have:

{"test": "netwerk/test/unit/test_http3_early_hint_listener.js", "subtest": null, "group": "netwerk/test/unit/xpcshell.ini", "status": "TIMEOUT", "expected": "PASS", "message": "Test timed out", "stack": null, "known_intermittent": [], "action": "test_result", "line": 2294}
{"level": "ERROR", "message": "TEST-UNEXPECTED-FAIL | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT)", "action": "log", "line": 2425}
  • we can detect status!=expected && status=='TIMEOUT'

for a crash on a specific testcase, we have:

{"test": "xpcshell-child-process.ini:dom/indexedDB/test/unit/test_clear.js", "signature": "@ mozilla::dom::IDBTransaction::~IDBTransaction()", "stackwalk_stdout": "Operating system: Mac OS X\n                  10.15.7 19H524\nCPU: amd64\n     family 6 model 158 stepping 10\n     12 CPUs\n\nCrash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS\nCrash address: 0x0\nMac Crash Info:\n\nProcess uptime: 10 seconds\n\nThread 0 MainThread (crashed)\n 0  XUL!mozilla::dom::IDBTransaction::~IDBTransaction() [IDBTransaction.cpp:c06bbb0ddc24d3d1605e5f67c1b875aad60e26c5 : 135 + 0x29]\n    rax = 0x000000011b0ed8d3
...
0.241.100.2\n0x7fff73544000 - 0x7fff7354cfff  libsystem_platform.dylib  0.220.100.1\n0x7fff7354d000 - 0x7fff73557fff  libsystem_pthread.dylib  0.416.100.3\n0x7fff73558000 - 0x7fff7355cfff  libsystem_sandbox.dylib  0.1217.141.2\n0x7fff7355d000 - 0x7fff7355ffff  libsystem_secinit.dylib  0.62.100.2\n0x7fff73560000 - 0x7fff73567fff  libsystem_symptoms.dylib  0.1.0.0\n0x7fff73568000 - 0x7fff7357efff  libsystem_trace.dylib  0.1147.120.1\n0x7fff73580000 - 0x7fff73585fff  libunwind.dylib  0.35.4.0\n0x7fff73586000 - 0x7fff735bbfff  libxpc.dylib  0.1738.140.2\n\nUnloaded modules:\n", "stackwalk_stderr": null, "action": "crash", "line": 1102}
  • this has the field signature and action=="crash"

there are cases when tests don't run or failures occur outside of tests...

  • taskcluster error timed out live.log
    ** early into the test cycle, all tests started failing as TEST-TIMEOUT
  • error while setting up live.log
  • harness timeout while running tests errorsummary.json, live.log
    ** this has an existing error, but overall the harness timeout is not reflected in errorsummary.

given that for specific test cases we output discoverable items to erorrsummary.json, I think there isn't anything to do there.

:ahal, am I missing some of the obvious stuff here?

Flags: needinfo?(ahal)

Yeah looks like it's working to me. Might want to verify with Marco as he mentioned there was data missing. Possible that the harness misses some cases but catches others? It's also possible we might need to update things on the mozci side.

But either way looks like errorsummary.py is good.

Marco did you have an example of a case where data appeared to be missing?

Flags: needinfo?(ahal) → needinfo?(mcastelluccio)

We discussed this on Matrix and yes, it looks like all the data we need is there.

The only problem we noticed, which is related to this but a separate problem, is that in the case of crashes we report a group result with status "OK".
For example in https://firefoxci.taskcluster-artifacts.net/ZhptEmYYQDCMHTLHtliH4Q/0/public/test_info/xpcshell_errorsummary.log we have {"group": "dom/indexedDB/test/unit/xpcshell-child-process.ini:dom/indexedDB/test/unit/xpcshell-shared.ini", "status": "OK", "duration": 31772, "action": "group_result", "line": 3559} even though xpcshell-child-process.ini:dom/indexedDB/test/unit/test_clear.js is crashing.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(mcastelluccio)
Resolution: --- → WORKSFORME
Assignee: nobody → jmaher

one case we can handle where we get a crash and need to mark the group status != OK in errorsummary.json

Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e6ec6c4fe419 add status to mozlog crash so group summary is not marked as OK. r=gbrown
Keywords: leave-open

Backed out changeset e6ec6c4fe419 (bug 1758984) for causing python3 unit test failures in test_mochitest_integration

Backout link: https://hg.mozilla.org/integration/autoland/rev/aba142c47ba9e18a419b8dd0372eb8654e567e0f

Push with failures

Failure log

Flags: needinfo?(jmaher)

had the fix in another commit locally, moved it over and tested on try:
https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=6f7114b4095fa7582e7e3ee3546202f2ea8c16e4

all looks good now

Flags: needinfo?(jmaher)
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/188adcf352b9 add status to mozlog crash so group summary is not marked as OK. r=gbrown
Blocks: 1759836

The leave-open keyword is there and there is no activity for 6 months.
:jmaher, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(jmaher)

more edge cases to solve

Flags: needinfo?(jmaher)
Blocks: 1614642

I have fixed many edge cases recently. I would like to close this and open specific bugs for edge cases when we find them

Status: REOPENED → RESOLVED
Closed: 3 years ago1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: