Closed Bug 1640758 Opened 4 years ago Closed 4 years ago

test-verify tasks often have bad errorsummary.log and raw.log information

Categories

(Testing :: General, defect, P1)

defect

Tracking

(firefox79 fixed)

RESOLVED FIXED
mozilla79
Tracking Status
firefox79 --- fixed

People

(Reporter: marco, Assigned: gbrown)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file)

E.g. in GHFio16nR1auPd8oTMS6jw browser/extensions/doh-rollout/test/browser/browser_dirtyEnable.js failed, but the errorsummary.log is:

{"action": "test_groups", "line": 5, "groups": ["browser/extensions/doh-rollout/test/browser/browser.ini"]}
{"status": "OK", "action": "group_result", "line": 141, "group": "browser/extensions/doh-rollout/test/browser/browser.ini"}
{"action": "test_groups", "line": 147, "groups": ["browser/extensions/doh-rollout/test/browser/browser.ini"]}
{"status": "OK", "action": "group_result", "line": 236, "group": "browser/extensions/doh-rollout/test/browser/browser.ini"}
{"action": "test_groups", "line": 239, "groups": ["browser/extensions/doh-rollout/test/browser/browser.ini"]}
{"status": "OK", "action": "group_result", "line": 330, "group": "browser/extensions/doh-rollout/test/browser/browser.ini"}
{"action": "test_groups", "line": 333, "groups": ["browser/extensions/doh-rollout/test/browser/browser.ini"]}
{"status": "OK", "action": "group_result", "line": 426, "group": "browser/extensions/doh-rollout/test/browser/browser.ini"}
{"action": "test_groups", "line": 429, "groups": ["browser/extensions/doh-rollout/test/browser/browser.ini"]}
{"status": "OK", "action": "group_result", "line": 519, "group": "browser/extensions/doh-rollout/test/browser/browser.ini"}
{"action": "test_groups", "line": 522, "groups": ["browser/extensions/doh-rollout/test/browser/browser.ini"]}
{"status": "OK", "action": "group_result", "line": 616, "group": "browser/extensions/doh-rollout/test/browser/browser.ini"}
{"action": "test_groups", "line": 622, "groups": ["browser/extensions/doh-rollout/test/browser/browser.ini"]}
{"status": "OK", "action": "group_result", "line": 760, "group": "browser/extensions/doh-rollout/test/browser/browser.ini"}
{"action": "test_groups", "line": 763, "groups": ["browser/extensions/doh-rollout/test/browser/browser.ini"]}
{"status": "OK", "action": "group_result", "line": 777, "group": "browser/extensions/doh-rollout/test/browser/browser.ini"}

Similarly in TsTTjrsmRES41BGmliS7Sg, netwerk/test/unit/test_trr_case_sensitivity.js failed, but the errorsummary.log file is:

{"action": "test_groups", "line": 7, "groups": ["netwerk/test/unit/xpcshell.ini"]}
{"action": "test_groups", "line": 38, "groups": ["netwerk/test/unit/xpcshell.ini"]}

The raw log is also wrong, which means data about test-verify is also bogus in ActiveData.

Summary: test-verify tasks often have bad errorsummary.log information → test-verify tasks often have bad errorsummary.log and raw.log information

test-verify runs the same test multiple times, so repeated groups are not unexpected, but I don't understand why many successful mochitests show ~13 runs in the errorsummary.log, where such test-verify tasks would run the test 30 (10+5+10+5) times.

:gbrown I'm not so concerned about the repeated groups, but about the fact that failures might not be reported (in the two cases above, they weren't).

Any chance you could prioritize this? It'd be nice to be able to use test-verify data to train the scheduler!

Flags: needinfo?(gbrown)

Let's see what I can do....

Assignee: nobody → gbrown
Severity: -- → S3
Flags: needinfo?(gbrown)
Priority: -- → P1

(In reply to Geoff Brown [:gbrown] from comment #2)

test-verify runs the same test multiple times, so repeated groups are not unexpected, but I don't understand why many successful mochitests show ~13 runs in the errorsummary.log, where such test-verify tasks would run the test 30 (10+5+10+5) times.

This is actually expected. For a single mochitest TV run, there are 13 group entries in errorsummary.log, corresponding to 13 suite_start logging events: 1 suite_start for step 1 (repeat 10 times in 1 browser session), 5 for step 2 (5 browser sessions), 1 for step 3, 5 for step 4, and finally 1 extra suite_start for the mochitest summary logging (a harmless kludge).

Now, why would failures not be reflected in the errorsummary...

I don't see anything specific to test-verification which would make the errorsummary differ from a corresponding result from a mochitest/reftest/xpcshell-test.

If I force a test failure in a TV task, the errorsummary correctly records the failure(s):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=152b6eae8d650f6427a3aef543d7db47c527c15f

But, a TV task that verifies more than one test in the same suite will run the test harness multiple time, and each time will produce a new errorsummary (and raw) log -- and each time will use the same log file name(s): Log files from the first test will be over-written by the next.

Component: Task Configuration → General
Product: Firefox Build System → Testing
Blocks: test-verify

:marco -- The issue here is that a test-verify task sometimes runs a test harness multiple times, and each run over-writes the logs from the previous run (oops!). Would it be okay to have multiple errorsummary (and raw) logs for test-verify tasks? It would look something like this:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d003f6a6238fe8da96e17f95e1349731223e0b53

Flags: needinfo?(mcastelluccio)

It would work for me, I don't know if other tools might be depending on the existence of a single errorsummary.log file though (maybe Treeherder?).

Flags: needinfo?(mcastelluccio)
Flags: needinfo?(cdawson)
Flags: needinfo?(ahal)

I don't see problems with it (other than the irony of the fact that we've been trying to eliminate multiple errorsummaries by running one suite per task). But overwriting errorsummaries is objectively worse.. so no issues here. Maybe in the long term we could have suite specific TV tasks (not sure if this actually solves a problem or not, just seems like it would make things more intuitive).

One comment is that the number makes it look like it corresponds to a specific chunk. I'd vote we call it mochitest-plain_run1_errorsummary.log or something like that.

Flags: needinfo?(ahal)

(In reply to Marco Castelluccio [:marco] from comment #10)

It would work for me, I don't know if other tools might be depending on the existence of a single errorsummary.log file though (maybe Treeherder?).

Also TV already emits more than one errorsummary, so not sure this would break anything more than it already does.

(In reply to Marco Castelluccio [:marco] from comment #10)

It would work for me, I don't know if other tools might be depending on the existence of a single errorsummary.log file though (maybe Treeherder?).

Treeherder would handle this fine. It parses any number of logs that end with _errorsummary.log. Thanks for checking. :)

Flags: needinfo?(cdawson)

In all mozharness scripts running test-verify (or per-test coverage), use distinct
file names for errorsummary and raw logs. This usually means that the command needs
to be built inside the per-test loop.

(In reply to Geoff Brown [:gbrown] from comment #9)

:marco -- The issue here is that a test-verify task sometimes runs a test harness multiple times, and each run over-writes the logs from the previous run (oops!). Would it be okay to have multiple errorsummary (and raw) logs for test-verify tasks? It would look something like this:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d003f6a6238fe8da96e17f95e1349731223e0b53

Just noticed though that there is a problem. In test-android-em-7.0-x86_64/debug-geckoview-test-verify-e10s (SPVn270YSmu0Hh8lBPL96Q), there is a xpcshell failure, but the xpcshell_errorsummary.log doesn't report it.

(In reply to Marco Castelluccio [:marco] from comment #15)
Yes, that try push did not have the android changes included in the final patch -- so expected and nothing to worry about.

Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/db1e66399e7e Use multiple errorsummary and raw logs for test-verify; r=jmaher
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: