1640758 - test-verify tasks often have bad errorsummary.log and raw.log information

Reporter

Description

•

5 years ago

E.g. in GHFio16nR1auPd8oTMS6jw browser/extensions/doh-rollout/test/browser/browser_dirtyEnable.js failed, but the errorsummary.log is:

{"action": "test_groups", "line": 5, "groups": ["browser/extensions/doh-rollout/test/browser/browser.ini"]}
{"status": "OK", "action": "group_result", "line": 141, "group": "browser/extensions/doh-rollout/test/browser/browser.ini"}
{"action": "test_groups", "line": 147, "groups": ["browser/extensions/doh-rollout/test/browser/browser.ini"]}
{"status": "OK", "action": "group_result", "line": 236, "group": "browser/extensions/doh-rollout/test/browser/browser.ini"}
{"action": "test_groups", "line": 239, "groups": ["browser/extensions/doh-rollout/test/browser/browser.ini"]}
{"status": "OK", "action": "group_result", "line": 330, "group": "browser/extensions/doh-rollout/test/browser/browser.ini"}
{"action": "test_groups", "line": 333, "groups": ["browser/extensions/doh-rollout/test/browser/browser.ini"]}
{"status": "OK", "action": "group_result", "line": 426, "group": "browser/extensions/doh-rollout/test/browser/browser.ini"}
{"action": "test_groups", "line": 429, "groups": ["browser/extensions/doh-rollout/test/browser/browser.ini"]}
{"status": "OK", "action": "group_result", "line": 519, "group": "browser/extensions/doh-rollout/test/browser/browser.ini"}
{"action": "test_groups", "line": 522, "groups": ["browser/extensions/doh-rollout/test/browser/browser.ini"]}
{"status": "OK", "action": "group_result", "line": 616, "group": "browser/extensions/doh-rollout/test/browser/browser.ini"}
{"action": "test_groups", "line": 622, "groups": ["browser/extensions/doh-rollout/test/browser/browser.ini"]}
{"status": "OK", "action": "group_result", "line": 760, "group": "browser/extensions/doh-rollout/test/browser/browser.ini"}
{"action": "test_groups", "line": 763, "groups": ["browser/extensions/doh-rollout/test/browser/browser.ini"]}
{"status": "OK", "action": "group_result", "line": 777, "group": "browser/extensions/doh-rollout/test/browser/browser.ini"}

Similarly in TsTTjrsmRES41BGmliS7Sg, netwerk/test/unit/test_trr_case_sensitivity.js failed, but the errorsummary.log file is:

{"action": "test_groups", "line": 7, "groups": ["netwerk/test/unit/xpcshell.ini"]}
{"action": "test_groups", "line": 38, "groups": ["netwerk/test/unit/xpcshell.ini"]}

Marco Castelluccio [:marco]

Reporter

Comment 1

•

5 years ago

The raw log is also wrong, which means data about test-verify is also bogus in ActiveData.

Summary: test-verify tasks often have bad errorsummary.log information → test-verify tasks often have bad errorsummary.log and raw.log information

Geoff Brown [:gbrown]

Assignee

Comment 2

•

5 years ago

test-verify runs the same test multiple times, so repeated groups are not unexpected, but I don't understand why many successful mochitests show ~13 runs in the errorsummary.log, where such test-verify tasks would run the test 30 (10+5+10+5) times.

Marco Castelluccio [:marco]

Reporter

Comment 3

•

5 years ago

:gbrown I'm not so concerned about the repeated groups, but about the fact that failures might not be reported (in the two cases above, they weren't).

Marco Castelluccio [:marco]

Reporter

Comment 4

•

5 years ago

Any chance you could prioritize this? It'd be nice to be able to use test-verify data to train the scheduler!

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Assignee

Comment 5

•

5 years ago

Let's see what I can do....

Assignee: nobody → gbrown

Severity: -- → S3

Flags: needinfo?(gbrown)

Priority: -- → P1

Geoff Brown [:gbrown]

Assignee

Comment 6

•

5 years ago

(In reply to Geoff Brown [:gbrown] from comment #2)

test-verify runs the same test multiple times, so repeated groups are not unexpected, but I don't understand why many successful mochitests show ~13 runs in the errorsummary.log, where such test-verify tasks would run the test 30 (10+5+10+5) times.

This is actually expected. For a single mochitest TV run, there are 13 group entries in errorsummary.log, corresponding to 13 suite_start logging events: 1 suite_start for step 1 (repeat 10 times in 1 browser session), 5 for step 2 (5 browser sessions), 1 for step 3, 5 for step 4, and finally 1 extra suite_start for the mochitest summary logging (a harmless kludge).

Now, why would failures not be reflected in the errorsummary...

Geoff Brown [:gbrown]

Assignee

Comment 7

•

5 years ago

I don't see anything specific to test-verification which would make the errorsummary differ from a corresponding result from a mochitest/reftest/xpcshell-test.

If I force a test failure in a TV task, the errorsummary correctly records the failure(s):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=152b6eae8d650f6427a3aef543d7db47c527c15f

Geoff Brown [:gbrown]

Assignee

Comment 8

•

5 years ago

But, a TV task that verifies more than one test in the same suite will run the test harness multiple time, and each time will produce a new errorsummary (and raw) log -- and each time will use the same log file name(s): Log files from the first test will be over-written by the next.

Geoff Brown [:gbrown]

Assignee

Updated

•

5 years ago

Component: Task Configuration → General

Product: Firefox Build System → Testing

Geoff Brown [:gbrown]

Assignee

Updated

•

5 years ago

Blocks: test-verify

Geoff Brown [:gbrown]

Assignee

Comment 9

•

5 years ago

:marco -- The issue here is that a test-verify task sometimes runs a test harness multiple times, and each run over-writes the logs from the previous run (oops!). Would it be okay to have multiple errorsummary (and raw) logs for test-verify tasks? It would look something like this:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d003f6a6238fe8da96e17f95e1349731223e0b53

Flags: needinfo?(mcastelluccio)

Marco Castelluccio [:marco]

Reporter

Comment 10

•

5 years ago

It would work for me, I don't know if other tools might be depending on the existence of a single errorsummary.log file though (maybe Treeherder?).

Flags: needinfo?(mcastelluccio)

Flags: needinfo?(cdawson)

Flags: needinfo?(ahal)

Andrew Halberstadt [:ahal]

Comment 11

•

5 years ago

I don't see problems with it (other than the irony of the fact that we've been trying to eliminate multiple errorsummaries by running one suite per task). But overwriting errorsummaries is objectively worse.. so no issues here. Maybe in the long term we could have suite specific TV tasks (not sure if this actually solves a problem or not, just seems like it would make things more intuitive).

One comment is that the number makes it look like it corresponds to a specific chunk. I'd vote we call it mochitest-plain_run1_errorsummary.log or something like that.

Flags: needinfo?(ahal)

Andrew Halberstadt [:ahal]

Comment 12

•

5 years ago

(In reply to Marco Castelluccio [:marco] from comment #10)

It would work for me, I don't know if other tools might be depending on the existence of a single errorsummary.log file though (maybe Treeherder?).

Also TV already emits more than one errorsummary, so not sure this would break anything more than it already does.

Cameron Dawson [:camd]

Comment 13

•

5 years ago

(In reply to Marco Castelluccio [:marco] from comment #10)

It would work for me, I don't know if other tools might be depending on the existence of a single errorsummary.log file though (maybe Treeherder?).

Treeherder would handle this fine. It parses any number of logs that end with _errorsummary.log. Thanks for checking. :)

Flags: needinfo?(cdawson)

Geoff Brown [:gbrown]

Assignee

Comment 14

•

5 years ago

Attached file Bug 1640758 - Use multiple errorsummary and raw logs for test-verify; r= (deleted) — Details

In all mozharness scripts running test-verify (or per-test coverage), use distinct
file names for errorsummary and raw logs. This usually means that the command needs
to be built inside the per-test loop.

Marco Castelluccio [:marco]

Reporter

Comment 15

•

5 years ago

(In reply to Geoff Brown [:gbrown] from comment #9)

:marco -- The issue here is that a test-verify task sometimes runs a test harness multiple times, and each run over-writes the logs from the previous run (oops!). Would it be okay to have multiple errorsummary (and raw) logs for test-verify tasks? It would look something like this:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d003f6a6238fe8da96e17f95e1349731223e0b53

Just noticed though that there is a problem. In test-android-em-7.0-x86_64/debug-geckoview-test-verify-e10s (SPVn270YSmu0Hh8lBPL96Q), there is a xpcshell failure, but the xpcshell_errorsummary.log doesn't report it.

Geoff Brown [:gbrown]

Assignee

Comment 16

•

5 years ago

(In reply to Marco Castelluccio [:marco] from comment #15)
Yes, that try push did not have the android changes included in the final patch -- so expected and nothing to worry about.

Pulsebot

Comment 17

•

5 years ago

Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/db1e66399e7e Use multiple errorsummary and raw logs for test-verify; r=jmaher

Bogdan Tara[:bogdan_tara | bogdant]

Comment 18

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/db1e66399e7e

Status: NEW → RESOLVED

Closed: 5 years ago

status-firefox79: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla79