Open Bug 1431125 Opened 7 years ago Updated 2 years ago

Test verification of long-running tests may exceed task timeout

Categories

(Testing :: General, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: gbrown, Unassigned)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file)

Consider https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=156829580&lineNumber=6259 TEST-OK | browser/tools/mozscreenshots/primaryUI/browser_primaryUI.js | took 682544ms ... [taskcluster:error] Task timeout after 5400 seconds. Force killing container.
Blocks: 1411358
Priority: -- → P2
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=162381172&lineNumber=4696 [task 2018-02-15T12:05:45.624Z] 12:05:45 INFO - ::: [task 2018-02-15T12:05:45.625Z] 12:05:45 INFO - ::: Running test verification step "1. Run each test 10 times, sequentially."... [task 2018-02-15T12:05:45.625Z] 12:05:45 INFO - ::: [task 2018-02-15T12:05:45.625Z] 12:05:45 INFO - Running tests sequentially. [task 2018-02-15T12:05:45.625Z] 12:05:45 INFO - SUITE-START | Running 1 tests [task 2018-02-15T12:05:45.830Z] 12:05:45 INFO - TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js [task 2018-02-15T12:09:27.473Z] 12:09:27 INFO - TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 221644ms [task 2018-02-15T12:09:28.116Z] 12:09:28 INFO - TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js [task 2018-02-15T12:13:06.912Z] 12:13:06 INFO - TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 218796ms [task 2018-02-15T12:13:07.555Z] 12:13:07 INFO - TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js [task 2018-02-15T12:16:44.402Z] 12:16:44 INFO - TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 216846ms [task 2018-02-15T12:16:45.097Z] 12:16:45 INFO - TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js [task 2018-02-15T12:20:20.093Z] 12:20:20 INFO - TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 214996ms [task 2018-02-15T12:20:20.738Z] 12:20:20 INFO - TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js [task 2018-02-15T12:23:57.083Z] 12:23:57 INFO - TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 216344ms [task 2018-02-15T12:23:57.729Z] 12:23:57 INFO - TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js [task 2018-02-15T12:27:49.732Z] 12:27:49 INFO - TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 232002ms [task 2018-02-15T12:27:50.431Z] 12:27:50 INFO - TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js [task 2018-02-15T12:31:24.381Z] 12:31:24 INFO - TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 213950ms [task 2018-02-15T12:31:25.080Z] 12:31:25 INFO - TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js [task 2018-02-15T12:35:01.834Z] 12:35:01 INFO - TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 216754ms [task 2018-02-15T12:35:02.535Z] 12:35:02 INFO - TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js [task 2018-02-15T12:38:53.939Z] 12:38:53 INFO - TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 231405ms [task 2018-02-15T12:38:54.587Z] 12:38:54 INFO - TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js [taskcluster:error] Task timeout after 5400 seconds. Force killing container.
"Teach" every harness' repeat loop about max run time?? Might be better to run the test once, skip (fail?) verification if a single test run takes longer than ~1 minute.
The test-verify logic tries to predict when verification will take longer than an hour and stops verification prematurely in those cases. But long-running tests - when a single test iteration takes more than a minute or so - can still be problematic. I don't see an easy way of fixing that. For now, I'd like to avoid the task timeouts by increasing the max-run-time significantly - to 3 hours.
Attachment #8954829 - Flags: review?(jmaher)
Keywords: leave-open
Comment on attachment 8954829 [details] [diff] [review] increase tc max-run-time for test-verify Review of attachment 8954829 [details] [diff] [review]: ----------------------------------------------------------------- ++ for 3 hour jobs! That is actually scary, but until we have a better chunking strategy, this makes a lot of sense.
Attachment #8954829 - Flags: review?(jmaher) → review+
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c72e09c45e93 Increase max-run-time of test-verify and test-verify-wpt; r=jmaher
Is there a bug on file for chunking TV?
No. I don't see chunking as a reasonable strategy for TV. The strategy is, if verification is taking too long, give up. https://developer.mozilla.org/en-US/docs/Mozilla/QA/Test_Verification
Assignee: gbrown → nobody
Priority: P2 → P3
The leave-open keyword is there and there is no activity for 6 months. :gbrown, maybe it's time to close this bug?
Flags: needinfo?(gbrown)
I hope to finish this off in 2019.
Flags: needinfo?(gbrown)

The leave-open keyword is there and there is no activity for 6 months.
:gbrown, maybe it's time to close this bug?

Flags: needinfo?(gbrown)
Keywords: leave-open
Flags: needinfo?(gbrown)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: