Retriggers use build for previous push if backfill requested before
Categories
(Firefox Build System :: Task Configuration, defect)
Tracking
(Not tracked)
People
(Reporter: intermittent-bug-filer, Unassigned)
Details
(Keywords: intermittent-failure, regression)
Filed by: dvarga [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=269458109&repo=autoland
Full log: https://queue.taskcluster.net/v1/task/HL4JRIPwRHKZJGeYAtqMkA/runs/0/artifacts/public/logs/live_backing.log
[task 2019-10-02T15:58:02.467Z] 15:58:02 INFO - Included file 'Z:\task_1570028969\build\tests\mochitest\tests\toolkit\mozapps\extensions\test\mochitest\mochitest.ini' does not exist
[task 2019-10-02T15:58:02.467Z] 15:58:02 INFO - Included file 'Z:\task_1570028969\build\tests\mochitest\tests\toolkit\xre\test\mochitest.ini' does not exist
[task 2019-10-02T15:58:02.467Z] 15:58:02 INFO - Included file 'Z:\task_1570028969\build\tests\mochitest\tests\uriloader\exthandler\tests\mochitest\mochitest.ini' does not exist
[task 2019-10-02T15:58:02.467Z] 15:58:02 INFO - Included file 'Z:\task_1570028969\build\tests\mochitest\tests\widget\tests\mochitest.ini' does not exist
[task 2019-10-02T15:58:02.467Z] 15:58:02 ERROR - No tests were found for flavor 'plain' and the following manifest filters:
[task 2019-10-02T15:58:02.467Z] 15:58:02 ERROR - skip_if, run_if, fail_if, remove_imptest_failure_expectations, subsuite(name=None), chunk_by_dir(1, 5, 4)
[task 2019-10-02T15:58:02.468Z] 15:58:02 ERROR -
[task 2019-10-02T15:58:02.468Z] 15:58:02 ERROR - Make sure the test paths (if any) are spelt correctly and the corresponding
[task 2019-10-02T15:58:02.468Z] 15:58:02 ERROR - --flavor and --subsuite are being used. See `mach mochitest --help` for a
[task 2019-10-02T15:58:02.468Z] 15:58:02 ERROR - list of valid flavors.
[task 2019-10-02T15:58:02.468Z] 15:58:02 ERROR -
[task 2019-10-02T15:58:02.468Z] 15:58:02 INFO - SUITE-START | Running 0 tests
[task 2019-10-02T15:58:02.468Z] 15:58:02 INFO - 0 INFO TEST-START | Shutdown
[task 2019-10-02T15:58:02.468Z] 15:58:02 INFO - 1 INFO Passed: 0
[task 2019-10-02T15:58:02.468Z] 15:58:02 INFO - 2 INFO Failed: 0
[task 2019-10-02T15:58:02.468Z] 15:58:02 INFO - 3 INFO Todo: 0
[task 2019-10-02T15:58:02.468Z] 15:58:02 INFO - 4 INFO Mode: e10s
[task 2019-10-02T15:58:02.468Z] 15:58:02 INFO - 5 INFO SimpleTest FINISHED
[task 2019-10-02T15:58:02.469Z] 15:58:02 INFO - Buffered messages finished
[task 2019-10-02T15:58:02.469Z] 15:58:02 INFO - SUITE-END | took 0s
[task 2019-10-02T15:58:02.502Z] 15:58:02 ERROR - Return code: 1
[task 2019-10-02T15:58:02.502Z] 15:58:02 ERROR - No checks run.
[task 2019-10-02T15:58:02.502Z] 15:58:02 INFO - TinderboxPrint: mochitest-mochitest-plain-chunked<br/><em class="testfail">T-FAIL</em>
[task 2019-10-02T15:58:02.502Z] 15:58:02 ERROR - # TBPL FAILURE #
[task 2019-10-02T15:58:02.503Z] 15:58:02 WARNING - setting return code to 2
[task 2019-10-02T15:58:02.503Z] 15:58:02 ERROR - The mochitest suite: mochitest-plain-chunked ran with return status: FAILURE
[task 2019-10-02T15:58:02.503Z] 15:58:02 INFO - Running post-action listener: _package_coverage_data
[task 2019-10-02T15:58:02.503Z] 15:58:02 INFO - Running post-action listener: _resource_record_post_action
[task 2019-10-02T15:58:02.503Z] 15:58:02 INFO - Running post-action listener: process_java_coverage_data
[task 2019-10-02T15:58:02.503Z] 15:58:02 INFO - [mozharness: 2019-10-02 15:58:02.503000Z] Finished run-tests step (success)
[task 2019-10-02T15:58:02.503Z] 15:58:02 INFO - Running post-run listener: _resource_record_post_run
Comment hidden (Intermittent Failures Robot) |
Comment 2•5 years ago
|
||
These failures are all for one push and on platform (Windows 10 asan). Reruns for one of those failures were green: https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=269458109&revision=b3743b6fb2f3408201cb491fa4f3c8d69222a1a6
Andrew, have you seen anything like this before and is there a difference in the logic which explains that the retriggers passed?
Comment 3•5 years ago
|
||
I haven't seen this before and I'm not sure how it would be possible :/.
Were there any issues with the build task?
Comment 4•5 years ago
|
||
Failed job:
Log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=269458109&repo=autoland&lineNumber=123
Installer url: https://queue.taskcluster.net/v1/task/Y0aSN_dMTBiW3q8PNVGTgA/artifacts/public/build/target.zip
Task: https://tools.taskcluster.net/groups/HylFB9IrRtWk6laSKtuNcA/tasks/Y0aSN_dMTBiW3q8PNVGTgA/details
Belongs to push: https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=269458109&revision=b3743b6fb2f3408201cb491fa4f3c8d69222a1a6&searchStr=windows%2Casan&group_state=expanded
Green retrigger:
Log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=269479874&repo=autoland&lineNumber=123
Installer url: https://queue.taskcluster.net/v1/task/eqsZsOBvTISMlKgdka0o5g/artifacts/public/build/target.zip
Task: https://tools.taskcluster.net/groups/aD3gRmtUQ1e2RHpKOewrgw/tasks/eqsZsOBvTISMlKgdka0o5g/details
Belongs to push: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=308db09b03b19f9576c5f0d25a9117f85c4dcea4
The installer_urls should be the same.
Cam, could this be from recent work related to custom actions/retriggers?
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 6•5 years ago
|
||
Chatted with Aryx in Zoom. This may be related to doing a Retrigger All (pinned jobs) with jobs from multiple pushes. I will investigate and do some testing on my end in Treeherder. That being said, the case we tried, we could not reproduce the problem. They jobs used the correct installer_urls.
Comment 7•5 years ago
|
||
Cameron and I took a look at it. The retrigger https://tools.taskcluster.net/groups/HylFB9IrRtWk6laSKtuNcA/tasks/YMUtQw-BS8unY8tZS0k3cw/details has the
Gecko decision task https://tools.taskcluster.net/tasks/HylFB9IrRtWk6laSKtuNcA
for the same push. The
Action task https://tools.taskcluster.net/groups/HylFB9IrRtWk6laSKtuNcA/tasks/PZpzJ2qNRZeUnfLqRtDRKA/details
references that. But the build dependency
Windows asan opt build https://tools.taskcluster.net/tasks/eqsZsOBvTISMlKgdka0o5g
belongs to the previous push. In the action task, the correct gecko decision task is referenced: HylFB9IrRtWk6laSKtuNcA
There is also a line:
No label-to-taskid.json found for UNFTUAyoRxO6lTw9Xxt-xw: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/UNFTUAyoRxO6lTw9Xxt-xw/artifacts/public/label-to-taskid.json
Dustin, any idea where the switch to a build from a different push could originate from?
Comment 8•5 years ago
|
||
This sounds like something else I did some digging on 2-3 weeks ago, but I can't find that bug now. It came down to some unintuitive interplay between action tasks and the records kept in files like label-to-taskid.json and full-task-graph.json. To dig into this, I'd suggest mapping out all of those files and also looking at what existed when.
Comment hidden (Intermittent Failures Robot) |
Comment 10•5 years ago
|
||
Steps to reproduce:
1 . Find a push on a tree for which you are allowed to trigger and backfill.
2. Select a test task, e.g. a mochitest M
one.
3. From the "..." menu at the bottom left, use "Backfill".
4. Wait for the action task for the backfill (AC(Bk)
) to finish.
5. Retrigger the same task for which you backfilled.
When I do a retrigger > backfill > retrigger, the retrigger jobs use the same installer urls (had waited until the action tasks finished before calling the next one - please correct if more wait time would be needed).
FWIW, Treeherder calls retrigger-multiple
and breakpoints in https://github.com/mozilla/treeherder/blob/master/ui/models/job.js point to the correct decision task.
https://searchfox.org/mozilla-central/rev/1fe0cf575841dbf3b7e159e88ba03260cd1354c0/taskcluster/taskgraph/actions/util.py#66 indicates gecko.v2.autoland.pushlog-id.99293.actions does not exist (only .decision
). 99293
is the push id submitted for retriggers of https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&selectedJob=273799497&resultStatus=pending%2Crunning%2Csuccess%2Csuperseded%2Cusercancel%2Cretry%2Ctestfailed%2Cbusted%2Cexception&searchStr=browser-chrome&revision=55724db5349e429c044d3493eb13bfc94c620ecf
Tom, any idea what causes this unexpected behavior?
Comment 11•5 years ago
|
||
Increasing severity as this affects all tasks on the platform, e.g. Linux x64 debug tests of https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&tochange=304f062595a5d8adf3f7f5932b48f305213e64dc&fromchange=d8796ee34018b922213c371407c701e679cc95c2 used the build before the backout for the backout push.
Comment 12•5 years ago
|
||
The issue here is that the backfill job is an action on one push, but creates jobs on other pushes. Thus, those jobs can't be found by other actions looking at the other pushes, but can be found on the original push. The backfill action needs to be split into two parts, one that runs on the original push, and triggers actions on the other pushes that create the appropriate tasks.
Comment 13•5 years ago
|
||
It might make sense to see if we can handle the failure case of Bug 1617107 better when we address this, too.
Comment 14•4 years ago
|
||
Is this now fixed? (Since we now schedule an intermediary action for every push)
Comment 15•4 years ago
|
||
I believe so, please reopen if it seen happening again.
Description
•