Closed
Bug 1482344
Opened 6 years ago
Closed 6 years ago
raptor fails to run fetch benchmarks after moving to hardware
Categories
(Testing :: Raptor, enhancement)
Testing
Raptor
Tracking
(firefox63 fixed)
RESOLVED
FIXED
mozilla63
Tracking | Status | |
---|---|---|
firefox63 | --- | fixed |
People
(Reporter: jmaher, Assigned: ahal)
References
Details
Attachments
(2 obsolete files)
raptor unittests unity3d and local one wasm-misc work fine when they are run on virtual machines, but when they are run on physical hardware they fail. In looking at logs before/after, we would fetch the task and post the data in the benchmarks directory, now only benchmarks from third_party/webkit/PerformanceTests/ seem to appear in our benchmark directory while running tests.
Reporter | ||
Comment 1•6 years ago
|
||
on a virtual machine, I see this in the log:
[taskcluster 2018-08-09 13:48:23.520Z] === Task Starting ===
[setup 2018-08-09T13:48:23.986Z] run-task started in /builds/worker
[cache 2018-08-09T13:48:23.989Z] cache /builds/worker/checkouts exists; requirements: gid=1000 uid=1000 version=1
[cache 2018-08-09T13:48:23.989Z] cache /builds/worker/workspace exists; requirements: gid=1000 uid=1000 version=1
[volume 2018-08-09T13:48:23.990Z] changing ownership of volume /builds/worker/.cache to 1000:1000
[volume 2018-08-09T13:48:23.990Z] volume /builds/worker/checkouts is a cache
[volume 2018-08-09T13:48:23.990Z] changing ownership of volume /builds/worker/tooltool-cache to 1000:1000
[volume 2018-08-09T13:48:23.990Z] volume /builds/worker/workspace is a cache
[setup 2018-08-09T13:48:23.991Z] running as worker:worker
[fetches 2018-08-09T13:48:23.991Z] fetching artifacts
Downloading https://queue.taskcluster.net/v1/task/XGuKvVIKTqi2FDJc_lWG-w/artifacts/public/wasm-misc.zip to /builds/worker/fetches/wasm-misc.zip.tmp
Downloading https://queue.taskcluster.net/v1/task/XGuKvVIKTqi2FDJc_lWG-w/artifacts/public/wasm-misc.zip
https://queue.taskcluster.net/v1/task/XGuKvVIKTqi2FDJc_lWG-w/artifacts/public/wasm-misc.zip resolved to 4433793 bytes with sha256 0ba273b748b872117a4b230c776bbd73550398da164025a735c28a16c0224397 in 0.619s
Renaming to /builds/worker/fetches/wasm-misc.zip
Extracting /builds/worker/fetches/wasm-misc.zip to /builds/worker/fetches using ['unzip', '/builds/worker/fetches/wasm-misc.zip']
Archive: /builds/worker/fetches/wasm-misc.zip
creating: wasm-misc/
...
/builds/worker/fetches/wasm-misc.zip extracted in 0.136s
Removing /builds/worker/fetches/wasm-misc.zip
[fetches 2018-08-09T13:48:24.867Z] finished fetching artifacts
[task 2018-08-09T13:48:24.867Z] executing ['/builds/worker/bin/test-linux.sh', '--installer-url=https://queue.taskcluster.net/v1/task/cMNgzfDCRJSd6A9blGVoBw/artifacts/public/build/target.tar.bz2', '--test-packages-url=https://queue.taskcluster.net/v1/task/cMNgzfDCRJSd6A9blGVoBw/artifacts/public/build/target.test_packages.json', '--test=raptor-wasm-misc', '--branch-name', 'try', '--download-symbols=ondemand']
on hardware we don't run test-linux.sh, is it possible that we have different features in docker-worker vs <whatever>-worker that we are using on hardware?
Flags: needinfo?(wcosta)
Flags: needinfo?(ahal)
Reporter | ||
Comment 2•6 years ago
|
||
I see :ahal recently added a fetch_artifacts support in run-task:
https://searchfox.org/mozilla-central/source/taskcluster/scripts/run-task#742
this looks as if it is supported in both docker-worker and native-engine, but I found that native-engine (i.e. hardware) doesn't have MOZ_FETCHES defined in the environment variables.
Reporter | ||
Comment 3•6 years ago
|
||
it appears that edit+retrigger to add MOZ_FETCH* env vars doesn't solve this problem.
Comment 4•6 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC+2) from comment #1)
> on a virtual machine, I see this in the log:
> [taskcluster 2018-08-09 13:48:23.520Z] === Task Starting ===
> [setup 2018-08-09T13:48:23.986Z] run-task started in /builds/worker
> [cache 2018-08-09T13:48:23.989Z] cache /builds/worker/checkouts exists;
> requirements: gid=1000 uid=1000 version=1
> [cache 2018-08-09T13:48:23.989Z] cache /builds/worker/workspace exists;
> requirements: gid=1000 uid=1000 version=1
> [volume 2018-08-09T13:48:23.990Z] changing ownership of volume
> /builds/worker/.cache to 1000:1000
> [volume 2018-08-09T13:48:23.990Z] volume /builds/worker/checkouts is a cache
> [volume 2018-08-09T13:48:23.990Z] changing ownership of volume
> /builds/worker/tooltool-cache to 1000:1000
> [volume 2018-08-09T13:48:23.990Z] volume /builds/worker/workspace is a cache
> [setup 2018-08-09T13:48:23.991Z] running as worker:worker
> [fetches 2018-08-09T13:48:23.991Z] fetching artifacts
> Downloading
> https://queue.taskcluster.net/v1/task/XGuKvVIKTqi2FDJc_lWG-w/artifacts/
> public/wasm-misc.zip to /builds/worker/fetches/wasm-misc.zip.tmp
> Downloading
> https://queue.taskcluster.net/v1/task/XGuKvVIKTqi2FDJc_lWG-w/artifacts/
> public/wasm-misc.zip
> https://queue.taskcluster.net/v1/task/XGuKvVIKTqi2FDJc_lWG-w/artifacts/
> public/wasm-misc.zip resolved to 4433793 bytes with sha256
> 0ba273b748b872117a4b230c776bbd73550398da164025a735c28a16c0224397 in 0.619s
> Renaming to /builds/worker/fetches/wasm-misc.zip
> Extracting /builds/worker/fetches/wasm-misc.zip to /builds/worker/fetches
> using ['unzip', '/builds/worker/fetches/wasm-misc.zip']
> Archive: /builds/worker/fetches/wasm-misc.zip
> creating: wasm-misc/
> ...
> /builds/worker/fetches/wasm-misc.zip extracted in 0.136s
> Removing /builds/worker/fetches/wasm-misc.zip
> [fetches 2018-08-09T13:48:24.867Z] finished fetching artifacts
> [task 2018-08-09T13:48:24.867Z] executing
> ['/builds/worker/bin/test-linux.sh',
> '--installer-url=https://queue.taskcluster.net/v1/task/
> cMNgzfDCRJSd6A9blGVoBw/artifacts/public/build/target.tar.bz2',
> '--test-packages-url=https://queue.taskcluster.net/v1/task/
> cMNgzfDCRJSd6A9blGVoBw/artifacts/public/build/target.test_packages.json',
> '--test=raptor-wasm-misc', '--branch-name', 'try',
> '--download-symbols=ondemand']
>
>
> on hardware we don't run test-linux.sh, is it possible that we have
> different features in docker-worker vs <whatever>-worker that we are using
> on hardware?
It doesn't seem to be related to worker setup. Do you have a link to the failing task?
Flags: needinfo?(wcosta)
Reporter | ||
Comment 5•6 years ago
|
||
Here is a link to a failing log on hardware:
https://taskcluster-artifacts.net/ArrpzEcgSemMQym6zN8C8w/0/public/logs/live_backing.log
and a passing log on vm:
https://taskcluster-artifacts.net/Z3o00pR-RIKpuchBMvAkCw/0/public/logs/live_backing.log
Comment 6•6 years ago
|
||
I noticed that on packet, it searches for the home directory at /home/cltbld, shouldn't it be /build/worker?
Flags: needinfo?(jmaher)
Assignee | ||
Comment 7•6 years ago
|
||
I think it's the other way around, those native-engine workers run from /home/cltbld. Joel, I think you need to add the 'workdir' key to raptor.yml similar to what I needed for the jsshell-bench tasks:
https://searchfox.org/mozilla-central/source/taskcluster/ci/source-test/jsshell.yml#19
Note those jsshell tasks are currently the only things using both run-task and a native-engine worker, so there are still edge cases that haven't been smoothed over.
Flags: needinfo?(ahal)
Assignee | ||
Comment 8•6 years ago
|
||
Oh, but because raptor.yml is a "test" kind (and not a "source-test" kind like jsshell), you'll need to figure out how to propagate this value from raptor.yml up to here:
https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/transforms/job/__init__.py#199
There may also very well be other problems. These are the first "test" tasks to use native-engine + fetches.
Comment 9•6 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #7)
> I think it's the other way around, those native-engine workers run from
> /home/cltbld. Joel, I think you need to add the 'workdir' key to raptor.yml
> similar to what I needed for the jsshell-bench tasks:
> https://searchfox.org/mozilla-central/source/taskcluster/ci/source-test/
> jsshell.yml#19
>
> Note those jsshell tasks are currently the only things using both run-task
> and a native-engine worker, so there are still edge cases that haven't been
> smoothed over.
Ops, my bad, I am so biased to packet.net that I assumed the task was running there.
Reporter | ||
Updated•6 years ago
|
Flags: needinfo?(jmaher)
Assignee | ||
Comment 10•6 years ago
|
||
This is happening because the 'native-engine' implementation in mozharness_test.py is overwriting the worker's env instead of updating it:
https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/transforms/job/mozharness_test.py#340
Though the workdir also needed to be set as per comment 7.
Assignee: nobody → ahal
Status: NEW → ASSIGNED
Assignee | ||
Comment 11•6 years ago
|
||
Turns out that still wasn't enough because the native-engine workers don't use `run-task` (mozharness_test.py could use some TLC), which means MOZ_FETCHES aren't downloaded automatically.
There are two options:
A) Try to mount the run-task and fetch-content scripts on these workers and modify mozharness_test.py to always use run-task.
B) Download the fetches in mozharness (there is precedent here from the code-coverage tasks)
Option A is more aligned with the future we want to see, so I'll give that a brief shot. If I can't get it to work for any reason, I'll fallback to option B.
Assignee | ||
Comment 12•6 years ago
|
||
We need to grab fetches from several place in mozharness, this creates a
dedicated mixin that can be used from anywhere. If the 'fetch-content' script
is detected that will be used, otherwise we download the fetches manually.
Assignee | ||
Comment 13•6 years ago
|
||
This unbreaks some tier 3 raptor tasks. There are a few fixes rolled together here:
1) Stop overwriting the 'env' in mozharness_test.py's 'native-engine' implementation
2) Set the workdir to /home/cltbld (which makes sure the fetches are downloaded to there)
3) Download the fetches via mozharness in the 'raptor' script (since they don't use run-task anymore)
Depends on D3651
Reporter | ||
Comment 14•6 years ago
|
||
Comment on attachment 9002065 [details]
Bug 1482344 - [raptor] Fix fetch tasks for native-engine mozharness_test based tasks, r=jmaher
Joel Maher ( :jmaher ) (UTC+2) has approved the revision.
Attachment #9002065 -
Flags: review+
Comment 15•6 years ago
|
||
Comment on attachment 9002064 [details]
Bug 1482344 - [mozharness] Refactor codecoverage fetch downloading into a standalone mixin, r=marco
Tudor-Gabriel Vijiala [:tvijiala] has approved the revision.
Attachment #9002064 -
Flags: review+
Comment 16•6 years ago
|
||
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/95e338482796
[mozharness] Refactor codecoverage fetch downloading into a standalone mixin, r=tvijiala
Comment 17•6 years ago
|
||
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/aa6f46eaec1b
[raptor] Fix fetch tasks for native-engine mozharness_test based tasks, r=jmaher
Comment 18•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/95e338482796
https://hg.mozilla.org/mozilla-central/rev/aa6f46eaec1b
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox63:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Comment 19•1 year ago
|
||
Comment on attachment 9002064 [details]
Bug 1482344 - [mozharness] Refactor codecoverage fetch downloading into a standalone mixin, r=marco
Revision D3651 was moved to bug 1607000. Setting attachment 9002064 [details] to obsolete.
Attachment #9002064 -
Attachment is obsolete: true
Comment 20•1 year ago
|
||
Comment on attachment 9002065 [details]
Bug 1482344 - [raptor] Fix fetch tasks for native-engine mozharness_test based tasks, r=jmaher
Revision D3652 was moved to bug 1607000. Setting attachment 9002065 [details] to obsolete.
Attachment #9002065 -
Attachment is obsolete: true
You need to log in
before you can comment on or make changes to this bug.
Description
•