Stop running all the QR tests as virtual-with-gpu
Categories
(Testing :: General, enhancement)
Tracking
(firefox70 fixed)
Tracking | Status | |
---|---|---|
firefox70 | --- | fixed |
People
(Reporter: bholley, Assigned: jrmuizel)
References
(Blocks 1 open bug)
Details
Attachments
(1 file, 1 obsolete file)
(deleted),
text/x-phabricator-request
|
Details |
I was digging through our cost numbers for automation and realized that our Windows 10 QuantumRender tests all run on VMs with virtualized GPUs. These VMs cost about $1.10 per hour, as opposed to ~$0.32 per hour for regular VMs. So regular VMs cost 70% less.
For non-QR Windows automation, we still run some tests with virtualized GPUs - the mochitest-gpu suite, the webgl tests, reftests, and a handful of other things. This accounts for about 10% of total CPU time running windows tests, and seems like a good cost trade-off for rendering-heavy suites. But I don't think it's really justifiable to run everything with GPUs given what it costs. We should still be able to test the WebRender code paths by forcing it on and running against WARP.
If we align the QR tests with the non-QR ones, we can gain a 70% cost reduction on 90% of our QR tests. That's huge.
Reporter | ||
Comment 1•5 years ago
|
||
Reporter | ||
Comment 2•5 years ago
|
||
Assignee | ||
Comment 3•5 years ago
|
||
I don't think WebRender is getting enabled on WARP. I'll do some investigation into why.
Comment 4•5 years ago
|
||
I had done a try push to fix many of the test differences:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c8e60caca25de21e532e55fbce2267a2275635d3
given :jrmuizel's comment, I will hold on on finishing that
Assignee | ||
Comment 5•5 years ago
|
||
Here's a try push that might work: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fac40c56d3e0f0843ab977a8bd2467c3a0e9dea3
Assignee | ||
Comment 6•5 years ago
|
||
It looks like all the mochitests pass. There's some WARP related things that need fixing for browserchrome.
Reporter | ||
Comment 7•5 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #6)
It looks like all the mochitests pass. There's some WARP related things that need fixing for browserchrome.
Handing this bug off to Jeff, since he's probably the right person to get it over the line.
Assignee | ||
Comment 9•5 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1273771f32e5bce0903466cffe9f3d6a1100b40f is attempt with all of the dependent bugs fixed.
Assignee | ||
Comment 10•5 years ago
|
||
This has an xpcshell test failure in the debug build and a unexpected PWebRenderBridge::Msg_GetSnapshot sync IPC before first paint
bc1 failure.
Assignee | ||
Comment 11•5 years ago
|
||
Assignee | ||
Comment 12•5 years ago
|
||
Assignee | ||
Comment 13•5 years ago
|
||
Assignee | ||
Comment 14•5 years ago
|
||
And a version that keeps web-platform-reftests running on gpus https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4ce54bfb618a55ae21aec48a06eded541a06e98
Assignee | ||
Comment 15•5 years ago
|
||
Assignee | ||
Comment 16•5 years ago
|
||
This uses the layers.d3d11.enable-blacklist pref to allow running WebRender on WARP.
Assignee | ||
Comment 17•5 years ago
|
||
Comment 18•5 years ago
|
||
Comment 19•5 years ago
|
||
Backed out changeset 10a056f52e49 (bug 1571969) for cpp failure. CLOSED TREE
Log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=262037121&repo=autoland&lineNumber=1335
Push with failures:
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=6e01b87795cf68cdbb0d757c23d47959ab52ba60
Backout:
https://hg.mozilla.org/integration/autoland/rev/f3cbdced5732b2bb1ae9cb9492697adc1da4645b
Comment 20•5 years ago
|
||
Also caused perma failures on Windows 10 x64 Talos: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Cpending%2Crunning%2Ctestfailed%2Cbusted%2Cexception&searchStr=windows%2C10%2Cx64%2Cquantumrender%2Cshippable%2Copt%2Ctalos%2Cperformance%2Ctests%2Ctest-windows10-64-shippable-qr%2Fopt-talos-xperf-e10s%2Ct%28x%29&revision=819d45761f2f047021814b400b8a225475f08695&selectedJob=262056788
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=262056788&repo=autoland&lineNumber=1667
Comment 21•5 years ago
|
||
Assignee | ||
Comment 22•5 years ago
|
||
I don't see this in time and requeued a landing. It will need to be backed out again
Comment 23•5 years ago
|
||
Assignee | ||
Comment 24•5 years ago
|
||
xperf talos fix: https://treeherder.mozilla.org/#/jobs?repo=try&revision=736e1a62718a7243381f2f0de52bd7b72d8a97ab
Assignee | ||
Comment 25•5 years ago
|
||
I typoed system32. Here's a new try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f957505d4737ca3e875ea1f46d6565dcfd8cf1df
Comment 26•5 years ago
|
||
Comment 27•5 years ago
|
||
Backed out for raptor failures on tests.py
Backout link: https://hg.mozilla.org/integration/autoland/rev/7b37e3f241fbdac1229b2c44a8363f1337a939a8
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=262152624&repo=autoland&lineNumber=93
Comment 28•5 years ago
|
||
Comment 29•5 years ago
|
||
bugherder |
Reporter | ||
Comment 30•5 years ago
|
||
Awesome, thanks Jeff!
For posterity, can you explain why we needed to do anything with Talos and Raptor here? I'd think we would be running those on hardware.
Comment 31•5 years ago
|
||
:bholley, I know :jrmuizel is on pto this week, in this line:
https://hg.mozilla.org/mozilla-central/rev/a5710687f9b4#l4.17
we skip harnesses that do not support --setpref, and those harnesses will not end up running with WARP, but as they were previously. For Talos and Raptor, we shouldn't do anything, so in this case Raptor is running as before, but Talos has the new pref set. I assume that should be fixed.
To be honest, I am not sure what if any regressions we see with win-qr talos/raptor. If there were changes (and I would expect there to be some) we should see them posted in this bug. Dave, can you see what changes came up with this change to win10-qr talos results?
:ahal, can you add in Talos to the list referenced earlier in my comment?
Reporter | ||
Comment 32•5 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #31)
To be honest, I am not sure what if any regressions we see with win-qr talos/raptor. If there were changes (and I would expect there to be some) we should see them posted in this bug. Dave, can you see what changes came up with this change to win10-qr talos results?
I'm still confused. Per comment 30, I was under the impression that Talos and Raptor always ran on physical hardware, so I would expect that the VM configuration changes in this bug to have no effect on those tests. Am I mistaken somehow?
Comment 33•5 years ago
|
||
Looks like talos does support --setpref
(and raptor doesn't). So is there still something to fix? I have zero understanding of talos configurations so would like to clarify before submitting patches blindly.
Comment 34•5 years ago
|
||
I verified both raptor and talos for win10 and win10-qr are running on hardware. There is one exception, talos-xperf runs on VMs as it measures FileIO operations and not timing. We saw that for those runs we had to add exceptions to the whitelist of known .dll's accessed.
I guess the question is- does the --setpref=layers.d3d11.enable-blacklist=false affect what we run on hardware (i.e. does it use the WARP backend)?
Assignee | ||
Comment 35•5 years ago
|
||
layers.d3d11.enable-blacklist=false only allows running with WARP. If we're running on hardware with an actual GPU that GPU should be used.
Comment 36•5 years ago
|
||
oh, then probably nothing to do, thanks for the confirmation Jeff (and stay on PTO!)
Updated•5 years ago
|
Updated•5 years ago
|
Description
•