Closed Bug 1571969 Opened 5 years ago Closed 5 years ago

Stop running all the QR tests as virtual-with-gpu

Categories

(Testing :: General, enhancement)

Product:

Component:

Version:

Version 3

Type:

enhancement

Priority:

Not set

Severity:

normal

Tracking

(firefox70 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla70

Tracking Flags:

Tracking

Status

firefox70

---

fixed

People

(Reporter: bholley, Assigned: jrmuizel)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

Bug 1571969 - Stop running all the QR tests as virtual-with-gpu. 5 years ago Bobby Holley (:bholley) (deleted), text/x-phabricator-request		Details
Bug 1571969. Stop running all the QR tests as virtual-with-gpu. 5 years ago Jeff Muizelaar [:jrmuizel] (deleted), text/x-phabricator-request		Details

Bobby Holley (:bholley)

Reporter

Description

•

5 years ago

I was digging through our cost numbers for automation and realized that our Windows 10 QuantumRender tests all run on VMs with virtualized GPUs. These VMs cost about $1.10 per hour, as opposed to ~$0.32 per hour for regular VMs. So regular VMs cost 70% less.

For non-QR Windows automation, we still run some tests with virtualized GPUs - the mochitest-gpu suite, the webgl tests, reftests, and a handful of other things. This accounts for about 10% of total CPU time running windows tests, and seems like a good cost trade-off for rendering-heavy suites. But I don't think it's really justifiable to run everything with GPUs given what it costs. We should still be able to test the WebRender code paths by forcing it on and running against WARP.

If we align the QR tests with the non-QR ones, we can gain a 70% cost reduction on 90% of our QR tests. That's huge.

Bobby Holley (:bholley)

Reporter

Comment 1

•

5 years ago

Attached file Bug 1571969 - Stop running all the QR tests as virtual-with-gpu. (obsolete) (deleted) — Details

Bobby Holley (:bholley)

Reporter

Comment 2

•

5 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8f8524c0d66e4b53c7fd21ef8205c8482bbe516e

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 3

•

5 years ago

I don't think WebRender is getting enabled on WARP. I'll do some investigation into why.

Joel Maher ( :jmaher ) (UTC -8)

Comment 4

•

5 years ago

I had done a try push to fix many of the test differences:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c8e60caca25de21e532e55fbce2267a2275635d3

given :jrmuizel's comment, I will hold on on finishing that

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 5

•

5 years ago

Here's a try push that might work: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fac40c56d3e0f0843ab977a8bd2467c3a0e9dea3

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 6

•

5 years ago

It looks like all the mochitests pass. There's some WARP related things that need fixing for browserchrome.

Bobby Holley (:bholley)

Reporter

Comment 7

•

5 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #6)

It looks like all the mochitests pass. There's some WARP related things that need fixing for browserchrome.

Handing this bug off to Jeff, since he's probably the right person to get it over the line.

Assignee: bobbyholley → jmuizelaar

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 8

•

5 years ago

Bug 1573616 should fix the bc5 failures.

Depends on: 1573616

Jeff Muizelaar [:jrmuizel]

Assignee

Updated

•

5 years ago

Depends on: 1573645

Jeff Muizelaar [:jrmuizel]

Assignee

Updated

•

5 years ago

Depends on: 1573681

Jeff Muizelaar [:jrmuizel]

Assignee

Updated

•

5 years ago

Depends on: 1573682

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 9

•

5 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1273771f32e5bce0903466cffe9f3d6a1100b40f is attempt with all of the dependent bugs fixed.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 10

•

5 years ago

This has an xpcshell test failure in the debug build and a unexpected PWebRenderBridge::Msg_GetSnapshot sync IPC before first paint bc1 failure.

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

5 years ago

Blocks: 1573872

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 11

•

5 years ago

A better attempt: https://treeherder.mozilla.org/#/jobs?repo=try&revision=32ed075a339375c956fb55ee0c9c308c87edcd49

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 12

•

5 years ago

And one that works: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7b7a111317954b350d989cddbaff8c715eabf4c8&selectedJob=261649553

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 13

•

5 years ago

Latest attempt: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ecc13f408913ddf599ba96e778b3fb9b5906ffe8

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 14

•

5 years ago

And a version that keeps web-platform-reftests running on gpus https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4ce54bfb618a55ae21aec48a06eded541a06e98

Jeff Muizelaar [:jrmuizel]

Assignee

Updated

•

5 years ago

Depends on: 1574281

Jeff Muizelaar [:jrmuizel]

Assignee

Updated

•

5 years ago

Depends on: 1574327

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 15

•

5 years ago

And one that builds: https://treeherder.mozilla.org/#/jobs?repo=try&revision=793476ed5b89555081a14de737eeb79cbc30acfa

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 16

•

5 years ago

Attached file Bug 1571969. Stop running all the QR tests as virtual-with-gpu. (deleted) — Details

This uses the layers.d3d11.enable-blacklist pref to allow running WebRender on WARP.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 17

•

5 years ago

The final result: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7ad3de31a36f4bd93c9d0eaedcf85bff6dc0c775

Comment 18

•

5 years ago

Pushed by jmuizelaar@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/10a056f52e49 Stop running all the QR tests as virtual-with-gpu. r=jmaher

Dorel Luca [:dluca]

Comment 19

•

5 years ago

Backed out changeset 10a056f52e49 (bug 1571969) for cpp failure. CLOSED TREE

Log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=262037121&repo=autoland&lineNumber=1335

Push with failures:
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=6e01b87795cf68cdbb0d757c23d47959ab52ba60

Backout:
https://hg.mozilla.org/integration/autoland/rev/f3cbdced5732b2bb1ae9cb9492697adc1da4645b

Flags: needinfo?(jmuizelaar)

Comment 20

•

5 years ago

Also caused perma failures on Windows 10 x64 Talos: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Cpending%2Crunning%2Ctestfailed%2Cbusted%2Cexception&searchStr=windows%2C10%2Cx64%2Cquantumrender%2Cshippable%2Copt%2Ctalos%2Cperformance%2Ctests%2Ctest-windows10-64-shippable-qr%2Fopt-talos-xperf-e10s%2Ct%28x%29&revision=819d45761f2f047021814b400b8a225475f08695&selectedJob=262056788

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=262056788&repo=autoland&lineNumber=1667

Comment 21

•

5 years ago

Pushed by jmuizelaar@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4f8a94072fa6 Stop running all the QR tests as virtual-with-gpu. r=jmaher

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 22

•

5 years ago

I don't see this in time and requeued a landing. It will need to be backed out again

Comment 23

•

5 years ago

Backout: https://hg.mozilla.org/integration/autoland/rev/3366086e0e70fc1980d05fe485e6e94496b1e727

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 24

•

5 years ago

xperf talos fix: https://treeherder.mozilla.org/#/jobs?repo=try&revision=736e1a62718a7243381f2f0de52bd7b72d8a97ab

Flags: needinfo?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 25

•

5 years ago

I typoed system32. Here's a new try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f957505d4737ca3e875ea1f46d6565dcfd8cf1df

Comment 26

•

5 years ago

Pushed by jmuizelaar@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/67be750311a1 Stop running all the QR tests as virtual-with-gpu. r=jmaher

Narcis Beleuzu [:NarcisB]

Comment 27

•

5 years ago

Backed out for raptor failures on tests.py

Backout link: https://hg.mozilla.org/integration/autoland/rev/7b37e3f241fbdac1229b2c44a8363f1337a939a8
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=262152624&repo=autoland&lineNumber=93

Flags: needinfo?(jmuizelaar)

Comment 28

•

5 years ago

Pushed by jmuizelaar@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a5710687f9b4 Stop running all the QR tests as virtual-with-gpu. r=jmaher

Noemi Erli[:noemi_erli]

Comment 29

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a5710687f9b4

Status: NEW → RESOLVED

Closed: 5 years ago

status-firefox70: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla70

Bobby Holley (:bholley)

Reporter

Comment 30

•

5 years ago

Awesome, thanks Jeff!

For posterity, can you explain why we needed to do anything with Talos and Raptor here? I'd think we would be running those on hardware.

Joel Maher ( :jmaher ) (UTC -8)

Comment 31

•

5 years ago

:bholley, I know :jrmuizel is on pto this week, in this line:
https://hg.mozilla.org/mozilla-central/rev/a5710687f9b4#l4.17

we skip harnesses that do not support --setpref, and those harnesses will not end up running with WARP, but as they were previously. For Talos and Raptor, we shouldn't do anything, so in this case Raptor is running as before, but Talos has the new pref set. I assume that should be fixed.

To be honest, I am not sure what if any regressions we see with win-qr talos/raptor. If there were changes (and I would expect there to be some) we should see them posted in this bug. Dave, can you see what changes came up with this change to win10-qr talos results?

:ahal, can you add in Talos to the list referenced earlier in my comment?

Flags: needinfo?(dave.hunt)

Flags: needinfo?(ahal)

Bobby Holley (:bholley)

Reporter

Comment 32

•

5 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #31)

To be honest, I am not sure what if any regressions we see with win-qr talos/raptor. If there were changes (and I would expect there to be some) we should see them posted in this bug. Dave, can you see what changes came up with this change to win10-qr talos results?

I'm still confused. Per comment 30, I was under the impression that Talos and Raptor always ran on physical hardware, so I would expect that the VM configuration changes in this bug to have no effect on those tests. Am I mistaken somehow?

Andrew Halberstadt [:ahal]

Comment 33

•

5 years ago

Looks like talos does support --setpref (and raptor doesn't). So is there still something to fix? I have zero understanding of talos configurations so would like to clarify before submitting patches blindly.

Flags: needinfo?(ahal) → needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 34

•

5 years ago

I verified both raptor and talos for win10 and win10-qr are running on hardware. There is one exception, talos-xperf runs on VMs as it measures FileIO operations and not timing. We saw that for those runs we had to add exceptions to the whitelist of known .dll's accessed.

I guess the question is- does the --setpref=layers.d3d11.enable-blacklist=false affect what we run on hardware (i.e. does it use the WARP backend)?

Flags: needinfo?(jmaher)

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 35

•

5 years ago

layers.d3d11.enable-blacklist=false only allows running with WARP. If we're running on hardware with an actual GPU that GPU should be used.

Flags: needinfo?(jmuizelaar)

Joel Maher ( :jmaher ) (UTC -8)

Comment 36

•

5 years ago

oh, then probably nothing to do, thanks for the confirmation Jeff (and stay on PTO!)

Alexandru Ionescu (needinfo me) [:alexandrui]

Updated

•

5 years ago

Regressions: 1575534

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

5 years ago

Flags: needinfo?(dave.hunt)

Phabricator Automation

Updated

•

5 years ago

Attachment #9083536 - Attachment is obsolete: true

You need to log in before you can comment on or make changes to this bug.