Closed Bug 1831263 (webgpu-v1-cts-windows) Opened 1 year ago Closed 1 year ago

Webgpu CI jobs are not running in configs that can do anything but fail

Categories

(Core :: Graphics: WebGPU, defect, P1)

defect

Tracking

()

RESOLVED FIXED
116 Branch
Tracking Status
firefox116 --- fixed

People

(Reporter: jgilbert, Assigned: ErichDonGubler)

References

(Blocks 6 open bugs, Regressed 1 open bug)

Details

Attachments

(7 files, 4 obsolete files)

(deleted), text/x-python
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details

Quis custodiet ipsos custodes?

This is the most recent try run from bug :
https://treeherder.mozilla.org/jobs?repo=try&revision=adefaafc0b5d68929d5b0a8d3369fd523a154b59
Despite the extensive changes in that bug, apparently all's well. This was too suspicious to me.

Here's that same run but when we ask for a gpu:
https://treeherder.mozilla.org/jobs?repo=try&revision=fd60d0fa06e61a00633f0fca75faba6a474e7d65
Linux is still not getting a gpu, and so is successfully continuing to expect to fail all tests.
Windows is getting a GPU now though, and we finally start to see daylight:

[task 2023-05-04T08:06:39.943Z] 08:06:39 INFO - TEST-START | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,adapter,requestDevice:default:*
[task 2023-05-04T08:06:39.971Z] 08:06:39 INFO - Setting pref dom.webgpu.enabled to true
[task 2023-05-04T08:06:41.745Z] 08:06:41 INFO -
[task 2023-05-04T08:06:41.745Z] 08:06:41 INFO - TEST-UNEXPECTED-PASS | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,adapter,requestDevice:default:* | : - expected FAIL
[task 2023-05-04T08:06:41.745Z] 08:06:41 INFO - TEST-INFO | expected FAIL
[task 2023-05-04T08:06:41.746Z] 08:06:41 INFO - TEST-OK | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,adapter,requestDevice:default:* | took 1803ms

Also chunking probably doesn't work right now for this job, since I asked for 4 and I'm getting 2 still.

Similarly, mochitest-webgpu Linux jobs look like:

[task 2023-05-04T04:07:56.305Z] 04:07:56 INFO - TEST-START | dom/webgpu/mochitest/test_device_creation.html
[task 2023-05-04T04:07:56.490Z] 04:07:56 INFO - GECKO(1976) | Validation error without device target: No suitable adapter found
[task 2023-05-04T04:07:56.525Z] 04:07:56 INFO - GECKO(1976) | MEMORY STAT | vsize 2570MB | residentFast 146MB | heapAllocated 8MB
[task 2023-05-04T04:07:56.681Z] 04:07:56 INFO - TEST-OK | dom/webgpu/mochitest/test_device_creation.html | took 375ms

mochitest-webgpu does not appear to be running on Windows right now.

Assignee: nobody → jgilbert
Status: NEW → ASSIGNED
Attachment #9331544 - Attachment description: Bug 1831263 - Ask for gpu for wpt-webgpu jobs, and run mochi-webgpu on Windows. → Bug 1831263 - Ask for gpu for W-webgpu jobs but drop Linux; run M-webgpu on Windows.

Kelsey and Erich are working on re-evaluating what our test expectations should be, so that when we land it we won't get massive oranges for unexpected passes.

Attached file wpt-re-mark.py (deleted) —

This is the script I wrote to automate re-marking tests, since wpt-update doesn't seem to be working, and Erich's previous experience was that it was slow anyway.
Feed it "wptreport.json"s one at a time, generally from the list of artifacts from a W(webgpuN) job, and it will re-mark tests based on the unexpected values in the report json.
It generally tries to preserve existing comments and line orders, with some narrow exceptions.

It has a framework to support amending expectations with e.g. if os == "linux": FAIL/PASS later on, but consider that aspect a WIP.

If this pans out, this will land somewhere more permanent, possibly in-tree.

Attachment #9332314 - Attachment mime type: application/octet-stream → text/x-python

Erich will try to mark these failures in a one-off way, while finishing automating marking will likely land separately.

Assignee: jgilbert → egubler
Summary: Webgpu CI jobs are not running in configs that can do anything but fail → Narrow WebGPU CTS testing in CI to Windows (for now)

I think we should keep the old title.
"Webgpu CI jobs are not running in configs that can do anything but fail" is the defect/bug, "Narrow WebGPU CTS testing in CI to Windows (for now)" is the solution/patch.

Summary: Narrow WebGPU CTS testing in CI to Windows (for now) → Webgpu CI jobs are not running in configs that can do anything but fail
Blocks: 1836359
Depends on: 1836410

Co-Authored-By: Kelsey Gilbert <jgilbert@mozilla.com>

Depends on D179815

Blocks: 1836479
Attachment #9337222 - Attachment description: Bug 1831263: test(webgpu): mark 32-bit, `asan`, `tsan` tasks as `UNCOMMON_TRY_TASK_LABELS` r=#webgpu-reviewers → Bug 1831263: test(webgpu): simplify when tests runs r=#webgpu-reviewers
Attachment #9331544 - Attachment is obsolete: true
Blocks: 1836520

Seeing if I can land the first patches only depending on #webgpu-reviewers with this Try build. 🤞🏻 EDIT: It's green, ship it!

Alias: webgpu-v1-cts-windows
Pushed by egubler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/09f78a71f86a
fix(webgpu): track forgotten strong refs. in shader compilation r=webgpu-reviewers,jgilbert
https://hg.mozilla.org/integration/autoland/rev/485b73c565ed
fix(webgpu): remove strong ref. cycle b/w `Compilation{Info,Message}` r=webgpu-reviewers,jgilbert
https://hg.mozilla.org/integration/autoland/rev/d64d0a03298f
refactor(webgpu): align `Device::InitSwapChain`'s `IntSize` arg. name b/w header and impl. r=webgpu-reviewers,jgilbert

Since D179816 has approval, I'm going to try reordering it to be earlier in the patch stack, so it can land. Checking that everything is still green in CI with this Try push. EDIT: It's green! Landing.

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 116 Branch

This only partially landed.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Pushed by egubler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1ceebfa927ec
test(webgpu): simplify when tests runs r=taskgraph-reviewers,bhearsum
Pushed by egubler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/efcb29ce73d6
build(ci): restrict WebGPU to known working envs. (read: Windows) r=webgpu-reviewers,jgilbert
https://hg.mozilla.org/integration/autoland/rev/dbc987c2d0cd
build(ci): chunk WebGPU WPT tests more r=webgpu-reviewers,jgilbert

Backed out changeset 1ceebfa927ec for causing TypeError related webgpu failures

Backout link

Push with failures

Failure log

Pushed by egubler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/714383968fdb
test(webgpu): simplify when tests runs r=taskgraph-reviewers,bhearsum

Investigating backouts. I'm greatly surprised that the last single revision got backed out, given the positive Try push I noted in comment 14.

Flags: needinfo?(egubler)
Pushed by egubler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/50babf946f47
build(ci): restrict WebGPU to known working envs. (read: Windows) r=webgpu-reviewers,jgilbert
https://hg.mozilla.org/integration/autoland/rev/d777b0086154
build(ci): chunk WebGPU WPT tests more r=webgpu-reviewers,jgilbert
Flags: needinfo?(egubler)
Flags: needinfo?(egubler)
Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED

This still has parts that haven't landed.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Currently investigating why web-platform-tests jobs generated by Taskcluster's decision task are including WebGPU tests. One can observe this with Try pushes like this one, which, like backout pushes above, fail because /_mozilla/webgpu tests are still being included (despite the intent to not do so with bug 1829715, CC :jgilbert). This is the single biggest source of CI failures with current pushes.

Looks like we were never actually filtering out webgpu tests from wpt. This Try push's wpt5 on Linux 18.04 x64 WebRender debug doesn't have any of the changes that haven't yet landed and been backed out, but apparently contains /_mozilla/webgpu test runs. 😭

Sooo...we get to figure out how to actually do that filtering correctly now, since we're specifically accepting that the virtualization: virtual environment specified in the web-platform-tests is broken, and we need virtual-with-gpu instead.

Asked :jgraham about this in Firefox CI on Matrix.

Talked with :jgraham. Current plan is to expose an inverse filtering operation based on WPT tag, i.e., an --exclude-tag=… option or something similar. I'm going to try writing the PR myself, but if that doesn't work as I expect, I can either get support or hand off to :jgraham in his next work day.

Attached file WIP: Bug 1831263: feat(wpt): add `--exclude-tag` (obsolete) (deleted) —
Depends on: 1838694
No longer depends on: 1838694
Depends on: 1838703
No longer depends on: 1838703
Depends on: 1838742
Depends on: 1838739
Depends on: 1838694
No longer depends on: 1838739
Depends on: 1838739

Comment on attachment 9339159 [details]
WIP: Bug 1831263: feat(wpt): add --exclude-tag

Revision D180993 was moved to bug 1838742. Setting attachment 9339159 [details] to obsolete.

Attachment #9339159 - Attachment is obsolete: true

Comment on attachment 9339160 [details]
WIP: Bug 1831263: test(wpt): use --exclude-tag=webgpu instead of --exclude=… for web-platform-tests r?#webgpu-reviewers

Revision D180994 was moved to bug 1838742. Setting attachment 9339160 [details] to obsolete.

Attachment #9339160 - Attachment is obsolete: true
No longer depends on: 1838694
Pushed by egubler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3b0228cca008
build(ci): chunk WebGPU WPT tests more r=webgpu-reviewers,jgilbert
Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED

LOL, shoulda used leave-open.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #9339592 - Attachment is obsolete: true
Pushed by egubler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/aa9ee0e3e4e9
build(ci): restrict WebGPU to known working envs. (read: Windows) r=webgpu-reviewers,jgilbert
Blocks: 1708025
Pushed by egubler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e2747753fb5e
build(ci): restrict WebGPU to known working envs. (read: Windows) r=webgpu-reviewers,jgilbert
Flags: needinfo?(egubler)
Pushed by egubler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/179a87036c2b
build(ci): restrict WebGPU to known working envs. (read: Windows) r=webgpu-reviewers,jgilbert
Flags: needinfo?(egubler)

Backed out for causing webgpu failures

Backout link

Push with failures

Failure log

Flags: needinfo?(egubler)
Flags: needinfo?(egubler)
Pushed by egubler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/54f3fddcb9bf
build(ci): restrict WebGPU to known working envs. (read: Windows) r=webgpu-reviewers,jgilbert
Regressions: 1839768
Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED
Regressions: 1838695
Depends on: 1837557
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: