Open Bug 1838695 Opened 1 year ago Updated 1 year ago

Frequent mozilla/tests/webgpu/chunked/<number>/cts.https.html | single tracking bug

Categories

(Core :: Graphics: WebGPU, defect, P2)

defect

Tracking

()

ASSIGNED
Tracking Status
firefox-esr102 --- unaffected
firefox-esr115 --- unaffected
firefox114 --- unaffected
firefox115 --- unaffected
firefox116 --- affected
firefox117 --- affected

People

(Reporter: intermittent-bug-filer, Assigned: nical)

References

(Depends on 2 open bugs, Blocks 1 open bug, Regression)

Details

(Keywords: intermittent-failure, intermittent-testcase, regression, Whiteboard: [stockwell disable-recommended][stockwell needswork:owner])

Filed by: smolnar [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=419445188&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/DXq8di7DR9yjfjxQDhmuPw/runs/0/artifacts/public/logs/live_backing.log


TEST-START | /_mozilla/webgpu/chunked/14/cts.https.html?q=webgpu:api,validation,encoding,cmds,clearBuffer:buffer,device_mismatch:*
[task 2023-06-15T16:38:41.639Z] 16:38:41     INFO - Closing window add187e0-e897-483c-a6b6-55407ccb2da5
[task 2023-06-15T16:39:06.634Z] 16:39:06     INFO - Got timeout in harness
[task 2023-06-15T16:39:06.645Z] 16:39:06     INFO - TEST-UNEXPECTED-TIMEOUT | /_mozilla/webgpu/chunked/14/cts.https.html?q=webgpu:api,validation,encoding,cmds,clearBuffer:buffer,device_mismatch:* | TestRunner hit external timeout (this may indicate a hang)
[task 2023-06-15T16:39:06.645Z] 16:39:06     INFO - TEST-INFO took 25024ms
[task 2023-06-15T16:40:17.988Z] 16:40:17     INFO - Browser exited with return code 572
[task 2023-06-15T16:40:28.002Z] 16:40:28  WARNING - Forcibly terminating runner process
[task 2023-06-15T16:40:28.050Z] 16:40:28     INFO - Application command: Z:\task_168684417078243\build\application\firefox\firefox.exe -marionette about:blank --wait-for-browser -profile C:\Users\task_168684417078243\AppData\Local\Temp\tmp03432srs
[task 2023-06-15T16:40:28.069Z] 16:40:28     INFO - PID 2140 | 1686847105357	Marionette	INFO	Marionette enabled
[task 2023-06-15T16:40:28.070Z] 16:40:28     INFO - PID 2140 | 1686847105441	Marionette	INFO	Listening on port 56440
[task 2023-06-15T16:40:28.071Z] 16:40:28     INFO - PID 2140 | JavaScript error: resource://gre/modules/XULStore.sys.mjs, line 60: Error: Can't find profile directory.
[task 2023-06-15T16:40:28.072Z] 16:40:28     INFO - PID 2140 | console.error: (new Error("Polling for changes failed: Unexpected content-type \"text/plain;charset=US-ASCII\".", "resource://services-settings/remote-settings.sys.mjs", 324))
[task 2023-06-15T16:40:28.073Z] 16:40:28     INFO - PID 2140 | console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 399))
[task 2023-06-15T16:40:28.073Z] 16:40:28     INFO - PID 2140 | console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 399))
[task 2023-06-15T16:40:28.074Z] 16:40:28     INFO - PID 2140 | console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 399))
[task 2023-06-15T16:40:28.075Z] 16:40:28     INFO - PID 2140 | console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 399))
[task 2023-06-15T16:40:28.076Z] 16:40:28     INFO - Starting runner
[task 2023-06-15T16:40:29.116Z] 16:40:29     INFO - Installing extension from Z:\task_168684417078243\build\tests\extensions\specialpowers@mozilla.org.xpi
[task 2023-06-15T16:40:29.352Z] 16:40:29     INFO - TEST-START | /_mozilla/webgpu/chunked/14/cts.https.html?q=webgpu:api,validation,debugMarker:push_pop_call_count_unbalance,command_encoder:*
Summary: Intermittent mozilla/tests/webgpu/chunked/14/cts.https.html | single tracking bug → Intermittent mozilla/tests/webgpu/chunked/<number>/cts.https.html | single tracking bug
Duplicate of this bug: 1839963
Duplicate of this bug: 1839962
Severity: S4 → --
Flags: needinfo?(egubler)
Keywords: regression
Priority: P5 → --
Regressed by: 1839768
Summary: Intermittent mozilla/tests/webgpu/chunked/<number>/cts.https.html | single tracking bug → Frequent mozilla/tests/webgpu/chunked/<number>/cts.https.html | single tracking bug
Regressed by: webgpu-v1-cts-windows
No longer regressed by: 1839768

Set release status flags based on info from the regressing bug 1839768

:CosmimS: I intend to look at this as part of my immediate work pipeline in bug 1836479. πŸ‘πŸ»

Flags: needinfo?(egubler)
Assignee: nobody → egubler
Severity: -- → S3
Status: NEW → ASSIGNED
Priority: -- → P1

Set release status flags based on info from the regressing bug 1831263

In the past week we had 115 occurrences of this failures on:

  • windows11-32-2009-qr opt and debug
  • windows11-32-2009-shippable-qr opt
  • windows11-64-2009-qr opt and debug
  • windows11-64-2009-shippable-qr opt
    Recent failure log
[task 2023-07-07T10:45:45.586Z] 10:45:45     INFO - Closing window 80e89224-c3b2-418a-9709-ea2310c5af01
[task 2023-07-07T10:45:46.525Z] 10:45:46     INFO - PID 8728 | [GFX1-[]G: CompositorBridgFXeChil1-]: Cod mpositorBrreceidives geCh[GF[IPCiGldX[F GcFl1o[-XG]Xs11F-:X-e  1 wi]tr:]Ch  reaecC-]s::oe oomp oCCmopnossimipiotoves Ismo=PAiCtporBr crobntoBrilrismaidgodtgleSorBCohsuet drhwithi BrreowreaCslndi odngeh
[task 2023-07-07T10:45:46.526Z] 10:45:46     INFO - PID 8728 | ildire drCchgi=eldi erCevchAieeld irbvneosre ceecs eIImPiiCvPvC  ees cIslac PloIlPCCS houtsd oesccw n
[task 2023-07-07T10:45:46.527Z] 10:45:46     INFO - PID 8728 | lloeswit h ree woitwsaithsohn=e wit Area sonrbenasoh rno==reAabmaslShonn=AuotAbrnombalnrodmarlShSomahuutwlSnh
[task 2023-07-07T10:45:46.527Z] 10:45:46     INFO - PID 8728 | uttddownow
[task 2023-07-07T10:45:46.528Z] 10:45:46     INFO - PID 8728 | n
[task 2023-07-07T10:45:46.528Z] 10:45:46     INFO - PID 8728 | down
[task 2023-07-07T10:45:46.551Z] 10:45:46     INFO - PID 8728 | [GFX1-]: CompositorBridgeChild receives IPC close with reason=AbnormalShutdown
[task 2023-07-07T10:45:46.555Z] 10:45:46     INFO - 
[task 2023-07-07T10:45:46.555Z] 10:45:46     INFO - TEST-UNEXPECTED-FAIL | /_mozilla/webgpu/chunked/12/cts.https.html?q=webgpu:api,validation,createBindGroup:buffer,usage:* | :type="uniform" - assert_unreached: 
[task 2023-07-07T10:45:46.555Z] 10:45:46     INFO -   - EXCEPTION: WebGPU device failed to initialize with AbortError "Internal communication error!"; not retrying
[task 2023-07-07T10:45:46.555Z] 10:45:46     INFO -     assert@https://web-platform.test:8443/_mozilla/webgpu/common/util/util.js:37:11
[task 2023-07-07T10:45:46.555Z] 10:45:46     INFO -     acquire@https://web-platform.test:8443/_mozilla/webgpu/webgpu/util/device_pool.js:36:11
[task 2023-07-07T10:45:46.555Z] 10:45:46     INFO -     
[task 2023-07-07T10:45:46.555Z] 10:45:46     INFO -  Reached unreachable code
[task 2023-07-07T10:45:46.555Z] 10:45:46     INFO - wpt_fn@https://web-platform.test:8443/_mozilla/webgpu/common/runtime/wpt.js:65:25```
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

Erich, is there an ETA to resolve these frequent test failures?

We're looking into this. These intermittents significantly reduce the value of the CTS to us, so it's a priority. But we do not have a diagnosis, so we don't have an ETA.

Since the intermittents aren't isolated to any particular WebGPU CTS tests, as far as we know, it's difficult to identify some subset of tests we could disable to quiet things down. If we still don't have a diagnosis after a few days, we can talk about broader measures to keep this from wasting people's time until we can get it fixed.

Depends on: 1843021
Depends on: 1843250

There were a significant number of failures associated with this bug that are because of this. I'm hoping that the number of failures here goes down significantly after fixing it. 🀞🏻

Update

There have been 207 total failures within the last 7 days:

  • 96 failures on Windows 11 x86 22H2 WebRender opt
  • 63 failures on Windows 11 x86 22H2 WebRender Shippable opt
  • 36 failures on Windows 11 x64 22H2 WebRender opt
  • 12 failures on Windows 11 x64 22H2 WebRender Shippable opt

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=424073238&repo=autoland&lineNumber=1758

It seems that the highest number of intermittent failures are on 32-bit builds, by a wide margin. I think the next priority for now is going to be tackling those.

Priority: P1 → P2
Assignee: egubler → nical.bugzilla

This failure turned into a perma failure and there are 269 failures in the last 7 days.
Nicolas, can you please take a look over ?

Flags: needinfo?(nical.bugzilla)
Summary: Frequent mozilla/tests/webgpu/chunked/<number>/cts.https.html | single tracking bug → Perma mozilla/tests/webgpu/chunked/<number>/cts.https.html | single tracking bug

:NarcisB: Could you elaborate on what but signatures turned perma? This bug aggregates numerous signatures, and part of our work has been to isolate them into separate well-understood bugs.

:ErichDonGubler, I did a few backfills & retriggers and all of them failed

This seems to be permafailing only with this failure line TEST-UNEXPECTED-FAIL | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,adapter,requestDevice:limit,worse_than_default:* | :limit="maxBufferSize" - assert_unreached:
Here are some examples that show that this is not permafailing on all webgpu jobs or other signatures:

Snippet from permafailing job:

[task 2023-08-07T21:20:44.729Z] 21:20:44     INFO - TEST-START | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,adapter,requestDevice:limit,worse_than_default:*
[task 2023-08-07T21:20:44.734Z] 21:20:44     INFO - Closing window 512136e6-ed72-4bd1-8c7d-4fe09779f824
[task 2023-08-07T21:20:51.636Z] 21:20:51     INFO - .......
[task 2023-08-07T21:20:51.636Z] 21:20:51     INFO - TEST-PASS | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,adapter,requestDevice:limit,worse_than_default:* | :limit="maxSampledTexturesPerShaderStage" 
[task 2023-08-07T21:20:51.636Z] 21:20:51     INFO - TEST-PASS | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,adapter,requestDevice:limit,worse_than_default:* | :limit="maxSamplersPerShaderStage" 
[task 2023-08-07T21:20:51.637Z] 21:20:51     INFO - TEST-PASS | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,adapter,requestDevice:limit,worse_than_default:* | :limit="maxStorageBuffersPerShaderStage" 
[task 2023-08-07T21:20:51.637Z] 21:20:51     INFO - TEST-PASS | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,adapter,requestDevice:limit,worse_than_default:* | :limit="maxStorageTexturesPerShaderStage" 
[task 2023-08-07T21:20:51.637Z] 21:20:51     INFO - TEST-PASS | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,adapter,requestDevice:limit,worse_than_default:* | :limit="maxUniformBuffersPerShaderStage" 
[task 2023-08-07T21:20:51.637Z] 21:20:51     INFO - TEST-PASS | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,adapter,requestDevice:limit,worse_than_default:* | :limit="maxUniformBufferBindingSize" 
[task 2023-08-07T21:20:51.637Z] 21:20:51     INFO - TEST-PASS | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,adapter,requestDevice:limit,worse_than_default:* | :limit="maxStorageBufferBindingSize" 
[task 2023-08-07T21:20:51.637Z] 21:20:51     INFO - TEST-PASS | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,adapter,requestDevice:limit,worse_than_default:* | :limit="minUniformBufferOffsetAlignment" 
[task 2023-08-07T21:20:51.637Z] 21:20:51     INFO - TEST-PASS | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,adapter,requestDevice:limit,worse_than_default:* | :limit="minStorageBufferOffsetAlignment" 
[task 2023-08-07T21:20:51.637Z] 21:20:51     INFO - TEST-PASS | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,adapter,requestDevice:limit,worse_than_default:* | :limit="maxVertexBuffers" 
[task 2023-08-07T21:20:51.637Z] 21:20:51     INFO - TEST-UNEXPECTED-FAIL | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,adapter,requestDevice:limit,worse_than_default:* | :limit="maxBufferSize" - assert_unreached: 
[task 2023-08-07T21:20:51.637Z] 21:20:51     INFO -   - EXCEPTION: requestDevice: Request for limit 'maxBufferSize' must be <= supported 0, was 268435455.
[task 2023-08-07T21:20:51.637Z] 21:20:51     INFO -     subcase: mul=1;add=-1
[task 2023-08-07T21:20:51.637Z] 21:20:51     INFO -     @https://web-platform.test:8443/_mozilla/webgpu/webgpu/api/operation/adapter/requestDevice.spec.js:263:36
[task 2023-08-07T21:20:51.637Z] 21:20:51     INFO -   - INFO: subcase: mul=1;add=-1
[task 2023-08-07T21:20:51.637Z] 21:20:51     INFO -     OK
[task 2023-08-07T21:20:51.637Z] 21:20:51     INFO -   - EXCEPTION: requestDevice: Request for limit 'maxBufferSize' must be <= supported 0, was 268435356.
[task 2023-08-07T21:20:51.637Z] 21:20:51     INFO -     subcase: mul=1;add=-100
[task 2023-08-07T21:20:51.637Z] 21:20:51     INFO -     @https://web-platform.test:8443/_mozilla/webgpu/webgpu/api/operation/adapter/requestDevice.spec.js:263:36
[task 2023-08-07T21:20:51.638Z] 21:20:51     INFO -   - INFO: subcase: mul=1;add=-100
[task 2023-08-07T21:20:51.638Z] 21:20:51     INFO -     OK
[task 2023-08-07T21:20:51.638Z] 21:20:51     INFO -  Reached unreachable code
[task 2023-08-07T21:20:51.638Z] 21:20:51     INFO - wpt_fn@https://web-platform.test:8443/_mozilla/webgpu/common/runtime/wpt.js:65:25
[task 2023-08-07T21:20:51.643Z] 21:20:51     INFO - ...........
[task 2023-08-07T21:20:51.643Z] 21:20:51     INFO - TEST-OK | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,adapter,requestDevice:limit,worse_than_default:* | took 6909ms
[task 2023-08-07T21:20:51.645Z] 21:20:51     INFO - PID 2172 | 1691443251640	Marionette	INFO	Stopped listening on port 53094
[task 2023-08-07T21:20:52.327Z] 21:20:52     INFO - PID 2172 | console.error: ({})
[task 2023-08-07T21:20:52.636Z] 21:20:52     INFO - Browser exited with return code 0
[task 2023-08-07T21:20:52.637Z] 21:20:52     INFO - Closing logging queue
[task 2023-08-07T21:20:52.637Z] 21:20:52     INFO - queue closed
[task 2023-08-07T21:20:52.841Z] 21:20:52     INFO - Application command: Z:\task_169144156238687\build\application\firefox\firefox.exe -marionette about:blank --wait-for-browser -profile C:\Users\task_169144156238687\AppData\Local\Temp\tmpxza982he
[task 2023-08-07T21:20:52.852Z] 21:20:52     INFO - PID 3652 | 1691443213094	Marionette	INFO	Marionette enabled
[task 2023-08-07T21:20:52.853Z] 21:20:52     INFO - PID 3652 | 1691443213190	Marionette	INFO	Listening on port 53095
[task 2023-08-07T21:20:52.853Z] 21:20:52     INFO - PID 3652 | JavaScript error: resource://gre/modules/XULStore.sys.mjs, line 60: Error: Can't find profile directory.
[task 2023-08-07T21:20:52.854Z] 21:20:52     INFO - PID 3652 | console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 399))
[task 2023-08-07T21:20:52.855Z] 21:20:52     INFO - PID 3652 | console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 399))
[task 2023-08-07T21:20:52.855Z] 21:20:52     INFO - PID 3652 | console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 399))
[task 2023-08-07T21:20:52.856Z] 21:20:52     INFO - PID 3652 | console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 399))
[task 2023-08-07T21:20:52.857Z] 21:20:52     INFO - PID 3652 | console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 399))
[task 2023-08-07T21:20:52.857Z] 21:20:52     INFO - PID 3652 | console.error: (new Error("Polling for changes failed: Unexpected content-type \"text/plain;charset=US-ASCII\".", "resource://services-settings/remote-settings.sys.mjs", 321))
[task 2023-08-07T21:20:52.857Z] 21:20:52     INFO - Starting runner
[task 2023-08-07T21:20:54.217Z] 21:20:54     INFO - Installing extension from Z:\task_169144156238687\build\tests\extensions\specialpowers@mozilla.org.xpi
[task 2023-08-07T21:20:54.473Z] 21:20:54     INFO - TEST-START | /_mozilla/webgpu/chunked/1/cts.https.html?q=webgpu:api,operation,buffers,map:mapAsync,write:*

Hi Erich! If you want to take a look at the details I've mentioned above, to get a more clearer picture.
And, should we file a separate bug for the permafailure?

Flags: needinfo?(egubler)
Flags: needinfo?(nical.bugzilla)

Filed Bug 1847771 to be easier to track the permafailure and not mess up the orange factor on this bug. I've managed to find what caused this.

Summary: Perma mozilla/tests/webgpu/chunked/<number>/cts.https.html | single tracking bug → Frequent mozilla/tests/webgpu/chunked/<number>/cts.https.html | single tracking bug
Flags: needinfo?(egubler)

Update

There have been 164 failures within the last 7 days:

  • 46 failures on Windows 11 x86 22H2 WebRender opt/debug
  • 33 failures on Windows 11 x86 22H2 WebRender Shippable opt
  • 50 failures on Windows 11 x64 22H2 WebRender debug
  • 35 failures on Windows 11 x64 22H2 WebRender Shippable opt

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=427155783&repo=mozilla-central&lineNumber=7277

Whiteboard: [stockwell disable-recommended] → [stockwell disable-recommended][stockwell needswork:owner]
You need to log in before you can comment on or make changes to this bug.