Permafailing tier 2 dom/media/webrtc/tests/mochitests/test_peerConnection_gatherWithStun300.html | Gathering did not complete within 2000 ms with redirect rule: => iceServers: [{"urls":["turn:10.7.204.44:3479?transport=tcp"],"username":"foo","credential"
Categories
(Core :: WebRTC: Audio/Video, defect, P5)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox89 | --- | unaffected |
firefox90 | --- | fixed |
firefox91 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: bwc)
References
(Regression)
Details
(Keywords: intermittent-failure, regression)
Attachments
(2 files)
Filed by: ncsoregi [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=341046795&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/d1SecatzS0uUD_Kchb4ihg/runs/0/artifacts/public/logs/live_backing.log
[task 2021-05-28T09:51:15.028Z] 09:51:15 INFO - ICE Server:
[task 2021-05-28T09:51:15.034Z] 09:51:15 INFO - ICE Server: Unhandled Error
[task 2021-05-28T09:51:15.040Z] 09:51:15 INFO - ICE Server: Traceback (most recent call last):
[task 2021-05-28T09:51:15.046Z] 09:51:15 INFO - ICE Server: File "iceserver\iceserver.py", line 999, in <module>
[task 2021-05-28T09:51:15.052Z] 09:51:15 INFO - ICE Server: reactor.run()
[task 2021-05-28T09:51:15.060Z] 09:51:15 INFO - ICE Server: File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\internet\base.py", line 1283, in run
[task 2021-05-28T09:51:15.066Z] 09:51:15 INFO - ICE Server: self.mainLoop()
[task 2021-05-28T09:51:15.075Z] 09:51:15 INFO - ICE Server: File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\internet\base.py", line 1295, in mainLoop
[task 2021-05-28T09:51:15.082Z] 09:51:15 INFO - ICE Server: self.doIteration(t)
[task 2021-05-28T09:51:15.090Z] 09:51:15 INFO - ICE Server: File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\internet\selectreactor.py", line 143, in doSelect
[task 2021-05-28T09:51:15.097Z] 09:51:15 INFO - ICE Server: _logrun(selectable, _drdw, selectable, method)
[task 2021-05-28T09:51:15.103Z] 09:51:15 INFO - ICE Server: --- <exception caught here> ---
[task 2021-05-28T09:51:15.111Z] 09:51:15 INFO - ICE Server: File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\python\log.py", line 103, in callWithLogger
[task 2021-05-28T09:51:15.117Z] 09:51:15 INFO - ICE Server: return callWithContext({"system": lp}, func, *args, **kw)
[task 2021-05-28T09:51:15.125Z] 09:51:15 INFO - ICE Server: File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\python\log.py", line 86, in callWithContext
[task 2021-05-28T09:51:15.132Z] 09:51:15 INFO - ICE Server: return context.call({ILogContext: newCtx}, func, *args, **kw)
[task 2021-05-28T09:51:15.140Z] 09:51:15 INFO - ICE Server: File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\python\context.py", line 122, in callWithContext
[task 2021-05-28T09:51:15.147Z] 09:51:15 INFO - ICE Server: return self.currentContext().callWithContext(ctx, func, *args, **kw)
[task 2021-05-28T09:51:15.155Z] 09:51:15 INFO - ICE Server: File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\python\context.py", line 85, in callWithContext
[task 2021-05-28T09:51:15.161Z] 09:51:15 INFO - ICE Server: return func(*args,**kw)
[task 2021-05-28T09:51:15.169Z] 09:51:15 INFO - ICE Server: File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\internet\selectreactor.py", line 154, in _doReadOrWrite
[task 2021-05-28T09:51:15.176Z] 09:51:15 INFO - ICE Server: self._disconnectSelectable(selectable, why, method=="doRead")
[task 2021-05-28T09:51:15.185Z] 09:51:15 INFO - ICE Server: File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\internet\posixbase.py", line 252, in _disconnectSelectable
[task 2021-05-28T09:51:15.190Z] 09:51:15 INFO - Buffered messages logged at 09:51:14
[task 2021-05-28T09:51:15.196Z] 09:51:15 INFO - ICE Server: selectable.readConnectionLost(f)
[task 2021-05-28T09:51:15.204Z] 09:51:15 INFO - ICE Server: File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\internet\tcp.py", line 307, in readConnectionLost
[task 2021-05-28T09:51:15.210Z] 09:51:15 INFO - ICE Server: self.connectionLost(reason)
[task 2021-05-28T09:51:15.218Z] 09:51:15 INFO - ICE Server: File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\internet\tcp.py", line 327, in connectionLost
[task 2021-05-28T09:51:15.224Z] 09:51:15 INFO - ICE Server: protocol.connectionLost(reason)
[task 2021-05-28T09:51:15.231Z] 09:51:15 INFO - ICE Server: builtins.AttributeError: 'TcpStunRedirectHandler' object has no attribute 'connectionLost'
[task 2021-05-28T09:51:15.237Z] 09:51:15 INFO - ICE Server:
[task 2021-05-28T09:51:15.242Z] 09:51:15 INFO - Buffered messages finished
[task 2021-05-28T09:51:15.255Z] 09:51:15 INFO - TEST-UNEXPECTED-FAIL | dom/media/webrtc/tests/mochitests/test_peerConnection_gatherWithStun300.html | Gathering did not complete within 2000 ms with redirect rule: => iceServers: [{"urls":["turn:10.7.204.44:3479?transport=tcp"],"username":"foo","credential":"bar"}]
[task 2021-05-28T09:51:15.256Z] 09:51:15 INFO - SimpleTest.ok@SimpleTest/SimpleTest.js:417:16
[task 2021-05-28T09:51:15.257Z] 09:51:15 INFO - gatherWithTimeout@dom/media/webrtc/tests/mochitests/iceTestUtils.js:51:7
[task 2021-05-28T09:51:15.262Z] 09:51:15 INFO - candidates: []
[task 2021-05-28T09:51:15.268Z] 09:51:15 INFO - Not taking screenshot here: see the one that was previously logged
[task 2021-05-28T09:51:15.282Z] 09:51:15 INFO - TEST-UNEXPECTED-FAIL | dom/media/webrtc/tests/mochitests/test_peerConnection_gatherWithStun300.html | Should have two relay candidates with redirect rule: => iceServers: [{"urls":["turn:10.7.204.44:3479?transport=tcp"],"username":"foo","credential":"bar"}] - got +0, expected 2
[task 2021-05-28T09:51:15.283Z] 09:51:15 INFO - SimpleTest.is@SimpleTest/SimpleTest.js:500:14
[task 2021-05-28T09:51:15.284Z] 09:51:15 INFO - checkRelayTcp@dom/media/webrtc/tests/mochitests/iceTestUtils.js:164:5
[task 2021-05-28T09:51:15.292Z] 09:51:15 INFO - GECKO(7004) | [Child 14792: Main Thread]: E/signaling [main|PeerConnectionImpl] PeerConnectionImpl.cpp:2778: GetStats: Found no pipelines matching selector.
[task 2021-05-28T09:51:15.302Z] 09:51:15 INFO - TEST-PASS | dom/media/webrtc/tests/mochitests/test_peerConnection_gatherWithStun300.html | No UDP relay candidates should be present with redirect rule: => iceServers: [{"urls":["turn:10.7.204.44:3479?transport=tcp"],"username":"foo","credential":"bar"}]
Comment 1•3 years ago
|
||
Caused by Bug 857668.
Byron, could you help us out with this one?
Thank you.
Assignee | ||
Comment 2•3 years ago
|
||
At first blush, it looks like the machine this is running on is simply too slow to run the test case in the time allotted. I'll look into it today.
Assignee | ||
Comment 3•3 years ago
|
||
Assignee | ||
Comment 4•3 years ago
|
||
Assignee | ||
Comment 5•3 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
Comment 7•3 years ago
|
||
I feared 2000 ms was too aggressive.
Comment 8•3 years ago
|
||
Set release status flags based on info from the regressing bug 857668
Assignee | ||
Comment 9•3 years ago
|
||
Every time I try to run test-windows10-aarch64/opt-mochitest-media-gli-e10s on try, it gets an exception. Trying the base revision to see if somehow the fixes I'm trying are the cause.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5add3afa72214cef193a9f690df6732e76c7cb6a
Edit: mochitest job has been queued for 1413 minutes. What's going on here?
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 10•3 years ago
|
||
I can't fix this if I can't run our test suite at all on try.
Comment 11•3 years ago
|
||
I think that Windows aarch64 platform is very limited and it looks like in this case there weren't any workers available to claim the task before the task deadline was exceeded. Actually looks like this pool is labeled "experimental" and consists of only a single laptop (one of the two workers listed there is quarantined)?
Is this pool try only or something? I'm not really sure how a tier-2 task could be running on a pool with a single machine... But yeah, basically there's no capacity in this pool.
Assignee | ||
Comment 12•3 years ago
|
||
So, I guess the thing to do is disable this test entirely on Windows aarch64 until we have adequate resources for try. Is there a tracking bug I can hook things off of?
Assignee | ||
Comment 13•3 years ago
|
||
I guess I could also extend the timeout and see if that helps on m-c, before taking the step of disabling entirely.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 14•3 years ago
|
||
Assignee | ||
Comment 15•3 years ago
|
||
Assignee | ||
Comment 16•3 years ago
|
||
Ok. Let's land the patch in comment 15, and wait to see how that effects the failure rate. We might consider extending the timeout some more if it continues to occur, but too much and our ability to detect regressions will be compromised. We'll leave this bug open until we determine that the problem is fixed, or we determine that we aren't willing to relax timeouts any further and disable the test.
Comment 17•3 years ago
|
||
I'm not aware of any tracking bug for this pool. With so few machines in this pool, it's likely we need to be very intentional about what we run on them, so if you don't see any particular value running these tests with Windows aarch64, then removing these tasks might be the best way forward.
Assignee | ||
Comment 18•3 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #17)
I'm not aware of any tracking bug for this pool. With so few machines in this pool, it's likely we need to be very intentional about what we run on them, so if you don't see any particular value running these tests with Windows aarch64, then removing these tasks might be the best way forward.
I'm just looking for a bug to make a dependency of "re-enable this test on aarch64". Is there a "promote aarch64 to tier 1" bug anywhere?
Comment hidden (Intermittent Failures Robot) |
Comment 20•3 years ago
|
||
Comment 21•3 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 23•3 years ago
|
||
Relaxing the timeout does not seem to have stopped the failures, but they seem to be happening in later, more complicated subtests (the ones that do both UDP and TCP). Maybe the thing to do is selectively relax the timeouts on aarch64?
Assignee | ||
Comment 24•3 years ago
|
||
Comment 25•3 years ago
|
||
Comment 26•3 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 28•3 years ago
|
||
Comment on attachment 9225197 [details]
Bug 1713300: Extend this timeout to give Windows aarch64 more time. r?jib
Beta/Release Uplift Approval Request
- User impact if declined: Frequent oranges.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Just a test change, and a simple one at that.
- String changes made/needed: None
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 29•3 years ago
|
||
Seems to only be happening on m-b now. This platform is just really slow it seems.
Comment 30•3 years ago
|
||
Does this bug need to stay open at this point, or should we close it as fixed?
Comment 31•3 years ago
|
||
Comment on attachment 9225197 [details]
Bug 1713300: Extend this timeout to give Windows aarch64 more time. r?jib
This can land a=test-only
Updated•3 years ago
|
Assignee | ||
Comment 32•3 years ago
|
||
Let's leave this open another week to see if the oranges are fully gone. Leaving ni on self.
Comment 33•3 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/01a9bd0a89c8
https://hg.mozilla.org/releases/mozilla-beta/rev/86e66cc94c21
Assignee | ||
Comment 34•3 years ago
|
||
Yep, this looks fixed to me.
Updated•3 years ago
|
Description
•