1713300 - Permafailing tier 2 dom/media/webrtc/tests/mochitests/test_peerConnection_gatherWithStun300.html | Gathering did not complete within 2000 ms with redirect rule: => iceServers: [{"urls":["turn:10.7.204.44:3479?transport=tcp"],"username":"foo","credential"

Reporter

Description

•

3 years ago

treeherder

Filed by: ncsoregi [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=341046795&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/d1SecatzS0uUD_Kchb4ihg/runs/0/artifacts/public/logs/live_backing.log

[task 2021-05-28T09:51:15.028Z] 09:51:15     INFO - ICE Server: 
[task 2021-05-28T09:51:15.034Z] 09:51:15     INFO - ICE Server: Unhandled Error
[task 2021-05-28T09:51:15.040Z] 09:51:15     INFO - ICE Server: Traceback (most recent call last):
[task 2021-05-28T09:51:15.046Z] 09:51:15     INFO - ICE Server:   File "iceserver\iceserver.py", line 999, in <module>
[task 2021-05-28T09:51:15.052Z] 09:51:15     INFO - ICE Server:     reactor.run()
[task 2021-05-28T09:51:15.060Z] 09:51:15     INFO - ICE Server:   File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\internet\base.py", line 1283, in run
[task 2021-05-28T09:51:15.066Z] 09:51:15     INFO - ICE Server:     self.mainLoop()
[task 2021-05-28T09:51:15.075Z] 09:51:15     INFO - ICE Server:   File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\internet\base.py", line 1295, in mainLoop
[task 2021-05-28T09:51:15.082Z] 09:51:15     INFO - ICE Server:     self.doIteration(t)
[task 2021-05-28T09:51:15.090Z] 09:51:15     INFO - ICE Server:   File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\internet\selectreactor.py", line 143, in doSelect
[task 2021-05-28T09:51:15.097Z] 09:51:15     INFO - ICE Server:     _logrun(selectable, _drdw, selectable, method)
[task 2021-05-28T09:51:15.103Z] 09:51:15     INFO - ICE Server: --- <exception caught here> ---
[task 2021-05-28T09:51:15.111Z] 09:51:15     INFO - ICE Server:   File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\python\log.py", line 103, in callWithLogger
[task 2021-05-28T09:51:15.117Z] 09:51:15     INFO - ICE Server:     return callWithContext({"system": lp}, func, *args, **kw)
[task 2021-05-28T09:51:15.125Z] 09:51:15     INFO - ICE Server:   File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\python\log.py", line 86, in callWithContext
[task 2021-05-28T09:51:15.132Z] 09:51:15     INFO - ICE Server:     return context.call({ILogContext: newCtx}, func, *args, **kw)
[task 2021-05-28T09:51:15.140Z] 09:51:15     INFO - ICE Server:   File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\python\context.py", line 122, in callWithContext
[task 2021-05-28T09:51:15.147Z] 09:51:15     INFO - ICE Server:     return self.currentContext().callWithContext(ctx, func, *args, **kw)
[task 2021-05-28T09:51:15.155Z] 09:51:15     INFO - ICE Server:   File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\python\context.py", line 85, in callWithContext
[task 2021-05-28T09:51:15.161Z] 09:51:15     INFO - ICE Server:     return func(*args,**kw)
[task 2021-05-28T09:51:15.169Z] 09:51:15     INFO - ICE Server:   File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\internet\selectreactor.py", line 154, in _doReadOrWrite
[task 2021-05-28T09:51:15.176Z] 09:51:15     INFO - ICE Server:     self._disconnectSelectable(selectable, why, method=="doRead")
[task 2021-05-28T09:51:15.185Z] 09:51:15     INFO - ICE Server:   File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\internet\posixbase.py", line 252, in _disconnectSelectable
[task 2021-05-28T09:51:15.190Z] 09:51:15     INFO - Buffered messages logged at 09:51:14
[task 2021-05-28T09:51:15.196Z] 09:51:15     INFO - ICE Server:     selectable.readConnectionLost(f)
[task 2021-05-28T09:51:15.204Z] 09:51:15     INFO - ICE Server:   File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\internet\tcp.py", line 307, in readConnectionLost
[task 2021-05-28T09:51:15.210Z] 09:51:15     INFO - ICE Server:     self.connectionLost(reason)
[task 2021-05-28T09:51:15.218Z] 09:51:15     INFO - ICE Server:   File "C:\tasks\task_1622189839\build\venv\lib\site-packages\twisted\internet\tcp.py", line 327, in connectionLost
[task 2021-05-28T09:51:15.224Z] 09:51:15     INFO - ICE Server:     protocol.connectionLost(reason)
[task 2021-05-28T09:51:15.231Z] 09:51:15     INFO - ICE Server: builtins.AttributeError: 'TcpStunRedirectHandler' object has no attribute 'connectionLost'
[task 2021-05-28T09:51:15.237Z] 09:51:15     INFO - ICE Server: 
[task 2021-05-28T09:51:15.242Z] 09:51:15     INFO - Buffered messages finished
[task 2021-05-28T09:51:15.255Z] 09:51:15     INFO - TEST-UNEXPECTED-FAIL | dom/media/webrtc/tests/mochitests/test_peerConnection_gatherWithStun300.html | Gathering did not complete within 2000 ms with redirect rule: => iceServers: [{"urls":["turn:10.7.204.44:3479?transport=tcp"],"username":"foo","credential":"bar"}] 
[task 2021-05-28T09:51:15.256Z] 09:51:15     INFO -     SimpleTest.ok@SimpleTest/SimpleTest.js:417:16
[task 2021-05-28T09:51:15.257Z] 09:51:15     INFO -     gatherWithTimeout@dom/media/webrtc/tests/mochitests/iceTestUtils.js:51:7
[task 2021-05-28T09:51:15.262Z] 09:51:15     INFO - candidates: []
[task 2021-05-28T09:51:15.268Z] 09:51:15     INFO - Not taking screenshot here: see the one that was previously logged
[task 2021-05-28T09:51:15.282Z] 09:51:15     INFO - TEST-UNEXPECTED-FAIL | dom/media/webrtc/tests/mochitests/test_peerConnection_gatherWithStun300.html | Should have two relay candidates with redirect rule: => iceServers: [{"urls":["turn:10.7.204.44:3479?transport=tcp"],"username":"foo","credential":"bar"}] - got +0, expected 2
[task 2021-05-28T09:51:15.283Z] 09:51:15     INFO -     SimpleTest.is@SimpleTest/SimpleTest.js:500:14
[task 2021-05-28T09:51:15.284Z] 09:51:15     INFO -     checkRelayTcp@dom/media/webrtc/tests/mochitests/iceTestUtils.js:164:5
[task 2021-05-28T09:51:15.292Z] 09:51:15     INFO - GECKO(7004) | [Child 14792: Main Thread]: E/signaling [main|PeerConnectionImpl] PeerConnectionImpl.cpp:2778: GetStats: Found no pipelines matching selector.
[task 2021-05-28T09:51:15.302Z] 09:51:15     INFO - TEST-PASS | dom/media/webrtc/tests/mochitests/test_peerConnection_gatherWithStun300.html | No UDP relay candidates should be present with redirect rule: => iceServers: [{"urls":["turn:10.7.204.44:3479?transport=tcp"],"username":"foo","credential":"bar"}]

Natalia Csoregi [:nataliaCs]

Comment 1

•

3 years ago

Caused by Bug 857668.
Byron, could you help us out with this one?
Thank you.

Flags: needinfo?(docfaraday)

Byron Campen [:bwc]

Assignee

Comment 2

•

3 years ago

At first blush, it looks like the machine this is running on is simply too slow to run the test case in the time allotted. I'll look into it today.

Assignee: nobody → docfaraday

Flags: needinfo?(docfaraday)

Byron Campen [:bwc]

Assignee

Comment 3

•

3 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=26de5e0c06f8404db3154386c1742937161cac11

Byron Campen [:bwc]

Assignee

Comment 4

•

3 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=253c0f5ae7f1b588ab23b8045f0148e5df8b1154

Byron Campen [:bwc]

Assignee

Comment 5

•

3 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c80902d2f0873d4e83aba337ff1f198597608fa6

Comment hidden (Intermittent Failures Robot)

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Comment 7

•

3 years ago

I feared 2000 ms was too aggressive.

BugBot [:suhaib / :marco/ :calixte]

Comment 8

•

3 years ago

Set release status flags based on info from the regressing bug 857668

status-firefox89: --- → unaffected

status-firefox90: --- → affected

status-firefox91: --- → affected

status-firefox-esr78: --- → unaffected

Byron Campen [:bwc]

Assignee

Comment 9

•

3 years ago

Every time I try to run test-windows10-aarch64/opt-mochitest-media-gli-e10s on try, it gets an exception. Trying the base revision to see if somehow the fixes I'm trying are the cause.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5add3afa72214cef193a9f690df6732e76c7cb6a

Edit: mochitest job has been queued for 1413 minutes. What's going on here?

Byron Campen [:bwc]

Assignee

Updated

•

3 years ago

Flags: needinfo?(ahal)

Byron Campen [:bwc]

Assignee

Comment 10

•

3 years ago

I can't fix this if I can't run our test suite at all on try.

Keywords: stalled

Andrew Halberstadt [:ahal]

Comment 11

•

3 years ago

I think that Windows aarch64 platform is very limited and it looks like in this case there weren't any workers available to claim the task before the task deadline was exceeded. Actually looks like this pool is labeled "experimental" and consists of only a single laptop (one of the two workers listed there is quarantined)?

Is this pool try only or something? I'm not really sure how a tier-2 task could be running on a pool with a single machine... But yeah, basically there's no capacity in this pool.

Flags: needinfo?(ahal)

Byron Campen [:bwc]

Assignee

Comment 12

•

3 years ago

So, I guess the thing to do is disable this test entirely on Windows aarch64 until we have adequate resources for try. Is there a tracking bug I can hook things off of?

Flags: needinfo?(ahal)

Byron Campen [:bwc]

Assignee

Comment 13

•

3 years ago

I guess I could also extend the timeout and see if that helps on m-c, before taking the step of disabling entirely.

Byron Campen [:bwc]

Assignee

Updated

•

3 years ago

Keywords: stalled → leave-open

Byron Campen [:bwc]

Assignee

Comment 14

•

3 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b57a78a8f98bb5cdda2b7873e542ab47a3c5c761

Byron Campen [:bwc]

Assignee

Comment 15

•

3 years ago

Attached file Bug 1713300: Extend this timeout to give Windows aarch64 more time. r?jib (deleted) — Details

Byron Campen [:bwc]

Assignee

Comment 16

•

3 years ago

Ok. Let's land the patch in comment 15, and wait to see how that effects the failure rate. We might consider extending the timeout some more if it continues to occur, but too much and our ability to detect regressions will be compromised. We'll leave this bug open until we determine that the problem is fixed, or we determine that we aren't willing to relax timeouts any further and disable the test.

Andrew Halberstadt [:ahal]

Comment 17

•

3 years ago

I'm not aware of any tracking bug for this pool. With so few machines in this pool, it's likely we need to be very intentional about what we run on them, so if you don't see any particular value running these tests with Windows aarch64, then removing these tasks might be the best way forward.

Flags: needinfo?(ahal)

Byron Campen [:bwc]

Assignee

Comment 18

•

3 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #17)

I'm not aware of any tracking bug for this pool. With so few machines in this pool, it's likely we need to be very intentional about what we run on them, so if you don't see any particular value running these tests with Windows aarch64, then removing these tasks might be the best way forward.

I'm just looking for a bug to make a dependency of "re-enable this test on aarch64". Is there a "promote aarch64 to tier 1" bug anywhere?

Comment hidden (Intermittent Failures Robot)

Pulsebot

Comment 20

•

3 years ago

Pushed by bcampen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/23f53e272191 Extend this timeout to give Windows aarch64 more time. r=jib

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 21

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/23f53e272191

Comment hidden (Intermittent Failures Robot)

Byron Campen [:bwc]

Assignee

Comment 23

•

3 years ago

Relaxing the timeout does not seem to have stopped the failures, but they seem to be happening in later, more complicated subtests (the ones that do both UDP and TCP). Maybe the thing to do is selectively relax the timeouts on aarch64?

Byron Campen [:bwc]

Assignee

Comment 24

•

3 years ago

Attached file Bug 1713300: Extend this timeout even more to give Windows aarch64 more time. r?jib (deleted) — Details

Pulsebot

Comment 25

•

3 years ago

Pushed by archaeopteryx@coole-files.de: https://hg.mozilla.org/integration/autoland/rev/72de1a61b749 Extend this timeout even more to give Windows aarch64 more time. r=jib

Cristian Brindusan [:cbrindusan]

Comment 26

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/72de1a61b749

Comment hidden (Intermittent Failures Robot)

Byron Campen [:bwc]

Assignee

Comment 28

•

3 years ago

Comment on attachment 9225197 [details]
Bug 1713300: Extend this timeout to give Windows aarch64 more time. r?jib

Beta/Release Uplift Approval Request

User impact if declined: Frequent oranges.
Is this code covered by automated tests?: Yes
Has the fix been verified in Nightly?: Yes
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): Just a test change, and a simple one at that.
String changes made/needed: None

Attachment #9225197 - Flags: approval-mozilla-beta?

Byron Campen [:bwc]

Assignee

Updated

•

3 years ago

Attachment #9226916 - Flags: approval-mozilla-beta?

Byron Campen [:bwc]

Assignee

Comment 29

•

3 years ago

Seems to only be happening on m-b now. This platform is just really slow it seems.

Julien Cristau [:jcristau]

Comment 30

•

3 years ago

Does this bug need to stay open at this point, or should we close it as fixed?

Flags: needinfo?(docfaraday)

Whiteboard: [checkin-needed-beta]

Julien Cristau [:jcristau]

Comment 31

•

3 years ago

Comment on attachment 9225197 [details]
Bug 1713300: Extend this timeout to give Windows aarch64 more time. r?jib

This can land a=test-only

Attachment #9225197 - Flags: approval-mozilla-beta?

Julien Cristau [:jcristau]

Updated

•

3 years ago

Attachment #9226916 - Flags: approval-mozilla-beta?

Byron Campen [:bwc]

Assignee

Comment 32

•

3 years ago

Let's leave this open another week to see if the oranges are fully gone. Leaving ni on self.

Julien Cristau [:jcristau]

Comment 33

•

3 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/01a9bd0a89c8
https://hg.mozilla.org/releases/mozilla-beta/rev/86e66cc94c21

status-firefox90: affected → fixed

Whiteboard: [checkin-needed-beta]

Byron Campen [:bwc]

Assignee

Comment 34

•

3 years ago

Yep, this looks fixed to me.

Status: NEW → RESOLVED

Closed: 3 years ago

status-firefox91: affected → fixed

Flags: needinfo?(docfaraday)

Keywords: leave-open

Resolution: --- → FIXED

Target Milestone: --- → 91 Branch

BMO Automation

Updated

•

3 years ago

Has Regression Range: --- → yes

Bug 1713300: Extend this timeout to give Windows aarch64 more time. r?jib 3 years ago Byron Campen [:bwc] (deleted), text/x-phabricator-request		Details
Bug 1713300: Extend this timeout even more to give Windows aarch64 more time. r?jib 3 years ago Byron Campen [:bwc] (deleted), text/x-phabricator-request		Details