Closed
Bug 1037618
Opened 10 years ago
Closed 7 years ago
Intermittent ice_unittest | test failed with return code 1
Categories
(Core :: WebRTC, defect, P4)
Tracking
()
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
firefox33 | --- | disabled |
firefox34 | --- | disabled |
firefox35 | --- | disabled |
firefox40 | --- | wontfix |
firefox41 | --- | affected |
firefox42 | --- | affected |
firefox43 | --- | affected |
firefox-esr31 | --- | disabled |
firefox-esr38 | --- | affected |
b2g-v1.4 | --- | disabled |
b2g-v2.0 | --- | disabled |
b2g-v2.1 | --- | disabled |
b2g-v2.2 | --- | disabled |
b2g-master | --- | affected |
backlog | webrtc/webaudio+ |
People
(Reporter: RyanVM, Assigned: drno)
References
(Depends on 1 open bug)
Details
(Keywords: intermittent-failure)
Attachments
(4 files, 1 obsolete file)
https://tbpl.mozilla.org/php/getParsedLog.php?id=43622829&tree=Fx-Team
Rev4 MacOSX Snow Leopard 10.6 fx-team opt test cppunit on 2014-07-11 10:05:37 PDT for push ed9ec8fed9a4
slave: t-snow-r4-0025
10:08:29 INFO - CANDIDATES:
10:08:29 INFO - Stream P1
10:08:29 INFO - candidate:0 1 UDP 2128609535 10.26.56.189 59147 typ host
10:08:29 INFO - candidate:1 1 UDP 1692467199 63.245.214.82 48826 typ srflx raddr 10.26.56.189 rport 59147
10:08:30 INFO - [ OK ] IceGatherTest.TestGatherDNSStunServerIpAddress (1088 ms)
10:08:30 INFO - [ RUN ] IceGatherTest.TestGatherDNSStunServerHostname
10:08:30 INFO - Candidate initialized: candidate:0 1 UDP 2128609535 10.26.56.189 50841 typ host
10:08:37 INFO - /builds/slave/fx-team-osx64-0000000000000000/build/media/mtransport/test/ice_unittest.cpp:844: Failure
10:08:37 INFO - Value of: res
10:08:37 INFO - Actual: false
10:08:37 INFO - Expected: true
10:08:37 INFO - ICE(P1): No STUN servers specified
10:08:37 INFO - ICE(P1): No TURN servers specified
10:08:38 INFO - [ FAILED ] IceGatherTest.TestGatherDNSStunServerHostname (8008 ms)
10:08:38 INFO - [ RUN ] IceGatherTest.TestGatherDNSStunBogusHostname
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 11•10 years ago
|
||
These seem to be network-related; for some reason the STUN server that these tests are hitting is being unreliable.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 60•10 years ago
|
||
Looks like our stun server is being flaky again.
Reporter | ||
Comment 61•10 years ago
|
||
This appears to have gone off the rails across all trees today. Disabled until someone can figure out what's wrong with this test on OSX and ensure that it's written in a way that meets visibility standards.
https://hg.mozilla.org/mozilla-central/rev/d08c31df0c1a
https://hg.mozilla.org/releases/mozilla-aurora/rev/9199063d2e16
https://hg.mozilla.org/releases/mozilla-beta/rev/0225b61c4f71
https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/f62cff5d1744
https://hg.mozilla.org/releases/mozilla-esr31/rev/1a78f56a046d
https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/767991019ae3
status-b2g-v1.4:
--- → disabled
status-b2g-v2.0:
--- → disabled
status-b2g-v2.1:
--- → disabled
status-b2g-v2.2:
--- → disabled
status-firefox33:
--- → disabled
status-firefox34:
--- → disabled
status-firefox35:
--- → disabled
status-firefox-esr31:
--- → disabled
Whiteboard: [test disabled on OSX][leave open]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 71•10 years ago
|
||
I think having these test being disabled on Mac OSX is pretty bad. The solution is not pretty, but hopefully we at least get the tests back this way.
Attachment #8499814 -
Flags: review?(docfaraday)
Comment 72•10 years ago
|
||
Comment on attachment 8499814 [details] [diff] [review]
bug_1037618_fix_ice_unitests_for_osx.patch
Review of attachment 8499814 [details] [diff] [review]:
-----------------------------------------------------------------
*sadface*
Maybe we should instead just use the test STUN server for all of this?
Comment 73•10 years ago
|
||
Hmm, looking at some of the more recent failures, I see evidence that EXPECT_TRUE_WAIT is not waiting for the full duration. We should be waiting 7 seconds for gathering to complete, but we do not seem to be waiting even 1. Maybe there's some gtest bug on OS X?
Comment 74•10 years ago
|
||
Ok, that EXPECT_TRUE_WAIT macro was something we wrote, based on PRIntervalTime. I do not see any obvious flaws in the implementation of the macro, have not evaluated the timer implementation stuff.
Comment 75•10 years ago
|
||
Tried Nils' patch, still seeing problems on 10.6, even with the test stun server. Something is rotten on those 10.6 boxes.
Comment 76•10 years ago
|
||
As far as I can tell, those 10.6 boxes have no network connectivity to the outside world at all.
Comment 78•10 years ago
|
||
Comment on attachment 8499814 [details] [diff] [review]
bug_1037618_fix_ice_unitests_for_osx.patch
Review of attachment 8499814 [details] [diff] [review]:
-----------------------------------------------------------------
Since this doesn't seem to help us any.
Attachment #8499814 -
Flags: review?(docfaraday)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•10 years ago
|
Rank: 35
Flags: firefox-backlog+
Priority: -- → P3
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•10 years ago
|
backlog: --- → webRTC+
Flags: firefox-backlog+
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 99•9 years ago
|
||
Nils,
All of the recent tests here seem to be in the TCP tests.
Flags: needinfo?(drno)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 103•9 years ago
|
||
So for the test failure where TestSendReceiveTcp failed the problem seems to be that in these cases the TCP active connection gets advertised in the ICE candidate as coming from port 9. As we are obviously not sending from port 9 but some ephemeral number the receiving side marks that candidate as peer reflexive. Which then makes our tests unhappy, because they expect a host-to-host connection happening here.
Flags: needinfo?(drno)
Assignee | ||
Comment 104•9 years ago
|
||
I see two options for fixing the issue described in comment #103:
1) Open up the tests to accept peer reflexive, at least when TCP is used/expected
2) Teach nICEr that port 9 is actually never a port to expect traffic from, so don't change the candidate type if we notice the port mis-match.
I'm not sure if option 2) is better or could cause collateral damage. Ekr, bwc any thought/opinions on that?
Flags: needinfo?(ekr)
Flags: needinfo?(docfaraday)
Assignee | ||
Comment 105•9 years ago
|
||
Further more I see failures in TestGatherDNSStunServerHostnameTcp and TestGatherDNSStunServerHostnameTcp which I don't have answers for yet why they are failing.
In both cases the 'tcptype so' seem to be missing, but no log message why. I'm going to read some code and see if I can come up with an explanation for that.
Comment 107•9 years ago
|
||
(In reply to Eric Rescorla (:ekr) from comment #106)
> I think for now, let's do #1
Agreed. That candidate is definitely a peer reflexive according to the definition, and is also the candidate we expect to see in that test.
Flags: needinfo?(docfaraday)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 109•9 years ago
|
||
Bug 1037618: relax candidate verification for TCP. r?bwc
Attachment #8646176 -
Flags: review?(docfaraday)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 112•9 years ago
|
||
Comment on attachment 8646176 [details]
MozReview Request: Bug 1037618: relax candidate verification for TCP. r=bwc
https://reviewboard.mozilla.org/r/15653/#review13997
::: media/mtransport/test/ice_unittest.cpp:705
(Diff revision 1)
> + if (expected_local_transport_==kNrIceTransportTcp &&
> + expected_remote_type_==NrIceCandidate::ICE_HOST) {
Spaces around ==
::: media/mtransport/test/ice_unittest.cpp:704
(Diff revision 1)
> + /* Special case for ICE TCP using port 9 for its active candidates */
This comment is probably too cryptic. Maybe something like "Active remote ICE TCP candidates will be discovered as peer reflexive, since they are trickled with port 9 as a placeholder. So, allow peer reflexive here."
Attachment #8646176 -
Flags: review?(docfaraday) → review+
Assignee | ||
Comment 113•9 years ago
|
||
Comment on attachment 8646176 [details]
MozReview Request: Bug 1037618: relax candidate verification for TCP. r=bwc
Bug 1037618: relax candidate verification for TCP. r=bwc
Attachment #8646176 -
Attachment description: MozReview Request: Bug 1037618: relax candidate verification for TCP. r?bwc → MozReview Request: Bug 1037618: relax candidate verification for TCP. r=bwc
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → drno
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 117•9 years ago
|
||
Try run is green for attachment 8646176 [details]
Keywords: checkin-needed,
leave-open
Comment 118•9 years ago
|
||
Keywords: checkin-needed
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Comment 120•9 years ago
|
||
Assignee | ||
Comment 121•9 years ago
|
||
(In reply to Treeherder Robot from comment #119)
> log:
> https://treeherder.mozilla.org/logviewer.html#?repo=fx-team&job_id=4257884
> repository: fx-team
> start_time: 2015-08-14T11:54:10
> who: rvandermeulen[at]mozilla[dot]com
> machine: t-yosemite-r5-0017
> buildname: Rev5 MacOSX Yosemite 10.10 fx-team opt test cppunit
> revision: a303d8119dc8
>
> TEST-UNEXPECTED-FAIL | ice_unittest | test failed with return code 1
> Return code: 1
Ohoh looks like we are "loosing" candidate on the fly :-0
12:02:30 INFO - Candidate initialized: candidate:0 1 UDP 2130444543 10.26.56.83 61619 typ host
12:02:30 INFO - Candidate initialized: candidate:1 1 TCP 2105442559 10.26.56.83 56425 typ host tcptype passive
12:02:30 INFO - Candidate initialized: candidate:1 1 TCP 2105458943 10.26.56.83 9 typ host tcptype active
12:02:30 INFO - Candidate initialized: candidate:1 1 TCP 2105426175 10.26.56.83 64468 typ host tcptype so
12:02:30 INFO - Gathering complete for P1
12:02:30 INFO - CANDIDATES:
12:02:30 INFO - Stream P1
12:02:30 INFO - candidate:0 1 UDP 2130444543 10.26.56.83 61619 typ host
12:02:30 INFO - candidate:1 1 TCP 2105458943 10.26.56.83 9 typ host tcptype active
12:02:30 INFO - candidate:1 1 TCP 2105426175 10.26.56.83 64468 typ host tcptype so
12:02:30 INFO - Returning candidate: candidate:0 1 UDP 2130444543 10.26.56.83 61619 typ host
12:02:30 INFO - Returning candidate: candidate:1 1 TCP 2105458943 10.26.56.83 9 typ host tcptype active
12:02:30 INFO - Returning candidate: candidate:1 1 TCP 2105426175 10.26.56.83 64468 typ host tcptype so
12:02:30 INFO - /builds/slave/fx-team-m64-000000000000000000/build/src/media/mtransport/test/ice_unittest.cpp:1761: Failure
12:02:30 INFO - Value of: StreamHasMatchingCandidate(0, "tcptype passive")
12:02:30 INFO - Actual: false
12:02:30 INFO - Expected: true
Assignee | ||
Comment 122•9 years ago
|
||
Assignee | ||
Comment 123•9 years ago
|
||
Theory of what might be going on with missing or loosing candidates: for the tcpype active candidate we currently allocate a port number, but do not actually create a socket with/for that port number (sockets for active connection get allocated once a connection to a destination gets requested). Internally the port number is a random ephemeral number, and the number 9 we see in the test output only result from over-writing the port number before handing it off. What could be a problem here is that deduping might kill new candidates, e.g. tcptype so, or even removed previously created candidates like the missing tcptype passive.
I'll look into improving our code base in that area, probably in a separate bug.
Assignee | ||
Comment 124•9 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #123)
> I'll look into improving our code base in that area, probably in a separate
> bug.
Created bug 1195396 to follow up on this.
Depends on: 1195396
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Updated•9 years ago
|
status-b2g-master:
--- → affected
status-firefox40:
--- → wontfix
status-firefox41:
--- → affected
status-firefox42:
--- → affected
status-firefox43:
--- → affected
status-firefox-esr38:
--- → affected
Whiteboard: [test disabled on OSX][leave open]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 151•9 years ago
|
||
It looks like a DNS resolution is occasionally taking a long enough time to cause the test to time out. We could try extending the timeout and see if that helps. It could be argued that this test should not run in CI, since a DNS lookup relies on the external network.
Comment 152•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/31459/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/31459/
Comment 153•9 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #151)
> It looks like a DNS resolution is occasionally taking a long enough time to
> cause the test to time out. We could try extending the timeout and see if
> that helps. It could be argued that this test should not run in CI, since a
> DNS lookup relies on the external network.
Why would it be looking up an external network address? The test machines have no access to an external network, so any access to DNS should be local. If it's timing out, that implies the entire system stalled for a while, it was chewed up by higher-priority threads, or something is wrong with DNS (local server).
Flags: needinfo?(docfaraday)
Assignee | ||
Comment 154•9 years ago
|
||
Comment on attachment 8709504 [details]
MozReview Request: Bug 1037618: Extend DNS timeout for bogus STUN server tests.
https://reviewboard.mozilla.org/r/31459/#review28213
On a machine with working DNS resolution trying to resolve anything at .invalid resolve really fast. So I guess that means on machines were this takes long DNS resolution isn't working at all. So I'm wondering if a check for empty |g_stun_server_address| to disable these tests is more appropriate?
I'll leave the decision up to you.
Attachment #8709504 -
Flags: review+
Comment 155•9 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #153)
> (In reply to Byron Campen [:bwc] from comment #151)
> > It looks like a DNS resolution is occasionally taking a long enough time to
> > cause the test to time out. We could try extending the timeout and see if
> > that helps. It could be argued that this test should not run in CI, since a
> > DNS lookup relies on the external network.
>
> Why would it be looking up an external network address? The test machines
> have no access to an external network, so any access to DNS should be local.
> If it's timing out, that implies the entire system stalled for a while, it
> was chewed up by higher-priority threads, or something is wrong with DNS
> (local server).
We're using a bogus FQDN (a .invalid address) in a test to make sure the code-path works. But if that doesn't try to hit the wire, it may be some other problem...
Flags: needinfo?(docfaraday)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 159•9 years ago
|
||
Six of these seem to be due to failure to form a TCP connection in a reasonable amount of time; a similar problem as seen in bug 1201209 recently.
http://archive.mozilla.org/pub/firefox/tinderbox-builds/fx-team-macosx64/1456350708/fx-team_yosemite_r7_test-cppunit-bm106-tests1-macosx-build452.txt.gz
http://archive.mozilla.org/pub/firefox/tinderbox-builds/fx-team-macosx64/1456440145/fx-team_yosemite_r7_test-cppunit-bm107-tests1-macosx-build366.txt.gz
http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-macosx64/1456590806/mozilla-inbound_snowleopard_test-cppunit-bm108-tests1-macosx-build2266.txt.gz
https://public-artifacts.taskcluster.net/FVDnFk4_Sf-DqktJ1MojIA/0/public/logs/live_backing.log
https://public-artifacts.taskcluster.net/Zj2ME_8UT06G3LsKjWvHsg/0/public/logs/live_backing.log
https://public-artifacts.taskcluster.net/ImdA3S28SoSL_hrZ-NLKOg/0/public/logs/live_backing.log
Two seem to be caused by TestStunServer::SetDropInitialPackets not working as expected.
http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-linux64-asan/1456393484/mozilla-inbound_ubuntu64-asan_vm_test-cppunit-bm120-tests1-linux64-build59.txt.gz
http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-linux64-debug/1456242790/mozilla-central_ubuntu64_vm-debug_test-cppunit-bm67-tests1-linux64-build1.txt.gz
Two have two test-cases that look up an invalid FQDN, which both fail.
http://archive.mozilla.org/pub/firefox/tinderbox-builds/fx-team-linux64/1456520126/fx-team_ubuntu64_vm_test-cppunit-bm120-tests1-linux64-build5.txt.gz
http://archive.mozilla.org/pub/firefox/tinderbox-builds/fx-team-linux64/1456525679/fx-team_ubuntu64_vm_test-cppunit-bm116-tests1-linux64-build11.txt.gz
Two are due to a bug in the test code where duplicate candidates will cause some checking code to be sad.
http://archive.mozilla.org/pub/firefox/tinderbox-builds/fx-team-macosx64/1456325925/fx-team_snowleopard_test-cppunit-bm107-tests1-macosx-build1085.txt.gz
http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-macosx64/1456241709/mozilla-inbound_snowleopard_test-cppunit-bm108-tests1-macosx-build2133.txt.gz
Comment 160•9 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #159)
>
> Two have two test-cases that look up an invalid FQDN, which both fail.
>
> http://archive.mozilla.org/pub/firefox/tinderbox-builds/fx-team-linux64/
> 1456520126/fx-team_ubuntu64_vm_test-cppunit-bm120-tests1-linux64-build5.txt.
> gz
> http://archive.mozilla.org/pub/firefox/tinderbox-builds/fx-team-linux64/
> 1456525679/fx-team_ubuntu64_vm_test-cppunit-bm116-tests1-linux64-build11.txt.
> gz
>
Both of these failures occurred on linux64, less than a minute apart, in exactly the same way (DNS resolution for .invalid seems to hang). I suspect this is infra-related.
Comment 161•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/37445/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/37445/
Attachment #8725365 -
Flags: review?(drno)
Updated•9 years ago
|
Attachment #8709504 -
Attachment is obsolete: true
Comment 162•9 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #159)
>
> Two seem to be caused by TestStunServer::SetDropInitialPackets not working
> as expected.
>
> http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-
> linux64-asan/1456393484/mozilla-inbound_ubuntu64-asan_vm_test-cppunit-bm120-
> tests1-linux64-build59.txt.gz
> http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-
> linux64-debug/1456242790/mozilla-central_ubuntu64_vm-debug_test-cppunit-bm67-
> tests1-linux64-build1.txt.gz
This is happening because a UDP port is being reused from an earlier test, and the test STUN server still has a packet count for that address which is large enough to exceed the packet drop count. We need to clear out these counts before running these tests.
Assignee | ||
Comment 163•9 years ago
|
||
Comment on attachment 8725365 [details]
MozReview Request: Bug 1037618 - Part 1: Add some logging to highlight TCP connection failures. r?drno
https://reviewboard.mozilla.org/r/37445/#review34215
::: media/mtransport/third_party/nICEr/src/net/nr_socket_multi_tcp.c:591
(Diff revision 1)
> + r_log(LOG_ICE,LOG_WARNING,"%s:%d %s failed to accept new TCP connection: %d",__FILE__,__LINE__,__FUNCTION__,_status);
Might be helpful to include |remote_addr.as_string| here as well if it is set already.
Attachment #8725365 -
Flags: review?(drno) → review+
Assignee | ||
Comment 164•9 years ago
|
||
Byron FYI on the TCP connection failures: I found the other day that not surprisingly all of the nICEr is not taking into account that establishing connections can take time. So in case of HTTP proxies connect() gets called, pretty soon followed by a call to write(). The write() returns R_WOULDBLOCK, which then results in the socket getting armed for write status. The OS then immediately returns that the socket in fact is writeable. Which then causes another call to write(). And then we go into something like a busy loop, until the proxy code either connects or gets failure.
I'm not sure if we have similar problems in general with TCP connections. But as we don't have a callback for when connections get established it seems plausible that also our TCP code in general might run into corner cases like this, where a busy loop on a small try machine might stall the test indefinitely.
Comment 165•9 years ago
|
||
https://reviewboard.mozilla.org/r/37445/#review34215
> Might be helpful to include |remote_addr.as_string| here as well if it is set already.
I thought about that, we'd either need to zero out |remote_addr| up top, or have another variable that tracks whether it has been filled in. As it stands, if this fails I'm primarily interested in the error, not so much who is the lucky winner.
Comment 166•9 years ago
|
||
Comment 167•9 years ago
|
||
bugherder |
Comment 168•7 years ago
|
||
Mass change P3->P4 to align with new Mozilla triage process.
Priority: P3 → P4
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Comment 169•7 years ago
|
||
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in
before you can comment on or make changes to this bug.
Description
•