Closed
Bug 1117979
Opened 10 years ago
Closed 8 years ago
Intermittent test_location_error.js | xpcshell return code: 0 | run_test/< - [run_test/< : 39] 1 == 0
Categories
(Firefox :: General, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox35 | --- | unaffected |
firefox36 | --- | fixed |
firefox37 | --- | fixed |
firefox38 | --- | fixed |
firefox-esr31 | --- | unaffected |
People
(Reporter: RyanVM, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: intermittent-failure)
Attachments
(1 file, 2 obsolete files)
(deleted),
patch
|
Gavin
:
review+
|
Details | Diff | Splinter Review |
Two failures within minutes of each other on two different trees (spanning different Gecko revisions no less) = external dependencies maybe?
11:59:59 INFO - TEST-START | toolkit/components/search/tests/xpcshell/test_location_error.js
12:00:04 WARNING - TEST-UNEXPECTED-FAIL | toolkit/components/search/tests/xpcshell/test_location_error.js | xpcshell return code: 0
12:00:04 INFO - TEST-INFO took 4735ms
12:00:04 INFO - >>>>>>>
12:00:04 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
12:00:04 INFO - TEST-PASS | toolkit/components/search/tests/xpcshell/test_location_error.js | run_test - [run_test : 8] true == true
12:00:04 INFO - PROCESS | 5452 | *** Search: SearchService.init
12:00:04 INFO - PROCESS | 5452 | *** Search: metadata init: starting
12:00:04 INFO - (xpcshell/head.js) | test pending (2)
12:00:04 INFO - (xpcshell/head.js) | test MAIN run_test finished (2)
12:00:04 INFO - running event loop
...
12:00:04 INFO - TEST-PASS | toolkit/components/search/tests/xpcshell/test_location_error.js | run_test/< - [run_test/< : 34] 0 == 0
12:00:04 WARNING - TEST-UNEXPECTED-FAIL | toolkit/components/search/tests/xpcshell/test_location_error.js | run_test/< - [run_test/< : 39] 1 == 0
12:00:04 INFO - C:/slave/test/build/tests/xpcshell/tests/toolkit/components/search/tests/xpcshell/test_location_error.js:run_test/<:39
12:00:04 INFO - resource://gre/components/nsSearchService.js:onSuccess:3982
12:00:04 INFO - resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:Handler.prototype.process:870
12:00:04 INFO - resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:this.PromiseWalker.walkerLoop:749
12:00:04 INFO - resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:this.PromiseWalker.scheduleWalkerLoop/<:691
12:00:04 INFO - C:\slave\test\build\tests\xpcshell\head.js:_do_main:184
12:00:04 INFO - C:\slave\test\build\tests\xpcshell\head.js:_execute_test:476
12:00:04 INFO - -e:null:1
12:00:04 INFO - null:null:0
12:00:04 INFO - exiting test
12:00:04 INFO - <<<<<<<
Flags: needinfo?(mhammond)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 3•10 years ago
|
||
A fairly trivial test-only fix.
This test is supposed to see a DNS error but instead is seeing a timeout. I speculate that this could happen due to the DNS request taking longer than ~2s to fail, causing the symptoms we see in the failure. The solution is (hopefully!) to increase the geoip timeout for this test.
Gavin, note that I also changed the server name - the test originally had:
// from server-locations.txt, we choose a URL without a cert.
let url = "https://nocert.example.com:443";
with the intention being that the test would see a certificate error. It turns out that the locations in server-locations.txt are *not* running for xpcshell tests, so what was actually happening is a DNS failure (the test infra doesn't complain about external DNS request, only about external socket connections).
So I changed it to:
// We configure for a server that's going to fail a DNS lookup.
let url = "https://xpcshell-test-does-not-exist.mozilla.org";
which I think it somewhat clearer and doesn't change the semantics. Note I've manually verified a certificate error *does* cause failure in the way we expect, but I haven't yet worked out how to sanely have an xpcshell test connect to a https server (at all, let alone one with a bad cert - httpd.js isn't any help with https)
Assignee: nobody → mhammond
Status: NEW → ASSIGNED
Flags: needinfo?(mhammond)
Attachment #8545703 -
Flags: review?(gavin.sharp)
Reporter | ||
Comment 4•10 years ago
|
||
(In reply to Mark Hammond [:markh] from comment #3)
> (the test infra doesn't complain about external DNS request, only
> about external socket connections).
I wonder how often this burns us in other tests.
Flags: needinfo?(nfroyd)
Comment 5•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #4)
> (In reply to Mark Hammond [:markh] from comment #3)
> > (the test infra doesn't complain about external DNS request, only
> > about external socket connections).
>
> I wonder how often this burns us in other tests.
No idea; then again, it's not clear how much win we really got out of disabling non-local network connections in the first place.
Bug 995417 comment 4 initially discouraged blocking things at the DNS level, maybe it's worth revisiting that assumption.
Flags: needinfo?(nfroyd)
Comment 6•10 years ago
|
||
Filed bug 1119246 for discussing DNS blocking.
Comment 7•10 years ago
|
||
Comment on attachment 8545703 [details] [diff] [review]
0001-Bug-1117979-fix-orange-by-increasing-the-geoip-timeo.patch
This seems like a fine bandaid, but doesn't really help if the DNS request ends up taking 21 seconds.
Wouldn't an invalid URI be a more reliable way to test the same error case?
Attachment #8545703 -
Flags: review?(gavin.sharp)
Comment 8•10 years ago
|
||
(In reply to :Gavin Sharp [email: gavin@gavinsharp.com] from comment #7)
> Comment on attachment 8545703 [details] [diff] [review]
> 0001-Bug-1117979-fix-orange-by-increasing-the-geoip-timeo.patch
>
> This seems like a fine bandaid, but doesn't really help if the DNS request
> ends up taking 21 seconds.
>
> Wouldn't an invalid URI be a more reliable way to test the same error case?
Fair enough - I felt it was nice to actually hit the network stack during this test, but I agree avoiding this can-o-worms is more worthwhile.
Attachment #8545703 -
Attachment is obsolete: true
Attachment #8546316 -
Flags: review?(gavin.sharp)
Updated•10 years ago
|
Attachment #8546316 -
Flags: review?(gavin.sharp) → review+
Comment 9•10 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Backed out in https://hg.mozilla.org/integration/fx-team/rev/8fbee3ef525f for xpcshell orange:
https://treeherder.mozilla.org/logviewer.html#?job_id=1649460&repo=fx-team
Flags: needinfo?(mhammond)
Comment 12•10 years ago
|
||
Comment on attachment 8546316 [details] [diff] [review]
0001-Bug-1117979-fix-orange-by-not-relying-on-DNS-lookup-.patch
Review of attachment 8546316 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/search/tests/xpcshell/test_location_error.js
@@ +20,5 @@
> removeCacheFile();
> });
>
> + // An unknown URL scheme will cause our error handler to be hit.
> + let url = "unknown-scheme://something";
So this line is causing an exception in the XHR code rather than onerror(). Gavin, do you have other ideas on how to cause onerror() without relying on dns failure or other parts of the network stack?
Attachment #8546316 -
Flags: feedback?(gavin.sharp)
Comment 13•10 years ago
|
||
I suppose you could avoid blocking on DNS by making a request to 127.0.0.1?
Updated•10 years ago
|
Attachment #8546316 -
Flags: feedback?(gavin.sharp)
Comment 14•10 years ago
|
||
(In reply to :Gavin Sharp [email: gavin@gavinsharp.com] from comment #13)
> I suppose you could avoid blocking on DNS by making a request to 127.0.0.1?
The only problem I have with that is that due to xpcshell tests running concurrently, there's no way I know of to ensure the port I choose isn't going to actually be in-use via httpd.js being used by another test (or indeed, by anything else on the machine).
But I wonder if I can use httpd.js (which selects an unused port) and return an invalid content-length or something...
Flags: needinfo?(mhammond)
Updated•10 years ago
|
Flags: firefox-backlog+
Updated•10 years ago
|
Iteration: --- → 38.1 - 26 Jan
Flags: qe-verify?
Updated•10 years ago
|
Points: --- → 2
Flags: needinfo?(mhammond)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 20•10 years ago
|
||
I thought I'd already uploaded this :(
The change is:
- // this will cause an "unknown host" error, but not report an external
- // network connection in the tests (note that the hosts listed in
- // server-locations.txt are *not* loaded for xpcshell tests...)
- let url = "https://nocert.example.com:443";
+ // using a port > 2^32 causes an error to be reported.
+ let url = "http://localhost:111111111";
+
which causes our onerror handler to be called.
Attachment #8546316 -
Attachment is obsolete: true
Attachment #8552781 -
Flags: review?(gavin.sharp)
Updated•10 years ago
|
Attachment #8552781 -
Flags: review?(gavin.sharp) → review+
Comment 21•10 years ago
|
||
Comment 22•10 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 38
Reporter | ||
Comment 23•10 years ago
|
||
Updated•10 years ago
|
Flags: qe-verify? → qe-verify-
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 77•9 years ago
|
||
Mark, this one has been continuing to happen fairly consistently. Do you think it's worth re-opening the bug for further investigation?
Flags: needinfo?(markh)
Comment 78•9 years ago
|
||
(In reply to Nigel Babu [:nigelb] from comment #77)
> Mark, this one has been continuing to happen fairly consistently. Do you
> think it's worth re-opening the bug for further investigation?
Yeah, it's pretty difficult to argue the bug truly is fixed :(
Status: RESOLVED → REOPENED
Flags: needinfo?(markh)
Resolution: FIXED → ---
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 83•9 years ago
|
||
Florian, do you think you could have a look at this?
Assignee: markh → nobody
Flags: needinfo?(florian)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 85•9 years ago
|
||
I'm not sure if what this test still needs is a domain name that will fail locally - but if so, you should be able to use anything now that ends in .onion.. (facebook.onion e.g. :))
thanks to 1228457
Comment 86•8 years ago
|
||
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
Comment 87•8 years ago
|
||
I see no OrangeFactor Robot comment here, so I assume this means the failure hasn't occurred in a long while.
Status: REOPENED → RESOLVED
Closed: 10 years ago → 8 years ago
Flags: needinfo?(florian)
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•