Closed
Bug 1241297
Opened 9 years ago
Closed 9 years ago
Wpt5 fails on TC but not on Buildbot
Categories
(Testing :: General, defect)
Testing
General
Tracking
(e10s+, firefox46 fixed, firefox47 fixed)
RESOLVED
FIXED
mozilla47
People
(Reporter: armenzg, Assigned: armenzg)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
(deleted),
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
The wpt-3 e10s job always times out on TC/docker regardless of how long you give it (now timing out at 120 minutes) [1][2]. It seems that if I bumped it beyond 120 minutes it might finish on time.
It seems that Buildbot has that same chunk running hidden [3]
jgraham, do you want us to try chunking this job further?
or try it on an m3.xlarge instance?
Eventually we want to disable the Buildbot jobs and used these running on TaskCluster.
[1]
https://public-artifacts.taskcluster.net/FVFCzYC3SCWrdYZ-D1suhA/1/public/logs/live_backing.log
[2]
https://treeherder.mozilla.org/#/jobs?repo=try&author=armenzg@mozilla.com&filter-searchStr=web-platform-tests%203%29&selectedJob=15573412
[3]
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=linux%20x64%20debug%20web-platform-tests-e10s&group_state=expanded
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=linux%20x64%20debug%20web-platform-tests-e10s&group_state=expanded&exclusion_profile=false
[4]
16:41:01 INFO - PROCESS | 1207 | JavaScript error: executormarionette.py, line 33: Error: Permission denied to access property "timeout"
16:41:05 INFO - TEST-UNEXPECTED-TIMEOUT | /html/dom/reflection-embedded.html | expected OK
[5]
17:56:16 CRITICAL - Loading initial page http://web-platform.test:8000/testharness_runner.html failed. Ensure that the there are no other programs bound to this port and that your firewall rules or network setup does not prevent access.\eTraceback (most recent call last):
17:56:16 CRITICAL - File "/home/worker/workspace/build/tests/web-platform/harness/wptrunner/executors/executormarionette.py", line 124, in load_runner
17:56:16 CRITICAL - self.marionette.navigate(url)
17:56:16 CRITICAL - File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/marionette_driver/marionette.py", line 1505, in navigate
17:56:16 CRITICAL - self._send_message("get", {"url": url})
17:56:16 CRITICAL - File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/marionette_driver/decorators.py", line 36, in _
17:56:16 CRITICAL - return func(*args, **kwargs)
17:56:16 CRITICAL - File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/marionette_driver/marionette.py", line 748, in _send_message
17:56:16 CRITICAL - self._handle_error(err)
17:56:16 CRITICAL - File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/marionette_driver/marionette.py", line 809, in _handle_error
17:56:16 CRITICAL - raise errors.lookup(error)(message, stacktrace=stacktrace)
17:56:16 CRITICAL - UnknownException: UnknownException: Error loading page
17:56:16 CRITICAL -
17:56:16 CRITICAL -
Comment 1•9 years ago
|
||
this changed in the last 2 weeks for <50 minutes to >90 minutes- we should identify the root cause of that.
Comment 2•9 years ago
|
||
There are generally unexplained problems with W3 on Linux; see https://bugzilla.mozilla.org/show_bug.cgi?id=1238435#c26 Let's revisit after that is fixed.
Depends on: 1238435
Comment 3•9 years ago
|
||
w3 seems to have become much worse in runtime by looking at this range:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=Linux%20x64%20debug%20W3C%20Web%20Platform%20Tests%20W3C%20Web%20Platform%20Tests%20W%283%29&group_state=expanded&fromchange=c33f30666b37&tochange=6020a4cb41a7
judging by the running jobs, I suspect our culprit lies in this set of changes:
https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=359f86fecbc2
and we have new tests which showed up there:
https://hg.mozilla.org/mozilla-central/rev/31a86d5e5ffa
working on filling in the gaps in inbound to prove that:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-str=web&filter-searchStr=Linux%20x64%20debug%20W3C%20Web%20Platform%20Tests%20W3C%20Web%20Platform%20Tests%20W%283%29&tochange=31a86d5e5ffa&fromchange=cd6b93ff2af7
If we added a handful of tests and our runtime goes from ~45 -> ~80 minutes, that is suspect- are these tests necessary, are there bugs, can we optimize, is this across all platforms, etc.?
At the very least we should chunk more which we can do in taskcluster land- not sure about available builders in buildbot land (linux64 is close to full)
Comment 4•9 years ago
|
||
adding bkelly and yury as they landed/reviewed the new wpt tests- I know there are other issues with the stability of the wpt(3) tests, I care about runtime :)
Comment 5•9 years ago
|
||
jmaher: Yeah, those tests are already implicated. But the mechanism is a mystery because they don't actually run in W3. Which suggests chunking changes. But why that would cause the browser to be unable to load pages I don't know. Possibly something is crashing the web server or similar, but I need to actually reproduce the issue to be sure.
Updated•9 years ago
|
Blocks: e10s-tests
tracking-e10s:
--- → +
Comment 6•9 years ago
|
||
(In reply to James Graham [:jgraham] from comment #5)
> jmaher: Yeah, those tests are already implicated. But the mechanism is a
> mystery because they don't actually run in W3. Which suggests chunking
> changes. But why that would cause the browser to be unable to load pages I
> don't know. Possibly something is crashing the web server or similar, but I
> need to actually reproduce the issue to be sure.
As noted at https://bugzilla.mozilla.org/show_bug.cgi?id=1238435#c11 , the reflection wpt tests are crashing on e10s. The chunking of the tests moved them (reflection tests) in W3 for linux64 after new tests are added. It's hard to tell in which chunk reflection tests where located or in which chunks they are on other platforms (e.g. see raw logs of W2 on Mac OSX https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&selectedJob=20210378). But reflection test definitely produce crashes on linux64 e10s due to DNS host name resolution.
Comment 7•9 years ago
|
||
the runtime difference is in both e10s and non e10s.
Assignee | ||
Comment 8•9 years ago
|
||
Is disabling the test(s) causing the crash a possibility?
At least we would get results from other tests until the crash is ironed out.
Comment 9•9 years ago
|
||
(In reply to Armen Zambrano [:armenzg] - Engineering productivity from comment #8)
> Is disabling the test(s) causing the crash a possibility?
> At least we would get results from other tests until the crash is ironed out.
This solution was r- at https://bugzilla.mozilla.org/show_bug.cgi?id=1238435#c16
Comment 10•9 years ago
|
||
we run an entirely different set of tests now in chunk 3 (keep in mind this is debug chunk 3 where we have 8 chunks). I think this bug is not useful other than to split wpt into more chunks and let the crashing/timeouts be solved in bug 1238435
Assignee | ||
Comment 11•9 years ago
|
||
Has anything changed?
It seems that going from 8 to 12 chunks has cleared this:
https://treeherder.mozilla.org/#/jobs?repo=try&author=armenzg@mozilla.com&filter-searchStr=web-platform-tests&group_state=expanded
It seems that the lengthy tests are now running accross chunks 4, 5 & 6 (all between 40 & 50 minutes)
Comment 12•9 years ago
|
||
If chunking exposed this bug in the first place, it's not too surprising if rechunking hides it again.
I certainly don't object to increasing the number of chunks in general; do you want to do that on buildbot and TC?
Assignee | ||
Comment 13•9 years ago
|
||
I don't know if we will be able to do it for Buildbot as the limit of builders was pretty high and close to the limit.
We can increase the TC chunking and at least have that visible.
Comment 14•9 years ago
|
||
Good luck with working around it by changing your chunk numbers - the bug 1242153 wpt update moved things around enough to get whatever two tests it is that don't like being in the same chunk into your chunk 4, both e10s and not-e10s, so I added them to the exclusion hiding buildbot's e10s-3.
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → armenzg
Summary: wpt-3 e10s always takes as long as the max runtime allows it to → Bump timeout for wpt tests (wpt4 times out)
Assignee | ||
Comment 15•9 years ago
|
||
Assignee | ||
Comment 16•9 years ago
|
||
Assignee | ||
Comment 17•9 years ago
|
||
Assignee | ||
Comment 18•9 years ago
|
||
Going back from 12 chunks to 8 chunks and increasing the time out does not necessarily improve the matter (AFAIK):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=788f6950a3bd
I will have to see if the tests run on Buildbot or not and compare with the TC jobs.
Assignee | ||
Comment 19•9 years ago
|
||
Just to update this, we're still having this issue for wpt5 e10s:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=aba896a9c4e2&filter-searchStr=Linux%20x64%20debug%20Desktop%20web-platform-tests%20[TC]%20Linux64%20web-platform-tests%205%20tc-W%285%29&exclusion_profile=false
Assignee | ||
Comment 20•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Summary: Bump timeout for wpt tests (wpt4 times out) → Wpt5 fails on TC but not on Buildbot
Assignee | ||
Comment 21•9 years ago
|
||
Having the same chunks will hopefully make comparing Buildbot and TaskCluster easier (it might fix the problem).
Attachment #8715849 -
Flags: review?(jmaher)
Comment 22•9 years ago
|
||
Comment on attachment 8715849 [details] [diff] [review]
wpt test jobs from 12 to 8 chunks to match Buildbot
Review of attachment 8715849 [details] [diff] [review]:
-----------------------------------------------------------------
I would like to go back to 12 chunks as soon as we are on taskcluster only.
Attachment #8715849 -
Flags: review?(jmaher) → review+
Assignee | ||
Comment 23•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/6391ae6db0ff732c8affd9143e8d080c22d1c4c6
Bug 1241297 - Bump timeout for TC Linux64 wpt tests and go from 12 chunks to 8 chunks. DONTBUILD. r=jmaher
Comment 24•9 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox47:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Assignee | ||
Comment 25•9 years ago
|
||
Going back to 8 has cleared the issue.
(In reply to Joel Maher (:jmaher) from comment #22)
> I would like to go back to 12 chunks as soon as we are on taskcluster only.
Someone will have to work on bug 1238435 before we can go to any different chunking.
Comment 26•9 years ago
|
||
bugherder uplift |
status-firefox46:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•