Closed
Bug 960072
Opened 11 years ago
Closed 10 years ago
Fix gaia-integration tests
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: daleharvey)
References
Details
The gaia-integration tests have just been hidden on all trees, since they do not meet the requirements for being shown in the default view:
https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy
Notably:
https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy#7.29_Low_intermittent_failure_rate
* For stats see: http://brasstacks.mozilla.com/orangefactor/?display=OrangeFactor&test=gaia-integration&tree=trunk
* Main bugs are bug 953212, bug 920153, bug 953309.
And:
https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy#6.29_Outputs_failures_in_a_TBPL-starrable_format
* Typical failures give the following output, which requires opening the log to ascertain the true cause:
{
command timed out: 1200 seconds without output, attempting to kill
}
{
make: *** [node_modules/.bin/mozilla-download] Error 34
Return code: 512
Tests exited with return code 512: harness failures
# TBPL FAILURE #
}
To see the jobs, the &showall=1 param must be used, eg:
https://tbpl.mozilla.org/?tree=B2g-Inbound&jobname=gaia-integration&showall=1
We should either fix these tests, or disable them to save resources - a la bug 784681.
Reporter | ||
Updated•11 years ago
|
Reporter | ||
Updated•11 years ago
|
Summary: Fix or disable gaia-integration tests → Fix or disable gaia-integration tests (currently running but hidden)
Reporter | ||
Comment 1•11 years ago
|
||
I should add these are also currently permared, but had been starred as one of the intermittents, since the failures messages are all pretty similar unless the full log is opened.
https://tbpl.mozilla.org/php/getParsedLog.php?id=33017441&tree=B2g-Inbound
Failure summary:
{
make: *** [node_modules/.bin/mozilla-download] Error 1
Return code: 512
Tests exited with return code 512: harness failures
# TBPL FAILURE #
}
From full log:
{
00:13:19 INFO - Calling ['make', 'test-integration', 'NPM_REGISTRY=http://npm-mirror.pub.build.mozilla.org', 'REPORTER=mocha-tbpl-reporter'] with output_timeout 330
00:13:19 INFO - npm install --registry http://npm-mirror.pub.build.mozilla.org
00:13:31 INFO - npm ERR! Error: shasum check failed for /home/cltbld/tmp/npm-2259-YYIhRDkV/1389773611198-0.06632106122560799/tmp.tgz
00:13:31 INFO - npm ERR! Expected: 4db64844d80b615b888ca129d12f8accd1e27286
00:13:31 INFO - npm ERR! Actual: 6633a07cf7b1233a40366ffd16c90170190f139a
00:13:31 INFO - npm ERR! at /usr/lib/node_modules/npm/node_modules/sha/index.js:38:8
00:13:31 INFO - npm ERR! at ReadStream.<anonymous> (/usr/lib/node_modules/npm/node_modules/sha/index.js:85:7)
00:13:31 INFO - npm ERR! at ReadStream.EventEmitter.emit (events.js:117:20)
00:13:31 INFO - npm ERR! at _stream_readable.js:920:16
00:13:31 INFO - npm ERR! at process._tickCallback (node.js:415:13)
00:13:31 INFO - npm ERR! If you need help, you may report this log at:
00:13:31 INFO - npm ERR! <http://github.com/isaacs/npm/issues>
00:13:31 INFO - npm ERR! or email it to:
00:13:31 INFO - npm ERR! <npm-@googlegroups.com>
00:13:31 INFO - npm ERR! System Linux 3.2.0-23-generic-pae
00:13:31 INFO - npm ERR! command "/usr/bin/node" "/usr/bin/npm" "install" "--registry" "http://npm-mirror.pub.build.mozilla.org"
00:13:31 INFO - npm ERR! cwd /builds/slave/test/gaia
00:13:31 INFO - npm ERR! node -v v0.10.21
00:13:31 INFO - npm ERR! npm -v 1.3.11
00:13:32 INFO - (node) warning: possible EventEmitter memory leak detected. 11 listeners added. Use emitter.setMaxListeners() to increase limit.
00:13:32 INFO - Trace
00:13:32 INFO - at Socket.EventEmitter.addListener (events.js:160:15)
00:13:32 INFO - at Socket.Readable.on (_stream_readable.js:689:33)
00:13:32 INFO - at Socket.EventEmitter.once (events.js:179:8)
00:13:32 INFO - at Request.onResponse (/usr/lib/node_modules/npm/node_modules/request/request.js:625:25)
00:13:32 INFO - at ClientRequest.g (events.js:175:14)
00:13:32 INFO - at ClientRequest.EventEmitter.emit (events.js:95:17)
00:13:32 INFO - at HTTPParser.parserOnIncomingClient [as onIncoming] (http.js:1688:21)
00:13:32 INFO - at HTTPParser.parserOnHeadersComplete [as onHeadersComplete] (http.js:121:23)
00:13:32 INFO - at Socket.socketOnData [as ondata] (http.js:1583:20)
00:13:32 INFO - at TCP.onread (net.js:525:27)
...
...
}
Updated•11 years ago
|
Assignee: nobody → gaye
Comment 2•11 years ago
|
||
So yesterday and today (after some tooling work/disabling/patching) Gi was actually quite stable. We still don't have gaia try (see bug 960201), but I think the lower intermittent rate warrants discussion of making these visible again.
Thoughts?
Flags: needinfo?(ryanvm)
Flags: needinfo?(jgriffin)
Flags: needinfo?(emorley)
Comment 3•11 years ago
|
||
Just looking at today's runs, there's still a lot of failures...looks like we need to do some more work before unhiding these.
Flags: needinfo?(jgriffin)
Comment 4•11 years ago
|
||
Hey Jonathan,
Every time I've looked in the past week (since around last Thursday) we've had about 10 greens in a row (save for the hg issues I brought up with you and Aki). What percent green do we need and over how many observations?
Flags: needinfo?(jgriffin)
Comment 5•11 years ago
|
||
I guess there are a lot of dependent bugs. I will share these with Gaia folks and also look through them myself.
Reporter | ||
Comment 6•11 years ago
|
||
(In reply to Gareth Aye [:gaye] from comment #4)
> Hey Jonathan,
>
> Every time I've looked in the past week (since around last Thursday) we've
> had about 10 greens in a row (save for the hg issues I brought up with you
> and Aki). What percent green do we need and over how many observations?
For guidance on failure rate, see:
https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy#7.29_Low_intermittent_failure_rate
The other requirements on that page will also need to be met, per comment 0.
Flags: needinfo?(ryanvm)
Flags: needinfo?(jgriffin)
Flags: needinfo?(emorley)
Comment 7•11 years ago
|
||
Just to quantify this a bit, on the last 25 runs on b2g-inbound, there were 9 failures:
1 instance of bug 961438
7 instances of bug 953212
1 instance of bug 920153
I'm going to add some help for bug 920153, but bug 953212 is a bigger problem. It is actually conflating several different problems...sometimes this error happens during hg clone, sometimes during npm install, and sometimes during test execution. This absolutely needs to be fixed before we can unhide the tests, because it makes them very difficult to sheriff.
The changes I'll make to hg clone in bug 920153 will help with the instances of 953212 that occur during hg clone. So, we'll still need two other things:
1 - an update to the mozharness script to implement more specific error reporting when 'npm install' times out (or even better, figure why it's timing out and fix it)
2 - better handling when the harness itself times out. Currently there's no timeout handling in the harness itself, so the timeouts get handled by mozprocess. It would be much better for the harness to be able to monitor test execution and handle test timeouts more intelligently, but at a minimum, we need better reprorting...when a test times out, we should output an error which indicates which test timed out, rather than a generic (and thus unsheriffable) string. Potentially we could do this in the mozharness script by clever log parsing, but ideally it's something that should be baked into the harness.
I'm not sure if problem #1 is distinct from bug 953309...it's possible they have the same underlying cause. It may make sense to investigate and solve that problem, and see if #1 goes away, and if not, to add some smarter failure handling to the mozharness script.
Comment 8•11 years ago
|
||
Turns out there is internal timeout handling (though not internal hang handling, that being what results in the 330 seconds without output failures). The error message for a timeout is "test description "before each" hook". The error message for failing to find an iframe the test is looking for is "test description "before each" hook". Probably the error message for 1 != 2 is "test description "before each" hook".
I very strongly feel that this test suite isn't even close to acceptable, and it should be shut off everywhere except Cedar while it's turned into something that is close to acceptable.
Comment 9•11 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #7)
> 1 - an update to the mozharness script to implement more specific error
> reporting when 'npm install' times out (or even better, figure why it's
> timing out and fix it)
Would you mind creating a patch for the reporting?
I can look into improving timeout management in marionette-js-runner. Mocha does this, but I know that the framework that hooks up mocha, the marionette client, and gecko doesn't always handle things gracefully.
Flags: needinfo?(jgriffin)
Comment 10•11 years ago
|
||
(In reply to Gareth Aye [:gaye] from comment #9)
> (In reply to Jonathan Griffin (:jgriffin) from comment #7)
> > 1 - an update to the mozharness script to implement more specific error
> > reporting when 'npm install' times out (or even better, figure why it's
> > timing out and fix it)
>
> Would you mind creating a patch for the reporting?
>
> I can look into improving timeout management in marionette-js-runner. Mocha
> does this, but I know that the framework that hooks up mocha, the marionette
> client, and gecko doesn't always handle things gracefully.
Yes, I'll make a patch to improve the reporting. I also separately have made a patch to dump npm-debug.log when 'npm install' fails, which will hopefully help us figure out why it fails so often - http://hg.mozilla.org/build/mozharness/rev/35223f92c123.
Just eyeballing the failures, it looks like maybe a third of the failures, roughly, occur because 'npm install' causes something to clone a repo directly from github, and github hangs up on us. See e.g., https://tbpl.mozilla.org/php/getParsedLog.php?id=33629939&tree=Mozilla-Inbound#error0 :
07:05:34 INFO - npm ERR! git clone https://github.com/dominictarr/crypto-browserify.git Cloning into bare repository '/home/cltbld/.npm/_git-remotes/https-github-com-dominictarr-crypto-browserify-git-a9d1415f'...
07:05:34 INFO - npm ERR! git clone https://github.com/dominictarr/crypto-browserify.git
07:05:34 INFO - npm ERR! git clone https://github.com/dominictarr/crypto-browserify.git error: The requested URL returned error: 403 while accessing https://github.com/dominictarr/crypto-browserify.git/info/refs
07:05:34 INFO - npm ERR! git clone https://github.com/dominictarr/crypto-browserify.git fatal: HTTP request failed
Is there any way we can prevent us from needing to clone github repos?
Flags: needinfo?(jgriffin)
Comment 11•11 years ago
|
||
> Is there any way we can prevent us from needing to clone github repos?
We should make the npm-mirror fetch them or disallow github dependencies in gaia. I can do the former for now I think...
Reporter | ||
Comment 12•11 years ago
|
||
gaia-integration tests on {m-c, inbound, fx-team, b2g-inbound, try} are now hidden on linux64 (the linux32 and OSX variants were already hidden), due to bug 1004453.
Comment 13•11 years ago
|
||
And we've now landed permaorange on b2g-inbound, and merged it around to every other tree.
Reporter | ||
Comment 14•11 years ago
|
||
I've filed bug 1017607 for switching these off on trunk for !cedar, given that they are perma-failing, hidden and bug 1004453 isn't fixed.
Summary: Fix or disable gaia-integration tests (currently running but hidden) → Fix gaia-integration tests (currently running but hidden)
Updated•10 years ago
|
Comment 15•10 years ago
|
||
I've updated the dependencies here to only include infrastructure fixes. The tests now all block a separate bug which we should fix, but for the purposes of un-hiding, we can mass-disable them. (Most of them have already been disabled).
Updated•10 years ago
|
Updated•10 years ago
|
Assignee | ||
Comment 16•10 years ago
|
||
Hey Gareth, I am gonna be working on this till its unhidden, so stealing for now if thats cool
Assignee: gaye → dale
Assignee | ||
Comment 17•10 years ago
|
||
With the 2 blocking tests disabled (they both have patches), https://treeherder.mozilla.org/ui/#/jobs?repo=gaia-try&revision=a313fc16de92 is looking pretty green right now
Assignee | ||
Comment 18•10 years ago
|
||
James
So most of the errors currently give error reports in the form of some type of socket error, now I can see unhandled errors being passed through the client a bunch and theres a few ways to clean that up, I will hopefully get it to the point where tests failing like that will fail with an obvious "marionette couldnt send commands to b2g" or something
However that actually fix the tests, is b2g crashing when this happens, is there a way to get some visibility on what is with the b2g process when the socket dies?
Flags: needinfo?(jlal)
Assignee | ||
Comment 19•10 years ago
|
||
Clearing needinfo since we are looking into it @ https://bugzilla.mozilla.org/show_bug.cgi?id=1093799
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(jlal)
Assignee | ||
Comment 20•10 years ago
|
||
No longer blocking since test has been disabled
No longer depends on: 1091484
Reporter | ||
Comment 21•10 years ago
|
||
Jobs unhidden on Treeherder on all repos apart from b2g32 and b2g30, in bug 1037001.
Summary: Fix gaia-integration tests (currently running but hidden) → Fix gaia-integration tests
Assignee | ||
Comment 22•10 years ago
|
||
This was our meta bug to track enabling Gij, the remaining bugs can be tracked seperately
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•