Closed
Bug 974564
Opened 11 years ago
Closed 11 years ago
Gaia tree needs to be reopened
Categories
(Firefox OS Graveyard :: Gaia, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gaye, Assigned: gaye)
References
Details
Attachments
(1 file)
On Tuesday, Feb 18 2014 I closed the tree because we started seeing (1) b2g die during the marionette js email integration tests and (2) travis die (no output for 10 minutes) during |npm install|.
Issue (1) was first observed on this build https://travis-ci.org/mozilla-b2g/gaia/builds/18901708. The build output was
> email next previous
>
>◦ should not move down when down tapped if greyed out:
>
>No output has been received in the last 10 minutes, this potentially indicates a >stalled build or something wrong with the build itself.
The intermittent error is not specific to this test though. There were others like
> email notifications, foreground
>
>◦ should have bulk message notification in the different account:
>
>No output has been received in the last 10 minutes, this potentially indicates a >stalled build or something wrong with the build itself.
I tried running pull requests on Travis which reverted each of the following gaia commits
- 9bb31f0da81f92f0048207ea9cf51628bb1183a5
- 19304fe63b26630f95c6f3a7da49349b433d476e and
- b9859b9c2199e1c5b7e4eda220e556a7ff43cd21
However, none of them fixed the issue (the intermittent error still popped up on each pull request). This makes me think the regression may have been introduced by a gecko patch, but further investigation and bisecting is necessary.
Issue (2) popped up yesterday morning while I was trying to debug issue (1). We are still seeing the following with some frequency:
>...
>npm http 200 https://registry.npmjs.org/q/-/q-0.9.7.tgz
>
>npm http 200 https://registry.npmjs.org/proxyquire/-/proxyquire-0.4.1.tgz
>
>No output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the >build itself.
Both travis and npm have reported minor outages in the past 24 hours on https://twitter.com/traviscistatus and https://twitter.com/npmjs, but they may not be aware of the severity and ubiquity of the issues.
This bug will track the work to resolve issues 1 and 2 and then (with some luck : ) reopen the tree.
Assignee | ||
Updated•11 years ago
|
Summary: Tree needs to be reopened → Gaia tree needs to be reopened
Assignee | ||
Comment 1•11 years ago
|
||
I have tweeted at both travis and npmjs to ask them for some guidance https://twitter.com/garethaye/status/436227129089724416.
Comment 2•11 years ago
|
||
I joined #travis on IRC and started asking some questions. I'm also running a job with npm log level = verbose, so hopefully that might help us narrow down the problem.
Assignee | ||
Comment 3•11 years ago
|
||
https://travis-ci.org/mozilla-b2g/gaia/builds/19214866 are 30 builds on b2g aurora. Notably, it doesn't have any errors unrelated to travis/npm. This demonstrates to statistical significance that we have a gecko regression. We are looking into bisecting now.
Comment 4•11 years ago
|
||
Now I am really confused. This build started going green like crazy: https://travis-ci.org/mozilla-b2g/gaia/builds/19226368
Unless a fix for gecko just landed something weird is happening.
Comment 5•11 years ago
|
||
(In reply to Kevin Grandon :kgrandon from comment #4)
> Now I am really confused. This build started going green like crazy:
> https://travis-ci.org/mozilla-b2g/gaia/builds/19226368
>
> Unless a fix for gecko just landed something weird is happening.
Not sure what you mean? There are 3 failed builds: 1) https://travis-ci.org/mozilla-b2g/gaia/jobs/19226378 2) https://travis-ci.org/mozilla-b2g/gaia/jobs/19226381 3) https://travis-ci.org/mozilla-b2g/gaia/jobs/19226390
Comment 6•11 years ago
|
||
Yup but we don't care about those (I am disabling those tests and opening bugs). The main one we care about is those nasty grey ones which popped up later. So seems there is something funky going on with gecko after all =/
Updated•11 years ago
|
Comment 7•11 years ago
|
||
I don't see what points to a gecko issue.
All grey builds I've seen just stop after (or during) the npm install process.
=> https://travis-ci.org/mozilla-b2g/gaia/pull_requests
firefox or b2g is not even launched.
Comment 8•11 years ago
|
||
(In reply to Julien Wajsberg [:julienw] from comment #7)
> I don't see what points to a gecko issue.
>
> All grey builds I've seen just stop after (or during) the npm install
> process.
> => https://travis-ci.org/mozilla-b2g/gaia/pull_requests
>
> firefox or b2g is not even launched.
We've managed to handle the npm problems with this: https://github.com/mozilla-b2g/gaia/pull/15319
(That is currently under review, and we may want to add additional reviewers). Tonight we have been basing investigations off of that patch.
Here is an example of the timeout we think is a gecko regression: https://travis-ci.org/mozilla-b2g/gaia/jobs/19230769
Comment 9•11 years ago
|
||
First failing job on Travis: https://travis-ci.org/mozilla-b2g/gaia/jobs/18901712
Previous (passing) job on Travis: https://travis-ci.org/mozilla-b2g/gaia/jobs/18897782
We can see the failing job uses this b2g desktop: http://ftp.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/mozilla-central-linux64_gecko/1392409311/b2g-30.0a1.multi.linux-x86_64.tar.bz2
while the passing one: http://ftp.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/mozilla-central-linux64_gecko/1392386116/b2g-30.0a1.multi.linux-x86_64.tar.bz2
So unless it's very intermittent, we can expect that the failing b2g desktop is already "bad".
Comment 10•11 years ago
|
||
failing b2g desktop hg changeset: https://hg.mozilla.org/mozilla-central/rev/33b3248b4aa0
passing b2g desktop hg changeset: https://hg.mozilla.org/mozilla-central/rev/eac89fb04bb9
(again, the 'passing' changeset does not mean it's good)
Comment 11•11 years ago
|
||
So I can see more similar failures on TBPL before this changeset.
Also, Bug 953212 is the bug opened by the automation team tracking this issue, we can clearly see an increase of this issue since last week or the week before.
Assignee | ||
Comment 12•11 years ago
|
||
It should be mentioned that the intermittent errors are only showing up in email app test cases.
Assignee | ||
Comment 13•11 years ago
|
||
I've pushed this patch https://github.com/mozilla-b2g/gaia/pull/16486 to add some debugging info for the test that's failing most frequently. The builds are showing up here https://travis-ci.org/mozilla-b2g/gaia/builds/19260139. 30050.7 shows the regression, for instance.
Comment 14•11 years ago
|
||
Trying to reproduce locally:
while xvfb-run -a make test-integration TEST_FILES=apps/email/test/marionette/next_previous_test.js ; do sleep .1 ; done
in 4 different locale clones.
Comment 15•11 years ago
|
||
I got one :
1) email next previous "before each" hook:
ScriptTimeout: (28) timed out
Remote Stack:
<none>
at Error.MarionetteError (/home/julien/travail/git/gaia-clone-1/node_modules/marionette-client/lib/marionette/error.js:67:13)
at Object.Client._handleCallback (/home/julien/travail/git/gaia-clone-1/node_modules/marionette-client/lib/marionette/client.js:474:19)
at /home/julien/travail/git/gaia-clone-1/node_modules/marionette-client/lib/marionette/client.js:508:21
at TcpSync.send (/home/julien/travail/git/gaia-clone-1/node_modules/marionette-client/lib/marionette/drivers/tcp-sync.js:94:10)
which is _not_ what we have in Travis IINW.
Comment 16•11 years ago
|
||
I could reproduce several times the same ScriptTimeout at "before each", and its always at:
at Object.Email.launch (/home/julien/travail/git/gaia/apps/email/test/marionette/lib/email.js:358:17)
But I could reproduce once the issue we have in Travis. Now I don't know how to hook into it...
Comment 17•11 years ago
|
||
I could see the crash reporter in the list of my processes so I know that when B2G hangs it's really a crash. But no crash dump yet...
Comment 18•11 years ago
|
||
hint: running "make test-integration DEBUG=0" gives you the marionette-js commands that are executed.
Now, I looped running this on my local X server (as opposed to Xvfb) and... my computer restarted. Yay.
Comment 19•11 years ago
|
||
(In reply to Gareth Aye [:gaye] from comment #12)
> It should be mentioned that the intermittent errors are only showing up in
> email app test cases.
I have actually seen this happen for other applications as well (dialer I think). Unfortunately I don't have a travis link handy - but it does happen in other places.
Assignee | ||
Comment 20•11 years ago
|
||
We have narrowed the issue down to a b2g crash that gets triggered by sending emails (with a low rate of reproducibility) in the marionette js test suite. Now that we know what's going on (even though we don't know why or which patch introduced the bug), it's time to reopen the tree.
Attachment #8379242 -
Flags: review?(bugmail)
Comment 21•11 years ago
|
||
Makes me sad that we are disabling these tests. Let's make sure we prioritize tracking this down asap. Should the productivity team own this work?
Comment 22•11 years ago
|
||
Comment on attachment 8379242 [details]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia/pull/16494
r=asuth on disabling all e-mail tests for now until we can isolate and get the gecko bug fixed and/or workaround it in the e-mail backend.
Attachment #8379242 -
Flags: review?(bugmail) → review+
Comment 23•11 years ago
|
||
(In reply to Kevin Grandon :kgrandon from comment #21)
> Makes me sad that we are disabling these tests. Let's make sure we
> prioritize tracking this down asap. Should the productivity team own this
> work?
I don't think it would make sense for any other functional team to own it unless there's a team/person who owns keeping testing in general healthy. (I think all y'all have just been drafted by your natural awesomeness!) Intermittent crashes involving e-mail frequently turn out to be GC hazards related to workers, so it's something we do want to track down in general anyways.
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → gaye
Assignee | ||
Comment 24•11 years ago
|
||
Tree is open again!
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 25•11 years ago
|
||
Thank you, Gareth!
Comment 26•11 years ago
|
||
Gareth filed bug 975588 to turn the e-mail tests back on.
You need to log in
before you can comment on or make changes to this bug.
Description
•