Closed
Bug 1342963
Opened 8 years ago
Closed 8 years ago
Severe performance regression in Android jsreftest jobs
Categories
(Firefox for Android Graveyard :: Testing, defect, P1)
Firefox for Android Graveyard
Testing
Tracking
(firefox54 fixed)
RESOLVED
FIXED
Firefox 54
Tracking | Status | |
---|---|---|
firefox54 | --- | fixed |
People
(Reporter: gbrown, Assigned: gbrown)
References
Details
Attachments
(2 files, 1 obsolete file)
(deleted),
patch
|
dustin
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
dustin
:
review+
|
Details | Diff | Splinter Review |
On mozilla-central, until Feb 23, each chunk of the Android opt jsreftests completed in 25 to 35 minutes; since then, each chunk is completing in about 110 minutes.
There is a similar regression in Android debug jsreftests.
Assignee | ||
Comment 1•8 years ago
|
||
On central, the first slow revision is https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=c02dd6a7e9c193b488271eb53e3ea039042c9ed6&filter-searchStr=android+jsreftest
Assignee | ||
Updated•8 years ago
|
Priority: -- → P1
Assignee | ||
Comment 6•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=85faf95c2400d9931cb0b98f6c089bf80c24d4f3&filter-searchStr=android+jsreftest and all later Android jsreftest jobs run much slower than earlier pushes.
At least part of the issue is the addition of previously skipped tests. Earlier runs of Android opt J1 have something like:
REFTEST SUITE-START | Running 3326 tests
REFTEST INFO | Running chunk 1 out of 10 chunks. tests 1-355/355
Later runs have:
REFTEST SUITE-START | Running 26420 tests
REFTEST INFO | Running chunk 1 out of 10 chunks. tests 1-2809/2809
I see bug 977849 also increased Android jsreftest chunks in an earlier push...but just barely enough.
If we really want to run all these tests, we'll need to use more chunks to avoid the intermittent timeouts we are seeing (duplicate bugs and bug 1204281).
Also, we usually try to keep test job run-times under 30 minutes. We'll need at least 30 Android opt chunks (20 more chunks) to meet that goal, and perhaps 90 chunks (55 more chunks) for Android debug.
:shu -- do you really want to run all those tests on Android?
:dustin -- is it okay to add that many chunks?
Blocks: 977849
Flags: needinfo?(shu)
Flags: needinfo?(dustin)
Summary: Severe performance regression in Android jsreftests → Severe performance regression in Android jsreftest jobs
Comment 7•8 years ago
|
||
So that's about 75 more 30-minute chunks per push, about 37.5 compute-hours, at US$0.04/hr that's about $1.50 per push just for compute, plus more to store all those logs. I think -- Greg's better at these calculations than I am!
Flags: needinfo?(dustin) → needinfo?(garndt)
Comment 8•8 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #6)
> :shu -- do you really want to run all those tests on Android?
Yes, but not all the time. The previous plan was to aggressively SETA (I still don't know what that's an abbreviation for, but the thing where we don't run the tests on every push) them. They should also only be run when js/src is touched by the push. Can we make those things happen and up the chunk count and timeout times at the same time to cut down on costs and intermittents?
Flags: needinfo?(shu)
Comment 9•8 years ago
|
||
I think the win here is for when js/src is touched. When SETA sees new jobs (or even chunks), it will always run those for 2 weeks to ensure they are stable and we get some basic data before skipping it. SETA will determine which jobs have found regressions in the past (high value) and only run high value jobs for pushes 1-4, then on push 5, it will run every possibly test; repeat.
:dustin, do you see any faults in selecting running tests when certain files are changed. I assume this is using the 'when' clause which I have bugged you about in the past many times. One fault I see is that on try server these will only ever be run if/when we match the files- that could be confusing if there is a failure caused by something else.
Flags: needinfo?(dustin)
Assignee | ||
Comment 10•8 years ago
|
||
(In reply to Shu-yu Guo [:shu] from comment #8)
> (In reply to Geoff Brown [:gbrown] from comment #6)
> > :shu -- do you really want to run all those tests on Android?
>
> Yes, but not all the time. The previous plan was to aggressively SETA (I
> still don't know what that's an abbreviation for, but the thing where we
> don't run the tests on every push) them. They should also only be run when
> js/src is touched by the push. Can we make those things happen and up the
> chunk count and timeout times at the same time to cut down on costs and
> intermittents?
Thanks :shu. Only running when js/src is touched would be an interesting optimization. Would you want to see that only for Android, or desktop as well? Would you like to see that apply to all jsreftests, or just the test262 ones?
Comment 11•8 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #7)
> So that's about 75 more 30-minute chunks per push, about 37.5 compute-hours,
> at US$0.04/hr that's about $1.50 per push just for compute, plus more to
> store all those logs. I think -- Greg's better at these calculations than I
> am!
Correct, we average $0.04/hr for that worker type.
Looking at the last 7 days, I recorded 1234 pushes resulting in automation for 'android-4-3-armv7-api15 debug' or 'android-4-3-armv7-api15 opt'. Assuming $1.50/push [1] without optimization by SETA or using the 'when' that's $1851 per week in additional automation. SETA and running tests only when files changes would definitely help.
Is it possible to run tests only when file changes on certain branches, but always run them on try? Try in this case was about a third of our pushes that contained those platforms (451 out of 1234)
[1] Assumed that the additional 75 chunks will run 30 minutes each and never be retried.
Flags: needinfo?(garndt)
Comment 12•8 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #10)
> (In reply to Shu-yu Guo [:shu] from comment #8)
> > (In reply to Geoff Brown [:gbrown] from comment #6)
> > > :shu -- do you really want to run all those tests on Android?
> >
> > Yes, but not all the time. The previous plan was to aggressively SETA (I
> > still don't know what that's an abbreviation for, but the thing where we
> > don't run the tests on every push) them. They should also only be run when
> > js/src is touched by the push. Can we make those things happen and up the
> > chunk count and timeout times at the same time to cut down on costs and
> > intermittents?
>
> Thanks :shu. Only running when js/src is touched would be an interesting
> optimization. Would you want to see that only for Android, or desktop as
> well? Would you like to see that apply to all jsreftests, or just the
> test262 ones?
It's fine (and preferred, even) for all jsreftests, on all platforms, to be only run when js/src is touched.
Comment 13•8 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #9)
> :dustin, do you see any faults in selecting running tests when certain files
> are changed. I assume this is using the 'when' clause which I have bugged
> you about in the past many times. One fault I see is that on try server
> these will only ever be run if/when we match the files- that could be
> confusing if there is a failure caused by something else.
Yeah, that's always a risk -- if that "something else" is intermittent, then that's one thing (and something SETA should help with). If that "something else" is a source-code change that wasn't captured by the `when` clause, then that highlights the difficulty of fine-tuning `when` clauses to actually catch every possible related change.
Still, it seems like it's probably worth the risk here!
Flags: needinfo?(dustin)
Comment 14•8 years ago
|
||
I have an example of this working:
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&fromchange=839fa01da7aaa1ff23db2b73778cb2bddf6d28eb&group_state=expanded
There are some issues with the 'when' clause, but this is doable without too much hacking.
Comment 15•8 years ago
|
||
the problem with the 'when' clause is that we automatically append a bunch of taskcluster related file patterns to all 'when' clauses, for example:
['js/src/**']
becomes:
['js/src/**', u'taskcluster/ci\\test/**', u'taskcluster/taskgraph/**', u'taskcluster/docker/desktop1604-test/**']
while I can see some value in ensuring that all tasks are run properly, we do change taskcluster/ci/test/* often, that will cause us to run the tests very often.
Is there a way for 'tests' that we can avoid adding in these?
Flags: needinfo?(dustin)
Comment 16•8 years ago
|
||
On the other hand, a change to taskcluster/ci/test/* does have the capacity to affect these jobs, so *not* running them when that directory changes has the capacity to introduce bustage that doesn't show up on a push.
The idea is this:
When anything changes that will cause a task to fail, run the task
At the extreme, is the halting problem - in the case described here, it would involve looking at the diff to taskcluster/ci/test/* and figuring out whether those changes could impact this particular test. We would have even more difficulty with taskcluster/taskgraph/** since it's all Python.
So, we need to make an approximation. If we under-estimate - fail to run tasks when they would fail - we get hidden bustage and waste sheriff and developer energy chasing it down. If we over-estimate - run tasks when a detailed analysis can prove they will not fail - then we waste machine resources. The former is far worse than the latter, so I am a strong advocate of over-estimating, which is what is going on here.
If you see a means of doing a finer analysis that does not cross into under-estimation, I'm open to that.
Note that :gps has done a bunch of thinking on this topic.
Flags: needinfo?(dustin)
Assignee | ||
Comment 17•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c6ba9e62d8fa45f1cd26617eb3272f22335e2c75
Increase Android opt jsreftest from 10 to 40 chunks.
Increase Android debug jsreftest from 35 to 100 chunks.
This returns run times for most jobs to the 30 - 45 minute range.
I intend not to check-in until the run-only-on-change work is ready.
Attachment #8842487 -
Flags: review?(dustin)
Comment 18•8 years ago
|
||
:gbrown, do you have any problems with this being run more often than just js/src/* but much less frequent then ALL ?
Flags: needinfo?(gbrown)
Comment 21•8 years ago
|
||
Comment on attachment 8842515 [details] [diff] [review]
add 'when' clause to tests and js/src/** for jsreftests
Review of attachment 8842515 [details] [diff] [review]:
-----------------------------------------------------------------
::: taskcluster/ci/test/tests.yml
@@ +319,5 @@
> extra-options:
> - --reftest-suite=jsreftest
> + when:
> + files-changed:
> + - "js/src/**"
Please add in js/public/**
cf http://searchfox.org/mozilla-central/source/taskcluster/ci/spidermonkey/kind.yml#30
Comment 22•8 years ago
|
||
One could argue for other things like mfbt/** as well. Or the full set at http://searchfox.org/mozilla-central/source/taskcluster/ci/spidermonkey/kind.yml#45
But "one" != "me", because I think of the Android variant of these as low-value, high-cost tests.
Comment 23•8 years ago
|
||
thanks :sfink, updated
Attachment #8842515 -
Attachment is obsolete: true
Attachment #8842515 -
Flags: review?(dustin)
Attachment #8842542 -
Flags: review?(dustin)
Comment 24•8 years ago
|
||
this would account for jsreftests on all platforms, should we do mfbt/** ?
Updated•8 years ago
|
Attachment #8842542 -
Flags: review?(dustin) → review+
Comment 25•8 years ago
|
||
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/52a406607871
only run jsreftests when js/src/* changes. r=dustin
Assignee | ||
Updated•8 years ago
|
Keywords: leave-open
Updated•8 years ago
|
Attachment #8842487 -
Flags: review?(dustin) → review+
Comment 26•8 years ago
|
||
bugherder |
Assignee | ||
Updated•8 years ago
|
Keywords: leave-open
Comment 27•8 years ago
|
||
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5ae59bdcd2d4
Add many more test chunks for Android jsreftests; r=dustin
Comment 28•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 54
Updated•4 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•