Closed Bug 1342963 Opened 8 years ago Closed 8 years ago

Severe performance regression in Android jsreftest jobs

Tracking

(firefox54 fixed)

Status:

RESOLVED FIXED

Milestone:

Firefox 54

Tracking Flags:

Tracking

Status

firefox54

---

fixed

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

Attachments

(2 files, 1 obsolete file)

increase android jsreftest chunks 8 years ago Geoff Brown [:gbrown] (deleted), patch	dustin : review+	Details \| Diff \| Splinter Review
add 'when' clause to tests and js/src/** for jsreftests 8 years ago Joel Maher ( :jmaher ) (UTC -8) (deleted), patch		Details \| Diff \| Splinter Review
add 'when' clause to tests and js/src/ and js/public/ for jsreftests 8 years ago Joel Maher ( :jmaher ) (UTC -8) (deleted), patch	dustin : review+	Details \| Diff \| Splinter Review

Geoff Brown [:gbrown]

Assignee

Description

•

8 years ago

On mozilla-central, until Feb 23, each chunk of the Android opt jsreftests completed in 25 to 35 minutes; since then, each chunk is completing in about 110 minutes. There is a similar regression in Android debug jsreftests.

Geoff Brown [:gbrown]

Assignee

Comment 1

•

8 years ago

On central, the first slow revision is https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=c02dd6a7e9c193b488271eb53e3ea039042c9ed6&filter-searchStr=android+jsreftest

Geoff Brown [:gbrown]

Assignee

Updated

•

8 years ago

Priority: -- → P1

Geoff Brown [:gbrown]

Assignee

Comment 6

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=85faf95c2400d9931cb0b98f6c089bf80c24d4f3&filter-searchStr=android+jsreftest and all later Android jsreftest jobs run much slower than earlier pushes. At least part of the issue is the addition of previously skipped tests. Earlier runs of Android opt J1 have something like: REFTEST SUITE-START | Running 3326 tests REFTEST INFO | Running chunk 1 out of 10 chunks. tests 1-355/355 Later runs have: REFTEST SUITE-START | Running 26420 tests REFTEST INFO | Running chunk 1 out of 10 chunks. tests 1-2809/2809 I see bug 977849 also increased Android jsreftest chunks in an earlier push...but just barely enough. If we really want to run all these tests, we'll need to use more chunks to avoid the intermittent timeouts we are seeing (duplicate bugs and bug 1204281). Also, we usually try to keep test job run-times under 30 minutes. We'll need at least 30 Android opt chunks (20 more chunks) to meet that goal, and perhaps 90 chunks (55 more chunks) for Android debug. :shu -- do you really want to run all those tests on Android? :dustin -- is it okay to add that many chunks?

Blocks: 977849

Flags: needinfo?(shu)

Flags: needinfo?(dustin)

Summary: Severe performance regression in Android jsreftests → Severe performance regression in Android jsreftest jobs

Dustin J. Mitchell [:dustin] (he/him)

Comment 7

•

8 years ago

So that's about 75 more 30-minute chunks per push, about 37.5 compute-hours, at US$0.04/hr that's about $1.50 per push just for compute, plus more to store all those logs. I think -- Greg's better at these calculations than I am!

Flags: needinfo?(dustin) → needinfo?(garndt)

Shu-yu Guo [:shu]

Comment 8

•

8 years ago

(In reply to Geoff Brown [:gbrown] from comment #6) > :shu -- do you really want to run all those tests on Android? Yes, but not all the time. The previous plan was to aggressively SETA (I still don't know what that's an abbreviation for, but the thing where we don't run the tests on every push) them. They should also only be run when js/src is touched by the push. Can we make those things happen and up the chunk count and timeout times at the same time to cut down on costs and intermittents?

Flags: needinfo?(shu)

Joel Maher ( :jmaher ) (UTC -8)

Comment 9

•

8 years ago

I think the win here is for when js/src is touched. When SETA sees new jobs (or even chunks), it will always run those for 2 weeks to ensure they are stable and we get some basic data before skipping it. SETA will determine which jobs have found regressions in the past (high value) and only run high value jobs for pushes 1-4, then on push 5, it will run every possibly test; repeat. :dustin, do you see any faults in selecting running tests when certain files are changed. I assume this is using the 'when' clause which I have bugged you about in the past many times. One fault I see is that on try server these will only ever be run if/when we match the files- that could be confusing if there is a failure caused by something else.

Flags: needinfo?(dustin)

Geoff Brown [:gbrown]

Assignee

Comment 10

•

8 years ago

(In reply to Shu-yu Guo [:shu] from comment #8) > (In reply to Geoff Brown [:gbrown] from comment #6) > > :shu -- do you really want to run all those tests on Android? > > Yes, but not all the time. The previous plan was to aggressively SETA (I > still don't know what that's an abbreviation for, but the thing where we > don't run the tests on every push) them. They should also only be run when > js/src is touched by the push. Can we make those things happen and up the > chunk count and timeout times at the same time to cut down on costs and > intermittents? Thanks :shu. Only running when js/src is touched would be an interesting optimization. Would you want to see that only for Android, or desktop as well? Would you like to see that apply to all jsreftests, or just the test262 ones?

Greg Arndt [:garndt]

Comment 11

•

8 years ago

(In reply to Dustin J. Mitchell [:dustin] from comment #7) > So that's about 75 more 30-minute chunks per push, about 37.5 compute-hours, > at US$0.04/hr that's about $1.50 per push just for compute, plus more to > store all those logs. I think -- Greg's better at these calculations than I > am! Correct, we average $0.04/hr for that worker type. Looking at the last 7 days, I recorded 1234 pushes resulting in automation for 'android-4-3-armv7-api15 debug' or 'android-4-3-armv7-api15 opt'. Assuming $1.50/push [1] without optimization by SETA or using the 'when' that's $1851 per week in additional automation. SETA and running tests only when files changes would definitely help. Is it possible to run tests only when file changes on certain branches, but always run them on try? Try in this case was about a third of our pushes that contained those platforms (451 out of 1234) [1] Assumed that the additional 75 chunks will run 30 minutes each and never be retried.

Flags: needinfo?(garndt)

Shu-yu Guo [:shu]

Comment 12

•

8 years ago

(In reply to Geoff Brown [:gbrown] from comment #10) > (In reply to Shu-yu Guo [:shu] from comment #8) > > (In reply to Geoff Brown [:gbrown] from comment #6) > > > :shu -- do you really want to run all those tests on Android? > > > > Yes, but not all the time. The previous plan was to aggressively SETA (I > > still don't know what that's an abbreviation for, but the thing where we > > don't run the tests on every push) them. They should also only be run when > > js/src is touched by the push. Can we make those things happen and up the > > chunk count and timeout times at the same time to cut down on costs and > > intermittents? > > Thanks :shu. Only running when js/src is touched would be an interesting > optimization. Would you want to see that only for Android, or desktop as > well? Would you like to see that apply to all jsreftests, or just the > test262 ones? It's fine (and preferred, even) for all jsreftests, on all platforms, to be only run when js/src is touched.

Dustin J. Mitchell [:dustin] (he/him)

Comment 13

•

8 years ago

(In reply to Joel Maher ( :jmaher) from comment #9) > :dustin, do you see any faults in selecting running tests when certain files > are changed. I assume this is using the 'when' clause which I have bugged > you about in the past many times. One fault I see is that on try server > these will only ever be run if/when we match the files- that could be > confusing if there is a failure caused by something else. Yeah, that's always a risk -- if that "something else" is intermittent, then that's one thing (and something SETA should help with). If that "something else" is a source-code change that wasn't captured by the `when` clause, then that highlights the difficulty of fine-tuning `when` clauses to actually catch every possible related change. Still, it seems like it's probably worth the risk here!

Flags: needinfo?(dustin)

Joel Maher ( :jmaher ) (UTC -8)

Comment 14

•

8 years ago

I have an example of this working: https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&fromchange=839fa01da7aaa1ff23db2b73778cb2bddf6d28eb&group_state=expanded There are some issues with the 'when' clause, but this is doable without too much hacking.

Joel Maher ( :jmaher ) (UTC -8)

Comment 15

•

8 years ago

the problem with the 'when' clause is that we automatically append a bunch of taskcluster related file patterns to all 'when' clauses, for example: ['js/src/**'] becomes: ['js/src/**', u'taskcluster/ci\\test/**', u'taskcluster/taskgraph/**', u'taskcluster/docker/desktop1604-test/**'] while I can see some value in ensuring that all tasks are run properly, we do change taskcluster/ci/test/* often, that will cause us to run the tests very often. Is there a way for 'tests' that we can avoid adding in these?

Flags: needinfo?(dustin)

Dustin J. Mitchell [:dustin] (he/him)

Comment 16

•

8 years ago

On the other hand, a change to taskcluster/ci/test/* does have the capacity to affect these jobs, so *not* running them when that directory changes has the capacity to introduce bustage that doesn't show up on a push. The idea is this: When anything changes that will cause a task to fail, run the task At the extreme, is the halting problem - in the case described here, it would involve looking at the diff to taskcluster/ci/test/* and figuring out whether those changes could impact this particular test. We would have even more difficulty with taskcluster/taskgraph/** since it's all Python. So, we need to make an approximation. If we under-estimate - fail to run tasks when they would fail - we get hidden bustage and waste sheriff and developer energy chasing it down. If we over-estimate - run tasks when a detailed analysis can prove they will not fail - then we waste machine resources. The former is far worse than the latter, so I am a strong advocate of over-estimating, which is what is going on here. If you see a means of doing a finer analysis that does not cross into under-estimation, I'm open to that. Note that :gps has done a bunch of thinking on this topic.

Flags: needinfo?(dustin)

Geoff Brown [:gbrown]

Assignee

Comment 17

•

8 years ago

Attached patch increase android jsreftest chunks (deleted) — Details — Splinter Review

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c6ba9e62d8fa45f1cd26617eb3272f22335e2c75 Increase Android opt jsreftest from 10 to 40 chunks. Increase Android debug jsreftest from 35 to 100 chunks. This returns run times for most jobs to the 30 - 45 minute range. I intend not to check-in until the run-only-on-change work is ready.

Attachment #8842487 - Flags: review?(dustin)

Joel Maher ( :jmaher ) (UTC -8)

Comment 18

•

8 years ago

:gbrown, do you have any problems with this being run more often than just js/src/* but much less frequent then ALL ?

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Assignee

Comment 19

•

8 years ago

No, I think that's fine.

Flags: needinfo?(gbrown)

Joel Maher ( :jmaher ) (UTC -8)

Comment 20

•

8 years ago

Attached patch add 'when' clause to tests and js/src/** for jsreftests (obsolete) (deleted) — Details — Splinter Review

ok, lets make it happen

Attachment #8842515 - Flags: review?(dustin)

Steve Fink [:sfink] [:s:]

Comment 21

•

8 years ago

Comment on attachment 8842515 [details] [diff] [review] add 'when' clause to tests and js/src/** for jsreftests Review of attachment 8842515 [details] [diff] [review]: ----------------------------------------------------------------- ::: taskcluster/ci/test/tests.yml @@ +319,5 @@ > extra-options: > - --reftest-suite=jsreftest > + when: > + files-changed: > + - "js/src/**" Please add in js/public/** cf http://searchfox.org/mozilla-central/source/taskcluster/ci/spidermonkey/kind.yml#30

Steve Fink [:sfink] [:s:]

Comment 22

•

8 years ago

One could argue for other things like mfbt/** as well. Or the full set at http://searchfox.org/mozilla-central/source/taskcluster/ci/spidermonkey/kind.yml#45 But "one" != "me", because I think of the Android variant of these as low-value, high-cost tests.

Joel Maher ( :jmaher ) (UTC -8)

Comment 23

•

8 years ago

Attached patch add 'when' clause to tests and js/src/** and js/public/** for jsreftests (deleted) — Details — Splinter Review

thanks :sfink, updated

Attachment #8842515 - Attachment is obsolete: true

Attachment #8842515 - Flags: review?(dustin)

Attachment #8842542 - Flags: review?(dustin)

Joel Maher ( :jmaher ) (UTC -8)

Comment 24

•

8 years ago

this would account for jsreftests on all platforms, should we do mfbt/** ?

Dustin J. Mitchell [:dustin] (he/him)

Updated

•

8 years ago

Attachment #8842542 - Flags: review?(dustin) → review+

Pulsebot

Comment 25

•

8 years ago

Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/52a406607871 only run jsreftests when js/src/* changes. r=dustin

Geoff Brown [:gbrown]

Assignee

Updated

•

8 years ago

Keywords: leave-open

Dustin J. Mitchell [:dustin] (he/him)

Updated

•

8 years ago

Attachment #8842487 - Flags: review?(dustin) → review+

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 26

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/52a406607871

Geoff Brown [:gbrown]

Assignee

Updated

•

8 years ago

Keywords: leave-open

Pulsebot

Comment 27

•

8 years ago

Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/5ae59bdcd2d4 Add many more test chunks for Android jsreftests; r=dustin

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 28

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/5ae59bdcd2d4

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox54: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 54

Geoff Brown [:gbrown]

Assignee

Updated

•

7 years ago

Blocks: 1411358

Geoff Brown [:gbrown]

Assignee

Updated

•

7 years ago

No longer blocks: 1411358

BMO Automation

Updated

•

4 years ago

Product: Firefox for Android → Firefox for Android Graveyard

You need to log in before you can comment on or make changes to this bug.