Closed Bug 1445975 Opened 7 years ago Closed 6 years ago

support running talos benchmarks on spidermonkey shell builds

Categories

(Testing :: General, enhancement)

enhancement
Not set
normal

Tracking

(firefox62 fixed)

RESOLVED FIXED
mozilla62
Tracking Status
firefox62 --- fixed

People

(Reporter: jmaher, Assigned: ahal)

References

Details

Attachments

(3 files)

currently we run benchmarks in a browser- this is great, but adds a lot of noise and overhead that doesn't give us specific data for the JS engine. if you look at the spidermonkey builds, they run jstests and jittests after doing the shell build via the js[.exe] binary. This binary should be able to run the benchmarks. Lets use ARES-6 as the first option: https://searchfox.org/mozilla-central/source/third_party/webkit/PerformanceTests/ARES-6/cli.js We want to: * create a new job that depends on a spidermonkey job (https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=fcb11e93adf57210167de0b27b15433e9c3f45e4&filter-searchStr=spidermonkey%20opt&group_state=expanded) * this would most likely be a single job that runs all the benchmarks * need to use taskcluster to depend on the spidermonkey build * setup the filesystem and dependencies so everything run smoothly * collect results and format them properly in a PERFHERDER_DATA json blob so it will upload properly to perfherder. the last point I can help out with, just getting a test to run in a unique job via the js binary will be the majority of this bug.
Assignee: nobody → ahalberstadt
Status: NEW → ASSIGNED
A couple of follow up questions. 1) Should these run wherever SM builds are produced? Or are there particular configurations that should be included/excluded? 2) Is this going to be a new micro framework, or somehow built-in to Talos? (If the former, should the bug move to Testing::General?) 3) Who are the stakeholders outside of our team, and can we cc them to this bug? I'd love if we could run these out of the source directory, though that would limit it to Linux only for the time being. However, gps recently took over the bug to get source-test tasks working on Windows so that shouldn't be far behind. It also looks like we don't build SM on macosx or android, so we wouldn't be running these tests there anyway. If Windows is important to get working ASAP, then I'll just go the regular mozharness route.
I am not sure I understand what a source-test task is, maybe that would be useful- but I suspect it wouldn't be. 1) I would say lets do all of the linux options for js.exe builds. We have a variety of them currently available, I assume once we get one running tests, it is sort of a cut/paste for the others. 2) this doesn't need to be build into talos. It does need to report to perfherder though. 3) I added some folks to this as a cc, ideally more can be added by anyone as we find more stakeholders. I don't see a history of windows running on AWFY jsshell, although I can see value in getting that working.
Component: Talos → General
The scope here is a probably a bit bigger than we initially thought. I'm assuming we'll need a mach command to run these tests locally, so I think that'll be step one. Once we have the mach command, the tasks can go ahead and run that. At this point we're going to be implementing a mini-harness similar to |mach python-test|. A source-test task is just a task that runs via a mach command in topsrcdir instead of using mozharness + test packages. So I think it won't be too hard to set that up.
Thanks for cc'ing us on this bug. > It also looks like we don't build SM on macosx or android, so we wouldn't be running these tests there anyway. We actually do: for instance, OS X 10.10 opt runs a collection of tests called "Jit", which runs tests on the JS shell, so it had to be built in the first place. Regular builds include the shell in their artifacts, in a file named target.jsshell.zip. > I don't see a history of windows running on AWFY jsshell, although I can see value in getting that working. We don't indeed. In the past, it's been valuable to see the performance accross different OSes, and that's why we had "CompareOS", which is different in what is proposed here because it runs in browsers. Low-level performance discrepancies like locking contention and such OS dependent things can be understood better when seeing results on different OSes. But it doesn't sound too high priority, or even absolutely needed, for an MVP version; definitely in the "nice to have / maybe / someday" category. > The scope here is a probably a bit bigger than we initially thought. I'm assuming we'll need a mach command to run these tests locally, so I think that'll be step one. Once we have the mach command, the tasks can go ahead and run that. At this point we're going to be implementing a mini-harness similar to |mach python-test|. Note the Python code in AWFY, messy as it is, could be probably reduced and reused for making such a command, since it's all Python code too (as mach is, as far as I know). See for instance https://github.com/mozilla/arewefastyet/blob/master/slave/benchmarks_shell.py ; for ARES6 specifically, a file that contains shims to run it with Spidermonkey: https://github.com/mozilla/arewefastyet/blob/master/benchmarks/ares6/cli.js
Thanks, that's all really useful! So maybe to start at least, it would be better to depend on the regular builds (rather than the SM ones). I think this would be a little bit easier, with more prior art. Also there are a lot of SM builds and I don't know which one(s) to use :p. I'll take a look at benchmarks_shell.py too.
I decided to go the mach/srcdir route for running these benchmarks to avoid mozharness. I filed bug 1461980 to improve support for depending on build artifacts in these kinds of tasks (patch is nearly ready to put up for review). I'm also nearly done in this bug. Have a basic mach command working and will start getting the tasks set up on top of bug 1461980 (which is implementing most of the hard parts). p.s gps is actively working on getting these tasks running on Windows and OSX in bug 1436037, so I don't anticipate it'll take too much longer before we can run source-test tasks there as well.
Depends on: 1461980
I have Ares6 running on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2877a94de7870c8c1c52afe3f1c31799a0f55bb7&selectedJob=179755325 I just need to log the PERFHERDER_DATA results. Locally I have it formatted like this: https://pastebin.mozilla.org/9086136 Does that look about right? Any changes I should make? Perfherder options I should add? Also I'm not actually sure that lowerIsBetter for Ares6, but couldn't figure out where that was defined in AWFY.
Flags: needinfo?(jmaher)
Flags: needinfo?(bbouvier)
Nice work! Quickly comparing with https://arewefastyet.com/#machine=29&view=breakdown&suite=ares6, I can find that the ratios between the different variants of the Air benchmark seem comparable, so the results seem plausible to me. (Probably the machine is a bit slower than the AWFY's one; just have to make sure it's not busy doing other stuff elsewhere, otherwise the results will be flaky over time) AWFY is always rendering charts so that the lower, the better; it may invert the Y axis to do so. So lowerIsBetter is true, if and only if, AWFY shows "execution time (ms)" on the Y axis; if it shows "score", it means the Y axis has been inverted (that happens e.g. for Speedometer final score, shown as speedometer-misc-score, which is the actual main score we're interested at). The lowerIsBetter value is defined in AWFY's database and named "direction" in one of the tables. I think there's a general value of "direction" per test, and the possibility to overwrite the direction for specific sub tests, e.g. Speedometer final score is inverted but every other Speedometer test measures times, so not inverted. (You'll probably need something like this too for Speedometer in perfherder) Hope it helps.
Flags: needinfo?(bbouvier)
this is great to be close to up and running. I will be happy to turn off the browser version. This should run on hardware, not a random VM machine- you can do that by using the same machine type we do for talos: https://searchfox.org/mozilla-central/source/taskcluster/ci/test/talos.yml#12 We should look at OS coverage- does this work in general on linux/osx/windows? I think that is a reasonable improvement we can offer above and beyond AWFY (assuming the test works there). We should make sure the framework magically uploads and is recognized in perfherder- I believe that is a hardcoded list. One thing to consider is the 'unit' field, awsy uses that: https://searchfox.org/mozilla-central/source/testing/awsy/awsy/process_perf_data.py#74 that could be 'score' in this case so we understand it is a benchmark.
Flags: needinfo?(jmaher)
Thanks for all the feedback. (In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #9) > We should look at OS coverage- does this work in general on > linux/osx/windows? I think that is a reasonable improvement we can offer > above and beyond AWFY (assuming the test works there). Getting this running on Windows/OSX is blocked on bug 1436037. I'll ping gps and see if he is close. I'll file a follow-up that depends on that for running these tests there. > We should make sure the framework magically uploads and is recognized in > perfherder- I believe that is a hardcoded list. So you're saying there's some configuration needed on the perfherder side of things to get these jobs recognized? Should I file a bug for this? (In reply to Benjamin Bouvier [:bbouvier] from comment #8) > The lowerIsBetter value is defined in AWFY's database and named "direction" > in one of the tables. I think there's a general value of "direction" per > test, and the possibility to overwrite the direction for specific sub tests, > e.g. Speedometer final score is inverted but every other Speedometer test > measures times, so not inverted. (You'll probably need something like this > too for Speedometer in perfherder) Looks like perfherder can already set this value per subtest, so shouldn't be too hard. I think I'll defer implementing a generalized mechanism for this until we integrate speedometer though, just to keep things simpler for now.
Blocks: 1464043
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #9) > this is great to be close to up and running. I will be happy to turn off > the browser version. This should run on hardware, not a random VM machine- > you can do that by using the same machine type we do for talos: > https://searchfox.org/mozilla-central/source/taskcluster/ci/test/talos.yml#12 The 'virtualization' key is only available to 'test' tasks (which this new one isn't). However it's only used to determine the worker-type which is easy enough to specify manually. Plus it only has an impact on Windows, so no need to worry about this yet. I'll make a note in bug 1464043 to make sure we use the hardware pool for Windows.
we still need to use hardware for linux- the run you show is on a VM; Please make sure that we are using the hardware pool, I believe it is releng-hardware/gecko-t-linux-talos
Ah you're right, it's just hardcoded instead of using the 'virtualization' key.
So turns out changing the workerType to the hardware pool causes all sorts of problems in taskcluster. I hit and solved a few problems, but now I'm stuck on this: https://taskcluster-artifacts.net/BObadzklSmyTCco4rkberw/0/public/logs/live_backing.log The crux of the issue is that using the same hardware pool as Talos, uses a taskcluster worker called 'native-engine' instead of the normal 'docker-worker'. I guess in 'docker-worker', commands are always run with the root user, but not with 'native-engine'. I'd like to propose we do the following: 1) Land what I have for now (running the tasks on AWS) 2) Work with the taskcluster team and gps (his work in bug 1436037 might give us a path forward) to fix the issue and get this running on hardware like it's supposed to. 3) If for whatever reason this effort stalls out, we move it over to mozharness
I agree this is fine- we can continue getting things running without being blocked.
Comment on attachment 8980554 [details] Bug 1445975 - Port shims for running ARES-6 in a js shell from AWFY, https://reviewboard.mozilla.org/r/246708/#review252842 ::: third_party/webkit/PerformanceTests/ARES-6/glue.js (Diff revision 1) > driver.addBenchmark(MLBenchmarkRunner); > driver.readyTrigger(); > - > -if (location.search == '?gecko') { > - driver.start(6); > -} this will break the existing runs we do on the browser (look for the motionmark 'mm' job which runs on mozilla-central and try). As a note, we will be turning off ARES-6 on the browser and only using the shell in the near future.
Attachment #8980554 - Flags: review?(jmaher) → review-
Comment on attachment 8980555 [details] Bug 1445975 - Add a basic mach command for running jsshell benchmarks, https://reviewboard.mozilla.org/r/246710/#review252844 ::: testing/jsshell/benchmark.py:20 (Diff revision 1) > +from mozbuild.base import MozbuildObject, BuildEnvironmentNotFoundException > +from mozprocess import ProcessHandler > + > +here = os.path.abspath(os.path.dirname(__file__)) > +build = MozbuildObject.from_environment(cwd=here) > +BENCHMARK_PATH = os.path.join(build.topsrcdir, 'third_party', 'webkit', 'PerformanceTests') this might be too much hardcoding for future benchmarks, I think third_party is ok, or at least make it easy to override.
Attachment #8980555 - Flags: review?(jmaher) → review+
Comment on attachment 8980556 [details] Bug 1445975 - Add jsshell bench-ares6 task, https://reviewboard.mozilla.org/r/246712/#review252846 I cannot see where we specify which branches this runs on. This should run on mozilla-central and try. ::: taskcluster/ci/source-test/jsshell.yml:2 (Diff revision 1) > +job-defaults: > + platform: linux64/opt can we think of the future where we have different spidermonkey type builds and platforms and make this in the test description. ::: taskcluster/ci/source-test/jsshell.yml:20 (Diff revision 1) > + run: > + using: run-task > + use-artifacts: > + build: > + - target.jsshell.zip > + when: I don't think a when clause makes sense here- we want to get this data on all m-c runs.
Attachment #8980556 - Flags: review?(jmaher) → review-
Comment on attachment 8980556 [details] Bug 1445975 - Add jsshell bench-ares6 task, https://reviewboard.mozilla.org/r/246712/#review252846 Good catch, forgot about this. > can we think of the future where we have different spidermonkey type builds and platforms and make this in the test description. The transforms already put the platform in the description/label. Or did you mean something else? > I don't think a when clause makes sense here- we want to get this data on all m-c runs. Good catch! I did this out of habit
Comment on attachment 8980556 [details] Bug 1445975 - Add jsshell bench-ares6 task, https://reviewboard.mozilla.org/r/246712/#review252846 > The transforms already put the platform in the description/label. Or did you mean something else? I assumed that meant this only runs on linux64/opt; I would prefer mentioning other known future options as None/Null/[] for right now.
Comment on attachment 8980554 [details] Bug 1445975 - Port shims for running ARES-6 in a js shell from AWFY, https://reviewboard.mozilla.org/r/246708/#review252944 thanks
Attachment #8980554 - Flags: review?(jmaher) → review+
Comment on attachment 8980556 [details] Bug 1445975 - Add jsshell bench-ares6 task, https://reviewboard.mozilla.org/r/246712/#review252960 ::: taskcluster/ci/source-test/jsshell.yml:10 (Diff revision 2) > + by-platform: > + linux64.*: aws-provisioner-v1/gecko-t-linux-xlarge > + worker: > + by-platform: > + linux64.*: > + docker-image: {in-tree: "desktop1604-test"} I thought this would be on hardware as per our irc conversation.
Attachment #8980556 - Flags: review?(jmaher) → review-
Comment on attachment 8980556 [details] Bug 1445975 - Add jsshell bench-ares6 task, https://reviewboard.mozilla.org/r/246712/#review252960 > I thought this would be on hardware as per our irc conversation. I made a mistake in my patch (forgot to change the workerType back) and thought it was working when it wasn't.
Attachment #8980556 - Flags: review- → review?(jmaher)
Attachment #8980556 - Flags: review?(jmaher) → review+
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/891e8d850c4c Port shims for running ARES-6 in a js shell from AWFY, r=jmaher https://hg.mozilla.org/integration/autoland/rev/9538c21ccf18 Add a basic mach command for running jsshell benchmarks, r=jmaher https://hg.mozilla.org/integration/autoland/rev/eba75ea19102 Add jsshell bench-ares6 task, r=jmaher
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Blocks: 1464840
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: