Closed Bug 1311814 Opened 8 years ago Closed 8 years ago

get linux talos test jobs running in taskcluster

Categories

(Testing :: Talos, defect)

49 Branch
defect
Not set
normal

Tracking

(firefox53 fixed)

RESOLVED FIXED
mozilla53
Tracking Status
firefox53 --- fixed

People

(Reporter: rwood, Assigned: rwood)

References

(Depends on 1 open bug)

Details

Attachments

(1 file, 3 obsolete files)

Get the talos tests running via taskcluster on macosx hardware, reporting to treeherder. This involves using :wcosta's new macosx taskcluster worker [1], and defining the talos tests and platform etc. in the taskcluster desktop-tests setup [2]. [1] https://github.com/walac/gecko-dev/blob/macosx-intree/taskcluster/taskgraph/transforms/tests/make_task_description.py#L246 [2] https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/desktop-test
Depends on: 1274980
Comment on attachment 8809051 [details] Bug 1311814 - Talos TC configs for macosx Hi Wander, I'm not sure where to go from here... if you have some time to have a look that'd be great, thanks! (I'm not sure how to hook up the tests to only run with the talos '-t' switch, and also why is it only running the talos-chrome and talos-other jobs, and a different one for e10s?).
Attachment #8809051 - Flags: feedback?(wcosta)
Comment on attachment 8809051 [details] Bug 1311814 - Talos TC configs for macosx https://reviewboard.mozilla.org/r/91714/#review91910 ::: taskcluster/ci/desktop-test/test-platforms.yml:73 (Diff revision 3) > macosx64/debug: > build-platform: macosx64/debug > test-set: macosx64-tests > -# macosx64/opt: > -# build-platform: macosx64/opt > -# test-set: macosx64-tests > +macosx64/opt: > + build-platform: macosx64/opt > + test-set: macosx64-tests unittests should only run on debug builds for now. Could you please put talos under a different test set? ::: taskcluster/ci/desktop-test/tests.yml:597 (Diff revision 3) > - --reftest-suite=reftest-no-accel > > +talos-chrome: > + description: "Talos chrome" > + suite: talos > + unittest-try-name: chromez talos uses [talos-try-name](http://gecko.readthedocs.io/en/latest/taskcluster/taskcluster/attributes.html#talos-try-name)
Comment on attachment 8809051 [details] Bug 1311814 - Talos TC configs for macosx There were a couple of comments I made in mr. Strangely, not all talos tests were scheduled, and those which were, were are picked by any mac machines. I can't understand why. Other than that, it looks good. Dustin might have something to comment about.
Flags: needinfo?(dustin)
Attachment #8809051 - Flags: feedback?(wcosta) → feedback-
Comment on attachment 8809051 [details] Bug 1311814 - Talos TC configs for macosx https://reviewboard.mozilla.org/r/91714/#review91954 This looks great! Just in terms of scheduling, though: we need to run macosx64 tests and talos via Buildbot Bridge to start with, and probably not running anywhere by default (to conserve capacity). Then when we throw the switch and macosx64 opt goes to tier 1 (probably months from now), we will start doing tests and talos everywhere by default, but still via BBB. Only at that point will we start installing taskcluster-worker on batches of macs and transitioning to running tests and talos *without* BBB. So, it's awesome to get ahead of things -- write and run test declarations for talos jobs run via taskcluster and identify any major issues that come up while there's still plenty of time to solve them. But it's probably not worth getting too invested in it now, since we will not have the capacity to run it even at tier 2 until 2017. Running these jobs via BBB should be the short-term focus. ::: taskcluster/ci/desktop-test/test-platforms.yml:73 (Diff revision 3) > macosx64/debug: > build-platform: macosx64/debug > test-set: macosx64-tests > -# macosx64/opt: > -# build-platform: macosx64/opt > -# test-set: macosx64-tests > +macosx64/opt: > + build-platform: macosx64/opt > + test-set: macosx64-tests You're right about "for now" -- let's get talos "greened up" in debug, even if the performance numbers are meaningless. We could consider adding a "test-sets" (plural) option here, so that both opt and debug can specify "macosx64-tests": test-set: macosx64-tests and opt can add "macosx64-talos": test-sets: [macosx64-tests, macosx64-talos] ::: testing/mozharness/mozharness/mozilla/testing/talos.py:142 (Diff revision 3) > + [["--e10s"], { > + "action": "store_true", > + "dest": "e10s", > + "default": False, > + "help": "Run tests with e10s enabled" > + }], This sort of surprises me -- do we not currently run talos with e10s?
Comment on attachment 8809051 [details] Bug 1311814 - Talos TC configs for macosx https://reviewboard.mozilla.org/r/91714/#review91958 ::: taskcluster/ci/desktop-test/test-sets.yml:124 (Diff revision 3) > # - xpcshell > + - talos-chrome > + - talos-dromaeojs > + - talos-g1 > + - talos-g2 > + - talos-g3 g3 is linux64 only ::: testing/mozharness/mozharness/mozilla/testing/talos.py:142 (Diff revision 3) > + [["--e10s"], { > + "action": "store_true", > + "dest": "e10s", > + "default": False, > + "help": "Run tests with e10s enabled" > + }], we run talos in both e10s and non-e10s for all tests on all platforms. on buildbot we have a talos job 'tp5o' and 'tp5o-e10s' and we infer from the job name that we do e10s or disable it. For taskcluster we will need to have the flag enabled. I would like to default to e10s and make --disable-e10s the way to do things if possible. But if this confuses buildbot, lets not create a massive headache.
Regarding e10s, that makes sense -- we'll need to incorporate the e10s-ness into the buildername too, then :)
Flags: needinfo?(dustin)
Depends on: 1316077
Summary: get talos tests running via taskcluster on macosx hardware → get talos test jobs running in taskcluster
Attachment #8809051 - Attachment is obsolete: true
Comment on attachment 8813705 [details] Bug 1311814 - TC linux64 talos configs (restricted to try only); :wcosta, hey, I'm not sure how I can get the talos tests to run but in their own group (not part of 'all tests'/unit tests). If I create a separate group in 'test-sets.yml' i.e. 'talos-linux64-tests' and list them there, then they won't run even using the -t flag. They shouldn't be in 'all-tests' correct because we don't want them to run with the unit tests. Feedback appreciated, thanks!
Attachment #8813705 - Flags: feedback?(wcosta)
I think all you need is to add them to all-tests-opt [1] and the magic will happen. https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/desktop-test/test-sets.yml#41
Comment on attachment 8813705 [details] Bug 1311814 - TC linux64 talos configs (restricted to try only); When I updated the review request it removed the feedback flag; :wcosta please see comment 52 thanks!
Attachment #8813705 - Flags: feedback?(wcosta)
Attachment #8813705 - Flags: feedback?(wcosta)
(In reply to Wander Lairson Costa [:wcosta] from comment #54) > I think all you need is to add them to all-tests-opt [1] and the magic will > happen. > > https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/desktop-test/ > test-sets.yml#41 Thanks Wander!
Comment on attachment 8813705 [details] Bug 1311814 - TC linux64 talos configs (restricted to try only); I separated out the e10s vs non-e10s because with just the one test listed in tests.yml (and e10s true by default) when you specify try syntax with just the test name i.e. '-t chromez' it would run both the 'chromez' and 'chromez-e10s'. So I split them out, so now only the e10s tests are run if you specify '-t <name>-e10s' or if you run all the talos tests then e10s are also included.
Attachment #8813705 - Flags: review?(wcosta)
So currently on try (via buildbot) if you run a specific talos test i.e. '-t chromez' it will just run 'chromez' and not the e10s equivalent ('chromez-e10s'). In order to run on e10s you specify in the try syntax '-t chromez-e10s' as works in the current try-chooser page. I'm assuming that's the same behaviour we want here when running talos via TC - if you run all talos tests ('-t all') that will run e10s and non-e10s, but when running specific talos tests run on e10s/non-e10s as specified in the try syntax suite name. Just wanted to verify that's correct? That's how this patch works also. Here's a set of try runs of this patch that demonstrates: https://treeherder.mozilla.org/#/jobs?repo=try&author=rwood@mozilla.com&fromchange=7cbc3925f09a5ac91e3d68560fb5f57c7d1465c7&tochange=faf5ef72c06a1bc38219f65c778069e4eef1b1ac
Flags: needinfo?(jmaher)
Comment on attachment 8813705 [details] Bug 1311814 - TC linux64 talos configs (restricted to try only); https://reviewboard.mozilla.org/r/95104/#review95360 ::: taskcluster/ci/desktop-test/tests.yml:660 (Diff revision 5) > + - talos/linux_config.py > + - remove_executables.py > + extra-options: > + - --suite=chromez > + > +talos-chrome-e10s: You shouldn't need to list e10s separately like this -- the e10s transform should be enough to distinguish them. Can you go into more detail as to why that didn't work? I don't really understand what you're saying in comment 60. The `e10s` parameter defaults to `both`, and the e10s transform creates a task with the base name, and a task with an `-e10s` suffix. Likely that transform is not editing `talos-try-name`?
Comment on attachment 8813705 [details] Bug 1311814 - TC linux64 talos configs (restricted to try only); https://reviewboard.mozilla.org/r/95104/#review95360 > You shouldn't need to list e10s separately like this -- the e10s transform should be enough to distinguish them. Can you go into more detail as to why that didn't work? I don't really understand what you're saying in comment 60. The `e10s` parameter defaults to `both`, and the e10s transform creates a task with the base name, and a task with an `-e10s` suffix. Likely that transform is not editing `talos-try-name`? Without having e10s jobs listed separately, when specifying a single talos suite name in the try syntax i.e. '-t chromez', that actually runs 'chromez' and 'chromez-e10s' even when you've only specified the non-e10s version. On try chooser it gives you the option to choose 'chromez' or 'chromez-e10s' so I split them up to support that behaviour (see comment 61 also, thanks)
Yeah, that's a bug in your code, likely related to `talos-try-name`. Please fix it instead of working around it :)
The behavior of running both e10s and non-e10s when specifying a single test follows the unit tests behavior, any harm on following this for talos? I think this is a good policy to avoid regressions on e10s, but the limited number of real hardware might a showstopper for this.
rwood, thanks for asking. I think for now we should mirror the existing functionality. I wouldn't complain if we decided to just support automatic -e10s job addition. The main use case I am thinking of is when we are bisecting and doing --rebuild 5 on 10 pushes, if there are no e10s regressions and we can just specify the jobs we care about, then we really do get a win. A couple other things (unrelated to this patch): 1) clicking on the talos jobs from tc-T, yield artifact not found; that is the same reason there is no performance tab showing in the summary 2) we still get buildbot jobs posted in treeherder under t() for tc-T (likewise for e10s variant), that is confusing.
Flags: needinfo?(jmaher)
(In reply to Dustin J. Mitchell [:dustin] from comment #64) > Yeah, that's a bug in your code, likely related to `talos-try-name`. Please > fix it instead of working around it :) I don't understand what code you're referring to here.
Sorry to be unclear -- I'm referring to https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/tests/desktop_test.py#66 @transforms.add def split_e10s(config, tests): for test in tests: ... which should automatically create both an e10s and non-e10s version of each task, when given `e10s: both`. That should work for Talos, too. My guess about talos-try-name might be incorrect, since I don't see mention of unittest-try-name in the function body.
Thanks :dustin. Sorry if I'm super confused but here's my understanding: Yes the build-bot-bridge/transform work that :wcosta did does work that way, i.e. when running a single talos test on try via the TC desktop-tests it does automatically add a corresponding '-e10s' task/test job also, as you noted above. The issue is that currently, on buildbot, when running a single talos test on try the behaviour is to just run that single test (and NOT automatically add the -e10s version). i.e.: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c93fbbd0238b3358e274da7b38e540876affffa0 So in order to replicate that current behaviour, in my patch to add the talos TC configs, I had to separate the e10s and non-e10s jobs. So I guess as the guys mentioned in comment 65 and comment 66, the decision is when moving talos into TC do we switch and let it automatically add the '-e10s' talos jobs each time, or not. I'm not sure how else to mirror the existing buildbot talos behaviour, in TC, except by having both non-e10s and e10s configs in the test.yml.
What I'm trying to get at is, there's nothing sacred about tests.yml -- the transforms can and should change that content arbitrarily. For the tests, we have avoided writing every test twice (or, worse, 40 times for a test with 20 chunks!) by using transforms, and I would like to do the same for talos. So regardless of the expected try behavior, it is possible to write the "talos-chrome" stanza exactly once in tests.yml and still produce the correct set of e10s/non-e10s tasks and select the expected tasks based on try syntax. It's just a matter of modifying `split_e10s` so that it produces tasks with the appropriate attributes, and then modifying `try_option_syntax.py` to handle talos the way buildbot does (which I now understand differs from how unittests are handled).
(In reply to Dustin J. Mitchell [:dustin] from comment #70) Ah, great thanks Dustin!
Comment on attachment 8813705 [details] Bug 1311814 - TC linux64 talos configs (restricted to try only); https://reviewboard.mozilla.org/r/95104/#review97956
Attachment #8813705 - Flags: review?(wcosta) → review+
(In reply to Robert Wood [:rwood] from comment #75) > Thanks Wander. Try runs from earlier today looks good so I'm going to land: > > https://treeherder.mozilla.org/#/jobs?repo=try&author=rwood@mozilla. > com&fromchange=bfdeca30cf91be70a5cbec5a84a1db72d8ee6208&tochange=042a10ab7495 > 50bf990985b7dbc7b34c33a20817 laaaaaaannnnnd it \o/
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Backed out in https://hg.mozilla.org/mozilla-central/rev/272a12b8d16a - that wound up scheduling talos jobs on the stylo repo, where absolutely no talos builders exist in buildbot's worldview because the entire stylo repo doesn't exist in buildbot's worldview, and also scheduling every talos job on ash, where (for reasons, I presume) only the talos-g4 suite exists. As a result, everything scheduled on stylo is pending, and will be pending forever until someone finds a way to cancel them, and all the non-g4 jobs on ash are forever pending (though they can be cancelled through self-serve, since self-serve knows the repo exists). Also, not backout-worthy but bizarre: on mozilla-central all of the talos suites were being scheduled, but on autoland and mozilla-inbound, https://public-artifacts.taskcluster.net/GSvyTjZeQzayV-7EPGSM5w/0/public/logs/live_backing.log, everything except talos-chrome was being optimized away, for no clear reason.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla53 → ---
Blocks: 1322921
We shouldn't have scheduled anything, anywhere! We don't have the capacity to double-run Talos on any platform. Instead, we need to green this up in try, then we will slowly switch from Talos-via-BBB to this, after the platform has gone tier-1.
this is odd about how we are optimizing away all of the talos jobs except talos-chrome. This looks like a SETA thing, although SETA ignores talos when it calculates jobs to skip and in the database we have all talos jobs set as priority 1 (run all the time). One explanation might be that we have a name mixup of talos-chrome vs talos-chromez. That might explain what we don't optimize talos-chrome away during the decision task, but it then it scheduled as talos-chromez in buildbot- still, I am unclear why we do optimize any talos jobs away. One possibly solution here is when the decision task runs that the data we download from SETA is included as an artifact so we can at least inspect what SETA is doing to influence tasks to run or optimize.
Comment on attachment 8813705 [details] Bug 1311814 - TC linux64 talos configs (restricted to try only); Updated original patch to restrict the jobs to run on try only (talos on linux64 via BBB). Ran 'mach taskgraph target' with parameters.yml from autoland and zero talos jobs showed up. Looks good on try itself: https://treeherder.mozilla.org/#/jobs?repo=try&revision=be4e76ebc70a01a67df92746347575a4a56cc88c
Attachment #8813705 - Flags: review+ → review?
Attachment #8813705 - Flags: review? → review?(wcosta)
Comment on attachment 8813705 [details] Bug 1311814 - TC linux64 talos configs (restricted to try only); https://reviewboard.mozilla.org/r/95104/#review98704
Attachment #8813705 - Flags: review?(wcosta) → review+
linux64 talos tests are scheduled by Taskcluster through buildbot-bridge on try branch.
Comment on attachment 8818444 [details] [diff] [review] Disable linux64 talos tests on try branch. r=Callek I actually have no idea how I can test this before landing :/
Attachment #8818444 - Flags: review?(bugspam.Callek)
Comment on attachment 8818444 [details] [diff] [review] Disable linux64 talos tests on try branch. r=Callek Review of attachment 8818444 [details] [diff] [review]: ----------------------------------------------------------------- Ok, heres my understanding of this all first: * We are going to be scheduling these via taskcluster over Buildbot Bridge * Buildbot bridge does need the jobs defined in buildbot in order to actually be able to run them * We want to ONLY schedule them on try via taskcluster not buildbot (why? until this code is ready to enable them via tc-scheduling on central I'm not sure we want that) --- As for testing: The easiest way is to use a github branch, and create a PR (for auto travis support) [or enable travis on your fork] * Travis runs tox * Tox creates allthethings.json https://dxr.mozilla.org/build-central/source/buildbot-configs/tox_env.sh#27 * Running locally allthethings.json can be compared to help you identify if what you're changing is what you want ::: mozilla-tests/config.py @@ +2798,5 @@ > > +# disable linux64 on Try > +# Try is scheduled through buildbot-bridge > +all_but_try = list((set(BRANCHES.iteritem()) - set(["try"]))) > +delete_slave_platform(BRANCHES, PLATFORMS, {'linux64': 'ubuntu64_hw'}, branch_exclusions=all_but_try) Offhand this will delete both the scheduling magic AND the associated builders, which means you won't get jobs run with BBB this way.
Attachment #8818444 - Flags: review?(bugspam.Callek) → review-
Attachment #8818444 - Attachment is obsolete: true
Comment on attachment 8813705 [details] Bug 1311814 - TC linux64 talos configs (restricted to try only); After discussion on IRC, changed to 'run_on_projects: []'
Attachment #8813705 - Flags: review+ → review?(wcosta)
Comment on attachment 8813705 [details] Bug 1311814 - TC linux64 talos configs (restricted to try only); https://reviewboard.mozilla.org/r/95104/#review98910
Attachment #8813705 - Flags: review?(wcosta) → review+
Summary: get talos test jobs running in taskcluster → get linux talos test jobs running in taskcluster
Depends on: 1324911
Some notes on remaining work: 1. Tests should be greened up on Try with this patch landed 2. From there, tests can be enabled on trunk branches, BB scheduling disabled 2a. TC tasks should be hidden from TH (or perhaps we can mark them as tier 3 so they are not shown by default 2b. Retriggers/backfills will be handled through the buildbot jobs After this, we can start looking into migrating tasks from BBB to taskcluster-worker Some unknowns: 1. pain during merge day 2. backfilling/retrigger support if done through buildbot 3. SETA interference between TC and BBB jobs
Pushed by rwood@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b04e33a4c492 TC linux64 talos configs (restricted to try only);r=wcosta
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Attachment #8821275 - Attachment is obsolete: true
Attachment #8821275 - Flags: review?(dustin)
Comment on attachment 8821275 [details] Bug 1311814: Disable treeherder reporting for talos jobs. https://reviewboard.mozilla.org/r/100572/#review101130 I'd rather have this linked to the buildbot-bridge payload function -- that way, when we start transitioning jobs away from buildbot bridge, we won't have to rememeber to update this logic too. I think that just means removing both task.extra.treeherder and the treeherder task.routes in `build_buildbot_bridge_payload`
Comment on attachment 8821275 [details] Bug 1311814: Disable treeherder reporting for talos jobs. https://reviewboard.mozilla.org/r/100572/#review101130 Wouldn't this demand a change to mozilla-taskcluster too?
Comment on attachment 8821275 [details] Bug 1311814: Disable treeherder reporting for talos jobs. https://reviewboard.mozilla.org/r/100572/#review101130 I don't think so -- once it has scheduled the decision task, it doesn't much care what happens. Maybe I'm failing to see something, though..?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: