Dynamically select which tests run during taskgraph generation
Categories
(Firefox Build System :: Task Configuration, enhancement)
Tracking
(Not tracked)
People
(Reporter: ahal, Assigned: ahal)
References
(Blocks 2 open bugs)
Details
(Whiteboard: [manifest-scheduling])
Attachments
(18 files, 1 obsolete file)
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
Bug 1633866 - [taskgraph] Always use chunk number as symbol if it would otherwise be empty, r?jmaher
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details |
Currently how we schedule tests in ./mach try auto
goes something like this:
- Define how many chunks N should exist for each configuration/suite combo in the
.yml
files. - Create N tasks for each configuration/suite.
- Take all tests and distribute them somewhat evenly across those N tasks.
- Later on, determine which tests are important and run only the tasks that contain those tests.
This scheme isn't ideal because we determine which tests run in which tasks right at the outset, before we know whether the tests are relevant to the push or not. So when we schedule the tasks containing "important" tests, we also run all the other tests that just happen to be chunked in that task as well.
A much better process would look something like:
- Determine which tests are important.
- Compute how many chunks N it would take to run those tests in a reasonable amount of time.
- Distribute only the important tests evenly across those chunks.
- Don't optimize further (based on test containment at least)
This way we only run the important tests. There is no collateral damage.
This will be a pretty large departure and break many workflows and assumptions we have in our tooling. Including but not limited to:
- Push Health
- Backfills
- Sheriff / developer treeherder workflows
- Cost analysis data
- Task selection with
./mach try
To start, this will likely only be enabled when using ./mach try auto
. Though eventually this mode will make its way to autoland itself.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
We use the term 'tests' to refer to 'tasks' in the tests.py transforms. Imo,
this makes things very hard to follow as the term 'test' is also used for all
kinds of other contexts in that file. Let's just call them what they are:
tasks.
I decided to land this as part of this series as I will be adding further uses
of the word 'test' later on.
Assignee | ||
Comment 3•5 years ago
|
||
With dynamic-test-selection, we'll also need to query the bugbug service from
the transforms. Let's move the querying logic to a utility file to share it
more easily.
Assignee | ||
Comment 4•5 years ago
|
||
Comment 5•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Comment 8•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Assignee | ||
Comment 9•5 years ago
|
||
Small improvement with respect to the single responsibility principle. The
taskgraph diff is identical.
Assignee | ||
Comment 10•5 years ago
|
||
This is instead of tests and will make it easier to re-use in the taskgraph.
This commit is a straight refactor and results in zero differences in the
taskgraph.
Depends on D74448
Assignee | ||
Comment 11•5 years ago
|
||
In the 'chunk_by_runtime' algorithm a 'get_manifest' helper function is used to
determine what "manifest" a test belongs to. The logic is basically, if
'ancestor-manifest' exists then use that. Otherwise use 'manifest_relpath'.
We need to do this because in some cases a "shared" manifest can be included
multiple times from parent manifests, each with a different configuration.
However, when we calculate the "skipped" manifests in chunking.py, we were
simply using 'manifest_relpath' and ignoring 'ancestor-manifest'. I believe
this meant we were mis-reporting which manifests were skipped in the task logs.
It possibly even meant we were double-scheduling some tests (i.e, if the
'skip-if' was in the ancestor, not the shared one). I'm not sure if the double
scheduling was actually happening in practice, but it's certainly theoretically
possible.
Afaict, this only affected a handful of xpcshell manifests on Windows and
Android.
Depends on D74449
Assignee | ||
Comment 12•5 years ago
|
||
This sets things up to be a little bit easier and cleaner to modify going
forward.
Depends on D74451
Comment 13•5 years ago
|
||
Comment 14•5 years ago
|
||
bugherder |
Comment 15•5 years ago
|
||
bugherder |
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 16•4 years ago
|
||
When performing diffs, identical parameters files need to be used. This makes
it possible to wget the exact same parameters used by a previous |mach
taskgraph| run.
Assignee | ||
Comment 17•4 years ago
|
||
This moves the creation of WPT groups into the TestResolver, where all kinds of
other consumers will be able to access them.
Comment 18•4 years ago
|
||
Comment 19•4 years ago
|
||
bugherder |
Comment 20•4 years ago
|
||
Comment on attachment 9150346 [details]
Bug 1633866 - [taskgraph] Move WPT group assignment into the TestResolver, r?egao
Revision D76085 was moved to bug 1634551. Setting attachment 9150346 [details] to obsolete.
Updated•4 years ago
|
Comment 21•4 years ago
|
||
Comment on attachment 9150346 [details]
Bug 1633866 - [taskgraph] Move WPT group assignment into the TestResolver, r?egao
Revision D76085 was moved to bug 1634551. Setting attachment 9150346 [details] to obsolete.
Updated•4 years ago
|
Updated•4 years ago
|
Comment 22•4 years ago
|
||
Comment 23•4 years ago
|
||
bugherder |
Assignee | ||
Comment 24•4 years ago
|
||
This allows tasks to opt into dynamic chunking. This means rather than define
the chunks ahead of time, in-tree manifest runtime data is used to guess how
many chunks are needed for a reasonable average task runtime (currently 20
min).
Only suites that are chunked in the taskgraph, and therefore whose manifests
are known ahead of time, can opt into this feature.
Assignee | ||
Comment 25•4 years ago
|
||
Currently test manifests are loaded by instantiating a TestResolver and
traversing moz.build files to find manifests.
With 'manifest-scheduling', we'll want to grab the manifests directly from the
bugbug service instead. Initially we'll want to enable manifest-scheduling with
|mach try auto|, but not on autoland yet.
This patch will allow |mach try auto| to set the parameter that causes bugbug
to be used (see future commits in this bug).
Depends on D76497
Assignee | ||
Comment 26•4 years ago
|
||
Depends on D76522
Assignee | ||
Comment 27•4 years ago
|
||
Depends on D76523
Comment 28•4 years ago
|
||
Comment 29•4 years ago
|
||
bugherder |
Comment 30•4 years ago
|
||
Assignee | ||
Comment 31•4 years ago
|
||
This change is a straight refactor and results in identical full taskgraphs.
Updated•4 years ago
|
Updated•4 years ago
|
Comment 32•4 years ago
|
||
bugherder |
Assignee | ||
Comment 33•4 years ago
|
||
Here's my latest progress:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=06a60a11960d2d645c25c9d25f46742a1bb713ad
Combined with the fixes to stop scheduling shippable builds, I'd say this is looking really good!
A couple issues to resolve:
- Why is the mochitest-plain symbol "unknown" ?
- Why did the Linux xpcshell tasks result in only one manifest per task? (Notably the Android ones didn't)
- Do we need to add more configuration specific runtime data to get balanced chunks everywhere? Or is what we have good enough for now.
Assignee | ||
Comment 34•4 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #33)
- Why is the mochitest-plain symbol "unknown" ?
This one is easy. The default symbols for mochitest-plain are just numbers and nothing else. When we split chunks, the numbers are appended to the existing symbol here:
https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/transforms/tests.py#1495
But since we only have one chunk we never enter that if statement and we likely assign the symbol an null value, which treeherder then defaults to "unknown".
Assignee | ||
Comment 35•4 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #33)
- Why did the Linux xpcshell tasks result in only one manifest per task? (Notably the Android ones didn't)
The algorithm is working as expected. This was happening because there is one manifest that takes 100+ minutes apparently (5x the desired length). So this causes the computed number of dynamic chunks to be artificially high. For example, assume we have three manifests:
foo -> 98 minutes
bar -> 1 minute
baz -> 1 minute
And let's say we want average chunk lengths of ~20 minutes. The calculation would currently say we need 5 chunks ((98 + 1 + 1) / 20
). Even if set the max chunks to the # of manifests, it would still return 3. Clearly this only needs 2 chunks to run.
I think if we truncated all runtimes that exceed the desired chunk duration, it would balance things out nicer.
- Do we need to add more configuration specific runtime data to get balanced chunks everywhere? Or is what we have good enough for now.
I think it's ok to not block on this for |mach try auto|. Though we'll likely need better runtime data before we make it live on autoland.
Assignee | ||
Comment 36•4 years ago
|
||
In my try push, truncating task runtimes would have reduced xpcshell chunks from 13 -> 7. So fix seems to work.
Assignee | ||
Comment 37•4 years ago
|
||
Depends on D76523
Assignee | ||
Comment 38•4 years ago
|
||
For example, assume we have three manifests:
foo -> 98 minutes
bar -> 1 minute
baz -> 1 minute
And let's say we want average chunk lengths of ~20 minutes. The calculation
would currently say we need 5 chunks ((98 + 1 + 1) / 20). Even if set the max
chunks to the # of manifests, it would still return 3. Clearly this only needs
at most 2 chunks to run.
By truncating the runtime, we'd have:
foo -> 20 minutes (truncated)
bar -> 1 minute
baz -> 1 minute
So we'd actually only assign one chunk (round((20 + 1 + 1) / 20)). Despite
having a single 100 minute chunk, this outcome is desirable as we now avoid the
overhead of a second chunk that only runs 2 minutes worth of tests.
Depends on D77871
Updated•4 years ago
|
Assignee | ||
Comment 39•4 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #35)
- Do we need to add more configuration specific runtime data to get balanced chunks everywhere? Or is what we have good enough for now.
I think it's ok to not block on this for |mach try auto|. Though we'll likely need better runtime data before we make it live on autoland.
I might have spoke too soon. The runtime data for xpcshell seems particularly bad. It's scheduling ~4 minute tasks and the full set of manifests take 20+ chunks. I'll need to either fix the data (preferable) or create some kind of max-chunk mechanism. Though the latter solution will still result in very short chunks when we're under the max, so it's a last resort.
Assignee | ||
Comment 40•4 years ago
|
||
A better "last resort" might be to artificially scale all xpcshell runtimes down. But I'll definitely spend a lot of time investigating why the runtime data doesn't seem to match reality.
Assignee | ||
Comment 41•4 years ago
|
||
I think the xpcshell issue is due to a combination of problems:
- A bug in the
writeruntimes
script, which is adding up the tests of included manifests multiple times. For example, ifxpcshell-common.ini
is included by three different parent manifests, then writeruntimes will collate each instance up into the ancestor manifest here:
https://searchfox.org/mozilla-central/source/testing/runtimes/writeruntimes#187
So though those tests are counted 3 times under the same manifest.
- Insufficient mozinfo guessing. I see some values missing (processor, bits, toolkit) that are commonly used to skip entire manifests in xpcshell.
Assignee | ||
Comment 42•4 years ago
|
||
Nvm, xpcshell chunks are way off because they run in parallel ><
It looks like they run on 8 cores in CI (at least for Linux). I'll try dividing the total computed runtime by that.
Assignee | ||
Comment 43•4 years ago
|
||
This is needed because xpcshell runs in parallel. Therefore we need to know the number
of threads are used (which varies by platform) in order to accurately determine how
many chunks are needed.
This allows us to specify a platform / suite specific multiplier to compensate via trial
and error.
Assignee | ||
Comment 44•4 years ago
|
||
Ensure runtimes are up to date prior to enabling manifest-scheduling
anywhere.
Depends on D78153
Assignee | ||
Comment 45•4 years ago
|
||
Adds a few properties to the mozinfo guess that were previously missing.
Depends on D78155
Comment 46•4 years ago
|
||
Comment 47•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/acace34eb88d
https://hg.mozilla.org/mozilla-central/rev/69790da5ec2c
https://hg.mozilla.org/mozilla-central/rev/b39e4d67af92
https://hg.mozilla.org/mozilla-central/rev/fd095da6e40d
https://hg.mozilla.org/mozilla-central/rev/828a49b4b520
https://hg.mozilla.org/mozilla-central/rev/2dcae3ff17c1
https://hg.mozilla.org/mozilla-central/rev/941d4125e9c3
https://hg.mozilla.org/mozilla-central/rev/275d082f51b3
Assignee | ||
Comment 48•4 years ago
|
||
I'm going to call this fixed. I filed bug 1643689 to track enabling manifest-scheduling on autoland.
Description
•