do not schedule builds/tests for pushes that will not benefit
Categories
(Testing :: General, task, P2)
Tracking
(firefox76 fixed)
Tracking | Status | |
---|---|---|
firefox76 | --- | fixed |
People
(Reporter: jmaher, Assigned: gbrown)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
(Whiteboard: [ci-costs-2020:done])
Attachments
(2 files)
there is a concept of DONTBUILD and we need to set taskcluster job scheduler for this as I find we run the default set of tasks for many pushes that don't need it.
- backouts (not run anything)
- doc only changes- .md files (only run doclint - although that should be at reviewbot)
- test manifest changes (disable a test) - figure out a task to ensure it parses correctly
- editing a test case - only run that test case [job that contains it] on a single platform (linux64/debug)
- test harness - only run a single job of that test harness and/or unittests
- taskcluster change - only run the jobs that changed from the previous run
- other ideas?!?
Assignee | ||
Updated•5 years ago
|
Comment 1•5 years ago
|
||
Looks like doc changes already have this set up:
https://searchfox.org/mozilla-central/source/taskcluster/moz.build#17
So we just need to make sure all our docs dirs have this metadata applied. Maybe we should have a global rule for all docs:
with Files('**/docs/**'):
SCHEDULES.exclusive = ['docs']
This will catch new docs
folders, though assumes that every folder called docs
is meant for Sphinx.
Assignee | ||
Updated•5 years ago
|
Reporter | ||
Comment 2•5 years ago
|
||
I looked at the last 10000 commits on autoland (sorted by bug) and found:
5890 unique bugs landed
702 backouts
1670 bugs had only test related changes (75 of those were backouts)
in total: 39% of commits would be an upper bound that we could create rules for, maybe reality is 30%
suggested rules:
- backout: DONTBUILD (or decision task only)
- wpt: run only changed tests (test-verify) - ideally on limited platforms like win10 pgo and linux64 pgo, run linters
- other test: run linters, sanity test on linux64 for each harness (mochitest/tests/harness_sanity, reftest-sanity, xpcshell-?)
This will play odd with some rules if we override SETA and it is time to do a full set of tests, but we hit a backout. As we have a 30% chance of a non browser test change, there is a good chance we will often run into this. Possibly the sheriffs should be involved here to be aware of and provide input.
Using the above rules, I think we could see a 25-28% reduction in load. Given that there is backfills/retriggers needed and some edge cases we might not want to deal with, I will call this a 20% savings in our overall load, or 175K annual savings.
Assignee | ||
Comment 3•5 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #2)
suggested rules:
- backout: DONTBUILD (or decision task only)
That seems controversial.
:aryx - Do you think it would be okay if no builds or tests ran on backout pushes?
Assignee | ||
Comment 4•5 years ago
|
||
Some possible issues:
- If the backout introduced a failure (an inaccurate/incomplete/typo-ed backout) failures would not appear until (and might be blamed on!) the next push.
- The trees are closed, sheriffs push a backout, no tests run on the backout: Need to re-open the trees and land another push to determine if the backout resolved the failures that originally closed the tree. That seems inconvenient.
- If the backout was successful, we expect test results to be (more or less) the same as they were before the backed out patch originally landed; but to verify that, folks now need to compare the push after backout with the push before the backed out patch -- which may have a dozen or so pushes between them. (Visually inconvenient/confusing on treeherder?)
Reporter | ||
Comment 5•5 years ago
|
||
good points :gbrown. Is there a way to take a list of failures from the previous commits that are affected and run just those jobs (or a sample of those jobs)?
One thought I have is that treeherder can determine what is marked as "fixed by commit" and then do an 'add new jobs' magic trick (this would need to be added to treeherder). Of course that isn't ideal as it takes additional sheriff time and steps to complete the backout and verify it. Currently taskcluster queries treeherder to get SETA data, maybe it can query treeherder to get a list of jobs to run as part of verification. If that fails run the builds and a single job to ensure the browser runs.
I would be curious to hear what :aryx thinks about this.
Comment 6•5 years ago
|
||
When we first discussed about the backout stuff, the initial idea was to only run the failing jobs that caused the patch to be backed out in the first place. This should alleviate (or, actually, fix) the issues outlined in comment 4.
The implementation would be a bit more complex, as you would need to 1) parse the backout's commit message to figure out which patches were backed out, 2) find the pushes corresponding to those backouts; 3) get the failures from those pushes; 4) schedule them in the backout push.
You would also miss cases like this:
- Patch A is pushed, all scheduled jobs are successful;
- Patch B is pushed, a job is failing but it's clearly Patch A's fault;
- A sheriff backs out Patch A.
In this case we'd need to actually schedule the failures from the second push, rather than the second.
One easy way to fix this is to do 1) parse the backout's commit message to figure out which patches were backed out, 2) find all the pushes between the culprit patches pushes and the backout push; 3) get the failures from those pushes; 4) schedule them in the backout push.
More expensive, but complete.
Comment 7•5 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #2)
I looked at the last 10000 commits on autoland (sorted by bug) and found:
5890 unique bugs landed
702 backouts
1670 bugs had only test related changes (75 of those were backouts)
804 bugs which got fixed in the last 10000 commits on autoland were part of the wpt-sync and represent 10 pushes (that omits follow-up commits, Treeherder's DB doesn't have information about files modified).
suggested rules:
- backout: DONTBUILD (or decision task only)
- If the backout is directly after the push which gets backed out and the failing tasks are all build jobs, the backout should be canceled.
- If only one platform is failing (e.g. Windows packaging, Android) the complete builds for Linux etc. are already sunk costs and keeping the running tasks running lets them act as a Try push (that the build bustage didn't get caught before indicates a missing Try push or with few or the wrong tests).
- If the backout push is not directly after the backed out push and has the full job set, it shall not be canceled as it will provide coverage for the previous pushes.
- If the backout push is not directly after the backed out push which was about build bustage, broke test suites or caused mass failures, it shall not be canceled because it provides test coverage.
- If the backout push is not directly after the backed out push which was about limit test failures and the backout push doesn't have the full job set, it can be canceled/use DONTBUILD as it won't provide new test coverage.
Regarding the request to verify that a backout fixed an issue, this should be done for builds and would be welcome for tests (in the current situation, they don't always run on the backout push).
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 8•5 years ago
|
||
Another possible savings: consider https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=2d123a680904f034dbf2d9319b929acc65f8d3a2
Only testing/web-platform files were modified and there is a SCHEDULES.exclusive for that case:
https://searchfox.org/mozilla-central/rev/42c2ecdc429115c32e6bcb78bf087a228a051044/testing/web-platform/moz.build#18
web-platform tests ran on that push, and no mochitest/reftest/etc - great! - but:
- spidermonkey tasks ran on most platforms;
- many build tasks ran, even ones where no wpt tests ran.
Why? Can we do better?
Assignee | ||
Comment 9•5 years ago
|
||
Another possible savings: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=89119ca63f83d6e661d136dd9e2fc5638ce320a3
Only a browser-chrome test was modified, but various test suites ran. Could we add a .exclusive for files test/browser*.js -> browser-chrome? (Did we already try that once?)
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 10•5 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #9)
(Did we already try that once?)
I was thinking of bug 1522113, but that was specifically for test-verify.
Assignee | ||
Comment 11•5 years ago
|
||
Add some SCHEDULES rules so that, when a push only modifies files known to be associated
with a particular test suite, only that test suite is run.
Comment 12•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Comment 13•5 years ago
|
||
bugherder |
Assignee | ||
Comment 14•5 years ago
|
||
Comment 15•5 years ago
|
||
Would it be possible to avoid to run tasks for a backout patch caused by linting issues ? For example:
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=15dfb02919db9c2b212a2f72eaafca0a19eb4660
Assignee | ||
Comment 16•5 years ago
|
||
(In reply to Calixte Denizet (:calixte) from comment #15)
Would it be possible to avoid to run tasks for a backout patch caused by linting issues ? For example:
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=15dfb02919db9c2b212a2f72eaafca0a19eb4660
I don't think so? It wouldn't be possible to detect the cause of the backout reliably; we'd need to check for some magic phrase in the check-in message, like "for linting failures".
But maybe more importantly, I don't think a linting failure is necessarily a good reason to reduce testing. Even in the referenced example, it looks like something is still broken after the backout. Was that because the backout was faulty/incomplete? Or was there another push that broke tests before the tree was closed? Isn't it best to run tests and verify the state of the tree after the backout?
Assignee | ||
Comment 17•5 years ago
|
||
Use finer granularity for reftest/mochitest SCHEDULES.exclusive categories,
so that reftest-plain does not run when only crashtests are modified, and
vice versa; similarly, break up mochitest into mochitest/browser-chrome/chrome/
a11y.
Assignee | ||
Comment 18•5 years ago
|
||
There may be an additional opportunity for devtools once bug 1564431 is complete.
Reporter | ||
Comment 19•5 years ago
|
||
:gbrown, is there a need to review/land your patch?
Reporter | ||
Updated•5 years ago
|
Assignee | ||
Comment 20•5 years ago
|
||
My patch needs changes to proceed -- see review comments. I hope to get back to it soon.
Comment 21•5 years ago
|
||
NOTE: some of these might be backed-in the ML model as hardcoded rules.
Reporter | ||
Comment 22•5 years ago
|
||
:gbrown, I know you are working on some other bugs, do you have an ETA when you think you will pick this bug up again?
Assignee | ||
Comment 23•5 years ago
|
||
Next week!
Updated•5 years ago
|
Comment 24•5 years ago
|
||
Assignee | ||
Comment 25•5 years ago
|
||
That's all the ideas I have for now. Let's pursue any other ideas in new bugs.
Comment 26•5 years ago
|
||
bugherder |
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 27•5 years ago
|
||
looking at the original list:
backouts (not run anything)
doc only changes- .md files (only run doclint - although that should be at reviewbot)
test manifest changes (disable a test) - figure out a task to ensure it parses correctly
editing a test case - only run that test case [job that contains it] on a single platform (linux64/debug)
test harness - only run a single job of that test harness and/or unittests
taskcluster change - only run the jobs that changed from the previous run
other ideas?!?
I see we focused on:
test manifest changes (disable a test) - figure out a task to ensure it parses correctly
what I am concerned with is the changes looked for *.ini and ran the appropriate subsuite. There are some extra rules for files changed in a directory which would pick up many harness or other tooling changes.
if we already run test-verify for a test that is added or edited, then we need to deal with the test disabled and harness/tooling cases. This bug handles a lot of that, but it really seems as though we are running a lot of tests for the disable a test case. The changes we have are perfect for harness/tooling changes (in place of a comprehensive test harness harness)
Could we have a simpler rule for test disabling? One where we just run a manifest lint job to ensure all the manifests can be parsed correctly? I am happy to roll this into a new bug. Unfortunately I wasn't able to measure any cost reductions/increases as a result of this - I think in some cases we spent more and in a few cases we saved, but overall it was a wash (which is ok)
Assignee | ||
Comment 28•5 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #27)
Could we have a simpler rule for test disabling? One where we just run a manifest lint job to ensure all the manifests can be parsed correctly?
I don't think we can reasonably distinguish test disabling from enabling, or other manifest changes. I think it all depends on whether people can adapt to and live with the testing differences. Consider:
- test is enabled and starts failing on the next push -- will sheriffs notice what has actually happened or will they back out the first failing push?
- test is disabled on the wrong platform, but no one will see that the test is still failing until it runs on the next non-SETA-optimized push.
I think in some cases we spent more and in a few cases we saved, but overall it was a wash (which is ok)
Is that because the tasks chosen by SCHEDULES rules will not be SETA-optimized? I suppose now a browser.ini change will trigger (bc1 bc2 bc3 bc4 bc5) where SETA might have run (bc3 X2 R7).
Reporter | ||
Comment 29•5 years ago
|
||
yeah, SETA would cherry pick tasks and configs to run them on where now we run bc1-5 on 20 configs.
Updated•5 years ago
|
Description
•