Closed Bug 1598746 Opened 5 years ago Closed 5 years ago

do not schedule builds/tests for pushes that will not benefit

Categories

(Testing :: General, task, P2)

Version 3
task

Tracking

(firefox76 fixed)

RESOLVED FIXED
mozilla76
Tracking Status
firefox76 --- fixed

People

(Reporter: jmaher, Assigned: gbrown)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [ci-costs-2020:done])

Attachments

(2 files)

there is a concept of DONTBUILD and we need to set taskcluster job scheduler for this as I find we run the default set of tasks for many pushes that don't need it.

  • backouts (not run anything)
  • doc only changes- .md files (only run doclint - although that should be at reviewbot)
  • test manifest changes (disable a test) - figure out a task to ensure it parses correctly
  • editing a test case - only run that test case [job that contains it] on a single platform (linux64/debug)
  • test harness - only run a single job of that test harness and/or unittests
  • taskcluster change - only run the jobs that changed from the previous run
  • other ideas?!?

Looks like doc changes already have this set up:
https://searchfox.org/mozilla-central/source/taskcluster/moz.build#17

So we just need to make sure all our docs dirs have this metadata applied. Maybe we should have a global rule for all docs:

with Files('**/docs/**'):
    SCHEDULES.exclusive = ['docs']

This will catch new docs folders, though assumes that every folder called docs is meant for Sphinx.

Assignee: nobody → gbrown

I looked at the last 10000 commits on autoland (sorted by bug) and found:
5890 unique bugs landed
702 backouts
1670 bugs had only test related changes (75 of those were backouts)

in total: 39% of commits would be an upper bound that we could create rules for, maybe reality is 30%

suggested rules:

  • backout: DONTBUILD (or decision task only)
  • wpt: run only changed tests (test-verify) - ideally on limited platforms like win10 pgo and linux64 pgo, run linters
  • other test: run linters, sanity test on linux64 for each harness (mochitest/tests/harness_sanity, reftest-sanity, xpcshell-?)

This will play odd with some rules if we override SETA and it is time to do a full set of tests, but we hit a backout. As we have a 30% chance of a non browser test change, there is a good chance we will often run into this. Possibly the sheriffs should be involved here to be aware of and provide input.

Using the above rules, I think we could see a 25-28% reduction in load. Given that there is backfills/retriggers needed and some edge cases we might not want to deal with, I will call this a 20% savings in our overall load, or 175K annual savings.

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #2)

suggested rules:

  • backout: DONTBUILD (or decision task only)

That seems controversial.

:aryx - Do you think it would be okay if no builds or tests ran on backout pushes?

Flags: needinfo?(aryx.bugmail)

Some possible issues:

  1. If the backout introduced a failure (an inaccurate/incomplete/typo-ed backout) failures would not appear until (and might be blamed on!) the next push.
  2. The trees are closed, sheriffs push a backout, no tests run on the backout: Need to re-open the trees and land another push to determine if the backout resolved the failures that originally closed the tree. That seems inconvenient.
  3. If the backout was successful, we expect test results to be (more or less) the same as they were before the backed out patch originally landed; but to verify that, folks now need to compare the push after backout with the push before the backed out patch -- which may have a dozen or so pushes between them. (Visually inconvenient/confusing on treeherder?)

good points :gbrown. Is there a way to take a list of failures from the previous commits that are affected and run just those jobs (or a sample of those jobs)?

One thought I have is that treeherder can determine what is marked as "fixed by commit" and then do an 'add new jobs' magic trick (this would need to be added to treeherder). Of course that isn't ideal as it takes additional sheriff time and steps to complete the backout and verify it. Currently taskcluster queries treeherder to get SETA data, maybe it can query treeherder to get a list of jobs to run as part of verification. If that fails run the builds and a single job to ensure the browser runs.

I would be curious to hear what :aryx thinks about this.

When we first discussed about the backout stuff, the initial idea was to only run the failing jobs that caused the patch to be backed out in the first place. This should alleviate (or, actually, fix) the issues outlined in comment 4.

The implementation would be a bit more complex, as you would need to 1) parse the backout's commit message to figure out which patches were backed out, 2) find the pushes corresponding to those backouts; 3) get the failures from those pushes; 4) schedule them in the backout push.
You would also miss cases like this:

  1. Patch A is pushed, all scheduled jobs are successful;
  2. Patch B is pushed, a job is failing but it's clearly Patch A's fault;
  3. A sheriff backs out Patch A.
    In this case we'd need to actually schedule the failures from the second push, rather than the second.

One easy way to fix this is to do 1) parse the backout's commit message to figure out which patches were backed out, 2) find all the pushes between the culprit patches pushes and the backout push; 3) get the failures from those pushes; 4) schedule them in the backout push.
More expensive, but complete.

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #2)

I looked at the last 10000 commits on autoland (sorted by bug) and found:
5890 unique bugs landed
702 backouts
1670 bugs had only test related changes (75 of those were backouts)

804 bugs which got fixed in the last 10000 commits on autoland were part of the wpt-sync and represent 10 pushes (that omits follow-up commits, Treeherder's DB doesn't have information about files modified).

suggested rules:

  • backout: DONTBUILD (or decision task only)
  1. If the backout is directly after the push which gets backed out and the failing tasks are all build jobs, the backout should be canceled.
  2. If only one platform is failing (e.g. Windows packaging, Android) the complete builds for Linux etc. are already sunk costs and keeping the running tasks running lets them act as a Try push (that the build bustage didn't get caught before indicates a missing Try push or with few or the wrong tests).
  3. If the backout push is not directly after the backed out push and has the full job set, it shall not be canceled as it will provide coverage for the previous pushes.
  4. If the backout push is not directly after the backed out push which was about build bustage, broke test suites or caused mass failures, it shall not be canceled because it provides test coverage.
  5. If the backout push is not directly after the backed out push which was about limit test failures and the backout push doesn't have the full job set, it can be canceled/use DONTBUILD as it won't provide new test coverage.

Regarding the request to verify that a backout fixed an issue, this should be done for builds and would be welcome for tests (in the current situation, they don't always run on the backout push).

Flags: needinfo?(aryx.bugmail)
Priority: -- → P2

Another possible savings: consider https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=2d123a680904f034dbf2d9319b929acc65f8d3a2

Only testing/web-platform files were modified and there is a SCHEDULES.exclusive for that case:
https://searchfox.org/mozilla-central/rev/42c2ecdc429115c32e6bcb78bf087a228a051044/testing/web-platform/moz.build#18

web-platform tests ran on that push, and no mochitest/reftest/etc - great! - but:

  • spidermonkey tasks ran on most platforms;
  • many build tasks ran, even ones where no wpt tests ran.
    Why? Can we do better?

Another possible savings: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=89119ca63f83d6e661d136dd9e2fc5638ce320a3

Only a browser-chrome test was modified, but various test suites ran. Could we add a .exclusive for files test/browser*.js -> browser-chrome? (Did we already try that once?)

(In reply to Geoff Brown [:gbrown] from comment #9)

(Did we already try that once?)

I was thinking of bug 1522113, but that was specifically for test-verify.

Add some SCHEDULES rules so that, when a push only modifies files known to be associated
with a particular test suite, only that test suite is run.

Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3bdb8c990a49 Add some SCHEDULES optimizations for changes to test files; r=ahal
Keywords: leave-open

Would it be possible to avoid to run tasks for a backout patch caused by linting issues ? For example:
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=15dfb02919db9c2b212a2f72eaafca0a19eb4660

(In reply to Calixte Denizet (:calixte) from comment #15)

Would it be possible to avoid to run tasks for a backout patch caused by linting issues ? For example:
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=15dfb02919db9c2b212a2f72eaafca0a19eb4660

I don't think so? It wouldn't be possible to detect the cause of the backout reliably; we'd need to check for some magic phrase in the check-in message, like "for linting failures".

But maybe more importantly, I don't think a linting failure is necessarily a good reason to reduce testing. Even in the referenced example, it looks like something is still broken after the backout. Was that because the backout was faulty/incomplete? Or was there another push that broke tests before the tree was closed? Isn't it best to run tests and verify the state of the tree after the backout?

Use finer granularity for reftest/mochitest SCHEDULES.exclusive categories,
so that reftest-plain does not run when only crashtests are modified, and
vice versa; similarly, break up mochitest into mochitest/browser-chrome/chrome/
a11y.

There may be an additional opportunity for devtools once bug 1564431 is complete.

:gbrown, is there a need to review/land your patch?

Flags: needinfo?(gbrown)
Whiteboard: [ci-costs-2020:todo]

My patch needs changes to proceed -- see review comments. I hope to get back to it soon.

Flags: needinfo?(gbrown)

NOTE: some of these might be backed-in the ML model as hardcoded rules.

:gbrown, I know you are working on some other bugs, do you have an ETA when you think you will pick this bug up again?

Next week!

Attachment #9121195 - Attachment description: Bug 1598746 - Adjust SCHEDULES.exclusive categories; r= → Bug 1598746 - Adjust some mochitest and reftest SCHEDULES.exclusive entries; r=
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/eb0786104e9e Adjust some mochitest and reftest SCHEDULES.exclusive entries; r=ahal

That's all the ideas I have for now. Let's pursue any other ideas in new bugs.

Keywords: leave-open
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76
Whiteboard: [ci-costs-2020:todo] → [ci-costs-2020:done]

looking at the original list:

backouts (not run anything)
doc only changes- .md files (only run doclint - although that should be at reviewbot)
test manifest changes (disable a test) - figure out a task to ensure it parses correctly
editing a test case - only run that test case [job that contains it] on a single platform (linux64/debug)
test harness - only run a single job of that test harness and/or unittests
taskcluster change - only run the jobs that changed from the previous run
other ideas?!?

I see we focused on:
test manifest changes (disable a test) - figure out a task to ensure it parses correctly

what I am concerned with is the changes looked for *.ini and ran the appropriate subsuite. There are some extra rules for files changed in a directory which would pick up many harness or other tooling changes.

if we already run test-verify for a test that is added or edited, then we need to deal with the test disabled and harness/tooling cases. This bug handles a lot of that, but it really seems as though we are running a lot of tests for the disable a test case. The changes we have are perfect for harness/tooling changes (in place of a comprehensive test harness harness)

Could we have a simpler rule for test disabling? One where we just run a manifest lint job to ensure all the manifests can be parsed correctly? I am happy to roll this into a new bug. Unfortunately I wasn't able to measure any cost reductions/increases as a result of this - I think in some cases we spent more and in a few cases we saved, but overall it was a wash (which is ok)

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #27)

Could we have a simpler rule for test disabling? One where we just run a manifest lint job to ensure all the manifests can be parsed correctly?
I don't think we can reasonably distinguish test disabling from enabling, or other manifest changes. I think it all depends on whether people can adapt to and live with the testing differences. Consider:

  • test is enabled and starts failing on the next push -- will sheriffs notice what has actually happened or will they back out the first failing push?
  • test is disabled on the wrong platform, but no one will see that the test is still failing until it runs on the next non-SETA-optimized push.

I think in some cases we spent more and in a few cases we saved, but overall it was a wash (which is ok)
Is that because the tasks chosen by SCHEDULES rules will not be SETA-optimized? I suppose now a browser.ini change will trigger (bc1 bc2 bc3 bc4 bc5) where SETA might have run (bc3 X2 R7).

yeah, SETA would cherry pick tasks and configs to run them on where now we run bc1-5 on 20 configs.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: