1598746 - do not schedule builds/tests for pushes that will not benefit

Reporter

Description

•

5 years ago

there is a concept of DONTBUILD and we need to set taskcluster job scheduler for this as I find we run the default set of tasks for many pushes that don't need it.

backouts (not run anything)
doc only changes- .md files (only run doclint - although that should be at reviewbot)
test manifest changes (disable a test) - figure out a task to ensure it parses correctly
editing a test case - only run that test case [job that contains it] on a single platform (linux64/debug)
test harness - only run a single job of that test harness and/or unittests
taskcluster change - only run the jobs that changed from the previous run
other ideas?!?

Geoff Brown [:gbrown]

Assignee

Updated

•

5 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1551695

Andrew Halberstadt [:ahal]

Comment 1

•

5 years ago

Looks like doc changes already have this set up:
https://searchfox.org/mozilla-central/source/taskcluster/moz.build#17

So we just need to make sure all our docs dirs have this metadata applied. Maybe we should have a global rule for all docs:

with Files('**/docs/**'):
    SCHEDULES.exclusive = ['docs']

This will catch new docs folders, though assumes that every folder called docs is meant for Sphinx.

Geoff Brown [:gbrown]

Assignee

Updated

•

5 years ago

Assignee: nobody → gbrown

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 2

•

5 years ago

I looked at the last 10000 commits on autoland (sorted by bug) and found:
5890 unique bugs landed
702 backouts
1670 bugs had only test related changes (75 of those were backouts)

in total: 39% of commits would be an upper bound that we could create rules for, maybe reality is 30%

suggested rules:

backout: DONTBUILD (or decision task only)
wpt: run only changed tests (test-verify) - ideally on limited platforms like win10 pgo and linux64 pgo, run linters
other test: run linters, sanity test on linux64 for each harness (mochitest/tests/harness_sanity, reftest-sanity, xpcshell-?)

This will play odd with some rules if we override SETA and it is time to do a full set of tests, but we hit a backout. As we have a 30% chance of a non browser test change, there is a good chance we will often run into this. Possibly the sheriffs should be involved here to be aware of and provide input.

Using the above rules, I think we could see a 25-28% reduction in load. Given that there is backfills/retriggers needed and some edge cases we might not want to deal with, I will call this a 20% savings in our overall load, or 175K annual savings.

Geoff Brown [:gbrown]

Assignee

Comment 3

•

5 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #2)

suggested rules:

backout: DONTBUILD (or decision task only)

That seems controversial.

:aryx - Do you think it would be okay if no builds or tests ran on backout pushes?

Flags: needinfo?(aryx.bugmail)

Geoff Brown [:gbrown]

Assignee

Comment 4

•

5 years ago

Some possible issues:

If the backout introduced a failure (an inaccurate/incomplete/typo-ed backout) failures would not appear until (and might be blamed on!) the next push.
The trees are closed, sheriffs push a backout, no tests run on the backout: Need to re-open the trees and land another push to determine if the backout resolved the failures that originally closed the tree. That seems inconvenient.
If the backout was successful, we expect test results to be (more or less) the same as they were before the backed out patch originally landed; but to verify that, folks now need to compare the push after backout with the push before the backed out patch -- which may have a dozen or so pushes between them. (Visually inconvenient/confusing on treeherder?)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 5

•

5 years ago

good points :gbrown. Is there a way to take a list of failures from the previous commits that are affected and run just those jobs (or a sample of those jobs)?

One thought I have is that treeherder can determine what is marked as "fixed by commit" and then do an 'add new jobs' magic trick (this would need to be added to treeherder). Of course that isn't ideal as it takes additional sheriff time and steps to complete the backout and verify it. Currently taskcluster queries treeherder to get SETA data, maybe it can query treeherder to get a list of jobs to run as part of verification. If that fails run the builds and a single job to ensure the browser runs.

I would be curious to hear what :aryx thinks about this.

Marco Castelluccio [:marco]

Comment 6

•

5 years ago

When we first discussed about the backout stuff, the initial idea was to only run the failing jobs that caused the patch to be backed out in the first place. This should alleviate (or, actually, fix) the issues outlined in comment 4.

The implementation would be a bit more complex, as you would need to 1) parse the backout's commit message to figure out which patches were backed out, 2) find the pushes corresponding to those backouts; 3) get the failures from those pushes; 4) schedule them in the backout push.
You would also miss cases like this:

Patch A is pushed, all scheduled jobs are successful;
Patch B is pushed, a job is failing but it's clearly Patch A's fault;
A sheriff backs out Patch A.
In this case we'd need to actually schedule the failures from the second push, rather than the second.

One easy way to fix this is to do 1) parse the backout's commit message to figure out which patches were backed out, 2) find all the pushes between the culprit patches pushes and the backout push; 3) get the failures from those pushes; 4) schedule them in the backout push.
More expensive, but complete.

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 7

•

5 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #2)

I looked at the last 10000 commits on autoland (sorted by bug) and found:
5890 unique bugs landed
702 backouts
1670 bugs had only test related changes (75 of those were backouts)

804 bugs which got fixed in the last 10000 commits on autoland were part of the wpt-sync and represent 10 pushes (that omits follow-up commits, Treeherder's DB doesn't have information about files modified).

suggested rules:

backout: DONTBUILD (or decision task only)

If the backout is directly after the push which gets backed out and the failing tasks are all build jobs, the backout should be canceled.
If only one platform is failing (e.g. Windows packaging, Android) the complete builds for Linux etc. are already sunk costs and keeping the running tasks running lets them act as a Try push (that the build bustage didn't get caught before indicates a missing Try push or with few or the wrong tests).
If the backout push is not directly after the backed out push and has the full job set, it shall not be canceled as it will provide coverage for the previous pushes.
If the backout push is not directly after the backed out push which was about build bustage, broke test suites or caused mass failures, it shall not be canceled because it provides test coverage.
If the backout push is not directly after the backed out push which was about limit test failures and the backout push doesn't have the full job set, it can be canceled/use DONTBUILD as it won't provide new test coverage.

Regarding the request to verify that a backout fixed an issue, this should be done for builds and would be welcome for tests (in the current situation, they don't always run on the backout push).

Flags: needinfo?(aryx.bugmail)

Geoff Brown [:gbrown]

Assignee

Updated

•

5 years ago

Priority: -- → P2

Geoff Brown [:gbrown]

Assignee

Comment 8

•

5 years ago

Another possible savings: consider https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=2d123a680904f034dbf2d9319b929acc65f8d3a2

Only testing/web-platform files were modified and there is a SCHEDULES.exclusive for that case:
https://searchfox.org/mozilla-central/rev/42c2ecdc429115c32e6bcb78bf087a228a051044/testing/web-platform/moz.build#18

web-platform tests ran on that push, and no mochitest/reftest/etc - great! - but:

spidermonkey tasks ran on most platforms;
many build tasks ran, even ones where no wpt tests ran.
Why? Can we do better?

Geoff Brown [:gbrown]

Assignee

Comment 9

•

5 years ago

Another possible savings: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=89119ca63f83d6e661d136dd9e2fc5638ce320a3

Only a browser-chrome test was modified, but various test suites ran. Could we add a .exclusive for files test/browser*.js -> browser-chrome? (Did we already try that once?)

Geoff Brown [:gbrown]

Assignee

Updated

•

5 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1272042

Geoff Brown [:gbrown]

Assignee

Comment 10

•

5 years ago

(In reply to Geoff Brown [:gbrown] from comment #9)

(Did we already try that once?)

I was thinking of bug 1522113, but that was specifically for test-verify.

Geoff Brown [:gbrown]

Assignee

Comment 11

•

5 years ago

Attached file Bug 1598746 - Add some SCHEDULES optimizations for changes to test files; r= (deleted) — Details

Add some SCHEDULES rules so that, when a push only modifies files known to be associated
with a particular test suite, only that test suite is run.

Pulsebot

Comment 12

•

5 years ago

Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3bdb8c990a49 Add some SCHEDULES optimizations for changes to test files; r=ahal

Geoff Brown [:gbrown]

Assignee

Updated

•

5 years ago

Keywords: leave-open

Raul Gurzau (:RaulG)

Comment 13

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/3bdb8c990a49

Geoff Brown [:gbrown]

Assignee

Comment 14

•

5 years ago

These changes seem to be working as anticipated:
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=53b1fcb5d9d57dbd4fd395829a232c885cd42580
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=075a66d659d2f00103138f724c131119be2d0292

Calixte Denizet (:calixte)

Comment 15

•

5 years ago

Would it be possible to avoid to run tasks for a backout patch caused by linting issues ? For example:
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=15dfb02919db9c2b212a2f72eaafca0a19eb4660

Geoff Brown [:gbrown]

Assignee

Comment 16

•

5 years ago

(In reply to Calixte Denizet (:calixte) from comment #15)

Would it be possible to avoid to run tasks for a backout patch caused by linting issues ? For example:
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=15dfb02919db9c2b212a2f72eaafca0a19eb4660

I don't think so? It wouldn't be possible to detect the cause of the backout reliably; we'd need to check for some magic phrase in the check-in message, like "for linting failures".

But maybe more importantly, I don't think a linting failure is necessarily a good reason to reduce testing. Even in the referenced example, it looks like something is still broken after the backout. Was that because the backout was faulty/incomplete? Or was there another push that broke tests before the tree was closed? Isn't it best to run tests and verify the state of the tree after the backout?

Geoff Brown [:gbrown]

Assignee

Comment 17

•

5 years ago

Attached file Bug 1598746 - Adjust some mochitest and reftest SCHEDULES.exclusive entries; r= (deleted) — Details

Use finer granularity for reftest/mochitest SCHEDULES.exclusive categories,
so that reftest-plain does not run when only crashtests are modified, and
vice versa; similarly, break up mochitest into mochitest/browser-chrome/chrome/
a11y.

Geoff Brown [:gbrown]

Assignee

Comment 18

•

5 years ago

There may be an additional opportunity for devtools once bug 1564431 is complete.

Marco Castelluccio [:marco]

Updated

•

5 years ago

Depends on: 1613707

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1613707

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 19

•

5 years ago

:gbrown, is there a need to review/land your patch?

Flags: needinfo?(gbrown)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

5 years ago

Whiteboard: [ci-costs-2020:todo]

Geoff Brown [:gbrown]

Assignee

Comment 20

•

5 years ago

My patch needs changes to proceed -- see review comments. I hope to get back to it soon.

Flags: needinfo?(gbrown)

Marco Castelluccio [:marco]

Comment 21

•

5 years ago

NOTE: some of these might be backed-in the ML model as hardcoded rules.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 22

•

5 years ago

:gbrown, I know you are working on some other bugs, do you have an ETA when you think you will pick this bug up again?

Geoff Brown [:gbrown]

Assignee

Comment 23

•

5 years ago

Next week!

Phabricator Automation

Updated

•

5 years ago

Attachment #9121195 - Attachment description: Bug 1598746 - Adjust SCHEDULES.exclusive categories; r= → Bug 1598746 - Adjust some mochitest and reftest SCHEDULES.exclusive entries; r=

Pulsebot

Comment 24

•

5 years ago

Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/eb0786104e9e Adjust some mochitest and reftest SCHEDULES.exclusive entries; r=ahal

Geoff Brown [:gbrown]

Assignee

Comment 25

•

5 years ago

That's all the ideas I have for now. Let's pursue any other ideas in new bugs.

Keywords: leave-open

Oana Pop-Rus

Comment 26

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/eb0786104e9e

Status: NEW → RESOLVED

Closed: 5 years ago

status-firefox76: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla76

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

5 years ago

Whiteboard: [ci-costs-2020:todo] → [ci-costs-2020:done]

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 27

•

5 years ago

looking at the original list:

backouts (not run anything)
doc only changes- .md files (only run doclint - although that should be at reviewbot)
test manifest changes (disable a test) - figure out a task to ensure it parses correctly
editing a test case - only run that test case [job that contains it] on a single platform (linux64/debug)
test harness - only run a single job of that test harness and/or unittests
taskcluster change - only run the jobs that changed from the previous run
other ideas?!?

I see we focused on:
test manifest changes (disable a test) - figure out a task to ensure it parses correctly

what I am concerned with is the changes looked for *.ini and ran the appropriate subsuite. There are some extra rules for files changed in a directory which would pick up many harness or other tooling changes.

if we already run test-verify for a test that is added or edited, then we need to deal with the test disabled and harness/tooling cases. This bug handles a lot of that, but it really seems as though we are running a lot of tests for the disable a test case. The changes we have are perfect for harness/tooling changes (in place of a comprehensive test harness harness)

Could we have a simpler rule for test disabling? One where we just run a manifest lint job to ensure all the manifests can be parsed correctly? I am happy to roll this into a new bug. Unfortunately I wasn't able to measure any cost reductions/increases as a result of this - I think in some cases we spent more and in a few cases we saved, but overall it was a wash (which is ok)

Geoff Brown [:gbrown]

Assignee

Comment 28

•

5 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #27)

Could we have a simpler rule for test disabling? One where we just run a manifest lint job to ensure all the manifests can be parsed correctly?
I don't think we can reasonably distinguish test disabling from enabling, or other manifest changes. I think it all depends on whether people can adapt to and live with the testing differences. Consider:

test is enabled and starts failing on the next push -- will sheriffs notice what has actually happened or will they back out the first failing push?
test is disabled on the wrong platform, but no one will see that the test is still failing until it runs on the next non-SETA-optimized push.

I think in some cases we spent more and in a few cases we saved, but overall it was a wash (which is ok)
Is that because the tasks chosen by SCHEDULES rules will not be SETA-optimized? I suppose now a browser.ini change will trigger (bc1 bc2 bc3 bc4 bc5) where SETA might have run (bc3 X2 R7).

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 29

•

5 years ago

yeah, SETA would cherry pick tasks and configs to run them on where now we run bc1-5 on 20 configs.

Marco Castelluccio [:marco]

Updated

•

5 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1636440

Bug 1598746 - Add some SCHEDULES optimizations for changes to test files; r= 5 years ago Geoff Brown [:gbrown] (deleted), text/x-phabricator-request		Details
Bug 1598746 - Adjust some mochitest and reftest SCHEDULES.exclusive entries; r= 5 years ago Geoff Brown [:gbrown] (deleted), text/x-phabricator-request		Details