Closed Bug 1522111 Opened 6 years ago Closed 6 years ago

disable linux64/windows7/windows10 opt builds and tests on mozilla-inbound and autoland branches

Categories

(Testing :: General, defect, P3)

Version 3
defect

Tracking

(firefox67 fixed)

RESOLVED FIXED
mozilla67
Tracking Status
firefox67 --- fixed

People

(Reporter: jmaher, Assigned: Callek)

References

Details

Attachments

(8 files, 1 obsolete file)

(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-github-pull-request
emorley
: checked-in+
Details
(deleted), text/x-phabricator-request
Details

there is a small bit of work to do in order to make this completed:

  1. adjust taskcluster to only do opt for l64/w7/w10 on mozilla-central and try and mark it as tier-2
  2. fix SETA to take existing 'opt' jobs we would run and run those as 'pgo'

I am not sure if we need to have any adjustments to |./mach try|. I would like to push artifact builds to be pgo, but that is unrelated.

I also do not see any differentiation in manifests between opt and pgo, only debug or not debug for test skipping or expectations.

:ahal, can you think of other pieces to fix in order to make this a reality?

Flags: needinfo?(ahal)
Priority: -- → P3

before doing this officially we should track some data:

  1. total hours (per day, per push) in total for all branches
  2. total jobs (per day, per push) in total for all branches
  3. total intermittents (per day) in total for all branches

ideally we can look at 2 weeks before/after the change to have a good data point of what we are saving. This is good to advertise as a result of doing this, likewise this is useful for predicting what we might see for any changes in the future (more job deletion or addition)

:catlee, are these metrics useful? do we have this data available from your point of view?

Flags: needinfo?(catlee)

++ for tracking. I suggest a shorter window -- at most 7 days before and after -- to reduce the effects of other factors.

No, I can't think of anything else. There won't be any changes to make in |mach try|, though I guess I could de-emphasize 'opt' in the |mach try chooser| UI. As I haven't advertised that much, I don't think it's a big deal (currently it is responsible for 2% of pushes).

Flags: needinfo?(ahal)

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #1)

before doing this officially we should track some data:

  1. total hours (per day, per push) in total for all branches
  2. total jobs (per day, per push) in total for all branches
  3. total intermittents (per day) in total for all branches

ideally we can look at 2 weeks before/after the change to have a good data point of what we are saving. This is good to advertise as a result of doing this, likewise this is useful for predicting what we might see for any changes in the future (more job deletion or addition)

:catlee, are these metrics useful? do we have this data available from your point of view?

We have data for 1) and 2) available for specific branches, I'm not sure if we track that data across all branches. We don't track 3) currently, but I think you have a better handle on intermittents than I do :)

Simon/Nick any thoughts on this?

Flags: needinfo?(sfraser)
Flags: needinfo?(nthomas)
Flags: needinfo?(catlee)

(In reply to Chris AtLee [:catlee] from comment #4)

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #1)

before doing this officially we should track some data:

  1. total hours (per day, per push) in total for all branches
  2. total jobs (per day, per push) in total for all branches

Is a job a task, or a group of tasks to do a particular CI run/build/release?

We have data for 1) and 2) available for specific branches, I'm not sure if we track that data across all branches. We don't track 3) currently, but I think you have a better handle on intermittents than I do :)

Simon/Nick any thoughts on this?

We can also go into recent history for 1 and 2, to populate a new data source. Where would people best like to visualise the information?

Flags: needinfo?(sfraser)

disable opt builds when pgo exists for autoland/inbound and adjust seta to run those opt jobs on pgo.

Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8d7c099bbe0d disable opt builds/tests when we have pgo builds/tests for integration branches. r=ahal

I landed this and opt builds are still running on autoland :(
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=8d7c099bbe0dbb73c8b9eb3a356ffad3cdd0e723

:tomprince, do you have ideas of how this could be made to not run the opt builds on autoland/inbound?

Flags: needinfo?(mozilla)
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
Assignee: nobody → jmaher

The problem is that some tests are configured to run on autoland/inbound, and they depend on the builds, which force the builds to run.

I made an attempt to solve this for linux64: https://gist.github.com/catlee/0adc285e7a56eb0405c7d32e33a8a0a3

It appears to work according to mach taskgraph target -p project=autoland, but I'm not sure if it's correct for other branches.

It looks like the builds are getting pulled into the graph by various dependencies:
(generated via mach taskgraph target-graph --fast -p project=autoland -J | jq '.[]|select([.["dependencies"]|.[]]|contains(["build-signing-linux64/opt"]))|.label')

  • "l10n-linux64/opt"
  • "source-test-jsshell-bench-ares6-sm"
  • "source-test-jsshell-bench-octane-sm"
  • "source-test-jsshell-bench-sixspeed-sm"
  • "source-test-jsshell-bench-sunspider-sm"
  • "source-test-jsshell-bench-web-tooling-sm"
  • "source-test-python-mochitest-harness-linux64/opt"
  • "source-test-python-reftest-harness-linux64/opt"
  • "test-linux64-qr/opt-talos-chrome-e10s"
  • "test-linux64-qr/opt-talos-damp-e10s"
  • "test-linux64-qr/opt-talos-dromaeojs-e10s"
  • "test-linux64-qr/opt-talos-g1-e10s"
  • "test-linux64-qr/opt-talos-g3-e10s"
  • "test-linux64-qr/opt-talos-g4-e10s"
  • "test-linux64-qr/opt-talos-g5-e10s"
  • "test-linux64-qr/opt-talos-other-e10s"
  • "test-linux64-qr/opt-talos-perf-reftest-e10s"
  • "test-linux64-qr/opt-talos-perf-reftest-singletons-e10s"
  • "test-linux64-qr/opt-talos-svgr-e10s"
  • "test-linux64-qr/opt-talos-tp5o-e10s"
  • "test-linux64-qr/opt-talos-tp6-stylo-threads-e10s"
  • "test-linux64-qr/opt-talos-tps-e10s"
  • "test-linux64/opt-browser-screenshots-e10s"
  • "test-linux64/opt-talos-bcv-e10s"
  • "test-linux64/opt-talos-chrome-e10s"
  • "test-linux64/opt-talos-damp-e10s"
  • "test-linux64/opt-talos-dromaeojs-e10s"
  • "test-linux64/opt-talos-g1-e10s"
  • "test-linux64/opt-talos-g3-e10s"
  • "test-linux64/opt-talos-g4-e10s"
  • "test-linux64/opt-talos-g5-e10s"
  • "test-linux64/opt-talos-other-e10s"
  • "test-linux64/opt-talos-perf-reftest-e10s"
  • "test-linux64/opt-talos-perf-reftest-singletons-e10s"
  • "test-linux64/opt-talos-svgr-e10s"
  • "test-linux64/opt-talos-tp5o-e10s"
  • "test-linux64/opt-talos-tp6-stylo-threads-e10s"
  • "test-linux64/opt-talos-tps-e10s"
  • "test-linux64/opt-test-verify-e10s-1"
  • "test-linux64/opt-test-verify-e10s-2"
  • "test-linux64/opt-test-verify-e10s-3"
  • "test-linux64/opt-test-verify-gpu-e10s-1"
  • "test-linux64/opt-test-verify-gpu-e10s-2"
  • "test-linux64/opt-test-verify-gpu-e10s-3"
  • "test-linux64/opt-test-verify-wpt-e10s-1"
  • "test-linux64/opt-test-verify-wpt-e10s-2"
  • "test-linux64/opt-test-verify-wpt-e10s-3"

(and presumably similarly for other platforms)

Blocks: 1515432
No longer blocks: 1515432

:gbrown, I will not get to work on this bug this week, if you have cycles, I think what :catlee and :tomprince added will do the trick. I know you suggested an interest in this.

We need to keep in mind that mozilla-beta runs opt builds but it is labeled opt when it is really pgo.

Status: RESOLVED → REOPENED
Flags: needinfo?(nthomas)
Flags: needinfo?(mozilla)
Flags: needinfo?(gbrown)
Resolution: FIXED → ---

I'll see what I can do...

Assignee: jmaher → gbrown
Flags: needinfo?(gbrown)

:jmaher I think this disabled Raptor tests on Quantum Render platforms (QR seem to only run on opt builds).
I noticed this thing by verifying this alert.

Flags: needinfo?(jmaher)

good point :igoldan, I see that clearly when looking at a before/after:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&searchStr=raptor&tochange=63348118ef1d564a659f793c0ec9afe5d7f1cc8b&fromchange=d8cebb3b46cfd216ab60e58588e585f28750a5f3

Luckily we are still running on mozilla-central, but I think possibly we should back this out until we get a patch that does the right thing.

To that point, I see we do not run *-qr-pgo tests- should we be doing that? Lets see what :kats says

Flags: needinfo?(jmaher) → needinfo?(kats)

and it appears that this patch broke SETA because in the hack to move opt jobs to pgo, we never returned the new label which caused SETA to do nothing for the last week (hence the backlog and tree closures) - thanks Aryx for catching this.

This got backed out because it broke SETA optimization and the full set of jobs ran for every push on autoland and inbound:

https://hg.mozilla.org/mozilla-central/rev/22ca3a5f976fd0f11c96cd3f5d6b91e55fb9b06d

If we're turning off opt, then yeah we should move the QR tests to the pgo build instead. Let me know if you need me to write a patch for any part of that.

Flags: needinfo?(kats)
Attachment #9043790 - Attachment description: Bug 1522111 - (WIP) Make -qr tests depend on -pgo where applicable, and set the run-on-projects for various tests to be central/try when opt → Bug 1522111 - Make -qr tests depend on -pgo where applicable, leaving old -qr sets in place. r=gbrown

This avoids opt being pulled in even when l10n is optimized out

Depends on D19838

Assignee: gbrown → bugspam.Callek
Pushed by jwood@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6fbfc8bc2388 disable opt builds/tests when we have pgo builds/tests for integration branches. r=ahal https://hg.mozilla.org/integration/autoland/rev/cc97d772a8db Make -qr tests depend on -pgo where applicable, leaving old -qr sets in place. r=gbrown https://hg.mozilla.org/integration/autoland/rev/c10783ea070b fix SETA to account for new tests, so we still filter them out even if SETA data doesn't include them. r=gbrown https://hg.mozilla.org/integration/autoland/rev/14e9ed41b8be Make l10n kind depend on -pgo where available instead of opt. r=tomprince
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/aa6103c8ef0f Followup, ensure test sets align between opt and pgo. r=jmaher
Depends on: 1528351

adjust manifest for webrender tests that are now passing/failing as a result of running on PGO

Depends on: 1528371

Backed out 5 changesets (Bug 1522111) for breaking windows opt wpts

Backout push: https://hg.mozilla.org/integration/autoland/rev/6a145b0bf8e4a9e1a227b2327f6fbbbe948fbae8

Attachment #9043790 - Attachment description: Bug 1522111 - Make -qr tests depend on -pgo where applicable, leaving old -qr sets in place. r=gbrown → Bug 1522111 - Make -qr tests depend on -pgo where applicable, leaving old -qr sets in place. r=gbrown r=jmaher
Pushed by jwood@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4d3c12d6fd14 disable opt builds/tests when we have pgo builds/tests for integration branches. r=ahal https://hg.mozilla.org/integration/autoland/rev/85456690910d Make -qr tests depend on -pgo where applicable, leaving old -qr sets in place. r=gbrown,jmaher,kats https://hg.mozilla.org/integration/autoland/rev/96f074966007 fix SETA to account for new tests, so we still filter them out even if SETA data doesn't include them. r=gbrown https://hg.mozilla.org/integration/autoland/rev/176bccfac2c7 Make l10n kind depend on -pgo where available instead of opt. r=tomprince https://hg.mozilla.org/integration/autoland/rev/d3e2e32d61ea Followup, ensure test sets align between opt and pgo. r=jmaher https://hg.mozilla.org/integration/autoland/rev/e2c779112d08 Followup, Use better SETA transformation which reads high value list and applies opt high value tests to heuristics to prevent them from being optimized out on pgo. r=jmaher
Pushed by jwood@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f4bd1f216081 Followup, improve SETA algorithm a bit more by treating opt low value as low value for pgo as well. Unless there is a high value task to override. r=jmaher
Attachment #9044268 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: