1390700 - Use sparse checkouts for decision task

Assignee

Description

•

7 years ago

Mercurial 4.3 has experimental support for sparse checkouts. These are working directories that contain a subset of all the files in a revision. Use of sparse checkouts means `hg update` operations only need to touch a subset of files, which means they should be much faster. Widespread use of sparse checkouts has the potential to drastically decrease time spent in VCS operations. It can take minutes to create >100k files in the working directory. And for most operations, only a subset of files in the working directory are relevant, so creating extra files is just wasteful. Sparse checkouts should unblock us from using VCS checkouts for running tests. We could never run WPT from checkouts (bug 1286900) because of perf issues creating the checkout. Sparse checkouts will enable us to only manifest the files we need, which should make using version control from tests feasible. In this bug, I want to get the decision task using sparse checkouts. Why the decision task? It should be pretty well isolated. It uses Linux (which is easy for in-tree hacking). And it's in the critical path for running things, so any speed improvement has an immediate benefit. Off the top of my head, this will involve: * New Docker image with Mercurial 4.3 (decision task is special and image needs manual upload) * In-tree definition of sparse profiles for Mercurial (not sure where to put these yet) * Robustcheckout awareness for sparse checkout, how to widen/narrow checkout as appropriate * run-task awareness for which sparse profile to use * New TC caches for sparse-aware repos (repos with sparse enabled have a new requirement that will lock out older Mercurial clients) * Mechanism in taskgraph YAML for specifying sparse profile to use

Dustin J. Mitchell [:dustin] (he/him)

Comment 1

•

7 years ago

What would that profile look like? The decision task is very close to using moz.build, which I think would need the entire tree..

Gregory Szorc [:gps]

Assignee

Comment 2

•

7 years ago

The Mercurial sparse profiles essentially define patterns of files to include or exclude. It can support things like **/moz.build if we need all moz.build files, for example. With Mercurial 4.3, run `hg help -e sparse` for more info. And stating it for the record, sparse checkouts are only a medium term solution for scaling. Infinite scalability means working directories backed by a virtual filesystem that realizes file content on first access. That way, you can access any file at any time without having to worry about tracking it in a profile. However, that world effectively reverts distributed version control to something more centralized and/or requires virtual filesystem support, which we can't easily get in TaskCluster now. Plus Mercurial's virtual filesystem working directory support is still manifesting. There's also a non-complete clone involved in there somewhere.

Gregory Szorc [:gps]

Assignee

Comment 3

•

7 years ago

I have a working proof-of-concept at https://treeherder.mozilla.org/#/jobs?repo=try&revision=952e0746f3e2e70be0dc0202cdda561711b0a071. Example sparse profile is https://hg.mozilla.org/try/file/401b0507166257b9ee8f18603144a6d58c1c3b76/build/sparse-profiles/taskgraph. That manifested 3553 files in the working directory instead of 234137, saving a few dozen seconds on fresh checkout. The patches are super hacky. Don't judge me on their quality :) There are a few "random" files being opened by taskgraph that I was somewhat surprised about. Also, with sparse checkouts, we'll have to be extra cautious of anything doing a filesystem walk or VCS query to discover files. e.g. if we assemble a manifest of **/moz.build files and sparse checkouts are in play, the manifest content varies depending on the sparse checkout configuration. This could come into play for things like computing the Docker image build context archive hashes (which taskgraph does).

Gregory Szorc [:gps]

Assignee

Updated

•

7 years ago

Depends on: 1391394

Gregory Szorc [:gps]

Assignee

Updated

•

7 years ago

Depends on: 1392886

Gregory Szorc [:gps]

Assignee

Updated

•

7 years ago

Assignee: nobody → gps

Status: NEW → ASSIGNED

Gregory Szorc [:gps]

Assignee

Comment 4

•

7 years ago

I'm almost ready to submit reviews for this. I punted most of the prep work to bug 1392886. https://treeherder.mozilla.org/#/jobs?repo=try&revision=5c01f4e1fae19e9e01f350171fad89f292cc185b has some tasks on it. Experimental sparse profiles for flake8, wptlint, and sphinx docs. https://public-artifacts.taskcluster.net/Bv8T1qveSvadVYP1eWXe_w/0/public/logs/live_backing.log shows us going from a wptlint checkout to a flake8 checkout. It takes 15s to perform ~70k file deletes+adds. (Anything involving WPT is about the worst case because of its absurd file count.) https://public-artifacts.taskcluster.net/aLvwHYqAQ0aQ6GONLknG0A/0/public/logs/live_backing.log is more interesting. That appears to be a flake8 checkout transitioning to a docs checkout. That takes ~4s to perform ~1600 file modifications. The real win from these tasks is not doing a full checkout. We're seeing the initial checkout after clone complete in 2-3s instead of upwards of a minute. Since TC's scheduling doesn't have cache affinity, checkouts tend to be more common than you'd think and minimizing the time spent doing checkouts translates to nice wall-time wins.

(not currently active) Ted Mielczarek

Comment 5

•

7 years ago

This would probably be a nice win for my recently-added upload-generated-sources tasks as well: https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/upload-generated-sources/kind.yml They're doing a full checkout but they only really need some Python bits.

Mike Hommey [:glandium]

Comment 6

•

7 years ago

Most toolchain jobs have the same property: they do a full checkout for a very small number of files they need.

Comment hidden (mozreview-request)

Dustin J. Mitchell [:dustin] (he/him)

Comment 10

•

7 years ago

mozreview-review

Comment on attachment 8900410 [details] Bug 1390700 - Sparse checkout profiles for mach and taskgraph; https://reviewboard.mozilla.org/r/171752/#review177000 Once concern I have is that we won't see problems with these profiles locally -- not until a try push or a landing. So probably being generous is good (e.g., including python/ rather than specific subpackages)

Attachment #8900410 - Flags: review?(dustin) → review+

Dustin J. Mitchell [:dustin] (he/him)

Comment 11

•

7 years ago

mozreview-review

Comment on attachment 8900411 [details] Bug 1390700 - Support sparse checkouts in run-task; https://reviewboard.mozilla.org/r/171754/#review177006 This commit should probably be before the previous.. ::: taskcluster/docs/caches.rst:60 (Diff revision 1) > This cache name pattern is managed by ``run-task`` and must only be > used with ``run-task``. > > +``level-{{level}}-checkouts-sparse-{{hash}}`` > + This is like the above but is used when the checkout is sparse (contains > + a subset of files). Isn't the cache requirements file enough to handle this difference?

Attachment #8900411 - Flags: review?(dustin) → review+

Gregory Szorc [:gps]

Assignee

Comment 12

•

7 years ago

mozreview-review-reply

Comment on attachment 8900411 [details] Bug 1390700 - Support sparse checkouts in run-task; https://reviewboard.mozilla.org/r/171754/#review177006 > Isn't the cache requirements file enough to handle this difference? The cache requirements file locks out incompatible tasks. It is a check on us not mixing and matching different cache "flavors" when we shouldn't be. We arguably should have a cache requirement for sparse checkouts. I may do that as a follow-up...

Gregory Szorc [:gps]

Assignee

Comment 13

•

7 years ago

mozreview-review-reply

Comment on attachment 8900410 [details] Bug 1390700 - Sparse checkout profiles for mach and taskgraph; https://reviewboard.mozilla.org/r/171752/#review177000 Yes, this is a big danger with sparse checkouts. Another is wildcards not finding files because they aren't in the sparse profile. The latter we can mitigate with a mozpack Finder that is sparse aware. I may have a go at that... As for not uncovering problems locally, I would like end-users to start using sparse checkouts. So we'll see some coverage once that occurs. But that's a ways off. There's no way around it: sparse checkouts introduce a whole new class of bugs due to "missing" files :/

Mike Hommey [:glandium]

Comment 14

•

7 years ago

build/sparse-profiles/taskgraph has been bitrotted by bug 1391744, and rebased as-is, I'd expect the decision task to be busted.

Comment hidden (mozreview-request)

Aki Sasaki (not active)

Comment 20

•

7 years ago

mozreview-review

Comment on attachment 8900797 [details] Bug 1390700 - Add hash for taskcluster/decision:0.1.10; https://reviewboard.mozilla.org/r/172236/#review177508 versions in comments ++

Attachment #8900797 - Flags: review?(aki) → review+

Gregory Szorc [:gps]

Assignee

Comment 21

•

7 years ago

https://hg.mozilla.org/build/puppet/rev/805e2d27fc1c819498b39e364faaf308f22bdcfb

Dustin J. Mitchell [:dustin] (he/him)

Comment 22

•

7 years ago

mozreview-review

Comment on attachment 8900412 [details] Bug 1390700 - Use sparse checkouts for decision task; https://reviewboard.mozilla.org/r/171756/#review177596

Attachment #8900412 - Flags: review?(dustin) → review+

Dustin J. Mitchell [:dustin] (he/him)

Comment 23

•

7 years ago

mozreview-review

Comment on attachment 8900795 [details] Bug 1390700 - Fix some error messages in run-task; https://reviewboard.mozilla.org/r/172232/#review177600

Attachment #8900795 - Flags: review?(dustin) → review+

Pulsebot

Comment 24

•

7 years ago

Pushed by gszorc@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ade7021e45b5 Fix some error messages in run-task; r=dustin https://hg.mozilla.org/integration/autoland/rev/7f873ef16fba Support sparse checkouts in run-task; r=dustin https://hg.mozilla.org/integration/autoland/rev/d243323c8686 Sparse checkout profiles for mach and taskgraph; r=dustin https://hg.mozilla.org/integration/autoland/rev/9923fffd4f64 Use sparse checkouts for decision task; r=dustin https://hg.mozilla.org/integration/autoland/rev/0516a2ab2945 Documentation for sparse checkouts; r=me

(not currently active) Ted Mielczarek

Comment 25

•

7 years ago

mozreview-review

Comment on attachment 8900410 [details] Bug 1390700 - Sparse checkout profiles for mach and taskgraph; https://reviewboard.mozilla.org/r/171752/#review177682 This looks great! ::: build/sparse-profiles/mach:8 (Diff revision 2) > +path:build/autoconf/config.guess > +# Used for bootstrapping the mach driver. > +path:build/mach_bootstrap.py > +path:build/virtualenv_packages.txt > +path:mach > +# Various dependencies. There is room to trim fat, especially in I don't know that I'd bother trying to trim this down--it seems like it's more likely to cause us headaches down the road of implementing commands assuming the full virtualenv is there and then failing.

Attachment #8900410 - Flags: review?(ted) → review+

Gregory Szorc [:gps]

Assignee

Comment 26

•

7 years ago

Had to backout the change to .taskcluster.yml because of this weird failure on some *Windows* builds: 2017-08-25T04:27:54 INFO - Found a OMYU_kr8R3qr7j9FD12T2w fuzzy match with eM27zVtYTXCuQppyOjoT-w ... 2017-08-25T04:27:54 INFO - Verifying the signing:build XQdrW-FuRKex5m9yKKbQZw task definition is part of the signing:decision L4bFr_PJRqavWHhGGogO4w task graph... 2017-08-25T04:27:54 INFO - Found XQdrW-FuRKex5m9yKKbQZw in the graph; it's a match 2017-08-25T04:27:54 INFO - Verifying signing:decision L4bFr_PJRqavWHhGGogO4w command... 2017-08-25T04:27:54 CRITICAL - signing:decision L4bFr_PJRqavWHhGGogO4w illegal option --sparse-profile=build/sparse-profiles/taskgraph in the command! 2017-08-25T04:27:54 CRITICAL - Chain of Trust verification error! Traceback (most recent call last): File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/cot/verify.py", line 1311, in verify_chain_of_trust task_count = await verify_task_types(chain) File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/cot/verify.py", line 1098, in verify_task_types await valid_task_types[task_type](chain, obj) File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/cot/verify.py", line 943, in verify_decision_task verify_firefox_decision_command(link) File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/cot/verify.py", line 908, in verify_firefox_decision_command raise_on_errors(errors) File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/cot/verify.py", line 219, in raise_on_errors raise CoTError("\n".join(errors)) I have no clue how to fix this. aki?

Flags: needinfo?(aki)

Keywords: leave-open

Pulsebot

Comment 27

•

7 years ago

Backout by gszorc@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/874e11b5b24e Backed out changeset 9923fffd4f64 for somehow breaking chain of trust on some Windows builds

Aki Sasaki (not active)

Comment 28

•

7 years ago

Tl;dr: we probably need an `if item.startswith('--sparse-profile='): continue` here [1], followed by a new scriptworker release. Longer answer: This [2] is the ugliness I currently use to eyeball a decision task to make sure it *looks* like a valid decision task. Combined with known workerTypes and allowlisted docker images, it restricts the damage someone could do should they have the desire + scopes to create a malicious release graph complete with signing and update-submitting. Bug 1328719 should let us get rid of this function, and instead rebuild the decision task definition using json-e. It will likely allow us to get rid of the docker image allowlists as well. I haven't been able to prioritize time for it, but since bug 1393277 would make a lot of releasetask migration work easier and more elegant, I may be able to carve out time for cot v2. [1] https://github.com/mozilla-releng/scriptworker/blob/b353ce36ff38cdb52d95a3de5d0967bcc8e448f0/scriptworker/cot/verify.py#L889-L890 [2] https://github.com/mozilla-releng/scriptworker/blob/b353ce36ff38cdb52d95a3de5d0967bcc8e448f0/scriptworker/cot/verify.py#L836-L908

Flags: needinfo?(aki)

Aki Sasaki (not active)

Comment 29

•

7 years ago

(In reply to Aki Sasaki [:aki] from comment #28) > Bug 1328719 should let us get rid of this function * Bug 1328719 should allow us to get rid of this function for *on-push* decision tasks. Release task graphs still require us to be able to parameterize certain things like buildnumber; until we have parameterized hooks, we may have to support submitting and cot-verifying non-json-e decision task definitions.

Aki Sasaki (not active)

Comment 30

•

7 years ago

Removing the option from the payload and baking it into the decision image would improve verifiability, as would simplifying the entire commandline.

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 31

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/ade7021e45b5 https://hg.mozilla.org/mozilla-central/rev/7f873ef16fba https://hg.mozilla.org/mozilla-central/rev/d243323c8686 https://hg.mozilla.org/mozilla-central/rev/0516a2ab2945

Aki Sasaki (not active)

Comment 32

•

7 years ago

Pushed out scriptworker 5.0.1: https://hg.mozilla.org/build/puppet/rev/9f9c4770ded2 . This should take ~1hr to propagate.

Pulsebot

Comment 33

•

7 years ago

Pushed by gszorc@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/bad2c7bdb5c7 Use sparse checkouts for decision task; r=dustin

Gregory Szorc [:gps]

Assignee

Updated

•

7 years ago

Keywords: leave-open

Phil Ringnalda (:philor)

Comment 34

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/bad2c7bdb5c7

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla57

Gregory Szorc [:gps]

Assignee

Updated

•

7 years ago

Blocks: 1395752

BMO Automation

Updated

•

7 years ago

Product: TaskCluster → Firefox Build System

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Updated

•

5 years ago

Regressions: 1560077

Bug 1390700 - Sparse checkout profiles for mach and taskgraph; 7 years ago Gregory Szorc [:gps] (deleted), text/x-review-board-request	dustin : review+ ted : review+	Details
Bug 1390700 - Support sparse checkouts in run-task; 7 years ago Gregory Szorc [:gps] (deleted), text/x-review-board-request	dustin : review+	Details
Bug 1390700 - Use sparse checkouts for decision task; 7 years ago Gregory Szorc [:gps] (deleted), text/x-review-board-request	dustin : review+	Details
Bug 1390700 - Fix some error messages in run-task; 7 years ago Gregory Szorc [:gps] (deleted), text/x-review-board-request	dustin : review+	Details
Bug 1390700 - Add hash for taskcluster/decision:0.1.10; 7 years ago Gregory Szorc [:gps] (deleted), text/x-review-board-request	mozilla : review+	Details