Closed
Bug 1390700
Opened 7 years ago
Closed 7 years ago
Use sparse checkouts for decision task
Categories
(Firefox Build System :: Task Configuration, task)
Firefox Build System
Task Configuration
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla57
People
(Reporter: gps, Assigned: gps)
References
(Blocks 2 open bugs)
Details
Attachments
(5 files)
(deleted),
text/x-review-board-request
|
dustin
:
review+
ted
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
dustin
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
dustin
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
dustin
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
mozilla
:
review+
|
Details |
Mercurial 4.3 has experimental support for sparse checkouts. These are working directories that contain a subset of all the files in a revision.
Use of sparse checkouts means `hg update` operations only need to touch a subset of files, which means they should be much faster.
Widespread use of sparse checkouts has the potential to drastically decrease time spent in VCS operations. It can take minutes to create >100k files in the working directory. And for most operations, only a subset of files in the working directory are relevant, so creating extra files is just wasteful.
Sparse checkouts should unblock us from using VCS checkouts for running tests. We could never run WPT from checkouts (bug 1286900) because of perf issues creating the checkout. Sparse checkouts will enable us to only manifest the files we need, which should make using version control from tests feasible.
In this bug, I want to get the decision task using sparse checkouts. Why the decision task? It should be pretty well isolated. It uses Linux (which is easy for in-tree hacking). And it's in the critical path for running things, so any speed improvement has an immediate benefit.
Off the top of my head, this will involve:
* New Docker image with Mercurial 4.3 (decision task is special and image needs manual upload)
* In-tree definition of sparse profiles for Mercurial (not sure where to put these yet)
* Robustcheckout awareness for sparse checkout, how to widen/narrow checkout as appropriate
* run-task awareness for which sparse profile to use
* New TC caches for sparse-aware repos (repos with sparse enabled have a new requirement that will lock out older Mercurial clients)
* Mechanism in taskgraph YAML for specifying sparse profile to use
Comment 1•7 years ago
|
||
What would that profile look like? The decision task is very close to using moz.build, which I think would need the entire tree..
Assignee | ||
Comment 2•7 years ago
|
||
The Mercurial sparse profiles essentially define patterns of files to include or exclude. It can support things like **/moz.build if we need all moz.build files, for example. With Mercurial 4.3, run `hg help -e sparse` for more info.
And stating it for the record, sparse checkouts are only a medium term solution for scaling. Infinite scalability means working directories backed by a virtual filesystem that realizes file content on first access. That way, you can access any file at any time without having to worry about tracking it in a profile. However, that world effectively reverts distributed version control to something more centralized and/or requires virtual filesystem support, which we can't easily get in TaskCluster now. Plus Mercurial's virtual filesystem working directory support is still manifesting. There's also a non-complete clone involved in there somewhere.
Assignee | ||
Comment 3•7 years ago
|
||
I have a working proof-of-concept at https://treeherder.mozilla.org/#/jobs?repo=try&revision=952e0746f3e2e70be0dc0202cdda561711b0a071.
Example sparse profile is https://hg.mozilla.org/try/file/401b0507166257b9ee8f18603144a6d58c1c3b76/build/sparse-profiles/taskgraph. That manifested 3553 files in the working directory instead of 234137, saving a few dozen seconds on fresh checkout.
The patches are super hacky. Don't judge me on their quality :)
There are a few "random" files being opened by taskgraph that I was somewhat surprised about.
Also, with sparse checkouts, we'll have to be extra cautious of anything doing a filesystem walk or VCS query to discover files. e.g. if we assemble a manifest of **/moz.build files and sparse checkouts are in play, the manifest content varies depending on the sparse checkout configuration. This could come into play for things like computing the Docker image build context archive hashes (which taskgraph does).
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → gps
Status: NEW → ASSIGNED
Assignee | ||
Comment 4•7 years ago
|
||
I'm almost ready to submit reviews for this. I punted most of the prep work to bug 1392886.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5c01f4e1fae19e9e01f350171fad89f292cc185b has some tasks on it. Experimental sparse profiles for flake8, wptlint, and sphinx docs.
https://public-artifacts.taskcluster.net/Bv8T1qveSvadVYP1eWXe_w/0/public/logs/live_backing.log shows us going from a wptlint checkout to a flake8 checkout. It takes 15s to perform ~70k file deletes+adds. (Anything involving WPT is about the worst case because of its absurd file count.)
https://public-artifacts.taskcluster.net/aLvwHYqAQ0aQ6GONLknG0A/0/public/logs/live_backing.log is more interesting. That appears to be a flake8 checkout transitioning to a docs checkout. That takes ~4s to perform ~1600 file modifications.
The real win from these tasks is not doing a full checkout. We're seeing the initial checkout after clone complete in 2-3s instead of upwards of a minute. Since TC's scheduling doesn't have cache affinity, checkouts tend to be more common than you'd think and minimizing the time spent doing checkouts translates to nice wall-time wins.
Comment 5•7 years ago
|
||
This would probably be a nice win for my recently-added upload-generated-sources tasks as well:
https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/upload-generated-sources/kind.yml
They're doing a full checkout but they only really need some Python bits.
Comment 6•7 years ago
|
||
Most toolchain jobs have the same property: they do a full checkout for a very small number of files they need.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 10•7 years ago
|
||
mozreview-review |
Comment on attachment 8900410 [details]
Bug 1390700 - Sparse checkout profiles for mach and taskgraph;
https://reviewboard.mozilla.org/r/171752/#review177000
Once concern I have is that we won't see problems with these profiles locally -- not until a try push or a landing. So probably being generous is good (e.g., including python/ rather than specific subpackages)
Attachment #8900410 -
Flags: review?(dustin) → review+
Comment 11•7 years ago
|
||
mozreview-review |
Comment on attachment 8900411 [details]
Bug 1390700 - Support sparse checkouts in run-task;
https://reviewboard.mozilla.org/r/171754/#review177006
This commit should probably be before the previous..
::: taskcluster/docs/caches.rst:60
(Diff revision 1)
> This cache name pattern is managed by ``run-task`` and must only be
> used with ``run-task``.
>
> +``level-{{level}}-checkouts-sparse-{{hash}}``
> + This is like the above but is used when the checkout is sparse (contains
> + a subset of files).
Isn't the cache requirements file enough to handle this difference?
Attachment #8900411 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 12•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8900411 [details]
Bug 1390700 - Support sparse checkouts in run-task;
https://reviewboard.mozilla.org/r/171754/#review177006
> Isn't the cache requirements file enough to handle this difference?
The cache requirements file locks out incompatible tasks. It is a check on us not mixing and matching different cache "flavors" when we shouldn't be.
We arguably should have a cache requirement for sparse checkouts. I may do that as a follow-up...
Assignee | ||
Comment 13•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8900410 [details]
Bug 1390700 - Sparse checkout profiles for mach and taskgraph;
https://reviewboard.mozilla.org/r/171752/#review177000
Yes, this is a big danger with sparse checkouts. Another is wildcards not finding files because they aren't in the sparse profile.
The latter we can mitigate with a mozpack Finder that is sparse aware. I may have a go at that...
As for not uncovering problems locally, I would like end-users to start using sparse checkouts. So we'll see some coverage once that occurs. But that's a ways off.
There's no way around it: sparse checkouts introduce a whole new class of bugs due to "missing" files :/
Comment 14•7 years ago
|
||
build/sparse-profiles/taskgraph has been bitrotted by bug 1391744, and rebased as-is, I'd expect the decision task to be busted.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 20•7 years ago
|
||
mozreview-review |
Comment on attachment 8900797 [details]
Bug 1390700 - Add hash for taskcluster/decision:0.1.10;
https://reviewboard.mozilla.org/r/172236/#review177508
versions in comments ++
Attachment #8900797 -
Flags: review?(aki) → review+
Assignee | ||
Comment 21•7 years ago
|
||
Comment 22•7 years ago
|
||
mozreview-review |
Comment on attachment 8900412 [details]
Bug 1390700 - Use sparse checkouts for decision task;
https://reviewboard.mozilla.org/r/171756/#review177596
Attachment #8900412 -
Flags: review?(dustin) → review+
Comment 23•7 years ago
|
||
mozreview-review |
Comment on attachment 8900795 [details]
Bug 1390700 - Fix some error messages in run-task;
https://reviewboard.mozilla.org/r/172232/#review177600
Attachment #8900795 -
Flags: review?(dustin) → review+
Comment 24•7 years ago
|
||
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ade7021e45b5
Fix some error messages in run-task; r=dustin
https://hg.mozilla.org/integration/autoland/rev/7f873ef16fba
Support sparse checkouts in run-task; r=dustin
https://hg.mozilla.org/integration/autoland/rev/d243323c8686
Sparse checkout profiles for mach and taskgraph; r=dustin
https://hg.mozilla.org/integration/autoland/rev/9923fffd4f64
Use sparse checkouts for decision task; r=dustin
https://hg.mozilla.org/integration/autoland/rev/0516a2ab2945
Documentation for sparse checkouts; r=me
Comment 25•7 years ago
|
||
mozreview-review |
Comment on attachment 8900410 [details]
Bug 1390700 - Sparse checkout profiles for mach and taskgraph;
https://reviewboard.mozilla.org/r/171752/#review177682
This looks great!
::: build/sparse-profiles/mach:8
(Diff revision 2)
> +path:build/autoconf/config.guess
> +# Used for bootstrapping the mach driver.
> +path:build/mach_bootstrap.py
> +path:build/virtualenv_packages.txt
> +path:mach
> +# Various dependencies. There is room to trim fat, especially in
I don't know that I'd bother trying to trim this down--it seems like it's more likely to cause us headaches down the road of implementing commands assuming the full virtualenv is there and then failing.
Attachment #8900410 -
Flags: review?(ted) → review+
Assignee | ||
Comment 26•7 years ago
|
||
Had to backout the change to .taskcluster.yml because of this weird failure on some *Windows* builds:
2017-08-25T04:27:54 INFO - Found a OMYU_kr8R3qr7j9FD12T2w fuzzy match with eM27zVtYTXCuQppyOjoT-w ...
2017-08-25T04:27:54 INFO - Verifying the signing:build XQdrW-FuRKex5m9yKKbQZw task definition is part of the signing:decision L4bFr_PJRqavWHhGGogO4w task graph...
2017-08-25T04:27:54 INFO - Found XQdrW-FuRKex5m9yKKbQZw in the graph; it's a match
2017-08-25T04:27:54 INFO - Verifying signing:decision L4bFr_PJRqavWHhGGogO4w command...
2017-08-25T04:27:54 CRITICAL - signing:decision L4bFr_PJRqavWHhGGogO4w illegal option --sparse-profile=build/sparse-profiles/taskgraph in the command!
2017-08-25T04:27:54 CRITICAL - Chain of Trust verification error!
Traceback (most recent call last):
File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/cot/verify.py", line 1311, in verify_chain_of_trust
task_count = await verify_task_types(chain)
File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/cot/verify.py", line 1098, in verify_task_types
await valid_task_types[task_type](chain, obj)
File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/cot/verify.py", line 943, in verify_decision_task
verify_firefox_decision_command(link)
File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/cot/verify.py", line 908, in verify_firefox_decision_command
raise_on_errors(errors)
File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/cot/verify.py", line 219, in raise_on_errors
raise CoTError("\n".join(errors))
I have no clue how to fix this. aki?
Flags: needinfo?(aki)
Keywords: leave-open
Comment 27•7 years ago
|
||
Backout by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/874e11b5b24e
Backed out changeset 9923fffd4f64 for somehow breaking chain of trust on some Windows builds
Comment 28•7 years ago
|
||
Tl;dr: we probably need an `if item.startswith('--sparse-profile='): continue` here [1], followed by a new scriptworker release.
Longer answer:
This [2] is the ugliness I currently use to eyeball a decision task to make sure it *looks* like a valid decision task. Combined with known workerTypes and allowlisted docker images, it restricts the damage someone could do should they have the desire + scopes to create a malicious release graph complete with signing and update-submitting.
Bug 1328719 should let us get rid of this function, and instead rebuild the decision task definition using json-e. It will likely allow us to get rid of the docker image allowlists as well. I haven't been able to prioritize time for it, but since bug 1393277 would make a lot of releasetask migration work easier and more elegant, I may be able to carve out time for cot v2.
[1] https://github.com/mozilla-releng/scriptworker/blob/b353ce36ff38cdb52d95a3de5d0967bcc8e448f0/scriptworker/cot/verify.py#L889-L890
[2] https://github.com/mozilla-releng/scriptworker/blob/b353ce36ff38cdb52d95a3de5d0967bcc8e448f0/scriptworker/cot/verify.py#L836-L908
Flags: needinfo?(aki)
Comment 29•7 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #28)
> Bug 1328719 should let us get rid of this function
* Bug 1328719 should allow us to get rid of this function for *on-push* decision tasks. Release task graphs still require us to be able to parameterize certain things like buildnumber; until we have parameterized hooks, we may have to support submitting and cot-verifying non-json-e decision task definitions.
Comment 30•7 years ago
|
||
Removing the option from the payload and baking it into the decision image would improve verifiability, as would simplifying the entire commandline.
Comment 31•7 years ago
|
||
bugherder |
Comment 32•7 years ago
|
||
Pushed out scriptworker 5.0.1: https://hg.mozilla.org/build/puppet/rev/9f9c4770ded2 . This should take ~1hr to propagate.
Comment 33•7 years ago
|
||
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bad2c7bdb5c7
Use sparse checkouts for decision task; r=dustin
Assignee | ||
Updated•7 years ago
|
Keywords: leave-open
Comment 34•7 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Updated•7 years ago
|
Product: TaskCluster → Firefox Build System
Regressions: 1560077
You need to log in
before you can comment on or make changes to this bug.
Description
•