Closed
Bug 1420449
Opened 7 years ago
Closed 7 years ago
Optimize task graph generation
Categories
(Firefox Build System :: Task Configuration, task)
Firefox Build System
Task Configuration
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: catlee)
References
(Blocks 1 open bug)
Details
Attachments
(2 files, 2 obsolete files)
Generating task graphs locally can be pretty slow. It takes about 20s on my machine to run `mach taskgraph target`. This makes the modify/test cycle pretty slow when working on taskgraph changes.
I did some digging, and found a few places we may be able to speed this up, especially for local development:
1) We spend a lot of time hashing directory contents for the toolchain tasks. Most users working locally probably don't need accurate hashes for toolchain tasks until they're ready to test out their changes. I've attached a simple patch that disables directory hashing based on an environment variable. This brings taskgraph generation time down to about 13s on my system.
2) We also spend a lot of time doing task schema validation. I wonder if this could be parallelized at all? It takes about 7s to generate and validate all the 'test' kind tasks.
Comment 1•7 years ago
|
||
I would rather have this be a command-line argument rather than an env var. Maybe something like -F / --fast?
I like the idea of skipping validation. I don't think it can be parallelized because of the GIL, but it could easily be skipped without an impact to correctness (the user would just need to be aware that they may not hear about schema errors in any modifications they have made).
As for skipping the hashing, I'd prefer that it disable index searching entirely for the relevant tasks, rather than return a static hash. There will likely be a lot of cases where a --fast taskgraph is accidentally used to look up something like a docker image or toolchain build, and with a static hash it would be trivial to trojan developers' systems in that case.
Since this is a local-use-only option, I think it would be fine for --fast to set some global variable (maybe just `taskgraph.fast`?). I don't think it should be a parameter, since it would never be used in automation.
/cc ahal since he is interested in fast local taskgraph generation
Comment 2•7 years ago
|
||
Yes, thanks for looking into this!
These improvements might make bug 1412121 unnecessary, I'll make this block so we can compare again after these changes land.
Blocks: 1412121
Assignee | ||
Comment 3•7 years ago
|
||
I played around a little bit with bypassing or parallelizing the schema validation and ran into a few issues:
1) skipping schema validation causes errors in the task generation. The schema validation seems to make modifications to the task definitions. I didn't dig into it, but perhaps it's applying defaults?
2) I couldn't figure out how to use multiprocessing or concurrent.futures from mach, I kept hitting errors like
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
Comment hidden (mozreview-request) |
Comment 5•7 years ago
|
||
> 1) skipping schema validation causes errors in the task generation. The schema validation seems to make modifications to the task definitions. I didn't dig into it, but perhaps it's applying defaults?
This is definitely the case. Perhaps it would be possible to run voluptuous in a mode that just applies changes and doesn't make other modifications? I guess voluptuous might be expressive enough that the defaults might depend on checking other things, even if taskgraph doesn't use that flexibility.
Comment 6•7 years ago
|
||
It's definitely possible -- it returns the modified data, rather than modifying it in place.
I think we only use voluptuous lightly to apply defaults, and we also have lots of transforms that apply defaults. I suspect that using transforms exclusively would be clearer for readers anyway.
Comment 7•7 years ago
|
||
mozreview-review |
Comment on attachment 8933366 [details]
Bug 1420449: Add mach taskgraph -F/--fast
https://reviewboard.mozilla.org/r/204294/#review210152
Thanks, this looks good! I have a couple comments.
Dustin, does using `graph_config` for this value make sense to you?
::: taskcluster/taskgraph/__init__.py:11
(Diff revision 1)
> +# Enable fast task generation for local debugging?
> +fast = False
I think this would be better off living in the `graph_config` instead of an attribute on the top level module:
https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/generator.py#90
You'd have to modify `load_graph_config` to accept arbitrary kwargs that updates/overwrites whatever is in `config.yml`.
Then this value can be accessed later on via `config.graph_config['fast']`
::: taskcluster/taskgraph/transforms/job/toolchain.py:159
(Diff revision 1)
> + if not taskgraph.fast:
> - name = taskdesc['label'].replace('{}-'.format(config.kind), '', 1)
> + name = taskdesc['label'].replace('{}-'.format(config.kind), '', 1)
> - add_optimization(
> + add_optimization(
> - config, taskdesc,
> + config, taskdesc,
> - cache_type=CACHE_TYPE,
> + cache_type=CACHE_TYPE,
> - cache_name=name,
> + cache_name=name,
> - digest_data=get_digest_data(config, run, taskdesc),
> + digest_data=get_digest_data(config, run, taskdesc),
> - )
> + )
I'm not at all familiar with toolchain tasks. But does skipping this `add_optimization` call make the whole function a no-op? Could we do an early return near the top instead?
Attachment #8933366 -
Flags: review?(ahalberstadt)
Comment 8•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8933366 [details]
Bug 1420449: Add mach taskgraph -F/--fast
https://reviewboard.mozilla.org/r/204294/#review210152
> I think this would be better off living in the `graph_config` instead of an attribute on the top level module:
> https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/generator.py#90
>
> You'd have to modify `load_graph_config` to accept arbitrary kwargs that updates/overwrites whatever is in `config.yml`.
>
> Then this value can be accessed later on via `config.graph_config['fast']`
To clarify, the new way of setting this would be `tgg = TaskGraphGenerator(fast=True)`
Comment 9•7 years ago
|
||
mozreview-review |
Comment on attachment 8933366 [details]
Bug 1420449: Add mach taskgraph -F/--fast
https://reviewboard.mozilla.org/r/204294/#review210794
I don't feel too strongly about the config issue. I'll leave it open for now in case there's some more bikeshedding to be had. But if dustin prefers this approach to the `graph_config` one, feel free to close it and land.
Attachment #8933366 -
Flags: review+
Comment 10•7 years ago
|
||
mozreview-review |
Comment on attachment 8933366 [details]
Bug 1420449: Add mach taskgraph -F/--fast
https://reviewboard.mozilla.org/r/204294/#review211152
::: taskcluster/taskgraph/__init__.py:11
(Diff revision 1)
> +# Enable fast task generation for local debugging?
> +fast = False
The graph config is for bigger, sweeping changes in the graph generation that differ across trust domains (so TB has its own taskcluste/ci/config.yml). So I don't think that's a great place.
This particular piece of configuration is a runtime option that's specifically about the generation, and doesn't affect the resulting graph in "important" ways.
I suspect we'll want access to this value from all manner of weird places, such as in the schema validation function, too. Rather than find a dynamic value that is available everywhere we want it, I think this global makes that easy, and also suggests that this is an internal feature.
My only request would be to expand this comment a little: what does it mean (how much detail can we drop in the interest of speed?), how is it set, how is it accessed?
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Keywords: leave-open
Comment 12•7 years ago
|
||
Pushed by catlee@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/642f885eb109
Add mach taskgraph -F/--fast r=ahal
Comment 13•7 years ago
|
||
bugherder |
Assignee | ||
Updated•7 years ago
|
Attachment #8931719 -
Attachment is obsolete: true
Assignee | ||
Comment 14•7 years ago
|
||
This produces equivalent output for me with or without --fast as compared to a task graph generated without this patch.
I don't like the repetition of the treeherder environment defaults. Any ideas for how to make that better?
Attachment #8935631 -
Flags: feedback?(mozilla)
Attachment #8935631 -
Flags: feedback?(dustin)
Attachment #8935631 -
Flags: feedback?(ahalberstadt)
Comment 15•7 years ago
|
||
Comment on attachment 8935631 [details] [diff] [review]
noschema.diff
Review of attachment 8935631 [details] [diff] [review]:
-----------------------------------------------------------------
I'm slightly unhappy at moving the defaults away from the schema (although I know I'm the one that suggested it). The schemas are the only documentation for the config formats that exist, so moving the defaults away makes them harder to reference. I wonder how hard it would be to extract the defaults from the schema, (probably once at module scope would be possible) and then apply them like is done here.
> I don't like the repetition of the treeherder environment defaults. Any ideas for how to make that better?
Yep, get rid of that setting, since it isn't useful anymore: https://reviewboard.mozilla.org/r/206516/
Attachment #8935631 -
Flags: feedback?(mozilla) → feedback-
Assignee | ||
Comment 16•7 years ago
|
||
(In reply to Tom Prince [:tomprince] from comment #15)
> Comment on attachment 8935631 [details] [diff] [review]
> noschema.diff
>
> Review of attachment 8935631 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> I'm slightly unhappy at moving the defaults away from the schema (although I
> know I'm the one that suggested it). The schemas are the only documentation
> for the config formats that exist, so moving the defaults away makes them
> harder to reference. I wonder how hard it would be to extract the defaults
> from the schema, (probably once at module scope would be possible) and then
> apply them like is done here.
I think we'll end up re-implementing most of the schema validation code to do this so that we can correctly apply defaults to schemas that accept more than one form (e.g. schemas with Any()). Perhaps we can duplicate the defaults and verify that they match when doing schema verification?
> > I don't like the repetition of the treeherder environment defaults. Any ideas for how to make that better?
>
> Yep, get rid of that setting, since it isn't useful anymore:
> https://reviewboard.mozilla.org/r/206516/
\o/
Comment 17•7 years ago
|
||
Comment on attachment 8935631 [details] [diff] [review]
noschema.diff
Review of attachment 8935631 [details] [diff] [review]:
-----------------------------------------------------------------
I like this. I think in the fullness of time we can move some of this to a model where defaults are applied at the time the parameter is consulted (so, use .get instead of []), but for the moment this is a good intermediate step.
::: taskcluster/taskgraph/transforms/source_test.py
@@ +74,1 @@
> yield validate_schema(source_test_description_schema, job,
We should also change this so that it validates but then returns the original (which, IIRC, voluptuous does not modify). Then if someone later adds a default, it will have no effect.
::: taskcluster/taskgraph/transforms/task.py
@@ +1091,5 @@
> + worker.setdefault('max-run-time', 600)
> + elif worker['implementation'] == 'beetmover-cdns':
> + worker.setdefault('max-run-time', 600)
> + elif worker['implementation'] == 'push-apk':
> + worker.setdefault('commit', False)
could all of these move to the implementation functions?
Attachment #8935631 -
Flags: feedback?(dustin) → feedback+
Comment 18•7 years ago
|
||
Comment on attachment 8935631 [details] [diff] [review]
noschema.diff
Review of attachment 8935631 [details] [diff] [review]:
-----------------------------------------------------------------
I spent a bit of time trying to recursively extract defaults out of a Schema. Top level defaults and recursively getting defaults out of dict values isn't too hard. But it gets quite complicated when you have to also consider Any/All directives. It's do-able, but seems more complicated than it's worth. Let me know and I can attach what I have so far.
::: taskcluster/taskgraph/util/schema.py
@@ +20,5 @@
> Validate that object satisfies schema. If not, generate a useful exception
> beginning with msg_prefix.
> """
> + if taskgraph.fast:
> + return obj
Should we also deepcopy this?
Attachment #8935631 -
Flags: feedback?(ahalberstadt) → feedback+
Assignee | ||
Comment 19•7 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #17)
> Comment on attachment 8935631 [details] [diff] [review]
> noschema.diff
>
> Review of attachment 8935631 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> I like this. I think in the fullness of time we can move some of this to a
> model where defaults are applied at the time the parameter is consulted (so,
> use .get instead of []), but for the moment this is a good intermediate step.
>
> ::: taskcluster/taskgraph/transforms/source_test.py
> @@ +74,1 @@
> > yield validate_schema(source_test_description_schema, job,
>
> We should also change this so that it validates but then returns the
> original (which, IIRC, voluptuous does not modify). Then if someone later
> adds a default, it will have no effect.
I've changed it so that validate_schema returns None, forcing all callers to yield the original job object.
I don't think we need to worry about mutable defaults in this case. The set_defaults transforms will create new list objects per task.
> ::: taskcluster/taskgraph/transforms/task.py
> @@ +1091,5 @@
> > + worker.setdefault('max-run-time', 600)
> > + elif worker['implementation'] == 'beetmover-cdns':
> > + worker.setdefault('max-run-time', 600)
> > + elif worker['implementation'] == 'push-apk':
> > + worker.setdefault('commit', False)
>
> could all of these move to the implementation functions?
Not easily. The validate transform occurs before the build_task transform, which is where the worker specific payload builder function is called. Any ideas for how to refactor that?
Assignee | ||
Comment 20•7 years ago
|
||
How do we keep this from breaking in the future?
Comment 21•7 years ago
|
||
I'm happy with comment 19 -- no ideas for how to refactor that. My comment was from curiosity.
As to keeping this from breaking -- we could potentially run with --fast in a sub-task, if that would help, perhaps even verifying that task-graph.json matches that from the decision task?
Assignee: nobody → catlee
Assignee | ||
Comment 22•7 years ago
|
||
Attachment #8935631 -
Attachment is obsolete: true
Attachment #8938500 -
Flags: review?(dustin)
Attachment #8938500 -
Flags: review?(ahalberstadt)
Updated•7 years ago
|
Attachment #8938500 -
Flags: review?(dustin) → review+
Comment 23•7 years ago
|
||
Comment on attachment 8938500 [details] [diff] [review]
Skip schema validation with --fast
Review of attachment 8938500 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks!
Attachment #8938500 -
Flags: review?(ahalberstadt) → review+
Comment 24•7 years ago
|
||
Pushed by catlee@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7a47accb11c5
Skip schema validation with --fast r=dustin,ahal
Comment 25•7 years ago
|
||
Backed out changeset 7a47accb11c5 (bug 1420449) for build bustage r=backout on a CLOSED TREE
Failure push: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=7a47accb11c59245d427149ab0fe251c2977d782&filter-classifiedState=unclassified&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=153872740&repo=mozilla-inbound&lineNumber=21213
Backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/5950f275b19245bd46edb9e27107e4a5cf90d8e9
Flags: needinfo?(catlee)
Comment 26•7 years ago
|
||
FWIW, this bounced because of bug 1421100. See https://hg.mozilla.org/integration/mozilla-inbound/rev/c1e20de19b6f
Also note that I landed bug 1419638 on autoland, that adds another use of validate_schema that this bug would need to patch.
Comment 27•7 years ago
|
||
(In reply to Mike Hommey [back on Jan 9] [:glandium] from comment #26)
> FWIW, this bounced because of bug 1421100. See
> https://hg.mozilla.org/integration/mozilla-inbound/rev/c1e20de19b6f
>
> Also note that I landed bug 1419638 on autoland, that adds another use of
> validate_schema that this bug would need to patch.
... although I must say, I don't understand why you're not keeping the current behavior for validate_schema, avoiding all callers to change.
Assignee | ||
Comment 28•7 years ago
|
||
(In reply to Mike Hommey [back on Jan 9] [:glandium] from comment #27)
> (In reply to Mike Hommey [back on Jan 9] [:glandium] from comment #26)
> > FWIW, this bounced because of bug 1421100. See
> > https://hg.mozilla.org/integration/mozilla-inbound/rev/c1e20de19b6f
> >
> > Also note that I landed bug 1419638 on autoland, that adds another use of
> > validate_schema that this bug would need to patch.
Thanks for the heads up! I'll incorporate that in my patch.
> ... although I must say, I don't understand why you're not keeping the
> current behavior for validate_schema, avoiding all callers to change.
To try and prevent people from relying on side-effects of the schema validation.
Flags: needinfo?(catlee)
Comment 29•7 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #28)
> To try and prevent people from relying on side-effects of the schema
> validation.
You don't need to change the function signature for that. You can still make it return the data it's passed in. That would avoid all those changes to callers.
Comment 30•7 years ago
|
||
Pushed by catlee@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1c24c4cefe70
Skip schema validation with --fast r=dustin,ahal
Comment 31•7 years ago
|
||
bugherder |
Comment 32•7 years ago
|
||
Pushed by mozilla@hocat.ca:
https://hg.mozilla.org/comm-central/rev/a7badbfcf7b3
Include optional graph config setting, now that defaults aren't applied; rs=bustage-fix CLOSED TREE
Comment 33•7 years ago
|
||
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/90928695e8b1
Fixup after bug 1427312. r=me
Comment 34•7 years ago
|
||
bugherder |
Assignee | ||
Comment 35•7 years ago
|
||
I think this is good enough for now.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment 36•7 years ago
|
||
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
Updated•7 years ago
|
Product: TaskCluster → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•