Closed Bug 1397552 Opened 7 years ago Closed 7 years ago

define a way to specify a previous graph's builds as this graph's dependencies

Categories

(Release Engineering :: Release Automation: Other, enhancement)

enhancement
Not set
normal

Tracking

(firefox58 fixed)

RESOLVED FIXED
Tracking Status
firefox58 --- fixed

People

(Reporter: mozilla, Assigned: mozilla)

References

Details

Attachments

(2 files, 3 obsolete files)

For release promotion, we build the release builds on-push; then at promotion time, we need to generate another graph that depends on those artifacts for l10n, beetmover, balrog, etc. Currently we depend upon the build graph's artifacts by pointing at the indexes. This works, but a) means we use a different url than the in-tree on-push and nightly graphs, and b) these are less cot-verifiable than task artifacts. I've been trying to figure out how to graft a previously finished graph's tasks at the top of this graph. We already do so, to a degree, with docker-image tasks.
Attached patch depend-on-previous-graph.patch (obsolete) (deleted) — Splinter Review
Dustin, This patch uses `optimize_task_graph`s `existing_tasks` kwarg to specify the task labels to replace with `task_id`s from the previously completed graph. I'm guessing that, if this approach looks feasible, we'll add two parameters (`previous_graph_id` and `previous_graph_kinds`); we'll download the previous graph's `label-to-taskid.json` and populate `existing_tasks` if those are set. Supporting more than one previous graph might be nice, but I'm not sure how to do that atm. What do you think?
Attachment #8905315 - Flags: feedback?(dustin)
Blocks: 1397762
Comment on attachment 8905315 [details] [diff] [review] depend-on-previous-graph.patch Review of attachment 8905315 [details] [diff] [review]: ----------------------------------------------------------------- I like this - this is what existing_tasks is for, in general (we use it in actions, for example). How will these properties be set, eventually? Is this a hook or an action or the like?
Attachment #8905315 - Flags: feedback?(dustin) → feedback+
Initially we're thinking a decision task template in releasetasks, which isn't ideal. Fragile and relies on pattern matching in CoT verification rather than re-generation. Eventually I'd like these to be action tasks.
(In reply to Aki Sasaki [:aki] from comment #1) > I'm guessing that, if this approach looks feasible, we'll add two parameters > (`previous_graph_id` and `previous_graph_kinds`); we'll download the > previous graph's `label-to-taskid.json` and populate `existing_tasks` if > those are set. Supporting more than one previous graph might be nice, but > I'm not sure how to do that atm. Hm, maybe an ordered `previous_graph_ids`, and `previous_graph_kinds`; we overlay each label-to-taskid over each other until we have a final dict. So if the repackage and repackage-signing for a subset of platforms+locales happened in a retrigger graph, we could potentially overlay the retrigger graph's label-to-taskid's over the top of the original's.
We've vaguely suggested similar ideas for action tasks. Several action tasks work by re-constituting the full-task-graph, picking a small target task graph, then optimizing it with the original label-to-taskid mapping, thereby pulling in any new dependencies that weren't built in the original decision task. But that process misses information about other action tasks that might already have run on the same graph. Having a generic way to get the combined decision+actions results in an action task implementation would be pretty cool.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=58bc934fc9fb355831aef80a07b76ceb15a9776b&selectedJob=129850176 is a win64 try build graph I pushed via `./mach try fuzzy` . I hacked .taskcluster.yml to add the --previous-graph-ids and --previous-graph-kinds arguments to the decision task, then pushed https://treeherder.mozilla.org/#/jobs?repo=try&revision=325a10ad8b5a265283c4fdf56dccf3f9c3849883&selectedJob=129851254 to trigger win64 signing via `./mach try fuzzy`. https://tools.taskcluster.net/groups/dgiU7OKISWKDU_O7M4wx4w/tasks/Y35ZoVw8QQ2MXh6qAqSK6A/details , the signing task from the 2nd graph, depends on the build task from the 1st graph. And if I'm eyeballing this correctly, it'll pass CoT verification and sign. Cleaning up the patch for review.
Thinking we may want to parameters['previous_graph_ids'] = ','.join(parameters['previous_graph_ids']).split(',') so we can do something like --previous-graph-ids="graphid1,graphid2"
(In reply to Aki Sasaki [:aki] from comment #9) > https://tools.taskcluster.net/groups/dgiU7OKISWKDU_O7M4wx4w/tasks/ > Y35ZoVw8QQ2MXh6qAqSK6A/details , the signing task from the 2nd graph, > depends on the build task from the 1st graph. And if I'm eyeballing this > correctly, it'll pass CoT verification and sign. Green \o/
Comment on attachment 8906090 [details] bug 1397552 - specify a previous graph's tasks as this graph's dependencies. https://reviewboard.mozilla.org/r/177822/#review183348 I see the second decision task, https://tools.taskcluster.net/groups/dgiU7OKISWKDU_O7M4wx4w/tasks/dgiU7OKISWKDU_O7M4wx4w/details, contains these new arguments, and from previous comments it seems this would be created by releasetasks. I feel like this should be an action task, not a decision task -- action tasks are designed to take extra args like this whereas these args feel like a weird fit for a decision task. And action tasks are the way to add new tasks for a particular "push" after-the-fact. I think I still don't understand the context enough. What would be different between decision task #1 and decision task #2 that would have #2 producing all of these additional jobs? Is it a "different" graph (e.g., taskcluster/releasetasks instead of taskcluster/ci) or more like a re-run of the same graph with different parameters? Do the results appear as a new "push" on treeherder, or are they intermingled with the earlier "push"?
(In reply to Dustin J. Mitchell [:dustin] from comment #13) > I see the second decision task, > https://tools.taskcluster.net/groups/dgiU7OKISWKDU_O7M4wx4w/tasks/ > dgiU7OKISWKDU_O7M4wx4w/details, contains these new arguments, and from > previous comments it seems this would be created by releasetasks. > > I feel like this should be an action task, not a decision task -- action > tasks are designed to take extra args like this whereas these args feel like > a weird fit for a decision task. And action tasks are the way to add new > tasks for a particular "push" after-the-fact. +1, I think it belongs in an action task too. I do believe that for the first phases of releasetasks-in-tree, we'll be using templatized decision tasks. > I think I still don't understand the context enough. What would be > different between decision task #1 and decision task #2 that would have #2 > producing all of these additional jobs? At first, the target_task_method. Beta does nightlies on push for desktop, with signing and repackaging, but no l10n or beetmover/balrog. The 2nd graph will introduce those tasks that haven't run yet, but still have them depend on the nightlies-on-push. The 2nd graph will also add buildbot bridge tasks that currently exist in releasetasks, e.g. tagging, bouncer, etc. > Is it a "different" graph (e.g., > taskcluster/releasetasks instead of taskcluster/ci) or more like a re-run of > the same graph with different parameters? The latter. Right now the thinking is, all the tasks will be in the same full graph, but will be weeded out in the target_task_method. > Do the results appear as a new > "push" on treeherder, or are they intermingled with the earlier "push"? On treeherder, they'll be shown on the original push. In taskcluster, they will be in a separate graph.
(In reply to Aki Sasaki [:aki] from comment #14) > +1, I think it belongs in an action task too. I do believe that for the > first phases of releasetasks-in-tree, we'll be using templatized decision > tasks. Why not a templatized action task? > At first, the target_task_method. > Beta does nightlies on push for desktop, with signing and repackaging, but > no l10n or beetmover/balrog. The 2nd graph will introduce those tasks that > haven't run yet, but still have them depend on the nightlies-on-push. The > 2nd graph will also add buildbot bridge tasks that currently exist in > releasetasks, e.g. tagging, bouncer, etc. OK, thanks, that helps a lot :) > The latter. Right now the thinking is, all the tasks will be in the same > full graph, but will be weeded out in the target_task_method. Gotcha. So basically what you need is to schedule some jobs that were in the original full-task-graph.json, but not in task-graph.json. > On treeherder, they'll be shown on the original push. In taskcluster, they > will be in a separate graph. OK. So I think this will make the most sense implemented in-tree as an action task. When that is triggered from releasetasks, it can just generate a task definition that is hard-coded to match what's in actions.json. When there's time to make it a little more fancy, releasetasks can download and parameterize actions.json. Or some other UI (ship-it?) can trigger that action (https://docs.taskcluster.net/manual/using/actions/ui) With this arrangement, there's no need to regenerate the task graph - you can just download it from the decision task. Then you specify the new tasks you want (similar to a target task method), add any additional desired tasks, and call create_tasks with the result. That function will assign dependencies correctly, schedule any intermediate dependencies, and perform optimization again (optimization may be useful if for example the new tasks require a docker image). Most of what you need is in http://searchfox.org/mozilla-central/source/taskcluster/taskgraph/actions/run_missing_tests.py Sorry for the back-and-forth here. I think the missing piece was the idea that releasetasks can easily create an action task instead of a decision task (just a mildly different template).
Comment on attachment 8906090 [details] bug 1397552 - specify a previous graph's tasks as this graph's dependencies. https://reviewboard.mozilla.org/r/177822/#review183794
Attachment #8906090 - Flags: review?(dustin) → review-
I've started working on the action task, and hit an issue: (In reply to Dustin J. Mitchell [:dustin] from comment #15) > (In reply to Aki Sasaki [:aki] from comment #14) > > +1, I think it belongs in an action task too. I do believe that for the > > first phases of releasetasks-in-tree, we'll be using templatized decision > > tasks. > > Why not a templatized action task? Looking at the code, we currently specify the beetmover payload [1] based upon the target task method of the decision task [2]. During a standard nightly decision task, the beetmover payload will specify no release config. During a given candidates_fennec decision task, we'll generate the release config with the build_number specified by os.environ['BUILD_NUMBER']. I realize this is hacky. I also don't know how I would properly use an action task to generate variable payloads like this. When we do the beta builds on-push, we need to be able to run N relpromo graphs against the build graph, and each relpromo graph needs to upload to a different candidates/buildN directory. Should we specify this similarly to a taskref, and expand it during optimization? We will likely have the same problem for setting an arbitrary rollout percentage for pushapk, or pointing at an out-of-tree revision for external repos. Either we need to re-run the decision task to generate a new full graph with the variables populated, or we need to keep the payload templatized to some degree, so we can fill in the blanks at action task time. Right now I'm thinking the easiest solution to unblock us is the templatized decision task, and we can add the templatized payloads in full-task-graph.json at a more leisurely pace, since we know we want to go to actions for most everything at some point. If you know of a way I can sanely modify the payloads of full-task-graph.json at action task time, then I can continue down that route. [1] https://hg.mozilla.org/mozilla-central/file/tip/taskcluster/taskgraph/transforms/task.py#l907 [2] https://hg.mozilla.org/mozilla-central/file/tip/taskcluster/taskgraph/util/scriptworker.py#l414 > > At first, the target_task_method. > > Beta does nightlies on push for desktop, with signing and repackaging, but > > no l10n or beetmover/balrog. The 2nd graph will introduce those tasks that > > haven't run yet, but still have them depend on the nightlies-on-push. The > > 2nd graph will also add buildbot bridge tasks that currently exist in > > releasetasks, e.g. tagging, bouncer, etc. > > OK, thanks, that helps a lot :) > > > The latter. Right now the thinking is, all the tasks will be in the same > > full graph, but will be weeded out in the target_task_method. > > Gotcha. So basically what you need is to schedule some jobs that were in > the original full-task-graph.json, but not in task-graph.json. Some jobs that were in the original full-task-graph.json, but may need slightly altered task definitions as mentioned above.
Flags: needinfo?(dustin)
Attached patch action_relpromo_wip.patch (obsolete) (deleted) — Splinter Review
Current action relpromo wip.
(In reply to Aki Sasaki [:aki] from comment #17) > We will likely have the same problem for setting an arbitrary rollout > percentage for pushapk, or pointing at an out-of-tree revision for external > repos. Either we need to re-run the decision task to generate a new full > graph with the variables populated, or we need to keep the payload > templatized to some degree, so we can fill in the blanks at action task time. We probably also want to pass down l10n changesets from ship-it. This may change the number of l10n chunks, which affects the number of tasks in the l10n kind, as well as all downstream tasks. If we go this route instead of using an in-tree l10n bumper, we'll definitely need to rerun the decision task.
Nothing's ever simple for releases, is it :) I suspect that we will have other action tasks that want to re-run the task-graph generation with some different parameters. So, how about this compromise: we keep the parameters, but don't allow them to be passed as command-line options to tasks, and instead make them available only to Python code (in this case, an action task). Then the action task can re-run the entire generation process: parameters = get_decision_parameters(..) parameters['previous_..'] = .. # set up previous_* tgg = TaskGraphGenerator( root_dir=options['root'], parameters=parameters) create_tasks(tgg.morphed_task_graph, tgg.label_to_taskid, parameters) I think I'd prefer to have parameters['existing_tasks'] (the action task doing the work of getting the artifact and filtering it by kind), but I can live with previous_ for now. It's code, we can refactor later if it works better that way. I think this is basically gluing attachment 8906090 [details] and attachment 8909952 [details] [diff] [review] together, so not too much work?
Flags: needinfo?(dustin)
Current wip, untested: https://github.com/escapewindow/gecko-dev/compare/inbound...escapewindow:relpromo-action?expand=1 - switched to parameters['existing_tasks'] - unsure if I can get parameters from a templatized action task, or if I should add in a call to `get_artifact` to download parameters.yml from the first decision task in the list - I believe this will work for both fennec and desktop, so I removed the `desktop` from the action name - unsure if I should go with camelCase or snake_case for input json keys. The only action with multi-word keys is mochitest_retrigger, which wlach wrote. Next steps: test, get feedback?
I think I need to insert a dummy existing_tasks: {} in ./mach try fuzzy somewhere.
I was able to ./mach try fuzzy -p <parameters_file> where the parameters file already had `existing_tasks: {}`. That gave me https://treeherder.mozilla.org/#/jobs?repo=try&revision=ab7f3d581e6505b509a28833c5db6db4b2edb551 which has the rel promo action: https://public-artifacts.taskcluster.net/Imt0Sn9yQ-y-XP_jlqdCCQ/0/public/actions.json Next step is to create a templatized action, probably through task creator.
Testing: DECISION=QF2uKKTHTW-xEV7yCvddcQ ./mach taskgraph test-action-callback \ -p /src/gecko/params/akitry.yml --task-id $DECISION --task-group-id $DECISION \ --input /src/gecko/params/input.yml release_promotion_action works! And then I rebased to inbound and things are broken like in https://bugzilla.mozilla.org/show_bug.cgi?id=1383880#c213 . https://treeherder.mozilla.org/#/jobs?repo=try&revision=630a98efeac42510db1df277dcff8f2d3cc5b89d&selectedJob=132613436 is the latest try push; I need to clean up a bit and figure out how we're going to be calling this. We probably want to download the parameters.yml from the decision task, and then run `./mach taskgraph action-callback`, so we don't necessarily need auto-parameters.yml-downloading in the action.
(In reply to Aki Sasaki [:aki] from comment #22) > I think I need to insert a dummy existing_tasks: {} in ./mach try fuzzy > somewhere. Once the patch gets to central, decision tasks' parameters.yml should include that explicitly. Andrew's working on a way to prevent parameter changes from causing errors (probably by providing defaults).
Current status: - tried the action on `date` branch. - it ran! yay - it triggered all the tasks, instead of building off the previous graph, boo. I suspect this is because it's getting the parameters from the on-push decision task, rather than the nightly decision task I pointed at. - it didn't upload the decision task artifacts, boo. - trying the command from comment 24 against the `date` nightly decision task, but hitting mach bootstrap bustage. - pinged #build - trying a reclone in case I horked my m-u clone. I think I'm close...
(In reply to Aki Sasaki [:aki] from comment #26) > - trying the command from comment 24 against the `date` nightly decision > task, but hitting mach bootstrap bustage. > - pinged #build > - trying a reclone in case I horked my m-u clone. Reclone fixed it. Weird.
Successful release promotion action task: https://tools.taskcluster.net/groups/btW09DPRRFiUExy6pxaZhg/tasks/X9TK6g_aRCq6Lvc8XRjPOQ/details It's going to fail chain of trust because it's not uploading its decision artifacts. Debugging that is next.
(In reply to Aki Sasaki [:aki] from comment #28) > It's going to fail chain of trust because it's not uploading its decision > artifacts. Debugging that is next. I bstack's first patch, https://reviewboard.mozilla.org/r/181890/diff/4#index_header , fixes this. Landed and retrying.
Attached patch relpro action (deleted) — Splinter Review
This patch: - adds the `do_not_optimize` and `existing_tasks` parameters. These are set to `[]` and `{}` by default; they can be overridden in an action. - adds the `release_promotion` action. This action may change over time as we work on migrating our releasetasks in-tree. - adds `find_hg_revision_pushlog_id` and `find_existing_tasks_from_previous_kinds` to actions.util - allows for passing a `parameters` kwarg into `taskgraph_decision`, rather than requiring argparse Comments, thoughts? I'm open to [partially?] landing early, or leaving on the new `maple` project branch until this is more fully baked.
Attachment #8905315 - Attachment is obsolete: true
Attachment #8906090 - Attachment is obsolete: true
Attachment #8909952 - Attachment is obsolete: true
Attachment #8914527 - Flags: feedback?(dustin)
Attachment #8914527 - Flags: feedback?(bstack)
Attachment #8914527 - Flags: feedback?(ahalberstadt)
Comment on attachment 8914527 [details] [diff] [review] relpro action Review of attachment 8914527 [details] [diff] [review]: ----------------------------------------------------------------- This lgtm, though I won't have many strong opinions on implementation details. After this lands I'll have to spend a bit more time thinking about how this might be applied to other kinds of tasks. ::: taskcluster/taskgraph/parameters.py @@ +19,5 @@ > PARAMETER_NAMES = set([ > 'base_repository', > 'build_date', > + 'do_not_optimize', > + 'existing_tasks', More of an fyi, but when you rebase you'll need to provide default values for these. The easiest way to test if the default values work is: ./mach try fuzzy -q foo --no-push
Attachment #8914527 - Flags: feedback?(ahalberstadt) → feedback+
Comment on attachment 8914527 [details] [diff] [review] relpro action Review of attachment 8914527 [details] [diff] [review]: ----------------------------------------------------------------- To me the actions-task specific stuff makes sense. Dustin will have a better handle on the changes to decision tasks in general. My only question is whether this will set task.extra.parent properly since we're using taskgraph_decision rather than create_tasks like in the rest of the action tasks. Do we even need to set parent for this use-case?
Attachment #8914527 - Flags: feedback?(bstack) → feedback+
(In reply to Brian Stack [:bstack] from comment #33) > To me the actions-task specific stuff makes sense. Dustin will have a better > handle on the changes to decision tasks in general. > > My only question is whether this will set task.extra.parent properly since > we're using taskgraph_decision rather than create_tasks like in the rest of > the action tasks. Do we even need to set parent for this use-case? Thanks! I believe that: - the action task itself will have task.extra.parent defined, pointing at the original on-push decision task. - the taskgroup that the action task spawns will have task.taskGroupId pointing at the action task, so even if we don't have task.extra.parent defined, we're good. I need to doublecheck all that on maple since things have changed a bit.
(In reply to Andrew Halberstadt [:ahal] from comment #32) > This lgtm, though I won't have many strong opinions on implementation > details. After this lands I'll have to spend a bit more time thinking about > how this might be applied to other kinds of tasks. Thanks! And yeah, we've brainstormed a lot of potential solutions where being able to build a graph off another graph is useful. > ::: taskcluster/taskgraph/parameters.py > @@ +19,5 @@ > > PARAMETER_NAMES = set([ > > 'base_repository', > > 'build_date', > > + 'do_not_optimize', > > + 'existing_tasks', > > More of an fyi, but when you rebase you'll need to provide default values > for these. The easiest way to test if the default values work is: > ./mach try fuzzy -q foo --no-push Awesome, I'll try a merge / rebase soon, probably once maple is all set up.
Comment on attachment 8914527 [details] [diff] [review] relpro action Review of attachment 8914527 [details] [diff] [review]: ----------------------------------------------------------------- I think this is looking good! ::: taskcluster/docs/parameters.rst @@ +25,5 @@ > ``base_repository`` > The repository from which to do an initial clone, utilizing any available > caching. > > +``do_not_optimize`` Maybe move these both to an "Optimization" section in the doc -- they aren't really about the push :) ::: taskcluster/taskgraph/actions/release_promotion.py @@ +43,5 @@ > + 'do_not_optimize': { > + 'type': 'array', > + 'description': ('An list of labels to avoid optimizing out ' > + 'of the graph (to force a rerun of, say, ' > + 'funsize docker-image tasks).'), It occurs to me, you might later want this to be a set of regular expressions or the like, or kind names, or something like that. That should be pretty easy to implement in the action task, though. @@ +112,5 @@ > + logger.info(pprint.pformat(parameters['existing_tasks'])) > + > + # make parameters read-only > + parameters = Parameters(parameters) > + # hardcode until we have a better way of passing this down. probably this should just default to None in argparse, with the default applied in taskgraph_decision when `not options.get('root')` ::: taskcluster/taskgraph/generator.py @@ +267,2 @@ > if not self.parameters.get('optimize_target_tasks', True): > do_not_optimize = target_task_set.graph.nodes What if both `optimize_target_tasks` and `do_not_optimize` are set? This should probably be a set union.
Attachment #8914527 - Flags: feedback?(dustin) → feedback+
Fixed on maple. I have these represented on my git dev branch: https://github.com/escapewindow/gecko-dev/commit/958a0e0cd6fafef2c000ebfa7ad9cffb7c6bae31 and followup https://github.com/escapewindow/gecko-dev/commit/5b81fcecd6f0d053824e37089a07f3559c84d29f We can either resolve this bug, and open a new one to land on m-c, or keep this open to land on m-c, but represent it as 2 different trello cards. I'm leaning towards the latter but am open.
:bstack - I'm hoping you're ok reviewing this based on the above feedback? Feel free to pull in other reviewers as needed. I'd like to land the initial implementation; we'll be morphing this over time as we migrate more of releasetasks in-tree.
Comment on attachment 8921663 [details] bug 1397552 - add a release promotion action. https://reviewboard.mozilla.org/r/192658/#review197904 I think this looks great! I see this incorporates feedback from previous reviews, so I think we're good. ::: taskcluster/taskgraph/actions/release_promotion.py:82 (Diff revision 1) > + }, > + } > + } > +) > +def release_promotion_action(parameters, input, task_group_id, task_id, task): > + # build_number, previous_graph_kinds, target_tasks_method are required We can make these required properties in the input schema to help guide the UI side of things.
Attachment #8921663 - Flags: review?(bstack) → review+
(In reply to Brian Stack [:bstack] from comment #40) > Comment on attachment 8921663 [details] > bug 1397552 - add a release promotion action. > > https://reviewboard.mozilla.org/r/192658/#review197904 > > I think this looks great! I see this incorporates feedback from previous > reviews, so I think we're good. Thanks! > ::: taskcluster/taskgraph/actions/release_promotion.py:82 > (Diff revision 1) > > + }, > > + } > > + } > > +) > > +def release_promotion_action(parameters, input, task_group_id, task_id, task): > > + # build_number, previous_graph_kinds, target_tasks_method are required > > We can make these required properties in the input schema to help guide the > UI side of things. Will do.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: