Closed
Bug 1318253
Opened 8 years ago
Closed 7 years ago
Use dummy tasks to handle task.dependencies maxItems limit
Categories
(Release Engineering :: Release Automation: Other, defect, P5)
Release Engineering
Release Automation: Other
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rail, Assigned: mozilla)
References
Details
Attachments
(1 file)
(deleted),
patch
|
rail
:
feedback+
|
Details | Diff | Splinter Review |
In bug 1316340 I tried to use task.dependencies instead of graph level "requires". Max deps we can use is 100 per task and some of tasks have more than that. We need to use dummy tasks so we can reduce this amount.
Reporter | ||
Updated•8 years ago
|
Priority: -- → P5
Comment 1•8 years ago
|
||
The in-tree part of this would be pretty easy, especially after some stuff I'm doing in bug 1333255.
The question is, how do we make a dummy task that resolves immediately once it becomes pending? I'd like some feedback from Jonas here.
The obvious option is a worker implementation that just runs claimWork and resolves the tasks immediately.
But I think this will be a common requirement, so a worker that we would want to provide to all of our users. And it hardly seems worth running a whole service just for that purpose.
Another solution would be to have the queue know of a particular provisionerId/workerType for which it short-circuits pending -> resolved.
Flags: needinfo?(jopsen)
Assignee | ||
Comment 2•8 years ago
|
||
in http://escapewindow.dreamwidth.org/243472.html#dummy_jobs , I brainstormed that we may want to have
* noop tasks
* notification tasks (irc? slack? email? pulse? other? might be handled by routes already?)
* timer tasks - mark as STATUS at datetime
These are small enough features that they could all fit in one type of task, if we want all of the above.
Comment 3•8 years ago
|
||
> The question is, how do we make a dummy task that resolves immediately once it becomes pending?
From the I sent earlier today:
Running the decision-task image on gecko-decision with command: ['echo', 'dummy-task'], takes:
3 seconds, sometimes 5 seconds.
Unless we have in the order of >10k tasks per day, it's not worth it to build a special cased workerType.
Replacing a dummy task template with an empty payload and using a special cased workerType for it would
be easy to do later. So let's reconsider that if/when we have >10k dummy tasks per day.
10k dummy tasks at 5s each would take: ~14 compute hours
I think we can find better places to spend our time making things faster/cheaper. We can re-evaluate
this solution later, as changing the template for dummy tasks should be easy.
Flags: needinfo?(jopsen)
Comment 4•8 years ago
|
||
The "graph morph" support I'm adding in bug 1333255 may help with all of those.
Assignee | ||
Comment 5•7 years ago
|
||
f? to land on maple.
We'll need to add a post-balrog-dummy kind once we have balrog tasks in publish_firefox, I think.
Attachment #8928737 -
Flags: feedback?(rail)
Assignee | ||
Comment 6•7 years ago
|
||
Avoiding a post- target_tasks_method morph, because that may negatively affect cot verification.
Reporter | ||
Comment 7•7 years ago
|
||
Comment on attachment 8928737 [details] [diff] [review]
reverse_chunk.patch
Review of attachment 8928737 [details] [diff] [review]:
-----------------------------------------------------------------
This should work imo. Another approach would be adding something generic at the end of taskgraph generation (morph?) to scan all tasks, identify the problematic ones and add dummy deps to all of them, not only to the beetmover task. What do you think?
::: taskcluster/taskgraph/transforms/reverse_chunk_deps.py
@@ +11,5 @@
> +from taskgraph.transforms.base import TransformSequence
> +import taskgraph.transforms.release_deps as release_deps
> +
> +transforms = TransformSequence()
> +MAX_DEPS = 100
Can you add a pointer to the the docs https://docs.taskcluster.net/reference/platform/taskcluster-queue/references/api#createTask or the schema to make it easier to understand where this limit comes from.
@@ +35,5 @@
> + deps = {}
> + count += 1
> + if deps:
> + yield yield_job(job, deps, count)
> + count += 1
Shouldn't you reset the counter here? It looks like you are switching to the next job after this, but keeping `count` global.
Attachment #8928737 -
Flags: feedback?(rail) → feedback+
Assignee | ||
Comment 8•7 years ago
|
||
Thanks!
(In reply to Rail Aliiev [:rail] ⌚️ET from comment #7)
> Comment on attachment 8928737 [details] [diff] [review]
> reverse_chunk.patch
>
> Review of attachment 8928737 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> This should work imo. Another approach would be adding something generic at
> the end of taskgraph generation (morph?) to scan all tasks, identify the
> problematic ones and add dummy deps to all of them, not only to the
> beetmover task. What do you think?
I explicitly avoided this solution, to prevent CoT bustage.
CoT checks against task definitions at the target-graph level. Morphing happens afterwards, so we may end up with altered task definitions, which CoT would reject.
> ::: taskcluster/taskgraph/transforms/reverse_chunk_deps.py
> @@ +11,5 @@
> > +from taskgraph.transforms.base import TransformSequence
> > +import taskgraph.transforms.release_deps as release_deps
> > +
> > +transforms = TransformSequence()
> > +MAX_DEPS = 100
>
> Can you add a pointer to the the docs
> https://docs.taskcluster.net/reference/platform/taskcluster-queue/references/
> api#createTask or the schema to make it easier to understand where this
> limit comes from.
Sure!
> @@ +35,5 @@
> > + deps = {}
> > + count += 1
> > + if deps:
> > + yield yield_job(job, deps, count)
> > + count += 1
>
> Shouldn't you reset the counter here? It looks like you are switching to the
> next job after this, but keeping `count` global.
I kept it since I was hitting dup label issues. I can probably reset it at this point, now that I adjusted my transform workflow... I'll try that.
Assignee | ||
Comment 9•7 years ago
|
||
https://hg.mozilla.org/projects/maple/rev/946a722551ce .
We'll probably open a new bug to get everything reviewed+landed on m-c; marking this bug as fixed.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 10•7 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #8)
> Thanks!
>
> (In reply to Rail Aliiev [:rail] ⌚️ET from comment #7)
> > Comment on attachment 8928737 [details] [diff] [review]
> > reverse_chunk.patch
> >
> > Review of attachment 8928737 [details] [diff] [review]:
> > -----------------------------------------------------------------
> >
> > This should work imo. Another approach would be adding something generic at
> > the end of taskgraph generation (morph?) to scan all tasks, identify the
> > problematic ones and add dummy deps to all of them, not only to the
> > beetmover task. What do you think?
>
> I explicitly avoided this solution, to prevent CoT bustage.
> CoT checks against task definitions at the target-graph level. Morphing
> happens afterwards, so we may end up with altered task definitions, which
> CoT would reject.
I'm wrong here. task-graph.json is written after morphing, so we won't end up with modified task definitions. However, scriptworker tasks with morphed deps may still barf, since we expect upstreamArtifacts to come from taskIds in task.dependencies. This probably isn't a huge concern... we may end up ignoring task.dependencies in jsone-verified-cot, and only ensure that the task definition hasn't changed.
Comment 11•7 years ago
|
||
Are we going to need a separate dummy kind for each different phase? I see you've set this one to the "ship" phase, but we also have kinds that depend on beetmover-repackage in the promote (update verify) and publish (final verify) phases.
Flags: needinfo?(aki)
Assignee | ||
Comment 13•7 years ago
|
||
(In reply to Ben Hearsum (:bhearsum) from comment #11)
> Are we going to need a separate dummy kind for each different phase? I see
> you've set this one to the "ship" phase, but we also have kinds that depend
> on beetmover-repackage in the promote (update verify) and publish (final
> verify) phases.
And yeah, if we have multiple phases to support, we probably need some for each phase. I'm not sure it requires a new kind, but a new job type definitely.
You need to log in
before you can comment on or make changes to this bug.
Description
•