Closed Bug 1318253 Opened 8 years ago Closed 7 years ago

Use dummy tasks to handle task.dependencies maxItems limit

Categories

(Release Engineering :: Release Automation: Other, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rail, Assigned: mozilla)

References

Details

Attachments

(1 file)

In bug 1316340 I tried to use task.dependencies instead of graph level "requires". Max deps we can use is 100 per task and some of tasks have more than that. We need to use dummy tasks so we can reduce this amount.
Priority: -- → P5
The in-tree part of this would be pretty easy, especially after some stuff I'm doing in bug 1333255. The question is, how do we make a dummy task that resolves immediately once it becomes pending? I'd like some feedback from Jonas here. The obvious option is a worker implementation that just runs claimWork and resolves the tasks immediately. But I think this will be a common requirement, so a worker that we would want to provide to all of our users. And it hardly seems worth running a whole service just for that purpose. Another solution would be to have the queue know of a particular provisionerId/workerType for which it short-circuits pending -> resolved.
Flags: needinfo?(jopsen)
in http://escapewindow.dreamwidth.org/243472.html#dummy_jobs , I brainstormed that we may want to have * noop tasks * notification tasks (irc? slack? email? pulse? other? might be handled by routes already?) * timer tasks - mark as STATUS at datetime These are small enough features that they could all fit in one type of task, if we want all of the above.
> The question is, how do we make a dummy task that resolves immediately once it becomes pending? From the I sent earlier today: Running the decision-task image on gecko-decision with command: ['echo', 'dummy-task'], takes: 3 seconds, sometimes 5 seconds. Unless we have in the order of >10k tasks per day, it's not worth it to build a special cased workerType. Replacing a dummy task template with an empty payload and using a special cased workerType for it would be easy to do later. So let's reconsider that if/when we have >10k dummy tasks per day. 10k dummy tasks at 5s each would take: ~14 compute hours I think we can find better places to spend our time making things faster/cheaper. We can re-evaluate this solution later, as changing the template for dummy tasks should be easy.
Flags: needinfo?(jopsen)
The "graph morph" support I'm adding in bug 1333255 may help with all of those.
Assignee: nobody → aki
Blocks: 1397773
Attached patch reverse_chunk.patch (deleted) — Splinter Review
f? to land on maple. We'll need to add a post-balrog-dummy kind once we have balrog tasks in publish_firefox, I think.
Attachment #8928737 - Flags: feedback?(rail)
Avoiding a post- target_tasks_method morph, because that may negatively affect cot verification.
Comment on attachment 8928737 [details] [diff] [review] reverse_chunk.patch Review of attachment 8928737 [details] [diff] [review]: ----------------------------------------------------------------- This should work imo. Another approach would be adding something generic at the end of taskgraph generation (morph?) to scan all tasks, identify the problematic ones and add dummy deps to all of them, not only to the beetmover task. What do you think? ::: taskcluster/taskgraph/transforms/reverse_chunk_deps.py @@ +11,5 @@ > +from taskgraph.transforms.base import TransformSequence > +import taskgraph.transforms.release_deps as release_deps > + > +transforms = TransformSequence() > +MAX_DEPS = 100 Can you add a pointer to the the docs https://docs.taskcluster.net/reference/platform/taskcluster-queue/references/api#createTask or the schema to make it easier to understand where this limit comes from. @@ +35,5 @@ > + deps = {} > + count += 1 > + if deps: > + yield yield_job(job, deps, count) > + count += 1 Shouldn't you reset the counter here? It looks like you are switching to the next job after this, but keeping `count` global.
Attachment #8928737 - Flags: feedback?(rail) → feedback+
Thanks! (In reply to Rail Aliiev [:rail] ⌚️ET from comment #7) > Comment on attachment 8928737 [details] [diff] [review] > reverse_chunk.patch > > Review of attachment 8928737 [details] [diff] [review]: > ----------------------------------------------------------------- > > This should work imo. Another approach would be adding something generic at > the end of taskgraph generation (morph?) to scan all tasks, identify the > problematic ones and add dummy deps to all of them, not only to the > beetmover task. What do you think? I explicitly avoided this solution, to prevent CoT bustage. CoT checks against task definitions at the target-graph level. Morphing happens afterwards, so we may end up with altered task definitions, which CoT would reject. > ::: taskcluster/taskgraph/transforms/reverse_chunk_deps.py > @@ +11,5 @@ > > +from taskgraph.transforms.base import TransformSequence > > +import taskgraph.transforms.release_deps as release_deps > > + > > +transforms = TransformSequence() > > +MAX_DEPS = 100 > > Can you add a pointer to the the docs > https://docs.taskcluster.net/reference/platform/taskcluster-queue/references/ > api#createTask or the schema to make it easier to understand where this > limit comes from. Sure! > @@ +35,5 @@ > > + deps = {} > > + count += 1 > > + if deps: > > + yield yield_job(job, deps, count) > > + count += 1 > > Shouldn't you reset the counter here? It looks like you are switching to the > next job after this, but keeping `count` global. I kept it since I was hitting dup label issues. I can probably reset it at this point, now that I adjusted my transform workflow... I'll try that.
https://hg.mozilla.org/projects/maple/rev/946a722551ce . We'll probably open a new bug to get everything reviewed+landed on m-c; marking this bug as fixed.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
(In reply to Aki Sasaki [:aki] from comment #8) > Thanks! > > (In reply to Rail Aliiev [:rail] ⌚️ET from comment #7) > > Comment on attachment 8928737 [details] [diff] [review] > > reverse_chunk.patch > > > > Review of attachment 8928737 [details] [diff] [review]: > > ----------------------------------------------------------------- > > > > This should work imo. Another approach would be adding something generic at > > the end of taskgraph generation (morph?) to scan all tasks, identify the > > problematic ones and add dummy deps to all of them, not only to the > > beetmover task. What do you think? > > I explicitly avoided this solution, to prevent CoT bustage. > CoT checks against task definitions at the target-graph level. Morphing > happens afterwards, so we may end up with altered task definitions, which > CoT would reject. I'm wrong here. task-graph.json is written after morphing, so we won't end up with modified task definitions. However, scriptworker tasks with morphed deps may still barf, since we expect upstreamArtifacts to come from taskIds in task.dependencies. This probably isn't a huge concern... we may end up ignoring task.dependencies in jsone-verified-cot, and only ensure that the task definition hasn't changed.
Are we going to need a separate dummy kind for each different phase? I see you've set this one to the "ship" phase, but we also have kinds that depend on beetmover-repackage in the promote (update verify) and publish (final verify) phases.
Flags: needinfo?(aki)
Ah, we should probably move these to 'push'.
Flags: needinfo?(aki)
(In reply to Ben Hearsum (:bhearsum) from comment #11) > Are we going to need a separate dummy kind for each different phase? I see > you've set this one to the "ship" phase, but we also have kinds that depend > on beetmover-repackage in the promote (update verify) and publish (final > verify) phases. And yeah, if we have multiple phases to support, we probably need some for each phase. I'm not sure it requires a new kind, but a new job type definitely.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: