Closed Bug 1660506 Opened 4 years ago Closed 4 years ago

Ensure backstop pushes use a time interval from the last backstop (rather than the last push)

Categories

(Firefox Build System :: Task Configuration, task, P1)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ahal, Assigned: ahal)

References

(Blocks 1 open bug)

Details

Attachments

(5 files, 1 obsolete file)

Sheriffs have found that on weekends we can sometimes go 20+ hours without a full backstop. This means there are fewer merge candidates for sheriffs [1].

This is happening because backstops run either every 20th push or if the last push happened longer than 4 hours ago. During weekends, the volume of pushes is very low. This means that if pushes come in, let's say once every 3:59hrs, then we'll never trigger the time requirement even though we could theoretically go just under 80 hours without a backtop.

To fix this, the backstop optimizer should use the "time since the last backstop" rather than the "time since the last push".

[1] Independent of this bug, I wonder if there should be some policy change here. If less than 20 commits are queued up on autoland, should we even bother merging? Do we need to spin up new nightly builds even when practically nothing has changed? Could we simply not merge on weekends, and get rid of the time requirement entirely? I suspect there are valid reasons things are the way they are that I haven't considered, but figured I'd ask the questions anyway.

This patch cleans up some of the backstop strategy names. Specifically:

  1. Rename 'full-backstop' -> 'backstop'. The old 'backstop' algorithm was
    unused anyway, so there is no conflict. It is also just defined directly in
    the decorator rather than using 'Alias'.

So now rather than 'full-backstop' and 'optimized-backstop', it's just
'backstop' and 'optimized-backstop'.

  1. Remove 'backstop-X-hours-Y-minutes' strategies, and replace them with
    the corresponding 'push-interval-X' strategy.

This means we lose the time component in the 'optimized-backstop'. But it isn't
a problem, because we shouldn't be using a time component there at all anyway
(we should just use it with the 'backstop').

In the past, the 'backstop' optimization was applied to tasks by default across
all projects, even though it only really made sense on autoland. To choose what
would happen on non-autoland branches, we invented this 'remove_on_projects'
concept.

These days, we only apply the backstop optimization in the first place for
autoland. So 'remove_on_projects' is no longer necessary.

Depends on D88149

We want to be able to retroactively tell whether a push was a backstop or not.
This patch stores whether or not a push was a "backstop" directly in the
parameters. The optimization strategy now simply returns 'not
params["backstop"]'.

For simplicity, I'm not counting the 'optimized-backstop' as a backstop. It's
unclear if we'll want to be able to detect these types of the pushes in the
future or not, but we can cross that bridge when we get there.

Depends on D88150

These patches lay the groundwork, but don't actually fix this bug yet. They move the backstop determination to a parameter, but the next piece of the puzzle is "how do we find the previous backstop?".

There are three approaches I can think of with various pros and cons:

  1. Download the parameters.yml from previous decision tasks until we find one that has backstop=True.
    Pro: Easiest solution to implement
    Con: Could make 20 new requests per taskgraph invocation

  2. Create a new route only pointing to backstop pushes. Since routes are defined in .taskcluster.yml, we'd have to move the backstop determination here. Unless there's some way to add a route to the decision task after the fact?
    Pro: Would be most useful solution for consumers (e.g sheriffs could use this route to find merge candidates)
    Con: Backfill determination would live out-of-tree. It might be out of sync with in tree optimizations. Unsure if this logic s suitable in ci-admin.

  3. Upload / overwrite the latest backstop push to some static location.
    Pro: The middle ground between the other two.
    Con: I'm not sure where this static place should be.

I think we should avoid 1). Ideally I'd like to implement 2), but I don't know if there are other downsides there I'm not considering. If we went with 3), does anyone have any suggestions around where we can store this info?

What do you all think?

Flags: needinfo?(mcastelluccio)
Flags: needinfo?(bhearsum)
Flags: needinfo?(aki)

(In reply to Andrew Halberstadt [:ahal] from comment #4)

These patches lay the groundwork, but don't actually fix this bug yet. They move the backstop determination to a parameter, but the next piece of the puzzle is "how do we find the previous backstop?".

There are three approaches I can think of with various pros and cons:

  1. Download the parameters.yml from previous decision tasks until we find one that has backstop=True.
    Pro: Easiest solution to implement
    Con: Could make 20 new requests per taskgraph invocation

Are there other cons here? It feels like this isn't terribly bad timewise if we parallelize them.

  1. Create a new route only pointing to backstop pushes. Since routes are defined in .taskcluster.yml, we'd have to move the backstop determination here. Unless there's some way to add a route to the decision task after the fact?
    Pro: Would be most useful solution for consumers (e.g sheriffs could use this route to find merge candidates)
    Con: Backfill determination would live out-of-tree. It might be out of sync with in tree optimizations. Unsure if this logic s suitable in ci-admin.

I can't speak to whether it's suitable in ci-admin, but I agree it's a somewhat hidden place for it.

  1. Upload / overwrite the latest backstop push to some static location.
    Pro: The middle ground between the other two.
    Con: I'm not sure where this static place should be.

I wonder if we can update an index for backstop pushes only? This would presumably done in a task created by the decision task, so the logic could live in tree, and we wouldn't add a new external dependency.

Flags: needinfo?(bhearsum)

we would need a backstop and optimized-backstop index, so 2 indexes.

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #6)

we would need a backstop and optimized-backstop index, so 2 indexes.

I've been thinking about this a little, and I'm not sure we need to care about optimized-backstop as this series will remove the time component from optimized-backstops entirely. I think as long as full-backstops run at least once ever N hours (i.e once this bug is fixed), we don't really need to schedule the optimized ones on a time basis anymore. The reasoning here is that if pushes are hitting the time intervals, then it means there are very few pushes coming in and we'd be getting into scenarios where every 2nd or 3rd push is some kind of backstop.

I wonder if we can update an index for backstop pushes only? This would presumably done in a task created by the decision task, so the logic could live in tree, and we wouldn't add a new external dependency.

Yeah, I thought about that too actually, not sure why I dismissed it. Something like decision-task-routes and then all it does is re-expose some of the artifacts. Maybe I'll poke around at this a little.

(In reply to Andrew Halberstadt [:ahal] from comment #4)

  1. Create a new route only pointing to backstop pushes. Since routes are defined in .taskcluster.yml, we'd have to move the backstop determination here. Unless there's some way to add a route to the decision task after the fact?
    Pro: Would be most useful solution for consumers (e.g sheriffs could use this route to find merge candidates)
    Con: Backfill determination would live out-of-tree. It might be out of sync with in tree optimizations. Unsure if this logic s suitable in ci-admin.

The morph step in taskgraph generation creates index tasks; these update indexes with taskIds that are not their own. This is at least partially to get around having a maximum of 10 routes per task. If this is possible, we may be able to add an index task to the graph to index the decision task with the backstop index, and only run it if the decision task has backstop=True. If that's possible, then we have solution (2) solved, which is the most useful solution for consumers, without the con of out-of-tree index generation.

Flags: needinfo?(aki)

I like Aki's idea, but I wouldn't be against implementing the easier solution (i.e. solution number 1) to fix this bug in the short term and work on a nicer solution in a follow-up.

Flags: needinfo?(mcastelluccio)

Yeah, let me poke around for a bit. If it looks like it's not too hard I'd prefer to just do it that way at first. But if it will be a big undertaking I'll just stick to option 1 for now.

Keywords: leave-open
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7883543ded4d Simplify the backstop optimizations, r=marco https://hg.mozilla.org/integration/autoland/rev/7c48fc6f04e3 Drop the 'remove_on_projects' feature from the Backstop optimization, r=marco https://hg.mozilla.org/integration/autoland/rev/3ab251f67b71 Move backstop determination to the decision task and store it in a parameter, r=marco,taskgraph-reviewers,jmaher

I found myself needing to test a push with 'backstop=True' on try. Though, I
couldn't set it because the call to 'is_backstop' happens after the
'try_task_config' parameters are added in. This adds a simple check to handle
that case.

Sometimes we want to conditionally add index routes to the decision task
based on a parameter. E.g, a route that tracks all the backstop pushes.

This adds a new index task specifically for the decision task, which allows
to defined new index routes within the decision task itself.

Depends on D88910

Attachment #9173223 - Attachment description: Bug 1660506 - [taskgraph] Ensure backstop determination uses time since last backstop → Bug 1660506 - [taskgraph] Ensure backstop determination uses time since last backstop, r?marco
Regressions: 1662427
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b62215be1a5a [taskgraph] Don't overwrite 'backstop' parameter if set from try_task_config.json, r=taskgraph-reviewers,aki https://hg.mozilla.org/integration/autoland/rev/c72da776e55e [taskgraph] Add an index task for the decision task, r=aki https://hg.mozilla.org/integration/autoland/rev/6706e78ac203 [taskgraph] Ensure backstop determination uses time since last backstop, r=marco

Should be fixed now. I'll move the patch that got backed out (wasn't technically needed for this bug anyway), over to bug 1656465.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Keywords: leave-open
Resolution: --- → FIXED

Comment on attachment 9171924 [details]
Bug 1660506 - Drop the 'remove_on_projects' feature from the Backstop optimization, r?marco

Revision D88150 was moved to bug 1656465. Setting attachment 9171924 [details] to obsolete.

Attachment #9171924 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: