Closed Bug 1660506 Opened 4 years ago Closed 4 years ago

Ensure backstop pushes use a time interval from the last backstop (rather than the last push)

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: ahal, Assigned: ahal)

References

(Blocks 1 open bug)

Details

Attachments

(5 files, 1 obsolete file)

Bug 1660506 - Simplify the backstop optimizations, r?marco 4 years ago Andrew Halberstadt [:ahal] (deleted), text/x-phabricator-request		Details
Bug 1660506 - Drop the 'remove_on_projects' feature from the Backstop optimization, r?marco 4 years ago Andrew Halberstadt [:ahal] (deleted), text/x-phabricator-request		Details
Bug 1660506 - Move backstop determination to the decision task and store it in a parameter, r?#taskgraph-reviewers!,marco! 4 years ago Andrew Halberstadt [:ahal] (deleted), text/x-phabricator-request		Details
Bug 1660506 - [taskgraph] Don't overwrite 'backstop' parameter if set from try_task_config.json, r?#taskgraph-reviewers 4 years ago Andrew Halberstadt [:ahal] (deleted), text/x-phabricator-request		Details
Bug 1660506 - [taskgraph] Add an index task for the decision task, r?aki 4 years ago Andrew Halberstadt [:ahal] (deleted), text/x-phabricator-request		Details
Bug 1660506 - [taskgraph] Ensure backstop determination uses time since last backstop, r?marco 4 years ago Andrew Halberstadt [:ahal] (deleted), text/x-phabricator-request		Details

Andrew Halberstadt [:ahal]

Assignee

Description

•

4 years ago

Sheriffs have found that on weekends we can sometimes go 20+ hours without a full backstop. This means there are fewer merge candidates for sheriffs [1].

This is happening because backstops run either every 20th push or if the last push happened longer than 4 hours ago. During weekends, the volume of pushes is very low. This means that if pushes come in, let's say once every 3:59hrs, then we'll never trigger the time requirement even though we could theoretically go just under 80 hours without a backtop.

To fix this, the backstop optimizer should use the "time since the last backstop" rather than the "time since the last push".

[1] Independent of this bug, I wonder if there should be some policy change here. If less than 20 commits are queued up on autoland, should we even bother merging? Do we need to spin up new nightly builds even when practically nothing has changed? Could we simply not merge on weekends, and get rid of the time requirement entirely? I suspect there are valid reasons things are the way they are that I haven't considered, but figured I'd ask the questions anyway.

Andrew Halberstadt [:ahal]

Assignee

Comment 1

•

4 years ago

Attached file Bug 1660506 - Simplify the backstop optimizations, r?marco (deleted) — Details

This patch cleans up some of the backstop strategy names. Specifically:

Rename 'full-backstop' -> 'backstop'. The old 'backstop' algorithm was
unused anyway, so there is no conflict. It is also just defined directly in
the decorator rather than using 'Alias'.

So now rather than 'full-backstop' and 'optimized-backstop', it's just
'backstop' and 'optimized-backstop'.

Remove 'backstop-X-hours-Y-minutes' strategies, and replace them with
the corresponding 'push-interval-X' strategy.

This means we lose the time component in the 'optimized-backstop'. But it isn't
a problem, because we shouldn't be using a time component there at all anyway
(we should just use it with the 'backstop').

Andrew Halberstadt [:ahal]

Assignee

Comment 2

•

4 years ago

Attached file Bug 1660506 - Drop the 'remove_on_projects' feature from the Backstop optimization, r?marco (obsolete) (deleted) — Details

In the past, the 'backstop' optimization was applied to tasks by default across
all projects, even though it only really made sense on autoland. To choose what
would happen on non-autoland branches, we invented this 'remove_on_projects'
concept.

These days, we only apply the backstop optimization in the first place for
autoland. So 'remove_on_projects' is no longer necessary.

Depends on D88149

Andrew Halberstadt [:ahal]

Assignee

Comment 3

•

4 years ago

Attached file Bug 1660506 - Move backstop determination to the decision task and store it in a parameter, r?#taskgraph-reviewers!,marco! (deleted) — Details

We want to be able to retroactively tell whether a push was a backstop or not.
This patch stores whether or not a push was a "backstop" directly in the
parameters. The optimization strategy now simply returns 'not
params["backstop"]'.

For simplicity, I'm not counting the 'optimized-backstop' as a backstop. It's
unclear if we'll want to be able to detect these types of the pushes in the
future or not, but we can cross that bridge when we get there.

Depends on D88150

Andrew Halberstadt [:ahal]

Assignee

Comment 4

•

4 years ago

These patches lay the groundwork, but don't actually fix this bug yet. They move the backstop determination to a parameter, but the next piece of the puzzle is "how do we find the previous backstop?".

There are three approaches I can think of with various pros and cons:

Download the parameters.yml from previous decision tasks until we find one that has backstop=True.
Pro: Easiest solution to implement
Con: Could make 20 new requests per taskgraph invocation
Create a new route only pointing to backstop pushes. Since routes are defined in .taskcluster.yml, we'd have to move the backstop determination here. Unless there's some way to add a route to the decision task after the fact?
Pro: Would be most useful solution for consumers (e.g sheriffs could use this route to find merge candidates)
Con: Backfill determination would live out-of-tree. It might be out of sync with in tree optimizations. Unsure if this logic s suitable in ci-admin.
Upload / overwrite the latest backstop push to some static location.
Pro: The middle ground between the other two.
Con: I'm not sure where this static place should be.

I think we should avoid 1). Ideally I'd like to implement 2), but I don't know if there are other downsides there I'm not considering. If we went with 3), does anyone have any suggestions around where we can store this info?

What do you all think?

Flags: needinfo?(mcastelluccio)

Flags: needinfo?(bhearsum)

Flags: needinfo?(aki)

bhearsum@mozilla.com (:bhearsum)

Comment 5

•

4 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #4)

These patches lay the groundwork, but don't actually fix this bug yet. They move the backstop determination to a parameter, but the next piece of the puzzle is "how do we find the previous backstop?".

There are three approaches I can think of with various pros and cons:

Download the parameters.yml from previous decision tasks until we find one that has backstop=True.
Pro: Easiest solution to implement
Con: Could make 20 new requests per taskgraph invocation

Are there other cons here? It feels like this isn't terribly bad timewise if we parallelize them.

Create a new route only pointing to backstop pushes. Since routes are defined in .taskcluster.yml, we'd have to move the backstop determination here. Unless there's some way to add a route to the decision task after the fact?
Pro: Would be most useful solution for consumers (e.g sheriffs could use this route to find merge candidates)
Con: Backfill determination would live out-of-tree. It might be out of sync with in tree optimizations. Unsure if this logic s suitable in ci-admin.

I can't speak to whether it's suitable in ci-admin, but I agree it's a somewhat hidden place for it.

Upload / overwrite the latest backstop push to some static location.
Pro: The middle ground between the other two.
Con: I'm not sure where this static place should be.

I wonder if we can update an index for backstop pushes only? This would presumably done in a task created by the decision task, so the logic could live in tree, and we wouldn't add a new external dependency.

Flags: needinfo?(bhearsum)

Joel Maher ( :jmaher ) (UTC -8)

Comment 6

•

4 years ago

we would need a backstop and optimized-backstop index, so 2 indexes.

Andrew Halberstadt [:ahal]

Assignee

Comment 7

•

4 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #6)

we would need a backstop and optimized-backstop index, so 2 indexes.

I've been thinking about this a little, and I'm not sure we need to care about optimized-backstop as this series will remove the time component from optimized-backstops entirely. I think as long as full-backstops run at least once ever N hours (i.e once this bug is fixed), we don't really need to schedule the optimized ones on a time basis anymore. The reasoning here is that if pushes are hitting the time intervals, then it means there are very few pushes coming in and we'd be getting into scenarios where every 2nd or 3rd push is some kind of backstop.

I wonder if we can update an index for backstop pushes only? This would presumably done in a task created by the decision task, so the logic could live in tree, and we wouldn't add a new external dependency.

Yeah, I thought about that too actually, not sure why I dismissed it. Something like decision-task-routes and then all it does is re-expose some of the artifacts. Maybe I'll poke around at this a little.

Aki Sasaki (not active)

Comment 8

•

4 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #4)

Create a new route only pointing to backstop pushes. Since routes are defined in .taskcluster.yml, we'd have to move the backstop determination here. Unless there's some way to add a route to the decision task after the fact?
Pro: Would be most useful solution for consumers (e.g sheriffs could use this route to find merge candidates)
Con: Backfill determination would live out-of-tree. It might be out of sync with in tree optimizations. Unsure if this logic s suitable in ci-admin.

The morph step in taskgraph generation creates index tasks; these update indexes with taskIds that are not their own. This is at least partially to get around having a maximum of 10 routes per task. If this is possible, we may be able to add an index task to the graph to index the decision task with the backstop index, and only run it if the decision task has backstop=True. If that's possible, then we have solution (2) solved, which is the most useful solution for consumers, without the con of out-of-tree index generation.

Flags: needinfo?(aki)

Marco Castelluccio [:marco]

Comment 9

•

4 years ago

I like Aki's idea, but I wouldn't be against implementing the easier solution (i.e. solution number 1) to fix this bug in the short term and work on a nicer solution in a follow-up.

Flags: needinfo?(mcastelluccio)

Andrew Halberstadt [:ahal]

Assignee

Comment 10

•

4 years ago

Yeah, let me poke around for a bit. If it looks like it's not too hard I'd prefer to just do it that way at first. But if it will be a big undertaking I'll just stick to option 1 for now.

Andrew Halberstadt [:ahal]

Assignee

Updated

•

4 years ago

Keywords: leave-open

Pulsebot

Comment 11

•

4 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7883543ded4d Simplify the backstop optimizations, r=marco https://hg.mozilla.org/integration/autoland/rev/7c48fc6f04e3 Drop the 'remove_on_projects' feature from the Backstop optimization, r=marco https://hg.mozilla.org/integration/autoland/rev/3ab251f67b71 Move backstop determination to the decision task and store it in a parameter, r=marco,taskgraph-reviewers,jmaher

Cosmin Sabou [:CosminS]

Comment 12

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/7883543ded4d
https://hg.mozilla.org/mozilla-central/rev/7c48fc6f04e3
https://hg.mozilla.org/mozilla-central/rev/3ab251f67b71

Andrew Halberstadt [:ahal]

Assignee

Comment 13

•

4 years ago

Attached file Bug 1660506 - [taskgraph] Don't overwrite 'backstop' parameter if set from try_task_config.json, r?#taskgraph-reviewers (deleted) — Details

I found myself needing to test a push with 'backstop=True' on try. Though, I
couldn't set it because the call to 'is_backstop' happens after the
'try_task_config' parameters are added in. This adds a simple check to handle
that case.

Andrew Halberstadt [:ahal]

Assignee

Comment 14

•

4 years ago

Attached file Bug 1660506 - [taskgraph] Add an index task for the decision task, r?aki (deleted) — Details

Sometimes we want to conditionally add index routes to the decision task
based on a parameter. E.g, a route that tracks all the backstop pushes.

This adds a new index task specifically for the decision task, which allows
to defined new index routes within the decision task itself.

Depends on D88910

Andrew Halberstadt [:ahal]

Assignee

Comment 15

•

4 years ago

Attached file Bug 1660506 - [taskgraph] Ensure backstop determination uses time since last backstop, r?marco (deleted) — Details

Depends on D88911

Phabricator Automation

Updated

•

4 years ago

Attachment #9173223 - Attachment description: Bug 1660506 - [taskgraph] Ensure backstop determination uses time since last backstop → Bug 1660506 - [taskgraph] Ensure backstop determination uses time since last backstop, r?marco

Andrew Halberstadt [:ahal]

Assignee

Updated

•

4 years ago

Regressions: 1662427

Pulsebot

Comment 16

•

4 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b62215be1a5a [taskgraph] Don't overwrite 'backstop' parameter if set from try_task_config.json, r=taskgraph-reviewers,aki https://hg.mozilla.org/integration/autoland/rev/c72da776e55e [taskgraph] Add an index task for the decision task, r=aki https://hg.mozilla.org/integration/autoland/rev/6706e78ac203 [taskgraph] Ensure backstop determination uses time since last backstop, r=marco

Atila Butkovits

Comment 17

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b62215be1a5a
https://hg.mozilla.org/mozilla-central/rev/c72da776e55e
https://hg.mozilla.org/mozilla-central/rev/6706e78ac203

Andrew Halberstadt [:ahal]

Assignee

Comment 18

•

4 years ago

Should be fixed now. I'll move the patch that got backed out (wasn't technically needed for this bug anyway), over to bug 1656465.

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

Keywords: leave-open

Resolution: --- → FIXED

Phabricator Automation

Comment 19

•

4 years ago

Comment on attachment 9171924 [details]
Bug 1660506 - Drop the 'remove_on_projects' feature from the Backstop optimization, r?marco

Revision D88150 was moved to bug 1656465. Setting attachment 9171924 [details] to obsolete.

Attachment #9171924 - Attachment is obsolete: true

You need to log in before you can comment on or make changes to this bug.