Ensure backstop pushes use a time interval from the last backstop (rather than the last push)
Categories
(Firefox Build System :: Task Configuration, task, P1)
Tracking
(Not tracked)
People
(Reporter: ahal, Assigned: ahal)
References
(Blocks 1 open bug)
Details
Attachments
(5 files, 1 obsolete file)
Sheriffs have found that on weekends we can sometimes go 20+ hours without a full backstop. This means there are fewer merge candidates for sheriffs [1].
This is happening because backstops run either every 20th push or if the last push happened longer than 4 hours ago. During weekends, the volume of pushes is very low. This means that if pushes come in, let's say once every 3:59hrs, then we'll never trigger the time requirement even though we could theoretically go just under 80 hours without a backtop.
To fix this, the backstop optimizer should use the "time since the last backstop" rather than the "time since the last push".
[1] Independent of this bug, I wonder if there should be some policy change here. If less than 20 commits are queued up on autoland, should we even bother merging? Do we need to spin up new nightly builds even when practically nothing has changed? Could we simply not merge on weekends, and get rid of the time requirement entirely? I suspect there are valid reasons things are the way they are that I haven't considered, but figured I'd ask the questions anyway.
Assignee | ||
Comment 1•4 years ago
|
||
This patch cleans up some of the backstop strategy names. Specifically:
- Rename 'full-backstop' -> 'backstop'. The old 'backstop' algorithm was
unused anyway, so there is no conflict. It is also just defined directly in
the decorator rather than using 'Alias'.
So now rather than 'full-backstop' and 'optimized-backstop', it's just
'backstop' and 'optimized-backstop'.
- Remove 'backstop-X-hours-Y-minutes' strategies, and replace them with
the corresponding 'push-interval-X' strategy.
This means we lose the time component in the 'optimized-backstop'. But it isn't
a problem, because we shouldn't be using a time component there at all anyway
(we should just use it with the 'backstop').
Assignee | ||
Comment 2•4 years ago
|
||
In the past, the 'backstop' optimization was applied to tasks by default across
all projects, even though it only really made sense on autoland. To choose what
would happen on non-autoland branches, we invented this 'remove_on_projects'
concept.
These days, we only apply the backstop optimization in the first place for
autoland. So 'remove_on_projects' is no longer necessary.
Depends on D88149
Assignee | ||
Comment 3•4 years ago
|
||
We want to be able to retroactively tell whether a push was a backstop or not.
This patch stores whether or not a push was a "backstop" directly in the
parameters. The optimization strategy now simply returns 'not
params["backstop"]'.
For simplicity, I'm not counting the 'optimized-backstop' as a backstop. It's
unclear if we'll want to be able to detect these types of the pushes in the
future or not, but we can cross that bridge when we get there.
Depends on D88150
Assignee | ||
Comment 4•4 years ago
|
||
These patches lay the groundwork, but don't actually fix this bug yet. They move the backstop determination to a parameter, but the next piece of the puzzle is "how do we find the previous backstop?".
There are three approaches I can think of with various pros and cons:
-
Download the
parameters.yml
from previous decision tasks until we find one that hasbackstop=True
.
Pro: Easiest solution to implement
Con: Could make 20 new requests per taskgraph invocation -
Create a new route only pointing to backstop pushes. Since routes are defined in
.taskcluster.yml
, we'd have to move the backstop determination here. Unless there's some way to add a route to the decision task after the fact?
Pro: Would be most useful solution for consumers (e.g sheriffs could use this route to find merge candidates)
Con: Backfill determination would live out-of-tree. It might be out of sync with in tree optimizations. Unsure if this logic s suitable inci-admin
. -
Upload / overwrite the latest backstop push to some static location.
Pro: The middle ground between the other two.
Con: I'm not sure where this static place should be.
I think we should avoid 1). Ideally I'd like to implement 2), but I don't know if there are other downsides there I'm not considering. If we went with 3), does anyone have any suggestions around where we can store this info?
What do you all think?
Comment 5•4 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #4)
These patches lay the groundwork, but don't actually fix this bug yet. They move the backstop determination to a parameter, but the next piece of the puzzle is "how do we find the previous backstop?".
There are three approaches I can think of with various pros and cons:
- Download the
parameters.yml
from previous decision tasks until we find one that hasbackstop=True
.
Pro: Easiest solution to implement
Con: Could make 20 new requests per taskgraph invocation
Are there other cons here? It feels like this isn't terribly bad timewise if we parallelize them.
- Create a new route only pointing to backstop pushes. Since routes are defined in
.taskcluster.yml
, we'd have to move the backstop determination here. Unless there's some way to add a route to the decision task after the fact?
Pro: Would be most useful solution for consumers (e.g sheriffs could use this route to find merge candidates)
Con: Backfill determination would live out-of-tree. It might be out of sync with in tree optimizations. Unsure if this logic s suitable inci-admin
.
I can't speak to whether it's suitable in ci-admin, but I agree it's a somewhat hidden place for it.
- Upload / overwrite the latest backstop push to some static location.
Pro: The middle ground between the other two.
Con: I'm not sure where this static place should be.
I wonder if we can update an index for backstop pushes only? This would presumably done in a task created by the decision task, so the logic could live in tree, and we wouldn't add a new external dependency.
Comment 6•4 years ago
|
||
we would need a backstop and optimized-backstop index, so 2 indexes.
Assignee | ||
Comment 7•4 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #6)
we would need a backstop and optimized-backstop index, so 2 indexes.
I've been thinking about this a little, and I'm not sure we need to care about optimized-backstop as this series will remove the time component from optimized-backstops entirely. I think as long as full-backstops run at least once ever N hours (i.e once this bug is fixed), we don't really need to schedule the optimized ones on a time basis anymore. The reasoning here is that if pushes are hitting the time intervals, then it means there are very few pushes coming in and we'd be getting into scenarios where every 2nd or 3rd push is some kind of backstop.
I wonder if we can update an index for backstop pushes only? This would presumably done in a task created by the decision task, so the logic could live in tree, and we wouldn't add a new external dependency.
Yeah, I thought about that too actually, not sure why I dismissed it. Something like decision-task-routes
and then all it does is re-expose some of the artifacts. Maybe I'll poke around at this a little.
Comment 8•4 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #4)
- Create a new route only pointing to backstop pushes. Since routes are defined in
.taskcluster.yml
, we'd have to move the backstop determination here. Unless there's some way to add a route to the decision task after the fact?
Pro: Would be most useful solution for consumers (e.g sheriffs could use this route to find merge candidates)
Con: Backfill determination would live out-of-tree. It might be out of sync with in tree optimizations. Unsure if this logic s suitable inci-admin
.
The morph
step in taskgraph generation creates index tasks; these update indexes with taskIds that are not their own. This is at least partially to get around having a maximum of 10 routes per task. If this is possible, we may be able to add an index task to the graph to index the decision task with the backstop
index, and only run it if the decision task has backstop=True
. If that's possible, then we have solution (2) solved, which is the most useful solution for consumers, without the con of out-of-tree index generation.
Comment 9•4 years ago
|
||
I like Aki's idea, but I wouldn't be against implementing the easier solution (i.e. solution number 1) to fix this bug in the short term and work on a nicer solution in a follow-up.
Assignee | ||
Comment 10•4 years ago
|
||
Yeah, let me poke around for a bit. If it looks like it's not too hard I'd prefer to just do it that way at first. But if it will be a big undertaking I'll just stick to option 1 for now.
Assignee | ||
Updated•4 years ago
|
Comment 11•4 years ago
|
||
Comment 12•4 years ago
|
||
bugherder |
Assignee | ||
Comment 13•4 years ago
|
||
I found myself needing to test a push with 'backstop=True' on try. Though, I
couldn't set it because the call to 'is_backstop' happens after the
'try_task_config' parameters are added in. This adds a simple check to handle
that case.
Assignee | ||
Comment 14•4 years ago
|
||
Sometimes we want to conditionally add index routes to the decision task
based on a parameter. E.g, a route that tracks all the backstop pushes.
This adds a new index task specifically for the decision task, which allows
to defined new index routes within the decision task itself.
Depends on D88910
Assignee | ||
Comment 15•4 years ago
|
||
Depends on D88911
Updated•4 years ago
|
Comment 16•4 years ago
|
||
Comment 17•4 years ago
|
||
bugherder |
Assignee | ||
Comment 18•4 years ago
|
||
Should be fixed now. I'll move the patch that got backed out (wasn't technically needed for this bug anyway), over to bug 1656465.
Comment 19•4 years ago
|
||
Comment on attachment 9171924 [details]
Bug 1660506 - Drop the 'remove_on_projects' feature from the Backstop optimization, r?marco
Revision D88150 was moved to bug 1656465. Setting attachment 9171924 [details] to obsolete.
Description
•