Closed Bug 1286022 Opened 8 years ago Closed 8 years ago

Allow replication to pull obsolete changesets

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Assigned: gps)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

Attachments

(1 file, 1 obsolete file)

As a first step towards rebasing autoland to mozilla-central, I want to implement glandium's suggestion in bug 1283669 to change how backouts are handled. Currently, a backout is performed by creating a local changeset that reverts another changeset and that changeset is pushed. The backout/revert changeset forever lives in history. The new proposal is to "prune" the bad changeset from history and rebase everything that followed on top of it. For reasons I'll explain in code comments and commit messages, we'll perform these backouts remotely on the hg.mozilla.org server rather than local (hint: it involves obsolescence and pushlog preservation). Sheriffs will invoke said command via a SSH connection, either a `ssh hg.mozilla.org autolandbackout <rev>` command or via `hg autolandbackout <rev>`. I'm tentatively going with the former because it is easier to implement this as a pash command than a Mercurial wire protocol command. Hence why the bug component is hg.mozilla.org. My plan is to obsolete the backed out changeset and write obsolescence markers mapping old changesets to new changesets. Old pushlog entries (referencing obsolete changesets) will be retained in the database. However, they may render as empty on hg.mozilla.org because the changesets are hidden (this may require pushlog extension changes). For rewritten changesets, we will copy pushlog entries to preserve the original pushlog metadata. Automation will pick these up as new pushes and build them (again). This will give us explicit automation coverage of every push. This is better than what we have today because bustage from a previous push may prevent your automation from running. It will generate a little extra automation load, however. Open question: how should we notify people that a backout has occurred? Currently, the backout commit message references a bug and pulsebot notifies the bug. In addition, a sheriff may set a needinfo in the bug. glandium: I'm interested in your general thoughts, especially around pulsebot behavior.
Flags: needinfo?(mh+mozilla)
This is far from complete. Needs lots of tests and some polish. Review commit: https://reviewboard.mozilla.org/r/63532/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/63532/
Comment on attachment 8769831 [details] hgserver: extension to perform backouts via purging (bug 1286022) Review request updated; see interdiff: https://reviewboard.mozilla.org/r/63532/diff/1-2/
I'm not sure about the pushlog. On the pulsebot side of things, there should probably be some indication in the pushlog json for the new pushes that they are obsoleting previous pushes, so that pulsebot can add a note in the bugzilla comments and/or irc on the repeated landing.
Flags: needinfo?(mh+mozilla)
Comment on attachment 8769831 [details] hgserver: extension to perform backouts via purging (bug 1286022) Review request updated; see interdiff: https://reviewboard.mozilla.org/r/63532/diff/2-3/
Attachment #8769831 - Attachment description: hgmoremoteadmin: command to perform remote backout (bug 1286022) → hgserver: extension to perform backouts via purging (bug 1286022)
I am curious how we would see this in treeherder? My concerns are with performance numbers and being able to associate a revision with a backout instead of something else. If we did a remote backout, would we get a new revision in the tree? I don't think we would, so any changes from the backout would look like the next changeset to run the tests is the root cause.
The history before a backout would be: A - B - C - D The history after a backout of B would be: A - C' - D' with test results for any of A, C', D' (assuming they are all push heads). Treeherder/perfherder might need to be more aware of the history DAG, though, because they might be keeping state for B, C and D.
so if we retriggered tests on B, C, D- I assume we would use the original builds, or when we rebase, C' will get new builds and tests? If we land E after the backout: A - C' - D' - E and E causes test or performance regressions we will need to be aware of this and probably have to rebuild/test everything for C' and D' to determine where the root cause is? That seems sort of overkill, especially if there are tree closures and we need to do a series of pushes to try to determine the root cause. Possibly I am overlooking something.
> probably have to rebuild/test everything for C' and D' to determine where the root cause is? as I said, C' and D' would have already been built and tested. Not because they were when they were C and D, but because they were when they landed rebased. The only thing you need to care about is that test results for B, C and D don't appear after the rebase happens.
At the moment Treeherder fetches the json pushlog using `&startID=NNN` style requests. If the `lastpushid` returned as part of that response is lower than the last seen `lastpushid`, Treeherder presumes the repo was reset, and makes a bare request (ie with no `startID` param) the next time (which defaults to the last 10 pushes). In this case, previous revisions are kept in Treeherder, and any new are just added on. So long as any solution here played nicely with that, we should be all good. (Though Treeherder would still display the old push and build/tests results too; whether that's useful or not for sheriffs/perferder/... is TBD). Though we're also hopefully switching to Pulse Hg ingestion soon too (bug 1065567). (CCing Cameron for Hg pulse part and Will for any potential perferder impact - comment 7 / comment 8).
It would be good to find a way to request all running jobs to be canceled for C and D to save resources. Buildapi has an API and TC listens to an exchange. I can teach pulse_actions to listen to the same exchange and make the request on your behalf (since it has LDAP credentials to make the request). Please file a bug if this works for you.
(In reply to Ed Morley [:emorley] from comment #9) > At the moment Treeherder fetches the json pushlog using `&startID=NNN` style > requests. If the `lastpushid` returned as part of that response is lower > than the last seen `lastpushid`, Treeherder presumes the repo was reset, and > makes a bare request (ie with no `startID` param) the next time (which > defaults to the last 10 pushes). In this case, previous revisions are kept > in Treeherder, and any new are just added on. We won't be deleting push IDs and therefore push IDs won't be decreasing. However, pushlog may advertise changesets that are hidden. This would cause `hg pull -r <hidden rev>` to fail. Or pushlog may start advertising empty pushes (because the changesets in them are now hidden). I should be writing tests today to flush out the exact behavior. Regarding Treeherder/Perfherder, we probably want a mechanism to hide results from now-hidden changesets. This is most important for Perfherder: if we have results from obsoleted changesets intermingled with non-obsoleted changesets, we could see some jagged data and it could throw off change detection. We'll probably want to cancel jobs for changesets that are obsoleted. I view this as a follow-up item because I see it as a performance/load optimization. If we put metadata in Pulse that says changesets were rewritten, we could set up a consumer that listens for these events and cancels things. Anyone could do that. It could even live under the Treeherder umbrella.
(In reply to Gregory Szorc [:gps] from comment #11) > (In reply to Ed Morley [:emorley] from comment #9) > > At the moment Treeherder fetches the json pushlog using `&startID=NNN` style > > requests. If the `lastpushid` returned as part of that response is lower > > than the last seen `lastpushid`, Treeherder presumes the repo was reset, and > > makes a bare request (ie with no `startID` param) the next time (which > > defaults to the last 10 pushes). In this case, previous revisions are kept > > in Treeherder, and any new are just added on. > > We won't be deleting push IDs and therefore push IDs won't be decreasing. > > However, pushlog may advertise changesets that are hidden. This would cause > `hg pull -r <hidden rev>` to fail. Or pushlog may start advertising empty > pushes (because the changesets in them are now hidden). I should be writing > tests today to flush out the exact behavior. > > Regarding Treeherder/Perfherder, we probably want a mechanism to hide > results from now-hidden changesets. This is most important for Perfherder: > if we have results from obsoleted changesets intermingled with non-obsoleted > changesets, we could see some jagged data and it could throw off change > detection. What is the timeline for this work? I can certainly see how we could make this use-case work for Perfherder, but the changes will be non-trivial. It would be a lot easier to handle these sorts of cases once bug 1178641 is fixed.
I'd like to deploy commit rewriting this week. Of course, deploying it doesn't mean we use it: we can try it once or twice and if there are problems it can be disabled until the blockers are worked out.
Depends on: 1286426
For the autoland repo, we want obsolescence markers to be produced on the server and replicated but we don't want random users to be able to push markers. This extension provides a mechanism to only allow obsolescence marker exchange to be enabled for certain users (notably the user running hgweb and the replication user). Review commit: https://reviewboard.mozilla.org/r/63802/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/63802/
Comment on attachment 8769831 [details] hgserver: extension to perform backouts via purging (bug 1286022) Review request updated; see interdiff: https://reviewboard.mozilla.org/r/63532/diff/3-4/
Comment on attachment 8769831 [details] hgserver: extension to perform backouts via purging (bug 1286022) This still needs a bit of work, notably around pushlog and pulse notification. But I'd like to get your feedback.
Attachment #8769831 - Flags: feedback?(mh+mozilla)
Comment on attachment 8770346 [details] obsolescencehacks: extension to enable obsolescence exchange under conditions (bug 1286022); Review request updated; see interdiff: https://reviewboard.mozilla.org/r/63802/diff/1-2/
Attachment #8770346 - Attachment description: obsolescencehacks: extension to enable obsolescence exchange under conditions → obsolescencehacks: extension to enable obsolescence exchange under conditions (bug 1286022);
Attachment #8770346 - Flags: review?(mh+mozilla)
Attachment #8769831 - Flags: feedback?(mh+mozilla)
Comment on attachment 8769831 [details] hgserver: extension to perform backouts via purging (bug 1286022) Review request updated; see interdiff: https://reviewboard.mozilla.org/r/63532/diff/4-5/
Depends on: 1287648
Attachment #8770346 - Flags: review?(mh+mozilla) → review+
Comment on attachment 8770346 [details] obsolescencehacks: extension to enable obsolescence exchange under conditions (bug 1286022); https://reviewboard.mozilla.org/r/63802/#review62118 ::: hgext/obsolescencehacks/__init__.py:45 (Diff revision 2) > + features = set(ui.configlist('experimental', 'evolution')) > + # Nothing to do if already enabled. > + if 'all' in features or obsolete.exchangeopt in features: > + return You could do this check before getting the user name.
Depends on: 1288181
I'm going to land and deploy the obsolescencehacks extension to prod. Will clone this bug to track "remote backouts." I really wish MozReview supported partial review request series so you could keep bugs open and keep writing commits against them...
Comment on attachment 8770346 [details] obsolescencehacks: extension to enable obsolescence exchange under conditions (bug 1286022); Review request updated; see interdiff: https://reviewboard.mozilla.org/r/63802/diff/2-3/
Attachment #8769831 - Attachment is obsolete: true
Pushed by gszorc@mozilla.com: https://hg.mozilla.org/hgcustom/version-control-tools/rev/c2ef44c5eb15 obsolescencehacks: extension to enable obsolescence exchange under conditions ; r=glandium
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Blocks: 1288845
Summary: Allow "remote backouts" on the autoland repo → Allow replication to pull obsolete changesets
Blocks: 1365318
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: