Closed Bug 1499047 Opened 6 years ago Closed 6 years ago

Tracking bug for 2018-12-03 migration work

Categories

(Release Engineering :: Release Requests, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mtabara, Unassigned)

References

Details

+++ This bug was initially created as a clone of Bug #1489406 +++ +++ This bug was initially created as a clone of Bug #1480479 +++ Filing this in advance to start chaining deps for release-ing 64.0
Depends on: 1499440
To help avoid some of the merge day timeouts we've been hitting, we're currently thinking: - the week before the initial merge day, ping #vcs / file a bug to push the head of beta to mozilla-release, and the head of central to mozilla-beta. These may need to be named branches to bypass the single head hook. - we probably want to kill the resulting push's builds, or push with an empty DONTBUILD commit. - in theory, that means the merge day push will contain many weeks' fewer commits, speeding up the hooks. We may want to verify that the merge day script deals well with this scenario... Callek, Simon, Aki, or some other volunteer should probably test this against a user repo beforehand: - populate a user repo with beta - push the head of central to it as a new branch - run the merge day script, dry run, against central + user-repo-beta - verify the push to user-repo-beta does the right thing (only push the new commits) Pretty sure this will work. This isn't our target end state, but this could get us to smoother merge days at relatively low cost until we can implement a long term solution.
gps, sheehan Do you have any concerns with this approach? We could setup a meeting to discuss with releng as needed to ensure we are on the same page.
Flags: needinfo?(sheehan)
Flags: needinfo?(gps)
I believe all known issues related to large pushes failing have been fixed and there should no longer be an issue with large pushes failing going forward. For reference, the two issues/fixes were: * Kafka connection timeouts on the server when sending replication messages (bug 1415233) * SSH connection timeouts due to channel inactivity (bug 1499204) There's still an issue where some hooks may take a few minutes to run. But this is a perf/optimization issue and not a fundamental reliability issue that jeopardizes releases. From a release pipeline robustness intersecting with VCS perspective, I think all is now well and no special process or follow-up is needed. I welcome being proved wrong. At which point we'll fix hg.mozilla.org to support large pushes, as large pushes should "just work." Of course, we may want to pursue non-VCS related changes to improve robustness. But that may be outside the scope of this bug?
Flags: needinfo?(sheehan)
Flags: needinfo?(gps)
(In reply to Gregory Szorc [:gps] from comment #3) > I believe all known issues related to large pushes failing have been fixed > and there should no longer be an issue with large pushes failing going > forward. Okay, Great to see the biggest issues with last few merges have been resolved. I think the proposal that aki suggests in comment 1 is good but I understand that it involves VCS. If things "just work" that's fine so long as VCS are willing to continue supporting and ensuring that hg.m.o can do large pushes. One thing that has been challenging in the past is testing this. Particularly the push. As, afaik, even if we were to push to a staging repo, the hooks and configuration will be different than on say m-r or m-b. gps, do you have any thoughts on this? Also, can you confirm that thanks to bug 1415233, we don't need to ask to disable vcsreplicator hooks[0] anymore? I believe that was supposed to be resolved last cycle but we got bit by it again. [0] https://github.com/mozilla-releng/releasewarrior-2.0/blob/master/docs/mergeduty/howto.md#disable-migration-blocking-hgmo-hooks
Flags: needinfo?(gps)
> I believe that was supposed to be resolved last cycle but we got bit by it again. IIRC the hook wasn't disabled this cycle
I just resolved bug 1415233 because that issue was fixed several weeks ago. With that issue out of the way, we uncovered a separate issue with the SSH channel timing out. Bug 1499204 landed a permanent fix for that. With those issues out of the way, I'm quite confident that large pushes should "just work." If they don't, I consider it a P1 bug against hg.mozilla.org. Regarding the hooks and configuration being different, it is a long-standing issue that people don't have visibility into the repo-specific hgrc modifications that are made on the server. I think we should bit the bullet and vendor hgrc snippets into version-control-tools so there is a) visibility b) ability for others to change things c) without people editing files directly on the server. I filed bug 1504811 to track this. That leaves us with intermittent `hg robustcheckout` failures. Please keep filing bugs and chaining them to the tracker for any issues you encounter in release automation.
Flags: needinfo?(gps)
@gps - so just to be explicit, can we remove this step of asking vcs to remove the ftl and vcsreplicator hooks in our runbook: https://github.com/mozilla-releng/releasewarrior-2.0/blob/master/docs/mergeduty/howto.md#disable-migration-blocking-hgmo-hooks
Flags: needinfo?(gps)
(In reply to Jordan Lund (:jlund) from comment #7) > @gps - so just to be explicit, can we remove this step of asking vcs to > remove the ftl and vcsreplicator hooks in our runbook: > https://github.com/mozilla-releng/releasewarrior-2.0/blob/master/docs/ > mergeduty/howto.md#disable-migration-blocking-hgmo-hooks The vcsreplicator hook for sure. The FTL hook, I'm not sure. I consider it a bug if we need to disable hooks to allow legitimate pushes to go through. So if the FTL hook is problematic, we should fix the hook. I /think/ the last merge day went off without a hitch. So maybe things are good?
Flags: needinfo?(gps)
Summary: Tracking bug for 2018-12-11 migration work → Tracking bug for 2018-12-03 migration work
(In reply to Gregory Szorc [:gps] from comment #8) > (In reply to Jordan Lund (:jlund) from comment #7) > > @gps - so just to be explicit, can we remove this step of asking vcs to > > remove the ftl and vcsreplicator hooks in our runbook: > > https://github.com/mozilla-releng/releasewarrior-2.0/blob/master/docs/ > > mergeduty/howto.md#disable-migration-blocking-hgmo-hooks > > The vcsreplicator hook for sure. The FTL hook, I'm not sure. I consider it a > bug if we need to disable hooks to allow legitimate pushes to go through. So > if the FTL hook is problematic, we should fix the hook. > > I /think/ the last merge day went off without a hitch. So maybe things are > good? I'm going to do this mondays merge with the expectation that we will not disable any hooks, should any be needed I'll both n-i here and ping in #vcs to get that done (failing a response by evening I may seek out greg or connor in person)
beta->release, central->beta, esr version bump and l10n-bumper are done.

I think it's safe to close this bug. Feel free to reopen if I missed anything.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Blocks: 1520195
Blocks: 1525081
You need to log in before you can comment on or make changes to this bug.