Closed
Bug 1389544
Opened 7 years ago
Closed 7 years ago
[Postmortem][releaseduty] 20% of all Firefox BETA populations upgraded to 56.0b2 for 30mins
Categories
(Release Engineering :: Release Automation: Other, enhancement)
Release Engineering
Release Automation: Other
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: mtabara, Unassigned)
References
Details
(Whiteboard: [releaseduty])
I'll follow-up in a bit with description of what happened. Maybe it's worth having a post-mortem on this.
Reporter | ||
Updated•7 years ago
|
Summary: [Postmortem][releaseduty] 20% of all populations upgraded to 56.0b2 for 30mins → [Postmortem][releaseduty] 20% of all Firefox BETA populations upgraded to 56.0b2 for 30mins
Reporter | ||
Comment 1•7 years ago
|
||
Sorry for delays in following up with more information here.
To describe what happened on Friday night:
1. What were we supposed to do?
We were supposed to ship 56.0b2 to existing 56.0b1 population, keeping everyone else on 55.0-build3.
2. What actually happened?
For ~30min, 20% of all eligible beta population had background updates turned on to upgrade to 56.0b2.
3. Reasoning for this happening
Misunderstanding from my side / Delays to fix the problem due to signoffs in Balrog.
===
tl;dr steps
* beta QE signoff arrives and Relman asks us to push to beta channel, mentioning "< 56 users should stay on 55.0 rc"
* I let automation do the rules change scheduling + amend to what I think RelMan meant + shorten the time to wait until we ship to X
* QE + RelEng sign the rules off, pending Relman signing off as well
* RelMan asks for more clarifications as scheduled rules in Balrog didn't reflect what they asked for
* going back and forth in asking clarifications in #releaseduty between RelEng/RelMan
* the fog clears out and I schedule another rule to reflect RelMan's intent at time Y (Y = X + 1.5h)
* before I manage to munge the automation rule as well, RelMan signs it off as well and goes into effect
* at this point, the default rule was in effect with 20% of all eligible beta population had background updates turned on to upgrade to 56.0b2. The correction rule is scheduled to happen in ~1h only and still lacks RelMan + QE signoff.
* I amend the timing and the initial rule to to go in effect as soon as possible but still needs RelMan/QE signoff.
* another RelMan shows up, we still have no QE to signoff. :bhearsum suggests we use the temporary granting role to somebody else to resolve the emergency
* ~20 mins going back and forth to explain the current situation and to determine whether or not is a good idea to use the temp-grant-QE-role hack.
* RelMan/RelEng is on the same page, another RelEnger is temporarily granted the QE role and signs off
* changes go live, we amend the existing rule and schedule the auxiliary one to reflect RelMan's initial intent
* we're all good.
* the second RelEnger temp QE role is being removed
* file postmortem bug.
More details, actors involved, timings and initial conclusions will be addressed in the postmortem document.
I'll try to find a good time slot to fit both RelEng/RelMan calendar.
Assignee: nobody → mtabara
Status: NEW → ASSIGNED
Depends on: 1392811
Depends on: 1392814
Depends on: 1392817
Reporter | ||
Comment 2•7 years ago
|
||
Post mortem done, no need to have this bug assigned to me, we're still tracking the action items here.
Assignee: mtabara → nobody
Status: ASSIGNED → NEW
Reporter | ||
Updated•7 years ago
|
Whiteboard: [releaseduty]
Reporter | ||
Comment 3•7 years ago
|
||
I think we can close this umbrella bug. There's three action items (as dep) bugs on RelMan's plate and one on ours. But I guess they can all be treated differently at this point.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•