Closed Bug 1369379 Opened 7 years ago Closed 5 years ago

consider formally integrating test channels with scheduled changes

Categories

(Release Engineering Graveyard :: Applications: Balrog (backend), enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED MOVED

People

(Reporter: bhearsum, Unassigned)

References

Details

(Whiteboard: [lang=python])

For most live channels (eg: release) in Balrog we have one or more "test" channels for them. These are generally kept in sync with the live channel, with the exception of when we're testing soon-to-be-live changes. The most common case for this is shipping a new release, where the process is: 1) Submit new release to Balrog 2) Update test channel rule(s) to point to it 3) Verify updates on the test channels 4) Update live channel rule(s) to point at it This process works well, but it's difficult to be certain that test channels are actually in sync with live channels. This is generally a manual human process, and quite difficult to be absolutely certain of. One way we could make this better is to formalize the idea of "test channels" into Balrog by hooking them into the Scheduled Changes system. Instead of having an entirely separate set of rules for them, we would consider the test channel rules to be the current live channel, plus whatever Scheduled Changes are in the queue for that live channel. This would have a few benefits: * Higher confidence that we've tested what we live * Making it much less likely to push *anything* live without testing * Formalizing the signoff of update testing * Greatly reducing the number of rules in Balrog I haven't thought in great detail about exactly how we'd implement it, but I have some random braindump thoughts: * Probably need some sort of mapping and UI to manage live <-> test channel(s) mappings (eg: release maps to release-localtest and release-cdntest) * Need the ability to combine the rules table and scheduled rule changes into one set of queryable rules * Need to decide if we should take scheduled release changes into account after finding the mapping (probably "yes") This idea needs a lot more thought and discussion as there's probably lots more edge cases and missing details before this viable.
Priority: -- → P3
I did an extremely rough prototype for this today which is able to update rules to their scheduled versions before evaluating them. It doesn't take into account new rules, scheduled release changes, or anything else like that - but for the one use case it covers, it works. The code is available at https://github.com/mozilla/balrog/compare/master...mozbhearsum:testchannel?expand=1. It took me less than an hour to throw together which gives me some confidence that this in't quite as big of a project as I'd anticipated. There's a lot of edge cases to flesh out and handle, but I think this could probably be knocked in a matter of a few weeks. Some additional thoughts: * Release Promotion will need to be updated to not do anything with test channel rules, and to ensure we're scheduling changes to live channel rules prior to update testing. * Do we need to (should we?) support staging multiple things at once for any given live channel. If so, how will that work - we can only have one active scheduled change for any given rule at a time. * Test channel history will become more opaque (you need to rewind rules+scheduled changes to find it). Is that OK? Do we need to block this on UI improvements? > * Probably need some sort of mapping and UI to manage live <-> test > channel(s) mappings (eg: release maps to release-localtest and > release-cdntest) I'm not sure if this is necessary. My prototype accepts anything like "release-cdntest" as a test channel, which might be good enough. If we have use cases for a test channel that isn't composed of primary channel + scheduled changes, we could use a different channel name for that. > This idea needs a lot more thought and discussion as there's probably lots > more edge cases and missing details before this viable. Still need this. I think it would be a good idea to put together a formal plan to circulate for feedback and help find edge cases before we decide to move forward with this. We caught a lot of issues up front when I did this for Multiple Signoffs, and changing the way test channels work could be very impactful, so we need to make sure we do it right.
Blocks: 1392706
Mentor: bhearsum
Mentor: bhearsum
Assignee: nobody → bhearsum
Priority: P3 → P1
I think we still want to do this, but there is no timetable at the moment.
Assignee: bhearsum → nobody
Priority: P1 → P3
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → MOVED
Product: Release Engineering → Release Engineering Graveyard
You need to log in before you can comment on or make changes to this bug.