Closed Bug 1277916 Opened 8 years ago Closed 7 years ago

Recipes sign-off

Categories

(Firefox :: Normandy Server, defect, P2)

defect

Tracking

()

RESOLVED INVALID

People

(Reporter: jvehent, Unassigned)

Details

This is a follow-up on bug 1248666 which was closed after it was decided to manage actions in git. For context: while actions are now treated as code and peer-reviewed, the recipes that launch those actions are still managed in the admin panel of normandy. I've asked the release managers (lmandel, sledu, rkothari) for their thought on real-time data/code pushes to firefox clients, and they requested two things: 1. changes that go out to beta and release channel should get a sign-off from release management. 2. testing of recipes should not send anything to users in the release channel I think (1) should be implemented in the admin panel directly by requiring a release manager to log in and sign off on a recipe before it is sent to either beta or release channels. This effectively implements a two-man rule on the recipes release process. (2) could prove challenging to implement due to the way filtering currently works in normandy recipes. Would it be possible to use distribution buckets per channel, the same way firefox clients point to the right update for their channel?
Hey Julien, I think there may be some misunderstanding here.
Oops. Hit send by accident. I'm going to chat with pdol and osmose to make sure I full understand. Apologies for the bug spam.
I disagree with the suggestion that recipe changes made within the Normandy admin should get sign-off from release management. I think we should stick with a 2-person approval from any 2 people with edit rights in the Normandy admin (effectively, any two people from Strategy and Insights). Coordination with RelMan making changes slower is an issue, but mainly I think RelMan already has an opportunity for sign-off in the system add-on (and possibly in action code if we're really concerned) that isn't greatly improved by adding sign-off at the recipe level. To recap, there's three levels of functionality in SHIELD: - The system add-on, providing privileged functionality like showing a survey prompt or a SHIELD study consent page. - Actions, which are JavaScript files run in a sandbox by the system add-on. Action code will be signed and verified before running. - Recipes, which contain configuration for the actions. This is stuff like survey text, a link to a SHIELD study add-on, etc. Each level has a different standard of review; the system add-on requires RelMan review before shipping either as a live update or via the trains, actions go through code review and deployment alongside the service, and recipes go through peer review via the admin. For example, one action idea that's raised a bunch of eyebrows is installing an add-on directly. If RelMan was (rightfully, because this is just a bit terrifying) concerned about that ability, they could require that we whitelist add-ons installable that way within the system add-on itself. That way, no matter what trickery or compromise happens in the action or recipe, we're confident that un-approved add-ons are not installed, and we can use system add-on updates to update that whitelist quickly if something goes wrong. But Strategy and Insights can still control the recipes and what we want to send out quickly. Put another way, approval at the system add-on level allows RelMan to shape what actions and recipes can do without requiring approval at a recipe level. Almost everything that we intend to put in recipe configs is stuff like survey and notification text, and stuff that has a large effect on security or stability or quality can be reasonably limited by the system add-on. Requiring RelMan approval for things like the language of survey variations that change _very_ often will have a big negative effect on Strategy and Insights' ability to use SHIELD effectively. Ritu: What are your thoughts? Is approval at the system add-on level good enough? Are there any alternatives besides recipe-level review we can consider?
Flags: needinfo?(rkothari)
(In reply to Julien Vehent [:ulfr] from comment #0) > 2. testing of recipes should not send anything to users in the release > channel > > (2) could prove challenging to implement due to the way filtering currently > works in normandy recipes. Would it be possible to use distribution buckets > per channel, the same way firefox clients point to the right update for > their channel? Oh, and for this, we currently use the staging and dev servers for testing new actions and changes to action, and bug 1269536 covers the ability to test recipes without sending anything to users. So we might be good here?
(In reply to Michael Kelly [:mkelly,:Osmose] from comment #3) > > Put another way, approval at the system add-on level allows RelMan to shape > what actions and recipes can do without requiring approval at a recipe > level. Almost everything that we intend to put in recipe configs is stuff > like survey and notification text, and stuff that has a large effect on > security or stability or quality can be reasonably limited by the system > add-on. Requiring RelMan approval for things like the language of survey > variations that change _very_ often will have a big negative effect on > Strategy and Insights' ability to use SHIELD effectively. > > Ritu: What are your thoughts? Is approval at the system add-on level good > enough? Are there any alternatives besides recipe-level review we can > consider? Hi Michael, Julien and I had a chat last week. The general rule is if we are making changes to Firefox on the release channel which may impact the quality and/or stability of the browser, it needs to be reviewed by release management. We have a good process in place now for system add-on live updates and it has worked well for Hello and Pocket so far. Can you provide an example of a recipe change that might be too granular for relman review but can be handled via a system add-on approval instead? We want to work on a light weight process that makes sense for maintaining Firefox stability on release channel and not become a burden on the Devs. Thanks!
Flags: needinfo?(rkothari)
(In reply to Ritu Kothari (:ritu) from comment #5) > Can you provide an example of a recipe change that might be too granular for > relman review but can be handled via a system add-on approval instead? We > want to work on a light weight process that makes sense for maintaining > Firefox stability on release channel and not become a burden on the Devs. > Thanks! Sure! So an example of something too granular for relman review would be a new weekly survey that S&I wants to put out. They decide they want to survey users with sync enabled to figure out if they're aware that they are using sync. This involves creating a new recipe, but the recipe only contains the text for the heartbeat question and a link to the post-heartbeat survey. Changing out these parameters is very low-risk and it doesn't really make sense for RelMan to approve this, especially since S&I is performing these surveys on a weekly basis. A more sensitive config that RelMan would be concerned about would be something like a SHIELD study. In this example, S&I wants to send out a study that moves tabs to the bottom of the window and measures how that affects user sentiment. SHIELD studies are implemented as add-ons that disable themselves after a week, and the recipe config here would contain a link to the add-on as well as the text for the study consent page; users have to give consent before installing the study add-on. The add-on link is more sensitive than just text, but the add-on is required to be signed by AMO. If there was still concern around this, then when we're updating the system add-on to support SHIELD studies, RelMan can review and help us design limits around studies, such as verifying a specific RelMan-controlled signature is on the add-on, or some other limit that helps ensure we can ship config changes quickly while still having RelMan review to maintain stability. Does that better illustrate the kind of review process I'm suggesting?
Flags: needinfo?(rkothari)
Purely from a security perspective, the concern is protecting against an account theft and a simple two-man rule solves the issue. I'm happy to let you decide who these two persons should be. I'll let Ritu handle the RelMan sign-off aspect of it.
Osmose has done a great job of outlining the various types of recipes above. I've got just a few things to add. Sample size is also something to keep in mind. We usually only sample 1k-3k users for surveys, shield studies, etc. Shield Studies may manipulate the UI as Osmose mentioned above, which may make it a special case. I'm open to discuss that. We have talked about doing things like trying to convince a large portion of the Beta population to move back to Release where they want to be and then recruit fresh users from Release. In this case the sample would be large (several million in total) and we will of course be filing bugs and sending emails to the appropriate stakeholders whenever we do something of that size. I think that is really the use case where we need to set up a process. When we do something out of the norm (greater than a few k users), how do we appropriately communicate and sign off?
We recently landed some docs that explain the concepts and threat model behind Normandy in more detail than Comment 3: http://normandy.readthedocs.io/en/latest/dev/concepts.html ritu: Do you still think Relman sign-off on recipe changes (vs Relman sign-off on system add-on changes to constrain recipe changes) is something you want?
No longer blocks: 1248648
Osmose, Matt, Cory and I have agreed upon the following sign-off process for recipe changes: "As we add new actions (an action roughly being a thing that recipes can do, like a SHIELD Study or a multivariate test or setting a preference), we'll send an email to r-d ML about the scope of the study, start and end dates and the Firefox version that study will be run against." This should be enough to keep all stakeholders connected and on the same page.
Flags: needinfo?(rkothari)
A small correction to comment 10, we have decided to email r-d ML *only* for shield studies and not for other recipe actions. We are assuming your team is doing some basic testing/QA sign-offs before pushing the recipes and don't add any new risks to the release population. We can always revisit this proposal if we face unexpected problems.
Marking as invalid as per Comment 11, since there was nothing to do here.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
Product: Shield → Firefox
You need to log in before you can comment on or make changes to this bug.