Closed Bug 1251259 Opened 9 years ago Closed 9 years ago

Compare engagement ratio of e10s and non-e10s in beta and release

Categories

(Cloud Services :: Metrics: Product Metrics, defect, P1)

defect
Points:
3

Tracking

(e10s+)

RESOLVED FIXED
Tracking Status
e10s + ---

People

(Reporter: benjamin, Assigned: rvitillo)

References

Details

As we start rolling e10s out to beta and release, we should be comparing the engagement ratio of the two groups to see whether there are obvious differences. See bug 1249845 for details about the population-splitting plan. The base deliverable here is a daily updated dashboard with the engagement ratio for the following groups over time: * beta users, e10s and non-e10s * release users, e10s and non-e10s As an extended deliverable, we will likely want to slice-and-dice this in a few different ways: * ratios for new profiles in the release channel, grouped by week after they started using Firefox * ratios for existing profiles in the beta and release channels, separately, grouped by week after we turned on e10s See bug 1240849 for the definition of engagement ratio.
1) Finalize on precise definition of how we're going to calculate the ratio for the two subpopulations of the study. **DAVID** could you look at the ticket and comment by end-of-day Monday. 2) Create the pipeline streams necessary to measure this. IIUC Katie and/or Roberto are going to handle this, with the result being a rollup table and a beautifully-formatted CSV. 3) Get the CSV updated on a daily basis on a web page somewhere. **HAMILTON** can you take this on?
The computation of the ER is described here: https://bugzilla.mozilla.org/show_bug.cgi?id=1240849#c19. I recommend using the type 2 active for both DAU and MAU (rather than type 1 for DAU as suggested in that description). Essentially this means that, if you switched groups on a given day (eg you disabled e10s), you get counted in the ER for both groups on that day. I think this will make for greater consistency and simpler computation. The segment for a client's subsession is determined by: - channel: Profile is in "release" group if environment.settings.update.channel starts with "release", and similarly for "beta" - e10s status: Profile is in "e10s" group if environment.settings.e10sEnabled is true, otherwise, "non-e10s" group.
No problem. I'll be waiting for kparlante & co. to get me the CSV for this. The putting-up-a-dashboard part should take no time at all on my end.
Roberto, please take this. Felipe, this is the bug we discussed that's blocked on having something in the environment. NOTE: dzeber is incorrect in comment 2. environment.settings.e10sEnabled will give us a biased comparison because of the splitting rules. We need to compare people in the two statistically divided populations from bug 1249845.
Assignee: nobody → rvitillo
Flags: needinfo?(felipc)
Ok, so what I propose is having a string property in the environment, called "e10sCohort", with four possible values: - "control" : user has not been selected to use e10s yet - "test" : user has been selected to use e10s (but might not actually be using it, if some blocking rule applied. e10sEnabled will tell the final state) - "opted-in" : user manually opted-in to e10s - "unknown" : if the system add-on for some reason hasn't set up things yet How does that look like? I believe this gives an unbiased distribution to compare test/control groups.
Flags: needinfo?(felipc)
Is it also possible to opt out (via hidden prefs?) If so that should be an option here also. Otherwise that sounds fine to me, and you'll need to coordinate with Georg on actually getting this added to the docs and schema.
FWIW, final list of possible cohorts, from bug 1249845. I made it very detailed so we can understand how rollout is working. test/control remains the statistically significant groups. - "unsupportedChannel" for any channel other than beta/release - "pastStartup" in case the system add-on code ran too late in the startup process and wasn't able to configure e10s properly - "optedIn" users who opted-in through the opt-in or force-enable pref - "optedOut" users who opted-out through the force-disable pref - "test" random() < threshold for this channel - "control" random() > threshold for this channel - "unknown" in case something goes wrong and the add-on code didn't run at all. shouldn't happen in the wild
Following up on this: - How do we know when a profile first enters the "experiment"? Is it the first subsession for which we see the "e10sCohort" variable? - On the first launch of the system addon, if you already have e10s enabled, are you "optedIn" or do you get randomly assigned to test/control? - For profiles already assigned to test/control, what happens when a user manually changes their e10s setting? If they disable it, does their cohort change to optedOut? Or do we check whether they're using e10s with "e10sEnabled"?
Flags: needinfo?(felipc)
Flags: needinfo?(cpeterson)
Priority: -- → P1
(In reply to Dave Zeber [:dzeber] from comment #8) > Following up on this: > > - How do we know when a profile first enters the "experiment"? Is it the > first subsession for which we see the "e10sCohort" variable? All profiles running builds with bug 1249845 will have "e10sCohort" defined. Hopefully it will be present on the first beta build of 46, so you can basically consider "is beta 46" for that. > > - On the first launch of the system addon, if you already have e10s enabled, > are you "optedIn" or do you get randomly assigned to test/control? If they already had e10s enabled, they will be part of "optedIn" > > - For profiles already assigned to test/control, what happens when a user > manually changes their e10s setting? If they disable it, does their cohort > change to optedOut? Or do we check whether they're using e10s with > "e10sEnabled"? If they manually disable it, they will be marked as "optedOut". But note that you still need to correlate the data of "test" users with the "e10sEnabled" flag. Because a "test" user might still not get e10s enabled due to add-ons, RTL locales, or accessibility. The system add-on makes no attempt to identify that, as that is the responsibility of the in-tree code. In this case (cohort = "test", e10sEnabled = false), you will be able to see the detailed reason for why this is the case by looking at the E10S_STATUS telemetry probe. Also note that I removed the "pastStartup" cohort as that should no longer be a problem with the new patch in bug 1249845
Flags: needinfo?(felipc)
Flags: needinfo?(cpeterson)
Makes sense. So it looks like the updated computation for the top-level dashboard should be: 1) Ignore pings for which e10sCohort is missing or is in ("unsupportedChannel", "unknown"). 2) Segment channel: - "release" if environment.settings.update.channel starts with "release" - "beta" if environment.settings.update.channel starts with "beta" 3) Segment e10s group: - "test" if e10sCohort == "test" and environment.settings.e10sEnabled == true, - "control" if e10sCohort == "control" - "other" otherwise (optionally, break this group down further) - Compute DAU for profiles in each (channel, e10s group) segment using the type 1 activity (a profile's most recent subsession on the day belongs to the segment). - Compute MAU for those segments (using the type 2 activity). - Compute ER. Details in https://bugzilla.mozilla.org/show_bug.cgi?id=1240849#c19. It was decided to use the type 1 activity computation for DAU in https://bugzilla.mozilla.org/show_bug.cgi?id=1240849#c28, so I'm mentioning that explicitly here.
We should not use "starts with" rules for release channels. It should be an exact match, because the partner build is not in the release channel it's in the separate environment block.
(In reply to Dave Zeber [:dzeber] from comment #10) > 3) Segment e10s group: > - "test" if e10sCohort == "test" and environment.settings.e10sEnabled == > true, > - "control" if e10sCohort == "control" Doing this is very risky. For simplicity sake say population splits 20% a11y, 40% with-addons, 40% without. With above a high influence over control is from population not matching the test. Better choice is either full cohort or restrict both to no a11y and without-addons. (restrict method then would need updating once expended to include addons.)
Depends on: 1253609
Points: --- → 3
(In reply to Jonathan Howard from comment #12) > Doing this is very risky. For simplicity sake say population splits 20% > a11y, 40% with-addons, 40% without. With above a high influence over control > is from population not matching the test. > > Better choice is either full cohort or restrict both to no a11y and > without-addons. (restrict method then would need updating once expended to > include addons.) Yes, if we expect a significant proportion of the test group to end up being disqualified, we should take this into account. How about we identify the set of conditions that would prevent e10s from getting enabled, and add a flag (eg. "disqualified") to the list in Comment 7 to identify these? This would restrict test/control to "valid" profiles.
(In reply to Dave Zeber [:dzeber] from comment #13) > (In reply to Jonathan Howard from comment #12) > > Doing this is very risky. For simplicity sake say population splits 20% > > a11y, 40% with-addons, 40% without. With above a high influence over control > > is from population not matching the test. > > > > Better choice is either full cohort or restrict both to no a11y and > > without-addons. (restrict method then would need updating once expended to > > include addons.) > > Yes, if we expect a significant proportion of the test group to end up being > disqualified, we should take this into account. > > How about we identify the set of conditions that would prevent e10s from > getting enabled, and add a flag (eg. "disqualified") to the list in Comment > 7 to identify these? This would restrict test/control to "valid" profiles. Is it safe to say that any such info you'll need for splitting up the population will be included in the e10sCohort field? I'm planning to launch a job to backfill our derived dataset with the e10s data and want to make sure I include all the fields we'll need for later analysis.
Flags: needinfo?(dzeber)
(In reply to Dave Zeber [:dzeber] from comment #13) > (In reply to Jonathan Howard from comment #12) > > Doing this is very risky. For simplicity sake say population splits 20% > > a11y, 40% with-addons, 40% without. With above a high influence over control > > is from population not matching the test. > > > > Better choice is either full cohort or restrict both to no a11y and > > without-addons. (restrict method then would need updating once expended to > > include addons.) > > Yes, if we expect a significant proportion of the test group to end up being > disqualified, we should take this into account. > > How about we identify the set of conditions that would prevent e10s from > getting enabled, and add a flag (eg. "disqualified") to the list in Comment > 7 to identify these? This would restrict test/control to "valid" profiles. Ok, I think this is a good idea. I'll file a bug to do that.
(In reply to Mark Reid [:mreid] from comment #14) > Is it safe to say that any such info you'll need for splitting up the > population will be included in the e10sCohort field? As far as I can tell, we need: - environment.settings.update.channel - e10sCohort - environment.settings.e10sEnabled (IIUC this should be superceded by e10sCohort, but better keep it anyway) together with dates of activity for each clientID. Felipe, is this enough for us to distinguish test users (that are actually using e10s) from control, or is there anything else we need to take into account?
Flags: needinfo?(dzeber) → needinfo?(felipc)
Yeah, that's enough. Hopefully with bug 1255013 checking e10sEnabled wouldn't be necessary, but it's indeed better to keep it, at least for verification.
Flags: needinfo?(felipc)
hmm so, update.channel only tells you that it's beta, so maybe you also want to make sure it's at least version 46. (or do this based on the presence of e10sCohort, which landed in 46)
(In reply to :Felipe Gomes (needinfo me!) from comment #18) > hmm so, update.channel only tells you that it's beta, so maybe you also want > to make sure it's at least version 46. > > (or do this based on the presence of e10sCohort, which landed in 46) I was thinking we should use the presence of e10sCohort, since we need that field to be present either way.
Depends on: 1253644
Hamilton, the "client_count" Parquet dataset contains the requested data (see 1253644). You can build a dashboard for it using re:dash; I created a simple one for the Beta channel based on the definition of ER [1] that uses only e10sEnabled (since e10sCohort made it only to the latest Beta which just went live). [1] https://sql.telemetry.mozilla.org/dashboard/e10s-client-count
Assignee: rvitillo → nobody
I think this bug needs to be moved to a different component, as this component is primarily used to triage product metrics requests. This appears to be a quality measurements need.
Flags: needinfo?(benjamin)
This is a product metric required to assess the rollout of e10s, requested by Jeff. I really don't care which component it lives in.
Flags: needinfo?(benjamin)
Roberto, do we have DAU/MAU data from the Beta 46 experiment? What are the next actions to analyze the engagement ratios of our e10s and non-e10s beta cohorts?
Flags: needinfo?(rvitillo)
Chris, as per Comment 21 the data, which includes the Beta 46 experiment, is available from re:dash. Hamilton owns the next step according to Comment 1.
Flags: needinfo?(rvitillo)
@ Robert: thanks! @ Hamilton: are you still block waiting for the daily CSV data?
Flags: needinfo?(hulmer)
Assignee: nobody → hulmer
:cpeterson I can get the CSV data to build another dashboard easily, but may need assistance on a process to get it into s3 daily. I'll put something together to display the data along the lines of our other summary dashboards, but first, a few questions: - re:dash already plots this data in a way similar to how I might plot it. Is there a need to put this up on a dashboard that only requires ldap if the e10s ERs are already graphed on re:dash? [1] - It appears that the e10s numbers cycle between ~ .50 and .02, where as the non-e10s numbers seem fairly stable. This doesn't seem like it is going to lead to good analysis, assuming these data makes sense. Is there something I'm missing there? [2] [1] https://sql.telemetry.mozilla.org/dashboard/e10s-client-count [2] https://sql.telemetry.mozilla.org/queries/82#133
Flags: needinfo?(rvitillo)
Flags: needinfo?(hulmer)
Flags: needinfo?(cpeterson)
Oh, scratch that first Q - obviously we are not comparing this to release, so that's the task. But my other Q about the cyclical nature of the ER in E10s holds.
Right now, we're shipping e10s as temporary experiments. Starting at some point we're going to be shipping it as a permanent progressive rollout, which is where this data will be more useful. One thing I'm concerned about here is that we don't seem to be comparing apples to apples for e10s and non-e10s. The progressive rollout procedure is: * divide the population in half: 50% have no-e10s and 50% maybe-e10s * for the "maybe e10s" population, exclude people with addons, people who have used a11y, and people in RTL locales * turn on e10s for the rest of the maybe-e10s population What we really care about comparing is users in the no-e10s group who match the maybe-e10s criteria (no addons in particular). We know that people with addons have different engagement in general and would skew any direct comparison. Roberto and Felipe, does the data we have account for this?
Flags: needinfo?(felipc)
Yeah, the data allows that by directly comparing the "test" and "control" groups (assuming these groups are coming from the "e10sCohort" telemetry environment data). Every eligible user gets drawn on the 50% dice roll, but only users who match the maybe-e10s criteria are tagged with test or control. Users who don't are tagged with "disqualified", in both groups. Chutten ran an analysis of how the first rollout test went, and we got the expected distribution (see bug 1261387 comment 5): - 22.3% in test - 22.1% in control - 55.6% in disqualified/optedIn/optedOut
Flags: needinfo?(felipc)
(In reply to Hamilton from comment #27) > - It appears that the e10s numbers cycle between ~ .50 and .02, where as the > non-e10s numbers seem fairly stable. This doesn't seem like it is going to > lead to good analysis, assuming these data makes sense. Is there something > I'm missing there? [2] e10s is disabled by default in the Beta channel, but we ran two e10s experiments during Beta 46, which accounts for the fluctuations. As Benjamin points out, we plan to continuously run 50/50% maybe-e10s/no-e01s cohorts in the Beta channel (hopefully starting with Beta 47). That should give us more stable DAU/MAU data.
Flags: needinfo?(cpeterson)
(In reply to Hamilton from comment #27) > I can get the CSV data to build another dashboard easily, but may need > assistance on a process to get it into s3 daily. I'll put something together > to display the data along the lines of our other summary dashboards, but > first, a few questions: > > - re:dash already plots this data in a way similar to how I might plot it. > Is there a need to put this up on a dashboard that only requires ldap if the > e10s ERs are already graphed on re:dash? [1] There is no need to pull out a CSV file from Presto to build a dashboard as you can use directly the plotting facilities provided by re:dash. What we are currently missing is a plot of the ER faceted by e10sCohort.
Flags: needinfo?(rvitillo)
Hamilton, what are the next steps to create the ER dashboard for the e10sCohort users? We plan to have a fixed percentage of Beta users in the e10sCohort instead of running short experiments. We would like to have a dashboard that can watch incoming ER data.
Flags: needinfo?(hulmer)
jjensen and I chatted a bit about this last week, and I let this thread slip - apologies. It seems the best approach here would be to make these graphs in re:dash. We agreed that projects like this (things that can be spec'd out and thrown up on re:dash easily by engineers, and consumed by them) would best be executed by rvitillo's team or those who are working with the dat accessed by re:dash(cc'd here). Am happy to work on this as well, but I think it'd be much more efficient if I wasn't involved on this project.
Flags: needinfo?(hulmer)
OK, I will take this then. Here is a first version of the dashboard based on the e10sControl flag [1]. I am going to be OOO tomorrow and at a work-week next week; I will try to follow up asap. [1] https://sql.telemetry.mozilla.org/dashboard/e10s-client-count-beta-e10scontrol-
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #35) > OK, I will take this then. Here is a first version of the dashboard based on > the e10sControl flag [1]. I am going to be OOO tomorrow and at a work-week > next week; I will try to follow up asap. > > [1] > https://sql.telemetry.mozilla.org/dashboard/e10s-client-count-beta- > e10scontrol- Hey, this e10sCohort test/control graph is also really useful to me to check that the distribution is working properly. Is it possible to add another graph that displays all possible values for this field (i.e., also includes the optin/out, disqualified-*, etc.)?
(In reply to :Felipe Gomes (needinfo me!) from comment #36) > Hey, this e10sCohort test/control graph is also really useful to me to check > that the distribution is working properly. Is it possible to add another > graph that displays all possible values for this field (i.e., also includes > the optin/out, disqualified-*, etc.)? Done.
as per RyanVM, assigned to Roberto - Can this be closed at this point?
Assignee: hulmer → rvitillo
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Depends on: 1277328
You need to log in before you can comment on or make changes to this bug.