1251259 - Compare engagement ratio of e10s and non-e10s in beta and release

Reporter

Description

•

9 years ago

As we start rolling e10s out to beta and release, we should be comparing the engagement ratio of the two groups to see whether there are obvious differences. See bug 1249845 for details about the population-splitting plan. The base deliverable here is a daily updated dashboard with the engagement ratio for the following groups over time: * beta users, e10s and non-e10s * release users, e10s and non-e10s As an extended deliverable, we will likely want to slice-and-dice this in a few different ways: * ratios for new profiles in the release channel, grouped by week after they started using Firefox * ratios for existing profiles in the beta and release channels, separately, grouped by week after we turned on e10s See bug 1240849 for the definition of engagement ratio.

Jim Mathies [:jimm]

Updated

•

9 years ago

tracking-e10s: ? → m9+

John Jensen

Comment 1

•

9 years ago

1) Finalize on precise definition of how we're going to calculate the ratio for the two subpopulations of the study. **DAVID** could you look at the ticket and comment by end-of-day Monday. 2) Create the pipeline streams necessary to measure this. IIUC Katie and/or Roberto are going to handle this, with the result being a rollup table and a beautifully-formatted CSV. 3) Get the CSV updated on a daily basis on a web page somewhere. **HAMILTON** can you take this on?

Dave Zeber [:dzeber]

Comment 2

•

9 years ago

The computation of the ER is described here: https://bugzilla.mozilla.org/show_bug.cgi?id=1240849#c19. I recommend using the type 2 active for both DAU and MAU (rather than type 1 for DAU as suggested in that description). Essentially this means that, if you switched groups on a given day (eg you disabled e10s), you get counted in the ER for both groups on that day. I think this will make for greater consistency and simpler computation. The segment for a client's subsession is determined by: - channel: Profile is in "release" group if environment.settings.update.channel starts with "release", and similarly for "beta" - e10s status: Profile is in "e10s" group if environment.settings.e10sEnabled is true, otherwise, "non-e10s" group.

Hamilton

Comment 3

•

9 years ago

No problem. I'll be waiting for kparlante & co. to get me the CSV for this. The putting-up-a-dashboard part should take no time at all on my end.

Benjamin Smedberg

Reporter

Comment 4

•

9 years ago

Roberto, please take this. Felipe, this is the bug we discussed that's blocked on having something in the environment. NOTE: dzeber is incorrect in comment 2. environment.settings.e10sEnabled will give us a biased comparison because of the splitting rules. We need to compare people in the two statistically divided populations from bug 1249845.

Assignee: nobody → rvitillo

Flags: needinfo?(felipc)

:Felipe Gomes (needinfo for replies!)

Comment 5

•

9 years ago

Ok, so what I propose is having a string property in the environment, called "e10sCohort", with four possible values: - "control" : user has not been selected to use e10s yet - "test" : user has been selected to use e10s (but might not actually be using it, if some blocking rule applied. e10sEnabled will tell the final state) - "opted-in" : user manually opted-in to e10s - "unknown" : if the system add-on for some reason hasn't set up things yet How does that look like? I believe this gives an unbiased distribution to compare test/control groups.

Flags: needinfo?(felipc)

Benjamin Smedberg

Reporter

Comment 6

•

9 years ago

Is it also possible to opt out (via hidden prefs?) If so that should be an option here also. Otherwise that sounds fine to me, and you'll need to coordinate with Georg on actually getting this added to the docs and schema.

:Felipe Gomes (needinfo for replies!)

Comment 7

•

9 years ago

FWIW, final list of possible cohorts, from bug 1249845. I made it very detailed so we can understand how rollout is working. test/control remains the statistically significant groups. - "unsupportedChannel" for any channel other than beta/release - "pastStartup" in case the system add-on code ran too late in the startup process and wasn't able to configure e10s properly - "optedIn" users who opted-in through the opt-in or force-enable pref - "optedOut" users who opted-out through the force-disable pref - "test" random() < threshold for this channel - "control" random() > threshold for this channel - "unknown" in case something goes wrong and the add-on code didn't run at all. shouldn't happen in the wild

Dave Zeber [:dzeber]

Comment 8

•

9 years ago

Following up on this: - How do we know when a profile first enters the "experiment"? Is it the first subsession for which we see the "e10sCohort" variable? - On the first launch of the system addon, if you already have e10s enabled, are you "optedIn" or do you get randomly assigned to test/control? - For profiles already assigned to test/control, what happens when a user manually changes their e10s setting? If they disable it, does their cohort change to optedOut? Or do we check whether they're using e10s with "e10sEnabled"?

Flags: needinfo?(felipc)

Thomas Huelbert

Updated

•

9 years ago

Flags: needinfo?(cpeterson)

Priority: -- → P1

:Felipe Gomes (needinfo for replies!)

Comment 9

•

9 years ago

(In reply to Dave Zeber [:dzeber] from comment #8) > Following up on this: > > - How do we know when a profile first enters the "experiment"? Is it the > first subsession for which we see the "e10sCohort" variable? All profiles running builds with bug 1249845 will have "e10sCohort" defined. Hopefully it will be present on the first beta build of 46, so you can basically consider "is beta 46" for that. > > - On the first launch of the system addon, if you already have e10s enabled, > are you "optedIn" or do you get randomly assigned to test/control? If they already had e10s enabled, they will be part of "optedIn" > > - For profiles already assigned to test/control, what happens when a user > manually changes their e10s setting? If they disable it, does their cohort > change to optedOut? Or do we check whether they're using e10s with > "e10sEnabled"? If they manually disable it, they will be marked as "optedOut". But note that you still need to correlate the data of "test" users with the "e10sEnabled" flag. Because a "test" user might still not get e10s enabled due to add-ons, RTL locales, or accessibility. The system add-on makes no attempt to identify that, as that is the responsibility of the in-tree code. In this case (cohort = "test", e10sEnabled = false), you will be able to see the detailed reason for why this is the case by looking at the E10S_STATUS telemetry probe. Also note that I removed the "pastStartup" cohort as that should no longer be a problem with the new patch in bug 1249845

Flags: needinfo?(felipc)

Thomas Huelbert

Updated

•

9 years ago

Flags: needinfo?(cpeterson)

Dave Zeber [:dzeber]

Comment 10

•

9 years ago

Makes sense. So it looks like the updated computation for the top-level dashboard should be: 1) Ignore pings for which e10sCohort is missing or is in ("unsupportedChannel", "unknown"). 2) Segment channel: - "release" if environment.settings.update.channel starts with "release" - "beta" if environment.settings.update.channel starts with "beta" 3) Segment e10s group: - "test" if e10sCohort == "test" and environment.settings.e10sEnabled == true, - "control" if e10sCohort == "control" - "other" otherwise (optionally, break this group down further) - Compute DAU for profiles in each (channel, e10s group) segment using the type 1 activity (a profile's most recent subsession on the day belongs to the segment). - Compute MAU for those segments (using the type 2 activity). - Compute ER. Details in https://bugzilla.mozilla.org/show_bug.cgi?id=1240849#c19. It was decided to use the type 1 activity computation for DAU in https://bugzilla.mozilla.org/show_bug.cgi?id=1240849#c28, so I'm mentioning that explicitly here.

Benjamin Smedberg

Reporter

Comment 11

•

9 years ago

We should not use "starts with" rules for release channels. It should be an exact match, because the partner build is not in the release channel it's in the separate environment block.

Jonathan Howard

Comment 12

•

9 years ago

(In reply to Dave Zeber [:dzeber] from comment #10) > 3) Segment e10s group: > - "test" if e10sCohort == "test" and environment.settings.e10sEnabled == > true, > - "control" if e10sCohort == "control" Doing this is very risky. For simplicity sake say population splits 20% a11y, 40% with-addons, 40% without. With above a high influence over control is from population not matching the test. Better choice is either full cohort or restrict both to no a11y and without-addons. (restrict method then would need updating once expended to include addons.)

Roberto Agostino Vitillo (:rvitillo)

Assignee

Updated

•

9 years ago

Depends on: 1253609

Rob Miller [:rmiller]

Updated

•

9 years ago

Points: --- → 3

Dave Zeber [:dzeber]

Comment 13

•

9 years ago

(In reply to Jonathan Howard from comment #12) > Doing this is very risky. For simplicity sake say population splits 20% > a11y, 40% with-addons, 40% without. With above a high influence over control > is from population not matching the test. > > Better choice is either full cohort or restrict both to no a11y and > without-addons. (restrict method then would need updating once expended to > include addons.) Yes, if we expect a significant proportion of the test group to end up being disqualified, we should take this into account. How about we identify the set of conditions that would prevent e10s from getting enabled, and add a flag (eg. "disqualified") to the list in Comment 7 to identify these? This would restrict test/control to "valid" profiles.

Mark Reid [:mreid]

Comment 14

•

9 years ago

(In reply to Dave Zeber [:dzeber] from comment #13) > (In reply to Jonathan Howard from comment #12) > > Doing this is very risky. For simplicity sake say population splits 20% > > a11y, 40% with-addons, 40% without. With above a high influence over control > > is from population not matching the test. > > > > Better choice is either full cohort or restrict both to no a11y and > > without-addons. (restrict method then would need updating once expended to > > include addons.) > > Yes, if we expect a significant proportion of the test group to end up being > disqualified, we should take this into account. > > How about we identify the set of conditions that would prevent e10s from > getting enabled, and add a flag (eg. "disqualified") to the list in Comment > 7 to identify these? This would restrict test/control to "valid" profiles. Is it safe to say that any such info you'll need for splitting up the population will be included in the e10sCohort field? I'm planning to launch a job to backfill our derived dataset with the e10s data and want to make sure I include all the fields we'll need for later analysis.

Flags: needinfo?(dzeber)

:Felipe Gomes (needinfo for replies!)

Comment 15

•

9 years ago

(In reply to Dave Zeber [:dzeber] from comment #13) > (In reply to Jonathan Howard from comment #12) > > Doing this is very risky. For simplicity sake say population splits 20% > > a11y, 40% with-addons, 40% without. With above a high influence over control > > is from population not matching the test. > > > > Better choice is either full cohort or restrict both to no a11y and > > without-addons. (restrict method then would need updating once expended to > > include addons.) > > Yes, if we expect a significant proportion of the test group to end up being > disqualified, we should take this into account. > > How about we identify the set of conditions that would prevent e10s from > getting enabled, and add a flag (eg. "disqualified") to the list in Comment > 7 to identify these? This would restrict test/control to "valid" profiles. Ok, I think this is a good idea. I'll file a bug to do that.

:Felipe Gomes (needinfo for replies!)

Updated

•

9 years ago

Depends on: 1255013

Dave Zeber [:dzeber]

Comment 16

•

9 years ago

(In reply to Mark Reid [:mreid] from comment #14) > Is it safe to say that any such info you'll need for splitting up the > population will be included in the e10sCohort field? As far as I can tell, we need: - environment.settings.update.channel - e10sCohort - environment.settings.e10sEnabled (IIUC this should be superceded by e10sCohort, but better keep it anyway) together with dates of activity for each clientID. Felipe, is this enough for us to distinguish test users (that are actually using e10s) from control, or is there anything else we need to take into account?

Flags: needinfo?(dzeber) → needinfo?(felipc)

:Felipe Gomes (needinfo for replies!)

Comment 17

•

9 years ago

Yeah, that's enough. Hopefully with bug 1255013 checking e10sEnabled wouldn't be necessary, but it's indeed better to keep it, at least for verification.

Flags: needinfo?(felipc)

:Felipe Gomes (needinfo for replies!)

Comment 18

•

9 years ago

hmm so, update.channel only tells you that it's beta, so maybe you also want to make sure it's at least version 46. (or do this based on the presence of e10sCohort, which landed in 46)

Dave Zeber [:dzeber]

Comment 19

•

9 years ago

(In reply to :Felipe Gomes (needinfo me!) from comment #18) > hmm so, update.channel only tells you that it's beta, so maybe you also want > to make sure it's at least version 46. > > (or do this based on the presence of e10sCohort, which landed in 46) I was thinking we should use the presence of e10sCohort, since we need that field to be present either way.

Brad Lassey [:blassey] (use needinfo?)

Comment 20

•

9 years ago

not an m9

tracking-e10s: m9+ → +

Roberto Agostino Vitillo (:rvitillo)

Assignee

Updated

•

9 years ago

Depends on: 1253644

Roberto Agostino Vitillo (:rvitillo)

Assignee

Comment 21

•

9 years ago

Hamilton, the "client_count" Parquet dataset contains the requested data (see 1253644). You can build a dashboard for it using re:dash; I created a simple one for the Beta channel based on the definition of ER [1] that uses only e10sEnabled (since e10sCohort made it only to the latest Beta which just went live). [1] https://sql.telemetry.mozilla.org/dashboard/e10s-client-count

Roberto Agostino Vitillo (:rvitillo)

Assignee

Updated

•

9 years ago

Assignee: rvitillo → nobody

rweiss@mozilla.com

Comment 22

•

9 years ago

I think this bug needs to be moved to a different component, as this component is primarily used to triage product metrics requests. This appears to be a quality measurements need.

Flags: needinfo?(benjamin)

Benjamin Smedberg

Reporter

Comment 23

•

9 years ago

This is a product metric required to assess the rollout of e10s, requested by Jeff. I really don't care which component it lives in.

Flags: needinfo?(benjamin)

Chris Peterson [:cpeterson]

Comment 24

•

9 years ago

Roberto, do we have DAU/MAU data from the Beta 46 experiment? What are the next actions to analyze the engagement ratios of our e10s and non-e10s beta cohorts?

Flags: needinfo?(rvitillo)

Roberto Agostino Vitillo (:rvitillo)

Assignee

Comment 25

•

9 years ago

Chris, as per Comment 21 the data, which includes the Beta 46 experiment, is available from re:dash. Hamilton owns the next step according to Comment 1.

Flags: needinfo?(rvitillo)

Chris Peterson [:cpeterson]

Comment 26

•

9 years ago

@ Robert: thanks! @ Hamilton: are you still block waiting for the daily CSV data?

Flags: needinfo?(hulmer)

Chris Peterson [:cpeterson]

Updated

•

9 years ago

Assignee: nobody → hulmer

Hamilton

Comment 27

•

9 years ago

:cpeterson I can get the CSV data to build another dashboard easily, but may need assistance on a process to get it into s3 daily. I'll put something together to display the data along the lines of our other summary dashboards, but first, a few questions: - re:dash already plots this data in a way similar to how I might plot it. Is there a need to put this up on a dashboard that only requires ldap if the e10s ERs are already graphed on re:dash? [1] - It appears that the e10s numbers cycle between ~ .50 and .02, where as the non-e10s numbers seem fairly stable. This doesn't seem like it is going to lead to good analysis, assuming these data makes sense. Is there something I'm missing there? [2] [1] https://sql.telemetry.mozilla.org/dashboard/e10s-client-count [2] https://sql.telemetry.mozilla.org/queries/82#133

Flags: needinfo?(rvitillo)

Flags: needinfo?(hulmer)

Flags: needinfo?(cpeterson)

Hamilton

Comment 28

•

9 years ago

Oh, scratch that first Q - obviously we are not comparing this to release, so that's the task. But my other Q about the cyclical nature of the ER in E10s holds.

Benjamin Smedberg

Reporter

Comment 29

•

9 years ago

Right now, we're shipping e10s as temporary experiments. Starting at some point we're going to be shipping it as a permanent progressive rollout, which is where this data will be more useful. One thing I'm concerned about here is that we don't seem to be comparing apples to apples for e10s and non-e10s. The progressive rollout procedure is: * divide the population in half: 50% have no-e10s and 50% maybe-e10s * for the "maybe e10s" population, exclude people with addons, people who have used a11y, and people in RTL locales * turn on e10s for the rest of the maybe-e10s population What we really care about comparing is users in the no-e10s group who match the maybe-e10s criteria (no addons in particular). We know that people with addons have different engagement in general and would skew any direct comparison. Roberto and Felipe, does the data we have account for this?

Flags: needinfo?(felipc)

:Felipe Gomes (needinfo for replies!)

Comment 30

•

9 years ago

Yeah, the data allows that by directly comparing the "test" and "control" groups (assuming these groups are coming from the "e10sCohort" telemetry environment data). Every eligible user gets drawn on the 50% dice roll, but only users who match the maybe-e10s criteria are tagged with test or control. Users who don't are tagged with "disqualified", in both groups. Chutten ran an analysis of how the first rollout test went, and we got the expected distribution (see bug 1261387 comment 5): - 22.3% in test - 22.1% in control - 55.6% in disqualified/optedIn/optedOut

Flags: needinfo?(felipc)

Chris Peterson [:cpeterson]

Comment 31

•

9 years ago

(In reply to Hamilton from comment #27) > - It appears that the e10s numbers cycle between ~ .50 and .02, where as the > non-e10s numbers seem fairly stable. This doesn't seem like it is going to > lead to good analysis, assuming these data makes sense. Is there something > I'm missing there? [2] e10s is disabled by default in the Beta channel, but we ran two e10s experiments during Beta 46, which accounts for the fluctuations. As Benjamin points out, we plan to continuously run 50/50% maybe-e10s/no-e01s cohorts in the Beta channel (hopefully starting with Beta 47). That should give us more stable DAU/MAU data.

Flags: needinfo?(cpeterson)

Roberto Agostino Vitillo (:rvitillo)

Assignee

Comment 32

•

9 years ago

(In reply to Hamilton from comment #27) > I can get the CSV data to build another dashboard easily, but may need > assistance on a process to get it into s3 daily. I'll put something together > to display the data along the lines of our other summary dashboards, but > first, a few questions: > > - re:dash already plots this data in a way similar to how I might plot it. > Is there a need to put this up on a dashboard that only requires ldap if the > e10s ERs are already graphed on re:dash? [1] There is no need to pull out a CSV file from Presto to build a dashboard as you can use directly the plotting facilities provided by re:dash. What we are currently missing is a plot of the ER faceted by e10sCohort.

Roberto Agostino Vitillo (:rvitillo)

Assignee

Updated

•

9 years ago

Flags: needinfo?(rvitillo)

Chris Peterson [:cpeterson]

Comment 33

•

9 years ago

Hamilton, what are the next steps to create the ER dashboard for the e10sCohort users? We plan to have a fixed percentage of Beta users in the e10sCohort instead of running short experiments. We would like to have a dashboard that can watch incoming ER data.

Flags: needinfo?(hulmer)

Hamilton

Comment 34

•

9 years ago

jjensen and I chatted a bit about this last week, and I let this thread slip - apologies. It seems the best approach here would be to make these graphs in re:dash. We agreed that projects like this (things that can be spec'd out and thrown up on re:dash easily by engineers, and consumed by them) would best be executed by rvitillo's team or those who are working with the dat accessed by re:dash(cc'd here). Am happy to work on this as well, but I think it'd be much more efficient if I wasn't involved on this project.

Flags: needinfo?(hulmer)

Roberto Agostino Vitillo (:rvitillo)

Assignee

Comment 35

•

9 years ago

OK, I will take this then. Here is a first version of the dashboard based on the e10sControl flag [1]. I am going to be OOO tomorrow and at a work-week next week; I will try to follow up asap. [1] https://sql.telemetry.mozilla.org/dashboard/e10s-client-count-beta-e10scontrol-

:Felipe Gomes (needinfo for replies!)

Comment 36

•

9 years ago

(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #35) > OK, I will take this then. Here is a first version of the dashboard based on > the e10sControl flag [1]. I am going to be OOO tomorrow and at a work-week > next week; I will try to follow up asap. > > [1] > https://sql.telemetry.mozilla.org/dashboard/e10s-client-count-beta- > e10scontrol- Hey, this e10sCohort test/control graph is also really useful to me to check that the distribution is working properly. Is it possible to add another graph that displays all possible values for this field (i.e., also includes the optin/out, disqualified-*, etc.)?

Roberto Agostino Vitillo (:rvitillo)

Assignee

Comment 37

•

9 years ago

(In reply to :Felipe Gomes (needinfo me!) from comment #36) > Hey, this e10sCohort test/control graph is also really useful to me to check > that the distribution is working properly. Is it possible to add another > graph that displays all possible values for this field (i.e., also includes > the optin/out, disqualified-*, etc.)? Done.

Thomas Huelbert

Comment 38

•

9 years ago

as per RyanVM, assigned to Roberto - Can this be closed at this point?

Assignee: hulmer → rvitillo

Roberto Agostino Vitillo (:rvitillo)

Assignee

Updated

•

9 years ago

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Chris Peterson [:cpeterson]

Updated

•

8 years ago

Depends on: 1277328