Closed Bug 1292360 Opened 8 years ago Closed 8 years ago

Include stub attribution code with the main Telemetry ping

Categories

(Toolkit :: Telemetry, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox50 --- fixed
firefox51 --- fixed
firefox52 --- fixed

People

(Reporter: ckprice, Assigned: mhowell)

References

Details

(Whiteboard: [measurement:client])

Attachments

(1 file)

This bug is opened to incorporate the stub attribution code accessible via the module in bug 1277627 into the main Telemetry ping. The data in this ping will be used to build the dashboards designed in bug1259614comment4. We're having a meeting to confirm the assignee and specifications of this bug.
Priority: -- → P3
Whiteboard: [measurement:client]
This data will eventually be viewed in the attribution dashboards in Tableau[0]. Ref: http://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/data/main-ping.html Could this go under `// The following properties...` as `attributionCode` attributionCode : { "source": "sample_source", "medium": "sample_medium", "campaign": "sample_campaign", "content": "sample_content" } This could also just be the raw string, but would like to confirm that it can be parsed downstream if not here. [0] https://bugzilla.mozilla.org/show_bug.cgi?id=1259614#c4
To provide a little more detail, each of the four data points Cory mentioned is a short string (almost certainly < 50 characters). All four are fixed and will not change during the life of an installation, except that they should expire and no longer be sent after 13 weeks from the time the installation is created. I'm not sure how hard it is to add a section to the main ping like that; it doesn't seem hard at all from the client perspective, but I don't have any idea how the rest of the pipeline would take it. Other options include adding a similar section to the environment data (which may or may not be any easier, I would not know), and just adding four new scalar probes (not ideal, probably a last resort).
Flags: needinfo?(gfritzsche)
I think this best fits into the environment data: http://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/data/environment.html Putting it into the "settings" section as an optional section "settings.attribution" (similar to "settings.update") would fit. Benjamin, does that sound good to you? Code-wise, this would be implemented in TelemetryEnvironment.jsm. The settings section is updated here: https://dxr.mozilla.org/mozilla-central/rev/65520f4cf4cc57b29aea6d1a0d608d2d30ecb8f5/toolkit/components/telemetry/TelemetryEnvironment.jsm#1128 But given that loading the attribution code is async, you probably want to add a `_updateAttribution()` call here: https://dxr.mozilla.org/mozilla-central/rev/65520f4cf4cc57b29aea6d1a0d608d2d30ecb8f5/toolkit/components/telemetry/TelemetryEnvironment.jsm#761 ... keep in mind to use `limitStringToLength()` for values. Test updates would go into test_TelemetryEnvironment.js, updating this: https://dxr.mozilla.org/mozilla-central/rev/65520f4cf4cc57b29aea6d1a0d608d2d30ecb8f5/toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js#340 ... and possibly adding some explicit testing. On the data processing side... is this data settled on going into Tableau or re:dash? If it is Tableau, is there an existing data source that you plan on extending with the attribution data?
Flags: needinfo?(gfritzsche) → needinfo?(benjamin)
(In reply to Georg Fritzsche [:gfritzsche] from comment #3) > On the data processing side... is this data settled on going into Tableau or > re:dash? > If it is Tableau, is there an existing data source that you plan on > extending with the attribution data? NI :cmore to weigh in on this.
Flags: needinfo?(chrismore.bugzilla)
Yes, environment.settings is fine.
Flags: needinfo?(benjamin)
Comment on attachment 8783053 [details] Bug 1292360 - Add attribution data to the telemetry environment block; https://reviewboard.mozilla.org/r/73018/#review70784 ::: toolkit/components/telemetry/TelemetryEnvironment.jsm:1190 (Diff revision 1) > + * Update the cached attribution data object. > + * @returns Promise<> resolved when the I/O is complete. > + */ > + _updateAttribution: Task.async(function* () { > + let data = yield AttributionCode.getAttrDataAsync(); > + this._currentEnvironment.settings.attribution = data; You mentioned limitStringToLength() but I could only find a few specific places (the add-ons data) where it's used, not for other strings, so I left it out. ::: toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js:349 (Diff revision 1) > e10sCohort: "string", > telemetryEnabled: "boolean", > locale: "string", > update: "object", > userPrefs: "object", > + attribution: "object", I'm testing only for the presence of the attribution object; the test for the AttributionData module should verify that object's content, because I'm just copying it directly out of that module for telemetry purposes.
Comment on attachment 8783053 [details] Bug 1292360 - Add attribution data to the telemetry environment block; https://reviewboard.mozilla.org/r/73018/#review70996 ::: toolkit/components/telemetry/TelemetryEnvironment.jsm:1149 (Diff revision 1) > channel: updateChannel, > enabled: Preferences.get(PREF_UPDATE_ENABLED, true), > autoDownload: Preferences.get(PREF_UPDATE_AUTODOWNLOAD, true), > }, > userPrefs: this._getPrefData(), > + attribution: {}, // filled in later by _updateAttribution() Am i right in understanding that most Firefox clients won't have data for `attribution`? If so, we should make it an optional property (i.e. only add it when we have attribution data). ::: toolkit/components/telemetry/TelemetryEnvironment.jsm:1150 (Diff revision 1) > enabled: Preferences.get(PREF_UPDATE_ENABLED, true), > autoDownload: Preferences.get(PREF_UPDATE_AUTODOWNLOAD, true), > }, > userPrefs: this._getPrefData(), > + attribution: {}, // filled in later by _updateAttribution() > }; We need to update the documentation too in toolkit/components/telemetry/docs/data/environment.rst. Please add a paragraph describing the data and when it may not be available etc. ::: toolkit/components/telemetry/TelemetryEnvironment.jsm:1190 (Diff revision 1) > + * Update the cached attribution data object. > + * @returns Promise<> resolved when the I/O is complete. > + */ > + _updateAttribution: Task.async(function* () { > + let data = yield AttributionCode.getAttrDataAsync(); > + this._currentEnvironment.settings.attribution = data; I don't see any bounding limits on the parsed attribution data. Please make sure that: * the amount of properties in `data` is limited * the individual key & value lengths are limited
(In reply to Cory Price [:ckprice] from comment #4) > (In reply to Georg Fritzsche [:gfritzsche] from comment #3) > > On the data processing side... is this data settled on going into Tableau or > > re:dash? > > If it is Tableau, is there an existing data source that you plan on > > extending with the attribution data? > NI :cmore to weigh in on this. Given all of this new data, we should just make this a new data source. In bug 1226379, Mark Reid created a new datasource for us for cohort retention. If we add this data and all of the combinations available to Mark's file, it will make the file exponentially bigger.
Flags: needinfo?(chrismore.bugzilla)
Assignee: nobody → mhowell
Comment on attachment 8783053 [details] Bug 1292360 - Add attribution data to the telemetry environment block; https://reviewboard.mozilla.org/r/73020/#review71450 ::: toolkit/components/telemetry/TelemetryEnvironment.jsm:1191 (Diff revisions 1 - 2) > * Update the cached attribution data object. > * @returns Promise<> resolved when the I/O is complete. > */ > _updateAttribution: Task.async(function* () { > let data = yield AttributionCode.getAttrDataAsync(); > + if (Object.keys(data).length > 0) { We should also check for the maximum number of keys in the object. ::: toolkit/components/telemetry/TelemetryEnvironment.jsm:1194 (Diff revisions 1 - 2) > _updateAttribution: Task.async(function* () { > let data = yield AttributionCode.getAttrDataAsync(); > + if (Object.keys(data).length > 0) { > + let failed = false; > + for (let key in data) { > + if (data[key].length > MAX_ATTRIBUTION_STRING_LENGTH) { Are you sure we should drop the data completely? We can just use `limitStringToLength()` here. This way, we see at least truncated data coming in if something went wrong. ::: toolkit/components/telemetry/TelemetryEnvironment.jsm:1195 (Diff revisions 1 - 2) > + failed = true; > + break; If you think we should just drop the data, just `return` here. ::: toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js:779 (Diff revisions 1 - 2) > > // Spoof the the hotfixVersion > Preferences.set("extensions.hotfix.lastVersion", APP_HOTFIX_VERSION); > > + // Create the attribution data file, so that settings.attribution will exist. > + if (gIsWindows) { Let's move this into a function `spoofAttributionData()`. Do we need to clear the `AttributionCode` cache here in case some other code already triggered loading the file? ::: toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js:1520 (Diff revisions 1 - 2) > + // Destroy the attribution data file we created. > + if (gIsWindows) { If any test before this fails, we will not remove that file, which means we could break other tests running on the same machine & build. You need to use `do_register_cleanup()` in `run_test()` to remove the file. ::: toolkit/components/telemetry/docs/data/environment.rst:64 (Diff revision 2) > // in this block > "pref.name.value": value // some prefs send the value > "pref.name.url": "<user-set>" // For some privacy-sensitive prefs > // only the fact that the value has been changed is recorded > }, > + attribution: { // only present if the installation has attribution data `// optional, only ...` ::: toolkit/components/telemetry/docs/data/environment.rst:65 (Diff revision 2) > "pref.name.value": value // some prefs send the value > "pref.name.url": "<user-set>" // For some privacy-sensitive prefs > // only the fact that the value has been changed is recorded > }, > + attribution: { // only present if the installation has attribution data > + // all of these values are absent when not defined `// all of these values are optional. ::: toolkit/components/telemetry/docs/data/environment.rst:331 (Diff revision 2) > +attribution > +~~~~~~~~~~~ > + > +This object contains the attribution data for the product installation. > + > +Attribution data is used to link installations of Firefox with the source that the user arrived at the Firefox download page from. It would indicate, for instance, when a user executed a web search for Firefox and arrived at the download page from there, directly navigated to the site, clinked on a link from a particular social media campaign, etc. s/clinked/clicked/
Attachment #8783053 - Flags: review?(gfritzsche)
Comment on attachment 8783053 [details] Bug 1292360 - Add attribution data to the telemetry environment block; https://reviewboard.mozilla.org/r/73020/#review71450 > We should also check for the maximum number of keys in the object. The attribution module checks that the set of keys only includes ones that it knows about before it returns the object, so that constrains the number of keys. > Are you sure we should drop the data completely? > We can just use `limitStringToLength()` here. > This way, we see at least truncated data coming in if something went wrong. I was afraid that might result in misleading data if the truncation got really unlucky, and I thought no data at all would be preferable to that. But that's extremely unlikely.
Comment on attachment 8783053 [details] Bug 1292360 - Add attribution data to the telemetry environment block; https://reviewboard.mozilla.org/r/73018/#review71704 This looks good with the below fixed. You still need to request data collection review: https://wiki.mozilla.org/Firefox/Data_Collection We also need to update the schema for the main ping with this: https://github.com/mozilla-services/mozilla-pipeline-schemas/blob/master/telemetry/main.schema.json https://github.com/mozilla-services/mozilla-pipeline-schemas/tree/master/validation/telemetry ::: toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js:278 (Diff revisions 2 - 3) > + let file = Services.dirsvc.get("LocalAppData", Ci.nsIFile); > + file.append("mozilla"); > + file.append(AppConstants.MOZ_APP_NAME); > + file.append("postSigningData"); This is exactly the same as in the function below. Lets move this to a utility function, say `getAttributionFile()`.
Comment on attachment 8783053 [details] Bug 1292360 - Add attribution data to the telemetry environment block; I somehow cleared the review flag in RB, now i can't figure out how to set r+. Setting r+ here and moving on.
Attachment #8783053 - Flags: review?(gfritzsche) → review+
(In reply to Georg Fritzsche [:gfritzsche] from comment #15) > Comment on attachment 8783053 [details] > Bug 1292360 - Add attribution data to the telemetry environment block; > > I somehow cleared the review flag in RB, now i can't figure out how to set > r+. > Setting r+ here and moving on. Don't worry about it; autoland doesn't work without an r+ in reviewboard, but I can land it old-school instead. ;) I'll make that last change and also do the data collection review request and put in a pull request to mozilla-pipeline-schemas before I land this. Thanks for the review and all the advice! Much appreciated.
Comment on attachment 8783053 [details] Bug 1292360 - Add attribution data to the telemetry environment block; Requesting formal data collection approval for this change.
Attachment #8783053 - Flags: feedback?(benjamin)
Just to keep the bug up-to-date: this is currently blocked on a discussion that's happening in email about the "source" field (the definition of that field, and minimizing risk around it).
Comment on attachment 8783053 [details] Bug 1292360 - Add attribution data to the telemetry environment block; https://reviewboard.mozilla.org/r/73020/#review80328 ::: toolkit/components/telemetry/docs/data/environment.rst:66 (Diff revision 3) > "pref.name.url": "<user-set>" // For some privacy-sensitive prefs > // only the fact that the value has been changed is recorded > }, > + attribution: { // optional, only present if the installation has attribution data > + // all of these values are optional. > + source: <string>, // domain of the referrer to the product download page After a meeting with cmore, please revise the description here to say: "When install happens via a known partner, the source records the referring partner domain." data-review=me with that change
Attachment #8783053 - Flags: feedback?(benjamin) → feedback+
Thanks, bsmedberg; I know that review turned into kind of a big deal, but it seems to have worked out. gfritzsche, now that this patch has data collection approval, I've sent a pull request to mozilla-services/mozilla-pipeline-schemas. I have a hunch it might be a good idea to wait on landing this patch until that PR is merged, so I'll do that.
Flags: needinfo?(gfritzsche)
r+ on the schema change.
Thanks, Mark. Let's finally land this thing.
Flags: needinfo?(gfritzsche)
https://hg.mozilla.org/integration/mozilla-inbound/rev/0bf376ad9b85c1eb5ec3707428151390788976a1 Bug 1292360 - Add attribution data to the telemetry environment block; r=gfritzsche, data-review=bsmedberg
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
NI :cmore for an uplift request.
Flags: needinfo?(chrismore.bugzilla)
(In reply to Cory Price [:ckprice] from comment #25) > NI :cmore for an uplift request. Thanks for the ping, Cory. Yes, I would like to request for the uplift to the earliest train without introduction extreme risk. Why? Because 2017 is going to start with heavy investments in acquisition and retention strategies and tactics. We are going to need insights on which are working out, so we can quickly evaluate winners/losers and focus on the tactics that are leading to higher usage of Firefox. Attribution will give us a clearer picture of our acquisition funnel and right now, nearly all acquisition channel's success metrics are done at a level of vanity. Telemetry is a much better measurement how newly acquired users are turning into active users. If we don't get this uplifted and on the trains asap, we'll be into 2017 trying to grow Firefox's user base without very good indications of success. Cory: What else do I need to do? Thanks
Flags: needinfo?(chrismore.bugzilla)
Comment on attachment 8783053 [details] Bug 1292360 - Add attribution data to the telemetry environment block; Approval Request Comment [Feature/regressing bug #]: This supports the Stub Attribution project -- bug 1259607. Bug 1277627 was uplifted in support. [User impact if declined]: See comment 26 for more reasons. [Describe test coverage new/current, TreeHerder]: Test coverage found here: https://wiki.mozilla.org/Firefox/Stub_Attribution/Test_Plan [String/UUID change made/needed]: N/A
Attachment #8783053 - Flags: approval-mozilla-beta?
Attachment #8783053 - Flags: approval-mozilla-aurora?
Cory, could you fill also this section? "[Risks and why]:" Thanks
Flags: needinfo?(cprice)
(In reply to Sylvestre Ledru [:sylvestre] from comment #28) > Cory, could you fill also this section? > "[Risks and why]:" > Thanks [Risks and why]: Medium: uses a new module, and affects Telemetry. But the code is relatively simple, and I forgot to mention that a unit test is added in this patch.
Flags: needinfo?(cprice)
Comment on attachment 8783053 [details] Bug 1292360 - Add attribution data to the telemetry environment block; Sub attribution is planned for Fx50, has stabilized on Nightly for a week, Aurora51+, Beta50+
Attachment #8783053 - Flags: approval-mozilla-beta?
Attachment #8783053 - Flags: approval-mozilla-beta+
Attachment #8783053 - Flags: approval-mozilla-aurora?
Attachment #8783053 - Flags: approval-mozilla-aurora+
Question for anyone here: Will the attribution code (source/medium/campaign/content) be saved somewhere within Firefox's preferences that is available to view in about:config or about:telemetry?
(In reply to Chris More [:cmore] from comment #33) > Question for anyone here: Will the attribution code > (source/medium/campaign/content) be saved somewhere within Firefox's > preferences that is available to view in about:config or about:telemetry? It should be listed in about:telemetry under Environment Data -> settings.
(In reply to Matt Howell [:mhowell] from comment #34) > (In reply to Chris More [:cmore] from comment #33) > > Question for anyone here: Will the attribution code > > (source/medium/campaign/content) be saved somewhere within Firefox's > > preferences that is available to view in about:config or about:telemetry? > It should be listed in about:telemetry under Environment Data -> settings. do you have the exact name of the preference key name? is it attribution_code or attributioncode or something different?
(In reply to Chris More [:cmore] from comment #35) > do you have the exact name of the preference key name? is it > attribution_code or attributioncode or something different? There will be up to four keys listed there: attribution.source, attribution.medium, attribution.campaign, and attribution.content. The telemetry documentation [0] briefly explains what each of those mean. [0] https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/data/environment.html
(In reply to Matt Howell [:mhowell] from comment #36) > (In reply to Chris More [:cmore] from comment #35) > > do you have the exact name of the preference key name? is it > > attribution_code or attributioncode or something different? > > There will be up to four keys listed there: attribution.source, > attribution.medium, attribution.campaign, and attribution.content. The > telemetry documentation [0] briefly explains what each of those mean. > > [0] > https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/ > telemetry/data/environment.html That's exactly what I needed. Thank you!
(In reply to Matt Howell [:mhowell] from comment #36) > (In reply to Chris More [:cmore] from comment #35) > > do you have the exact name of the preference key name? is it > > attribution_code or attributioncode or something different? > > There will be up to four keys listed there: attribution.source, > attribution.medium, attribution.campaign, and attribution.content. The > telemetry documentation [0] briefly explains what each of those mean. > > [0] > https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/ > telemetry/data/environment.html matt: one more question regarding these 4 keys. Will you also be able to find them within about:config and if someone wanted to clear out the values, can they? Maybe another way to say it, for the 4 keys, how about would someone modify them if they were inclined to do so?
Flags: needinfo?(mhowell)
They're not prefs, because they're associated with an installation and not a profile. They're stored in URL-encoded format in a file called "postSigningData" located in "C:\Users\[username]\AppData\Local\Mozilla\Firefox", which can be edited or just deleted by the user, it's not protected in any way.
Flags: needinfo?(mhowell)
(In reply to Matt Howell [:mhowell] from comment #39) > They're not prefs, because they're associated with an installation and not a > profile. They're stored in URL-encoded format in a file called > "postSigningData" located in > "C:\Users\[username]\AppData\Local\Mozilla\Firefox", which can be edited or > just deleted by the user, it's not protected in any way. Ah ok, but does really editing it or deleting really change anything since it was already sent during the activation ping? It seems like it would only just change what is on stored on the machine, but not the fact of what attribution cohort they exist in.
(In reply to Chris More [:cmore] from comment #40) > Ah ok, but does really editing it or deleting really change anything since > it was already sent during the activation ping? It seems like it would only > just change what is on stored on the machine, but not the fact of what > attribution cohort they exist in. It would affect future telemetry pings, beginning after the browser is restarted, but I think you're right that that wouldn't really accomplish much. The only way to prevent even the activation ping (or, since we don't actually have that yet, the initial main ping) would be to delete the file after installation but before starting the browser for the first time.
matt: do you have any idea why we are seeing in telemetry that most profiles that have attribution don't have it in their early telemetry pings, but it appears to show up later? Isn't attribution implemented that it happens immediately on install of Firefox? Can you think of any reasons why telemetry would be sending pings without any attribution codes and then later it shows up?
Flags: needinfo?(mhowell)
Not really being a telemetry client expert, I can only come up with one theory: because getting the attribution data requires disk I/O, it's done asynchronously, some time after most of the environment block is initialized, so maybe that means that pings being generated very early on startup don't have access to that information yet. But I'm basically guessing, :gfritzsche's team would be who to talk to for a real answer.
Flags: needinfo?(mhowell)
gfritzsche: any comment on comment 42 or comment 43? The analysis from Saptarshi/Su is showing that most users don't have attribution on their first payload, but then after a session or two, it starts sending. I don't understand why it wouldn't be sent immediately. Thanks
Flags: needinfo?(gfritzsche)
We do wait for the info from `AttributionCode.getAttrDataAsync()`: https://dxr.mozilla.org/mozilla-central/rev/f9a5e9ed62103c84e4cde915f4d08f1ce71be83e/toolkit/components/telemetry/TelemetryEnvironment.jsm#870 https://dxr.mozilla.org/mozilla-central/rev/f9a5e9ed62103c84e4cde915f4d08f1ce71be83e/toolkit/components/telemetry/TelemetryEnvironment.jsm#1424 That initializing and waiting happens very early in the session and should finish before the first ping. It is probably fine from the Telemetry side. Have we had success reproducing this locally with fresh profiles? That would make it easy to investigate.
Flags: needinfo?(gfritzsche) → needinfo?(mhowell)
I don't seem to be able to reproduce the problem on either my main profile or a fresh one. I'm not 100% certain because I'm having a surprisingly hard time decoding the content of the requests being sent, but I can see the attribution data filled in at the correct place in about:telemetry by the time the first pings go out shortly after startup. So unless something is disconnected between about:telemetry and what actually gets sent out, then I'm not seeing this happening at all.
Flags: needinfo?(mhowell)
Have you checked the actual archived pings or just the "current" data? You can access them through the ping selector button: https://imgur.com/a/Ky4Re
Flags: needinfo?(mhowell)
But lets move the investigation to a new follow-up bug? That's probably easier to track.
Oh, I wasn't aware of that archived pings feature, that's really neat. Using that I do see the issue; as soon as I can get about:telemetry open in a new profile (before there are any archived pings), the current ping view shows the attribution data, but once the first batch of archived pings appears (which I'm forcing by restarting the browser), they are missing the attribution object. I am at a complete loss and do not understand how this is possible even in theory. I don't see anything being done differently from, e.g., the profile creation date, and that's always there. I don't think I'm the right person to be investigating this (but then I also didn't think I was the right person to write this code in the first place...).
Flags: needinfo?(mhowell)
Depends on: 1400870
(In reply to Matt Howell [:mhowell] from comment #49) > Oh, I wasn't aware of that archived pings feature, that's really neat. Using > that I do see the issue; as soon as I can get about:telemetry open in a new > profile (before there are any archived pings), the current ping view shows > the attribution data, but once the first batch of archived pings appears > (which I'm forcing by restarting the browser), they are missing the > attribution object. > > I am at a complete loss and do not understand how this is possible even in > theory. I don't see anything being done differently from, e.g., the profile > creation date, and that's always there. I don't think I'm the right person > to be investigating this (but then I also didn't think I was the right > person to write this code in the first place...). Hey Matt, fear thou not! I filed bug 1400870 and I will be taking a look at this in the next couple of days. Thanks for showing off how to reproduce the problem!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: