Closed Bug 1821189 Opened 2 years ago Closed 2 years ago

Put provenance attribution in `installation.first_seen` event

Categories

(Firefox :: Installer, task, P2)

Unspecified
Windows
task

Tracking

()

RESOLVED FIXED
113 Branch
Tracking Status
firefox112 --- fixed
firefox113 --- fixed

People

(Reporter: bytesized, Assigned: bytesized)

References

Details

(Whiteboard: [fidedi-ope])

Attachments

(1 file)

Filing this as a followup bug to do the work described by this comment:

The [installation.first_seen] event will be on the "event" ping and the Scalars will be on the "new-profile" ping. You should be able to link the two by client_id (since there should only ever be one "new-profile" ping per client_id). There's no guarantee you'd get both -- I can think of circumstances where you'd end up with only one or the other -- but in the case of folks who stick around and aren't obviously robots with fifteen second initial sessions? I think you should have a decent chance of being able to link.

But by all means dump the scalar info into the event extras to save yourself the hassle.

Remember, so long as we're not talking event-per-vsync sorts of things, optimize not for the size of the dataset: optimize for the length of time it takes to analyze the data.

Or, put another way: data that is too hard to use will not be used. We don't want to collect data that won't be used. So make it easy.

Blocks: 1814968
Whiteboard: [fidedi-ope]

It's not actually clear to me that we can do this. Events appear to be limited to 10 keys in their extras object. There already appear to be 10 keys in the extras object for installation.first_seen:

  1. version
  2. build_id
  3. admin_user
  4. profdir_existed
  5. from_msi
  6. silent
  7. default_path
  8. install_existed
  9. other_inst
  10. other_msix_inst
Flags: needinfo?(nalexander)

(In reply to Robin Steuber (they/them) [:bytesized] from comment #1)

It's not actually clear to me that we can do this. Events appear to be limited to 10 keys in their extras object. There already appear to be 10 keys in the extras object for installation.first_seen:

  1. version
  2. build_id
  3. admin_user
  4. profdir_existed
  5. from_msi
  6. silent
  7. default_path
  8. install_existed
  9. other_inst
  10. other_msix_inst

Oh dear. TIL. chutten, is there anything to be done here? Can we have another level of nesting to avoid these limits some way?

Flags: needinfo?(nalexander) → needinfo?(chutten)
Priority: -- → P2

Nesting isn't supported because these string -> string maps are interpreted as that structure on ingestion.

(And I don't recommend encoding your own second-level nesting in the event extra values either. No one should write parsers in SQL).

With present technology I'd recommend having a second event with a similar name (maybe with a suffix like _ext) with the extra extras. There's a possibility we might only receive one or the other event instead of both, but that should only happen if the "event" ping splits between them and one of the two pings fails to be received properly (ie, should basically never happen).

Otherwise or as well, we could look into raising the event extras limit (might just be a matter of raising the limits). Changes to Legacy Telemetry are still possible, after all. But we'd need to consider the ramifications. Glean has the same limit, after all, so we may not have yet had a proper think about the impact of large numbers of event extras. (e.g., should we change the event record trigger for flushing storage and submitting the ping?)

Flags: needinfo?(chutten)

(In reply to Chris H-C :chutten from comment #3)

With present technology I'd recommend having a second event with a similar name (maybe with a suffix like _ext) with the extra extras. There's a possibility we might only receive one or the other event instead of both, but that should only happen if the "event" ping splits between them and one of the two pings fails to be received properly (ie, should basically never happen).

@nalexander does this sound like an improvement over the current situation? Should I get started on this?

Flags: needinfo?(nalexander)

(In reply to Robin Steuber (they/them) [:bytesized] from comment #4)

(In reply to Chris H-C :chutten from comment #3)

With present technology I'd recommend having a second event with a similar name (maybe with a suffix like _ext) with the extra extras. There's a possibility we might only receive one or the other event instead of both, but that should only happen if the "event" ping splits between them and one of the two pings fails to be received properly (ie, should basically never happen).

@nalexander does this sound like an improvement over the current situation? Should I get started on this?

Argh -- a partially typed comment that I missed finishing. Anyway:

Thanks, :chutten.

:bytesized: yes, I think we'd better to do this, since I'm not having a lot of luck joining installation.first_seen events to new-profile pings in either direction.

Earlier I witnessed the reverse, lots (>50%) of new_profile pings without installation.first_seen events, over in https://bugzilla.mozilla.org/show_bug.cgi?id=1811374. (That is expected, prompting that ticket.) See https://sql.telemetry.mozilla.org/queries/90885/source?p_range=2022-12-01--2022-12-28.

But now I witness lots of installation.first_seen events without corresponding new_profile pings, perhaps >60%! See https://sql.telemetry.mozilla.org/queries/90886/source?p_range=2022-11-01--2022-11-29.

So yes, we'd better get an additional event, in the hopes that we'll get two events and be able to actually have coherent data from the single system. Thanks!

Flags: needinfo?(nalexander)
Assignee: nobody → bytesized

We can't add the provenance data to the installation.first_seen extra data because it is already at its maximum number of keys. So instead we will add the installation.first_seen_prov_ext event which will be sent at the same time as installation.first_seen and will contain provenance attribution data in its extras object.

I confirmed with :chutten that since this telemetry sends precisely the same values that are sent by the code added in Bug 1815023, no additional data review is needed for this telemetry change.

Pushed by rsteuber@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c4f8098f81bb Associate provenance attribution with `installation.first_seen` event r=nalexander
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 113 Branch

Comment on attachment 9322883 [details]
Bug 1821189 - Associate provenance attribution with installation.first_seen event r=nalexander!

Beta/Release Uplift Approval Request

  • User impact if declined: This is a telemetry-only change. So there would be no impact to users, but it will negatively impact our ability to analyze what sort of attribution data is available in the Zone Identifier Alternate Data Stream on Windows.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Code changes are telemetry-only. And they are wrapped in try/catch blocks. So even if there is something wrong with the code, there should be no user-facing impact.
  • String changes made/needed: None
  • Is Android affected?: No
Attachment #9322883 - Flags: approval-mozilla-beta?

Comment on attachment 9322883 [details]
Bug 1821189 - Associate provenance attribution with installation.first_seen event r=nalexander!

Approved for 112.0b3

Attachment #9322883 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: