Closed
Bug 1376493
Opened 7 years ago
Closed 7 years ago
Aggregate String Scalars as Simple Counts
Categories
(Data Platform and Tools :: General, enhancement, P1)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: bugzilla, Assigned: frank)
References
Details
In bug 1323069 I added a new scalar probe, A11Y_INSTANTIATORS. This probe contains string values so it is not displayed on t.m.o.
Ideally we would like to be able to generate a histogram from that probe where each unique, non-empty string receives its own bucket.
Comment 2•7 years ago
|
||
Note this will help us understand risk as we ship Windows e10s a11y support...
Assignee | ||
Comment 3•7 years ago
|
||
Hi David, apologies for the delay. Chutten and I discussed this and came up with a solution for displaying string scalars. Basically, we'll end up with a single number per string, just the total number of instances of that string across all pings. It will be displayed like keyed scalars are now, but instead of a distribution, just a single number. I'll make this bug track those changes.
Assignee: nobody → fbertsch
Points: --- → 2
Component: Datasets: General → Datasets: Telemetry Aggregates
Flags: needinfo?(fbertsch)
Priority: -- → P1
Summary: Need to be able to aggregate A11Y_INSTANTIATORS scalar → Aggregate String Scalars as Simple Counts
Assignee | ||
Comment 4•7 years ago
|
||
Benjamin: We are running into a question on the PR [0] of whether we need to limit the strings to just those that occur in greater than 1% (or some other percentage) of incoming pings. 1% is the number for the hardware report. If this were the case for string scalars, do we also need to do the same for keyed histograms?
[0] https://github.com/mozilla/python_mozaggregator/pull/49
Flags: needinfo?(benjamin)
Comment 5•7 years ago
|
||
In order to reduce risk, I do not think we should display this data by default/automatically on telemetry.mozilla.org. The identification risk of strings is naturally higher.
That doesn't mean that we can't ever do it: but I'd prefer that teams explicitly review incoming data to ensure that it's the data they expect before making it public.
So to start out, I recommend analyzing this using a dataset (is this data included in main-summary?) in STMO, or using an ATMO query. Once you've reviewed results, it's ok to publish using the STMO publishing facility, and that's less risky than doing automatically-publishing aggregates.
Flags: needinfo?(benjamin)
Assignee | ||
Comment 6•7 years ago
|
||
> In order to reduce risk, I do not think we should display this data by
> default/automatically on telemetry.mozilla.org. The identification risk of
> strings is naturally higher.
In that case, I can create a whitelist of string scalars to aggregate. We should also update the documentation and mention that if teams want their string scalars aggregated, they need to put a bug out and we can add it.
Benjamin, would you want teams to request a data review from a data steward before asking to make a string scalar public?
Flags: needinfo?(benjamin)
Comment 7•7 years ago
|
||
Yes. I think the risk profile of that is enough that we'd like to review those.
Flags: needinfo?(benjamin)
Assignee | ||
Comment 8•7 years ago
|
||
Here is a query listing all the A11Y_INSTANTIATORS strings, with their counts [0]. David, is it acceptable to make these public? Benjamin, requesting a data-review. Please let us know if you need more information.
[0] https://sql.telemetry.mozilla.org/queries/5484
Flags: needinfo?(dbolter)
Flags: needinfo?(benjamin)
Comment 9•7 years ago
|
||
Thank you! I don't have a need to make this public.
Sorting by count, high to low, is pretty interesting!
Flags: needinfo?(surkov.alexander)
Flags: needinfo?(jmathies)
Flags: needinfo?(dbolter)
Flags: needinfo?(aklotz)
Assignee | ||
Comment 10•7 years ago
|
||
> Thank you! I don't have a need to make this public.
What I mean is, if/when we add this to the aggregator, all of these will be public on TMO. I want to make sure these aren't surprising to you, and all of the values are expected.
Or do you mean you don't need them on TMO any longer?
Comment 11•7 years ago
|
||
I'd prefer that David's team publish a one-off or curated report with this data rather than making it public by default. So I guess that counts as data-review denied?
Flags: needinfo?(benjamin)
Comment 12•7 years ago
|
||
I've imported the dataset from comment 8 into a google sheet and shared with some folks. We can add notes/findings to the rows there and decide what we want to do next...
Updated•7 years ago
|
Flags: needinfo?(jmathies)
Comment 13•7 years ago
|
||
For now, let's not fix this in the aggregator.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(surkov.alexander)
Flags: needinfo?(aklotz)
Resolution: --- → WONTFIX
Updated•2 years ago
|
Component: Datasets: Telemetry Aggregates → General
You need to log in
before you can comment on or make changes to this bug.
Description
•