Closed Bug 1328678 Opened 8 years ago Closed 3 years ago

Aggregator should have more buckets for count histograms

Categories

(Data Platform and Tools :: General, defect, P3)

defect
Points:
3

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: frank, Unassigned)

References

Details

For some count histograms, most counts are beyond 10000 [0]. We need to continue to have bucket for 10000 histograms or more [1]. My idea would be linear histograms for the first 25 or so, then calculated exponential for the rest, tacking on new buckets when we need them. This would allow differentiation far beyond 10000, while still keeping precision for low counts. The big issue would be backfill. We could either do an actual backfill on a few recent count histograms (such as the one mentioned), or include some sort of note on TMO to let people know the difference. [0] https://mzl.la/2jamLUb [1] https://github.com/mozilla/python_mozaggregator/blob/master/mozaggregator/aggregator.py#L16
Blocks: 1297867
Variations of this issue show up time and time again; we should consider the use of histograms with dynamic range (e.g. [1][2]) to solve this class of problems. [1] https://github.com/HdrHistogram/HdrHistogram [2] https://github.com/vitillo/lua_tdigest
Points: --- → 3
Priority: -- → P3
Any action here? This skews one of Quantum engagement metrics. If we collect data and then make it less useful in aggregation; we can stop collecting the data and save the bandwidth/storage
(In reply to :Harald Kirschner :digitarald from comment #2) > Any action here? This skews one of Quantum engagement metrics. If we collect > data and then make it less useful in aggregation; we can stop collecting the > data and save the bandwidth/storage If this is blocking Quantum work we can certainly move it up the priority queue. Question: How and why are you using aggregates data for engagement measures? Are you using the data to create a dash somewhere? Or is this just for viewing in TMO?
Flags: needinfo?(hkirschner)
We are planning to use scroll engagement as proxy for improved performance in pref-flipping experiments.
Flags: needinfo?(hkirschner)
Is this usage then predicated on bug 1336989? Do you also need to see experiments and branches?
Flags: needinfo?(hkirschner)
if experiments means the pref-flipping experiment pipeline, then yes.
Flags: needinfo?(hkirschner)
No longer blocks: 1255755
Component: Metrics: Pipeline → Datasets: Telemetry Aggregates
Product: Cloud Services → Data Platform and Tools
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
Component: Datasets: Telemetry Aggregates → General
You need to log in before you can comment on or make changes to this bug.