Closed
Bug 727184
Opened 13 years ago
Closed 12 years ago
Increase granularity of telemetry uptime measurement
Categories
(Mozilla Metrics :: Data/Backend Reports, defect)
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 778809
Backlogged - BZ
People
(Reporter: justin.lebar+bug, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: Telemetry -- needs PM project priority)
We track uptime in telemetry, and even report it in the telemetry front-end (as SIMPLE_MEASURES_UPTIME).
But "uptime" currently is a binary proposition: It's either "less than 1 day" or "more than 1 day". This makes investigations such as bug 726375 difficult.
I'm not wed to the specific set of buckets, but if we did 24h / 2^7 (~12m) and doubled from there, all but the first two buckets would be an even number of minutes, and all after the first seven would be an even number of days. So ~12m, 22.5m, 45m, etc. I doubt anyone has uptime this high, but "greater than 32 days" might be a good final bucket.
Comment 1•13 years ago
|
||
Uptime is in minutes. Not sure what you are proposing. Once bug 707320 is done we should have more long uptimes sent in. Feel free to help with that bug btw :)
Reporter | ||
Comment 2•13 years ago
|
||
Uptime is sent in minutes. But it is bucketed into "less than 1 day" and "more than one day" in the telemetry database. See the SIMPLE_MEASURES_UPTIME histogram.
I'm proposing that there be more buckets, and that we re-bucket existing data.
Comment 3•13 years ago
|
||
Definitely makes sense to have more buckets. Can you guys think about what level it should be broken down to and post that in this bug and then we'll plan out the change?
Comment 4•13 years ago
|
||
Hit enter by mistake. What I meant to say is whether the other Telemetry people agree with jlebar's proposal before we start implementing it.
Comment 5•13 years ago
|
||
(In reply to Daniel Einspanjer :dre [:deinspanjer] from comment #4)
> Hit enter by mistake. What I meant to say is whether the other Telemetry
> people agree with jlebar's proposal before we start implementing it.
I don't think there are other telemetry people who care about uptime distribution. Lets just do it.
Reporter | ||
Updated•13 years ago
|
Comment 6•13 years ago
|
||
As per the figure at the bottom of http://people.mozilla.org/~sguha/cyccollector.uptime.html we suggest [ 0, 5, 15, 30, 60,90, … every 60 up to 1441 … , 2880 ] (2880 = 2 days) as buckets. This implies every hour would be a bucket
with the exception of the first 90 minutes leaving around 30 buckets for display purposes.
Reporter | ||
Comment 7•13 years ago
|
||
(In reply to Saptarshi Guha from comment #6)
> As per the figure at the bottom of
> http://people.mozilla.org/~sguha/cyccollector.uptime.html we suggest [ 0, 5,
> 15, 30, 60,90, … every 60 up to 1441 … , 2880 ] (2880 = 2 days) as buckets.
> This implies every hour would be a bucket
> with the exception of the first 90 minutes leaving around 30 buckets for
> display purposes.
I'm not wed to a specific set of buckets, but I'd like the max bucket to be bigger than 2 days. Doing one every 60 minutes up to 1 day, then skipping all the way up to 2 days seems less than ideal, but whatever.
Comment 8•13 years ago
|
||
Oh yes, we need a bucket for everything above 2 days (otherwise we wont have a bucket for those cases).
If you look at the graph, there is only a handful of points above 2 days and if you look at the bucket widths (the intervals in the panels at the top of the page) for the last set of 3 horizontal graphs [254 (~4hrs)-1479(~1day)] contains 5% of the data and [1479,a very large number] another 5%.
Reporter | ||
Comment 9•13 years ago
|
||
> If you look at the graph, there is only a handful of points above 2 days
Even for people on the release channel?
The problem we have now is that we have too few buckets. I'd rather create some new ones we don't need than aggressively aggregate buckets together and come to regret it in the future.
Comment 10•13 years ago
|
||
Can't comment for release channel - didn't separate out for channels. However more buckets is always good, so +1 from here.
Comment 11•13 years ago
|
||
Please be aware that the ammount of buckets you choose will directly impact the ammount of aggregations we do.try to fight the urge of doing too many.
Reporter | ||
Comment 12•13 years ago
|
||
(In reply to Pedro Alves from comment #11)
> Please be aware that the ammount of buckets you choose will directly impact
> the ammount of aggregations we do.try to fight the urge of doing too many.
Could you please elaborate on this? What is the tradeoff, exactly?
Comment 13•13 years ago
|
||
This will act as a new dimension, so we can filter on this on the front end. When we aggregate the docs from hbase to ES, we aggregate on those dimensions. Every different combination will result in more docs, and this is no exception. Being a type of count thatmwill surely have a somewhat linear distribution, it will result in N more documents (N being themnumber of buckets)
I'm currently on vacations, and a bit slower to answer. Daniel should be able to help here too
Reporter | ||
Comment 14•13 years ago
|
||
(In reply to Pedro Alves from comment #13)
> This will act as a new dimension, so we can filter on this on the front end.
I don't think it needs to.
The uptime number applies to the telemetry ping as a whole. Filtering by "this ping was sent when Firefox had been running for between 60 and 90 minutes" is not particularly interesting to me, particularly because that doesn't mean that the data in the ping is from when Firefox had been running for 60-90m.
I'm OK keeping the current filtering as "less/more than one day" for the moment, since the ping is sent only once a day. Anyway, the ping from "less than one day" can now contain data from uptime of greater than one day, since we landed bug 707320.
Comment 15•13 years ago
|
||
Correct me, doesn't 60<uptime<90 mean that the measurements were collected for a period of time between 60 to 90 minutes? If not, what does uptime mean?
Reporter | ||
Comment 16•13 years ago
|
||
(In reply to Saptarshi Guha from comment #15)
> Correct me, doesn't 60<uptime<90 mean that the measurements were collected
> for a period of time between 60 to 90 minutes? If not, what does uptime mean?
|uptime| tells us how long Firefox had been running when the telemetry ping was sent.
Before bug 707320, all reported histograms were collected before |uptime|. After bug 707320, AIUI all bets are off.
Updated•12 years ago
|
Blocks: daily_beta_tracking
Comment 17•12 years ago
|
||
Marking: in group of > 33 asks for Telemetry that need PM priority before triage/scheduling.
Status: NEW → ASSIGNED
Whiteboard: Telemetry -- needs PM project priority
Comment 19•12 years ago
|
||
I think having bug 778809 fixed will result in what is wanted here
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•