Closed Bug 723846 Opened 13 years ago Closed 13 years ago

make boolean histograms more useful

Categories

(Toolkit :: Telemetry, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla13

People

(Reporter: froydnj, Assigned: froydnj)

References

Details

Attachments

(3 files, 7 obsolete files)

+++ This bug was initially created as a clone of Bug #723802 +++

Is it necessary to call accumulate with a false value for boolean telemetry when we just want to know how often something happens?  From what I can tell it should be enough to just report the interesting (true) case and then compare that to the number of telemetry reports for the time period.  This is what Telemetry::A11Y_INSTANTIATED currently does but it leads to useless bar graphs showing 100% of reporters with it on without showing the total number of reports that didn't specify a value.  It seems like boolean telemetry probes should have a default value.  The advantage of only reporting in the true case is that it has no perf. impact.
I think we should do this on the serverside instead.
> The advantage of only reporting in the true case is that it has no perf. impact.

Have you measured a perf impact of accumulating into a histogram?
(In reply to Taras Glek (:taras) from comment #1)
> I think we should do this on the serverside instead.

That's possible and why I didn't file a bug for it in the first place.  I wanted to know if that's something that will get fixed on the server.  The problem is that the server side currently doesn't know for sure if a boolean will be used just for a true or false value or whether it's reporting both.  I guess they can just check if 100% of the reports are for one side and then assume it should be compared to the total reports?  This only works if you are looking only at data for a browser that reports that telemetry probe, otherwise it'll skew the data.

(In reply to Justin Lebar [:jlebar] from comment #2)
> > The advantage of only reporting in the true case is that it has no perf. impact.
> 
> Have you measured a perf impact of accumulating into a histogram?

That's a quote from me btw.

Note that I wasn't saying there was a perf impact of accumulating, I was just saying that not accumulating has zero perf impact compared to not using telemetry.
(In reply to Matthew N. [:MattN] from comment #3)
> (In reply to Taras Glek (:taras) from comment #1)
> > I think we should do this on the serverside instead.
> 
> That's possible and why I didn't file a bug for it in the first place.  I
> wanted to know if that's something that will get fixed on the server.

File a bug for the server side or the client side, doesn't matter much.  If it's not a good idea, somebody will say so.

> The problem is that the server side currently doesn't know for sure if a boolean
> will be used just for a true or false value or whether it's reporting both.

This is a problem, yes.  We could probably just use the heuristic you proposed for the boolean histograms we have now, though.
> The problem is that the server side currently doesn't know for sure if a boolean will be used just 
> for a true or false value or whether it's reporting both.  I guess they can just check if 100% of 
> the reports are for one side and then assume it should be compared to the total reports?

I don't think you should rely on this.  Take a jaunt through the Telemetry dashboard and you'll see a bunch of bogus reporter names.  I don't think it's unlikely that we have bogus values as well.  Just one bogus value out of potentially millions of pings could throw your report off.
Blocks: 732310
Attached patch part 1 - add FlagHistogram to ipc/ (obsolete) (deleted) — Splinter Review
Here's a proposed solution: let's have a boolean-esque histogram that actually starts with a 0 value and only lets people accumulate a single positive value, kind of like a toggleable flag.  Hence the name FlagHistogram.

I realize that it might be more conceptually elegant to have a separate entity for this sort of thing, a TelemetryFlag or somesuch.  But this is expedient from a code point of view, an existing client's point of view, and from a metrics point of view (histogram dashboard already built, etc.)
Attachment #602356 - Flags: review?(taras.mozilla)
Attached patch part 2 - add FlagHistogram support to telemetry (obsolete) (deleted) — Splinter Review
Happily, little code is needed on the telemetry side of things.
Attachment #602357 - Flags: review?(taras.mozilla)
Attached patch part 3 - tests for FlagHistogram (obsolete) (deleted) — Splinter Review
...and tests to make sure everything works correctly.
Attachment #602359 - Flags: review?(taras.mozilla)
Comment on attachment 602356 [details] [diff] [review]
part 1 - add FlagHistogram to ipc/

that's a neat hack. This better get documented well
Attachment #602356 - Flags: review?(taras.mozilla) → review+
Comment on attachment 602356 [details] [diff] [review]
part 1 - add FlagHistogram to ipc/

+  Histogram::Accumulate(1, 1, zero_index);
I'm confused why you don't use 0 as a default value
Comment on attachment 602357 [details] [diff] [review]
part 2 - add FlagHistogram support to telemetry

 * HISTOGRAM_FLAG - For storing single 0/1 values

That's confusing. You should say that this histogram can only contain a single value, ie count == 1.

Also this is kind of weird. This histogram gets instantiated the first time you assign or read from it...so then why have the default value?

I thought the point of these was to exist in every ping, which will allow you to look at stuff like marketshare of l10n.
Comment on attachment 602359 [details] [diff] [review]
part 3 - tests for FlagHistogram

+  // Should only switch counts once.
+  h.add(3);

Should this not throw an error?
Comment on attachment 602357 [details] [diff] [review]
part 2 - add FlagHistogram support to telemetry

I'm pretty sure in order to work as intended you have to request all  HISTOGRAM_FLAG histograms to be created the first time histogramSnapshots is called.
Attachment #602357 - Flags: review?(taras.mozilla) → review-
(In reply to Taras Glek (:taras) from comment #11)
> +  Histogram::Accumulate(1, 1, zero_index);
> I'm confused why you don't use 0 as a default value

I am using 0 as a default value; I compute the index/bucket 0 goes in--that's zero_index--and stick a 1 there.

The inverse might be useful: default to 1 and get flipped to 0.  But I suppose you can have a FOO_DISABLED histogram to capture that semantic.

(In reply to Taras Glek (:taras) from comment #13)
> +  // Should only switch counts once.
> +  h.add(3);
> 
> Should this not throw an error?

I am ambivalent about this.  Doing that would require modifying all the Histogram subclasses to have their Add methods (and Accumulate methods, and SampleSet::Accumulate) return a status code.  I can see the value in doing this and am happy to do it if need be.

(In reply to Taras Glek (:taras) from comment #14)
> I'm pretty sure in order to work as intended you have to request all 
> HISTOGRAM_FLAG histograms to be created the first time histogramSnapshots is
> called.

Ooo, good point.  I'll do that and add a test.

I wonder how badly that would screw with the metrics side of things though; before you could assume that the histogram data was increasing only.  Now we might have these crazy histograms with one set of values one ping and a totally different (non-superset) set next ping.
(In reply to Nathan Froyd (:froydnj) from comment #15)

> I wonder how badly that would screw with the metrics side of things though;
> before you could assume that the histogram data was increasing only.  Now we
> might have these crazy histograms with one set of values one ping and a
> totally different (non-superset) set next ping.

I'm not sure what you mean.
Attached patch part 2 - add FlagHistogram support to telemetry (obsolete) (deleted) — Splinter Review
Updated patch, creating flag histograms automagically from addons and "main" telemetry, and fixing a couple of thinkos.
Attachment #602357 - Attachment is obsolete: true
Attachment #602932 - Flags: review?(taras.mozilla)
Attached patch part 3 - tests for FlagHistogram (obsolete) (deleted) — Splinter Review
Updated tests for addons.
Assignee: nobody → nfroyd
Attachment #602359 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #602359 - Flags: review?(taras.mozilla)
Attachment #602933 - Flags: review?(taras.mozilla)
Comment on attachment 602932 [details] [diff] [review]
part 2 - add FlagHistogram support to telemetry

+  printf("Getting histogram %s with type %u\n", name, histogramType);

^ I doubt you meant to put that in.

+  printf("Got histogram %s with type %u!\n", name, histogramType);
   return NS_OK;

+      nsresult rv = GetHistogramByEnumId(Telemetry::ID(i), &h);
+      if (NS_FAILED(rv)) {
+        return rv;
+      }

This is a bit scary. Having one ill-specified flag histogram will bust our entire telemetry reporting. An assert would work better.

I'll trust you on addon histogram stuff(that never fails, right?).
Attachment #602932 - Flags: review?(taras.mozilla) → review-
(In reply to Taras Glek (:taras) from comment #19)
> +  printf("Getting histogram %s with type %u\n", name, histogramType);
> 
> ^ I doubt you meant to put that in.

Oops, indeed.

> +      nsresult rv = GetHistogramByEnumId(Telemetry::ID(i), &h);
> +      if (NS_FAILED(rv)) {
> +        return rv;
> +      }
> 
> This is a bit scary. Having one ill-specified flag histogram will bust our
> entire telemetry reporting. An assert would work better.

GetHistogramByEnumId doesn't do any real checking of flag histogram parameters; we can make the static asserts for boolean histograms do the same checks for flag histograms, though.

So maybe there's no point in checking the return value here.

> I'll trust you on addon histogram stuff(that never fails, right?).

The addon histogram stuff at least has tests.  ^_^
Attached patch part 2 - add FlagHistogram support to telemetry (obsolete) (deleted) — Splinter Review
Debugging code removed.  We now only complain about finding flag histograms in debug mode with an assert.
Attachment #602932 - Attachment is obsolete: true
Attachment #603398 - Flags: review?(taras.mozilla)
Comment on attachment 602933 [details] [diff] [review]
part 3 - tests for FlagHistogram

You need a test in test_TelemetryPing to see that a FLAG histogram is being registered with a default 0 value
Attachment #602933 - Flags: review?(taras.mozilla) → review+
Attachment #603398 - Flags: review?(taras.mozilla) → review+
Attached patch part 1 - add FlagHistogram to ipc/ (obsolete) (deleted) — Splinter Review
There was a bug in FlagHistogram; the bucket ranges are not initialized in the constructor, but rather by explicit calls in FactoryGet.  So the initial bucketing of 0 was totally bogus.  I am unsure of how the nsITelemetry tests didn't catch this...
Attachment #602356 - Attachment is obsolete: true
Attachment #603454 - Flags: review+
Add HISTOGRAM_FLAG macro and check for HISTOGRAM_FLAG in the static checking bits.
Attachment #603398 - Attachment is obsolete: true
Attachment #603455 - Flags: review+
Add TelemetryPing test to make sure we automagically instantiate flag histograms.  Also add a little extra checking for sane sums to the nsITelemetry tests.
Attachment #602933 - Attachment is obsolete: true
Attachment #603456 - Flags: review+
Whiteboard: [autoland-try:-b do -p all -u all -t none]
Whiteboard: [autoland-try:-b do -p all -u all -t none] → [autoland-in-queue]
Autoland Patchset:
	Patches: 603454, 603455, 603456
	Branch: mozilla-central => try
	Destination: http://hg.mozilla.org/try/pushloghtml?changeset=edb3d7d29932
Try run started, revision edb3d7d29932. To cancel or monitor the job, see: https://tbpl.mozilla.org/?tree=Try&rev=edb3d7d29932
Try showed that Windows and Android builds fail on histogram.cc because MOZ_ASSERT is not defined there (possibly Mac too; those builds aren't done yet).  Curious...

Using the moral equivalent, DCHECK_EQ instead.
Attachment #603454 - Attachment is obsolete: true
Attachment #603530 - Flags: review+
Try run for edb3d7d29932 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=edb3d7d29932
Results (out of 63 total builds):
    success: 47
    warnings: 7
    failure: 9
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/autolanduser@mozilla.com-edb3d7d29932
 Timed out after 06 hours without completing.
Whiteboard: [autoland-in-queue]
Whiteboard: [autoland-try:-b do -p all -u all -t none:603530,603455,603456]
Whiteboard: [autoland-try:-b do -p all -u all -t none:603530,603455,603456] → [autoland-in-queue]
Autoland Patchset:
	Patches: 603530, 603455, 603456
	Branch: mozilla-central => try
	Destination: http://hg.mozilla.org/try/pushloghtml?changeset=ac8c0c20a522
Try run started, revision ac8c0c20a522. To cancel or monitor the job, see: https://tbpl.mozilla.org/?tree=Try&rev=ac8c0c20a522
Try run looks good, flagging for checkin.
Keywords: checkin-needed
Try run for ac8c0c20a522 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=ac8c0c20a522
Results (out of 217 total builds):
    success: 172
    warnings: 42
    failure: 3
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/autolanduser@mozilla.com-ac8c0c20a522
Whiteboard: [autoland-in-queue]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: