Closed
Bug 1178923
Opened 9 years ago
Closed 9 years ago
[Metrics] Determine which telemetry metrics should be sent regardless of HUD setting status
Categories
(Firefox OS Graveyard :: Developer Tools, defect)
Tracking
(firefox43 fixed)
RESOLVED
FIXED
FxOS-S6 (04Sep)
Tracking | Status | |
---|---|---|
firefox43 | --- | fixed |
People
(Reporter: rnicoletti, Assigned: rnicoletti)
References
Details
Attachments
(1 file, 6 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review |
Currently, the hud backend (hud.js) sends the following telemetry information without regard to their status in the developer HUD settings panel:
* security errors
* reflow duration (although reflow count is only sent when the 'reflow' setting is enabled)
* custom metrics (console API-based)
There was no conscious decision resulting in the above metrics being sent without regard to the setting status; it is essentially because the hud.js code is cleaner to send the above without regard to checking their developer HUD status.
We need to make a conscious decision regarding which metrics will be sent without regard to their developer HUD status.
Assignee | ||
Comment 1•9 years ago
|
||
time-to-load (cold and warm start) is also sent regardless of developer HUD setting status
Assignee | ||
Updated•9 years ago
|
Blocks: nga-telemetry
Assignee | ||
Comment 2•9 years ago
|
||
While thinking about how to implement this it occurred to me that our initial idea of sending telemetry data for HUD metrics that are enabled may not be what we want to do. I'm thinking it would be better to have a core set of 'engineering' telemetry metrics that we will always record when HUD Telemetry is enabled and that we won't send any other HUD metrics regardless of whether they are enabled. For example, I'm thinking we won't ever send the HUD uss memory data because I don't think it's very useful compared to recording uss memory at the various performance marks, which will be a core metric according to me. The core metrics I propose are:
* event loop lag
* reflows
* reflow duration
* errors
* warnings
* security warnings
* app-startup at time of performance marks
* app-memory at time of performance marks
I'm finding performance is reasonable with all of these enabled in the HUD. My other reason for wanting to record these regardless of their HUD status is it seems to me we want to collect as much data as possible and ideally we would collect the data without putting a burden on the developers to explicitly enable settings.
What do you think?
Flags: needinfo?(thills)
Comment 3•9 years ago
|
||
Hi Russ,
I agree this makes sense to just always collect a core set of engineering metrics. I think it's a configuration mess to do it any other way.
In terms of USS, I think that having it at the startup performance mark should be sufficient at least for a first round. I believe it will be very difficult and potentially useless data to collect just massive amounts of USS data. We can revisit later if this turns out to be wrong, but I think USS at the startup is a good place to start.
-tamara
Flags: needinfo?(thills)
Assignee | ||
Comment 4•9 years ago
|
||
After attempting a POC for the implementation suggested in comment 2, it's apparent that recording and sending telemetry data for existing hud metrics that are not enabled is in conflict with the hud backend design and architecture; it's not feasible. Therefore, the only telemetry metrics that will always be recorded and sent are:
* app-startup at time of performance marks
* app-memory at time of performance marks
In addition, as mentioned in the original description of this bug, the following metrics are currently always being recorded and sent regardless of whether they are enabled:
* security category
* reflow duration
Part of addressing this bug will be to only send these telemetry metrics when they are enabled in the developer HUD settings.
To summarize, the following telemetry metrics will be sent only when they are enabled in the developer HUD settings:
* event loop lag
* reflows
* reflow duration
* errors
* warnings
* security category
These telemetry metrics will always be sent (they cannot be disabled in the developer HUD settings):
* app-startup at time based on performance marks
* app-memory at time of performance marks
Assignee | ||
Comment 5•9 years ago
|
||
Update to comment 4:
Telemetry metrics sent when they are enabled in the developer HUD settings:
* event loop lag
* reflows
* reflow duration
* errors
* warnings
* security category
Telemetry metrics always sent (they cannot be disabled in the developer HUD settings):
* app-startup at time based on performance marks
* app-memory at time of performance marks
* custom metrics (via the AdvancedTelemetryHelper)
Assignee | ||
Comment 6•9 years ago
|
||
Hi Jan,
This patch corrects the problem of security category and reflow duration telemetry metrics not respecting HUD settings.
Attachment #8650012 -
Flags: review?(janx)
Comment 7•9 years ago
|
||
Comment on attachment 8650012 [details] [diff] [review]
respect hud settings
Review of attachment 8650012 [details] [diff] [review]:
-----------------------------------------------------------------
The patch looks correct, but I have a few nits.
Also, I'm not sure there is any value in respecting HUD settings for these two metrics (security category and reflow duration). Tracking their value has virtually no performance impact, because the console listener is active anyway, and the HUD settings are merely there to hide the widgets from the UI, in order not to overload it. I don't see any problems with sending these values to telemetry even if they're not shown in the HUD.
::: b2g/chrome/content/devtools/hud.js
@@ +412,5 @@
> // Telemetry sends the security error category not the
> // count of security errors.
> + if (this._watching[metric.name]) {
> + target._sendTelemetryEvent({
> + name: 'security_category',
Nit: I'm not sure this telemetry event name needs to be changed. Also, I believe Tamara's recent patch assumes there will never be "_" characters in metric names. Please leave it as is, or remove the "_".
@@ +470,5 @@
>
> + // Telemetry also records reflow duration when recording
> + // reflows.
> + if (this._watching[metric.name]) {
> + target._sendTelemetryEvent({name: 'reflow_duration',
Nit: Same comment as above, please remove the "_".
Attachment #8650012 -
Flags: review?(janx) → feedback+
Comment 8•9 years ago
|
||
(In reply to Jan Keromnes [:janx] from comment #7)
>
> ::: b2g/chrome/content/devtools/hud.js
> @@ +412,5 @@
> > // Telemetry sends the security error category not the
> > // count of security errors.
> > + if (this._watching[metric.name]) {
> > + target._sendTelemetryEvent({
> > + name: 'security_category',
>
> Nit: I'm not sure this telemetry event name needs to be changed. Also, I
> believe Tamara's recent patch assumes there will never be "_" characters in
> metric names. Please leave it as is, or remove the "_".
>
> @@ +470,5 @@
> >
> > + // Telemetry also records reflow duration when recording
> > + // reflows.
> > + if (this._watching[metric.name]) {
> > + target._sendTelemetryEvent({name: 'reflow_duration',
>
> Nit: Same comment as above, please remove the "_".
Jan and Russ,
I believe we need the '-' changed to an '_'. For the custom histograms, that is handled in the Gaia patch. In the hud keyed histograms which are predefined in histograms.json, it's tied to the histogram name which is DEVTOOLS_HUD_ + metric.name.toUpperCase(). The style of the histograms.json does not include any '-' so to match up with that we have all underscores and uppercase.
Thanks,
tamara
Assignee | ||
Comment 9•9 years ago
|
||
Hi Jan,
I agree it's a good idea to always send warnings, errors, security, reflow information regardless of their HUD setting status. This patch accomplishes that.
Attachment #8650012 -
Attachment is obsolete: true
Attachment #8651116 -
Flags: feedback?(janx)
Assignee | ||
Comment 10•9 years ago
|
||
Jan, I am working on a simpler patch. The patch I submitted for feedback has logic to only send the HUD event when the consoleWatcher metric is enabled and to always send the telemetry event. I feel this is not the correct approach. I believe the correct approach is to send the HUD event and the telemetry event if either the metric is enabled or if telemetry is enabled. The developer can use the 'hide hud widgets' if they don't want to see the widgets.
I like this simpler apporoach. However, there is a slightly tricky aspect of it where the widget should only be cleared when the metric is not enabled and telemetry is also disabled. As I see it, this requires the consoleWatcher to be aware of when telemetry is disabled. I don't want to add a settings listener to the consoleWatcher; I'm thinking about the best approach.
Comment 11•9 years ago
|
||
Hi Russ, so sorry for the review lag, I got distracted by something else today. I'll get to it first thing tomorrow! (European time)
Assignee | ||
Comment 12•9 years ago
|
||
Hi Jan,
This patch is simpler than the previous one. It considers console watcher metrics to be `watched` if they are enabled in the HUD settings or if telemetry is enabled. This applies to the logic for sending the metrics and for clearing them.
Attachment #8651116 -
Attachment is obsolete: true
Attachment #8651116 -
Flags: feedback?(janx)
Attachment #8651973 -
Flags: feedback?(janx)
Comment 13•9 years ago
|
||
Comment on attachment 8651973 [details] [diff] [review]
send console watcher metrics by default
Review of attachment 8651973 [details] [diff] [review]:
-----------------------------------------------------------------
Hi Russ, a few nits here, but I'm concerned by this approach: won't it cause the graphical widgets to stay visible even if the user "unwatches" it from the settings (and telemetry is enabled)?
Instead, I'd prefer an approach where we don't interfere with the un/watched mechanism, by calling `target._logHistogram()` directly (before checking for `watched`) and setting `metric.skipTelemetry = true` to avoid dual reporting. Would that work for you?
::: b2g/chrome/content/devtools/hud.js
@@ +74,5 @@
> },
>
> + addOnTelemetryStatusChangedCallback(callback) {
> + this._onTelemetryStatusChangedCallbacks.push(callback);
> + },
Nit: Technically, `SettingsListener.observe()` already acts as a collection of callbacks. It's probably not necessary to create a new one.
Instead, the setting name 'debug.performance_data.advanced_telemetry' could be a constant at the top of this file, and all interested watchers could register their own observer function as needed.
@@ +444,5 @@
> target.clear({name: metric});
> }
> });
> }
> +
Nit: Trailing space.
@@ +453,5 @@
> },
>
> + onTelemetryStatusChanged(enabled) {
> + // If telemetry is disabled, remove any widgets that are disabled.
> + if (!enabled) {
Nit: Bail-out style is preferred: `if (enabled) { return; }`.
@@ +456,5 @@
> + // If telemetry is disabled, remove any widgets that are disabled.
> + if (!enabled) {
> + let watching = this._watching;
> + for (let metric in watching) {
> + if (!watching[metric]) {
Nit: Bail-out style is preferred: `if (watching[metric]) { continue; }`.
@@ +459,5 @@
> + for (let metric in watching) {
> + if (!watching[metric]) {
> + for (let target of this._targets.values()) {
> + target.clear({name: metric});
> + }
Nit: Trailing space.
Attachment #8651973 -
Flags: feedback?(janx) → feedback-
Assignee | ||
Comment 14•9 years ago
|
||
I've taken a slightly different approach than you suggested because before calling `_logHistogram` we need to bump the value of the metric. I've consolidated the calls to `bump` into a single function.
Attachment #8651973 -
Attachment is obsolete: true
Attachment #8652499 -
Flags: review?(janx)
Comment 15•9 years ago
|
||
Comment on attachment 8652499 [details] [diff] [review]
send console watcher metrics by default (updated)
Review of attachment 8652499 [details] [diff] [review]:
-----------------------------------------------------------------
A few suggestions to tweak your approach, but before addressing them, please make sure that Telemetry really cares about cumulative counts; otherwise please use the more direct `_logHistogram` approach.
Incrementing cumulative metric counts behind Target's back makes the consoleWatcher do strange things, and I'm wondering if such a count really makes sense for Telemetry? E.g. is it not sufficient to know:
> app:X had an error
> app:X had a warning
> app:Y had an error
Does Telemetry really need to know:
> app:X had its 1st error since launch
> app:X had its 4th warning since launch
> app:Y had its 27th error since launch
These numbers can grow quite high over time, e.g. if the user repeats an action triggering an error, while not being very informative: An app could have 1 critical error that prevents it from being used normally, or it could have 300 errors because the user tapped a problematic button many times.
---
If you do need cumulative counts, please proceed with the nits below. If you do not, please just call `_logHistogram()` for the type of console event and skip telemetry.
::: b2g/chrome/content/devtools/hud.js
@@ +262,5 @@
> + this.bumpVal(metric);
> + this.update(metric, message);
> + },
> +
> + bumpVal(metric) {
I don't really like the idea of adding a "dry bump" method to the Target API, because `update()` is supposed to be the main entry point with all the safety checks, and `bump()` is supposed to be just an alias.
I thought about making `bumpVal()` more obviously internal (e.g. by prefixing it with "_"), but actually the consoleWatcher needs to bypass the Target API to just increment a value, and such a method would be a detour. Please increment the value in the consoleWatcher directly, and access `target.metrics` to update it, leaving the Target API unchanged.
@@ +569,5 @@
> + * the telemetry metric.
> + * If the metric is not being watched, the function
> + * increments the value of the metric and records the telemetry
> + * metric.
> + */
Nit: Trailing spaces.
@@ +575,5 @@
> + // If this metric is being watched, bump the value
> + // and send the HUD event.
> + if (this._watching[metric.name]) {
> + target.bump(metric, output);
> + } else if (developerHUD._telemetry) {
Nit: Please use bail-out style:
if (this._watching[metric.name]) {
target.bump(metric, output);
return;
}
if (!developerHUD._telemetry) {
return;
}
// rest of code
@@ +581,5 @@
> + // bump the value and record the telemetry metric.
> + // Although `_logHistogram` respects whether telemetry is enabled,
> + // the additional check is here so that the metric will not be
> + // updated -- see `bumpVal` -- when it is not being recorded.
> + target.bumpVal(metric);
Nit: As mentioned above, please increment the `metric.value` here by accessing the `target.metrics` Map.
Attachment #8652499 -
Flags: review?(janx) → feedback-
Assignee | ||
Comment 16•9 years ago
|
||
Jan, thanks for the feedback. You've encouraged me to create the most concise patch of all. I thought about creating a function for the new logic but I figured it was ok inline.
Attachment #8653205 -
Flags: review?(janx)
Comment 17•9 years ago
|
||
Comment on attachment 8653205 [details] [diff] [review]
send console watcher metrics by default (updated)
Review of attachment 8653205 [details] [diff] [review]:
-----------------------------------------------------------------
Hi Russ, that's a lot more concise indeed.
Technically, you could get rid off one more `if`, by moving the `if(developerHUD._telemetry){}` block inside the `if(!this._watching[metric.name]){}` (persisting the incremented value unconditionally and letting watched metrics report their telemetry normally), but actually I don't think it's a good idea to use different code paths to report telemetry depending on the metric being watched or not.
With the two micro-nits, I'm all out of things to complain about. Thanks for addressing all my concerns!
::: b2g/chrome/content/devtools/hud.js
@@ +550,5 @@
> return;
> }
>
> + if (developerHUD._telemetry) {
> + // Always record telemetry for these metrics
Nit: Please add full stop or ":".
@@ +556,5 @@
> + let value = target.metrics.get(metric.name);
> + metric.value = (value || 0) + 1;
> + target._logHistogram(metric);
> +
> + // Telemetry has already been recorded
Nit: Trailing space, full stop.
Attachment #8653205 -
Flags: review?(janx) → review+
Updated•9 years ago
|
Attachment #8652499 -
Attachment is obsolete: true
Assignee | ||
Comment 18•9 years ago
|
||
Assignee | ||
Comment 19•9 years ago
|
||
Comment on attachment 8653662 [details] [diff] [review]
send console watcher metrics by default
Latest patch addresses last review comments and is rebased against b2g-inbound. Will attach try link when gaia-try tree is open.
Assignee | ||
Updated•9 years ago
|
Attachment #8653662 -
Attachment is obsolete: true
Assignee | ||
Comment 20•9 years ago
|
||
Rebased patch.
Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=39a473ca5845
Assignee | ||
Updated•9 years ago
|
Attachment #8653205 -
Attachment is obsolete: true
Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
Comment 21•9 years ago
|
||
Keywords: checkin-needed
Comment 22•9 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox43:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → FxOS-S6 (04Sep)
You need to log in
before you can comment on or make changes to this bug.
Description
•