Closed Bug 1172688 Opened 9 years ago Closed 9 years ago

Add telemetry for when gethash calls timeout

Categories

(Toolkit :: Safe Browsing, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox48 --- fixed

People

(Reporter: francois, Assigned: dimi)

References

Details

(Whiteboard: tpe-seceng)

Attachments

(1 file, 2 obsolete files)

It would be useful to know how often these completions timeout when trying to reach Google's servers.
Component: DOM: Security → Safe Browsing
Product: Core → Toolkit
This is probably going to be similar to bug 1150921.
Whiteboard: tpe-seceng
I would like to work on this bug.
Assignee: nobody → dlee
Status: NEW → ASSIGNED
Hi François, Do we measure this by how many success gethash request with a timeout or the time between each timeout ?
Flags: needinfo?(francois)
I don't think we have a need for the actual time it took. We just want to make sure that requests don't time out very often. So I would suggest: # timeouts / (# timeouts + # completed requests) which means we also need to add telemetry for the requests that succeeded.
Flags: needinfo?(francois)
Attached patch Add telemetry for gethash (obsolete) (deleted) — Splinter Review
Attachment #8732696 - Flags: review?(francois)
Comment on attachment 8732696 [details] [diff] [review] Add telemetry for gethash Review of attachment 8732696 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js @@ +553,5 @@ > // Notify the RequestBackoff once a response is received. > this._completer.finishRequest(this.gethashUrl, httpStatus); > > if (success) { > + Services.telemetry.getHistogramById("URLCLASSIFIER_COMPLETE_TIMEOUT").add(0); What happens when the request doesn't timeout but still fails? For example, if it returns a 503 instead of a 200. I would say that as far as the telemetry goes, we should consider that a non-timeout event and mark it as such. We're only interested in seeing how often the gethash endpoint fails to answer at all.
Attachment #8732696 - Flags: review?(francois)
Attached patch Add telemetry for gethash v2 (obsolete) (deleted) — Splinter Review
Modification of this patch: - Record all gethash requests in telemetry instead of just record success requests
Attachment #8732696 - Attachment is obsolete: true
Attachment #8733220 - Flags: review?(francois)
Comment on attachment 8733220 [details] [diff] [review] Add telemetry for gethash v2 Review of attachment 8733220 [details] [diff] [review]: ----------------------------------------------------------------- That looks fine to me. I think Benjamin needs to review all new telemetry probes.
Attachment #8733220 - Flags: review?(francois) → review?(benjamin)
Comment on attachment 8733220 [details] [diff] [review] Add telemetry for gethash v2 If this is an exploratory measurement, it should expire (typically in 6 months). If this is an operational alerting metric, I think we should be collecting from the release population, and you should talk a bit about how you plan on monitoring this with either daily or realtime alerting.
Attachment #8733220 - Flags: review?(benjamin) → feedback-
(In reply to Benjamin Smedberg [:bsmedberg] from comment #9) > If this is an exploratory measurement, it should expire (typically in 6 > months). > > If this is an operational alerting metric, I think we should be collecting > from the release population, and you should talk a bit about how you plan on > monitoring this with either daily or realtime alerting. The question we're trying to answer is: Is the 10s timeout reasonable and (especially) does this happen often in practice? Is that what you'd consider an exploratory measurement? If so, can we run it for 1 year instead of just 6 months?
Flags: needinfo?(benjamin)
Please run it for 6 months first and if you still need more data you can extend it. You could instead add a metric that measures the total time a gethash call takes, which would give you a distribution and not just a boolean.
Flags: needinfo?(benjamin)
Dimi, I think we should go with the 6 months that Benjamin suggested. https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=1237198&attachment=8735634 suggests that all we need to do is add "expires_in_version" with whatever release will come 6 months after Fx 48.
Attachment #8733220 - Attachment is obsolete: true
Comment on attachment 8737773 [details] MozReview Request: Bug 1172688 - Add telemetry for when gethash calls timeout. r=francois, bsmedberg https://reviewboard.mozilla.org/r/44105/#review40815
Attachment #8737773 - Flags: review?(francois) → review+
(In reply to Dimi Lee[:dimi][:dlee] from comment #13) > Created attachment 8737773 [details] > MozReview Request: Bug 1172688 - Add telemetry for when gethash calls > timeout. r?francois, bsmedberg > > Review commit: https://reviewboard.mozilla.org/r/44105/diff/#index_header > See other reviews: https://reviewboard.mozilla.org/r/44105/ Forget to say, based on comment 4, i still use boolean for now instead of metric suggested in comment 11
I think the boolean is fine for this initial exploration.
Attachment #8737773 - Flags: review?(benjamin) → review+
Comment on attachment 8737773 [details] MozReview Request: Bug 1172688 - Add telemetry for when gethash calls timeout. r=francois, bsmedberg https://reviewboard.mozilla.org/r/44105/#review41015 data-review=me with that change or something equivalent ::: toolkit/components/telemetry/Histograms.json:3364 (Diff revision 1) > + "URLCLASSIFIER_COMPLETE_TIMEOUT": { > + "alert_emails": ["gcp@mozilla.com", "francois@mozilla.com"], > + "expires_in_version": "52", > + "kind": "boolean", > + "bug_numbers": [1172688], > + "description": "True if gethash lookup timeout" Suggest more detail: "this metric is recorded every time a gethash lookup is performed. `true` is recorded if the lookup times out."
Comment on attachment 8737773 [details] MozReview Request: Bug 1172688 - Add telemetry for when gethash calls timeout. r=francois, bsmedberg Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44105/diff/1-2/
Attachment #8737773 - Attachment description: MozReview Request: Bug 1172688 - Add telemetry for when gethash calls timeout. r?francois, bsmedberg → MozReview Request: Bug 1172688 - Add telemetry for when gethash calls timeout. r=francois, bsmedberg
Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: