Closed Bug 1702937 Opened 4 years ago Closed 4 years ago

Narrow down the zero_byte_load probe to tailor results for YSOD, part 5

Categories

(Core :: Networking: JAR, task, P2)

task

Tracking

()

RESOLVED FIXED
89 Branch
Tracking Status
firefox88 --- fixed
firefox89 --- fixed

People

(Reporter: zbraniecki, Assigned: kershaw)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [necko-triaged])

Attachments

(1 file)

I analyzed the data from all three channels from the last weeks and built the fifth phase of filters to adjust the collected data for further analysis:

  1. Set the probe pref to false on release by default

We collected enough data from release for the initial analysis and if we'll want to collect more, we'll run a real "experiment" by flipping the pref to "on" for a short window, rather than using the pref that is by default "on" to flip it to "off".

I'd suggest we keep the probe on on nightly/beta since the volume there is smaller and we can use it to monitor for trends. The other channels are not representative of the Release, but until we have a solid hypothesis that we want to valid on release, I think this should work.

  1. Filter out svg+'css'+'png'/NS_BINDING_ABORTED combo.

While this may be a manifestation of the bug, I would be surprised if it was a unique one, and we have many other types with NS_BINDING_ABORTED at lower volume.
Out of 335m events a day on Release, 50% are from svg type and almost all of them are from NS_BINDING_ABORTED.
The data we collected so far is enough to reason about the problem if the file names or type will become part of a hypothesis and we won't learn much more out of collecting that type for now.

  1. Filter css/NS_ERROR_CORRUPTED_CONTENT that is coming from outside of omni.ja.

We see a high volume (3m) out of jid1-TMndP6cdKgxLcQ@jetpack.xpi!/res/styles/* path with NS_ERROR_CORRUPTED_CONTENT. I reported it separately as bug 1702936 and would let WebExtensions and the authors of the extension explore that problem further.

  1. Filter aboutNetError.xhtml

It is responsible for 75m/77m of XHTML events with a NS_ERROR_FAILURE dominating (73m/day) and NS_BINDING_ABORTED following (2m/day).

We likely have the same problem on less popular files with lower volume of errors, and if anything since this is an error page, it is more likely that the causation is reversed and this error is to be opened in result of a system getting in a broken state than reverse.

I'm a bit concerned about filter this one out because all other filters are on secondary resources like CSS/PNG/SVG, while this actually is on UI document, which may impact our ability to analyze impact of XHTML network errors on retention, so if we ever will decide to run a particular study like this we may want to unfilter this one.

It may be worth leaving a comment in the source stating that aboutNetError.xhtml is causing ~90% of the XHTML error events, with NS_ERROR_FAILURE dominating, and we filter it out to minimize the volume, but when analyzing XHTML errors a researcher has to take this into account.

  1. Filter other/*.ico/NS_BINDING_ABORTED

This is responsible for 90% of others category and similarly to svg/css/png is a secondary resource file, which should cause that much damage and likely has the same underlying cause as document errors with the same error.

Together those 5 filters should remove ~90% of the volume while only really cutting out secondary resources with NS_BINDING_ABORTED and extensions with corrupted resources and aboutNetError.xhtml errors.

:kershaw - does this look reasonable to you?

Flags: needinfo?(kershaw)

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #1)

:kershaw - does this look reasonable to you?

Looks good to me. Thanks!

Assignee: nobody → kershaw
Flags: needinfo?(kershaw)
Pushed by kjang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/391150f67aff Narrow down the zero_byte_load probe, r=zbraniecki
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 89 Branch

Comment on attachment 9213852 [details]
Bug 1702937 - Narrow down the zero_byte_load probe, r=zbraniecki

Beta/Release Uplift Approval Request

  • User impact if declined: This bug is about fine tuning the filters we used to collect zero_byte_load event. Taking this patch to beta is able to make sure we have the same filters for nightly and beta.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: N/A
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This patch only changes how we send the telemetry probe.
  • String changes made/needed: N/A
Attachment #9213852 - Flags: approval-mozilla-beta?

Comment on attachment 9213852 [details]
Bug 1702937 - Narrow down the zero_byte_load probe, r=zbraniecki

Mostly a no-op for Beta at this point since we're already out of early beta and nearing RC week, just turning this telemetry probe off by default. Approved for 88.0b9.

Attachment #9213852 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: