Closed Bug 1191757 Opened 9 years ago Closed 9 years ago

Follow-up on investigative Telemetry client probes

Categories

(Toolkit :: Telemetry, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Iteration:
42.3 - Aug 10
Tracking Status
firefox42 --- affected

People

(Reporter: gfritzsche, Assigned: gfritzsche)

References

Details

(Whiteboard: [unifiedTelemetry] [data-validation])

We landed some investigative Telemetry client probes to track client side misbehavior.

We need to follow up on some basic checks with them.
* bug 1190302 - no data yet for TELEMETRY_SESSIONDATA_FAILED_*
* bug 1186955 - no data yet for TELEMETRY_PING_SIZE_EXCEEDED_*, TELEMETRY_DISCARDED_*_SIZE_MB
* bug 1186492 - oddly, no data yet for TELEMETRY_PING_EVICTED_FOR_SERVER_ERRORS, i expected to see some before bug 1186955 landed due to oversized pings

We do have data for bug 1187340, bug 1168835:
* pending ping load failures: http://bit.ly/1EambFu
* pending ping parse failures: http://bit.ly/1IPaytb
* pending ping sizes: http://bit.ly/1DwtUmi
(In reply to Georg Fritzsche [:gfritzsche] from comment #1)
> TELEMETRY_PING_EVICTED_FOR_SERVER_ERRORS, i expected to see some before bug
> 1186955 landed due to oversized pings

Local testing shows that the server correctly returns 4xx errors (411 for a 4mb payload) and the client handles this correctly, so we may just not have had any submissions yet or before bug 1186955.
(In reply to Georg Fritzsche [:gfritzsche] from comment #1)
> We do have data for bug 1187340, bug 1168835:
> * pending ping load failures: http://bit.ly/1EambFu
> * pending ping parse failures: http://bit.ly/1IPaytb
> * pending ping sizes: http://bit.ly/1DwtUmi

There is some interesting data here:
* We do have a lot of parse failures, i'm not sure yet what to make of this.
  However, we can track this in analysis to see if that explains the missing pings.
* We have very few actual disk load failures, 2 of 3 submissions have very high load failure counts.
  This seems to point to us repeatedly trying to load from disk after failures from the ping send task.
  I think we have to prioritize bug 1189425 to avoid clients getting stuck.
* The pending ping size distribution shows that we have a long tail of pretty big pings.
  88.86% are <1MB, a further 10.56% are 1MB<=size<2MB. That means we will get a lot of pings evicted
  soon, as bug 1186955 introduces a 1MB ping size limit.
  We can't uplift that bug in this form and probably have to consider increasing that limit to 2MB temporarily.
  as bug 1186955 introduced a 1MB limit.
(In reply to Georg Fritzsche [:gfritzsche] from comment #3)
> * The pending ping size distribution shows that we have a long tail of
> pretty big pings.
>   88.86% are <1MB, a further 10.56% are 1MB<=size<2MB. That means we will
> get a lot of pings evicted
>   soon, as bug 1186955 introduces a 1MB ping size limit.
>   We can't uplift that bug in this form and probably have to consider
> increasing that limit to 2MB temporarily.
>   as bug 1186955 introduced a 1MB limit.

Worth noting that this only measures the sizes of persisted pings on disk at startup, so this may be biased.
(In reply to Georg Fritzsche [:gfritzsche] from comment #4)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #3)
> > * The pending ping size distribution shows that we have a long tail of
> > pretty big pings.
> >   88.86% are <1MB, a further 10.56% are 1MB<=size<2MB. That means we will
> > get a lot of pings evicted
> >   soon, as bug 1186955 introduces a 1MB ping size limit.
> >   We can't uplift that bug in this form and probably have to consider
> > increasing that limit to 2MB temporarily.
> >   as bug 1186955 introduced a 1MB limit.
> 
> Worth noting that this only measures the sizes of persisted pings on disk at
> startup, so this may be biased.

This is actually the size of the whole pending ping directory, so not immediately indicating brokenness.

The individual ping size is in TELEMETRY_DISCARDED_*_SIZE_MB and those have no data yet.
(In reply to Georg Fritzsche [:gfritzsche] from comment #5)
> The individual ping size is in TELEMETRY_DISCARDED_*_SIZE_MB and those have
> no data yet.

Is this measuring the compressed size (as stored on disk) or the raw uncompressed payload size?
(In reply to Mark Reid [:mreid] from comment #6)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #5)
> > The individual ping size is in TELEMETRY_DISCARDED_*_SIZE_MB and those have
> > no data yet.
> 
> Is this measuring the compressed size (as stored on disk) or the raw
> uncompressed payload size?

Depends:
* TELEMETRY_DISCARDED_PENDING_PINGS_SIZE_MB - pending ping on-disk size
* TELEMETRY_DISCARDED_ARCHIVED_PINGS_SIZE_MB - ditto for archived
* TELEMETRY_DISCARDED_SEND_PINGS_SIZE_MB - ping size after serializing, before compression & sending to the server
Blocks: 1122482
No longer blocks: 1120356
Blocks: 1201045
They measurements here seem ok except pending ping parse failures. That issue is tracked in bug 1201045.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.