Closed Bug 1191257 Opened 9 years ago Closed 6 years ago

Investigate high volume of Telemetry pings with subsessionLength 0

Categories

(Toolkit :: Telemetry, defect, P4)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox42 --- affected

People

(Reporter: gfritzsche, Unassigned)

References

Details

(Whiteboard: [unifiedTelemetry] [measurement:client])

In bug 1189164 i see a lot of pings with subsessionLength 0:
https://kibana.shared.us-west-2.prod.mozaws.net/#/dashboard/temp/AU76r2croEkfX-6KQ_Co

We should investigate:
* what the volume of this problem is
* check that bug 1188416 didnt make things worse
* run down why daily pings are affected so much
Summary: High volume of Telemetry pings with subsessionLength 0 → Investigate high volume of Telemetry pings with subsessionLength 0
Of the first 500 pings in the kibana query 498 are 0.
Sorting descending by subsession we get 500 positive subsessionLength.

How can we check what the volume of subsessionLength<=0 is?
Flags: needinfo?(kparlante)
The first few clients we inspected who had a subsessionLength for a "daily" ping seem to create two daily pings at the same time, e.g.:

> Timestamp 	        creationTimestamp 	reason 	submissionDate 	profileSubsessionCounter appBuildId 	subsessionLength 
> 2015-08-05T05:57:51Z 	2015-08-05T06:04:48Z 	daily 	20150805 	21                       20150803030207 	0
> 2015-08-05T05:56:53Z 	2015-08-05T06:04:48Z 	daily 	20150805 	20                       20150803030207 	100693
We inspected things here a bit. We found one example where there were 4 daily pings generated right after each other, which are definitely not just duplicates (profileSubsessionCounter increments monotonically, correct subsession chaining, ...).

Inspecting the code we found some potential issues:
* TelemetrySession retries to generate daily pings if ping sending or saving fails - this is wrong, it should just hand off the ping to the other modules and let TelemetrySend etc. take care of retries
* time checks and timeout calculations should not generate close-to-zero scheduler intervals, but hard to be certain - we need to check a bit more
* _sendDailyPing() waits for aborted-session saves and will retry if that fails
* race potential with the following steps:
  * _schedulerTick tries to send a daily ping, waiting on the async operations
  * _schedulerTick may be triggered again, say from observer
  * _lastDailyPingTime is not updated yet, triggering another daily ping

* action points:
  (1) we need to stop retrying daily pings
  (2) we need to immediately update the _lastDailyPingTime in _schedulerTickLogic
  (3) send a few "active" or "idle" notifications directly after each other - should reproduce
  (4)  directly test schedulers _rescheduleTimeout, _sentDailyPingToday,  _isDailyPingDue with some border values to look for misbehavior

Actions/findings unrelated to this bug:
  (5) investigation bug: we may trigger ping generations etc. when we receive the "idle" notification instead of only rescheduling timers - do we have to?
  (6) improvement bug: We're not updating _lastSessionCheckpointTime when sending daily pings. We may be writing a new aborted-session ping a few seconds after a daily ping (and the relative aborted-session ping) was generated
Depends on: 1191312
(In reply to Georg Fritzsche [:gfritzsche] from comment #3)
>   (3) send a few "active" or "idle" notifications directly after each other
> - should reproduce

This trivially reproduced the behavior, but doesn't explain all the samples we see.
Filed bug 1191312 for action points (1) and (2), bug for 1191318 (5) and bug 1191315 for (6).

_rescheduleTimeout, _sentDailyPingToday and _isDailyPingDue seem to behave correctly in the following cases:

- 30 minutes before midnight
- Midnight
- 15 minutes after midnight
- midday
Depends on: 1191410
(In reply to Alessio Placitelli [:Dexter] from comment #5)
> Filed bug 1191312 for action points (1) and (2), bug for 1191318 (5) and bug
> 1191315 for (6).
> 
> _rescheduleTimeout, _sentDailyPingToday and _isDailyPingDue seem to behave
> correctly in the following cases:
> 
> - 30 minutes before midnight
> - Midnight
> - 15 minutes after midnight
> - midday

What about "really close to midnight" on either side?
The needinfo moved to bug 1191410.
Flags: needinfo?(kparlante)
Georg, could you try to manually check if the sleep and idle/active user notifications causes multiple daily pings on your Mac? (see bug 1191312)
Flags: needinfo?(gfritzsche)
Blocks: 1122482
No longer blocks: 1120356
Flags: needinfo?(gfritzsche)
Priority: -- → P3
Whiteboard: [unifiedTelemetry] [data-validation] → [unifiedTelemetry]
Whiteboard: [unifiedTelemetry] → [unifiedTelemetry] [measurement:client]
Priority: P3 → P4
This has been around for a while and people have worked with the data.
We'll close this bug; any new investigations can happen in new bugs.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.