Closed Bug 1570305 Opened 5 years ago Closed 5 years ago

Figure out what to do about the last batch of GV Streaming accumulations

Categories

(Toolkit :: Telemetry, task, P2)

task

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: chutten, Assigned: chutten)

References

Details

In GV Streaming we batch accumulations and send them when the first accumulation comes in that exceeds the batch duration.

But what if that accumulation never comes? We could have toolkit.telemetry.geckoview.batchDurationMS - 1 (4999ms) worth of data that just sits in memory and gets lost.

I see three obvious solutions, each with their flaws:

  1. What the design says: A just-in-case timer set to, let's say, 2 * batchDurationMS (10s) that will flush the batch.
    • It won't help us if the reason we've stopped accumulating is because we've shut down.
  2. Decrease the default value of batchDurationMS to, say, 500ms.
    • Decreases the maximum possible window of lost data, but still loses data.
  3. Flush the batch on shutdown (probably hooking into [ShutdownTelemetry()](https://searchfox.org/mozilla-central/rev/38c88cbf4be87dfa0636d15a5d3599a8ea0d1a72/toolkit/components/telemetry/core/Telemetry.cpp#1234))
    • We can't depend on a tidy shutdown on mobile.

Maybe some combination of all three?

Blocks: 1570307

Another option we could consider is
4. Do nothing. We always end up losing the last bits of data, so let's update the design doc and in-tree docs to be clear that this is expected and acceptable.

I don't like the sound of that, so I think I'm going to start with a combination of 1 and 3, plus a sprinkle of documentation to be clear about this aspect of the mechanism.

Alessio, you're the project owner of this. Does this sound acceptable to you?

Assignee: nobody → chutten
Status: NEW → ASSIGNED
Flags: needinfo?(alessio.placitelli)
Priority: -- → P2

(In reply to Chris H-C :chutten from comment #1)

Another option we could consider is
4. Do nothing. We always end up losing the last bits of data, so let's update the design doc and in-tree docs to be clear that this is expected and acceptable.

I don't like the sound of that, so I think I'm going to start with a combination of 1 and 3, plus a sprinkle of documentation to be clear about this aspect of the mechanism.

Alessio, you're the project owner of this. Does this sound acceptable to you?

I think (4) is fine: I don't think we get any shutdown notification at all, in most cases, on Android. The Glean SDK (and all the app devs I've talked to) already live with the fact that data can be lost. It might be worth checking with Agi and Sebastian Kaspari.

Flags: needinfo?(alessio.placitelli)

(In reply to Alessio Placitelli [:Dexter] from comment #2)

I think (4) is fine: I don't think we get any shutdown notification at all, in most cases, on Android. The Glean SDK (and all the app devs I've talked to) already live with the fact that data can be lost. It might be worth checking with Agi and Sebastian Kaspari.

ni? away!

Summary: There's a possibility that at most 5s of Gecko Telemetry might be "lost" because it'll be the last 5s of data (the process is wiped out from under us). Seems like that's an acceptable margin (considering non-streaming GV Telemetry's persistence timer of 60s, I agree).

According to the two of you: Is this fine? For now at least?

Flags: needinfo?(s.kaspari)
Flags: needinfo?(agi)

I don't think losing 5s of data is a concern. If we want to make this slightly better we could make GeckoView flush telemetry when it goes to the background (which greatly increases the chance of being killed OOM). There is no way on Android to know when your app is killed (android uses SIG_KILL or equivalent), this is by design.

Flags: needinfo?(agi)

I can't really answer that. That's up to the Gecko(View) team(s). But with the previous timer being 60s this seems very reasonable to me. :)

Flags: needinfo?(s.kaspari)

I'd say comment#4 covers the GeckoView POV, so I think we're good here.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.