Closed Bug 1599877 Opened 5 years ago Closed 5 years ago

Make the 'baseline' ping resistent to force-closed

Categories

(Data Platform and Tools :: Glean: SDK, defect, P1)

Unspecified
Android
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: travis_, Assigned: Dexter)

References

Details

(Whiteboard: [telemetry:glean-rs:m11])

Attachments

(2 files, 1 obsolete file)

In testing a theory around why we sometimes only get metrics pings and no baseline pings from clients, I discovered that a user can open the task-switcher menu and force close an app quickly enough that the ON_STOP lifecycle event handler is not called. This means that baseline and events pings are neither collected or submitted in this case.

I tested this on the glean-sample-app, which does very little on shutdown and it still is affected, so I imagine apps like Fenix that are doing significantly more work on shutdown, the problem is even easier to reproduce.

I tried to find statistics on what percentage of mobile users close their apps in this way but couldn't find anything concrete, but there are lots of articles explaining why this behavior is bad, and I know several mobile users who perform this behavior all the time. So, no concrete numbers, but my instincts tell me that this is a statistically significant population.

Steps to Reproduce:

  1. Open Glean-enabled Android app
  2. Click the square OS icon at the bottom of the device to launch the task switcher
  3. Immediately swipe away the app to force-close it.

The app will send the baseline ping if you pause between steps 2 and 3, given enough time, but I was consistently able to reproduce this without having to rush, even in the emulator.

To the best of my experimentation, this affects multiple versions of Android and Glean.

At the current time, I'm not sure if iOS suffers from the same problem but I plan to investigate this also.

Ok, so this will result in two things:

  1. Glean sending less baseline pings; potentially none, if force-closing without ever going to background;
  2. Glean sending more metrics ping, at least one more per device, if Glean-enabled products are not used for two days in a row, because of this.

I think that we can easily address (1) by sending the baseline ping, at least once, right after the application starts.
Addressing (2) is a bit more problematic. However, we could add one extra bit of information to the metrics ping to understand when collection happens, like the reason field for the main Desktop ping. Maybe a string metric (well, EnumMetricType would be ideal in this case :P) with the following values:

  • overdue -> to cover this case: "the ping was NOT collected the current calendar day, and we're later than the due time; collect the ping immediately";
  • current_day -> to cover this case: "the ping was NOT collected the current calendar day, but we still have some time to the due time; schedule for sending the current calendar day."

This could help us get a sense of how many pings we get overdue at startup and how many are sent because scheduled by the workmanager.

Given the new knowledge, we might also consider reworking the metrics ping scheduler.. once again.

@chutten, @janerik what do you think of the above?

Flags: needinfo?(jrediger)
Flags: needinfo?(chutten)

Lemme break this down into pieces.

First piece: send the "baseline" ping in two cases: unreliably on backgrounding (ON_STOP), and reliably on application start.

Why?

What is the point of the "baseline" ping again? To provide some measure of user activity, and the length of time of foreground sessions? For users who quickly close we're missing out on the final fg session of an application session. Will this application start variant contain that final fg session's duration? How important is it that we get "baseline" pings? At all? In a timely fashion?

I guess these might be questions we need to ask of Data Science. They're questions of analysis doctrine, I guess... but whatever characteristics the pings have will dictate that doctrine, so I'm starting the questions here.

Second piece: add a reason field to "metrics" pings.

I may have written about this before, so it might not surprise you to learn I'm broadly in favour. It should help us diagnose "metrics"-ping-only clients. I might bikeshed a bit about the reason values themselves (If there are three scheduling cases in the code, there should be three reasons), but I'm in favour of us having it.

Flags: needinfo?(chutten)

all for upping priority on adding a reason field to our pings.
for the other question I defer to what :chutten said.

Flags: needinfo?(jrediger)

(In reply to Chris H-C :chutten from comment #2)

Lemme break this down into pieces.

First piece: send the "baseline" ping in two cases: unreliably on backgrounding (ON_STOP), and reliably on application start.

Why?

What is the point of the "baseline" ping again? To provide some measure of user activity, and the length of time of foreground sessions? For users who quickly close we're missing out on the final fg session of an application session. Will this application start variant contain that final fg session's duration? How important is it that we get "baseline" pings? At all? In a timely fashion?

very important, since this is the baseline set of things that are needed to "know" the application population.
You raise a very good point about the foreground duration. Of course, if we sent it at startup as well that ping would not contain the foreground duration. But, as a counterpoint, the foreground duration would be lost regardless of what we do on force-close, given what Travis has found.

So it's really a matter of getting any signal at all from first sessions that are force-closed (intentionally forgetting about the foreground duration) or nothing at all.

I guess these might be questions we need to ask of Data Science. They're questions of analysis doctrine, I guess... but whatever characteristics the pings have will dictate that doctrine, so I'm starting the questions here.

Yes, it's definitely something we need to discuss with them.

Second piece: add a reason field to "metrics" pings.

I may have written about this before, so it might not surprise you to learn I'm broadly in favour. It should help us diagnose "metrics"-ping-only clients. I might bikeshed a bit about the reason values themselves (If there are three scheduling cases in the code, there should be three reasons), but I'm in favour of us having it.

Ok, let's take this second part to bug 1557048, which I'm now prioritizing.

(In reply to Travis Long [:travis_] from comment #0)

In testing a theory around why we sometimes only get metrics pings and no baseline pings from clients, I discovered that a user can open the task-switcher menu and force close an app quickly enough that the ON_STOP lifecycle event handler is not called.
[...]

Steps to Reproduce:

  1. Open Glean-enabled Android app
  2. Click the square OS icon at the bottom of the device to launch the task switcher
  3. Immediately swipe away the app to force-close it.

The app will send the baseline ping if you pause between steps 2 and 3, given enough time, but I was consistently able to reproduce this without having to rush, even in the emulator.

To the best of my experimentation, this affects multiple versions of Android and Glean.

Sebastian, have you ever seen something like that? Is the above behaviour expected?

Flags: needinfo?(s.kaspari)

(In reply to Travis Long [:travis_] from comment #0)

In testing a theory around why we sometimes only get metrics pings and no baseline pings from clients, I discovered that a user can open the task-switcher menu and force close an app quickly enough that the ON_STOP lifecycle event handler is not called. This means that baseline and events pings are neither collected or submitted in this case.

Just to clarify: events will be sent next time the application starts, so they won't be lost.

(In reply to Alessio Placitelli [:Dexter] from comment #5)

Sebastian, have you ever seen something like that? Is the above behaviour expected?

I assume this is from the onStop event of a ProcessLifecycleOwner? I think that is in the realm of possibilities, especially if it only happens sometimes, as mentioned in the bug description:

  • If you look at the activity lifecycle (https://developer.android.com/reference/android/app/Activity#ActivityLifecycle) then you can see that onStop() is the first method that is killable. This means that the process can get killed AFTER that method returns. So this is good. However:
  • ProcessLifecycleOwner can be seen as a composite of all of your Activities. That is great when you want to track foreground/background in general because you do not care which activity goes to the background - you only want to know that "the whole app is in the background or not". Now activities can go to the background/foreground for different reasons. To work around that ProcessLifecycleOwner delays , because e.g. "This delay is long enough to guarantee that ProcessLifecycleOwner won't send any events if activities are destroyed and recreated due to a configuration change.". See: https://developer.android.com/reference/androidx/lifecycle/ProcessLifecycleOwner?hl=en

So my assumption is that sometimes you see situations where onStop() is getting called for the activities (because that should be guaranteed( and the process gets killed just before the delay of the process lifecycle owner has passed - which would trigger your code. You could try confirming that by logging the events from the activities and seeing if they show up (and the ones from your lifecycle callback do not).

Flags: needinfo?(s.kaspari)
Depends on: 1600340
Whiteboard: [telemetry:glean-rs:m?] → [telemetry:glean-rs:m11]

The metrics ping scheduler is being redesigned. I'm making this bug about fixing the 'baseline' ping. The 'baseline ping' is useful for measuring the population, is a base signal that should be sent by all products using glean out of the box, even if no custom metric is defined.

As discussed in comment 4, we don't get any 'baseline' ping if the application starts and it's force-closed before it ever goes to background. One potential way to address this is to send a baseline ping when the application starts, if we detect that it was force-closed.

The force-close detection mechanism currently works like this: when the application starts, it checks if a value was stored to disk (i.e. "dirty value"). If it was, then the app was force-closed. If it wasn't set, we set the value to true. When we receive the ON_STOP message, we clear the stored "dirty value" by setting it to false. If for some reason we fail to receive ON_STOP (due to force-closing or a crash, for example), then next time we start we'll detect this.

Emily, Frank what do you think about this?

Flags: needinfo?(fbertsch)
Flags: needinfo?(ethompson)
Summary: Android ON_STOP lifecycle event can be missed if the application is force-closed → Make the 'baseline' ping resistent to force-closed

Hey Alessio, thanks for dividing this up into two separate bugs. I'll also cc the same crew on this one +sguha, marissa and corey

Flags: needinfo?(sguha)
Flags: needinfo?(mgorlick)
Flags: needinfo?(ethompson)
Flags: needinfo?(cdowhygelund)

(In reply to Emily Thompson from comment #9)

Hey Alessio, thanks for dividing this up into two separate bugs. I'll also cc the same crew on this one +sguha, marissa and corey

Cheers, thank you!

I have observed a significant fraction of the experiment population missing baseline pings from the Fenix TP mode experiment. Using the metrics pings for enrollment yields 33% more profiles than the baseline: 75K versus 52K. There are 30K profiles out of the 75K that are only found in the metrics table: they have no baseline ping. Therefore, they have no uri count, nor duration data over the two week experiment window.

There is also a small fraction of profiles, that only have baseline pings and no metrics pings: ~7K.

Flags: needinfo?(cdowhygelund)

(In reply to Corey Dow-Hygelund [:ccd] from comment #11)

I have observed a significant fraction of the experiment population missing baseline pings from the Fenix TP mode experiment. Using the metrics pings for enrollment yields 33% more profiles than the baseline: 75K versus 52K. There are 30K profiles out of the 75K that are only found in the metrics table: they have no baseline ping. Therefore, they have no uri count, nor duration data over the two week experiment window.

Hey Corey, yes, we outlined related findings in our baseline incident doc.

tl;dr - due to a bug in a library used by Fenix the 'baseline' ping was not being sent by about 30% of the Fenix Nightly population. The same bug also resulted in more metrics pings being sent (that's the reason why we're improving the scheduling in bug 1602824).

This is now resolved.

There is also a small fraction of profiles, that only have baseline pings and no metrics pings: ~7K.

There's a few reason why this could happen: for example a client closing Fenix and never opening it again before a metrics ping is ever collected.

Flags: needinfo?(mgorlick)
Assignee: nobody → alessio.placitelli
Priority: P3 → P1

We settled on "Proposal A" from the document in comment 13.

Flags: needinfo?(sguha)
Flags: needinfo?(fbertsch)
Attached file GitHub Pull Request (deleted) —
Attached file baseline-reason-codes-data-request.md (obsolete) (deleted) —
Attachment #9124520 - Flags: data-review?(chutten)

Chris, WIP documentation is available as part of the PR, here: https://github.com/mozilla/glean/pull/687/files#diff-edd6ebe73fb8a58e7f181371a6356740

Comment on attachment 9124520 [details] baseline-reason-codes-data-request.md If the collection is permanent, an individual must be identified in Q6 to be responsible for it over time.
Attachment #9124520 - Flags: data-review?(chutten) → data-review-

(In reply to Chris H-C :chutten from comment #19)

Comment on attachment 9124520 [details]
baseline-reason-codes-data-request.md

If the collection is permanent, an individual must be identified in Q6 to be
responsible for it over time.

Glean does not send anything on its own, products using glean will send data and, as such, people from these products will be responsible for being the individuals looking at that.

I'm not sure who the individual looking at the data should be for each specific, general metric we add to the SDK side.

Flags: needinfo?(chutten)

It's the individual responsible for the collection, not necessarily for its monitoring. If I'm looking at it and have questions about why it is a certain way, or if certain behaviours are buggy, or if there's a question about its applicability to certain policies... who do I ask?

Flags: needinfo?(chutten) → needinfo?(alessio.placitelli)
Comment on attachment 9124520 [details] baseline-reason-codes-data-request.md - removed -
Attachment #9124520 - Attachment is obsolete: true
Flags: needinfo?(alessio.placitelli)
Attachment #9124520 - Flags: data-review-
Attached file baseline-reason-codes-data-request.md (deleted) —
Attachment #9124761 - Flags: data-review?(chutten)

(In reply to Chris H-C :chutten from comment #21)

It's the individual responsible for the collection, not necessarily for its monitoring. If I'm looking at it and have questions about why it is a certain way, or if certain behaviours are buggy, or if there's a question about its applicability to certain policies... who do I ask?

All right, clear, thanks!

Comment on attachment 9124761 [details] baseline-reason-codes-data-request.md DATA COLLECTION REVIEW RESPONSE: Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate? Yes. This collection is documented in the [Book of Glean](https://mozilla.github.io/glean/book/user/pings/baseline.html). Is there a control mechanism that allows the user to turn the data collection on and off? Yes. This collection is part of the Glean SDK so can be controlled by the embedded application's settings. If the request is for permanent data collection, is there someone who will monitor the data over time? Yes, Alessio Placitelli is responsible for this collection. For monitoring over time, individual product owners will be responsible. Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under? Category 1, Technical. Is the data collection request for default-on or default-off? Default on for all channels. Does the instrumentation include the addition of any new identifiers? No. Is the data collection covered by the existing Firefox privacy notice? Yes. Does there need to be a check-in in the future to determine whether to renew the data? No. This collection is permanent. --- Result: datareview+
Attachment #9124761 - Flags: data-review?(chutten) → data-review+
Blocks: 1614386
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

Hi, verified as fixed on Migration Build #9 using a OnePlus A 2005 (Android 6.0.1) and the baseline pings are generated with the correct reason code.

✔️ Ping - 090b4205-5c54-483e-8f3f-b7f8421416a7 | Put the app in background having the migration screen displayed ("reason": "background")
✔️ Ping - 120e0aa2-529e-4047-aef2-fe39bde3061d | Put the app in background after successfully performing the migration ("reason": "background")
✔️ Ping - 99103064-6b6b-41db-ae2d-22d473d13a7e | Force crash Fenix using about:crashcontent and re-open Fenix ("reason": "background")
✔️ Ping - ebe05160-f71d-49e2-86b1-204230d48be8 | Force crash Fenix using about:crashparent and re-open Fenix ("reason": "dirty_startup")
✔️ Ping - 545068fa-7cbd-41ea-a44c-dd87abfceff | "Kill" Fenix from the Task manager and re-open it ("reason": "dirty_startup")

Glean dashboard

I've checked exclusively the fact that after Fenix is "killed" from the task manager and re-opened, a baseline ping with "reason": "dirty_startup" will be generated and submitted.
In this case as well, everything seems to be working properly, managed to generate 6 consecutive baseline pings having "reason": "dirty_startup".

✔️ Ping - d4d89710-467f-4f17-b7ea-6c04637ac995
✔️ Ping - 75ee1652-d276-4223-b7f0-f8a5b64d82a6
✔️ Ping - 8536e61b-3b20-47b9-830a-a762cfc03f20
✔️ Ping - 8cad76bd-40c9-4553-ad6b-a0590da71ff5
✔️ Ping - 6de81125-1ac2-4c4c-9e82-5ded140667eb
✔️ Ping - 18da8730-f6a3-4f1f-bbd5-9f48fe57a797

Glean dashboard

Status: RESOLVED → VERIFIED

Looking at the newly landed reason field for the baseline ping, we found that the overall volume of pings coming from newer Fenix builds matches with the known trends.
We see a small increase in the number of baseline pings being sent, since we're sending reason: dirty_startup whenever the product using the SDK (e.g. Fenix) crashed or was force-closed. Preliminary data shows that this matches our expected volume of ~20% of clients. These clients are sending at least a ‘baseline’ ping with ‘dirty_startup’ per day (they might be reporting other pings as well).

How many new clients are we seeing with SDK >= 24.2.0 that did not exist in earlier versions?

This query on incoming data shows that, currently, we only see 61 clients that were not present with earlier versions of the SDK and only sent baseline pings with reason: dirty_startup. I thought that this number was surprisingly low, but then I thought again about the ~19% figure we disclosed about users who 'force-close': that number reflects users who have, at least ONCE, force-closed. If some user force-closes a session, it doesn't necessarily mean they will force-close each single session.

This preliminary data indicates that the size of the population that we were not able to see without a ‘dirty_startup’ ping is minimal.

Do we see clients that have both dirty-startup and background reason?

Or, put differently, are clients sending reason: dirty_startup exclusively sending that ping and no other ping/reasons?

The answer seems to be no. Preliminary data shows that clients exclusively sending baseline ping with reason: dirty_startup are a minority, less than ~0.1%.
It's fairly common to see clients force-closing, then properly going to background, then force-closing, etc.
The sequence of reasons deeply varies on users, which indicates completely different behaviours and usage patterns.

Does Fenix have these improvements?

Fenix Nightly and Beta builds used for the Fennec to Fenix migration have these improvements, as this query shows. The improvements will reach other channels as changes propagate to Release and/or Fennec builds are migrated.
As of today (27th of February, 2020) Firefox Preview does not have these improvements, yet. Data coming from this product can show up to twice the number of clients with the ‘metrics’ ping compared to the ones reported with the ‘baseline’ ping.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: