Closed Bug 1716847 Opened 3 years ago Closed 2 years ago

Glean errors for baseline.duration on Firefox iOS are unusually high

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: travis_, Assigned: travis_)

References

Details

Attachments

(1 file)

[mozilla/glean] Bug 1716847 - Fix for Glean internal metric `baseline.duration` InvalidState errors (#2368) 2 years ago BMO Automation (deleted), text/x-github-pull-request		Details

Travis Long [:travis_]

Assignee

Description

•

3 years ago

After forking our Fenix error query and creating one for Firefox iOS looking at something for Nimbus, I noticed that the glean.baseline.duration metric was reporting 120k-140k errors per day, affecting 6-7% of clients.

Jan-Erik Rediger [:janerik]

Updated

•

3 years ago

Whiteboard: [telemetry:glean-rs:m?]

Chris H-C :chutten

Updated

•

2 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1780035

Chris H-C :chutten

Comment 1

•

2 years ago

Not actually likely to be related to bug 1780035 as the shared root cause assumption that this had to do with overflowing preinit queues weren't helped by increasing the preinit queue.

See Also: https://bugzilla.mozilla.org/show_bug.cgi?id=1780035 →

Jan-Erik Rediger [:janerik]

Updated

•

2 years ago

Duplicate of this bug: 1806448

Jan-Erik Rediger [:janerik]

Updated

•

2 years ago

Priority: P3 → P2

Jan-Erik Rediger [:janerik]

Comment 3

•

2 years ago

Went from ~100k per day to ~400k around 2022-12-16, continuing to go up. New ramp-up from 2022-12-22 on, leveling at 3.4 million since 2022-12-29.

https://mozilla.cloud.looker.com/explore/firefox_ios/metrics?qid=VeWCiLhEC04siMCnXMyAqm&origin_space=746&toggle=fil,vis

Travis Long [:travis_]

Assignee

Updated

•

2 years ago

Assignee: nobody → tlong

Travis Long [:travis_]

Assignee

Comment 4

•

2 years ago

Before I forget to update this with the latest progress:

It appears that this is now easily reproducible for me locally, and I I've narrowed it down to the baseline.duration generating an InvalidState error because start was called on an already started counter. I believe that this may be a race condition between this line:

https://github.com/mozilla/glean/blob/0591aecadb762ac93e70bc85b8605f7a1ea409f0/glean-core/ios/Glean/Scheduler/GleanLifecycleObserver.swift#L28

And this line:

https://github.com/mozilla/glean/blob/0591aecadb762ac93e70bc85b8605f7a1ea409f0/glean-core/ios/Glean/Scheduler/GleanLifecycleObserver.swift#L35

This race condition exists because we expected the creation of the lifecycle observer to happen after the willEnterForeground event had occured, and so for the first foreground when launching the app we expected to need to explicitly do all the foreground things when creating the observer. But there was always a handful of errors in the past and now a lot of them to indicate that this wasn't working quite like we expected. Something has changed recently to cause the init to happen more consistently before the willEnterForeground happens, so now we are calling the handleForegroundEvent twice when launching the app.

In order to fix this, I think it is still important to have the handleForegroundEvent in the init of the observer to handle the case where it get initialized after the event has occurred, but we should add a check to ensure calls to handleForegroundEvent don't happen without a call to handleBackgroundEvent in between. This may mean yet another flag, but I'm looking at ways to handle this better, so happy to entertain any ideas or counter proposals to what might be going on here.

Travis Long [:travis_]

Assignee

Updated

•

2 years ago

Priority: P2 → P1

BMO Automation

Comment 5

•

2 years ago

Attached file [mozilla/glean] Bug 1716847 - Fix for Glean internal metric `baseline.duration` InvalidState errors (#2368) (deleted) — Details

Travis Long [:travis_]

Assignee

Comment 6

•

2 years ago

Waiting for this to be released and see if it reduces the InvalidState errors we are seeing before closing this

Travis Long [:travis_]

Assignee

Comment 7

•

2 years ago

Looks like this, combined with the other recent iOS HttpUploader updates has done what we hoped and reduced the errors we were seeing with this. Currently this is only in beta but I don't think we need to wait until v112 to see this in release to call it good. The beta data was obvious enough to call this fixed.

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Glean errors for baseline.duration on Firefox iOS are unusually high

Categories

(Data Platform and Tools :: Glean: SDK, defect, P1)

Tracking

(Not tracked)

People

(Reporter: travis_, Assigned: travis_)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Updated

Comment 1

Updated

Updated

Comment 3

Updated

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Attachment

General

Description

File Name

Content Type