High ANR rate in 68.0b5 for the arm32 apk
Categories
(Firefox for Android Graveyard :: General, defect, P1)
Tracking
(firefox67 unaffected, firefox67.0.1 unaffected, firefox68+ verified, firefox69+ verified)
Tracking | Status | |
---|---|---|
firefox67 | --- | unaffected |
firefox67.0.1 | --- | unaffected |
firefox68 | + | verified |
firefox69 | + | verified |
People
(Reporter: marcia, Assigned: petru)
References
Details
(Keywords: regression, reproducible)
Attachments
(2 files)
(deleted),
text/plain
|
Details | |
(deleted),
text/x-phabricator-request
|
jcristau
:
approval-mozilla-beta+
|
Details |
While reviewing the GPC I noticed that there was a notification that the ANR rate was above the accepted threshold. Given the fact that we are on the cusp of going to ESR mode, I thought it was important to file and track this issue.
The top issues in the cluster are:
- Input dispatching timed out (org.mozilla.firefox_beta/org.mozilla.gecko.BrowserApp, Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 2. Wait queue head age: 12660.8ms.)
- Broadcast of Intent { act=android.intent.action.SCREEN_OFF flg=0x50200010 (has extras) }
Reporter | ||
Comment 2•5 years ago
|
||
It looks as if 2015630993 (68.0) is the version that has the high ANR rate (not 2015630995 (68.0), so we may be OK here assume the second one is the latest version.
Comment 3•5 years ago
|
||
Both of those are 68.0b5, 2015630993 is the version code for the arm32 apk, 2015630995 is the arm64 one (while 2015630997 is x86_32 and 2015630999 is x86_64)
Reporter | ||
Comment 4•5 years ago
|
||
Petru or Andrei - If you are able to see info in GPS, any ideas on what might be going with the arm32 apk?
Assignee | ||
Comment 5•5 years ago
|
||
Hello,
Tried to investigate this but we don't currently have access to beta's developer console.
Filed bug 1556439 for that.
Comment 6•5 years ago
|
||
Hi!
I can reproduce this issue on Beta 68.0b7 with Motorola Nexus 6 (Android 7.1.1) by clearing data and restarting Fennec.
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 7•5 years ago
|
||
I checked the Play console and this issue is still present. Petru - Have you been able to take a look since you now have access? Thanks.
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 9•5 years ago
|
||
On it.
Assignee | ||
Comment 10•5 years ago
|
||
Classic deadlock situation possible because getDatabaseHelperForProfile(..)
would lock on [PerProfileDatabase] and then try to on [GeckoProfile] while at
the same time it would be possible for another thread which already had the
[GeckoProfile] lock to call this method and so try to acquire the
[PerProfileDatabase] lock.
The simplest solution to resolve this and the one I went with is to ensure that
one of those threads will not need both locks and it turns out that the
getDatabaseHelperForProfile method can easily be refactored to use only the
GeckoProfile lock, change which would not significantly increase the block of
code synchronized with the same key.
Assignee | ||
Comment 11•5 years ago
|
||
Turns out that in both cases "main" was trying to queuePersistAllTabs() and for this it had to wait for the GeckoProfile
lock which was already part of a deadlock between
- "GeckoBackgroundThread" which checked for distributions data and then
it acquired theGeckoProfile
lock
continued toaddDefaultBookmarks(..)
so acquiring a new LocalBrowserDb lock and
then trying to get the PerProfileDatabases lock to hold until reading from the database. - a background thread which at the same time queried TopSites _ from database
for this acquiring a PerProfileDatabase lock just to ultimately
try to acquire the GeckoProfile lock while trying to read a locally stored json.
"GeckoBackgroundThread" had a GeckoProfile
lock and needed the PerProfileDatabases
lock
"A background thread" had a PerProfileDatabases
lock and needed the GeckoProfile
lock.
Assignee | ||
Updated•5 years ago
|
Comment 12•5 years ago
|
||
We'll want to uplift this deadlock fix to Fennec 68, though we might need to postpone the uplift until after 68 ESR has been branched.
Comment 13•5 years ago
|
||
Pushed by jcristau@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2cada2586c93
Resolve deadlock by using just one lock, not two; r=VladBaicu
Comment 14•5 years ago
|
||
bugherder |
Comment 15•5 years ago
|
||
[Tracking Requested - why for this release]:
We will want to uplift this Fennec crash fix to the ESR 68 branch for the Fennec 68.1 release.
Assignee | ||
Comment 16•5 years ago
|
||
Although I don't think it's a recent regression but just a fluke that affected some, in bug 1554660 I think Eliza and Mira could reproduce this same issue fairly often.
Could you please test to see if that problem is resolved?
Comment 17•5 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #15)
[Tracking Requested - why for this release]:
We will want to uplift this Fennec crash fix to the ESR 68 branch for the Fennec 68.1 release.
To be honest I'd almost consider it for 68.0... Petru do you think this is safe enough?
Assignee | ||
Comment 18•5 years ago
|
||
The provided solution was simple enough without changing any logic so I don't see a risk of regressions.
The ANRs are pretty bad so the sooner we can resolve them, the better experience our users will have.
Was thinking of waiting for a validation from QA for bug 1554660 which I think had this same cause and with that we'll have even more reasons and confidence to uplift this.
Comment 19•5 years ago
|
||
Eliza, you said you could reproduce this in comment 6, can you check again with the latest 69 apk off mozilla-central?
Comment 20•5 years ago
|
||
Hi!
I tested this with the latest 69 apk from mozilla-central, Nightly 69.0a1 (2019-06-26) with Motorola Nexus 6 (Android 7.1.1), Sony Xperia Z5 Premium (Android 7.1.1), Motorola Moto G6 (Android 8) and I could not reproduce the issue.
Due to my findings I will mark this as verified on Firefox 69.
Thanks!
Assignee | ||
Comment 21•5 years ago
|
||
Comment on attachment 9073521 [details]
Bug 1556083 - Resolve deadlock by using just one lock, not two; r?VladBaicu
Beta/Release Uplift Approval Request
- User impact if declined: Potential "Application Not Responding"
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: Yes
- If yes, steps to reproduce: Clean start of the app
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Not risky as it is a very small change, verified by QA.
- String changes made/needed:
Assignee | ||
Updated•5 years ago
|
Comment 22•5 years ago
|
||
Comment on attachment 9073521 [details]
Bug 1556083 - Resolve deadlock by using just one lock, not two; r?VladBaicu
fix for increased ANR rate, approved for beta68
Updated•5 years ago
|
Comment 23•5 years ago
|
||
bugherder uplift |
Comment 24•5 years ago
|
||
Hello, I can confirm that the issue is not reproducible on Beta 68.0b14 using Motorola Nexus 6 (Android 7.1.1) and Motorola Moto G6 (Android 8). Due to my findings, I will mark this as verified.
Thanks!
Reporter | ||
Comment 25•5 years ago
|
||
Commenting here since this was brought up in the Channel meeting. Since this issue, it doesn't appear that we have had another warning about a high ANR rate. I do still see some instances of Input dispatching timed out errors in the current production, but we seem to have those in every production release.
Updated•4 years ago
|
Description
•