org.webrtc.SurfaceEglRenderer crash in [@ android.view.WindowManager$BadTokenException: at android.view.ViewRootImpl.setView(ViewRootImpl.java)]
Categories
(Fenix :: Browser Engine, defect, P1)
Tracking
(firefox106 wontfix, firefox107 wontfix, firefox108 verified, firefox109+ verified)
People
(Reporter: cpeterson, Assigned: jonalmeida)
References
(Blocks 1 open bug)
Details
(Keywords: crash, regression, topcrash, Whiteboard: [geckoview:m109][geckoview:m110][fxdroid])
Crash Data
Attachments
(2 files)
+++ This bug was initially created as a clone of Bug #1536505 +++
[Tracking Requested - why for this release]:
This top crash is a regression in Fenix 106, probably related to WebRTC bug 1766646.
https://crash-stats.mozilla.org/report/index/ca003154-1b81-46f0-b1c2-0959c0221102
android.view.WindowManager$BadTokenException: Unable to add window -- token null is not valid; is your activity running?
at android.view.ViewRootImpl.setView(ViewRootImpl.java:958)
at android.view.WindowManagerGlobal.addView(WindowManagerGlobal.java:381)
at android.view.WindowManagerImpl.addView(WindowManagerImpl.java:100)
at org.webrtc.SurfaceEglRenderer$$ExternalSyntheticLambda0.run(Unknown Source:15)
at android.os.Handler.handleCallback(Handler.java:789)
at android.os.Handler.dispatchMessage(Handler.java:98)
at android.os.Looper.loop(Looper.java:164)
at android.app.ActivityThread.main(ActivityThread.java:6944)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.Zygote$MethodAndArgsCaller.run(Zygote.java:327)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1374)
Some notes from bug 1536505 comment 23 and 24:
the 106 release included a libwebrtc update which brought in over a year's worth of improvements authored by Google. Some of this work was performance related. So it seems reasonable to guess here that if we have a race related to prompt display early in startup in geckoview, improvements to the lib would tickle that issue more often.
The only possibility we can come up with here is some sort of camera permissions prompt issue on a restart of the browser. (Uptimes are mostly 0 seconds.) Testing that scenario however indicates the prompts work correctly. We don't have urls in these reports, so we don't know what service they might be associated with.
I think it would be useful to get the geckoview teams thoughts on the error here -
android.view.WindowManager$BadTokenException: Unable to add window -- token null is not valid; is your activity running? at android.view.ViewRootImpl.setView(ViewRootImpl.java:1002)
What might cause a null window token early in startup on a prompt?
Reporter | ||
Updated•2 years ago
|
Reporter | ||
Updated•2 years ago
|
Comment 1•2 years ago
|
||
Set release status flags based on info from the regressing bug 1766646
:mjf, since you are the author of the regressor, bug 1766646, could you take a look?
For more information, please visit auto_nag documentation.
Reporter | ||
Comment 2•2 years ago
|
||
Clearing the needinfo for mjf because we need an Android engineer to investigate first.
Updated•2 years ago
|
Comment 3•2 years ago
|
||
The bug is marked as tracked for firefox108 (nightly). We have limited time to fix this, the soft freeze is today. However, the bug still isn't assigned.
:cpeterson, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit auto_nag documentation.
Comment 4•2 years ago
|
||
Tracking for 107, not new in the 107 release and 107 is in RC
Keeping it on the radar for a potential uplift and 107 dot release ride-along.
Reporter | ||
Comment 5•2 years ago
|
||
This crash is reported on all Android OS versions that Fenix supports (API 21-33), so it's not related to recent Google changes to Android permissions.
Comment 6•2 years ago
|
||
Set release status flags based on info from the regressing bug 1766646
Comment 7•2 years ago
|
||
Chris, has an android engineer investigated this?
Comment 8•2 years ago
|
||
I talked with Bryan Clark the engineering manager for the foundation team. Overall the crash rate for 106 was not adversely affected by this crash, this does not appear to be critical enough to interrupt sprint plans. A member of his team will look at this with a time box approach. If they are not able to provide a fix we will evaluate this for the 110 development cycle. Getting community STR would be another thing that could simplify finding a fix.
Comment 9•2 years ago
|
||
This is the #4 overall topcrash for Fenix 107.1.0. IMO this needs higher priority.
Comment 10•2 years ago
|
||
I don't understand why this was split out from bug 1536505. Web Conferencing Team feels this is caused by some sort of race in prompts vs. something that changed in webrtc.
Reporter | ||
Updated•2 years ago
|
Updated•2 years ago
|
Reporter | ||
Comment 11•2 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #10)
I don't understand why this was split out from bug 1536505. Web Conferencing Team feels this is caused by some sort of race in prompts vs. something that changed in webrtc.
That bug was an old, low-volume crash signature. Kevin recommended that we spin off the investigation into this specific crash spike into a new bug. If the fix for this crash spike fixes all the crashes with this signature, then we can close that bug, too.
This bug is in the Android Foundation team's current sprint.
Comment 13•2 years ago
|
||
Removing from 107 tracking. Sounds like this may take some time to diagnose.
Updated•2 years ago
|
Comment 14•2 years ago
|
||
[Tracking Requested - why for this release]:
moving tracking forward.
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 16•2 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #15)
Any progress here?
Nothing conclusive so far.
The breadcrumbs indicate that we are trying to attach a view after it has been deatached (onDetach). From looking at the stack traces, I see other reports that show a trace from different entry points:
Link: https://crash-stats.mozilla.org/report/index/89ea700f-7272-4c2e-ac44-b88270221128
android.view.WindowManager$BadTokenException: Unable to add window -- token android.os.BinderProxy@fdb47ef is not valid; is your activity running?
at android.view.ViewRootImpl.setView(ViewRootImpl.java:1147)
at android.view.WindowManagerGlobal.addView(WindowManagerGlobal.java:471)
at android.view.WindowManagerImpl.addView(WindowManagerImpl.java:95)
at android.app.Dialog.show(Dialog.java:507)
at androidx.fragment.app.DialogFragment.onStart(DialogFragment.java:11)
at org.mozilla.fenix.search.SearchDialogFragment.onStart(SearchDialogFragment.kt:1)
at androidx.fragment.app.FragmentStateManager.start(FragmentStateManager.java:32)
at androidx.fragment.app.FragmentStateManager.moveToExpectedState(FragmentStateManager.java:55)
at androidx.fragment.app.FragmentManager.executeOpsTogether(FragmentManager.java:1178)
at androidx.fragment.app.FragmentManager.removeRedundantOperationsAndExecute(FragmentManager.java:92)
at androidx.fragment.app.FragmentManager.execPendingActions(FragmentManager.java:74)
at androidx.fragment.app.FragmentManager$5.run(FragmentManager.java:4)
at android.os.Handler.handleCallback(Handler.java:883)
at android.os.Handler.dispatchMessage(Handler.java:100)
at android.os.Looper.loop(Looper.java:237)
at android.app.ActivityThread.main(ActivityThread.java:8167)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:496)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1100)
Link: https://crash-stats.mozilla.org/report/index/936df649-b64f-4af9-afaa-cecd70221128
android.view.WindowManager$BadTokenException: Unable to add window -- token null is not valid; is your activity running?
at android.view.ViewRootImpl.setView(ViewRootImpl.java:598)
at android.view.WindowManagerGlobal.addView(WindowManagerGlobal.java:341)
at android.view.WindowManagerImpl.addView(WindowManagerImpl.java:85)
at org.mozilla.fenix.compose.cfr.CFRPopup$$ExternalSyntheticLambda0.run(R8$$SyntheticClass:56)
at android.os.Handler.handleCallback(Handler.java:743)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:150)
at android.app.ActivityThread.main(ActivityThread.java:5546)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:792)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:682)
This leads me to believe that it's more a timing issue with how we decide to add a view in some general sense and not specific to the org.webrtc.SurfaceEglRenderer
.
I'm now trying to see what might have landed in Fenix that might be trying to show Android Views after our Fragment was detached.
Assignee | ||
Comment 17•2 years ago
|
||
I looked into the history apply_observation
breadcrumbs, however they don't seem to be related - history events are being recorded here while browsing.
In some of the traces, the signature is from org.mozilla.fenix.compose.cfr.CFRPopup
which is the non-upstreamed version. The traces seems to be happening specifically at BrowserToolbarCFRPresenter here.
This change was added in fenix#26226 which landed in 106 as well, so looks highly suspicious as the source of where the increase happens.
Assignee | ||
Comment 18•2 years ago
|
||
Current hypothesis: What I believe is happening, is that we are observing page loading until 99% completion until we decide to show a CFR popup here - this is happening off the main thread. When we get to the point to show this to the user here, the method CFRPopup#show
then posts to the UI handler at the back of the queue on the main thread to add the popup view to the anchor view on the WindowManager. So there are two chained async tasks that are happening when the anchor view may have already been removed from the window before we actually to get to our addView
call (which triggers the ViewRootImpl.setView
).
What I'm trying to do now is reproduce this bug and find a solution for it.
Assignee | ||
Comment 19•2 years ago
|
||
A speculative fix has been put up for review. When it lands, we will observe the crash rates in nightly before considering an uplift.
Assignee | ||
Comment 20•2 years ago
|
||
(In reply to Jonathan Almeida [:jonalmeida] from comment #19)
Created attachment 9306798 [details]
Bug 1799996 - Speculative fix checks CFRPopup.anchor is attached to window #28063A speculative fix has been put up for review. When it lands, we will observe the crash rates in nightly before considering an uplift.
Landed in https://github.com/mozilla-mobile/fenix/commit/7a8b0a91c0657340c6a5d191099c22644c49c80a
Assignee | ||
Updated•2 years ago
|
Comment 21•2 years ago
|
||
No crashes on Nightly since this landed when we were previously seeing pretty consistent crash volume in every build. Looks like the fix worked! BTW, the fix does cherry-pick cleanly to v108 as well. IMO, we should definitely keep this on the radar for eventual dot release inclusion should the opportunity arise.
Assignee | ||
Comment 22•2 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #21)
No crashes on Nightly since this landed when we were previously seeing pretty consistent crash volume in every build. Looks like the fix worked! BTW, the fix does cherry-pick cleanly to v108 as well. IMO, we should definitely keep this on the radar for eventual dot release inclusion should the opportunity arise.
Thanks for checking - I can prep this for the 108 release branch in that case.
Comment 23•2 years ago
|
||
Assignee | ||
Comment 24•2 years ago
|
||
Uplifted, but the fix has not been released yet (no scheduled patch releases): https://github.com/mozilla-mobile/fenix/commit/c84d0805ec7d149c7f8fc7b7ee5a322ba685b976
Assignee | ||
Updated•2 years ago
|
Comment 25•2 years ago
|
||
We couldn't reproduce this issue on both Focus and Fenix with the latest builds ( RC108.1.1 build2 and 109.0b3 ).
Devices used:
- Huawei P40 Lite (Android 10).
- Lenovo Tab M10 (Android 10).
- Oppo Find X5 (Android 12).
- Samsung Galaxy S22 Ultra (Android 13).
- Google Pixel 6 (Android 13).
Marking this ticket as verified.
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Description
•