Navigation from extension page to another extension page results in a unusable tab on Fenix (and RefBrow)
Categories
(GeckoView :: Extensions, defect, P1)
Tracking
(firefox106 wontfix, firefox107 affected, firefox108 affected, firefox109 affected, firefox110 affected, firefox111 affected)
People
(Reporter: robwu, Assigned: boek)
References
(Depends on 1 open bug)
Details
(Whiteboard: [geckoview:m109][geckoview:m111][geckoview:m112][geckoview:m113][bugz:In Progress][addons-jira])
Attachments
(3 files)
STR:
- Open uBlock Origin's panel (install if needed)
- Click on Dashboard.
- Click on the eye icon, e.g. at uBlock Filters. This will trigger a navigation to a page within the extension a
moz-extension:-
URL
4 .Observe that the tab is created but cannot be interacted with, and still renders whatever what was previously displayed (e.g. the displayed tab from step 3).- THIS IS A BUG, reported at https://github.com/uBlockOrigin/uBlock-issues/issues/2271
- Reloading the page does not fix the issue. That looks very similar to this long-standing but not easily reproducible issue: https://github.com/mozilla-mobile/fenix/issues/24563
- Attach the debugger to that new tab via
about:debugging
on desktop. - Switch to Inspect element, and discover that suddenly the bug disappears - content is painted again, and can be interacted with!
- THIS IS A STRANGE OBSERVATION that may potentially shed some light on the underlying issue.
- More info: hitting the "side effect free" evaluation mode of the devtools restores the connection to the hidden tab.
$('body')
is not side-effect free, so the devtools doesn't run it.document.body
is side-effect free, and immediately triggers a preview + highlights the page. That results in the page being painted.
Versions reproduced on:
org.mozilla.reference.browser
: Firefox Beta 106.0b3 (Build #2015907027), e80324116+; AC: 106.0.2, 50dd304103, GV: 106.0-20220929195234org.mozilla.firefox_beta
: Reference Browser 1.0.2231 (Build #42140610 GV: 105.0a1-20220731065953)
Does NOT happen on:
org.mozilla.fenix
: Firefox Nightly 106.0a1 (Build #2015901675), b43243685+; AC: 106.0.20220901143315, f54031ee15; GV: 106.0a1-20220901095452- In the JS console, I can see that
WebExtensionPolicy.useRemoteWebExtensions
is true. Akaextensions.webextensions.remote
=true. - When I flip the preference to
false
, the bug also reproduces on Nightly.
- In the JS console, I can see that
Because of the last observation, I am marking it as a dependency of bug 1535365. However, I still believe that it's worth investigating this bug, to get a better understanding of why this bug occurs, in the hope that it may also fix this issue with the same symptoms, but more difficult to reproduce: https://github.com/mozilla-mobile/fenix/issues/24563
Reporter | ||
Comment 1•2 years ago
|
||
Arturo, you mentioned being able to reproduce this and saw some interesting logs. Could you attach the logs that you deem relevant, along with any relevant analysis that you can come up with?
Comment 2•2 years ago
|
||
Since this bug affects both Fenix and RB, it sounds like a GV (or AC) bug.
In the JS console, I can see that WebExtensionPolicy.useRemoteWebExtensions is true. Aka extensions.webextensions.remote=true.
Is the extension process enabled in Fenix or GV Nightly?? Did we leave it enabled in Nightly to continue testing it while we wait for MV3?
Since Android MV3 won't be ready until next year, I think we should disable the extension process in all channels. We want Nightly users to test the configuration that we're shipping.
Reporter | ||
Comment 3•2 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #2)
Since this bug affects both Fenix and RB, it sounds like a GV (or AC) bug.
In the JS console, I can see that WebExtensionPolicy.useRemoteWebExtensions is true. Aka extensions.webextensions.remote=true.
Is the extension process enabled in Fenix or GV Nightly?? Did we leave it enabled in Nightly to continue testing it while we wait for MV3?
It's off by default on GV. The pref might have been a leftover from a previous debugging session.
I mentioned the pref and all details/observations, with the hope that the ability to observe different behaviors can help with pinpointing the issue and fix.
Comment 4•2 years ago
|
||
(In reply to Rob Wu [:robwu] from comment #1)
Arturo, you mentioned being able to reproduce this and saw some interesting logs. Could you attach the logs that you deem relevant, along with any relevant analysis that you can come up with?
Sure, after opening a new tab and switch a couple of times between the new the old tab, the old tab became responsive again.
Logs:
Not recording metadata (canAddUri=false) for: moz-extension://d50c45a6-02ef-41a1-8686-7e458d541b14/asset-viewer.html?url=ublock-filters
2022-10-24 17:04:37.853 26451-26795 lla.fenix.debug org.mozilla.fenix.debug I Explicit concurrent copying GC freed 231988(8713KB) AllocSpace objects, 17(1168KB) LOS objects, 75% free, 11MB/45MB, paused 61us,52us total 71.819ms
2022-10-24 17:04:37.866 26451-26462 StrictMode org.mozilla.fenix.debug D StrictMode policy violation: android.os.strictmode.LeakedClosableViolation: A resource was acquired at attached stack trace but never released. See java.io.Closeable for information on avoiding resource leaks.
at android.os.StrictMode$AndroidCloseGuardReporter.report(StrictMode.java:1986)
at dalvik.system.CloseGuard.warnIfOpen(CloseGuard.java:336)
at android.view.Surface.finalize(Surface.java:310)
at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:319)
at java.lang.Daemons$FinalizerDaemon.runInternal(Daemons.java:306)
at java.lang.Daemons$Daemon.run(Daemons.java:140)
at java.lang.Thread.run(Thread.java:1012)
Caused by: java.lang.Throwable: Explicit termination method 'Surface.release' not called
at dalvik.system.CloseGuard.openWithCallSite(CloseGuard.java:288)
at dalvik.system.CloseGuard.open(CloseGuard.java:257)
at android.view.Surface.setNativeObjectLocked(Surface.java:760)
at android.view.Surface.updateNativeObject(Surface.java:599)
at android.view.Surface.copyFrom(Surface.java:626)
at android.view.Surface.<init>(Surface.java:268)
at org.mozilla.gecko.gfx.SurfaceControlManager.getChildSurface(SurfaceControlManager.java:76)
at org.mozilla.geckoview.GeckoSession$Compositor.syncResumeResizeCompositor(Native Method)
at org.mozilla.geckoview.GeckoSession.onSurfaceChanged(GeckoSession.java:6086)
at org.mozilla.geckoview.GeckoDisplay.surfaceChanged(GeckoDisplay.java:149)
at org.mozilla.geckoview.GeckoView$Display.acquire(GeckoView.java:121)
at org.mozilla.geckoview.GeckoView.setSession(GeckoView.java:525)
at mozilla.components.browser.engine.gecko.GeckoEngineView.render(GeckoEngineView.kt:116)
at mozilla.components.feature.session.engine.EngineViewPresenter.renderTab(EngineViewPresenter.kt:89)
at mozilla.components.feature.session.engine.EngineViewPresenter.onTabToRender(EngineViewPresenter.kt:62)
at mozilla.components.feature.session.engine.EngineViewPresenter.access$onTabToRender(EngineViewPresenter.kt:24)
at mozilla.components.feature.session.engine.EngineViewPresenter$start$1$3.emit(EngineViewPresenter.kt:46)
at mozilla.components.feature.session.engine.EngineViewPresenter$start$1$3.emit(EngineViewPresenter.kt:46)
at mozilla.components.support.ktx.kotlinx.coroutines.flow.FlowKt$ifAnyChanged$$inlined$filter$1$2.emit(Emitters.kt:223)
at mozilla.components.feature.session.engine.EngineViewPresenter$start$1$invokeSuspend$$inlined$map$1$2.emit(Emitters.kt:223)
at kotlinx.coroutines.flow.FlowKt__ChannelsKt.emitAllImpl$FlowKt__ChannelsKt(Channels.kt:62)
at kotlinx.coroutines.flow.FlowKt__ChannelsKt.access$emitAllImpl$FlowKt__ChannelsKt(Channels.kt:1)
at kotlinx.coroutines.flow.FlowKt__ChannelsKt$emitAllImpl$1.invokeSuspend(Unknown Source:14)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
at android.os.Handler.handleCallback(Handler.java:942)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loopOnce(Looper.java:201)
at android.os.Looper.loop(Looper.java:288)
at android.app.ActivityThread.main(ActivityThread.java:7898)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
``` at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:936)
Comment 5•2 years ago
|
||
Tagging jnicol, as part of the stack we have gecko.gfx
specifically org.mozilla.gecko.gfx.SurfaceControlManager.getChildSurface(SurfaceControlManager.java:76)
.
Do you have some context around what could be happening?
Thanks in advance!
Comment 6•2 years ago
|
||
That stack trace means we are not explicitly releasing the Surface we created using SurfaceControlManager. I should fix that, but it should be harmless. If I disable our usage of SurfaceControlManager altogether I can still reproduce this bug, so I don't think it's relevant. I imagine it's just showing up at the logcat at this time because we set our remaining reference to the surface to null in SyncPauseCompositor
, which gets called for the tab we are switching away from just before we switch to the new tab.
Comment 7•2 years ago
|
||
When adding logging to verify the above, I can see that SyncResumeResizeCompositor()
never gets called for the tab we are switching to. That's because mCompositorReady
is false here. This means we are never receiving the COMPOSITOR_CONTROLLER_OPEN
message from gecko. This happens automatically after gecko creates a compositor, which happens automatically whenever nsWindow::GetWindowRenderer()
is called. But I can see that never happens for this webextension window.
Is there anything special about this type of window that would cause gecko not to paint to it as usual?
Comment 8•2 years ago
|
||
I can also reproduce this, with the exception of the behavior described when the debugger is attached; for me, it did not load the page corectly, after attaching.
I can also confirm that having useRemoteWebExtensions
fixes the problem
I cannot see anything wrong with the tab load, but the content is not painted.
One additional thing I can add to all the observations made here is that when activating Talkback, it reads the correct content as if the page was actually loaded correctly.
Reporter | ||
Comment 9•2 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #7)
Is there anything special about this type of window that would cause gecko not to paint to it as usual?
The "window" here is a regular content window. The only special thing about it is that it is loaded in the main process instead of a content process.
More observations:
- x Reloading the tab does NOT fix the issue.
- x Copying the URL and pasting does NOT fix the issue.
- x Typing a different URL (e.g. dashboard.html instead of asset-viewer.html) does NOT fix the issue.
- v Navigating elsewhere (to a website) and then navigating back does "fix" the issue.
(In reply to Mihai Adrian Carare from comment #8)
I can also reproduce this, with the exception of the behavior described when the debugger is attached; for me, it did not load the page corectly, after attaching.
It is not just attaching the debugger, but also activating the inspector or triggering the "side effect free" evaluation, e.g. typing document.body
in the console.
One additional thing I can add to all the observations made here is that when activating Talkback, it reads the correct content as if the page was actually loaded correctly.
I can confirm - enabling Talkback in Android OS > Settings > Accessibility, followed by running the STR in Firefox triggers the bug, and after tapping inside the content area, Talkback will select the are and read the tapped content from the non-painted tab whenever I tap. The tab is not painted.
Comment 10•2 years ago
|
||
TalkBack working correctly makes sense, if the problem is just that nothing is triggering a paint but the page is still loaded.
Likewise that plausibly explains why the devtools inspector can fix it, because presumably that does trigger a paint.
It does sound plausible that loading a window from the main process behaves differently than content, and for some reason we don't paint. My next step would be to use the debugger to find out where nsWindow::GetWindowRenderer
gets called from for "normal" tabs, and see where that differs for this tab.
Comment 11•2 years ago
|
||
I am being affected by the same bug in Firefox Nightly for Android.
The issue is happening since a couple of weeks (so already several builds ago) and when entering NoScript or Dark Reader options.
Comment 12•2 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #10)
My next step would be to use the debugger to find out where
nsWindow::GetWindowRenderer
gets called from for "normal" tabs, and see where that differs for this tab.
Jamie, are you working on this bug? Or was your comment a debugging suggestion for whoever will be working on this bug?
(In reply to Jamie Nicol [:jnicol] from comment #6)
That stack trace means we are not explicitly releasing the Surface we created using SurfaceControlManager. I should fix that, but it should be harmless.
Should I file a new bug for that Surface LeakedClosableViolation warning?
Comment 13•2 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #12)
(In reply to Jamie Nicol [:jnicol] from comment #10)
My next step would be to use the debugger to find out where
nsWindow::GetWindowRenderer
gets called from for "normal" tabs, and see where that differs for this tab.Jamie, are you working on this bug? Or was your comment a debugging suggestion for whoever will be working on this bug?
No I'm not working on this, that was a suggestion.
That stack trace means we are not explicitly releasing the Surface we created using SurfaceControlManager. I should fix that, but it should be harmless.
Should I file a new bug for that Surface LeakedClosableViolation warning?
Yes that would be great, thanks.
Updated•2 years ago
|
Comment 14•2 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #13)
That stack trace means we are not explicitly releasing the Surface we created using SurfaceControlManager. I should fix that, but it should be harmless.
Should I file a new bug for that Surface LeakedClosableViolation warning?
I filed bug 1800541.
Assignee | ||
Comment 15•2 years ago
|
||
We were able to reproduce this issue inside the GeckoView sample application, this should rule out Android Components. Going to spend some time today trying to debug nsWindow::GetWindowRenderer
to see if we can track down what is different about these tabs.
Comment 16•2 years ago
|
||
(In reply to Jeff Boek [:boek] from comment #15)
Going to spend some time today trying to debug
nsWindow::GetWindowRenderer
to see if we can track down what is different about these tabs.
Tentatively assigning this bug to Jeff since he is investigating it.
Assignee | ||
Comment 17•2 years ago
|
||
Assignee | ||
Comment 18•2 years ago
|
||
Assignee | ||
Comment 19•2 years ago
|
||
Here is what I've found so far
When accessing a tab pointed to something like https://mozilla.org
I get what looks like a normal looking order of calls:
(Sorry for the gist links, attachments on Bugzilla were failing to upload for me)
https://gist.github.com/boek/f9ed619bae4b10c0e37fc5c47eea54f7
When the uBlock origin dashboard page is open, it seems to be looping with this stack infinitely. I'm not sure if this is relevant but it seemed odd:
https://gist.github.com/boek/fd26e5cb38f73a1ea150559afb376ee7
When opening the broken tab or when you switch tabs and navigate back nsWindow::GetWindowRenderer
doesn't ever seem to be called.
This is my first time debugging native code so it is possible I may not have something setup correctly.
Comment 20•2 years ago
|
||
When accessing a tab pointed to something like https://mozilla.org I get what looks like a normal looking order of calls:
(Sorry for the gist links, attachments on Bugzilla were failing to upload for me)
https://gist.github.com/boek/f9ed619bae4b10c0e37fc5c47eea54f7
We can see here the paint is called from the refresh driver tick here. So I think the interesting place to break the debugger and get the callstack is actually when mViewManagerFlushIsPending
gets set true here. It would be interesting to see whether that occurs for the broken extension tab or not.
When the uBlock origin dashboard page is open, it seems to be looping with this stack infinitely. I'm not sure if this is relevant but it seemed odd:
https://gist.github.com/boek/fd26e5cb38f73a1ea150559afb376ee7
I think this just means the page has an animated SVG or something like that on it.
Assignee | ||
Comment 21•2 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #20)
We can see here the paint is called from the refresh driver tick here. So I think the interesting place to break the debugger and get the callstack is actually when
mViewManagerFlushIsPending
gets set true here. It would be interesting to see whether that occurs for the broken extension tab or not.
I see, interesting. it looks like when switching from mozilla.org to the broken tab it does get set to true once: https://gist.github.com/boek/cbb4dc1094dcec721c565306818253fc
Switching from the broken tab back to mozilla.org: it is also only called once but the trace is significantly different: https://gist.github.com/boek/d33681986daae74af4f301ca2113f807
I'll keep poking around to see if I can see anything else
When the uBlock origin dashboard page is open, it seems to be looping with this stack infinitely. I'm not sure if this is relevant but it seemed odd:
https://gist.github.com/boek/fd26e5cb38f73a1ea150559afb376ee7I think this just means the page has an animated SVG or something like that on it.
Oooh, good to know. I tried to reproduce on a few pages with just SVGs but I wasn't getting the same behavior so I wasn't sure why it was happening.
Assignee | ||
Comment 22•2 years ago
|
||
So I think I found out why this is broken, but what I am not sure about is how.
When creating a new working tab it will eventually create a layout manager
https://searchfox.org/mozilla-central/rev/c1180ea13e73eb985a49b15c0d90e977a1aa919c/widget/android/nsWindow.cpp#2386
Here is the stack trace:
https://gist.github.com/boek/24ba936c44829286261de5ba393e40ad
When creating a new broken tab the furthest up the stack that it reaches is
mozilla::dom::binding_detail::GenericGetter<…>(JSContext *, unsigned int, JS::Value *) BindingUtils.cpp:3169
where then it repeatedly bounces around the frames marked with ✅, never reaching the frame above
mozilla::dom::FrameLoader_Binding::get_loadContext(JSContext *, JS::Handle<…>, void *, JSJitGetterCallArgs) FrameLoaderBinding.cpp:338
Updated•2 years ago
|
Comment 23•2 years ago
|
||
Following on from Jeff's good work in comment 22 finding the stacktrace for when the compositor is created in the working case: I'm not sure if there's a clever way to follow a callstack from C++ to javascript. But naively we can see the places in javascript which could call FrameLoader_Binding::get_loadContext()
are quite few: https://searchfox.org/mozilla-central/search?q=symbol:%23loadContext&redirect=false
And with some dump()
s we can narrow it down to here.
With the devtools debugger I can see the javascript callstack is:
construct (chrome://global/content/elements/browser-custom-element.js#983)
connectedCallback (chrome://global/content/elements/browser-custom-element.js#324)
init (chrome://geckoview/content/geckoview.js#88)
startup (chrome://geckoview/content/geckoview.js#546)
<anonymous> (chrome://geckoview/content/geckoview.xhtml#12)
In geckoview.xhtml we can see this comes from a "DOMContentLoaded" event. Which matches the dom::Document::DispatchContentLoadedEvents()
call in the native stacktrace just before it goes into javascript code. Neat!
Interestingly, when opening the broken tab, browser-custom-element.js' construct()
function does in fact get called! But with a slightly different stacktrace:
construct (chrome://global/content/elements/browser-custom-element.js#983)
connectedCallback (chrome://global/content/elements/browser-custom-element.js#324)
init (chrome://geckoview/content/geckoview.js#89)
startup (chrome://geckoview/content/geckoview.js#548)
handleNewSession (resource://gre/modules/GeckoViewNavigation.jsm#379)
createContentWindow (resource://gre/modules/GeckoViewNavigation.jsm#414)
And crucially, this.isRemoteBrowser
is false
here, meaning we never call loadContext
(which eventually leads to calling GetWindowRenderer()
/ CreateCompositor()
).
I guess (though haven't verified) that the reason this worked when extensions.webextensions.remote
was true
is that isRemoteBrowser
would therefore be true.
One last thing to note is that when loading the other ublock pages in the steps to reproduce this bug, (eg the panel and the dashboard), when construct()
is initially called the browser remoteness is true
, meaning we create the compositor successfully, and then immediately afterwards finishChangeRemoteness()
is called to make it false. But with the broken page, it is false from the beginning.
This is getting really far outside my sphere of knowledge now.. perhaps somebody else might know:
a) Why we don't call this code (which leads to creating the compositor) when isRemoteBrowser
is false.
b) Why for the broken page isRemoteBrowser
is initially false, whereas for other pages it starts off as true and is later switched to false.
Comment 24•2 years ago
|
||
The reason isRemoteBrowser
is false for the broken page (which unlike the working pages is opened via GeckoViewNavigation.jsm) is because of this line.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 25•2 years ago
|
||
Spent some more time debugging this today trying to invalidate paint explicitly by both changing the opacity
and toggling the visibility in SetActive
. This did seem to remove the last frame from drawn from the previous tab, but it didn't repaint the new tab.
Assignee | ||
Comment 26•2 years ago
|
||
Spent some more time today trying to
Comment 27•2 years ago
|
||
Post sprint-planning: The question mark is for sprint planning to consider if we want to add it to the sprint or not. Removing the ? because we have agreed to continue working on this.
Comment 28•2 years ago
|
||
We'll need to fix this bug for Android Addons GA.
Updated•2 years ago
|
Description
•