Closed Bug 1588581 Opened 5 years ago Closed 4 years ago

Black Flash When Navigating Away From GV in Browser Fragment to other Fragment

Categories

(Core :: Graphics, defect, P1)

Unspecified
Android
defect

Tracking

()

RESOLVED MOVED
Tracking Status
firefox71 --- affected
firefox72 --- affected

People

(Reporter: ekager, Assigned: snorp)

References

Details

Reproducible in Fenix, RB, AC samples-browser

STR:

  1. (?) Have EngineView hosted in fragment layout
  2. Navigate to other fragment from Browser Fragment

Expected:
No black flash

Actual:
Black flash when navigating away

See slowed down gif of Fenix flash and more Fenix context

I think this may have to do with how the SurfaceView lifecycle is handled in fragments vs in activities maybe?

P1
snorp is investigating. He can reproduce in Fenix, but not in other apps.

Assignee: nobody → snorp
Priority: -- → P1

"Other apps" as in RB and samples-browser too? I see it in those as well when clicking into the tabs fragment but it's fast.

OS: All → Android
Whiteboard: [geckoview:m1911]

OK, I think I've figured out what's going on here. Currently, when the GeckView loses the Surface, we pause the compositor but also call GeckoSession.setActive(false). This in turn sets the DocShell inactive. I think this is causing us to also throw away layers. When the Surface comes back later, we synchronously resume the compositor, but it doesn't have anything to draw, resulting in a flash of blank content. I verified that removing the setActive(false) calls in GeckoView mitigated the problem.

Jeff, am I on the right track here? I didn't print the layer tree or anything. This is with traditional layers, not WR. If my guess is correct, do you think we could hold on to the layer tree for a while? Maybe only evict if we need to due to memory pressure?

Flags: needinfo?(jmuizelaar)

What your saying sounds plausable. I'm ni'd Jamie and Kris perhaps they can provide more useful input.

Flags: needinfo?(ktaeleman)
Flags: needinfo?(jnicol)

Yes, that sounds like another issue I'm looking into with WR enabled where when in about:support (or other parent process pages), switching away and back to the app results in a black screen.
I have to force a paint when PresShell becomes active again, else we end up with a black page until we scroll.

Bug 1581868 - Black page sometimes when restoring geckoview_example/fenix with webrender enabled

I'll take a closer look at this too in that context too as even with the repaint I still see a black flash when switching back to the app.

Flags: needinfo?(jnicol)
Flags: needinfo?(jmuizelaar)

Moving to the graphics component as this sounds

Component: General → Graphics
Product: GeckoView → Core
No longer blocks: wr-android-mvp
Flags: needinfo?(ktaeleman) → needinfo?(jnicol)

(In reply to Kris Taeleman (:kris/kris) from comment #8)

https://searchfox.org/mozilla-central/rev/923eec8d2fb8078ebc7a33a9e1ce73eac01f7446/gfx/webrender_bindings/src/bindings.rs#1400
-> Looks like this was encountered before

Well it's not like white is much better :)

Maybe I'm misunderstanding, but this bug seems to be about a black flash when navigating away from the browser, to the settings page or tab switcher. Comment 4 seems to be about resuming, ie switching back to the browser, away from the tab drawer or settings page.

I'm not convinced the black flash when navigating away is graphics' fault. We set everything inactive in GeckoView.Display.onSurfaceDestroyed(), which I presume happens when the SurfaceView is destroyed by navigating away from the browser fragment. No rendering can occur after this point. I'm not sure what we could do, when we don't have a surface to render in to. Isn't the problem that there is a delay between the SurfaceView being destroyed and the TabsFragment being created and shown? So the android UI displays black during this time.

FWIW I can no longer reproduce the behaviour in the screen recording on Fenix. However, I can easily still do so in Samples Browser. In the original fenix fix, as well as taking a screenshot and setting swipeRefresh?.background to display it, also set engineView.asView().visibility = View.GONE. Because the screenshot code wasn't yet implemented in webrender, this was removed, but we continued to set the visibility to GONE. I guess this means we get a white flash rather than a black flash, which is a big improvement, but better still would be if the SurfaceView wasn't destroyed until the next fragment is ready?

Flags: needinfo?(jnicol)
Whiteboard: [geckoview:m1911]

There was a small bandaid added on the Fenix side that switched the flash color to be black or white, to match the Light/Dark theme on Fenix.

Jamie, do you think the problem is on the Fenix side? If so, we can investigate that further, but the repro steps have changed slightly (and I've updated the comment 0 to clear up confusion).

Flags: needinfo?(jnicol)

I think it might be, yes. I'm certainly not 100% sure, as the interaction between Fenix/Android, geckoview, and graphics code here is complicated and I definitely don't understand it fully. But it seems to me that graphics code can't do anything after GeckoSession.onSurfaceDestroyed() (as we no longer have a target to render in to), and that the problem is that is being called too early, before the tab switcher / settings menu fragment is visible.

Perhaps bug 1594860 will fix this longer-term, however, by not using a SurfaceView.

Flags: needinfo?(jnicol)

It looks like https://bugzilla.mozilla.org/show_bug.cgi?id=1547737 should have fixed this. Unfortunately, we are now running into this bug on webrender devices:

When a captureThumbnail is requested while the engine is still doing an initial page load (engineView is still completely white) the captureThumbnail never goes inside its block. For example, in this code snippet, "captureThumbnail" never gets printed

This leads to the home button press being ignored in Fenix. A demo GIF of this behavior can be seen here

 engineView.captureThumbnail { bitmap ->
            Log.d("Sawyer", "captureThumbnail")
        }

Any ideas what's going on here? Should I reopen https://bugzilla.mozilla.org/show_bug.cgi?id=1547737 or is this a different issue?

Flags: needinfo?(snorp)
Flags: needinfo?(jnicol)
Flags: needinfo?(cpeterson)

That sounds like a bug in the webrender implementation of RecvRequestScreenPixels. I believe it works by reading the pixels from the next composite, and I guess because the page is loading, there isn't a composite scheduled.

Please file a new bug about that.

I really don't think we want to reintroduce Fenix bug #1901 as a workaround for this flash, however. Reading pixels from the GPU is unavoidably slow. Delaying opening a new screen until that completes will be laggy. Do we understand why this flash occurs in the first place? Why is the surface destroyed before the new fragment is visible?

Flags: needinfo?(jnicol)

You can see more of my investigation here: https://github.com/mozilla-mobile/fenix/issues/1901#issuecomment-575263206. Basically the destroying of the SurfaceView is handled by the Navigation Library (a fragment gets destroyed while the transition is happening). We cannot force the fragment or view to stick around longer as its not our code (you can see I tried to do this in one of those comments).

I agree screenshotting is not an ideal solution, but it seems to be the only one that actually works until we move away from SurfaceView which is known by Google to have this bug.

Ah, great work figuring that out. This is an unfortunate situation :(

Okay, I can fix the RecvRequestScreenPixels during page load issue. That needs fixed in any case.

Flags: needinfo?(cpeterson)

Do you have any updates on this, Jamie? :)

Flags: needinfo?(jnicol)

Are you sure you only see this behaviour with webrender enabled? I can reproduce with webrender enabled or disabled (after applying your pull-request to my local fenix build).

Adding some logging to geckoview and graphics code, not even the geckoview functions are being called while the page is being loaded, so the request is being lost between Fenix and GeckoView. Looking at the android-components source I would guess that it is because of this - firstContentfulPaint will be false during page load.

I'll try to build fenix with a local a-c with that statement removed, and see if it works.

Flags: needinfo?(jnicol)

Ah, that seems like a good lead! Sorry for taking us down the wrong path here 😅

Flags: needinfo?(snorp)
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → MOVED
You need to log in before you can comment on or make changes to this bug.