Closed Bug 1581868 Opened 5 years ago Closed 5 years ago

Black page sometimes when restoring geckoview_example/fenix with webrender enabled

Categories

(Core :: Graphics: WebRender, defect, P3)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
mozilla74
Tracking Status
firefox73 --- fixed
firefox74 --- fixed

People

(Reporter: jnicol, Assigned: jnicol)

References

Details

Attachments

(2 files)

My initial guess is that this is somewhat related to bug 1580153 - that we are calling NotifyFirstPaint() too early, so are uncovering the GeckoView before there is any content to display.

The priority flag is not set for this bug.
:jbonisteel, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jbonisteel)
Priority: -- → P3
Flags: needinfo?(jbonisteel)

Hey Jamie - do you have reliable STR for this?

Flags: needinfo?(jnicol)

Yes: open about:support, minimise app, restore app.

It 100% reproduces on about: pages. I'm not sure if it ever reproduces elsewhere.

Flags: needinfo?(jnicol)

This occurs on other pages as well, but it does not reproduce as a black page. Instead it feels like a screenshot or a previous framebuffer is blended with the new page.
Example:
https://news.ycombinator.com/item?id=21135424
go to ~ mid page, switch away from app, switch back and immediately scroll up.

This is no longer reproducible, so I'm assuming this was unrelated to this issue.

Been looking into this and my current lead is that in case of about:page vs regular pages ClearCachedResources is not being called. Due to this, AsyncImagePipelineManager::SetWillGenerateFrameAllRenderRoots() will not be called, causing AsyncImagePipelineManager::WillGenerateFrame to stay false for the default render root. This in turn causes WebRenderBridgeParent::MaybeGenerateFrame to early out and skip rendering.

In case of regular pages running in a content process, ClearCachedResources is called by BrowserHost::SetDocShellIsActive triggered by GeckoViewContent.jsm GeckoView:SetActive.

I'm still in the process of investigating where the corresponding code for pages running in the parent process should live.

Pushed by ktaeleman@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/01679b21503d Black page sometimes when restoring geckoview_example/fenix with webrender enabled. r=jnicol
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla73

The fix seems to have stopped working and original issue is back :/

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Blocks: wr-73-android
No longer blocks: wr-android-mvp

Been looking in to this. I agree with the assessment that in WebRenderBridgeParent::MaybeGenerateFrame(), mAsyncImageManager->GetAndResetWillGenerateFrame() is returning false for about: pages but true for other pages. I'm unsure why your patch doesn't force it to be true for subsequent composites. But in any case it won't happen in time for the initial composite, which is forced by the resumption of the app. (Because the PresShell paint will occur after that.) I think we probably want to generate the frame on that initial composite.

For the non-about: pages case, ClearCachedResources seems to be called while minimising the app, so we should probably do it then for about: pages too (haven't yet figured out why we don't).

But maybe we shouldn't be relying on AsyncImagePipelineManager for this. I'm guessing it's intended to tell us when we need to generate a frame because of async images (ie videos) requiring a new frame. So abusing it here might be hacky. Maybe we should detect when a composition is due to an app-resumption, and explicitly force generate a frame in that case. Not sure.

I'll keep looking in to this.

Actually, looking at where else SetWillGenerateFrame is called from, it seems like it's not just for async-images, but just generally whenever we need to generate a new frame. So using it should be fine. So I'll investigate why we're not clearing cached resources when minimising the app for about:pages.

Okay, so like I said in comment 10, we definitely want to force generate a frame for the composite which happens as part of the resume, rather than waiting for the next paint. Calling ScheduleGenerateFrameAllRenderRoots() in WebRenderBridgeParent::Resume() would do so, however, even that is not enough. We do indeed generate a frame, but webrender may decide to skip rendering it because it thinks nothing has changed. What we need is ScheduleForcedGenerateFrame(), which first sends an invalidate frame message to webrender, before ensuring we generate a new frame, then webrender will decide to definitely render it because of the invalidation message. This also happens to fix bug 1606179.

The next problem is that we now hit a race condition to do with the android surface size. Here is the java callback which fires when we resume. mCompositor.syncResumeResizeCompositor() synchronously calls the code discussed above which schedules a new frame, however this frame is then built asynchronously and will eventually be rendered on the render thread. The onWindowBoundsChanged() call on the other hand will asynchronously call in to the widget code to set the new widget size. In the render thread when we go to render the frame, we read the widget size and render the frame at that size. So there is a race condition between the widget size being updated and the frame being rendered. We are more likely to hit this now that we schedule a frame immediately on resume, however the problem was pre-existing.

This race condition was solved in layers by ignoring the widget's size variable, and instead passing the new size from the CompositorBridgeParent to the CompositorOGL. In webrender, the data path between CompositorBridgeParent and RendererOGL is trickier, so for now it will be easier to use ANativeWindow_getWidth/Height() to query the size (again, just ignoring the widget's size variable).

As Sotaro pointed out in Slack, long term we won't be able to use ANativeWindow_getWidth/Height(), as bug 1594860 will make us stop rendering in to an Android Surface. However, we can cross that bridge when we come to it.

Patch incoming

On webrender on android, parent-process pages (eg about:support) were not being
rendered immediately after minimising then resuming the app, resulting in a
black screen. The problem was that webrender believed the previous frame was
still valid, and therefore that it did not need to render a new
one. Content-process pages were unnaffected because we clear cached resources
when the app is minimised, so we accidentally rendered a new frame on
resumption.

To fix this we always force a new frame to be rendered immediately on
resumption. This uncovers a race condition which causes us to sometimes render
frames at the wrong size when the window size has changed (for example when
showing or hiding the keyboard), so that is also fixed.

This also fixes a bug affecting fenix, where when opening a page in a new tab
the portion of the screen where the keyboard used to be would remain black until
the page had loaded. This no longer occurs because we force a composite as soon
as the keyboard is hidden.

Additionally, this patch reverts the original attempt at fixing this
bug, as it is not necessary.

Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/467a138c74bc Make webrender schedule a frame immediately on resume. r=sotaro
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: mozilla73 → mozilla74

status-firefox73: affected → wontfix

In comment #12 it was said:

This also happens to fix bug 1606179.

And bug 1606179 is the fenix issue https://github.com/mozilla-mobile/fenix/issues/7338. There was said:

this is the last major issues that WR team is aware of, and are actively working on it. Source of issue has been found by Jamie, and so it's in-progress.

This will have to get into 73 beta.

Flags: needinfo?(ryanvm)

Yeah, we should definitely uplift this to beta. I was planning to let it sit on nightly for a few more days to get some more testing first, then will request an uplift.

Thanks for the info.

Flags: needinfo?(ryanvm)
Blocks: 1606179

Comment on attachment 9119799 [details]
Bug 1581868 - Make webrender schedule a frame immediately on resume. r?sotaro

Beta/Release Uplift Approval Request

  • User impact if declined: * Black page sometimes when restoring Fenix, until next scroll
  • Ugly black gap where keyboard used to be when loading a new page in Fenix
  • Page sometimes "jumping around" when showing/hiding the keyboard in Fenix
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Small change, just forces an extra render (using existing code paths) on window resize. Only affects webrender on android.
  • String changes made/needed:
Attachment #9119799 - Flags: approval-mozilla-beta?

Comment on attachment 9119799 [details]
Bug 1581868 - Make webrender schedule a frame immediately on resume. r?sotaro

Needed for Fenix WebRender testing. Approved for GV73.

Attachment #9119799 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Hi, verified the issue with Google Pixel 2 (Android 9) and Samsung Galaxy S9 (Android 8.0.0).

Google Pixel 2 (Android 9) - webrender enabled - default setting and with setting gfx.webrender.all set to "true" and Samsung Galaxy S9 (Android 8.0.0) with gfx.webrender.all set to true
Verified through:

  • geckview_example.apk from Comment21 with GV: 73.0b5 build ID: 20200114190416
  • Firefox Preview Nightly 74.0a1 200115 (Build #20150607) GV: 74.0a1-20200114094410
    Verified with the about pages, minimised the app and restored app:
  • about:support, about:about, about:buildconfig, about:config, about:plugins, about:memory, about:webrtc

Notes:

  • With Samsung Galaxy S9 (Android 8.0.0) I can partially reproduce the issue on both mentioned builds, the black screens stays only for a couple of seconds with the about:support page - please check video.
    Can it be somehow improved?
Flags: needinfo?(jnicol)

Hi Diana, thank you for testing.

Do you only see the black screen for a second on the Galaxy S9, but not on the Pixel 2?

Flags: needinfo?(jnicol)

Hi, yes, on Galaxy i can see the black screen for a little. On Pixel 2 the scheme is immediate.

Flags: needinfo?(jnicol)

Great. Would you be able to file a follow up bug about that please? And could you attach the about:support from the Galaxy to that bug. Thanks!

Flags: needinfo?(jnicol)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: