Closed Bug 1656211 Opened 4 years ago Closed 4 years ago

Popup windows from toolbar draws incorrectly

Categories

(Core :: Graphics: WebRender, defect)

Desktop
Linux
defect

Tracking

()

VERIFIED FIXED
84 Branch
Tracking Status
firefox-esr78 --- disabled
firefox78 --- disabled
firefox79 --- disabled
firefox80 --- disabled
firefox81 --- disabled
firefox82 --- disabled
firefox83 --- disabled
firefox84 --- verified

People

(Reporter: aosmond, Assigned: aosmond)

References

(Regression)

Details

(Keywords: regression)

Attachments

(6 files, 1 obsolete file)

This happens to me on Ubuntu 20.04 with my AMD Juniper card, but seemingly not on Ubuntu 18.04 with my Haswell GPU. When the GPU process was disabled, this started happening. I'll see if I can isolate this further.

STR:

  1. Explicitly disable GPU process if on older builds.
  2. Go to a random website. I used https://soylentnews.org/.
  3. Click on the Firefox Account button in the toolbar. Almost always happens right away, but I click a few times just in case I don't see it.
  4. Observe issue.
Flags: needinfo?(aosmond)

If this only happens with WR this might have the same cause as bug 1650246

mozregression --good 2020-01-01 --bad 2020-07-30 --pref gfx.webrender.all:true layers.gpu-process.enabled:false

It aborted before it could finish but I got:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8f68705097b4bf88cd61b43b14401cde98ac75b6&tochange=855249e545c361516a65bcba8f5bc6b423e2d131

This is the GLX appearance of bug 1650246. So I assume disabling the GPU process only made it more likely to occur.

(Jan Andre Ikenmeyer [:darkspirit] from bug 1650246 comment 0)

With GLX, the top-left quarter of these menus is sometimes cut off (transparent), but with EGL they seem to be incorrectly sized until you hover them.

If it makes it easier to reproduce, I'll take it. I can make it happen very easily, although it is not my dev machine, which makes it slightly less convenient for manually rebuilding.... I am currently eyeing bug 1612440 as a candidate.

Of course, if it is just a race as nical speculates in the other bug, that might not be the most useful regression bug ever :).

I'm not quite sure if they are one and the same..... if I run with LIBGL_ALWAYS_SOFTWARE=1, I see the symptoms observed in that bug, partial drawing of the popup, and at the wrong origin, which fits with nicals assessment.

In my case, it looks like it is pulling something random of the texture cache to fill in the upper left hand gap. It often looks different when I reproduce.

Gnome XWayland, Debian Testing, 2560x1440@60Hz, Intel HD Graphics 630 (KBL GT2)
Make sure that no input is focused, no caret is blinking, no text is highlighted and close the "Welcome to Nightly" message. Then click like crazy on the account menu button.
mozregression --good 2020-05-04 --bad 2020-05-07 --pref gfx.webrender.all:true layers.gpu-process.enabled:false

9:34.75 INFO: Last good revision: 74d50028caec9d5856a173c98a6172700f1ccc29
9:34.75 INFO: First bad revision: 6a43f985307516ec1ae2d413a39cd6d813560b8b
9:34.75 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=74d50028caec9d5856a173c98a6172700f1ccc29&tochange=6a43f985307516ec1ae2d413a39cd6d813560b8b

6a43f985307516ec1ae2d413a39cd6d813560b8b sotaro — Bug 1574746 - Remove AllowWebRenderForThisWindow() r=nical

This is just the commit which enabled WebRender for search results, autoscroll icon, toolbar menus like the account menu (oop webext panels used WebRender already before).

(In reply to Andrew Osmond [:aosmond] from comment #6)

In my case, it looks like it is pulling something random of the texture cache to fill in the upper left hand gap. It often looks different when I reproduce.

If I open the account panel while something is moving or blinking, the "transparent" top-left quarter can be partly red and looks squished(?). It's more like a corrupted screenshot of the area behind it. Your screenshot has these red nuances, too.

That bug seems much more relevant. Let's see if I can reproduce that. Thanks darkspirit! I don't need to click like crazy but maybe I'm just lucky :). It almost always appears on the first click, and if not the first, then the second.

"Enabling WebRender for this widget type" must not be the cause. We have to find different STR to get to the bottom of it. ^^

bug 1650246 has an older sister that I saw on KDE long before: bug 1502519

Oh. I found out that it can also occur with bookmark, library and main menus as well.
Attached screenshot, build from May:
mozregression --launch 2020-05-04 --pref gfx.webrender.all:true layers.gpu-process.enabled:false -a https://keithclark.co.uk/labs/css-fps/nojs/
While testing, the library menu could either
a) be fine,
b) be transparent with red corruption as in attached screenshot, or
c) cause all rendering to hang for a second, then show up, but be totally transparent on the left - almost like attached screenshot, but without the red glitch.

But in previous comment, the library menu is apparently Basic, not WebRender. gfx.webrender.debug.new-frame-indicator does not show up.
Edit: No it's OpenGL, layers.acceleration.draw-fps does show up.

mozregression --launch 2020-05-04 --pref gfx.webrender.all:true layers.gpu-process.enabled:false -a https://keithclark.co.uk/labs/css-fps/nojs/
While in one session it can be easier to reproduce, in others it's impossible.
Now I have sucessfully raped the About Nightly window (a WebRender window) on Gnome XWayland, by repeatedly maximizing/unmaximizing and changing window size, which is bug 1502519, and proven that it's not restricted to KDE. These red glitches look similar to the ones at library and account menu.
Comment 11 to comment 13 may be offtopic, I don't know, I just wanted to mention it.

Comment 13 is still reproducible with latest Nightly: Updated bug 1502519 with screencast.

No longer blocks: wr-80
Blocks: wr-81

@aosmond: reminder :)

No longer blocks: wr-81

Also happens with extension popup windows. Second video from me: https://youtu.be/_I1MsTzthiU

We are deferring Linux MVP another release; I haven't been able to prioritize this work but we can't ship without it fixed.

Blocks: gfx-83
No longer blocks: gfx-82

This one looks like WR specific, I can't reproduce it on GL backend. It's also reproducible in debug builds so it does not look like a race condition or some timing issue. I can reproduce it on GLX backend only, EGL backend is OK.

I used qapitrace and traced the textures/buffers used here. It looks like more deep problem with popup rendering via. WebRender. What we see on the screens here is just backbuffer (perhaps ATI does not clear it).

The issue is that the popup is drawn in two or more steps when the part of the popup is drawn in first phase, then we swap buffers and draw rest of the popup and sometimes a part of the popup is missing and/or we see backbuffer content from the previous step.

This is not a problem with source textures - source texture with the popup content is created correctly.

The corruption happens at draw_frame/end_frame time when the popup texture is rendered to backbuffer - for instance draw_frame/pass5/framebuffer/Composite clears and update only part of the back buffer.

I'm afraid that's all I can help here as the rust stuff is beyond my scope.

Attached image backbuffer content after first frame (deleted) —

Backbuffer content after first frame of popup rendering on radeon drivers, taken from apitrace.

Attachment #9179825 - Attachment description: text.png → backbuffer content after first frame

(In reply to Martin Stránský [:stransky] from comment #22)

This one looks like WR specific, I can't reproduce it on GL backend. It's also reproducible in debug builds so it does not look like a race condition or some timing issue. I can reproduce it on GLX backend only, EGL backend is OK.

No longer blocks: gfx-83

Given we see it most often with GLX, I am exploring using GLX_EXT_buffer_age to decide whether we need to do a full render or not. It is possible there are multiple related problems since we already use EGL_EXT_buffer_age today... for the problem I reproduce regularly, I only see it with GLX.

Attached file Bug 1656211 - Fix partial present paths with GLX. (obsolete) (deleted) —

When swapping buffers with a GLX context, the back buffer may no longer
be valid. We can determine this by checking the buffer age via the
GLX_EXT_buffer_age extension, which is similar to the EGL_EXT_buffer_age
we use with EGL. If it is 0, then we must redraw the entire frame.

Assignee: nobody → aosmond
Status: NEW → ASSIGNED

Upon further examination and discussions with jnicol/kvark, I believe:

  1. The EGL implementation on Linux would also suffer from problems if the buffer age was 2+ (might explain why we see it less often there).
  2. Since partial present is disabled on Linux, it is supposed to be always doing a full redraw, but clearly it isn't.

So I have a few bugs to fix, which involves:

  1. Splitting out partial present support such that we can let WR know what areas it needs to redraw, even if the compositor itself doesn't support a set of damage rects when it does the swap
  2. Ensure that if we don't support the above with buffer age, then we always do a full redraw.
Flags: needinfo?(aosmond)
Attachment #9180723 - Attachment description: Bug 1656211 - Use GLX_EXT_buffer_age with GLXContext if supported. → Bug 1656211 - Fix partial present paths with GLX.

Similar to bug 1280653, it appears that GLX invalidates the back buffer
while we are drawing. The only indication we get of this are resize and
configure events from X. We suppress the configure event for popups for
various reasons, so this patch explicitly generates a forced recomposite
of the frame. It does it immediately so that most of the time it should
beat the presentation of the buffer and avoid displaying of the bad
frame to the user; popups generally are not complicated and should have
plenty of budget to perform the second composite.

URL: 1280653

Gnome XWayland, Debian Testing, Intel HD Graphics 630 (KBL GT2), Mesa 20.1.9

If WebRender is enabled, I can reproduce this bug (top left quarter of the library menu is transparent) by clicking on the library button, then hovering the main menu button:
mozregression --launch 2020-10-19 --pre gfx.webrender.all:true -a about:blank
The bug does not occur if the GPU process is enabled.

With try build from comment 30 I am still able to reproduce it with WebRender (top left quarter is transparent) and OpenGL (top left quarter is colorfully corrupted after playing a video on YouTube):
mozregression --repo try --launch dd9b61a3d09f8f3dbaf7a885b2815e1696332d6e --pref gfx.webrender.force-disabled:true layers.acceleration.force-enabled:true -a about:blank

Pushed by aosmond@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/99dcd4e9ee56 Force recompositing frames on GTK when popup window configuration changes. r=nical
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 84 Branch

Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.

I just tested the autoland build (changeset 99dcd4e9ee56) with mozregression and I still see this bug with the default settings (WebRender enabled and GPU process disabled), e.g. when I click on the library button. Am I missing something here?

Regressions: 1672565

I'm investigating the reports of this still happening. Managed to reproduce. It seems timely related, so if the forced composite comes in a bit later, it seems to avoid the problem.

Attachment #9180723 - Attachment is obsolete: true
Flags: qe-verify+

With Firefox 85 build 20201118215158 and EGL the top left corner transparency issue doesn't happen, but this issue still happens as before: https://youtu.be/jghvl0R31LQ

We are tracking this issue now in bug 1676164, please check if this is what you can observe.

I could reproduce this issue using old Nightly from 2020-07-30 using Ubuntu 18.04 with AMD HD 5450 gpu and gfx.webrender.all true and layers.gpu-process.enabled false. I also verified that using beta Firefox 84.0b3 this does not reproduce, although there still is a glimpse of bad rendering but only for a fraction of a second if clicking fast enough on toolbar tools but it recovers very fast. Is this acceptable in your opinion?

Flags: needinfo?(aosmond)

(In reply to Bogdan Maris [:bogdan_maris], Release Desktop QA from comment #40)

I could reproduce this issue using old Nightly from 2020-07-30 using Ubuntu 18.04 with AMD HD 5450 gpu and gfx.webrender.all true and layers.gpu-process.enabled false. I also verified that using beta Firefox 84.0b3 this does not reproduce, although there still is a glimpse of bad rendering but only for a fraction of a second if clicking fast enough on toolbar tools but it recovers very fast. Is this acceptable in your opinion?

Unfortunately yes. Ideally we would get rid of that flash too, but as this is a race condition with X (see bug 1280653 for a more indepth analysis with the OpenGL compositor), it might not be trivial to do so.

Flags: needinfo?(aosmond)

(In reply to Andrew Osmond [:aosmond] from comment #41)

(In reply to Bogdan Maris [:bogdan_maris], Release Desktop QA from comment #40)

I could reproduce this issue using old Nightly from 2020-07-30 using Ubuntu 18.04 with AMD HD 5450 gpu and gfx.webrender.all true and layers.gpu-process.enabled false. I also verified that using beta Firefox 84.0b3 this does not reproduce, although there still is a glimpse of bad rendering but only for a fraction of a second if clicking fast enough on toolbar tools but it recovers very fast. Is this acceptable in your opinion?

Unfortunately yes. Ideally we would get rid of that flash too, but as this is a race condition with X (see bug 1280653 for a more indepth analysis with the OpenGL compositor), it might not be trivial to do so.

Got it, I'm going to close this bug as verified for now.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: