Closed Bug 1712969 Opened 3 years ago Closed 3 years ago

Partial present is glitchy

Categories

(Core :: Graphics: WebRender, defect, P3)

defect

Tracking

()

RESOLVED FIXED
100 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- wontfix
firefox88 --- unaffected
firefox89 --- unaffected
firefox90 --- wontfix
firefox91 --- wontfix
firefox92 --- wontfix
firefox93 --- wontfix
firefox94 --- wontfix
firefox95 --- wontfix
firefox98 --- wontfix
firefox99 --- wontfix
firefox100 --- fixed

People

(Reporter: heftig, Assigned: jnicol)

References

(Blocks 3 open bugs, Regression, )

Details

(Keywords: correctness, regression)

Attachments

(2 files)

I'm seeing glitches accountable to poor partial present.

Firefox Nightly, GNOME 40, Arch Linux
Dell XPS 13 7390 2-in-1 (3840x2400, scale 2)
MOZ_ENABLE_WAYLAND=1

To reproduce:

  1. Open https://c.eev.ee/doom-text-generator/
  2. If the "Scale" slider is not visible with the page scrolled to the top,
    make the window larger.
  3. Drag the "Scale" slider right and left repeatedly.
  4. As the text changes in size, observe glitches showing text where the
    empty background should be.

mozregression results:

2021-04-19: meh: frequently presents entire window, but shows no glitches

Last meh revision: 30af8f80e2754783f1981485dd911a2e341d9afd
First good revision: f8eb1926bfdbb630b4aa17e064c3472c4a32709e
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=30af8f80e2754783f1981485dd911a2e341d9afd&tochange=f8eb1926bfdbb630b4aa17e064c3472c4a32709e
Fixed by bug 1377445

2021-04-20: good
2021-04-21: good

Last good revision: 6c2c039872b3c49c22323e13473c84bab33ebb7c
First bad revision: 1859720a3225cf4d9560ccc240555f1e05dc3a76
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=6c2c039872b3c49c22323e13473c84bab33ebb7c&tochange=1859720a3225cf4d9560ccc240555f1e05dc3a76
Regressed by bug 1706452

2021-04-22: bad: glitches as described above

The two "good" nightlies were only good because basic layers were being used.

I've rerun the regression using LD_PRELOAD=/usr/lib/libgtk-3.so mozregression --good 2021-04-19 --bad 2021-04-22 -a https://c.eev.ee/doom-text-generator/ to work around avoid bug 1706452.

Last good revision: 42c3b61f3db2df3cc5941a9e71875f06e35042f0
First bad revision: 2a384027ee9bcee63121a68d42c8db6644c8d8aa
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=42c3b61f3db2df3cc5941a9e71875f06e35042f0&tochange=2a384027ee9bcee63121a68d42c8db6644c8d8aa
Regressed by bug 1706488

Component: Widget: Gtk → Graphics: WebRender
Regressed by: 1706488
No longer regressed by: 1706452

Glenn, any idea why this could be? We share our partial present support with Android, so it might reproduce there as well.

Flags: needinfo?(gwatson)
Attached video bug.mp4 (deleted) —

It does reproduce on Android. Here's a video of Nightly on my Pixel 3 XL.

No specific ideas, it will need to be debugged with some logging of the dirty rects being supplied by WR.

Flags: needinfo?(gwatson)

This also affects Windows if you disable gfx.webrender.compositor. So I don't think it's to do with EGL partial present specifically.

On Mac the canvas gets drawn with a black background if you disable the OS compositor, so it doesn't reproduce there, but it's buggy in a different way.

Set release status flags based on info from the regressing bug 1706488

Severity: -- → S4
Priority: -- → P3
Depends on: wr-correctness, wr-non-os-compositor
No longer depends on: wayland
Keywords: correctness
OS: Linux → All
Hardware: Desktop → All
Summary: [wayland] Partial present is glitchy → Partial present is glitchy
Has Regression Range: --- → yes
Has STR: --- → yes

I've also seen rapid flashing with YouTube (when video playback stops and the throbber appears) and Google Meet (when the dimensions of the shared window change) that look like the same bug.

Does setting gfx.webrender.allow-partial-present-buffer-age to false prevent that flashing?

Flags: needinfo?(jan.steffens)

That does seem to help. Although it also causes the Doom Text Generator (from comment #0) to have a black canvas background that's not intended. PS: Never mind, I can't reproduce the black background.

Flags: needinfo?(jan.steffens)

(In reply to Jamie Nicol [:jnicol] from comment #8)

Does setting gfx.webrender.allow-partial-present-buffer-age to false prevent that flashing?

I'm able to reproduce comment 0 on EGL/Xwayland.
No, disabling gfx.webrender.allow-partial-present-buffer-age and restarting Nightly doesn't help.
The glitch goes away when scrolling down.
The canvas also had a black background once, then I pressed Ctrl+F5 and it was white again.

It did not flicker for me. There were just artifacts.

Default configuration: So far this bug has been shipped on Android and Wayland (Ubuntu, Fedora).
Now this bug is also default on X11 and Xwayland (bug 1695933).

Please have a look at duplicate bug 1731485 which has another small testcase.

  • GLX can be fixed with gfx.webrender.allow-partial-present-buffer-age:false or with gfx.webrender.max-partial-present-rects:0.
  • EGL can only be fixed with gfx.webrender.max-partial-present-rects:0.
Flags: needinfo?(jnicol)

(In reply to Jan Alexander Steffens [:heftig] from comment #9)

That does seem to help. Although it also causes the Doom Text Generator (from comment #0) to have a black canvas background that's not intended. PS: Never mind, I can't reproduce the black background.

I can reproduce the black background when Fission is enabled by Ctrl+Clicking the Reload button to open a copy of the tab.

mozregression --launch 2021-11-30 -a https://c.eev.ee/doom-text-generator/ --pref fission.autostart:true

This is independent of gfx.webrender.allow-partial-present-buffer-age, so it also reproduces with the following:

mozregression --launch 2021-11-30 -a https://c.eev.ee/doom-text-generator/ --pref fission.autostart:true gfx.webrender.allow-partial-present-buffer-age:false

PS: This is now bug 1743656.

Fedora 35 - Gnome 41.2 wayland - AMDGPU (mesa) - Webrender enabled (by distro package) - else default settings

HTML5 players (Youtube and other streaming sites) show a heavy flicker on every ~10th seek. The issue looks like this: https://streamable.com/item3i

Setting gfx.webrender.max-partial-present-rects:0 fixes the issue completly.

Not fixed by bug 1747857.

(In reply to Darkspirit from comment #14)

Please have a look at duplicate bug 1731485 which has another small testcase.

  • GLX can be fixed with gfx.webrender.allow-partial-present-buffer-age:false or with gfx.webrender.max-partial-present-rects:0.
  • EGL can only be fixed with gfx.webrender.max-partial-present-rects:0.

Thanks for that link. that test case is indeed a bit simpler.

What's happening here in either test case is that there is a canvas, which since bug 1706488 gets promoted to a compositor surface. For partial present (both buffer age and max-partial-present-rects/eglSwapBuffersWithDamage), we calculate the current frame's dirty rect by unioning the dirty rect of each tile, for all tiles (including promoted compositor surfaces as well as picture cache tiles). For compositor surfaces, the dirty rect is simply set to the full area of the tile, meaning a) we always render it in composite_simple(), and b) the OS compositor will composite it because we told it the area is dirty in eglSwapBuffersWithDamage.

However, in these test cases the problem is that the size of the canvas changes, meaning the size and therefore dirty rect of the compositor surface changes. When we calculate_dirty_rects() for the new frame, the rect only includes the area of the new canvas size, and not the area which the canvas used to occupy. This means we do indeed render the new area of the canvas correctly, but (when using buffer age) we do not render the area the canvas used to occupy, leaving stale content on the framebuffer. And (when using max-partial-present-rects) nor do we tell the OS compositor that the area has been updated, so it may skip compositing that area too. Hence why the bug reproduces with either buffer age, max-partial-present-rects, or both

It's quite trivial to fix this simply for compositor surfaces. We just need to keep track of the union of compositor surface rects for each frame. And then union the combined tiles dirty rect with the previous frame's compositor surface rects to calculate the dirty rect for the new frame. (Which will be a nop if the compositor surfaces have not moved.) This is easy because we always treat the entire area of the compositor surface as dirty.

However, this has me worried that it could go beyond just compositor surfaces.. Is it possible that picture cache slices/tiles can move, and therefore won't be included in the dirty rect calculation? I think this would possibly be more difficult to account for, as we track dirty subregions of picture cache tiles. Do you know of any defence we have against that situation, Glenn?

In any case, I guess we should land the fix for compositor surfaces first.

Flags: needinfo?(jnicol) → needinfo?(gwatson)

I'm not sure I completely follow the question, but I don't know the buffer-age code or requirements very well.

I know that we do skip partial present if any tiles have been added or removed, and this is a hack / workaround to deal with not knowing the previous state of that tile so we just skip partial in those cases (this is relatively rare compared to number of frames we draw where the tile config hasn't changed). There may be other cases where we set dirty_rects_are_valid to be false that are relevant here.

In any case, I would not be at all surprised if there are some similar issues with tiles as compositor surfaces. I guess, like you said, maybe we fix the compositor surface case first and then consider the content tile case and see if we can reproduce it.

Would it be helpful (and maybe get rid of the hack I mentioned above) if we had the complete state of tiles (position / size / dirty region) from last frame stored / made available? And then we could perhaps do better diffs to find the exact dirty rect? Or is that orthogonal to the problem here?

Flags: needinfo?(gwatson)

(In reply to Glenn Watson [:gw] from comment #20)

I know that we do skip partial present if any tiles have been added or removed, and this is a hack / workaround to deal with not knowing the previous state of that tile so we just skip partial in those cases

Interesting. hopefully this prevents the thing I was worried about happening from happening. Perhaps this hack can be extended to deal with the compositor surface case as well, or else I can just land the fix I already have. Will look in to it a bit.

In any case, I would not be at all surprised if there are some similar issues with tiles as compositor surfaces. I guess, like you said, maybe we fix the compositor surface case first and then consider the content tile case and see if we can reproduce it.

Sounds good.

Would it be helpful (and maybe get rid of the hack I mentioned above) if we had the complete state of tiles (position / size / dirty region) from last frame stored / made available? And then we could perhaps do better diffs to find the exact dirty rect? Or is that orthogonal to the problem here?

Yeah, sounds like that could potentially be used to fix this bug and remove the hack above. But it wouldn't be a prerequisite, we can fix this in the meantime without. Thanks Glenn!

When using webrender's draw compositor with partial present enabled,
we treat the entire area of all promoted compositor surfaces as
dirty. However, if a compositor surface is reduced in size this is
buggy - the new area of the surface will be treated as dirty, but
the additional area it used to occupy will not be. This means that
area may not be updated properly when compositing, leaving stale
content behind.

To fix this, keep track of the ItemUids of primitives that were
promoted to compositor surfaces for each sub slice. If the Uid
changes then set dirty_rects_are_valid to false, ensuring we do a
full-composite on the next frame.

This is perhaps overly conservative in some respects, but it solves
this correctness bug whilst at the same time still allowing partial
present for the common case where there are no changes to the number
or position of compositor surface and just their contents needs to be
updated.

Assignee: nobody → jnicol
Status: NEW → ASSIGNED
Attachment #9267168 - Attachment description: Bug 1712969 - Don't use partial present if compositor surface prims change. r?gw → Bug 1712969 - Perform full composite if external surfaces are moved or shrunk r?gw
Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0f88bce6d4bb Perform full composite if external surfaces are moved or shrunk r=gw
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 100 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: