Open Bug 1617498 (WR-linux-wayland-compositing) Opened 4 years ago Updated 10 months ago

[meta] WR Wayland Compositing

Categories

(Core :: Graphics: WebRender, enhancement, P3)

Desktop
Linux
enhancement

Tracking

()

ASSIGNED

People

(Reporter: gw, Assigned: rmader)

References

(Depends on 17 open bugs, Blocks 2 open bugs)

Details

(Keywords: meta)

WebRender has a trait that can be implemented by Gecko which allows all rendering to occur in native compositor surfaces [1].

On Windows, we render directly into DirectComposition surfaces, while on Mac we render directly into CoreAnimation surfaces. It would be great if we could also do this on Linux, when supported by the underlying windowing system.

The advantage is that WebRender no longer composites the set of picture cache slices into a single buffer before handing to the OS. Instead, the OS compositor is able to composite the picture cache slices directly. This can result in significant performance and battery improvements. We're also able to support compositing video directly to a native compositor surface, which can provide further performance and power savings (this work is being tracked in [2]).

I don't believe this is feasible on X11, since there's no way that I'm aware of to draw into surface tiles with the GPU, and composite them with a single atomic transaction (if there is a way, please let me know!).

However, I believe that Wayland supports everything we need, so long as the wp_viewporter [3] or similar extension is supported. WebRender needs this in able to support clipping of the wayland subsurfaces that the picture cache tiles would be rasterized into. It appears that this extension is available in GNOME [4] and also KWin / Plasma [5].

[1] https://searchfox.org/mozilla-central/rev/a37fc61f172b432e7ae0b6b4c4a12cac2a787a0f/gfx/wr/webrender/src/composite.rs#451

[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1579235

[3] https://cgit.freedesktop.org/wayland/wayland-protocols/tree/stable/viewporter/viewporter.xml

[4] https://gitlab.gnome.org/GNOME/mutter/issues/132

[5] https://phabricator.kde.org/D26171

CCing a few people that might be interested in this work.

That can be done on Wayland by rendering to dmabuf as it's implemented for WebGL (Bug 1586696). Also cross-process fence synchronization is available (Bug 1614568).

It appears that this extension is available in GNOME [4] and also KWin / Plasma [5].

Weston also does support it well

Author of the Gnome Viewport implementation here. I wouldn't be surprised if you run into bugs in Mutter when using subsurfaces so advanced (we don't have any clients doing that yet). So great to see this and I'll be following this bug closely. Feel free to always ping me.

Great, thanks Robert! We shouldn't need any cross-process synchronization for this case, I think - all surface allocation and rasterization occurs inside the GPU process.

Do we have a ticket for the GPU process on Wayland?

I believe GPU process is enabled on Linux now by default on nightly? I'm not sure if that's different when using Wayland?

Even if not using a dedicated GPU process, WR still exists in a single process as far as all allocation and rasterization is involved.

(In reply to Glenn Watson [:gw] from comment #6)

I believe GPU process is enabled on Linux now by default on nightly? I'm not sure if that's different when using Wayland?

Wayland does not use GPU process. It's disabled because Wayland can't share plain surfaces/windows across processes. Wayland can only share the underlying GPU memory (by dmabuf) which can be mapped to EGLImage/framebuffer in different processes.

Priority: -- → P3
OS: Unspecified → Linux
Hardware: Unspecified → Desktop

Side note: the upcoming Sway version will have viewport support, too.

Sway 1.5 with viewporter support is out.

Using wl-viewports would apparently allow us to scale videos more efficiently. YUV conversion in the compositor is not mandatory in Wayland - the Mutter tracking bug for that is here: https://gitlab.gnome.org/GNOME/mutter/-/issues/1366 (hopefully available around 3.40 if everything works out).

Yes - there are patches in progress for WR to make use of native OS compositor transforms where available to scale videos efficiently in the compositor / hardware (see https://phabricator.services.mozilla.com/D84328). We can make use of the viewport scaling functionality in wayland to achieve the same efficiency savings here as with DirectComposition and CoreAnimation.

Depends on: 1668805
Assignee: nobody → robert.mader
Status: NEW → ASSIGNED
Alias: WR-linux-wayland-compositing
Summary: Implement WebRender native compositor trait for Wayland → [meta] WR Windows Compositing
Summary: [meta] WR Windows Compositing → [meta] WR Wayland Compositing
Depends on: 1695500
Depends on: 1697673
Depends on: 1699754
Depends on: 1699985

Status update: the example compositor now works quite well and can be tested (see bug 1695500). So far Weston is the only compositor able to run it properly - compositor bugs are tracked in bug 1699754.

The main takeaway from implementing the example compositor Wayland backend for me is that:
1: Wayland seems to offer everything needed to map the features used on other platforms
2: We may want to use Wayland APIs directly instead of using the EGL-Wayland platform in order to have more control over buffers etc.

The second point is something for later when the basic functionality stands. However it may make sense to create a little library for that so it can be reused by other projects that want to do similar compositor integration.

Depends on: 1700151
Depends on: 1700684
Depends on: 1707202
Depends on: 1711214
Depends on: 1711224
Depends on: 1711244
Depends on: 1711461
Depends on: 1712472
Depends on: 1713202
Depends on: 1714326
Depends on: 1714771
Depends on: 1716006

Little status update here: after the latest round of patches things seem to run quite stable for me. So I think this is now dogfoodable and if you run recent Gnome (40.1/3.38.5) or KDE (5.22), you're invited to give this a try. Simply switch on gfx.webrender.compositor.force-enabled on latest nightly (of course you also need to run with MOZ_ENABLE_WAYLAND=1).

Depends on: 1716044
Depends on: 1716108

I did some (not very scientific) performance profiling now on my Thinkpad T460p (skylake). What immediately jumps to attention is that that we have heavily reduced GPU utilization when e.g. scrolling a static page. I tested this with intel_gpu_top and both reported utilization as well and frequencies drop by about 30% while RC6 time increased by about 10%. This is on a FullHD screen - on 4K I'd expect even bigger differences. Reducing GPU overhead is the central idea behind this effort, so it's nice to see that it works out.

CPU wise we seem to also consume about the same in FF, however at least Gnome-Shell consumes about twice as much CPU time as normally (still way less than FF). It is somewhat expected that we trade GPU vs CPU time to some extend. However, I think there's quite a bit of optimization potential, both by how FF uses the Wayland protocol and by the implementation in Gnome-Shell.

Power consumption wise I didn't spot a significant difference on my mashine yet. Apparently the lower GPU frequency gets compensated by the extra CPU time or there are other things at play so that the package (I have an integrated Intel GPU) does not power down. This finding is a bit sad as saving energy is the eventual main goal of the whole effort.

Note that I only looked for very obvious and easy to spot differences - nothing below a save 10% change. Also, other hardware may be affected differently. Also, this was only for HW-WR, not SW-WR.

Robert I have a 4K display running off Intel UHD 620 graphics (Whiskey lake). Do you know of a good (scientific) profiling utility for GNOME/Fedora so I could do some testing? Perhaps there's a way of logging intel_gpu_top output to a file.

I see in this blog macOS has a tool to show the area being repainted. Are you aware of such a tool on Linux/Wayland?

Depends on: 1717902

Hi Vincent. Created bug 1717902 for discussions and findings around performance and profiling, lets continue there.

Depends on: 1718569
Depends on: 1718570
Depends on: 1720375
Depends on: 1718688

After bug 1718570 landed I now consider the compositor backend to be on feature parity with the default one. To my knowledge, there's no broken feature (I previously worried about e.g. screenshots, but they work) - and in many situations the compositor backend is already much faster. So while there is outstanding performance work and potentially some bugs will get discovered, we are getting closer to the point where we can enable compositor integration by default - at least for a subset of users using recent versions of their compositors.

@rmader sorry for asking in such a random place, but on my system (Arch Linux, GNOME Wayland, the 2021-07-11 Nightly, AMD GPU), with the compositor enabled I sometimes get rectangular parts of the window flickering with portions from another tab. I don't get along very well with the Bugzilla search, so if that's a known issue, can you please point me to it? Otherwise I'll try to update and file a bug.

(In reply to Laurențiu Nicola from comment #18)

@rmader sorry for asking in such a random place, but on my system (Arch Linux, GNOME Wayland, the 2021-07-11 Nightly, AMD GPU), with the compositor enabled I sometimes get rectangular parts of the window flickering with portions from another tab. I don't get along very well with the Bugzilla search, so if that's a known issue, can you please point me to it? Otherwise I'll try to update and file a bug.

No worries, this probably affected all users until bug 1718570 landed - so thanks for asking.
Despite its title about partial damage (thus better performance), its main achievement was actually to give much better guarantees about correctness. So if you update nightly to the latest version, my expectation would be that what you describe should not happen any more - buffer content should now always be correct (minus Webrender, system compositor or driver bugs of course). If you still see such issues please file a new bug blocking this one.

Depends on: 1720850
Depends on: 1720874
No longer depends on: 1720874
Depends on: 1721036
Depends on: 1721298
Depends on: 1723012
Depends on: 1723940

Hello Robert, what's status of this feature? Should it be enabled by default, do we need to test is somehow or so?
It may be possible to run testsuite on the compositor to compare result, for instance I use locally:

MOZ_ENABLE_WAYLAND=1 ./mach mochitest dom/base/test --setpref widget.wayland.test-workarounds.enabled=true --enable-webrender

or for long version

MOZ_ENABLE_WAYLAND=1 ./mach mochitest dom --setpref widget.wayland.test-workarounds.enabled=true --enable-webrender

you can use --setpref to enable the feature.

Flags: needinfo?(robert.mader)
Depends on: 1725371

(In reply to Martin Stránský [:stransky] (ni? me) from comment #20)

Hello Robert, what's status of this feature? Should it be enabled by default, do we need to test is somehow or so?

I think it's quite close to be ready from the FF side, but as it uncovered a lot of bugs in compositors (some of them listed in bug 1699754). It will still take some time until most/all of them are fixed and reached users - the good thing is that this will benefit other applications as well that try to do similar things. Opened bug 1725372 to track things.

Flags: needinfo?(robert.mader)
Depends on: 1726807
Depends on: 1726954
Depends on: 1725368
Depends on: 1727936
Depends on: 1729233
Depends on: 1729613
Depends on: 1731450
Depends on: 1732051
Depends on: 1735494
Depends on: 1735560
Depends on: 1736205
Depends on: 1737821
Depends on: 1741081
Depends on: 1742990
Depends on: 1743631

On a Gemini Lake (Linux 5.16 and latest mesa git-master) system with Plasma/KWin 5.23.90 and 5.23 Wayland, this seems to be counter-productive:
With gfx.webrender.compositor & gfx.webrender.compositor.force-enabled = false, SoC power consumption while watching YT 720p 60fps VP9 VAAPI is ~4.4W. With both options = true, it's ~5.2W (double checked & sufficiently long enough playback to rule out additional load by buffering etc.). Also, there is more stutter on light web sites while scrolling with it enabled.

Rather vital information I forgot to mention: Used Firefox version was 97.0b3.

Interesting, thanks for sharing! Note: I opened bug 1717902 for performance measurements as this is now a meta bug. For me it would be great to know where that energy is spent: on the CPU or GPU (this backend generally trades less GPU time for slightly more CPU time).

I'd expected video playback to be slightly better (usually one less copy - as long as scanout doesn't kick in, which is more likely when using the default EGL backend, see bug 1743631), however real differences should only show up once bug 1711461 is implemented. As for scrolling: this is something where I'd expect this backend to be much better. However, as it moves a lot of work into the Wayland compositor, performance also depends on the compositor to be optimized for this use-case. AFAIK this is the first and still only client to do this to such an extend so I don't expect Wayland compositor devs to care that much (apart from Gnome, where I'm a dev myself).

Depends on: 1750373

CPU load and CPU core power consumption seem to be unchanged. However, intel_gpu_top reports roughly twice as high GPU load with WR compositor enabled vs. disabled and higher GPU power consumption accordingly.

I can give Sway (latest git-master) a try. I could also give Gnome a try. Slightly OT: However, it slows down that particular low end device too much, there are also continuous frame drops during playback with mpv etc. I suspect there might be some latency reduction active that works too aggressively by default for such a slow GPU. Just a shot in the dark, but that's also the case with KWin's latency reduction (that can be configured via UI to a less aggressive value). Might be worth a bug report (can do that if you think this would help). Sway also has a latency reduction, but it's disabled by default. Yet I also found the values it suggests as safe as too aggressive also with a faster dedicated GPU (frame drops in games with high GPU load).

intel_gpu_top reports roughly twice as high GPU load with WR compositor enabled vs. disabled and higher GPU power consumption accordingly.

To me that sounds like missing optimizations regarding opaque regions and subsurfaces in Kwin. Things should look quite different on Gnome and, more importantly, in theory (on a perfect compositor).
Regarding low end devices: I also test this on an old Thinkpad T400 and get quite good results. It was also reported that this improves performance on e.g. the Pinephone. That was on Gnome (which has dynamic latency reduction based on measurements) and Weston (which like Gnome should have proper optimizations for subsurfaces in place) though. Kwin and Sway are the compositors I know least about.

Anyway, please let's continue any performance related conversation either in bug 1717902 or open a new bug for compositor specific issues (such as "Higher GPU utilization on Kwin" / "Performance on Kwin"). From your report the later sounds like a good idea.

Depends on: 1750443
Depends on: 1750457

I think that bug 1747481 should block this bug. For me, it occurs so often that firefox is unusable with the wayland compositor force enabled, but never occurs without it and therefore I thought it was clearly related. Sorry if this is not as clear as it seems to me.

Depends on: 1747481
Depends on: 1752469

For all interested parties: it may turn out that the approach here is a dead end with regard to the future development of Wayland. Most importantly, offloading composition to Wayland compositors may turn out to not be efficient in a HDR world. Doing composition within Firefox and rely on direct scanout by the Wayland compositor may be a better approach, so the work here stays experimental for the foreseeable future. See https://gitlab.freedesktop.org/pq/color-and-hdr/-/issues/6 for more information.

Depends on: 1752678
Depends on: 1761927
Depends on: 1767795
Depends on: 1770404
Depends on: 1775002
Depends on: 1786064
Depends on: 1791156
Severity: normal → S3
Depends on: 1826789
Depends on: 1828323
You need to log in before you can comment on or make changes to this bug.