Closed Bug 1773596 Opened 2 years ago Closed 2 years ago

Investigate switching away from DirectComposition VirtualSurfaces

Categories

(Core :: Graphics: WebRender, task)

task

Tracking

()

RESOLVED FIXED
109 Branch
Tracking Status
firefox109 --- fixed

People

(Reporter: jrmuizel, Assigned: ahale)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file, 1 obsolete file)

Edge used them but Chrome doesn't. We have some evidence to suggest that using them causes increased power usage during video playback. They're also not as convenient to use as regular DirectComposition surfaces.

Previously we switched to VirtualSurfaces because we saw increased GPU usage without them. We should more thoroughly try to understand what's going to see if we get away without them.

Blocks: video-perf
Assignee: nobody → sotaro.ikeda.g

Sorry, there was a problem with the detection of inactive users. I'm reverting the change.

Assignee: nobody → sotaro.ikeda.g
Blocks: 1782834

I'll be looking into this to learn more about DComp.

Assignee: sotaro.ikeda.g → ahale

At a very high level, we're effectively wanting a code path that no longer calls IDCompositionDevice::CreateVirtualSurface and instead works with IDCompositionDevice::CreateSurface instead.

The main documentation portal for DirectComposition is https://learn.microsoft.com/en-us/windows/win32/directcomp/directcomposition-portal

Specifically, there is some information in https://learn.microsoft.com/en-us/windows/win32/directcomp/composition-surface about the difference between a DC virtual surface compared to a regular surface.

WR talks to native compositors via the Compositor trait - see https://searchfox.org/mozilla-central/source/gfx/wr/webrender/src/composite.rs#1080. In bindings.rs we implement the rust trait - https://searchfox.org/mozilla-central/source/gfx/webrender_bindings/src/bindings.rs#1274. This calls through to various unsafe C functions that forward the calls on to a platform specific compositor implementation written in C++ - see https://searchfox.org/mozilla-central/source/gfx/webrender_bindings/RenderCompositor.cpp#61

The Gecko C++ implementation of the DirectComposition interface can be found in https://searchfox.org/mozilla-central/source/gfx/webrender_bindings/DCLayerTree.cpp and https://searchfox.org/mozilla-central/source/gfx/webrender_bindings/DCLayerTree.h

We currently use the virtual surface API exclusively, the intent of this bug is to:
(1) Add conditional (either via #ifdef or runtime selection) support for using the non-virtual regular surface API
(2) Profile both performance and power usage of these two modes in a number of scenarios, and see if there is a noticeable difference
(3) If there's no noticeable difference, or a benefit to using non-virtual surfaces, enable that, which will allow us to simplify a number of subtle complexities in the current WR compositor and picture cache code.
(4) Scenarios we'll want to test are: simple pages, complex pages with lots of scrolling, video playback in regular and full-screen where compositor surfaces are involved, webgl/canvas pages that create compositor surfaces.

It should be possible to do this work only by changing the C++ Gecko parts (DCLayerTree and possibly callers of that), as the interface is designed to expose enough information about surfaces and tiles that compositor implementations can work without a true Surface abstraction (for example, on Mac, we create a CALayer per tile, and there's no OS-level concept of the overall Surface - referring to the code in https://searchfox.org/mozilla-central/source/gfx/layers/NativeLayerCA.mm and callers will show how such an implementation works).

There are a few potential simplifications / optimizations inside WR that may flow from this if it works out:

  • It's not possible to change the blend mode of a DC surface after creation. However, WR picture cache tiles can change from opaque <-> alpha depending on how content is changing. At the moment we have to create an alpha and opaque virtual surface, and manage when a tile changes which surface it belongs too, which adds some complexity. If we instead have individual surfaces, we can simplify some of the logic in WR that deals with this.

  • The API for updating and rendering a virtual surface doesn't work well with readback of content during rendering (for effects such as backdrop-filter). If we switch to individual tiles, we may be able to remove a surface allocation and copy for tiles that are affected by backdrop-filter effects. This may also open up some similar optimizations for mix-blend-mode effects.

  • In future, we intend to split the compositor trait to separate the concepts of native surface alloc / bind / update / unbind / free from the concepts of compositing that visual tree of surfaces. This will allow us to do things like have DC-allocated surfaces, that can be composited by DC itself, or by WR (e.g. when profiler screenshots are active). This has the potential to make screenshot grabbing much simpler and faster (we currently have to tear down the entire compositor and rasterize those tiles to WR allocated surfaces to allow the WR composite step). If surfaces are individual tiles rather than virtual surfaces, this change may be simpler to implement.

Slightly tangential to this bug, I did some data gathering per advice from :egubler who suggested using Sysinternals handle64.exe to look into the number of DirectComposition objects we're wrangling in normal usage, the bare minimum for a browser window is 7 DxgkCompositionObject when viewing a basic web page with no iframes or similar constructs that could create more DCSurfaces, in a window smaller than our tile size of 1024x1024. Using a window that is 4 tiles in size raises this to 10 (so that's +3 logical surfaces, makes sense). Viewing a matrix chat raises this to 12 (iframes presumably?). In more typical usage for me, there are at least 58 DxgkCompositionObjects for 5 windows, so I think we're basically using a minimum of 7 composition objects per window. These are numbers for clean 'just restarted' windows, in normal usage closing tabs doesn't always reduce the number of DxgkCompositionObjects (e.g. with the 58 objects, I closed 4 windows and all but one tab on the other window, and it was still at 18 objects, restarting the browser dropped it to 12, however I did not wait long enough for it to shut down the tabs for being idle so it may have reclaimed the 6 objects if I waited a little longer).

This bug was originally created to reduce GPU/Power usage of Firefox during video playback. Firefox uses more power than chrome like the following.

Then we thought that virtual Surface usage might increase power/gpu usage. Then we need to reduce GPU usage by removing virtual surface. But with latest D161239, GPU was still used during video playback from Intel Power Gadget. chrome uses 0% GT Utilization.

Before progress D161239, it seems necessary that we need to make clear what make the difference. Removing VirtualSurfaces is not this bugs target, reducing power/gpu usage by removing virtual surface was original target.


Chromium uses one Surface/SwapChain for content rendering except overlays. There is a bug to split it to ui and WebContent
https://bugs.chromium.org/p/chromium/issues/detail?id=1132392

I'd like to disambiguate the goal of this bug, it was to my understanding an exploration of the GPU usage of using virtual surfaces vs non-virtual surfaces for web content surrounding a video surface, but comment #8 makes it clear that the GPU usage of video playback is the core goal.

D161239 changes all video surfaces to non-virtual (regardless of the pref), but its primary aim is reworking the DCLayerTree code to support non-virtual surfaces for web content again, since the web content seemed to be the focus in the initial description of the bug (and refactoring this has some benefit to WebRender architecturally).

Comment #8 seems to change the focus of this bug to solely the behavior of the video surface, but I'm not sure that is actually the source of the GPU power usage - it may be the fact we are using dcomp for web content at all, whereas Chrome only uses dcomp for the video overlay and the window itself is a regular window (this means a lot fewer surfaces that dcomp is juggling internally).

In terms of experiments to try at this point, I have several experiments to explore:
1a. Web content as virtual dcomp surfaces, video as virtual dcomp surfaces (current approach in Firefox).
1b. Web content as virtual dcomp surfaces, video as non-virtual dcomp surfaces (D161239 makes this change).
1c. Web content as non-virtual dcomp surfaces, video as non-virtual dcomp surfaces (D161239 conditionally makes this change based on a pref).
2. Web content without dcomp, video as non-virtual dcomp surfaces (the way Chrome does it).

We haven't explored option 2 yet, it would be a very different change, but probably worth it to try.

I believe D161239 addresses the question that this bug represents, as option 2 would be a very different direction than what was described. But if the bug is purely the goal of reducing power usage then it needs to be blocked by bugs for each possible solution (and D161239 would belong to one of those).

Slight edit - I've verified that there is no use of virtual or non-virtual dcomp surfaces for video and WebGL content, so these are unaffected by D161239.

Revised list of experiments to try:

  • All dcomp - Web content as virtual dcomp surfaces (current approach in Firefox).
    • This is the baseline as we've been shipping it this way for several years.
  • All dcomp - Web content as non-virtual dcomp surfaces (D161239 conditionally makes this change based on a pref).
    • This is the new option in D161239.
  • All dcomp - Web content as SwapChain dcomp surfaces (approach not explored yet).
    • Not explored yet. This was discussed a bit as a possibility, but it's not clear if it would reduce GPU usage.
  • No dcomp - video and WebGL content rendered into web page (Firefox has this option, and I assume it has been benchmarked against option 1a already?).
    • This is already an option in Firefox, it seems suboptimal for video playback efficiency.
  • Partial dcomp - Web content (browser chrome and content iframe) not using dcomp, but using SwapChain dcomp surfaces for video and WebGL (this is how Chrome approaches the problem).
    • This is a variant of the classic way that Firefox worked before dcomp, where a child window is used for each overlay.

The most promising experiment after D161239 did not demonstrate meaningful differences in GPU usage is trying out the Partial dcomp approach, which may be more difficult to mock up.

Attachment #9301920 - Attachment description: WIP: Bug 1773596 - Reimplement non-virtual surface rendering → Bug 1773596 - Reimplement non-virtual surface rendering
Attachment #9301920 - Attachment description: Bug 1773596 - Reimplement non-virtual surface rendering → Bug 1773596 - Reimplement non-virtual surface rendering r?gw
Attachment #9306340 - Attachment is obsolete: true
Pushed by ahale@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3d6276a0fc51 Reimplement non-virtual surface rendering r=jgilbert,gfx-reviewers,gw
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 109 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: