1773596 - Investigate switching away from DirectComposition VirtualSurfaces

Reporter

Description

•

2 years ago

Edge used them but Chrome doesn't. We have some evidence to suggest that using them causes increased power usage during video playback. They're also not as convenient to use as regular DirectComposition surfaces.

Previously we switched to VirtualSurfaces because we saw increased GPU usage without them. We should more thoroughly try to understand what's going to see if we get away without them.

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

2 years ago

Blocks: video-perf

Sotaro Ikeda [:sotaro]

Updated

•

2 years ago

Assignee: nobody → sotaro.ikeda.g

Comment hidden (off-topic)

Suhaib Mujahid [:suhaib]

Comment 2

•

2 years ago

Sorry, there was a problem with the detection of inactive users. I'm reverting the change.

Assignee: nobody → sotaro.ikeda.g

Glenn Watson [:gw]

Updated

•

2 years ago

Blocks: 1782834

Ashley Hale [:ahale]

Assignee

Comment 3

•

2 years ago

I'll be looking into this to learn more about DComp.

Ashley Hale [:ahale]

Assignee

Updated

•

2 years ago

Assignee: sotaro.ikeda.g → ahale

Glenn Watson [:gw]

Comment 4

•

2 years ago

At a very high level, we're effectively wanting a code path that no longer calls IDCompositionDevice::CreateVirtualSurface and instead works with IDCompositionDevice::CreateSurface instead.

The main documentation portal for DirectComposition is https://learn.microsoft.com/en-us/windows/win32/directcomp/directcomposition-portal

Specifically, there is some information in https://learn.microsoft.com/en-us/windows/win32/directcomp/composition-surface about the difference between a DC virtual surface compared to a regular surface.

WR talks to native compositors via the Compositor trait - see https://searchfox.org/mozilla-central/source/gfx/wr/webrender/src/composite.rs#1080. In bindings.rs we implement the rust trait - https://searchfox.org/mozilla-central/source/gfx/webrender_bindings/src/bindings.rs#1274. This calls through to various unsafe C functions that forward the calls on to a platform specific compositor implementation written in C++ - see https://searchfox.org/mozilla-central/source/gfx/webrender_bindings/RenderCompositor.cpp#61

The Gecko C++ implementation of the DirectComposition interface can be found in https://searchfox.org/mozilla-central/source/gfx/webrender_bindings/DCLayerTree.cpp and https://searchfox.org/mozilla-central/source/gfx/webrender_bindings/DCLayerTree.h

We currently use the virtual surface API exclusively, the intent of this bug is to:
(1) Add conditional (either via #ifdef or runtime selection) support for using the non-virtual regular surface API
(2) Profile both performance and power usage of these two modes in a number of scenarios, and see if there is a noticeable difference
(3) If there's no noticeable difference, or a benefit to using non-virtual surfaces, enable that, which will allow us to simplify a number of subtle complexities in the current WR compositor and picture cache code.
(4) Scenarios we'll want to test are: simple pages, complex pages with lots of scrolling, video playback in regular and full-screen where compositor surfaces are involved, webgl/canvas pages that create compositor surfaces.

It should be possible to do this work only by changing the C++ Gecko parts (DCLayerTree and possibly callers of that), as the interface is designed to expose enough information about surfaces and tiles that compositor implementations can work without a true Surface abstraction (for example, on Mac, we create a CALayer per tile, and there's no OS-level concept of the overall Surface - referring to the code in https://searchfox.org/mozilla-central/source/gfx/layers/NativeLayerCA.mm and callers will show how such an implementation works).

Glenn Watson [:gw]

Comment 5

•

2 years ago

There are a few potential simplifications / optimizations inside WR that may flow from this if it works out:

It's not possible to change the blend mode of a DC surface after creation. However, WR picture cache tiles can change from opaque <-> alpha depending on how content is changing. At the moment we have to create an alpha and opaque virtual surface, and manage when a tile changes which surface it belongs too, which adds some complexity. If we instead have individual surfaces, we can simplify some of the logic in WR that deals with this.
The API for updating and rendering a virtual surface doesn't work well with readback of content during rendering (for effects such as backdrop-filter). If we switch to individual tiles, we may be able to remove a surface allocation and copy for tiles that are affected by backdrop-filter effects. This may also open up some similar optimizations for mix-blend-mode effects.
In future, we intend to split the compositor trait to separate the concepts of native surface alloc / bind / update / unbind / free from the concepts of compositing that visual tree of surfaces. This will allow us to do things like have DC-allocated surfaces, that can be composited by DC itself, or by WR (e.g. when profiler screenshots are active). This has the potential to make screenshot grabbing much simpler and faster (we currently have to tear down the entire compositor and rasterize those tiles to WR allocated surfaces to allow the WR composite step). If surfaces are individual tiles rather than virtual surfaces, this change may be simpler to implement.

Ashley Hale [:ahale]

Assignee

Comment 6

•

2 years ago

Slightly tangential to this bug, I did some data gathering per advice from :egubler who suggested using Sysinternals handle64.exe to look into the number of DirectComposition objects we're wrangling in normal usage, the bare minimum for a browser window is 7 DxgkCompositionObject when viewing a basic web page with no iframes or similar constructs that could create more DCSurfaces, in a window smaller than our tile size of 1024x1024. Using a window that is 4 tiles in size raises this to 10 (so that's +3 logical surfaces, makes sense). Viewing a matrix chat raises this to 12 (iframes presumably?). In more typical usage for me, there are at least 58 DxgkCompositionObjects for 5 windows, so I think we're basically using a minimum of 7 composition objects per window. These are numbers for clean 'just restarted' windows, in normal usage closing tabs doesn't always reduce the number of DxgkCompositionObjects (e.g. with the 58 objects, I closed 4 windows and all but one tab on the other window, and it was still at 18 objects, restarting the browser dropped it to 12, however I did not wait long enough for it to shut down the tabs for being idle so it may have reclaimed the 6 objects if I waited a little longer).

Ashley Hale [:ahale]

Assignee

Comment 7

•

2 years ago

Attached file Bug 1773596 - Reimplement non-virtual surface rendering r?gw (deleted) — Details

Sotaro Ikeda [:sotaro]

Comment 8

•

2 years ago

This bug was originally created to reduce GPU/Power usage of Firefox during video playback. Firefox uses more power than chrome like the following.

https://docs.google.com/spreadsheets/d/1RDdjEdTl0tatvG-_vXP-cxRlpt98NDVL9WJCcd3iJ24/edit#gid=1954559447

Then we thought that virtual Surface usage might increase power/gpu usage. Then we need to reduce GPU usage by removing virtual surface. But with latest D161239, GPU was still used during video playback from Intel Power Gadget. chrome uses 0% GT Utilization.

Before progress D161239, it seems necessary that we need to make clear what make the difference. Removing VirtualSurfaces is not this bugs target, reducing power/gpu usage by removing virtual surface was original target.

Chromium uses one Surface/SwapChain for content rendering except overlays. There is a bug to split it to ui and WebContent
https://bugs.chromium.org/p/chromium/issues/detail?id=1132392

Ashley Hale [:ahale]

Assignee

Comment 9

•

2 years ago

I'd like to disambiguate the goal of this bug, it was to my understanding an exploration of the GPU usage of using virtual surfaces vs non-virtual surfaces for web content surrounding a video surface, but comment #8 makes it clear that the GPU usage of video playback is the core goal.

D161239 changes all video surfaces to non-virtual (regardless of the pref), but its primary aim is reworking the DCLayerTree code to support non-virtual surfaces for web content again, since the web content seemed to be the focus in the initial description of the bug (and refactoring this has some benefit to WebRender architecturally).

Comment #8 seems to change the focus of this bug to solely the behavior of the video surface, but I'm not sure that is actually the source of the GPU power usage - it may be the fact we are using dcomp for web content at all, whereas Chrome only uses dcomp for the video overlay and the window itself is a regular window (this means a lot fewer surfaces that dcomp is juggling internally).

In terms of experiments to try at this point, I have several experiments to explore:
1a. Web content as virtual dcomp surfaces, video as virtual dcomp surfaces (current approach in Firefox).
1b. Web content as virtual dcomp surfaces, video as non-virtual dcomp surfaces (D161239 makes this change).
1c. Web content as non-virtual dcomp surfaces, video as non-virtual dcomp surfaces (D161239 conditionally makes this change based on a pref).
2. Web content without dcomp, video as non-virtual dcomp surfaces (the way Chrome does it).

We haven't explored option 2 yet, it would be a very different change, but probably worth it to try.

I believe D161239 addresses the question that this bug represents, as option 2 would be a very different direction than what was described. But if the bug is purely the goal of reducing power usage then it needs to be blocked by bugs for each possible solution (and D161239 would belong to one of those).

Ashley Hale [:ahale]

Assignee

Comment 10

•

2 years ago

Slight edit - I've verified that there is no use of virtual or non-virtual dcomp surfaces for video and WebGL content, so these are unaffected by D161239.

Revised list of experiments to try:

All dcomp - Web content as virtual dcomp surfaces (current approach in Firefox).
- This is the baseline as we've been shipping it this way for several years.
All dcomp - Web content as non-virtual dcomp surfaces (D161239 conditionally makes this change based on a pref).
- This is the new option in D161239.
All dcomp - Web content as SwapChain dcomp surfaces (approach not explored yet).
- Not explored yet. This was discussed a bit as a possibility, but it's not clear if it would reduce GPU usage.
No dcomp - video and WebGL content rendered into web page (Firefox has this option, and I assume it has been benchmarked against option 1a already?).
- This is already an option in Firefox, it seems suboptimal for video playback efficiency.
Partial dcomp - Web content (browser chrome and content iframe) not using dcomp, but using SwapChain dcomp surfaces for video and WebGL (this is how Chrome approaches the problem).
- This is a variant of the classic way that Firefox worked before dcomp, where a child window is used for each overlay.

The most promising experiment after D161239 did not demonstrate meaningful differences in GPU usage is trying out the Partial dcomp approach, which may be more difficult to mock up.

Phabricator Automation

Updated

•

2 years ago

Attachment #9301920 - Attachment description: WIP: Bug 1773596 - Reimplement non-virtual surface rendering → Bug 1773596 - Reimplement non-virtual surface rendering

Ashley Hale [:ahale]

Assignee

Comment 11

•

2 years ago

Attached file Bug 1773596 - Reimplement non-virtual surface rendering r?gw (obsolete) (deleted) — Details

Phabricator Automation

Updated

•

2 years ago

Attachment #9301920 - Attachment description: Bug 1773596 - Reimplement non-virtual surface rendering → Bug 1773596 - Reimplement non-virtual surface rendering r?gw

Phabricator Automation

Updated

•

2 years ago

Attachment #9306340 - Attachment is obsolete: true

Pulsebot

Comment 12

•

2 years ago

Pushed by ahale@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3d6276a0fc51 Reimplement non-virtual surface rendering r=jgilbert,gfx-reviewers,gw

Marian-Vasile Laza

Comment 13

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/3d6276a0fc51

Status: NEW → RESOLVED

Closed: 2 years ago

status-firefox109: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 109 Branch

Bug 1773596 - Reimplement non-virtual surface rendering r?gw 2 years ago Ashley Hale [:ahale] (deleted), text/x-phabricator-request		Details
Bug 1773596 - Reimplement non-virtual surface rendering r?gw 2 years ago Ashley Hale [:ahale] (deleted), text/x-phabricator-request		Details