Open Bug 1478125 Opened 6 years ago Updated 2 years ago

Texture upload performance could be better on Windows

Categories

(Core :: Graphics: WebRender, defect, P3)

63 Branch
defect

Tracking

()

Tracking Status
firefox63 --- affected

People

(Reporter: jrmuizel, Assigned: jrmuizel)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

I'm seeing an appreciable amount of time being spent in ANGLE during texture upload.

https://perfht.ml/2v3zmvx - a profile from the bottom of https://creativecluster.lu
Assignee: nobody → kvark
Priority: -- → P3
It looks like this might just be a necessary copy. There may not be anything we can about this without switching to gfx-rs which gives us better control over upload (i.e. we can upload on a different thread)
Depends on: 1474664
In an ideal world, the blob image renderer should be given the space to fill (as opposed to owning that space), and it's going to come from a mapped staging resource owned by the renderer. This would allow us to avoid an extra copy.
Added a link to upstream issue. I've been looking at it more from the perspective of Gecko, and I came to the following conclusion:
  1. Angle logic appears to be correct. It keeps a staging texture, maps it, copies it, and has a heuristic on when it's best to keep it around instead of re-creating.
  2. Gecko appears to be wrong: it constantly re-creates the blob image every frame. This forces Angle to create a staging resource each frame, and also makes the `Map` call slow, since the associated memory pages need to be initialized to zeroes.

Andrew, is this expected for a blob image to be re-created all the time? I was under impression that Gecko now re-uses the same image for blobs.
Flags: needinfo?(aosmond)
I believe on this site the blob images are changing in size which explains why we're recreating them. On digitalocean.com I don't think they should be changing in size.
Flags: needinfo?(aosmond)
Jeff, on "digitalocean.com" I'm only see it entering the createStagingTexture() when the size changes, and it does appear to be changing when scrolling. So the reason for uploads being slow is the same.

Speaking of which, the logic of maintaining the staging area could be much more efficient. There is no requirement to have the staging texture with the same size. It only needs to have the same pixel format. So the most efficient approach would be to have a pool of staging textures of fixed size per format (e.g. 2Kx2K). When the user wants to update a sub-rect, the implementation would pick a rect from one of the pooled staging textures, map it, fill it, and then issue a copy from it.

Of course this is more complicated than Angle wants it to be, so I wonder if we can force this behavior by changing the logic on our side.
I think the proper solution for us, one that doesn't involve dancing around Angle-D3D interop, would be to make sure the image resources (especially blobs) live as long as possible. Make them more persistent, to keep the associated data alive across Gecko - WR - Angle line.
So for large images (where upload performance matters) we should be tiling the blob images. This means that all/most of the images should be 256 byte aligned and not changing in size. I haven't confirmed that digitalocean.com is using tiled blob images but if it's not it sounds like we should make it do so.
Priority: P3 → P4
Jeff -- How bad is this now?  How often will a user (likely) hit this?
Flags: needinfo?(jmuizelaar)
I'm worried that this isn't working great still, but don't have any concrete instances right now. Let's see where we are after blob-re-coord and go from there.
Depends on: blob-recoord
Flags: needinfo?(jmuizelaar)

This should be better now that we're tiling unconditionally. I'll check again.

Flags: needinfo?(jmuizelaar)

This is still not great: https://bit.ly/2RtNzvi. To upload 16 MB we're spending >10ms. My suspicion is that this could be caused by us creating and deleting the blob image every frame.

Assignee: dmalyshau → nobody
Flags: needinfo?(jmuizelaar)

I wanted to check if regular invalidations go fast.

Flags: needinfo?(jmuizelaar)
Assignee: nobody → jmuizelaar
Flags: needinfo?(jmuizelaar)
Blocks: wr-67
No longer blocks: stage-wr-trains
Priority: P4 → P3
Blocks: wr-68
No longer blocks: wr-67
Blocks: wr-70
No longer blocks: wr-68
Blocks: wr-71
No longer blocks: wr-70

Can we take a look at whether this is still an issue?

Flags: needinfo?(nical.bugzilla)

I looked at the test case on an intel Windows laptop and the renderer thread spends 50% of its time on texture uploads (more than two thirds of the total time spent in render()) just like in Jeff's initial report, so it seems that the issue hasn't changed.
I see the same on Linux as well (nvidia and intel).

Flags: needinfo?(nical.bugzilla)
No longer blocks: wr-71
Blocks: texture-upload-perf
No longer blocks: wr-perf
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.