Closed Bug 1663387 Opened 4 years ago Closed 2 years ago

Poor scrolling performance on westfield.co.nz (very large blob copy)

Categories

(Core :: Graphics, defect)

80 Branch
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: ali.nz2005, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(3 files, 1 obsolete file)

Attached image Untitled-1.jpg (deleted) —

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0

Steps to reproduce:

Visited this page, clicked middle mouse button and moved mouse downwards.
https://www.westfield.co.nz/newmarket

about:support details:
https://pastebin.com/uYhhVwSP

System specs:
i5 2400
8gb
GTX 970 - 442.59
Win 10 19041.488

Actual results:

Very jerky scrolling with large frame drops; Menu bar resizes very slowly.

Expected results:

Smooth scrolling / menu resize.

Component: Untriaged → Graphics: WebRender
Product: Firefox → Core

Can reproduce both with and without WebRender on MBP. Seems like an older bug, present since at least FF70.

WR: https://share.firefox.dev/2E4eqNO
Non-WR: https://share.firefox.dev/2ZDncK5

Component: Graphics: WebRender → Graphics
Blocks: gfx-triage
Severity: -- → S3
Keywords: perf

@Nical: Looks to be blob related on WR and skia on non-WR, could you take a look at what we can do here?

Flags: needinfo?(nical.bugzilla)

Wow, 300+ ms copying bytes (the blob) from a shmem into a rust vec, and about twice as much writing into the shmem on the content side.

The blob image is very large. It looks like we serialize some big source surfaces into the recording.
The slowness appears to be coming from a blob mask

Blocks: blob-perf
Summary: Poor scrolling performance on westfield.co.nz → Poor scrolling performance on westfield.co.nz (very large blob copy)

(In reply to Nicolas Silva [:nical] from comment #4)

Wow, 300+ ms copying bytes (the blob) from a shmem into a rust vec, and about twice as much writing into the shmem on the content side.

The blob image is very large. It looks like we serialize some big source surfaces into the recording.
The slowness appears to be coming from a blob mask

I had a look at what's going on it and it looks like this is caused by stacked svg masks. I suspect we hit some raster fallback path in that case and then we're just recording a huge raster image in the blob.

Attached file Partially reduced version (obsolete) (deleted) —
Assignee: nobody → jmuizelaar
No longer blocks: gfx-triage

Not sure why this causes the long gpu times yet.

The large gpu times were already there. They are a very large clip task.

I investigated the large GPU times. I turned off picture caching to make it easy to get a capture. I was able to replay the capture in GPA and RenderDoc.

GPA allows graphing draw calls by their execution time and this showed two clip_image draw calls taking the bulk of the time.

It wasn't immediately obvious why these draw calls should be so much worse than the others. The memory bandwidth used on these calls was lower than other fullscreen draws. Eventually I noticed that the number of rasterized pixels was very high (34 million), and it turns out that for the first clip_image draw call we were rasterizing 12 full screen quads and the second 3 additional full screen quads. The vast majority of these rasterized pixels end up hitting the discard branch of the clip_image shader which avoids the memory bandwidth cost. However, we still get bottlenecked on pixel shader dispatch/execution.

I also discovered that conditional 'discard' causes ANGLE to turn off optimization in the D3D shader compiler. While not great, this probably isn't that bad because are shaders are already preoptimized by glslopt.

It looks like the 12 instances are from the 12 tiles of the image. I'm guessing we draw each tile over the entire area of the mask and this causes the sad. This sounds somewhat related to bug 1616326 comment 11.

Glenn, can you confirm that this could be what's happening? If it is how can we fix it?

Flags: needinfo?(gwatson)

The tiled clip rendering path is known bad path in WR right now, so this analysis sounds right. I'll look into it this week and see if we can apply a better bounding rect to clip tiles (this should be a reasonable short-term fix, we can look at removing the discard altogether as a longer term improvement).

Assignee: jmuizelaar → gwatson
Flags: needinfo?(gwatson)
Attachment #9178004 - Attachment is patch: true
Attachment #9178004 - Attachment mime type: application/octet-stream → text/plain

Added an optimization for tiled clip masks that are axis-aligned (the common case) in bug #1667707.

On my 5700 machine, this drops the GPU time on this page at 4k from 4.5ms down to 0.9ms.

Depends on: 1667707
Flags: needinfo?(nical.bugzilla)
Attached file A more reduced version (deleted) —

The conditions for the CreateSamplingRestrictedDrawable are narrower than I thought.

Attachment #9176973 - Attachment is obsolete: true

You need to have two mask images and one needs to be positioned.

I took another look at this. There is still some very long times spent in the "WR DL" category, which I suspect to be the blob draw/copy mentioned above? Apart from that, the WR side looks quite reasonable now - there's nothing else obvious in the profile.

Nical, Jeff, would this be a good test candidate for some of the proposed work related to accelerating SVG masks?

Assignee: gwatson → nobody
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(jmuizelaar)
Depends on: 1746662
Flags: needinfo?(jmuizelaar)

Either the page changed or the removal of CreateSamplingRestrictedDrawable removed the worst of the blob rendering (or both). There's still some blob activity but nothing too intense.

Status: UNCONFIRMED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(nical.bugzilla)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: