Poor scrolling performance on westfield.co.nz (very large blob copy)
Categories
(Core :: Graphics, defect)
Tracking
()
People
(Reporter: ali.nz2005, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: perf)
Attachments
(3 files, 1 obsolete file)
(deleted),
image/jpeg
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/html
|
Details |
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0
Steps to reproduce:
Visited this page, clicked middle mouse button and moved mouse downwards.
https://www.westfield.co.nz/newmarket
about:support details:
https://pastebin.com/uYhhVwSP
System specs:
i5 2400
8gb
GTX 970 - 442.59
Win 10 19041.488
Actual results:
Very jerky scrolling with large frame drops; Menu bar resizes very slowly.
Expected results:
Smooth scrolling / menu resize.
Updated•4 years ago
|
Comment 1•4 years ago
|
||
Comment 2•4 years ago
|
||
Can reproduce both with and without WebRender on MBP. Seems like an older bug, present since at least FF70.
WR: https://share.firefox.dev/2E4eqNO
Non-WR: https://share.firefox.dev/2ZDncK5
Updated•4 years ago
|
Comment 3•4 years ago
|
||
@Nical: Looks to be blob related on WR and skia on non-WR, could you take a look at what we can do here?
Comment 4•4 years ago
|
||
Wow, 300+ ms copying bytes (the blob) from a shmem into a rust vec, and about twice as much writing into the shmem on the content side.
The blob image is very large. It looks like we serialize some big source surfaces into the recording.
The slowness appears to be coming from a blob mask
Comment 5•4 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #4)
Wow, 300+ ms copying bytes (the blob) from a shmem into a rust vec, and about twice as much writing into the shmem on the content side.
The blob image is very large. It looks like we serialize some big source surfaces into the recording.
The slowness appears to be coming from a blob mask
I had a look at what's going on it and it looks like this is caused by stacked svg masks. I suspect we hit some raster fallback path in that case and then we're just recording a huge raster image in the blob.
Comment 6•4 years ago
|
||
Updated•4 years ago
|
Updated•4 years ago
|
Comment 7•4 years ago
|
||
Not sure why this causes the long gpu times yet.
Comment 8•4 years ago
|
||
The large gpu times were already there. They are a very large clip task.
Comment 9•4 years ago
|
||
I investigated the large GPU times. I turned off picture caching to make it easy to get a capture. I was able to replay the capture in GPA and RenderDoc.
GPA allows graphing draw calls by their execution time and this showed two clip_image draw calls taking the bulk of the time.
It wasn't immediately obvious why these draw calls should be so much worse than the others. The memory bandwidth used on these calls was lower than other fullscreen draws. Eventually I noticed that the number of rasterized pixels was very high (34 million), and it turns out that for the first clip_image draw call we were rasterizing 12 full screen quads and the second 3 additional full screen quads. The vast majority of these rasterized pixels end up hitting the discard branch of the clip_image shader which avoids the memory bandwidth cost. However, we still get bottlenecked on pixel shader dispatch/execution.
I also discovered that conditional 'discard' causes ANGLE to turn off optimization in the D3D shader compiler. While not great, this probably isn't that bad because are shaders are already preoptimized by glslopt.
Comment 10•4 years ago
|
||
It looks like the 12 instances are from the 12 tiles of the image. I'm guessing we draw each tile over the entire area of the mask and this causes the sad. This sounds somewhat related to bug 1616326 comment 11.
Glenn, can you confirm that this could be what's happening? If it is how can we fix it?
Comment 11•4 years ago
|
||
The tiled clip rendering path is known bad path in WR right now, so this analysis sounds right. I'll look into it this week and see if we can apply a better bounding rect to clip tiles (this should be a reasonable short-term fix, we can look at removing the discard altogether as a longer term improvement).
Updated•4 years ago
|
Comment 12•4 years ago
|
||
Added an optimization for tiled clip masks that are axis-aligned (the common case) in bug #1667707.
On my 5700 machine, this drops the GPU time on this page at 4k from 4.5ms down to 0.9ms.
Updated•4 years ago
|
Comment 13•4 years ago
|
||
The conditions for the CreateSamplingRestrictedDrawable are narrower than I thought.
Comment 14•4 years ago
|
||
You need to have two mask images and one needs to be positioned.
Comment 15•3 years ago
|
||
I took another look at this. There is still some very long times spent in the "WR DL" category, which I suspect to be the blob draw/copy mentioned above? Apart from that, the WR side looks quite reasonable now - there's nothing else obvious in the profile.
Nical, Jeff, would this be a good test candidate for some of the proposed work related to accelerating SVG masks?
Comment 16•2 years ago
|
||
Either the page changed or the removal of CreateSamplingRestrictedDrawable removed the worst of the blob rendering (or both). There's still some blob activity but nothing too intense.
Description
•