Closed Bug 1663387 Opened 4 years ago Closed 2 years ago

Poor scrolling performance on westfield.co.nz (very large blob copy)

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: ali.nz2005, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(3 files, 1 obsolete file)

Untitled-1.jpg 4 years ago Ali (deleted), image/jpeg		Details
Partially reduced version 4 years ago Jeff Muizelaar [:jrmuizel] (deleted), text/html		Details
This patch fixes the long recording times but causes large gpu times 4 years ago Jeff Muizelaar [:jrmuizel] (deleted), patch		Details \| Diff \| Splinter Review
A more reduced version 4 years ago Jeff Muizelaar [:jrmuizel] (deleted), text/html		Details

Ali

Reporter

Description

•

4 years ago

Attached image Untitled-1.jpg (deleted) — Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0

Steps to reproduce:

Visited this page, clicked middle mouse button and moved mouse downwards.
https://www.westfield.co.nz/newmarket

about:support details:
https://pastebin.com/uYhhVwSP

System specs:
i5 2400
8gb
GTX 970 - 442.59
Win 10 19041.488

Actual results:

Very jerky scrolling with large frame drops; Menu bar resizes very slowly.

Expected results:

Smooth scrolling / menu resize.

Alice0775 White

Updated

•

4 years ago

Component: Untriaged → Graphics: WebRender

Product: Firefox → Core

Mayank Bansal

Comment 1

•

4 years ago

https://share.firefox.dev/3icV5Ja

Miko Mynttinen

Comment 2

•

4 years ago

Can reproduce both with and without WebRender on MBP. Seems like an older bug, present since at least FF70.

WR: https://share.firefox.dev/2E4eqNO
Non-WR: https://share.firefox.dev/2ZDncK5

Component: Graphics: WebRender → Graphics

Miko Mynttinen

Updated

•

4 years ago

Blocks: gfx-triage

Severity: -- → S3

Keywords: perf

Kris Taeleman (:ktaeleman)

Comment 3

•

4 years ago

@Nical: Looks to be blob related on WR and skia on non-WR, could you take a look at what we can do here?

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Comment 4

•

4 years ago

Wow, 300+ ms copying bytes (the blob) from a shmem into a rust vec, and about twice as much writing into the shmem on the content side.

The blob image is very large. It looks like we serialize some big source surfaces into the recording.
The slowness appears to be coming from a blob mask

Blocks: blob-perf

Summary: Poor scrolling performance on westfield.co.nz → Poor scrolling performance on westfield.co.nz (very large blob copy)

Jeff Muizelaar [:jrmuizel]

Comment 5

•

4 years ago

(In reply to Nicolas Silva [:nical] from comment #4)

Wow, 300+ ms copying bytes (the blob) from a shmem into a rust vec, and about twice as much writing into the shmem on the content side.

The blob image is very large. It looks like we serialize some big source surfaces into the recording.
The slowness appears to be coming from a blob mask

I had a look at what's going on it and it looks like this is caused by stacked svg masks. I suspect we hit some raster fallback path in that case and then we're just recording a huge raster image in the blob.

Jeff Muizelaar [:jrmuizel]

Comment 6

•

4 years ago

Attached file Partially reduced version (obsolete) (deleted) — Details

Jeff Muizelaar [:jrmuizel]

Updated

•

4 years ago

Assignee: nobody → jmuizelaar

Jeff Muizelaar [:jrmuizel]

Updated

•

4 years ago

No longer blocks: gfx-triage

Jeff Muizelaar [:jrmuizel]

Comment 7

•

4 years ago

Attached patch This patch fixes the long recording times but causes large gpu times (deleted) — Details — Splinter Review

Not sure why this causes the long gpu times yet.

Jeff Muizelaar [:jrmuizel]

Comment 8

•

4 years ago

The large gpu times were already there. They are a very large clip task.

Jeff Muizelaar [:jrmuizel]

Comment 9

•

4 years ago

I investigated the large GPU times. I turned off picture caching to make it easy to get a capture. I was able to replay the capture in GPA and RenderDoc.

GPA allows graphing draw calls by their execution time and this showed two clip_image draw calls taking the bulk of the time.

It wasn't immediately obvious why these draw calls should be so much worse than the others. The memory bandwidth used on these calls was lower than other fullscreen draws. Eventually I noticed that the number of rasterized pixels was very high (34 million), and it turns out that for the first clip_image draw call we were rasterizing 12 full screen quads and the second 3 additional full screen quads. The vast majority of these rasterized pixels end up hitting the discard branch of the clip_image shader which avoids the memory bandwidth cost. However, we still get bottlenecked on pixel shader dispatch/execution.

I also discovered that conditional 'discard' causes ANGLE to turn off optimization in the D3D shader compiler. While not great, this probably isn't that bad because are shaders are already preoptimized by glslopt.

Jeff Muizelaar [:jrmuizel]

Comment 10

•

4 years ago

It looks like the 12 instances are from the 12 tiles of the image. I'm guessing we draw each tile over the entire area of the mask and this causes the sad. This sounds somewhat related to bug 1616326 comment 11.

Glenn, can you confirm that this could be what's happening? If it is how can we fix it?

Flags: needinfo?(gwatson)

Glenn Watson [:gw]

Comment 11

•

4 years ago

The tiled clip rendering path is known bad path in WR right now, so this analysis sounds right. I'll look into it this week and see if we can apply a better bounding rect to clip tiles (this should be a reasonable short-term fix, we can look at removing the discard altogether as a longer term improvement).

Assignee: jmuizelaar → gwatson

Flags: needinfo?(gwatson)

Matt Woodrow (:mattwoodrow)

Updated

•

4 years ago

Attachment #9178004 - Attachment is patch: true

Attachment #9178004 - Attachment mime type: application/octet-stream → text/plain

Glenn Watson [:gw]

Comment 12

•

4 years ago

Added an optimization for tiled clip masks that are axis-aligned (the common case) in bug #1667707.

On my 5700 machine, this drops the GPU time on this page at 4k from 4.5ms down to 0.9ms.

Depends on: 1667707

Nicolas Silva [:nical]

Updated

•

4 years ago

Flags: needinfo?(nical.bugzilla)

Jeff Muizelaar [:jrmuizel]

Comment 13

•

4 years ago

Attached file A more reduced version (deleted) — Details

The conditions for the CreateSamplingRestrictedDrawable are narrower than I thought.

Attachment #9176973 - Attachment is obsolete: true

Jeff Muizelaar [:jrmuizel]

Comment 14

•

4 years ago

You need to have two mask images and one needs to be positioned.

Glenn Watson [:gw]

Comment 15

•

3 years ago

I took another look at this. There is still some very long times spent in the "WR DL" category, which I suspect to be the blob draw/copy mentioned above? Apart from that, the WR side looks quite reasonable now - there's nothing else obvious in the profile.

Nical, Jeff, would this be a good test candidate for some of the proposed work related to accelerating SVG masks?

Assignee: gwatson → nobody

Flags: needinfo?(nical.bugzilla)

Flags: needinfo?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Updated

•

3 years ago

Depends on: 1746662

Flags: needinfo?(jmuizelaar)

Nicolas Silva [:nical]

Comment 16

•

2 years ago

Either the page changed or the removal of CreateSamplingRestrictedDrawable removed the worst of the blob rendering (or both). There's still some blob activity but nothing too intense.

Status: UNCONFIRMED → RESOLVED

Closed: 2 years ago

Flags: needinfo?(nical.bugzilla)

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.