Closed Bug 1521015 Opened 6 years ago Closed 4 years ago

https://neon-bindings.com/ performance is not great

Categories

(Core :: Graphics: WebRender, defect, P3)

Unspecified
All
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: sirak2010, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: perf)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0

Steps to reproduce:

Config gfx.webrender.all;true on Nighly Firefox

https://neon-bindings.com/

Actual results:

Very Slow response and lags when scroll

Expected results:

Should work as smooth as chrome

Assignee: moz_en-gb → nobody
Component: en-GB / English (United Kingdom) → Untriaged
Product: Mozilla Localizations → Firefox
QA Contact: moz_en-gb
Component: Untriaged → Graphics: WebRender
Product: Firefox → Core

It might be related to picture caching. It is re-enabled by Bug 1518405.

sirak2010, can you test by disabling picture caching? It could be disabled by pref gfx.webrender.picture-caching:false.

Flags: needinfo?(sirak2010)

I have tried with gfx.webrender.picture-caching:false and the issue is still there, actually its very responsive when i revert back to gfx.webrender.all;false

Flags: needinfo?(sirak2010)

Debian Testing, MBP
The upper part of the page is noticeable harder to scroll. It's fast after removing the blinking/shining "NEON" logo in the center.

"GET STARTED" at the bottom is bad as well. Scrolling is fast without WebRender, but there is huge white checkerboarding at the top and bottom of the page.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: perf
OS: Unspecified → All

This problem at the top is caused by a bunch of text shadow. Thoughts Dan or Glenn?

Flags: needinfo?(gwatson)
Flags: needinfo?(dglastonbury)
Priority: -- → P3

Wow, this is a very odd GPU profile.

There are ~14 targets allocated for drawing clip masks. We'll need to find out what's causing those clip masks to be generated to fix this issue.

It runs smoothly for me with picture caching is enabled and the cache is valid, but it's invalidating and redrawing those huge clip masks very often.

Flags: needinfo?(gwatson)
Assignee: nobody → gwatson

This also seems to have a large number of color targets, most of which don't seem to get anything rendered, which suggests something going wrong during pass assignment of render tasks.

I'll have a look at tracing the pass assignment logic.

Flags: needinfo?(dglastonbury)

There seem to be multiple unexpected things happening here, but one in particular is that the bounding rect of the NEON text run supplied by Gecko is 1130 x 797.

This is much bigger than the actual bounds of those glyphs, which in combination with a text shadow with 1000px blur radius is resulting in extremely large blur targets.

Still investigating to find why the bounds are that large, and why so many clip masks are being generated.

I worked out what's happening with the apparently empty passes, I think. We have no GPU markers for downscale render tasks, so they are not appearing in the GPU profile. I'll fix that now.

It turns out that the text runs in shadows provided by gecko already have a blur inflation applied, so we are ending up drawing text shadows with 9x the original bounding rect, rather than 3x. I'll work on a patch to skip blur inflation for text shadows in WR. That should improve this page significantly, although we will also need to fix the extra generation of redundant clip masks.

I confirmed that fixing the text over inflation brings the GPU time down from ~60ms to ~25ms on my machine. The vast majority of the remaining time is in the aforementioned clip masks.

A patch to fix the over inflated text shadows is https://bugzilla.mozilla.org/show_bug.cgi?id=1522395.

Depends on: 1522395

https://bugzilla.mozilla.org/show_bug.cgi?id=1522758 improves this page significantly.

It's still quite slow at times, especially the blur at the bottom. It might be good enough to ship now with these two patches, but it's still not good.

I'll do another profile once these two patches land and see what the next biggest issue is.

Depends on: 1522758

Interestingly, although the blurs are not super fast when picture caching tiles are invalidated, they don't seem to be the main cause of the noticeable slowness now.

There seems to be (at least on my machine) a GPU stall of ~100ms at times, that seems unrelated to the blur itself. It seems to occur when the blur targets either become visible or invisible.

I suspect we might be causing a stall during render target array resizing or allocation - will investigate further.

My analysis above was wrong - the GPU time spike occurs when invoking the blur shader on a small target with very large blur radius.

The patch in https://bugzilla.mozilla.org/show_bug.cgi?id=1523228 addresses this.

Although it's still not great, performance wise, I think it's shippable once this and the patches above land in m-c. There are lots of potential wins to be had once we have time - specifically:

  1. Reduce the size of the area we need to blur.
  2. Optimize the blur shader to reduce number of texture fetches.
  3. Reduce the number of redundant clips that get drawn for multiple text shadows.
Depends on: 1523228

Although this is still not great when the neon sign is animating, I think it's probably shippable with the patches currently in m-c (may not have hit nightly yet).

It's vastly better than it was in the original bug report. Jeff, Matt, could you take a look when you have a moment and see if you think it's shippable now, or if it needs more work?

I have ideas on how to fix the remaining issues - it's likely to be somewhere from 2-5 days work to implement all of those ideas.

Flags: needinfo?(matt.woodrow)
Flags: needinfo?(jmuizelaar)

Yeah. I think it's reasonable to move this to P4

Flags: needinfo?(jmuizelaar)
Priority: P3 → P4
Flags: needinfo?(matt.woodrow)
Assignee: gwatson → nobody
Summary: very unreponsive on https://neon-bindings.com/ Firefox Nightly 66.0a1 (2019-01-17) (64-bit, with gfx.webrender.all;true → https://neon-bindings.com/ performance is not great
Blocks: wr-67
No longer blocks: stage-wr-trains
Priority: P4 → P3

I'm thinking of adding a shadow stack composite op which would let us handle N shadows on the same primitive without redoing all of the intermediate work N times when possible.

I think that something that would only detect consecutive shadow items would be enough to cover the most common use case, for example: PushShadow, PushShadow, PushShadow, TextRun, PushShadow, PushShadow, Image, PopAllShadows would generate a single shadow stack for the text run and two shadow stacks for the image (whereas a smarter scheme could generate only one).

This seems like a good idea.

FWIW, we should not need to detect consecutive items. The text shadows are already grouped in layout we explicitly PushShadow for each member of the group.

Depends on: 1533735
Blocks: wr-68
No longer blocks: wr-67
Blocks: wr-70
No longer blocks: wr-68
Blocks: wr-perf
No longer blocks: wr-70

This looks fine now.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.