Closed Bug 1455921 Opened 7 years ago Closed 6 years ago

Chromium CI checkerboards while scrollng. Janks with WR enabled

Categories

(Core :: Graphics: WebRender, defect, P4)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: mayankleoboy1, Assigned: gw)

References

(Depends on 2 open bugs, Blocks 1 open bug, )

Details

Attachments

(1 file)

Attached file aboutsupport.txt (deleted) β€”
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0
Build ID: 20180421220102

Steps to reproduce:

1. Create new profile.
2. Go to https://ci.chromium.org/p/chromium/g/main/console?limit=500
3. Scroll with mouse, or scroll with the scroll-bar


Actual results:

Page will checkerboard heavily without WR : https://perfht.ml/2vxNErQ
Page will jank with WR enabled: https://perfht.ml/2vxWxBZ


Expected results:

not so, hopefully
Component: Graphics → Graphics: WebRender
Bug 1457466 might help turn some of the WR jank into checkerboarding.
Depends on: 1457466
Here is another profile of the same page with WebRender enabled, but also sampling a number of WebRender-y threads:

https://perfht.ml/2ItbfQv
jrmuizel suggested I P1 this.
Priority: P2 → P1
Depends on: 1462716
Glenn, any ideas for what can be done to make this run better?
Assignee: nobody → gwatson
Flags: needinfo?(gwatson)
I'll need to do a detailed profile to see where the bottleneck is (CPU/GPU) to know how we can fix this one.
Flags: needinfo?(gwatson)
I took a quick profile of the WR side of this page today. Wow!

The GPU time is ~9ms. This is bad, but not enough to cause it to drop below 60fps.

The CPU compositor (GL) time is ~10ms. This is bad - the cause of this is that we end up with ~700 draw calls, and ~146,000 vertices per frame. It's not clear to me from a visual inspection where these are all coming from. The GPU cache size is also much larger than on most pages.

The CPU backend time and display list processing time are the main issues here, ~27ms per frame. These are caused by the extremely high primitive count, which is ~42,000 in this page. For reference, most normal web pages have 500 - 1000 primitives in the display list.

So, the bad news is that a primitive count that high is not easy to solve on the WR side, without significant changes (e.g. incremental display lists). The good news is that once we solve the primitive count problem, all the other issues with draw call count, gpu cache size, vertex count etc will also disappear.

The next step is to inspect the Gecko display list, and see where all the primitives are coming from. Visually, it doesn't seem  like there should be anywhere near that many primitives, and the primitive count is mostly static even when scrolling. Perhaps there is some kind of culling bug in Gecko where the display list is containing every single element on the page or something simple like that? I'm not sure who would be best to look at the display list - maybe Markus?
Flags: needinfo?(mstange)
> grep "item: Clip(" scene-1-0.ron | wc -l 
> 5214

> grep "item: ClipChain" scene-1-0.ron | wc -l
> 10179

> grep "item: Rectangle" scene-1-0.ron | wc -l
> 26975

> grep "item: Border" scene-1-0.ron | wc -l
> 13565

> grep "item: Image" scene-1-0.ron | wc -l
> 110

> grep "item: PushStacking" scene-1-0.ron | wc -l
> 63

If you look at the page using the dev tools you will see that even though the "long vertical rectangles" look like individual rectangles they are actually made of plenty of smaller rectangles (one per row). Behind the green things there are white and grey stripes that are also made of a gazillion of small 18x22-ish pixels spanning the entire page. The html of page is 12Mo

I think that it's actually just the page that contains a brutal amount of rectangles and that gecko isn't doing something bad this time.
> of small 18x22-ish pixels

*of small 18x22-ish pixel rectangles
Summary: Page checkerboards while scrollng. Janks with WR enabled → Chromium CI checkerboards while scrollng. Janks with WR enabled
Bug 1485420 might help reduce some of the clip stuff.
Depends on: 1485420
Thanks nical. I don't think there are any quick fixes here, we'll just have to get better at dealing with lots of elements.

In the profile, the scene building part looks easier to optimize than the frame building part, because it spends a lot of time in Vec buffer reallocation. So maybe picking a better initial capacity could already help there. But optimizing scene building won't help with scrolling.

As for frame building, most of the time seems to be spent inside PrimitiveStore::prepare_prim_for_render itself, or in functions that have been inlined into it. It might be valuable to use a profiler that can resolve inlined functions, like perf on linux.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #9)
> Bug 1485420 might help reduce some of the clip stuff.

I think a lot of these clips have rounded corners, and bug 1485420 won't help with those.
Flags: needinfo?(mstange)
(In reply to Markus Stange [:mstange] from comment #10)
> I don't think there are any quick fixes here

"Caching scrolled layers" is something that has been brought up before, and it would help here, but I don't think anybody has fleshed out the specifics of how it would be implemented.
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to Markus Stange [:mstange] from comment #11) 
> "Caching scrolled layers" is something that has been brought up before, and
> it would help here, but I don't think anybody has fleshed out the specifics
> of how it would be implemented.

Bug 1480317 has a rough description of how this could work.

(In reply to Markus Stange [:mstange] from comment #10)
> I think a lot of these clips have rounded corners, and bug 1485420 won't
> help with those.

Maybe adding a RoundedRect display item would be a good idea. Chrome has this concept instead of just having a Rect clipped roundedly. As we talked yesterday, this is probably a useful concept for non-webrender too.
We can't release this to the field, but we can let this ride to beta.
Priority: P1 → P2
I checked this again with current Gecko + WR. As several people above have already noted, this is basically caused by the extreme primitive count.

The GPU time is actually quite reasonable on my machine - 68 draw calls, 84k vertices and ~9ms GPU time, given the content.

The slowness in WR comes from the serialization / deserialization / processing of 45,518 primitives per frame.

I don't think there's much actionable things we can do in WR with this, for now.

Although there are various CPU optimizations we can make to the WR CPU code, we need to somehow transfer less data from Gecko -> WR for good performance here.

Given that, should we un-assign this for now or perhaps move to someone who can look at the DL side of things in Gecko?
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(matt.woodrow)
In this profile https://perfht.ml/2N5Cbsb it doesn't look serialization is as big a problem as other places in WebRender. E.g. I see ~16ms in serialization vs 110ms in RenderBackend. Do your profiles look different?
Flags: needinfo?(jmuizelaar) → needinfo?(gwatson)
What I see in the WR on-screen profiler is:

CPU backend time:
 mean = 24ms
 max = 43ms

DisplayList IPC:
 mean = 15ms
 max = 146ms

The mean time in display list IPC is low (relative), but that's because we only receive one every X number of frames. When we do receive a display list, it tends to be ~50 - 100ms per DL.

Either way, both backend and display list time seem to need work to handle this case.

(It just occurred to me reading this back that maybe the difference is that DL IPC time profile counter might be including other stuff [sync blobs?] which might explain why I'm seeing such a high number here, compared to what you see in a sampler profile).
Flags: needinfo?(gwatson)
We have code in gecko that detects when we have a clip (rounded or otherwise) that wraps just a background color, and we optimize the clip away and draw the colour with the intersection of the fill area and the clip.

I think we could do something similar here, if we had a RoundedRect display item.

That should reduce the complexity of the clips sent across to WR, and hopefully make things a bit faster.
Flags: needinfo?(matt.woodrow)
I tested this with Matt's recent CPU optimizations, and this is much better. It scrolls at ~30fps on my machine. It's still not great, but I think this is quite shippable now, given the crazy primitive counts.
with latest nightly : http://bit.ly/2DBGuHg
Priority: P2 → P4
Depends on: 1505942
new profile : https://perfht.ml/2yZSv4q

This scrolls smoothly on my machine at 60 fps now, with picture caching enabled by default.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED

I am not so sure this is fixed. Here is my profile with WR+picture caching. The first half of the profile is with scrolling with the scroll-bar. The rest is scrolling with the touchpad. There was checkerboarding.

https://perfht.ml/2FATP2J

I am reopening this bug. But feel free to close it again if you think that this page is sufficiently improved.

Status: RESOLVED → REOPENED
Flags: needinfo?(gwatson)
Resolution: FIXED → ---

I think there's two parts to this bug (1) The checkerboarding while scrolling (this is related to scene building, and perhaps blob rasterization time) and (2) The jank while scrolling with WR.

(2) should be (largely) fixed by the picture caching. (1) is still not great - but I'm not sure if it's better / worse than non-WR.

It's probably fine to leave it open for now, until we spend some more time on (1), but this is a fairly low priority for now.

Flags: needinfo?(gwatson)
Assignee: gwatson → nobody

This bug was about the jank, and that's fixed, so let's keep this closed. If the checkerboarding is really bad (i.e. worse with WR than without WR) then we can investigate that separately in a new bug. But getting rid of checkerboarding entirely is not really feasible.

Assignee: nobody → gwatson
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: