Closed Bug 1422057 Opened 7 years ago Closed 7 years ago

Avoid duplicating work in ScrollingLayersHelper::BeginItem for adjacent items with equivalent clip chains

Categories

(Core :: Graphics: WebRender, enhancement, P2)

Other Branch
enhancement

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: kats, Assigned: kats)

References

(Blocks 1 open bug)

Details

(Whiteboard: [wr-mvp] [gfx-noted])

Attachments

(7 files)

The profile in bug 1422039 still shows around 14% of time spent in ScrollingLayersHelper::BeginItem. Looking over the code I found another quick fix that we can make that should help trim that down.

We have an early-exit codepath [1] that checks if the display item we are processing has the same ASR and clipchain as the last item. If so, we don't need to do any fiddling with the clip chain on the WR stack and so we can early-exit. However the check uses == on the clip chains, and it would be better to use DisplayItemClipChain::Equal instead as that will cover more cases.

[1] https://searchfox.org/mozilla-central/rev/be78e6ea9b10b1f5b2b3b013f01d86e1062abb2b/gfx/layers/wr/ScrollingLayersHelper.cpp#72
If we can do a better job of collapsing these "equivalent" clip chain items earlier on (during gecko display list construction) that would be even better. It seems to happen quite a lot.
For the record, I checked to see how often this gets hit - after some time browsing on Facebook my one-line change affected the result of around 22% of calls to AreSameInputs. On things like MazeSolver it goes way up (90%+) and other simpler sites it seems to be in the neighbourhood of 30%. Seems like it should be a decent win.
Reftest failures! That was unexpected. I'll investigate.
Status: NEW → ASSIGNED
Priority: P3 → P1
Whiteboard: [gfx-noted][wr-mvp][triage] → [wr-mvp] [gfx-noted]
So this exposed an existing problem. Simplified example:

Mask                asr(A) clipChain(        C  [root])
  LayerEventRegions asr(A) clipChain(D  <A>, C' [root])
  BackgroundColor   asr(A) clipChain(D' <A>, C  [root])

C and C' are ::Equal but not ==. Same for D and D'.

What happens right now is that the Mask item does it's normal pushing of stuff, and then also pushes an out-of-band clip. The EventRegions item should end up defining D as a child of the out-of-band clip, but when it does the lookup of C' in the out-of-band lookup table, it's not == to C, and so it fails. So it ends up defining D as a child of A instead of the out-of-band clip. That's the problem in the current code, which is papered over by the fact that the clips that we push for event regions items don't really matter, so the fact that they're wrong doesn't visibly affect anything.

When we get to the BackgroundColor item, it does find the out-of-band clip because it has C in its clipchain rather than C', and so it generates things fine. With the one-line fix, however, the BackgroundColor ends up with the same clip state as the LayerEventRegions item, which is wrong.

I feel like there's been enough problems caused by this clipchain item equivalencies that it's worth just deduplicating them up front and then not having to worry about it in the rest of the code, so I'll try that.
Just to be clear, the above example makes it clear that any attempt at deduplication purely in the WR code will fail. That's because if we first encounter the clip chain "C", and then encounter the clip chain "D", we cannot properly deduplicate D without also deduplicating its ancestors. So what we would want to end up with is D -> C instead of D -> C' and that means we actually need to allocate a new DisplayItemClipChain instance. Doing that in the WR code after the gecko display list is fully built is madness, so the better approach is to do the deduplication even earlier, while we are building the DisplayItemClipChain items as part of the gecko display list build.

I wrote some patches that I believe accomplish this, try push is at https://treeherder.mozilla.org/#/jobs?repo=try&revision=9196ee685940c3d11361424d1efcb0e3400e00db

In my local checks it seemed like some of the async-scrolling reftests were failing with WR enabled but everything else seemed to be ok. Try push to see if there's any other problems. I haven't investigated the async-scrolling reftest failures yet.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #6)
> In my local checks it seemed like some of the async-scrolling reftests were
> failing with WR enabled but everything else seemed to be ok.

I investigated one of these failures (fixed-pos-scrolled-clip-1.html) and the problem is that without my patches the Background item under the Fixed item has an equivalent clip chain to one previously encountered. Without my patches this is treated as a fresh clip and we create it in WR as a child of the clipchain on the Fixed item, which works. With the patches it becomes deduplicated and so we just use the clip we created in WR already, which doesn't have the correct ancestry. Again fixing servo/webrender#1964 is the correct solution here, so that we don't need to fake the ancestry in WR to try and simulate the intersection of multiple clip chains (one from the Fixed item and one from the Background item).
Turns out that in this case the clipchain of the Background item was a subchain of the clipchain for the Fixed item, so I wrote some code to just extend the Background item's clip chain to the fuller chain from the Fixed item. That seems to fix the failing reftests. It doesn't address the root issue, but it should at least avoid regressing stuff. New try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=31bb68472ed3de93dff2206e07db7f40f8a9a946

After all these workarounds and whatnot I'm not sure if the performance is going to be any better, but I'll run some numbers and see.
So the try push showed some more failures. The problem seems to be that if we define a clip inside a stacking context with a transform, that clip bakes in that transform and so if we later try to reuse it in a different place with a different transform, we can't.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=407bd07a6d52e5c564c7fa175f8a8791f45d8738 has that fixed as well. Again, not sure what the perf impact will be like but now that all the tests are passing I'll do some perf runs and see.
Ok, I got some perf numbers and it seems to be a decent win to apply the patches:

Test                                 |  Without patches     |   With patches
-------------------------------------+----------------------+--------------------
MazeSolver (3 runs)                  |  12, 12, 12          |   11, 11, 12
------- MazeSolver profile ----------+----------------------+--------------------
Fraction of EndTransaction time spent| 1029/2523ms          | 199/2671ms
in ScrollingLayersHelper::BeginItem  |  = ~40.8%            | = ~7.45%
------- MazeSolver profile ----------+----------------------+--------------------
Fraction of BuildDisplayList time    | 420/5596ms           | 148/6560ms
spent in AllocateDisplayItemClipChain| = ~7.5%              | = ~2.26%
-------- Facebook profile -----------+----------------------+--------------------
Fraction of EndTransaction time spent| 87/677ms             | 86/704ms
in ScrollingLayersHelper::BeginItem  |  = ~12.85%           | = ~12.22%
-------- Facebook profile -----------+----------------------+--------------------
Fraction of BuildDisplayList time    | 31/1125ms            | 45/1313ms
spent in AllocateDisplayItemClipChain| = ~2.76%             | = ~3.43%
-------------------------------------+----------------------+--------------------
MotionMark Animometer scores         |                      |
Subtests:                            |   69.96              |  204.13
                                     |  474.52              |  498.67
                                     |  284.63              |  407.45
                                     | 1502.96              | 1641.36
                                     | 1775.36              | 1729.23
                                     |   71.70              |   75.10
                                     |   68.49              |   77.09
                                     |    3.00              |   19.20
                                     |   49.51              |   55.54
Overall:                             |  138.69              |  207.94

I'll clean up the patches and put them up for review.
Comment on attachment 8934651 [details]
Bug 1422057 - Adjust some logging-related things to be more useful.

https://reviewboard.mozilla.org/r/205528/#review212474
Attachment #8934651 - Flags: review?(mstange) → review+
Comment on attachment 8934653 [details]
Bug 1422057 - Extract a local variable.

https://reviewboard.mozilla.org/r/205532/#review212476
Attachment #8934653 - Flags: review?(mstange) → review+
Comment on attachment 8934655 [details]
Bug 1422057 - Add hash function and boilerplate for deduplicating DisplayItemClipChain via std::set.

https://reviewboard.mozilla.org/r/205536/#review212480

::: layout/painting/DisplayItemClipChain.cpp:46
(Diff revision 2)
> +    hash = AddToHash(hash, rect.x, rect.y, rect.width, rect.height);
> +  }

What about the rounded rects in aClip->mClip?
Comment on attachment 8934656 [details]
Bug 1422057 - Deduplicate DisplayItemClipChain instances on creation.

https://reviewboard.mozilla.org/r/205538/#review212482
Attachment #8934656 - Flags: review?(mstange) → review+
Comment on attachment 8934657 [details]
Bug 1422057 - Remove now-unnecessary ::Equals checks.

https://reviewboard.mozilla.org/r/205540/#review212484
Attachment #8934657 - Flags: review?(mstange) → review+
Comment on attachment 8934656 [details]
Bug 1422057 - Deduplicate DisplayItemClipChain instances on creation.

https://reviewboard.mozilla.org/r/205538/#review212486

::: layout/painting/nsDisplayList.cpp:1619
(Diff revision 2)
> +    // An equivalent clip chain item was already created, so let's return that
> +    // instead. Destroy the one we just created.

Maybe add a comment here:

Note that this can cause clip chains from different coordinate systems to collapse into the same clip chain object, because clip chains do not keep track of the reference frame that they were created in.
(In reply to Markus Stange [:mstange] from comment #29)
> 
> What about the rounded rects in aClip->mClip?

I intentionally left them out to keep the hash function fast. Most of the clips will not have rounded rects so adding them to the hash will mostly be a waste. Note that std::set will still do the Equals check on things that hash to the same value, so clips which differ only in their rounded rects will still be separate at that step.

(In reply to Markus Stange [:mstange] from comment #32)
> Maybe add a comment here:
> 
> Note that this can cause clip chains from different coordinate systems to
> collapse into the same clip chain object, because clip chains do not keep
> track of the reference frame that they were created in.

Sounds good, will do.
Comment on attachment 8934652 [details]
Bug 1422057 - Avoid caching clips across stacking contexts with non-identity transforms.

https://reviewboard.mozilla.org/r/205530/#review212488

Doesn't this introduce a slow path for all clips inside any kind of interesting transform? This seems suboptimal. Can we cache based on (referenceFrame, clipChain) instead?
Comment on attachment 8934655 [details]
Bug 1422057 - Add hash function and boilerplate for deduplicating DisplayItemClipChain via std::set.

https://reviewboard.mozilla.org/r/205536/#review212492

::: layout/painting/DisplayItemClipChain.cpp:46
(Diff revision 2)
> +    hash = AddToHash(hash, rect.x, rect.y, rect.width, rect.height);
> +  }

Maybe just add a comment here that says that the Equals method will also compare any rounded rects but that including those in the hash is not worth it.
Attachment #8934655 - Flags: review?(mstange) → review+
Comment on attachment 8934654 [details]
Bug 1422057 - Extend the clipchain of a display item to the ancestor's clipchain if it is a strict superset.

https://reviewboard.mozilla.org/r/205534/#review212478

::: gfx/layers/wr/ScrollingLayersHelper.cpp:476
(Diff revision 2)
> +  // nsDisplayTransform with clip chain A -> B -> nullptr
> +  //   nsDisplayBackgroundColor which clip chain B -> nullptr

When does this case occur?

::: gfx/layers/wr/ScrollingLayersHelper.cpp:504
(Diff revision 2)
> +  MOZ_ASSERT(clipDepth > 0);
> +  while (--clipDepth > 0) {
> +    const DisplayItemClipChain* enclosingClip = mItemClipStack[clipDepth - 1].mChain;
> +    if (!enclosingClip) {
> +      // This is a special case; if an item has a nullptr clipchain it basically
> +      // inherits the clipchain from it's ancestor, so let's skip to that.

typo: it's -> its
Comment on attachment 8934654 [details]
Bug 1422057 - Extend the clipchain of a display item to the ancestor's clipchain if it is a strict superset.

https://reviewboard.mozilla.org/r/205534/#review212498

::: gfx/layers/wr/ScrollingLayersHelper.cpp:484
(Diff revision 2)
> +  // should be intersected for the child display item; because one clip chain
> +  // is a subset of the other the intersection comes out to be clip chain from
> +  // the parent.

Is this true? The "B" clip of the nsDisplayBackgroundColor is with respect to a different reference frame, so its shape might not be the same as the "B" clip of the nsDisplayTransform.
(In reply to Markus Stange [:mstange] from comment #34)
> Doesn't this introduce a slow path for all clips inside any kind of
> interesting transform? This seems suboptimal. Can we cache based on
> (referenceFrame, clipChain) instead?

Yeah that's a good point. I originally thought that this wouldn't be any slower than what we had before but I failed to consider the case with multiple uses of the same DisplayItemClipChain object inside a single transformed stacking context. And yeah, I guess caching like you suggest should help here. I'll work on that.

(In reply to Markus Stange [:mstange] from comment #35)
> Maybe just add a comment here that says that the Equals method will also
> compare any rounded rects but that including those in the hash is not worth
> it.

Will do

(In reply to Markus Stange [:mstange] from comment #36)
> > +  // nsDisplayTransform with clip chain A -> B -> nullptr
> > +  //   nsDisplayBackgroundColor which clip chain B -> nullptr
> 
> When does this case occur?

I ran into this case in fixed-pos-scrolled-clip-1.html (comment 7). I guess it was a Fixed item, not a Transform item. That probably also answers comment 37. I can fix the comment to reflect that (also will fix the which -> with typo).

> ::: gfx/layers/wr/ScrollingLayersHelper.cpp:504
> > +      // This is a special case; if an item has a nullptr clipchain it basically
> > +      // inherits the clipchain from it's ancestor, so let's skip to that.
> 
> typo: it's -> its

Will fix
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #38)
> (In reply to Markus Stange [:mstange] from comment #34)
> > Doesn't this introduce a slow path for all clips inside any kind of
> > interesting transform? This seems suboptimal. Can we cache based on
> > (referenceFrame, clipChain) instead?
> 
> Yeah that's a good point. I originally thought that this wouldn't be any
> slower than what we had before but I failed to consider the case with
> multiple uses of the same DisplayItemClipChain object inside a single
> transformed stacking context. And yeah, I guess caching like you suggest
> should help here. I'll work on that.
> 

So I solved this by turning the clipcache in ScrollingLayersHelper into a stack of clipcaches, and I push a new one onto the cache anytime we enter a new reference frame and pop it off when we exit the reference frame. I believe this should be equivalent to using (referenceFrame, clipChain) as the cache key, if we make the simplifying assumption that we'll never push the same reference frame at multiple points in the display list. I don't know how often that happens, but making this assumption led to a pretty clean implementation so I went with it.

(In reply to Markus Stange [:mstange] from comment #37)
> > +  // should be intersected for the child display item; because one clip chain
> > +  // is a subset of the other the intersection comes out to be clip chain from
> > +  // the parent.
> 
> Is this true? The "B" clip of the nsDisplayBackgroundColor is with respect
> to a different reference frame, so its shape might not be the same as the
> "B" clip of the nsDisplayTransform.

I augmented this code using the above-mentioned clipcache stack so that we never "extend" one clipchain into a clipchain that has a different reference frame.

Try push with these changes and all the other comments addressed: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b9123b80dd2a63c265eda46d698ef2192c6df3b5
Comment on attachment 8934652 [details]
Bug 1422057 - Avoid caching clips across stacking contexts with non-identity transforms.

https://reviewboard.mozilla.org/r/205530/#review215804

This is a very nice solution with a very clear comment to go along with it.
Attachment #8934652 - Flags: review?(mstange) → review+
Comment on attachment 8934654 [details]
Bug 1422057 - Extend the clipchain of a display item to the ancestor's clipchain if it is a strict superset.

https://reviewboard.mozilla.org/r/205534/#review215808
Attachment #8934654 - Flags: review?(mstange) → review+
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4dcb02cef0b7
Adjust some logging-related things to be more useful. r=mstange
https://hg.mozilla.org/integration/autoland/rev/c12c951b5ecc
Avoid caching clips across stacking contexts with non-identity transforms. r=mstange
https://hg.mozilla.org/integration/autoland/rev/78ca58c296fa
Extract a local variable. r=mstange
https://hg.mozilla.org/integration/autoland/rev/b65a65089ffb
Extend the clipchain of a display item to the ancestor's clipchain if it is a strict superset. r=mstange
https://hg.mozilla.org/integration/autoland/rev/e448dd0f4aba
Add hash function and boilerplate for deduplicating DisplayItemClipChain via std::set. r=mstange
https://hg.mozilla.org/integration/autoland/rev/b8931ab9dac9
Deduplicate DisplayItemClipChain instances on creation. r=mstange
https://hg.mozilla.org/integration/autoland/rev/92ff8af42f6f
Remove now-unnecessary ::Equals checks. r=mstange
Depends on: 1430830
Blocks: 1408792
Depends on: 1431326
Depends on: 1432742
No longer blocks: 1422039
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: