Closed Bug 1676554 Opened 4 years ago Closed 4 years ago

Categories

(Core :: Graphics: WebRender, defect, P3)

defect

Tracking

()

RESOLVED FIXED
86 Branch
Tracking Status
firefox86 --- fixed

People

(Reporter: jrmuizel, Assigned: jimb)

References

(Blocks 3 open bugs)

Details

Attachments

(2 files)

https://share.firefox.dev/3eUrvas

Still lots of time spent in cs_clip_rectangle_frag::run even though this profile includes the work from bug 1675955

Blocks: sw-wr-perf
Severity: -- → S3
Priority: -- → P3

Had a call with jimb and jrmuizel about this one.

There seems to be lots of unexplained things going on here that need further investigation.

  1. The slow path is being invoked in this case as the radii are not uniform - the left corners have 8x8 radii, while the right corners are 0.

This is probably rare enough that it's OK to leave as-is, I think, assuming we solve the other issues below.

  1. There is no segmentation being used on when drawing the clip mask for the canvas image.

This results in a very large mask allocation and slow path. It's not totally clear why this is happening - there are a number of conditions that affect whether WR chooses to segment a primitive. We would need to debug trace this and see what is occurring. Although this deserves investigation, if we can solve (3) below, there won't be any clip mask at all drawn in this case, so it's lower priority unless we see it occurring elsewhere.

  1. There is no stacking context being pushed for the outer div with the border-radii clip.

As I understand it, in most cases when we have an outer div with radii, Gecko will push/pop an enclosing WR stacking context display item with the clip attached to this. If this occurs, WR will render the individual unclipped content items to a surface, then apply that clip (with segmentation) once when drawing the stacking context surface into the parent.

If we can make this happen in this case, it would mean:

  • Fewer clips and clip-chains in the display list (there seem to be ones added on many individual items right now).
  • No clip masks applied to all the content items.
  • No clip item would be specified on the Iframe display element for the canvas items.

This would result in (1) and (2) no longer being relevant, and should be the optimal way to draw this content, I think.

Next steps:

  • jimb will attempt to reduce the test case. Then we can try to better understand when / why Gecko chooses to push a stacking context with the outer clip, and why that's not occurring in this case.
Attached file Reduced test case (deleted) —

That exhibits most of the nice weird behaviors. There are no (HTML) iframes here.

It seems like the slow path cs_rectangle_clip shader is only getting applied when the canvas is partially scrolled out of view. When it's entirely contained within the scrolled element, it doesn't get clipped.

No - it's not when it's scrolled out of view. If it's partially scrolled off the bottom, cs_rectangle_clip_slow isn't being applied. It's when the y range of the canvas overlays the y range of the containing element's rounded top left order.

Do the rounded clips get applied to each item individually as we were seeing in the real world case? If so, it'd be interesting to find out why Gecko isn't creating a stacking context for that clip - I would have expected it to.

Yes - there are several rounded clips incorporated into several clip chains in the example, but the HitTest, Texts, and Iframe of the scrolling area each cite one of the rounded clip chains directly. The only thing they're nested inside is the PushStackingContext-PushReferenceFrame-PushStackingContext triple that encloses almost the entire display list. That triple is only clipped to the viewport.

(In reply to Glenn Watson [:gw] from comment #5)

Do the rounded clips get applied to each item individually as we were seeing in the real world case? If so, it'd be interesting to find out why Gecko isn't creating a stacking context for that clip - I would have expected it to.

There are only a few nsDisplayList subclasses that create DisplayItem::StackingContexts. Aside from ApplyAsyncImageForPipeline (which seems like it should apply to our canvas?), it's only uses of StackingContextHelper that create them.

(In reply to Jim Blandy :jimb from comment #7)

Aside from ApplyAsyncImageForPipeline (which seems like it should apply to our canvas?)

It does, of course - but it builds the inner pipeline for the async image, so it has nothing to do with the clip we're concerned about.

Performance here is substantially improved if we make the approximations in ClipItemKind::get_clip_result more accurate.

That function's job is to decide whether a primitive falls entirely inside a clip, entirely outside it, or has regions of both. If the primitive is known to fall entirely inside the clip, WebRender doesn't bother applying the clip to it. This also saves WR the trouble of rendering the clip mask itself - which is what the cs_rectangle_clip shader is spending a lot of time on when displaying these pages.

Currently, ClipItemKind::get_clip_result handles rounded rectangle clip regions by computing an 'inner rect', a rectangle inset from the rounded rectangle on each side by the relevant rounding radii. This is a correct conservative approximation, but it means that any primitive that lies flush with one of the flat sides of the clip is considered to only be partially within the clip, and thus needs to have the clip mask applied - even though simple rectangle intersection would serve.

If, instead of an 'inner rect', we approximate the rounded rectangle by a rectangle with rectangular chunks taken out of each corner, we are able to recognize more primitives as being fully within the clip, and apply the clip mask less often.

The code the patch affects is already marked:

// TODO(gw): extract_inner_rect_safe is overly
//           conservative for this code!

Under software WebRender, performance is substantially improved if we make the approximations in ClipItemKind::get_clip_result more accurate.

That function's job is to decide whether a primitive falls entirely inside a
clip, entirely outside it, or has regions of both. If the primitive is known to
fall entirely inside the clip, WebRender doesn't bother applying the clip to it.
This also saves WR the trouble of rendering the clip mask itself - which is what
the cs_rectangle_clip shader is spending a lot of time on when displaying these
pages.

Before this change, ClipItemKind::get_clip_result handles rounded rectangle clip
regions by computing an 'inner rect', a rectangle inset from the rounded
rectangle on each side by the relevant rounding radii. This is a correct
conservative approximation, but it means that any primitive that lies flush with
one of the flat sides of the clip is considered to only be partially within the
clip, and thus needs to have the clip mask applied - even though simple
rectangle intersection would serve.

With this change, instead of an 'inner rect', we approximate the rounded
rectangle by a rectangle with rectangular chunks taken out of each corner. This
lets us recognize more primitives as being fully within the clip, and apply the
clip mask less often.

Assignee: nobody → jimb
Status: NEW → ASSIGNED

This patch should improve hardware WebRender's performance as well. It's just that the unnecessary clip mask rendering is more apparent under SWGL.

Pushed by jblandy@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8e682a665334 Decide whether rectangles fall within rounded rectangles more accurately. r=gw
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 86 Branch
QA Whiteboard: [qa-86b-p2]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: