Closed Bug 1779952 Opened 2 years ago Closed 2 years ago

Refactor and optimize internal representation of clip state

Categories

(Core :: Graphics: WebRender, task)

task

Tracking

()

RESOLVED FIXED
105 Branch
Tracking Status
firefox105 --- fixed

People

(Reporter: gw, Assigned: gw)

References

(Depends on 1 open bug, Regressed 4 open bugs)

Details

Attachments

(1 file)

No description provided.

This patch refactors how clip chains are internally represented and used
during scene and frame building. The intent is to make clip processing
during frame building more efficient and consistent. Additionally, this
work enables follow ups to cache the result of clip-chain builds between
frame and scene builds.

These changes will significantly reduce the cost of the visibility pass
for the common case when not much content has changed. In this patch,
the public API for clipping remains (mostly) the same, in order to allow
landing and stabilising this work without major changes to Gecko. However,
a longer term goal is to make the public WR clip API more closely match
the internal representation, to reduce work done during scene building.

Clips on a primitive can be categorized into two buckets. The first are
local clips that are specific to the primitive and move with it. These
could essentially be considered part of the definition of the primitive
itself. The second are a hierarchy of clips that apply to one or more
items, and may move independently of the primitive(s) they clip. These
clips are things like scroll regions, stacking context clips, iframe
clip regions etc. On (real world) pages, the clip hierarchy is typically
quite shallow, with a small number of clips that are shared by a large
number of primitives.

Finding clips that are shared between primitives is both required (for
things such as determining which picture cache slice a primitive can
be assigned to, while applying the shared clips during composition), and
also a potential optimization (processing shared clips only once and
caching this clip state similar primitives).

The public clip-chain API has two complexities that make the above
difficult and time consuming for WR to determine. It was possible to
express a clipping hierarchy both via the legacy clip parenting path
(via ClipId definitions) and also via clip-chains (the parent
field of a ClipChain). Second, clip-chains themselves can define
an arbitrary number and ordering of clips. Clips can also implicitly
apply to primitives via parent stacking contexts and iframes, but must
sometimes be removed (when an intermediate surface is created) for
performance reasons.

The new internal representation provided by this patch introduces a
ClipTree structure which is built during scene building by accumulating
the set of clips that apply to a primitive from all explicit and implicit
sources, and grafting this on to the existing clip-tree structure.
This provides WR a simple way to determine which clips are shared between
primitive (by checking ancestry) and reduces the size of the internal
representation (by sharing clips where possible rather than duplicating).
Interning is still used to identify parts of the clip-tree that define
the same clipping state.

Specific changes in this patch:

  • Remove legacy ClipId style parenting support (in conjunction with
    previous patches)
  • Remove the public API ability to specify the clip on a primitive via
    ClipId (it must now be a clip-chain)
  • Remove combined_local_clip_rect from PrimitiveInstance, reducing
    the size of the structure significantly
  • Introduce ClipTree used during frame building, which is created by
    ClipTreeBuilder during scene building
  • Separate out per-primitive clip concept (ClipTreeLeaf) from clipping
    hierarchy (ClipTreeNode). In future, more elements will be moved to
    the ClipTreeLeaf and the state of each ClipTreeNode will be cached)
  • Simplify the logic to disable / remove clips during frame building that
    are applied by parent surface(s)
  • Port hit-testing to be based on ClipTree which is simpler, faster and
    also resolves some edge case correctness bugs
  • Use a simpler and faster method to find shared clips during picture
    cache slice assignment of primitives
  • Update wrench to use the public clip-chain API definition changes

This patch already introduces some real-world optimizations (for example,
displaylist_mutate becomes 6% faster overall), but mostly sets things
up for follow up patches to be able to cache clip-state between frames,
which should result in much larger wins.

Assignee: nobody → gwatson
Status: NEW → ASSIGNED
Blocks: 1780390
Pushed by gwatson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7016bef75f07
Refactor and optimize internal representation of clip state r=nical
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 105 Branch
Regressions: 1781786
Regressions: 1781987
Depends on: 1782001
Regressions: 1782001

== Change summary for alert #34969 (as of Sat, 30 Jul 2022 00:13:51 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
7% displaylist_mutate macosx1015-64-shippable-qr e10s fission stylo webrender 1,857.45 -> 1,727.27
6% displaylist_mutate macosx1015-64-shippable-qr e10s fission stylo webrender 1,854.26 -> 1,739.20

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=34969

Regressions: 1782317
Regressions: 1792197
Regressions: 1799262
Regressions: 1802119
Regressions: 1826983
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: