Refactor and optimize internal representation of clip state
Categories
(Core :: Graphics: WebRender, task)
Tracking
()
Tracking | Status | |
---|---|---|
firefox105 | --- | fixed |
People
(Reporter: gw, Assigned: gw)
References
(Depends on 1 open bug, Regressed 4 open bugs)
Details
Attachments
(1 file)
(deleted),
text/x-phabricator-request
|
Details |
Assignee | ||
Comment 1•2 years ago
|
||
This patch refactors how clip chains are internally represented and used
during scene and frame building. The intent is to make clip processing
during frame building more efficient and consistent. Additionally, this
work enables follow ups to cache the result of clip-chain builds between
frame and scene builds.
These changes will significantly reduce the cost of the visibility pass
for the common case when not much content has changed. In this patch,
the public API for clipping remains (mostly) the same, in order to allow
landing and stabilising this work without major changes to Gecko. However,
a longer term goal is to make the public WR clip API more closely match
the internal representation, to reduce work done during scene building.
Clips on a primitive can be categorized into two buckets. The first are
local clips that are specific to the primitive and move with it. These
could essentially be considered part of the definition of the primitive
itself. The second are a hierarchy of clips that apply to one or more
items, and may move independently of the primitive(s) they clip. These
clips are things like scroll regions, stacking context clips, iframe
clip regions etc. On (real world) pages, the clip hierarchy is typically
quite shallow, with a small number of clips that are shared by a large
number of primitives.
Finding clips that are shared between primitives is both required (for
things such as determining which picture cache slice a primitive can
be assigned to, while applying the shared clips during composition), and
also a potential optimization (processing shared clips only once and
caching this clip state similar primitives).
The public clip-chain API has two complexities that make the above
difficult and time consuming for WR to determine. It was possible to
express a clipping hierarchy both via the legacy clip parenting path
(via ClipId
definitions) and also via clip-chains (the parent
field of a ClipChain
). Second, clip-chains themselves can define
an arbitrary number and ordering of clips. Clips can also implicitly
apply to primitives via parent stacking contexts and iframes, but must
sometimes be removed (when an intermediate surface is created) for
performance reasons.
The new internal representation provided by this patch introduces a
ClipTree
structure which is built during scene building by accumulating
the set of clips that apply to a primitive from all explicit and implicit
sources, and grafting this on to the existing clip-tree structure.
This provides WR a simple way to determine which clips are shared between
primitive (by checking ancestry) and reduces the size of the internal
representation (by sharing clips where possible rather than duplicating).
Interning is still used to identify parts of the clip-tree that define
the same clipping state.
Specific changes in this patch:
- Remove legacy
ClipId
style parenting support (in conjunction with
previous patches) - Remove the public API ability to specify the clip on a primitive via
ClipId
(it must now be a clip-chain) - Remove
combined_local_clip_rect
fromPrimitiveInstance
, reducing
the size of the structure significantly - Introduce
ClipTree
used during frame building, which is created by
ClipTreeBuilder
during scene building - Separate out per-primitive clip concept (
ClipTreeLeaf
) from clipping
hierarchy (ClipTreeNode
). In future, more elements will be moved to
theClipTreeLeaf
and the state of eachClipTreeNode
will be cached) - Simplify the logic to disable / remove clips during frame building that
are applied by parent surface(s) - Port hit-testing to be based on
ClipTree
which is simpler, faster and
also resolves some edge case correctness bugs - Use a simpler and faster method to find shared clips during picture
cache slice assignment of primitives - Update wrench to use the public clip-chain API definition changes
This patch already introduces some real-world optimizations (for example,
displaylist_mutate
becomes 6% faster overall), but mostly sets things
up for follow up patches to be able to cache clip-state between frames,
which should result in much larger wins.
Updated•2 years ago
|
Comment 3•2 years ago
|
||
bugherder |
Updated•2 years ago
|
Comment 4•2 years ago
|
||
== Change summary for alert #34969 (as of Sat, 30 Jul 2022 00:13:51 GMT) ==
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
7% | displaylist_mutate | macosx1015-64-shippable-qr | e10s fission stylo webrender | 1,857.45 -> 1,727.27 |
6% | displaylist_mutate | macosx1015-64-shippable-qr | e10s fission stylo webrender | 1,854.26 -> 1,739.20 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=34969
Description
•