Very slow plane splitting and poor batching on https://themaninblue.com/experiment/slashGlobe/
Categories
(Core :: Graphics: WebRender, defect, P3)
Tracking
()
People
(Reporter: gregp, Assigned: nical)
References
(Blocks 3 open bugs)
Details
Attachments
(6 files)
Steps to reproduce:
- Navigate to https://themaninblue.com/experiment/slashGlobe/
- Move cursor around the globe
Actual results: Lag
Expected results: Less lag
Comment 1•2 years ago
|
||
Looks like we are spending half the RenderBackend time inserting into the BSP tree for plane splitting. gw, is that normal or excessive?
Comment 2•2 years ago
|
||
No, that's not normal - seems like a bug worthy of investigation when someone has time.
Assignee | ||
Comment 3•2 years ago
|
||
This is the plane-splitting code allocating memory at each insertion. I had a similar profile on the test case of bug 1493359.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 4•2 years ago
|
||
I have a work in progress rewrite of plane_split
's BSP tree which yields pretty large speedups for the plane splitter (~3x) on this test case and the one in bug 1493359, which leads to ~25% improvement to frame building time on my laptop for these test cases. It won't get a solid 60fps on mid-range laptop but plane splitting goes down to about 10% of the profile and further improvements will probably have to be found elsewhere.
Assignee | ||
Comment 5•2 years ago
|
||
New plane_split version with much faster BSP tree.
The main source of churn is plane_split not being generic over the unit tag anymore.
Assignee | ||
Comment 6•2 years ago
|
||
Currently splane splitters are stored in the built scene and reallocated with every new scene.
This patch moves the responsibility of storing/recycling the plane splitters to the frame builder so that they can be reused in more cases. The scene builder only needs to track splitter indices, it doesn't use the splitter objects themselves.
Alone this patch does not make a large difference because the current version of the plane_split crate reallocates everything each frame. The next version of plane_split does a much better job of recycling allocations, and applying this patch on top of it makes a large difference.
Reporter | ||
Updated•2 years ago
|
Assignee | ||
Comment 7•2 years ago
|
||
Depends on D160362
Assignee | ||
Comment 8•2 years ago
|
||
Depends on D161584
Comment 10•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/296dca6ce0fa
https://hg.mozilla.org/mozilla-central/rev/ebdd5938e8ab
https://hg.mozilla.org/mozilla-central/rev/b18532ba7066
https://hg.mozilla.org/mozilla-central/rev/8d17749400ba
Assignee | ||
Comment 11•2 years ago
|
||
Reopening as the changes bring a large perf boost on the plane-splitting part of the profile but we are still choking on thousands of draw calls with this test case.
Updated•2 years ago
|
Assignee | ||
Comment 12•2 years ago
|
||
The large number of draw calls mostly comes from many split composite instances (the 3d transformed thingies), sampling a relatively low number of different textures but alternating between them frequently. The input texture that is changing is the child picture task of the primitive.
Assignee | ||
Comment 13•2 years ago
|
||
The render task graph is not a pretty sight. There is a very large number of child pictures depending on an even larger number of clip tasks. The main issue for batching is the number of child pictures though (second column) and how they end up interleaved in the drawing order.
Hopefully there's something we can do about nudging the packing of the child tasks so that they better match the batching order.
Description
•