Open Bug 1843930 Opened 1 year ago Updated 1 year ago

Painting is more than 2x slower than Chrome on the Editor-CodeMirror subtest of speedometer3

Categories

(Core :: Web Painting, defect, P2)

defect

Tracking

()

People

(Reporter: mstange, Assigned: tnikkel)

References

(Blocks 1 open bug, )

Details

(Whiteboard: [sp3])

https://speedometer-preview.netlify.app/?suite=Editor-CodeMirror

According to the comparison report, on Windows, Firefox spends 2.17x as much time painting during the measured time compared to Chrome. Eliminating the difference would improve our total time on this subtest by 2.6%.

Firefox: https://share.firefox.dev/3KmYW6z
Chrome: https://share.firefox.dev/3JZTvde

I noticed a few things in the profiles:

  • Retained Display List partial update appears to fail, because we spend just as much time doing the full rebuild as we spend in the partial update rebuild.
  • We hit fallback painting (WebRenderCommandBuilder::PushItemAsImage), probably for the Windows scrollbar arrows.
  • There's a lot of time spent in RetainedDisplayListBuilder::ComputeRebuildRegion. I think this page would probably be faster to paint with RDL turned off entirely.

Hmm, I don't get PushItemAsImage in my profiles at all. I tested on macos and on Windows 10 (there are scrollbars with arrows visible). The only thing that shows up as a blob with gfx.webrender.debug.blob.paint-flashing on is the svg inside the buttons at the top.

Maybe you can share some more info about your setup so I can try to emulate that to reproduce?

Flags: needinfo?(mstange.moz)

I think the profile is hitting this

https://searchfox.org/mozilla-central/rev/7a4c08f2c3a895c9dc064734ada320f920250c1f/widget/Theme.cpp#1317

but I don't see how I would not be hitting that. Maybe it's just so fast on my machine there are no samples in it?

That's possible. It is a rather small part of the profile. The profile was taken with ?iterationCount=100 and a profiling interval of 0.25ms.

Flags: needinfo?(mstange.moz)

Seems like the major thing that makes ComputeRebuildRegion take so long is calling nsLayoutUtils::GetTransformToAncestor. It looks like we call it 3 separate times for the same frame. Two of them in the displayport getter, and a third when we transform the rect in ProcessFrameInternal. Calling it once and sharing it would be a good optimization.

The long partial display list updates look to be when the syntax highlighting it turned on. We get ~300 modified frames, which is just below the limit of a partial update (500). They end up making the dirty rect be everything but the toolbar of buttons at the top, so we are rebuilding almost the whole thing anyway, so all the other retained stuff (merging, preprocessing etc) is just pure overhead.

It makes me think that a good heuristic to add might be to bail out of a partial update if the root dirty rect becomes most of the whole page. The root dirty rect doesn't tell the whole story because we can store an override dirty rect on every frame with a display port and every stacking context. So the root dirty rect could be large, but we immediately enter a stacking context or displayport with a very small rect. But I'm not sure if that consideration is important.

And the scrollbar parts that we use fallback for, they aren't blobs, they are images because nsDisplayThemedBackground::MustPaintOnContentSide returns true. But I think since we are using the non-native theme we are just drawing using simple DrawTarget commands and not asking the OS for bitmaps, at least in the common cases, that we can just use blobs, which should be a small win.

Assignee: nobody → tnikkel
Severity: -- → S3
Priority: -- → P2
You need to log in before you can comment on or make changes to this bug.