Slow Frames detected on singlesprout home page
Categories
(Core :: Graphics: WebRender, defect, P3)
Tracking
()
People
(Reporter: yoasif, Unassigned)
References
(Blocks 4 open bugs, )
Details
(Keywords: nightly-community)
Attachments
(4 files)
Seeing the red animation after enabling gfx.webrender.debug.slow-frame-indicator on https://www.singlesprout.com/
Profile: https://perfht.ml/2VPrlHn
Slow Frame #01 Frame 2729(https://www.singlesprout.com/) CONTENT_FRAME_TIME 711 - Transaction start 0.250244, main-thread time 1.360601, full paint time 8.513312, Skipped composites 0, Composite start 16.124971, Resource upload time 0.000434, GPU cache upload time 0.253842, Render time 101.557626, Composite time 102.260539
Slow Frame #02 Frame 2732(https://www.singlesprout.com/) CONTENT_FRAME_TIME 662 - Transaction start 0.199985, main-thread time 1.040204, full paint time 1.750260, Skipped composites 0, Composite start 60.498300, Resource upload time 2.838174, GPU cache upload time 1.248226, Render time 47.848765, Composite time 49.632026
Slow Frame #03 Frame 2730(https://www.singlesprout.com/) CONTENT_FRAME_TIME 645 - Transaction start 0.291483, main-thread time 1.364013, full paint time 2.188527, Skipped composites 0, Composite start 16.129933, Resource upload time 0.001761, GPU cache upload time 0.058995, Render time 5.150866, Composite time 91.181365
Slow Frame #04 Frame 315(https://www.singlesprout.com/) CONTENT_FRAME_TIME 616 - Transaction start 0.336957, main-thread time 2.412339, full paint time 8.948310, Skipped composites 0, Composite start 16.040572, Resource upload time 1.621313, GPU cache upload time 0.114735, Render time 84.312407, Composite time 86.308967
Slow Frame #05 Frame 316(https://www.singlesprout.com/) CONTENT_FRAME_TIME 559 - Transaction start 0.614784, main-thread time 3.238029, full paint time 5.423443, Skipped composites 0, Composite start 12.906398, Resource upload time 0.001895, GPU cache upload time 0.324688, Render time 6.633799, Composite time 76.681015
Slow Frame #06 Frame 2772(https://www.singlesprout.com/) CONTENT_FRAME_TIME 506 - Transaction start 0.243645, main-thread time 1.299267, full paint time 1.991710, Skipped composites 0, Composite start 16.103180, Resource upload time 0.663972, GPU cache upload time 0.268590, Render time 67.415635, Composite time 68.084906
Slow Frame #07 Frame 2731(https://www.singlesprout.com/) CONTENT_FRAME_TIME 471 - Transaction start 0.381023, main-thread time 2.520607, full paint time 4.443342, Skipped composites 0, Composite start 16.042296, Resource upload time 0.000620, GPU cache upload time 0.102412, Render time 1.800384, Composite time 62.339946
Slow Frame #08 Frame 2934(https://www.singlesprout.com/) CONTENT_FRAME_TIME 466 - Transaction start 0.465604, main-thread time 7.433910, full paint time 11.099232, Skipped composites 0, Composite start 15.917207, Resource upload time 2.745110, GPU cache upload time 0.385973, Render time 4.642055, Composite time 61.459521
Slow Frame #09 Frame 2774(https://www.singlesprout.com/) CONTENT_FRAME_TIME 437 - Transaction start 0.962038, main-thread time 2.663309, full paint time 4.660299, Skipped composites 0, Composite start 16.068381, Resource upload time 0.000477, GPU cache upload time 0.060649, Render time 55.797323, Composite time 56.514158
Slow Frame #10 Frame 2773(https://www.singlesprout.com/) CONTENT_FRAME_TIME 430 - Transaction start 0.190110, main-thread time 0.870163, full paint time 1.613620, Skipped composites 0, Composite start 16.065874, Resource upload time 0.002077, GPU cache upload time 0.273269, Render time 3.445983, Composite time 55.526719
Reporter | ||
Updated•6 years ago
|
Comment 1•6 years ago
|
||
I can reproduce this by scrolling down the list of clients, and mousing over the client names (such that they animate their size).
Lots of time spent on the Renderer thread, just blocking waiting for the previous frame to complete.
Enabling picture caching helps a lot, though it seems like it's possible to temporary break it sometimes and get ~1second of slowness, and then it goes back to being fast (stands out in the gecko profiler).
WR profiler shows a lot of time in B_Image (and also target init).
Any ideas Glenn?
Comment 2•6 years ago
|
||
I tried with the latest picture caching pages - it's still very janky when scrolling. There seems to be a very large number of color targets (7+) that get allocated and used, and a lot of mix-blend-mode, which is currently quite slow. This will need further investigation, both to see why it's invalidating with picture caching, and why it's so slow to rasterize.
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Comment 3•6 years ago
|
||
Notes for myself:
-
The mix-blend code that isolates stacking contexts doesn't take into account iframes / content rects, so is doing an extra blit and confusing the invalidation logic.
-
There is a world clip on one of the background images that is tripping up the invalidation logic - it appears to be 1px larger in the y-axis than I would expect. Might be a Gecko bug, might be something that can be handled in WR.
-
The mix-blend-mode is slow here because there are two large, chained mix-blend-modes. We might need to consider implementing the options previously discussed to make mix-blend-mode faster than the current implementation.
-
Although there are 7 reported color targets, at least 2 of them are only storage for the mix-blend-mode readback regions. However, we still clear these targets on the GPU, which is redundant and costs ~2-3 ms GPU time.
Comment 4•6 years ago
|
||
There are a number of invalidation issues here, which will need separate patches.
The first patch is https://bugzilla.mozilla.org/show_bug.cgi?id=1521689.
Comment 5•6 years ago
|
||
A second patch is up - this fixes interning text runs on this page.
https://bugzilla.mozilla.org/show_bug.cgi?id=1522028
There are still pending patches to resolve all the invalidations on this page.
Comment 6•6 years ago
|
||
WIP patch for Matt to take a look at.
Updated•6 years ago
|
Comment 7•6 years ago
|
||
Matt, do you want to take this bug? Once your mix-blend-mode patch lands, we'll probably want to either close this or unassign and drop priority for mvp?
Comment 8•6 years ago
|
||
This is not great on integrated GPUs and still needs some work, but it seems to run well on discrete GPUs that I've tested.
Jeff, could you test / verify if it's OK to drop the priority on this one?
Comment 9•6 years ago
|
||
Bug 1524117 never landed. Presumably it helps with perf on this. Do you want to fix the wrench test and reland that and then we can drop the priority of this one?
Updated•6 years ago
|
Comment 10•6 years ago
|
||
Now that bug 1524117 has landed this seems pretty good.
Updated•6 years ago
|
Comment 11•6 years ago
|
||
Fixing our mix-blend performance is fairly important.
Comment 12•6 years ago
|
||
This is a new implementation of mix-blend compositing that is meant to be more idiomatic to WR and efficient.
Previously, mix-blend mode was composed in the following way:
- parent stacking context was forced to isolate
- source picture is also isolated
- when rendering the isolated context, the framebuffer is read upon reaching the source. Both the readback and the source are placed in the RT cache.
- a mix-blend draw call is issued to read from those cache segments and blend on top of the backdrop
The new implementation works by using the picture cutting (intruduced for preserve-3D contexts earlier) and some bits of magic:
- backdrop stacking context is isolated with a special composition mode that prevents it from actually rendeing unless the suorce stacking context is invisible.
- source stacking context is isolated with mix-blend composition mode that has a pointer to the backdrop picture
- the instance of the backdrop picture is placed as a peer of the source picture (not a child)
- if the backdrop is invisible, the source is drawn as a simple blit
- otherwise, it's a draw call that reads from the isolated backdrop and source textures
Note the differences:
- parent stacking context is not isolated, but backdrop is
- no framebuffer readback is involved
- the source and backdrop pictures are rendered in parallel in a pass, improving the batching
- we don't blend onto the backdrop while reading from the backdrop copy at the same time
- the depth of the render pass tree is reduced: previously the parent and the source were isolated, now the source and the backdrop, which are siblings
Comment 13•6 years ago
|
||
Checked a GPU capture of the new implementation on Singesprout.com versus the old one: slightly reduced draw call count (36 versus 48), missing 3 framebuffer blits, otherwise seems identical. Therefore, I don't expect a major perf win here, but doing this re-implementation has a value anyway: it fits the picture/surface/task structure much better.
Comment 14•6 years ago
|
||
Comment 15•6 years ago
|
||
Backed out changeset b5b2ecf454b0 (bug 1519718) for causing webrender-windows bustage
push that caused the backout: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&selectedJob=229988320&revision=b5b2ecf454b0d34845098e13a186532469021ad0
backout: https://hg.mozilla.org/integration/autoland/rev/1d6e7352316cdc28bdae8adeb1cf330e8a3c7693
Comment 16•6 years ago
|
||
Comment 17•6 years ago
|
||
Log was landed my mistake in https://phabricator.services.mozilla.com/D20608
Comment 18•6 years ago
|
||
Figured out the issues on Windows and updated before landing. The new try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=14d3ae3115c7614a3711ed1529f9ab559436c4cc
It shows a few unexpected passes:
- blend-gradient-background-color.html
- background-blending-gradient-color.html
Comment 19•6 years ago
|
||
Comment 20•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/4fefbf627bfc
https://hg.mozilla.org/mozilla-central/rev/4f11ca736067
Comment 21•6 years ago
|
||
Backed out for perma fails on blend-gradient-background-color.html.
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=230144319&repo=mozilla-central&lineNumber=57544
Backout: https://hg.mozilla.org/mozilla-central/rev/1cacb0b1fe7b01d1c8d0c1ad083ad0c210f6d0f9
Updated•6 years ago
|
Comment 22•6 years ago
|
||
Razvan, these are unexpected passes. I suppose all I need is to mark them as passing with this PR?
Comment 23•6 years ago
|
||
Comment 24•6 years ago
|
||
Confirmed with kats, rebased, and landed again.
Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=34d18dab28b8422ee0b1f9e37a5f63e84e0196c1
Comment 25•6 years ago
|
||
bugherder |
Comment 26•6 years ago
|
||
Comment 28•6 years ago
|
||
bugherder |
Comment 29•6 years ago
|
||
Comment 30•6 years ago
|
||
Updated•6 years ago
|
Comment 31•6 years ago
|
||
https://phabricator.services.mozilla.com/D20608
Caused problems in Servo https://github.com/servo/servo/pull/22973 and MotionMark - https://bugzilla.mozilla.org/show_bug.cgi?id=1519718
Comment 32•6 years ago
|
||
Updated•6 years ago
|
Comment 33•6 years ago
|
||
bugherder |
Comment 34•6 years ago
|
||
The original change was reverted.
Updated•6 years ago
|
Updated•5 years ago
|
Comment 35•5 years ago
|
||
What do we want to do here long term?
Comment 36•5 years ago
|
||
Well, the fix was rolled back, and there isn't a good plan B yet. I took a few captures today and I'm no longer seeing mix-blend mode used at all. The pages have a lot of transparency, but that's it.
Some notes from GPU captures:
- yuv shader is heavy
- use
textureLevel()
instead oftexture()
? - is it needed for grayscale images? (two of the planes are solid)
- use
- can we detect/make it opaque? - many images are drawn with blending enabled unnecessary.
Will look more...
Comment 37•5 years ago
|
||
WR times seem heavy-ish but not extraordinary: 3.5ms CPU rendering, 3.7ms CPU backend, 3.7ms GPU time, 1.5ms DL.
I'd say the plan is to investigate the items I posted above.
Comment 38•5 years ago
|
||
On average the frames are not too bad but some frames take very long. I see some with 20ms frame builds, 30ms renderer 40ms GPU.
The content main thread is 100% busy while scrolling, half of the time spent in JS inside scroll handlers, a lot of time inside a function called Gecko_SnapshotAttrHasSubstring
(7.5%), and 30% in display list building.
When the render thread is busy, it spends a bit more than half of its time blobking during swap buffers, the rest in wr_renderer_render
.
The render backend time spends an unusually high amount of time in webrender::picture::TileNode::update_dirty_rects
(50%!) half of which is in pthread_mutex
code according to the profiler, maybe some lock contention caused by memory allocations?
Comment 39•5 years ago
|
||
On the GPU side things don't look terribly suspicious. Rendering 10 picture cache tiles one after another just amounts to a lot of GPU time, in addition to the 3 passes necessary to render text shadows.
There's a fairly large semi-transparent png in the brackground (happily chomping on memory bandwidth), which we can't do much about.
Comment 40•5 years ago
|
||
Are we invalidating most of the tiles every frame? This is the case where bug 1585716 would help.
Updated•5 years ago
|
Updated•4 years ago
|
Comment 41•2 years ago
|
||
The bug assignee is inactive on Bugzilla, so the assignee is being reset.
Comment 42•2 years ago
|
||
This pages seems to run well now.
Description
•