Compositor CSS animations keep being sampled in fully occluded windows
Categories
(Core :: Graphics: Layers, defect, P3)
Tracking
()
People
(Reporter: florian, Assigned: bradwerth)
References
(Blocks 2 open bugs)
Details
(Keywords: perf:resource-use)
Attachments
(4 files, 3 obsolete files)
Steps to reproduce (on a platform where detecting window occlusion is supported, I think that's Mac and Windows) :
- load a page with a CSS animation that can run on the compositor, eg.
data:text/html,<head><style>body{animation: fadein 1s infinite;} @keyframes fadein{from{opacity: 0;}}</style><body>Text
- cover the window entirely with another window.
- see that the CPU usage barely drops (I checked that using about:processes in another window).
When the same tab with CSS animation is put in the background by opening another tab in the same window, the CPU use drops significantly.
After doing some debugging, I noticed that removing the nsIRemoteTab.preserveLayers call at https://searchfox.org/mozilla-central/rev/c7183440fe8ad391e2ab2e990229dd85fac318d3/browser/base/content/tabbrowser.js#5636-5639 makes the animations stop when the window is occluded.
When the window is occluded, the refresh driver ticks are throttled in the content process, only the compositor/renderer threads remain active at 60Hz.
Example profile where I made the animated tab inactive first by occluding the window (the IPCs related to sending vsync notifications to the child process stop, but the renderer and compositor activity continues) and another time by switching to an about:blank tab (this time the compositor and renderer thread activity stopped): https://share.firefox.dev/3kVgGIY
Is it a bug of the PreserveLayers code that in addition to keeping the layers it keeps the animations active, or should the front-end not call preserveLayers when a window is fully occluded?
Comment 1•3 years ago
|
||
I'm confused, how's that call the relevant one? Shouldn't we have a tab switcher, in which case the event is handled here? https://searchfox.org/mozilla-central/rev/c7183440fe8ad391e2ab2e990229dd85fac318d3/browser/modules/AsyncTabSwitcher.jsm#852
Reporter | ||
Comment 2•3 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #1)
I'm confused, how's that call the relevant one? Shouldn't we have a tab switcher, in which case the event is handled here?
My understanding is that an AsyncTabSwitcher object is created whenever a tab switch is initiated, and it self detroys before sending the TabSwitchDone event, typically 300ms after the new tab became visible. (This behavior is visible in this part of the profile: https://share.firefox.dev/3LYRQUx)
Comment 3•3 years ago
|
||
Huh, that's weird... But yeah I think preserveLayers()
should be avoided, that keeps compositing. But I don't know why it's there to begin with.
Comment 4•3 years ago
|
||
I guess it comes from bug 1098131... Does minimizing show the same effect, actually? Or does something else stop the compositor?
Updated•3 years ago
|
Reporter | ||
Comment 5•3 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #4)
I guess it comes from bug 1098131... Does minimizing show the same effect, actually? Or does something else stop the compositor?
Bug 1580117 landed Mac-specific code to pause the compositor in minimized windows.
(In reply to Emilio Cobos Álvarez (:emilio) from comment #3)
Huh, that's weird... But yeah I think
preserveLayers()
should be avoided, that keeps compositing. But I don't know why it's there to begin with.
When I remove the preserveLayers()
, the content area becomes blank while the window is occluded, this is visible in the screenshots of this profile: https://share.firefox.dev/3N2YGZ7
That's likely what bug 1098131 worked around by adding the preserveLayers call. Could we instead suppress the paints as soon as the window is occluded?
Reporter | ||
Comment 6•3 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #4)
I guess it comes from bug 1098131... Does minimizing show the same effect, actually? Or does something else stop the compositor?
I tested on Windows, here's a profile: https://share.firefox.dev/3PbA4zd We keep compositing when the window is minimized, but the minimized window is resized to a 1px height, so the screenshots look empty during that time.
Reporter | ||
Comment 7•2 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #5)
(In reply to Emilio Cobos Álvarez (:emilio) from comment #4)
I guess it comes from bug 1098131... Does minimizing show the same effect, actually? Or does something else stop the compositor?
Bug 1580117 landed Mac-specific code to pause the compositor in minimized windows.
(In reply to Emilio Cobos Álvarez (:emilio) from comment #3)
Huh, that's weird... But yeah I think
preserveLayers()
should be avoided, that keeps compositing. But I don't know why it's there to begin with.When I remove the
preserveLayers()
, the content area becomes blank while the window is occluded, this is visible in the screenshots of this profile: https://share.firefox.dev/3N2YGZ7That's likely what bug 1098131 worked around by adding the preserveLayers call. Could we instead suppress the paints as soon as the window is occluded?
Looks like the needinfo chain broke here, re-adding.
Reporter | ||
Comment 8•2 years ago
|
||
This wastes enough CPU / power that I don't think the severity could be less than S2. Also nominating for performance triage.
Comment 9•2 years ago
|
||
I'm not so familiar with the compositor side of things, but presumably? We have compositor-pausing machinery on various platforms, IIRC.
preserveLayers()
keeping compositing animations should affect Android as well (they always preserveLayers()
iirc). Jamie, do you know if/how Android pauses the compositor of background tabs?
Comment 11•2 years ago
|
||
To pause the compositor the java layer calls in to SyncPauseCompositor()
which will result in CompositorBridgeParent::PauseComposition()
being called on the compositor thread. This will then call through to RendererThread::Pause()
on the render thread. RenderThread::UpdateAndRender()
then checks whether it is paused here and skips rendering if so.
However, this only skips rendering frames, but not generating the frames (which I think would be when we sample compositor animations). I'm afraid I don't know whether we have a mechanism for stopping the rest of the work the compositor thread (and other internal webrender threads) perform. I think the ClearCachedResources()
command should take care of clearing compositor animations, but not sure under what circumstances it is called. Hope that helps!
Comment 12•2 years ago
|
||
I wonder which platforms are affected here - on Wayland the WaylandVsyncSource
stops sending NotifyVsync()
events in this case which IIUC would avoid generating the frame.
Reporter | ||
Comment 13•2 years ago
|
||
(In reply to Robert Mader [:rmader] from comment #12)
I wonder which platforms are affected here - on Wayland the
WaylandVsyncSource
stops sendingNotifyVsync()
events in this case which IIUC would avoid generating the frame.
I don't think I tested Linux. From what I remember, on Mac we keep compositing when a window is occluded but stop compositing when the window is minimized. On Windows we keep compositing for CSS animations both when the window is occluded and when it is minimized.
Comment 14•2 years ago
|
||
The Performance Priority Calculator has determined this bug's performance priority to be P2. If you'd like to request re-triage, you can reset the Performance flag to "?" or needinfo the triage sheriff.
Platforms: [x] Windows [x] macOS [x] Linux [x] Android
[x] Causes severe resource usage
[x] Bug affects multiple sites
Assignee | ||
Comment 16•2 years ago
|
||
There doesn't appear to be a way to check the non-occluded region, so any
part of the bounding rectangle of the window will resume the compositor.
Assignee | ||
Comment 17•2 years ago
|
||
macOS fixup seems easy; I'll try to create fixes for the other platforms also.
Reporter | ||
Comment 18•2 years ago
|
||
(In reply to Brad Werth [:bradwerth] from comment #17)
macOS fixup seems easy; I'll try to create fixes for the other platforms also.
Would it be possible to do this in cross platform code, eg. https://searchfox.org/mozilla-central/rev/08362489086b10de96e7a199b267ea5504c01583/xpfe/appshell/AppWindow.cpp#2992 ?
Assignee | ||
Comment 19•2 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #18)
Would it be possible to do this in cross platform code, eg. https://searchfox.org/mozilla-central/rev/08362489086b10de96e7a199b267ea5504c01583/xpfe/appshell/AppWindow.cpp#2992 ?
I'll attempt that instead.
Assignee | ||
Comment 20•2 years ago
|
||
(In reply to Brad Werth [:bradwerth] from comment #19)
(In reply to Florian Quèze [:florian] from comment #18)
Would it be possible to do this in cross platform code, eg. https://searchfox.org/mozilla-central/rev/08362489086b10de96e7a199b267ea5504c01583/xpfe/appshell/AppWindow.cpp#2992 ?
I'll attempt that instead.
I'm not sure this can be done purely in AppWindow
. It seems to me that each widget window class has its own widget that hosts the CompositorBridgeChild
. For example, for macOS it's the last child of the window's content view. That's specific enough that each widget window class should really manipulate it directly, rather than trying to replicate that logic within AppWindow
. I'll build out the patches for each widget window class and if somebody can find a more centralized way to do it, maybe that will come out in the review process.
Assignee | ||
Comment 21•2 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #11)
To pause the compositor the java layer calls in to
SyncPauseCompositor()
which will result inCompositorBridgeParent::PauseComposition()
being called on the compositor thread.
Getting the call through to CompositorBridgeParent::PauseComposition()
is sufficient to reduce the CanvasThread
traffic on other platforms, and that appears to be the majority of the CPU work noted in comment 0. Maybe there's more CPU usage to target, but we can definitely get this piece. Meanwhile, GeckoView doesn't seem to have any existing concept of occlusion. Can you sketch a model of how to enable occlusion detection on Android?
Assignee | ||
Comment 22•2 years ago
|
||
Depends on D169123
Assignee | ||
Comment 23•2 years ago
|
||
Depends on D169260
Comment 24•2 years ago
|
||
Just making sure I'm on the right page.. Occlusion in this case refers to the OS window being occluded because another window is in front of it (or it is minimised)? I don't think windows being in front of eachother is really a thing on Android. Perhaps on some samsung tablets you can have multiple windows overlapping eachother, but I'm not sure that's exposed via an API or whether we care enough anyway. For the equivalent of a window being minimised, SyncPauseCompositor()
will be called when an app is no longer visible, and SyncResumeResizeCompositor()
will be called when it becomes visible again.
Comment 25•2 years ago
|
||
Getting the call through to CompositorBridgeParent::PauseComposition() is sufficient to reduce the CanvasThread traffic on other platforms
By this do you mean it does not reduce the CanvasThread traffic on Android, or just that you haven't tested?
Assignee | ||
Comment 26•2 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #25)
Getting the call through to CompositorBridgeParent::PauseComposition() is sufficient to reduce the CanvasThread traffic on other platforms
By this do you mean it does not reduce the CanvasThread traffic on Android, or just that you haven't tested?
Thanks for the info. I meant that I haven't tested it, so I don't know. Also, I misspoke: the traffic is on the Render thread. Really, I'm just responding to the note in comment 14 that this bug affects all platforms. It's possible that that is also a misstatement and this is not an issue on Android.
I have a new version of the patch that pauses/resumes the compositor when CanonicalBrowsingContext::RecomputeAppWindowVisibility
decides the window is visible or not. So that should be a cross-platform solution, though I'm not sure under what conditions that function is hit on Android.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 27•2 years ago
|
||
Assignee | ||
Comment 28•2 years ago
|
||
Depends on D169344
Assignee | ||
Comment 29•2 years ago
|
||
Depends on D169345
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 30•2 years ago
|
||
Comment 31•2 years ago
|
||
Backed out for causing reftest failures in 1478035.html
- Backout link
- Push with failures
- Failure Log
- Failure line: REFTEST TEST-UNEXPECTED-FAIL | gfx/tests/crashtests/1478035.html | load failed: timed out waiting for reftest-wait to be removed
Assignee | ||
Comment 32•2 years ago
|
||
This failure of the 1478035.html test is super weird. The crash log shows that the test is hitting JS errors in the anti-tracking cookie PurgeTrackerService and in the remote settings Utils. I can't explain why the code changes would cause those JS errors. The code probably doesn't cause these errors. I see there's a resolved incomplete intermittent orange Bug 1584357. I can't see the logs from that Bug to determine if they are the same logs here. Cristian, is this just another expression of Bug 1584357, which needs to be re-opened so that this code can land?
Comment 33•2 years ago
|
||
Hi, yes we have a suggestion for that but the suggestion is for an Intermittent failure and I have 2 sets of backfills for that and both look permanent and both seem to stop at your push, can you please take another look?
Backfill1: https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&collapsedPushes=1155668&fromchange=25203291c831be25baa38f36f8155ee14d8c5359&searchStr=windows%2C10%2Cx86%2C2004%2Cwebrender%2Copt%2Creftests%2Ctest-windows10-32-2004-qr%2Fopt-crashtest%2Cc&tochange=97a258c20ec57449539ed763fdeeaea5fdb73f96&selectedTaskRun=H5PBJWoqSe63O_ABSkixtw.0
Backfill2: https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&collapsedPushes=1155668&searchStr=windows%2C10%2Cx64%2C2004%2Cwebrender%2Copt%2Creftests%2Ctest-windows10-64-2004-qr%2Fopt-crashtest%2Cc&tochange=97a258c20ec57449539ed763fdeeaea5fdb73f96&fromchange=25203291c831be25baa38f36f8155ee14d8c5359
Assignee | ||
Comment 34•2 years ago
|
||
Yes, local testing has shown that there's some kind of interference with the crashtests that I don't understand yet. I'll try to sort it out and make the patches safe to land.
Assignee | ||
Comment 35•2 years ago
|
||
I'm still not sure why manipulating the compositor causes the cookie purge tracker to crash. I'm adding local logging to the purge tracker to see what the change is in origin attributes that is causing this. Ultimately, the purge tracker should be wrapping this call in a try-catch block and if I can figure out the right way to do that, I'll open a new Bug on that issue and mark it as a blocker.
Assignee | ||
Comment 36•2 years ago
|
||
I've figured this out but I'm not sure how to fix it yet. This timeout is happening because the crashtests are allowing the sending of the idle-daily
event, which triggers the PurgeTrackerService
, which then fails. None of these tests should be sending the idle-daily
event, which is disabled by setting the idle.lastDailyNotification
pref to -1, which is done for most of the test harnesses but not for crashtests. I'm trying to figure out how to fix this, which obviously should be a separate otherwise-unrelated Bug, but which will lead to timeouts in crashtests until it is fixed.
Assignee | ||
Comment 37•2 years ago
|
||
(In reply to Brad Werth [:bradwerth] from comment #35)
Ultimately, the purge tracker should be wrapping this call in a try-catch block and if I can figure out the right way to do that, I'll open a new Bug on that issue and mark it as a blocker.
I have filed Bug 1818782 on this issue. It shouldn't be a blocker because the real fix will be ensuring that the idle-daily
event is never sent and the PurgeTrackerService
is never activated during crashtests. Still working on that Bug.
Assignee | ||
Comment 38•2 years ago
|
||
It looks like preventing the idle-daily
event is not sufficient to solve this Bug. Local testing with the patch for Bug 1818787 applied shows that the crashtest still times out, and a different idle event is triggering after over 3 minutes. So the fix will require something to ensure that a requestAnimationFrame
is always successful and won't fail silently if the compositor is paused.
Assignee | ||
Comment 39•2 years ago
|
||
This crashtest relies upon a requestAnimationFrame firing even when the window is
occluded, which is something that we don't yet handle.
Assignee | ||
Comment 40•2 years ago
|
||
There are a total of 10 crashtests that timeout on requestAnimationFrame
with this patch applied. I'll update the final part of the patch to set the pref on each of those tests individually, but that's a lot of tests. I'm removing Bug 1819154 as a dependency, because if it was fixed first, this wouldn't be necessary.
Updated•2 years ago
|
Comment 41•2 years ago
|
||
Assignee | ||
Comment 42•2 years ago
|
||
Ugh, I screwed up the landing request because Part 4 wasn't parented correctly. It can land independently from parts 1 through 3, which should then work with part 4 already in place.
Comment 43•2 years ago
|
||
Comment 44•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/425fd934f385
https://hg.mozilla.org/mozilla-central/rev/21c21d6a428a
https://hg.mozilla.org/mozilla-central/rev/52f7a952958b
https://hg.mozilla.org/mozilla-central/rev/1d55841947c3
Comment 46•2 years ago
|
||
bug 1828587 backed this out of beta and release for 113.0b7 and 112.0.2 dot release
Backout link for beta: https://hg.mozilla.org/releases/mozilla-beta/rev/e6bc65cb318a
Backout link for release: https://hg.mozilla.org/releases/mozilla-release/rev/6b8f889688ed
Comment 47•2 years ago
|
||
Since it's still in Nightly I think we should keep this closed, but feel free to reopen if you disagree.
Updated•2 years ago
|
Comment 48•1 years ago
|
||
We backed the patch out in 114 beta in https://hg.mozilla.org/releases/mozilla-beta/rev/a77cb72b0b19
Updated•1 years ago
|
Updated•1 years ago
|
Updated•1 year ago
|
Description
•