Open Bug 1839036 Opened 1 year ago Updated 1 year ago

High CPU use when browser is idle

Categories

(Core :: JavaScript: GC, defect, P3)

Firefox 113
defect

Tracking

()

UNCONFIRMED
Performance Impact low

People

(Reporter: curlypaul924, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf:resource-use)

Attachments

(5 files)

Attached image Screenshot_20230617_160054.png (deleted) —

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/113.0

Steps to reproduce:

  1. Use the browser for a while (many tabs open)
  2. Firefox eventually uses nearly 500% cpu (8-core machine)
  3. Using about:processes kill all processes (note: none of the processes killed were showing as using CPU)

Actual results:

After killing all processes firefox was still using nearly 500% cpu (visible in top and in about:processes).

Expected results:

With no active processes firefox should be using close to 0% cpu.

Profile: https://share.firefox.dev/3JjvYUw

The renderer is spending 100% of time in mozilla::wr::RenderThread::HandleFrameOneDocInner.

The main thread is spending its time in g_main_context_poll.

Attached output from perf top for the busiest threads. A lot of time is spent in futex_wait/futex_wait. There is also a lot of time spent scheduling the threads (as if they are waking very briefly and then yielding the cpu). The high cost of the linux scheduler for threads with this work pattern is something that afaict isn't captured by the firefox profiler.

The Bugbug bot thinks this bug should belong to the 'Core::Widget: Gtk' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Widget: Gtk
Product: Firefox → Core
Flags: needinfo?(curlypaul924)

(In reply to Martin Stránský [:stransky] (ni? me) from comment #8)

A possible dupe of Bug 1826291.

Hmm, interesting thought. On the surface the conditions are different (I did not have any active background windows), but the effect is similar.

I agree that aggressively swapping buffers for a window that is actively drawing but not visible is not ideal. In this case all the background windows are idle/inactive. There should be zero work done for windows that are inactive, as there is no active content to render.

Can you try Wayland backend?
https://fedoraproject.org/wiki/How_to_debug_Firefox_problems#Testing_Mozilla_binaries

Thanks.

Will the Wayland backend work under X11? I am not running Wayland.

Flags: needinfo?(curlypaul924)

(In reply to Paul Brannan from comment #9)

Will the Wayland backend work under X11? I am not running Wayland.

No, Wayland needs different environment. You need Wayland compositor running (Sway for instance or Mutter in Wayland mode), see:
https://fedoraproject.org/wiki/How_to_debug_Firefox_problems#Testing_different_Wayland_compositor

I realized the profiler's screenshots obscures the work the browser is doing. Here is an updated profile with screenshots disabled: https://share.firefox.dev/3Nl39Ie

Above profile has two tabs showing content (one from reddit, one from HN), and extensions are enabled. I tried to get another profile with extensions disabled and showing all threads: https://share.firefox.dev/3NEFx2O

But the second profile seems to have hit a different bug. The renderer and compositor threads are now idle, and there are multiple pool-firefox threads visible in top (afaik I've never seen this before).

It's possible this bug is related to bug#1581169 as right-clicking and saving images is part of the process I used to reproduce the bug.

Performance Impact: --- → ?

(In reply to Paul Brannan from comment #11)

I realized the profiler's screenshots obscures the work the browser is doing. Here is an updated profile[...]

Thanks. So based on the profiles from comment 11, there seems to be quite a lot of garbage collection happening in the parent process, which is showing up as a fair amount of jank. Let's classify under JS:Garbage Collection, assuming comment 11 is representative of the same original issue here.

That wouldn't correspond to your original report of 500% cpu usage (I'd expect at-most 100% from that single process); but it is a user-visible perf issue and it looks like more than I would expect.

Iain, could you take a look at the comment 11 profiles (particularly the second one with extensions disabled) and see if you have any theories about what's going on?

Component: Widget: Gtk → JavaScript: GC

The Performance Impact Calculator has determined this bug's performance impact to be low. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

[x] Causes severe resource usage

Performance Impact: ? → low

The second profile in comment 11 looks to me like the parent process is otherwise idle and we are taking advantage of the opportunity to do a major GC. I'm a little surprised that we're managing to spend 7 mostly uninterrupted seconds running a GC without finishing, but maybe I'm underestimating the scope of the heap. The first second is spent marking, and then we keep sweeping until the end of the profile.

Looking at the first profile, I note that the parent process is doing some non-GC work, but it's still the case that every half-second we're running a GCSlice with a 100ms budget. We're already sweeping when the profile starts, and we're still sweeping when it ends.

Jon, is it concerning/noteworthy if the parent process spends 6-10+ seconds sweeping during a single major GC?

Flags: needinfo?(jcoppeard)

The component has been changed since the backlog priority was decided, so we're resetting it.
For more information, please visit BugBot documentation.

Priority: P3 → --

I tried to reproduce the bug today in a controlled experiment (fresh profile, no extensions) but with no success. I visited multiple websites and saved over 500 images in a single directory using right-click and save as. The browser became very slow when saving files, same as bug#1581169, but CPU usage remained low after the file was saved. So it appears this is a different bug from that one at least. I did get a profile from saving an image and will add it to that bug report.

I will continue trying to reproduce the bug and see if there's a particular website that triggers it. For now I don't know, just that after browsing for a while firefox gets sluggish, and when I look at about:processes to see if there's a busy process, there is no obvious culprit.

(In reply to Iain Ireland [:iain] from comment #14)

The second profile in comment 11 looks to me like the parent process is otherwise idle and we are taking advantage of the opportunity to do a major GC. I'm a little surprised that we're managing to spend 7 mostly uninterrupted seconds running a GC without finishing, but maybe I'm underestimating the scope of the heap. The first second is spent marking, and then we keep sweeping until the end of the profile.

This is unusual to say the least. Telemetry shows the 95th percentile of sweep time in the parent process is 34 milliseconds. That suggests some kind of memory leak in the parent process.

Looking at the flame graph, most of the time is spent tracking cycle collector gray roots, i.e. C++ objects.

(In reply to Paul Brannan from comment #16)
If you reproduce this again, can you measure the memory use with about:memory and post the results for the parent process?

Flags: needinfo?(jcoppeard)
Blocks: jsperf
Severity: -- → S3
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: