Open Bug 1689164 Opened 4 years ago Updated 2 years ago

[meta] Pre-allocated content processes are not always idle but consume a bit of CPU

Categories

(Core :: DOM: Content Processes, defect, P3)

defect

Tracking

()

Fission Milestone Future

People

(Reporter: whimboo, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: meta, power, Whiteboard: fission-perf)

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:86.0) Gecko/20100101 Firefox/86.0 ID:20210122212755

As I have noticed with Firefox Nightly and Fission turned on the 3 pre-allocated processes are not always idle. They consume a bit of CPU all the time (max 0.1%) but I think that this should not happen.

There was some discussion on Matrix between Nika and Florian, and as it looks like Nika figured it out. So I will let her reply here.

Flags: needinfo?(nika)
Fission Milestone: --- → ?

There are a bunch of different issues that contribute to this. We should track it as a meta bug so they can be fixed individually.

Keywords: meta
Summary: Pre-allocated content processes are not always idle but consume a bit of CPU → [meta] Pre-allocated content processes are not always idle but consume a bit of CPU

With some help and discussion with Florian and others, we first checked which threads were waking up, and ran a quick profiler run to find the specific messages which are being sent back and forth from the content process. Florian's profile is here: https://share.firefox.dev/3aaI7Zy. There are a couple of cases here which are leading to the content process being woken up:

The preallocated content process is periodically receiving messages broadcasted by the parent process. The main culprits are:

  1. GetMemoryUniqueSetSize (bug 1689182) - this is called by memory telemetry on a regular interval to collect the unique set information for each content process, and is implemented by pinging each process and asking it to compute its own unique set size. In bug 1652813 support was added to get this information from the parent process without waking content processes, so we should switch the memory collection telemetry over to using this approach.
  2. DataStoragePut (bug 1689191) - this is called by mozilla::DataStorage on a regular basis, and broadcasts a message to all content processes to update some state related to our ssl implementation. I don't know enough about this data to know why it's changing so frequently and whether or not we could potentially avoid broadcasting this information to every process. This message only fires a couple of times in the profile Florian captured, but was firing in bursts every second or so when I recorded my live profile, which leads me to believe the rate of updates is related to the browser being in-use.

In addition, the preallocated content process appears to wake itself with a timer on a somewhat regular basis to send AccumulateChildKeyedHistograms and RecordDiscardedData messages to the parent process. These appear to fire in pairs every 2 seconds or so. I was initially worried that this could be caused by a loop of IPC collecting telemetry when sending telemetry data, however the telemetry code explicitly waits to disarm the timer until after IPC data has been sent in order to avoid looping (https://searchfox.org/mozilla-central/rev/b9384b091e901b3283ce24b6610e80699d79fd06/toolkit/components/telemetry/core/ipc/TelemetryIPCAccumulator.cpp#301-302). The most likely case is that IPC receiving the other messages mentioned here causes the timer to start firing again, although this should be verified once those issues are fixed.

Flags: needinfo?(nika)
Depends on: 1689446

M8 unless this we find this is a bigger problem than we currently believe.

Severity: -- → S3
Fission Milestone: ? → M8
Priority: -- → P3
Whiteboard: fission-perf

Randell, what are your thoughts on this with Nika's findings in comment 2?

Flags: needinfo?(rjesup)

This is almost solely an issue with power use. This gets more important on mobile devices, but can have a small impact on laptops. These are relatively cheap operations on a preallocated process, so the power impact isn't high, especially if the browser is otherwise active. Disabling these until the process is allocated to something is very feasible, but adds some small complexity and also increases the amount of overhead when allocating a process from the preallocation cache, which may make a small regression for some page loads.

M8 or even MVP seems reasonable.

Flags: needinfo?(rjesup)

Not a user-perceivable performance impact so pushing it to MVP.

Fission Milestone: M8 → MVP

I can still see PContent::Msg_FlushFOGData observer notifications being dispatched from these pre-allocated processes. Here an appropriate profile from a recent Nightly build: https://share.firefox.dev/3AsTxnM

As it looks like this is coming from Glean, and I assume we should try to stop these notifications?

Chris, could you please have a look at my last comment? Thanks!

Flags: needinfo?(chutten)

Yup, this sure appears to be from FOG, the layer integrating the Glean SDK into Firefox Desktop.

FOG will, after 5s of idle, ask content processes to hand up any data they have kicking around. We use ContentParent::GetAll to get a list of all the content processes that might be harbouring unsent data and ask them all to flush. Most have nothing to send at the moment, but soon any might.

Is there a way to identify processes as being pre-allocated? We can exclude them from the iteration easily enough. We've been deliberately vague in the documentation about how we schedule these flushes, so we have flexibility here (though we will want to make a specific note that telemetry accumulated in pre-allocated content processes will be specifically excluded).

We're also interested in learning more about how to best do efficient and non-intrusive opportunistic IPC flushes, like in bug 1641989. So if the current approach is completely wrong and should be revisited, we'll simply need to prioritize the work. We're not attached to it the way it is : )

Flags: needinfo?(chutten)

Chris, thank you for the quick reply! I just noticed that there is one dependency open for this meta, which is bug 1689446. I wonder if the Telemetry part should block that other bug (which might in turn could be a meta bug?).

Or be blocked by. It seems as though bug 1689446 is thinking about rethinking GetAll in a way that FOG could then use...

Moving this meta bug from Fission MVP to Future. The one remaining blocking bug 1689446 doesn't need to block Fission MVP.

Fission Milestone: MVP → Future
Depends on: 1736868
No longer depends on: 1736868
Depends on: 1736868
Depends on: 1817297
Depends on: 1822062
You need to log in before you can comment on or make changes to this bug.