Avoid running the garbage collector in content processes that are shutting down
Categories
(Core :: DOM: Content Processes, enhancement, P2)
Tracking
()
People
(Reporter: gsvelto, Unassigned)
References
(Blocks 2 open bugs)
Details
Crash Data
While looking at various content child shutdown kills in bug 1279293 I noticed that quite a few of them were running garbage-collection cycles. Since content process shutdown has to be as fast as possible I was wondering if it's possible to reduce the number of GCs.
Aborting current iterations via JS::AbortIncrementalGC()
might also be a good idea.
Content process shutdown can be detected by listening to the "content-child-will-shutdown" notification.
Comment 1•5 years ago
|
||
Could you give some examples of these crashes? I'm curious what kinds of GCs are being run.
Comment 2•5 years ago
|
||
So I guess https://searchfox.org/mozilla-central/rev/df94cd5ba431234bc220ac081def0801fe44b89e/dom/base/nsJSEnvironment.cpp#2663
could observe also content-child-will-shutdown and abort incremental GC in https://searchfox.org/mozilla-central/rev/df94cd5ba431234bc220ac081def0801fe44b89e/dom/base/nsJSEnvironment.cpp#382
(Though I don't quite understand why, since I assume there can be other GCs during shutdown, and they can be slow if there is lots of garbage to collect.)
Comment 3•5 years ago
|
||
Yeah, I'm wondering how many of these GCs are long running GCs that happen to be in progress (or maybe somebody closes a tab that is hanging because of a GC) compared to how many might be happening because we do some weird GC during shutdown for some reason. There's a lot of the latter, but I don't know how much of an issue they cause.
Reporter | ||
Comment 4•5 years ago
|
||
Here's two signatures where we're doing a GC during shutdown:
IPCError-browser | ShutDownKill | js::GCMarker::processMarkStackTop
IPCError-browser | ShutDownKill | js::GCParallelTask::joinRunningOrFinishedTask
Generally speaking I was just looking for ways to minimize GC activity during shutdown knowing that we're going to kill the process anyway if shutdown takes too long.
It could also help with responsiveness on machines that don't have much memory: over 20% of the shutdown kill reports have less than 512MiB of free physical memory so the process that's being shut down might have been partially paged out. The less memory we touch in that process the less likely we page-in stuff that we know we won't use anymore (and possibly page-out stuff that we still need in other processes).
Comment 5•5 years ago
|
||
The stacks for the first signature look like regular GC work happening off an idle callback. The second one looks interesting because we're waiting for the GC helper threads to come back. I think we've had issues with that taking a while in the past.
It looks like Jon added the mechanism to interrupt GCs in bug 1149752. He might have some ideas about how to improve this.
Comment 6•5 years ago
|
||
Should this bug move to the JavaScript: GC component? Does the GC know when a process is shutting down? Or is this bug about adding such a notification?
Priority P2 because IPCError_ShutDownKill meta bug is P2.
Comment 7•5 years ago
|
||
Maybe we could make nsJSEnvironment listen for content-child-will-shutdown and kill the GC timers? I think that will keep us from running the idle task stuff. Maybe we could even stop running any idle tasks once we get that. If we're getting ready to shut down, arguably we're never really idle!
Updated•5 years ago
|
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Comment 8•5 years ago
|
||
CCGraphBuilder::NoteJSChild is the cycle collector, not the garbage collector. I'm not sure if we want to bundle them together.
Reporter | ||
Comment 9•5 years ago
|
||
Sorry, I assumed the two would always run together.
Comment 10•5 years ago
|
||
On the main thread, running one will typically schedule running the other, so they are linked, but they don't usually run one right after the other. (On workers, they do run together.) The changes I made in nsJSEnvironment to not schedule stuff when we're shutting down will affect both GCs and CCs, but other work, like interrupting a GC in progress, will be specific to one or the other.
Reporter | ||
Comment 11•4 years ago
|
||
Added another signature, maybe it would be worth duplicating bug 1619213 against this because it also seems to track GC-related operations during shutdown.
Comment 12•4 years ago
|
||
Yes, it looks like the same thing (though the recent crashes with signatures in that bug didn't look GC-related...)
Updated•4 years ago
|
Reporter | ||
Comment 13•4 years ago
|
||
Added a signature.
Reporter | ||
Comment 14•4 years ago
|
||
Added another signature.
Comment 15•4 years ago
|
||
Did this get sufficiently better since Bug 1619455 and Bug 1620623 were resolved? Or is it worth while raising a flag with the JS engine itself to tell it not to bother GCing? Since it has internal triggers based on how much has been allocated.
Comment 16•4 years ago
|
||
I looked at some of the residual crashes with these signatures before, and if I recall correctly there were no IPC shutdown state annotations, so I think the content process was too busy to even notice that it was told to shut down (which would happen when we run ContentChild::RecvShutdown()).
For instance, in this one I checked just now it looks like it is busy running a GC in ContentChild::RecvFlushMemory bp-f6486beb-bcd1-42aa-8b56-2faeb0210101
Reporter | ||
Comment 17•4 years ago
|
||
I second Andrew's comment, the vast majority of the remaining crashes have no IPCShutdownState annotation so the affected content process didn't even notice it was being shutdown. I think we can mark it as resolved, the remaining volume is very small anyway.
Updated•4 years ago
|
Comment 18•4 years ago
|
||
I could imagine a world where we use the background hang monitor to tell a child process to shut down even when the main thread is hanging, by interrupting the GC or JS or something, but that's a bunch of different work I think.
Description
•