Lots of wake-ups triggered by garbage collection in a background tab
Categories
(Core :: JavaScript: GC, defect, P3)
Tracking
()
People
(Reporter: florian, Unassigned)
References
(Depends on 3 open bugs, Blocks 1 open bug)
Details
Attachments
(1 file)
(deleted),
application/x-gzip
|
Details |
@metasieben in #joy-of-profiling wrote:
here another >sitting in the background for days< profile, this time twitch taking apparently 2GB, maybe useful? https://share.firefox.dev/3aFWitv
I'm surprised that we have so many GCSlide of only a few microseconds at the beginning. Is there something we can do about it?
It's also unfortunate that the machine appears to be under heavy load (out of a 599ms awake markers, we only get 40ms of actual CPU time), but we kept this background process at the "User Interactive" Quality of Service. I guess that's bug 1394714.
Comment 1•2 years ago
|
||
Forwarding the needinfo to Steve, as Paul is no longer working on GCs.
Comment 2•2 years ago
|
||
Thanks Florian,
It looks like maybe it starts to do a CC then bails out of a GC, but whatever told it it wants to do a CC retriggers. I'll have to spend some time reading the code to understand.
It might also be helpful to have markers about the management of idle time.
Updated•2 years ago
|
Reporter | ||
Comment 3•2 years ago
|
||
I just noticed that there's a TaskController thread stuck in js::gc::BackgroundUnmarkTask::run (a 13s Runnable) and attempting to use CPU there, and the very short repeated GCSlice markers on the main thread stop when the TaskController runnable has finished. Should the main thread code more explicitly wait for that runnable to finish, and avoid triggering EnsureCCRunner::CCRunnerFired timers until it is finished?
Comment 4•2 years ago
|
||
Some more info about the profile:
It was generated via about:processes
after switching to the backgrounded twitch-page.
STR(using Mac mini M1 with 8GB RAM):
- open a vod https://www.twitch.tv/videos/1486195733
- play the video
- pause the video
- switch to another tab
- leave the browser running for a few days
Sadly the STR aren't more specific and probably not easily reproducible.
Comment 5•2 years ago
|
||
This is interesting. We're waiting for the unmarking background task to finish, so idle time GC slices are returning immediately. Normally this task doesn't take so long to complete.
Comment 6•2 years ago
|
||
The background unmarking task is effectively a memset(ptr, 0, size)
loop over various memory regions. It looks up some pointers and writes zeroes to them. You can see in the profile that the compiler inlines it all. It's not even much memory usually (the size of each is fairly small since it's a bitmap). And yet, the profile shows 2 full seconds just doing this unmarking. Which makes me think that either (1) the machine is swapping heavily, or (2) the thread is deprioritized behind other things happening so that it's not running (it's schedulable so it doesn't show up as idle)? The CPU graph shows that it's mostly around 15% CPU utilization, peaking at about 30%.
Comment 7•2 years ago
|
||
@mstange tracked down the code that is being executed:
0000000001ac44c4 stur xzr, [x13, #-0xa0]
0000000001ac44c8 add x13, x20, #0x38
0000000001ac44cc ldar w13, [x13]
0000000001ac44d0 cbnz w13, loc_1ac44dc
0000000001ac44d4 ldr x12, [x12, #0x10] ; <--- this is where all the samples spend their time
0000000001ac44d8 cbnz x12, loc_1ac4498
loc_1ac44dc:
0000000001ac44dc add x10, x10, #0x1
0000000001ac44e0 cmp x10, #0x23
ldr x12, [x12, #0x10]
looks very much like the call to arena.next()
since the next
field of js::gc::Arena
is at 0x10. That gave me pause initially because I thought it was already accessing the Arena
and so any paging would happen on an earlier access. But actually, the mark bits that are being cleared here are stored in the chunk, so this very well could be the first access of this address.
So it looks like page faults of some sort. If it were thread scheduling, I would expect the sampled addresses to be more spread out. (But I could be full of crap, too.)
Comment 8•2 years ago
|
||
If you get it into this state again, could you collect a report from about:memory?
jrmuizel suggested that a profile from Instruments with kernel stacks on would confirm if the unmark thread is going so slowly because of swapping. I don't have an OSX machine so I don't know how to do that, but that would be useful if you happen to know how.
Anything else that could tell whether it's swapping would confirm it to. free
on linux would give a hint, or top
or htop
or whatever. Or the machine just being miserably slow at everything and spinning the fans fast enough to take off.
Comment 9•2 years ago
|
||
I filed bug 1773944 last week which has a memory report attached, since this bug seemed
to be more directed at the GC wakeups.
I'm unsure how related the memory report in the other bug is though.
The symptoms look similar, background twitch.tv tab using lots of memory(1GB) while being paused.
As for creating a profile with Instruments, I will try to create one the next time.
Comment 10•2 years ago
|
||
FYI macOS goes to extreme lengths to prevent applications from going OOM, I've seen it use a swap file twice the size of the machine's memory. That's to say don't discount swapping as the cause of the slowness because macOS is very happy to do it.
Comment 11•2 years ago
|
||
sfink, Do you think it's worth pausing the CC and GC timers while background unmarking is happening. Doing this could be arkward, especially re-enabling the timers once the thread has finished. And in most cases unmarking is fast so it's possibly not worth it. Maybe it's easier to backoff these timers so they execute less often until unmarking is done.
Comment 12•2 years ago
|
||
(In reply to Paul Bone [:pbone] from comment #11)
sfink, Do you think it's worth pausing the CC and GC timers while background unmarking is happening. Doing this could be arkward, especially re-enabling the timers once the thread has finished. And in most cases unmarking is fast so it's possibly not worth it. Maybe it's easier to backoff these timers so they execute less often until unmarking is done.
It's a good idea, but I guess I kind of feel like I'd want more evidence that it would do some good. If the cause turns out to be that the machine was swapping, then I doubt the do-nothing GC slices are adding any additional wakeups or other overhead. Stalling off the slices would make it so the GC wouldn't be blamed, but unless there's something else going on it wouldn't improve actual performance or battery usage. Right now, the report is consistent with the hypothesis that the machine was swapping, everything was painfully slow, and yet it managed to gather a profile and the scheduled slices are the obvious thing that stand out. It would be really nice to have swap activity (or some other correlated metric) in the profiles...
I guess we kind of do; as florian said: "the machine appears to be under heavy load (out of a 599ms awake markers, we only get 40ms of actual CPU time)". I think whatever is causing that is the true culprit here. It's probably some variant of a memory leak. Twitch should not be using 2GB when it's backgrounded. It could be a site leak, or it could be something like unread messages piling up. I'm looking for anything in bug 1749366 that could be related.
Comment hidden (offtopic) |
Comment hidden (offtopic) |
Comment hidden (offtopic) |
Description
•