Open Bug 1772806 Opened 2 years ago Updated 2 years ago

Lots of wake-ups triggered by garbage collection in a background tab

Categories

(Core :: JavaScript: GC, defect, P3)

defect

Tracking

()

People

(Reporter: florian, Unassigned)

References

(Depends on 3 open bugs, Blocks 1 open bug)

Details

Attachments

(1 file)

@metasieben in #joy-of-profiling wrote:

here another >sitting in the background for days< profile, this time twitch taking apparently 2GB, maybe useful? https://share.firefox.dev/3aFWitv

I'm surprised that we have so many GCSlide of only a few microseconds at the beginning. Is there something we can do about it?

It's also unfortunate that the machine appears to be under heavy load (out of a 599ms awake markers, we only get 40ms of actual CPU time), but we kept this background process at the "User Interactive" Quality of Service. I guess that's bug 1394714.

Flags: needinfo?(pbone)

Forwarding the needinfo to Steve, as Paul is no longer working on GCs.

Severity: -- → S4
Flags: needinfo?(pbone) → needinfo?(sphink)
Priority: -- → P3

Thanks Florian,

It looks like maybe it starts to do a CC then bails out of a GC, but whatever told it it wants to do a CC retriggers. I'll have to spend some time reading the code to understand.

It might also be helpful to have markers about the management of idle time.

Depends on: 1394714, 1773198

I just noticed that there's a TaskController thread stuck in js::gc::BackgroundUnmarkTask::run (a 13s Runnable) and attempting to use CPU there, and the very short repeated GCSlice markers on the main thread stop when the TaskController runnable has finished. Should the main thread code more explicitly wait for that runnable to finish, and avoid triggering EnsureCCRunner::CCRunnerFired timers until it is finished?

Some more info about the profile:
It was generated via about:processes after switching to the backgrounded twitch-page.

STR(using Mac mini M1 with 8GB RAM):

  1. open a vod https://www.twitch.tv/videos/1486195733
  2. play the video
  3. pause the video
  4. switch to another tab
  5. leave the browser running for a few days

Sadly the STR aren't more specific and probably not easily reproducible.

This is interesting. We're waiting for the unmarking background task to finish, so idle time GC slices are returning immediately. Normally this task doesn't take so long to complete.

The background unmarking task is effectively a memset(ptr, 0, size) loop over various memory regions. It looks up some pointers and writes zeroes to them. You can see in the profile that the compiler inlines it all. It's not even much memory usually (the size of each is fairly small since it's a bitmap). And yet, the profile shows 2 full seconds just doing this unmarking. Which makes me think that either (1) the machine is swapping heavily, or (2) the thread is deprioritized behind other things happening so that it's not running (it's schedulable so it doesn't show up as idle)? The CPU graph shows that it's mostly around 15% CPU utilization, peaking at about 30%.

Flags: needinfo?(sphink)

@mstange tracked down the code that is being executed:


0000000001ac44c4         stur       xzr, [x13, #-0xa0]
0000000001ac44c8         add        x13, x20, #0x38
0000000001ac44cc         ldar       w13, [x13]
0000000001ac44d0         cbnz       w13, loc_1ac44dc

0000000001ac44d4         ldr        x12, [x12, #0x10] ; <---  this is where all the samples spend their time
0000000001ac44d8         cbnz       x12, loc_1ac4498

                     loc_1ac44dc:
0000000001ac44dc         add        x10, x10, #0x1
0000000001ac44e0         cmp        x10, #0x23

ldr x12, [x12, #0x10] looks very much like the call to arena.next() since the next field of js::gc::Arena is at 0x10. That gave me pause initially because I thought it was already accessing the Arena and so any paging would happen on an earlier access. But actually, the mark bits that are being cleared here are stored in the chunk, so this very well could be the first access of this address.

So it looks like page faults of some sort. If it were thread scheduling, I would expect the sampled addresses to be more spread out. (But I could be full of crap, too.)

If you get it into this state again, could you collect a report from about:memory?

jrmuizel suggested that a profile from Instruments with kernel stacks on would confirm if the unmark thread is going so slowly because of swapping. I don't have an OSX machine so I don't know how to do that, but that would be useful if you happen to know how.

Anything else that could tell whether it's swapping would confirm it to. free on linux would give a hint, or top or htop or whatever. Or the machine just being miserably slow at everything and spinning the fans fast enough to take off.

Flags: needinfo?(mozilla)

I filed bug 1773944 last week which has a memory report attached, since this bug seemed
to be more directed at the GC wakeups.

I'm unsure how related the memory report in the other bug is though.
The symptoms look similar, background twitch.tv tab using lots of memory(1GB) while being paused.

As for creating a profile with Instruments, I will try to create one the next time.

Flags: needinfo?(mozilla)

FYI macOS goes to extreme lengths to prevent applications from going OOM, I've seen it use a swap file twice the size of the machine's memory. That's to say don't discount swapping as the cause of the slowness because macOS is very happy to do it.

sfink, Do you think it's worth pausing the CC and GC timers while background unmarking is happening. Doing this could be arkward, especially re-enabling the timers once the thread has finished. And in most cases unmarking is fast so it's possibly not worth it. Maybe it's easier to backoff these timers so they execute less often until unmarking is done.

Flags: needinfo?(sphink)

(In reply to Paul Bone [:pbone] from comment #11)

sfink, Do you think it's worth pausing the CC and GC timers while background unmarking is happening. Doing this could be arkward, especially re-enabling the timers once the thread has finished. And in most cases unmarking is fast so it's possibly not worth it. Maybe it's easier to backoff these timers so they execute less often until unmarking is done.

It's a good idea, but I guess I kind of feel like I'd want more evidence that it would do some good. If the cause turns out to be that the machine was swapping, then I doubt the do-nothing GC slices are adding any additional wakeups or other overhead. Stalling off the slices would make it so the GC wouldn't be blamed, but unless there's something else going on it wouldn't improve actual performance or battery usage. Right now, the report is consistent with the hypothesis that the machine was swapping, everything was painfully slow, and yet it managed to gather a profile and the scheduled slices are the obvious thing that stand out. It would be really nice to have swap activity (or some other correlated metric) in the profiles...

I guess we kind of do; as florian said: "the machine appears to be under heavy load (out of a 599ms awake markers, we only get 40ms of actual CPU time)". I think whatever is causing that is the true culprit here. It's probably some variant of a memory leak. Twitch should not be using 2GB when it's backgrounded. It could be a site leak, or it could be something like unread messages piling up. I'm looking for anything in bug 1749366 that could be related.

Flags: needinfo?(sphink)
Depends on: 1801575
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: