Open Bug 865959 Opened 11 years ago Updated 2 years ago

[meta] need more aggressive GC on reload for pages that use lots of memory

Categories

(Core :: DOM: Navigation, defect)

defect

Tracking

()

People

(Reporter: vlad, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: meta, Whiteboard: [games:p?])

Attachments

(1 file, 2 obsolete files)

Attached file big-alloc.html (deleted) —
The attached testcase allocates a 1GB arraybuffer.  You should be able to keep reloading it practically forever, but right now we still have the objects from the previous page in memory, so future allocations fail. (32-bit build)

We should be looking at the memory in use by a page, and if it's over some threshold, we should aggressively reclaim on reload or navigation away (e.g. we shouldn't keep those pages around in bfcache but blow them away right away).
Whiteboard: [MemShrink]
Duplicating the proposal in bug 865960 comment 3:
 1. on large ArrayBuffer allocations (and other places with the same large-allocation problem), if we fail, we call some "large allocation failure handler" that is a callback into the DOM
 2. from this handler, we drop everything in the BF Cache in the whole browser
 3. after calling the large allocation failure handler, we retry the allocation.  If this fails, then print an error message (or, even better, attempt to throw, so that the page can give the user the advice to close other tabs)

Here I say "BF Cache", but I don't really understand what I'm talking about and I'm told reload is a separate problem.  I'm hoping that the same callback could deal with reload as well.
Whiteboard: [MemShrink]
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P2]
This is relevant to the post-fix state of bug 902922.  http://p3d.in/3axnM is the test case.

BTW, we now have code that can measure with reasonable accuracy how much memory a page (tab) is using (thanks to bug 918207).  On my fast Linux desktop, measuring the JS memory consumption of a gmail tab takes about 8ms (once bug 930876 lands), and measuring the non-JS memory consumption takes about 5ms.  So, fairly fast.
> BTW, we now have code that can measure with reasonable accuracy how much
> memory a page (tab) is using

Just to clarify:  the measurement of JS memory consumption is very comprehensive;  the measurement of everything else is moderately comprehensive (e.g. much of DOM, layout, style stuff is covered, but images aren't measured per-tab (bug 921300 is open for that), and lots of gfx stuff isn't measured).
Whiteboard: [MemShrink:P2] → [MemShrink:P2] [games:p1]
Maybe hooking something into WindowDestroyedEvent::Run would be the way to go?  That's where we do NukeCrossCompartmentWrappers.  Maybe we could have some kind of nuke-giant-array-buffers.
You could try printing something in that method to see if the timing is in line with what you are thinking of.
Let me try to throw something together...
Assignee: nobody → continuation
Attached patch nuke array buffers, WIP (obsolete) (deleted) — Splinter Review
Here's a rough patch that seems to more or less handle the reload case.  I spam clicked reload for ten or so seconds on the test case, then quickly measured memory and it was 1gig.  In contrast, without that patch, there was an extra gig or two of heap-unclassified hanging around.

Basically, when we're notified that a window is destroyed, right in the place where we nuke cross compartment wrappers, we iterate over all objects in the zone (this should actually be the compartment, now that I think about it).  If we find any array buffers, we call stealContents on them, and then call js_free on the contents, freeing the memory.  This seems to take about a second, but it looks like that's all the js_free call.
Let me know if this helps the scenario you are thinking of.  Comments 0 and 1 have a ton of different scenarios, so I'm not sure what you are really after.
> we iterate over all objects
> in the zone (this should actually be the compartment, now that I think about
> it).

It might be easier to just create a new IterateCompartmentCells() function.
Yeah, I was going to write a less hacky patch with a new iterator if this is the right approach.  I think you still have to iterate over the entire zone, but filtering the compartment out in the iterator rather than the callback seems like it would be more efficient.
Blocks: gecko-games
Luke sent me this via email:

"Here's my current ideal:
  - whenever a page becomes inactive (or whatever the term is that means "not the current window of any tab"), if that page is flagged as containing a live large ArrayBuffer (which we could maintain with a counter on the zone that gets inc'd on alloc and dec'd on finalize), we'd immediately nuke it and schedule a GC+CC "real soon".
  - when certain large allocations are about to fail (viz., the ArrayBuffer's js_calloc), we synchronously: (1) nuke all non-active pages (evict the entire bf cache), (2) GC+CC."
For the second one, it sounds like maybe we want to hook up these certain large allocations to the browser's memory pressure notification system.  If we are doing this for large allocations, then we probably still have a decent amount of non-contiguous free memory, so I'd expect we can actually do something without falling over.  Then we'd want to make sure things like flushing the bfcache and synchronous GC/CC are hooked up to the memory pressure notification.  Does that sound right?

For the first one, I'm still not entirely clear on it.  It sounds like maybe you want pages with large ArrayBuffers to not ever be put in the bfcache?  I don't know if there's some mechanism to flag a window as uncachable.  You'd get worse navigation back-and-forth for these pages, but are they are stressing the limit of how much memory the browser can even hold, you are probably better off doing that.  Then, I think we'd want some kind of ArrayBuffer nuking mechanism like I added in the patch here to so we don't have to wait 5-15 seconds for GC/CC to cycle.
Are you really concerned with the general case of "pages that use lots of memory" or is dealing with "pages that have really large array buffers" sufficient?  What is the memory usage breakdown typically like for array buffer vs. images vs. sound vs. whatever for the games you are concerned about?
I'm trying to scope this to be more specific than the "make GC better" of the original summary, but more general than the initial example test case of "rapidly reload a page that just has a 1gig ArrayBuffer".
(In reply to Andrew McCreight [:mccr8] from comment #12)
> For the second one, it sounds like maybe we want to hook up these certain
...
> pressure notification.  Does that sound right?

Yes.  What does 'hooking up to the memory pressure notification' entail, exactly?

> For the first one, I'm still not entirely clear on it.  It sounds like maybe
> you want pages with large ArrayBuffers to not ever be put in the bfcache?  I
> don't know if there's some mechanism to flag a window as uncachable.

Yes, that is a good way to say it.  It sounds like there is already a predicate that evaluates whether a window is cacheable (bz said if you have an onUnload handler you are not cacheable).


> You'd get worse navigation back-and-forth for these pages, but are they are
> stressing the limit of how much memory the browser can even hold, you are
> probably better off doing that.

Agreed

> Then, I think we'd want some kind of
> ArrayBuffer nuking mechanism like I added in the patch here to so we don't
> have to wait 5-15 seconds for GC/CC to cycle.

Ah, good idea!  So maybe the Zone keeps a list of its contained huge buffers and we have NukeCrossCompartmentWrappers also neuter all buffers in this list.  That way we get the synchronous free() call w/o the jank of a sync GC+CC.

(Just to confirm: if you are uncacheable, you get immediately Nuke'd?)

(In reply to Andrew McCreight [:mccr8] from comment #13)
> Are you really concerned with the general case of "pages that use lots of
> memory" or is dealing with "pages that have really large array buffers"
> sufficient?

The latter would be sufficient; although the former may be an interesting longer term goal.

> What is the memory usage breakdown typically like for array
> buffer vs. images vs. sound vs. whatever for the games you are concerned
> about?

There are a couple huge array buffers (>100MB: one for the heap, one for the VFS, several held by the AudioBuffer) that take the lion's share of memory, but I've also probably <200MB combined other WebGL, JS GC heap, JIT code and other resources that would be useful to clear "soon".  I think synchronously neutering the large ABs would get us most of the way, though.

(In reply to Andrew McCreight [:mccr8] from comment #14)
> I'm trying to scope this to be more specific than the "make GC better" of
> the original summary, but more general than the initial example test case of
> "rapidly reload a page that just has a 1gig ArrayBuffer".

Yes, much appreciated!  As stated above, it's not *quite* as simple as a single 1GB AB; more like several >50MB ABs that total more than 512MB.  It'd also be reasonable to only do this eviction (to preserve the bfcache benefits) on 32-bit systems.
Depends on: 936128
Depends on: 936218
Depends on: 936236
> (Just to confirm: if you are uncacheable, you get immediately Nuke'd?)

I'd expect so, but I haven't tested it.
If it proves useful, we could add more things to WindowDestroyedEvent::Run.  For instance, I get the impression that dumping the JIT code is no big deal.  tn also points out we could probably throw out image data for the current tab, under the assumption that mostly big images will only be used by a single tab.
This is kind of a meta bug.  I think I've filed everything mentioned in this bug prior to comment 17, and that should be enough to go on for now.
Keywords: meta
We usually don't put MemShrink tags on meta bugs so I'm removing it.
Whiteboard: [MemShrink:P2] [games:p1] → [games:p1]
Depends on: 965916
No longer depends on: 936218
Attached patch notify-on-failure (WIP) (obsolete) (deleted) — Splinter Review
This unfinished patch synchronously sends the low-memory notification and it seems to fix the testcase on 32-bit Windows.

I'll probably hoist the logic from AllocateArrayBufferContents into some JSRuntime::{malloc,calloc,realloc}Reentrant functions that we can use later.  Also, Andrew, do you know if "heap-minimize" is the right "topic" for the notification?
Assignee: continuation → luke
Attachment #827756 - Attachment is obsolete: true
Status: NEW → ASSIGNED
I'm not really sure.  about:memory does "heap-minimize", but then other place I see "memory-pressure" being used (like the code I added in bug 965916).  I don't know what the relationship between those is.  Maybe njn does?
I DXR'd for "heap-minimize" and got this: http://dxr.mozilla.org/mozilla-central/source/xpcom/base/nsIMemory.idl#17

So "memory-pressure" is the topic, and "heap-minimize" is one of several sub-topics. The sub-topics are used only in a few places, it looks like. I can't tell you anything more than that.
Oops, yes, I meant "sub-topic".  "heap-minimize" it is, then.
Comment on attachment 8374498 [details] [diff] [review]
notify-on-failure (WIP)

This should have been in bug 936236.
Attachment #8374498 - Attachment is obsolete: true
So unfortunately, attachment 742147 [details] still OOMs (on 32-bit Windows; not sure why it worked before but not now).  There is an improvement, though: the second reload always succeeds.  My first thought was background sweeping (hence bug 936236 comment 28), but that doesn't fix it.

Digging in, I can see that, on the reloads that fail, the big array buffer is never finalized (synchronously or even if I wait a while).  On reloads that succeed, the finalization happens before the NotifyObservers call returns.  In both fail/success cases, I can confirm the GC/CC/GC tri happen synchronously.  So it seems like something is temporarily keeping the window reachable on the refreshes that fail.  BF Cache is suspect, but (1) according to bz, refresh doesn't put the previous page in the cache, (2) manually adding window.unload listener doesn't fix either.

Andrew, perhaps you could help see how the should-be-dead window is staying alive?
> Andrew, perhaps you could help see how the should-be-dead window is staying alive?
Sure, I can look later today, or maybe early next week.

If you want to look at it before I get to it, what you do is:
1. Run the browser with XPCOM_CC_LOG_ALL=1
2. Run find_roots.py <log name> nsGlobalWindow on the CC log where you think the window should be cleaned up already (from https://github.com/amccreight/heapgraph ), which you can figure out by grepping for nsGlobalWindow, and seeing the last one where the window in question shows up.
Any chance you could give a look?  (If not, I'll take a stab, but I think you are much more experienced interpreting the output.)
I started looking at it, but I got distracted by something.  I'll try to pick it up again...
Whiteboard: [games:p1] → [games:p?]

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: mail → nobody
Status: ASSIGNED → NEW
Summary: need more aggressive GC on reload for pages that use lots of memory → [meta] need more aggressive GC on reload for pages that use lots of memory
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: