Closed Bug 746253 Opened 12 years ago Closed 12 years ago

OOM when creating a lot of garbage without pauses with IGC on

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla15

People

(Reporter: azakai, Assigned: till)

References

(Blocks 1 open bug)

Details

Attachments

(3 files)

STR:

1. Unpack attached archive
2. python -m SimpleHTTPServer 8888
3. Browse to localhost:8888/client.html

In Firefox, memory usage rises dramatically until we run out of it, and there is a silent failure (the only way you can tell an OOM happened is to view memory usage on your machine). In Chrome, memory usage rises very little, and loading proceeds up to a later point where a "Shader compilation halt" exception is thrown (this is expected).
Attachment #615804 - Attachment mime type: text/plain → application/octet-stream
(The expected exception can be seen in the web console.)
This only happens when javascript.options.mem.gc_incremental is set to true. So it looks like when IGC is on, garbage accumulates until we eventually OOM, it is not collected quickly enough.

There are 2 bugs here, though. Aside from garbage not being collected quickly enough, there is the question of why garbage is accumulating at all. The code itself is emscripten-compiled and does not generate any significant amount of garbage. So perhaps this is TI-generated stuff? (Note that all the unnecessary GCing probably explains why it takes much longer to load in Firefox than Chrome.)
What does about:memory look like when the memory usage is rising?  In other words, what is it that is using more and more memory?
Whiteboard: [MemShrink]
The problem is that the memory increase happens during a continuous span of processing, without returning to the browser event loop, the UI hangs during that time. So I can't open about:memory until the OOM happens and memory goes back down.

Actually come to think of it, that might be connected to the OOM, if IGC only runs when code stops running.
Blocks: gecko-games
Turns out there was a bug in the JS here that did generate garbage, so GCing is expected in this testcase. Still, there remains the bug shown here that when garbage is created quickly before returning to the browser event loop, we can OOM with IGC on.
Blocks: 735099
Whiteboard: [MemShrink]
Summary: BananaBread OOM → OOM when creating a lot of garbage without pauses with IGC on
I'll take a look at this soon.
Assignee: general → wmccloskey
Attached patch patch (deleted) — Splinter Review
This seems to fix the problem. All these GCs are TOO_MUCH_MALLOC GCs. When one of these triggered, we would start incremental GC. However, since the event loop wasn't spinning, we wouldn't GC until the next 100MB of malloc bytes were used. There are checks in place if gcBytes grows too quickly, but not gcMallocBytes. This fixes that problem.
Attachment #620548 - Flags: review?(igor)
This problem indicates that there is value in having gcMallocBytes tracked in the runtime, too, as opposed to what I was saying in bug 723350. Maybe the solution for that would be to set a flag on the runtime as soon as at least one compartment has triggered a TOO_MUCH_MALLOC GC?

I'll test with the attached test-case and a modified version of the patch from comment 7 and report back.
(In reply to Bill McCloskey (:billm) from comment #7)
> Created attachment 620548 [details] [diff] [review]
> patch
> 
> This seems to fix the problem. All these GCs are TOO_MUCH_MALLOC GCs. When
> one of these triggered, we would start incremental GC. However, since the
> event loop wasn't spinning, we wouldn't GC until the next 100MB of malloc
> bytes were used. There are checks in place if gcBytes grows too quickly, but
> not gcMallocBytes. This fixes that problem.

But what is exactly the problem then? Bad incremental GC scheduling so its slice does not run as needed?
This patch builds on Bug 723350 and is strictly a reduced version of billm's patch that leaves out the runtime trigger.

Testing with azakai's test-case shows that it effectively removes the unbounded memory growth and leads to non-incremental GCs that only collect the affected compartment instead of being global.
Attachment #620649 - Flags: review?(igor)
(In reply to Igor Bukanov from comment #9)
> But what is exactly the problem then? Bad incremental GC scheduling so its
> slice does not run as needed?

Yes, it's bad incremental scheduling. Once an incremental GC starts, there are three reasons why we'll do more slices:

1. There's a 100ms timer in nsJSEnvironment.
2. If we draw a frame, we do another slice.
3. If gcBytes grows too large, we'll immediately finish the GC.

Since we never run the event loop in this code, #1 and #2 will not trigger. And since the demo allocates a lot of malloc bytes but almost no JS objects, we won't hit #3.

We will do a slice every time the malloc trigger hits, which is every 100MB of malloc allocation. But this is not frequent enough to stop us from OOMing.

Trigger #3 is a "safety valve" to ensure that we GC frequently enough. The problem is that it wasn't tracking malloc bytes. This patch makes it so it does.

I'm fine with Till's patch since it fixes the problem as well.
Comment on attachment 620649 [details] [diff] [review]
Use non-incremental GCs for compartmental TOO_MUCH_MALLOC GCs

Review of attachment 620649 [details] [diff] [review]:
-----------------------------------------------------------------

As a spot fix I like this better.
Attachment #620649 - Flags: review?(igor) → review+
(In reply to Bill McCloskey (:billm) from comment #11)
> 1. There's a 100ms timer in nsJSEnvironment.
> 2. If we draw a frame, we do another slice.
> 3. If gcBytes grows too large, we'll immediately finish the GC.
> 
> Since we never run the event loop in this code, #1 and #2 will not trigger.
> And since the demo allocates a lot of malloc bytes but almost no JS objects,
> we won't hit #3.

We have all that wonderful operational callback infrastructure including the extra thread that reliably triggers the callback to stop the long-running script or do other things. We should use that to trigger the incremental GC scheduling reliably.
Thanks for the review, Igor!
Keywords: checkin-needed
(In reply to Igor Bukanov from comment #13)
> We have all that wonderful operational callback infrastructure including the
> extra thread that reliably triggers the callback to stop the long-running
> script or do other things. We should use that to trigger the incremental GC
> scheduling reliably.

Yeah, I guess it would be nice to use that to trigger the incremental slices.

However, I still don't see this patch as a spot fix. I've tried to design the incremental GC heuristics so that we always start the GC before any allocation triggers hit. If we do hit an allocation trigger (like gcTriggerBytes or gcMaxMallocBytes) then I think we should always do a non-incremental GC right away.

It would probably make sense to start an incremental GC from MaybeGC if gcMallocBytes == gcMaxMallocBytes/2, or something like that. I'm not sure how useful that would be in practice, though.
Assuming that attachment 620649 [details] [diff] [review] is the patch intended for checkin, it does not apply cleanly to inbound. Please rebase and re-request.
Keywords: checkin-needed
https://hg.mozilla.org/integration/mozilla-inbound/rev/bd39bb6ae2f2
Assignee: wmccloskey → tschneidereit+bmo
Target Milestone: --- → mozilla15
https://hg.mozilla.org/mozilla-central/rev/bd39bb6ae2f2
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: