Closed Bug 746253 Opened 13 years ago Closed 13 years ago

OOM when creating a lot of garbage without pauses with IGC on

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla15

People

(Reporter: azakai, Assigned: till)

References

(Blocks 1 open bug)

Details

Attachments

(3 files)

BananaBread very early unoptimized build 13 years ago Alon Zakai (:azakai) (deleted), application/octet-stream		Details
patch 13 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) (deleted), patch		Details \| Diff \| Splinter Review
Use non-incremental GCs for compartmental TOO_MUCH_MALLOC GCs 13 years ago Till Schneidereit [:till] (deleted), patch	igor : review+	Details \| Diff \| Splinter Review

Alon Zakai (:azakai)

Reporter

Description

•

13 years ago

Attached file BananaBread very early unoptimized build (deleted) — Details

STR: 1. Unpack attached archive 2. python -m SimpleHTTPServer 8888 3. Browse to localhost:8888/client.html In Firefox, memory usage rises dramatically until we run out of it, and there is a silent failure (the only way you can tell an OOM happened is to view memory usage on your machine). In Chrome, memory usage rises very little, and loading proceeds up to a later point where a "Shader compilation halt" exception is thrown (this is expected).

Alon Zakai (:azakai)

Reporter

Updated

•

13 years ago

Attachment #615804 - Attachment mime type: text/plain → application/octet-stream

Alon Zakai (:azakai)

Reporter

Comment 1

•

13 years ago

(The expected exception can be seen in the web console.)

Alon Zakai (:azakai)

Reporter

Comment 2

•

13 years ago

This only happens when javascript.options.mem.gc_incremental is set to true. So it looks like when IGC is on, garbage accumulates until we eventually OOM, it is not collected quickly enough. There are 2 bugs here, though. Aside from garbage not being collected quickly enough, there is the question of why garbage is accumulating at all. The code itself is emscripten-compiled and does not generate any significant amount of garbage. So perhaps this is TI-generated stuff? (Note that all the unnecessary GCing probably explains why it takes much longer to load in Firefox than Chrome.)

Andrew McCreight [:mccr8]

Comment 3

•

13 years ago

What does about:memory look like when the memory usage is rising? In other words, what is it that is using more and more memory?

Whiteboard: [MemShrink]

Alon Zakai (:azakai)

Reporter

Comment 4

•

13 years ago

The problem is that the memory increase happens during a continuous span of processing, without returning to the browser event loop, the UI hangs during that time. So I can't open about:memory until the OOM happens and memory goes back down. Actually come to think of it, that might be connected to the OOM, if IGC only runs when code stops running.

Alon Zakai (:azakai)

Reporter

Updated

•

13 years ago

Blocks: gecko-games

Alon Zakai (:azakai)

Reporter

Comment 5

•

13 years ago

Turns out there was a bug in the JS here that did generate garbage, so GCing is expected in this testcase. Still, there remains the bug shown here that when garbage is created quickly before returning to the browser event loop, we can OOM with IGC on.

Nicholas Nethercote [inactive]

Updated

•

13 years ago

Blocks: 735099

Whiteboard: [MemShrink]

Alon Zakai (:azakai)

Reporter

Updated

•

13 years ago

Summary: BananaBread OOM → OOM when creating a lot of garbage without pauses with IGC on

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 6

•

13 years ago

I'll take a look at this soon.

Assignee: general → wmccloskey

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 7

•

13 years ago

Attached patch patch (deleted) — Details — Splinter Review

This seems to fix the problem. All these GCs are TOO_MUCH_MALLOC GCs. When one of these triggered, we would start incremental GC. However, since the event loop wasn't spinning, we wouldn't GC until the next 100MB of malloc bytes were used. There are checks in place if gcBytes grows too quickly, but not gcMallocBytes. This fixes that problem.

Attachment #620548 - Flags: review?(igor)

Till Schneidereit [:till]

Assignee

Comment 8

•

13 years ago

This problem indicates that there is value in having gcMallocBytes tracked in the runtime, too, as opposed to what I was saying in bug 723350. Maybe the solution for that would be to set a flag on the runtime as soon as at least one compartment has triggered a TOO_MUCH_MALLOC GC? I'll test with the attached test-case and a modified version of the patch from comment 7 and report back.

Igor Bukanov

Comment 9

•

13 years ago

(In reply to Bill McCloskey (:billm) from comment #7) > Created attachment 620548 [details] [diff] [review] > patch > > This seems to fix the problem. All these GCs are TOO_MUCH_MALLOC GCs. When > one of these triggered, we would start incremental GC. However, since the > event loop wasn't spinning, we wouldn't GC until the next 100MB of malloc > bytes were used. There are checks in place if gcBytes grows too quickly, but > not gcMallocBytes. This fixes that problem. But what is exactly the problem then? Bad incremental GC scheduling so its slice does not run as needed?

Till Schneidereit [:till]

Assignee

Comment 10

•

13 years ago

Attached patch Use non-incremental GCs for compartmental TOO_MUCH_MALLOC GCs (deleted) — Details — Splinter Review

This patch builds on Bug 723350 and is strictly a reduced version of billm's patch that leaves out the runtime trigger. Testing with azakai's test-case shows that it effectively removes the unbounded memory growth and leads to non-incremental GCs that only collect the affected compartment instead of being global.

Attachment #620649 - Flags: review?(igor)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 11

•

13 years ago

(In reply to Igor Bukanov from comment #9) > But what is exactly the problem then? Bad incremental GC scheduling so its > slice does not run as needed? Yes, it's bad incremental scheduling. Once an incremental GC starts, there are three reasons why we'll do more slices: 1. There's a 100ms timer in nsJSEnvironment. 2. If we draw a frame, we do another slice. 3. If gcBytes grows too large, we'll immediately finish the GC. Since we never run the event loop in this code, #1 and #2 will not trigger. And since the demo allocates a lot of malloc bytes but almost no JS objects, we won't hit #3. We will do a slice every time the malloc trigger hits, which is every 100MB of malloc allocation. But this is not frequent enough to stop us from OOMing. Trigger #3 is a "safety valve" to ensure that we GC frequently enough. The problem is that it wasn't tracking malloc bytes. This patch makes it so it does. I'm fine with Till's patch since it fixes the problem as well.

Igor Bukanov

Comment 12

•

13 years ago

Comment on attachment 620649 [details] [diff] [review] Use non-incremental GCs for compartmental TOO_MUCH_MALLOC GCs Review of attachment 620649 [details] [diff] [review]: ----------------------------------------------------------------- As a spot fix I like this better.

Attachment #620649 - Flags: review?(igor) → review+

Igor Bukanov

Comment 13

•

13 years ago

(In reply to Bill McCloskey (:billm) from comment #11) > 1. There's a 100ms timer in nsJSEnvironment. > 2. If we draw a frame, we do another slice. > 3. If gcBytes grows too large, we'll immediately finish the GC. > > Since we never run the event loop in this code, #1 and #2 will not trigger. > And since the demo allocates a lot of malloc bytes but almost no JS objects, > we won't hit #3. We have all that wonderful operational callback infrastructure including the extra thread that reliably triggers the callback to stop the long-running script or do other things. We should use that to trigger the incremental GC scheduling reliably.

Till Schneidereit [:till]

Assignee

Comment 14

•

13 years ago

Thanks for the review, Igor!

Keywords: checkin-needed

Bill McCloskey [inactive unless it's an emergency] (:billm)

Updated

•

13 years ago

Attachment #620548 - Flags: review?(igor)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 15

•

13 years ago

(In reply to Igor Bukanov from comment #13) > We have all that wonderful operational callback infrastructure including the > extra thread that reliably triggers the callback to stop the long-running > script or do other things. We should use that to trigger the incremental GC > scheduling reliably. Yeah, I guess it would be nice to use that to trigger the incremental slices. However, I still don't see this patch as a spot fix. I've tried to design the incremental GC heuristics so that we always start the GC before any allocation triggers hit. If we do hit an allocation trigger (like gcTriggerBytes or gcMaxMallocBytes) then I think we should always do a non-incremental GC right away. It would probably make sense to start an incremental GC from MaybeGC if gcMallocBytes == gcMaxMallocBytes/2, or something like that. I'm not sure how useful that would be in practice, though.

Ryan VanderMeulen [:RyanVM]

Comment 16

•

13 years ago

Assuming that attachment 620649 [details] [diff] [review] is the patch intended for checkin, it does not apply cleanly to inbound. Please rebase and re-request.

Keywords: checkin-needed

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 17

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/bd39bb6ae2f2

Assignee: wmccloskey → tschneidereit+bmo

Target Milestone: --- → mozilla15

Ed Morley [:emorley]

Comment 18

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/bd39bb6ae2f2

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.