Closed Bug 657115 Opened 14 years ago Closed 7 years ago

Extensive memory usage during ROME demo

Categories

(Core :: Graphics: CanvasWebGL, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: WonderCsabo, Unassigned)

References

()

Details

(Whiteboard: [MemShrink:P2][see comment 54])

Attachments

(4 files)

User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Build Identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a1) Gecko/20110512 Firefox/6.0a1

The memory usage during the interactive film is terribly huge. It's far worse than Google Chrome 12.

Reproducible: Always

Steps to Reproduce:
1. Open the webpage
2. Play the demo
3. Check the memory usage
Version: unspecified → Trunk
Here's the link to the demo: http://www.ro.me/
I already provided it in the URL section. ;)
I have also noticed that after I have closed tab with that Demo, Firefox was doing something strange:
Memory usage was rising up to 2GB, then went down to 1,5GB and this behavior started to loop with some "rest time"
Related to Bug 656761 and thus Bug 652931?
Ah wait, Bug 652931 landed before your tested Nightly Build ...
Confirmed for Firefox 4.0.1 on OS X 10.6.7. 

I ran the demo on my MacBook Pro with 4 GB RAM. I can't give any concrete numbers about the RAM usage though as after some time the whole OS become unresponsive to the point that I had real trouble killing the Firefox process. The demo runs fine with Chrome 11 where it uses about 700-800 RAM according to its internal task manager.
Confirmed on latest nightly
Confirmed on :::: Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110515 Firefox/6.0a1 #1305448436
Blocking #640452?
I have some observations here (NoScript forums):

"Nightly 6.0a1 + NS 2.1.0.3: memory leak"
http://forums.informaction.com/viewtopic.php?f=10&t=6467

(Note that my methodology was not correct, nevertheless ...) 

Other "ROME" Bugs:

Bug 657118 - Several problems with ROME demo's tech page
Bug 657151 - Long GC pause time caused by JSGC_FINALIZE_END callback
I tried with fresh profile when tested the memory usage. It's not a NoScript problem.
bjacob, any thoughts?
We are leaking here. STR is to run the demo til it finishes (run it in a background tab if it crashes due to OOM otherwise), then see that after closing the tab, even GC+GC does not clear the memory.

Longer version:

After preloading (that is, right before pressing BEGIN for the demo to begin), about:memory shows 650MB in js gc heap and 530MB in heap-unclassified (video stuff?). resident is 1.3GB and vsize is 2GB.

During the demo js gc rises to 750GB or higher. vsize rises to over 2.2GB, at which point the demo usually errors on out of memory on my 4GB laptop.

If I run the demo in another tab, it uses less memory (less frames since we throttle timers, rendering, etc.). In this case my laptop survives until the end of the demo.

*However*, it is left in a very bad state: even after closing the ro.me tab, memory usage only decreases a tiny bit (to 1.4GB resident, 2.2GB virtual). Every few seconds the browser locks up entirely for a few seconds, breaking shows this is in cycle collection. But nothing gets cleaned up, so there is a leak here.

It seems likely that we were leaking all through the demo then, which can explain the excessive memory usage during the demo.

njn, a side issue that I hope you can clarify: Why is virtual memory usage so much higher than resident here? I suspect that running out of virtual memory is why it OOMs on my 4GB laptop (since there is 1GB of physical memory left, + swap).
Blocks: mlk-fx7
Status: UNCONFIRMED → NEW
Ever confirmed: true
Sorry I've still not finished the about:memory support for WebGL patch..
Just a naive question, if we were leaking video memory, could that explain the discrepancy between virtual and resident pointed out in comment 13?
Attached file output during successful CC after preload (deleted) β€”
If I just preload, and do not actually start the demo, I mostly OOM during CC (memory rises before it goes down, I guess). However, I succeeded once to run CC to completion, and memory usage went all the way down (note that this doesn't mean there isn't a leak later - this was just preloading, not running the actual demo as before).

I got the attached output during the successful CC.

1. "ASSERTION: Wrong element being closed" appears a lot. No idea if important or not.

2. WebGL reports a lot of 'Outstanding Textures' and other items, in a very lengthy output (I shortened it here, [MANY MANY NUMBERS] etc. were very very long lists of numerical indexes).
Also, 57 (!) threads exit as we do that CC after preload.
After the demo preloads, we have 50 threads doing media stuff, half nsBuiltinDecoderStateMachine::DecodeLoop and half nsBuiltinDecoderStateMachine::AdvanceFrame (see stacks below). There are also a few audio threads (nsBuiltinDecoderStateMachine::AudioLoop and nsOggReader::DecodeAudioData). All these threads stay active until the page is closed and CC is done.

kinetik, is this amount of threads expected?

I recall hearing that Linux threads take a lot of virtual memory for their stacks. A little googling gives sites claiming 2MB or 8MB. If 8MB, then these 50 threads can explain 400MB of virtual memory - about half of the difference between virtual and resident memory.

Stacks for the two types of threads (each appearing 25 times):

=====
#2  0xb538dab1 in PR_WaitCondVar (cvar=0x758ff7c0, timeout=4294967295)
    at /home/alon/Dev/mozilla-central/nsprpub/pr/src/pthreads/ptsynch.c:417
#3  0xb538dbb8 in PR_Wait (mon=0x74afc5c0, timeout=4294967295) at /home/alon/Dev/mozilla-central/nsprpub/pr/src/pthreads/ptsynch.c:614
#4  0xb72e4765 in mozilla::ReentrantMonitor::Wait (this=0xb3f4ce00, interval=4294967295)
    at /home/alon/Dev/m-c/xpcom/build/BlockingResourceBase.cpp:346
#5  0xb664f2dc in mozilla::ReentrantMonitorAutoEnter::Wait (this=0x734fa580) at ../../dist/include/mozilla/ReentrantMonitor.h:224
#6  nsBuiltinDecoderStateMachine::DecodeLoop (this=0x734fa580)
    at /home/alon/Dev/mozilla-central/content/media/nsBuiltinDecoderStateMachine.cpp:380
#7  0xb665490e in nsRunnableMethodImpl<void (nsBuiltinDecoderStateMachine::*)(), true>::Run (this=0xace2d5c0)
    at ../../dist/include/nsThreadUtils.h:341
#8  0xb7376398 in nsThread::ProcessNextEvent (this=0x75aff700, mayWait=1, result=0x5e6ff23c)
    at /home/alon/Dev/mozilla-central/xpcom/threads/nsThread.cpp:618
====
#2  0xb538dab1 in PR_WaitCondVar (cvar=0x758ff7c0, timeout=4294967295)
    at /home/alon/Dev/mozilla-central/nsprpub/pr/src/pthreads/ptsynch.c:417
#3  0xb538dbb8 in PR_Wait (mon=0x74afc5c0, timeout=4294967295) at /home/alon/Dev/mozilla-central/nsprpub/pr/src/pthreads/ptsynch.c:614
#4  0xb72e4765 in mozilla::ReentrantMonitor::Wait (this=0xb3f4ce00, interval=4294967295)
    at /home/alon/Dev/m-c/xpcom/build/BlockingResourceBase.cpp:346
#5  0xb664f9d8 in nsBuiltinDecoderStateMachine::AdvanceFrame (this=0x734fa580)
    at /home/alon/Dev/mozilla-central/content/media/nsBuiltinDecoderStateMachine.cpp:1544
#6  0xb6653215 in nsBuiltinDecoderStateMachine::Run (this=0x734fa580)
    at /home/alon/Dev/mozilla-central/content/media/nsBuiltinDecoderStateMachine.cpp:1141
#7  0xb7376398 in nsThread::ProcessNextEvent (this=0x75afe880, mayWait=1, result=0x5f0ff23c)
    at /home/alon/Dev/mozilla-central/xpcom/threads/nsThread.cpp:618
====

Side comment: Using gdb and breaking on NS_NewThread and seeing each thread as it is created will consistently hang X for me. Much fun.
(In reply to comment #17)
> kinetik, is this amount of threads expected?

For each playing element there will be at least a decode thread (DecodeLoop), a state machine thread (AdvanceFrame), and an audio thread (AudioLoop).  Behind the scenes, there's also another audio thread created by ALSA on Linux for every stream we open.  OS X and Windows tend to have only a single audio thread for all open streams.

> I recall hearing that Linux threads take a lot of virtual memory for their
> stacks. A little googling gives sites claiming 2MB or 8MB. If 8MB, then
> these 50 threads can explain 400MB of virtual memory - about half of the
> difference between virtual and resident memory.

The default stack size is 8MB:
~% ulimit -Sa
-s: stack size (kbytes)        8192

Last I checked (mainly looking for calls to pthread_attr_setstacksize), I couldn't see anywhere that we change that at runtime.

> Side comment: Using gdb and breaking on NS_NewThread and seeing each thread
> as it is created will consistently hang X for me. Much fun.

Export GDK_DEBUG=nograbs in your environment before running the browser.
(In reply to comment #17)
> After the demo preloads, we have 50 threads doing media stuff

That sounds reminiscent of bug 592833. Mark as blocking this one?
azakai, it's entirely typical for the vsize to be a lot bigger than the resident;  take a look at about:memory on some more typical workloads.  So don't focus on that, vsize isn't a very useful metric.

This is a good bug report, though, sounds like a big problem.
> > Side comment: Using gdb and breaking on NS_NewThread and seeing each thread
> > as it is created will consistently hang X for me. Much fun.
> 
> Export GDK_DEBUG=nograbs in your environment before running the browser.

Thanks, I did not know that!
This demo suffers from bug 656120 as well. Closing the tab is not always enough for memory to be reclaimed, but forcing a GC is (except when the leak mentioned earlier in this bug happens).
Depends on: 656120
The leak I saw, where GC takes a very long time and doesn't succeed, is apparently due to the cycle collector itself OOMing, which prevents it from working properly,

WARNING: out of memory: file /home/alon/Dev/m-c/xpcom/build/nsDeque.cpp, line 210

#0  nsDeque::Push (this=0xb2dfe038, aItem=0x3bdac0f4) at /home/alon/Dev/m-c/xpcom/build/nsDeque.cpp:211
#1  0xb739d176 in GraphWalker<scanVisitor>::DoWalk (this=0xb2dfe06c, aQueue=...)
    at /home/alon/Dev/mozilla-central/xpcom/base/nsCycleCollector.cpp:1271
#2  0xb73993ec in GraphWalker<scanVisitor>::WalkFromRoots (this=0xb3f85800)
    at /home/alon/Dev/mozilla-central/xpcom/base/nsCycleCollector.cpp:1254
#3  nsCycleCollector::ScanRoots (this=0xb3f85800) at /home/alon/Dev/mozilla-central/xpcom/base/nsCycleCollector.cpp:1850
#4  0xb739b584 in nsCycleCollector::BeginCollection (this=0xb3f85800, aForceGC=0, aListener=0x0)
    at /home/alon/Dev/mozilla-central/xpcom/base/nsCycleCollector.cpp:2651
#5  0xb739dd6a in nsCycleCollectorRunner::Run (this=0xb3fe0d80) at /home/alon/Dev/mozilla-central/xpcom/base/nsCycleCollector.cpp:3327
#6  0xb7376398 in nsThread::ProcessNextEvent (this=0xb3f06e00, mayWait=1, result=0xb2dfe23c)
    at /home/alon/Dev/mozilla-central/xpcom/threads/nsThread.cpp:618
#7  0xb72e2c8f in NS_ProcessNextEvent_P (thread=0x0, mayWait=1) at /home/alon/Dev/m-c/xpcom/build/nsThreadUtils.cpp:245
#8  0xb7376e50 in nsThread::ThreadFunc (arg=0xb3f06e00) at /home/alon/Dev/mozilla-central/xpcom/threads/nsThread.cpp:273
#9  0xb539567c in _pt_root (arg=0xb3f117f0) at /home/alon/Dev/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:187
#10 0xb7fb996e in start_thread (arg=0xb2dfeb70) at pthread_create.c:300
#11 0xb519aa4e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

Perhaps we should abort when OOMing in the cycle collector? It might work out in other cases though, so I guess not.

So in summary, the main issues in this bug are

1. Many threads for media elements (bug 592833). A contributing factor to memory issues.
2. GC does not always run, memory which could be reclaimed is not reclaimed (bug 656120).
3. In general, higher JS memory usage than chrome which hits us in some cases, and I suspect here.
Whiteboard: [MemShrink:P1]
Oh, another thing worth noting about vsize is that you'll see 64MB of address space allocated per open audio stream due to bug 617852.
Assignee: nobody → jmuizelaar
Assignee: jmuizelaar → nobody
OS: Windows 7 → All
Hardware: x86 → All
Whiteboard: [MemShrink:P1] → [MemShrink:P2]
azakai, now that bug 656120 is fixed can you try the demo again and see if things have improved?
Overall memory use seems the same, perhaps slightly better - I max out at 2.0 GB instead of 2.2. It still OOMs sometimes. There are more GC pauses which I guess is to be expected. They are now very noticeable in some parts of the demo.

The leak at the end seems better or maybe I was just lucky. But closing the tab led to a several-minute-long GC which eventually did work, and memory went almost all the way back down (to 125MB resident, I guess higher than originally due to fragmentation etc.).

So there seems like there is a change (less memory, more GC pauses), but GC not being done was just one of the issues here.
Yeah the problems during this demo can't be solved with an "idle" GC :)
I looked a little bit deeper into the GC performance here.
For the 18 GCs we perform during this demo, 3 are coming from the new GC trigger.

As you mentioned, the working set for this demo is huge. About 370MB remains reachable after each GC. 95% of our pause time we spend in marking.
An average GC takes about 450 - 500ms which is definitely too long. Our max pause time is 600ms.
So until we have incremental and generational GC in place, the parallel marking patch should be a nice improvement here.

It's a chrome experiment and with their current canary version (where they fixed the typed array bug they had for bad GC performance in FOTN) I also see 2 GCs that take about 1.5sec and one GC that takes 700ms almost at the end of the demo. But they keep their heap much smaller and perform many short GCs.
Gregor, did you see the minutes-long GC that happens after the demo is done? Any idea why that one is so bad?

STR: Run the demo to completion. When it's all finished, close the tab and wait. The browser will start to GC soon after, and it completely locks up and is unresponsive for several minutes until GC finally concludes.
(In reply to comment #29)
> Gregor, did you see the minutes-long GC that happens after the demo is done?
> Any idea why that one is so bad?
> 
> STR: Run the demo to completion. When it's all finished, close the tab and
> wait. The browser will start to GC soon after, and it completely locks up
> and is unresponsive for several minutes until GC finally concludes.

Yes once I close the animation I get a 800ms pause time caused by JSGC_FINALIZE_END callback. I am not an expert about this bottleneck. Maybe Andrew knows more.
From reading the comments, it looks like that is where a wrapper for a JS object held by XPCOM will be released (and other JS bric-a-brac XPCOM cares about).  I guess if it sets up a huge hierarchy of things that hook into XPCOM then when you tear it down there's a lot to collect.
Could that be the WebGL context, with all its associated WebGL objects such as textures? If you look at the terminal during that pause, on a debug build, do you see stuff about WebGL?
I don't have a debug build handy now, but I definitely saw a lot of WebGL debug output before if I CC'd after preload (which isn't the same, but maybe close enough). So it makes a lot of sense I think that lots of WebGL objects are a part of that huge CC at the end.
OK. Note that unreferenced WebGL objects being GC'd all together surprisingly late is bug 617453 but that's 100 miles above my understanding of things.

Upon WebGL context destructoin, I wonder if just "leaking" WebGL texture/buffers, instead of destroying them one by one, would make that faster. They would be destroyed anyway when the context gets destroyed.
In bug 638549 I've made patches that add WebGL accounting in about:support.

Try builds are ready at:
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-c2df6455c534/

Can you please try this demo in these builds, and go to about:memory and check the webgl entries there, at the bottom under the 'Other Measurements' category. They only appear if webgl objects are in memory. If you don't see webgl there, that means that no webgl objects are currently in memory.
Nice, the WebGL info in about:memory looks like it will be very useful!

Here is some data:

Preload

2,070.06 MB -- vsize
1,284.00 MB -- heap-committed
1,240.23 MB -- resident
1,212.10 MB -- heap-used
  642.00 MB -- js-gc-heap
   96.10 MB -- webgl-buffer-memory
   71.90 MB -- heap-unused
   52.22 MB -- webgl-texture-memory
    2.30 MB -- heap-dirty
    0.60 MB -- canvas-2d-pixel-bytes
    0.04 MB -- gfx-surface-image
    764,691 -- soft-page-faults
      6,602 -- webgl-buffer-count
         35 -- webgl-texture-count
         27 -- hard-page-faults
          4 -- webgl-context-count

During

2,082.72 MB -- vsize
1,401.00 MB -- heap-committed
1,356.87 MB -- resident
1,328.33 MB -- heap-used
  755.00 MB -- js-gc-heap
   97.27 MB -- webgl-buffer-memory
   72.67 MB -- heap-unused
   68.81 MB -- webgl-texture-memory
    2.64 MB -- heap-dirty
    0.60 MB -- canvas-2d-pixel-bytes
    0.04 MB -- gfx-surface-image
  1,833,404 -- soft-page-faults
      6,890 -- webgl-buffer-count
         55 -- webgl-texture-count
         40 -- hard-page-faults
          4 -- webgl-context-count

After demo, before closing tab

2,066.62 MB -- vsize
1,441.00 MB -- heap-committed
1,387.55 MB -- resident
1,365.83 MB -- heap-used
  780.00 MB -- js-gc-heap
   99.49 MB -- webgl-buffer-memory
   76.77 MB -- webgl-texture-memory
   75.17 MB -- heap-unused
    3.16 MB -- heap-dirty
    0.60 MB -- canvas-2d-pixel-bytes
    0.04 MB -- gfx-surface-image
  2,105,187 -- soft-page-faults
      7,113 -- webgl-buffer-count
         65 -- webgl-texture-count
         40 -- hard-page-faults
          4 -- webgl-context-count

After closing tab and massive multi-minute GC

619.40 MB -- vsize
297.00 MB -- heap-committed
248.78 MB -- heap-unused
127.62 MB -- resident
 48.21 MB -- heap-used
 21.00 MB -- js-gc-heap
  2.46 MB -- heap-dirty
  0.04 MB -- gfx-surface-image
  0.00 MB -- canvas-2d-pixel-bytes
2,220,593 -- soft-page-faults
       40 -- hard-page-faults

Looks like there is WebGL stuff in memory that is cleaned up by the final GC.
Thanks. I don't really see something abnormal here. WebGL textures and buffers using 175M of memory in a very large demo like this sounds reasonable and doesn't explain the 2G memory usage so far. I'll update this bug as WebGL accounting in about:memory becomes more complete.
>     764,691 -- soft-page-faults
>       6,602 -- webgl-buffer-count
>          35 -- webgl-texture-count
>          27 -- hard-page-faults
>           4 -- webgl-context-count

This is unrelated to this bug... sorting the "counts" events by size doesn't seem useful, alphabetical order would be better.  In which case, hard-page-faults and soft-page-faults would be better renamed as page-faults-hard and page-faults-soft.
Filed bug 668893 on the ordering.
New tryserver builds, with expanded WebGL about:memory coverage, will soon be available at:
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-087d304b9757
It seems GC is working, it cleans everything after the page is closed. http://pastebin.com/wDTgN8SX Although, the memory usage is still horrible during the demo.
Attached file about:memory while running ROME (deleted) β€”
Tried again today, it's using 2G of heap out of which only 8M fall in WebGL heap categories. It's using 150M of (non-heap) GL memory for WebGL objects.

May I suggest that the next thing that needs to get instrumented with about:memory reporters, is the media stuff. There's a ton of video playing in this demo.
Depends on: 672755
Filed bug 674343 about the fact that we spend 10% of our time in the GC during this demo.
You could try seeing if the builds in this directory have lower heap usage in the ro.me demo:

https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/cpearce@mozilla.com-2e106a76a194/

These builds change the amount of memory reserved for the video decoder's thread stacks to be smaller, in particular drastically smaller on Linux.
> May I suggest that the next thing that needs to get instrumented with
> about:memory reporters, is the media stuff. There's a ton of video playing
> in this demo.

That's bug 672755.
I ran this demo just now, with a recentish but (I think) pre-objShrink build.  I looked at memory usage at a few random points.  The heap-unclassified story at least looks a lot better than in comment 13:

opening video part: 1.2gigs total, 10% heap unclassified, 133mb
city part: 1.9gigs total, 9% heap unclassified (167mb)
desert part: 1.5gigs total, 10.36% heap unclassified (152mb)
Attached file about:memory while running ROME (deleted) β€”
I guess I should have copied the part at the bottom, too, but I forgot.  83% of memory usage of JS, so it would be worth seeing what it looks like post-ObjShrink.
> I guess I should have copied the part at the bottom, too, but I forgot.  83%
> of memory usage of JS, so it would be worth seeing what it looks like
> post-ObjShrink.

Definitely -- you've got 985MB of non-function objects in the ro.me compartment, so that should more than halve (the exact change depends on whether this is a 32-bit or 64-bit build).
On nightly, in the desert part, it was about 1.2gigs of memory, with 12% (141mb) heap-unclassified.  Memory went down to 80mb after I closed it down.

GC pause times were about 280 to 350ms.  They were almost all global GCs, except a few compartmental ones near the start.
azakai, do you think we've done enough to close this bug?  If the GC/CC pauses are still too big, seems like that should be spun of as a separate bug.
Note that important changes about WebGL objects referencing each other landed earlier this week, so I recommend using a nightly build no older than 2 days old.
I updated the Nightly I used before I ran the demo, so it should be pretty new.
(In reply to Nicholas Nethercote [:njn] from comment #51)
> azakai, do you think we've done enough to close this bug?  If the GC/CC
> pauses are still too big, seems like that should be spun of as a separate
> bug.

I don't think we should close this. We still take 2-3X more memory than Chrome. Some numbers from latest nightly:

1.1GB to preload. Chrome takes 614MB.
1.6GB maximum during the city scene, and huge lag/nonresponsiveness. Chrome takes 634MB there and is smooth.
1.3GB in the desert scene. Chrome takes 672MB.
On the plus side, all memory is reclaimed once the tab is closed, after 10-15 seconds. However, during that there is a huge multisecond GC where we reach 1.6GB.
Whiteboard: [MemShrink:P2] → [MemShrink:P2][see comment 54]
I repeated the test for this bug. azakai's numbers are still valid, Fx uses much more memory than Chrome.
I can confirm those numbers for Firefox Aurora 16.0a2 (2012-08-14) / Windows 7 64 bit / 8GB RAM / NVIDIA 8800GT / Intel Core 2 Quad Q6600 2.40 GHz. Firefox is also choppy with pauses in the city - Chrome runs smoothly. GC pauses?
Mac stats :) On a Sandybridge Macbook Pro 13" OSX 10.7.4 4GB RAM Firefox Aurora 16.0a2 (2012-08-14)the numbers are:

1.18GB to preload. Chrome takes 880MB (3 processes)
1.72GB maximum during the city scene. Chrome takes 934MB
1.5GB in the desert scene. Chrome takes 1007MB

Firefox does reclaim the memory when the tab closes, within 10 - 15 seconds, without too much pause.
The GC output after the window was closed after comment 57 was:

CC(T+560.6) duration: 7ms, suspected: 313, visited: 1218 RCed and 4535 GCed, collected: 290 RCed and 0 GCed (290 waiting for GC)
ForgetSkippable 2 times before CC, min: 0 ms, max: 0 ms, avg: 0 ms, total: 1 ms, removed: 1097

GC(T+563.1) Total Time: 354.6ms, Compartments Collected: 161, Total Compartments: 161, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 316.6ms, Allocated: 685MB, +Chunks: 0, -Chunks: 0
    Slice: 0, Pause: 37.5 (When: 0.0ms, Reason: PAGE_HIDE): Mark: 31.6ms, Mark Discard Code: 5.7ms, Mark Roots: 21.4ms
    Slice: 2, Pause: 316.6 (When: 239.2ms, Reason: INTER_SLICE_GC): Mark: 303.6ms, Mark Gray: 303.6ms, Sweep: 11.2ms, Sweep Atoms: 1.8ms, Sweep Compartments: 5.6ms, Sweep Tables: 1.5ms, Sweep Object: 0.8ms, Sweep Script: 0.2ms, Sweep Shape: 1.2ms, Sweep Discard Code: 0.2ms, Discard Analysis: 3.8ms, Discard TI: 0.1ms, Free TI Arena: 3.2ms, Sweep Types: 0.3ms, Clear Script Analysis: 0.1ms, Finalize End Callback: 1.5ms, End Callback: 1.5ms
    Totals: Mark: 335.6ms, Mark Discard Code: 5.7ms, Mark Roots: 21.4ms, Mark Gray: 303.6ms, Sweep: 11.2ms, Sweep Atoms: 1.8ms, Sweep Compartments: 5.6ms, Sweep Tables: 1.5ms, Sweep Object: 0.8ms, Sweep Script: 0.2ms, Sweep Shape: 1.2ms, Sweep Discard Code: 0.2ms, Discard Analysis: 3.8ms, Discard TI: 0.1ms, Free TI Arena: 3.2ms, Sweep Types: 0.3ms, Clear Script Analysis: 0.1ms, Finalize End Callback: 1.5ms, End Callback: 1.5ms

CC(T+569.3) duration: 224ms, suspected: 334, visited: 10578 RCed and 3896 merged GCed, collected: 9212 RCed and 1008 GCed (10220 waiting for GC)
ForgetSkippable 2 times before CC, min: 1 ms, max: 2 ms, avg: 1 ms, total: 3 ms, removed: 418

GC(T+573.9) Total Time: 372.7ms, Compartments Collected: 160, Total Compartments: 160, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 338.8ms, Allocated: 684MB, +Chunks: 0, -Chunks: 1
    Slice: 0, Pause: 33.4 (When: 0.0ms, Reason: CC_WAITING): Mark: 32.8ms, Mark Discard Code: 0.4ms, Mark Roots: 22.7ms
    Slice: 2, Pause: 338.8 (When: 234.1ms, Reason: INTER_SLICE_GC): Mark: 2.6ms, Mark Gray: 2.5ms, Sweep: 58.5ms, Sweep Atoms: 2.7ms, Sweep Compartments: 2.7ms, Sweep Tables: 1.8ms, Sweep Object: 48.1ms, Sweep Script: 1.8ms, Sweep Shape: 2.3ms, Sweep Discard Code: 0.3ms, Discard Analysis: 0.6ms, Discard TI: 0.2ms, Sweep Types: 0.2ms, Clear Script Analysis: 0.2ms, Finalize End Callback: 0.6ms, End Callback: 277.3ms
    Totals: Mark: 35.8ms, Mark Discard Code: 0.4ms, Mark Roots: 22.7ms, Mark Gray: 2.5ms, Sweep: 58.5ms, Sweep Atoms: 2.7ms, Sweep Compartments: 2.7ms, Sweep Tables: 1.8ms, Sweep Object: 48.1ms, Sweep Script: 1.8ms, Sweep Shape: 2.3ms, Sweep Discard Code: 0.3ms, Discard Analysis: 0.6ms, Discard TI: 0.2ms, Sweep Types: 0.2ms, Clear Script Analysis: 0.2ms, Finalize End Callback: 0.6ms, End Callback: 277.3ms

CC(T+574.0) duration: 7ms, suspected: 264, visited: 1304 RCed and 2888 GCed, collected: 0 RCed and 0 GCed (0 waiting for GC)
ForgetSkippable 3 times before CC, min: 0 ms, max: 1 ms, avg: 0 ms, total: 2 ms, removed: 108
During the city scene in the run from comment 57 the GC debug follows. There's a couple of non incremental GC, and the max pauses are still around 100ms. The experience is still slower and more stuttery than Chrome.

GC(T+159.1) Total Time: 481.0ms, Compartments Collected: 1, Total Compartments: 151, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 126.3ms, Nonincremental Reason: allocation trigger, Allocated: 828MB, +Chunks: 231, -Chunks: 0
    Slice: 0, Pause: 39.8 (When: 0.0ms, Reason: MAYBEGC): Mark: 38.4ms, Mark Discard Code: 1.3ms, Mark Roots: 28.4ms, Mark Types: 0.6ms
    Slice: 32, Pause: 126.3 (When: 2521.6ms, Reason: ALLOC_TRIGGER): Mark: 90.4ms, Mark Delayed: 24.2ms, Mark Gray: 0.4ms, Sweep: 34.6ms, Sweep Atoms: 1.5ms, Sweep Compartments: 1.1ms, Sweep Tables: 0.2ms, Sweep Object: 30.6ms, Sweep Shape: 0.2ms, Sweep Discard Code: 0.9ms, Finalize End Callback: 1.1ms, End Callback: 0.3ms
    Totals: Mark: 440.2ms, Mark Discard Code: 1.3ms, Mark Roots: 28.4ms, Mark Types: 0.6ms, Mark Delayed: 24.2ms, Mark Gray: 0.4ms, Sweep: 34.6ms, Sweep Atoms: 1.5ms, Sweep Compartments: 1.1ms, Sweep Tables: 0.2ms, Sweep Object: 30.6ms, Sweep Shape: 0.2ms, Sweep Discard Code: 0.9ms, Finalize End Callback: 1.1ms, End Callback: 0.3ms

CC(T+160.7) duration: 8ms, suspected: 51, visited: 573 RCed and 3463 GCed, collected: 0 RCed and 0 GCed (0 waiting for GC)
ForgetSkippable 4 times before CC, min: 0 ms, max: 1 ms, avg: 0 ms, total: 2 ms, removed: 460

GC(T+165.6) Total Time: 521.6ms, Compartments Collected: 1, Total Compartments: 151, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 112.9ms, Nonincremental Reason: allocation trigger, Allocated: 946MB, +Chunks: 135, -Chunks: 0
    Slice: 0, Pause: 40.5 (When: 0.0ms, Reason: MAYBEGC): Mark: 38.8ms, Mark Discard Code: 1.5ms, Mark Roots: 28.8ms, Mark Types: 0.6ms
    Slice: 36, Pause: 12.5 (When: 2869.7ms, Reason: REFRESH_FRAME): Mark: 12.4ms, Mark Delayed: 7.4ms
    Slice: 37, Pause: 112.9 (When: 2916.6ms, Reason: ALLOC_TRIGGER): Mark: 58.9ms, Mark Delayed: 18.1ms, Mark Gray: 0.3ms, Sweep: 53.2ms, Sweep Atoms: 1.5ms, Sweep Compartments: 1.5ms, Sweep Tables: 0.3ms, Sweep Object: 48.5ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.1ms, Finalize End Callback: 1.4ms
    Totals: Mark: 461.7ms, Mark Discard Code: 1.5ms, Mark Roots: 28.8ms, Mark Types: 0.6ms, Mark Delayed: 26.9ms, Mark Gray: 0.3ms, Sweep: 53.2ms, Sweep Atoms: 1.5ms, Sweep Compartments: 1.5ms, Sweep Tables: 0.3ms, Sweep Object: 48.5ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.1ms, Finalize End Callback: 1.4ms

GC(T+172.5) Total Time: 534.1ms, Compartments Collected: 1, Total Compartments: 151, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 64.4ms, Allocated: 968MB, +Chunks: 16, -Chunks: 0
    Slice: 0, Pause: 41.2 (When: 0.0ms, Reason: MAYBEGC): Mark: 39.6ms, Mark Discard Code: 1.5ms, Mark Roots: 29.6ms, Mark Types: 0.6ms
    Slice: 36, Pause: 12.1 (When: 2494.4ms, Reason: REFRESH_FRAME): Mark: 12.0ms, Mark Delayed: 6.2ms
    Slice: 38, Pause: 13.6 (When: 2625.9ms, Reason: REFRESH_FRAME): Mark: 13.4ms, Mark Delayed: 7.7ms
    Slice: 43, Pause: 64.4 (When: 2955.6ms, Reason: REFRESH_FRAME): Mark: 2.1ms, Mark Delayed: 0.5ms, Mark Gray: 0.4ms, Sweep: 61.9ms, Sweep Atoms: 1.7ms, Sweep Compartments: 1.3ms, Sweep Tables: 0.3ms, Sweep Object: 57.4ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.0ms, Finalize End Callback: 1.2ms
    Totals: Mark: 465.3ms, Mark Discard Code: 1.5ms, Mark Roots: 29.6ms, Mark Types: 0.6ms, Mark Delayed: 24.9ms, Mark Gray: 0.4ms, Sweep: 61.9ms, Sweep Atoms: 1.7ms, Sweep Compartments: 1.3ms, Sweep Tables: 0.3ms, Sweep Object: 57.4ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.0ms, Finalize End Callback: 1.2ms

CC(T+172.8) duration: 6ms, suspected: 367, visited: 884 RCed and 3471 GCed, collected: 0 RCed and 0 GCed (0 waiting for GC)
ForgetSkippable 3 times before CC, min: 0 ms, max: 1 ms, avg: 0 ms, total: 2 ms, removed: 75

GC(T+178.8) Total Time: 475.4ms, Compartments Collected: 1, Total Compartments: 151, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 74.0ms, Allocated: 958MB, +Chunks: 0, -Chunks: 0
    Slice: 0, Pause: 47.4 (When: 0.0ms, Reason: MAYBEGC): Mark: 45.5ms, Mark Discard Code: 1.7ms, Mark Roots: 35.5ms, Mark Types: 0.6ms
    Slice: 36, Pause: 74.0 (When: 1325.5ms, Reason: REFRESH_FRAME): Mark: 1.1ms, Mark Gray: 1.0ms, Finalize Start Callback: 0.1ms, Sweep: 72.8ms, Sweep Atoms: 2.4ms, Sweep Compartments: 2.0ms, Sweep Tables: 0.4ms, Sweep Object: 66.9ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.6ms, Finalize End Callback: 1.2ms
    Totals: Mark: 396.7ms, Mark Discard Code: 1.7ms, Mark Roots: 35.5ms, Mark Types: 0.6ms, Mark Delayed: 0.4ms, Mark Gray: 1.0ms, Finalize Start Callback: 0.1ms, Sweep: 72.8ms, Sweep Atoms: 2.4ms, Sweep Compartments: 2.0ms, Sweep Tables: 0.4ms, Sweep Object: 66.9ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.6ms, Finalize End Callback: 1.2ms
Thanks for the information!  GC pauses are a slightly separate issue from memory usage, but I guess we can leave it here for now.  Some of these problems should be fixed in Nightly.

Comment 58: This is the teardown, and it is clearly dominated by a ton of C++ <-> JS references.
First GC is dominated by a 303ms Mark Gray.
Second GC is dominated by a 277ms End Callback, which should be fixed by bug 743112.

Comment 59:
First GC: Main problem is a 90ms Mark phase, I guess due to falling behind?
Second GC: about 60ms of Mark again, plus 50ms of Sweep Object. The latter should be fixed by bug 729760.
Third GC: Main problem is Sweep Object, which as before should be fixed in Nightly.
Fourth GC: Mark Roots takes 35ms in the first slice. Last slice is Sweep Object again.
Updated Mac stats for Aurora Firefox 17.0a2 (2012-09-03) :) On a Sandybridge Macbook Pro 13" OSX 10.7.4 4GB RAM the numbers are (from MemChaser now):

1.21GB to preload.
1.61GB maximum during the city scene. That's down about 100MB.

On a clean Firefox start the GC situation (from MemChaser this time) is much nicer, although 58.7ms is still a very long time to add to a frame of graphics:

Resident 1606MB: IGC: 58.7ms (6.0s) CC: 3.5ms (12.1s)

Running with a long-open Firefox was showing some really nasty GC spikes: 

Resident 1710MB: IGC: 20291.3ms 99.8s CC: 2969ms (129.8s)

There also seems to be flickering in the rendering, where some frames are black. I'll bug that separately.
I've reported the flickering: bug 788026
There is a possibility that bug 795186 might help here. It is worth retrying with tomorrow's nightly build.
Mac stats :) On a Sandybridge Macbook Pro 13" OSX 10.7.4 4GB RAM Firefox Nightly 19.0a1 (2012-10-25) the numbers are:

1.10GB to preload. 
1.67GB maximum during the city scene

This is maybe a bit lower than before. Firefox does reclaim the memory when the tab closes, within 10 - 15 seconds, without too much pause, dropping to 298MB. 

The memory stats are reported with Mem Chaser. There's also a really big pause when ro.me is not in the Firefox cache after preload when you hit "Begin". The first time I got a black screen for about 10 seconds, with no audio, then the audio/video kicks in that far into the song. The second time (after clearing the last hour of cache) I got the first frame of video, and playing audio, but then the video didn't update for about 10s. Mem Chaser didn't show long/red GC/CC pauses. This don't happen if you load Ro.me a second time without clearing the cache.
I remeasured with Firefox nightly 58 on Linux vs Chrome beta 62 using atsy's stats module. On the Firefox side I had DMD enabled.

> from atsy import stats
> f = status.ProcessStats(lambda x: 'obj-x86_64-pc-linux-gnu-clang' in x, lambda x: 'firefox -content' not in x)
> c = stats.ProcessStats(lambda x: 'chrome-beta' in x, lambda x: 'chrome --type' not in x)

Firefox + DMD:

> >>> f.print_stats()
> [24936] - /var/dev/erahm/mozilla-unified/obj-x86_64-pc-linux-gnu-clang/dist/bin/firefox
>   * RSS - 382935040
>     USS - 250073088
> [25071] - /var/dev/erahm/mozilla-unified/obj-x86_64-pc-linux-gnu-clang/dist/bin/firefox
>     RSS - 179822592
>   * USS - 73367552
> [25132] - /var/dev/erahm/mozilla-unified/obj-x86_64-pc-linux-gnu-clang/dist/bin/firefox
>     RSS - 1799127040
>   * USS - 1658810368
> 
> Total: 2,115,112,960 bytes

Chrome:

> >>> c.print_stats()
> [26578] - /opt/google/chrome-beta/chrome
>   * RSS - 165359616
>     USS - 80101376
> [26588] - /opt/google/chrome-beta/chrome
>     RSS - 54571008
>   * USS - 8699904
> [26589] - /opt/google/chrome-beta/nacl_helper
>   * RSS - 7880704
>     USS - 2338816
> [26592] - /opt/google/chrome-beta/chrome
>     RSS - 12619776
>   * USS - 114688
> [26646] - /opt/google/chrome-beta/chrome
>     RSS - 414932992
>   * USS - 298663936
> [26723] - /opt/google/chrome-beta/chrome
>     RSS - 23814144
>   * USS - 3039232
> [26923] - /opt/google/chrome-beta/chrome
>     RSS - 1087041536
>   * USS - 1004920832
> 
> Total: 1,488,678,912 bytes

If we subtract the amount of overhead for DMD (458 MiB) and about:memory (10 MiB) that leaves us with:

> Fx - 1,624,379,392 bytes
> Cr - 1,488,678,912 bytes
>      ===================
>        135,700,480 bytes

So we're talking a 130 MiB difference on a Chrome-specific experiment, at this point I think we can just close this. If someone feels strongly feel free to reopen but this isn't more than MemShrink:P3.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: