654028 - 70MB of collectible garbage not cleaned up

Reporter

Description

•

14 years ago

STR: 1. Open Firefox with a new profile, no addons or plugins. I tested on Ubuntu 10.10. 2. Visit http://linux.slashdot.org/story/11/04/30/230259/Firefox-On-Linux-Gets-Faster-Builds-mdash-To-Be-Fast-As-Windows 3. Click 'Post a comment'. 6. Type some stuff, for example just fill a line with one character, then another line with another. 7. Memory usage rises, from 40MB to even 100MB or more, quite quickly, in a way that is obviously tied to the typing. The CPU is also maxed out and things get sluggish (after a few lines of typing). For comparison, on Chrome I get 30% CPU, which also seems abnormal, but memory usage does not rise (it starts at 50MB and stays in the 50's). A user reported this on Slashdot, and was even kind enough to make a screencast of it: http://www.youtube.com/watch?v=64pX97-THXc As a Slashdot user myself I must say Slashdot seems sluggish on Firefox, this issue might be a big part of that. (There are also other issues, like bug 648278 - apparently Slashdot doesn't test on Firefox at all?)

Alon Zakai (:azakai)

Reporter

Comment 1

•

14 years ago

I will do some profiling with valgrind to try to find out what is going on here.

Assignee: nobody → azakai

Alon Zakai (:azakai)

Reporter

Comment 2

•

14 years ago

For some reason, we do *not* do a GC while typing in Slashdot comment fields. The memory increase here is caused by GC-able stuff, and it doesn't get cleaned up. This doesn't seem to happen on a test webpage I created with a textarea. I will investigate the Slashdot page some more.

Alon Zakai (:azakai)

Reporter

Comment 3

•

14 years ago

Some more observations: 1. Just scrolling up and down is enough to increase memory by a few MB every few seconds. 2. Doing something like control-shift-J to open the JS error console will trigger a GC. This does not clear away all the memory, but a few seconds later a second set of GCs are done, which finally do clear things back to normal. (I am looking at a printf that shows up each time js_GC() is called.)

Alon Zakai (:azakai)

Reporter

Comment 4

•

14 years ago

Attached file reduced testcase (deleted) — Details

Reduced testcase. The attachment is an HTML file that uses onscroll to create lots of collectable garbage. We do not do any GCing at all until 70MB of garbage is created (can change N in the testcase to set how fast garbage is created). At that point, memory does not increase further due to GC, however, that 70MB is *not* freed no matter how long you scroll and/or wait. But, if you open the JS error console (control-shift-J), then 5-10 seconds later all that 70MB will be cleaned up, I guess opening that window triggers a different form of GC?

Alon Zakai (:azakai)

Reporter

Updated

•

14 years ago

Assignee: azakai → general

Component: General → JavaScript Engine

Product: Firefox → Core

QA Contact: general → general

Alon Zakai (:azakai)

Reporter

Updated

•

14 years ago

Summary: Extreme memory and CPU usage when typing a Slashdot comment → 70MB of collectible garbage not cleaned up

Alon Zakai (:azakai)

Reporter

Updated

•

14 years ago

Blocks: mlk-fx5+

Alon Zakai (:azakai)

Reporter

Comment 5

•

14 years ago

Forgot to write proper STR for the testcase: 1. Load page 2. Scroll down by pressing down arrow. Over 70MB of garbage should be created in less than a second. 3. Wait however long you want, and watch that memory usage remains the same, that is, the 70MB is never cleaned up. Optionally, also do 4. Press control-shift-J to open the JS console. 5. Wait 10 seconds. 6. See the 70MB of garbage cleaned up.

OS: Linux → All

Hardware: x86 → All

Hugh Nougher [:Hughman]

Comment 6

•

14 years ago

I can confirm the reduced testcase and steps from comment 5 but only in Nightly. Aurora and 4.0.1 work as expected. Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a1) Gecko/20110503 Firefox/6.0a1 about:memory says that its js/gc-heap that contains the memory. Manual trigger of GC does the same as opening the JS Console (few seconds before memory drops).

Boris Zbarsky [:bzbarsky]

Comment 7

•

14 years ago

> but only in Nightly. Aurora and 4.0.1 work as expected. Hmm. Can we figure out a regression range here?

Ed Morley [:emorley]

Updated

•

14 years ago

Keywords: regression, regressionwindow-wanted

Hugh Nougher [:Hughman]

Comment 8

•

14 years ago

Last good nightly: 2011-04-26 First bad nightly: 2011-04-27 Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c5e8cc100248&tochange=c833fb1623ca

Hugh Nougher [:Hughman]

Comment 9

•

14 years ago

Tracemonkey for tm to m-c merge in above changesets: Last good nightly: 2011-04-13 First bad nightly: 2011-04-14 Pushlog: http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=cd5910633dd8&tochange=dc6ce44bedd4

Boris Zbarsky [:bzbarsky]

Comment 10

•

14 years ago

Looks likely to be a background finalization problem. Hugh, thanks for hunting those ranges down! Gregor, can you take a look?

Blocks: 627200

status-firefox5: --- → unaffected

tracking-firefox6: --- → ?

Keywords: regressionwindow-wanted

Gregor Wagner [:gwagner]

Comment 11

•

14 years ago

(In reply to comment #10) > Looks likely to be a background finalization problem. Hugh, thanks for > hunting those ranges down! > > Gregor, can you take a look? I tried the reduced testcase but after navigating away the JS heap size went back to normal again. I also tried the slashdot page and saw the incredible increase during typing. Once I navigate away the JS heap size decreases again (with a delay of about 40sec). I didn't see the CPU intensive part though. It stays between 5 and 10% for me. Alon you mention that slashdot is sluggish on your system. What system do you have? (CPU, RAM, OS) We try to perform GC when we detect idle time but it seems that doesn't work for low-end systems. The background finalization patch moves a huge amount of work to a second thread that is executed in parallel to the main thread after the GC. Maybe this affected the idle time detection on low-end systems.

Boris Zbarsky [:bzbarsky]

Comment 12

•

14 years ago

Gregor, this bug is about the fact that GC doesn't run while the page is just there; navigating away will trigger GC.

Alon Zakai (:azakai)

Reporter

Comment 13

•

14 years ago

(In reply to comment #11) > (In reply to comment #10) > > Looks likely to be a background finalization problem. Hugh, thanks for > > hunting those ranges down! > > > > Gregor, can you take a look? > > I tried the reduced testcase but after navigating away the JS heap size went > back to normal again. As bz said, the issue here is that, without navigating away, the 70MB is never released (but it used to). > I didn't see the CPU intensive part though. It stays between 5 and > 10% for me. > > Alon you mention that slashdot is sluggish on your system. What system do > you have? (CPU, RAM, OS) > The sluggishness is relative I guess, I originally tested on a slower 2GHz Lenovo laptop, but I also see the main issue here (the 70MB not being reclaimed) on a powerful MacBook, and other people reproduce it as well elsewhere. (Side issue: Comparing browsers on the same machine, we use significantly more CPU (2x) and RAM (up to 70MB) than Chrome on Slashdot.)

Gregor Wagner [:gwagner]

Comment 14

•

14 years ago

Attached patch patch (obsolete) (deleted) — Details — Splinter Review

Ok I see what the problem is. The background patch increased the memory footprint and the trigger mechanism in MaybeGC might be too loose. We can fix this by changing the trigger for the GC in MaybeGC. This patch has to be tested but it solves the problem for me.

Alon Zakai (:azakai)

Reporter

Comment 15

•

14 years ago

The patch does not fix the problem for me with the STR from comment 5.

Gregor Wagner [:gwagner]

Comment 16

•

14 years ago

(In reply to comment #15) > The patch does not fix the problem for me with the STR from comment 5. Does it never work or just sometimes? What happens is you keep scrolling with about 10sec pauses in between? For me it releases it maybe 8 out of 10 times after scrolling.

Alon Zakai (:azakai)

Reporter

Comment 17

•

14 years ago

I tried 3 times, with greater than 10 second pauses, and it didn't work in any of those. Maybe I was just unlucky.

Richard Ayotte

Comment 18

•

14 years ago

Tightening the GC could help but isn't the allocation of 70 MB for the input of a few hundred bytes of text the real problem? The 70MB shouldn't be allocated in the first place. I'm not sure how much memory is needed but intuition tells me that 70MB is way too much, probably by a factor of 1000.

Alon Zakai (:azakai)

Reporter

Comment 19

•

14 years ago

(In reply to comment #18) > Tightening the GC could help but isn't the allocation of 70 MB for the input > of a few hundred bytes of text the real problem? The text is not related to the 70MB. That 70MB is generated by the JavaScript in the testcase page, which purposefully creates many many MB of garbage JS objects. Which simulates what Slashdot does, every time you press a key or scroll on Slashdot, lots and lots of JS garbage is created (I don't know why they wrote it that way). That many MB of garbage are created on Slashdot and in the testcase page, is expected (see the JavaScript that creates the garbage in each case). The bug however is that we do not clean it up eventually (unless you do something like browse to somewhere else).

Richard Ayotte

Comment 20

•

14 years ago

(In reply to comment #19) > (In reply to comment #18) > > Tightening the GC could help but isn't the allocation of 70 MB for the input > > of a few hundred bytes of text the real problem? > > The text is not related to the 70MB. That 70MB is generated by the > JavaScript in the testcase page, which purposefully creates many many MB of > garbage JS objects. What's interesting though is that the testcase only allocates an extra 3MB in Chrome.

Boris Zbarsky [:bzbarsky]

Comment 21

•

14 years ago

Chrome runs GC much more often than we do; it almost certainly runs it multiple times while the testcase runs.

Gregor Wagner [:gwagner]

Comment 22

•

13 years ago

We also perform a few GCs during the testcase but at the end we stop executing and if you don't move the mouse any more we don't even get MaybeGC calls. So it's random how full the JS heap is after you stop scrolling.

Gregor Wagner [:gwagner]

Comment 23

•

13 years ago

Attached patch patch (deleted) — Details — Splinter Review

This patch runs a GC every 5 seconds if we have any calls to MaybeGC. Note that we need 2 GCs in order to give back the memory to the OS because of background finalization. It should solve the reduced testcase issue but the slashdot situation does not get better. Even after full GCs the memory doesn't go down. This seems to be a leak.

Attachment #531133 - Attachment is obsolete: true

Igor Bukanov

Comment 24

•

13 years ago

Our GC heuristics depend on the size of the GC heap after the last GC. The background finalization broke that as at the end of js_GC() we now call setGCLastBytes with the value that does not account for any arena finalized in the background. I guess as a simple hack we could try to move setGCLastBytes at the end of the background finalization.

Igor Bukanov

Comment 25

•

13 years ago

(In reply to comment #24) > I guess as a simple hack we could try to move setGCLastBytes at the end of > the background finalization. Or at least try to call it again accounting for the newly allocated GC arenas.

Gregor Wagner [:gwagner]

Comment 26

•

13 years ago

The original problem here is that the 70MB during typing in SlashDot are not reclaimed. They don't even get reclaimed if we perform full GCs! So lets focus on this issue here. The reduced testcase shows a different (well known) problem: We don't perform GCs during idle time. I filed bug 656120 for that.

Alon Zakai (:azakai)

Reporter

Comment 27

•

13 years ago

(In reply to comment #26) > The original problem here is that the 70MB during typing in SlashDot are not > reclaimed. They don't even get reclaimed if we perform full GCs! > So lets focus on this issue here. > > The reduced testcase shows a different (well known) problem: We don't > perform GCs during idle time. I filed bug 656120 for that. Gregor, in what situation are you seeing that even full GCs do not reclaim all the memory? In my testing, both on Slashdot and in the testcase, memory is reclaimed if something like the JS error console is opened and closed, which triggers a GC. For example, load the Slashdot page linked to in comment #0, scroll up and down until memory goes from 95 to 130, wait a few minutes (or longer), see that memory stays up. Then press control-shift-J to open the JS error console, and close it, and see that 5-10 seconds later memory goes down to 95. Or, instead of scrolling, click to add a comment and hold down a key (so that the key repeats and lots of text is entered), again wait until memory goes from 95 to 130, and continue as in the last paragraph, with the same results. Does memory not get reclaimed in these STRs for you?

Gregor Wagner [:gwagner]

Comment 28

•

13 years ago

(In reply to comment #27) > > Gregor, in what situation are you seeing that even full GCs do not reclaim > all the memory? > > In my testing, both on Slashdot and in the testcase, memory is reclaimed if > something like the JS error console is opened and closed, which triggers a > GC. > > For example, load the Slashdot page linked to in comment #0, scroll up and > down until memory goes from 95 to 130, wait a few minutes (or longer), see > that memory stays up. Then press control-shift-J to open the JS error > console, and close it, and see that 5-10 seconds later memory goes down to > 95. > > Or, instead of scrolling, click to add a comment and hold down a key (so > that the key repeats and lots of text is entered), again wait until memory > goes from 95 to 130, and continue as in the last paragraph, with the same > results. > > Does memory not get reclaimed in these STRs for you? Yes you are right. The memory goes down again after some time. I guess I didn't wait long enough. I just tested on my netbook and after a few lines I almost couldn't type any more. With my usual typing speed I was always one word ahead of what I saw on the screen. Very annoying!

Alon Zakai (:azakai)

Reporter

Comment 29

•

13 years ago

> I just tested on my netbook and after a few lines I almost couldn't type any > more. With my usual typing speed I was always one word ahead of what I saw > on the screen. Very annoying! Yes, the overall Firefox user experience on Slashdot is not good. I filed an evangelism bug for that, bug 656347.

Nicholas Nethercote [inactive]

Comment 30

•

13 years ago

(In reply to comment #28) > > Yes you are right. The memory goes down again after some time. I guess I > didn't wait long enough. I'm about to land bug 654041 which adds buttons to about:memory to force GC and CC. Should be very helpful to avoid this not-waiting-long-enough problem :)

Asa Dotzler [:asa]

Updated

•

13 years ago

tracking-firefox6: ? → +

Nicholas Nethercote [inactive]

Updated

•

13 years ago

Blocks: mlk-fx6

Nicholas Nethercote [inactive]

Updated

•

13 years ago

No longer blocks: mlk-fx5+

Alon Zakai (:azakai)

Reporter

Updated

•

13 years ago

Depends on: 656120

Nicholas Nethercote [inactive]

Updated

•

13 years ago

Whiteboard: [MemShrink:P2]

Nicholas Nethercote [inactive]

Comment 31

•

13 years ago

Bug 656120 just landed. Can we confirm whether it helped with the reduced test case?

Alon Zakai (:azakai)

Reporter

Comment 32

•

13 years ago

I confirmed that this bug is now fixed, memory is reclaimed properly (after a short pause). Marking resolved. Great stuff, Gregor! We still don't have good performance on Slashdot, due to large amounts of garbage it creates all the time (scrolling, keypresses, etc.). However, at least that garbage is now reclaimed.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Asa Dotzler [:asa]

Comment 33

•

13 years ago

So this bug is resolved fixed because we got the fix for Firefox 7 over in Bug 656120. But this is a regression in Firefox 6 and we don't seem to have a fix there. Is that correct? Are we going to just live with it for 6?

status-firefox6: --- → affected

status-firefox7: --- → fixed

Whiteboard: [MemShrink:P2] → [MemShrink:P2] [Fixed in 7 by Bug 656120]

Nicholas Nethercote [inactive]

Comment 34

•

13 years ago

(In reply to comment #33) > So this bug is resolved fixed because we got the fix for Firefox 7 over in > Bug 656120. But this is a regression in Firefox 6 and we don't seem to have > a fix there. Is that correct? Are we going to just live with it for 6? I don't see what else we can do. Even if a spot-fix could be devised, if the patch from bug 656120 was deemed too risky for FF6 why would a spot-fix be deemed acceptable, esp. since we are much later in the cycle?

Asa Dotzler [:asa]

Comment 35

•

13 years ago

That's great information, njn. I'm not making any judgements here, just looking for data so we can make the best use of our time in the big group triage meeting tomorrow. Thanks for the follow-up.

Christopher Blizzard (:blizzard)

Comment 36

•

13 years ago

Not tracking for Firefox 6 - too late for that train.

tracking-firefox6: + → ---

u279076

Comment 37

•

13 years ago

qa+: QA verification needed on Firefox 7 using the attached testcase and the steps in comment 5

Whiteboard: [MemShrink:P2] [Fixed in 7 by Bug 656120] → [MemShrink:P2] [Fixed in 7 by Bug 656120][qa+]

Mihaela Velimiroviciu (:mihaelav)

Comment 38

•

13 years ago

Verified on Win7/64 bit with Firefox 7RC, Aurora and Nightly: Build identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0) Gecko/20100101 Firefox/7.0 Build identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0a2) Gecko/20110914 Firefox/8.0a2 Build identifier: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:9.0a1) Gecko/20110922 Firefox/9.0a1 In the reduced test case page, when scrolling page up/down, memory usage increases with about 70 MB (on Aurora and 7RC) or 100MB (on Nightly), but decreases back in a few seconds after the scrolling is stopped.

Keywords: verified-aurora, verified-beta

Mihaela Velimiroviciu (:mihaelav)

Comment 39

•

13 years ago

LINUX: Mozilla/5.0 (X11; Linux x86_64; rv:9.0a1) Gecko/20110922 Firefox/9.0a1 Mozilla/5.0 (X11; Linux x86_64; rv:8.0a2) Gecko/20110916 Firefox/8.0a2 Mozilla/5.0 (X11; Linux x86_64; rv:7.0) Gecko/20100101 Firefox/7.0 MAC: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0a1) Gecko/20110922 Firefox/9.0a1 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0a2) Gecko/20110921 Firefox/8.0a2 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0) Gecko/20100101 Firefox/7.0 Same as in comment 38 with Firefox 7RC on Ubuntu and MAC 10.6

Status: RESOLVED → VERIFIED

Mihaela Velimiroviciu (:mihaelav)

Updated

•

13 years ago

Whiteboard: [MemShrink:P2] [Fixed in 7 by Bug 656120][qa+] → [MemShrink:P2] [Fixed in 7 by Bug 656120][qa!]

Mihaela Velimiroviciu (:mihaelav)

Comment 40

•

13 years ago

Verified as fixed also on Win XP: Mozilla/5.0 (Windows NT 5.1; rv:7.0) Gecko/20100101 Firefox/7.0

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 41

•

12 years ago

http://www.cs.umd.edu/~mwh/papers/expositor-submitted.pdf claims that there is an unfixed underlying bug that caused this bug, a race on gcChunksWaitingToExpire.

Gregor Wagner [:gwagner]

Comment 42

•

12 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #41) > http://www.cs.umd.edu/~mwh/papers/expositor-submitted.pdf claims that there > is an unfixed underlying bug that caused this bug, a race on > gcChunksWaitingToExpire. gcChunksWaitingToExpire and the mentioned timer-based code path don't exist any more. But they are right, bug 631733 in combination with bug 650326 caused some problems.

reduced testcase 14 years ago Alon Zakai (:azakai) (deleted), text/html		Details
patch 14 years ago Gregor Wagner [:gwagner] (deleted), patch		Details \| Diff \| Splinter Review
patch 13 years ago Gregor Wagner [:gwagner] (deleted), patch		Details \| Diff \| Splinter Review