Closed Bug 654028 Opened 14 years ago Closed 13 years ago

70MB of collectible garbage not cleaned up

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
Tracking Status
firefox5 --- unaffected
firefox6 --- affected
firefox7 --- fixed

People

(Reporter: azakai, Unassigned)

References

Details

(Keywords: regression, verified-aurora, verified-beta, Whiteboard: [MemShrink:P2] [Fixed in 7 by Bug 656120][qa!])

Attachments

(2 files, 1 obsolete file)

STR: 1. Open Firefox with a new profile, no addons or plugins. I tested on Ubuntu 10.10. 2. Visit http://linux.slashdot.org/story/11/04/30/230259/Firefox-On-Linux-Gets-Faster-Builds-mdash-To-Be-Fast-As-Windows 3. Click 'Post a comment'. 6. Type some stuff, for example just fill a line with one character, then another line with another. 7. Memory usage rises, from 40MB to even 100MB or more, quite quickly, in a way that is obviously tied to the typing. The CPU is also maxed out and things get sluggish (after a few lines of typing). For comparison, on Chrome I get 30% CPU, which also seems abnormal, but memory usage does not rise (it starts at 50MB and stays in the 50's). A user reported this on Slashdot, and was even kind enough to make a screencast of it: http://www.youtube.com/watch?v=64pX97-THXc As a Slashdot user myself I must say Slashdot seems sluggish on Firefox, this issue might be a big part of that. (There are also other issues, like bug 648278 - apparently Slashdot doesn't test on Firefox at all?)
I will do some profiling with valgrind to try to find out what is going on here.
Assignee: nobody → azakai
For some reason, we do *not* do a GC while typing in Slashdot comment fields. The memory increase here is caused by GC-able stuff, and it doesn't get cleaned up. This doesn't seem to happen on a test webpage I created with a textarea. I will investigate the Slashdot page some more.
Some more observations: 1. Just scrolling up and down is enough to increase memory by a few MB every few seconds. 2. Doing something like control-shift-J to open the JS error console will trigger a GC. This does not clear away all the memory, but a few seconds later a second set of GCs are done, which finally do clear things back to normal. (I am looking at a printf that shows up each time js_GC() is called.)
Attached file reduced testcase (deleted) —
Reduced testcase. The attachment is an HTML file that uses onscroll to create lots of collectable garbage. We do not do any GCing at all until 70MB of garbage is created (can change N in the testcase to set how fast garbage is created). At that point, memory does not increase further due to GC, however, that 70MB is *not* freed no matter how long you scroll and/or wait. But, if you open the JS error console (control-shift-J), then 5-10 seconds later all that 70MB will be cleaned up, I guess opening that window triggers a different form of GC?
Assignee: azakai → general
Component: General → JavaScript Engine
Product: Firefox → Core
QA Contact: general → general
Summary: Extreme memory and CPU usage when typing a Slashdot comment → 70MB of collectible garbage not cleaned up
Blocks: mlk-fx5+
Forgot to write proper STR for the testcase: 1. Load page 2. Scroll down by pressing down arrow. Over 70MB of garbage should be created in less than a second. 3. Wait however long you want, and watch that memory usage remains the same, that is, the 70MB is never cleaned up. Optionally, also do 4. Press control-shift-J to open the JS console. 5. Wait 10 seconds. 6. See the 70MB of garbage cleaned up.
OS: Linux → All
Hardware: x86 → All
I can confirm the reduced testcase and steps from comment 5 but only in Nightly. Aurora and 4.0.1 work as expected. Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a1) Gecko/20110503 Firefox/6.0a1 about:memory says that its js/gc-heap that contains the memory. Manual trigger of GC does the same as opening the JS Console (few seconds before memory drops).
> but only in Nightly. Aurora and 4.0.1 work as expected. Hmm. Can we figure out a regression range here?
Last good nightly: 2011-04-26 First bad nightly: 2011-04-27 Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c5e8cc100248&tochange=c833fb1623ca
Tracemonkey for tm to m-c merge in above changesets: Last good nightly: 2011-04-13 First bad nightly: 2011-04-14 Pushlog: http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=cd5910633dd8&tochange=dc6ce44bedd4
Looks likely to be a background finalization problem. Hugh, thanks for hunting those ranges down! Gregor, can you take a look?
(In reply to comment #10) > Looks likely to be a background finalization problem. Hugh, thanks for > hunting those ranges down! > > Gregor, can you take a look? I tried the reduced testcase but after navigating away the JS heap size went back to normal again. I also tried the slashdot page and saw the incredible increase during typing. Once I navigate away the JS heap size decreases again (with a delay of about 40sec). I didn't see the CPU intensive part though. It stays between 5 and 10% for me. Alon you mention that slashdot is sluggish on your system. What system do you have? (CPU, RAM, OS) We try to perform GC when we detect idle time but it seems that doesn't work for low-end systems. The background finalization patch moves a huge amount of work to a second thread that is executed in parallel to the main thread after the GC. Maybe this affected the idle time detection on low-end systems.
Gregor, this bug is about the fact that GC doesn't run while the page is just there; navigating away will trigger GC.
(In reply to comment #11) > (In reply to comment #10) > > Looks likely to be a background finalization problem. Hugh, thanks for > > hunting those ranges down! > > > > Gregor, can you take a look? > > I tried the reduced testcase but after navigating away the JS heap size went > back to normal again. As bz said, the issue here is that, without navigating away, the 70MB is never released (but it used to). > I didn't see the CPU intensive part though. It stays between 5 and > 10% for me. > > Alon you mention that slashdot is sluggish on your system. What system do > you have? (CPU, RAM, OS) > The sluggishness is relative I guess, I originally tested on a slower 2GHz Lenovo laptop, but I also see the main issue here (the 70MB not being reclaimed) on a powerful MacBook, and other people reproduce it as well elsewhere. (Side issue: Comparing browsers on the same machine, we use significantly more CPU (2x) and RAM (up to 70MB) than Chrome on Slashdot.)
Attached patch patch (obsolete) (deleted) — Splinter Review
Ok I see what the problem is. The background patch increased the memory footprint and the trigger mechanism in MaybeGC might be too loose. We can fix this by changing the trigger for the GC in MaybeGC. This patch has to be tested but it solves the problem for me.
The patch does not fix the problem for me with the STR from comment 5.
(In reply to comment #15) > The patch does not fix the problem for me with the STR from comment 5. Does it never work or just sometimes? What happens is you keep scrolling with about 10sec pauses in between? For me it releases it maybe 8 out of 10 times after scrolling.
I tried 3 times, with greater than 10 second pauses, and it didn't work in any of those. Maybe I was just unlucky.
Tightening the GC could help but isn't the allocation of 70 MB for the input of a few hundred bytes of text the real problem? The 70MB shouldn't be allocated in the first place. I'm not sure how much memory is needed but intuition tells me that 70MB is way too much, probably by a factor of 1000.
(In reply to comment #18) > Tightening the GC could help but isn't the allocation of 70 MB for the input > of a few hundred bytes of text the real problem? The text is not related to the 70MB. That 70MB is generated by the JavaScript in the testcase page, which purposefully creates many many MB of garbage JS objects. Which simulates what Slashdot does, every time you press a key or scroll on Slashdot, lots and lots of JS garbage is created (I don't know why they wrote it that way). That many MB of garbage are created on Slashdot and in the testcase page, is expected (see the JavaScript that creates the garbage in each case). The bug however is that we do not clean it up eventually (unless you do something like browse to somewhere else).
(In reply to comment #19) > (In reply to comment #18) > > Tightening the GC could help but isn't the allocation of 70 MB for the input > > of a few hundred bytes of text the real problem? > > The text is not related to the 70MB. That 70MB is generated by the > JavaScript in the testcase page, which purposefully creates many many MB of > garbage JS objects. What's interesting though is that the testcase only allocates an extra 3MB in Chrome.
Chrome runs GC much more often than we do; it almost certainly runs it multiple times while the testcase runs.
We also perform a few GCs during the testcase but at the end we stop executing and if you don't move the mouse any more we don't even get MaybeGC calls. So it's random how full the JS heap is after you stop scrolling.
Attached patch patch (deleted) — Splinter Review
This patch runs a GC every 5 seconds if we have any calls to MaybeGC. Note that we need 2 GCs in order to give back the memory to the OS because of background finalization. It should solve the reduced testcase issue but the slashdot situation does not get better. Even after full GCs the memory doesn't go down. This seems to be a leak.
Attachment #531133 - Attachment is obsolete: true
Our GC heuristics depend on the size of the GC heap after the last GC. The background finalization broke that as at the end of js_GC() we now call setGCLastBytes with the value that does not account for any arena finalized in the background. I guess as a simple hack we could try to move setGCLastBytes at the end of the background finalization.
(In reply to comment #24) > I guess as a simple hack we could try to move setGCLastBytes at the end of > the background finalization. Or at least try to call it again accounting for the newly allocated GC arenas.
The original problem here is that the 70MB during typing in SlashDot are not reclaimed. They don't even get reclaimed if we perform full GCs! So lets focus on this issue here. The reduced testcase shows a different (well known) problem: We don't perform GCs during idle time. I filed bug 656120 for that.
(In reply to comment #26) > The original problem here is that the 70MB during typing in SlashDot are not > reclaimed. They don't even get reclaimed if we perform full GCs! > So lets focus on this issue here. > > The reduced testcase shows a different (well known) problem: We don't > perform GCs during idle time. I filed bug 656120 for that. Gregor, in what situation are you seeing that even full GCs do not reclaim all the memory? In my testing, both on Slashdot and in the testcase, memory is reclaimed if something like the JS error console is opened and closed, which triggers a GC. For example, load the Slashdot page linked to in comment #0, scroll up and down until memory goes from 95 to 130, wait a few minutes (or longer), see that memory stays up. Then press control-shift-J to open the JS error console, and close it, and see that 5-10 seconds later memory goes down to 95. Or, instead of scrolling, click to add a comment and hold down a key (so that the key repeats and lots of text is entered), again wait until memory goes from 95 to 130, and continue as in the last paragraph, with the same results. Does memory not get reclaimed in these STRs for you?
(In reply to comment #27) > > Gregor, in what situation are you seeing that even full GCs do not reclaim > all the memory? > > In my testing, both on Slashdot and in the testcase, memory is reclaimed if > something like the JS error console is opened and closed, which triggers a > GC. > > For example, load the Slashdot page linked to in comment #0, scroll up and > down until memory goes from 95 to 130, wait a few minutes (or longer), see > that memory stays up. Then press control-shift-J to open the JS error > console, and close it, and see that 5-10 seconds later memory goes down to > 95. > > Or, instead of scrolling, click to add a comment and hold down a key (so > that the key repeats and lots of text is entered), again wait until memory > goes from 95 to 130, and continue as in the last paragraph, with the same > results. > > Does memory not get reclaimed in these STRs for you? Yes you are right. The memory goes down again after some time. I guess I didn't wait long enough. I just tested on my netbook and after a few lines I almost couldn't type any more. With my usual typing speed I was always one word ahead of what I saw on the screen. Very annoying!
> I just tested on my netbook and after a few lines I almost couldn't type any > more. With my usual typing speed I was always one word ahead of what I saw > on the screen. Very annoying! Yes, the overall Firefox user experience on Slashdot is not good. I filed an evangelism bug for that, bug 656347.
(In reply to comment #28) > > Yes you are right. The memory goes down again after some time. I guess I > didn't wait long enough. I'm about to land bug 654041 which adds buttons to about:memory to force GC and CC. Should be very helpful to avoid this not-waiting-long-enough problem :)
No longer blocks: mlk-fx5+
Depends on: 656120
Whiteboard: [MemShrink:P2]
Bug 656120 just landed. Can we confirm whether it helped with the reduced test case?
I confirmed that this bug is now fixed, memory is reclaimed properly (after a short pause). Marking resolved. Great stuff, Gregor! We still don't have good performance on Slashdot, due to large amounts of garbage it creates all the time (scrolling, keypresses, etc.). However, at least that garbage is now reclaimed.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
So this bug is resolved fixed because we got the fix for Firefox 7 over in Bug 656120. But this is a regression in Firefox 6 and we don't seem to have a fix there. Is that correct? Are we going to just live with it for 6?
Whiteboard: [MemShrink:P2] → [MemShrink:P2] [Fixed in 7 by Bug 656120]
(In reply to comment #33) > So this bug is resolved fixed because we got the fix for Firefox 7 over in > Bug 656120. But this is a regression in Firefox 6 and we don't seem to have > a fix there. Is that correct? Are we going to just live with it for 6? I don't see what else we can do. Even if a spot-fix could be devised, if the patch from bug 656120 was deemed too risky for FF6 why would a spot-fix be deemed acceptable, esp. since we are much later in the cycle?
That's great information, njn. I'm not making any judgements here, just looking for data so we can make the best use of our time in the big group triage meeting tomorrow. Thanks for the follow-up.
Not tracking for Firefox 6 - too late for that train.
qa+: QA verification needed on Firefox 7 using the attached testcase and the steps in comment 5
Whiteboard: [MemShrink:P2] [Fixed in 7 by Bug 656120] → [MemShrink:P2] [Fixed in 7 by Bug 656120][qa+]
Verified on Win7/64 bit with Firefox 7RC, Aurora and Nightly: Build identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0) Gecko/20100101 Firefox/7.0 Build identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0a2) Gecko/20110914 Firefox/8.0a2 Build identifier: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:9.0a1) Gecko/20110922 Firefox/9.0a1 In the reduced test case page, when scrolling page up/down, memory usage increases with about 70 MB (on Aurora and 7RC) or 100MB (on Nightly), but decreases back in a few seconds after the scrolling is stopped.
LINUX: Mozilla/5.0 (X11; Linux x86_64; rv:9.0a1) Gecko/20110922 Firefox/9.0a1 Mozilla/5.0 (X11; Linux x86_64; rv:8.0a2) Gecko/20110916 Firefox/8.0a2 Mozilla/5.0 (X11; Linux x86_64; rv:7.0) Gecko/20100101 Firefox/7.0 MAC: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0a1) Gecko/20110922 Firefox/9.0a1 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0a2) Gecko/20110921 Firefox/8.0a2 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0) Gecko/20100101 Firefox/7.0 Same as in comment 38 with Firefox 7RC on Ubuntu and MAC 10.6
Status: RESOLVED → VERIFIED
Whiteboard: [MemShrink:P2] [Fixed in 7 by Bug 656120][qa+] → [MemShrink:P2] [Fixed in 7 by Bug 656120][qa!]
Verified as fixed also on Win XP: Mozilla/5.0 (Windows NT 5.1; rv:7.0) Gecko/20100101 Firefox/7.0
http://www.cs.umd.edu/~mwh/papers/expositor-submitted.pdf claims that there is an unfixed underlying bug that caused this bug, a race on gcChunksWaitingToExpire.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #41) > http://www.cs.umd.edu/~mwh/papers/expositor-submitted.pdf claims that there > is an unfixed underlying bug that caused this bug, a race on > gcChunksWaitingToExpire. gcChunksWaitingToExpire and the mentioned timer-based code path don't exist any more. But they are right, bug 631733 in combination with bug 650326 caused some problems.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: