Closed Bug 403481 Opened 17 years ago Closed 17 years ago

minefield memory usage keeps growing as png images are dynamically loaded (maps.google.com)

Categories

(Core :: Graphics, defect, P2)

x86
Linux
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: nirvn.asia, Assigned: vlad)

References

()

Details

(Keywords: memory-leak, regression)

Attachments

(3 files, 4 obsolete files)

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a9pre) Gecko/2007110404 Minefield/3.0a9pre Build Identifier: Memory usage of latest firefox minefield grows infinitely as it loads pngs. The problem can easily be seen by navigating maps.google.com for a couple of minutes. It seems a loaded image will leak some part of itself as memory consumption keeps growing on maps.google.com even if you happen to request map tile image files you had previously charged. Furthermore, memory isn't released after closing page. Reproducible: Always Steps to Reproduce: 1. Go to maps.google.com 2. Spend five minutes exploring a part of the world Actual Results: Memory usage grows indefinitely Expected Results: Memory usage is not reaching +200meg in five minutes This was confirmed also being a problem in safe mode with all extensions disabled, under a new profile.
Evidence A
Just to note WFM on windows...stays around 56 mb
I can confirm this issue. The memory allocation reach 315MB+ since the nov 11 build.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Just as a data point, using RAMBack doesn't appear to recover any memory.
Attached file simple testcase (obsolete) (deleted) —
This just loads a single image, and then in a setInterval resets the img.src = src + ? + random number. Memory usage will creep up while viewing this page.
(In reply to comment #3) > I can confirm this issue. The memory allocation reach 315MB+ since the nov 11 > build. > Sorry, It's with the Nov 7 build that this problem started.
backing out Bug 399630 doesn't seem to help.
Blocks: 386585
(In reply to comment #8) > backing out Bug 399630 doesn't seem to help. But whoever debugs this, better to re-check, that is a small patch anyway.
Attached file diffbloatdump.pl --use-address (obsolete) (deleted) —
I ran my testcase with trace-malloc. I dumped a bloat log right after loading it, and then let it run for a little bit and dumped another log. This attachment is diffbloatdump between the two.
Attached file diffbloatdump.pl --use-address --allocation-count (obsolete) (deleted) —
And this is diffbloatdump --allocation-count on the same logs.
Attachment #288377 - Attachment is obsolete: true
Attachment #288378 - Attachment is obsolete: true
So the bulk of the stuff there is: * sqlite stuff from the URL classifier * image-related stuff from the GTK drag code My initial guess is that the URL classifier was still loading its db or something like that, and the leak here is related to the GTK drag code. But I could be wrong. Taking another diff after doing the steps to see the leak a few times longer than you did it for this log should help show which it is.
Hrm; why was this specific memory issue/leak was not discovered by one of the nice automated tests and leak detector mechanism that were setup in the past few months? Anyone? It seems this could have gone unnoticed for still quite some time, making it pretty hard to find the regression window.
Because we don't run google maps as a testcase? New leak testing is being worked on, that's not the issue here. I re-ran using google maps, I guess my testcase isn't actually representative: 644289 malloc 785075 realloc (/home/luser/obj-firefox-debug/tools/trace-malloc/lib/../../../ ../mozilla/tools/trace-malloc/lib/nsTraceMalloc.c:1182) 774892 PR_Realloc (/home/luser/obj-firefox-debug/nsprpub/pr/src/malloc/../.. /../../../mozilla/nsprpub/pr/src/malloc/prmem.c:483) 742156 NS_Realloc_P (/home/luser/obj-firefox-debug/xpcom/base/../../../moz illa/xpcom/base/nsMemoryImpl.cpp:292) 742156 nsTArray_base::EnsureCapacity(unsigned int, unsigned int) (/home/ luser/obj-firefox-debug/xpcom/build/nsTArray.cpp:104) 742556 char* nsTArray<char>::AppendElements<char>(char const*, unsigne d int) (/home/luser/obj-firefox-debug/modules/libpr0n/src/../../../dist/include/ xpcom/nsTArray.h:501) 742556 imgContainer::AddRestoreData(char const*, unsigned int) (/hom e/luser/obj-firefox-debug/modules/libpr0n/src/../../../../mozilla/modules/libpr0 n/src/imgContainer.cpp:586) 742556 ReadDataOut (/home/luser/obj-firefox-debug/modules/libpr0n/ decoders/png/../../../../../mozilla/modules/libpr0n/decoders/png/nsPNGDecoder.cp p:309)
Attachment #288349 - Attachment is obsolete: true
wfm on Windows XP. Taking bug.
Status: NEW → ASSIGNED
Version: unspecified → Trunk
Assignee: nobody → glennrp
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
Has anyone verified that backing out the patch from bug #386585 stops the leakage?
ted, this is not google maps specific, leak has to do with loading images in general (seems to be png specific). I'm simply surprised it was not caught as part of other leak test that has images in it. Btw, comment was meant to help, not to criticize :)
Flags: blocking1.9?
Has the libpng update (bug 403239) healed the fox?
There is nothing in that update having anything to do with memory allocation. What is more interesting is whether backing out bug #386585 heals it.
Moving to blocking P2
Flags: blocking1.9? → blocking1.9+
Priority: -- → P2
Notice that the original report says "Gecko/2007110404 Minefield/3.0a9pre". 20071104 was before libpng-1.2.22 was checked in.
Oh, comment 6 says Nov 7. Someone needs to find the right regression range.
I've been reviewing checkins back to October 10, and the only one that looks suspicious is bug #296818. I didn't see anything else apparently involving PNG and memory allocations.
glenn, I didn't report the bug using a faulty build. I had to downgrade to November 4th nightly, not affected by this bug, as memory usage was hurting my poor 512meg ram operating system. Sorry for any confusion.
Confirming memory usage is normal using the hourly build ( Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a9pre) Gecko/2007110614 Minefield/3.0a9pre) right before libpng 1.2.22 was applied to trunk. Remains to test whether following hourly with patch applied is problematic.
Hrm, memory usage is normal using the hourly build (Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a9pre) Gecko/2007110620 Minefield/3.0a9pre) which, according to the hourly-archive page, has libpng 1.2.22 patch applied. Smaug/anyone, have you tried a build without patch from bug 399630 applied?
(In reply to comment #20) > Has the libpng update (bug 403239) healed the fox? > Has anyone tested this? I originally thought this update had nothing to do with memory allocation, but it does remove use of snprintf() that was introduced in bug #386585.
Memory usage seems normal using Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b2pre) Gecko/2007111504
Mathiew, can we mark this FIXED now?
(In reply to comment #31) > Mathiew, can we mark this FIXED now? > This bug is not resolved with the latest build Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b2pre) Gecko/2007111620 Minefield/3.0b2pre ID:2007111620
oops, it's not added in the trunk yet, my bad
glenn, nope, memory usage is still out of control using Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b2pre) Gecko/2007111800 Minefield/3.0b2pre
libpng might not be the guilty one. download more hourly builds to try and find the patch which led to leak, I came down to this: The regression happened between 2007-11-06 20:55 and 2007-11-07 12:52 http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-11-06+20%3A55&maxdate=2007-11-07+12%3A52&cvsroot=%2Fcvsroot
There's an hourly archive at: - http://hourly-archive.localgho.st/ Which has the last 14 days' builds. So if you are real quick, then you could get a more accurate range. But soon the builds from the 6th are going to fall off the end..
I keep getting hit by this now on windows...even though I said WFM the other day. I was using the hybrid mode (the satellite view with road names mode), panning around and firefoxs memory consumption just skyrocketed even after stopping panning...firefox soon became unresponsive and killed through task manager.
OS: Linux → Windows XP
OS: Windows XP → All
Keywords: mlk, regression
steve, yep, am using this page. Doing my best but Cambodian internet connections are not reliable nor very fast. Hope range we have now can be enough for coders to figure out which patch is guilty.
The change to nsWindow.cpp near the end of the patch for bug #380464, which is in the range looks a little suspicious. if (mThebesSurface && !mThebesSurface->CairoStatus()) { [some code removed] else { mThebesSurface = nsnull; } becomes if (mThebesSurface && mThebesSurface->CairoStatus() != 0) mThebesSurface = nsnull; which seems to reverse the conditions for setting mThebesSurface to nsnull.
(In reply to comment #39) Oh, never mind, they are the same. The new version is easier to read, too.
Ive just finished testing the hourly builds and as far as i can tell the regression starts in the 2007-11-06 22:57 build. Can someone confirm?
kritenks: There was a problem with Firefox downloading the large phishing-sites database when it was started with a fresh or outdated profile, that had similar symptoms (see bug #404645 and dupes, and Asa's comment about the Firefox 3 beta1 release on Mozillazine and slashdot). This probably confused some of our range testing. The range for the phishing bug seems to be from 19 Nov through 21 Nov, and affected earlier minefield nightly builds run during that period. Kurt: Re: comment #37. Please see if the problem has gone away on Windows. It is working now on Windows for me, but locked up two days ago, so I suppose I was seeing the phishing database bug.
(In reply to comment #42) > Kurt: Re: comment #37. Please see if the problem has gone away on Windows. > It is working now on Windows for me, but locked up two days ago, so I suppose I > was seeing the phishing database bug. > Must have been that bug because this WFM once again. Sorry about that I just remembered reading this bug a while ago and I was trying to find something with google maps as soon as I started firefox up and figured I finally hit this bug. Changing back to linux.
OS: All → Linux
so based on kritenks hourly test and mine, regression happened between 2007-11-06 20:50 to 22:57: http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-11-06+20%3A50&maxdate=2007-11-06+22%3A57&cvsroot=%2Fcvsroot that leaves us with 5 bugs: bug 398880, getElementsByClassName('') leaks an nsVoidArray Bug 313984, ASSERTION: Either two people are trying to submit or the previous submit bug 394751, hang with long lines of text and page break characters bug 396315, correctly fetch cmaps for all fonts bug 399369, disable font fallback when a character is in the PUA ... all of them unfortunately seems unrelated to png image memory leak problem.
(In reply to comment #44) > so based on kritenks hourly test and mine, regression happened between > 2007-11-06 20:50 to 22:57: Well that regression range in incorrect. According to the tinderbox pages there was no Linux build that started at 2007-11-06 20:50. There was a build at 20:43 and the next one started at 20:55. I am doing CVS pulls by time from the source tree from just before and just after the libpng update to verify it as the cause.
I found the real culprit. This is a regression caused by the ceck-in for bug 380464. This was verified via back-out.
Assignee: glennrp → nobody
Blocks: 380464
No longer blocks: 386585
Status: ASSIGNED → NEW
Component: ImageLib → GFX: Thebes
QA Contact: imagelib → thebes
Vlad can you take a look since this is caused by bug 380464?
Assignee: nobody → vladimir
I've been seeing a pixmap leak in minefield with the maps at www.openstreetmap.org. After a few hours I've seen xrestop report 4GB+ of pixmaps in use. The commit for bug 380464 looks like it removes a gdk_pixmap_unref(pixmap) call. Perhaps I'm missing something, but how does it get freed now?
Resolving INVALID and removing the block of 1.9, per comments 42, 46, and 48.
Assignee: vladimir → glennrp
Flags: blocking1.9+
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → INVALID
How is this INVALID? There's a clear bug with a known regression range. It may not be related to your libpng update, but that doesn't make it any less of a bug.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Assignee: glennrp → nobody
Status: REOPENED → NEW
Assignee: nobody → vladimir
Flags: blocking1.9?
OK. But shouldn't the flags show that this depends on bug #380464, instead of the other way around, and reopen bug #380464?
Status: NEW → ASSIGNED
This bug is already marked as blocking 380464. It's been that way since 2007-11-23 09:54:55 PST. I'll leave it to Vlad to decide how he wants to handle this otherwise.
And no, the standard procedure on Bugzilla is that regressions from bugs are marked as blocking the original bug, not depends on. There's long discussion in the newsgroups about the reasoning behind that.
Just to clarify. 1. Bugs tat cause regressions are only re-opened if it is decided to clar the regression by backing out the patch. 2. The dependency filed has to do with fixing a bug. as in this bug cannot be fixed until the bug it depends on is fixed. SO, if you switch it form fixing to breaking you have to reverse the state. It is all logical if not confusing. I have filed bug 405215 that suggest modifying the filed labels to clarify the way we use them.
(restoring blocking)
Flags: blocking1.9? → blocking1.9+
The attached patch seems to make the pixmap usage stable on my machine. While browsing the map it goes up to around 25MB and then drops back to 8MB about a minute after the tab is closed. NB: The patch does not address the !xrenderFormat case. I'm not an X, GTK or Mozilla coder so I can't claim that it is the right way to fix the problem.
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b2pre) Gecko/2007112504 Minefield/3.0b2pre Browsing http://maps.google.co.uk/ in hybrid mode for about 5 minutes caused System Monitor to report: Process Name: firefox-bin Virtual Memory: 245.8 MiB Resident Memory: 89.0 MiB Writable Memory: 72.3 MiB Shared Memory: 15.1 MiB X Server Memory: 1.7 GiB Closing all tabs and waiting for a few minutes returns little or no memory. Test using hybrid mode for quickest reproduction.
Attached patch put back accidentally removed code (obsolete) (deleted) — Splinter Review
Too much code got removed in 380464; put this rather important chunk back.
Attachment #290241 - Flags: superreview?(pavlov)
Attachment #290241 - Flags: review?(pavlov)
Attachment #290241 - Flags: superreview?(pavlov)
Attachment #290241 - Flags: superreview+
Attachment #290241 - Flags: review?(pavlov)
Attachment #290241 - Flags: review+
Attachment 290241 [details] [diff] fixes the massive leak I was seeing (while using Google Maps) in the X server's address space due to our leaking pixmaps (as shown by xrestop).
Attachment 290421 [details] [diff] works for me too. There is still a possibility of a leak though: if gdk_pixmap_new() succeeds and the gfxXlibSurface() call fails.
Attached patch more bulletproofing (deleted) — Splinter Review
Bulletproof some more and free the pixmap in all error cases.
Attachment #290241 - Attachment is obsolete: true
Attachment #290287 - Flags: review?(pavlov)
Attachment #290287 - Flags: review?(pavlov) → review+
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: