Closed Bug 403481 Opened 17 years ago Closed 17 years ago

minefield memory usage keeps growing as png images are dynamically loaded (maps.google.com)

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: nirvn.asia, Assigned: vlad)

References

(
URL
)

Details

(Keywords: memory-leak, regression)

Attachments

(3 files, 4 obsolete files)

system monitor showing minefield memory usage 17 years ago Mathieu Pellerin (deleted), image/png		Details
simple testcase 17 years ago (not currently active) Ted Mielczarek (deleted), text/html		Details
diffbloatdump.pl --use-address 17 years ago (not currently active) Ted Mielczarek (deleted), text/plain		Details
diffbloatdump.pl --use-address --allocation-count 17 years ago (not currently active) Ted Mielczarek (deleted), text/plain		Details
Use alternate gfxXlibSurface call which manages the pixmap itself 17 years ago Jon Burgess (deleted), patch		Details \| Diff \| Splinter Review
put back accidentally removed code 17 years ago Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond) (deleted), patch	pavlov : review+ pavlov : superreview+	Details \| Diff \| Splinter Review
more bulletproofing 17 years ago Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond) (deleted), patch	pavlov : review+	Details \| Diff \| Splinter Review

Mathieu Pellerin

Reporter

Description

•

17 years ago

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a9pre) Gecko/2007110404 Minefield/3.0a9pre Build Identifier: Memory usage of latest firefox minefield grows infinitely as it loads pngs. The problem can easily be seen by navigating maps.google.com for a couple of minutes. It seems a loaded image will leak some part of itself as memory consumption keeps growing on maps.google.com even if you happen to request map tile image files you had previously charged. Furthermore, memory isn't released after closing page. Reproducible: Always Steps to Reproduce: 1. Go to maps.google.com 2. Spend five minutes exploring a part of the world Actual Results: Memory usage grows indefinitely Expected Results: Memory usage is not reaching +200meg in five minutes This was confirmed also being a problem in safe mode with all extensions disabled, under a new profile.

Mathieu Pellerin

Reporter

Comment 1

•

17 years ago

Attached image system monitor showing minefield memory usage (deleted) — Details

Evidence A

u88484

Comment 2

•

17 years ago

Just to note WFM on windows...stays around 56 mb

Denis Saintilma

Comment 3

•

17 years ago

I can confirm this issue. The memory allocation reach 315MB+ since the nov 11 build.

(not currently active) Ted Mielczarek

Updated

•

17 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

(not currently active) Ted Mielczarek

Comment 4

•

17 years ago

Just as a data point, using RAMBack doesn't appear to recover any memory.

(not currently active) Ted Mielczarek

Comment 5

•

17 years ago

Attached file simple testcase (obsolete) (deleted) — Details

This just loads a single image, and then in a setInterval resets the img.src = src + ? + random number. Memory usage will creep up while viewing this page.

Denis Saintilma

Comment 6

•

17 years ago

(In reply to comment #3) > I can confirm this issue. The memory allocation reach 315MB+ since the nov 11 > build. > Sorry, It's with the Nov 7 build that this problem started.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 7

•

17 years ago

So either Bug 386585 or Bug 399630?

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 8

•

17 years ago

backing out Bug 399630 doesn't seem to help.

Blocks: 386585

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 9

•

17 years ago

(In reply to comment #8) > backing out Bug 399630 doesn't seem to help. But whoever debugs this, better to re-check, that is a small patch anyway.

(not currently active) Ted Mielczarek

Comment 10

•

17 years ago

Attached file diffbloatdump.pl --use-address (obsolete) (deleted) — Details

I ran my testcase with trace-malloc. I dumped a bloat log right after loading it, and then let it run for a little bit and dumped another log. This attachment is diffbloatdump between the two.

(not currently active) Ted Mielczarek

Comment 11

•

17 years ago

Attached file diffbloatdump.pl --use-address --allocation-count (obsolete) (deleted) — Details

And this is diffbloatdump --allocation-count on the same logs.

(not currently active) Ted Mielczarek

Updated

•

17 years ago

Attachment #288377 - Attachment is obsolete: true

(not currently active) Ted Mielczarek

Updated

•

17 years ago

Attachment #288378 - Attachment is obsolete: true

(not currently active) Ted Mielczarek

Comment 12

•

17 years ago

Ok, I re-ran with --depth=20 and filtered them through fix-linux-stack: http://people.mozilla.com/~tmielczarek/allocations.1-2.diff.log http://people.mozilla.com/~tmielczarek/allocations.1-2.count.diff.log Each of those is ~3.5Mb.

David Baron :dbaron:

Comment 13

•

17 years ago

So the bulk of the stuff there is: * sqlite stuff from the URL classifier * image-related stuff from the GTK drag code My initial guess is that the URL classifier was still loading its db or something like that, and the leak here is related to the GTK drag code. But I could be wrong. Taking another diff after doing the steps to see the leak a few times longer than you did it for this log should help show which it is.

Mathieu Pellerin

Reporter

Comment 14

•

17 years ago

Hrm; why was this specific memory issue/leak was not discovered by one of the nice automated tests and leak detector mechanism that were setup in the past few months? Anyone? It seems this could have gone unnoticed for still quite some time, making it pretty hard to find the regression window.

(not currently active) Ted Mielczarek

Comment 15

•

17 years ago

Because we don't run google maps as a testcase? New leak testing is being worked on, that's not the issue here. I re-ran using google maps, I guess my testcase isn't actually representative: 644289 malloc 785075 realloc (/home/luser/obj-firefox-debug/tools/trace-malloc/lib/../../../ ../mozilla/tools/trace-malloc/lib/nsTraceMalloc.c:1182) 774892 PR_Realloc (/home/luser/obj-firefox-debug/nsprpub/pr/src/malloc/../.. /../../../mozilla/nsprpub/pr/src/malloc/prmem.c:483) 742156 NS_Realloc_P (/home/luser/obj-firefox-debug/xpcom/base/../../../moz illa/xpcom/base/nsMemoryImpl.cpp:292) 742156 nsTArray_base::EnsureCapacity(unsigned int, unsigned int) (/home/ luser/obj-firefox-debug/xpcom/build/nsTArray.cpp:104) 742556 char* nsTArray<char>::AppendElements<char>(char const*, unsigne d int) (/home/luser/obj-firefox-debug/modules/libpr0n/src/../../../dist/include/ xpcom/nsTArray.h:501) 742556 imgContainer::AddRestoreData(char const*, unsigned int) (/hom e/luser/obj-firefox-debug/modules/libpr0n/src/../../../../mozilla/modules/libpr0 n/src/imgContainer.cpp:586) 742556 ReadDataOut (/home/luser/obj-firefox-debug/modules/libpr0n/ decoders/png/../../../../../mozilla/modules/libpr0n/decoders/png/nsPNGDecoder.cp p:309)

(not currently active) Ted Mielczarek

Updated

•

17 years ago

Attachment #288349 - Attachment is obsolete: true

(not currently active) Ted Mielczarek

Comment 16

•

17 years ago

Full log at: http://people.mozilla.com/~tmielczarek/gmaps-allocations.diff.log ~3.5Mb

Glenn Randers-Pehrson

Comment 17

•

17 years ago

wfm on Windows XP. Taking bug.

Status: NEW → ASSIGNED

Version: unspecified → Trunk

Glenn Randers-Pehrson

Updated

•

17 years ago

Assignee: nobody → glennrp

Status: ASSIGNED → NEW

Glenn Randers-Pehrson

Updated

•

17 years ago

Status: NEW → ASSIGNED

Glenn Randers-Pehrson

Comment 18

•

17 years ago

Has anyone verified that backing out the patch from bug #386585 stops the leakage?

Mathieu Pellerin

Reporter

Comment 19

•

17 years ago

ted, this is not google maps specific, leak has to do with loading images in general (seems to be png specific). I'm simply surprised it was not caught as part of other leak test that has images in it. Btw, comment was meant to help, not to criticize :)

Michael Ventnor

Updated

•

17 years ago

Flags: blocking1.9?

Mathieu Pellerin

Reporter

Comment 20

•

17 years ago

Has the libpng update (bug 403239) healed the fox?

Glenn Randers-Pehrson

Comment 21

•

17 years ago

There is nothing in that update having anything to do with memory allocation. What is more interesting is whether backing out bug #386585 heals it.

Mike Schroepfer

Comment 22

•

17 years ago

Moving to blocking P2

Flags: blocking1.9? → blocking1.9+

Priority: -- → P2

Glenn Randers-Pehrson

Comment 23

•

17 years ago

Notice that the original report says "Gecko/2007110404 Minefield/3.0a9pre". 20071104 was before libpng-1.2.22 was checked in.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 24

•

17 years ago

Oh, comment 6 says Nov 7. Someone needs to find the right regression range.

Glenn Randers-Pehrson

Comment 25

•

17 years ago

I've been reviewing checkins back to October 10, and the only one that looks suspicious is bug #296818. I didn't see anything else apparently involving PNG and memory allocations.

Mathieu Pellerin

Reporter

Comment 26

•

17 years ago

glenn, I didn't report the bug using a faulty build. I had to downgrade to November 4th nightly, not affected by this bug, as memory usage was hurting my poor 512meg ram operating system. Sorry for any confusion.

Mathieu Pellerin

Reporter

Comment 27

•

17 years ago

Confirming memory usage is normal using the hourly build ( Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a9pre) Gecko/2007110614 Minefield/3.0a9pre) right before libpng 1.2.22 was applied to trunk. Remains to test whether following hourly with patch applied is problematic.

Mathieu Pellerin

Reporter

Comment 28

•

17 years ago

Hrm, memory usage is normal using the hourly build (Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a9pre) Gecko/2007110620 Minefield/3.0a9pre) which, according to the hourly-archive page, has libpng 1.2.22 patch applied. Smaug/anyone, have you tried a build without patch from bug 399630 applied?

Glenn Randers-Pehrson

Comment 29

•

17 years ago

(In reply to comment #20) > Has the libpng update (bug 403239) healed the fox? > Has anyone tested this? I originally thought this update had nothing to do with memory allocation, but it does remove use of snprintf() that was introduced in bug #386585.

James Schwinabart [:mutantmonkey] (he/him)

Comment 30

•

17 years ago

Memory usage seems normal using Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b2pre) Gecko/2007111504

Glenn Randers-Pehrson

Comment 31

•

17 years ago

Mathiew, can we mark this FIXED now?

Denis Saintilma

Comment 32

•

17 years ago

(In reply to comment #31) > Mathiew, can we mark this FIXED now? > This bug is not resolved with the latest build Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b2pre) Gecko/2007111620 Minefield/3.0b2pre ID:2007111620

Denis Saintilma

Comment 33

•

17 years ago

oops, it's not added in the trunk yet, my bad

Mathieu Pellerin

Reporter

Comment 34

•

17 years ago

glenn, nope, memory usage is still out of control using Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b2pre) Gecko/2007111800 Minefield/3.0b2pre

Mathieu Pellerin

Reporter

Comment 35

•

17 years ago

libpng might not be the guilty one. download more hourly builds to try and find the patch which led to leak, I came down to this: The regression happened between 2007-11-06 20:55 and 2007-11-07 12:52 http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-11-06+20%3A55&maxdate=2007-11-07+12%3A52&cvsroot=%2Fcvsroot

Steve England [:stevee]

Comment 36

•

17 years ago

There's an hourly archive at: - http://hourly-archive.localgho.st/ Which has the last 14 days' builds. So if you are real quick, then you could get a more accurate range. But soon the builds from the 6th are going to fall off the end..

u88484

Comment 37

•

17 years ago

I keep getting hit by this now on windows...even though I said WFM the other day. I was using the hybrid mode (the satellite view with road names mode), panning around and firefoxs memory consumption just skyrocketed even after stopping panning...firefox soon became unresponsive and killed through task manager.

OS: Linux → Windows XP

Reed Loden [:reed]

Updated

•

17 years ago

OS: Windows XP → All

Jesse Ruderman

Updated

•

17 years ago

Keywords: mlk, regression

Mathieu Pellerin

Reporter

Comment 38

•

17 years ago

steve, yep, am using this page. Doing my best but Cambodian internet connections are not reliable nor very fast. Hope range we have now can be enough for coders to figure out which patch is guilty.

Glenn Randers-Pehrson

Comment 39

•

17 years ago

The change to nsWindow.cpp near the end of the patch for bug #380464, which is in the range looks a little suspicious. if (mThebesSurface && !mThebesSurface->CairoStatus()) { [some code removed] else { mThebesSurface = nsnull; } becomes if (mThebesSurface && mThebesSurface->CairoStatus() != 0) mThebesSurface = nsnull; which seems to reverse the conditions for setting mThebesSurface to nsnull.

Glenn Randers-Pehrson

Comment 40

•

17 years ago

(In reply to comment #39) Oh, never mind, they are the same. The new version is easier to read, too.

Harsh86

Comment 41

•

17 years ago

Ive just finished testing the hourly builds and as far as i can tell the regression starts in the 2007-11-06 22:57 build. Can someone confirm?

Glenn Randers-Pehrson

Comment 42

•

17 years ago

kritenks: There was a problem with Firefox downloading the large phishing-sites database when it was started with a fresh or outdated profile, that had similar symptoms (see bug #404645 and dupes, and Asa's comment about the Firefox 3 beta1 release on Mozillazine and slashdot). This probably confused some of our range testing. The range for the phishing bug seems to be from 19 Nov through 21 Nov, and affected earlier minefield nightly builds run during that period. Kurt: Re: comment #37. Please see if the problem has gone away on Windows. It is working now on Windows for me, but locked up two days ago, so I suppose I was seeing the phishing database bug.

u88484

Comment 43

•

17 years ago

(In reply to comment #42) > Kurt: Re: comment #37. Please see if the problem has gone away on Windows. > It is working now on Windows for me, but locked up two days ago, so I suppose I > was seeing the phishing database bug. > Must have been that bug because this WFM once again. Sorry about that I just remembered reading this bug a while ago and I was trying to find something with google maps as soon as I started firefox up and figured I finally hit this bug. Changing back to linux.

OS: All → Linux

Mathieu Pellerin

Reporter

Comment 44

•

17 years ago

so based on kritenks hourly test and mine, regression happened between 2007-11-06 20:50 to 22:57: http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-11-06+20%3A50&maxdate=2007-11-06+22%3A57&cvsroot=%2Fcvsroot that leaves us with 5 bugs: bug 398880, getElementsByClassName('') leaks an nsVoidArray Bug 313984, ASSERTION: Either two people are trying to submit or the previous submit bug 394751, hang with long lines of text and page break characters bug 396315, correctly fetch cmaps for all fonts bug 399369, disable font fallback when a character is in the PUA ... all of them unfortunately seems unrelated to png image memory leak problem.

Bill Gianopoulos [:WG9s]

Comment 45

•

17 years ago

(In reply to comment #44) > so based on kritenks hourly test and mine, regression happened between > 2007-11-06 20:50 to 22:57: Well that regression range in incorrect. According to the tinderbox pages there was no Linux build that started at 2007-11-06 20:50. There was a build at 20:43 and the next one started at 20:55. I am doing CVS pulls by time from the source tree from just before and just after the libpng update to verify it as the cause.

Bill Gianopoulos [:WG9s]

Comment 46

•

17 years ago

I found the real culprit. This is a regression caused by the ceck-in for bug 380464. This was verified via back-out.

Assignee: glennrp → nobody

Blocks: 380464
No longer blocks: 386585

Status: ASSIGNED → NEW

Component: ImageLib → GFX: Thebes

QA Contact: imagelib → thebes

Mike Schroepfer

Comment 47

•

17 years ago

Vlad can you take a look since this is caused by bug 380464?

Assignee: nobody → vladimir

Jon Burgess

Comment 48

•

17 years ago

I've been seeing a pixmap leak in minefield with the maps at www.openstreetmap.org. After a few hours I've seen xrestop report 4GB+ of pixmaps in use. The commit for bug 380464 looks like it removes a gdk_pixmap_unref(pixmap) call. Perhaps I'm missing something, but how does it get freed now?

Glenn Randers-Pehrson

Comment 49

•

17 years ago

Resolving INVALID and removing the block of 1.9, per comments 42, 46, and 48.

Assignee: vladimir → glennrp

Flags: blocking1.9+

Glenn Randers-Pehrson

Updated

•

17 years ago

Status: NEW → RESOLVED

Closed: 17 years ago

Resolution: --- → INVALID

Ryan VanderMeulen [:RyanVM]

Comment 50

•

17 years ago

How is this INVALID? There's a clear bug with a known regression range. It may not be related to your libpng update, but that doesn't make it any less of a bug.

Status: RESOLVED → REOPENED

Resolution: INVALID → ---

Ryan VanderMeulen [:RyanVM]

Updated

•

17 years ago

Assignee: glennrp → nobody

Status: REOPENED → NEW

Ryan VanderMeulen [:RyanVM]

Updated

•

17 years ago

Assignee: nobody → vladimir

Flags: blocking1.9?

Glenn Randers-Pehrson

Comment 51

•

17 years ago

OK. But shouldn't the flags show that this depends on bug #380464, instead of the other way around, and reopen bug #380464?

Status: NEW → ASSIGNED

Ryan VanderMeulen [:RyanVM]

Comment 52

•

17 years ago

This bug is already marked as blocking 380464. It's been that way since 2007-11-23 09:54:55 PST. I'll leave it to Vlad to decide how he wants to handle this otherwise.

Ryan VanderMeulen [:RyanVM]

Comment 53

•

17 years ago

And no, the standard procedure on Bugzilla is that regressions from bugs are marked as blocking the original bug, not depends on. There's long discussion in the newsgroups about the reasoning behind that.

Bill Gianopoulos [:WG9s]

Comment 54

•

17 years ago

Just to clarify. 1. Bugs tat cause regressions are only re-opened if it is decided to clar the regression by backing out the patch. 2. The dependency filed has to do with fixing a bug. as in this bug cannot be fixed until the bug it depends on is fixed. SO, if you switch it form fixing to breaking you have to reverse the state. It is all logical if not confusing. I have filed bug 405215 that suggest modifying the filed labels to clarify the way we use them.

Mike Beltzner [:beltzner, not reading bugmail]

Comment 55

•

17 years ago

(restoring blocking)

Flags: blocking1.9? → blocking1.9+

Jon Burgess

Comment 56

•

17 years ago

Attached patch Use alternate gfxXlibSurface call which manages the pixmap itself (deleted) — Details — Splinter Review

The attached patch seems to make the pixmap usage stable on my machine. While browsing the map it goes up to around 25MB and then drops back to 8MB about a minute after the tab is closed. NB: The patch does not address the !xrenderFormat case. I'm not an X, GTK or Mozilla coder so I can't claim that it is the right way to fix the problem.

Daniel C

Comment 57

•

17 years ago

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b2pre) Gecko/2007112504 Minefield/3.0b2pre Browsing http://maps.google.co.uk/ in hybrid mode for about 5 minutes caused System Monitor to report: Process Name: firefox-bin Virtual Memory: 245.8 MiB Resident Memory: 89.0 MiB Writable Memory: 72.3 MiB Shared Memory: 15.1 MiB X Server Memory: 1.7 GiB Closing all tabs and waiting for a few minutes returns little or no memory. Test using hybrid mode for quickest reproduction.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 58

•

17 years ago

Attached patch put back accidentally removed code (obsolete) (deleted) — Details — Splinter Review

Too much code got removed in 380464; put this rather important chunk back.

Attachment #290241 - Flags: superreview?(pavlov)

Attachment #290241 - Flags: review?(pavlov)

Stuart Parmenter

Updated

•

17 years ago

Attachment #290241 - Flags: superreview?(pavlov)

Attachment #290241 - Flags: superreview+

Attachment #290241 - Flags: review?(pavlov)

Attachment #290241 - Flags: review+

David Baron :dbaron:

Comment 59

•

17 years ago

Attachment 290241 [details] [diff] fixes the massive leak I was seeing (while using Google Maps) in the X server's address space due to our leaking pixmaps (as shown by xrestop).

Jon Burgess

Comment 60

•

17 years ago

Attachment 290421 [details] [diff] works for me too. There is still a possibility of a leak though: if gdk_pixmap_new() succeeds and the gfxXlibSurface() call fails.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 61

•

17 years ago

Attached patch more bulletproofing (deleted) — Details — Splinter Review

Bulletproof some more and free the pixmap in all error cases.

Attachment #290241 - Attachment is obsolete: true

Attachment #290287 - Flags: review?(pavlov)

Stuart Parmenter

Updated

•

17 years ago

Attachment #290287 - Flags: review?(pavlov) → review+

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 62

•

17 years ago

Fix checked in.

Status: ASSIGNED → RESOLVED

Closed: 17 years ago → 17 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.