Closed Bug 665832 Opened 13 years ago Closed 13 years ago

Consider a custom allocation scheme for decoded image data

Categories

(Core :: Graphics: ImageLib, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: khuey, Unassigned)

Details

Allocations for decoded image data are substantially different from other allocations in the browser, which leads me to believe that they may benefit from a custom allocation scheme. In particular: 1) These allocations can easily grow to extremely large sizes (www.theatlantic.com/infocus). 2) There is a well defined and well understood lifetime associated with decoded images. 3) They are typically (if not always?) destroyed together (either at tab close, or when the discard timer fires). glandium, njn, jesup, and I discussed a setup on IRC that we think might offer some benefits over the current heap allocation setup. - Allocate decoded image data in pools that are associated with each 'tab' (whether that means window/document/etc we didn't really flesh out) - Give "large" images their own pages, lump a bunch of "small" images together. What the cutoffs for large and small are here needs some measurement. The idea would be to give images their own pages unless the overhead (for an N byte image, smallest # of pages that can hold N * sizeof a page - N) is too high. - Explicitly map in pages (VirtualAlloc/mmap) and unmap them when done to ensure that memory can be reclaimed by the OS (and thus isn't counted against the browser by users). There's some evidence to suggest that this isn't currently happening well. - Blow away the entire pool when the discard timer fires/the tab closes/whatever. The known unknowns - Some images are currently shared between tabs. We would have to handle these somehow. The idea we had was to share "large" images and live with duplicating "small" ones. This likely forces the "small"/"large" boundary to be fairly low. - Some pathological behaviors might be possible depending on how fancy we get. A naive implementation where we never free anything until tab close could be DOSed fairly quickly. Only freeing "large" images would be pretty straightforward and might alleviate most of the problem here. If we have to track and free "small" images too that will add some overhead. - None of our existing allocator code is really set up to do this. The PresShell arena might be the closest thing we have. Thoughts/comments/questions/reasons why this would never work welcome.
Would the ordering into pools at all reflect the visible ordering on web pages? There's been some talk of discarding images that would take more than a page of scrolling to become visible (bug 660577 and related), and I was wondering how compatible this scheme would be with that.
I'm not an expert in this code, but I was poking around yesterday in both imagelib and the allocator. I'm skeptical this would be a useful optimization. Presumably when you say "image" here, you mean "decoded image"? > - Give "large" images their own pages, jemalloc already does this, for "large" = 1M (512x512 px, since 512*512*4 = 1M). See huge_alloc() in jemalloc.c. > - Explicitly map in pages (VirtualAlloc/mmap) and unmap them when done to > ensure that memory can be reclaimed by the OS (and thus isn't counted against > the browser by users). There's some evidence to suggest that this isn't > currently happening well. See bug 664642 comment 18 -- I think the decoded image data on Linux isn't stored in the process, which is why I wasn't seeing an RSS decrease when we discarded. The question that needs to be answered is: Why might this be a useful change? - It might be useful if small images were fragmenting the heap. But I suspect that, in terms of number of allocations, images are a very small fraction of overall malloc usage. - It might be useful if freeing all the images on discard was expensive. But a page with lots of images might have 500 of them? I don't think 500 frees when a timer fires is necessarily something to worry about. And if a page has 500 images, we'll surely create many times that many objects on the heap.
This bug was referring to decoded images, yes. The linux issue of where it's stored: do we have verification of that? Or is it X-specific (more likely than Linux-specific.) And of course this isn't a linux-specific bug; if it only helps other platforms that's still good. Dropping decoded image data is especially important on something like fennec. Images are a pretty fair portion of malloc usage, I believe, and a fairly well-distinguished set (due to the background-tab harvesting). And the issue here is to increase the chances of freeing the memory back to the OS for inactive tabs, especially image-heavy ones. (pron...) And as mentioned, this all is even more important for memory-tight devices like mobile - but it matters for all of them. I typically run out of address space on my browser within 1-5 days (WinXP, 32-bits). This certainly isn't all or even mostly images, but they're part of the equation. We should investigate and see what the results are; I think there's a good chance it will be a win, perhaps a significant win. But we need to see.
(In reply to comment #3) > The linux issue of where it's stored: do we have verification of that? Or > is it X-specific (more likely than Linux-specific.) And of course this > isn't a linux-specific bug; if it only helps other platforms that's still > good. Dropping decoded image data is especially important on something like > fennec. On X, when we "Optimize" an image (directly after it's finished decoding), we upload it to X using gfxXlibSurface and discard our gfxImageSurface. See http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxPlatform.cpp#393 http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxPlatformGtk.cpp#170 and http://mxr.mozilla.org/mozilla-central/source/modules/libpr0n/src/imgFrame.cpp#339
> Images are a pretty fair portion of malloc usage It's critical to distinguish between "decoded images use a large fraction of the heap" and "allocations for decoded images are responsible for a large number of calls to malloc". I suspect writing a custom allocator is only interesting if the latter is true, and I don't think it is. But please prove me wrong! > And the issue here is to increase the chances of freeing the memory back to the > OS for inactive tabs, especially image-heavy ones. (pron...) These kinds of images are all "huge" (greater than 1M) in jemalloc parlance. Each huge allocation is done via a call to mmap. Thus heap fragmentation is not an issue; when you munmap, you free the whole thing.
Taras, can we get telemetry here on decoded image sizes?
My primary interest here is in returning the memory we've allocated for images to the OS when appropriate. If jemalloc is doing a good job and the problems on Linux are due to other things I'm less interested.
(In reply to comment #6) > Taras, can we get telemetry here on decoded image sizes? Sure. So far this bug sounds like a problem in search of a solution. I think you'd want to instrument how much memory is allocated vs how quickly rss increases (or something) to prove that this is even an issue.
Err, that wasn't meant to say solution in search of a problem.
(In reply to comment #9) > Err, that wasn't meant to say solution in search of a problem. Sorry for the spam. s/wasn't/was/
We have a proposed idea on how to increase the amount of memory given back by background tabs, which is related to complaints about overall memory use and the apparent failure to give memory back to the OS (though this is a somewhat soft complaint, and part of it might be the X-only issue referenced above). If there is an issue, and there may be, especially on low-memory (non-VM) systems like Android/fennec, then this solution may well help reduce allocated-but-unused memory caused by background tabs (and possibly even foreground). We need to be careful not to say "it's not a problem on a system with multiple GB of memory and TB of disk, therefore we can ignore it". More data (and maybe some quick test implementations to gather data) will help a lot, and will be useful for other purposes as well when it comes to tuning.
> part of it might be the X-only issue referenced above The only known issue with X at this point is a bug in how about:memory reports how much memory we're using. As far as I know, when we discard an image on X, we free it in the X server. I'm not aware of any leaks or objects we keep alive longer than we intend to. > this solution may well help reduce allocated-but-unused memory caused by > background tabs (and possibly even foreground) By "allocated-but-unused" memory, do you mean "unallocated-and-unusable" memory (i.e., "fragmentation")? I'm not sure how using a custom allocator will reduce memory which is allocated but which we're not currently using. I don't mean to sound like a downer here. There are definitely real issues with memory usage and images. I just think it's really unlikely that the problem or solution is the allocator. Decoded images tend to be large in size and few in number -- this is the best case for jemalloc (or any sane allocator). If you wanted to show that the allocator was a problem here, you could do that by showing that decoded images are fragmenting the heap, or that it's expensive to free all of a page's images when we discard. I don't know what information we'd get from telemetry on how large decoded images are, except to identify how many users look at pr0n.
(In reply to comment #12) > > part of it might be the X-only issue referenced above > > The only known issue with X at this point is a bug in how about:memory > reports how much memory we're using. As far as I know, when we discard an > image on X, we free it in the X server. I'm not aware of any leaks or > objects we keep alive longer than we intend to. Good - I assume that means we discard decoded image data from the X server when a background tab is idle (since it's not being stored in our process memory). > > > this solution may well help reduce allocated-but-unused memory caused by > > background tabs (and possibly even foreground) > > By "allocated-but-unused" memory, do you mean "unallocated-and-unusable" > memory (i.e., "fragmentation")? I'm not sure how using a custom allocator > will reduce memory which is allocated but which we're not currently using. No, I meant memory that's in allocations from the OS (arenas) that aren't empty (and so can't be freed back to the OS) but isn't currently in-use. My assumption has been something is minimizing or blocking fairly full release of the memory back to the OS when background tabs are harvested (given the complaints and the focus on freeing memory, plus the issues you were reporting about Linux (which appear to have a difference, non-problematic explanation)). > I don't mean to sound like a downer here. There are definitely real issues > with memory usage and images. I just think it's really unlikely that the > problem or solution is the allocator. Decoded images tend to be large in > size and few in number -- this is the best case for jemalloc (or any sane > allocator). So what are the issues you see? What percentage of memory freed on background-tab harvesting is being returned to the OS? (We could be more aggressive about freeing decoded images, and smarter about which images to hold onto depending on likely tab-switching and/or scrolling behavior.) > If you wanted to show that the allocator was a problem here, you could do > that by showing that decoded images are fragmenting the heap, or that it's > expensive to free all of a page's images when we discard. I don't know what > information we'd get from telemetry on how large decoded images are, except > to identify how many users look at pr0n. Sounds like I need to generate some numbers myself - I was hoping someone had a way to get them without tons of work. (and my numbers will be slanted to my browsing behavior.) If I find time to make a patch to generate the numbers, I'll upload it here.
(In reply to comment #13) > > By "allocated-but-unused" memory, do you mean "unallocated-and-unusable" > > memory (i.e., "fragmentation")? I'm not sure how using a custom allocator > > will reduce memory which is allocated but which we're not currently using. > > No, I meant memory that's in allocations from the OS (arenas) that aren't > empty (and so can't be freed back to the OS) but isn't currently in-use. Either we're actually talking about the same thing but using different words, or I still don't understand. When I say "allocated" I mean "given by malloc to the main Firefox code," and when I say "unallocated," I mean "inside Firefox's virtual address space, but not currently given out by malloc." When the operating system has given us memory but it's unusable because it's stuck between two chunks allocated by malloc, we call that "external fragmentation". On the other hand, when malloc rounds up its allocations (say I ask for 15 bytes, but malloc reserves a chunk of size 16), we call that "internal fragmentation". I think that here you're suggesting that a type of external fragmentation is a problem. malloc allocates many things on a page, then we free all but one of those things. And now we can't give the page back to the operating system. (The memory is "fragmented" in that all the active pieces are scattered across many more pages than they should be.) Is that what you mean? > What percentage of memory freed on background-tab harvesting is being > returned to the OS? On decoded image discard, almost all the memory goes back to the OS, at least when the image are > 1M decoded, since they're "huge" jemalloc allocations and there's no external fragmentation. If you're not on Linux, it's easy to test in bug 664642. If you're talking about non-image heap objects, I have no idea. > > I don't mean to sound like a downer here. There are definitely real issues > > with memory usage and images. I just think it's really unlikely that the > > problem or solution is the allocator. Decoded images tend to be large in > > size and few in number -- this is the best case for jemalloc (or any sane > > allocator). > > So what are the issues you see? The two issues I've observed are that we don't discard on background tabs as quickly as we should, and that on a page with many images, we don't discard anything, even though we probably should. > (We could be more aggressive about freeing decoded images, and smarter about > which images to hold onto depending on likely tab-switching and/or scrolling > behavior.) In bug 664290, they set the discard timer to 10s. > Sounds like I need to generate some numbers myself - I was hoping someone > had a way to get them without tons of work. (and my numbers will be slanted > to my browsing behavior.) If I find time to make a patch to generate the > numbers, I'll upload it here. That sounds good. I look forward to seeing what you come up with!
(In reply to comment #14) > On decoded image discard, almost all the memory goes back to the OS, at > least when the image are > 1M decoded, since they're "huge" jemalloc > allocations and there's no external fragmentation. A 640 x 480 image, with 4 bytes per pixel (RGBA) is 1.2MB. Most images on the web are much smaller than that.
(In reply to comment #15) > A 640 x 480 image, with 4 bytes per pixel (RGBA) is 1.2MB. Most images on > the web are much smaller than that. Yes, of course. But they don't take up much memory, so even if they do become fragmented, you don't lose so much. (Consider the absolute worse case, where you have 1024 small images each on a page with only one other allocation. We free all the images, but the other allocation keeps the page alive. In that case, we waste less than 4k * 1024 = 4M of memory [less than because the other allocation takes up some space]. And surely jemalloc is much better than this.)
> We free all the images, but the other allocation keeps the page alive. I should say We free all the images, but the other allocation keeps *each* page alive.
That's a version of what I'm concerned about. And 4M is a lot to waste on a mobile device - and I wonder if we may be wasting more than your worst-case. But it's not intuitive how much memory might be wasted in what conditions, which is why I'm looking for data. This bug was opened by Kyle after we were chatting about your Linux stuff and I was blue-skying about what might be going on and about the lifetime/size differences between images and other allocations, especially given the background-tab behavior. And given the lifetime differences, that might come into play. It helps that jemalloc segregates "large" from smaller, but you still could get enough non-image "large" allocs (hashtables, encoded image data (especially), etc) to cause fragmentation. Is that happening? Dunno, and I don't think any of us know. I'm constantly shocked how much memory we're using with lots of tabs open - even when most of them aren't loaded. Which if part of the reason for memshrink, of course. I just want to make sure we're not missing something hidden by the allocation system, since our "real" footprint is how much we allocated from the system (and how much of it is resident in non-mobile systems).
I was supportive of this bug when it was discussed on IRC, but having read the comments I now think it's not that likely to help. In other words, jlebar has convinced me. Randall, 4MB is arguably a lot, but it's a drop in the ocean compared to our total memory usage and other savings that I reckon can be made more easily. So, in the absence of measurements indicating that external fragmentation is a problem, should we WONTFIX this? Or we could leave it open, but it sounds like no-one will work on it either way.
(In reply to comment #19) > I was supportive of this bug when it was discussed on IRC, but having read > the comments I now think it's not that likely to help. In other words, > jlebar has convinced me. I completely agree with this paragraph. :-)
I have no problem with closing this - it was just an idea that people grabbed onto; though I still want more info (in general) on how we deal with decoded image data, especially when it comes to non-VM systems like fennec. I suspect our bigger problem is probably significantly GC-related, perhaps helped along by details of JS and JIT. What I'd take from this bug is we need better, more-detailed ways to look at memory usage and causes. The suggestions and ideas by Steve Fink at http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/7573f7ceeb3cc99a# detail a number of good paths to look at. Info on decoded images still may help in tuning jemalloc - there's nothing magical about the size breaks there; it may be a different break between small and large, or large and huge would help some. Not likely to make a major difference, but perhaps it would reduce fragmentation. I still have some suspicions about blocks getting mixtures of objects with different/conflicting lifetimes or usage patterns causing higher working-set sizes than should be needed, for example. But those are just suspicions; I have no data. There's certainly plenty of good theory out there on GCs, but that's another bug. :-) (I've been down this path (or parts of it) long ago, when we were trying to get 15-30 mozilla 1.0-ish instances running on a P3-class laptop-chipset blade with 256MB under FreeBSD (long before jemalloc).) That was also before tabs...
Don't think we're going to do this.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.