Closed
Bug 854799
Opened 12 years ago
Closed 12 years ago
need the ability to discard image memory immediately
Categories
(Core :: Graphics: ImageLib, defect)
Tracking
()
People
(Reporter: djf, Assigned: khuey)
References
Details
(Whiteboard: [MemShrink:P1])
Attachments
(2 files, 3 obsolete files)
(deleted),
patch
|
justin.lebar+bug
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review |
The FirefoxOS gallery app has to handle very large images on a very memory constrained device. This is most difficult when scanning for new images. If images have a sufficiently large EXIF preview, the gallery just uses that and all is well.
But for large images without previews, the app must decode the full-size image in order to create a preview image and a thumbnail image. Decoding a 5mp image takes 20mb of memory. The app needs to be able to scan these new images quickly.
But there is no way to force gecko to free image memory immediately. If the app scans and decodes preview-less images as quickly as it can, it crashes with an OOM, even though it is using a single off-screen image element for all the image decoding.
In order to prevent the OOM crashes, the app inserts an artificial delay between images. For 2mp images without previews, I had a 400ms delay. For 5mp images, I've had to wait more than 3 seconds between images. I'm actually working on a patch (in bug 847060) that squares the number of megapixels, multiplies that by 150ms and pauses for that long before moving on to the next image. It is a horrible heuristic, and won't always be good enough to avoid OOMs on all hardware in all conditions.
I've tried reducing the image.mem.min_discard_timeout_ms pref from 10000 to 200, but it doesn't really help. I still have to wait more than 1500 ms between images.
So I really need a way to force image memory to be freed, or at least to find out when the image memory has been freed.
Possibly this should be considered a DOM bug instead of an imagelib bug. Feel feel to reassign it as appropriate.
Comment 1•12 years ago
|
||
> But there is no way to force gecko to free image memory immediately.
Setting src to "" really should work. (I know you've tried it.) Can you please provide a testcase?
Updated•12 years ago
|
Whiteboard: [MemShrink]
Reporter | ||
Comment 2•12 years ago
|
||
The link is to a github pull request that adds a test case to the UITests app.
To test: pull the PR into a local branch and push to your phone.
Launch the browser to use up some memory and have a realistic load on the phone
Launch the UITests app
Scroll to the bottom and tap on Big Image Decode test
Click on the Start button and watch.
You'll see a counter appear. Each time it increments the app has successfully decoded another 8mp jpeg image. It may increment a few times, but you'll probably see an OOM pretty soon. If not, then attempting to switch to the browser app will probably cause the OOM.
The first line of test_apps/uitest/js/bigimage.js is |var pause = 0|. If you change that value to 1000, you may find that the OOM will stop happening. If not, an even higher value may work. For the gallery app, I need to use over 3000, but in this test case it doesn't seem as bad.
I think that without a sufficiently long pause, the test case is allocating memory for a second image before the memory for the first image has been released. So if the system does not have enough memory for two 8mp images to be decoded at the same time we get an OOM even though we only need to have one image decoded at a time.
There is a similar test online at http://djf.net/bigimage.html I can't reproduce the OOM when running that test in the browser app, though.
Attachment #729727 -
Flags: feedback?(justin.lebar+bug)
Reporter | ||
Comment 3•12 years ago
|
||
Justin: I asked for your feedback on that attachment just as a way go bring the test case to your attention.
Comment 4•12 years ago
|
||
These "delay" tactics would probably need to be re-examined once we get to 22 with the multi-threaded image decoding.
Updated•12 years ago
|
Attachment #729727 -
Flags: feedback?(justin.lebar+bug)
Updated•12 years ago
|
Whiteboard: [MemShrink] → [MemShrink:P1]
Updated•12 years ago
|
Assignee: nobody → khuey
Assignee | ||
Comment 5•12 years ago
|
||
So this is fun. img.src = "" works, unless img is not actually in the document ...
Assignee | ||
Comment 6•12 years ago
|
||
Attachment #732616 -
Flags: review?(justin.lebar+bug)
Assignee | ||
Comment 7•12 years ago
|
||
If we're still taking b2g patches this could be a nice win ...
Comment 8•12 years ago
|
||
Comment on attachment 732616 [details] [diff] [review]
Patch
r=me, but is the comment right? You're talking about images which were "never in the document", but that does not seem to be a necessary condition to hit this bug.
Attachment #732616 -
Flags: review?(justin.lebar+bug) → review+
Assignee | ||
Comment 9•12 years ago
|
||
Yeah I should just say not in the document.
Reporter | ||
Comment 10•12 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #7)
> If we're still taking b2g patches this could be a nice win ...
It would be *such* a nice win! Nominating for tef.
Bug 847060 was tef+ and is the one that introduced the 3 second delay between preview-less images when scanning. If we can uplift this fix, we can remove that delay, which would be really nice for gallery startup.
blocking-b2g: --- → tef?
Comment 11•12 years ago
|
||
I don't think khuey meant to close this?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 12•12 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #11)
> I don't think khuey meant to close this?
Uh, no. I'm not going to pretend to know what happened there.
Status: REOPENED → ASSIGNED
Updated•12 years ago
|
tracking-b2g18:
? → ---
Updated•12 years ago
|
blocking-b2g: tef? → tef+
Comment 13•12 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Comment 14•12 years ago
|
||
Could this have caused Tp5 regressions? It's in the ranges for the ones from today....
Comment 15•12 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #14)
> Could this have caused Tp5 regressions? It's in the ranges for the ones
> from today....
Holding off on uplifting this until this is answered.
Assignee | ||
Comment 16•12 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #14)
> Could this have caused Tp5 regressions? It's in the ranges for the ones
> from today....
Anything is possible but I think it's pretty unlikely.
Comment 17•12 years ago
|
||
It's possible that a site sets some img.src = '' and then inserts say another image with the old src into the DOM, and we have to re-decode it.
It's not clear to me how we fix that without breaking the b2g use-case. But you could certainly detect it on tp5 with judiciously-placed printfs.
Assignee | ||
Comment 18•12 years ago
|
||
That's possible. Bug 839103 is in the regression range too though and already claims to regress Dromaeo (comment 30) so I'm going to blame froydnj ;-)
Assignee | ||
Comment 19•12 years ago
|
||
Plus if we land this on b2g and see a regression there we'll know it's this.
Comment 20•12 years ago
|
||
Bug 839103 is in some of the mails, but only one or two. But yes, it's the obvious other candidate.
This patch makes us sync-discard on ~HTMLImageElement. How expensive is discarding? Could it be the action of discarding all the images during page unload that matters? Or is it possible we were keeping the images decoded until Tp looped back to the page again?
Assignee | ||
Comment 21•12 years ago
|
||
Justin can correct me if I'm wrong, but discarding appears to be pretty cheap. Re-decoding, of course, is not. It may be possible that we were keeping images around until Tp looped, depending on how long that loop takes ...
Comment 22•12 years ago
|
||
I'd be very surprised if discarding itself was slowing us down.
Comment 23•12 years ago
|
||
As a side note, I wonder whether ~HTMLImageElement should in fact force-discard or whether it should be limited to src="". Especially in cases when there are multiple images pointing to the same src...
Assignee | ||
Comment 24•12 years ago
|
||
I'm pretty sure that RequestDiscard doesn't actually discard if other things have the image locked.
Comment 25•12 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #16)
> (In reply to Boris Zbarsky (:bz) from comment #14)
> > Could this have caused Tp5 regressions? It's in the ranges for the ones
> > from today....
>
> Anything is possible but I think it's pretty unlikely.
This bug or one of the others from the same push (Bug 849654, Bug 855804) is pretty clearly the culprit for this Windows PGO Tp5 regression:
http://graphs.mozilla.org/graph.html#tests=[[255,131,31]]&sel=1364931977000,1365104777000&displayrange=7&datatype=running
Assignee | ||
Comment 26•12 years ago
|
||
(In reply to Matt Brubeck (:mbrubeck) from comment #25)
> This bug or one of the others from the same push (Bug 849654, Bug 855804) is
> pretty clearly the culprit for this Windows PGO Tp5 regression:
>
> http://graphs.mozilla.org/graph.html#tests=[[255,131,31]]&sel=1364931977000,
> 1365104777000&displayrange=7&datatype=running
That's a non-PGO build for x64 Windows 8, which is like triple unsupported. I really wish graphserver would show me the other data series though.
Comment 27•12 years ago
|
||
You can use the links here to view before/after numbers and graphs for other builds:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=tp5&fromchange=d30f993f121f&tochange=0d688b7c4411
For example, this Windows 7 non-PGO graph shows a clear regression where the only pushes in range are yours and mine (and mine was NPOTB on desktop):
http://graphs.mozilla.org/graph.html#tests=[[275,131,12]]&sel=none&displayrange=7&datatype=running
Windows 7 PGO had a similar-sized (5-6%) regression at the same time, though the range is larger because of fewer PGO runs:
http://graphs.mozilla.org/graph.html#tests=[[255,63,12]]&sel=none&displayrange=7&datatype=running
Assignee | ||
Comment 28•12 years ago
|
||
Ok so I guess the next question is whether or not comment 20's conjecture is true.
Comment 29•12 years ago
|
||
Marking the status flags accordingly so this can be uplifted to the appropriate branches
status-b2g18:
--- → affected
status-b2g18-v1.0.1:
--- → affected
Comment 30•12 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #28)
> Ok so I guess the next question is whether or not comment 20's conjecture is
> true.
Um, maybe this is a naive question but did you do a tryserver push with/without the patch? Was there a difference in the perf numbers?
The Win7 opt tp numbers show a 5% regression between 798d734dc32a and 6bcb665562b0, so I pushed some tryserver builds to try and narrow down the exact culprit.
changeset: 127495:6bcb665562b0
user: Kyle Huey <khuey@kylehuey.com>
date: Wed Apr 03 09:57:11 2013 -0700
summary: Bug 857186: Make virtualenv paths relative. r=gps
changeset: 127494:b436cad50c33
user: Kyle Huey <khuey@kylehuey.com>
date: Wed Apr 03 09:54:35 2013 -0700
summary: Bug 849654: Kill nsRefPtrHashtableMT. r=bsmedberg
changeset: 127493:0caf5937f8bc
user: Kyle Huey <khuey@kylehuey.com>
date: Wed Apr 03 09:52:25 2013 -0700
summary: Bug 854799: Make image.src='' discard the image immediately even if the image is not in the document. r=jlebar
https://tbpl.mozilla.org/?tree=Try&rev=a4d42bd6dc9a
changeset: 127492:ef873e1fb7e9
user: Kyle Huey <khuey@kylehuey.com>
date: Wed Apr 03 09:49:17 2013 -0700
summary: Bug 855804: Add hashtable helpers for cycle collection. r=bz
https://tbpl.mozilla.org/?tree=Try&rev=02f68ce1b761
changeset: 127491:798d734dc32a
user: Jan de Mooij <jdemooij@mozilla.com>
date: Wed Apr 03 18:09:26 2013 +0200
summary: No bug - Fix merge conflict on a CLOSED TREE. r=red
https://tbpl.mozilla.org/?tree=Try&rev=bda444ebb97d
Comment 31•12 years ago
|
||
This patch is the cause of the 5% regression in tp5 that took place on 4/3:
without patch (798d734dc32a): https://tbpl.mozilla.org/?tree=Try&rev=bda444ebb97d
winxp opt: 347.48 win7 opt: 362.18 win8 opt: 250.23
with patch (0caf5937f8bc): https://tbpl.mozilla.org/?tree=Try&rev=a4d42bd6dc9a
winxp opt: 370.55 win7 opt: 381.64 win8 opt: 264.79
http://graphs.mozilla.org/graph.html#tests=[[255,131,12]]&sel=1364997395719.561,1365027600597.6099&displayrange=7&datatype=running
note: the changeset in between changesets 798d734dc32a and 0caf5937f8bc has similar tp results to 798d734dc32a.
I think we should back this out until the performance regression is resolved.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 32•12 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #20)
> This patch makes us sync-discard on ~HTMLImageElement. How expensive is
> discarding? Could it be the action of discarding all the images during page
> unload that matters? Or is it possible we were keeping the images decoded
> until Tp looped back to the page again?
The current Tp test suite loads on the same page repeatedly rather than cycling through the other pages before loading the same page again. Which makes it very sensitive to whether something is cached or already decoded or not.
Assignee | ||
Comment 33•12 years ago
|
||
(In reply to John Daggett (:jtd) from comment #32)
> (In reply to Boris Zbarsky (:bz) from comment #20)
>
> > This patch makes us sync-discard on ~HTMLImageElement. How expensive is
> > discarding? Could it be the action of discarding all the images during page
> > unload that matters? Or is it possible we were keeping the images decoded
> > until Tp looped back to the page again?
>
> The current Tp test suite loads on the same page repeatedly rather than
> cycling through the other pages before loading the same page again. Which
> makes it very sensitive to whether something is cached or already decoded or
> not.
Mmm. This is just an artifact of the way Talos measures then.
Comment 34•12 years ago
|
||
> The current Tp test suite loads on the same page repeatedly rather than cycling through > the other pages before loading the same page again. Which makes it very sensitive to
> whether something is cached or already decoded or not.
Given this, it sounds to me pretty unlikely we'd want to do anything differently here. What do you think?
In any case, this bug should stay FIXED unless it's actually backed out.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 35•12 years ago
|
||
roc, wondering if you think we should take the 5% tp regression here or not?
Flags: needinfo?(roc)
I wouldn't have thought so. Reloading the current page (or part of the current page) is common.
Shouldn't this bug be prevented by having a memory pressure signal which flushes the image cache?
Flags: needinfo?(roc)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #36)
> Shouldn't this bug be prevented by having a memory pressure signal which
> flushes the image cache?
Flushes decoded image data, of course.
Another approach would be to have a way to decode an image directly to a thumbnail. Not b2g18 material of course, but it could be a lot more efficient than decoding to a full-size image.
Comment 39•12 years ago
|
||
Attachment #734473 -
Flags: review?(roc)
Attachment #734473 -
Flags: review?(roc) → review+
Comment 40•12 years ago
|
||
Backout changeset 0caf5937f8bc pushed to mozilla-inbound
https://hg.mozilla.org/integration/mozilla-inbound/rev/6286b4578f14
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 41•12 years ago
|
||
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #38)
> Another approach would be to have a way to decode an image directly to a
> thumbnail. Not b2g18 material of course, but it could be a lot more
> efficient than decoding to a full-size image.
That's the topic of bug 854795, which I'd love to have your thoughts on. I'm told that our jpeg library has the ability to do this, so we'd just need to hook it up somehow.
Reporter | ||
Comment 42•12 years ago
|
||
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #36)
> I wouldn't have thought so. Reloading the current page (or part of the
> current page) is common.
>
> Shouldn't this bug be prevented by having a memory pressure signal which
> flushes the image cache?
On FirefoxOS devices, the images I'm concerned with require something like 15% of available memory to decode. And during the scan process, I need to decode one right after the other. Because I'm using memory in such big chunks, I'm not convinced that a memory pressure model will do the right thing.
Could Kyle's patch land if it was behind a pref? Or if it was modified so it only discarded large images immediately? (Perhaps where the definition of large was a pref?)
If I'm understanding the regression correctly, this patch is causing images to be re-decoded when a document is reloaded. Is there any way the patch could be modified so that it only applies to images that have never been in the document? (See comments 8&9). That would do the right thing for the offscreen images I use, but wouldn't affect reloading.
(In reply to David Flanagan [:djf] from comment #42)
> Could Kyle's patch land if it was behind a pref? Or if it was modified so
> it only discarded large images immediately? (Perhaps where the definition
> of large was a pref?)
That sounds like a good idea. Megapixel images are relatively rare on Web sites AFAIK.
> If I'm understanding the regression correctly, this patch is causing images
> to be re-decoded when a document is reloaded. Is there any way the patch
> could be modified so that it only applies to images that have never been in
> the document? (See comments 8&9). That would do the right thing for the
> offscreen images I use, but wouldn't affect reloading.
That might affect real sites that construct the page programmatically.
Comment 44•12 years ago
|
||
What if we modified the patch to _only_ discard on actual explicit src="" sets (as in, sets to the empty string)? Would that be sufficient for b2g for the moment until we can do something smarter?
Assignee | ||
Comment 45•12 years ago
|
||
I guess this patch does effectively revert bug 791731.
*sigh*
Assignee | ||
Comment 46•12 years ago
|
||
Like every other problem in nsImageLoadingContent, we can solve this with more flags.
Attachment #729727 -
Attachment is obsolete: true
Attachment #732616 -
Attachment is obsolete: true
Attachment #734473 -
Attachment is obsolete: true
Attachment #734571 -
Flags: review?(justin.lebar+bug)
Comment 47•12 years ago
|
||
> Shouldn't this bug be prevented by having a memory pressure signal which flushes the image cache?
On most B2G devices, we have a memory-pressure signal which flushes the decoded image cache. It is, however, an emergency measure that we can't rely on.
The problem is that large malloc()'s don't return null; they cause the kernel to kill some process (perhaps ours). So we have to know if the malloc() is going to fail before we call it.
We can choose when we get a low-memory notification, but we only get one notify threshold, so that doesn't help us much with this sort of thing.
We could instead try to read the amount of free memory on the system before making a big allocation, but that doesn't take into account the fact that this process may or may not be the one to be killed when we run out of memory. It also doesn't take into account race conditions in reading the value, and it doesn't protect us from getting close to the limit and then some other non-imagelib module (or another process altogether) OOMing us.
Comment 48•12 years ago
|
||
Not to be contrarian, but I think that the old patch's behavior was almost correct.
Although refreshing the page may be a common operation compared to the universe of all possible operations one can take in a browser, it's much less common than navigating to a new page.
Given a choice, we should optimize for navigating between pages, not for refreshing pages. Therefore, I think we should throw out all images immediately when we navigate to a new page. Keeping them around for 20s is just asking for excessive memory usage. I am very happy to take a hit on TP5 to get this behavior; the problem is the benchmark, not our code.
> I guess this patch does effectively revert bug 791731.
Of course we don't want to regress bug 791731. Is there a way to tell the difference between "document goes away" and "image loading content goes away, but document stays alive"?
Reporter | ||
Comment 49•12 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #44)
> What if we modified the patch to _only_ discard on actual explicit src=""
> sets (as in, sets to the empty string)? Would that be sufficient for b2g
> for the moment until we can do something smarter?
That would more than suffice for the Gallery app.
Assignee | ||
Comment 50•12 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #48)
> Not to be contrarian, but I think that the old patch's behavior was almost
> correct.
>
> Although refreshing the page may be a common operation compared to the
> universe of all possible operations one can take in a browser, it's much
> less common than navigating to a new page.
>
> Given a choice, we should optimize for navigating between pages, not for
> refreshing pages. Therefore, I think we should throw out all images
> immediately when we navigate to a new page. Keeping them around for 20s is
> just asking for excessive memory usage. I am very happy to take a hit on
> TP5 to get this behavior; the problem is the benchmark, not our code.
FWIW, I agree.
> > I guess this patch does effectively revert bug 791731.
>
> Of course we don't want to regress bug 791731. Is there a way to tell the
> difference between "document goes away" and "image loading content goes
> away, but document stays alive"?
Not easily. Since cycle collection doesn't order dtors the best we could do, even with silly things like weakrefs, is "image loading content goes away, but document hasn't gone away yet".
(In reply to David Flanagan [:djf] from comment #49)
> (In reply to Boris Zbarsky (:bz) from comment #44)
> > What if we modified the patch to _only_ discard on actual explicit src=""
> > sets (as in, sets to the empty string)? Would that be sufficient for b2g
> > for the moment until we can do something smarter?
>
> That would more than suffice for the Gallery app.
This is what the latest patch does, more or less.
Comment 51•12 years ago
|
||
> Since cycle collection doesn't order dtors the best we could do, even with silly things like
> weakrefs, is "image loading content goes away, but document hasn't gone away yet".
Perhaps we could throw away the images when the document gets detached from its docshell?
But that's getting outside the scope of this bug.
Assignee | ||
Comment 52•12 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #51)
> Perhaps we could throw away the images when the document gets detached from
> its docshell?
>
> But that's getting outside the scope of this bug.
Well that doesn't solve the refresh problem does it? Won't the previous document get detached before the new copy gets attached?
But yes, we're creeping far away from this bug.
Comment 53•12 years ago
|
||
Comment on attachment 734571 [details] [diff] [review]
Patch
> + void ClearCurrentRequest(nsresult aReason, uint32_t aFlags = REQUEST_DISCARD);
> + void ClearPendingRequest(nsresult aReason, uint32_t aFlags = REQUEST_DISCARD);
Flags that default to 0 are fine, but defaulting flags to REQUEST_DISCARD is asking for trouble, I think.
Attachment #734571 -
Flags: review?(justin.lebar+bug) → review+
Comment 54•12 years ago
|
||
> Well that doesn't solve the refresh problem does it?
Indeed, but per comment 48 I don't think that's something we should worry about.
Assignee | ||
Comment 55•12 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #53)
> Comment on attachment 734571 [details] [diff] [review]
> Patch
>
> > + void ClearCurrentRequest(nsresult aReason, uint32_t aFlags = REQUEST_DISCARD);
> > + void ClearPendingRequest(nsresult aReason, uint32_t aFlags = REQUEST_DISCARD);
>
> Flags that default to 0 are fine, but defaulting flags to REQUEST_DISCARD is
> asking for trouble, I think.
It makes sense though because that's what all the current callers want. I could invert the sense of the flags here ... flags suck :-/
Comment 56•12 years ago
|
||
> It makes sense though because that's what all the current callers want.
Explicit is better than implicit?
I suppose you could invert the direction of these flags, but then we'd have a mismatch between this flag and the nsIDocument::REQUEST_DISCARD one.
> flags suck :-/
No argument here... :)
Comment 57•12 years ago
|
||
So the thing is.... when navigating between pages we want to not redecode the images the pages share, right?
Comment 58•12 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #57)
> So the thing is.... when navigating between pages we want to not redecode
> the images the pages share, right?
If we want to avoid regressing TP, yes. But it seems to me that this became the current behavior by accident.
Comment 59•12 years ago
|
||
It seems like if we want to avoid regressing common browsing behavior, we want to avoid such redecodes...
Comment 60•12 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #59)
> It seems like if we want to avoid regressing common browsing behavior, we
> want to avoid such redecodes...
Well, it's a trade-off. Right now when I leave a page with lots of images, my memory usage doesn't go down for 20s or so, and that's bad too.
Perhaps we can have a hack where we throw away decoded images when we navigate between two pages of different origins, but we keep them otherwise. Might that be satisfactory?
Comment 61•12 years ago
|
||
Or perhaps not discarding until the onload of the new page?
Or both?
Comment 62•12 years ago
|
||
> Or perhaps not discarding until the onload of the new page?
The problem is, by that point the images on the new page have finished loading, and we might have resultantly OOMed the content process on B2G.
Moving images into discardable memory on B2G would help, if that's feasible. So would lowering the limit on how much unlocked decoded data we keep around; that would help avoid pathological cases on desktop, although it doesn't help much for B2G.
Comment 63•12 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•12 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Justin Lebar [:jlebar] from comment #60)
> Well, it's a trade-off. Right now when I leave a page with lots of images,
> my memory usage doesn't go down for 20s or so, and that's bad too.
It's not bad if you're on desktop and have gobs of memory. In that case, hanging onto decoded images is definitely the right thing to do.
Waiting until the docshell's new page has finished onload sounds good. But as you say, it's still OOM-risky if the new page ends up not sharing images with the old page. I think making the right decisions will sometimes require us to be aware of memory pressure and there's no way around that. It sounds like we need better machinery for that.
Comment 65•12 years ago
|
||
> It's not bad if you're on desktop and have gobs of memory. In that case, hanging onto decoded images
> is definitely the right thing to do.
If you're on desktop /and/ you have gobs of memory, it probably doesn't matter much what we do, because you probably have a fast, many-core machine, and with multithreaded image decoding, we can likely re-decode the images very quickly.
> I think making the right decisions will sometimes require us to be aware of memory pressure and
> there's no way around that.
I've written extensively about the challenges with this approach, but unfortunately I can't find my most recent mail. So to summarize:
If we change our behavior based on the amount of free memory on the machine -- e.g. if we drop unnecessary images and bfcache when memory gets low -- then we're allowing the other processes the user is running on the machine to change our behavior. In essence we become the nicest process on the machine, trying never to push memory into the page file.
This might be a good strategy, and it might not be. You can imagine machines that are chronically low on memory where this strategy would essentially mean that we disable bfcache for the user, or where we end up re-decoding lots of images. Both of these would be especially painful because presumably the machine is slow and also loaded with processes sucking up the CPU. Maybe letting the OS page something out would be better than taking this approach. (Or maybe not; we are not in a position to judge.)
Separately, low-memory notifications are difficult because you have to fire them early enough that you have time to react (e.g. a GC takes a while), but not so early that you end up dropping things unnecessarily. And you can't react synchronously to memory pressure; you have to wait for the main thread event loop to finish what it's doing. Otherwise e.g. any call to malloc() can trigger a gc, and that is craziness.
I've experimented with this stuff on Windows -- we have the ability to intercept VirtualAlloc calls, read the amount of free memory on the system, and trigger a memory-pressure notification at the next possible moment which triggers a gc/cc and drops images and bfcache -- but I wasn't able to overcome these challenges, so this code remains mostly unused.
Assignee | ||
Comment 66•12 years ago
|
||
Comment 67•12 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 68•12 years ago
|
||
This has some conflicts on b2g18 that I'm not comfortable resolving. Please post a branch-specific patch.
Assignee | ||
Comment 70•12 years ago
|
||
Flags: needinfo?(khuey)
Assignee | ||
Updated•12 years ago
|
Keywords: checkin-needed
Comment 71•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g18/rev/05d9bc1a7aac
https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/ada03bf95314
status-b2g18-v1.0.0:
--- → wontfix
status-firefox21:
--- → wontfix
status-firefox22:
--- → wontfix
status-firefox23:
--- → fixed
Keywords: checkin-needed
Reporter | ||
Comment 72•12 years ago
|
||
This patch does not solve the issue I'm having with the Gallery app. If I wait 2-3 seconds after processing each 5mp image, I generally do not crash. With this patch I was not supposed to need to wait. But when I remove the timeouts, I get an OOM. I've tested this with nightly unagi builds for b2g18-1.0.1, b2g18.
With a 3000 ms timeout I don't crash (at least not right away). Without it I do.
Kyle: any idea what might be happening here? Does your patch include some kind of pref that I need to set in order to enable it?
Flags: needinfo?(khuey)
Reporter | ||
Comment 73•12 years ago
|
||
It doesn't look to me like there are any prefs to set... Just double checking: setting img.src = '' is still the official way to release the image memory, right?
Comment 74•12 years ago
|
||
Come find me (I'm on IRC) and let's debug this f2f.
Reporter | ||
Comment 75•12 years ago
|
||
Justin: I'm not at the Madrid work week, so I can't work on this with you f2f.
As a test case, see my pull request here: https://github.com/mozilla-b2g/gaia/pull/9216 It removes the setTimeout() idle time after scanning a large image and inserts a bunch of console.log() statements.
Combine that with the 5 images attached to bug 847060 (Make a couple of copies of each one and put them on your sdcard.) You should see the gallery OOM while trying to scan them. (You can always adb shell rm -r /data/local/indexedDB/*gallery*) to clear the gallery data and force the scan to resart from the beginning.
Comment 76•12 years ago
|
||
> Justin: I'm not at the Madrid work week, so I can't work on this with you f2f.
So I discovered! Sorry to hear that.
khuey should work with you on this.
Assignee | ||
Comment 77•12 years ago
|
||
Ok so I still see some weirdness here. On my Windows desktop running the test case with this patch still leaves a large buildup of memory somewhere outside the heap. So something is mmapping a bunch of memory or something. I'll dig in more tonight.
Flags: needinfo?(khuey)
Assignee | ||
Comment 78•12 years ago
|
||
Alright what I'm seeing appears to be a windows specific problem. So unfortunately I can't immediately help with comment 75 :-(
Comment 79•12 years ago
|
||
We should file a separate bug on the Windows issue, which looks really scary.
Can we get you a b2g device? Or otherwise, can you steal someone's Mac? If not, we need to ask for someone else's help here. I just feel bad pulling someone off other B2G work when you're free for the picking...
Comment 80•12 years ago
|
||
Reopening as this did not fix the b2g issue reported in bug 861965.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 81•12 years ago
|
||
Kyle, what do you think?
Assignee | ||
Comment 82•12 years ago
|
||
I haven't been able to steal another machine yet.
Comment 83•12 years ago
|
||
I think we are hanging on to the image in the canvas image cache. This patch is correct, it just didn't fix all of it. I will file a blocking bug.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•