854799 - need the ability to discard image memory immediately

Reporter

Description

•

12 years ago

The FirefoxOS gallery app has to handle very large images on a very memory constrained device. This is most difficult when scanning for new images. If images have a sufficiently large EXIF preview, the gallery just uses that and all is well. But for large images without previews, the app must decode the full-size image in order to create a preview image and a thumbnail image. Decoding a 5mp image takes 20mb of memory. The app needs to be able to scan these new images quickly. But there is no way to force gecko to free image memory immediately. If the app scans and decodes preview-less images as quickly as it can, it crashes with an OOM, even though it is using a single off-screen image element for all the image decoding. In order to prevent the OOM crashes, the app inserts an artificial delay between images. For 2mp images without previews, I had a 400ms delay. For 5mp images, I've had to wait more than 3 seconds between images. I'm actually working on a patch (in bug 847060) that squares the number of megapixels, multiplies that by 150ms and pauses for that long before moving on to the next image. It is a horrible heuristic, and won't always be good enough to avoid OOMs on all hardware in all conditions. I've tried reducing the image.mem.min_discard_timeout_ms pref from 10000 to 200, but it doesn't really help. I still have to wait more than 1500 ms between images. So I really need a way to force image memory to be freed, or at least to find out when the image memory has been freed. Possibly this should be considered a DOM bug instead of an imagelib bug. Feel feel to reassign it as appropriate.

David Flanagan [:djf]

Reporter

Updated

•

12 years ago

Blocks: 854783

Justin Lebar (not reading bugmail)

Comment 1

•

12 years ago

> But there is no way to force gecko to free image memory immediately. Setting src to "" really should work. (I know you've tried it.) Can you please provide a testcase?

Justin Lebar (not reading bugmail)

Updated

•

12 years ago

Whiteboard: [MemShrink]

David Flanagan [:djf]

Reporter

Comment 2

•

12 years ago

Attached file link to test case on github (obsolete) (deleted) — Details

The link is to a github pull request that adds a test case to the UITests app. To test: pull the PR into a local branch and push to your phone. Launch the browser to use up some memory and have a realistic load on the phone Launch the UITests app Scroll to the bottom and tap on Big Image Decode test Click on the Start button and watch. You'll see a counter appear. Each time it increments the app has successfully decoded another 8mp jpeg image. It may increment a few times, but you'll probably see an OOM pretty soon. If not, then attempting to switch to the browser app will probably cause the OOM. The first line of test_apps/uitest/js/bigimage.js is |var pause = 0|. If you change that value to 1000, you may find that the OOM will stop happening. If not, an even higher value may work. For the gallery app, I need to use over 3000, but in this test case it doesn't seem as bad. I think that without a sufficiently long pause, the test case is allocating memory for a second image before the memory for the first image has been released. So if the system does not have enough memory for two 8mp images to be decoded at the same time we get an OOM even though we only need to have one image decoded at a time. There is a similar test online at http://djf.net/bigimage.html I can't reproduce the OOM when running that test in the browser app, though.

Attachment #729727 - Flags: feedback?(justin.lebar+bug)

David Flanagan [:djf]

Reporter

Comment 3

•

12 years ago

Justin: I asked for your feedback on that attachment just as a way go bring the test case to your attention.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 4

•

12 years ago

These "delay" tactics would probably need to be re-examined once we get to 22 with the multi-threaded image decoding.

Justin Lebar (not reading bugmail)

Updated

•

12 years ago

Attachment #729727 - Flags: feedback?(justin.lebar+bug)

Justin Lebar (not reading bugmail)

Updated

•

12 years ago

Whiteboard: [MemShrink] → [MemShrink:P1]

Justin Lebar (not reading bugmail)

Updated

•

12 years ago

Assignee: nobody → khuey

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 5

•

12 years ago

So this is fun. img.src = "" works, unless img is not actually in the document ...

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 6

•

12 years ago

Attached patch Patch (obsolete) (deleted) — Details — Splinter Review

Attachment #732616 - Flags: review?(justin.lebar+bug)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 7

•

12 years ago

If we're still taking b2g patches this could be a nice win ...

Status: NEW → RESOLVED

Closed: 12 years ago

tracking-b2g18: --- → ?

Resolution: --- → FIXED

Justin Lebar (not reading bugmail)

Comment 8

•

12 years ago

Comment on attachment 732616 [details] [diff] [review] Patch r=me, but is the comment right? You're talking about images which were "never in the document", but that does not seem to be a necessary condition to hit this bug.

Attachment #732616 - Flags: review?(justin.lebar+bug) → review+

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 9

•

12 years ago

Yeah I should just say not in the document.

David Flanagan [:djf]

Reporter

Comment 10

•

12 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #7) > If we're still taking b2g patches this could be a nice win ... It would be *such* a nice win! Nominating for tef. Bug 847060 was tef+ and is the one that introduced the 3 second delay between preview-less images when scanning. If we can uplift this fix, we can remove that delay, which would be really nice for gallery startup.

blocking-b2g: --- → tef?

Justin Lebar (not reading bugmail)

Comment 11

•

12 years ago

I don't think khuey meant to close this?

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 12

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #11) > I don't think khuey meant to close this? Uh, no. I'm not going to pretend to know what happened there.

Status: REOPENED → ASSIGNED

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

12 years ago

tracking-b2g18: ? → ---

Daniel Coloma:dcoloma

Updated

•

12 years ago

blocking-b2g: tef? → tef+

Ryan VanderMeulen [:RyanVM]

Comment 13

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/0caf5937f8bc

Status: ASSIGNED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla23

Boris Zbarsky [:bzbarsky]

Comment 14

•

12 years ago

Could this have caused Tp5 regressions? It's in the ranges for the ones from today....

Ryan VanderMeulen [:RyanVM]

Comment 15

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #14) > Could this have caused Tp5 regressions? It's in the ranges for the ones > from today.... Holding off on uplifting this until this is answered.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 16

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #14) > Could this have caused Tp5 regressions? It's in the ranges for the ones > from today.... Anything is possible but I think it's pretty unlikely.

Justin Lebar (not reading bugmail)

Comment 17

•

12 years ago

It's possible that a site sets some img.src = '' and then inserts say another image with the old src into the DOM, and we have to re-decode it. It's not clear to me how we fix that without breaking the b2g use-case. But you could certainly detect it on tp5 with judiciously-placed printfs.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 18

•

12 years ago

That's possible. Bug 839103 is in the regression range too though and already claims to regress Dromaeo (comment 30) so I'm going to blame froydnj ;-)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 19

•

12 years ago

Plus if we land this on b2g and see a regression there we'll know it's this.

Boris Zbarsky [:bzbarsky]

Comment 20

•

12 years ago

Bug 839103 is in some of the mails, but only one or two. But yes, it's the obvious other candidate. This patch makes us sync-discard on ~HTMLImageElement. How expensive is discarding? Could it be the action of discarding all the images during page unload that matters? Or is it possible we were keeping the images decoded until Tp looped back to the page again?

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 21

•

12 years ago

Justin can correct me if I'm wrong, but discarding appears to be pretty cheap. Re-decoding, of course, is not. It may be possible that we were keeping images around until Tp looped, depending on how long that loop takes ...

Justin Lebar (not reading bugmail)

Comment 22

•

12 years ago

I'd be very surprised if discarding itself was slowing us down.

Boris Zbarsky [:bzbarsky]

Comment 23

•

12 years ago

As a side note, I wonder whether ~HTMLImageElement should in fact force-discard or whether it should be limited to src="". Especially in cases when there are multiple images pointing to the same src...

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 24

•

12 years ago

I'm pretty sure that RequestDiscard doesn't actually discard if other things have the image locked.

Matt Brubeck (:mbrubeck)

Comment 25

•

12 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #16) > (In reply to Boris Zbarsky (:bz) from comment #14) > > Could this have caused Tp5 regressions? It's in the ranges for the ones > > from today.... > > Anything is possible but I think it's pretty unlikely. This bug or one of the others from the same push (Bug 849654, Bug 855804) is pretty clearly the culprit for this Windows PGO Tp5 regression: http://graphs.mozilla.org/graph.html#tests=[[255,131,31]]&sel=1364931977000,1365104777000&displayrange=7&datatype=running

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 26

•

12 years ago

(In reply to Matt Brubeck (:mbrubeck) from comment #25) > This bug or one of the others from the same push (Bug 849654, Bug 855804) is > pretty clearly the culprit for this Windows PGO Tp5 regression: > > http://graphs.mozilla.org/graph.html#tests=[[255,131,31]]&sel=1364931977000, > 1365104777000&displayrange=7&datatype=running That's a non-PGO build for x64 Windows 8, which is like triple unsupported. I really wish graphserver would show me the other data series though.

Matt Brubeck (:mbrubeck)

Comment 27

•

12 years ago

You can use the links here to view before/after numbers and graphs for other builds: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=tp5&fromchange=d30f993f121f&tochange=0d688b7c4411 For example, this Windows 7 non-PGO graph shows a clear regression where the only pushes in range are yours and mine (and mine was NPOTB on desktop): http://graphs.mozilla.org/graph.html#tests=[[275,131,12]]&sel=none&displayrange=7&datatype=running Windows 7 PGO had a similar-sized (5-6%) regression at the same time, though the range is larger because of fewer PGO runs: http://graphs.mozilla.org/graph.html#tests=[[255,63,12]]&sel=none&displayrange=7&datatype=running

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 28

•

12 years ago

Ok so I guess the next question is whether or not comment 20's conjecture is true.

Daniel Coloma:dcoloma

Comment 29

•

12 years ago

Marking the status flags accordingly so this can be uplifted to the appropriate branches

status-b2g18: --- → affected

status-b2g18-v1.0.1: --- → affected

John Daggett (:jtd)

Comment 30

•

12 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #28) > Ok so I guess the next question is whether or not comment 20's conjecture is > true. Um, maybe this is a naive question but did you do a tryserver push with/without the patch? Was there a difference in the perf numbers? The Win7 opt tp numbers show a 5% regression between 798d734dc32a and 6bcb665562b0, so I pushed some tryserver builds to try and narrow down the exact culprit. changeset: 127495:6bcb665562b0 user: Kyle Huey <khuey@kylehuey.com> date: Wed Apr 03 09:57:11 2013 -0700 summary: Bug 857186: Make virtualenv paths relative. r=gps changeset: 127494:b436cad50c33 user: Kyle Huey <khuey@kylehuey.com> date: Wed Apr 03 09:54:35 2013 -0700 summary: Bug 849654: Kill nsRefPtrHashtableMT. r=bsmedberg changeset: 127493:0caf5937f8bc user: Kyle Huey <khuey@kylehuey.com> date: Wed Apr 03 09:52:25 2013 -0700 summary: Bug 854799: Make image.src='' discard the image immediately even if the image is not in the document. r=jlebar https://tbpl.mozilla.org/?tree=Try&rev=a4d42bd6dc9a changeset: 127492:ef873e1fb7e9 user: Kyle Huey <khuey@kylehuey.com> date: Wed Apr 03 09:49:17 2013 -0700 summary: Bug 855804: Add hashtable helpers for cycle collection. r=bz https://tbpl.mozilla.org/?tree=Try&rev=02f68ce1b761 changeset: 127491:798d734dc32a user: Jan de Mooij <jdemooij@mozilla.com> date: Wed Apr 03 18:09:26 2013 +0200 summary: No bug - Fix merge conflict on a CLOSED TREE. r=red https://tbpl.mozilla.org/?tree=Try&rev=bda444ebb97d

John Daggett (:jtd)

Comment 31

•

12 years ago

This patch is the cause of the 5% regression in tp5 that took place on 4/3: without patch (798d734dc32a): https://tbpl.mozilla.org/?tree=Try&rev=bda444ebb97d winxp opt: 347.48 win7 opt: 362.18 win8 opt: 250.23 with patch (0caf5937f8bc): https://tbpl.mozilla.org/?tree=Try&rev=a4d42bd6dc9a winxp opt: 370.55 win7 opt: 381.64 win8 opt: 264.79 http://graphs.mozilla.org/graph.html#tests=[[255,131,12]]&sel=1364997395719.561,1365027600597.6099&displayrange=7&datatype=running note: the changeset in between changesets 798d734dc32a and 0caf5937f8bc has similar tp results to 798d734dc32a. I think we should back this out until the performance regression is resolved.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

John Daggett (:jtd)

Comment 32

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #20) > This patch makes us sync-discard on ~HTMLImageElement. How expensive is > discarding? Could it be the action of discarding all the images during page > unload that matters? Or is it possible we were keeping the images decoded > until Tp looped back to the page again? The current Tp test suite loads on the same page repeatedly rather than cycling through the other pages before loading the same page again. Which makes it very sensitive to whether something is cached or already decoded or not.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 33

•

12 years ago

(In reply to John Daggett (:jtd) from comment #32) > (In reply to Boris Zbarsky (:bz) from comment #20) > > > This patch makes us sync-discard on ~HTMLImageElement. How expensive is > > discarding? Could it be the action of discarding all the images during page > > unload that matters? Or is it possible we were keeping the images decoded > > until Tp looped back to the page again? > > The current Tp test suite loads on the same page repeatedly rather than > cycling through the other pages before loading the same page again. Which > makes it very sensitive to whether something is cached or already decoded or > not. Mmm. This is just an artifact of the way Talos measures then.

Justin Lebar (not reading bugmail)

Comment 34

•

12 years ago

> The current Tp test suite loads on the same page repeatedly rather than cycling through > the other pages before loading the same page again. Which makes it very sensitive to > whether something is cached or already decoded or not. Given this, it sounds to me pretty unlikely we'd want to do anything differently here. What do you think? In any case, this bug should stay FIXED unless it's actually backed out.

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

John Daggett (:jtd)

Comment 35

•

12 years ago

roc, wondering if you think we should take the 5% tp regression here or not?

Flags: needinfo?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 36

•

12 years ago

I wouldn't have thought so. Reloading the current page (or part of the current page) is common. Shouldn't this bug be prevented by having a memory pressure signal which flushes the image cache?

Flags: needinfo?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 37

•

12 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #36) > Shouldn't this bug be prevented by having a memory pressure signal which > flushes the image cache? Flushes decoded image data, of course.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 38

•

12 years ago

Another approach would be to have a way to decode an image directly to a thumbnail. Not b2g18 material of course, but it could be a lot more efficient than decoding to a full-size image.

John Daggett (:jtd)

Comment 39

•

12 years ago

Attached patch patch, backout changeset 0caf5937f8bc due to tp regression (obsolete) (deleted) — Details — Splinter Review

Attachment #734473 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

12 years ago

Attachment #734473 - Flags: review?(roc) → review+

John Daggett (:jtd)

Comment 40

•

12 years ago

Backout changeset 0caf5937f8bc pushed to mozilla-inbound https://hg.mozilla.org/integration/mozilla-inbound/rev/6286b4578f14

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

David Flanagan [:djf]

Reporter

Comment 41

•

12 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #38) > Another approach would be to have a way to decode an image directly to a > thumbnail. Not b2g18 material of course, but it could be a lot more > efficient than decoding to a full-size image. That's the topic of bug 854795, which I'd love to have your thoughts on. I'm told that our jpeg library has the ability to do this, so we'd just need to hook it up somehow.

David Flanagan [:djf]

Reporter

Comment 42

•

12 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #36) > I wouldn't have thought so. Reloading the current page (or part of the > current page) is common. > > Shouldn't this bug be prevented by having a memory pressure signal which > flushes the image cache? On FirefoxOS devices, the images I'm concerned with require something like 15% of available memory to decode. And during the scan process, I need to decode one right after the other. Because I'm using memory in such big chunks, I'm not convinced that a memory pressure model will do the right thing. Could Kyle's patch land if it was behind a pref? Or if it was modified so it only discarded large images immediately? (Perhaps where the definition of large was a pref?) If I'm understanding the regression correctly, this patch is causing images to be re-decoded when a document is reloaded. Is there any way the patch could be modified so that it only applies to images that have never been in the document? (See comments 8&9). That would do the right thing for the offscreen images I use, but wouldn't affect reloading.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 43

•

12 years ago

(In reply to David Flanagan [:djf] from comment #42) > Could Kyle's patch land if it was behind a pref? Or if it was modified so > it only discarded large images immediately? (Perhaps where the definition > of large was a pref?) That sounds like a good idea. Megapixel images are relatively rare on Web sites AFAIK. > If I'm understanding the regression correctly, this patch is causing images > to be re-decoded when a document is reloaded. Is there any way the patch > could be modified so that it only applies to images that have never been in > the document? (See comments 8&9). That would do the right thing for the > offscreen images I use, but wouldn't affect reloading. That might affect real sites that construct the page programmatically.

Boris Zbarsky [:bzbarsky]

Comment 44

•

12 years ago

What if we modified the patch to _only_ discard on actual explicit src="" sets (as in, sets to the empty string)? Would that be sufficient for b2g for the moment until we can do something smarter?

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 45

•

12 years ago

I guess this patch does effectively revert bug 791731. *sigh*

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 46

•

12 years ago

Attached patch Patch (deleted) — Details — Splinter Review

Like every other problem in nsImageLoadingContent, we can solve this with more flags.

Attachment #729727 - Attachment is obsolete: true

Attachment #732616 - Attachment is obsolete: true

Attachment #734473 - Attachment is obsolete: true

Attachment #734571 - Flags: review?(justin.lebar+bug)

Justin Lebar (not reading bugmail)

Comment 47

•

12 years ago

> Shouldn't this bug be prevented by having a memory pressure signal which flushes the image cache? On most B2G devices, we have a memory-pressure signal which flushes the decoded image cache. It is, however, an emergency measure that we can't rely on. The problem is that large malloc()'s don't return null; they cause the kernel to kill some process (perhaps ours). So we have to know if the malloc() is going to fail before we call it. We can choose when we get a low-memory notification, but we only get one notify threshold, so that doesn't help us much with this sort of thing. We could instead try to read the amount of free memory on the system before making a big allocation, but that doesn't take into account the fact that this process may or may not be the one to be killed when we run out of memory. It also doesn't take into account race conditions in reading the value, and it doesn't protect us from getting close to the limit and then some other non-imagelib module (or another process altogether) OOMing us.

Justin Lebar (not reading bugmail)

Comment 48

•

12 years ago

Not to be contrarian, but I think that the old patch's behavior was almost correct. Although refreshing the page may be a common operation compared to the universe of all possible operations one can take in a browser, it's much less common than navigating to a new page. Given a choice, we should optimize for navigating between pages, not for refreshing pages. Therefore, I think we should throw out all images immediately when we navigate to a new page. Keeping them around for 20s is just asking for excessive memory usage. I am very happy to take a hit on TP5 to get this behavior; the problem is the benchmark, not our code. > I guess this patch does effectively revert bug 791731. Of course we don't want to regress bug 791731. Is there a way to tell the difference between "document goes away" and "image loading content goes away, but document stays alive"?

David Flanagan [:djf]

Reporter

Comment 49

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #44) > What if we modified the patch to _only_ discard on actual explicit src="" > sets (as in, sets to the empty string)? Would that be sufficient for b2g > for the moment until we can do something smarter? That would more than suffice for the Gallery app.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 50

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #48) > Not to be contrarian, but I think that the old patch's behavior was almost > correct. > > Although refreshing the page may be a common operation compared to the > universe of all possible operations one can take in a browser, it's much > less common than navigating to a new page. > > Given a choice, we should optimize for navigating between pages, not for > refreshing pages. Therefore, I think we should throw out all images > immediately when we navigate to a new page. Keeping them around for 20s is > just asking for excessive memory usage. I am very happy to take a hit on > TP5 to get this behavior; the problem is the benchmark, not our code. FWIW, I agree. > > I guess this patch does effectively revert bug 791731. > > Of course we don't want to regress bug 791731. Is there a way to tell the > difference between "document goes away" and "image loading content goes > away, but document stays alive"? Not easily. Since cycle collection doesn't order dtors the best we could do, even with silly things like weakrefs, is "image loading content goes away, but document hasn't gone away yet". (In reply to David Flanagan [:djf] from comment #49) > (In reply to Boris Zbarsky (:bz) from comment #44) > > What if we modified the patch to _only_ discard on actual explicit src="" > > sets (as in, sets to the empty string)? Would that be sufficient for b2g > > for the moment until we can do something smarter? > > That would more than suffice for the Gallery app. This is what the latest patch does, more or less.

Justin Lebar (not reading bugmail)

Comment 51

•

12 years ago

> Since cycle collection doesn't order dtors the best we could do, even with silly things like > weakrefs, is "image loading content goes away, but document hasn't gone away yet". Perhaps we could throw away the images when the document gets detached from its docshell? But that's getting outside the scope of this bug.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 52

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #51) > Perhaps we could throw away the images when the document gets detached from > its docshell? > > But that's getting outside the scope of this bug. Well that doesn't solve the refresh problem does it? Won't the previous document get detached before the new copy gets attached? But yes, we're creeping far away from this bug.

Justin Lebar (not reading bugmail)

Comment 53

•

12 years ago

Comment on attachment 734571 [details] [diff] [review] Patch > + void ClearCurrentRequest(nsresult aReason, uint32_t aFlags = REQUEST_DISCARD); > + void ClearPendingRequest(nsresult aReason, uint32_t aFlags = REQUEST_DISCARD); Flags that default to 0 are fine, but defaulting flags to REQUEST_DISCARD is asking for trouble, I think.

Attachment #734571 - Flags: review?(justin.lebar+bug) → review+

Justin Lebar (not reading bugmail)

Comment 54

•

12 years ago

> Well that doesn't solve the refresh problem does it? Indeed, but per comment 48 I don't think that's something we should worry about.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 55

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #53) > Comment on attachment 734571 [details] [diff] [review] > Patch > > > + void ClearCurrentRequest(nsresult aReason, uint32_t aFlags = REQUEST_DISCARD); > > + void ClearPendingRequest(nsresult aReason, uint32_t aFlags = REQUEST_DISCARD); > > Flags that default to 0 are fine, but defaulting flags to REQUEST_DISCARD is > asking for trouble, I think. It makes sense though because that's what all the current callers want. I could invert the sense of the flags here ... flags suck :-/

Justin Lebar (not reading bugmail)

Comment 56

•

12 years ago

> It makes sense though because that's what all the current callers want. Explicit is better than implicit? I suppose you could invert the direction of these flags, but then we'd have a mismatch between this flag and the nsIDocument::REQUEST_DISCARD one. > flags suck :-/ No argument here... :)

Boris Zbarsky [:bzbarsky]

Comment 57

•

12 years ago

So the thing is.... when navigating between pages we want to not redecode the images the pages share, right?

Justin Lebar (not reading bugmail)

Comment 58

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #57) > So the thing is.... when navigating between pages we want to not redecode > the images the pages share, right? If we want to avoid regressing TP, yes. But it seems to me that this became the current behavior by accident.

Boris Zbarsky [:bzbarsky]

Comment 59

•

12 years ago

It seems like if we want to avoid regressing common browsing behavior, we want to avoid such redecodes...

Justin Lebar (not reading bugmail)

Comment 60

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #59) > It seems like if we want to avoid regressing common browsing behavior, we > want to avoid such redecodes... Well, it's a trade-off. Right now when I leave a page with lots of images, my memory usage doesn't go down for 20s or so, and that's bad too. Perhaps we can have a hack where we throw away decoded images when we navigate between two pages of different origins, but we keep them otherwise. Might that be satisfactory?

Boris Zbarsky [:bzbarsky]

Comment 61

•

12 years ago

Or perhaps not discarding until the onload of the new page? Or both?

Justin Lebar (not reading bugmail)

Comment 62

•

12 years ago

> Or perhaps not discarding until the onload of the new page? The problem is, by that point the images on the new page have finished loading, and we might have resultantly OOMed the content process on B2G. Moving images into discardable memory on B2G would help, if that's feasible. So would lowering the limit on how much unlocked decoded data we keep around; that would help avoid pathological cases on desktop, although it doesn't help much for B2G.

Ryan VanderMeulen [:RyanVM]

Comment 63

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/6286b4578f14

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Updated

•

12 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 64

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #60) > Well, it's a trade-off. Right now when I leave a page with lots of images, > my memory usage doesn't go down for 20s or so, and that's bad too. It's not bad if you're on desktop and have gobs of memory. In that case, hanging onto decoded images is definitely the right thing to do. Waiting until the docshell's new page has finished onload sounds good. But as you say, it's still OOM-risky if the new page ends up not sharing images with the old page. I think making the right decisions will sometimes require us to be aware of memory pressure and there's no way around that. It sounds like we need better machinery for that.

Justin Lebar (not reading bugmail)

Comment 65

•

12 years ago

> It's not bad if you're on desktop and have gobs of memory. In that case, hanging onto decoded images > is definitely the right thing to do. If you're on desktop /and/ you have gobs of memory, it probably doesn't matter much what we do, because you probably have a fast, many-core machine, and with multithreaded image decoding, we can likely re-decode the images very quickly. > I think making the right decisions will sometimes require us to be aware of memory pressure and > there's no way around that. I've written extensively about the challenges with this approach, but unfortunately I can't find my most recent mail. So to summarize: If we change our behavior based on the amount of free memory on the machine -- e.g. if we drop unnecessary images and bfcache when memory gets low -- then we're allowing the other processes the user is running on the machine to change our behavior. In essence we become the nicest process on the machine, trying never to push memory into the page file. This might be a good strategy, and it might not be. You can imagine machines that are chronically low on memory where this strategy would essentially mean that we disable bfcache for the user, or where we end up re-decoding lots of images. Both of these would be especially painful because presumably the machine is slow and also loaded with processes sucking up the CPU. Maybe letting the OS page something out would be better than taking this approach. (Or maybe not; we are not in a position to judge.) Separately, low-memory notifications are difficult because you have to fire them early enough that you have time to react (e.g. a GC takes a while), but not so early that you end up dropping things unnecessarily. And you can't react synchronously to memory pressure; you have to wait for the main thread event loop to finish what it's doing. Otherwise e.g. any call to malloc() can trigger a gc, and that is craziness. I've experimented with this stuff on Windows -- we have the ability to intercept VirtualAlloc calls, read the amount of free memory on the system, and trigger a memory-pressure notification at the next possible moment which triggers a gc/cc and drops images and bfcache -- but I wasn't able to overcome these challenges, so this code remains mostly unused.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 66

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/5943881c5298

Ryan VanderMeulen [:RyanVM]

Comment 67

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/5943881c5298

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM]

Comment 68

•

12 years ago

This has some conflicts on b2g18 that I'm not comfortable resolving. Please post a branch-specific patch.

Ryan VanderMeulen [:RyanVM]

Comment 69

•

12 years ago

Pretty please?

Flags: needinfo?(khuey)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 70

•

12 years ago

Attached patch Patch for b2g (deleted) — Details — Splinter Review

Flags: needinfo?(khuey)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Updated

•

12 years ago

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 71

•

12 years ago

https://hg.mozilla.org/releases/mozilla-b2g18/rev/05d9bc1a7aac https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/ada03bf95314

status-b2g18: affected → fixed

status-b2g18-v1.0.0: --- → wontfix

status-b2g18-v1.0.1: affected → fixed

status-firefox21: --- → wontfix

status-firefox22: --- → wontfix

status-firefox23: --- → fixed

Keywords: checkin-needed

David Flanagan [:djf]

Reporter

Updated

•

12 years ago

Blocks: 861965

David Flanagan [:djf]

Reporter

Comment 72

•

12 years ago

This patch does not solve the issue I'm having with the Gallery app. If I wait 2-3 seconds after processing each 5mp image, I generally do not crash. With this patch I was not supposed to need to wait. But when I remove the timeouts, I get an OOM. I've tested this with nightly unagi builds for b2g18-1.0.1, b2g18. With a 3000 ms timeout I don't crash (at least not right away). Without it I do. Kyle: any idea what might be happening here? Does your patch include some kind of pref that I need to set in order to enable it?

Flags: needinfo?(khuey)

David Flanagan [:djf]

Reporter

Comment 73

•

12 years ago

It doesn't look to me like there are any prefs to set... Just double checking: setting img.src = '' is still the official way to release the image memory, right?

Justin Lebar (not reading bugmail)

Comment 74

•

12 years ago

Come find me (I'm on IRC) and let's debug this f2f.

David Flanagan [:djf]

Reporter

Comment 75

•

12 years ago

Justin: I'm not at the Madrid work week, so I can't work on this with you f2f. As a test case, see my pull request here: https://github.com/mozilla-b2g/gaia/pull/9216 It removes the setTimeout() idle time after scanning a large image and inserts a bunch of console.log() statements. Combine that with the 5 images attached to bug 847060 (Make a couple of copies of each one and put them on your sdcard.) You should see the gallery OOM while trying to scan them. (You can always adb shell rm -r /data/local/indexedDB/*gallery*) to clear the gallery data and force the scan to resart from the beginning.

Justin Lebar (not reading bugmail)

Comment 76

•

12 years ago

> Justin: I'm not at the Madrid work week, so I can't work on this with you f2f. So I discovered! Sorry to hear that. khuey should work with you on this.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 77

•

12 years ago

Ok so I still see some weirdness here. On my Windows desktop running the test case with this patch still leaves a large buildup of memory somewhere outside the heap. So something is mmapping a bunch of memory or something. I'll dig in more tonight.

Flags: needinfo?(khuey)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 78

•

12 years ago

Alright what I'm seeing appears to be a windows specific problem. So unfortunately I can't immediately help with comment 75 :-(

Justin Lebar (not reading bugmail)

Comment 79

•

12 years ago

We should file a separate bug on the Windows issue, which looks really scary. Can we get you a b2g device? Or otherwise, can you steal someone's Mac? If not, we need to ask for someone else's help here. I just feel bad pulling someone off other B2G work when you're free for the picking...

Johnny Stenback (:jst)

Comment 80

•

12 years ago

Reopening as this did not fix the b2g issue reported in bug 861965.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Milan Sreckovic [:milan] (needinfo for best results)

Comment 81

•

12 years ago

Kyle, what do you think?

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 82

•

12 years ago

I haven't been able to steal another machine yet.

Andreas Gal :gal

Comment 83

•

12 years ago

I think we are hanging on to the image in the canvas image cache. This patch is correct, it just didn't fix all of it. I will file a blocking bug.

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

Andreas Gal :gal

Updated

•

12 years ago

Depends on: 865929

David Flanagan [:djf]

Reporter

Updated

•

12 years ago

No longer blocks: 861965

link to test case on github 12 years ago David Flanagan [:djf] (deleted), text/html		Details
Patch 12 years ago Kyle Huey (Exited; not receiving bugmail, old account, do not use) (deleted), patch	justin.lebar+bug : review+	Details \| Diff \| Splinter Review
patch, backout changeset 0caf5937f8bc due to tp regression 12 years ago John Daggett (:jtd) (deleted), patch	roc : review+	Details \| Diff \| Splinter Review
Patch 12 years ago Kyle Huey (Exited; not receiving bugmail, old account, do not use) (deleted), patch	justin.lebar+bug : review+	Details \| Diff \| Splinter Review
Patch for b2g 12 years ago Kyle Huey (Exited; not receiving bugmail, old account, do not use) (deleted), patch		Details \| Diff \| Splinter Review