Bugzilla

Comment 9

•

11 years ago

(In reply to Preeti Raghunath(:Preeti) from comment #8)
>
> This is not a regression issue so need understanding on spec.

Gallery app should not let itself get killed by the kernel regardless of the SD card contents.
But that aside, 1000 images and 100 videos on sdcard seems like a fairly normal use case.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 10

•

11 years ago

(In reply to Michael Vines [:m1] [:evilmachines] from comment #9)
> (In reply to Preeti Raghunath(:Preeti) from comment #8)
> >
> > This is not a regression issue so need understanding on spec.
> 
> Gallery app should not let itself get killed by the kernel regardless of the
> SD card contents.
> But that aside, 1000 images and 100 videos on sdcard seems like a fairly
> normal use case.

I don't think so. 100 videos in the gallery app can only happen if a user decides to record 100 videos & then launch the gallery app for a first launch. That's absolutely not possible to happen in any end-user use case.

Comment 11

•

11 years ago

Is this a regression?  If not, it's way too late for 1.3.

Comment 12

•

11 years ago

(In reply to Jason Smith [:jsmith] from comment #10)
> I don't think so. 100 videos in the gallery app can only happen if a user
> decides to record 100 videos & then launch the gallery app for a first
> launch. That's absolutely not possible to happen in any end-user use case.

SD cards are removable.

Comment 13

•

11 years ago

:tk, can you please try your same SD card on a QRD7x27a loaded with the latest v1.3 and v1.2 and report the results here.

Flags: needinfo?(tkundu)

Assignee

Comment 14

•

11 years ago

Please make sure that the images in your reference workload all contain EXIF preview images that are at least as large as the screen (measured in device pixels).  That is the fast case, and we try to enforce it for the camera.  

Until we have a fix for bug 854795, scanning images without big EXIF previews is just not going to work well, no matter what.

Flags: needinfo?(dflanagan)

Dave Huseby [:huseby]

Comment 15

•

11 years ago

After reading the comments, it seems like this problem only arises when there is a large number of photos with large EXIF preview images.  If that's true, then my question is, how large are the EXIF previews created by the phone when taking photos?  If the phone doesn't create images that cause this bug, I don't think this bug should block 1.3.

Am I missing something?

Flags: needinfo?(mvines)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 16

•

11 years ago

Potentially.  Let's see what we get from comment 13.  Also :tk, please clarify the origin of the images on the SD card

Flags: needinfo?(mvines)

Comment 17

•

11 years ago

(In reply to Dave Huseby [:huseby] from comment #15)
> After reading the comments, it seems like this problem only arises when
> there is a large number of photos with large EXIF preview images.  If that's
> true, then my question is, how large are the EXIF previews created by the
> phone when taking photos?  If the phone doesn't create images that cause
> this bug, I don't think this bug should block 1.3.
> 
> Am I missing something?

It's more of a large images without EXIF previews, so we have to decode the full image, then scale it down to display it.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 18

•

11 years ago

(In reply to Tapas Kumar Kundu from comment #0)
> Please note that Gallery app memory usage gets reduced in above scenerio if
> you use less number of pics/videos .

Is this explainable by the gallery app decoding images to generate a preview?  I would expect memory usage during import to be somewhat independent of total number of images.  I assume we are not trying to decode them all at once.

Inder

Comment 19

•

11 years ago

Michael provided the info. Clearing ni.

Flags: needinfo?(ikumar)

Reporter

Comment 20

•

11 years ago

Attached file gallery_app_varying_memusage_1st_launch_2000pics_100_videos.txt (obsolete) (deleted) — Details

This contains b2g-procrank usage logs for Gallery app when it scans all images on sdcard . Please note that sdcard has 2000 images and 100 videos. All images are simply a copy of attached [1]
https://bugzilla.mozilla.org/attachment.cgi?id=8366621&action=edit

From dmesg log, i can see following things during Gallery launch 1st time.

<6>[   40.509489] send sigkill to 1003 (Homescreen), adj 534, size 4588
<6>[   50.540774] send sigkill to 2143 ((Preallocated a), adj 667, size 3607
<6>[   75.946431] send sigkill to 1123 (Gallery), adj 134, size 16731

Attachment #8365503 - Attachment is obsolete: true

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 21

•

11 years ago

Attached file gallery_app_varying_memusage_1st_launch_2000pics_100_videos.txt (deleted) — Details

MemUsage for #comment 20 . It uses same sdcard as mentioned in #comment 20. Sorry for re-uploading it. You can see galley is using 86068KB as USS during scanning all images on FFOS 1.3 JB port

Attachment #8367577 - Attachment is obsolete: true

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 22

•

11 years ago

Attached file gallery_app_varying_memusage_1st_launch for same content as #comments 20 (deleted) — Details

It contains gallery app mem usage for Ics Strawberry FFOS 1.3 .

It uses same content as #comments 20 . It is running on 320x480 resolution device. It is not killed and mem usage is also significantly lower (<55MB) than mem usage in #comments 20. 

Please note that #comment 20 is for FFOS 1.3 JB port running on 800x480 resolution device.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 23

•

11 years ago

Attached file gallery_app_varying_memusage_1st_launch_2000pics_100_videos_ICS_strawberry_FFOS1.2_.txt (deleted) — Details

Mem usage for gallery app running on ICS Strawberry FFOS 1.2 320x480 resolution device . Highest USS memory usage is <55MB for gallery app scanning same content.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 24

•

11 years ago

Attached video VID_0003.3gp (deleted) — Details

In summary,

FFOS 1.2 ics strawberry and FFOS 1.3 ics strawberry runs on 320x480 device and gallery mem usage is <55MB . We also didn't observe any crash here.

But FFOS 1.3 JB port runs on 800x480 device and gallery app mem usage is >80MB at some point. We always observed crash in this use case.

Please note that crash happened only when gallery app scans for images on sdcard. I used 2000 copy of same image :
https://bugzilla.mozilla.org/attachment.cgi?id=8366621&action=edit and 100 copies of same video (attached here).

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Updated

•

11 years ago

Flags: needinfo?(dhuseby)

Comment 25

•

11 years ago

Thanks for the deep analysis in comment 24 - that confirms that this is a regression in memory when moving over to JB for 1.3. Agreed on the blocking decision now.

Removing dependency also - the problem is likely JB-specific, so the root cause is likely different than what the dependency is implying.

blocking-b2g: 1.3? → 1.3+

No longer depends on: 854795

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

11 years ago

Keywords: regression

Comment 26

•

11 years ago

JB on 1.3 was only done on Nexus 4.  I imagine that has more memory and doesn't run into this problem.  I wouldn't call that a regression, as we never had this work on a JellyBean phone we're talking about (which one are we talking about, by the way), but I suppose the fact that it works on ICS is what would have us call it a regression.

Comment 27

•

11 years ago

(In reply to Milan Sreckovic [:milan] from comment #26)
> JB on 1.3 was only done on Nexus 4.  I imagine that has more memory and
> doesn't run into this problem.  I wouldn't call that a regression, as we
> never had this work on a JellyBean phone we're talking about (which one are
> we talking about, by the way), but I suppose the fact that it works on ICS
> is what would have us call it a regression.

Fair enough, let's pull the keyword.

Keywords: regression

Assignee

Comment 28

•

11 years ago

Please see comment 14.  The EXIF preview in the attached image is not very large, and will not cover a 480x800 screen, I don't think

Therefore, the gallery app will decode the full-size image 1000 times, using much more memory.

If the exif preview is big enough to cover the 320x480 screen, then the gallery app will just use the exif preview which uses much less memory.

Because of this screensize depedency it is not fair to use the same reference workload for the two phones.  We ensure in the Camera app that photos produced on the phone have large enough EXIF previews to take the fast path in Gallery.

However, see also bug 942199 where we may have to relax the "big enough" preview size to screensize comparison for some hardware.

Assignee

Comment 29

•

11 years ago

Mike: flagging you on this because I see now that Tapas is using the standard reference workloads for photos. As described above, I think that might not be a fair test for large-screen devices.

Flags: needinfo?(mlee)

Assignee

Comment 30

•

11 years ago

Comments 5 and 6 indicate that this isn't just about decodeing large images, but about the number of images.  If it doens't crash after 100 duplicate images, I would expect the gallery to have reached a steady state.  But those comments describe a crash that happens somewhere between 1000 and 2500 images, so I wonder if we have to look for a memory leak somewhere.

Gallery used to display all of its thumbnails, all of the time.  Then we introduced shared/js/visibility_monitor.js so that thumbnails would not be loaded when not visible so that we wouldn't run of of memory when the number of thumbnails got large.

It might be worth investigating whether the visibility monitor is working as it should during the thumbnail scanning process.

The patch that added date headers to the thumbnail list changed the version of the visiblity monitor we're using.  It would be interesting to try backing that patch out to see if this can still be reproduced. That was bug 925179

Milan Sreckovic [:milan] (needinfo for best results)

Assignee

Comment 31

•

11 years ago

Looks like bug 925179 backs out mostly cleanly with only a trivial manual fix to the index.html to resolve by hand.  I used this line:

I did this:

git checkout -b revert925179 v1.3
git revert -m1 14ed05fcf07b43ec8bed744c3b2e3e2977ebb40f

I haven't actually tried to reproduce the crash yet myself, so I haven't investigated whether backing this out prevents the crash.

Note that I am not suggesting we back out that patch to fix this bug.  But if backing it out fixes it, then that is a huge clue about what we need to do to fix it.

Flags: needinfo?(tkundu)

Comment 32

•

11 years ago

David, this is seriously useful information and triage, thanks!

Assignee

Comment 33

•

11 years ago

It would also be very helpful to know whether scrolling while scanning is actually necessary to reproduce the crash.  If we crash on 2000 images without any scrolling that should simplify the diagnosis somewhat

Updated

•

11 years ago

Whiteboard: [c=memory p= s= u=] → [c=memory p= s= u=1.3]

Assignee

Comment 34

•

11 years ago

Running on master, I can reproduce this crash with 1000 copies of the test image on the sdcard.  It appears to have crashed after ~700 or so.

I instrumented the onscreen and offscreen callbacks (that are supposed to be called by the visibilty monitor).  The onscreen callback was called 659 times before crashing and the offscreen callback was only called once.

This probably means that we were keeping 659 thumbnails in memory. 120x120 px, so 120x120x4x659 = 36.2mb

The thumbnails would be bigger on a 480x800 device so the problem would be worse there.

The visibility montior was supposed to be tuned so that it kept 300 thumbnails in memory, so I think something is going wrong.

Comment 35

•

11 years ago

Interesting.  I would have thought unloading images would be less of an issue as well given that we discard images that are off screen.  See bug 689623 now.

Also, late in v1.2 we enabled the pref for bug 847223 to avoid decoding images that are offscreen:

  pref("layout.imagevisibility.enabled", true);

I wonder if that pref could be interacting badly with the mutation code or something.

Comment 36

•

11 years ago

Maybe gallery is described by bug 689623 comment 165?

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Assignee

Comment 37

•

11 years ago

Ben,

And independently, I just saw bug 943248 which also appears to set that pref. This is the first I've heard about it, so maybe we don't need visibility monitor anymore.

And yes, bug 689623 comment 165 seems to describe Gallery. But that comment is really old and the STR on this bug involve scrolling, so it might not be relevant.

Reporter

Comment 38

•

11 years ago

(In reply to David Flanagan [:djf] from comment #31)
> Looks like bug 925179 backs out mostly cleanly with only a trivial manual
> fix to the index.html to resolve by hand.  I used this line:
> 

I reverted that change but we still see crash during startup for 256MB 480x800 device (running FFOS 1.3 JB port)

>> It would also be very helpful to know whether scrolling while scanning is actually necessary to reproduce the crash.
Not necessary. It will automatically crash at some point during scanning.

Flags: needinfo?(tkundu)

Hema Koka [:hema]

Comment 39

•

11 years ago

djf: I am assigning this to you since you have been actively investigating this today (please feel free to ping punam to help with additional testing as you suggested, I will also talk to her about this one in the morning).

Thanks
Hema

Assignee: nobody → dflanagan

Assignee

Comment 40

•

11 years ago

(In reply to Tapas Kumar Kundu from comment #38)
> (In reply to David Flanagan [:djf] from comment #31)
> > Looks like bug 925179 backs out mostly cleanly with only a trivial manual
> > fix to the index.html to resolve by hand.  I used this line:
> > 
> 
> I reverted that change but we still see crash during startup for 256MB
> 480x800 device (running FFOS 1.3 JB port)
> 
> >> It would also be very helpful to know whether scrolling while scanning is actually necessary to reproduce the crash.
> Not necessary. It will automatically crash at some point during scanning.

Thanks for this information, Tapas.  How did you test with bug 925179 reverted? When I try reverting that patch, Gallery ends up hanging with IndexedDB errors for me, so I haven't been able to test that yet

Flags: needinfo?(tkundu)

Assignee

Comment 41

•

11 years ago

Clearing the needinfo flag for Tapas. I flashed a nightly from before bug 925179 landed and still get the creah trying to scan 1000 copies of the test image my hamachi.

Flags: needinfo?(tkundu)

Assignee

Comment 42

•

11 years ago

Following up on comment 34, running master on hamachi, with 900 copies of the test image.

Yesterday Gallery crashed on the first run and visiblity monitor fails calling the onscreen callback 659 times and offscreen only once. When I ran it again, it finishes scanning.

Today, I ran it again with all of the iamges already scanned. This time it loads and displays all of the thumbanils and the onscreen callback is only called ~300 times which is the expected number. So there is something about the scanning process.

I tried the scanning test again today with today's master build.  900 photos. This time the scan completed without an OOM.  But the onscreen calback was called 905 times and the offscreen callback only once.  I could see a steady increase in the max memory usage as scanning continued, so I think we did indeed have all of the decoded thumbnails in memory at the same time.

Next, I'm going to try the test again with all of the visibility monitor code removed, since per the discussion in comments 35 to 37, it shouldn't actually be needed anymore.

Assignee

Comment 43

•

11 years ago

Running the scan again without the visibility monitor worked with 900 photos.  Memory usage dropped when the screen blanked and then spiked when I turned the screen back on again.  Trying again with the screen set to stay on always.  Memory usage sometimes seems alarmingly high (spikes in the 160's) but it hasn't crashed yet.

Next, I'm going to try working with different imges that have suitable EXIF previews. That will reduce the variation in memory usage since we'll never have to decode the full image.  It should also make scanning much faster, so it will be easier to try with larger numbers of photos.  If the underlying cause is that we just have too many decoded thumbnails in memory, we ought to be able to trigger the bug even with photos with previews.

Assignee

Comment 44

•

11 years ago

Now I've put 2187 copies of a photo from my hamachi camera on the sdcard and am running gallery without the visibility monitor.  It scans much more quickly, and does not OOM.

Once the photos are scanned, I can scroll quickly through them without causing an OOM.  Memory usage does go up as I scroll, and the frame rate is not particularly good, but we had that problem with Gallery before.

Next, I'll try this same test with the faster scanning photos but with the visibility monitor running.

Assignee

Comment 45

•

11 years ago

My 2000+ images also scanned fine with the visibility monitor running. And this time, the onscreen callback was only called 215 times.

Memory usage while scanning seemed about the same as without it.  Scrolling performance after scanning is worse with the visibility monitor installed, as expected.

One thing that is different this time is that I have the screen always on. I wonder if the visibility monitor was malfunctioning when the screen went blank.  I'll try that next.

Assignee

Comment 46

•

11 years ago

When the screen blanks while we're scanning, we OOM.
This is on master, and with the visibility monitor running.

Comment 47

•

11 years ago

Nice find!  Do the mutation events stop coming or something?

Assignee

Comment 48

•

11 years ago

Same thing without the visibility monitor: we OOM if keep scanning while the screen is blank.

Assignee

Comment 49

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #47)
> Nice find!  Do the mutation events stop coming or something?

I'm wondering if the LMK just kills us more easily when the screen blanks.  Are we still the foreground app in that case?  I've been testing by manually tapping the sleep button. 

Maybe I have to hold a screen lock while scanning?

Assignee

Comment 50

•

11 years ago

Next I'll try this on 1.3. It could be that I'm just seeing some new buggy behavior on master.

Punam Dahiya [:pdahiya]

Comment 51

•

11 years ago

Here are my findings testing on Hamachi 1.3 with 1000 big EXIF preview images. Image used was taken using Hamchi camera and 1000 replicated images copied Using APPS="gallery" make reference-workload-x-heavy.

No crash was observed while scanning, time taken to scan and load images was 4 mins for 1000 images, memory usage stayed around 150000.

On Hamachi 1.3 similar scanning and load for 1000 test image attached to the bug  https://bug963917.bugzilla.mozilla.org/attachment.cgi?id=8366621, took 31 mins with memory usage varied between 160000-180000.

Assignee

Comment 52

•

11 years ago

Same result in the latest 1.3 nightly.  If I use the default 'blank screen after 1 minute' setting, then the gallery app will die during scanning.

If do Settings->Display->Screen timeout->Never then I can scan 2187 photos without an OOM.

Tapas: do you find the same behavior?  If you force the screen to stay on, does the OOM go away?

Comment 53

•

11 years ago

(In reply to David Flanagan [:djf] from comment #49)
> I'm wondering if the LMK just kills us more easily when the screen blanks. 
> Are we still the foreground app in that case?  I've been testing by manually
> tapping the sleep button. 
> 
> Maybe I have to hold a screen lock while scanning?

Yea, I think you are the background process at that point and will OOM more easily.  Can you see what b2g-info says in terms of memory usage?  Is it roughly the same whether you let it sleep or not?

Assignee

Comment 54

•

11 years ago

calling navigator.requestWakeLock('screen') in the onscanstart callback, and then unlocking it in onscanend allows the scan to complete.  And a minute after the scan completes, the screen blanks successfully.

Assignee

Comment 55

•

11 years ago

Attached file link to patch on github (obsolete) (deleted) — Details

Tapas,

Does this patch solve the problem for you?

Attachment #8368974 - Flags: review?(tkundu)

Assignee

Comment 56

•

11 years ago

Note that even with this patch, if I push the power button to manually blank the screen the app will go to the background and OOM.  But the screen lock at least keeps it awake by default so that the scan can complete.

I suppose we could argue about whether an OOM while sleeping is a bug or not. I'm not sure that it is.  Something to discuss, perhaps, but not something to block on.

Alive and Tim: can either of you comment on what happens to the foreground app when the phone goes to sleep?  Does it become a background app and easier for the LMK to kill, or is there a bug here? Is there an optimization we can make here so that if an app has a screen lock and the user sleeps the phone we keep that app as the foreground app or something?  Basically, are there followup bugs we should file here?

Flags: needinfo?(timdream)

Flags: needinfo?(alive)

Assignee

Comment 57

•

11 years ago

I wonder if a cpu lock would work here instead of a screen lock...  I don't actually know what a cpu lock is supposed to do

Assignee

Comment 58

•

11 years ago

With this patch applied, I just scanned 3600 photos (each one with an EXIF preview of large enough size for the 320x480 hamachi screen).  When scanning completed, the screen blanked correctly, but it did not blank during scanning.

(inactive after 6/18) Alive Kuo [:alive]

Assignee

Comment 59

•

11 years ago

Changing from screen lock to cpu lock does not fix the bug.

Comment 60

•

11 years ago

(In reply to David Flanagan [:djf] from comment #56)
> Note that even with this patch, if I push the power button to manually blank
> the screen the app will go to the background and OOM.  But the screen lock
> at least keeps it awake by default so that the scan can complete.
> 
> I suppose we could argue about whether an OOM while sleeping is a bug or
> not. I'm not sure that it is.  Something to discuss, perhaps, but not
> something to block on.
> 
> Alive and Tim: can either of you comment on what happens to the foreground
> app when the phone goes to sleep?  Does it become a background app and
> easier for the LMK to kill, or is there a bug here? Is there an optimization
> we can make here so that if an app has a screen lock and the user sleeps the
> phone we keep that app as the foreground app or something?  Basically, are
> there followup bugs we should file here?

Quick answer is: when screen is locked, all apps are at background.
App is not able to distinguish background and screen-off.

Because of page visibility is inheritable, and all app are embedded under system app,
when screen is turned of, the visibility of system app's iframe is changed to be false,
thus all apps including foreground app would get visibilitychange.

I don't think we should not put app to background when screen is off. This would result power drain if the app don't know screen is off to do less work.
screen lock does prevent screen to be automatically turned off by screen manager in system app,
and maybe we could ban the power button even the foreground app has a screen lock.

Flags: needinfo?(alive)

Punam Dahiya [:pdahiya]

Comment 61

•

11 years ago

Tested on Hamachi v1.3 with default 'blank screen after 1 minute' screen timeout setting, Gallery Crash seen after 4 mins of scanning 1800 big EXIF preview images.

Changing the setting to screen never blanks out scanned 1800 images in 7 mins, max memory usage reached around 158000

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Assignee

Comment 62

•

11 years ago

Setting needinfo for Tapas in case he is not monitoring his review requests.

Tapas: does the attached patch solve the bug you were seeing?

Flags: needinfo?(tkundu)

Reporter

Comment 63

•

11 years ago

No :((In reply to David Flanagan [:djf] from comment #55)
> Created attachment 8368974 [details]
> link to patch on github
> 
> Tapas,
> 
> Does this patch solve the problem for you?

No. Didn't work :(

Flags: needinfo?(tkundu)

Assignee

Comment 64

•

11 years ago

Did the patch prevent the screen from blanking while you ran the test? Did it allow your test to run longer than it did before?

If you switch to a more realistic workload (with images that have bigger EXIF previews), then maybe this patch in conjunction with a patch for bug 942199 will be enough.

Other than that, I don't know what to suggest. If we got a fix for bug 854795 and could uplift it to 1.3 that would make a big difference, but I don't think there is any chance of that for 1.3.

You could reduce the size of the photos that the Camera app is taking.  Though I'm guessing that an OOM for a completely unrealistic workload is easier to live with then compromising the photo size of the device.

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Assignee

Comment 65

•

11 years ago

I'm tempted to unassign myself from this bug, and I won't be able to figure out why my 256mb device (hamachi) can scan the photos and the QC reference device cannot.

I suppose we should try one more time with exactly the workload that Tapas is using however before I give up completely.

Punam: if you have time, would you try reproducing this bug with the 1000 photos and 100 videos workload that the reporter is using?  I think the make reference-workload targets don't put enough photos and videos on the phone, so you'll have to use more.  Please use the default reference workload image even though it has a too-small preview and takes a really long time to scan.

Assignee: dflanagan → nobody

Flags: needinfo?(pdahiya)

Punam Dahiya [:pdahiya]

Comment 66

•

11 years ago

Tested on below environment

Device: Hamachi
OS Version 1.3
Platform 28.0a2
Build Id: 20140131004001

Steps

1. Applied the attached patch on latest gaia v1.3 and installed on device using make reset-gaia

2. Load 1000 images on device by changing makeRefrenceWorkLoad.sh script to push 1000 default images https://github.com/mozilla-b2g/gaia/blob/master/test_media/reference-workload/makeReferenceWorkload.sh#L62

3. Run APPS="gallery" make reference-workload-x-heavy

4. Load 100 attached video https://bug963917.bugzilla.mozilla.org/attachment.cgi?id=8367619 on sdcard

5. Open gallery app

The gallery app took 22 mins to load 1000 images and screen didn't timeout while scanning. Screen successfully timeout and turned black after scanning finished. No crash observed.

Flags: needinfo?(pdahiya)

Reporter

Comment 67

•

11 years ago

Attached file gaia and gecko commit hash (deleted) — Details

(In reply to David Flanagan [:djf] from comment #64)
> Did the patch prevent the screen from blanking while you ran the test? Did
> it allow your test to run longer than it did before?
> 
It crashed after 10mins and I tested with 2000 copy of sample image (which I attached with this bugid) . I am not sure whether this patch has prevented 'screen blanking' for me or not. I changed screen timeout to NEVER in settings>display>screentimeout and I also applied your patch [1] on top of my repo . I still saw a crash exactly after 10mins and my display was not  blank even after gallery crashed.

I attached my gaia/gecko repo's last two commit hash. Can you please tell me if I am missing some other changes in gaia/gecko ?

It will be helpful if you can point me to debug something which can tell me whether gecko is activating screentimeout after 10mins or not.

I am assuming that I am hitting some other bug for screen timeout as it is working for others in 256MB configuration. I am re-checking with latest gaia/gecko and I will update soon with your patch on top of latest gaia/gecko in FFOS 1.3 .

[1] https://bugzilla.mozilla.org/attachment.cgi?id=8368974&action=edit

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Updated

•

11 years ago

Flags: needinfo?(dflanagan)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 68

•

11 years ago

Do you have calendar or clock alarms that might be triggering another app launch after 10 minutes?  That shouldn't kill the foreground app, but maybe that is triggering some bug.

Reporter

Comment 69

•

11 years ago

(In reply to Punam Dahiya from comment #66)
> Tested on below environment
> 
> Device: Hamachi
> OS Version 1.3
> Platform 28.0a2
> Build Id: 20140131004001
> 
Did you test it on JB-port of FFOS 1.3 ? I guess that it is ICS strawberry based FFOS 1.3 . Please also confirm me about total memory available on this device. Thanks a lot for help

Flags: needinfo?(pdahiya)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Assignee

Comment 70

•

11 years ago

Tapas,

Yes, the Hamachi is an ICS device. It has 256mb of memory, but only has a 320x480 screen, which means that the thumbnail images and the preview images that are being generated are smaller than they would be on your device. 

The only JB device that I know of is the nexus 4, and although it has the larger screen size obviously it has much more memory than your device.  

I don't know if anyone at mozilla has a device that can replicate your test.

Punam tried with 1000 images (Thanks, Punam!).  I'll try with the 2000 that you tried.  I'm eager to hear what you experience with the latest version of Mozilla's 1.3 branch.

If the crash is always at the 10 minute mark that does seem like an important clue. I don't know anything about b2g process management, however, so don't know what to make of that.

Flags: needinfo?(dflanagan)

Reporter

Comment 71

•

11 years ago

(In reply to David Flanagan [:djf] from comment #70)
> Tapas,
> 
> Yes, the Hamachi is an ICS device. It has 256mb of memory, but only has a
> 320x480 screen, which means that the thumbnail images and the preview images
> that are being generated are smaller than they would be on your device. 
> 
> The only JB device that I know of is the nexus 4, and although it has the
> larger screen size obviously it has much more memory than your device.  
> 
> I don't know if anyone at mozilla has a device that can replicate your test.

I already confirmed that ICS decice running FFOS 1.3 has no issue even with 2000 images in #comment 24.
Please know that this issue is present only in FFOS 1.3 JB device with 256 Mem configuration. 

>> Punam tried with 1000 images (Thanks, Punam!).  I'll try with the 2000 that you tried. 
Looks like it will not help much as you don't have proper device to test this use case. 

>> If the crash is always at the 10 minute mark that does seem like an important clue. I don't know anything about b2g process management, however, so don't know what to make of that.

Do you want us to re-adjust LMK values for this galley app problem ? I guess that we should not allow gallery app to consume ~80MB memory during loading. If you look into #comment 20 then you can see that this is memory usage regression of gallery app for 800x480 JB 256MB device. It would be great if we can find a solution for memory usage of galley app instead of adjusting LMK or screen timeout settings. 

please suggest

Flags: needinfo?(tkundu)

Flags: needinfo?(pdahiya)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 72

•

11 years ago

(In reply to David Flanagan [:djf] from comment #64)
> Did the patch prevent the screen from blanking while you ran the test? Did
> it allow your test to run longer than it did before?
> 
> If you switch to a more realistic workload (with images that have bigger
> EXIF previews), then maybe this patch in conjunction with a patch for bug
> 942199 will be enough.
> 
> Other than that, I don't know what to suggest. If we got a fix for bug
> 854795 and could uplift it to 1.3 that would make a big difference, but I
> don't think there is any chance of that for 1.3.
>
It seems to me that we are not trying to solve memory usage regression (#comment 24) for gallery app on 800x480 device running FFOS 1.3 (JB port) now.

Assignee

Comment 73

•

11 years ago

(In reply to Tapas Kumar Kundu from comment #71)
> (In reply to David Flanagan [:djf] from comment #70)
> > Tapas,
> > 
> > Yes, the Hamachi is an ICS device. It has 256mb of memory, but only has a
> > 320x480 screen, which means that the thumbnail images and the preview images
> > that are being generated are smaller than they would be on your device. 
> > 
> > The only JB device that I know of is the nexus 4, and although it has the
> > larger screen size obviously it has much more memory than your device.  
> > 
> > I don't know if anyone at mozilla has a device that can replicate your test.
> 
> I already confirmed that ICS decice running FFOS 1.3 has no issue even with
> 2000 images in #comment 24.
> Please know that this issue is present only in FFOS 1.3 JB device with 256
> Mem configuration. 
> 

I guess I didn't read it carefully enough, and since I was able to observe an OOM when the screen blanked, I thought maybe I was seeing the same bug as you.

Do you actually know that this is related to ICS vs JS or could it be related to the screen size?

> >> Punam tried with 1000 images (Thanks, Punam!).  I'll try with the 2000 that you tried. 
> Looks like it will not help much as you don't have proper device to test
> this use case. 
> 
> >> If the crash is always at the 10 minute mark that does seem like an important clue. I don't know anything about b2g process management, however, so don't know what to make of that.
> 
> Do you want us to re-adjust LMK values for this galley app problem ? 

I know nothing at all about the LMK, so have no opinion on what to do with it.

In addition to tweaking the LMK parameters, I think there are other knobs that can be adjusted that have to do with how gecko manages image memory. I don't understand those, either.

I guess
> that we should not allow gallery app to consume ~80MB memory during loading.

If the memory is free, I don't care if the app uses it. If gecko wants to hang on to decoded images, that should be fine as long as it releases them on memory pressure....


> If you look into #comment 20 then you can see that this is memory usage
> regression of gallery app for 800x480 JB 256MB device. It would be great if
> we can find a solution for memory usage of galley app instead of adjusting
> LMK or screen timeout settings. 
> 

There is no regression in the gallery app itself. Gallery's metadata parsing code is careful to serialize the parsing process so that only one image is being decoded at a time. We had all kinds of memory related bugs in previous releases, and I really do not think that there are any further memory savings to be achieved at the JavaScript level.  (Until we have bug 854795 and can downsample and decode at the same time.)

> please suggest

It seems clear at this point that this is not a Gaia regression. I don't know if it is gecko or gonk, though, and really have no clue.

I'll try following Milan's suggestion in comment #7 and will attach a patch, but I'm not particularly hopeful about that since I don't think that has anything to do with ICS vs JB. I don't know what SkiaGL even is, but I'm told that setting this new flag might reduce some memory usage, so maybe it will help.

After that I'm going to unassign myself, since I'm out of ideas and it seems clear that we won't solve this bug with JavaScript.

Assignee

Comment 74

•

11 years ago

Attached file link to patch on github (obsolete) (deleted) — Details

Tapas,

This patch implements Milan's suggestion from early in the history of this bug. Please give this a try and see if it makes any difference.

Milan Sreckovic [:milan] (needinfo for best results)

Assignee

Comment 75

•

11 years ago

Looks like I actually unassigned myself from this bug up at comment 65.

For a regression caused by the switch from ICS to JB, I don't even have a guess who the new assignee should be. Milan, is this something someone on your team could work on?

Flags: needinfo?(milan)

Comment 76

•

11 years ago

Sounds good - let's find out if your patch makes a difference.  Otherwise, I imagine that the memory increase isn't so much from the switch from ICS to JB (although I supposed it could be), but more because the size of the screen, and needing the larger image than we have in the thumbnail.  At that point, we need to decode the full size image (at least until bug 854795 gets fixed), which is larger than before.

Hema Koka [:hema]

Comment 77

•

11 years ago

NI Tapas to test David's patch from comment 74

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 78

•

11 years ago

Attached file b2g-procrank and kernel logs just before gallert crashes (tested with patch forom #comment 74 (deleted) — Details

(In reply to Hema Koka [:hema] from comment #77)
> NI Tapas to test David's patch from comment 74

here is my observation:
1) With 2000 sample image (I attached that image in this bugid), I can see device is scanning for 20mins and then gallery crashes. This is observed with patch from #comment 74 . You can see b2g-procrank and kernel logs when it crashes. 

2) But if I don't use patch from #comment 74 then gallery crashes within ~5min of scanning. So I can say that this patch(#comment 74) improves memory usage a lot. But we still need to save more memory if we want to allow end users to use their existing memory card (or memory card from another smartphone) with FFOS. 

3) If we change device orientation then we see gallery crashes during scanning. It happens almost immediately. Please suggest for this problem.

4) If we don't use attached sample image and only use picture captured by camera then we don't see any crash during initial scanning of 2000 images. I think that this is explained in #comment 14. But if we open a picture during scanning (during this use case) then I can see crash immediately. Do we have a bug for it ?

5) I don't think that this is ICS/JB issue . It seems to me that it is related to additional memory usage mentioned in #comment 14. And this is present only in 800x480 resolution device with 256MB memory and it can be reproduced very easily with 2000 copy of same image (attached to this bugid).

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 79

•

11 years ago

Comment on attachment 8369887 [details]
link to patch on github

It saves memory but we are still crashing after 20mins for 2000 images. We need to save more memory during scanning. But it is a big improvement. Please see #comment 78 for more details on this.

Attachment #8369887 - Flags: feedback+

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 80

•

11 years ago

Comment on attachment 8368974 [details]
link to patch on github

It didn't help in JB based FFOS 1.3 . I can see that gallery is scanning fine even if I force blanking screen by pressing power button on device.

Attachment #8368974 - Flags: review?(tkundu) → review-

Tim Guan-tin Chien [:timdream] (please needinfo)

Updated

•

11 years ago

Whiteboard: [c=memory p= s= u=1.3] → [c=memory p= s= u=1.3] [CR 611568]

Comment 81

•

11 years ago

Comment 60 is correct. If you think visibility state and memory priority should decouple for occasion like this, we should ask Gecko developers to fix that in another bug.

Flags: needinfo?(timdream)

Tim Guan-tin Chien [:timdream] (please needinfo)

Comment 82

•

11 years ago

(In reply to Tapas Kumar Kundu from comment #80)
> Comment on attachment 8368974 [details]
> link to patch on github
> 
> It didn't help in JB based FFOS 1.3 . I can see that gallery is scanning
> fine even if I force blanking screen by pressing power button on device.

Screen lock or CPU lock has nothing to do with visibility state. User is always allowed to press home to hide the app and press the hardware sleep/power key to turn off the screen.

Screen lock simply stops system app from automatically turning off the screen, e.g. for video/monitoring app.
CPU lock simply stop the phone from suspending JavaScript execution cycles when the screen is asleep.

Maybe we do need to fix comment 81 in this bug.....

Comment 83

•

11 years ago

I use about:memroy to get the memory usage. I found that we have a very different usage for "canvas-2d-pixels."
We need to find why the "canvas-2d-pixels" is so different. Maybe the visibility monitor is not work for the first launch.


1. first time launch for size(256x256) x 400pic png image.
 60.73 MB ── canvas-2d-pixels
  2.92 MB ── gfx-surface-image
  0.00 MB ── gfx-textures
        0 ── ghost-windows
  0.00 MB ── gralloc
 16.57 MB ── heap-allocated
 18.54 MB ── heap-committed
   11.91% ── heap-overhead-ratio
      400 ── host-object-urls
  0.00 MB ── imagelib-surface-cache
  0.29 MB ── js-main-runtime-temporary-peak
        0 ── page-faults-hard
  336,203 ── page-faults-soft
 42.88 MB ── resident
 23.89 MB ── resident-unique
118.36 MB ── vsize

2. the second time launch for size(256x256) x 400pic png image..
  2.76 MB ── gfx-surface-image
  0.00 MB ── gfx-textures
        0 ── ghost-windows
  0.00 MB ── gralloc
 15.64 MB ── heap-allocated
 16.36 MB ── heap-committed
    4.61% ── heap-overhead-ratio
    2,004 ── host-object-urls
  0.00 MB ── imagelib-surface-cache
  0.29 MB ── js-main-runtime-temporary-peak
        2 ── page-faults-hard
   14,350 ── page-faults-soft
 37.73 MB ── resident
 22.17 MB ── resident-unique
117.26 MB ── vsize

Comment 84

•

11 years ago

Sorry, forget the "Maybe the visibility monitor is not work for the first launch" in comment 83.

Maybe we have some problem to release the canvas 2d buffer.

Comment 85

•

11 years ago

So it seems to me there is no back-pressure in the scanning process.  It reads files, generates thumbnails/previews which creates canvases, and then saves them to disk as a blob.  The canvases are not discarded until the canvas.toBlob() callback is fired.

This seems problematic to me because its probably faster to read files off disk than it is to re-encode images and save to disk.  This means we could be building up a large amount of canvases in flight waiting for things to flush.

David, Gabriele, what do you think about adding a callback from the metadata parser infrastructure through to MediaDB so that it can throttle when to process the next newly scanned file?  We could then limit the number of outstanding canvases to a known quantity.

Flags: needinfo?(gsvelto)

Flags: needinfo?(dflanagan)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 86

•

11 years ago

Oops.  I thought Gabriele was working this bug, but I was thinking of bug 917193.  Sorry for the bug spam.

Flags: needinfo?(gsvelto)

Comment 87

•

11 years ago

Jerry is looking at this, so assigning to him.  I don't know how long this will take because I don't know what success looks like.  We were failing with 1000+100, and that was deemed a reasonable case.  We now have a patch that lets us go beyond that, but we fail at 2500+100.  Is there another target short of "never OOM"?

Assignee: nobody → hshih

Flags: needinfo?(milan)

Whiteboard: [c=memory p= s= u=1.3] [CR 611568] → [c=memory p= s= u=1.3] [CR 611568][ETA: n/a]

Assignee

Comment 88

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #85)
> So it seems to me there is no back-pressure in the scanning process.  It
> reads files, generates thumbnails/previews which creates canvases, and then
> saves them to disk as a blob.  The canvases are not discarded until the
> canvas.toBlob() callback is fired.
> 
> This seems problematic to me because its probably faster to read files off
> disk than it is to re-encode images and save to disk.  This means we could
> be building up a large amount of canvases in flight waiting for things to
> flush.
> 
> David, Gabriele, what do you think about adding a callback from the metadata
> parser infrastructure through to MediaDB so that it can throttle when to
> process the next newly scanned file?  We could then limit the number of
> outstanding canvases to a known quantity.

Keep in mind that I've been through all this before. Gallery's memory usage has been optimized and optimized and optimized already in the 1.0 and 1.1 release cycles.

MediaDB has lots of code to ensure that metadata parsing, although done asynchronously, is strictly serialized. There is only one image being processed at a time, and nothing happens until toBlob() is done.  We do not wait until the DB writes and file saves are complete, however.  If thumbnails are showing up on the screen while the scan is in progress, that indicates that the DB writes are happening. I suppose it is possible that the writes of the generated preview images to /sdcard/.gallery/previews/ are backing up, though that seems unlikely.  Probably easy enough to check that by adding an onsuccess callback with some console logging in the save() function at apps/gallery/js/MetadataParser.js:371

Though I don't know that will do any of us any good without the QRD device to try it out on.

Does anyone at Mozilla have the ability to work on this bug and have access to a device that can reproduce the crash?

Assignee

Comment 89

•

11 years ago

(In reply to Milan Sreckovic [:milan] from comment #87)
> Jerry is looking at this, so assigning to him.  I don't know how long this
> will take because I don't know what success looks like.  We were failing
> with 1000+100, and that was deemed a reasonable case.  We now have a patch
> that lets us go beyond that, but we fail at 2500+100.  Is there another
> target short of "never OOM"?

I agree that at some point we may have to just say "good enough; we don't have to block" anymore.  But there is a mystery memory leak here.  If memory usage increases linearly with the number of photos that need to be scanned, something is going wrong.  And it is something that does not seem to happen on the Hamachi, which can scan 2000 previewless images (it takes over 40 minutes, but there is no crash).

Flags: needinfo?(dflanagan)

Assignee

Comment 90

•

11 years ago

An interesting experiment with a hamachi might be to modify the metadata parser code to force it to create 400x800 preview images and 200x200 thumbnails (instead of 320x480 and 120x120). This would simulate the image sizes being used on the QRD device.

If we had a QRD device, I would like to know what happens if we didn't generate the preview image. Note, though that because of bug 942199 photos from that device do not have a big enough preview, so we'd have to use a manually created test image, or just turn off the preview generation code in MetadataParser.js.  This might help us narrow down the leak.  If we crash with only thumbnails being created, then we know that the leak probably has something to do with those thumbnails.

Jerry's finding in comment #83 does seem like an interesting lead.  Jerry are you testing this on the QRD device? In terms of freeing up canvases, there are two calls to toBlob() in MetadataParser.js.  In one the callback explicitly sets the canvas size to 0 once the blob is available.  In the other it does not, and we rely on ordinary garbage collection to discard that 200x200 canvas.  We could try patching that by changing MetadataParser.js:127 to be:

  canvas.toBlob(function(b) { canvas.width = canvas.height = 0; callback(b); }, 'image/jpeg');

Comment 91

•

11 years ago

(In reply to David Flanagan [:djf] from comment #90)
> If we had a QRD device,

QRD8x28s are available.  See bug 961246.  We could probably force the RAM down to simulate a 256MB config.  Not an exact match but would be closer than a Hamachi.

Comment 92

•

11 years ago

Attached patch gallery.patch (obsolete) (deleted) — Details — Splinter Review

Release the canvas buffer explicitly. The "canvas-2d-pixels" becomes zero.
I think the gallery can load more picture.

--
If we want to handle 5000 even 10000 picture, I think we should consider the DOM element number. Currently, If one picture is out of the visibility window, we still have a DOM element for this picture. We just set the backgroundImage to null. The DOM element might consume some memory.


First init with 4000pic 256x256 png file
  0.00 MB ── canvas-2d-pixels
  2.92 MB ── gfx-surface-image
  0.00 MB ── gfx-textures
        0 ── ghost-windows
  0.00 MB ── gralloc
 50.46 MB ── heap-allocated
 52.59 MB ── heap-committed
    4.22% ── heap-overhead-ratio
    4,000 ── host-object-urls
  0.00 MB ── imagelib-surface-cache
  0.29 MB ── js-main-runtime-temporary-peak
        0 ── page-faults-hard
2,707,857 ── page-faults-soft
 78.34 MB ── resident
 59.12 MB ── resident-unique
144.42 MB ── vsize

Second launch time
 2.76 MB ── gfx-surface-image
 0.00 MB ── gfx-textures
       0 ── ghost-windows
 0.00 MB ── gralloc
13.00 MB ── heap-allocated
13.63 MB ── heap-committed
   4.79% ── heap-overhead-ratio
     945 ── host-object-urls
 0.00 MB ── imagelib-surface-cache
 0.29 MB ── js-main-runtime-temporary-peak
       2 ── page-faults-hard
  13,378 ── page-faults-soft
35.21 MB ── resident
19.82 MB ── resident-unique
92.37 MB ── vsize

Attachment #8371256 - Flags: feedback?(tkundu)

Attachment #8371256 - Flags: feedback?(dflanagan)

Comment 93

•

11 years ago

My test device is helix. I have 384mb memory.
So far I don't get OOM when I launch gallery.

Assignee

Comment 94

•

11 years ago

Comment on attachment 8371256 [details] [diff] [review]
gallery.patch

Jerry,

Thanks for trying this approach. Since you observe a difference in the about:memory results, I'm hoping that Tapas will see the difference, too.  Unfortunately he's the only one I know of who can actually test our patches for us.  Setting needinfo for Tapas in case he doesn't follow the feedback? flag as closely.

Is there a reason that you didn't leave in the willReadFrequently flags?  I tried that at Milan's suggestion, and Tapas seemed to think that it made some difference.  But I don't really understand what it does, so I'm not sure whether it is necessary here or not. In any case, you'll want to modify your patch to uncomment those lines or to remove the commented lines.

Attachment #8371256 - Flags: feedback?(dflanagan) → feedback+

Flags: needinfo?(tkundu)

Comment 95

•

11 years ago

(In reply to David Flanagan [:djf] from comment #94)
> Is there a reason that you didn't leave in the willReadFrequently flags?  I
> tried that at Milan's suggestion, and Tapas seemed to think that it made
> some difference.  But I don't really understand what it does, so I'm not
> sure whether it is necessary here or not. In any case, you'll want to modify
> your patch to uncomment those lines or to remove the commented lines.

I just want to test the worse case. We should use willReadFrequently for final release.

(In reply to Jerry Shih[:jerry] from comment #92)
> If we want to handle 5000 even 10000 picture, I think we should consider the
> DOM element number. Currently, If one picture is out of the visibility
> window, we still have a DOM element for this picture. We just set the
> backgroundImage to null. The DOM element might consume some memory.

Does the gallery easy to remove/attach the preview DOM node? Not only set the backgroundImage to null

Flags: needinfo?(dflanagan)

Gabriele Svelto [:gsvelto]

Assignee

Comment 96

•

11 years ago

(In reply to Jerry Shih[:jerry] from comment #95)
> (In reply to David Flanagan [:djf] from comment #94)
> > Is there a reason that you didn't leave in the willReadFrequently flags?  I
> > tried that at Milan's suggestion, and Tapas seemed to think that it made
> > some difference.  But I don't really understand what it does, so I'm not
> > sure whether it is necessary here or not. In any case, you'll want to modify
> > your patch to uncomment those lines or to remove the commented lines.
> 
> I just want to test the worse case. We should use willReadFrequently for
> final release.

Great.

> 
> (In reply to Jerry Shih[:jerry] from comment #92)
> > If we want to handle 5000 even 10000 picture, I think we should consider the
> > DOM element number. Currently, If one picture is out of the visibility
> > window, we still have a DOM element for this picture. We just set the
> > backgroundImage to null. The DOM element might consume some memory.
> 
> Does the gallery easy to remove/attach the preview DOM node? Not only set
> the backgroundImage to null

That would be too big a change for a last minute 1.3 blocker.

I've actually filed a 1.4 bug to try to remove visibility_monitor entirely, because I don't think we need it for image visibility any more.  

If we also want to add and remove dom elements, we should consider some other infinite scrolling solution, I think. I suspect that trying to do that with visibility monitor would cause problems with too many layouts.  Also, the way we now segment thumbnails by month complicates matters.

Flags: needinfo?(dflanagan)

Comment 97

•

11 years ago

(In reply to David Flanagan [:djf] from comment #94)
> But I don't really understand what it does, so I'm not
> sure whether it is necessary here or not.

Setting |willReadFrequently| to true forces the canvas to be backed by a software-only surface as opposed to one accelerated via SkiaGL. This uses less memory as we don't allocate a GL context and the associated backing buffer in graphics memory.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 98

•

11 years ago

Comment on attachment 8371256 [details] [diff] [review]
gallery.patch

This patch does not apply cleanly on FFOS 1.3 gaia/gecko. It seems like that you have [1] in your repo but it has not came to FFOS 1.3 gaia/gecko till now.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=958681 

I tried this patch by resolving minor conflict manually. I found that 

1) Gallery is killed after 10mins with this patch
2) Gallery is killed within 5mins without this patch

This patch looks nice to me as it is saving some memory. But I am interested to know what is causing memory growth over a period of time for 2000 copy of same image in gallery. is there any memory leak issue with gallery?

Attachment #8371256 - Flags: review+

Attachment #8371256 - Flags: feedback?(tkundu)

Attachment #8371256 - Flags: feedback+

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 99

•

11 years ago

Attached file memory growth of gallery app with patch from #comment 98 (deleted) — Details

memory growth of gallery app with patch from #comment 98. Looks like either gallery is leaking some memory or it is using more memory over a period of time to scan images (2000 copy of same image which i have attached with this bugid). 

can anyone please explain why we are seeing memory growth for over a period of time in gallery ?

Assignee

Comment 100

•

11 years ago

Tapas,

Would you test again with all three patches applied (or at least the last two)? This most recent patch did not include the previous patch, and both improved things for you. So it would be good to see how they do together.

As for the memory growth, if anyone could explain why were seeing that we would know how to fix the bug!

Your attachments of b2g-procrank output are helpful but the fact that gallery's memory goes up and down so much make it hard to interpret the output. I'd rather have data from runs that don't crash after they have finished scanning and memory usage has stabilized.

Here's some data I'd be interested in seeing:

Put 100 photos on your device, let gallery scan them to completion (assuming no crash with this number). Wait a minute without scrolling, to let memory stabilize, and then use procrank to see the final memory usage.

Then repeat with the same photos so no scanning is required.

Then erase the db (or reset gaia) and repeat with 200, 300, 400, 500, 600, etc.

For small number of photos, I'd expect to see a linear relationship between photos and memory usage because more photos means more decoded thumbnails. Each 200x200 thumbnail requires 160,000 bytes to hold the decoded image, so adding 100 more thumbnail should make memory use go up by 16,000,000 bytes.

At some point, however, gecko and/or the visibility monitor code in Gallery should stop decoding thumbnail images when they are too far offscreen. So as long as we don't scroll, we should see the total memory size flatten out and no longer increase very much as the # of photos goes up.

If we see this pattern for the runs where the scanning is already complete but do not see the memory usage tapering off for the runs where scanning is done, then we know that there is something wrong with the scanning process.

From some preliminary testing I did earlier in this bug, it seems as if the visibility monitor code is failing during the scan and is allowing too many images to decode, so that is something we could investigate.

(Ah, here's another idea: try commenting out the body of thumbnailOnscreen() at lines 754 and 755 of gallery.js. This will mean that thumbnails will never be displayed and you'll just see a bunch of gray boxes. If you still get an OOM during scanning, then we know that the bug is not related to the thumbnails.)

Unfortunately, Tapas, you are the only person I know of who can do any testing on this bug.

Flags: needinfo?(tkundu)

Comment 101

•

11 years ago

Hi David,

If we have a lot of picture, the gallery will load all thumbnail into memory.
https://github.com/mozilla-b2g/gaia/blob/master/apps/gallery/js/gallery.js#L446
After enumerating, we have all thumbnail in memory.
I think it is the reason why the memory usage will increase with more and more picture.

Can we remove the thumbnail from fileinfo object? If the thumbnail is in the visible window, we load the thumbnail blob object on demand.

Flags: needinfo?(dflanagan)

Peter Chang[:pchang]

Comment 102

•

11 years ago

(In reply to Jerry Shih[:jerry] from comment #83)
> I use about:memroy to get the memory usage. I found that we have a very
> different usage for "canvas-2d-pixels."
> We need to find why the "canvas-2d-pixels" is so different. Maybe the
> visibility monitor is not work for the first launch.
> 
> 
> 1. first time launch for size(256x256) x 400pic png image.
>  60.73 MB ── canvas-2d-pixels
>   2.92 MB ── gfx-surface-image
>   0.00 MB ── gfx-textures

David, about memory of "canvas-2d-pixels" above are created by too many local canvas objects[1] for thumbnail creation.
Since the width/height of the thumbnail canvas are the same[2], could we use only one canvas object to generate all the thumbnail. I think it will help to reduce the total memory usage during first launch and we don't need to wait GC to cleanup the unused canvas objects.

[1] https://github.com/mozilla-b2g/gaia/blob/master/apps/gallery/js/MetadataParser.js#L44
[2] https://github.com/mozilla-b2g/gaia/blob/master/apps/gallery/js/MetadataParser.js#L45

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 103

•

11 years ago

Hi David,

I saw some interesting memory usage in about:memory.
We see the "memory-file-data/large/file" at 1st launch, but we don't see this at 2nd launch.
It costs a lot.

1st launch
29.70 MB (100.0%) -- explicit
├──15.63 MB (52.62%) -- dom
│  ├──15.63 MB (52.62%) ── memory-file-data/large/file
│  └───0.00 MB (00.00%) ── event-listener-managers-hash


This "memory-file-data" is a blob object in gecko, and it was generated after calling toBlob() in createThumbnailFromElement().
https://github.com/mozilla-b2g/gaia/blob/master/apps/gallery/js/MetadataParser.js#L134

Does this blob object be freed after saving to mediaDB? There is too many callback and closure. It's hard to me to find the callback function. :(

Reporter

Comment 104

•

11 years ago

Hi jerry shih/David Flanagan,

can we have chat in IRC ? I want to debug this issue together on my device. 

Please let me know when you are free. We are seeing same issue <100 pic for higher resolution image (not attached in this bugid) . 

I think that we should debug together to conclude quickly.

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Updated

•

11 years ago

Flags: needinfo?(hshih)

Assignee

Comment 105

•

11 years ago

(In reply to Jerry Shih[:jerry] from comment #101)
> Hi David,
> 
> If we have a lot of picture, the gallery will load all thumbnail into memory.
> https://github.com/mozilla-b2g/gaia/blob/master/apps/gallery/js/gallery.
> js#L446
> After enumerating, we have all thumbnail in memory.
> I think it is the reason why the memory usage will increase with more and
> more picture.
> 
> Can we remove the thumbnail from fileinfo object? If the thumbnail is in the
> visible window, we load the thumbnail blob object on demand.

That code is fine for this bug. This is only about the first run case where we start with no images in the db and are scanning them all.  When a blob is stored in indexeddb, it is stored in an external file, and when queried, the Blob object is just a reference to that file.  So enumerating all the images does not actually (unless something has changed in gecko) read the encoded images into memory.

Flags: needinfo?(dflanagan)

Assignee

Comment 106

•

11 years ago

(In reply to peter chang[:pchang][:peter] from comment #102)

> David, about memory of "canvas-2d-pixels" above are created by too many
> local canvas objects[1] for thumbnail creation.
> Since the width/height of the thumbnail canvas are the same[2], could we use
> only one canvas object to generate all the thumbnail. I think it will help
> to reduce the total memory usage during first launch and we don't need to
> wait GC to cleanup the unused canvas objects.
> 

Peter: reusing that canvas is an interesting idea.  Note, however, that Jerry's patch attached to this bug solves the same problem by explicitly setting the canvas size to 0 after use.  Reusing the cavans might be more efficient, but presumably won't save us any more memory than we've already saved with Jerry's patch, so I don't think that will help us.

Assignee

Comment 107

•

11 years ago

(In reply to Jerry Shih[:jerry] from comment #103)
> Hi David,
> 
> I saw some interesting memory usage in about:memory.
> We see the "memory-file-data/large/file" at 1st launch, but we don't see
> this at 2nd launch.
> It costs a lot.
> 
> 1st launch
> 29.70 MB (100.0%) -- explicit
> ├──15.63 MB (52.62%) -- dom
> │  ├──15.63 MB (52.62%) ── memory-file-data/large/file
> │  └───0.00 MB (00.00%) ── event-listener-managers-hash
> 
> 
> This "memory-file-data" is a blob object in gecko, and it was generated
> after calling toBlob() in createThumbnailFromElement().
> https://github.com/mozilla-b2g/gaia/blob/master/apps/gallery/js/
> MetadataParser.js#L134
> 
> Does this blob object be freed after saving to mediaDB? There is too many
> callback and closure. It's hard to me to find the callback function. :(

I don't know of any way to explicitly free a blob, do you? If we are somehow retaining a reference to these blobs and not allowing them to be garbage collected, at least they'll be jpeg-encoded images which means much much smaller than the decoded images.  

Jerry: how many images did you scan before you saw that 15.6mb of large files? Does the number go up linearly with the number of images scanned?  That will help us know if this is a memory leak or whether you just did an about:memory dump before gc had been performed.

Assignee

Comment 108

•

11 years ago

(In reply to Tapas Kumar Kundu from comment #104)
> Hi jerry shih/David Flanagan,
> 
> can we have chat in IRC ? I want to debug this issue together on my device. 
> 
> Please let me know when you are free. We are seeing same issue <100 pic for
> higher resolution image (not attached in this bugid) . 
> 
> I think that we should debug together to conclude quickly.

Tapas: my IRC handle is djf, and I'm usually in #gaia. I'm on Pacific time. Jerry is in Taipei, so it will be difficult to find a time that works for both he and I. You can feel free to ping me, but I'm at home sick today, will probably be useless for debugging.

I'll spend some time re-reading the code that Jerry points to above to see if I an identify a memory leak that would cause the memory-file-data/large/file.

Maybe I'll modify the code so that it creates 480x800 previews even though my hamachi screen size is 320x480.  If I can reproduce your crash that way, then we'll know that this has more to do with the screen size than with the ICS/JB distinction.

Assignee

Comment 109

•

11 years ago

I've modified gallery to always create 200x200 thumbnails and 480x800 previews, and was then able to process 1000 images without OOM.  (With the display set to never blank.) So maybe this does have more to do with the version of android than with the screen size.  My test did not include either of the memory saving patches attached to this bug.  I did it with the latest 1.3 nightly for my hamachi device.

So we can keep trying to chase memory leaks, but I fear that we won't be able to get to the bottom of this without a 256mb ICS device to test on.  And I wonder if we need someone who knows about memory management at the OS level rather than trying to chase this at the Gaia level.

Assignee

Comment 110

•

11 years ago

After killing and restarting the app, it scanned some more. It ran to completion without crashing previously but somehow missed some images on the first scan? The second scan was short, but it did process images I think.  Killing it a second time and starting a third, it launched  and did not scan any new images.

After the first run (when it scanned for 20+ minutes) the stable state memory usage (reported by b2g-ps) was 135M.  On the third run where it didn't scan at all, the stable state memory usage is 91M.

So maybe I'll go back to trying to figure out if and why the visibility monitor is failing during the scanning.

Assignee

Comment 111

•

11 years ago

Going back to Jerry's comments #101 and #103...

When we enumerate the database (and no scanning is involved) the thumbnail blobs are files on disk.

But when we scan, we're using in-memory files and we're holding on to those.  That might account for the large difference in memory usage for the scanning case vs the not scanning case.

When the gallery is notified of a new file, maybe it should query the mediadb to get the record out of the db so that the blob is on disk rather than in memory.  If that solves this problem, I should possibly modify mediadb to have an option to do that automatically.

Assignee

Comment 112

•

11 years ago

While scanning photos on my 1.3 ICS hamachi, I notice that every 20 to 30 seconds, all of the thumbnails go blank and revert to a gray background color.  If I watch the b2g-ps output every second, this happens when VSIZE gets up to about 220M, and after the blank, VSIZE drops down to 150M or so.

It seems clear that this is some sort of garbage collection going on. I wonder if on the QRD device the app is getting killed before that gc pass can complete.  Tapas: do you ever see those repeated flashes of gray screen while testing on your device?

Meanwhile, on this test run, I'm not seeing errors with the visibility monitor.  The onscreen callback was called for the first ~150 thumbnails, and then it stopped getting called.  (It is getting called twice for each thumbnail, which seems like an error, but harmless.)  My previous tests included photos in many different directories. This time I'm using a single directory, and scanning seems to be happening from most recent to least recent, so all thumbnails are being added at the bottom.  In the previous case we were probably doing insertions at the top, so onscreen had to be called more often. (There was a bug where offscreen was not being called, I suppose.)

Assignee

Comment 113

•

11 years ago

Scanning has just completed without any OOM.  Of the 1000 photos on the sdcard, only 969 actually got scanned, which is concerning.

Memory use when scanning was complete was stable at 127mb. That is lower than the 135mb I saw before, but much higher than I'd expect for the case where no scanning took place.  But if each thumbnail blob was 7kb and there were 1000 of them, then I'd expect my patch to use file-backed thumbnail blobs instead of memory-backed blobs would save about 7mb.

Assignee

Comment 114

•

11 years ago

Running the app again, I find that some or all of remaining files are scanned, 157 thumbnails get their onscreen callback called (about as expected) and memory usage settles at 85mb.  But if I scroll even once, then memory goes up to ~120M and stays at that level.

The thumbnail blobs turn out to be 12kb each, so I would expect to see more memory savings if my patch to force file-backed blobs had worked.

Assignee

Comment 115

•

11 years ago

Attached file patch to try to force a file-backed thumbnail blob when scanning (obsolete) (deleted) — Details

This patch is my attempt to force the gallery to discard the memory-backed thumbnail blob it is passed during the scanning process and instead explicitly query the media db to get a file-backed blob.  With the test image we're using and 1000 200x200 thumbnails, this should result in saving something like 12mb.

The patch does slow down scanning. On my hamachi it now takes 1.4s for each image.  (It won't be this bad in practice assuming that EXIF preview images are big enough.)

Assignee

Comment 116

•

11 years ago

Tapas,

The patch attached above seems to resolve the "big file" memory issue that Jerry noticed. With this patch applied, I can scan 1000 images (creating big previews and thumbnails for them) without an OOM. And when I'm done, the gallery memory usage is stable at 103mb instead of at 127 or 135mb as I'd seen before. Looks like I'm getting substantially better memory savings from it than the 12mb I expected.

(Unfortunately, I think the tradeoff is significantly worse scanning time. There is probably a batching solution that could help with that, though.)

Please check whether the patch above makes any difference for you.

The patch also includes my code to force 200x200 thumbnails and 480x800 preview images. Please confirm that your device is 480x800 and that devicePixelRatio is 1. (That is, that 1 css pixel == 1 device pixel). If your devicePixelRatio is > 1 then your device uses even bigger images than I realized.

If you want to continue investigating the possibility of memory leaks in Gallery, let's first try to narrow things down. We need to find out whether it is the thumbnail images or the preview images that are causing problems. If you modify the "is it big enough" test in the previewsuccess function in MetadataParser.js you can force Gallery to just accept the embedded EXIF preview image as big enough and not try to generate its own. This will make scanning much, much quicker. It would be interesting to know if you crash in that case.

Next, I'd try commenting out the thumbnailOnscreen callback in gallery.js. This will prevent any thumbnails from being decoded and displayed. If you still crash with that, then we know that it is not the memory consumed by decoding the thumbnails that is at fault. But if you don't crash, then maybe you can fix things by just tuning the parameters passed to the VisibilityMonitor.

But again, the fact that the scan always completes without OOM on my 256mb hamachi, even when I simulate a big screen, makes me think that the problem is probably not in Gallery (although we've found room for improvement in Gallery) but in the lower levels of the OS. That is an area that I know nothing about and cannot help you with.

Flags: needinfo?(tkundu)

Assignee

Comment 117

•

11 years ago

Jerry,

Could you also try my most recent patch to see if it solves the "big file" memory issue you identified?  I tested it but just looked at the b2g-ps output to assess memory impact.  Your analysis of an about:memory dump will be more authoritative.

Comment 118

•

11 years ago

Attached patch mix willReadFrequently, resize canvas and file-base blob (obsolete) (deleted) — Details — Splinter Review

Thanks, David!
The large memory-file-data is disappeared after scanning process.
I don't find that in explicit memory section.
But the "vsize" is still huge. I will check how we collect this value.

---
1st
19.34 MB (100.0%) -- explicit
├───4.60 MB (23.78%) ++ window-objects/top
├───4.22 MB (21.81%) ++ images
├───3.01 MB (15.54%) ++ js-non-window
├───2.89 MB (14.92%) ── heap-unclassified
├───2.19 MB (11.33%) ++ heap-overhead
├───0.87 MB (04.48%) ── xpti-working-set
├───0.30 MB (01.58%) ++ (8 tiny)
├───0.29 MB (01.48%) ── freetype
├───0.26 MB (01.36%) ++ xpconnect
├───0.26 MB (01.35%) ── layout/style-sheet-cache
├───0.25 MB (01.30%) ++ xpcom
└───0.21 MB (01.07%) ── atom-tables

  0.00 MB ── canvas-2d-pixels
  2.92 MB ── gfx-surface-image
  0.00 MB ── gfx-textures
        0 ── ghost-windows
  0.00 MB ── gralloc
 14.92 MB ── heap-allocated
 17.12 MB ── heap-committed
   14.68% ── heap-overhead-ratio
    2,000 ── host-object-urls
  0.00 MB ── imagelib-surface-cache
  0.29 MB ── js-main-runtime-temporary-peak
        0 ── page-faults-hard
  239,903 ── page-faults-soft
 39.23 MB ── resident
 22.44 MB ── resident-unique
162.10 MB ── vsize


2nd
19.23 MB (100.0%) -- explicit
├───4.43 MB (23.06%) ++ window-objects/top
├───4.05 MB (21.04%) ++ images
├───2.94 MB (15.30%) ++ js-non-window
├───2.77 MB (14.39%) ── heap-unclassified
├───2.62 MB (13.63%) ++ heap-overhead
├───0.87 MB (04.51%) ── xpti-working-set
├───0.29 MB (01.50%) ++ (8 tiny)
├───0.28 MB (01.48%) ── freetype
├───0.26 MB (01.36%) ++ xpconnect
├───0.26 MB (01.36%) ── layout/style-sheet-cache
├───0.25 MB (01.30%) ++ xpcom
└───0.21 MB (01.08%) ── atom-tables

  2.76 MB ── gfx-surface-image
  0.00 MB ── gfx-textures
        0 ── ghost-windows
  0.00 MB ── gralloc
 14.00 MB ── heap-allocated
 16.62 MB ── heap-committed
   18.72% ── heap-overhead-ratio
    2,000 ── host-object-urls
  0.00 MB ── imagelib-surface-cache
  0.29 MB ── js-main-runtime-temporary-peak
        2 ── page-faults-hard
   19,999 ── page-faults-soft
 39.10 MB ── resident
 22.59 MB ── resident-unique
104.36 MB ── vsize

Comment 119

•

11 years ago

Here are another interesting parts.

9.92 MB (100.0%) -- blob-urls
└──9.92 MB (100.0%) ++ owner(app://gallery.gaiamobile.org/index.html)

    2,000 ── host-object-urls

1.
I decrease the visibility window size, but the blob-urls size is not change.
Maybe we should call revokeObjectURL() to purge the blob object which is not used.
I will check when the blob-urls be created.
2.
We have 2000 url object which number is the same as the all picture number.
If we only create url object for the picture which is within the visibility window, maybe we can save some memory.

Comment 120

•

11 years ago

Hi Tapas and David,
You can try the about:memory tool to dump the memory usage.
It's easy to use.

In B2G/tools folder, you will find the get_about_memory.py.
Just exec the get_about_memory.py. It will connect to device, and get the memory report back.

You also can exec:
./get_about_memory.py -m
It will call GC before dumping memory report.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 121

•

11 years ago

(In reply to Tapas Kumar Kundu from comment #104)
> Hi jerry shih/David Flanagan,
> 
> can we have chat in IRC ? I want to debug this issue together on my device. 
> 
> Please let me know when you are free. We are seeing same issue <100 pic for
> higher resolution image (not attached in this bugid) . 
> 
> I think that we should debug together to conclude quickly.

My nickname is Jerry or Jerry_ in IRC. I also join the #gaia.
Taipei is GMT+8, so it's not easy to find David and I together.

I suggest that you can find the maximum number of higher resolution image which doesn't get OOM.
Then use get_about_memory.py to dump the memory report.
Maybe there is another problem in hi-resolution image case.

Flags: needinfo?(hshih)

Comment 122

•

11 years ago

I don't have time to read this whole bug but if you need some help figuring out what is going on on the Gecko side I am in Taipei this week.  It looks like Jerry is on to the cause in comment 119 though.  If I can help please come find have Jerry come find me.

Gabriele Svelto [:gsvelto]

Comment 123

•

11 years ago

(In reply to David Flanagan [:djf] from comment #112)
> It seems clear that this is some sort of garbage collection going on.

That's probably caused by the memory pressure monitor. When we detect that available memory drops below a certain boundary we dispatch 'memory-pressure' events to all applications and all subsystems. This not only runs the GC/CC but also flushes caches and other temporary structures. The flash might be caused by this mechanism purging the image cache.

Note that this mechanism is device-dependent; the unagi devices didn't have it for example which lead to more frequent OOMs in the case of a single application taking lots of memory all for itself. AFAIK all other devices should have it but if you want to verify check with adb if the following patch exists:

/sys/module/lowmemorykiller/parameters/notify_trigger

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 124

•

11 years ago

Gabriele: thanks for this useful information. Tapas: does your device have the notify_trigger patch? (And do you see the gray flash during scanning when the thumbnails disappear and then come back?)

Kyle: thank you for your offer to help. I'd guess that Jerry would love some help with this.

Jerry: Yes, one blob url per thumbnail, held for the lifetime of the app. I was assured (by bent, probably) that blob urls were inexpensive. So that requirement is pretty fundamental to the current architecture. Not something we can change in 1.3. In any case, I think we're spending too much time at the Gaia level looking for memory savings. We've found some good ones. But remember that the app runs find on every other device we can test it on. Tapas doesn't see the crash on ICS-based devices. And I don't see the crash even when I try to simulate a 480x800 device on my hamachi. I suspect that we now need to be looking at OS-level memory management stuff. I have no idea how to do that, which is why I unassigned myself from this bug.

In comment #71, Tapas asks whether we should adjust the LMK parameters. I don't know anything about that, but maybe we should. Or we should at least find out if they are differnet on Tapas's device than they are on other devices. Gabriele: do you know about the LMK? If so (and if Jerry does not) then perhaps you are a better person to work on this bug than Jerry is.

The tricky thing about this bug is that the crash only occurs on a qualcomm reference device that no one at mozilla has. There are apparently some similar QRD devices available at mozilla, but they have more memory than the one that crashes.

The device that crashes has 256mb of memory, a 480x800 screen, and runs JB. I've tried making the gallery app think it has a 480x800 screen so it uses up more memory but I cannot reproduce the OOM on my 256mb hamachi. (Note that the hamachi will OOM if the screen goes to sleep duing the scanning, but that is not the issue here.)

In any case, I think the ball is in QC's court right now. We need Tapas to try out the latest patch (Tapas: Jerry's combination patch in comment #118 is probably a good one to try). And we want to know whether his device has the notify_trigger patch and whether he sees the thumbnails going blank and then coming back during the scanning process when memory gets low.

It does not make sense to have 4 Mozilla engineers (Jerry, Kyle, Gabriele and myself) spending time on a bug when none of us has access to the device that exhibits the crash. But when we hear again from Tapas, perhaps we can figure out which of us is the best qualified to continue to work with Tapas on it.

Flags: needinfo?(khuey)

Flags: needinfo?(hshih)

Flags: needinfo?(gsvelto)

Comment 125

•

11 years ago

Using a blob URL will keep the blob alive until it is revoked.  It's cheaper than a data: URI certainly, but it's not cheap.  As long as your blobs are all backed by the disk it's not a big deal (each blob object takes up a few tens of bytes), but if they're in memory I would expect that to be a pretty big problem.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 126

•

11 years ago

Sorry, for the delay. Looks like Jerry's combination patch in comment #118 has a big improvement. We also have NUWA enabled in latest FFOS 1.3 which saves more memory in FFOS 1.3 now. I am still running scripts to collect logs.. I will update here soon. Thanks a lot for your work

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Updated

•

11 years ago

Flags: needinfo?(tkundu)

Peter Chang[:pchang]

Comment 127

•

11 years ago

(In reply to David Flanagan [:djf] from comment #106)
> (In reply to peter chang[:pchang][:peter] from comment #102)
> 
> > David, about memory of "canvas-2d-pixels" above are created by too many
> > local canvas objects[1] for thumbnail creation.
> > Since the width/height of the thumbnail canvas are the same[2], could we use
> > only one canvas object to generate all the thumbnail. I think it will help
> > to reduce the total memory usage during first launch and we don't need to
> > wait GC to cleanup the unused canvas objects.
> > 
> 
> Peter: reusing that canvas is an interesting idea.  Note, however, that
> Jerry's patch attached to this bug solves the same problem by explicitly
> setting the canvas size to 0 after use.  Reusing the cavans might be more
> efficient, but presumably won't save us any more memory than we've already
> saved with Jerry's patch, so I don't think that will help us.

David, agree that reusing canvas had similar memory usage as zero canvas size. But I think reusing canvas could also save the buffer allocations inside gecko which involve some IPC calls.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 128

•

11 years ago

(In reply to Jerry Shih[:jerry] from comment #119)
> Here are another interesting parts.
> 
> 9.92 MB (100.0%) -- blob-urls
> └──9.92 MB (100.0%) ++ owner(app://gallery.gaiamobile.org/index.html)
> 
>     2,000 ── host-object-urls
> 

I'm sure that we load all thumbnail into these blob-urls(both in 1st and 2nd launch).
We can unroll the blob-urls item to see the detail.

If I have 2000 images, I can see 2000 blob-urls entries in memory report. The total blob-urls' size will become large if we have a lot of thumbnail.

9.92 MB (100.0%) -- blob-urls
└──9.92 MB (100.0%) ++ owner(app://gallery.gaiamobile.org/index.html)
   ├──0.00 MB (00.05%) ── blob:0027984b-4209-4198-9cc6-b1aa3954c2d5   #1
   ├──0.00 MB (00.05%) ── blob:0070362d-3c69-4762-91a9-0c1d2424db60
   ...............
   ├──0.00 MB (00.05%) ── blob:03f49825-b5ee-4e6f-9b9b-f686e2ea0373   #2000



(In reply to David Flanagan [:djf] from comment #124)
> Jerry: Yes, one blob url per thumbnail, held for the lifetime of the app. I
> was assured (by bent, probably) that blob urls were inexpensive.  So that
> requirement is pretty fundamental to the current architecture. Not something
> we can change in 1.3. In any case, I think we're spending too much time at
> the Gaia level looking for memory savings.  We've found some good ones. But
> remember that the app runs find on every other device we can test it on.
> Tapas doesn't see the crash on ICS-based devices. And I don't see the crash
> even when I try to simulate a 480x800 device on my hamachi. I suspect that
> we now need to be looking at OS-level memory management stuff.  I have no
> idea how to do that, which is why I unassigned myself from this bug. 

I have no idea about the different of OS-level memory management between jb and ics-base platform.
I can just help to find a better way for gaia to improve the memory usage.

Flags: needinfo?(hshih) → needinfo?(dflanagan)

Reporter

Comment 129

•

11 years ago

Comment on attachment 8372737 [details] [diff] [review]
mix willReadFrequently, resize canvas and file-base blob

Review of attachment 8372737 [details] [diff] [review]:
-----------------------------------------------------------------

This patch helps to load 2000 image and 100 videos properly. I can also put gallery app in background during scanning.
Looks like it is saving good amount of memory. Please also note that NUWA has landed in latest FFOS 1.3 build which I tested. NUWA also saves some additional memory.
But I noticed that gallery app shows blank screen if I scroll. Blank screen comes only for 2000 image case. I tested with 100 images and scroll was fine. 
Please let me know if you are still interested to see memory reports/procrank logs for this usecase.

Attachment #8372737 - Flags: review+

Attachment #8372737 - Flags: feedback+

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Updated

•

11 years ago

Flags: needinfo?(tkundu)

Assignee

Comment 130

•

11 years ago

Tapas: What about with larger image sizes? You said earlier that larger images caused an almost immediate crash. You've been testing with 2mp images, but I'm assuming that your camera app will be configured to produce 5mp images. Do those work this this patch applied?

Also, as you've pointed out, this bug suggests significant memory management differences between ICS and JB builds.  The Gallery app is the most memory hungry of the core Gaia apps, and in the past it has pushed us to find and fix memory management bugs.  I'm reluctant to close this bug based only on gaia level fixes, because I think we've only masked an underlying issue.  Both Jerry and I have worked on this bug only at the Gaia level. Do you understand what is going on at the LMK and notify_trigger level? Or do you have a colleague who can investigate this bug at that level?  Even if we land this patch and close this bug, I suggest you open a new one to get to the bottom of the underlying issue.

Jerry: before we land your combination patch, we should measure the performance impact on scanning time. The piece I did to ensure that scanning produces file-backed blobs instead of memory-backed blobs probably slows down the first scan time enough to be considered a regression. I may be able to optimize that with some batching. Tapas gave an unsolicited r+ to the patch, but please don't land until we know more about performance.

Flags: needinfo?(dflanagan)

Assignee

Updated

•

11 years ago

Flags: needinfo?(tkundu)

Flags: needinfo?(hshih)

Assignee

Comment 131

•

11 years ago

Comment on attachment 8372737 [details] [diff] [review]
mix willReadFrequently, resize canvas and file-base blob

Clearing the r+ on this patch because we need to check on performance and clean up the code before landing it.

Attachment #8372737 - Flags: review+ → review-

Assignee

Comment 132

•

11 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #125)
> Using a blob URL will keep the blob alive until it is revoked.  It's cheaper
> than a data: URI certainly, but it's not cheap.  As long as your blobs are
> all backed by the disk it's not a big deal (each blob object takes up a few
> tens of bytes), but if they're in memory I would expect that to be a pretty
> big problem.

Kyle: I've patched the bug that was causing them to be in-memory blobs. But Jerry's memory dumps are still showing 2000 blob urls are taking up 10mb of memory.  See comment #119.

That is not something we're going to fix here in a 1.3 blocker, but I wonder if it is something worth filing a followup bug about. To either reduce memory usage in Gecko or modify the app so that it doesn't hold so many blob urls.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Assignee

Comment 133

•

11 years ago

With no patches applied, my hamachi running 1.3, can scan 200 of the sample photos in 207 seconds. 1.035 seconds per photo.

When I apply the willReadFrequently patch, I can scan 200 photos in 182 seconds or 0.906 seconds per photo.
That's a pleasant surprise: willReadFrequently forces use of a software canvas instead of hardware, so I assumed it saved memory at the expense of time, but I guess it saves us time in this case, too.

Applying the next patch to set the canvas size to 0 after calling toBlob() makes basically no change in time: .905 seconds per photo for 200 photos.

I'll create a rollup patch will all of these fixes applied. But I still want to investigate whether we need to keep the screen on during scanning or if there is some other way to prevent the app from going to the background.

Modifying that patch so that it uses a single shared canvas reduces the time to .897s per photo. This comes at the cost of a fixed size canvas that is never freed. On the target 480x800 device, I think this will be 160kb of memory (for a 200x200 thumbnail). That's < 1% improvement in time, so I'm going to go with the simpler patch that allocates and discards a fresh canvas for each thumbnail.

When I add the patch that ensures that we get a file-backed thumbnail blob instead of a memory backed blob, the time per photo goes up only to 0.922s. This is much, much better than I expected, and is faster than what we started with.

Tweaking the mediadb batch timeout to 1500ms gets this scan time down to about .88s per photo. But this causes a full 15 photos per batch which makes the scan seem very unresponsive. A 500ms timeout gives us batches of 1 and 1000ms often gives us batches that are too big. If I force batches of 3, I get an average time of .882s per photo for 201 photos. That saves 40ms per photo, and seems worth doing.

Reporter

Comment 134

•

11 years ago

(In reply to David Flanagan [:djf] from comment #130)
> Tapas: What about with larger image sizes? You said earlier that larger
> images caused an almost immediate crash. You've been testing with 2mp
> images, but I'm assuming that your camera app will be configured to produce
> 5mp images. Do those work this this patch applied?

Yes it works fine.

> 
> Also, as you've pointed out, this bug suggests significant memory management
> differences between ICS and JB builds. 
Yes JB has ZRAM, KSM etc. 

> Do you
> understand what is going on at the LMK and notify_trigger level? 

It does not seem to me that its a good idea to allow an app to consume >100MB memory on 256MB device. I also don't like the idea of tweaking LMK values for only one app 'gallery'. Still I will look in it and update asap here.

> 
> Jerry: before we land your combination patch, we should measure the
> performance impact on scanning time.

Please also check why we are seeing blank screen during scroll for 2000 images with this patch. I can give you access to my device if you want. Please let me know .

Thanks a lot.

Assignee

Comment 135

•

11 years ago

(In reply to Tapas Kumar Kundu from comment #134)

> 
> > Do you
> > understand what is going on at the LMK and notify_trigger level? 
> 
> It does not seem to me that its a good idea to allow an app to consume
> >100MB memory on 256MB device. 

Wait, are you saying that your device is configured to kill an app that exceeds 100MB? If so, I assume that is the underlying issue right there.  Is the notify trigger stuff set up to send memory pressure notifications when this artificial 100mb limit is neared?  Maybe that is what was going wrong: you're setting a memory limit that is defeating the memory pressure notifications?

I also don't like the idea of tweaking LMK
> values for only one app 'gallery'. Still I will look in it and update asap
> here.
> 

Do we have gallery-specific LMK tweaks on ICS builds? I honestly know nothing at all about this.

> 
> Please also check why we are seeing blank screen during scroll for 2000
> images with this patch. 

You would have seen that without this patch had you been able to scan the images. Our visibility monitor code does not set the backgroundImage of thumbnails that are far off screen. Then it sets backgroundImage as they get closer to scrolling onscreen.  It is a time/memory tradeoff.  If you keep more thumbnails swapped in, it takes more memory but you see fewer blank squares.  You've got a bigger screen on this device so your device has to work harder to load and decode the thumbnails when they're needed.  

There are parameters you can tweak if you want.  Look for visibility monitor in gallery.js. You can change one of the parameters to swap images in more responsively in a tradeoff for jankier scrolling.  Currently we're tuned for 320x480 devices, so possibly you'll want to adjust those values.

But completely unrelated to this bug. You can file a new bug for it, if you want, but I don't think it is something we can block on, because there's really nothing we can do: it is a fundamental memory vs time tradeoff.

Assignee

Comment 136

•

11 years ago

Attached file patch for the 1.3 branch (deleted) — Details

Jerry,

This patch is a cleaned up version of the patch you attached. Note that most of the changes in gallery.js are just an indentation level change.

If you are not comfortable reviewing the mediadb.js changes, please ask dkuo if he can do that part.

Note that with these patches applied, I was no longer seeing the issue of the gallery being killed when the screen had blanked, so I have not included the patch to force the screen to stay on.

I'll produce a separate version of the patch for the master branch.

Assignee: hshih → dflanagan

Attachment #8368974 - Attachment is obsolete: true

Attachment #8369887 - Attachment is obsolete: true

Attachment #8371256 - Attachment is obsolete: true

Attachment #8372692 - Attachment is obsolete: true

Attachment #8372737 - Attachment is obsolete: true

Attachment #8373686 - Flags: review?(hshih)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Assignee

Comment 137

•

11 years ago

Attached file patch for master (deleted) — Details

Jerry: this is the same patch as the other one, but with a couple of merge conflicts manually resolved.

Attachment #8373696 - Flags: review?(hshih)

Reporter

Comment 138

•

11 years ago

>> Is the notify trigger stuff set up to send memory pressure notifications when this artificial 100mb limit is neared?  Maybe that is what was going wrong

It is not the case. LMK sees only available memory and it will kill any app based on pririty. LMK does not seem to be an issue for any other apps. I will raise separate bugid if I see any problem for this later.

>>Look for visibility monitor in gallery.js. You can change one of the parameters to swap images in more responsively in a tradeoff for jankier scrolling.  Currently we're tuned for 320x480 devices, so possibly you'll want to adjust those values.

Thanks for helping. I will raise a separate bugid if needed. 

(In reply to David Flanagan [:djf] from comment #130)
> Jerry: before we land your combination patch, we should measure the
> performance impact on scanning time. The piece I did to ensure that scanning
> produces file-backed blobs instead of memory-backed blobs probably slows
> down the first scan time enough to be considered a regression. I may be able
> to optimize that with some batching.

@mvikram: please suggest

Flags: needinfo?(tkundu) → needinfo?(mvikram)

Mandyam Vikram

Comment 139

•

11 years ago

Tapas, please run a test to determine if the first time scanning of the images has regressed. Let's decide on next steps after.

Flags: needinfo?(mvikram)

John Hu [:johnhu][:johu][:醬糊小弟]

Comment 140

•

11 years ago

Attached patch WIP patch to revoke and recreate URL (deleted) — Details — Splinter Review

Hi all,

With the requesting from Jerry, I had changed the code to revoke the URL for memory testing. A WIP can be found here (without David's patch):

https://github.com/huchengtw-moz/gaia/commit/e8e60896bb4f9ad055a9875a934305daca9d985d

The following report is generated with both this WIP and David's patch, the blob-urls becomes 187 and the memory size of blob-urls goes low. That means the memory usage of blob's buffer is freed when we revoked the url. And blob's buffer is loaded to memory when we recreate the url and set it as the background image.

Note, this WIP patch breaks some unit tests and regression tests of gallery app.

0.68 MB (100.0%) -- blob-urls
└──0.68 MB (100.0%) -- owner(app://gallery.gaiamobile.org/index.html)
   ├──0.00 MB (00.55%) ── blob:02c41822-f958-44bb-adfb-84b792909deb (1)
   ├──0.00 MB (00.55%) ── blob:03a03b34-322b-4bcc-88d1-c16385535c75 (2)
   ├──0.00 MB (00.55%) ── blob:03f8edae-8894-4aad-8afe-5ea395b7a62c (3)
   ...
   ├──0.00 MB (00.55%) ── blob:09a9f20f-1efe-4f81-b728-b4d53bef5284 (187)


 0.00 MB ── canvas-2d-pixels
 0.00 MB ── gfx-surface-image
 0.00 MB ── gfx-textures
       0 ── ghost-windows
 0.00 MB ── gralloc
 9.21 MB ── heap-allocated
10.34 MB ── heap-committed
  12.25% ── heap-overhead-ratio
     187 ── host-object-urls ######
 0.00 MB ── imagelib-surface-cache
 0.29 MB ── js-main-runtime-temporary-peak
       1 ── page-faults-hard
 539,809 ── page-faults-soft
31.93 MB ── resident
15.21 MB ── resident-unique
96.86 MB ── vsize

Assignee

Comment 141

•

11 years ago

Tapas: see comment 133. I did the timing tests, and it turns out that this patch actually improves scanning speed as well as reduces memory usage.

Comment 142

•

11 years ago

(In reply to David Flanagan [:djf] from comment #133)
> When I apply the willReadFrequently patch, I can scan 200 photos in 182
> seconds or 0.906 seconds per photo.
> That's a pleasant surprise: willReadFrequently forces use of a software
> canvas instead of hardware, so I assumed it saved memory at the expense of
> time, but I guess it saves us time in this case, too.

With SkiaGL and with read operation, we may spend a lot of time loading data from GPU(from GL texture).
With GPU, it is good for rendering, but may be not good for reading data back.

Flags: needinfo?(hshih)

Comment 143

•

11 years ago

Comment on attachment 8373686 [details]
patch for the 1.3 branch

Hi Dominic,
Could you review this patch?
I'm not a js expert.
Thanks!

Attachment #8373686 - Flags: review?(hshih) → review?(dkuo)

Updated

•

11 years ago

Attachment #8373696 - Flags: review?(hshih) → review?(dkuo)

Assignee

Comment 144

•

11 years ago

John and Jerry,

Please file a separate bug for investigation of the blob urls.  I feel that fixing that is too risky for a 1.3 patch.

John: what do you mean when you say "blob's buffer"?  These are file-backed blobs, not memory-backed blobs, and creating a URL that references them should not be loading them into memory.  Do you have any idea why each blob URL takes 3.4kb?  Is that actually the size of the thumbnail blobs? Gecko engineers tell me that creating a blob URL is cheap and that referencing a file backed blob with a blob url will not bring it into memory.  If so, why are they so expensive here?  I wonder if there is a gecko bug here.

Also, before we go too far down the road of eliminating the blob urls, I also want to consider the possiblity of eliminating the visibility monitor and converting the thumbnails to <img> elements because I've been told that gecko will now do the image decoding and freeing based on visiblity for us.  But if we did that, then we would have to have blob urls for each thumbnail.  So I don't want to eliminate them right away before investigating why they are not cheaper.

Please don't reply here, though. Let's start a new bug for this.

Assignee

Comment 145

•

11 years ago

John: I'd be happy with your review, too, if you're free and Dominic is not, especially since you've already started looking at this.

Gabriele Svelto [:gsvelto]

Comment 146

•

11 years ago

Move the blob-url object problem to bug 970801

Comment 147

•

11 years ago

(In reply to Tapas Kumar Kundu from comment #134)
> It does not seem to me that its a good idea to allow an app to consume
> >100MB memory on 256MB device.

Actually we spent a great deal of work in the v1-train timeframe to ensure that a large, foreground application could stay alive on 256MB devices and I don't really see any downsides to this. So I don't think we should adjust LMK parameters in this case and certainly not to catch memory regressions: I'd be far more keen on adding additional testing (maybe marionette-based?) that reproduces known high memory usage scenarios and automatically extracts memory profiles to ensure we're not regressing.

Flags: needinfo?(gsvelto)

Hema Koka [:hema]

Updated

•

11 years ago

Target Milestone: --- → 1.4 S1 (14feb)

Dominic Kuo [:dkuo]

Comment 148

•

11 years ago

(In reply to Jerry Shih[:jerry] from comment #143)
> Comment on attachment 8373686 [details]
> patch for the 1.3 branch
> 
> Hi Dominic,
> Could you review this patch?
> I'm not a js expert.
> Thanks!

No problem, and I just finished reading all the comments and now I know what's going on here, will continue to review and test the patches later.

Dominic Kuo [:dkuo]

Comment 149

•

11 years ago

Comment on attachment 8373696 [details]
patch for master

David,

I was not involved in this bug thread at beginning, but hopefully I am on the right track and review your patch correctly. From the patch I saw four major changes:

1. The |willReadFrequently| flag of canvas.getContext() is set to true, which Gabriele explained in comment 97 that it can save memory, and with your test result it saved the time as well.
2. Set the width and height of the canvas to 0, then purge the context and canvas to force it to release the memory after we created the blobs. This is what Jerry found in comment 83 and fixed in comment 92.
3. Add getFileInfo() in MediaDB so that MediaDB can return file-backed blobs instead of memory-backed blobs while adding the thumbnail to the DOM tree. This is because you guys found memory-backed cost too many memory, we should use file-backed blobs form the disk.
4. Adjust the batch size and timeout to save more time for parsing each images. This is because the flow of parsing images was changed a little bit, so gallery has to adapt it by tweaking the batch size and timeout.

My configuration is different from everyone's, it's inari but should be similar to hamachi. After applied David's patch, I did saw the memory usage dropped by using b2g-procrank at first launch, though I didn't get crash before/after applied the patch, probably it's because I only test few times.

So if my understanding is correct then this patch looks good to me! David, please correct me if I am wrong with the description above, thanks. And everyone rocks!

Attachment #8373696 - Flags: review?(dkuo) → review+

Dominic Kuo [:dkuo]

Comment 150

•

11 years ago

Comment on attachment 8373686 [details]
patch for the 1.3 branch

The patch for 1.3 is also r+ because it just fixed the conflicts.

Attachment #8373686 - Flags: review?(dkuo) → review+

Inder

Comment 151

•

11 years ago

Tapas, please try out this patch and verify if it solves the issue on our end.

Flags: needinfo?(tkundu)

Assignee

Comment 152

•

11 years ago

Landed on master: https://github.com/mozilla-b2g/gaia/commit/fdc7a7e422470492e2ea107f8d48651bb5d4ebf2
Landed on 1.3: https://github.com/mozilla-b2g/gaia/commit/b1a249ba6e9db3c1d0f91530e3c3a91ce9ee091f

These are solid memory wins for the gallery app, so I'm going to resolve this bug.  Tapas, if you find that this does not resolve the issue for you, let's open a new bug.

Status: NEW → UNCONFIRMED

status-b2g-v1.3: --- → fixed

status-b2g-v1.3T: --- → affected

status-b2g-v1.4: --- → fixed

Ever confirmed: false

Assignee

Comment 153

•

11 years ago

Set the status wrong. Marking resolved now.

Status: UNCONFIRMED → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Assignee

Comment 154

•

11 years ago

We almost certainly want to be sure this gets on the 1.3T branch since it is memory-related. I don't know who handles those uplifts. I would guess that the 1.3 patch would uplift cleanly if anyone wants it.

Dave Huseby [:huseby]

Updated

•

11 years ago

Flags: needinfo?(dhuseby)

Reporter

Updated

•

11 years ago

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 155

•

11 years ago

(In reply to Inder from comment #151)
> Tapas, please try out this patch and verify if it solves the issue on our
> end.

works as expected

Mike Lee [:mlee]

Comment 156

•

11 years ago

Joe, see David's comment 154 re: 1.3T inclusion.

Flags: needinfo?(mlee) → needinfo?(jcheng)

Joe Cheng [:jcheng] (please needinfo)

Comment 157

•

11 years ago

Thanks
1.3T? to get this in Tarako triage

blocking-b2g: 1.3+ → 1.3T?

Flags: needinfo?(jcheng)