Closed Bug 847060 Opened 12 years ago Closed 12 years ago

[Buri][Gallery]When try to open some big size photo, gallery is force closed

Categories

(Firefox OS Graveyard :: Gaia::Gallery, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:tef+, b2g18 fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 fixed)

VERIFIED FIXED
blocking-b2g tef+
Tracking Status
b2g18 --- fixed
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- fixed

People

(Reporter: sync-1, Assigned: djf)

References

Details

(Keywords: crash, Whiteboard: [b2g-crash][triaged])

Attachments

(8 files, 2 obsolete files)

Firefox v1.0.1, build identifier: 20130228114642 +++ This bug was initially created as a clone of Bug #413789 +++ DEFECT DESCRIPTION: (1)Go to Gallery, there are a lot of pictures(hundreds) (2)Scanning and Thunbmails display are all OK (3)But when try to open some big size photos, the Gallery APP will quit(Force closed) --- KO And when back to homescreen, you will find the homescreen was killed before and relaunched. REPRODUCING PROCEDURES: EXPECTED BEHAVIOUR: Maybe it is due to Lack of RAM. But, at least, if the full size photo can not be displayed, please just display some error information to end user , such as "Size is too large to open..." , but APP force close is not acceptable. ASSOCIATE SPECIFICATION: TEST PLAN REFERENCE: TOOLS AND PLATFORMS USED: USER IMPACT: REPRODUCING RATE: 100% For FT PR, Please list reference mobile's behavior: ++++++++++ end of initial bug #413789 description ++++++++++
partner will provide sample photos
Created an attachment (id=356845) The photo sample will cause Gallery force close
Created an attachment (id=356845) The photo sample will cause Gallery force close
Here in my mobile, if you try to open this photo from thumbnail view, the Gallery will quit 100%. But I am not sure , if only this picture alone can reproduce the problem.As I have nearly 600 pictures on my SD card, the Gallery seems always in scanning mode.
blocking-b2g: --- → tef?
Michael, qawanted should have a comment attached saying what's wanted. Can you clarify please?
Flags: needinfo?(mvines)
I am not able to reproduce a crash using: Gecko http://hg.mozilla.org/releases/mozilla-b2g18/rev/a56593f84577 Gaia 69cfb9f7d00e5ee7886e40645240686091b6b57e BuildID 20130311095652 Version 18.0 I enabled USB mass storage and copied the picture over to my SD card, which doesn't have quite as many pictures as the reporter is mentioning in Comment 4. If the app crashes consistently, is there a crash report in the b2g/mozilla directory?
Sorry, qawanted to attempt to duplicate this issue. Looks like Marcia is not having any luck. Any hints sync-1@bugzilla.tld?
Flags: needinfo?(mvines) → needinfo?(sync-1)
Yes , here the gallery will crash consistently, But I am not sure how the crash report looks like in the b2g/mozilla directory? I didnt
Flags: needinfo?(sync-1)
Yes , here the gallery will crash consistently, But I am not sure how the crash report looks like in the b2g/mozilla directory? I didn't find anything looks like a crash report in that directory. Shall I activate any menu to enable crash report? And also , I can not find any tombstone dumps. And further information, now on my mobile, the attached picture will not cause gallery crash 100%, but, when you scrolling the picuture one by one, just after switching 3-4 pictures, gallery will crash 100%. And at that time, the RAM occuppied by Gallery already raise up to above 70M, which already kill homescreen and pre-lauched app. So it is not really related to one dedicated picture, it may be related to the whole pictures on my SD card, who make the Gallery occupy too much RAM, then it is killed by low memory killer.
You can check for minidumps by looking for .dmp files in /data/b2g/mozilla/Crash Reports/pending/ on device, but if the LMK is killing the gallery app then there will be no .dmp file (instead check |dmesg| for LMK activity). Would it be possible to debug your SD card contents a little so try to see exactly what on the SD card is causing this issue?
Attached file 5 photos to reproduce the Gallery crashing (obsolete) (deleted) —
Please just copy the 5 photos to your SD card. And try to use Gallery, click any picture to open detail picture view, then try to switch photos by sliding , just need to switch 3 times, the Gallery will be killed. And also it is very strange, seems it is killed by OOM, but not LMK. But with my SD card, sometimes, Gallery will be killded by LMK, sometimes, by OOM. Wish you can reproduce it in your side.
Thanks! qawanted on ^^^ pretty please.
I'll take this one. Will get you an answer presently.
QA Contact: jhammink → gmealer
Can someone clarify if this is specific to Buri? If so, we don't have a build that's really valid for testing yet.
Phil has a device, so I've asked him to try to dup reproduce here
Flags: needinfo?(pgravel)
OK. I'll keep qawanted live and monitored until I see it's handled. Thanks!
(tef+ cause Phil did duplicate this today on the Buri) Phil, please do post your findings here as soon as possible
blocking-b2g: tef? → tef+
OK, cool. Looks handled, removing qawanted. Replace if you need something else.
Keywords: qawanted
Since this is blocked on Phil, assigning to him so that it's not assigned to 'nobody' - when it's ready for action you can re-assign or put back to 'nobody' so it turns up on our triage plate for bugs needing resources.
Assignee: nobody → pgravel
This seems to be an issue with large photos rather than number of pictures stored on the card. I have been able to easily "crash" the gallery app by simply adding 10 copies of the picture provided and very quickly scrolling through them (quickly enough that the image doesn't have time to decode and show on screen). Usually by the time I get to the 5 or 6th picture, we get a low-memory memory-pressure event and gecko ends up killing the gallery app. It may be worth investigating how the gallery app handles low-memory scenarios. Additionally it may be a good idea to look at doing partial jpeg decoding to reduce the amount of memory used by larger pictures. Note: this not limited to buri v1, easily reproducible on other devices with v1 and up.
Assignee: pgravel → nobody
Flags: needinfo?(pgravel)
Assignee: nobody → dflanagan
(In reply to buri.blff from comment #11) > Created attachment 723891 [details] > 5 photos to reproduce the Gallery crashing > I can't open the attachment. My unrar utility says it isn't a valid rar archive. Zip or tar or 5 individual attachments would be easier.
The gallery app refuses to display images if they are larger than 5 megapixels or if the file is larger than 3megabytes to avoid this kind of problem. The first attached image is just under 5 megapixels, so about the largest image we can display. If it doesn't have an EXIF preview image, then we have to decode the entire thing, which takes 20mb of memory. Gallery keeps 3 images decode at a time, so three copies of this image would require 60mb. That shouldn't be enough to crash yet, though. I think this is a gecko regresssion. Something has probably changed in the way gecko releases image memory when images are no longer used. I'll investigate a bit, but won't be able to do much unless I can reproduce it on my unagi device. Also, I'll probably need assistance from the gekco team. So cc'ing jlebar. Justin: has anything changed in the way we release image memory?
Keywords: regression
Blocks: 848916
I can reproduce this on unagi by pushing a few copies of the first attached image to the sdcard and then panning back and forth between them in the gallery. After 6 or 8 times, the gallery crashes. I really do think this is a regression in gecko. But if we can't get it sorted out there, we may have to further restrict the size of images that we'll display in the gallery. Or maybe create preview images on the sdcard for images that do not have previews of their own? (That, however, would be a complicated change to make at the last minute like this.)
Attached file 5 photos to reproduce the Gallery crashing in ZIP (obsolete) (deleted) —
It is strange, seems the bugzilla has problem. We always found the RAR files uploaded will be corrupted then, while in fact, my original rar fire is OK. Try ZIP format this time.
Attached image photo1 (deleted) —
Zip is not working either. :(
Attached image photo2 (deleted) —
Attached image photo3 (deleted) —
Attached image photo4 (deleted) —
Attached image photo5 (deleted) —
> Justin: has anything changed in the way we release image memory? What branch are you testing on? It may be easiest to do |git log image| and look through yourself; that would be a pretty good approximation. There's an assumption here that apps always have at least 60mb of memory available to them. That is not necessarily true (e.g. bug 850893).
Justin, I wonder if this is a consequence of turning image locking on to fix bug 807143? I've only just now read through that bug and found out how it was resolved, so maybe the gallery just hasn't been exercised in this way (extra large images without previews) since that change happened. I'll have to experiment with the setting and see if. Though I imagine it will be too risky to change the setting for v1.0.1 even if it would fix this bug.
> I wonder if this is a consequence of turning image locking on to fix bug 807143? That change was made four months ago, so I kind of doubt it's the cause of the "regression" here. > Though I imagine it will be too risky to change the setting for v1.0.1 even if it would > fix this bug. Yes. Image-locking disabled is not a supported configuration.
Keywords: crash
I can reproduce the crash with the 5 attached photos. I have to pan back and forth among them for a while, but I get the crash.
I get the crash even if I change the image locking pref.
And I get the crash even if I change image.mem.min_discard_timeout_ms from 10,000 to 1,000.
Each of the 5 attached images are just about 5mp, which means they require 20mb of image memory when decoded. (They have embedded preview images, but too small for us to use, so we're decoding the entire thing). Using adb shell b2g-ps, I see that the gallery app starts off using 70mb. When I tap the first thumbnail, it displays the first image, and pre-loads the second image (to be ready in case the user swipes). That requires 40mb, and, as expected, memory usage goes up to 110mb. If I swipe sideways, the second image is displayed, the 1st image is maintained off screen, and the 3rd image is loaded too, and memory usage goes up to 130mb. That's all as expected. Now if I swipe again, the 3rd image is displayed, the 2nd and 4th are held offscreen, and the 1st image is released. Memory usage goes up to 150mb because gecko does not release the memory for the first image right away. After about 30 seconds I see memory go back down to 130mb. If I swipe one or two more times before This is all as expected, except that in the past, the memory for images we were no longer displaying was released within a couple of seconds, not 30 seconds.
If I install a unagi nightly build based on m-c instead of b2g-18, then there is no 30 second delay. Memory usage goes up to 130mb when there are three images decoded, but when I swipe sideways to bring in a new image, the memory usage stays at 130mb. Or, if it spikes upward, the spike lasts just a second or so. With m-c it is much, much harder to crash the gallery app. I can still do it, but only if I zoom in on one of the photos. Just swiping among the 5 5mp images will not cause a crash. (I don't know why zooming should affect anything... nothing is being decoded.) But anyway, there is some difference between m-c and b2g-18 that we need to figure out.
Needinfo jlebar Justin, would you read over comments 33 to 37 and see if it rings any bells for you? Can you think of anything that has changed between m-c and b2g18 that might cause this issue?
Flags: needinfo?(justin.lebar+bug)
Asking Joe for info too. Joe: same question that I asked Justin above... why isn't image memory being freed up in b2g18 the way it is in m-c?
Flags: needinfo?(justin.lebar+bug) → needinfo?(joe)
A lot has changed with regard to how gecko handles images in mozilla-central; Firefox on desktop, for example, now discards images that aren't being displayed. I suspect that you're seeing this effect. > in the past, the memory for images we were no longer displaying was released within a couple > of seconds, not 30 seconds. Is it possible for you to find a regression range for this?
Flags: needinfo?(joe)
As noted in comment 37, I can still crash the app even with m-c, so clearing up this gecko issue won't be enough to close this bug. The main issue is that the images causing this crash are not from the phone's own camera, and they do not include EXIF preview images that are big enough for us to use. One way to handle this would be to make the gallery app create previews for images that do not have them. This would be done during metadata parsing while we're creating the thumbanil image. Unlike thumbnails, however, the preview image would be saved to the sdcard. That might be a bigger change than we want for a tef+ bug, however. So I'm going to look into modifying the shared/js/media/media_frame.js class so that if it is invoked without a preview image, it creates one for itself and as soon as it can replaces the fullsize image with the preview image. So when loading a 5mp image, we take the 20mb hit to decode it, but then can quickly free up most of that memory by switching to a screen-size image
That's bug 661304, btw.
> > > in the past, the memory for images we were no longer displaying was released within a couple > > of seconds, not 30 seconds. > > Is it possible for you to find a regression range for this? But I can set a flag, at least :-) QA: if you've got time, I'd love help finding out if there is an old nightly build unagi that does not crash when swiping back and forth between these five images. I'm pretty sure (but not 100% certain) that the app used to be able to handle this.
Also: has the app changed?
Clearing the regression and regressionwindow-wanted flags. I can reproduce the crash going back as far as December. So maybe I just never stress tested the app with images this size before. Joe: yes the app has changed a lot over the months. But now that I'm not convinced this is a regression that may not matter. I've got lots of code in there to try to be very careful about not keeping images around when they're not needed. So the 30 second wait to free up memory is frustrating! I wonder if I've actually got a JS memory leak of some sort happening and the 30 seconds is waiting for the JS to run a garbage collection or cycle collection pass or something... Though I don't know why that would matter, though. I'm not relying on the GC to free up an img element. I'm explicitly setting the src attribute to null which should, I assume, make the image memory discardable. I should investigate whether removing and recreating the img elements (instead of just changing their src) makes a difference. Joe or Justin: I can't find the gecko code that actually unlocks and discards an image when its src attribute is changed... any pointers?
> I can't find the gecko code that actually unlocks and discards an image > when its src attribute is changed. http://hg.mozilla.org/mozilla-central/file/1d6fe70c79c5/content/base/src/nsImageLoadingContent.cpp#l723
(In reply to David Flanagan [:djf] from comment #45) > I should investigate whether removing and recreating the img elements > (instead of just changing their src) makes a difference. No difference. It still takes 30 seconds for the memory to be freed.
Trying to release image memory with img.src = '' instead of img.src = null; makes no difference either.
Releasing image memory with img.removeAttribute('src') makes no difference.
I'm measuring memory consumption using adb shell b2g-ps. I wonder if gecko is freeing image memory right away, but the linux process isn't returning the memory to the system right away, and that is where the 30 second delay comes from? I wonder who would be able to tell me about how that works...
If they're bigger than 1mb and we're free()'ing them, they should get returned to the OS immediately. See huge_dalloc in memory/mozjemalloc/jemalloc.c (I did that from memory, so hopefully those are the right names). I dunno what's going on here. Maybe an imagelib person can help.
Does the 30 seconds change if you change image.mem.min_discard_timeout_ms?
(In reply to Joe Drew (:JOEDREW! \o/) from comment #52) > Does the 30 seconds change if you change image.mem.min_discard_timeout_ms? At one point I changed that pref from 10,000 to 1,000 and still got the crash, but I didn't actually check the memory and time it. I'll give that a try. I've been editing that pref directly in omni.ja and just pushing the modified file back to the phone. That should work, after a reboot, right?
(In reply to Joe Drew (:JOEDREW! \o/) from comment #52) > Does the 30 seconds change if you change image.mem.min_discard_timeout_ms? I changed it from 10s to 1s and it didn't make a difference.
Whiteboard: triaged
Attached file link to test case pull request (deleted) —
The attachment adds a test to the UITests app with buttons to decode and release a 5mp image. I find that image memory is sometimes released immediately. And sometimes after about 10 seconds. I can't duplicate the 30 second delay. This test does not simulate the CSS transforms that the gallery app uses, however.
Attachment #723891 - Attachment is obsolete: true
Attachment #725751 - Attachment is obsolete: true
Attached file link to patch on github (deleted) —
The attachment is a work-in-progress patch that creates a preview image during metadata parsing so that we can display images later without having to decode the full-size image (unless the user zooms in). What this is doing is moving the danger of OOM from image display time to metadata parsing time. In order to avoid crashing during metadata parsing, the code actually inserts 2500ms of idle time after decoding and creating the preview for a 5mp image. This will make scanning absurdly slow when users push large images without embedded previews (or previews that are not as big as the screen size) to the device. Images from the camera have big embedded previews, so they won't be affected by this.
The attached patch needs more testing and code cleanup before landing. But I think this is the best fix I'm going to be able to come up with in gaia. Anyone who wants to tackle the underlying gecko issues, please jump in!
With more testing, I've found that I need to wait 3 seconds after decoding each 5mp before resuming the scan/metadata parsing process. Updated patch coming. Currently gallery rejects any image of more than 5 megapixels. I wonder if I should modify that to allow larger images if they have a big preview but only allow images up to 3mp if they don't have a big enough preview image... (The "big enough preview" test is based on the screen size, so performance will be different on larger screens) I wish there was a way to tell gecko to decode a large image into a smaller image, downsampling as it decodes, so that I could decode a 5mp image into a screen-sized preview and hardly use any memory at all. I've tried setting the image element width and height before setting the src attribute, hoping that I'd achieve this, but it does not seem to work.
Comment on attachment 728539 [details] link to patch on github Dominic, This patch tries to avoid OOM when displaying large images by always creating a preview for large images that do not have embedded preview. The main changes are in MetadataParser.js.
Attachment #728539 - Attachment description: link to work in progress patch → link to patch on github
Attachment #728539 - Flags: review?(dkuo)
Reducing the image.mem.min_discard_timeout_ms from 10000 to 200 doesn't have a huge impact on the amount of time I've got to wait after creating a preview for a large image. If I wait 1500ms, I'll still have OOMs if there are a bunch of preview-less 5mp images on the sdcard. Changing the preference does not seem to actually reduce the time required to discard the images.
I've filed bug 854783 as a metabug for all of the gecko support we need to really fix this bug completely. Until we get the gecko support, the gallery will always be vulnerable to OOM crashes.
Sorry guys I missed this one... I will start reviewing the patch.
Comment on attachment 728539 [details] link to patch on github David, This patch looks good, and I have also tested with the attached images, gallery does not OOM again when we try display the previews then show the original images in fullscreen view. Although the modified parser takes a while to parse the new images which do not have previews but needs previews, I think it's worthy because at least we don't get OOM when we try to diaplay images later. The OOM problems kill gallery again and again but I think you did a really great job on fighting with it! I also hope in the future maybe we can have new api(such as downsampling...) on image element then we don't have to do so many "back-end" like works.
Attachment #728539 - Flags: review?(dkuo) → review+
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Landed in v1.0.1: https://github.com/mozilla-b2g/gaia/commit/381103c6cdbc491591b752fe343b662307f72957 Looks like I set the status flags in the wrong order. But they're set correctly now.
David, this patch added some gjslint errors. Would you please fix them (in all branches) ? That's only 2 missing semicolons. I think you can push these changes directly as a small follow-up to this bug, no need to file a new bug and request a review for that. Thanks
Created a test case covering this issue: https://moztrap.mozilla.org/manage/cases/?filter-id=6933
Flags: in-moztrap+
I'm reopening this because bug 854799 has just been fixed and has been granted tef+ status. With that fix in gecko, I can remove the horrible 3-second timeout hack that I landed here. (It is easier to reopen this bug and reuse its tef+ status than it is to create a new bug and get it marked +.)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
As long you uplift the change yourself... BTW if you pushed the lint fixes in comment 67 would you put the commit hashes here too ?
Verify result, not totally OK. The previous problem is OK, when open the picture or slide between these pictures, there is no crash. But, because the scanning is very slow, it will cost long time to scanning. And while scanning, open one 5M pixel picture, then try to zoom it, enlarge it, then the Gallery will crash. --- NOK And if the scanning is finished, try to zoom, it can enlarge to the biggest size. No crash.
(In reply to sync-1 from comment #71) > But, because the scanning is very slow, it will cost long time to scanning. > And while scanning, open one 5M pixel picture, then try to zoom it, enlarge > it, then the Gallery will crash. --- NOK It's probably best to open a new bug for this new issue and close this bug.
I agree on sync-1@bugzilla.tld. This patch introduces *hugh* performance regressions in scanning a SD card. With my 4MP test photo's scanning the SD card takes 5 seconds per photo. I have also managed to get an out of memory exception while flicking through the photo's on my Unagi while scanning. Overall it seems more stable though than before.
We are closing this one as having fixed what it was meant to. Bug 854795 deals with decoding the image into a thumbnail size, and please add the number for the SD card performance issue into the notes so that we can track it.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Severity: normal → critical
Whiteboard: triaged → [b2g-crash][triaged]
Tested in ikura device with: Gecko http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/2b44e2c40cc1 Gaia 3e7dda88950ae09166109783adea8181eb857d21 BuildID 20130414230203 Version 18.0 Working fine.
Status: RESOLVED → VERIFIED
No longer blocks: 854795
I wish you hadn't closed this Milan. In order to fix the performance regression caused by this bug, I was waiting on bug 854799 to land and be uplifted. Now that it has, I can fix the regression. But I no longer have an open tef+ blocker to work on it. If I don't have an open high-priority bug in bugzilla assigned to me, I forget to do things! Now I have to file a new bug and get it through triage.
Okay bug 861965 is where I'll address the metadata scanning performance regression introduced by the fix to this bug.
Depends on: 861965
I've seen way too many bugs become incomprehensible because we force multiple separate landings into the same bug, and thus I think that the fact that our system so strongly disincentives multiple bugs is a huge failing. I've said as much in more than one long e-mail thread to release drivers, but they have not changed their tune. We need to continue beating this drum, because I think that's the only way things will change. Anyway, I've tef+'ed your bug.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: