Closed
Bug 851626
Opened 12 years ago
Closed 11 years ago
[B2G][Camera][Gallery] Crash when switching repeatedly between Gallery and Camera apps
Categories
(Firefox OS Graveyard :: Gaia::Camera, defect, P2)
Tracking
(firefox26 affected, firefox27 affected, b2g18 unaffected)
RESOLVED
WONTFIX
1.2 C3(Oct25)
Tracking | Status | |
---|---|---|
firefox26 | --- | affected |
firefox27 | --- | affected |
b2g18 | --- | unaffected |
People
(Reporter: jcouassi, Unassigned)
References
Details
(Whiteboard: [mozilla-triage][MemShrink:P2] [TD-59414])
Attachments
(14 files)
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/x-log
|
Details | |
(deleted),
application/x-zip-compressed
|
Details | |
(deleted),
application/x-zip-compressed
|
Details | |
(deleted),
application/x-zip-compressed
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
application/octet-stream
|
Details | |
(deleted),
text/x-python
|
Details | |
(deleted),
application/x-zip-compressed
|
Details | |
(deleted),
application/x-zip-compressed
|
Details | |
(deleted),
application/x-zip-compressed
|
Details | |
(deleted),
application/x-xz
|
Details | |
(deleted),
text/plain
|
Details |
Description:
You can switch from Camera to Gallery over and over then Camera Crashes
Repro Steps:
1) Updated to Unagi Build ID: 20130314114915
2. Launch Gallery Application from homescreen
3. Tap on Camera Icon (From inside Gallery)
4. Tap on Gallery Application (From inside Camera)
5. Repeat steps 4 or 5 times.
6. View what happens
Expected
Device switches back and forth to proper application
Actual:
After switching back and forth at a fast past you will get a notification saying that Camera has crashed
Repro frequency:
5/5 100%
Environmental Variables:
Kernel Date: Dec 5
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/8e9dd87b4f3b
Gaia: 69dbcd84085f10bec0c0189b926ffb535b14dcfe
Notes:
Checked on Master build as well. Issue repros
Log attached to bug.
Crash report: https://crash-stats.mozilla.com/report/index/fb45a703-39cd-48b1-8c6c-fa5532130315
Signature android::GonkCameraHardware::PullParameters More Reports Search
UUID fb45a703-39cd-48b1-8c6c-fa5532130315
Date Processed 2013-03-15 18:39:05
Process Type content
Uptime 6
Install Age 20.4 hours since version was first installed.
Install Time 2013-03-14 22:02:27
Product B2G
Version 18.0
Build ID 20130314114915
Release Channel nightly
OS Android
OS Version 0.0.0 Linux 3.0.8-perf #1 PREEMPT Wed Dec 5 04:47:49 PST 2012 armv7l toro/full_unagi/unagi:4.0.4.0.4.0.4/OPENMASTER/eng.cltbld.20130306.101604:user/test-keys
Build Architecture arm
Build Architecture Info
Crash Reason SIGSEGV
Crash Address 0x18
User Comments
App Notes
EGL? EGL+ GL Context? GL Context+ GL Layers? GL Layers+
Processor Notes sp-processor02.phx1.mozilla.com_2251:2008; this crash has been processed more than once; WARNING: JSON file missing Add-ons; exploitablity tool: ERROR: unable to analyze dump
EMCheckCompatibility False
Winsock LSP
Adapter Vendor ID
Adapter Device ID
Device toro unagi1
Android API Version 15(AOSP)
Android CPU ABI armeabi-v7a
Bugzilla - Report this bug in B2G, Core, Plug-Ins, or Toolkit
Related Bugs
850845 NEW --- Camera - crash when trying to open a second camera instance
Crashing Thread
Frame Module Signature Source
0 libxul.so android::GonkCameraHardware::PullParameters GonkCameraHwMgr.cpp:281
1 libxul.so mozilla::nsGonkCameraControl::PullParametersImpl GonkCameraControl.cpp:855
2 libxul.so mozilla::nsGonkCameraControl::Init GonkCameraControl.cpp:233
3 libxul.so InitGonkCameraControl::Run GonkCameraControl.cpp:179
4 libxul.so nsThread::ProcessNextEvent nsThread.cpp:620
5 libxul.so NS_ProcessNextEvent_P nsThreadUtils.cpp:237
6 libxul.so nsThread::ThreadFunc nsThread.cpp:258
7 libnspr4.so _pt_root ptthread.c:191
8 libc.so __thread_entry pthread.c:217
9 libc.so pthread_create pthread.c:357
Comment 2•12 years ago
|
||
This looks a lot like the crash in bug 850845, for which there is a patch pending (and that will land as soon as m-i is reopened).
I'm a little concerned that we can reach this point, though; even with the aforementioned fix, the camera will fail to start even if it doesn't crash.
Comment 3•12 years ago
|
||
Unable to reproduce on unagi with:
- gecko: inbound-src:c45d34db0d69
- gaia: c4d153b9f2f079400ce0eac73ea04137098230a0
Repeated STR steps 3 and 4 20+ times. Will try on b2g18 branch.
Comment 4•12 years ago
|
||
Unable to reproduce on unagi with:
- gecko: b2g18:a827df06cffb
- gaia: c4d153b9f2f079400ce0eac73ea04137098230a0
I'm running a DEBUG build, however, which may slow things down enough to hide a race condition. Will try with a non-DEBUG build.
Comment 5•12 years ago
|
||
Unable to reproduce with non-DEBUG build.
Issue repros on
Unagi Build ID: 20130404070202
Kernel Date: Dec 5
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/da523063aa7b
Gaia: a845be046c5d3cb077e3c78f963ca5c079e7ab3d
Once you switch back in forth from gallery to camera around 7 or 8 times the camera/gallery crashes and it takes you back to the homescreen.
Comment 7•12 years ago
|
||
(In reply to Jeni from comment #6)
> Issue repros on
>
> Unagi Build ID: 20130404070202
I am unable to reproduce this issue using this specific build, even after 60+ switches (30+ completely cycles) between the camera app and the gallery.
Jeni, can you try repeating this test with a fresh, empty memory card? (Please _don't_ erase the memory card you're currently using--if this does turn out to be a crash due to a specfic image, we'll need to examine your images to determine the cause.)
(In reply to Mike Habicher [:mikeh] from comment #7)
>
> I am unable to reproduce this issue using this specific build, even after
> 60+ switches (30+ completely cycles) between the camera app and the gallery.
>
> Jeni, can you try repeating this test with a fresh, empty memory card?
> (Please _don't_ erase the memory card you're currently using--if this does
> turn out to be a crash due to a specfic image, we'll need to examine your
> images to determine the cause.)
Unable to repro issue with only 37 pictures/39.3 mb used. SD card used on the device that has 2.3 GB used for pictures issue does repro.
Comment 9•12 years ago
|
||
Thanks, Jeni!
Sounds like an out-of-memory issue. djf?
Flags: needinfo?(dflanagan)
Comment 10•12 years ago
|
||
I've more-or-less given up on OOMs with gallery. People keep testing it by putting big honking 5 megapixel images that don't have good EXIF previews on their SD cards. Gecko can't handle it. See bug 854783. In general, Gallery can do a good job with photos from the Camera app. But given the current limitations of gecko, it cannot handle large images gracefully. If we get a fix for bug 854799, that will go a long way to fixing the problem.
Jeni, is the gallery app scanning photos when this crash occurs (crawling ants animations at the top of the screen)? If so, and if the photos are not photos from the camera app, then it is probably eating up lots and lots of memory, which means that other apps get killed to free up more memory. (And then, if we're unlucky, the gallery app gets killed too). And if this is the case, then this probably has nothing to do with switching back and forth between apps. If gallery is trying to scan a bunch of big images, other apps are going to be killed to make room. This is basically normal.
So, if scanning is happening when this occurs, then I recommend closing this bug as a dupe of bug 854783.
Flags: needinfo?(dflanagan) → needinfo?(jcouassi)
Comment 11•12 years ago
|
||
On the other hand, I don't usually see a "app has crashed" notification with OOMs, so if there is actually a notification and a real crash report, then maybe something else is going on. Mike, can you tell if there is a real crash here? Could the app being killed because of memory pressure be causing a crash somehow?
Flags: needinfo?(mhabicher)
Comment 12•12 years ago
|
||
djf, to determine if an app is killed due to OoM, you need to look in the kernel logs: 'adb shell dmesg'. I don't remember the exact strings, but they're pretty obvious.
Flags: needinfo?(mhabicher)
Reporter | ||
Comment 13•12 years ago
|
||
(In reply to David Flanagan [:djf] from comment #10)
> Jeni, is the gallery app scanning photos when this crash occurs (crawling
> ants animations at the top of the screen)? If so, and if the photos are not
> photos from the camera app, then it is probably eating up lots and lots of
> memory, which means that other apps get killed to free up more memory. (And
> then, if we're unlucky, the gallery app gets killed too). And if this is
> the case, then this probably has nothing to do with switching back and forth
> between apps. If gallery is trying to scan a bunch of big images, other apps
> are going to be killed to make room. This is basically normal.
I am seeing the animation on the top of the screen with the main device I have been working with. I have both had applications running in the background and not had applications running while testing this issue and it occurs in both cases. I did also remove pictures I added and took a bunch of pictures (383.7 mb) and still had same issue with it crashing
Flags: needinfo?(jcouassi)
Comment 14•12 years ago
|
||
I wrote an automated test for this today and will run it overnight, to try to reproduce the camera crash.
Comment 15•12 years ago
|
||
I ran the automated test several times on my own engineering builds on unagi (over 1000 iterations total) and it passed (no crashes reproduced). Just had one photo on the usd card and no extra apps running in the background.
Reporter | ||
Comment 16•12 years ago
|
||
Issue repros in
Inari Build ID: 20130503070205
Kernel Date: Feb 21
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/3f3489356bbc
Gaia: 3e232bce289c9e156d92553e752616cba284bc8f
And in
Unagi Build ID: 20130503070204
Kernel Date: Dec 5
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/8becaf2a0bc7
Gaia: b0aca0dd1e2955e11190ede725e1fb9ee596438b
Once you switch back in forth from gallery to camera around 7 or 8 times the camera/gallery crashes and it takes you back to the homescreen or will freeze on Gallery with no buttons listed and one or two pictures showing.
Comment 17•11 years ago
|
||
Note: This crash is reproducible via the gaia-ui gallery_camera endurance test on Inari with b2g 18 v1.0.1.
Comment 18•11 years ago
|
||
rwood, when this crash happens, do you see any OoM errors in the kernel logs? 'adb shell dmesg'.
Updated•11 years ago
|
blocking-b2g: --- → leo?
Comment 19•11 years ago
|
||
Recommend not blocking because this is a stress test and not a normal user scenario.
Whiteboard: [mozilla-triage]
Updated•11 years ago
|
Summary: [B2G][Camera][Gallery]Camera crashes when switch repeatedly between Gallery and Camera mode → [B2G][Camera][Gallery] Crash when switching repeatedly between Gallery and Camera apps
Comment 21•11 years ago
|
||
Trying to reproduce this in Inari manually (as per comment 16) with:
- gecko: b2g18:78de618c071a
- gaia: v1.0.1:42b5b9f6d6c045039e1bd88cd32d5f850e3d3750
Unable to do so. 'adb shell b2g-ps' shows fluctuations in VSIZE and RSS, but nothing indicating an obvious resource leak.
Updated•11 years ago
|
Assignee: nobody → mhabicher
Comment 22•11 years ago
|
||
Initial:
20130612160407 Gaia Endurance Test: gallery_camera
20130612160407 Checkpoint after iteration 10 of 100:
APPLICATION USER PID PPID VSIZE RSS WCHAN PC NAME
b2g root 1524 1 193640 76696 ffffffff 400f06ec R /system/b2g/b2g
Homescreen app_1610 1610 1524 73320 27832 ffffffff 400f3330 S /system/b2g/plugin-container
Usage app_1622 1622 1524 65516 24900 ffffffff 400e3330 S /system/b2g/plugin-container
Gallery app_1636 1636 1524 74588 28144 ffffffff 40038330 S /system/b2g/plugin-container
Camera app_1655 1655 1524 72384 24872 ffffffff 4001c330 S /system/b2g/plugin-container
Final:
20130612170959 Checkpoint after iteration 100 of 100:
APPLICATION USER PID PPID VSIZE RSS WCHAN PC NAME
b2g root 1524 1 228024 113376 ffffffff 4100cdce R /system/b2g/b2g
Homescreen app_1610 1610 1524 73320 17224 ffffffff 400f3330 S /system/b2g/plugin-container
Gallery app_1636 1636 1524 89980 25892 ffffffff 40038330 S /system/b2g/plugin-container
This test loads the SD card with a single image of the Firefox logo; the image file is a 355 KiB JPG.
Comment 23•11 years ago
|
||
Wow, it gets better--it seems that with a DEBUG build running on Inari with:
- gecko: b2g18:7609f4d7e9b0
- gaia: v1.0.1:ed3b9e7ed0e083cd1c587c160d6a63440b29fad8
...very early on in the iterations, switching to the Gallery app causes the camera to die; and vice-versa. Is there something wrong with the LMK? There also appear to be, at times, two preallocated processes.
Kernel messages:
# adb shell dmesg | grep kswapd0
<4>[06-12 22:12:09.080] [25: kswapd0]select 999 (Camera), adj 6, size 7082, to kill
<4>[06-12 22:12:09.080] [25: kswapd0]send sigkill to 999 (Camera), adj 6, size 7082
<4>[06-12 22:14:56.494] [25: kswapd0]select 1139 (Camera), adj 6, size 7232, to kill
<4>[06-12 22:14:56.494] [25: kswapd0]send sigkill to 1139 (Camera), adj 6, size 7232
<4>[06-12 22:16:22.528] [25: kswapd0]select 1209 (Camera), adj 6, size 6661, to kill
<4>[06-12 22:16:22.528] [25: kswapd0]send sigkill to 1209 (Camera), adj 6, size 6661
<4>[06-12 22:17:49.873] [25: kswapd0]select 1281 (Camera), adj 6, size 6727, to kill
<4>[06-12 22:17:49.873] [25: kswapd0]send sigkill to 1281 (Camera), adj 6, size 6727
<4>[06-12 22:18:45.797] [25: kswapd0]select 1316 (Gallery), adj 6, size 5413, to kill
<4>[06-12 22:18:45.797] [25: kswapd0]send sigkill to 1316 (Gallery), adj 6, size 5413
<4>[06-12 22:20:44.814] [25: kswapd0]select 1420 (Camera), adj 6, size 7544, to kill
<4>[06-12 22:20:44.814] [25: kswapd0]send sigkill to 1420 (Camera), adj 6, size 7544
<4>[06-12 22:21:37.495] [25: kswapd0]select 1456 (Gallery), adj 6, size 7992, to kill
<4>[06-12 22:21:37.495] [25: kswapd0]send sigkill to 1456 (Gallery), adj 6, size 7992
<4>[06-12 22:23:07.092] [25: kswapd0]select 1695 (Gallery), adj 6, size 8276, to kill
<4>[06-12 22:23:07.092] [25: kswapd0]send sigkill to 1695 (Gallery), adj 6, size 8276
<4>[06-12 22:23:37.832] [25: kswapd0]select 1874 (Camera), adj 6, size 7950, to kill
<4>[06-12 22:23:37.832] [25: kswapd0]send sigkill to 1874 (Camera), adj 6, size 7950
<4>[06-12 22:24:33.216] [25: kswapd0]select 1910 (Gallery), adj 6, size 7891, to kill
<4>[06-12 22:24:33.216] [25: kswapd0]send sigkill to 1910 (Gallery), adj 6, size 7891
<4>[06-12 22:25:06.499] [25: kswapd0]select 1943 (Camera), adj 6, size 7515, to kill
<4>[06-12 22:25:06.499] [25: kswapd0]send sigkill to 1943 (Camera), adj 6, size 7515
<4>[06-12 22:26:01.713] [25: kswapd0]select 1980 (Gallery), adj 6, size 7085, to kill
<4>[06-12 22:26:01.713] [25: kswapd0]send sigkill to 1980 (Gallery), adj 6, size 7085
jlebar: have we regressed on memory recently?
Flags: needinfo?(justin.lebar+bug)
Comment 24•11 years ago
|
||
> jlebar: have we regressed on memory recently?
Not to my knowledge.
What's happening in comment 22 is that the main process grows in size from 77mb to 113mb. Now we have less space for apps.
We need a get_about_memory.py dump after the main process is using 110+mb of RAM.
Flags: needinfo?(justin.lebar+bug)
Comment 25•11 years ago
|
||
This is the b2g-ps output from a freshly-rebooted device (ignore comment 22, it's from a previous run); it goes with the following grepped (truncated, it seems) kernel log entries:
<4>[06-12 22:28:05.123] [25: kswapd0]select 2104 (Camera), adj 6, size 6781, to kill
<4>[06-12 22:28:05.123] [25: kswapd0]send sigkill to 2104 (Camera), adj 6, size 6781
<4>[06-12 22:30:38.673] [25: kswapd0]select 2211 (Gallery), adj 6, size 6446, to kill
<4>[06-12 22:30:38.673] [25: kswapd0]send sigkill to 2211 (Gallery), adj 6, size 6446
<4>[06-12 22:31:12.916] [25: kswapd0]select 2246 (Camera), adj 6, size 6787, to kill
<4>[06-12 22:31:12.916] [25: kswapd0]send sigkill to 2246 (Camera), adj 6, size 6787
<4>[06-12 22:32:11.163] [25: kswapd0]select 2282 (Gallery), adj 6, size 7686, to kill
<4>[06-12 22:32:11.163] [25: kswapd0]send sigkill to 2282 (Gallery), adj 6, size 7686
<4>[06-12 22:32:44.906] [25: kswapd0]select 2316 (Camera), adj 6, size 7281, to kill
<4>[06-12 22:32:44.906] [25: kswapd0]send sigkill to 2316 (Camera), adj 6, size 7281
<4>[06-12 22:33:41.661] [25: kswapd0]select 2352 (Gallery), adj 6, size 7249, to kill
<4>[06-12 22:33:41.661] [25: kswapd0]send sigkill to 2352 (Gallery), adj 6, size 7249
<4>[06-12 22:35:51.598] [25: kswapd0]select 2454 (Camera), adj 6, size 6886, to kill
<4>[06-12 22:35:51.598] [25: kswapd0]send sigkill to 2454 (Camera), adj 6, size 6886
<4>[06-12 22:36:59.354] [25: kswapd0]select 2490 (Gallery), adj 6, size 5106, to kill
<4>[06-12 22:36:59.354] [25: kswapd0]send sigkill to 2490 (Gallery), adj 6, size 5106
<4>[06-12 22:37:32.256] [25: kswapd0]select 2524 (Camera), adj 6, size 6209, to kill
<4>[06-12 22:37:32.256] [25: kswapd0]send sigkill to 2524 (Camera), adj 6, size 6209
<4>[06-12 22:38:34.257] [25: kswapd0]select 2560 (Gallery), adj 6, size 7304, to kill
<4>[06-12 22:38:34.257] [25: kswapd0]send sigkill to 2560 (Gallery), adj 6, size 7304
<4>[06-12 22:39:11.353] [25: kswapd0]select 2595 (Camera), adj 6, size 7020, to kill
<4>[06-12 22:39:11.353] [25: kswapd0]send sigkill to 2595 (Camera), adj 6, size 7020
<4>[06-12 22:40:14.094] [25: kswapd0]select 2631 (Gallery), adj 6, size 6957, to kill
<4>[06-12 22:40:14.094] [25: kswapd0]send sigkill to 2631 (Gallery), adj 6, size 6957
<4>[06-12 22:40:51.671] [25: kswapd0]select 2664 (Camera), adj 6, size 6672, to kill
<4>[06-12 22:40:51.671] [25: kswapd0]send sigkill to 2664 (Camera), adj 6, size 6672
<4>[06-12 22:41:57.185] [25: kswapd0]select 2699 (Gallery), adj 6, size 5624, to kill
<4>[06-12 22:41:57.185] [25: kswapd0]send sigkill to 2699 (Gallery), adj 6, size 5624
<4>[06-12 22:43:43.338] [25: kswapd0]select 2773 (Gallery), adj 6, size 5509, to kill
<4>[06-12 22:43:43.338] [25: kswapd0]send sigkill to 2773 (Gallery), adj 6, size 5509
<4>[06-12 22:44:30.795] [25: kswapd0]select 2807 (Camera), adj 6, size 5510, to kill
<4>[06-12 22:44:30.795] [25: kswapd0]send sigkill to 2807 (Camera), adj 6, size 5510
<4>[06-12 22:46:11.973] [25: kswapd0]select 2879 (Camera), adj 6, size 6700, to kill
<4>[06-12 22:46:11.973] [25: kswapd0]send sigkill to 2879 (Camera), adj 6, size 6700
<4>[06-12 22:47:54.423] [25: kswapd0]select 2951 (Camera), adj 6, size 6515, to kill
<4>[06-12 22:47:54.423] [25: kswapd0]send sigkill to 2951 (Camera), adj 6, size 6515
I forgot to build with DMD enabled, but will get that data next.
In the attachment, you can see that even intially with b2g.VSIZE=195908 and .RSS=93516, after a cycle of:
a. open/switch to Camera app
b. switch to Gallery app
c. grab b2g-ps
...the Camera is killed.
Attachment #761754 -
Flags: feedback?(justin.lebar+bug)
Comment 26•11 years ago
|
||
fwiw you don't need to build with DMD enabled to do get_about_memory.py. We only need DMD if the result of get_about_memory.py shows high heap-unclassified.
Comment 27•11 years ago
|
||
The 110+mb main process memory usage shouldn't be happening.
But aside from that, everything looks like it's working properly.
At cjones's insistence, the preallocated process runs with the same priority as other bg processes. And the homescreen app runs with higher priority.
So after those two apps, we don't have a lot of space left.
It would be a bit interesting to see what the output of b2g-info is, because it will show you how much memory is actually free on the system. b2g-info isn't merged into mainline yet, but you can get it with something like
$ cd root/b2g/checkout
$ git remote add https://github.com/jlebar/B2G jlebar
$ git fetch jlebar
$ git checkout b2g-info
At this point you can do either:
$ ./build.sh && ./flash.sh
or
$ ./build.sh b2g-info
$ adb remount
$ adb push out/target/product/<XXX>/system/bin/b2g-info /system/bin
Comment 28•11 years ago
|
||
DMD report for the b2g parent process shows:
------------------------------------------------------------------
Unreported stack trace records
------------------------------------------------------------------
Unreported: 13 blocks in stack trace record 1 of 713
7,987,200 bytes (7,987,200 requested / 0 slop)
14.16% of the heap (14.16% cumulative); 31.49% of unreported (31.49% cumulative)
Allocated at
malloc /home/mikeh/dev/mozilla/m-c/b2g18/memory/build/replace_malloc.c:152 (0x400f142c libmozglue.so+0x442c)
yyalloc /home/mikeh/dev/mozilla/btg024/objdir-gecko-b2g18-debug-dmd/gfx/angle/glslang_lex.cpp:2930 (0x413bd540 libxul.so+0x1273540)
gfxImageSurface /home/mikeh/dev/mozilla/m-c/b2g18/gfx/thebes/gfxImageSurface.cpp:111 (0x411f630a libxul.so+0x10ac30a)
nsRefPtr<gfxASurface>::assign_with_AddRef(gfxASurface*) /home/mikeh/dev/mozilla/btg024/objdir-gecko-b2g18-debug-dmd/dist/include/nsAutoPtr.h:844 (0x41216310 libxul.so+0x10cc310)
nsRefPtr<gfxASurface> /home/mikeh/dev/mozilla/btg024/objdir-gecko-b2g18-debug-dmd/dist/include/nsAutoPtr.h:903 (0x41209920 libxul.so+0x10bf920)
nsRefPtr<gfxASurface>::assign_assuming_AddRef(gfxASurface*) /home/mikeh/dev/mozilla/btg024/objdir-gecko-b2g18-debug-dmd/dist/include/nsAutoPtr.h:859 (0x40546c60 libxul.so+0x3fcc60)
mozilla::image::RasterImage::DecodingComplete() /home/mikeh/dev/mozilla/btg024/objdir-gecko-b2g18-debug-dmd/dist/include/nsError.h:1065 (0x4054073a libxul.so+0x3f673a)
mozilla::image::Decoder::PostDecodeDone() /home/mikeh/dev/mozilla/btg024/objdir-gecko-b2g18-debug-dmd/dist/include/nsCOMPtr.h:762 (0x4053aabc libxul.so+0x3f0abc)
mozilla::image::nsJPEGDecoder::NotifyDone() /home/mikeh/dev/mozilla/m-c/b2g18/image/decoders/nsJPEGDecoder.cpp:533 (0x40cb3680 libxul.so+0x40d680)
mozilla::image::term_source(jpeg_decompress_struct*) /home/mikeh/dev/mozilla/m-c/b2g18/image/decoders/nsJPEGDecoder.cpp:851 (0x40cb36b4 libxul.so+0x40d6b4)
jpeg_finish_decompress /home/mikeh/dev/mozilla/m-c/b2g18/media/libjpeg/jdapimin.c:393 (0x41af735e libxul.so+0x125135e)
mozilla::image::nsJPEGDecoder::WriteInternal(char const*, unsigned int) /home/mikeh/dev/mozilla/m-c/b2g18/image/decoders/nsJPEGDecoder.cpp:502 (0x40cb42d4 libxul.so+0x40e2d4)
mozilla::image::Decoder::Write(char const*, unsigned int) /home/mikeh/dev/mozilla/m-c/b2g18/image/src/Decoder.cpp:81 (0x4053a978 libxul.so+0x3f0978)
mozilla::image::RasterImage::WriteToDecoder(char const*, unsigned int) /home/mikeh/dev/mozilla/m-c/b2g18/image/src/RasterImage.cpp:2501 (0x4053ff28 libxul.so+0x3f5f28)
mozilla::image::RasterImage::DecodeSomeData(unsigned int) /home/mikeh/dev/mozilla/m-c/b2g18/image/src/RasterImage.cpp:3098 (0x40540006 libxul.so+0x3f6006)
mozilla::image::RasterImage::DecodeWorker::DecodeSomeOfImage(mozilla::image::RasterImage*, mozilla::image::RasterImage::DecodeWorker::DecodeType) /home/mikeh/dev/mozilla/btg024/objdir-gecko-b2g18-debug-dmd/dist/include/nsError.h:1065 (0x40540b06 libxul.so+0x3f6b06)
mozilla::image::RasterImage::DecodeWorker::Run() /home/mikeh/dev/mozilla/m-c/b2g18/image/src/RasterImage.cpp:3335 (0x40c9cd16 libxul.so+0x3f6d16)
Comment 29•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #27)
>
> $ cd root/b2g/checkout
> $ git remote add https://github.com/jlebar/B2G jlebar
I can't get past this step:
22:10:14 ➜ btg024 git:(master) ✗ git remote add https://github.com/jlebar/B2G jlebar
fatal: 'https://github.com/jlebar/B2G' is not a valid remote name
22:10:29 ➜ btg024 git:(master) ✗ git remote add https://github.com/jlebar/B2G.git jlebar
fatal: 'https://github.com/jlebar/B2G.git' is not a valid remote name
(I tried the second in case the .git was missing, but it didn't make a difference.)
Comment 30•11 years ago
|
||
Sorry, git remote add jlebar https://github.com/jlebar/B2G.git
Comment 31•11 years ago
|
||
Okay, that worked--next:
# git checkout b2g-info
error: pathspec 'b2g-info' did not match any file(s) known to git.
Comment 32•11 years ago
|
||
I don't know why, but some versions of git make you do "checkout jlebar/b2g-info", while others work with just "b2g-info".
Sorry; I didn't mean for this to be complex!
Updated•11 years ago
|
Comment 33•11 years ago
|
||
No luck with that either:
# git checkout jlebar/b2g-info
error: pathspec 'jlebar/b2g-info' did not match any file(s) known to git.
Comment 34•11 years ago
|
||
# adb shell b2g-info
| megabytes |
NAME PID NICE USS PSS RSS VSIZE OOM_ADJ USER
b2g 444 0 116.5 118.6 121.3 230.9 0 root
(Preallocated a 587 1 10.5 11.1 12.4 68.5 2 app_587
Homescreen 8272 18 14.3 16.4 19.1 73.7 4 app_8272
System memory info:
Total 176.6 MB
Used - cache 157.1 MB
B2G procs (PSS) 146.0 MB
Non-B2G procs 11.1 MB
Free + cache 19.5 MB
Free 7.5 MB
Cache 12.1 MB
Low-memory killer parameters:
notify_trigger 10240 KB
oom_adj min_free
6 20480 KB
4 8192 KB
3 7168 KB
2 6144 KB
1 5120 KB
0 4096 KB
Flags: needinfo?(justin.lebar+bug)
Comment 35•11 years ago
|
||
Second run:
# adb shell b2g-info
| megabytes |
NAME PID NICE USS PSS RSS VSIZE OOM_ADJ USER
b2g 444 0 115.8 117.9 120.7 231.9 0 root
(Preallocated a 587 1 10.5 11.1 12.4 68.5 2 app_587
Homescreen 8272 18 14.3 16.4 19.1 73.7 4 app_8272
System memory info:
Total 176.6 MB
Used - cache 156.5 MB
B2G procs (PSS) 145.3 MB
Non-B2G procs 11.2 MB
Free + cache 20.1 MB
Free 8.0 MB
Cache 12.1 MB
Low-memory killer parameters:
notify_trigger 10240 KB
oom_adj min_free
6 20480 KB
4 8192 KB
3 7168 KB
2 6144 KB
1 5120 KB
0 4096 KB
Comment 36•11 years ago
|
||
Hm, it's weird that the preallocated process has oom_adj 2. Maybe it's in the process of turning into some other process.
You can see here that the system only has 20mb free, including the buffer cache. That's not a lot of space.
So I see two bugs here:
1) The main process is using 115+mb of RAM. That's very bad.
2) The preallocated app process has oom_adj 2. It should be 6, I think. This could be bad, or it might not be a big deal.
To reproduce this, all I need to do is reboot the phone and then following the steps in comment 0?
Updated•11 years ago
|
Flags: needinfo?(justin.lebar+bug)
Whiteboard: [mozilla-triage] → [mozilla-triage][MemShrink]
Comment 37•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #36)
>
> To reproduce this, all I need to do is reboot the phone and then following
> the steps in comment 0?
I do it with the endurance gaiatest. rwood can help you get this going, or if he's too busy, I can probably muddle you through the setup process. :)
Comment 38•11 years ago
|
||
(You can do it manually, but in my case, the device got into the wedged state logged above after 63 iterations.)
Updated•11 years ago
|
Attachment #761754 -
Attachment is obsolete: true
Attachment #761754 -
Flags: feedback?(justin.lebar+bug)
Updated•11 years ago
|
Attachment #761754 -
Attachment is obsolete: false
Updated•11 years ago
|
Attachment #761698 -
Attachment mime type: text/x-log → text/plain
Comment 39•11 years ago
|
||
Comment on attachment 761836 [details]
get_about_memory.py output, including DMD report
One thing that sticks out at me in this attachment is
2.56 MB (02.40%) ── huge/string(length=9114, "...") [131]
That means that we have 131 copies of a length-9114 data URI. Someone (maybe Gaia, maybe Gecko) is probably leaking that.
A 9114-char data URI is not very big, so it's likely not a full image that we're leaking. So one way to approach this bug is to try to figure out what that string is.
Comment 40•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #39)
>
> A 9114-char data URI is not very big, so it's likely not a full image that
> we're leaking. So one way to approach this bug is to try to figure out what
> that string is.
Looks like the first 8 characters decode to <137>PNG<13><10>. Not so useful. If we logged a bit more, perhaps we could compare it against images included in the build.
Comment 41•11 years ago
|
||
I've been trying to debug this locally, but the camera app segfaults with a null-pointer exception on trunk. So it's somewhat slow-going...
Comment 42•11 years ago
|
||
> If we logged a bit more, perhaps we could compare it against images included in the build.
Indeed, we can and should log the whole thing.
Bug 801780 is where we added our current long-string logging to about:memory. Bug 852010 is open for dumping the entire contents of long strings.
I'm juggling a lot of things at the moment; if you're interested in helping with bug 852010, that would probably help us move forward here. At the very least, we'd be able to see what image is being leaked (simply by opening the data URI).
Another thing to figure out here is the following: gfxImageSurface has a memory reporter, which is invoked under some circumstances. But DMD is seeing dark matter in some gfxImageSurface objects here, which means that the memory reporter is not being run for some gfxImageSurfaces when we do a DMD dump. Why is that?
If we could figure out why the memory reporters for these gfxImageSurface objects are not being run, that might help us understand why they're building up as they are.
We have bug 820248 open on a similar problem, which may or may not be related.
I'm happy to keep working on this, but it would be a big help to me if you could step in, so let me know.
Comment 43•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #41)
>
> I've been trying to debug this locally, but the camera app segfaults with a
> null-pointer exception on trunk. So it's somewhat slow-going...
It's probably bug 882328; there's a patch pending there.
Comment 44•11 years ago
|
||
Decoded image attachments pending. :)
Comment 45•11 years ago
|
||
Comment 46•11 years ago
|
||
Further to comment 45, occurrences of the different images in the memory-report:
# grep -c length=9114 memory-reports
160
# grep -c length=9117 memory-reports
2
# grep -c length=44737 memory-reports
2
(Although the data for each image appears twice in the memory-report, one occurrence uses the "length=X" notation, while the other uses "length-X" notation; so the above greps are unique.)
So the Marketplace app icon appears in the memory-report 160 times (or 162, considering that the 9117-byte images are also Marketplace app icons).
The 44737-byte image is a Homescreen wallpaper (although, interestingly, _not_ the Homescreen wallpaper I have selected).
Comment 47•11 years ago
|
||
Cristian, it looks like the Homescreen might be leaking icon resources when switching tasks.
Flags: needinfo?(crdlc)
Comment 48•11 years ago
|
||
Hi,
My main concern here is to know if this bug is reproducible when ev.me is loaded or not. Surfing on our home implementation, we revoke all icon resources after loading:
https://github.com/mozilla-b2g/gaia/blob/master/apps/homescreen/js/page.js#L233
https://github.com/mozilla-b2g/gaia/blob/master/apps/homescreen/js/page.js#L240
https://github.com/mozilla-b2g/gaia/blob/master/apps/homescreen/js/page.js#L351
https://github.com/mozilla-b2g/gaia/blob/master/apps/homescreen/js/page.js#L530
What do you mean: switching tasks? I don't know that is the Homescreen wallpaper.. home doesn't define wallpaper, it is defined on system layer as far as I know.
Could I take a look in some part of the home?
Thanks a lot
Flags: needinfo?(crdlc) → needinfo?(mhabicher)
Comment 49•11 years ago
|
||
This memory leak isn't in the homescreen app; it's in the main process.
Comment 50•11 years ago
|
||
After killing the Homescreen app and rerunning get_about_memory.py, some of the images in the memory-report have been freed:
# grep -c length=9114 memory-reports
156
# grep -c length=9117 memory-reports
0
# grep -c length=44737 memory-reports
2
Comment 51•11 years ago
|
||
What if you run get_about_memory.py --minimize?
Comment 52•11 years ago
|
||
(If the images are garbage -- and I suspect they're not -- get_about_memory.py --minimize will dump them. If --minimize doesn't dump them, then they're definitely leaked somehow.)
Comment 53•11 years ago
|
||
(In reply to Cristian Rodriguez de la Cruz (:crdlc) from comment #48)
>
> My main concern here is to know if this bug is reproducible when ev.me
> is loaded or not. Surfing on our home implementation, we revoke all icon
> resources after loading:
The test that shows this problem is as follows:
1. restart b2g process
2. open Camera app, wait 30 seconds
3. switch to Gallery app, wait 30 seconds
4. switch to Camera app, wait 30 seconds
5. go to step 3
Somewhere between 60 and 90 iterations, the phone runs out of memory and the Homescreen fails to load.
> What do you mean: switching tasks? I don't know that is the Homescreen
> wallpaper.. home doesn't define wallpaper, it is defined on system layer as
> far as I know.
The about:memory report of the b2g parent process shows that it is holding onto several strings that contain "
Hmm, I see that the icon for the Marketplace is defined in $B2G/gaia/external-apps/marketplace.firefox.com/update.webapp:
"icons": {
"64": "..."
},
According to |grep -rn -A 4 \"icons\" *| in $B2G/gaia/{apps, external-apps, external-dogfood-apps}, the Marketplace is the _only_ app that has its icon so-encoded.
I don't suppose you know off-hand how app icons get loaded?
Flags: needinfo?(mhabicher) → needinfo?(crdlc)
Comment 54•11 years ago
|
||
I've just added traces to homescreen where we create object URLs and set the src attributes for apps and I don't see anything special for the Marketplace app. The behavior is the same than the rest of apps. After flashing the device, it loads the default icon (rocket), tries to load the icon defined at build time (application-data) and finally when mozapps API returns the info it loads the correct one if is different (theoretically the same loaded previously). In other words, I don't see any way in home's code where we could create 160 strings that contain the "...") [598]
> │ │ │ │ │ │ ├───11.68 MB (03.69%) ── string(length=9114, "...") [598]
> │ │ │ │ │ │ ├────9.34 MB (02.95%) ── string(length=7074, "...") [598]
> │ │ │ │ │ │ ├────9.34 MB (02.95%) ── string(length=7286, "...") [598]
> │ │ │ │ │ │ ├────9.34 MB (02.95%) ── string(length=8054, "...") [598]
> │ │ │ │ │ │ ├────4.67 MB (01.48%) ── string(length=2182, "...") [598]
> │ │ │ │ │ │ ├────4.67 MB (01.48%) ── string(length=2522, "...") [598]
> │ │ │ │ │ │ ├────4.67 MB (01.48%) ── string(length=3062, "...") [598]
> │ │ │ │ │ │ ├────4.67 MB (01.48%) ── string(length=3926, "...") [598]
> │ │ │ │ │ │ └────0.01 MB (00.00%) ── string(length=2499, "bssid // frequency // signal leve...")
Ouch, that's horrible. Like you said, these appear to be different strings, as they all have different lengths.
Comment 61•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #60)
>
> Ouch, that's horrible. Like you said, these appear to be different strings,
> as they all have different lengths.
If it's anything like what I saw in comment 45, the largest one is probably one of the wallpapers.
Unfortunately, the amount of string we log by default is just enough to show the basic PNG header. This patch increases the amount of logged string so we can at least identify what's leaking.
Comment 62•11 years ago
|
||
> If it's anything like what I saw in comment 45, the largest one is probably one of the wallpapers.
Indeed, but what's different here is that we have many copies of the big one, whereas earlier we had only a few.
Comment 63•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #62)
>
> Indeed, but what's different here is that we have many copies of the big
> one, whereas earlier we had only a few.
I was thinking about that; my best guess (for now) is that leo has more memory to leak into, so their tests can run longer, allowing it to accumulate more copies.
Comment 64•11 years ago
|
||
That sounds plausible to me!
Comment 65•11 years ago
|
||
(In reply to Mike Habicher [:mikeh] from comment #63)
> I was thinking about that; my best guess (for now) is that leo has more
> memory to leak into, so their tests can run longer, allowing it to
> accumulate more copies.
Confirmed, they have more memory and since the test is run via marionette it can run for hours (IIRC they took a memory dump every 2 hours).(In reply to Mike Habicher [:mikeh] from comment #61)
> Unfortunately, the amount of string we log by default is just enough to show
> the basic PNG header. This patch increases the amount of logged string so we
> can at least identify what's leaking.
I'll ask them if they can re-run their tests with your patch applied so we get better visibility.
Comment 66•11 years ago
|
||
this memory report taken after increase 8k for logging string. (Attachment #767906 [details] [diff])
Comment 67•11 years ago
|
||
Well, one of these is a twitter logo...
Have you tried get_about_memory.py --minimize? Can we check that that doesn't make these strings go away?
Comment 68•11 years ago
|
||
Okay, I got tired of doing this by hand, so here's a script. Just stick it somewhere on your path, cd into the folder with the 'memory-reports' file, and run it. It will spit out one file for each unique image in the report.
Running it against the log in comment 66, I see:
- five icons that look like the sun
- three Twitter icons
- two Facebook icons
- two Wikipedia icons
- a Marketplace icon
- an icon that looks like the top of a gold frame, or something
- and a JPEG that doesn't contain enough non-header data to decode
I wonder if, like the Marketplace icon, all of these are defined in their application manifests as "~~~~~~"
This one is any clue for this issue?
Comment 70•11 years ago
|
||
(In reply to jongsoo.oh from comment #69)
> All apps that added for request by operator is set to manifest like as
> "~~~~~~"
>
> This one is any clue for this issue?
Yes, it's a very important clue. So we have to check what the system app is doing with the application manifest as Mike suggested in comment 56. Considering that switching apps seem to cause the issue it might be happening somewhere in the window manager; a quick look over the code shows that it's using the app's icon, see here:
https://github.com/mozilla-b2g/gaia/blob/master/apps/system/js/window_manager.js#L1271
... and here:
https://github.com/mozilla-b2g/gaia/blob/master/apps/system/js/window_manager.js#L446
Though from a superficial look that code seems to be running only at application startup and thus shouldn't be leaking the icon.
Comment 71•11 years ago
|
||
gsvelto, I see this happening on the v1.0.1 branch, and the code looks different. Can you suggest some places to look in here?
https://github.com/mozilla-b2g/gaia/blob/v1.0.1/apps/system/js/window_manager.js
I was thinking of commenting out the equivalent of the lines you mention in comment 70 and seeing if the leak still occurs.
Comment 72•11 years ago
|
||
If we ran this test with principal merging disabled, that will probably tell us which JSM/JS component the leak is coming from.
The pref to flip is jsloader.reuseGlobal. Set it to false, e.g. in b2g/app/b2g.js. Then get a new memory report; instead of putting the strings under Compartment([System Principal]), hopefully they'll be under something else.
> So we have to check what the system app is doing with the application manifest
The leak is probably in something from Gecko; that's why it's under System Principal and not under the system app in about:memory.
Alternatively we can just look at whatever touches manifests...
Comment 73•11 years ago
|
||
Actually, we hardcoded jsloader.reuseGlobal to true in B2G. You need to set to false in mozJSComponentLoader.cpp. Search for "reuseGlobal" and look right below that.
Comment 74•11 years ago
|
||
Also, if you test this again, please check whether get_about_memory.py --minimize gets rid of the strings, after you observe that they're there.
Comment 75•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #74)
>
> Also, if you test this again, please check whether get_about_memory.py
> --minimize gets rid of the strings, after you observe that they're there.
I've done that with reports in the past, and it didn't make any difference--at least, not to the data:image strings.
Comment 76•11 years ago
|
||
Comment 77•11 years ago
|
||
jlebar, here is the latest set of memory reports. I don't see anything obvious in them, but hopefully you can make some sense of them. They were obtained with:
diff --git a/b2g/app/b2g.js b/b2g/app/b2g.js
--- a/b2g/app/b2g.js
+++ b/b2g/app/b2g.js
@@ -678,17 +678,17 @@ pref("network.activity.blipIntervalMilli
// By default we want the NetworkManager service to manage Gecko's offline
// status for us according to the state of Wifi/cellular data connections.
// In some environments, such as the emulator or hardware with other network
// connectivity, this is not desireable, however, in which case this pref
// can be flipped to false.
pref("network.gonk.manage-offline-status", true);
-pref("jsloader.reuseGlobal", true);
+pref("jsloader.reuseGlobal", false);
// Enable font inflation for browser tab content.
pref("font.size.inflation.minTwips", 120);
// And disable it for lingering master-process UI.
pref("font.size.inflation.disabledInMasterProcess", true);
// Enable freeing dirty pages when minimizing memory; this reduces memory
// consumption when applications are sent to the background.
diff --git a/js/xpconnect/loader/mozJSComponentLoader.cpp b/js/xpconnect/loader/mozJSComponentLoader.cpp
--- a/js/xpconnect/loader/mozJSComponentLoader.cpp
+++ b/js/xpconnect/loader/mozJSComponentLoader.cpp
@@ -450,17 +450,17 @@ mozJSComponentLoader::ReallyInit()
{
nsresult rv;
mReuseLoaderGlobal = Preferences::GetBool("jsloader.reuseGlobal");
// XXXkhuey B2G child processes have some sort of preferences race that
// results in getting the wrong value.
#ifdef MOZ_B2G
- mReuseLoaderGlobal = true;
+ // mReuseLoaderGlobal = true;
#endif
/*
* Get the JSRuntime from the runtime svc, if possible.
* We keep a reference around, because it's a Bad Thing if the runtime
* service gets shut down before we're done. Bad!
*/
Attachment #769115 -
Flags: feedback?(justin.lebar+bug)
Comment 78•11 years ago
|
||
(In reply to Mike Habicher [:mikeh] from comment #75)
> (In reply to Justin Lebar [:jlebar] from comment #74)
> >
> > Also, if you test this again, please check whether get_about_memory.py
> > --minimize gets rid of the strings, after you observe that they're there.
>
> I've done that with reports in the past, and it didn't make any
> difference--at least, not to the data:image strings.
Okay, great. Thanks!
Comment 79•11 years ago
|
||
> ├──28.82 MB (36.70%) -- js-non-window
> │ ├──21.95 MB (27.96%) -- compartments
> │ │ ├──20.43 MB (26.02%) -- non-window-global
> │ │ │ ├──11.19 MB (14.25%) ++ (111 tiny)
> │ │ │ ├───6.24 MB (07.95%) -- compartment([System Principal], resource://gre/modules/DOMRequestHelper.jsm)
> │ │ │ │ ├──2.40 MB (03.06%) -- gc-heap
> │ │ │ │ │ ├──1.30 MB (01.65%) ++ (5 tiny)
> │ │ │ │ │ └──1.10 MB (01.41%) ── unused-gc-things
> │ │ │ │ ├──2.25 MB (02.86%) -- string-chars
> │ │ │ │ │ ├──1.29 MB (01.64%) ── non-huge
> │ │ │ │ │ └──0.96 MB (01.22%) ── huge/string(length=9114, ") (***)
> │ │ │ │ ├──1.43 MB (01.82%) -- objects-extra
> │ │ │ │ │ ├──1.42 MB (01.81%) ── slots
> │ │ │ │ │ └──0.01 MB (00.01%) ── elements
> │ │ │ │ └──0.16 MB (00.21%) ++ (3 tiny)
> │ │ │ ├───2.02 MB (02.57%) -- compartment([System Principal], jar:file:///system/b2g/omni.ja!/components/Webapps.js)
> │ │ │ │ ├──0.70 MB (00.89%) ++ gc-heap
> │ │ │ │ ├──0.57 MB (00.73%) ── string-chars/non-huge (***)
> │ │ │ │ ├──0.47 MB (00.59%) ── objects-extra/slots
> │ │ │ │ ├──0.25 MB (00.32%) ── cross-compartment-wrappers
> │ │ │ │ ├──0.02 MB (00.02%) ── script-data
> │ │ │ │ └──0.02 MB (00.02%) ── other-sundries
> │ │ │ └───0.98 MB (01.25%) ++ compartment([System Principal], chrome://browser/content/shell.xul)
> │ │ └───1.52 MB (01.94%) -- no-global/compartment(atoms)
> │ │ ├──0.98 MB (01.24%) -- string-chars
> │ │ │ ├──0.89 MB (01.14%) ── non-huge
> │ │ │ └──0.09 MB (00.11%) ── huge/string(length=44737, ")
This shows 0.96mb coming from presumably multiple copies of a length-9114 string in the DOMRequestHelper.jsm compartment. That's probably the culprit.
It also shows a lot of non-huge strings in Webapps.js, which may be relevant.
Comment 80•11 years ago
|
||
Maybe we're leaking DOMRequests from somewhere.
Comment 81•11 years ago
|
||
I may have figured this out. Let me give you a patch to look at.
Comment 82•11 years ago
|
||
Sorry, jlebar: this has your patch applied and the tests still OoMs. memory-reports attached.
Comment 83•11 years ago
|
||
Can you point me to instructions for running this workload locally?
Comment 84•11 years ago
|
||
This leakage problem makes a issue to stability of leo device.
The image data is contained in manifest link below might make a memory leak.
"~~~~~~"
When we erase the upper case apps, leo get a better stablity.
Comment 85•11 years ago
|
||
Have we established that the leak itself is caused by switching from an app to another? I did some quick tests last week but - strangely enough - couldn't reproduce the issue. IMHO the first thing we should nail down is why (and where) we're reading all the app manifests; we'll probably be able to drill down to the leak from there.
:jeffhwang confirmed that in their tests they had multiple apps installed with the icon being specified as a data URL in the manifest and all of those were leaked. So we're obviously touching all the manifests and even if there wasn't an actual leak I wouldn't understand why we're doing it.
Comment 86•11 years ago
|
||
> Sorry, jlebar: this has your patch applied and the tests still OoMs. memory-reports attached.
Were you testing on b2g18? I discovered that, although the patch applies there, it has no effect on that branch.
Comment 87•11 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #85)
> IMHO the first thing we should nail down is
> why (and where) we're reading all the app manifests; we'll probably be able
> to drill down to the leak from there.
I agree with you it doesn't need to touch all the manifest files.
But it seems it is doing that while we run Marionette tests.
On the beginning of the test, we use manifest files which have inline icons inside of it.
and we got a crash on every device.
After, we removed the inline icons from manifest and tested it again, the result is remarkable. We only got one crashed device out of 22 so far.
> we'll probably be able to drill down to the leak from there.
Gabriele, (if it is doing that) do you find something why Marionette is reading a number of times for all the manifests?
Flags: needinfo?(gsvelto)
Comment 88•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #86)
>
> Were you testing on b2g18? I discovered that, although the patch applies
> there, it has no effect on that branch.
Yes, I was testing on b2g18. I can retest against m-c, assumin it's stable enough to run the stress test.
If you want to run the test yourself:
# git clone https://github.com/rwood-moz/gaia-ui-tests.git gaiastress
# cd gaiastress
# git pull origin gaiastress
# cd gaiatest
# cp testvars_template.json bug851626.json
-- edit bug851626.json to add |"acknowledged_risks": true,| to the top of the JSON object
# adb forward tcp:2828 tcp:2828
# gaiatest --type b2g --address localhost:2828 --testvars bug851626.json --restart --iterations=100 --checkpoint=10 tests/endurance/test_endurance_gallery_camera.py
--interations: the number of times to repeat the gallery<-->camera switch test
--checkpoint: log the output of |adb shell b2g-ps| every this-number of iterations
(Those steps are reconstructed from my CLI history--ping me on IRC if you run into any issues, and I'll do my best to help you sort them out.)
Comment 89•11 years ago
|
||
BTW, gaiatest will warn you of this (and give you 30s to cancel) but I'll call it out here: THE STEPS IN COMMENT 88 WILL RESET THE DATA ON YOUR DEVICE, INCLUDING PICTURES ON THE uSD CARD.
Comment 90•11 years ago
|
||
Master is totally unusable, so I guess I'll try to backport this patch to b2g18. :-/
Comment 91•11 years ago
|
||
Okay, I got my device to work on master.
I can reproduce the leak using marionette, but doing the same thing manually, I can't. So maybe this is another marionette leak.
Comment 92•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #91)
> I can reproduce the leak using marionette, but doing the same thing
> manually, I can't. So maybe this is another marionette leak.
i agree, it maybe this is another marionette leak.
when i am doing in using marionette, i can see the increasing icon resource on defined manifest.webapp.
but, doing by manually, can't see same thing.
Comment 93•11 years ago
|
||
before marionette test.
│ │ │ │ │ ├──1.03 MB (02.83%) -- huge
│ │ │ │ │ │ ├──0.19 MB (00.52%) ── string(length=23510, "...") [4]
│ │ │ │ │ │ ├──0.16 MB (00.43%) ── string(length=18514, "...") [4]
│ │ │ │ │ │ ├──0.16 MB (00.43%) ── string(length=20438, "...") [4]
│ │ │ │ │ │ ├──0.10 MB (00.27%) ── string(length=9114, "...") [5]
│ │ │ │ │ │ ├──0.09 MB (00.26%) ── string(length=10914, "...") [4]
│ │ │ │ │ │ ├──0.09 MB (00.26%) ── string(length=7074, "...") [6]
│ │ │ │ │ │ ├──0.08 MB (00.21%) ── string(length=8054, "...") [5]
│ │ │ │ │ │ ├──0.05 MB (00.13%) ── string(length=4654, "...") [4]
│ │ │ │ │ │ ├──0.04 MB (00.11%) ── string(length=2522, "...") [5]
│ │ │ │ │ │ ├──0.04 MB (00.11%) ── string(length=3062, "...") [5]
│ │ │ │ │ │ └──0.04 MB (00.11%) ── string(length=3926, "...") [5]
after testing by marionette.
│ │ │ │ │ ├──1.55 MB (03.72%) -- huge
│ │ │ │ │ │ ├──0.23 MB (00.56%) ── string(length=9114, "...") [12]
│ │ │ │ │ │ ├──0.20 MB (00.49%) ── string(length=7074, "...") [13]
│ │ │ │ │ │ ├──0.19 MB (00.45%) ── string(length=23510, "...") [4]
│ │ │ │ │ │ ├──0.19 MB (00.45%) ── string(length=8054, "...") [12]
│ │ │ │ │ │ ├──0.16 MB (00.38%) ── string(length=18514, "...") [4]
│ │ │ │ │ │ ├──0.16 MB (00.38%) ── string(length=20438, "...") [4]
│ │ │ │ │ │ ├──0.09 MB (00.23%) ── string(length=10914, "...") [4]
│ │ │ │ │ │ ├──0.09 MB (00.23%) ── string(length=2522, "...") [12]
│ │ │ │ │ │ ├──0.09 MB (00.23%) ── string(length=3062, "...") [12]
│ │ │ │ │ │ ├──0.09 MB (00.23%) ── string(length=3926, "...") [12]
│ │ │ │ │ │ └──0.05 MB (00.11%) ── string(length=4654, "...") [4]
Comment 94•11 years ago
|
||
Running the gallery-camera stress test on an m-c/master build, the test actually completed; though the b2g parent process had ballooned to the point where only it and the Gallery app could fit in memory at the same time.
After 100 iterations, I see 205 copies of the Marketplace app icon, ~2 per iteration.
(jlebar, this is without your DOMRequest fixes--I'll run that test overnight.)
I know next to nothing about how marionette works, but I wonder how it could be very specifically leaking data: URI icons.
Comment 95•11 years ago
|
||
The marionette might cause to leakage of icon in manifest "~~~~~~"
The marionette is handled from QA team?
Flags: needinfo?(tchung)
Comment 96•11 years ago
|
||
The original issue was reported/found by a manual test (without marionette) though, correct?
Comment 97•11 years ago
|
||
(In reply to Rob Wood [:rwood] from comment #96)
> The original issue was reported/found by a manual test (without marionette)
> though, correct?
It's possible that the original issue manifests as a different leak than what we're seeing with marionette.
Comment 98•11 years ago
|
||
If you do
$ grep data:image/png gc-edges.762.1372815764.log | cut -f 4 -d ' ' | sort | uniq -c
you'll see that there are 28 copies each of two unique long png strings.
Comment 99•11 years ago
|
||
According to these GC logs, what's happening here is that we're leaking WebappsApplication objects.
These objects each keep a ref to the app's manifest. The manifest keeps a ref to the icon. The icon string is not deduplicated. Therefore we leak an icon string for each WebappsApplication object we hold alive.
Comment 100•11 years ago
|
||
per comment 96, please check if the leak is reproducible when performing manually to help with narrowing down the issue.
Flags: needinfo?(tchung)
Comment 101•11 years ago
|
||
I did, comment 91.
See the dependent bugs here; we have a decent idea of what's going on.
Flags: needinfo?(gsvelto)
Comment 102•11 years ago
|
||
I think FFOS might have different issues.
One is the marionette leak.
The inline icons are duplicated.
Another is B2G process might have a leakage.(Bug 889261)
We have to divide them.
Comment 103•11 years ago
|
||
Indeed. You did the right thing by filing a separate bug; it's important to have one bug for each issue so we don't conflate them.
Updated•11 years ago
|
Whiteboard: [mozilla-triage][MemShrink:P2] → [mozilla-triage][MemShrink:P2] [TD-59414]
Target Milestone: --- → 1.1 QE4 (15jul)
Comment 104•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #97)
> (In reply to Rob Wood [:rwood] from comment #96)
> > The original issue was reported/found by a manual test (without marionette)
> > though, correct?
>
> It's possible that the original issue manifests as a different leak than
> what we're seeing with marionette.
We knew that Bug 886217 is same issue with Bug 851626.
But Bug 886217 is a marionette issue. so I reopen the Bug 886217
Let's handle to Bug 886217 for icon duplicate issue in mariomette
Updated•11 years ago
|
Attachment #769115 -
Flags: feedback?(justin.lebar+bug)
Comment 105•11 years ago
|
||
(In reply to Mike Habicher [:mikeh] PTO until Aug 5 from comment #94)
> Running the gallery-camera stress test on an m-c/master build, the test
> actually completed; though the b2g parent process had ballooned to the point
> where only it and the Gallery app could fit in memory at the same time.
>
> After 100 iterations, I see 205 copies of the Marketplace app icon, ~2 per
> iteration.
>
> (jlebar, this is without your DOMRequest fixes--I'll run that test
> overnight.)
>
> I know next to nothing about how marionette works, but I wonder how it could
> be very specifically leaking data: URI icons.
Is there a copy of this stress test that I could use to see if this is potentially Marionette-related?
Comment 106•11 years ago
|
||
Hey Jonathan, Mike is referring to the gallery_camera endurance test:
https://github.com/rwood-moz/gaia-ui-tests/blob/gaiastress/gaiatest/tests/endurance/test_endurance_gallery_camera.py
Comment 107•11 years ago
|
||
perfect, thanks
Comment 108•11 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #105)
> (In reply to Mike Habicher [:mikeh] PTO until Aug 5 from comment #94)
> > Running the gallery-camera stress test on an m-c/master build, the test
> > actually completed; though the b2g parent process had ballooned to the point
> > where only it and the Gallery app could fit in memory at the same time.
> >
> >
> > I know next to nothing about how marionette works, but I wonder how it could
> > be very specifically leaking data: URI icons.
>
> Is there a copy of this stress test that I could use to see if this is
> potentially Marionette-related?
rwood pointed the test out to me. This is a pretty simple test, so I'm going to construct an orangutan version of it, which should help us determine if the problem is related to Marionette or not. If it is, we then have to figure out if it's any of the Gaia API's that Marionette calls that's involved.
Comment 109•11 years ago
|
||
I ran this test on mozilla-b2g18/v1-train on an inari (I don't have a leo). After 60 iterations, the gc-edges file shows hundreds of copies of the Marketplace icon and the icon for the HostStubTest app.
I doubt this has anything to do with core Marionette, but it may involve the gaiatest atoms. I will write a version of the test which doesn't use them and see if this persists.
Comment 110•11 years ago
|
||
I made a simplified version of this test using pure Marionette, without gaiatest, and found that it does not leak application icons. So, the problem isn't in Marionette per se, but either in gaiatest, or in the Gaia API's it uses. I'll narrow it down further.
Comment 111•11 years ago
|
||
Pure Marionette version of camera/gallery test, without gaiatest
Comment 112•11 years ago
|
||
I've narrowed this down to one of the WebAPI's that gaiatest uses. Adding or removing this line of code (without using the return value anywhere) is enough to trigger or resolve the icon leak:
let appsReq = navigator.mozApps.mgmt.getAll();
I'll make as simple a test as I can manage to reproduce this, then file a separate bug.
Comment 113•11 years ago
|
||
> Adding or removing this line of code (without using the return value anywhere) is enough
> to trigger or resolve the icon leak:
Yeesh, that's really bad, if you don't have to use the return value.
Please cc Fabrice on the new bug and mark it as [MemShrink].
Comment 114•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #113)
> > Adding or removing this line of code (without using the return value anywhere) is enough
> > to trigger or resolve the icon leak:
>
> Yeesh, that's really bad, if you don't have to use the return value.
>
> Please cc Fabrice on the new bug and mark it as [MemShrink].
Also - file the bug in Core --> DOM: Apps specifically.
Comment 115•11 years ago
|
||
This isn't a 1.01 regression and has been stagnating for a while. Do we really need to block on this?
blocking-b2g: leo+ → leo?
Comment 116•11 years ago
|
||
I'm hopeful that bug 900221 will fix this. But that shouldn't have much bearing either way on the blocking status.
Comment 118•11 years ago
|
||
Rob, Jonathan: are we still seeing this issue with the endurance tests?
Flags: needinfo?(rwood)
Flags: needinfo?(jgriffin)
Comment 119•11 years ago
|
||
Unfortunately I was able to reproduce this crash three times today by running the gallery_camera endurance test on Inari with the latest master build. Each time b2g crashed before the 45th iteration of switching between the gallery and camera.
Flags: needinfo?(rwood)
Flags: needinfo?(jgriffin)
Comment 120•11 years ago
|
||
Okay, I wanted to see if this was reproducible manually, and it is--kind of.
After approximately 110 screen taps (or about 55 full camera-gallery cycles) the screen on the test Inari I borrowed from rwood went black--it looks like the backlight turned off as well.
When I plugged in a USB cable to pull the logcat, I ran 'adb shell b2g-ps' which reported:
APPLICATION USER PID PPID VSIZE RSS WCHAN PC NAME
b2g root 111 1 173924 67024 ffffffff 4001b4e0 S /system/b2g/b2g
Usage app_335 335 111 66312 20792 ffffffff 401384e0 S /system/b2g/plugin-container
Homescreen app_343 343 111 68416 24296 ffffffff 4005c4e0 S /system/b2g/plugin-container
Camera app_410 410 111 88596 32316 ffffffff 40113abc R /system/b2g/plugin-container
Gallery app_435 435 111 78064 26908 ffffffff 401094e0 S /system/b2g/plugin-container
(Preallocated a root 452 111 63168 17184 ffffffff 4001b4e0 S /system/b2g/plugin-container
...all of the processes still active! None had crashed.
After some time, I noticed that the button backlight turned on. Pressing the power button once turned the button backlight off; pressing it again caused the lockscreen to come up properly!
Unlocking the device took me back into the camera; pressing the gallery button caused the screen to go black again, as above. Again, I was able to unlock the device, this time landing in the gallery. Hitting the camera button caused the screen to go black and _this_ time the b2g parent process crashed.
Updated•11 years ago
|
blocking-b2g: - → koi?
Comment 121•11 years ago
|
||
I'm going to remove the dependency on bug 897684, since I can reproduce this issue manually.
Status: NEW → ASSIGNED
No longer depends on: 897684
Comment 122•11 years ago
|
||
With the following b2g18 build, I am unable to observe any memory leaks after 150 manually-triggered Camera<-->Gallery cycles:
- gecko: b2g18:3655fe17b75b
- gaia: 763757e133a4fa8b0cb49f35a8e6b6700c0bf345
==> BASELINE:
14:00:58 ➜ gaia adb shell b2g-ps
APPLICATION USER PID PPID VSIZE RSS WCHAN PC NAME
b2g root 965 1 212352 65400 ffffffff 4007a430 S /system/b2g/b2g
Homescreen app_1033 1033 965 74708 28728 ffffffff 40102430 S /system/b2g/plugin-container
Camera app_1080 1080 965 69828 25000 ffffffff 400bd430 S /system/b2g/plugin-container
Gallery app_1113 1113 965 89748 29300 ffffffff 400c8430 S /system/b2g/plugin-container
(Preallocated a root 1140 965 63316 21308 ffffffff 40060430 S /system/b2g/plugin-container
==> AFTER 50 APP CYCLES (or 100 app switches):
14:36:48 ➜ btg030_hamachi-b2g18 git:(master) ✗ adb shell b2g-ps
APPLICATION USER PID PPID VSIZE RSS WCHAN PC NAME
b2g root 139 1 187332 59028 ffffffff 400c5430 S /system/b2g/b2g
Usage app_365 365 139 66452 25348 ffffffff 4005d430 S /system/b2g/plugin-container
Homescreen app_435 435 139 71640 29536 ffffffff 400bb430 S /system/b2g/plugin-container
Camera app_502 502 139 77708 26452 ffffffff 40076430 S /system/b2g/plugin-container
Gallery app_600 600 139 69584 26700 ffffffff 400ea430 S /system/b2g/plugin-container
(Preallocated a root 701 139 63312 21608 ffffffff 40055430 S /system/b2g/plugin-container
==> AFTER 150 APP CYCLES (or 300 app switches):
14:56:45 ➜ btg030_hamachi-b2g18 git:(master) ✗ adb shell b2g-ps
APPLICATION USER PID PPID VSIZE RSS WCHAN PC NAME
b2g root 139 1 187524 57740 ffffffff 400c5430 S /system/b2g/b2g
Usage app_365 365 139 66452 23092 ffffffff 4005d430 S /system/b2g/plugin-container
Homescreen app_435 435 139 71640 26768 ffffffff 400bb430 S /system/b2g/plugin-container
Camera app_502 502 139 78776 25680 ffffffff 40076430 S /system/b2g/plugin-container
Gallery app_600 600 139 69584 25140 ffffffff 400ea430 S /system/b2g/plugin-container
(Preallocated a root 701 139 63312 19232 ffffffff 40055430 S /system/b2g/plugin-container
Comment 123•11 years ago
|
||
if 26 is affected, 27 is likely affected as well
Comment 124•11 years ago
|
||
Based on previous discussions, moving this to koi+
blocking-b2g: koi? → koi+
Comment 125•11 years ago
|
||
Hema
Any progress on this bug since it hasn't been commented on since 9/24
Flags: needinfo?(hkoka)
Comment 126•11 years ago
|
||
Hoping to get some help from perf team on this bug (mikeh is busy with the latency bugs on camera);
Flags: needinfo?(mlee)
Comment 127•11 years ago
|
||
Kyle, is anyone on the MemShrink team able to help with this issue?
Flags: needinfo?(mlee) → needinfo?(khuey)
Priority: -- → P2
Target Milestone: 1.1 QE4 (15jul) → 1.2 C3(Oct25)
Updated•11 years ago
|
Updated•11 years ago
|
Flags: needinfo?(hkoka)
Updated•11 years ago
|
Status: ASSIGNED → NEW
(In reply to Hema Koka [:hema] from comment #124)
> Based on previous discussions, moving this to koi+
Where did this discussion happen? I don't see why the rationale from comment 115 no longer applies.
Flags: needinfo?(hkoka)
Comment 130•11 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #129)
> (In reply to Hema Koka [:hema] from comment #124)
> > Based on previous discussions, moving this to koi+
>
> Where did this discussion happen? I don't see why the rationale from
> comment 115 no longer applies.
Agreed. Moving to koi?, as we need cut back on what we're blocking on for the release at this point anyways. We've shipped two releases with this bug already.
blocking-b2g: koi+ → koi?
Comment 131•11 years ago
|
||
Hema Koka deleted the linked story in Pivotal Tracker
Comment 132•11 years ago
|
||
Moving it out of koi? -- If we start seeing this frequently, we can renominate it back (from comment 122)
blocking-b2g: koi? → ---
MemShrink doesn't have bandwidth to look into non-koi+ bugs ourselves right now. If the situation changes here we can reevaluate.
Flags: needinfo?(khuey)
Updated•11 years ago
|
Flags: needinfo?(hkoka)
Comment 134•11 years ago
|
||
mikeh: does this still reproduce, or can we close this?
Flags: needinfo?(mhabicher)
Comment 135•11 years ago
|
||
Rob, do we still see this endurance issue?
Flags: needinfo?(mhabicher) → needinfo?(rwood)
Comment 136•11 years ago
|
||
This test is now obsolete. In 1.4 and master/2.0 you can no longer switch back to the gallery from within the camera by a single button press. In 1.4/2.0 you need to take a photo, view the preview, click a menu, and then choose to switch to gallery from the preview.
I will close this bug as wontfix. If/when I update the endurance test for 2.0 and this issue is seen again, I will open a new bug.
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: needinfo?(rwood)
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•