Closed
Bug 802446
Opened 12 years ago
Closed 12 years ago
B2G memshrink brainstorming bug
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: justin.lebar+bug, Unassigned)
References
(Depends on 2 open bugs)
Details
Attachments
(3 files)
In today's MemShrink meeting we decided we wanted a place to dissect memory dumps and find opportunities for improvement. That's this bug.
I'll post some about:memory dumps from the device, but feel free to ask for specific use cases or provide your own.
To generate an about:memory dump, do the following:
* Update your B2G checkout (git fetch and merge; ./repo sync isn't sufficient)
* Run ./get-about-memory.py
* gunzip merged-reports.gz
* Open nightly on your desktop and load the file into about:memory (see button at the bottom of about:memory)
* Copy-paste the text into a file and attach it here.
Reporter | ||
Comment 1•12 years ago
|
||
== DUMP 1 ==
I loaded a few apps but didn't interact with them much. I loaded mozilla.org into the browser.
Reporter | ||
Comment 2•12 years ago
|
||
Reporter | ||
Comment 3•12 years ago
|
||
You can load this file into about:memory on your machine.
Comment 4•12 years ago
|
||
One idea I had was that we could take that hugetastical list of compartments and post it on dev.platform and see if people have ideas of things we could get rid of. It would reach a broad number of people. But I don't know how many people are going to dig through that list, so maybe such a scattershot approach won't be effective.
Comment 5•12 years ago
|
||
jlebar, can you mail dev.platform and point them to this bug?
Reporter | ||
Comment 6•12 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #5)
> jlebar, can you mail dev.platform and point them to this bug?
Can we get a few days' analysis here first? We're nominally the experts in understanding what these numbers mean.
Comment 7•12 years ago
|
||
Initial thoughts:
- Shared libraries are big. Fortunately the PSS numbers are substantially lower than the RSS numbers.
- JS dominates among the Gecko stuff. DOM and layout hardly matter in comparison.
- 1 MiB of xpti-working-set per process is terrible. Bug 799658 is open for that.
- heap-unclassified continues to be annoyingly high. Recent reporter fixes (esp. bug 799796) should help a bit.
Reporter | ||
Comment 8•12 years ago
|
||
Another idea I had was to basically grep through the processes' private address spaces to see whether there's a lot of other memory we might be able to share. But the trick would be identifying the owner of a page once we've found a candidate for sharing.
Comment 9•12 years ago
|
||
I find analysis-temporary get a big ratio of memory usage. I had done some simple tests. By cutting off default chunk size of LifoAlloc (LIFO_ALLOC_PRIMARY_CHUNK_SIZE) from 128K to 32K, it cuts off a lot of memory (5+%). If we free analysis-temporary more aggressively, it cuts more, 8~10% I guess.
Comment 10•12 years ago
|
||
Does the nsEffectiveTLDService need to be running in all processes? Judging from DMDV's output it is, and on 64-bit builds it's slightly more than 128 KiB per process.
Comment 11•12 years ago
|
||
Chrome process spend a lot of space on huge string. Most of them are used for data URI for 7.*MB. Following are the list of compartments that use data URI.
- BrowserElementParent.js (1.6MB)
- contentSecurityPolicy.js (1.41MB)
- CSPUtils.js (1.1MB)
- system app (2.81MB)
Near all of them are image data.
Reporter | ||
Comment 12•12 years ago
|
||
> Chrome process spend a lot of space on huge string.
> Near all of them are image data.
Some of these at least are screenshots, which we're tracking in bug 798002 and dependencies. But CSPUtils.js using screenshots sounds unlikely to me, so I dunno what that is.
It would be relatively easy to get a dump of all large strings and their associated compartments. If you still see a lot of huge strings after we fix bug 802647, let me know and I'll work on this.
> Does the nsEffectiveTLDService need to be running in all processes?
We ought to be able to proxy those calls to the parent process; I can't imagine we make many calls into it.
I also have to imagine that we could compress its data structures. (I say this without having ever looked at this code, but just in general... :)
Comment 13•12 years ago
|
||
I'm seeing this a lot:
1 block(s) in record 1 of 12897
262,144 bytes (262,112 requested / 32 slop)
1.61% of the heap (1.61% cumulative unreported)
malloc (vg_replace_malloc.c:270)
moz_xmalloc (mozalloc.cpp:54)
operator new[](unsigned long) (mozalloc.h:200)
std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
std::string::append(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
IPC::Channel::ChannelImpl::ProcessIncomingMessages() (ipc_channel_posix.cc:496)
IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int) (ipc_channel_posix.cc:747)
base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) (message_pump_libevent.cc:213)
event_process_active (event.c:385)
event_base_loop (event.c:522)
base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) (message_pump_libevent.cc:331)
MessageLoop::RunInternal() (message_loop.cc:215)
MessageLoop::RunHandler() (message_loop.cc:208)
MessageLoop::Run() (message_loop.cc:182)
base::Thread::ThreadMain() (thread.cc:156)
ThreadFunc(void*) (platform_thread_posix.cc:39)
start_thread (pthread_create.c:308)
clone (clone.S:112)
This looks like an IPC buffer. Is it expected that it's (often) this big? Unfortunately it uses std::string which means we can't really measure it in memory reports.
Comment 14•12 years ago
|
||
I'm also seeing lots of variants on this within a single process:
Unreported: 28 block(s) in record 3 of 12897
114,688 bytes (60,256 requested / 54,432 slop)
0.70% of the heap (3.15% cumulative unreported)
at 0x402C2AF: malloc (vg_replace_malloc.c:270)
by 0x418E03B: moz_xmalloc (mozalloc.cpp:54)
by 0x54C1611: operator new[](unsigned long) (mozalloc.h:200)
by 0x57ABE9E: nsJAR::nsJAR() (nsJAR.cpp:92)
by 0x57B00E8: nsZipReaderCache::GetZip(nsIFile*, nsIZipReader**) (nsJAR.cpp:1092)
by 0x57B409E: nsJARChannel::CreateJarInput(nsIZipReaderCache*) (nsJARChannel.cpp:276)
by 0x57B4845: nsJARChannel::EnsureJarInput(bool) (nsJARChannel.cpp:357)
by 0x57B55FA: nsJARChannel::AsyncOpen(nsIStreamListener*, nsISupports*) (nsJARChannel.cpp:702)
by 0x582F077: imgLoader::LoadImage(nsIURI*, nsIURI*, nsIURI*, nsIPrincipal*, nsILoadGroup*, imgINotificationObserver*, nsISupports*, unsigned int, nsISupp
orts*, imgIRequest*, nsIChannelPolicy*, imgIRequest**) (imgLoader.cpp:1716)
by 0x5C5959D: nsContentUtils::LoadImage(nsIURI*, nsIDocument*, nsIPrincipal*, nsIURI*, imgINotificationObserver*, int, imgIRequest**) (nsContentUtils.cpp:
2764)
by 0x5CF992A: nsImageLoadingContent::LoadImage(nsIURI*, bool, bool, nsIDocument*, unsigned int) (nsImageLoadingContent.cpp:664)
by 0x5CF9475: nsImageLoadingContent::LoadImage(nsAString_internal const&, bool, bool) (nsImageLoadingContent.cpp:578)
by 0x5EDED9A: nsHTMLImageElement::SetAttr(int, nsIAtom*, nsIAtom*, nsAString_internal const&, bool) (nsHTMLImageElement.cpp:378)
by 0x5E94D7E: nsGenericHTMLElement::SetAttr(int, nsIAtom*, nsAString_internal const&, bool) (nsGenericHTMLElement.h:245)
by 0x5E9DC7C: nsGenericHTMLElement::SetAttrHelper(nsIAtom*, nsAString_internal const&) (nsGenericHTMLElement.cpp:2871)
by 0x5EDE3E5: nsHTMLImageElement::SetSrc(nsAString_internal const&) (nsHTMLImageElement.cpp:114)
by 0x66ED8A2: nsIDOMHTMLImageElement_SetSrc(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, int, JS::MutableHandle<JS::Value>) (dom_quickstubs.cpp:13
179)
by 0x7BD13A3: js::CallJSPropertyOpSetter(JSContext*, int (*)(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, int, JS::MutableHandle<JS::Value>), JS::
Handle<JSObject*>, JS::Handle<jsid>, int, JS::MutableHandle<JS::Value>) (jscntxtinlines.h:450)
by 0x7BD26D3: js::Shape::set(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, bool, JS::MutableHandle<JS::Value>) (jsscopeinlines.h:333)
by 0x7BE624D: js_NativeSet(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, js::Shape*, bool, bool, JS::Value*) (jsobj.cpp:4284)
Lots of images are being loaded from JARs, and the unzipping requires memory(?) I don't see ones like this on desktop.
Comment 15•12 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #12)
> > Chrome process spend a lot of space on huge string.
> > Near all of them are image data.
>
> Some of these at least are screenshots, which we're tracking in bug 798002
It seems working for me. The huge strings are dramatically dropped. Only system app still use data URI for images.
Comment 16•12 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #15)
> > Some of these at least are screenshots, which we're tracking in bug 798002
> It seems working for me. The huge strings are dramatically dropped. Only
> system app still use data URI for images.
The default background is stored as a data: URI in a setting. This is probably the one you're seeing.
Comment 17•12 years ago
|
||
heap-dirty is 2.2~3.5MB for every process. I had try to reduce it by reducing opt_dirty_max to 256 from default value 1024. Then heap-dirty is dropped dramatically to 0.5MB~0.8xMB. I also measure boot time of otoro, I can not tell any change before and after the change (25s for both).
opt_dirty_max can be changed to 256 by add a line of |export MALLOC_OPTIONS="ff"| in b2g.sh.
Comment 18•12 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #17)
> heap-dirty is 2.2~3.5MB for every process. I had try to reduce it by
> reducing opt_dirty_max to 256 from default value 1024. Then heap-dirty is
> dropped dramatically to 0.5MB~0.8xMB. I also measure boot time of otoro, I
> can not tell any change before and after the change (25s for both).
>
> opt_dirty_max can be changed to 256 by add a line of |export
> MALLOC_OPTIONS="ff"| in b2g.sh.
We're trying to tackle this issue in bug 805855, I'm currently working on a patch that will reduce opt_dirty_max as you suggest as well as clear it completely when apps are sent to the background.
Comment 19•12 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #14)
> Lots of images are being loaded from JARs, and the unzipping requires
> memory(?)
Yes, that should be expected, nsJAR creates an instance of nsZipArchive which in turn uses zlib for decompression, the comment here states that this requires 9520 + 32768 bytes per each decompression:
http://mxr.mozilla.org/mozilla-central/source/modules/libjar/nsZipArchive.cpp#73
BTW this seems inconsistent with zlib's own documentation which states 11520 + 32768 (see under the Memory Footprint section):
http://www.zlib.net/zlib_tech.html
Reporter | ||
Comment 20•12 years ago
|
||
(In reply to Fabrice Desré [:fabrice] from comment #16)
> The default background is stored as a data: URI in a setting. This is
> probably the one you're seeing.
I filed bug 806374.
> This looks like an IPC buffer. Is it expected that it's (often) this big? Unfortunately
> it uses std::string which means we can't really measure it in memory reports.
We should be able to use a custom allocator? I filed bug 806377.
> Lots of images are being loaded from JARs, and the unzipping requires memory(?)
Compressing images in JARs sounds pretty dumb. I wonder if we store the images in the jar with zero compression whether that will cause us to spin up the gzip instances.
I filed bug 806379 for the dark matter and bug 806383 for reducing the memory usage here somehow.
Comment 21•12 years ago
|
||
The per-process overhead is non-trivial. Some examples (warning, 64-bit build which overstates things somewhat):
├─────415,464 B (02.27%) -- layout
│ ├──365,320 B (01.99%) ── style-sheet-cache
│ └───50,144 B (00.27%) ── style-sheet-service
├─────407,232 B (02.22%) -- xpcom
│ ├──231,296 B (01.26%) ── component-manager
│ ├──135,264 B (00.74%) ── effective-TLD-service
│ └───40,672 B (00.22%) ── category-manager
├─────350,096 B (01.91%) ── atom-tables
├─────171,576 B (00.94%) ── xpconnect
├─────165,760 B (00.91%) ── script-namespace-manager
├─────165,264 B (00.90%) ── preferences
├──────36,864 B (00.20%) ── cycle-collector/collector-object
├──────21,712 B (00.12%) ── telemetry
This is from the clock app, where (presumably) a lot of this stuff isn't exactly necessary.
Comment 22•12 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #21)
> The per-process overhead is non-trivial. Some examples (warning, 64-bit
> build which overstates things somewhat):
>
> ├─────415,464 B (02.27%) -- layout
> │ ├──365,320 B (01.99%) ── style-sheet-cache
I dug into this some more. Here are the sizes of each of the seven sheets within the cache:
mFormsSheet: 66888
mFullScreenOverrideSheet: 752
mQuirkSheet: 48472
mScrollbarsSheet: 21152
mUASheet: 222504
mUserChromeSheet: 0
mUserContentSheet: 0
The UASheet is easily the biggest. I wonder if it can be made smaller?
Comment 23•12 years ago
|
||
(I forgot to mention that the style-sheet-cache numbers are the same for every process.)
Comment 24•12 years ago
|
||
> I wonder if it can be made smaller?
I wonder how much of the space is ua.css itself vs html.css and xul.css (which it imports). I'll bet money xul.css is the main reason this is taking so much space. :(
Reporter | ||
Comment 25•12 years ago
|
||
> I'll bet money xul.css is the main reason this is taking so much space. :(
We don't have any xul in B2G content processes, and we have very little xul in the B2G main process. Could we coalesce these files and then remove the unnecessary bits, or do you think that's a losing game?
Comment 26•12 years ago
|
||
> We don't have any xul in B2G content processes,
No scrollbars? No video controls?
I think getting data on whether my hunch is right would be good. If it is, we might be able to come up with a smaller xul.css for b2g, possibly.
Comment 27•12 years ago
|
||
We could also try to disable the system/user chunk separation for content processes.
During app startup we allocate 1MB for the chrome and 1MB for content JS heap (4MB if we consider the alignment code) but maybe we never allocate more than 1MB JS objects in the first place for common apps.
Reporter | ||
Comment 28•12 years ago
|
||
> During app startup we allocate 1MB for the chrome and 1MB for content JS heap (4MB if we consider
> the alignment code) but maybe we never allocate more than 1MB JS objects in the first place for
> common apps.
We should be careful not to conflate virtual memory usage and RSS. We allocate up to 4MB of virtual memory for these chunks, but much of that will not be committed.
In fact, if different compartments can't share arenas (pages, in the JS engine), I don't see how merging the chunks would make a difference in RSS.
Comment 29•12 years ago
|
||
There are 4 coefficient tables computed at runtime:
Num: Value Size Type Bind Vis Ndx Name
194068: 012f3170 65536 OBJECT LOCAL DEFAULT 24 jpeg_nbits_table
180068: 012e0be4 65536 OBJECT LOCAL DEFAULT 24 _ZL17sPremultiplyTable
180066: 012d0be4 65536 OBJECT LOCAL DEFAULT 24 _ZL19sUnpremultiplyTable
14744: 012bb15c 41984 OBJECT LOCAL DEFAULT 24 _ZL18gUnicodeToGBKTable
http://mxr.mozilla.org/mozilla-central/source/media/libjpeg/jchuff.c#24
http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxUtils.cpp#24
http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxUtils.cpp#25
http://mxr.mozilla.org/mozilla-central/source/intl/uconv/ucvcn/nsGBKConvUtil.cpp#18
They can be easily converted to constants to save 233KB .bss at the expense of elf size. Is it worth it?
The following two dynamically allocated tables are actually redundant of the above, although they are in quite different source trees.
http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/CanvasRenderingContext2D.cpp#3558
http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/CanvasRenderingContext2D.cpp#3362
Reporter | ||
Comment 30•12 years ago
|
||
> They can be easily converted to constants to save 233KB .bss at the expense of elf size.
> Is it worth it?
Probably, yes! Let's figure out the details in a new bug?
Comment 31•12 years ago
|
||
I think this bug has served its purpose. Current B2G memory consumption excitement is over in bug 837187. Come join the party.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•