Closed
Bug 1254777
Opened 9 years ago
Closed 4 years ago
[meta] Reduce size of writable static data
Categories
(Core :: General, defect)
Core
General
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox48 | --- | affected |
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
(Keywords: meta, Whiteboard: [overhead:meta])
Attachments
(2 files, 6 obsolete files)
Writable static data (e.g. non-const static variables, vtables, etc.) can take up significant memory, and it is duplicated in every process. This is a meta-bug for improving this.
How I'm finding these (on Linux), thanks to glandium's help:
- `readelf -W -l libxul.so` gives the sections; from the LOAD RW section I can
see the relevant segments.
- `objdump -t -j .dynamic -j .got -j .got.plt -j .data -j .jcr -j .tm_clone_table -j .fini_array -j .init_array -j .kPStaticModules -j .data.rel.ro.local -j .data.rel.ro -j .bss libxul.so` gives all the writable data symbols. (The segments were obtained from the previous command.)
- `sort --key=5,6 -r -n` sorts the objdump output so the largest symbols are shown first.
Assignee | ||
Comment 1•9 years ago
|
||
Assignee | ||
Comment 2•9 years ago
|
||
Here's the same data but with symbol names demangled (using objdump's -C flag).
Attachment #8728168 -
Attachment is obsolete: true
Assignee | ||
Comment 3•9 years ago
|
||
This one actually is demangled.
Assignee | ||
Updated•9 years ago
|
Attachment #8728170 -
Attachment is obsolete: true
Comment 4•9 years ago
|
||
Those ff_{sin,cos} symbols come from libav here:
https://dxr.mozilla.org/mozilla-central/rev/4657041c6f77b36b1fb9647c28f53f4f51757360/media/libav/libavcodec/fft_template.c#37
https://dxr.mozilla.org/mozilla-central/rev/4657041c6f77b36b1fb9647c28f53f4f51757360/media/libav/libavcodec/rdft.c#31
They are literally sine and cosine lookup tables, but they're readable because they're calculated at runtime. Upstream libav has host programs that can generate these tables at build-time, which makes them const:
https://github.com/libav/libav/blob/master/libavcodec/cos_tablegen.c
Integrating that into our build system is probably a PITA because we don't have a good way to wire up a HOST_PROGRAM to GENERATED_FILES, but that program is also pretty simple and we could almost certainly rewrite it in Python.
Updated•9 years ago
|
Blocks: e10s-multi
Comment 5•9 years ago
|
||
Is the first column size? Because it doesn't seem to be sorted correctly. 000000000028b850 is followed by 00000000050753d0, for instance. Sorry if I'm confused about something here.
I looked at sAttributes_specs. It is static const, and is marked as .data.rel.ro.local. According to a Stack Overflow answer I found [1], it sounds like this is because it contains function pointers, which can be relocated. Is this something that we can or should change?
[1] http://stackoverflow.com/questions/7029734/what-is-the-data-rel-ro-used-for
Comment 6•9 years ago
|
||
The fifth column is size; the first column is the address of the symbol within the library.
Comment 7•9 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #5)
> I looked at sAttributes_specs. It is static const, and is marked as
> .data.rel.ro.local. According to a Stack Overflow answer I found [1], it
> sounds like this is because it contains function pointers, which can be
> relocated. Is this something that we can or should change?
We have to hand off function pointers to the JS engine at some point. Bug 786819 comment 4 lays out a scheme for having fewer function pointers, though.
There's a bug I filed on how to lay out some of the datstructures so we stop having so many nullptrs everwhere. I can't find it right now, though. :(
Assignee | ||
Updated•9 years ago
|
Whiteboard: [MemShrink]
Assignee | ||
Comment 8•9 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #4)
> Those ff_{sin,cos} symbols come from libav here:
One complication: the latest attachment shows symbols from all the .so files in $OBJDIR/dist/bin/ on a Linux64 build. But some of them are dynamically loaded on demand. E.g. avcodec_options is in libmozavcodec.so which is dynamically loaded when we play a VP9 video.
ff_{sin,cos} are in liblgpllibs.so, I'm not sure how that one gets loaded.
Assignee | ||
Comment 9•9 years ago
|
||
Here's a description of the different segment types.
* .text: read-only code.
* .rodata: non-zero read-only data.
* .bss: initially-all-zero writable data.
* .data (no pointers): non-zero writable data. Often it's actually read-only in
practice, in which case |const| can be added, changing it to .rodata, which is
better than .data because it allows it to be shared between processes.
* .data (with pointers): like ".data (no pointers)", but relocatable. If |const|
is added it becomes .data.rel.ro, which is better than .data because (a) the
compiler might be able to optimize it, and (b) the data cannot be modified
accidentally/maliciously (if the platform supports this).
* .data.rel.ro, .data.rel.ro.local: read-only data that contains pointers and so
is relocatable.
There are a few other kinds, but they don't take up much space.
> -----------------------------------------------------------------------------
> Type Uses disk Uses address Uses physical Shared between
> space? space? memory? processes?
> -----------------------------------------------------------------------------
> .text Y Y if touched Y
> .rodata Y Y if touched Y
> .bss - Y if touched -
> .data(no ptrs) Y Y probably[1] -
> .data(w/ptrs) Y Y always[2] -
> .data.rel.ro* Y Y always -
> -----------------------------------------------------------------------------
[1]: because it probably shares pages with ".data(w/ptrs)" symbols
[2]: because applying relocations requires touching the memory
Assignee | ||
Comment 10•9 years ago
|
||
Here's the script I'm using to get these results. I've tweaked it to:
- include all symbols except for .text ones (.rodata ones are interesting, even if they can be shared)
- cut out the boring stuff at the start of each line
Just run it in a directory containing .so files, such as $OBJDIR/dist/bin/, and it'll put the results in .so.syms files, and a summary of everything in all.syms.
Linux only!
Assignee | ||
Comment 11•9 years ago
|
||
Updated version, obtained with the attached script.
Attachment #8728341 -
Attachment is obsolete: true
Comment 12•9 years ago
|
||
> .data.rel.ro.local 0000000000012310 kSTSPreloadList
We ought to be able to significantly reduce the size of this table by applying the same techniques the effective tld service uses, which will also make it shareable across processes. Will eliminate a number of relocations, too.
Comment 13•9 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #8)
> ff_{sin,cos} are in liblgpllibs.so, I'm not sure how that one gets loaded.
It's just linked directly to libxul:
$ LD_LIBRARY_PATH=`pwd` ldd libxul.so | grep liblgpl
liblgpllibs.so => /home/luser/firefox/liblgpllibs.so (0x00007f1a7c41a000)
(built as a standalone lib because it contains LGPL code)
Assignee | ||
Comment 14•9 years ago
|
||
I filed an upstream bug report (https://ssl.icu-project.org/trac/ticket/12390) to shrink the silly gDigitCount lookup table in ICU.
Assignee | ||
Comment 15•9 years ago
|
||
A fair chunk of the static data is from DOM bindings:
- sAttributes_specs/sAttributes_ids/sAttributes (and similar ones for methods, constants, etc)
- InterfaceObjectClass and PrototypeClass (which incorporate js::Class)
- sNativeProperties (which is usually mostly nullptrs)
- There is lots of duplicated data across worker and non-worker code, e.g. |namespace URLBinding| and |namespace
URLBinding_workers|
- sNativePropertyHooks
Bug 1011826 is about shrinking Prefable, which shrinks sAttributes/sMethods/sConstants/etc.
Depends on: 1011826
Assignee | ||
Comment 16•9 years ago
|
||
froydnj: as a follow-up to our conversation today I looked more closely at vtables. There are 12,276 of them. They take up 2,564,936 bytes! That's roughly half of our static data.
I looked for frequently-occurring patterns in the names, which suggest possibly
fruitful inheritance hierarchies. Here's a list...
- Foo:cycleCollection 770
- Ends with "Event" 607
- Ends with "Impl" 566
- Ends with "Accessible 365
- Ends with "Parent" 353
- Ends with "Child" 320
- nsRunnableMethodImpl<T> 285
- Ends with "Element" 278
- Ends with "Callback" 269
- PDocAccessible::{Msg,Reply}_* 259
- Ends with "Runnable" 245
- PContent::* 233
- Ends with "Frame" 199
- Ends with "Listener" 174
- Ends with "Stream" 165
- MozPromise<T> 152
- Ends with "Channel" 149
- Ends with "Request" 143
- Ends with "Handler" 141
- Runnable{Function,Method} 126
- Ends with "Effect" 102
- Foo::Compiler 100
- Ends with "Command" 74
- Ends with "Layer" 70
- Ends with "Type" 63
- Ends with "Document" 53
- PointerClearer<T> 53
- Ends with "Controller" 50
Assignee | ||
Comment 17•9 years ago
|
||
(needinfo'ing to ensure you see comment 16)
Flags: needinfo?(nfroyd)
Assignee | ||
Comment 18•9 years ago
|
||
I also looked more carefully at the DOM bindings structs. bz, you'll be interested in this.
- DOMIfaceAndProtoJSClass sInterfaceObjectClass: 748 x 240 B = 179,520 B (.d.r.ro.l)
- DOMIfaceAndProtoJSClass sPrototypeClass: 794 x 240 B = 190,560 B (.d.r.ro.l)
- DOMJSClass sClass: 773 x 264 B = 204,072 B (.d.r.ro.l)
- These sizes are *after* bug 1259124's patches are applied, which saved
over 200,000 B overall.
- The same trick from that bug should be applicable a couple more times,
saving that much again and more.
- s{Attributes,Methods,...}_specs: 414,888 (.data.rel.ro.local)
- Depends on JSPropertySpec; shrinking that looks tricky
- s{Attributes,Methods,...}_ids: 81,152 (.bss)
- Each one is only a word, so shrinking seems difficult
- s{Attributes,Methods,...}: 49,388 (.data)
- Bug 1011826 shrank this by 2/3 already, hard to do more
- s{Attributes,Methods,...}_disablers: 11,720 (.data)
- Bug 1011826 introduced these
- sNativeProperties: 739 x 176 B == 130,064 B total (.data.rel.ro.local)
- An inefficient structure. Making it variable-length should reduce it to
1/3 its current size or less, saving 90--100 KB.
- *_[gs]etterinfo: 16 x 6796 (in total) = 108,736 (.data.rel.ro.local)
- Looks hard to shrink
- sNativePropertyHooks: 792 x 48 B = 38,016 B (.data.rel.ro.local)
- Looks hard to shrink
That gives a total of 1,408,116 bytes, and that's after cutting about 300 KB in bug
1259124 (pending) and bug 1011826 (landed). The savings identified above should give
us another 300 KB or more. I'd love to hear any other ideas.
Flags: needinfo?(bzbarsky)
Assignee | ||
Comment 19•9 years ago
|
||
Looks like I missed all the 'sChrome' variants of the DOM bindings stuff, so some of the items above will be under-measured.
Comment 20•9 years ago
|
||
>- DOMIfaceAndProtoJSClass sInterfaceObjectClass: 748 x 240 B = 179,520 B (.d.r.ro.l)
>- DOMIfaceAndProtoJSClass sPrototypeClass: 794 x 240 B = 190,560 B (.d.r.ro.l)
What's particularly annoying is that the JSClass parts of these fall into one of a small number of buckets. But we can't easily share those, because the JSClass* is what the JS engine is pointing to, so they need to be distinct pointers for cases when our non-JSClass bits differ.
> - These sizes are *after* bug 1259124's patches are applied, which saved
> over 200,000 B overall.
I assume you mean bug 1259194? That sort of approach does seem likely to be fruitful if we can figure out which things are typically the same across all this stuff.
>- s{Attributes,Methods,...}_ids: 81,152 (.bss)
> - Each one is only a word, so shrinking seems difficult
So... I _think_ this is only used for Xray code, and only used for finding "the index of the thing with the given jsid, so we can use that as an index into our JSPropertySpec/JSFunctionSpec array".
If we had a way to walk a JSPropertySpec[] or JSFunctionSpec[] and do fast-ish compares of a given jsid to the const char* name in there , I think we could nix the *_ids arrays completely. We'd need to be a little careful about symbol IDs, but I think that should not be too bad.
One other thing we could do if it's worth it: the mPrototypeID in DOMIfaceAndProtoJSClass is the same as mPrototypeID in NativePropertyHooks as far as I can tell. So we could nix it from DOMIfaceAndProtoJSClass and have an extra pointer-chase at the consumers, if they're not too hot (and I expect they're not). We could also move mDepth into the NativePropertyHooks so we don't have two copies of it lying around in the common case (one in the interface object class, one in the prototype object class). Again, not sure whether this helps in practice given how DOMIfaceAndProtoJSClass packs.
Updated•9 years ago
|
Flags: needinfo?(bzbarsky)
Comment 21•9 years ago
|
||
A non-exhaustive checking of these:
(In reply to Nicholas Nethercote [:njn] from comment #16)
> - Foo:cycleCollection 770
These are likely pretty tight; I think glandium spent some time optimizing these.
> - Ends with "Event" 607
These don't seem to fit nicely into one category: there's mozilla::dom::Event subclasses, which I'm not sure we can do a whole lot about, things that are sort of like nsIRunnable, but more specialized, and WebIDL events (?), which are also similarly intractable.
> - Ends with "Impl" 566
A fair number of these seem to be *JSImpl things from the WebIDL bindings, which are not easily minimized. Some of these also hail from webrtc:: as well.
> - Ends with "Accessible 365
I do not get nearly that many:
froydnj@thor:/opt/build/froydnj/build-debug$ readelf -sW toolkit/library/libxul.so |c++filt|egrep -c 'vtable.*Accessible$'
93
Am I doing something wrong?
> - Ends with "Parent" 353
> - Ends with "Child" 320
I mentioned in our conversion that IPDL/IPC code sets up its class hierarchies a little oddly; we could probably eliminate some of these, or at least trim them down, but it would take some work.
> - nsRunnableMethodImpl<T> 285
Not much to be done here, I think.
> - Ends with "Element" 278
> - Ends with "Callback" 269
> - PDocAccessible::{Msg,Reply}_* 259
I filed bug 1259428 for this and similar things. I think we can get a nice win here.
> - Ends with "Channel" 149
I'm not seeing this many, and the ones that I do appear to be mostly nsI* classes. We don't care much about those vtables, as --gc-sections (not turned on for local builds, AIUI) will take care of those.
> - Ends with "Request" 143
> - Ends with "Handler" 141
> - Runnable{Function,Method} 126
Not a lot we can do here. It would be nice if the C++ side of things didn't need full nsIRunnable objects, but could instead get by with objects having just a Run() method (I know the graphics folks mentioned this when I asked why they implemented a separate thread pool instead of reusing nsThreadPool...), but I think that's intractable.
Flags: needinfo?(nfroyd)
Comment 22•9 years ago
|
||
> and WebIDL events (?), which are also similarly intractable.
Wait. Shouldn't WebIDL events just have the same vtable a dom::Event? I guess not in the cases when they need to do their own cycle collection... I just checked and for Web IDL events we have 32 that have something to cycle collect and 42 that don't have anything to cycle collect, if I'm measuring right. Also, I guess virtual destructors could cause problems here. :(
Assignee | ||
Comment 23•9 years ago
|
||
> > - Ends with "Accessible 365
>
> I do not get nearly that many:
>
> froydnj@thor:/opt/build/froydnj/build-debug$ readelf -sW
> toolkit/library/libxul.so |c++filt|egrep -c 'vtable.*Accessible$'
> 93
>
> Am I doing something wrong?
For all my searches I did "Foo\>" rather than "Foo$". For a lot of the Accessible ones I think there was a template parameter like "Accessible<T>".
Comment 24•9 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #21)
> A non-exhaustive checking of these:
>
> (In reply to Nicholas Nethercote [:njn] from comment #16)
> > - Foo:cycleCollection 770
>
> These are likely pretty tight; I think glandium spent some time optimizing
> these.
I optimized the macros. I didn't look from the angle of static data usage.
Assignee | ||
Comment 25•9 years ago
|
||
> - sNativeProperties: 739 x 176 B == 130,064 B total (.data.rel.ro.local)
> - An inefficient structure. Making it variable-length should reduce it to
> 1/3 its current size or less, saving 90--100 KB.
I filed bug 1260653 for this. The draft patch saves 110 KB.
Assignee | ||
Comment 26•9 years ago
|
||
> >- s{Attributes,Methods,...}_ids: 81,152 (.bss)
> > - Each one is only a word, so shrinking seems difficult
>
> So... I _think_ this is only used for Xray code, and only used for finding
> "the index of the thing with the given jsid, so we can use that as an index
> into our JSPropertySpec/JSFunctionSpec array".
>
> If we had a way to walk a JSPropertySpec[] or JSFunctionSpec[] and do
> fast-ish compares of a given jsid to the const char* name in there , I think
> we could nix the *_ids arrays completely. We'd need to be a little careful
> about symbol IDs, but I think that should not be too bad.
I filed bug 1261726 for this one, with extra information from our IRC discussion. I doubt I will get to this one any time soon, bz, so please take it if you want it.
Assignee | ||
Comment 27•9 years ago
|
||
Another one worth noting, even though it's a read-only table and thus shared...
In media/libjpeg/, jpeg_nbits_table[] is 64 KiB, and it is just a table implementation of the CLZ intrinsic. The comment USE_CLZ_INTRINSIC documents this, and why the intrinsic is only used on ARM by default (some x86 implementations are slow?). This option is GCC/clang-only, there's nothing supporting use of intrinsics for MSVC.
Assignee | ||
Comment 28•9 years ago
|
||
Another read-only one:
There's an option SMALL_TABLE which reduces the prime_tab table from 51.1 KiB to 1 KiB. I don't know how hot this table is and thus whether it would hurt performance. Also, it's in freebl, which is in NSS, so setting that constant may be difficult.
Comment 29•9 years ago
|
||
I think `codesighs` used to report metrics like this, but it rotted due to toolchain changes over time. Would it be useful to generate this data as part of the build, and report the sum to perfherder so we could graph it over time?
Comment 30•9 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] (Vacation May 4th-11th) from comment #29)
> I think `codesighs` used to report metrics like this, but it rotted due to
> toolchain changes over time. Would it be useful to generate this data as
> part of the build, and report the sum to perfherder so we could graph it
> over time?
Yes! How difficult is that to do?
Assignee | ||
Comment 31•9 years ago
|
||
Just running |size| on libxul.so on Linux and parsing the output should be pretty easy.
Assignee | ||
Comment 32•8 years ago
|
||
Here's a slightly tweaked version of the script.
Assignee | ||
Updated•8 years ago
|
Attachment #8728737 -
Attachment is obsolete: true
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Assignee | ||
Updated•7 years ago
|
Attachment #8921707 -
Attachment is obsolete: true
Attachment #8921707 -
Flags: feedback?(nfroyd)
Assignee | ||
Comment 35•7 years ago
|
||
This version of the script has better comments about how to use it.
Attachment #8769526 -
Attachment is obsolete: true
Updated•7 years ago
|
Blocks: memshrink-content
Comment 36•7 years ago
|
||
It may be worth revisiting whether VS2017 has fixed the "const data goes in a writable section" bug and/or whether new instances of it have popped up.
Depends on: 1334254
Assignee | ||
Updated•7 years ago
|
Attachment #8956657 -
Attachment mime type: application/x-shellscript → text/plain
Updated•6 years ago
|
Summary: Reduce size of writable static data → [meta] Reduce size of writable static data
Updated•6 years ago
|
Whiteboard: [overhead:meta]
Assignee | ||
Comment 37•4 years ago
|
||
This bug depends on a single open bug, so I think it has served its purpose.
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•