Closed
Bug 492185
Opened 15 years ago
Closed 4 years ago
Find classes that are wasting memory due to alignment
Categories
(Developer Infrastructure :: Source Code Analysis, defect)
Developer Infrastructure
Source Code Analysis
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: Swatinem, Unassigned)
References
Details
(Keywords: memory-footprint, Whiteboard: [MemShrink:P3])
Attachments
(1 file, 10 obsolete files)
(deleted),
text/x-python
|
Details |
I have been hacking up a dehydra script that is calculating a classes "real" sizeof and comparing it to the final sizeof to determine if that class is wasting memory due to alignment.
The script works quite well as far as I can tell.
Reporter | ||
Comment 1•15 years ago
|
||
This is a filtered list of badly aligned classes. I will skim through it and try to fix some of the classes.
Comment 2•15 years ago
|
||
This awesome stuff. Who would've thought that getting this number is as simple as subtracting sizes of members from the struct size :) Whenever I thought about this issue I forgot that dehydra provides size_of.
/**
* Size of a pointer, assume as 8, TODO: how to get the *real* pointer size?
*/
var pointerSize = 8;
I presume this is 64bit
if(type.isPointer)
{
size.real = pointerSize;
I think you can do size.real = type.precision/8
For lazy bums like me, csv files are easier to open.
As I said before, awesome!
David, I believe you wanted to see some static analysis results on accessibility, this spreadsheet looks like a goldmine.
Blocks: static_analyses
Reporter | ||
Comment 3•15 years ago
|
||
> size.real = pointerSize;
> I think you can do size.real = type.precision/8
Yes indeed. I use pointerSize mainly for adding the size of the vtable pointer to classes.
Will see what else I can do there.
I wonder how much freedom I have in reordering members. I suppose I can't change anything in NSPR/NSS/frozen linkage due to ABI compatibility. Other classes should be fine though?
A quick look over some classes also revealed more optimization opportunities by changing away from PRBool (4byte) to PRPackedBool (1byte) or even straight to bool, as that is now used in some places in the codebase already.
Comment 4•15 years ago
|
||
> I wonder how much freedom I have in reordering members. I suppose I can't
> change anything in NSPR/NSS/frozen linkage due to ABI compatibility. Other
> classes should be fine though?
Think so.
>
> A quick look over some classes also revealed more optimization opportunities by
> changing away from PRBool (4byte) to PRPackedBool (1byte) or even straight to
> bool, as that is now used in some places in the codebase already.
PRPackedBool is probably easier since you don't have to add casts everywhere to switch to bool. I'm also not sure if all modules allow bool usage.
While you are at it could you do a check similar to the one mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=376434
Basically scan for structs that have both signed and unsigned bitfields and warn because MSVC packs them wrong.
Comment 5•15 years ago
|
||
(In reply to comment #2)
> David, I believe you wanted to see some static analysis results on
> accessibility, this spreadsheet looks like a goldmine.
Wow, crazy! Great work. What happens next?
Comment 6•15 years ago
|
||
(In reply to comment #5)
> (In reply to comment #2)
> > David, I believe you wanted to see some static analysis results on
> > accessibility, this spreadsheet looks like a goldmine.
>
> Wow, crazy! Great work. What happens next?
Would be nice if you could go through accessibility ones to verify the results are correct and maybe pick a couple that are frequently allocated as candidates to fixing.
Comment 7•15 years ago
|
||
Arpad: excellent work -- any clues from your tool how JSRuntime loses 92 bytes to internal fragmentation?
/be
Reporter | ||
Comment 8•15 years ago
|
||
(In reply to comment #7)
> Arpad: excellent work -- any clues from your tool how JSRuntime loses 92 bytes
> to internal fragmentation?
>
> /be
My tool currently doesn't print a list of members, but that would be easy to add.
Looking at the code, I can see some possible problems:
> uint32 gcMallocBytes;
> JSGCArenaInfo *gcUntracedArenaStackTop;
> JSBool rngInitialized;
> int64 rngMultiplier;
> uint32 requestCount;
> JSThread *gcThread;
And quite a few more... I'm on x86_64, and every time a 4byte member is followed by a 8byte member (or pointer) its wasting 4 bytes.
Reporter | ||
Comment 9•15 years ago
|
||
Started a thread on dev.platform: http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/52bc09157428dadb#
Reporter | ||
Comment 10•15 years ago
|
||
Attachment #376545 -
Attachment is obsolete: true
Comment 11•15 years ago
|
||
On 32-bit x86, misaligned doubles are allowed but slow down the program considerably. It might be useful to run this script with pointerSize=4 to find those as well.
Reporter | ||
Comment 12•15 years ago
|
||
This list includes a possible size if using C++ bools (which are 1byte as are PRPackedBools) and a list of direct members with their sizes (does not include inherited members from parent classes).
Attachment #376546 -
Attachment is obsolete: true
Reporter | ||
Comment 13•15 years ago
|
||
Attachment #376620 -
Attachment is obsolete: true
Reporter | ||
Comment 14•15 years ago
|
||
The script had some issues with inheritance and vtables, probably still has, I'm not quite sure.
Attachment #376682 -
Attachment is obsolete: true
Reporter | ||
Updated•15 years ago
|
Attachment #376728 -
Attachment mime type: application/octet-stream → text/plain
Comment 15•15 years ago
|
||
You might find this document helpful, if you haven't seen it already:
http://www.codesourcery.com/public/cxx-abi/abi.html#layout
It lays out things like the rules for where the vtable pointer appears and under what circumstances a derived class's members can be packed into the base class's tail padding. (Ignore mention of Itanium; GCC uses this ABI on all architectures nowadays, modulo things like pointer size and minimum required alignment.)
Reporter | ||
Comment 16•15 years ago
|
||
Still has problems with inheritance and vtable pointers though :(
Attachment #376683 -
Attachment is obsolete: true
Comment 17•15 years ago
|
||
Quick comments on looking at the list:
1) A number of these are services or one-per-document objects. Savings from
those will be small no matter what. It might not be worth making the
changes if it decreases readability by separating related members.
2) You might want to get an NSS person to look at those SEC* things near the
top of the list (as well as the other ones on it).
3) A bunch of per-element accessibility stuff here that could really use
fixing, if the numbers are right.
4) The frame and element classes in that list should be a high priority.
5) I'm not sure where some of these differences are coming from. For example,
The one for nsHTMLFormElement seems way too big; I would have expected
about 2 bytes of difference there. The class does have 5 vtable pointers
in addition to the ones inherited from the superclass; is that where the
other 40 bytes come from?
Minor output nit: can you include a percentage, or better yet just stick the results in a google spreadsheet and link to them?
Tying these results to trace-refcnt output from a Tp run would also be interesting, as far as looking at the maximum number of bytes saved during runtime.
Reporter | ||
Comment 19•15 years ago
|
||
This time I tried a different approach. Instead of taking the compiled sizeof() and messing around with vtable pointers, I rather look at the members and see how many GAPs are between them. I then calculate how much one can win by reordering the members or using 1byte bools.
This time, there aren't that many wins to be made. More wins could be made by cutting down the inheritance tree, consolidating Interfaces and remove unnecessary ones.
Here is the spreadsheet: http://spreadsheets.google.com/ccc?key=rVIrgklQt9Y2H1hnS87L1HA
Reporter | ||
Updated•15 years ago
|
Attachment #376728 -
Attachment is obsolete: true
Comment 20•15 years ago
|
||
The desire for a cheap, always-on memory profiler that puts allocations in coarse, non-hierarchical buckets (JS, DOM, HTML, XUL, CSS, ...) came up again, Ben Galbraith is leading the charge.
This goal reminded me of SGI's Power Bloat Vision(tm) tool, a graphical line-chart or (default UI) bar with color-coded segments jumping around as different kernel identified resource users of memory changed their shares. PBV really helped users zero in on bloat in their apps, as well as in Motif/X, the SGI graphical desktop, and Irix ;-).
Thinking about how to do it super-cheaply, I thought of this bug. We don't want to add malloc interception by default, but we use jemalloc on Windows and Linux, and may on Mac. We can hack jemalloc. Problem is we don't want to change malloc, new, etc. signatures to carry "bucket ids", nor do we wish to walk the stack on every allocation, even by one frame.
But suppose each bucket (subsystem) happened to allocate unique sizes? Then we could statically analyze the sizes and produce a mapping to burn into the build.
Ok, I'm sure different buckets have their own structs or classes that happen to have the same size. Still, we could find such types with static analysis, and even generate a patch to farble the size of one or the other to differ.
If this wastes space, we could do something trickier, like farble the size passed to malloc in a way that did not waste space (e.g., subtract from 0 to 3 to help distinguish 4 buckets' uses of the same size).
What say you static analysis folks? We need your northern-style kung fu!
/be
Comment 21•15 years ago
|
||
I will file a new bug if needed, I didn't mean to hijack this one -- sorry. Want to get a quick sanity check, figured this bug could use some action ;-).
/be
Reporter | ||
Comment 22•14 years ago
|
||
Attachment #376742 -
Attachment is obsolete: true
Reporter | ||
Comment 23•14 years ago
|
||
I have been running this again after a long time, there are quite a few offenders left, see the bug url field.
Assuming we fixed everything and every structure was tightly packed, what's the total net savings? (Both absolute and %)
Comment 25•14 years ago
|
||
(In reply to comment #24)
> Assuming we fixed everything and every structure was tightly packed, what's the
> total net savings? (Both absolute and %)
Savings would require instrumenting the build to figure out how often these datastructures are used.
Comment 26•14 years ago
|
||
Can we get the list sorted by win size or win%? I suspect that most of these classes are very infrequently used, but a few seem like they could be big wins. Some suggestions, if you want to do tryserver runs on a limited set:
nsHtml5StackNode (not actually stack-allocated, despite its name)
nsStyleVisibility
mozilla::dom::Link (there can be thousands of these per page, might improve Tp times)
nsHtml5TreeOperation
nsHtml5ElementName
nsHTMLElement (!)
For all of the IPDL-generated classes (P*), you should probably just file a bug on the code generator to emit the members in a better order.
It also looks like you're listing typedefs of classes, which adds a lot of duplicate data.
Reporter | ||
Comment 27•14 years ago
|
||
(In reply to comment #26)
> Can we get the list sorted by win size or win%?
"usingbool" means wins if PRBools are changed to PRPackedBools (or C++ bools) and is thus a better sorting column, especially for bool heavy Classes like nsHtml5StackNode. In extreme cases, using a bitfield may yield even better wins. What is the policy regarding such changes?
> It also looks like you're listing typedefs of classes, which adds a lot of
> duplicate data.
Yes, the script is far from being perfect :(
Comment 28•14 years ago
|
||
We're not going to switch to real bool until after FF4. Can we fix the easy wins I mentioned now?
Comment 29•14 years ago
|
||
Just FYI, Julian has a decent list of distributions of allocated objects from bug 551477. I'm gonna try cross-referencing that with this.
Reporter | ||
Comment 30•14 years ago
|
||
(In reply to comment #28)
> We're not going to switch to real bool until after FF4. Can we fix the easy
> wins I mentioned now?
mozilla::dom::Link does already use bool, and according to dehydra, they are 1 bit types instead of 1 byte, so there are possibly more wins to have. Will update my script accordingly.
The easy cases you mentioned are included in the patch, except for nsHTMLElement, which would require a rewrite of all the static initialization.
Attachment #467717 -
Flags: review?
Reporter | ||
Comment 31•14 years ago
|
||
(In reply to comment #30)
> mozilla::dom::Link does already use bool, and according to dehydra, they are 1
> bit types instead of 1 byte, so there are possibly more wins to have. Will
> update my script accordingly.
Dehydra reports a precision of 1bit for C++ bools, but GCC does not automatically fold adjacent bools into bitfields, would have to be done manually. I updated the table to assume bitfield packing for bools, also I hope to have fixed the duplicate typedefs.
Comment 32•14 years ago
|
||
You cannot and should not fold bools into bitfields because you can take their address. In some cases it can also be a performance hit because it involves a full read-write.
Updated•13 years ago
|
Whiteboard: [MemShrink:P3]
Updated•13 years ago
|
OS: Linux → All
Hardware: x86_64 → All
Comment 33•13 years ago
|
||
Comment on attachment 467717 [details] [diff] [review]
fix classes wasting memory
I'm going through unassigned review -- and am going to take the liberty of r-ing this based on comment 32.
Please update patch and request review again if this is still an issue.
Attachment #467717 -
Flags: review? → review-
Reporter | ||
Comment 34•12 years ago
|
||
Last time I tried, like half a year ago, neither my original dehydra script worked with the latest gcc/dehydra, nor pahole (http://lwn.net/Articles/206805/).
If someone else wants to take over, feel free to do so.
Reporter | ||
Comment 35•11 years ago
|
||
So apparently compilers have had `-Wpadded` warnings for some time. GCCs warnings are not that useful, those from clang are more so, they look like this:
(this is from `mach warnings-list`)
```
js/src/vm/Runtime.h:693:25 [-Wpadded] padding struct 'JSRuntime' with 7 bytes to align 'operationCallback'
js/src/vm/Runtime.h:764:12 [-Wpadded] padding struct 'JSRuntime' with 6 bytes to align 'numExclusiveThreads'
js/src/vm/Runtime.h:814:11 [-Wpadded] padding struct 'JSRuntime' with 4 bytes to align 'ownerThread_'
js/src/vm/Runtime.h:958:25 [-Wpadded] padding struct 'JSRuntime' with 3 bytes to align 'gcChunkSet'
js/src/vm/Runtime.h:983:25 [-Wpadded] padding struct 'JSRuntime' with 4 bytes to align 'gcMarker'
[…]
```
So I just wrote a small python script (attached) to sum up all those numbers, they then look like this:
```
144.0 JSRuntime js/src/vm/Runtime.h
92.0 icu_52::DateFormatSymbols intl/icu/source/i18n/unicode/dtfmtsym.h
58.625 mozilla::net::nsHttpHandler netwerk/protocol/http/nsHttpHandler.h
52.0 webrtc::acm1::AudioCodingModuleImpl media/webrtc/trunk/webrtc/modules/audio_coding/main/source/audio_coding_module_impl.h
48.0 mozilla::net::WebSocketChannel netwerk/protocol/websocket/WebSocketChannel.h
47.0 fsmdef_dcb_t media/webrtc/signaling/src/sipcc/core/gsm/h/fsm.h
42.0 nsFtpState netwerk/protocol/ftp/nsFtpConnectionThread.h
42.0 mozilla::WebGLContext content/canvas/src/WebGLContext.h
42.0 mozilla::net::BackgroundFileSaver netwerk/base/src/BackgroundFileSaver.h
40.0 AffixMgr extensions/spellcheck/hunspell/src/affixmgr.hxx
```
I have no idea how clang handles unions in this case though, so this might have some double counting and false positives. Haven’t done any research on that yet.
Attachment #466989 -
Attachment is obsolete: true
Attachment #466990 -
Attachment is obsolete: true
Comment 36•11 years ago
|
||
Arpad: nice! I suggest that you post the full list to dev-platform to get feedback on which of these objects are likely to matter. For example, I know that the padding in JSRuntime isn't important, because we have 1+N JSRuntime objects live at any one time, where N is the number of workers, which is typically 3. But I don't know about any of those others.
Reporter | ||
Updated•11 years ago
|
Reporter | ||
Updated•11 years ago
|
Attachment #467717 -
Attachment is obsolete: true
Updated•7 years ago
|
Product: Core → Firefox Build System
Comment 37•4 years ago
|
||
We already have this in place with one of our static-analysis tools.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
Updated•2 years ago
|
Product: Firefox Build System → Developer Infrastructure
You need to log in
before you can comment on or make changes to this bug.
Description
•