Closed Bug 688979 Opened 13 years ago Closed 7 years ago

Add trace-malloc-like functionality for jemalloc

Categories

(Core :: Memory Allocator, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1094552

People

(Reporter: justin.lebar+bug, Unassigned)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(2 files, 1 obsolete file)

I've been thinking about how we can get more information about how and why the heap is fragmented. I think what would be helpful is a log which contains: - for each malloc, the requested malloc size, the block's malloc_usable_size, the block's address, and a stack trace, and - for each free, the free'd address. We could parse this log to profile the heap and find dark matter, which is nice. But we could also use it to understand sources of heap fragmentation. Since we know the allocations' addresses, we can look at a page with few live allocations and ask "who allocated the objects which used to live on this page?". trace-malloc is almost what we want, but doesn't quite get us there because: - its output format is impenetrable, - it doesn't contain malloc_usable_size (and adding that would break all consumers, although I guess we could put it behind a flag), - it calls into libc's allocator, not jemalloc, and - it collects a lot of additional information, thus perturbing jemalloc. The only real trick here, afaict, is figuring out how to call NS_StackWalk from either within jemalloc or from a wrapper.
Whiteboard: [MemShrink]
Assuming you had this information, what would you do with it that would help lessen fragmentation?
Presumably, callsites which are causing fragmentation allocate lots of small, short-lived chunks interspersed with some longer-lived chunks. If we could identify those sites, we could either allocate the small chunks from an arena, as part of larger allocations, or perhaps on the stack. This is really a generalization of the nsTArray --> nsAutoTArray work in bug 688532, except that we'd be able to focus on the callsites which are actually causing fragmentation, instead of (or, in addition to) trying to reduce the number of overall calls to malloc.
Assignee: nobody → justin.lebar+bug
I think I read somewhere that allocating stack traces are good predictors of a block's lifetime (which is what you're after, right?)
(In reply to Julian Seward from comment #3) > I think I read somewhere that allocating stack traces are good > predictors of a block's lifetime (which is what you're after, right?) I guess I'm interested in more than just "how long do the allocations from a callsite live?" A bunch of small, long-lived allocations made all in a row isn't so bad if they are all free'd around the same time. So long-lived allocations aren't necessarily the problem, unless the distribution of the chunks' lifetimes has a thick tail. But also, a callsite which makes exclusively short-lived allocations could cause fragmentation by spreading out onto more pages the intervening long-lived allocations.
Depends on: 688999
Whiteboard: [MemShrink] → [MemShrink:P2]
For my reference, changes to jemalloc.c don't get propagated correctly unless you apply attachment 529650 [details] [diff] [review].
Target Milestone: --- → mozilla9
Version: unspecified → Trunk
Attached patch WIP v1 (obsolete) (deleted) — Splinter Review
This prints out backtraces which I think may be right. The backtraces are just a list of PCs. To translate a PC into a file and line number, you need to use the data from /proc/maps (included in the dumps generated by this patch) to figure out which solib the PC belongs to, calculate the offset into the solib, and then run addr2line.
Target Milestone: mozilla9 → ---
(In reply to Justin Lebar [:jlebar] from comment #6) > Created attachment 563797 [details] [diff] [review] [diff] [details] [review] > WIP v1 > > This prints out backtraces which I think may be right. > > The backtraces are just a list of PCs. To translate a PC into a file and > line number, you need to use the data from /proc/maps (included in the dumps > generated by this patch) to figure out which solib the PC belongs to, > calculate the offset into the solib, and then run addr2line. Note that this (using data from /proc/maps) won't on Android.
That's a shame. Why is that, and how do I get around it?
Because we don't map files for our libs. What can work instead, is to get struct r_debug during malloc_init. Once you get that, you can find the right library by going through struct link_maps. See the simple_linker_init part of https://bug687446.bugzilla.mozilla.org/attachment.cgi?id=560887 , this will get you struct r_debug. I can assist if necessary, I've been implementing that in the linker and breakpad.
Though, now that i think of it, if you want line numbers, you need actual files, since the debug info is not mapped, libunwind won't find the necessary info anyways...
> Because we don't map files for our libs. Ah. Let me see if this is even useful on desktop Linux, and then we can figure out how to get this to work on Android.
Attached patch WIP v2 (deleted) — Splinter Review
Now with a python script which, miraculously, seems to translate the offsets properly.
Attachment #563797 - Attachment is obsolete: true
(In reply to Justin Lebar [:jlebar] from comment #12) > Created attachment 563849 [details] [diff] [review] [diff] [details] [review] > WIP v2 > > Now with a python script which, miraculously, seems to translate the offsets > properly. Speaking of a script that translates offsets, I seem to remember we have one in the tree already. Or maybe it was in the automation scripts.
There's fix-linux-stack.pl, but that doesn't translate raw PCs; it only translates "lib+addr".
(In reply to Justin Lebar [:jlebar] from comment #14) > There's fix-linux-stack.pl, but that doesn't translate raw PCs; it only > translates "lib+addr". Well, you have libs, you have their base address, you have pc... you could output lib+addr :)
Well, yeah. But piping to fix-linux-stack.pl is about as hard as piping to addr2line. :)
Blocks: 691174
Blocks: 691176
Blocks: 691189
Blocks: 691192
I've been thinking about figuring out how to assign "blame" for fragmentation. The intuitive thing to do would be to look at the heap, find pages with just a few live objects, and blame those objects for fragmentation. But I think this is wrong. Those objects have to live *somewhere*, and it's not their fault that they live on a mostly-empty page. So we need to look at dead objects, not live objects. Probably the simplest heuristic is to blame the most-recently dead object at each address on each page which has at least one live allocation, but I'm not sure that's right, because it ignores the allocator's bucketing of allocations by size and whatnot...
I guess the correct definition of "how bad is this allocation site?" is "how many fewer pages would be live if we hadn't made any allocations at that site?". Our goal is to approximate this tractably.
Blocks: 746009
Attached patch WIP v3 (deleted) — Splinter Review
I have no idea what these changes to rules.mk are for. But anyway, this works well enough. Linux only.
This is now simple to do with replace-malloc. It's what we rely on for new DMD. In any case I'm not looking at this anymore.
Assignee: justin.lebar+bug → nobody
DMD's cumulative heap profiling covers this.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: