Closed
Bug 549749
Opened 15 years ago
Closed 6 years ago
Icegrind: Valgrind plugin to produce an exact chronological log of functions/data accesses during startup
Categories
(Core :: General, defect)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: taras.mozilla, Assigned: taras.mozilla)
References
(Blocks 1 open bug, )
Details
Attachments
(2 files, 5 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
application/octet-stream
|
Details |
Notes from irc:
need to write an instrumentation function which
a) looks at the IRStmt_IMarks (case Ist_IMark). when it sees new addresses for the first time, then print something out via one of the various routines in the public header of m_debuginfo (pub_tool_debuginfo.h) will convert it into a name
b) hands every data load/store (Ist_Store, Iex_Load) to an runtime helper function, a la ptrcheck, memcheck, helgrind, lackey that maintains a big bitmap.
The output of the plugin(with postprocessing) will be a set of linker scripts that will produce cold-startup-friendly binaries.
Comment 1•15 years ago
|
||
The data accesses are given by Lackey's --trace-mem option. That code is guarded by the 'clo_trace_mem' variable so is easy to extract from Lackey.
As for the function calls, that's a bit trickier. Well, doing it imperfectly is pretty easy. Doing it really accurately is harder, at least it is if you want to identify calls *and* returns, because of things like setjmp/longjmp and tail recursion. Callgrind has some code that tries to do it accurately.
Comment 2•15 years ago
|
||
Taras, if you give me a reasonably precise spec of what the output should look like, I can hack it up for you early next week.
Assignee | ||
Comment 3•15 years ago
|
||
Nicholas,
I appreciate your offer to help. I followed Julian's instructions and ended up with this patch. It captures data accesses as desired, but can't seem to catch any functions. Not sure if I'm doing something wrong there(going to look at callgrind next).
Spec:
I would like to obtain a chronological log which consists of functions(.text) and the .data that they access the first time they are accessed(ie modelling page-ins during application startup). This should then be easy to turn into a linker script to reorder the binary such that it is optimized for sequential reading during cold startup.
Assignee: nobody → tglek
Assignee | ||
Comment 4•15 years ago
|
||
Note, my problem turned out to be a weird testcase. Everything is working great now, moving on to make this useful.
Comment 5•15 years ago
|
||
Comment on attachment 430136 [details] [diff] [review]
lackey hack
> static VG_REGPARM(2) void trace_instr(Addr addr, SizeT size)
> {
>- VG_(printf)("I %08lx,%lu\n", addr, size);
>+ char fnname[200];
>+ //VG_(printf)("I %08lx,%lu\n", addr, size);
>+ if (remember_addr(addr) &&
>+ VG_(get_fnname_if_entry)(addr, fnname, sizeof(fnname))) {
>+ VG_(printf)(".text.%s\n", fnname);
>+ }
>+
> }
The function-detection part could be done at instrumentation-time (ie. in lk_instrument()) rather than run-time if you want it to be more efficient, but maybe that's not important for your needs.
> static VG_REGPARM(2) void trace_modify(Addr addr, SizeT size)
> {
>- VG_(printf)(" M %08lx,%lu\n", addr, size);
>+ //VG_(printf)(" M %08lx,%lu\n", addr, size);
> }
This function is for instructions that modify a memory location, eg. 'addl (%eax), 1'. You probably want to consider that along with more normal loads.
Other than that, it looks ok and I think it should do what you want it to.
Assignee | ||
Comment 6•15 years ago
|
||
(In reply to comment #5)
> The function-detection part could be done at instrumentation-time (ie. in
> lk_instrument()) rather than run-time if you want it to be more efficient, but
> maybe that's not important for your needs.
Yeah, this was just a prototype. I've switched to doing function-detection like you/Julian suggested.
> > static VG_REGPARM(2) void trace_modify(Addr addr, SizeT size)
> > {
> >- VG_(printf)(" M %08lx,%lu\n", addr, size);
> >+ //VG_(printf)(" M %08lx,%lu\n", addr, size);
> > }
>
> This function is for instructions that modify a memory location, eg. 'addl
> (%eax), 1'. You probably want to consider that along with more normal loads.
Thanks for the headsup
> Other than that, it looks ok and I think it should do what you want it to.
Unfortunately I ran into another snag. Is Valgrind supposed to work well on Fedora 12 yet? I'm using trunk from yesterday and it seems to fail to start firefox completely. I noticed that my tool gets through a significant portion of startup before firefox exits.
../vg-in-place --tool=none ~/builds/minefield.release/dist/bin/firefox-bin -P crash -no-remote
leads to
../vg-in-place: line 28: 5816 Killed VALGRIND_LIB="$vgbasedir/.in_place" VALGRIND_LIB_INNER="$vgbasedir/.in_place" "$vgbasedir/coregrind/valgrind" "$@"
If f12 isn't working well, what's a recent distribution that works well?
Comment 7•15 years ago
|
||
I don't know of any Fedora 12 issues, but Julian might.
Have you read this?
https://developer.mozilla.org/en/Debugging_Mozilla_with_Valgrind
You'll need --smc-check=all, unless you built Firefox with --enable-valgrind.
Assignee | ||
Comment 8•15 years ago
|
||
(In reply to comment #7)
> I don't know of any Fedora 12 issues, but Julian might.
There aren't any, I had my environment paths screwed up. Going to post a working prototype soon.
Assignee | ||
Comment 9•15 years ago
|
||
Now valgrind outputs the linker scripts.
This was underwhelming. Got the exact same speedup as I did with function-only reorder, ie barely 150ms.
The linker isn't moving the data sections as I expected. Will have to investigate more.
Attachment #430136 -
Attachment is obsolete: true
Assignee | ||
Comment 10•15 years ago
|
||
Note to self:
Just as my head was about to explode on detecting the types of data sections being used and grouping them in linker scripts, Ian Taylor pointed out the simplified interface being proposed for gold: http://sourceware.org/ml/binutils/2010-03/msg00050.html
Assignee | ||
Comment 11•15 years ago
|
||
This is a valgrind plugin that outputs a chronological log of function/data-sections that are accessed by the program.
The sections are produced by (http://hg.mozilla.org/users/tglek_mozilla.com/startup/file/6453ad2a7906/elflog.cpp) g++ elflog.cpp -g -O3 -o elflog -lelf && time ./elflog --contents ~/builds/minefield.release/dist/bin/libxul.so > ~/builds/minefield.release/dist/bin/libxul.so.sections
The plugin looks for .sections section-metadata every time a new file is mmap()ed. The section names live in an OSet. When a section is accessed, section name is printed to the log file and the section is removed from the OSet. The resulting section list is then fed into gnu gold to produce binaries with correct order.
This produces a libxul.so that has 2.5-3x less page faults on startup, 30% faster cold startup, and 2mb(out of 14mb) ram savings due to paging in less of libxul.
Posting the file in the hopes that V authors will help me strip the remaining lackey leftovers and package this up into an out-of-tree V plugin.
Attachment #430471 -
Attachment is obsolete: true
Comment 12•15 years ago
|
||
> This produces a libxul.so that has 2.5-3x less page faults on startup, 30%
> faster cold startup, and 2mb(out of 14mb) ram savings due to paging in less of
> libxul.
Nice! Glad you've made such good progress.
> Posting the file in the hopes that V authors will help me strip the remaining
> lackey leftovers and package this up into an out-of-tree V plugin.
Stripping out the Lackey leftovers shouldn't be hard -- I think you'd have the best idea of what's being used and what isn't. Or maybe you need help putting into a separate tool? http://www.valgrind.org/docs/manual/manual-writing-tools.html explains how, including build system stuff, I last updated it for the 3.5.0 release so it should be accurate.
Unfortunately there isn't any good way to package out-of-tree Valgrind plug-ins. The two options are (a) a patch, or (b) get it included in the source distribution. The former shouldn't be too painful since most of the changes will be additions in separate files, AFAICT. The latter would require adding docs, preferably some tests, and a commitment to maintain it.
Assignee | ||
Comment 13•15 years ago
|
||
>
> > Posting the file in the hopes that V authors will help me strip the remaining
> > lackey leftovers and package this up into an out-of-tree V plugin.
>
> Stripping out the Lackey leftovers shouldn't be hard -- I think you'd have the
> best idea of what's being used and what isn't. Or maybe you need help putting
> into a separate tool?
Both. My main problem with ripping out code is I do not understand the setup code that
sets up trace_load/modify/etc callbacks. Those are the only parts of lackey that I need.
> http://www.valgrind.org/docs/manual/manual-writing-tools.html explains how,
> including build system stuff, I last updated it for the 3.5.0 release so it
> should be accurate.
Those docs are really good. Got my tool building. It's too bad out-of-tree plugins aren't supported, I'll distribute this as a V patch for now.
>
> Unfortunately there isn't any good way to package out-of-tree Valgrind
> plug-ins. The two options are (a) a patch, or (b) get it included in the
> source distribution. The former shouldn't be too painful since most of the
> changes will be additions in separate files, AFAICT. The latter would require
> adding docs, preferably some tests, and a commitment to maintain it.
I'm ok with maintaining and all the requirements that come with it if we manage to get this deployed for our nightly builds.
Comment 14•15 years ago
|
||
(In reply to comment #13)
> > Stripping out the Lackey leftovers shouldn't be hard -- I think you'd have the
> > best idea of what's being used and what isn't. Or maybe you need help putting
> > into a separate tool?
>
> Both. My main problem with ripping out code is I do not understand the setup
> code that
> sets up trace_load/modify/etc callbacks. Those are the only parts of lackey
> that I need.
If you put the latest version up, I'll have a poke at it. Shouldn't
be difficult.
---
ATM this is just for Linux, right? Do you want this stuff to work for
MacOSX and Win32 too?
Assignee | ||
Comment 15•15 years ago
|
||
This is now feature-complete. I hooked into valgrind's mmap notifications and life became easy. I would appreciate any suggestions for improvement.
This code makes a mess of mixing Addr,Word,64bit<>32bit conversions, sorry about that, will clean that up before submitting this for inclusion in valgrind.
Attachment #437169 -
Attachment is obsolete: true
Assignee | ||
Updated•15 years ago
|
Summary: Valgrind plugin to produce an exact chronological log of functions/data accesses during startup → Icegrind: Valgrind plugin to produce an exact chronological log of functions/data accesses during startup
Comment 16•15 years ago
|
||
Comment on attachment 437664 [details] [diff] [review]
icegrind
I had a quick look over the code...
>+/*--------------------------------------------------------------------*/
>+/*--- A tool to produce a chronological log of functions/data access */
>+/*--------------------------------------------------------------------*/
>+
>+/*
>+ This file is part of Lackey, an example Valgrind tool that does
>+ some simple program measurement and tracing.
You should update the tool description.
>+static void lk_post_clo_init(void)
>+{
>+
>+}
You should change the "lk_" prefixes to "ice_" to match the filename.
>+ case Ist_IMark:
>+
>+ /* Needed to be able to check for inverted condition in Ist_Exit */
>+ inspect_addr( st->Ist.IMark.addr, st->Ist.IMark.len);
>+ break;
You realise this call will occur at instrumentation-time, not at run-time,
right? Hopefully you do, but it's easy to get confused between the two.
>Index: coregrind/m_tooliface.c
>===================================================================
>--- coregrind/m_tooliface.c (revision 11100)
>+++ coregrind/m_tooliface.c (working copy)
>@@ -330,6 +330,14 @@
> VG_(tdict).tool_final_IR_tidy_pass = final_tidy;
> }
>
>+void VG_(needs_mmap_notify)(
>+ void (*track_mmap)(struct NSegment const * seg),
>+ void (*track_munmap)(Addr a, SizeT len)
>+ )
>+{
>+ VG_(tdict).track_mmap = track_mmap;
>+ VG_(tdict).track_munmap = track_munmap;
>+}
> /*--------------------------------------------------------------------*/
> /* Tracked events. Digit 'n' on DEFn is the REGPARMness. */
>
Do you know about the existing track_new_mem_map and track_die_mem_munmap
events? This is just replicating them.
Assignee | ||
Comment 18•14 years ago
|
||
Uploading my working copy, this isn't ready for review.
Attachment #437664 -
Attachment is obsolete: true
Assignee | ||
Comment 19•14 years ago
|
||
mmap()s sections, switched ist_mark: case over to record functions at runtime. This results in a 4-5x slowdown, but hopefully a more realistic log.
TODO: mmap() a binary sections file to avoid building datastructures for section descriptors at runtime.
Attachment #448782 -
Attachment is obsolete: true
Assignee | ||
Comment 20•14 years ago
|
||
This also handles nested mmaps
Comment 21•14 years ago
|
||
Comment on attachment 449748 [details] [diff] [review]
ice grind checkpoint
>+ while((iterator_mmap = VG_(OSetGen_Next)(mmaps)) != NULL) {
>+ if (iterator_mmap->addr < a->addr && iterator_mmap->end > a->end) {
>+ if (0)
>+ VG_(printf)("nested ");
>+ if (!iterator_mmap->sub_mmaps) {
>+ iterator_mmap->sub_mmaps = VG_(OSetGen_Create)(0, NULL, VG_(malloc), "sub_mmaps", VG_(free));
>+ }
>+ dest_mmaps = iterator_mmap->sub_mmaps;
>+ break;
Wouldn't that be better done with find_mmap?
Comment 22•14 years ago
|
||
Comment on attachment 449748 [details] [diff] [review]
ice grind checkpoint
>+static void track_mmap(NSegment const * seg) {
(...)
>+ VG_(OSetGen_Insert)(mmaps, a);
This should be dest_mmaps instead of mmaps, otherwise nested mmaps don't work. Note, however, that if the new mmap starts in a previous one, but ends after its end, that won't work either. Maybe the best would be to actually cut the original mmap (which is afaiu what happens with real mmaps anyway)
Comment 23•14 years ago
|
||
To make icegrind properly work with latest valgrind trunk, you need to add a few things to Makefile.am. Here is the new version.
Comment 24•14 years ago
|
||
(In reply to comment #16)
> Do you know about the existing track_new_mem_map and track_die_mem_munmap
> events? This is just replicating them.
The downside of this is that it doesn't track the original mmap() for the executable binary (though libraries loaded by ld.so are still tracked)
Comment 25•14 years ago
|
||
(In reply to comment #24)
> (In reply to comment #16)
> > Do you know about the existing track_new_mem_map and track_die_mem_munmap
> > events? This is just replicating them.
>
> The downside of this is that it doesn't track the original mmap() for the
> executable binary (though libraries loaded by ld.so are still tracked)
track_new_mem_startup should get the original binary, IIRC.
Comment 26•6 years ago
|
||
Seems that there isn't a strong interest on that anymore. Closing.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•