Closed Bug 554421 Opened 15 years ago Closed 14 years ago

Replace or augment unix wrapper script with a madvise()ing binary for a massive win

Categories

(Toolkit :: Startup and Profile System, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 632404

People

(Reporter: taras.mozilla, Unassigned)

References

Details

(Whiteboard: [ts])

Attachments

(2 files, 3 obsolete files)

Attached file proof of concept (obsolete) (deleted) —
By default, the dynamic linker + demand paging is an excellent way to kill our startup perf. By default the dynamic linker causes random (and frequently backwards) IO through the binary, which is extremely likely to cause pagefaults and disk io. This overhead can be measured in seconds on my system. By default Linux reads 200K worth of pages per fault which is a lot of seeks/reads in a >10MB binary. Turns out this sort of stupidity would be avoided if one could call madvise(MADV_WILLNEED) from the dynamic linker right after it mmaps() relevant areas. On my system MADV_WILLNEED cranks up the readahead by 100x to 2MB which massively reduces faulting. Note that madvise has zero overhead if there are no page faults, so it basically has no startup overhead during warm startup. Since I can't figure out how to make the dynamic linker do madvise(), have to hack around it. hack a) Call madvise from a function that runs before main() in firefox-bin. This is easy to implement, could be deployed tomorrow. In my testcase it reduced libxul.so reads from 127 to 62. The downside is that by the time my function is called the dynamic linker has already gone and done the worst of the seeky backwards io. hack b) I wasn't sure if this was going to work, but it did. Write a basic C program that open()s libxul, maps the two rw/rx segments and madvise()s them. Then it forks(in order to keep the memory maps) and starts firefox which then presumably shares the mappings. My script can't measure faults in this situation, but i'm estimating it's 10x better than a). It shaves off 1second of my startup time down to 1.6seconds. proper-solution c) Modify the dynamic linker so one could specify madvise()s via ld commandline or env variables. What what I know about libc/ld release cycles and difficulty of getting stuff landed, this sounds hard.
Whiteboard: [ts]
Attachment #434313 - Attachment mime type: text/x-csrc → text/plain
Wouldn't it then be much simpler to use the xulrunner stub, which dlopen()s libxul.so, at which point you could madvise() ?
(In reply to comment #1) > Wouldn't it then be much simpler to use the xulrunner stub, which dlopen()s > libxul.so, at which point you could madvise() ? Yes. This would work better for the current libxul xulrunner builds. It wont help with static builds in bug 525013.
(In reply to comment #2) > Yes. This would work better for the current libxul xulrunner builds. It wont > help with static builds in bug 525013. How about mixing both worlds ? Why not make the static builds really a big fat libxul and use a loader/stub that would do the loading and madvise() ? (and also get rid of the shell wrapper scripts)
(In reply to comment #3) > (In reply to comment #2) > > Yes. This would work better for the current libxul xulrunner builds. It wont > > help with static builds in bug 525013. > > How about mixing both worlds ? Why not make the static builds really a big fat > libxul and use a loader/stub that would do the loading and madvise() ? (and > also get rid of the shell wrapper scripts) Yeah I could live with that on Linux. Sounds like a reasonable compromise. Got a patch? :)
d) use a different dynamic linker than ld.so. e.g. a stub dynamic linker that does the madvise and then just forwards everything else to ld.so. But that doesn't sound simple b) sounds fine with me How would "static builds a big fat libxul" be any different from our current situation? Just linking in JS, which we could/should probably do anyway?
(In reply to comment #5) > d) use a different dynamic linker than ld.so. e.g. a stub dynamic linker that > does the madvise and then just forwards everything else to ld.so. But that > doesn't sound simple Yeah shipping a drop-in linker with firefox sounds a little scary. > > b) sounds fine with me > > How would "static builds a big fat libxul" be any different from our current > situation? Just linking in JS, which we could/should probably do anyway? Yes. Link in everything we can into libxul.
Attached patch approach a (obsolete) (deleted) — Splinter Review
I was misinterpreting my logs. Turns out madvise() immediately triggers readahead. This makes linking everything into libxul.so, madvise()ing, then dlopen()ing it a robust way to read our binaries in fast.
Attached patch libxul preloader (obsolete) (deleted) — Splinter Review
Here is another approach(dirty hack) for preloading the binary. This relies on gnu linker's function layout and gcc's initialization order. This places a marker function on the front of the binary and a startup function on the end. Gcc runs initializers in reverse order so the end function runs before any other initializer. I also use function/variable positions to compute the approximate size of the .text and .data sections. This loads the firefox into memory(with patches from bug 561842) with ~50 pagefaults. vs near 250 of a stock nightly.
Attachment #434313 - Attachment is obsolete: true
Attachment #434721 - Attachment is obsolete: true
Attached patch libxul preloader (deleted) — Splinter Review
Correct patch
Attachment #447252 - Attachment is obsolete: true
Comment on attachment 447253 [details] [diff] [review] libxul preloader This is the best patch I've ever seen. Please never ever commit it to the tree.
(In reply to comment #11) > (From update of attachment 447253 [details] [diff] [review]) > This is the best patch I've ever seen. Please never ever commit it to the tree. Whatever, but this is massively less invasive and complicated than the other stuff Taras had been looking into.
Actually, I take it back: it could go in fine if it was a good win, I guess. I don't really care about the performance of XULRunner-stubbed builds, since people who distribute that way are basically deciding to make startup slow anyway. (And they tend to correlate with people who want to link against a bazillion system libraries, rather than use the in-tree ones that we can get into a big libxul anyway.)
(In reply to comment #13) > Actually, I take it back: it could go in fine if it was a good win, I guess. I > don't really care about the performance of XULRunner-stubbed builds, since > people who distribute that way are basically deciding to make startup slow > anyway. (And they tend to correlate with people who want to link against a > bazillion system libraries, rather than use the in-tree ones that we can get > into a big libxul anyway.) I'm not sure why this would negatively affect xulrunner stubs. I would also point out that using system libraries instead of [even statically] linking our own is a performance win as they have a higher chance of being cached already.
Have you considered using the build system to get the appropriate libxul offsets and compile them into the executable in an approach like comment 1 / attachment 434313 [details] with mmap, madvise, dlopen?
You could use linker scripts to set symbols with the proper values to get the offsets. Anyways, I don't think we need this yet.
This turned out to be super-easy. This version of the hack use dlopen(as suggested above) to workaround avoid ld.so stupidity.
Blocks: 627591
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: