Closed Bug 554421 Opened 15 years ago Closed 14 years ago

Replace or augment unix wrapper script with a madvise()ing binary for a massive win

Tracking

()

Status:

RESOLVED DUPLICATE of bug 632404

People

(Reporter: taras.mozilla, Unassigned)

References

Details

(Whiteboard: [ts])

Attachments

(2 files, 3 obsolete files)

proof of concept 15 years ago (dormant account) (deleted), text/plain		Details
approach a 15 years ago (dormant account) (deleted), patch		Details \| Diff \| Splinter Review
libxul preloader 15 years ago (dormant account) (deleted), patch		Details \| Diff \| Splinter Review
libxul preloader 15 years ago (dormant account) (deleted), patch		Details \| Diff \| Splinter Review
firefox-bin that dynamically opens libxul.so 14 years ago (dormant account) (deleted), patch		Details \| Diff \| Splinter Review

(dormant account)

Reporter

Description

•

15 years ago

Attached file proof of concept (obsolete) (deleted) — Details

By default, the dynamic linker + demand paging is an excellent way to kill our startup perf. By default the dynamic linker causes random (and frequently backwards) IO through the binary, which is extremely likely to cause pagefaults and disk io. This overhead can be measured in seconds on my system. By default Linux reads 200K worth of pages per fault which is a lot of seeks/reads in a >10MB binary. Turns out this sort of stupidity would be avoided if one could call madvise(MADV_WILLNEED) from the dynamic linker right after it mmaps() relevant areas. On my system MADV_WILLNEED cranks up the readahead by 100x to 2MB which massively reduces faulting. Note that madvise has zero overhead if there are no page faults, so it basically has no startup overhead during warm startup. Since I can't figure out how to make the dynamic linker do madvise(), have to hack around it. hack a) Call madvise from a function that runs before main() in firefox-bin. This is easy to implement, could be deployed tomorrow. In my testcase it reduced libxul.so reads from 127 to 62. The downside is that by the time my function is called the dynamic linker has already gone and done the worst of the seeky backwards io. hack b) I wasn't sure if this was going to work, but it did. Write a basic C program that open()s libxul, maps the two rw/rx segments and madvise()s them. Then it forks(in order to keep the memory maps) and starts firefox which then presumably shares the mappings. My script can't measure faults in this situation, but i'm estimating it's 10x better than a). It shaves off 1second of my startup time down to 1.6seconds. proper-solution c) Modify the dynamic linker so one could specify madvise()s via ld commandline or env variables. What what I know about libc/ld release cycles and difficulty of getting stuff landed, this sounds hard.

(dormant account)

Reporter

Updated

•

15 years ago

Whiteboard: [ts]

(not currently active) Ted Mielczarek

Updated

•

15 years ago

Attachment #434313 - Attachment mime type: text/x-csrc → text/plain

Mike Hommey [:glandium]

Comment 1

•

15 years ago

Wouldn't it then be much simpler to use the xulrunner stub, which dlopen()s libxul.so, at which point you could madvise() ?

(dormant account)

Reporter

Comment 2

•

15 years ago

(In reply to comment #1) > Wouldn't it then be much simpler to use the xulrunner stub, which dlopen()s > libxul.so, at which point you could madvise() ? Yes. This would work better for the current libxul xulrunner builds. It wont help with static builds in bug 525013.

Mike Hommey [:glandium]

Comment 3

•

15 years ago

(In reply to comment #2) > Yes. This would work better for the current libxul xulrunner builds. It wont > help with static builds in bug 525013. How about mixing both worlds ? Why not make the static builds really a big fat libxul and use a loader/stub that would do the loading and madvise() ? (and also get rid of the shell wrapper scripts)

(dormant account)

Reporter

Comment 4

•

15 years ago

(In reply to comment #3) > (In reply to comment #2) > > Yes. This would work better for the current libxul xulrunner builds. It wont > > help with static builds in bug 525013. > > How about mixing both worlds ? Why not make the static builds really a big fat > libxul and use a loader/stub that would do the loading and madvise() ? (and > also get rid of the shell wrapper scripts) Yeah I could live with that on Linux. Sounds like a reasonable compromise. Got a patch? :)

Benjamin Smedberg

Comment 5

•

15 years ago

d) use a different dynamic linker than ld.so. e.g. a stub dynamic linker that does the madvise and then just forwards everything else to ld.so. But that doesn't sound simple b) sounds fine with me How would "static builds a big fat libxul" be any different from our current situation? Just linking in JS, which we could/should probably do anyway?

(dormant account)

Reporter

Comment 6

•

15 years ago

(In reply to comment #5) > d) use a different dynamic linker than ld.so. e.g. a stub dynamic linker that > does the madvise and then just forwards everything else to ld.so. But that > doesn't sound simple Yeah shipping a drop-in linker with firefox sounds a little scary. > > b) sounds fine with me > > How would "static builds a big fat libxul" be any different from our current > situation? Just linking in JS, which we could/should probably do anyway? Yes. Link in everything we can into libxul.

(dormant account)

Reporter

Comment 7

•

15 years ago

Attached patch approach a (obsolete) (deleted) — Details — Splinter Review

(dormant account)

Reporter

Comment 8

•

15 years ago

I was misinterpreting my logs. Turns out madvise() immediately triggers readahead. This makes linking everything into libxul.so, madvise()ing, then dlopen()ing it a robust way to read our binaries in fast.

(dormant account)

Reporter

Comment 9

•

15 years ago

Attached patch libxul preloader (obsolete) (deleted) — Details — Splinter Review

Here is another approach(dirty hack) for preloading the binary. This relies on gnu linker's function layout and gcc's initialization order. This places a marker function on the front of the binary and a startup function on the end. Gcc runs initializers in reverse order so the end function runs before any other initializer. I also use function/variable positions to compute the approximate size of the .text and .data sections. This loads the firefox into memory(with patches from bug 561842) with ~50 pagefaults. vs near 250 of a stock nightly.

Attachment #434313 - Attachment is obsolete: true

Attachment #434721 - Attachment is obsolete: true

(dormant account)

Reporter

Comment 10

•

15 years ago

Attached patch libxul preloader (deleted) — Details — Splinter Review

Correct patch

Attachment #447252 - Attachment is obsolete: true

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 11

•

15 years ago

Comment on attachment 447253 [details] [diff] [review] libxul preloader This is the best patch I've ever seen. Please never ever commit it to the tree.

Jim Blandy :jimb

Comment 12

•

15 years ago

(In reply to comment #11) > (From update of attachment 447253 [details] [diff] [review]) > This is the best patch I've ever seen. Please never ever commit it to the tree. Whatever, but this is massively less invasive and complicated than the other stuff Taras had been looking into.

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 13

•

15 years ago

Actually, I take it back: it could go in fine if it was a good win, I guess. I don't really care about the performance of XULRunner-stubbed builds, since people who distribute that way are basically deciding to make startup slow anyway. (And they tend to correlate with people who want to link against a bazillion system libraries, rather than use the in-tree ones that we can get into a big libxul anyway.)

(dormant account)

Reporter

Comment 14

•

15 years ago

(In reply to comment #13) > Actually, I take it back: it could go in fine if it was a good win, I guess. I > don't really care about the performance of XULRunner-stubbed builds, since > people who distribute that way are basically deciding to make startup slow > anyway. (And they tend to correlate with people who want to link against a > bazillion system libraries, rather than use the in-tree ones that we can get > into a big libxul anyway.) I'm not sure why this would negatively affect xulrunner stubs. I would also point out that using system libraries instead of [even statically] linking our own is a performance win as they have a higher chance of being cached already.

Karl Tomlinson (:karlt)

Comment 15

•

15 years ago

Have you considered using the build system to get the appropriate libxul offsets and compile them into the executable in an approach like comment 1 / attachment 434313 [details] with mmap, madvise, dlopen?

Mike Hommey [:glandium]

Comment 16

•

15 years ago

You could use linker scripts to set symbols with the proper values to get the offsets. Anyways, I don't think we need this yet.

(dormant account)

Reporter

Comment 17

•

14 years ago

Attached patch firefox-bin that dynamically opens libxul.so (deleted) — Details — Splinter Review

This turned out to be super-easy. This version of the hack use dlopen(as suggested above) to workaround avoid ld.so stupidity.

(dormant account)

Reporter

Updated

•

14 years ago

Blocks: 627591

Mike Hommey [:glandium]

Updated

•

14 years ago

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.