Closed Bug 201224 Opened 22 years ago Closed 15 years ago

Using mmap to memory map the JAR file, to reduce number of seeks in BuildFileList, and sequent entry copying

Categories

(Core :: Networking: JAR, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 504864
Future

People

(Reporter: alfredkayser, Unassigned)

Details

(Whiteboard: [Perf testing on unix platforms wanted])

Attachments

(4 obsolete files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030406 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030406 From bug 118455: So we can have both: mmap'ing the jar-file, and using memory-operations to seek and read, while still having a compressed (per item) jar file. Although a compress jar-file instead of a jarfile with compressed item is smaller (littlemozilla 253KB versus 304KB), the uncompressed version is about 500K, and that is even with mmap more costly in footprint than 304KB mmapped. The little reduction in download of Mozilla is not worth the footprint impact. In Summary, mmapping the jar files as they are currently is the least impact as only the jar-handling code needs to be changed. Actually, as far as I can see, only nsZipArchive.cpp is to be changed. Actually the current zip directory reading is quite bad on IO, lots of seeks (from the end, backwards), lots of reads of small buffers, even shifting buffers around, etc... With mmapping the whole directory can be parsed, directly from (virtual) memory, making the directory parsing also much simpler. ------- Additional Comment #15 From Roland Mainz (not reading bugmail) 2003-03-31 05:49 ------- Alfred Kayser wrote: > Actually the current zip directory reading is quite bad on IO, [snip] > With mmapping the whole directory can be parsed, directly from > (virtual) memory, making the directory parsing also much simpler. Can you open a seperate bug for that and post the bugid here, please ? So, here it is: Implementing mmap in nsZipArchive.cpp, would remove the need to randomly seek through file, and would enable to copy entries directly from the mmap'ed memory. As mmap is readonly, the used memory can then be shared with other apps (GRE?). Reproducible: Always Steps to Reproduce: Actual Results: Lots of seeks, and malloc saved.
Over to alecf to check on this.
Assignee: jaggernaut → alecf
Status: UNCONFIRMED → NEW
Ever confirmed: true
cc'ing darin who has had some experience in this area. I'll iterate some of the reasoning against this later..
one point: current mallocs are done by necko. we used to malloc a bunch in libjar. alecf removed most of those, and unfortunately it didn't help Ts or Txul at all. so, i doubt this is an area that needs to be optimized further.
Some "pro"-|mmap()| arguments are listed in bug 118455 comment 22. AFAIK some platforms will get a huge benefit from using |mmap()| etc. etc.
well, I await anyone's patches...
Component: XP Apps → Networking: File
Target Milestone: --- → Future
Attached file A first and incomplete attempt to mmap the zipfile (obsolete) (deleted) —
A first and incomplete attempt to mmap the zipfile: Changes: * ZipItem changed members to Get functions, so that structure now only contains hash-link, and pointer to 'central' structure in mmaped zipfile. * Also the filename is not copied, but directly used from the mmap. This saves a lot of bytes (from about 80 to 8 bytes per entry). * Changed BuildFileItem to used mmap, but not complete yet. * Adapted nsJAR* to use the changed ZipItem. * Optimized GetModTime a bit. (removed dosdate/time) * Removed DATA flag trick, as it is not needed for mmap. * Renamed parameter from ZipEntry to aName (when it is a filename) * Functions IsSymLink and GetMode now only enabled for XP_UNIX. * SeekToItem renamed to GetOffset More mmapping to be done yet: * define functions to wrap OS versions of mmap (or do they exists allready somewhere? * change remainder of nsZipArchive.cpp to mmap. * remove the arena special stuff (as all allocs can be done in one step). * look at zlib and copytobuffer for further optimizations... When all my 'patching' is done, can someone else try to compile it, and test it for any speed improvements?
uh, what format is this patch in? "application/octet-stream"? is this a zip file? a gzip'ed patch? since I can't see it I'll just say "make sure you're using NSPR mmap'ing functions so that this can be cross platform!"
Attachment #120807 - Attachment mime type: application/octet-stream → application/zip
Attached file Newer version (obsolete) (deleted) —
This is almost complete, except for compilation and testing. Changes sofar: * Use PR mmap functions (hopefully in the right way) * Changed more param-names from ZipEntry to name * Cleanup of ExtractTo functions * Now only 'Inflate' needs a (stack)buffer. * Read now only allocates buffer with Realsize for Inflate items. * Read directly copies from mmap to buffer provided by caller. Next steps: * Try compilation (that's right: this is sofar just a 'paper' exercise). * Convert hashtable: as size is known before alloc, we can allocate a hashtable just big enough for the entries (removing the linking). Now memory: 256*4 (table) + 8*entries, after: 1.3*4*entries... * For standalone, fixup zipstub.h to define the new PR_*mmap* calls. If someone else volunteers to take-over from here, be my guest ;-)
Attachment #120807 - Attachment is obsolete: true
please provide a patch file, it'll be easier to review (use 'cvs diff -u > patch.txt') your files aren't current with the trunk (for instance it does not include XP_PC removal)
Be patient, I'm working on it. It compiles and works OK now. It seems much faster, but it requires some real performance testing. Things still to do: * test standalone * update to latest tree (implement 'deflate' on demand, using alecf's code). Note: when the file is m-mapped, the handle is closed. Note: further memory reduction achieved using a simpler hashtable.
oh man, you haven't updated to the inflate-on-demand code? you're going to have to rewrite this almost completely - I totally changed the structure of that file. I would update ASAP and again, PLEASE post patches, not ZIP files!
I've allready done that part, and it actually it was not so difficult, using your code, and the easier model of reading from memory instead of from file. I'm now compiling and testing it, but had some troubles with my tree, so it will take a couple of hours... Patch coming as soon as it compiles and runs (against a fresh tree).
Attached patch Patchfile (to source tree of 2003-04-22) (obsolete) (deleted) — Splinter Review
Real patchfile. This one compiles and runs OK. One problem left (but see bug 198487) CRC checking doesn't work (for InputStream). Things changed: * Flat hashlist, allocated during BuildFileList, with improved hash function * Incremental deflate/copy for InputStream, without any buffers! * No filedesc. ownership issues anymore. Reads can be performed parallel on the same jar-file. * Structure size change: ZipReadState: from 4096+50+16=4162 to 50+16=66 ZipItem: from 32+namelen to 4 ZipArchive: from 12+256*4+n*ZipItem to 12+1.5*n*ZipItem (for n=500: 29536 to 3012) * More code reduction: Test using ExtractToFile(0) * File of jar is closed right after BuildItemList (so bug 153446 is now obsolote!) Next step: Performance testing, stress testing, ...
Attachment #120938 - Attachment is obsolete: true
Attached patch Patchfile from: cvs diff -uw (obsolete) (deleted) — Splinter Review
Removed a printf statement. Note: total codesize reduction about 4K. Memory is more difficult to measure.
Alec, The patch is complete (IMHO). Can you arrange for the required reviews? Thanks in advance, Alfred
alfred - please submit appropriate review requests via the "Edit" link next to the patch. myself and darin@netscape.com or dougt@netscape.com are probably the appropriate reviewers here. However, I will not be able to get to this until next week. What you've described sounds great though!
Attachment #121281 - Flags: superreview?(darin)
Attachment #121281 - Flags: review?(alecf)
Attachment #121281 - Flags: superreview?(darin)
Attachment #121281 - Flags: superreview?(alecf)
Attachment #121281 - Flags: review?(darin)
Attachment #121281 - Flags: review?(alecf)
Alfred: thanks for the patch... i'm really interested in checking this out, but can you comment on to what extent this improves performance? what kind of measurements have been done? a simple test is to load an HTML document like this: <html><body onLoad="window.close();"></body></html> load it from the filesystem, and be sure to enable this pref: pref("dom.allow_scripts_to_close_windows", true); then, you can get some idea of how this helps by running bash$ time ./mozilla file:///tmp/close.html several times.
times: OLD: jar50.dll size=98.364 0.01user 0.02system 0:04.42elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k 0inputs+0outputs (425major+0minor)pagefaults 0swaps 0.01user 0.01system 0:03.99elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k 0inputs+0outputs (425major+0minor)pagefaults 0swaps 0.01user 0.02system 0:03.95elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k 0inputs+0outputs (425major+0minor)pagefaults 0swaps 0.02user 0.01system 0:03.96elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k 0inputs+0outputs (425major+0minor)pagefaults 0swaps 0.01user 0.03system 0:03.96elapsed 1%CPU (0avgtext+0avgdata 6752maxresident)k 0inputs+0outputs (425major+0minor)pagefaults 0swaps NEW: jar50.dll size=94.268 0.01user 0.03system 0:04.88elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k 0inputs+0outputs (425major+0minor)pagefaults 0swaps 0.01user 0.02system 0:04.73elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k 0inputs+0outputs (425major+0minor)pagefaults 0swaps 0.01user 0.03system 0:04.69elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k 0inputs+0outputs (425major+0minor)pagefaults 0swaps 0.02user 0.00system 0:04.81elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k 0inputs+0outputs (425major+0minor)pagefaults 0swaps 0.01user 0.03system 0:04.81elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k 0inputs+0outputs (425major+0minor)pagefaults 0swaps So, while it is smaller in codesize, average elapsed time (in my debug-build version on a PIII, 1Ghz, 512Mb thinkpad) has *increased* from about 3.95 to about 4.8. Which is disappointing. I'll do some more testing and tuning hoping to improve it. Maybe instead of mapping the whole file, just pieces of it will improve things? I try that, and test if it improves. If not, we should close this thread...
Ok, I give up. With just mmap only used for the central directory, and the remainder of the code unchanged, it still gives worse numbers than before. So it seems that mmap doesn't help at all. Explanations: * the (standard) file io allready does a smart job. * memory is not really saved, as mmapping costs virtual memory still * the incremental reading allows allready to safe some buffer copying, and the rest of necko is tuned to do incremental reading. So, the mmap-option has been tested, but with a negative result. Resolving as WONTFIX.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → WONTFIX
Alfred Kayser wrote: > Ok, I give up. ;-(( Just curious: On which platform did you do your tests ? AIX ? Linux ?
Tested on Win2000. I'm not able to test on Linux (or AIX, or Mac, or OS/2...)
Comment on attachment 121281 [details] [diff] [review] Patchfile from: cvs diff -uw Review requests removed, due to WONTFIX
Attachment #121281 - Flags: superreview?(alecf)
Attachment #121281 - Flags: review?(darin)
Alfred Kayser wrote: > Tested on Win2000. Ouch. Win32 isn't really that good on mmap()&co. ... ;-/ > I'm not able to test on Linux (or AIX, or Mac, or OS/2...) Please consider to reopen the bug that we can test it on Linux/Solaris/AIX - I guess the results on these platforms should be much better... :)
Reopened per request. Requesting others to do some performance testing on Linux, AIX, MacOsX, etc... Maybe the patch is still usefull for those platforms. Note: http://bugzilla.mozilla.org/attachment.cgi?id=121281&action=view is still a valid patch (it compiles and runs), so ready for testing.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
as darin has explained once to me (and having heard him out, I believe him) mmap() is generally important when you need random access and want to be able to seek quickly around inside a file, reading different parts of it. When you just want to suck in large streams of data, and read them only once, there is no advantage to mmap() - the thing is, the data needs to be read into memory from disk at SOME point, so whether mmap() makes the kernel read it into some newly-dedicated page of memory, or read() reads the data into an existing buffer, the same thing happens behind the scenes... when you're reading it just once and throwing it away, you get a similar set of actions: mmap() <memcopies which may cause page faults, which read data into memory> unmmap() or open() read() read() read() close() the results are almost the same. About all you're saving is the system calls.
Some further testing, proves that MemMap in Win2000 is really bad. Just doing a simulation of mmap with malloc and fread is faster, even for the fullfile, than MemMapping the file (in Win2000). Only using mmap for the directory is still bad but... When simulating mmap (with malloc and fread) in Win2000 the new situation is a little bit faster than the original: Timings: old situation 4.13 (avg of 5 runs) new3/mio situation 5.83 (avg of 5 runs) new3/fio situation 4.04 (avg of 5 runs) (so about 3 to 4% speedup for new3/fio?) Memory usage InMemKB TotalKB old 35256 73360: old situation new/fio 42400 80620: simulate memmap (malloc/read) for full jar new/mio 35940 80712: real memmap for full jar Just the directory in 'mmap' new3/mio 35256 73360: real memmap for directory new3/fio 35232 73352: simulate memmap for directory So the last config (simulate memmap for directory) is the fastest and smallest!! Now, can other test at Linux, AIX, MacOsX? I can supply patches for the different situations at request.
Whiteboard: [Perf testing on unix platforms wanted]
-> whiteboxqa
QA Contact: paw → ashishbhatt
Assignee: alecf → nobody
Status: REOPENED → NEW
QA Contact: ashshbhatt → networking.file
Attachment #121279 - Attachment is obsolete: true
Attachment #121281 - Attachment is obsolete: true
bug 214672 has implemented all of the other performance improvements of the patches included in this bug. With those out of the way, it seems to me that mmap will not be usefull for Windows platforms, but could possibly be usefull for Unix platforms (Linux/MacOsX/etc), but I don't have readily access to these.
Component: Networking: File → Networking: JAR
QA Contact: networking.file → networking.jar
Status: NEW → RESOLVED
Closed: 22 years ago15 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: