Closed
Bug 201224
Opened 22 years ago
Closed 15 years ago
Using mmap to memory map the JAR file, to reduce number of seeks in BuildFileList, and sequent entry copying
Categories
(Core :: Networking: JAR, defect)
Core
Networking: JAR
Tracking
()
RESOLVED
DUPLICATE
of bug 504864
Future
People
(Reporter: alfredkayser, Unassigned)
Details
(Whiteboard: [Perf testing on unix platforms wanted])
Attachments
(4 obsolete files)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030406
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030406
From bug 118455:
So we can have both: mmap'ing the jar-file, and using memory-operations to seek
and read, while still having a compressed (per item) jar file.
Although a compress jar-file instead of a jarfile with compressed item is
smaller (littlemozilla 253KB versus 304KB), the uncompressed version is about
500K, and that is even with mmap more costly in footprint than 304KB mmapped.
The little reduction in download of Mozilla is not worth the footprint impact.
In Summary, mmapping the jar files as they are currently is the least impact as
only the jar-handling code needs to be changed.
Actually, as far as I can see, only nsZipArchive.cpp is to be changed.
Actually the current zip directory reading is quite bad on IO,
lots of seeks (from the end, backwards), lots of reads of small buffers,
even shifting buffers around, etc...
With mmapping the whole directory can be parsed, directly from
(virtual) memory, making the directory parsing also much simpler.
------- Additional Comment #15 From Roland Mainz (not reading bugmail)
2003-03-31 05:49 -------
Alfred Kayser wrote:
> Actually the current zip directory reading is quite bad on IO,
[snip]
> With mmapping the whole directory can be parsed, directly from
> (virtual) memory, making the directory parsing also much simpler.
Can you open a seperate bug for that and post the bugid here, please ?
So, here it is:
Implementing mmap in nsZipArchive.cpp, would remove the need to randomly seek
through file, and would enable to copy entries directly from the mmap'ed memory.
As mmap is readonly, the used memory can then be shared with other apps (GRE?).
Reproducible: Always
Steps to Reproduce:
Actual Results:
Lots of seeks, and malloc saved.
Comment 1•22 years ago
|
||
Over to alecf to check on this.
Assignee: jaggernaut → alecf
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 2•22 years ago
|
||
cc'ing darin who has had some experience in this area.
I'll iterate some of the reasoning against this later..
Comment 3•22 years ago
|
||
one point:
current mallocs are done by necko. we used to malloc a bunch in libjar. alecf
removed most of those, and unfortunately it didn't help Ts or Txul at all. so,
i doubt this is an area that needs to be optimized further.
Comment 4•22 years ago
|
||
Some "pro"-|mmap()| arguments are listed in bug 118455 comment 22.
AFAIK some platforms will get a huge benefit from using |mmap()| etc. etc.
Comment 5•22 years ago
|
||
well, I await anyone's patches...
Component: XP Apps → Networking: File
Target Milestone: --- → Future
Reporter | ||
Comment 6•22 years ago
|
||
A first and incomplete attempt to mmap the zipfile:
Changes:
* ZipItem changed members to Get functions, so that structure now
only contains hash-link, and pointer to 'central' structure in mmaped
zipfile.
* Also the filename is not copied, but directly used from the mmap.
This saves a lot of bytes (from about 80 to 8 bytes per entry).
* Changed BuildFileItem to used mmap, but not complete yet.
* Adapted nsJAR* to use the changed ZipItem.
* Optimized GetModTime a bit. (removed dosdate/time)
* Removed DATA flag trick, as it is not needed for mmap.
* Renamed parameter from ZipEntry to aName (when it is a filename)
* Functions IsSymLink and GetMode now only enabled for XP_UNIX.
* SeekToItem renamed to GetOffset
More mmapping to be done yet:
* define functions to wrap OS versions of mmap (or do they exists allready
somewhere?
* change remainder of nsZipArchive.cpp to mmap.
* remove the arena special stuff (as all allocs can be done in one step).
* look at zlib and copytobuffer for further optimizations...
When all my 'patching' is done, can someone else try to compile it, and test it
for any speed improvements?
Comment 7•22 years ago
|
||
uh, what format is this patch in? "application/octet-stream"? is this a zip
file? a gzip'ed patch?
since I can't see it I'll just say "make sure you're using NSPR mmap'ing
functions so that this can be cross platform!"
Updated•22 years ago
|
Attachment #120807 -
Attachment mime type: application/octet-stream → application/zip
Reporter | ||
Comment 8•22 years ago
|
||
This is almost complete, except for compilation and testing.
Changes sofar:
* Use PR mmap functions (hopefully in the right way)
* Changed more param-names from ZipEntry to name
* Cleanup of ExtractTo functions
* Now only 'Inflate' needs a (stack)buffer.
* Read now only allocates buffer with Realsize for Inflate items.
* Read directly copies from mmap to buffer provided by caller.
Next steps:
* Try compilation (that's right: this is sofar just a 'paper' exercise).
* Convert hashtable: as size is known before alloc, we can allocate
a hashtable just big enough for the entries (removing the linking).
Now memory: 256*4 (table) + 8*entries, after: 1.3*4*entries...
* For standalone, fixup zipstub.h to define the new PR_*mmap* calls.
If someone else volunteers to take-over from here, be my guest ;-)
Attachment #120807 -
Attachment is obsolete: true
Comment 9•22 years ago
|
||
please provide a patch file, it'll be easier to review
(use 'cvs diff -u > patch.txt')
your files aren't current with the trunk (for instance it does not include XP_PC
removal)
Reporter | ||
Comment 10•22 years ago
|
||
Be patient, I'm working on it. It compiles and works OK now.
It seems much faster, but it requires some real performance testing.
Things still to do:
* test standalone
* update to latest tree (implement 'deflate' on demand, using alecf's code).
Note: when the file is m-mapped, the handle is closed.
Note: further memory reduction achieved using a simpler hashtable.
Comment 11•22 years ago
|
||
oh man, you haven't updated to the inflate-on-demand code? you're going to have
to rewrite this almost completely - I totally changed the structure of that
file. I would update ASAP
and again, PLEASE post patches, not ZIP files!
Reporter | ||
Comment 12•22 years ago
|
||
I've allready done that part, and it actually it was not so difficult,
using your code, and the easier model of reading from memory instead of
from file.
I'm now compiling and testing it, but had some troubles with my tree,
so it will take a couple of hours...
Patch coming as soon as it compiles and runs (against a fresh tree).
Reporter | ||
Comment 13•22 years ago
|
||
Real patchfile. This one compiles and runs OK.
One problem left (but see bug 198487) CRC checking doesn't work (for
InputStream).
Things changed:
* Flat hashlist, allocated during BuildFileList, with improved hash function
* Incremental deflate/copy for InputStream, without any buffers!
* No filedesc. ownership issues anymore. Reads can be performed parallel on the
same jar-file.
* Structure size change:
ZipReadState: from 4096+50+16=4162 to 50+16=66
ZipItem: from 32+namelen to 4
ZipArchive: from 12+256*4+n*ZipItem to 12+1.5*n*ZipItem (for n=500: 29536 to
3012)
* More code reduction: Test using ExtractToFile(0)
* File of jar is closed right after BuildItemList (so bug 153446 is now
obsolote!)
Next step: Performance testing, stress testing, ...
Attachment #120938 -
Attachment is obsolete: true
Reporter | ||
Comment 14•22 years ago
|
||
Removed a printf statement.
Note: total codesize reduction about 4K. Memory is more difficult to measure.
Reporter | ||
Comment 15•22 years ago
|
||
Alec,
The patch is complete (IMHO).
Can you arrange for the required reviews?
Thanks in advance, Alfred
Comment 16•22 years ago
|
||
alfred - please submit appropriate review requests via the "Edit" link next to
the patch. myself and darin@netscape.com or dougt@netscape.com are probably the
appropriate reviewers here. However, I will not be able to get to this until
next week. What you've described sounds great though!
Updated•22 years ago
|
Attachment #121281 -
Flags: superreview?(darin)
Attachment #121281 -
Flags: review?(alecf)
Reporter | ||
Updated•22 years ago
|
Attachment #121281 -
Flags: superreview?(darin)
Attachment #121281 -
Flags: superreview?(alecf)
Attachment #121281 -
Flags: review?(darin)
Attachment #121281 -
Flags: review?(alecf)
Comment 17•22 years ago
|
||
Alfred:
thanks for the patch... i'm really interested in checking this out, but can you
comment on to what extent this improves performance? what kind of measurements
have been done?
a simple test is to load an HTML document like this:
<html><body onLoad="window.close();"></body></html>
load it from the filesystem, and be sure to enable this pref:
pref("dom.allow_scripts_to_close_windows", true);
then, you can get some idea of how this helps by running
bash$ time ./mozilla file:///tmp/close.html
several times.
Reporter | ||
Comment 18•22 years ago
|
||
times:
OLD:
jar50.dll size=98.364
0.01user 0.02system 0:04.42elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k
0inputs+0outputs (425major+0minor)pagefaults 0swaps
0.01user 0.01system 0:03.99elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k
0inputs+0outputs (425major+0minor)pagefaults 0swaps
0.01user 0.02system 0:03.95elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k
0inputs+0outputs (425major+0minor)pagefaults 0swaps
0.02user 0.01system 0:03.96elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k
0inputs+0outputs (425major+0minor)pagefaults 0swaps
0.01user 0.03system 0:03.96elapsed 1%CPU (0avgtext+0avgdata 6752maxresident)k
0inputs+0outputs (425major+0minor)pagefaults 0swaps
NEW:
jar50.dll size=94.268
0.01user 0.03system 0:04.88elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k
0inputs+0outputs (425major+0minor)pagefaults 0swaps
0.01user 0.02system 0:04.73elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k
0inputs+0outputs (425major+0minor)pagefaults 0swaps
0.01user 0.03system 0:04.69elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k
0inputs+0outputs (425major+0minor)pagefaults 0swaps
0.02user 0.00system 0:04.81elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k
0inputs+0outputs (425major+0minor)pagefaults 0swaps
0.01user 0.03system 0:04.81elapsed 0%CPU (0avgtext+0avgdata 6752maxresident)k
0inputs+0outputs (425major+0minor)pagefaults 0swaps
So, while it is smaller in codesize, average elapsed time
(in my debug-build version on a PIII, 1Ghz, 512Mb thinkpad) has
*increased* from about 3.95 to about 4.8. Which is disappointing.
I'll do some more testing and tuning hoping to improve it.
Maybe instead of mapping the whole file, just pieces of it will improve things?
I try that, and test if it improves. If not, we should close this thread...
Reporter | ||
Comment 19•22 years ago
|
||
Ok, I give up.
With just mmap only used for the central directory, and
the remainder of the code unchanged, it still gives
worse numbers than before. So it seems that mmap doesn't help at all.
Explanations:
* the (standard) file io allready does a smart job.
* memory is not really saved, as mmapping costs virtual memory still
* the incremental reading allows allready to safe some buffer copying, and
the rest of necko is tuned to do incremental reading.
So, the mmap-option has been tested, but with a negative result.
Resolving as WONTFIX.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → WONTFIX
Comment 20•22 years ago
|
||
Alfred Kayser wrote:
> Ok, I give up.
;-((
Just curious:
On which platform did you do your tests ? AIX ? Linux ?
Reporter | ||
Comment 21•22 years ago
|
||
Tested on Win2000. I'm not able to test on Linux (or AIX, or Mac, or OS/2...)
Reporter | ||
Comment 22•22 years ago
|
||
Comment on attachment 121281 [details] [diff] [review]
Patchfile from: cvs diff -uw
Review requests removed, due to WONTFIX
Attachment #121281 -
Flags: superreview?(alecf)
Attachment #121281 -
Flags: review?(darin)
Comment 23•22 years ago
|
||
Alfred Kayser wrote:
> Tested on Win2000.
Ouch.
Win32 isn't really that good on mmap()&co. ... ;-/
> I'm not able to test on Linux (or AIX, or Mac, or OS/2...)
Please consider to reopen the bug that we can test it on Linux/Solaris/AIX - I
guess the results on these platforms should be much better... :)
Reporter | ||
Comment 24•22 years ago
|
||
Reopened per request.
Requesting others to do some performance testing on Linux, AIX, MacOsX, etc...
Maybe the patch is still usefull for those platforms.
Note: http://bugzilla.mozilla.org/attachment.cgi?id=121281&action=view
is still a valid patch (it compiles and runs), so ready for testing.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Comment 25•22 years ago
|
||
as darin has explained once to me (and having heard him out, I believe him)
mmap() is generally important when you need random access and want to be able to
seek quickly around inside a file, reading different parts of it. When you just
want to suck in large streams of data, and read them only once, there is no
advantage to mmap() - the thing is, the data needs to be read into memory from
disk at SOME point, so whether mmap() makes the kernel read it into some
newly-dedicated page of memory, or read() reads the data into an existing
buffer, the same thing happens behind the scenes... when you're reading it just
once and throwing it away, you get a similar set of actions:
mmap()
<memcopies which may cause page faults, which read data into memory>
unmmap()
or
open()
read()
read()
read()
close()
the results are almost the same. About all you're saving is the system calls.
Reporter | ||
Comment 26•22 years ago
|
||
Some further testing, proves that MemMap in Win2000 is really bad.
Just doing a simulation of mmap with malloc and fread is faster,
even for the fullfile, than MemMapping the file (in Win2000).
Only using mmap for the directory is still bad but...
When simulating mmap (with malloc and fread) in Win2000 the new situation
is a little bit faster than the original:
Timings: old situation 4.13 (avg of 5 runs)
new3/mio situation 5.83 (avg of 5 runs)
new3/fio situation 4.04 (avg of 5 runs)
(so about 3 to 4% speedup for new3/fio?)
Memory usage
InMemKB TotalKB
old 35256 73360: old situation
new/fio 42400 80620: simulate memmap (malloc/read) for full jar
new/mio 35940 80712: real memmap for full jar
Just the directory in 'mmap'
new3/mio 35256 73360: real memmap for directory
new3/fio 35232 73352: simulate memmap for directory
So the last config (simulate memmap for directory) is the fastest and smallest!!
Now, can other test at Linux, AIX, MacOsX?
I can supply patches for the different situations at request.
Updated•18 years ago
|
Assignee: alecf → nobody
Status: REOPENED → NEW
QA Contact: ashshbhatt → networking.file
Reporter | ||
Updated•18 years ago
|
Attachment #121279 -
Attachment is obsolete: true
Reporter | ||
Updated•18 years ago
|
Attachment #121281 -
Attachment is obsolete: true
Reporter | ||
Comment 28•18 years ago
|
||
bug 214672 has implemented all of the other performance improvements of the patches included in this bug. With those out of the way, it seems to me that mmap will not be usefull for Windows platforms, but could possibly be usefull for Unix platforms (Linux/MacOsX/etc), but I don't have readily access to these.
Reporter | ||
Updated•18 years ago
|
Component: Networking: File → Networking: JAR
Updated•18 years ago
|
QA Contact: networking.file → networking.jar
Reporter | ||
Updated•15 years ago
|
Status: NEW → RESOLVED
Closed: 22 years ago → 15 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•