Closed
Bug 594172
Opened 14 years ago
Closed 14 years ago
ZipWriter write()s too often
Categories
(Core :: Networking: JAR, defect)
Tracking
()
RESOLVED
FIXED
mozilla2.0b7
Tracking | Status | |
---|---|---|
blocking2.0 | --- | betaN+ |
People
(Reporter: taras.mozilla, Assigned: taras.mozilla)
References
Details
(Whiteboard: [ts])
Attachments
(1 file, 3 obsolete files)
(deleted),
patch
|
taras.mozilla
:
review+
|
Details | Diff | Splinter Review |
Combination of a tiny output buffer + overwriting header seems to result in up to 1000x overuse of write() calls.
Assignee | ||
Updated•14 years ago
|
Whiteboard: [ts]
Assignee | ||
Comment 1•14 years ago
|
||
Comment on attachment 472805 [details] [diff] [review]
Cutting out of bonus writes
Bumping the buffer up doesn't reduce writes due to excessive seeking while writing. Turns out the seeking is to write the header, which seems to happen on Close() anyway. With this change zips only need one write.
Attachment #472805 -
Flags: review?(dtownsend)
Comment 2•14 years ago
|
||
Comment on attachment 472805 [details] [diff] [review]
Cutting out of bonus writes
(In reply to comment #1)
> Bumping the buffer up doesn't reduce writes due to excessive seeking while
> writing. Turns out the seeking is to write the header, which seems to happen on
> Close() anyway. With this change zips only need one write.
This isn't true, zip files are meant to contain two headers for every entry, one is the file header that appears immediately before the file data (and is written in nsZipDataStream::CompleteEntry, the other is in the central directory which is written in nsZipWriter::Close. They are redundant and most tools seem to just use the central directory which is why this change passes tests, but it does not leave a correct zip file. You can see the problem by running zip -FF on a generated zip. It will complain about crc and file size mismatches and the "fixed" zip will be full of 0 length entries.
Attachment #472805 -
Flags: review?(dtownsend) → review-
Comment 3•14 years ago
|
||
I'm not sure you can avoid the seeking here.
You could perhaps punt it all to the end and go back and write all file headers and then the central directory in Close, but that loses a certain amount of fault tolerance (currently if the app crashes or something before nsZipWriter::Close is called the zip file should be repairable with common zip tools). It is still the same amount of seeking but maybe removing the seeking while writing the main file data gives you a win anyway.
The only way to truly solve this is to write the file header before the file data, but that requires knowing the compressed size of the data and the crc value in advance.
Assignee | ||
Comment 4•14 years ago
|
||
(In reply to comment #3)
> I'm not sure you can avoid the seeking here.
>
> You could perhaps punt it all to the end and go back and write all file headers
> and then the central directory in Close, but that loses a certain amount of
> fault tolerance (currently if the app crashes or something before
> nsZipWriter::Close is called the zip file should be repairable with common zip
> tools). It is still the same amount of seeking but maybe removing the seeking
> while writing the main file data gives you a win anyway.
That's reasonable.
>
> The only way to truly solve this is to write the file header before the file
> data, but that requires knowing the compressed size of the data and the crc
> value in advance.
you can also buffer new zip items before writing them out.
I wonder if we should just buffer the whole zip file in memory.
Comment 5•14 years ago
|
||
(In reply to comment #4)
> (In reply to comment #3)
> > The only way to truly solve this is to write the file header before the file
> > data, but that requires knowing the compressed size of the data and the crc
> > value in advance.
>
> you can also buffer new zip items before writing them out.
>
> I wonder if we should just buffer the whole zip file in memory.
That is certainly a choice that works for small enough files but it was originally designed to allow for large zip files to be created. I guess a way to indicate that you want faster operation for smaller zip files might work.
Assignee | ||
Comment 6•14 years ago
|
||
So I just confirmed my theory, having lots of write() calls causes windows to fragment the file a lot. The above patch gets rid of fragmentation nicely.
So given your requirements, updating offsets after the file is written is the best option.
Assignee | ||
Comment 7•14 years ago
|
||
This passes tests. The file is still more fragmented than it needs to be on updates.
I wonder if we should add an api to leave some space inbetween last file and the central directory. Either that or support opening zipfiles from filehandle so then the user can call mozilla::fallocate on it (bug 592520). This would also involve not doing SetEOF in that case.
Assignee: nobody → tglek
Attachment #472805 -
Attachment is obsolete: true
Attachment #473156 -
Flags: review?(dtownsend)
Assignee | ||
Updated•14 years ago
|
Attachment #473156 -
Flags: review?(dtownsend)
Assignee | ||
Comment 8•14 years ago
|
||
Attachment #473156 -
Attachment is obsolete: true
Attachment #473183 -
Flags: review?(dtownsend)
Comment 9•14 years ago
|
||
FWIW, zip is designed to work with streaming. There is a mode to write the zip length information after the data so no seeking is required.
Comment 10•14 years ago
|
||
(In reply to comment #9)
> FWIW, zip is designed to work with streaming. There is a mode to write the zip
> length information after the data so no seeking is required.
Can you point to details on this, the main zip spec is pretty explicit about file headers coming before their file data and also file headers have no offset information normally.
Comment 11•14 years ago
|
||
From the spec: http://www.pkware.com/documents/APPNOTE/APPNOTE-6.3.0.TXT
C. Data descriptor:
crc-32 4 bytes
compressed size 4 bytes
uncompressed size 4 bytes
This descriptor exists only if bit 3 of the general
purpose bit flag is set (see below). It is byte aligned
and immediately follows the last byte of compressed data.
This descriptor is used only when it was not possible to
seek in the output .ZIP file, e.g., when the output .ZIP file
was standard output or a non-seekable device.
Basically, this optional post-data descriptor contains information which can only be calculated after the fact while streaming.
Comment 12•14 years ago
|
||
Oh yeah. For some reason I recall that there were problems with that, but we could try it.
Comment 13•14 years ago
|
||
Comment on attachment 473183 [details] [diff] [review]
Now updates work with zip -FF
>diff --git a/modules/libjar/zipwriter/src/nsZipWriter.cpp b/modules/libjar/zipwriter/src/nsZipWriter.cpp
>--- a/modules/libjar/zipwriter/src/nsZipWriter.cpp
>+++ b/modules/libjar/zipwriter/src/nsZipWriter.cpp
>@@ -301,7 +301,7 @@ NS_IMETHODIMP nsZipWriter::Open(nsIFile
> return rv;
> }
>
>- rv = NS_NewBufferedOutputStream(getter_AddRefs(mStream), stream, 0x800);
>+ rv = NS_NewBufferedOutputStream(getter_AddRefs(mStream), stream, 3*1024*1024);
Spaces around the *s please.
Attachment #473183 -
Flags: review?(dtownsend) → review+
Assignee | ||
Comment 14•14 years ago
|
||
Requesting approval to land because this fixes a significant startup performance problem.
Attachment #473183 -
Attachment is obsolete: true
Attachment #474758 -
Flags: review+
Attachment #474758 -
Flags: approval2.0?
Comment 15•14 years ago
|
||
What's the cost/benefit of taking this in terms of risk of regression? We're getting close to code freeze, and I'd rather rachet down regression risk until we re-open for the next beta. Opinions?
Comment 16•14 years ago
|
||
This will help startup perf significantly, and is fairly safe. I think we should take it, and block on it since it's related to a startup cache regression.
blocking2.0: --- → betaN+
Assignee | ||
Updated•14 years ago
|
Keywords: checkin-needed
Comment 17•14 years ago
|
||
Status: NEW → RESOLVED
Closed: 14 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b7
Comment 18•14 years ago
|
||
Comment on attachment 474758 [details] [diff] [review]
write out in bigger chunks.
Canceling unnecessary approval request, since this is blocking+
Attachment #474758 -
Flags: approval2.0?
Updated•14 years ago
|
Version: unspecified → Trunk
Comment 19•14 years ago
|
||
This may have caused bug 610040.
You need to log in
before you can comment on or make changes to this bug.
Description
•