Closed Bug 35956 Opened 25 years ago Closed 23 years ago

File extension not changed but gzipped files expanded when saving

Categories

(Core :: Networking: HTTP, defect, P2)

x86
All
defect

Tracking

()

VERIFIED FIXED
mozilla0.9.2

People

(Reporter: myk, Assigned: darin.moz)

References

()

Details

(Keywords: regression, Whiteboard: [PDTP2] relnote-user)

Attachments

(5 files)

Overview Description: Mozilla now automatically decompresses a file in gzip format (.gz extension) when downloading it, but it doesn't change the file extension. This is misleading: I think the file is still a gzipped file and unsuccessfully try to gunzip it when I should be untarring it. It probably also causes problems in graphical file managers that base a file's type on its extension. Steps to Reproduce: 1) go to the url and save the file when mozilla asks you what to do with it. 2) after the file has finished downloading, browse to its location in your filesystem Actual Results: file is named cervisia-0.6.0.tar.gz Expected Results: file is named cervisia-0.6.0.tar Build Date & Platform Bug Found: Linux 2000-04-14-09 Additional Builds and Platforms Tested On: none
Shouldn't it also ask before decompressing? If you click on a link to a .gz file, you probably to get a .gz file. Somewhat related is bug 31519, "Save as: should add extension to match content type".
->law
Assignee: gagan → law
Target Milestone: --- → M18
*** Bug 39964 has been marked as a duplicate of this bug. ***
I really think the behaviour we want here is to keep the .gz extention, but save the file as gziped data. We only want to uncompress if we're going to view the file internally (how are we detecting this anyway, *.txt.gz?) It's wrong to have mozilla silently gunziping all downloads. It does nothing but waste drive space, as most applications do this at runtime (vi, less, whatever else you'd be viewing gziped ascii with...) and for a .gz of non-ascii, its even more useless. since test case involves two linked files, i put it at: http://turbogeek.org/mozilla/gzip.html
Move to M21 target milestone.
Target Milestone: M18 → M21
What Mr. Dolan said, except twice. Mozilla has no business decompressing files just because I download them.
*** Bug 42019 has been marked as a duplicate of this bug. ***
Uping severity. This is freaking huge folks. If I try to download a mozilla build with mozilla, instead of 4.7, after it downloads, i get to wait while mozilla's sloooooow decompression process sucks 100% CPU for longer then it actually took to download. Then, to make matters worse, mozilla leaks a chuck the size of the uncompressed version of the file (25-30M, in the case of a mozilla nightly), occasionally crashing the whole thing, as it tries to swap in chrome or something. Marking minor->major, perf, crash, mlk. Please reconsidder targeting. This really should be a M16 blocker.
Severity: minor → major
Keywords: crash, mlk, perf
It looks like 39241 may have been another report of this crash. Looks like someone reported it while downloading quakeforge (fairly large), then it ended up as WORKSFORME, as the testcase was a much smaller file, not causing enough of a memory leak. Uping severity again, as this confirms my crash reporting.
Severity: major → critical
I'm reassigning this to the Networking component. I've recently been dealing with some more bugs real similar to this one. If there is a way to open an input stream in such a way to avoid the decompressing, then please fill me in and I can fix it in nsStreamXferOp.cpp.
Assignee: law → gagan
Actually, I just went back and re-opened Bug 39241. It is a reliable crasher with a testcase. The bug "went away", without an explicit fix, so I marked it WORKSFORME. We need to get stack traces to determine where this is crashing, so we can sort out what is causing the problem.
nsbeta2 radar
Keywords: nsbeta2
Putting on [nsbeta2+] radar for beta2 fix.
Whiteboard: [nsbeta2+]
Bug 33808 is one of the similar bugs.
After talking to mscott, I think this bug is best resolved in the URI loader area. However for now I am adding a call in nsIHTTPChannel to allow doing the conversion inside of HTTP. law: after I finish adding that youd need to QI the channel to nsIHTTPChannel and then set doConversion to false, before you start reading anything off of it.
Status: NEW → ASSIGNED
*** Bug 33808 has been marked as a duplicate of this bug. ***
I just finished implementing this. The correct call is SetApplyConversion(PR_FALSE). After I check it in tonight I will reassign this back to you law.
Adding mostfreq as a mostfreq bug was marked a dupe of this one. Gerv
Keywords: mostfreq
I would like to vote to YES decompress data if a viewer (external) for the decompressed version is defined. Or at least ask the user. I make use of this feature of netscape 4 all the time. Compressing all my files and letting netscape do the right thing when I view them - it decompresses them and then opens their defined viewer. If it's a save as, then it should not decompress. So, if it askes, save it, or view it: if I pick save, it's saved unchanged, if I pick view then it's decompressed before the viewer gets it. I guess some people would want the viewer to get the compressed data - perhaps a configuration in the viewer definition?
I don't think anyone is arguing to not un-gzip stuff if we're going to view it. Since the gzip code has to be in mozilla for HTTP compression, mine as well support viewing normal .gz files. Rewording summary.
Summary: file extension not changed when gzipped (.gz) files expanded on download → File extension not changed but gzipped files expanded when saving
checked in. law take it away...
Assignee: gagan → law
Status: ASSIGNED → NEW
I've got this patch; Gagan is reviewing... Index: nsStreamTransfer.cpp =================================================================== RCS file: /cvsroot/mozilla/xpfe/components/xfer/src/nsStreamTransfer.cpp,v retrieving revision 1.19 diff -u -r1.19 nsStreamTransfer.cpp --- nsStreamTransfer.cpp 2000/06/09 00:49:40 1.19 +++ nsStreamTransfer.cpp 2000/06/24 00:57:53 @@ -131,6 +131,13 @@ NS_SUCCEEDED( outputFile->IsValid( &isValid ) ) && isValid ) { + // Try to get HTTP channel. + nsCOMPtr<nsIHTTPChannel> httpChannel = do_QueryInterface( aChannel ); + if ( httpChannel ) { + // Turn off content encoding conversions. + httpChannel->SetApplyConversion( PR_FALSE ); + } + // Construct stream transfer operation to be given to dialog. nsStreamXferOp *p= new nsStreamXferOp( aChannel, outputFile );
Status: NEW → ASSIGNED
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
verified: Linux 2000062808
Status: RESOLVED → VERIFIED
I'm seeing this bug again. Both with the URL above and other tar.gz files. WinNT buildID 2000081504
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
This has regressed as of at least Linux 2000-08-16-05. When downloading the give URL, the file is expanded but name is not changed. Also reported on WinNT -> all/all. No crash this time, changing severity to normal.
Severity: critical → normal
Keywords: regression
OS: Linux → All
*** Bug 49279 has been marked as a duplicate of this bug. ***
*** Bug 49279 has been marked as a duplicate of this bug. ***
nav triage team: [nsbeta3+] regression
Whiteboard: [nsbeta2+] → [nsbeta2+][nsbeta3+]
Marking P1.
Priority: P3 → P1
Setting to [nsbeta2-] since that is outta here. [nsbeta3+] is already indicated.
Whiteboard: [nsbeta2+][nsbeta3+] → [nsbeta2-][nsbeta3+]
There is an extremely important detail that, if provided, would greatly help me to figure out the source of the problem: When this fails, is it after seeing the (newer) "Downloading" (aka "super helper app") dialog? If yes, does it behave the same way if you right-click on the link and choose "Save link as...?" My theory is that the answers to those two questions are "yes" and "no." This would be caused by the fact that the SetApplyConversion( PR_FALSE ) never happens when you go through that other dialog. This might get messy. Scott, we might have to reload the URL in this case to get the bits without the conversion.
PDT downgrading to P2 and leaving [PDTP2] in status whiteboard
Priority: P1 → P2
Whiteboard: [nsbeta2-][nsbeta3+] → [nsbeta2-][nsbeta3+][PDTP2]
Unable to reproduce this bug on PC/Linux build 2000090111 with the given URL. I see an "Unknown File Type" dialog (application/x-gzip), click "Save File..." and then "Ok". As a result, cervisia-0.6.0.tar.gz is saved, and an integrity test "tar ztf ..." does not complain, whereas "tar tf ..." shows an error. Thus the saved file is still gzipped (as it should).
I suspect we might see different behavior depending on *how* you download. If you click on a link directly and see the "Downloading" dialog, that might result in different behavior than if you right-click on the link and choose "Save link as..." (which produces the "unknown content" dialog and then the download progress dialog). Typing the url in the location bar seems to produce the latter behavior. Do *both* of those techniques produce the same behavior?
Blocks: 50326
For the URL specified it never decompresses the file I've tried all three methods (left click, Save-link as, typing it in URL bar). All three methods works the same in the build 2000090321. It doesn't decompress the file. So I think it's WORKSFORME or not?
If they all work the same then that's a different problem, perhaps. The data should be uncompressed as it is loaded into the browser (in case it is html). Maybe Necko isn't uncompressing gzip data now?
I think there are two bugs here. After testing this out (using the bug's URL) I see that if you right-click and do "Save link as...", the file is decompressed (but of course should not be). This despite (I believe), calling SetTransferEncoding(PR_FALSE) on the channel. This appears to be a networking bug. If I click on the link and then choose "Save to disk" on the Downloading dialog, it is also decompressed. This will still be broken when the networking glitch is fixed, I believe. That's the second bug. We need to issue the SetTransferEncoding(PR_FALSE) when we decide to save to disk (or to open a helper app, for that matter). One other detail: This may be due to relatively recent changes that fixed the downloading code to use the cached version of the file. If the version in the cache is decompressed, then that might account for what we're seeing. I'm going to investigate a bit more and then (probably) reassign to Networking.
Turning off cache got Save link as... working properly. Seems to be a cache interaction. Basically, the data is in the cache decompressed and when we ask for it again, we get the decompressed data, even though we've specified SetAutoEncoding(PR_FALSE). I'm reassigning to the Networking component so that that can be fixed. I think there's still the bug in the new Downloading code whereby the data will be decompressed if you just click on the link. Note that this is somewhat tricky to detect because if the data in the cache is compressed, then when *that* code loads it, it gets the compressed data (and thus appears to be working properly). The right thing is for Networking to straighten out the cache business and then reassign this to mscott to get the helper app service to turn off decompressing when it starts saving to the temporary file.
Assignee: law → gagan
Status: REOPENED → NEW
->neeti
Assignee: gagan → neeti
Not holding PR3 for this, so marking nsbeta3-. Seems serious enough to nominate for rtm, though.
Keywords: rtm
Whiteboard: [nsbeta2-][nsbeta3+][PDTP2] → [nsbeta2-][nsbeta3-][PDTP2]
*** Bug 54704 has been marked as a duplicate of this bug. ***
approving for rtm. gordon can you help neeti here?
Whiteboard: [nsbeta2-][nsbeta3-][PDTP2] → [nsbeta2-][nsbeta3-][PDTP2][rtm need info]
*** Bug 56439 has been marked as a duplicate of this bug. ***
*** Bug 56449 has been marked as a duplicate of this bug. ***
*** Bug 56846 has been marked as a duplicate of this bug. ***
*** Bug 56856 has been marked as a duplicate of this bug. ***
The cache is not doing anything special. The flag (SetApplyConversion) is not set for Save As /Right click cases. This needs to be set for necko to not automatically apply the content conversion. Assigning to mscott for the cases law describes. mscott: give me a call if you need help.
Assignee: neeti → mscott
Keywords: relnoteRTM
Here's the fix for disabling conversion for content that's getting dispatched via the exthandler. Please ignore the first 3 lines of this patch which are part of another bug. All we care about here are the lines in OnStartRequest which disable conversion: nsCOMPtr<nsIHTTPChannel> httpChannel = do_QueryInterface( aChannel ); if ( httpChannel ) { // Turn off content encoding conversions. httpChannel->SetApplyConversion( PR_FALSE ); } http doesn't require this flag until the consumer starts to read out data so setting it in the OnStart call has the desired effect. If there are still cases via the save as and right click cases where this flag isn't getting set then those would go back to law. gagan, can I get r=gagan on this change from ya?
r=gagan
Scott, the PDT is going to want the exact fix attached without the extraneous lines. Might as well just do that now...
Turns out this problem is worse than we thought. My fix works great for windows. However it doesn't work on linux. On top of that, nothing works on linux! Let me rephrase that. Without my changes, goto http://www.turbogeek.com/mozilla/gzip.html. 1) Save Link As for both of the example links and save them to a local file. 2) On windows, both of these files are still gziped. They aren't uncompressed. This is GOOD as this is what we want. 3) On Linux, both of these files are uncompressed!! I've verified that Bill's change to set the compress flag on the http channel is getting set before he opens the channel and starts reading from it. http is choking on linux for some reason and still decompressing it. So what did I fix? Well I fixed it for the case where you click on a url and it causes the helper app dialog to come up. When you saved the content, we were always uncompressing the data. So I believe my patch is a requirement as it makes things work great on windows. but we need to figure out why http is giving us uncompressed data on linux for all scenarios when we set the convert flag to false. back to gagan for that one =). hot potato hot potato.
Assignee: mscott → gagan
*** Bug 57410 has been marked as a duplicate of this bug. ***
->darin
Assignee: gagan → darin
Status: NEW → ASSIGNED
*** Bug 57249 has been marked as a duplicate of this bug. ***
The behavior of Netscape 4.75 under Linux is also to uncompress .gz files. Except that it "correctly" renames the files. An interesting thing, however, is that N4.75 does not uncompress (or rename) .tar.gz files.
Using Netscape 6 [Linux build 2000102309] and starting with an empty cache, I see the following behavior: 1) Left click on a .gz http link. NS tries to display the file (even if it is binary). This is how 4.X works as well. 2) Now right click on the same .gz http link and select save link as. Then in NS 6 you get a dialog with the full name of the file (including the .gz) but the file that is saved will not be compressed (BUG). 3) Next, right click on a different .gz http link and select save link as. This time the file is not uncompressed (or renamed). This is what we expect. So, from this sequence of events, it is clear that the problem is related to the cache. Whether it is the fault of the cache or not is unclear. As far as what is going on, my guess would be that the cache is storing the uncompressed data, since that is what was needed for display. However, it is associating the uncompressed data with the URL to the compressed data. Thus, when we later request the URL to the compressed data, the cache simply gives us the uncompressed data instead. We of course have no knowledge of this, and therefore we do not save the data with the correct name. It looks to me like we are not using the cache correctly. There should be two entries in the cache, one for the compressed data and one for the uncompressed data. I have to investigate this further since I'm not too familar with how we pass data to the cache. If what I'm saying is true, we should be able to see the same behavior under Windows. Hmmmm...
Ok, under WinNT [build 20001023], I find the exact same behavior that I just described. The next thing to investigate is how entries are added to the cache. Do people agree that there should be two cache entries? One for the original compressed data and one for the uncompressed data? But, then what would the URL be for the uncompressed data? Perhaps we should only cache the compressed one? How does NS 4.X handle this? Ans: Under Windows NT, it doesn't... NS 4.7 has a similar problem. In this case, it doesn't seem to matter if I first follow a link to a .gz file (not a .tar.gz though) and then try to save the .gz file, the result is always an uncompressed file with a .gz extension. The Linux version of 4.X gets this right, however, as I previously noted.
The issue we should address is that Mozilla has no business uncompressing any files. Mozilla should only decompress content if it has been encoded for transfer (e.g. gzipped HTTP response body), but it should never uncompress files simply because they are compressed. For example, if I download the Linux kernel source code, I do not want mozilla to uncompress it. The compressed archive is 17.7 megabytes. The uncompressed archive is 562% larger at 99.5 megabytes. I do not want this file to take up six times more room that it needs to. Mozilla should leave it compressed on the disk. If Mozilla automatically decompresses it, it wastes my time and my CPU time having to recompress it. The second issue is consistency. Mozilla decompresses gzip files, but what about bzip, zip, lha, arc, shar, ice, et. al? After we unzip the file, why aren't we untarring it? Do we also handle cpio? Consistent handling of compressed files is important. The bottom line is that Mozilla should only automatically handle compression which is meant to be invisible to the user. The only example that I am aware of is Content-Encoding and friends from the HTTP 1.1 spec.
I agree, but in fact, mozilla does not have a problem with .tar.gz files. It does not try to decompress them. The problem is with .gz files.
> Perhaps we should only cache the compressed one? Agreed. > The issue we should address is that Mozilla has no business uncompressing any > files. If Mozilla can *display* the file and intends to do so, uncompressing it is OK. It should only never uncompress it when saved on disk.
Hey Darin, I think there's more than just the cache that's causing a problem here. If I click on a link that isn't in the cache, then on linux, I still see the content get uncompressed. On windows, the content is properly handled. To see this, attach my patch to the exthandler to disable conversion. Now visit: www.mozilla.org and click on a linux nightly tarball. On windows, you'll see that we don't unzip the content but on linux we still do. The linux tarball isn't in my cache.
I had a discussion with Gagan on this... and, irrespective of how we are currently doing things, the "correct" thing to do (following the SPEC) is to give the file to the user in the format corresponding to the Content-Type HTTP header. I'm going to say up front that this is not what the user would expect in many cases. For example, the server at turbogeek.org reports the Content-Type of both gzip-test.gz and gziped-ascii.txt.gz as test/plain. And, it specifies the Content-Encoding as gzip. According to the SPEC this tells the browser that the content is only gzip compressed for the purposes of getting the actual data to the user, but that the user ultimately wants the data in the format specified by Content-Type. Correct me if I'm wrong, but this is how I interpret the SPEC. Now the way Apache, for example, handles .gz files is that it tries to guess the format of the compressed content. If you have a .tar.gz file, it will report the Content-Type as application/x-tar and if it doens't recognize a contained extension (eg. whatever.gz) then it will just report the Content-Type as text/plain, which often times is not correct. In both of these cases, it will report the Content-Encoding as gzip. What all of this means, of course, is that if the server is not reporting the content type as application/gzip then we should decompress. Ultimately, I think the user should be given a choice: if the server is providing compressed data, and the user wishes to save that data to a file, we should ask the user if they want the data in the compressed form or the uncompressed form. The current behavior of mozilla and netscape 4.X does not follow the SPEC in this regard. It is inconsistent at best, and so we have to decide what behavior to actually implement.
What do other servers, e.g. MS IIS, by default (many people have only ftp access to their webserver, so the default matters a lot)? What happens for sea.gz? .tar.bz2? .txt.bz2, sea.bz2? Do we recognize bz2 at all? Does Apache, MS IIS?
Is apache aware of the problem, then? The closest bug I found by searching http://bugs.apache.org/ for "gz" was http://bugs.apache.org/index.cgi/full/3892.
Indeed, it seems that the Apache Group have done this on purpose. Quoting from httpd.conf.dist: # # AddEncoding allows you to have certain browsers (Mosaic/X 2.1+) uncompress # information on the fly. Note: Not all browsers support this. # Despite the name similarity, the following Add* directives have nothing # to do with the FancyIndexing customization directives above. # AddEncoding x-compress Z AddEncoding x-gzip gz tgz I'm contacting the Apache people by mail.
> AddEncoding x-gzip gz tgz Please note that "tgz". ".tgz" is short (in order to stay in the DOS 8.3 scheme) for ".tar.gz". -> We will also uncompress tarballs. Yes, this is an Apache bug, but relevant to our decision.
Please disregard my last comments. I misunderstood you. If Apache always add the encoding header, even for .tar.gz and .tgz, we must ignore it during saving as file (this includes the cache), or at least ask the user. > According to the SPEC this tells > the browser that the content is only gzip compressed for the purposes of > getting the actual data to the user, but that the user ultimately wants the > data in the format specified by Content-Type. Correct me if I'm wrong, but > this is how I interpret the SPEC. Which spec and which sentence do you interpret this way? I checked in RFC1945, 10.3 and RFC2616, 14.11, and I see nothing that suggests this. If I interpret it correctly, I suggest to just ignore it always save the file compressed, without bothering the user to ask.
BTW: The relevant Apache bug reports are <http://bugs.apache.org/index.cgi/full/2364> and <http://bugs.apache.org/index.cgi/full/1439> (please note that the latter predates the former, it is just an example of potential harm).
RFC 2616 says that the Content-Encoding should not be undone until display or other presentation of data. If Mozilla is going to save the data to disk, it must leave the content encoding in place, which means not unzipping zipped files. If it intends to display the file, then it must unzip the file first. So two example scenarios, to clarify all this posting: 1) Content-Type: application/x-tar Content-Encoding: x-gzip Mozilla must save the file to disk without changing the name or unzipping the data. 2) Content-Type: text/plain (or other displayable format) Content-Encoding: x-gzip Mozilla should unzip the file and display it.
Marking relnote-user in case we don't come up with a fix for this. Gerv
Whiteboard: [nsbeta2-][nsbeta3-][PDTP2][rtm need info] → [nsbeta2-][nsbeta3-][PDTP2][rtm need info] relnote-user
The SPEC I was previously referring to is RFC 2616. Take a look at Section 3.5 "Content Codings"... the first paragraph in particular. Clearly, decoding is necessary when we view content. The question (which the HTTP SPEC does not answer) is: what do we do when the user asks to save the content to a file? And, in what format should we store the content in the cache? As far as question 2 is concerned, storing the content in the decoded form would make sense from the point of view of efficiency -- we don't want to be decoding the content every time we wish to display it! On the other hand, the encoded form of the content may be smaller (in the case of compression) and therefore would help us conserve disk space. This would probably be a benefit to embedded device implementers. So, perhaps this should be a preference?!? Now back to question 1. One very common use of the Save Link As option is for downloading a file from a HTTP server. In this way, the HTTP server is being used like a FTP server. And, in this case, the user almost always expects the data to be in the original format... usually compressed. We should not decompress such content. But, this is not the only way that the Content-Encoding header is used. This header was intended to be used by the server when it needs (or wants) to encode the content for transmission or for whatever reason. We don't have any way of knowing what the intent of this encoding is. The way we've attempted to solve this problem so far, is to give the user the content in the "raw" encoded form when they click Save Link As (or some equivalent). However, there are some bugs in the way we do this now. And, moreover if the content is already in the cache (in decoded form), then what should we give the user when they ask for the content to be saved to a file? Should we re-encode the data? Should we re-fetch the content? Or, should we give them the decoded data, and somehow guess the correct filename as Netscape 4 tries to do? Also, what if the content is not in the cache, and then if we save the content undecoded, should we cache that? If we do, then if the user later asks to display the content, we will have to remember to decode at that time. BTW.. this is currently a problem. Clear your cache, goto turbogeek.org/mozilla and download [right click->Save Link As] one of the test files (eg. gzip-test.gz). Then left click on the save file, notice that the displayed content is binary, which means the data is not being decoded!! The cache is already slated for an overhaul in the very near future. I think it should incorporate knowledge of the encoding and possibly be able to provide the data in either format on-the-fly?!?
Whiteboard: [nsbeta2-][nsbeta3-][PDTP2][rtm need info] relnote-user → [nsbeta2-][nsbeta3-][PDTP2][rtm need info]
mscott: with your patch (which appears to be checked into the trunk... i haven't looked for it on the branch) I do not see a difference in the behavior on Linux versus Windows. I am using trunk pulls from yesterday (10-23-2000). Perhaps I could swing by your cube and have you to show me the difference?
darin, to see the difference go to www.mozilla.org and click on the linux nightly tarball. When the helper app dialog comes up, select save to disk. On windows, the file is correctly saved uncompressed. On linux, the file is uncompressed and still has a .gz extension. Is that what you were trying? That's what my patch fixed for windows.
Darin, IMO, the cache should not hold decompressed data. It is a network cache, supposed to reduce redundant network fetches, not save processing time. Decompressing is so fast that it might be even faster than reading the decompressed file from disk (but I have no data supporting this). In any case, a cache hit for compressed data (in contrast to stylesheets etc.) seems to be unlikely, so keeping the decomprssed file just to save some processing seconds seems like a waste of cache space to me (especially for the mem cache).
I agree with your arguments that the cache should not hold decoded data. It's unfortunate that this is not the current behavior. At the moment, the stream conversion (decoding) is happening as the data arrives, unless the HTTP channel has the ApplyConversion flag set to FALSE. The converted stream is passed on to the channel's listener (eg. the parser). The cache intercepts this stream, so it never sees the encoded data. Clearly, then, we need to re-think how data is put into the cache. This is probably a major change.
mscott: as strange as this may sound, when I go to www.mozilla.org and grab a nightly build (like mozilla-i686-pc-linux-gnu-sea.tar.gz) by left- clicking the link and selecting save in the dialog, I _do_ get a gzip'd tar file. I tested this using a CVS pull from around 4 pm today (10/24). With which version of the code are you seeing the discrepancy between Linux and Windows?
Most of the discussion here is around HTTP; FTP also shows the same behaviour. (Should this be entered as a new bug, like 57619?) If I download http://ftp.mozilla.org/pub/mozilla/nightly/2000-10-25-08-Mtrunk/mozilla-i686-pc-linux-gnu-sea.tar.gz (with 2000102508 in Linux) I get a compressed version of the .tar.gz. If I download ftp://ftp.mozilla.org/pub/mozilla/nightly/2000-10-25-08-Mtrunk/mozilla-i686-pc-linux-gnu-sea.tar.gz (the same file with a different protocol) I get an uncompressed version.
*** Bug 57625 has been marked as a duplicate of this bug. ***
Whiteboard: [nsbeta2-][nsbeta3-][PDTP2][rtm need info] → [nsbeta2-][nsbeta3-][PDTP2][rtm need info] relnote-user
This is not something we can easily fix for RTM. Moving the target to Future, and marking rtm- in the status whiteboard.
Whiteboard: [nsbeta2-][nsbeta3-][PDTP2][rtm need info] relnote-user → [nsbeta2-][nsbeta3-][PDTP2][rtm-] relnote-user
Target Milestone: M21 → Future
Note that when Content-Encoding is set to 'gzip', Mozilla will save files in compressed format, regardless. I have a proxy which compresses html content (sets Content-Encoding to gzip), and *all* files are saved as gzipped when I hit save-as (but mozilla does not append a .gz extension). It appears that with this build (2000110308), .tar.gz files are handled correctly, but others types are not. From my reading of rfc2616, section 7.2.1, "Any HTTP/1.1 message containing an entity-body SHOULD include a Content-Type header field defining the media type of that body. If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource." Since this is an ambiguous situation, it is bendinging the rules a bit, but the client (mozilla) must examine the URI to determine if the file was gzipped. The way to unambiguously fix this would be to use 'Transfer-Encoding: gzip' when the server has compressed the data, and only use 'Content-Encoding' when the data was compressed to begin with. But this would require modification of lots of servers. (And will probably be incompatible with HTTP/1.0) I think I will modify my proxy to use Transfer-Encoding instead (since Transfer-Encoding *must* be removed by the client). Will mozilla handle this properly? "Future" my ass. (not to be rude...) This is a bug, and it needs to be fixed.
I think most people would agree, that the preferred action taken by the browser should be to give the file in a format that is consistent with the URL. Based on the content-type/content-encoding alone, you do not really know what the user expects when they ask to save the content to disk. However, if you inspect the extension on the URL, you can do a mime-type lookup and then figure out what to do (most of the time). This bug is "futured" b/c it depends on cache architecture changes (which are coming). Currently, we are not putting content in the cache in a consistent way, so it is difficult (if not impossible) to properly fix this problem right now. Right now, it is possible depending on how you acquire content (either through left-clicking a link or right-clicking and saving the link) to end up with content stored in encoded form in the cache as well as content stored in decoded form. I really believe that this needs to be resolved first.
My suggestion to unambiguously fix the Content-Type/Content-Encoding mess by using Transfer-Encoding won't work because neither Netscape 4.7x nor Mozilla support it. (But it's defined by rfc2616!) Should I file a separate bug for 'Transfer-Encoding: gzip' not working?
If transfer-encoding is not working properly then YES that should be filed under a different bug.
Why ever _decompress_ or do _anything_ else (i.e. CRLF-conversion) if saved on disk! Neither on clicking nor on "Save as...". Leave _any_ changing of the saved file to the helper applications or at least to what's declared in the application preferences, since one never knows what will be done with it. Why try to be smarter if it means worse. Not being able to download without changing means not being able to download at all. So I consider this bug as "major".
One important reason why we want to "touch" the network data before caching it is, in the case of HTTP, to parse transfer encodings and group headers together in the event that headers appear at the end of the data stream. We really don't want to have to do this kind of parsing every time we fetch a server response from the cache.
Component: Networking → Networking: HTTP
Blocks: 61688
*** Bug 65573 has been marked as a duplicate of this bug. ***
For *content* vs *transfer* encoding in HTTP/1.1 see my comments in bug 68414.
To clarify: When we get an entity with "Content-Encoding: gzip" it should keep this encoding when saved to disk. RFC 2616 says this is a property of the entity. Also, see W3C-CUAP 3.1 (http://www.w3.org/TR/2001/NOTE-cuap-20010206#cp-save-filenames and bug 68420), which even more clearly tells us what to do. However, there could be an option in the filepicker to save uncompressed.
The option to "save uncompressed" is definitely an enhancement worthy of a different bug report. For this bug, I agree, content saved to disk should not be decoded. In order for this to work, we have to make sure that cached content is also written in encoded form. For now, we are blocked waiting for the new cache to land.
In my opinion, the Right Thing for the cache to Do is to store exactly what was received from the server. Then, on "save as", the file should be copied from the cache as-is. Only in this way will the final file saved be identical to the original file from the server. The "display" section of Mozilla should be treated just like any other helper app and given the original (possibly compressed) file. That helper app should uncompress (if necessary) and parse the file on-the-fly and display it. But there's no reason to store this uncompressed (and parsed and otherwise fiddled-with) text anywhere. It's definitely much faster to decompress a file from the RAM cache than it is to load a uncompressed file from the disk cache. (Sometimes all the compressed data fits in RAM, but all the compressed + all the uncompressed data overflows RAM and overflows onto disk). Darin Fisher may be right that re-parsing the group headers every time a file is loaded from cache is too slow. Would it be possible to store a short summary of pre-parsed information and meta-information about a file somewhere else, such that the original file contents are still unchanged ?
The new cache implementation addresses this concern.
removing stale/old keywords. adding dependency on bug for new cache design.
Depends on: 68705
Keywords: crash, mlk, nsbeta2, rtm
Whiteboard: [nsbeta2-][nsbeta3-][PDTP2][rtm-] relnote-user → [PDTP2] relnote-user
Ok, doing "save to disk" now leaves the .gz extension and it is a .gz file, so this is good. However, if you choose "Open With..." and then a gz-handling program, odd things happen (at least for me, on Windows, with PowerArchiver2000 - appreciate reports on other platforms.) Is it passing it the tar version? Gerv
this bug hasn't been fixed yet. it will require a little bit of HTTP reworking.
Keywords: nsbeta1
Target Milestone: Future → mozilla0.9.1
my changes for bug 76866 will fix this bug as well.
Depends on: 76866
*** Bug 80053 has been marked as a duplicate of this bug. ***
the necessary http changes were checked in with the http branch landing. however, i'm not sure that the problem is completely fixed.
Keywords: qawanted
No, this is definitely not fixed yet. Build 2001051616, x86/Linux.
That's what we should do: View Helper Save As ================================================================================ Content-Type: application/octet-stream --- compressed compressed Content-Encoding: gzip uncompressed uncompressed compressed Transfer-Encoding: gzip uncompressed uncompressed uncompressed Transfer-Coding is not working yet (bug 68517). For all other cases I can verify correct behavior with 2001-05-16-04, Win NT (for helper applications see bug 69306). One issue is left: File extension is not changed in the default file name of the save-as dialog if it does not match the content coding. Is that covered by bug 31519, should we keep this bug open, or should I file a new one?
-> 0.9.2
Target Milestone: mozilla0.9.1 → mozilla0.9.2
*** Bug 82127 has been marked as a duplicate of this bug. ***
*** Bug 82308 has been marked as a duplicate of this bug. ***
*** Bug 82319 has been marked as a duplicate of this bug. ***
*** Bug 82698 has been marked as a duplicate of this bug. ***
*** Bug 83154 has been marked as a duplicate of this bug. ***
This has started happening to gzip files downloaded from ftp.mozilla.org within the past few days.
*** Bug 83188 has been marked as a duplicate of this bug. ***
Happens on all tar.gz and .tgz files too At least now i know this and i dont have to erase the files and get them with another app <g>
Keywords: perf, qawanted, relnoteRTMpatch
good fix. r=gagan
this fix is not really correct. it breaks down for servers which send text/html with a content-encoding of gzip. try saving the toplevel page at http://sourceforge.net/. you'll see that the saved page is gzip encoded. i think what we really need to do here is respect the content-encoding header except in cases where the content-type is application/x-gzip (and related variants). sourceforge.net is basically broken. it should not be sending a Content-Encoding header in this case, since it does not intend for the browser to decode the data. if we added this application/x-gzip hack we'd be consistent with the behavior of NS4x.
the question then is are there other content-types which should be treated in a similar manner?
Attached patch alternative solution (deleted) — Splinter Review
> this fix is not really correct. it breaks down for servers which send text/html > with a content-encoding of gzip. try saving the toplevel page at > http://sourceforge.net/. you'll see that the saved page is gzip encoded. That's exactly what we should do. Quoting RFC 2616 once again: The content-coding is a characteristic of the entity identified by the Request-URI. Typically, the entity-body is stored with this encoding and is only decoded before rendering or analogous usage.
but then the filename of the saved page should have a .gz appended to it. fwiw: nav4x saves the page in text/html format, not application/x-gzip.
> but then the filename of the saved page should have a .gz appended to it. Yes (strictly speaking: we should use the proper system naming convention for the content coding, see bug 68420).
OK.. while i agree that this solution could be sufficient for text/html, we'd then need a way to distinguish: Content-Type: text/html Content-Encoding: gzip from: Content-Type: application/x-gzip Content-Encoding: gzip in terms of whether or not we should gunzip the data. if we adhere to the spec then we should decode the data in both cases (or at least assume that the data is not actually of the type given by the Content-Type header but actually an encoded form of that), and we should choose a filename extension that matches the content-type. unfortunately, this doesn't work for the second case, since the content sent by the server is not actually twice gzip'd! ...the server is lying! we can alternatively assume that when saving to disk, the Content-Encoding header should be ignored. this is nice, because in the second case it means that we'd be OK... we would have conveniently solved the twice gzip'd problem. but, what about the first case. what would happen there? well, we'd probably want to (as i've already said) adjust the saved file extension to take into account the fact that the content is gzip'd. but, how do we know that the content is gzip'd? because of the Content-Encoding header right? but, we're ignoring the Content-Encoding header, aren't we? this is where i get stuck. i think that no matter what we have to assume that servers will not double gzip content. otherwise, i'm not sure how we're going to solve this problem. this is just another example of us having to jump through hoops to support a commonly accepted (and consistently implemented) violation of the spec.
darin, what if you have a URL .../foo.html.gz giving HTTP Content-Type: text/html Content-Encoding: gzip ? You don't want to save that as foo.html.gz.gz, do you? You want to save it as foo.html.gz, no matter, if the URL was ../foo.html or ../foo.html.gz, right? So, can we avoid the check, if the filename already has an acceptable extension (Note: .tar.gz means the same as .tgz!)? If we do extension guessing, and considering that we can't (yet) do anything sensible with application/x-gzip other than saving, do we have to special-case for the "violation" (Content-Type: application/x-gzip, Content-Encoding: gzip) at all?
FYI: It would be fine with me, if you - didn't decompress Content-Type: gzip when saving - decompressed Transfer-Encoding: gzip when saving - proposed the "filename portion" (part after the last slash; "index.html", if null) of the URL as filename for the local disk. This would mean that the URL .../foo.html giving HTTP Content-Type: text/html Content-Encoding: gzip would be stored gzipped as foo.html, but that'sa fault of the web site, not? (BTW: What happens on Windows, if I save a normal foo.html? Is foo.html or foo.htm proposed as filename? Doesn't the former break Windows extensions?)
Darin wrote: > this is just another example of us having to jump through hoops to support a > commonly accepted (and consistently implemented) violation of the spec. It seems there is no solution that will work right for all situations, so, failing that, we mine as well implement it right (RFC-style), and get Apache to fix the Content-type/encoding ambiguities before next release. It mean's we'd screw up saves for older (broken) Apache servers, but hell, Netscape 4.x screws up gzip saves as well, so we're at 4.x parity for broken servers, and correct for proper servers. If we don't do it right now, more browsers and web servers that serve up ambiguous content-type/-encodings will be released, and it'll never be corrected. And if we don't do it right now, we're stuck with the problem of trying to guess what the server meant, and eventually we're gonna guess wrong, and force new web servers to remain broken for broken browser compat. I say do it RFC style, relnote the user, and contact Apache (and any other server who's default config is sending twice-gzip'd headers and only gziping it once).
We are the Moz-cops, upholding the RFCs!! WooHoo!!
The new attachment "alternative solution" openly admits to breaking the RFC and it doesn't solve the problem properly. The 06/01/01 solution with Gagan's r= on it sounds fine and hopefully can be checked in soon so that we can verify and get out of here. If people are downloading gzip compressed HTML files and want them to be named foo.html.gz that's an issue for a different bug (someone mentioned the bug # already) PLEASE don't try to solve all the world's trouble here in #35956
my previous patch (the one gagan r='d) simply ignored the Content-Encoding header when saving to disk, but as i described in my previous patch this makes it impossible to get the file extension right, unless we explicitly encode (someplace) the fact that apache doesn't really double gzip such content. moreover, my previous patch breaks necko convention by not calling OnStartRequest for the stream converter. this is only a minor detail, of course, and for the gzip stream converter it fortunately has benign side effects... but http's not supposed to know that, right? ;-)
...but as i described in my previous _comments_ this makes...
.tar.gz is usually the same as .tgz unless .tgz means a slackware package on linux which is VERY different
What the hell are you talking about? A slackware package is just a gzipped tar file.
"my previous patch (the one gagan r='d) simply ignored the Content-Encoding header when saving to disk," Good. Fix the Necko nit you mentioned, get approval, check in fix, verify and kill this bug. I really don't think the 06/01/01 patch prevents bug 68420 from being fixed, Darin can you explain why you believe that to be true? Maybe my understanding of Mozilla's architecture is too weak to see the problem.
my point is that this is not a bug with mozilla, it is a bug with apache. all we can do is work around apache's bug. that is the intent of my latter patch. i'm going to add an additional check that only enables the workaround logic if the server is apache.
Attached patch merging both solutions (deleted) — Splinter Review
r=gagan
*** Bug 84899 has been marked as a duplicate of this bug. ***
+ const char *encoding = mResponseHead->PeekHeader(nsHttp::Content_Encoding); + if (encoding && PL_strcasestr(encoding, "gzip") && ( + !PL_strcmp(mResponseHead->ContentType(), APPLICATION_GZIP) || + !PL_strcmp(mResponseHead->ContentType(), APPLICATION_GZIP2))) { + // clear the Content-Encoding header + mResponseHead->SetHeader(nsHttp::Content_Encoding, nsnull); I think I got apache to spit out an encoding of x-gzip at one point (although that was without sending any accept-encoding headers). You should probably check for that as well.
an encoding of x-gzip would be picked up by this patch as well. note the call to PL_strcasestr.
Oops. Of course, we'll now match against not-gzip :)
Not sure if the following situation is releveant to this bug, but.. I've tried to download something from this link.. http://mylookandfeel.l2fprod.com/portal.php3?action=plaf&id=skinlf#resources and I'm offered to save a *.php file I don't think it's right.. it should let me save the *gz file that is behind there (worked well on IE when I copied the link) (milos.kleint@czech.sun.com)
remove the printf in nsHttpChannel::SetApplyConversion and it is good to go.
*** Bug 85476 has been marked as a duplicate of this bug. ***
dougt: yikes! thanks for catching the printf.
Whiteboard: [PDTP2] relnote-user → [PDTP2] relnote-user, r=gagan, sr=dougt, a=?
Has anyone contacted Apache to get a fix in there?
Blocks: 83989
a= asa@mozilla.org for checkin to the trunk. (on behalf of drivers)
fix checked in!! horray!! horray!!
Whiteboard: [PDTP2] relnote-user, r=gagan, sr=dougt, a=? → [PDTP2] relnote-user, r=gagan, sr=dougt, a=asa
marking FIXED
Status: ASSIGNED → RESOLVED
Closed: 24 years ago23 years ago
Resolution: --- → FIXED
http://mylookandfeel.l2fprod.com/portal.php3?action=plaf&id=skinlf#resources still tries to save the .zip files as .php... seperate issue, or was the patch supposed to take care of that one too?
yes that's a seperate bug which you should be able to find in bugzilla (it might even be mostfreq), basically we don't honor the suggested file name field, which pairs nicely w/ the fact that we don't provide normal filename fields :)
Keywords: patch
Whiteboard: [PDTP2] relnote-user, r=gagan, sr=dougt, a=asa → [PDTP2] relnote-user
*** Bug 85854 has been marked as a duplicate of this bug. ***
This fix seems to have broken some (all?) pages that are gz encoded. For example, go to http://www.mutt.org/ and click on the FAQ link (it's near the top). In 4.77 this link brings up the page, but in Mozilla I now get a gzip file displayed in the browser.
The mutt faq page WFM on 2001061309/Linux.
I was using 2001061308/Linux (Navigator only). Installing 2001061408 has only made things worse... it now segfaults whenever I click on the FAQ link: /usr/local/mozilla/run-mozilla.sh: line 72: 15501 Segmentation fault $prog ${1+"$@"}
WFM linux 2001061308, but I dont think this page is a "normal" encoding setup. Upon requesting "GET /muttfaq/faq", it 302's you to http://www.fefe.de:80/muttfaq/faq.html.gz Which is sent: Content-Type: text/html..Content-Encoding: gzip Doing a Save As... on the page saves with the name faq.html.gz, and with the data gziped. Doing a "Save Link As..." on the page from mutt.org saves the uncompressed version to a file named 'faq' (no .html).
Travis, the bug you mentioned (with x-gzip encoding) is known as bug 85887 and was fixed recently.
*** Bug 87016 has been marked as a duplicate of this bug. ***
Hi I got caught by this one too!
*** Bug 87781 has been marked as a duplicate of this bug. ***
Gzipped files from citeseer still expanded: http://citeseer.nj.nec.com/rd/44385488%2C319362%2C1%2C0.25%2CDownload/http%253A%252F%252Fciteseer.nj.nec.com/cache/papers/cs/14081/http%253AzSzzSzwww.brics.dkzSz%257EmiszSzmacro.ps.gz/brabrand00growing.ps.gz The URL above is a redirect to a gzipped ps file which is auto-expanded when mozilla downloads it. It is saved by default as .ps.gz which is incorrect. Build id: 2001062608
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
robin: what platform are you noticing this on? i just tried with the linux 6-26/08 and didn't have any problems. the saved file was compressed.
robin: make sure you clear your cache... there was a bug that just got fixed which made it possible for the uncompressed content to be written to the cache, which if later saved to disk would also be uncompressed. this was fixed, however. marking FIXED... please reopen if after clearing your cache you still see the problem. thx!
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
*** Bug 88059 has been marked as a duplicate of this bug. ***
*** Bug 88619 has been marked as a duplicate of this bug. ***
No longer blocks: 68420
I filed bug 90490 for the remaining issue (no .gz extension added).
*** Bug 90711 has been marked as a duplicate of this bug. ***
Peopl are still filing dupes. Should this be re-opened?
benc: I don't think this made 0.9.2 - are there any reports with current nightlies? I'd have to check that though.
benc: Some bugs are filed against old builds (bug 87016, bug 88059). Bug 85854 is filed against build 2001061308, the fix has been checked in 9 hours before. Bug 87781 and bug 90711 have no build ID. Bug 88619 has probably the wrong ID (that of a newer build downloaded with an old build).
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2) Gecko/20010701. The most annoying bug in the world has been squashed. Congratulations FT
WFM on WindowsME, 2001072618 trunk installer build. Isn't this bug ripe for fixed/verified?
verified: Linux rh6 2001080106 Win NT4 2001080103 Mac os9 2001080108
Status: RESOLVED → VERIFIED
*** Bug 95242 has been marked as a duplicate of this bug. ***
I'm seing this again on build linux gcc3.0 2001122021 while downloading files from http://ftp.mozilla.org/pub/mozilla/nightly/latest/ using save as, not when clicking on the links waiting for the download menu to pop up! Link in this bug works fine however?!?
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
confirmed... but, the problem you are reporting is a different bug. i could reproduce it by right clicking and pressing "Save link as" ...that's not what this bug report is about. please see bug 116445. marking FIXED.
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
This bug is still present in build 2002010208. I tried three cases with the mozilla installer on http://www.mozilla.org/ (i686-pc-linux) click: downloaded and saved as .tar.gz, file is a gzipped tar. file save as dialog box showed: Files of type: *.gz (*.gz) shift+click: downloaded and saved as .tar.gz, file is a tar (not gzipped) file save as dialog box showed: Files of type: All Files (*.*)(*.*) right click+save link as: downloaded and saved as i386 Linux, file is tar (not gzipped) file save as dialog box showed: Files of type: All Files (*.*)(*.*)
Philip, that's bug 116445.
v fixed. new issues/regressions whould be filed as new bugs.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: