Closed Bug 128199 Opened 23 years ago Closed 23 years ago

[FIX]We should not decode when saving files with well-known extension

Categories

(Core Graveyard :: File Handling, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.0

People

(Reporter: bzbarsky, Assigned: bzbarsky)

References

()

Details

Attachments

(1 file, 4 obsolete files)

We now "ban" decoding of files on save if the files have a type like application/tar. We should also consider banning decoding of files on save when the url has the "gz" or "zip" extension. This is usually a good indicator that the file is in fact not to be decoded...
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla1.0
URL demonstrates the problem. Go there and do a "save link as" on the link described in the second comment.
Attached patch work-in-progress patch (obsolete) (deleted) — Splinter Review
Attached patch Fully working patch (obsolete) (deleted) — Splinter Review
This does what it can to make sure that extensions are right in the default suggested filename and does not decode .zip or .gz files. Does not regress any of bugs 121001, 35956, 120033, 116445, 128177
Attachment #71844 - Attachment is obsolete: true
bz: great work! i'll try to take a look tomorrow.
Please indulge me and explain what's going on here it terms a layman such as myself can understand. My take on this is that: - There's a link to some content, which ultimately has an arbitrary content type. - The server indicates that the file has this "ultimate" content type. - The server stores this content in some compressed format, a fact which they advertise via the Content-Encoding http header. - Normally, that header tells the user agent to un-encode the data to get the ultimate content type. - We will now infer from the file extension that this content should be stored on the client's computer un-encoded, so we will ignore the Content-Encoding header. That sounds plausible. The alternative (what we do currently), is to save the file with the wrong extension, essentially (e.g., in this case save text/xml with extension .gz). My one concern (and it may not really be a problem, I just couldn't figure out from the patch and I'm too busy/lazy to figure it out) is this: What if the file is being downloaded to be opened with a helper application that was selected based on the actual content type. For example, imagine application/x- msword instead of text/xml in the case cited). Would this patch result in us passing encoded data to MS Word?
Explanation: 1) There's a link to some content, which ultimately has an arbitrary content type. 2) The server indicates that the file has this "ultimate" content type. 3) The server stores this content in some compressed format, a fact which they advertise via the Content-Encoding http header. 4) "Normally, that header tells the user agent to un-encode the data to get the ultimate content type." This is where problems start. That's not actually what the header says. The header says that the data should be decoded for viewing but should be kept encoded otherwise because the encoding is an inherent part of the data. Transfer-Encoding is the header that says "decode this once you get it", but no browsers or servers support it. So servers use content-encoding to mean either content-encoding or transfer-encoding and hence we have to guess which they meant. Proceeding: 5) We will now infer from the file extension that this content should be stored on the client's computer un-encoded, so we will ignore the Content-Encoding header. Correct. Your concern is a valid one.... Indeed, if someone has a url ending in .gz or that is served as application/msword and content-encoding: gzip and the user selects "open with application" then we will indeed feed gzipped data to Word. To be truthful, I see no way to solve this problem, since we have to decide whether to decode _before_ we know whether we're using a helper. I could fix that for the "never ask" case, but the general problem will remain until stop streaming the data to file before we know what we're doing with it. The only type I can think of that people would send off to helpers that has a .gz or .zip extension with any sort of frequency is postscript.... So this should be a fairly rare problem; certainly much rarer than .gz files of types other than application/tar that we will therefore gunzip silently.
> "url ending in .gz or that is served as application/msword and content-encoding: > gzip" That should have an "and", not an "or". Booleans-R-us.
Comment on attachment 71996 [details] [diff] [review] Fully working patch r=law OK, we'll have to let that one situation slide, then. In the very unlikely event that this gets to be a "real problem"(*), then perhaps we can add code at the point we decide to hand off to the helper, and un-encode it then somehow. (*) A problem that some pointy-haired person puts nsbeta1+ on (no offense to any pointy-haired people :-).
Attachment #71996 - Flags: review+
here is just comment about well-known filetypes, I like the idea of using helper application to list these like the windows os/file types display.. if that helps, I believe this is something the registry tracks, but maybe would be useful as advanced > UI to list their association that mozilla has and its decoding how it is handled.
Summary: We should not decode when saving files with well-known extension → [FIX]We should not decode when saving files with well-known extension
Comment on attachment 71996 [details] [diff] [review] Fully working patch >Index: xpfe/communicator/resources/content/contentAreaUtils.js >+ const helperAppService = >+ Components.classes["@mozilla.org/uriloader/external-helper-app-service;1"].getService(Components.interfaces.nsIExternalHelperAppService); how about a newline to break up this long line? >Index: netwerk/protocol/http/public/nsIHttpChannel.idl >+ /** >+ * This attribute holds the MIME type corresponding to the outermost >+ * Content-Encoding on the channel. If you want the actual >+ * Content-Encoding value, use getResponseHeader("Content-Encoding") >+ */ >+ readonly attribute string contentEncoding; yuck.. i hadn't thought about multiple content encodings :( nsISimpleEnumerator?? also, could you move this up just below the charset attribute. thx! >Index: netwerk/protocol/http/src/nsHttpChannel.cpp >+ NS_WARNING("Unknown encoding type"); what about a "Content-Encoding: identity" response header?
Comment on attachment 71996 [details] [diff] [review] Fully working patch sr=darin bz will address the enumerator based solution for moz 1.0, but this patch would be good to have for moz 0.9.9
Attachment #71996 - Flags: superreview+
Comment on attachment 71996 [details] [diff] [review] Fully working patch a=asa (on behalf of drivers) for checkin to the 0.9.9 branch and the 1.0 trunk
Attachment #71996 - Flags: approval+
Comment on attachment 71996 [details] [diff] [review] Fully working patch Checked in on 0.9.9 branch. Will be making a similar patch for the trunk using nsISimpleEnumerator
Attachment #71996 - Flags: needs-work+
Attached patch Patch for trunk using nsISimpleEnumerator (obsolete) (deleted) — Splinter Review
Attachment #71996 - Attachment is obsolete: true
I need some background info to digest this... The Content-Encoding: header has the form Content-Encoding: mime-type1,mime-type2,mime-type3; or something like that? And that's why contentEncodings needs to return an enumerator (to examine each of these strings in turn). And why is it that the code in contentAreaUtils seems to just examine the first of these? Maybe a couple of comments would help those of us more networking-challenged code readers.
Attached patch Same patch with better comments (obsolete) (deleted) — Splinter Review
Added the following comment in contentAreaUtils.js: + // There may be content-encodings on the channel. Multiple content + // encodings are allowed, eg "Content-Encoding: gzip, uuencode". This + // header would mean that the content was first gzipped and then + // uuencoded. The encoding enumerator returns MIME types + // corresponding to each encoding starting from the end, so the first + // thing it returns corresponds to the outermost encoding. Does that help clear things up?
Attachment #73064 - Attachment is obsolete: true
Yep; I sort of figured that out on my own but I think it will be helpful for the next person. So now I wonder about the case you cite: gzip, uuencode. If we do SetApplyConversion(PR_FALSE) in that case, won't it block the uuencode-ing, also? Won't that be a problem?
It will. Unfortunately, there is no more fine-grained API to control conversions... Also, I have yet to run into a case where multiple encodings are actually used. I don't think we do the right thing with them elsewhere in the code at the moment...
Comment on attachment 73231 [details] [diff] [review] Same patch with better comments r=law This patch sounds good, then. I just wanted to make sure I passed the "code reviewer test" by spotting this flaw. Just don't make me look bad by having some misaligned curly braces in there.
Attachment #73231 - Flags: review+
One thing that's nice about emacs -- the curly brace issue is hard to trigger... :)
Comment on attachment 73231 [details] [diff] [review] Same patch with better comments nits: in the first sentence of your comment for nsIHttpChannel::contentEncodings, "one" should be "on" also, it should be documented that changing the content-encoding header on the channel while holding a reference to the content encodings enumerator has undefined behavior, or some such disclaimer like that ;-) no reason to prefix ContentEncodingsImpl w/ nsHttpChannel:: ..all of our compilers will find it. btw: how about calling this class nsContentEncodings instead of ContentEncodingsImpl. that would be more in line with necko convention. thx. i don't see anything setting mReady to false, which means that PrepareForNext would only be called once, so why not call it from the constructor? or should it be called more often than that? also, looks like you would NS_WARNING if identity was one of the content encodings, or am i missing something? have you tested this? a simple CGI script would do.
> i don't see anything setting mReady to false, That's a bug... I should be doing that. I thought I'd put that code in, but apparently I had not... That's needed to actually return more than one type. I have tested on "real-world" sites (all the bugs from comment 3). I'll make sure I write a script to output a header that will let me test the enumerator's handling of multiple encodings before I post another patch. > also, looks like you would NS_WARNING if identity was one of the content > encodings, or am i missing something? That's handled in PrepareForNext(), which just skips over "identity" encodings.
Changes: * Spelling in nsIHttpChannel.idl fixed, disclaimer added * ContentEncodingsImpl renamed to nsContentEncodings * set mReady to PR_FALSE at the end of GetNext() * remove "nsHttpChannel::" from uses of the class Not changes: * If I take "nsHttpChannel::" out of the function definitions gcc gets confused and fails to build. So I left them there. Testing: Tested on the following header: "Content-encoding: gzip, x-gzip, aa, compress, identity, deflate, identity" (with spaces exactly like that) The enumerator returned, in order: 'application/zip', 'application/x-compress', unknown encoding warning and exception, 'application/x-gzip', 'application/x-gzip'
Attachment #73231 - Attachment is obsolete: true
Attachment #73577 - Flags: superreview+
Comment on attachment 73577 [details] [diff] [review] Patch fixing Darin's nits (and not-so-nits) * Modifying the Content-Encoding header on the channel will case s/case/cause/ sr=darin
nice work btw :-)
*** Bug 130666 has been marked as a duplicate of this bug. ***
QA Contact: sairuh → tever
Comment on attachment 73577 [details] [diff] [review] Patch fixing Darin's nits (and not-so-nits) a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #73577 - Flags: approval+
checked in on trunk
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: