Closed Bug 128199 Opened 23 years ago Closed 23 years ago

[FIX]We should not decode when saving files with well-known extension

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

mozilla1.0

People

(Reporter: bzbarsky, Assigned: bzbarsky)

References

(
URL
)

Details

Attachments

(1 file, 4 obsolete files)

work-in-progress patch 23 years ago Boris Zbarsky [:bzbarsky] (deleted), patch		Details \| Diff \| Splinter Review
Fully working patch 23 years ago Boris Zbarsky [:bzbarsky] (deleted), patch	law : review+ darin.moz : superreview+ asa : approval+	Details \| Diff \| Splinter Review
Patch for trunk using nsISimpleEnumerator 23 years ago Boris Zbarsky [:bzbarsky] (deleted), patch		Details \| Diff \| Splinter Review
Same patch with better comments 23 years ago Boris Zbarsky [:bzbarsky] (deleted), patch	law : review+	Details \| Diff \| Splinter Review
Patch fixing Darin's nits (and not-so-nits) 23 years ago Boris Zbarsky [:bzbarsky] (deleted), patch	darin.moz : superreview+ asa : approval+	Details \| Diff \| Splinter Review

Boris Zbarsky [:bzbarsky]

Assignee

Description

•

23 years ago

We now "ban" decoding of files on save if the files have a type like application/tar. We should also consider banning decoding of files on save when the url has the "gz" or "zip" extension. This is usually a good indicator that the file is in fact not to be decoded...

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

23 years ago

Status: NEW → ASSIGNED

Priority: -- → P1

Target Milestone: --- → mozilla1.0

Boris Zbarsky [:bzbarsky]

Assignee

Comment 1

•

23 years ago

URL demonstrates the problem. Go there and do a "save link as" on the link described in the second comment.

URL: http://bugzilla.ximian.com/show_bug.c...

Boris Zbarsky [:bzbarsky]

Assignee

Comment 2

•

23 years ago

Attached patch work-in-progress patch (obsolete) (deleted) — Details — Splinter Review

Boris Zbarsky [:bzbarsky]

Assignee

Comment 3

•

23 years ago

Attached patch Fully working patch (obsolete) (deleted) — Details — Splinter Review

This does what it can to make sure that extensions are right in the default suggested filename and does not decode .zip or .gz files. Does not regress any of bugs 121001, 35956, 120033, 116445, 128177

Attachment #71844 - Attachment is obsolete: true

Darin Fisher

Comment 4

•

23 years ago

bz: great work! i'll try to take a look tomorrow.

Bill Law

Comment 5

•

23 years ago

Please indulge me and explain what's going on here it terms a layman such as myself can understand. My take on this is that: - There's a link to some content, which ultimately has an arbitrary content type. - The server indicates that the file has this "ultimate" content type. - The server stores this content in some compressed format, a fact which they advertise via the Content-Encoding http header. - Normally, that header tells the user agent to un-encode the data to get the ultimate content type. - We will now infer from the file extension that this content should be stored on the client's computer un-encoded, so we will ignore the Content-Encoding header. That sounds plausible. The alternative (what we do currently), is to save the file with the wrong extension, essentially (e.g., in this case save text/xml with extension .gz). My one concern (and it may not really be a problem, I just couldn't figure out from the patch and I'm too busy/lazy to figure it out) is this: What if the file is being downloaded to be opened with a helper application that was selected based on the actual content type. For example, imagine application/x- msword instead of text/xml in the case cited). Would this patch result in us passing encoded data to MS Word?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 6

•

23 years ago

Explanation: 1) There's a link to some content, which ultimately has an arbitrary content type. 2) The server indicates that the file has this "ultimate" content type. 3) The server stores this content in some compressed format, a fact which they advertise via the Content-Encoding http header. 4) "Normally, that header tells the user agent to un-encode the data to get the ultimate content type." This is where problems start. That's not actually what the header says. The header says that the data should be decoded for viewing but should be kept encoded otherwise because the encoding is an inherent part of the data. Transfer-Encoding is the header that says "decode this once you get it", but no browsers or servers support it. So servers use content-encoding to mean either content-encoding or transfer-encoding and hence we have to guess which they meant. Proceeding: 5) We will now infer from the file extension that this content should be stored on the client's computer un-encoded, so we will ignore the Content-Encoding header. Correct. Your concern is a valid one.... Indeed, if someone has a url ending in .gz or that is served as application/msword and content-encoding: gzip and the user selects "open with application" then we will indeed feed gzipped data to Word. To be truthful, I see no way to solve this problem, since we have to decide whether to decode _before_ we know whether we're using a helper. I could fix that for the "never ask" case, but the general problem will remain until stop streaming the data to file before we know what we're doing with it. The only type I can think of that people would send off to helpers that has a .gz or .zip extension with any sort of frequency is postscript.... So this should be a fairly rare problem; certainly much rarer than .gz files of types other than application/tar that we will therefore gunzip silently.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 7

•

23 years ago

> "url ending in .gz or that is served as application/msword and content-encoding: > gzip" That should have an "and", not an "or". Booleans-R-us.

Bill Law

Comment 8

•

23 years ago

Comment on attachment 71996 [details] [diff] [review] Fully working patch r=law OK, we'll have to let that one situation slide, then. In the very unlikely event that this gets to be a "real problem"(*), then perhaps we can add code at the point we decide to hand off to the helper, and un-encode it then somehow. (*) A problem that some pointy-haired person puts nsbeta1+ on (no offense to any pointy-haired people :-).

Attachment #71996 - Flags: review+

[not reading bugmail]

Comment 9

•

23 years ago

here is just comment about well-known filetypes, I like the idea of using helper application to list these like the windows os/file types display.. if that helps, I believe this is something the registry tracks, but maybe would be useful as advanced > UI to list their association that mozilla has and its decoding how it is handled.

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

23 years ago

Summary: We should not decode when saving files with well-known extension → [FIX]We should not decode when saving files with well-known extension

Darin Fisher

Comment 10

•

23 years ago

Comment on attachment 71996 [details] [diff] [review] Fully working patch >Index: xpfe/communicator/resources/content/contentAreaUtils.js >+ const helperAppService = >+ Components.classes["@mozilla.org/uriloader/external-helper-app-service;1"].getService(Components.interfaces.nsIExternalHelperAppService); how about a newline to break up this long line? >Index: netwerk/protocol/http/public/nsIHttpChannel.idl >+ /** >+ * This attribute holds the MIME type corresponding to the outermost >+ * Content-Encoding on the channel. If you want the actual >+ * Content-Encoding value, use getResponseHeader("Content-Encoding") >+ */ >+ readonly attribute string contentEncoding; yuck.. i hadn't thought about multiple content encodings :( nsISimpleEnumerator?? also, could you move this up just below the charset attribute. thx! >Index: netwerk/protocol/http/src/nsHttpChannel.cpp >+ NS_WARNING("Unknown encoding type"); what about a "Content-Encoding: identity" response header?

Darin Fisher

Comment 11

•

23 years ago

Comment on attachment 71996 [details] [diff] [review] Fully working patch sr=darin bz will address the enumerator based solution for moz 1.0, but this patch would be good to have for moz 0.9.9

Attachment #71996 - Flags: superreview+

Asa Dotzler [:asa]

Comment 12

•

23 years ago

Comment on attachment 71996 [details] [diff] [review] Fully working patch a=asa (on behalf of drivers) for checkin to the 0.9.9 branch and the 1.0 trunk

Attachment #71996 - Flags: approval+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 13

•

23 years ago

Comment on attachment 71996 [details] [diff] [review] Fully working patch Checked in on 0.9.9 branch. Will be making a similar patch for the trunk using nsISimpleEnumerator

Attachment #71996 - Flags: needs-work+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 14

•

23 years ago

Attached patch Patch for trunk using nsISimpleEnumerator (obsolete) (deleted) — Details — Splinter Review

Attachment #71996 - Attachment is obsolete: true

Bill Law

Comment 15

•

23 years ago

I need some background info to digest this... The Content-Encoding: header has the form Content-Encoding: mime-type1,mime-type2,mime-type3; or something like that? And that's why contentEncodings needs to return an enumerator (to examine each of these strings in turn). And why is it that the code in contentAreaUtils seems to just examine the first of these? Maybe a couple of comments would help those of us more networking-challenged code readers.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 16

•

23 years ago

Attached patch Same patch with better comments (obsolete) (deleted) — Details — Splinter Review

Added the following comment in contentAreaUtils.js: + // There may be content-encodings on the channel. Multiple content + // encodings are allowed, eg "Content-Encoding: gzip, uuencode". This + // header would mean that the content was first gzipped and then + // uuencoded. The encoding enumerator returns MIME types + // corresponding to each encoding starting from the end, so the first + // thing it returns corresponds to the outermost encoding. Does that help clear things up?

Attachment #73064 - Attachment is obsolete: true

Bill Law

Comment 17

•

23 years ago

Yep; I sort of figured that out on my own but I think it will be helpful for the next person. So now I wonder about the case you cite: gzip, uuencode. If we do SetApplyConversion(PR_FALSE) in that case, won't it block the uuencode-ing, also? Won't that be a problem?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 18

•

23 years ago

It will. Unfortunately, there is no more fine-grained API to control conversions... Also, I have yet to run into a case where multiple encodings are actually used. I don't think we do the right thing with them elsewhere in the code at the moment...

Bill Law

Comment 19

•

23 years ago

Comment on attachment 73231 [details] [diff] [review] Same patch with better comments r=law This patch sounds good, then. I just wanted to make sure I passed the "code reviewer test" by spotting this flaw. Just don't make me look bad by having some misaligned curly braces in there.

Attachment #73231 - Flags: review+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 20

•

23 years ago

One thing that's nice about emacs -- the curly brace issue is hard to trigger... :)

Darin Fisher

Comment 21

•

23 years ago

Comment on attachment 73231 [details] [diff] [review] Same patch with better comments nits: in the first sentence of your comment for nsIHttpChannel::contentEncodings, "one" should be "on" also, it should be documented that changing the content-encoding header on the channel while holding a reference to the content encodings enumerator has undefined behavior, or some such disclaimer like that ;-) no reason to prefix ContentEncodingsImpl w/ nsHttpChannel:: ..all of our compilers will find it. btw: how about calling this class nsContentEncodings instead of ContentEncodingsImpl. that would be more in line with necko convention. thx. i don't see anything setting mReady to false, which means that PrepareForNext would only be called once, so why not call it from the constructor? or should it be called more often than that? also, looks like you would NS_WARNING if identity was one of the content encodings, or am i missing something? have you tested this? a simple CGI script would do.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 22

•

23 years ago

> i don't see anything setting mReady to false, That's a bug... I should be doing that. I thought I'd put that code in, but apparently I had not... That's needed to actually return more than one type. I have tested on "real-world" sites (all the bugs from comment 3). I'll make sure I write a script to output a header that will let me test the enumerator's handling of multiple encodings before I post another patch. > also, looks like you would NS_WARNING if identity was one of the content > encodings, or am i missing something? That's handled in PrepareForNext(), which just skips over "identity" encodings.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 23

•

23 years ago

Attached patch Patch fixing Darin's nits (and not-so-nits) (deleted) — Details — Splinter Review

Changes: * Spelling in nsIHttpChannel.idl fixed, disclaimer added * ContentEncodingsImpl renamed to nsContentEncodings * set mReady to PR_FALSE at the end of GetNext() * remove "nsHttpChannel::" from uses of the class Not changes: * If I take "nsHttpChannel::" out of the function definitions gcc gets confused and fails to build. So I left them there. Testing: Tested on the following header: "Content-encoding: gzip, x-gzip, aa, compress, identity, deflate, identity" (with spaces exactly like that) The enumerator returned, in order: 'application/zip', 'application/x-compress', unknown encoding warning and exception, 'application/x-gzip', 'application/x-gzip'

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

23 years ago

Attachment #73231 - Attachment is obsolete: true

Darin Fisher

Updated

•

23 years ago

Attachment #73577 - Flags: superreview+

Darin Fisher

Comment 24

•

23 years ago

Comment on attachment 73577 [details] [diff] [review] Patch fixing Darin's nits (and not-so-nits) * Modifying the Content-Encoding header on the channel will case s/case/cause/ sr=darin

Darin Fisher

Comment 25

•

23 years ago

nice work btw :-)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 26

•

23 years ago

*** Bug 130666 has been marked as a duplicate of this bug. ***

sairuh (rarely reading bugmail)

Updated

•

23 years ago

QA Contact: sairuh → tever

Asa Dotzler [:asa]

Comment 27

•

23 years ago

Comment on attachment 73577 [details] [diff] [review] Patch fixing Darin's nits (and not-so-nits) a=asa (on behalf of drivers) for checkin to the 1.0 trunk

Attachment #73577 - Flags: approval+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 28

•

23 years ago

checked in on trunk

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

8 years ago

Product: Core → Core Graveyard

You need to log in before you can comment on or make changes to this bug.