Closed
Bug 128199
Opened 23 years ago
Closed 23 years ago
[FIX]We should not decode when saving files with well-known extension
Categories
(Core Graveyard :: File Handling, defect, P1)
Core Graveyard
File Handling
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla1.0
People
(Reporter: bzbarsky, Assigned: bzbarsky)
References
()
Details
Attachments
(1 file, 4 obsolete files)
(deleted),
patch
|
darin.moz
:
superreview+
asa
:
approval+
|
Details | Diff | Splinter Review |
We now "ban" decoding of files on save if the files have a type like
application/tar.
We should also consider banning decoding of files on save when the url has the
"gz" or "zip" extension. This is usually a good indicator that the file is in
fact not to be decoded...
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla1.0
Assignee | ||
Comment 1•23 years ago
|
||
URL demonstrates the problem. Go there and do a "save link as" on the link
described in the second comment.
Assignee | ||
Comment 2•23 years ago
|
||
Assignee | ||
Comment 3•23 years ago
|
||
This does what it can to make sure that extensions are right in the default
suggested filename and does not decode .zip or .gz files. Does not regress any
of bugs 121001, 35956, 120033, 116445, 128177
Attachment #71844 -
Attachment is obsolete: true
Comment 4•23 years ago
|
||
bz: great work! i'll try to take a look tomorrow.
Please indulge me and explain what's going on here it terms a layman such as
myself can understand.
My take on this is that:
- There's a link to some content, which ultimately has an arbitrary content
type.
- The server indicates that the file has this "ultimate" content type.
- The server stores this content in some compressed format, a fact which they
advertise via the Content-Encoding http header.
- Normally, that header tells the user agent to un-encode the data to get the
ultimate content type.
- We will now infer from the file extension that this content should be stored
on the client's computer un-encoded, so we will ignore the Content-Encoding
header.
That sounds plausible. The alternative (what we do currently), is to save the
file with the wrong extension, essentially (e.g., in this case save text/xml
with extension .gz).
My one concern (and it may not really be a problem, I just couldn't figure out
from the patch and I'm too busy/lazy to figure it out) is this: What if the
file is being downloaded to be opened with a helper application that was
selected based on the actual content type. For example, imagine application/x-
msword instead of text/xml in the case cited).
Would this patch result in us passing encoded data to MS Word?
Assignee | ||
Comment 6•23 years ago
|
||
Explanation:
1) There's a link to some content, which ultimately has an arbitrary content
type.
2) The server indicates that the file has this "ultimate" content type.
3) The server stores this content in some compressed format, a fact which they
advertise via the Content-Encoding http header.
4) "Normally, that header tells the user agent to un-encode the data to get the
ultimate content type."
This is where problems start. That's not actually what the header says. The
header says that the data should be decoded for viewing but should be kept
encoded otherwise because the encoding is an inherent part of the data.
Transfer-Encoding is the header that says "decode this once you get it", but no
browsers or servers support it. So servers use content-encoding to mean either
content-encoding or transfer-encoding and hence we have to guess which they meant.
Proceeding:
5) We will now infer from the file extension that this content should be stored
on the client's computer un-encoded, so we will ignore the Content-Encoding
header.
Correct.
Your concern is a valid one.... Indeed, if someone has a url ending in .gz or
that is served as application/msword and content-encoding: gzip and the user
selects "open with application" then we will indeed feed gzipped data to Word.
To be truthful, I see no way to solve this problem, since we have to decide
whether to decode _before_ we know whether we're using a helper. I could fix
that for the "never ask" case, but the general problem will remain until stop
streaming the data to file before we know what we're doing with it.
The only type I can think of that people would send off to helpers that has a
.gz or .zip extension with any sort of frequency is postscript.... So this
should be a fairly rare problem; certainly much rarer than .gz files of types
other than application/tar that we will therefore gunzip silently.
Assignee | ||
Comment 7•23 years ago
|
||
> "url ending in .gz or that is served as application/msword and
content-encoding: > gzip"
That should have an "and", not an "or". Booleans-R-us.
Comment on attachment 71996 [details] [diff] [review]
Fully working patch
r=law
OK, we'll have to let that one situation slide, then. In the very unlikely
event that this gets to be a "real problem"(*), then perhaps we can add code at
the point we decide to hand off to the helper, and un-encode it then somehow.
(*) A problem that some pointy-haired person puts nsbeta1+ on (no offense to
any pointy-haired people :-).
Attachment #71996 -
Flags: review+
Comment 9•23 years ago
|
||
here is just comment about well-known filetypes, I like the idea of using helper
application to list these like the windows os/file types display.. if that
helps, I believe this is something the registry tracks, but maybe would be
useful as advanced > UI to list their association that mozilla has and its
decoding how it is handled.
Assignee | ||
Updated•23 years ago
|
Summary: We should not decode when saving files with well-known extension → [FIX]We should not decode when saving files with well-known extension
Comment 10•23 years ago
|
||
Comment on attachment 71996 [details] [diff] [review]
Fully working patch
>Index: xpfe/communicator/resources/content/contentAreaUtils.js
>+ const helperAppService =
>+ Components.classes["@mozilla.org/uriloader/external-helper-app-service;1"].getService(Components.interfaces.nsIExternalHelperAppService);
how about a newline to break up this long line?
>Index: netwerk/protocol/http/public/nsIHttpChannel.idl
>+ /**
>+ * This attribute holds the MIME type corresponding to the outermost
>+ * Content-Encoding on the channel. If you want the actual
>+ * Content-Encoding value, use getResponseHeader("Content-Encoding")
>+ */
>+ readonly attribute string contentEncoding;
yuck.. i hadn't thought about multiple content encodings :(
nsISimpleEnumerator??
also, could you move this up just below the charset attribute. thx!
>Index: netwerk/protocol/http/src/nsHttpChannel.cpp
>+ NS_WARNING("Unknown encoding type");
what about a "Content-Encoding: identity" response header?
Comment 11•23 years ago
|
||
Comment on attachment 71996 [details] [diff] [review]
Fully working patch
sr=darin
bz will address the enumerator based solution for moz 1.0, but this patch would
be good to have for moz 0.9.9
Attachment #71996 -
Flags: superreview+
Comment 12•23 years ago
|
||
Comment on attachment 71996 [details] [diff] [review]
Fully working patch
a=asa (on behalf of drivers) for checkin to the 0.9.9 branch and the 1.0 trunk
Attachment #71996 -
Flags: approval+
Assignee | ||
Comment 13•23 years ago
|
||
Comment on attachment 71996 [details] [diff] [review]
Fully working patch
Checked in on 0.9.9 branch. Will be making a similar patch for the trunk using
nsISimpleEnumerator
Attachment #71996 -
Flags: needs-work+
Assignee | ||
Comment 14•23 years ago
|
||
Attachment #71996 -
Attachment is obsolete: true
Comment 15•23 years ago
|
||
I need some background info to digest this...
The Content-Encoding: header has the form
Content-Encoding: mime-type1,mime-type2,mime-type3;
or something like that? And that's why contentEncodings needs to return an
enumerator (to examine each of these strings in turn).
And why is it that the code in contentAreaUtils seems to just examine the first
of these?
Maybe a couple of comments would help those of us more networking-challenged
code readers.
Assignee | ||
Comment 16•23 years ago
|
||
Added the following comment in contentAreaUtils.js:
+ // There may be content-encodings on the channel. Multiple content
+ // encodings are allowed, eg "Content-Encoding: gzip, uuencode". This
+ // header would mean that the content was first gzipped and then
+ // uuencoded. The encoding enumerator returns MIME types
+ // corresponding to each encoding starting from the end, so the first
+ // thing it returns corresponds to the outermost encoding.
Does that help clear things up?
Attachment #73064 -
Attachment is obsolete: true
Comment 17•23 years ago
|
||
Yep; I sort of figured that out on my own but I think it will be helpful for the
next person.
So now I wonder about the case you cite: gzip, uuencode. If we do
SetApplyConversion(PR_FALSE) in that case, won't it block the uuencode-ing,
also? Won't that be a problem?
Assignee | ||
Comment 18•23 years ago
|
||
It will. Unfortunately, there is no more fine-grained API to control conversions...
Also, I have yet to run into a case where multiple encodings are actually used.
I don't think we do the right thing with them elsewhere in the code at the
moment...
Comment 19•23 years ago
|
||
Comment on attachment 73231 [details] [diff] [review]
Same patch with better comments
r=law
This patch sounds good, then. I just wanted to make sure I passed the "code
reviewer test" by spotting this flaw. Just don't make me look bad by having
some misaligned curly braces in there.
Attachment #73231 -
Flags: review+
Assignee | ||
Comment 20•23 years ago
|
||
One thing that's nice about emacs -- the curly brace issue is hard to trigger...
:)
Comment 21•23 years ago
|
||
Comment on attachment 73231 [details] [diff] [review]
Same patch with better comments
nits:
in the first sentence of your comment for nsIHttpChannel::contentEncodings,
"one" should be "on"
also, it should be documented that changing the content-encoding header on the
channel while holding a reference to the content encodings enumerator has
undefined behavior, or some such disclaimer like that ;-)
no reason to prefix ContentEncodingsImpl w/ nsHttpChannel:: ..all of our
compilers will find it.
btw: how about calling this class nsContentEncodings instead of
ContentEncodingsImpl. that would be more in line with necko convention. thx.
i don't see anything setting mReady to false, which means that PrepareForNext
would only be called once, so why not call it from the constructor? or should
it be called more often than that?
also, looks like you would NS_WARNING if identity was one of the content
encodings, or am i missing something?
have you tested this? a simple CGI script would do.
Assignee | ||
Comment 22•23 years ago
|
||
> i don't see anything setting mReady to false,
That's a bug... I should be doing that. I thought I'd put that code in, but
apparently I had not... That's needed to actually return more than one type.
I have tested on "real-world" sites (all the bugs from comment 3). I'll make
sure I write a script to output a header that will let me test the enumerator's
handling of multiple encodings before I post another patch.
> also, looks like you would NS_WARNING if identity was one of the content
> encodings, or am i missing something?
That's handled in PrepareForNext(), which just skips over "identity" encodings.
Assignee | ||
Comment 23•23 years ago
|
||
Changes:
* Spelling in nsIHttpChannel.idl fixed, disclaimer added
* ContentEncodingsImpl renamed to nsContentEncodings
* set mReady to PR_FALSE at the end of GetNext()
* remove "nsHttpChannel::" from uses of the class
Not changes:
* If I take "nsHttpChannel::" out of the function definitions gcc gets
confused and fails to build. So I left them there.
Testing:
Tested on the following header:
"Content-encoding: gzip, x-gzip, aa, compress, identity, deflate,
identity"
(with spaces exactly like that)
The enumerator returned, in order: 'application/zip',
'application/x-compress', unknown encoding warning and exception,
'application/x-gzip', 'application/x-gzip'
Assignee | ||
Updated•23 years ago
|
Attachment #73231 -
Attachment is obsolete: true
Updated•23 years ago
|
Attachment #73577 -
Flags: superreview+
Comment 24•23 years ago
|
||
Comment on attachment 73577 [details] [diff] [review]
Patch fixing Darin's nits (and not-so-nits)
* Modifying the Content-Encoding header on the channel will case
s/case/cause/
sr=darin
Comment 25•23 years ago
|
||
nice work btw :-)
Assignee | ||
Comment 26•23 years ago
|
||
*** Bug 130666 has been marked as a duplicate of this bug. ***
Updated•23 years ago
|
QA Contact: sairuh → tever
Comment 27•23 years ago
|
||
Comment on attachment 73577 [details] [diff] [review]
Patch fixing Darin's nits (and not-so-nits)
a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #73577 -
Flags: approval+
Assignee | ||
Comment 28•23 years ago
|
||
checked in on trunk
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Updated•8 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•