Closed Bug 610054 Opened 14 years ago Closed 13 years ago

Clean up RFC 2231 MIME param handling, and add support for RFC 5987 subset

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla10

People

(Reporter: jduell.mcbugs, Assigned: jduell.mcbugs)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 5 obsolete files)

This patch fixes a number of bugs in our MIME parameter parsing, especially related to continuations. See http://greenbytes.de/tech/tc2231/#attfncontlz Also, RFC 5987 removes continuations from HTTP (but they remain valid for RFC 2231 apps like email): http://tools.ietf.org/html/draft-ietf-httpbis-content-disp-03 so I've implemented a enum that changes the parser to ignore them. I'm not sure of the best way to actually hook it up to the browser, though, so for now it's tested in xpcshell (via a GetParameterRFC5987 function) but otherwise unused. Note that I've also kept the RFC 2047 decoding in the new function: do we still want that? Any advice on how (and/or whether) this code should get hooked in would be welcome. There are still other parsing errors from http://greenbytes.de/tech/tc2231/ that are not fixed by this patch--those can be dealt with in a followup bug. This patch is on top of my one for bug 588781.
Attachment #488587 - Flags: review?(bzbarsky)
Assignee: nobody → jduell.mcbugs
Jason, is this something we're trying to do for 2.0? Just trying to prioritize reviews.... As for RFC 2047, the question is whether that's still being used, right? I assume that it's not compatible with what RFC 5987 defines?
(In reply to comment #0) > Note that I've also kept the RFC 2047 decoding in the new function: > do we still want that? Any advice on how (and/or whether) this code should > get hooked in would be welcome. From my reading of the specs, RFC 2047 encoding does not apply to header field parameters, see <http://greenbytes.de/tech/webdav/draft-ietf-httpbis-content-disp-03.html#rfc.section.C.1>. In HTTP C-D it was only implemented in Mozilla, and also made it into Chrome because of this code being written by the same author. It's not supported in other UAs (see <http://greenbytes.de/tech/webdav/draft-ietf-httpbis-content-disp-03.html#rfc.section.C.4>). So I doubt a lot it's used by anybody, and I'd recommend to get rid of it. For MIME, the specification argument is the same, but I have no data on who's implementing this, so the risk of removing this might be higher.
(In reply to comment #1) > As for RFC 2047, the question is whether that's still being used, right? I > assume that it's not compatible with what RFC 5987 defines? Clarification: the situation hasn't changed with 5987. It never was applicable to *parameter* encoding, according to the specs.
Cc-ing a couple of Thunderbird developers so they're aware of this. > is this something we're trying to do for 2.0? Nope--I just got on a roll looking over the spec and figured I'd code up the bugfixes I ran into while I had it paged into my head. > As for RFC 2047, the question is whether that's still being used, right? Right. It was never spec-supported, but was apparently being used circa 2003. From nsIMIMEHeaderParam.idl: * Note that a lot of MUAs and HTTP servers put RFC 2047-encoded parameters * in mail headers and HTTP headers. Unfortunately, this includes Mozilla * as of 2003-05-30. Even more standard-ignorant MUAs, web servers and * application servers put 'raw 8bit characters'. This will try to cope * with all these cases as gracefully as possible. Additionally, it * returns the language tag if the parameter is encoded per RFC 2231 and * includes lang. We should look at whether this is still the case (perhaps thunderbird is still using 2047?), and nuke the 2047 (and raw 8 bits?) handling if possible. Re: plugging in this code: Julian points out that RFC 5987 only applies to HTTP header parameters, so 2231 mode should still be used when Content-Disposition appears in the *payload* of an HTTP message, such as for multipart/whatever." And we'd obviously still need it for Thunderbird. So perhaps the answer is to have two functions in the IDL, as I've done, and figure out where we want to convert calls into the 5987 variant.
Status: NEW → ASSIGNED
(In reply to comment #4) > > As for RFC 2047, the question is whether that's still being used, right? > > Right. It was never spec-supported, but was apparently being used circa 2003. > From nsIMIMEHeaderParam.idl: > > * Note that a lot of MUAs and HTTP servers put RFC 2047-encoded parameters > * in mail headers and HTTP headers. Unfortunately, this includes Mozilla > * as of 2003-05-30. Even more standard-ignorant MUAs, web servers and > * application servers put 'raw 8bit characters'. This will try to cope > * with all these cases as gracefully as possible. Additionally, it > * returns the language tag if the parameter is encoded per RFC 2231 and > * includes lang. As far as I can tell, historically *only* Mozilla/FF supported RFC 2047 in HTTP header params (incorrectly). See <http://greenbytes.de/tech/webdav/draft-ietf-httpbis-content-disp-03.html#rfc.section.C.4>. Chrome just inherited this years later, no other UA is doing that.
Jason, I've got a new test case: <http://greenbytes.de/tech/tc2231/#emptydisposition> where FF is the only UA to fail. Do we need a separate bug? (I currently don't have time to test with the proposed change).
Thunderbird's behavior does not matter for this issue. For emission : The last time I checked (a few years ago), Thunderbird by default used RFC 2231 for C-D, but it breaks the interoperability because MS Outlook and Outlook Express did not understand RFC 2231 (at least until 2 yrs ago). So, there's a config param (about:config) by which a user can select to use RFC 2047 for C-D header. The latest version of MS Outlook / Outlook Express (the latter is free, but sometimes, it's better in terms of I18N than non-free MS Outlook) might support RFC 2231, but I haven't tried them recently. For receiving: TB has to support RFC2047 in C-D because there are MUAs and web mail services that produce RFC 2047 for C-D in email messages. As for Firefox, note that multiple Google products (including gmail) emits RFC 2047. So, it can't be simply removed overnight. We have to find out what Yahoo, Hotmail/Live and various country-specific web mail services do in HTTP before deciding what to do. I haven't followed what TB does this days
See also http://blog.gmane.org/gmane.org.w3c.internationalization.general/month=20031101 It's a 7-yr old mail thread, but some people voiced the support for using RFC 2047 in HTTP C-D header. So, it's likely that there are some servers (other than Google's) that still emit RFC 2047 in HTTP C-D header.
The previous patch didn't apply anymore; I *believe* I have resolved the merge conflict properly, but Jason should double-check.
(In reply to comment #0) > so I've implemented a enum that changes the parser to ignore them. I'm not > sure of the best way to actually hook it up to the browser, though, so for now > it's tested in xpcshell (via a GetParameterRFC5987 function) but otherwise > unused. Note that I've also kept the RFC 2047 decoding in the new function: > do we still want that? Any advice on how (and/or whether) this code should > get hooked in would be welcome. I tested this with the revised patch I just attached, and I can confirm that FF will still process continuations, and the fixes Jason added indeed work as described.
Outlook Express/Windows Mail/Windows Live Mail/Office Outlook have never been supported RFC2231 encoding. We can't remove RFC2047 support at least for mail.
(In reply to comment #11) > Outlook Express/Windows Mail/Windows Live Mail/Office Outlook have never been > supported RFC2231 encoding. We can't remove RFC2047 support at least for mail. And the change as proposed doesn't do that; neither for email nor for HTTP. That being said, I think we *should* disable it for HTTP (in the nightly channel) in order to find out what sites "break". Those sites should be fixed to use something which is properly specified and interoperable. (yes, looking at you, GMail!)
I don't think seriously breaking gmail in nightlies is an option, fwiw. We'll need to get them to change first. I'm sorry about the slow review here; I need to do a bunch more reviewing....
(In reply to comment #13) > I don't think seriously breaking gmail in nightlies is an option, fwiw. We'll > need to get them to change first. > ... Just to be clear: this patch doesn't break GMail, disabling the RFC 2047 encoding just happens to become more easy once the path is in.
Yes, that was clear from comment 12. I just think that the followup plan in that comment is not realistic as presented. ;)
Can we get this committed before FF6 Aurora branches? (I'd like to do some more work on C-D, but would like unnecessary conflicts because of other pending changes)
Attachment #526485 - Flags: review?(bzbarsky)
Attachment #488587 - Attachment is obsolete: true
Attachment #488587 - Flags: review?(bzbarsky)
The logic in this patch isn't currently hooked up to anything, so we could theoretically land it. Perhaps we should first add a way to hook it up to HTTP header parsing and provide an about:config pref to turn it on, and see what happens (first manually, and if things look promising, then try it on by default in the nightlies). bz: we call GetParameter in a bunch of places: do have a idea of which we'd want to switch to the getParameter5987() function? nsHTMLMediaElement::GetCanPlay nsFeedSniffer::HasAttachmentDisposition() nsContentTypeParser::GetType() nsScriptLoader::ProcessScriptElement() XULContentSinkImpl::OpenScript() nsDocumentOpenInfo::DispatchContent() nsExternalHelperAppService::GetFilenameAndExtensionFromChannel()
The support for the RFC 5987 variant ("param*") should only be activated for parameters that are defined to support it - which hopefully is the case for all fields that require I18N. For now, this should be "filename" in Content-Disposition and "title" in Link (which currently doesn't do it -- see http://greenbytes.de/tech/tc/httplink/#simplecsstitle5987 -- but it's on my list :-).
We can experiment with switching C-D filename parsing fairly easily. I suggest we wait until bug 589292 lands first, as it touches the same code.
Comment on attachment 526485 [details] [diff] [review] updates Jason's patch to apply to the current version Please update the "decode RFC 2231 ..." comments in DoGetParameter. Should DoParameterInternal be DoGetParameterInternal? Either way, I guess. Using a char for nextContinuation seems like it will fail if someone sends a header with 256 continuations. Why not use an int with -1 as the failure signal? caseCStart is really caseC_or_DStart or something, right? Is it worth naming accordingly? I don't understand the handling of nextContinuation == '!'. How does that cause us to skip future continuations, in cases of >9 continuations? The rest looks fine, but the nextContinuation thing needs fixing. Sorry about the terrible review lag here. :(
Attachment #526485 - Flags: review?(bzbarsky) → review-
Attached patch proposed patch brought up-to-date with trunk (obsolete) (deleted) — Splinter Review
(this doesn't address any of Boris' comments though)
Attachment #526485 - Attachment is obsolete: true
Attached patch proposed patch brought up-to-date with trunk (obsolete) (deleted) — Splinter Review
this should address some of Boris' comments; see separate comment in bugzilla
Attachment #559771 - Attachment is obsolete: true
Attachment #559792 - Flags: review?(bzbarsky)
(In reply to Boris Zbarsky (:bz) from comment #20) > Comment on attachment 526485 [details] [diff] [review] > updates Jason's patch to apply to the current version > > Please update the "decode RFC 2231 ..." comments in DoGetParameter. I augmented one of the comments in .h. Note sure about the rest; do you want to *replace* mentions of 2231 by something else. > Should DoParameterInternal be DoGetParameterInternal? Either way, I guess. I don't feel strongly about this; except that we don't want the names to get too long, right? > Using a char for nextContinuation seems like it will fail if someone sends a > header with 256 continuations. Why not use an int with -1 as the failure > signal? Done. This also means that gaps are now detected past segment 9 (test cases enhanced). > caseCStart is really caseC_or_DStart or something, right? Is it worth > naming accordingly? Not sure. > I don't understand the handling of nextContinuation == '!'. How does that > cause us to skip future continuations, in cases of >9 continuations? Gone by switching to int. > The rest looks fine, but the nextContinuation thing needs fixing. Sorry > about the terrible review lag here. :(
Blocks: 686429
OS: Linux → All
Hardware: x86_64 → All
Attached patch proposed patch brought up-to-date with trunk (obsolete) (deleted) — Splinter Review
Attachment #559792 - Attachment is obsolete: true
Attachment #564915 - Flags: review?(bzbarsky)
Attachment #559792 - Flags: review?(bzbarsky)
Comment on attachment 564915 [details] [diff] [review] proposed patch brought up-to-date with trunk > do you want to *replace* mentions of 2231 by something else. I meant the comments in the .cpp. Like this one: 89 // get parameter (decode RFC 2231 if it's RFC 2231-encoded and 90 // return charset.) which are not quite true anymore; the behavior now depends on the new flag this patch adds. > Is it worth naming accordingly? I think it is. Let's call it caseCorDStart. Now for the patch: >+ // no trailing zeros allowed except for ... position 0 That should be "leading zeros", not trailing ones, right? r=me with those nits picked, and another apology for lag. :(
Attachment #564915 - Flags: review?(bzbarsky) → review+
Target Milestone: --- → mozilla10
Attached patch proposed patch (deleted) — Splinter Review
Review comments hopefully addressed.
Attachment #564915 - Attachment is obsolete: true
Attachment #565628 - Flags: review?
Attachment #565628 - Flags: review? → review?(bzbarsky)
Comment on attachment 565628 [details] [diff] [review] proposed patch r=me
Attachment #565628 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky (:bz) from comment #25) > ... > r=me with those nits picked, and another apology for lag. :( Thanks a lot. I'll proceed with 686429 (the cleanup of the tests).
In my queue with a few other bits that are being sent to try first and then onto inbound :-) https://tbpl.mozilla.org/?tree=Try&rev=c1528769b893
Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Unclear to me what doc update is needed for this. C-D is documented here https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition. If needed, please re-add ddn with an explanation of what kind of docs we should add here.
Keywords: dev-doc-needed
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: