Closed Bug 693806 Opened 13 years ago Closed 13 years ago

RFC2231/5987 encoding: charset information should be treated as authoritative

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla15

People

(Reporter: julian.reschke, Assigned: julian.reschke)

References

(Blocks 1 open bug, )

Details

Attachments

(1 file, 3 obsolete files)

Test case <http://greenbytes.de/tech/tc2231/#attwithfn2231utf8-bad>: Content-Disposition: attachment; filename*=iso-8859-1''foo-%c3%a4-%e2%82%ac.html After percent-unescaping, Firefox apparently decides that this is more likely to be UTF-8. Other UAs either ignore the parameter (good, as it contains a non-ISO-8859-1 octet) or treat it as in a superset-encoding of ISO-8859-1 (where %82 represents a valid code point). Optimally, FF would align with IE9's and Konqueror's strict behavior.
Assignee: nobody → julian.reschke
Attached patch Proposed patch, incl test cases (obsolete) (deleted) — Splinter Review
This patch is big considering the change of behavior, so below a few explanations. Functional changes: - DecodeParameter() always skips the check, we trust the charset as obtained from the 2231/5987 format - Before returning data from GetParameterInternal(), if we do have a charset we always check that the octet sequence decodes according to the character encoding, otherwise we ignore the parameter To get there, I made a few more changes: - rather than short-circuiting, we always collect all information and only decide at the (only) exit point which param to use - to do that, we collect param values for caseA, caseB and caseCorD separately (and for the two latter cases, their charset values) - renamed "needUnquote" to "needPercentDecoding" (to avoid confusion with quoted-string unquoting) - made the code that copies fragments into new strings more readable by calculating the length only once I hope I haven't added a memory leak, and I also think I fixed one what would occur when both caseCorD and caseB are present (in this order), and both specify the charset
Attachment #567559 - Flags: review?(bzbarsky)
Comment on attachment 567559 [details] [diff] [review] Proposed patch, incl test cases Refactoring the code, and trying to fix bug 384571 first.
Attachment #567559 - Flags: review?(bzbarsky)
Attached patch Proposed patch, incl test case (obsolete) (deleted) — Splinter Review
To be applied on top of the patch for bug 384571.
Attachment #567559 - Attachment is obsolete: true
Attachment #568455 - Flags: review?(bzbarsky)
Attached patch Proposed patch, incl test case (obsolete) (deleted) — Splinter Review
Updated patch (applies on top of bug 384571)
Attachment #568455 - Attachment is obsolete: true
Attachment #568455 - Flags: review?(bzbarsky)
Attachment #575733 - Flags: review?(bzbarsky)
Comment on attachment 575733 [details] [diff] [review] Proposed patch, incl test case (patch needs to be updated)
Attachment #575733 - Flags: review?(bzbarsky)
Attached patch Proposed patch, incl test case (deleted) — Splinter Review
Attachment #575733 - Attachment is obsolete: true
Attachment #607983 - Flags: review?(jduell.mcbugs)
Comment on attachment 607983 [details] [diff] [review] Proposed patch, incl test case Review of attachment 607983 [details] [diff] [review]: ----------------------------------------------------------------- Looks good. Only one question: ::: netwerk/mime/nsMIMEHeaderParamImpl.cpp @@ -721,2 @@ > return cvtUTF8->ConvertStringToUTF8(aParamValue, aCharset, > - IS_7BIT_NON_ASCII_CHARSET(aCharset), aResult); Any reason why you've scrapped the 7bit optimization?
(In reply to Jason Duell (:jduell) from comment #7) > Comment on attachment 607983 [details] [diff] [review] > Proposed patch, incl test case > > Review of attachment 607983 [details] [diff] [review]: > ----------------------------------------------------------------- > > Looks good. Only one question: > > ::: netwerk/mime/nsMIMEHeaderParamImpl.cpp > @@ -721,2 @@ > > return cvtUTF8->ConvertStringToUTF8(aParamValue, aCharset, > > - IS_7BIT_NON_ASCII_CHARSET(aCharset), aResult); > > Any reason why you've scrapped the 7bit optimization? It's not an optimization; it's a heuristic that will cause the wrong thing to happen when the name encoding is declared to be ISO-8859-1, but it *looks* as if it was UTF-8. (see http://greenbytes.de/tech/tc2231/#attwithfn2231utf8-bad) This bug is about removing this heuristic (that is, to treat the declared character encoding to be authoritative, not just advisory)
Comment on attachment 607983 [details] [diff] [review] Proposed patch, incl test case OK. I'm going to hold off on landing this until the next code fork (April 24th), since we've closed mozilla-central to all but approved patched till then.
Attachment #607983 - Flags: review?(jduell.mcbugs) → review+
(In reply to Jason Duell (:jduell) from comment #9) > OK. I'm going to hold off on landing this until the next code fork (April > 24th), since we've closed mozilla-central to all but approved patched till > then. Makes perfect sense. Thank you!
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla15
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: