Closed
Bug 693806
Opened 13 years ago
Closed 13 years ago
RFC2231/5987 encoding: charset information should be treated as authoritative
Categories
(Core :: Networking, defect)
Core
Networking
Tracking
()
RESOLVED
FIXED
mozilla15
People
(Reporter: julian.reschke, Assigned: julian.reschke)
References
(Blocks 1 open bug, )
Details
Attachments
(1 file, 3 obsolete files)
(deleted),
patch
|
jduell.mcbugs
:
review+
|
Details | Diff | Splinter Review |
Test case <http://greenbytes.de/tech/tc2231/#attwithfn2231utf8-bad>:
Content-Disposition: attachment; filename*=iso-8859-1''foo-%c3%a4-%e2%82%ac.html
After percent-unescaping, Firefox apparently decides that this is more likely to be UTF-8. Other UAs either ignore the parameter (good, as it contains a non-ISO-8859-1 octet) or treat it as in a superset-encoding of ISO-8859-1 (where %82 represents a valid code point).
Optimally, FF would align with IE9's and Konqueror's strict behavior.
Assignee | ||
Updated•13 years ago
|
Assignee: nobody → julian.reschke
Assignee | ||
Comment 1•13 years ago
|
||
This patch is big considering the change of behavior, so below a few explanations.
Functional changes:
- DecodeParameter() always skips the check, we trust the charset as obtained from the 2231/5987 format
- Before returning data from GetParameterInternal(), if we do have a charset we always check that the octet sequence decodes according to the character encoding, otherwise we ignore the parameter
To get there, I made a few more changes:
- rather than short-circuiting, we always collect all information and only decide at the (only) exit point which param to use
- to do that, we collect param values for caseA, caseB and caseCorD separately (and for the two latter cases, their charset values)
- renamed "needUnquote" to "needPercentDecoding" (to avoid confusion with quoted-string unquoting)
- made the code that copies fragments into new strings more readable by calculating the length only once
I hope I haven't added a memory leak, and I also think I fixed one what would occur when both caseCorD and caseB are present (in this order), and both specify the charset
Attachment #567559 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 2•13 years ago
|
||
Comment on attachment 567559 [details] [diff] [review]
Proposed patch, incl test cases
Refactoring the code, and trying to fix bug 384571 first.
Attachment #567559 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 3•13 years ago
|
||
To be applied on top of the patch for bug 384571.
Attachment #567559 -
Attachment is obsolete: true
Attachment #568455 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 4•13 years ago
|
||
Updated patch (applies on top of bug 384571)
Attachment #568455 -
Attachment is obsolete: true
Attachment #568455 -
Flags: review?(bzbarsky)
Attachment #575733 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 5•13 years ago
|
||
Comment on attachment 575733 [details] [diff] [review]
Proposed patch, incl test case
(patch needs to be updated)
Attachment #575733 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 6•13 years ago
|
||
Attachment #575733 -
Attachment is obsolete: true
Attachment #607983 -
Flags: review?(jduell.mcbugs)
Comment 7•13 years ago
|
||
Comment on attachment 607983 [details] [diff] [review]
Proposed patch, incl test case
Review of attachment 607983 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good. Only one question:
::: netwerk/mime/nsMIMEHeaderParamImpl.cpp
@@ -721,2 @@
> return cvtUTF8->ConvertStringToUTF8(aParamValue, aCharset,
> - IS_7BIT_NON_ASCII_CHARSET(aCharset), aResult);
Any reason why you've scrapped the 7bit optimization?
Assignee | ||
Comment 8•13 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #7)
> Comment on attachment 607983 [details] [diff] [review]
> Proposed patch, incl test case
>
> Review of attachment 607983 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Looks good. Only one question:
>
> ::: netwerk/mime/nsMIMEHeaderParamImpl.cpp
> @@ -721,2 @@
> > return cvtUTF8->ConvertStringToUTF8(aParamValue, aCharset,
> > - IS_7BIT_NON_ASCII_CHARSET(aCharset), aResult);
>
> Any reason why you've scrapped the 7bit optimization?
It's not an optimization; it's a heuristic that will cause the wrong thing to happen when the name encoding is declared to be ISO-8859-1, but it *looks* as if it was UTF-8.
(see http://greenbytes.de/tech/tc2231/#attwithfn2231utf8-bad)
This bug is about removing this heuristic (that is, to treat the declared character encoding to be authoritative, not just advisory)
Comment 9•13 years ago
|
||
Comment on attachment 607983 [details] [diff] [review]
Proposed patch, incl test case
OK. I'm going to hold off on landing this until the next code fork (April 24th), since we've closed mozilla-central to all but approved patched till then.
Attachment #607983 -
Flags: review?(jduell.mcbugs) → review+
Assignee | ||
Comment 10•13 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #9)
> OK. I'm going to hold off on landing this until the next code fork (April
> 24th), since we've closed mozilla-central to all but approved patched till
> then.
Makes perfect sense. Thank you!
Comment 11•13 years ago
|
||
Comment 12•13 years ago
|
||
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla15
You need to log in
before you can comment on or make changes to this bug.
Description
•