Closed Bug 414064 Opened 17 years ago Closed 14 years ago

Remove support for UTF-7 (and others) per HTML5 spec

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla5
Tracking Status
firefox5 --- fixed
blocking2.0 --- -
status1.9.2 --- wontfix
status1.9.1 --- wontfix

People

(Reporter: dveditz, Assigned: smontagu)

References

(Blocks 1 open bug)

Details

(Keywords: dev-doc-complete, html5, Whiteboard: [sg:want P3])

Attachments

(2 files, 8 obsolete files)

Section 8.2.2.2 of the WHAT-WG HTML5 spec says User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU encodings. Support for UTF-32 is not recommended. This encoding is rarely used, and frequently misimplemented. http://www.whatwg.org/specs/web-apps/current-work/#character0 I don't know about UTF-32, but as far as I can tell UTF-7 is only used to demo XSS holes and isn't used for live web content. Even the authors of UTF-7 agree: http://mail.apps.ietf.org/ietf/charsets/msg01746.html Some bugs seem to imply we may need utf-7 or a variant for IMAP folder names. Don't know if that's still a requirement or if there's a way to preserve it for mail protocols while killing its use for the web and mail _content_.
We still need IMAP Mod utf-7 for IMAP folder names (and keywords), and will for the forseeable future, afaik. We use the intl converters to convert back and forth from unicode to imap mod utf7.
Could we change the registered name of the converter to some arbitrary internal name so that it works when we call it manually for IMAP but doesn't recognize "utf-7" in content? OTOH, I think I remember Neil saying when we discussed this on IRC a little while ago that he still comes across UTF-7 in mail content.
IMAP uses MUTF7, "x-imap4-modified-utf7" - see http://mxr.mozilla.org/mailnews/search?string=mutf7
Within the IMAP community, the fact that MUTF7 exists is generally believed to be a design error, but it still exists. There is a draft proposal that might deprecate it, but it's both controversial and not done: http://tools.ietf.org/html/draft-ietf-eai-imap-utf8 So despite the need for MUTF7 for the time being in IMAP mailbox names, removal of support for the regular UTF-7 charset seems like a good idea.
Just wanted to poke this bug to see if anyone has given further thought on how we can remove UTF-7 globally. We continue to see a steady trickle of XSS bugs that leverage UTF-7, so it would be great to shut it off once and for all.
Whiteboard: [sg:want P3]
Blocks: 530647
Keywords: html5
Attached patch Remove UTF7 and move BasicUTF7 (obsolete) (deleted) — Splinter Review
Attachment #432349 - Flags: review?(smontagu)
Attached patch Merge BasicUTF7 and MUTF7 (obsolete) (deleted) — Splinter Review
Attachment #432350 - Flags: review?(smontagu)
So are we sure that there are no mail clients still around that send UTF-7 mails?
(In reply to comment #8) MS Exchange 2000 uses UTF-7 for Delivery status notification messages http://support.microsoft.com/kb/819831/en-us/ http://www.microsoft.com/exchange/2007/support/lifecycle/2000.mspx
(In reply to comment #9) End of lifecycle of MS Exchange 2000 is January 1, 2011.
So, if we theoretically removed our UTF-7 support today, what issues could/would occur?
Webkit has been already removed UTF-7 support. I filed a security bug around UTF-7 decoder of Webkit. It is fixed by removal of UTF-7 support.
(In reply to comment #12) > I filed a security bug around UTF-7 decoder of Webkit. Oops, it is NOT correct description. I thought it may be security-related, and filed it as Security-sensitive. But, No security attack method using it was found.
(In reply to comment #13) > (In reply to comment #12) > > I filed a security bug around UTF-7 decoder of Webkit. > Oops, it is NOT correct description. > I thought it may be security-related, and filed it as Security-sensitive. > But, No security attack method using it was found. And main security-related reason was "UTF-7 is commonly used for XSS".
Comment on attachment 432349 [details] [diff] [review] Remove UTF7 and move BasicUTF7 Canceling r? for now; comments still welcome though.
Attachment #432349 - Flags: review?(smontagu)
Attachment #432350 - Flags: review?(smontagu)
The way we handled this at Opera was disabling UTF-7 just for web content. UTF-32 support was nuked completely though.
I'm going to investigate moving the UTF-7 converters into mailnews/imap or somewhere.
Status: NEW → ASSIGNED
Attached patch Remove utf-7 (intl part) (obsolete) (deleted) — Splinter Review
This patch breaks tests in mailnews/base, which is a Good Thing, because it means that there is already testing in place for the follow-up patch to reimplement UTF-7 in mailnews.
Attachment #462910 - Flags: review?(VYV03354)
Attached patch Remove utf-7 (parser part) (obsolete) (deleted) — Splinter Review
Attachment #462911 - Flags: review?(hsivonen)
Attached patch Remove utf-7 (parser part) (obsolete) (deleted) — Splinter Review
The right patch this time
Attachment #462911 - Attachment is obsolete: true
Attachment #462911 - Flags: review?(hsivonen)
Comment on attachment 462910 [details] [diff] [review] Remove utf-7 (intl part) Removing review request -- we may still need the alias handling.
Attachment #462910 - Flags: review?(VYV03354)
Depends on: 587475
Filed bug 587475 with a patch reimplementing UTF-7 decoding and encoding in mailnews using character sinks.
Be real nice if we could get this into 2.0, then we could stop worrying about these UTF-7 smuggled XSS issues.
blocking2.0: --- → ?
Attached patch Remove utf-7 (intl part) v.2 (obsolete) (deleted) — Splinter Review
Tested with the patch in bug 587475.
Attachment #462910 - Attachment is obsolete: true
Attachment #466601 - Flags: review?(VYV03354)
Attachment #462912 - Flags: review?(hsivonen)
Attached patch tests (obsolete) (deleted) — Splinter Review
Blocks: 412431
Unfortunately View Source of a UTF-7 message is broken with this approach. I haven't tested viewing an .eml file in browser, but I expect it will be broken also.
Sorry, the previous comment belongs in bug 587475
Attachment #466601 - Flags: review?(VYV03354) → review+
Comment on attachment 462912 [details] [diff] [review] Remove utf-7 (parser part) If mailnews restores an UTF-7 decoder, what's the expected behavior for an HTML message that doesn't specify the encoding on the MIME layer but has <meta http-equiv="Content-Type" content="text/html; charset=utf-7"> or <meta http-equiv="Content-Type" content="text/html; charset=x-imap4-modified-utf7"> inside the message? Would GetUnicodeDecoderRaw when called with "utf-7" or "x-imap4-modified-utf7" from return a decoder when the HTML parser runs in mailnews? In any case, please add a reftest that loads an HTML document that has <meta http-equiv="Content-Type" content="text/html; charset=utf-7"> and another with <meta http-equiv="Content-Type" content="text/html; charset=x-imap4-modified-utf7"> and content decodes differently in UTF-7 and Windows-1252.
blocking2.0: ? → -
(In reply to comment #29) > Would GetUnicodeDecoderRaw when called with "utf-7" or "x-imap4-modified-utf7" > from return a decoder when the HTML parser runs in mailnews? The new UTF-7 decoder in mailnews is a character sink, not an nsIUnicodeDecoder, so GetUnicodeDecoderRaw is not going to return a decoder in this scenario. > If mailnews restores an UTF-7 decoder, what's the expected behavior for an HTML > message that doesn't specify the encoding on the MIME layer but has <meta > http-equiv="Content-Type" content="text/html; charset=utf-7"> or <meta > http-equiv="Content-Type" content="text/html; charset=x-imap4-modified-utf7"> > inside the message? "UTF-7" is special cased in MimeInlineText_convert_and_parse_line() in the patch for bug 587475 so that it will be decoded, and I have tested that this works both in text and HTML messages. "x-imap4-modified-utf7" will not be decoded. > In any case, please add a reftest that loads an HTML document that has <meta > http-equiv="Content-Type" content="text/html; charset=utf-7"> and another with > <meta http-equiv="Content-Type" content="text/html; > charset=x-imap4-modified-utf7"> and content decodes differently in UTF-7 and > Windows-1252. I don't mind doing this, but do you think it adds significant value beyond the xpcshell test in attachment 466602 [details] [diff] [review]?
Comment on attachment 462912 [details] [diff] [review] Remove utf-7 (parser part) (In reply to comment #30) > (In reply to comment #29) > > > Would GetUnicodeDecoderRaw when called with "utf-7" or "x-imap4-modified-utf7" > > from return a decoder when the HTML parser runs in mailnews? > > The new UTF-7 decoder in mailnews is a character sink, not an > nsIUnicodeDecoder, so GetUnicodeDecoderRaw is not going to return a decoder in > this scenario. Great. > > If mailnews restores an UTF-7 decoder, what's the expected behavior for an HTML > > message that doesn't specify the encoding on the MIME layer but has <meta > > http-equiv="Content-Type" content="text/html; charset=utf-7"> or <meta > > http-equiv="Content-Type" content="text/html; charset=x-imap4-modified-utf7"> > > inside the message? > > "UTF-7" is special cased in MimeInlineText_convert_and_parse_line() in the > patch for bug 587475 so that it will be decoded, and I have tested that this > works both in text and HTML messages. "x-imap4-modified-utf7" will not be > decoded. Code inspection of the patch suggests to me that internal encoding declaration (<meta>) saying UTF-7 wouldn't get decoded as UTF-7, but I'm OK with that. > > In any case, please add a reftest that loads an HTML document that has <meta > > http-equiv="Content-Type" content="text/html; charset=utf-7"> and another with > > <meta http-equiv="Content-Type" content="text/html; > > charset=x-imap4-modified-utf7"> and content decodes differently in UTF-7 and > > Windows-1252. > > I don't mind doing this, but do you think it adds significant value beyond the > xpcshell test in attachment 466602 [details] [diff] [review]? Since there's not going to be an nsIUnicodeDecoder for UTF-7 in any configuration, I think it's not of significant value to add more tests. r=me.
Attachment #462912 - Flags: review?(hsivonen) → review+
Nominating for 2.0 (and I think should be backported to active branches as well).
blocking1.9.1: --- → ?
blocking1.9.2: --- → ?
blocking2.0: - → ?
We'll want this on branches after we figure out what we're doing on trunk and that it's not breaking things we care about. Not blocking a particular release.
blocking1.9.1: ? → needed
blocking1.9.2: ? → needed
(but I do hope it blocks Firefox 4)
blocking2.0: ? → betaN+
Blocks: html
Attached patch Flag utf-7 as XSS-vulnerable (obsolete) (deleted) — Splinter Review
Patch using the mechanism from bug 601429 to mark the charsets as vulnerable, which will make GetUnicodeDecoder return an error. This still depends on bug 587475, but that will be a lot simpler to fix now too.
Attachment #432349 - Attachment is obsolete: true
Attachment #432350 - Attachment is obsolete: true
Attachment #462912 - Attachment is obsolete: true
Attachment #466601 - Attachment is obsolete: true
It might make sense to reorder the .notForBrowser block so that x-imap4-modified-utf7 is next to utf-7, I was about to post asking why x-imap4-modified-utf7 doesn't get .notForBrowser. Also, some comments explaining the difference between .notForBrowser and .isXSSVulnerable would be nice; the former list has a lot of things that I don't immediately see what the problem is (us-ascii, for instance) and am left wondering whether others of those ought to be tagged xss-vulnerable.
.notForBrowser is *supposed* to mean "we need to be able to decode this if it appears in content, but we don't want to expose it in the UI", but you may well be right that some of the existing entries should become .isXSSVulnerable
blocking2.0: betaN+ → -
Attachment #491486 - Attachment is obsolete: true
Attachment #500009 - Flags: review?(VYV03354)
Attached patch Tests updated (deleted) — Splinter Review
Attachment #466602 - Attachment is obsolete: true
Attachment #500009 - Flags: review?(VYV03354) → review+
Attachment #500009 - Flags: approval2.0?
Comment on attachment 500009 [details] [diff] [review] Flag utf-7 as XSS-vulnerable v.2 I believe this has missed the boat and we shouldn't introduce any new risk at this point. This should wait for the next 3-month cycle.
Attachment #500009 - Flags: approval2.0? → approval2.0-
Depends on: post2.0
Flags: in-testsuite+
Whiteboard: [sg:want P3] → [sg:want P3][fixed-in-cedar]
Target Milestone: --- → mozilla2.2
Keywords: checkin-needed
Version: unspecified → Trunk
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: [sg:want P3][fixed-in-cedar] → [sg:want P3]
I've added dev-doc-needed as at least that page must be upgraded: https://developer.mozilla.org/en/Character_Sets_Supported_by_Gecko
Keywords: dev-doc-needed
Documented: https://developer.mozilla.org/en/Character_Sets_Supported_by_Gecko Also mentioned on Firefox 5 for developers; and while I was at it, I overhauled that page a bit and added some links to it. It's also tagged so it will show up when people go looking for things to do.
blocking1.9.1: needed → ---
blocking1.9.2: needed → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: