Closed Bug 814900 Opened 12 years ago Closed 12 years ago

Byte order mark/BOM in big-endian XML-files causes fatal error in Firefox 19 & 20

Categories

(Core :: XML, defect)

19 Branch
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla20
Tracking Status
firefox19 + verified
firefox20 --- fixed

People

(Reporter: xn--mlform-iua, Assigned: emk)

References

Details

(Keywords: regression)

Attachments

(6 files, 1 obsolete file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50 Steps to reproduce: Open a UTF-16-encoded, big-endian XML file (served as application/xhtml+xml) and which includes a BOM. Actual results: Firefox 19 and 20 emits yellow screen of death (aka "fatal error") Expected results: There should be no fatal error as this is the normal XML format (when UTF-16 is used).
Attachment #684883 - Attachment description: bomtest.xhtml → An 16-bit, big-endian XHTML file with the BOM.
Attachment #684883 - Attachment mime type: application/octet-stream → application/xhtml+xml
Severity: normal → major
OS: Mac OS X → All
Hardware: x86 → All
Component: Untriaged → XML
Product: Firefox → Core
This bug might be an effect of bug 716579. And on bug 809934 ?
Last good nightly: 2012-11-08 First bad nightly: 2012-11-09 Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=36e99ea02c05&tochan ge=90cea19e27e2
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: regression
Bug 782412 fixed this.
Depends on: 782412
Workaround in case bug 782412 takes too long to review or it is too danger for branches.
Attachment #684919 - Flags: review?(hsivonen)
Attached patch mochitest (obsolete) (deleted) — Splinter Review
Fail without the patch, pass with the patch.
Attachment #684928 - Flags: review?(hsivonen)
(In reply to Masatoshi Kimura [:emk] from comment #5) QUESTION: Does the patch also pass for e.g. big-endian documents WITHOUT a BOM? In Firefox 16/17/18, such documents are parsed OK. But not in Firefox 19/20. (For Firefox 17/18 - and IE/Opera — then even little-endian without BOM are parsed.) NOTE: This bug is about documents with a BOM. But according Unicode, UTF-16 documents without a BOM are legally UTF-16, and should thus not cause a fatal error in XML. (XML has a MUST rule to use the BOM, but it does not have a corresponding fatal error when the BOM is omitted, as long as the document otherwise is "legal" UTF-16.) (The reason I did not file this question initially is because of the BOM versus no-BOM differences.)
Per XML spec, http://www.w3.org/TR/xml/#dt-error > error > [Definition: A violation of the rules of this specification; results are > undefined. Unless otherwise specified, failure to observe a prescription of > this specification indicated by one of the keywords MUST, REQUIRED, MUST NOT, > SHALL and SHALL NOT is an error. Conforming software MAY detect and report > an error and MAY recover from it.] We don't have to recover for UTF-16 entities without the BOM even if the error is non-fatal. Chrome didn't recover from the error.
Comment on attachment 684919 [details] [diff] [review] Temporarily assign utf-16 with sniffing decoder Whoa. I didn't realize our UTF-16LE and UTF-16BE were non-sniffing.
Attachment #684919 - Flags: review?(hsivonen) → review+
FWIW, I expect this bug (and fix) to apply to CSS and JS in addition to XML. The CSS and JS code, too, expects the label "UTF-16" to resolve to a decoder that sniffs the endianness and then discards the BOM.
(In reply to Masatoshi Kimura [:emk] from comment #8) First: Please note that what I wrote was only meant to apply to big-endian UTF-16 without BOM. For UTF-16 little-endian without BOM, then I am not sure that it can be considered "legal UTF-16". After all, Unicode only lists big-endian without BOM plus little and big endian WITH BOM under the label UTF-16. If it cannot be considered "legal UTF-16", then it ought to be fatal error to send little-endian without BOM. > Per XML spec, > http://www.w3.org/TR/xml/#dt-error > > error > > [Definition: A violation of the rules of this specification; results are > > undefined. Unless otherwise specified, failure to observe a prescription of > > this specification indicated by one of the keywords MUST, REQUIRED, MUST NOT, > > SHALL and SHALL NOT is an error. Conforming software MAY detect and report > > an error and MAY recover from it.] > > We don't have to recover for UTF-16 entities without the BOM even if the > error is non-fatal. Chrome didn't recover from the error. I see. Good point. Thanks! So, for big-endian UTF-16 without BOM, then it is "extra" to recover from lack of BOM. But see what I said about little-endian without BOM, above.
(In reply to Leif Halvard Silli from comment #11) I tested patched Nightly for all four patterns. - UTF-16, bug endian with BOM: success - UTF-16, little endian with BOM: success - UTF-16, bug endian without BOM: error (allowed as I explained in comment #8) - UTF-16LE (without BOM): error (because it is not "legal") Do you consider this result is OK?
Attached file UTF-16, big endian without BOM (deleted) —
Attached file UTF-16, little endian with BOM (deleted) —
Attached file UTF-16LE (without BOM) (deleted) —
I tested with attached files.
(In reply to Masatoshi Kimura [:emk] from comment #12) > Do you consider this result is OK? Seem I can't claim that you break any spec.
Attached patch mochitest v2 (deleted) — Splinter Review
- Added test cases for XML files encoded in utf-16, little endian, with or without BOM. big endian with BOM file has not been added because the spec doesn't require any specific behavior. - Added test cases for CSS and JS files.
Attachment #684928 - Attachment is obsolete: true
Attachment #685612 - Flags: review?(hsivonen)
(In reply to Masatoshi Kimura [:emk] from comment #18) > - Added test cases for XML files encoded in utf-16, little endian, with or > without BOM. For little-endian without BOM, the test checks that it *doesn't* work, right? > big endian with BOM file has not been added because the spec doesn't > require any specific behavior. You may already have thought about this, but what about files that, via HTTP, are explicitly labelled either "UTF-16BE" or "UTF-16LE"? Do you *not* want to support that? Clearly, XML does *only* require support for "UTF-16" and "UTF-8" files. Thus, it is not required that Firefox, for XML, supports the labels "UTF-16LE" and "UTF-16BE" via HTTP. It seems to me that it might very well be the correct choice to *not* support "UTF-16LE" and "UTF-16BE" via HTTP. One reason why *not* supporting seems like the correct choice is that it, otherwise, would be possible to serve UTF-16LE/UTF-16BE via HTTP but impossible to do the same via file URLs - that seems like an unnecessary difference. But as the UTF-16LE/UTF-16BE labels have been supported up until now, you should make a deliberate choice.
(In reply to Leif Halvard Silli from comment #19) > (In reply to Masatoshi Kimura [:emk] from comment #18) > > > - Added test cases for XML files encoded in utf-16, little endian, with or > > without BOM. > > For little-endian without BOM, the test checks that it *doesn't* work, right? Right. > > big endian with BOM file has not been added because the spec doesn't > > require any specific behavior. > > You may already have thought about this, but what about files that, via > HTTP, are explicitly labelled either "UTF-16BE" or "UTF-16LE"? Do you *not* > want to support that? > > Clearly, XML does *only* require support for "UTF-16" and "UTF-8" files. > Thus, it is not required that Firefox, for XML, supports the labels > "UTF-16LE" and "UTF-16BE" via HTTP. > > It seems to me that it might very well be the correct choice to *not* > support "UTF-16LE" and "UTF-16BE" via HTTP. One reason why *not* supporting > seems like the correct choice is that it, otherwise, would be possible to > serve UTF-16LE/UTF-16BE via HTTP but impossible to do the same via file URLs > - that seems like an unnecessary difference. > > But as the UTF-16LE/UTF-16BE labels have been supported up until now, you > should make a deliberate choice. HTTP can provide a charset information from the transport layer. file URLs can't. It's the reason of difference. FWIW, we really want to drop support for UTF-16BE/LE if it doesn't break the existing contents.
Comment on attachment 684919 [details] [diff] [review] Temporarily assign utf-16 with sniffing decoder This patch is no longer needed and bitrotted because of bug 782412.
Attachment #684919 - Attachment is obsolete: true
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Depends on: 817643
Comment on attachment 684919 [details] [diff] [review] Temporarily assign utf-16 with sniffing decoder [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 801402 User impact if declined: users will not be able to open Web pages encoded in UTF-16 big endian. Testing completed (on m-c, etc.): tested locally, not applicable to m-c. Risk to taking this patch (and alternatives if risky): very low. only one of this patch or attachment 688972 [details] [diff] [review] needed. String or UUID changes made by this patch: none.
Attachment #684919 - Attachment is obsolete: false
Attachment #684919 - Flags: approval-mozilla-aurora?
Triage drive-by: waiting on responses to questions in bug 782412 comment 17
Attachment #684919 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Keywords: checkin-needed
Whiteboard: [attachment=684919][branch=aurora]
Keywords: checkin-needed
Whiteboard: [attachment=684919][branch=aurora]
Reproduced the issue for the 2012-11-15 Nightly using the testcase from comment 0 No error found for Firefox 19.0 Beta 4 while using the same testcase Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0 Build ID: 20130130080006
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: