Closed
Bug 814900
Opened 12 years ago
Closed 12 years ago
Byte order mark/BOM in big-endian XML-files causes fatal error in Firefox 19 & 20
Categories
(Core :: XML, defect)
Tracking
()
RESOLVED
FIXED
mozilla20
People
(Reporter: xn--mlform-iua, Assigned: emk)
References
Details
(Keywords: regression)
Attachments
(6 files, 1 obsolete file)
(deleted),
application/xhtml+xml
|
Details | |
(deleted),
patch
|
hsivonen
:
review+
akeybl
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
(deleted),
application/xhtml+xml
|
Details | |
(deleted),
application/xhtml+xml
|
Details | |
(deleted),
application/xhtml+xml
|
Details | |
(deleted),
patch
|
hsivonen
:
review+
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50
Steps to reproduce:
Open a UTF-16-encoded, big-endian XML file (served as application/xhtml+xml) and which includes a BOM.
Actual results:
Firefox 19 and 20 emits yellow screen of death (aka "fatal error")
Expected results:
There should be no fatal error as this is the normal XML format (when UTF-16 is used).
Reporter | ||
Updated•12 years ago
|
Attachment #684883 -
Attachment description: bomtest.xhtml → An 16-bit, big-endian XHTML file with the BOM.
Attachment #684883 -
Attachment mime type: application/octet-stream → application/xhtml+xml
Reporter | ||
Updated•12 years ago
|
Severity: normal → major
OS: Mac OS X → All
Hardware: x86 → All
Reporter | ||
Updated•12 years ago
|
Component: Untriaged → XML
Product: Firefox → Core
Reporter | ||
Comment 1•12 years ago
|
||
This bug might be an effect of bug 716579. And on bug 809934 ?
Comment 2•12 years ago
|
||
Last good nightly: 2012-11-08
First bad nightly: 2012-11-09
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=36e99ea02c05&tochan
ge=90cea19e27e2
Assignee | ||
Comment 4•12 years ago
|
||
Workaround in case bug 782412 takes too long to review or it is too danger for branches.
Attachment #684919 -
Flags: review?(hsivonen)
Assignee | ||
Comment 5•12 years ago
|
||
Fail without the patch, pass with the patch.
Attachment #684928 -
Flags: review?(hsivonen)
Assignee | ||
Comment 6•12 years ago
|
||
Reporter | ||
Comment 7•12 years ago
|
||
(In reply to Masatoshi Kimura [:emk] from comment #5)
QUESTION: Does the patch also pass for e.g. big-endian documents WITHOUT a BOM? In Firefox 16/17/18, such documents are parsed OK. But not in Firefox 19/20. (For Firefox 17/18 - and IE/Opera — then even little-endian without BOM are parsed.)
NOTE: This bug is about documents with a BOM. But according Unicode, UTF-16 documents without a BOM are legally UTF-16, and should thus not cause a fatal error in XML. (XML has a MUST rule to use the BOM, but it does not have a corresponding fatal error when the BOM is omitted, as long as the document otherwise is "legal" UTF-16.)
(The reason I did not file this question initially is because of the BOM versus no-BOM differences.)
Assignee | ||
Comment 8•12 years ago
|
||
Per XML spec,
http://www.w3.org/TR/xml/#dt-error
> error
> [Definition: A violation of the rules of this specification; results are
> undefined. Unless otherwise specified, failure to observe a prescription of
> this specification indicated by one of the keywords MUST, REQUIRED, MUST NOT,
> SHALL and SHALL NOT is an error. Conforming software MAY detect and report
> an error and MAY recover from it.]
We don't have to recover for UTF-16 entities without the BOM even if the error is non-fatal. Chrome didn't recover from the error.
Comment on attachment 684919 [details] [diff] [review]
Temporarily assign utf-16 with sniffing decoder
Whoa. I didn't realize our UTF-16LE and UTF-16BE were non-sniffing.
Attachment #684919 -
Flags: review?(hsivonen) → review+
FWIW, I expect this bug (and fix) to apply to CSS and JS in addition to XML. The CSS and JS code, too, expects the label "UTF-16" to resolve to a decoder that sniffs the endianness and then discards the BOM.
Attachment #684928 -
Flags: review?(hsivonen) → review+
Blocks: 801402
Reporter | ||
Comment 11•12 years ago
|
||
(In reply to Masatoshi Kimura [:emk] from comment #8)
First: Please note that what I wrote was only meant to apply to big-endian UTF-16 without BOM. For UTF-16 little-endian without BOM, then I am not sure that it can be considered "legal UTF-16". After all, Unicode only lists big-endian without BOM plus little and big endian WITH BOM under the label UTF-16. If it cannot be considered "legal UTF-16", then it ought to be fatal error to send little-endian without BOM.
> Per XML spec,
> http://www.w3.org/TR/xml/#dt-error
> > error
> > [Definition: A violation of the rules of this specification; results are
> > undefined. Unless otherwise specified, failure to observe a prescription of
> > this specification indicated by one of the keywords MUST, REQUIRED, MUST NOT,
> > SHALL and SHALL NOT is an error. Conforming software MAY detect and report
> > an error and MAY recover from it.]
>
> We don't have to recover for UTF-16 entities without the BOM even if the
> error is non-fatal. Chrome didn't recover from the error.
I see. Good point. Thanks! So, for big-endian UTF-16 without BOM, then it is "extra" to recover from lack of BOM. But see what I said about little-endian without BOM, above.
Assignee | ||
Comment 12•12 years ago
|
||
(In reply to Leif Halvard Silli from comment #11)
I tested patched Nightly for all four patterns.
- UTF-16, bug endian with BOM: success
- UTF-16, little endian with BOM: success
- UTF-16, bug endian without BOM: error (allowed as I explained in comment #8)
- UTF-16LE (without BOM): error (because it is not "legal")
Do you consider this result is OK?
Assignee | ||
Comment 13•12 years ago
|
||
Assignee | ||
Comment 14•12 years ago
|
||
Assignee | ||
Comment 15•12 years ago
|
||
Assignee | ||
Comment 16•12 years ago
|
||
I tested with attached files.
Reporter | ||
Comment 17•12 years ago
|
||
(In reply to Masatoshi Kimura [:emk] from comment #12)
> Do you consider this result is OK?
Seem I can't claim that you break any spec.
Assignee | ||
Comment 18•12 years ago
|
||
- Added test cases for XML files encoded in utf-16, little endian, with or without BOM.
big endian with BOM file has not been added because the spec doesn't require any specific behavior.
- Added test cases for CSS and JS files.
Attachment #684928 -
Attachment is obsolete: true
Attachment #685612 -
Flags: review?(hsivonen)
Attachment #685612 -
Flags: review?(hsivonen) → review+
Reporter | ||
Comment 19•12 years ago
|
||
(In reply to Masatoshi Kimura [:emk] from comment #18)
> - Added test cases for XML files encoded in utf-16, little endian, with or
> without BOM.
For little-endian without BOM, the test checks that it *doesn't* work, right?
> big endian with BOM file has not been added because the spec doesn't
> require any specific behavior.
You may already have thought about this, but what about files that, via HTTP, are explicitly labelled either "UTF-16BE" or "UTF-16LE"? Do you *not* want to support that?
Clearly, XML does *only* require support for "UTF-16" and "UTF-8" files. Thus, it is not required that Firefox, for XML, supports the labels "UTF-16LE" and "UTF-16BE" via HTTP.
It seems to me that it might very well be the correct choice to *not* support "UTF-16LE" and "UTF-16BE" via HTTP. One reason why *not* supporting seems like the correct choice is that it, otherwise, would be possible to serve UTF-16LE/UTF-16BE via HTTP but impossible to do the same via file URLs - that seems like an unnecessary difference.
But as the UTF-16LE/UTF-16BE labels have been supported up until now, you should make a deliberate choice.
Assignee | ||
Comment 20•12 years ago
|
||
(In reply to Leif Halvard Silli from comment #19)
> (In reply to Masatoshi Kimura [:emk] from comment #18)
>
> > - Added test cases for XML files encoded in utf-16, little endian, with or
> > without BOM.
>
> For little-endian without BOM, the test checks that it *doesn't* work, right?
Right.
> > big endian with BOM file has not been added because the spec doesn't
> > require any specific behavior.
>
> You may already have thought about this, but what about files that, via
> HTTP, are explicitly labelled either "UTF-16BE" or "UTF-16LE"? Do you *not*
> want to support that?
>
> Clearly, XML does *only* require support for "UTF-16" and "UTF-8" files.
> Thus, it is not required that Firefox, for XML, supports the labels
> "UTF-16LE" and "UTF-16BE" via HTTP.
>
> It seems to me that it might very well be the correct choice to *not*
> support "UTF-16LE" and "UTF-16BE" via HTTP. One reason why *not* supporting
> seems like the correct choice is that it, otherwise, would be possible to
> serve UTF-16LE/UTF-16BE via HTTP but impossible to do the same via file URLs
> - that seems like an unnecessary difference.
>
> But as the UTF-16LE/UTF-16BE labels have been supported up until now, you
> should make a deliberate choice.
HTTP can provide a charset information from the transport layer. file URLs can't. It's the reason of difference.
FWIW, we really want to drop support for UTF-16BE/LE if it doesn't break the existing contents.
Assignee | ||
Comment 21•12 years ago
|
||
Comment on attachment 684919 [details] [diff] [review]
Temporarily assign utf-16 with sniffing decoder
This patch is no longer needed and bitrotted because of bug 782412.
Attachment #684919 -
Attachment is obsolete: true
Assignee | ||
Comment 22•12 years ago
|
||
Keywords: checkin-needed
Comment 23•12 years ago
|
||
Comment 24•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Assignee | ||
Comment 25•12 years ago
|
||
Comment on attachment 684919 [details] [diff] [review]
Temporarily assign utf-16 with sniffing decoder
[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 801402
User impact if declined: users will not be able to open Web pages encoded in UTF-16 big endian.
Testing completed (on m-c, etc.): tested locally, not applicable to m-c.
Risk to taking this patch (and alternatives if risky): very low. only one of this patch or attachment 688972 [details] [diff] [review] needed.
String or UUID changes made by this patch: none.
Attachment #684919 -
Attachment is obsolete: false
Attachment #684919 -
Flags: approval-mozilla-aurora?
Comment 26•12 years ago
|
||
Triage drive-by: waiting on responses to questions in bug 782412 comment 17
Updated•12 years ago
|
tracking-firefox19:
--- → +
Updated•12 years ago
|
Attachment #684919 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Assignee | ||
Updated•12 years ago
|
Keywords: checkin-needed
Whiteboard: [attachment=684919][branch=aurora]
Comment 27•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/93ecec9fb54c
https://hg.mozilla.org/releases/mozilla-aurora/rev/8cceb0a1ec30
status-firefox19:
--- → fixed
status-firefox20:
--- → fixed
Updated•12 years ago
|
Keywords: checkin-needed
Whiteboard: [attachment=684919][branch=aurora]
Comment 28•12 years ago
|
||
Reproduced the issue for the 2012-11-15 Nightly using the testcase from comment 0
No error found for Firefox 19.0 Beta 4 while using the same testcase
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0
Build ID: 20130130080006
You need to log in
before you can comment on or make changes to this bug.
Description
•