Closed
Bug 562590
Opened 15 years ago
Closed 7 years ago
Bad byte right before EOF is not turned into the REPLACEMENT CHARACTER
Categories
(Core :: DOM: HTML Parser, defect, P3)
Core
DOM: HTML Parser
Tracking
()
RESOLVED
FIXED
mozilla56
Tracking | Status | |
---|---|---|
firefox56 | --- | fixed |
People
(Reporter: hsivonen, Assigned: hsivonen)
References
(Blocks 1 open bug, )
Details
(Whiteboard: [good first bug])
Attachments
(1 file)
Steps to reproduce (both the old parser and the HTML5 parser have this bug):
1) Load data:text/html;charset=utf-8,x%c2
2) Compare with data:text/html;charset=utf-8,x%c2x
Actual results:
A bad byte is turned into the REPLACEMENT CHARACTER only if there's another byte after it.
Expected results:
Expected a bad byte to turn into the REPLACEMENT CHARACTER even when followed immediately by the EOF.
Comment 1•11 years ago
|
||
Henri, Anne - any updated thoughts here?
Over on Chromium we're trying to decide how to tackle this. bug: http://crbug.com277035 patch: http://crrev.com/23532016
Chromium flushes the decoder at the end of fetches (page load, XHR) *but* the UTF-16 decoder is buggy and the flush ends up as a no-op. So... for UTF-16, Chromium and Gecko happen to be bug-compatible here. :(
If there's no intent to fix this issue, Chromium could disable flushing at the end of fetches to become compatible with Gecko; that would merit an addition to the Encoding and/or Fetch specs. I'd prefer that we correctly flush in all cases, though, but that's a bigger change for Gecko.
(There's no impact on the Encoding script API here, so far as I can tell; the behavior there can be independent of what's done for fetching.)
Comment 2•11 years ago
|
||
Sorry, typo in the bug URL: http://crbug.com/277035
Comment 3•11 years ago
|
||
Joshua, I think we want to do the right thing here. Emitting U+FFFD if there's an incomplete sequence when you hit EOF. I believe this is what the combination of Encoding, Fetch, and HTML already defines.
I take it that by flushing you mean that Chromium already does the correct thing?
Comment 4•11 years ago
|
||
(In reply to Anne (:annevk) from comment #3)
> I take it that by flushing you mean that Chromium already does the correct
> thing?
Yes (mostly). By flushing I meant ensuring the EOF propagates from "fetch" through "encoding", emitting U+FFFD. From discussion (email?) I was led to believe that Gecko is missing the plumbing there. Chromium does propagate the EOF, it just has a bug in the UTF-16 decoder that causes the it to be ignored, leading to "bug compatible' behavior with Gecko.
Comment 5•11 years ago
|
||
I recommend fixing the bug in Chromium and shaming Gecko (or better, provide us with a patch :-)).
Assignee | ||
Comment 6•9 years ago
|
||
The HTML parser should remember if the last call to nsIUnicodeDecoder::Convert returned NS_OK_UDEC_MOREINPUT or NS_OK. At end of input from network, if the remembered result is NS_OK_UDEC_MOREINPUT, append one REPLACEMENT CHARACTER to output.
Bug 912470 will update the API doc to actually say this.
Assignee | ||
Updated•9 years ago
|
Whiteboard: [good first bug]
Assignee | ||
Comment 7•7 years ago
|
||
I intend to actually fix this now that bug 1261841 put in place nicer infra.
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8880350 -
Flags: review?(VYV03354)
Comment 11•7 years ago
|
||
mozreview-review |
Comment on attachment 8880350 [details]
Bug 562590 - Make incomplete byte sequences near HTML EOF emit a REPLACEMENT CHARACTER.
https://reviewboard.mozilla.org/r/151716/#review159298
Attachment #8880350 -
Flags: review?(VYV03354) → review+
Comment 12•7 years ago
|
||
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again.
hg error in cmd: hg rebase -s 678da39b4708 -d 1bc50cd63349: rebasing 405139:678da39b4708 "Bug 562590 - Make incomplete byte sequences near HTML EOF emit a REPLACEMENT CHARACTER. r=emk" (tip)
merging testing/web-platform/meta/MANIFEST.json
warning: conflicts while merging testing/web-platform/meta/MANIFEST.json! (edit, then use 'hg resolve --mark')
unresolved conflicts (see hg resolve, then hg rebase --continue)
Comment hidden (mozreview-request) |
Comment 14•7 years ago
|
||
Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f0a1809a7ee1
Make incomplete byte sequences near HTML EOF emit a REPLACEMENT CHARACTER. r=emk
Comment 15•7 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox56:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
You need to log in
before you can comment on or make changes to this bug.
Description
•