Closed
Bug 782412
Opened 12 years ago
Closed 12 years ago
Improve UTF-16 decoders
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
FIXED
mozilla20
People
(Reporter: bsurender, Assigned: emk)
References
(Blocks 1 open bug)
Details
Attachments
(6 files)
(deleted),
patch
|
smontagu
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
smontagu
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
smontagu
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
smontagu
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
hsivonen
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
akeybl
:
approval-mozilla-aurora-
|
Details | Diff | Splinter Review |
The current encoder for UTF-16 code units does not support checking, validating and handling of unpaired surrogates in the incoming UTF-16 code units.
Updated•12 years ago
|
Component: DOM → Internationalization
Component: DOM → Internationalization
Comment 1•12 years ago
|
||
The decoder (nsUCS2BEToUnicode.cpp) should be able to handle arbitrary Web content that might have unpaired surrogates in it, but shouldn't the encoders (in general) be able to assume they're being given sane content?
The API Bonnie is implementing takes arbitrary stuff from JS and feeds it to the encoder so we have to do sanitization somewhere, either in the encoder or at a previous stage (and looping through the text twice seems undesirable).
Comment 3•12 years ago
|
||
Is this really the only encoder with problems (of the ones you'll ever use, if it's clearly commented explaining what needs to be done to expand the set)?
Assignee | ||
Comment 4•12 years ago
|
||
Decoders should have implemented some error handling for bug 174351.
Assignee | ||
Comment 5•12 years ago
|
||
Renamed misleading filenames. Also, these files contains not only BE converter but also LE and BOM-sniffing converters.
Assignee | ||
Comment 6•12 years ago
|
||
This is required to support TextDecoder's "stream" option correctly.
Attachment #680388 -
Flags: review?(smontagu)
Assignee | ||
Comment 7•12 years ago
|
||
Attachment #680389 -
Flags: review?(smontagu)
Assignee | ||
Comment 8•12 years ago
|
||
This is needed for TextDecoder's "fatal" option.
Attachment #680390 -
Flags: review?(smontagu)
Assignee | ||
Comment 9•12 years ago
|
||
In theory, BOM detection is not a part of decoders. But this will greatly simplify the implement of TextDecoder. And part 3 will make it easier to add an option to disable BOM detection.
Attachment #680391 -
Flags: review?(hsivonen)
Assignee | ||
Comment 10•12 years ago
|
||
Green on try:
https://tbpl.mozilla.org/?tree=Try&rev=20f04d91e80c
Assignee | ||
Updated•12 years ago
|
Attachment #680388 -
Attachment is patch: true
(In reply to Masatoshi Kimura [:emk] from comment #9)
> Created attachment 680391 [details] [diff] [review]
> Part 5: Remove a workaround from dom/encoding
>
> In theory, BOM detection is not a part of decoders. But this will greatly
> simplify the implement of TextDecoder. And part 3 will make it easier to add
> an option to disable BOM detection.
I’m a bit confused about the purpose of this bug vs. the patches vs. other potential patches.
This bug says it’s about encoders. Why do the patches change decoders?
How exactly does decoder BOM handling change here? E.g. nsScriptLoader relies on the decoders to swallow the BOM.
Assignee | ||
Comment 12•12 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #11)
> I’m a bit confused about the purpose of this bug vs. the patches vs. other
> potential patches.
>
> This bug says it’s about encoders. Why do the patches change decoders?
Oh, I misread the bug title. I've morphed the bug because I already attached patches. I'll file a new bug if encoders needs to be fixed.
> How exactly does decoder BOM handling change here? E.g. nsScriptLoader
> relies on the decoders to swallow the BOM.
UTF-16BE decoder and UTF-16LE decoder will continue to swallow the BOM. This patch will only fix their poor behaviors on some edge cases.
For example, If only one byte was fed to the decoder, it returned NS_ERROR_ILLEGAL_INPUT. I added the own BOM handling to TextDecoder to avoid the bug. Without the workaround, TextDecoder("utf-16").decode(Uint8Array([0xFF]),{stream:true}) would throw the exception wrongly.
With this patch, UTF-16 decoders will work saner so that the workaround is no longer needed.
I didn't touch the BOM-sniffing decoder because it will not be used from the DOM after bug 801402 and eventually it will be removed.
Summary: Current nsIUnicodeEncoder, nsBasicEncoder, encoder implementation for UTF-16 does not support unpaired surrogates checking. → Improve UTF-16 decoders
Updated•12 years ago
|
Attachment #680387 -
Flags: review?(smontagu) → review+
Updated•12 years ago
|
Attachment #680390 -
Flags: review?(smontagu) → review+
Comment on attachment 680391 [details] [diff] [review]
Part 5: Remove a workaround from dom/encoding
r+ on the assumption that the other patches work as described.
Attachment #680391 -
Flags: review?(hsivonen) → review+
Updated•12 years ago
|
Attachment #680388 -
Flags: review?(smontagu) → review+
Updated•12 years ago
|
Attachment #680389 -
Flags: review?(smontagu) → review+
Assignee | ||
Updated•12 years ago
|
Keywords: checkin-needed
Comment 14•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/d9efe7e7b135
https://hg.mozilla.org/integration/mozilla-inbound/rev/3a77b9740c42
https://hg.mozilla.org/integration/mozilla-inbound/rev/c7cc475cbc41
https://hg.mozilla.org/integration/mozilla-inbound/rev/6d5d486cc044
https://hg.mozilla.org/integration/mozilla-inbound/rev/b960d3ce65bc
Flags: in-testsuite+
Keywords: checkin-needed
Comment 15•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/d9efe7e7b135
https://hg.mozilla.org/mozilla-central/rev/3a77b9740c42
https://hg.mozilla.org/mozilla-central/rev/c7cc475cbc41
https://hg.mozilla.org/mozilla-central/rev/6d5d486cc044
https://hg.mozilla.org/mozilla-central/rev/b960d3ce65bc
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Assignee | ||
Comment 16•12 years ago
|
||
[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 801402
User impact if declined: users will not be able to open Web pages encoded in UTF-16 big endian.
Testing completed (on m-c, etc.): baked on m-c for one weeks.
Risk to taking this patch (and alternatives if risky): moderately. If this is too risky, take attachment 684919 [details] [diff] [review] from bug 814900 instead.
String or UUID changes made by this patch: none.
Attachment #688972 -
Flags: approval-mozilla-aurora?
Comment 17•12 years ago
|
||
(In reply to Masatoshi Kimura [:emk] from comment #16)
> Created attachment 688972 [details] [diff] [review]
> Folded patch for aurora
>
> [Approval Request Comment]
> Bug caused by (feature/regressing bug #): bug 801402
> User impact if declined: users will not be able to open Web pages encoded in
> UTF-16 big endian.
> Testing completed (on m-c, etc.): baked on m-c for one weeks.
> Risk to taking this patch (and alternatives if risky): moderately. If this
> is too risky, take attachment 684919 [details] [diff] [review] from bug
> 814900 instead.
> String or UUID changes made by this patch: none.
What kind of risk are we looking at here? What sort of fallout would we see, could we potentially test for and catch early? Anything we can point QA to in order to be more sure of taking this fix over the one in but 814900? What is the difference between the two patches?
Assignee | ||
Comment 18•12 years ago
|
||
(In reply to Lukas Blakk [:lsblakk] from comment #17)
> What kind of risk are we looking at here? What sort of fallout would we
> see, could we potentially test for and catch early? Anything we can point
> QA to in order to be more sure of taking this fix over the one in but
> 814900? What is the difference between the two patches?
This patch rewritten UTF-16 decoder substantially. There might be a undiscovered problem in the new code.
Second patch just changes back the canonical name of UTF-16 to UTF-16 (rather than UTF-16) which is the same setting as Firefox 18 or earlier.
Assignee | ||
Comment 19•12 years ago
|
||
But I didn't find any specific concern at the moment.
Comment 20•12 years ago
|
||
(In reply to Masatoshi Kimura [:emk] from comment #18)
> (In reply to Lukas Blakk [:lsblakk] from comment #17)
> > What kind of risk are we looking at here? What sort of fallout would we
> > see, could we potentially test for and catch early? Anything we can point
> > QA to in order to be more sure of taking this fix over the one in but
> > 814900? What is the difference between the two patches?
>
> This patch rewritten UTF-16 decoder substantially. There might be a
> undiscovered problem in the new code.
Sounds like we should back out bug 801402 instead of taking this patch, since it wasn't ready to ride the trains.
Updated•12 years ago
|
Attachment #688972 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Comment 21•12 years ago
|
||
(In reply to Alex Keybl [:akeybl] from comment #20)
> (In reply to Masatoshi Kimura [:emk] from comment #18)
> > (In reply to Lukas Blakk [:lsblakk] from comment #17)
> > > What kind of risk are we looking at here? What sort of fallout would we
> > > see, could we potentially test for and catch early? Anything we can point
> > > QA to in order to be more sure of taking this fix over the one in but
> > > 814900? What is the difference between the two patches?
> >
> > This patch rewritten UTF-16 decoder substantially. There might be a
> > undiscovered problem in the new code.
>
> Sounds like we should back out bug 801402 instead of taking this patch,
> since it wasn't ready to ride the trains.
Nevermind, we'll take bug 814900.
You need to log in
before you can comment on or make changes to this bug.
Description
•