Closed
Bug 638379
Opened 14 years ago
Closed 12 years ago
make nsIUnicodeDecoder::SetInputErrorBehavior reliable and stop using nsIUnicodeDecoder::Reset for error handling
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla20
People
(Reporter: dbaron, Assigned: emk)
References
Details
Attachments
(3 files, 1 obsolete file)
(deleted),
patch
|
hsivonen
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
hsivonen
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
smontagu
:
review+
|
Details | Diff | Splinter Review |
We currently use nsIUnicodeDecoder::Reset for error handling, though we've added nsIUnicodeDecoder::SetInputErrorBehavior that's implemented in some decoders. If it were implemented reliably in all decoders, we could avoid the Reset() pattern. This Reset() pattern is bad (see bug 335944) because it confuses resetting a decoder for an entirely new stream with this error handling behavior. This means that, for example, we might redo BOM detection in the middle of a stream after error handling.
Do we actually want the decoders to do BOM detection at all? Which callers benefit from it? For markup parsers, I'd be happier to have BOM sniffing be entirely the parser's responsibility. See also bug 634541.
Assignee | ||
Comment 2•12 years ago
|
||
I wrote a workaround in bug 784367.
Assignee | ||
Comment 4•12 years ago
|
||
Assignee: smontagu → VYV03354
Attachment #687025 -
Flags: review?(smontagu)
Assignee | ||
Comment 5•12 years ago
|
||
Henri, asking you for review because you reviewed bug 663057.
Attachment #687026 -
Flags: review?(hsivonen)
Assignee | ||
Comment 6•12 years ago
|
||
Attachment #687027 -
Flags: review?(hsivonen)
Assignee | ||
Updated•12 years ago
|
Attachment #687026 -
Attachment is patch: true
Assignee | ||
Comment 7•12 years ago
|
||
Comment 8•12 years ago
|
||
Comment on attachment 687025 [details] [diff] [review]
Part 1: Implement kOnError_Recover to the UTF-8 decoder
Review of attachment 687025 [details] [diff] [review]:
-----------------------------------------------------------------
::: intl/uconv/src/nsScriptableUConv.cpp
@@ +260,5 @@
>
> + if (NS_FAILED(rv) || !ccm) {
> + return rv;
> + }
> + // get charset atom due to getting unicode converter
Please remove this dead comment while you're here.
@@ +283,5 @@
> + // Some callers depend on the UTF-8 decoder throws.
> + nsAutoCString charset;
> + ccm->GetCharsetAlias(mCharset.get(), charset);
> + if (charset.EqualsLiteral("UTF-8")) {
> + mDecoder->SetInputErrorBehavior(nsIUnicodeDecoder::kOnError_Signal);
I don't really understand this bit. Why only UTF-8? And if it's "some callers", shouldn't the callers be setting the behaviour?
I also don't like the call to GetCharsetAlias when GetUnicodeDecoder has already done that. You can leave it for now if there's no choice, but maybe we should have an API on the decoder/encoder to return its canonical charset name. Does that make sense?
Assignee | ||
Comment 9•12 years ago
|
||
(In reply to Simon Montagu from comment #8)
> Please remove this dead comment while you're here.
Will do.
> @@ +283,5 @@
> I don't really understand this bit. Why only UTF-8?
Because only UTF-8 used to ignore kOnError_Recover.
> And if it's "some
> callers", shouldn't the callers be setting the behaviour?
Fixing in-tree callers would be easy.
But "some callers" include third-party add-ons because nsIScriptableUnicodeConverter is scriptable. So it's nearly impossible to audit and fix all callers.
Moreover, nsIScriptableUnicodeConverter does not provide a way to set the behavior. I want to avoid as far as possible changing this legacy interface. Eventually it should be replaced by TextEncoder and TextDecoder.
> I also don't like the call to GetCharsetAlias when GetUnicodeDecoder has
> already done that. You can leave it for now if there's no choice, but maybe
> we should have an API on the decoder/encoder to return its canonical charset
> name. Does that make sense?
Another option is exposing nsCharsetAlias::GetPreferredInternal to nsIScriptableUnicodeConverter and use it with GetUnicodeDecoderRaw.
Assignee | ||
Comment 10•12 years ago
|
||
- Removed a bogus comment.
- Elaborated a comment a bit more.
- nsScriptableUnicodeConverter is now a friend of nsCharsetAlias.
Try result:
https://tbpl.mozilla.org/?tree=Try&rev=d02d6a13249c
Attachment #687025 -
Attachment is obsolete: true
Attachment #687025 -
Flags: review?(smontagu)
Attachment #689963 -
Flags: review?(smontagu)
Updated•12 years ago
|
Attachment #689963 -
Flags: review?(smontagu) → review+
Comment on attachment 687027 [details] [diff] [review]
Part 3: Remove workaround for unreliable inputErrorBehavior
r+ on the assumption that the default behavior across all decoders really is to produce the REPLACEMENT CHARACTER in the decoder and never to return NS_ERROR_ILLEGAL_INPUT.
Attachment #687027 -
Flags: review?(hsivonen) → review+
Comment on attachment 687026 [details] [diff] [review]
Part 2: Fix nsMIMEHeaderParamImpl
Why is this change needed?
Blocks: 819868
Filed bug 819868 as a follow-up.
Also filed bug 819869.
Blocks: 819869
Assignee | ||
Comment 15•12 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #12)
> Comment on attachment 687026 [details] [diff] [review]
> Part 2: Fix nsMIMEHeaderParamImpl
>
> Why is this change needed?
netwerk/test/unit/test_MIME_params.js failed without the change. It seems that the function expects that the conversion will fail if the input is invalid as UTF-8.
Unlike nsIScriptableUnicodeConverter, ConvertStringToUTF8 already had aAllowSubstitution parameter. So I changed the caller instead of simulating the old quirks in nsUTF8ConverterService.
Comment on attachment 687026 [details] [diff] [review]
Part 2: Fix nsMIMEHeaderParamImpl
OK. Thanks for the explanation.
Attachment #687026 -
Flags: review?(hsivonen) → review+
Assignee | ||
Updated•12 years ago
|
Keywords: checkin-needed
Comment 17•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/78b0678417f9
https://hg.mozilla.org/integration/mozilla-inbound/rev/f2d8e0807127
https://hg.mozilla.org/integration/mozilla-inbound/rev/85211b40ba37
Keywords: checkin-needed
Comment 18•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/78b0678417f9
https://hg.mozilla.org/mozilla-central/rev/f2d8e0807127
https://hg.mozilla.org/mozilla-central/rev/85211b40ba37
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
You need to log in
before you can comment on or make changes to this bug.
Description
•