Closed Bug 638379 Opened 14 years ago Closed 12 years ago

make nsIUnicodeDecoder::SetInputErrorBehavior reliable and stop using nsIUnicodeDecoder::Reset for error handling

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla20

People

(Reporter: dbaron, Assigned: emk)

References

Details

Attachments

(3 files, 1 obsolete file)

Part 1: Implement kOnError_Recover to the UTF-8 decoder 12 years ago Masatoshi Kimura [:emk] (deleted), patch		Details \| Diff \| Splinter Review
Part 2: Fix nsMIMEHeaderParamImpl 12 years ago Masatoshi Kimura [:emk] (deleted), patch	hsivonen : review+	Details \| Diff \| Splinter Review
Part 3: Remove workaround for unreliable inputErrorBehavior 12 years ago Masatoshi Kimura [:emk] (deleted), patch	hsivonen : review+	Details \| Diff \| Splinter Review
Part 1: Implement kOnError_Recover to the UTF-8 decoder, v2 12 years ago Masatoshi Kimura [:emk] (deleted), patch	smontagu : review+	Details \| Diff \| Splinter Review

David Baron :dbaron:

Reporter

Description

•

14 years ago

We currently use nsIUnicodeDecoder::Reset for error handling, though we've added nsIUnicodeDecoder::SetInputErrorBehavior that's implemented in some decoders. If it were implemented reliably in all decoders, we could avoid the Reset() pattern. This Reset() pattern is bad (see bug 335944) because it confuses resetting a decoder for an entirely new stream with this error handling behavior. This means that, for example, we might redo BOM detection in the middle of a stream after error handling.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 1

•

14 years ago

Do we actually want the decoders to do BOM detection at all? Which callers benefit from it? For markup parsers, I'd be happier to have BOM sniffing be entirely the parser's responsibility. See also bug 634541.

Masatoshi Kimura [:emk]

Assignee

Comment 2

•

12 years ago

I wrote a workaround in bug 784367.

Masatoshi Kimura [:emk]

Assignee

Comment 4

•

12 years ago

Attached patch Part 1: Implement kOnError_Recover to the UTF-8 decoder (obsolete) (deleted) — Details — Splinter Review

Assignee: smontagu → VYV03354

Attachment #687025 - Flags: review?(smontagu)

Masatoshi Kimura [:emk]

Assignee

Comment 5

•

12 years ago

Attached patch Part 2: Fix nsMIMEHeaderParamImpl (deleted) — Details — Splinter Review

Henri, asking you for review because you reviewed bug 663057.

Attachment #687026 - Flags: review?(hsivonen)

Masatoshi Kimura [:emk]

Assignee

Comment 6

•

12 years ago

Attached patch Part 3: Remove workaround for unreliable inputErrorBehavior (deleted) — Details — Splinter Review

Attachment #687027 - Flags: review?(hsivonen)

Masatoshi Kimura [:emk]

Assignee

Updated

•

12 years ago

Attachment #687026 - Attachment is patch: true

Masatoshi Kimura [:emk]

Assignee

Comment 7

•

12 years ago

https://tbpl.mozilla.org/?tree=Try&rev=f3424ec4326f

Simon Montagu :smontagu

Comment 8

•

12 years ago

Comment on attachment 687025 [details] [diff] [review] Part 1: Implement kOnError_Recover to the UTF-8 decoder Review of attachment 687025 [details] [diff] [review]: ----------------------------------------------------------------- ::: intl/uconv/src/nsScriptableUConv.cpp @@ +260,5 @@ > > + if (NS_FAILED(rv) || !ccm) { > + return rv; > + } > + // get charset atom due to getting unicode converter Please remove this dead comment while you're here. @@ +283,5 @@ > + // Some callers depend on the UTF-8 decoder throws. > + nsAutoCString charset; > + ccm->GetCharsetAlias(mCharset.get(), charset); > + if (charset.EqualsLiteral("UTF-8")) { > + mDecoder->SetInputErrorBehavior(nsIUnicodeDecoder::kOnError_Signal); I don't really understand this bit. Why only UTF-8? And if it's "some callers", shouldn't the callers be setting the behaviour? I also don't like the call to GetCharsetAlias when GetUnicodeDecoder has already done that. You can leave it for now if there's no choice, but maybe we should have an API on the decoder/encoder to return its canonical charset name. Does that make sense?

Masatoshi Kimura [:emk]

Assignee

Comment 9

•

12 years ago

(In reply to Simon Montagu from comment #8) > Please remove this dead comment while you're here. Will do. > @@ +283,5 @@ > I don't really understand this bit. Why only UTF-8? Because only UTF-8 used to ignore kOnError_Recover. > And if it's "some > callers", shouldn't the callers be setting the behaviour? Fixing in-tree callers would be easy. But "some callers" include third-party add-ons because nsIScriptableUnicodeConverter is scriptable. So it's nearly impossible to audit and fix all callers. Moreover, nsIScriptableUnicodeConverter does not provide a way to set the behavior. I want to avoid as far as possible changing this legacy interface. Eventually it should be replaced by TextEncoder and TextDecoder. > I also don't like the call to GetCharsetAlias when GetUnicodeDecoder has > already done that. You can leave it for now if there's no choice, but maybe > we should have an API on the decoder/encoder to return its canonical charset > name. Does that make sense? Another option is exposing nsCharsetAlias::GetPreferredInternal to nsIScriptableUnicodeConverter and use it with GetUnicodeDecoderRaw.

Masatoshi Kimura [:emk]

Assignee

Comment 10

•

12 years ago

Attached patch Part 1: Implement kOnError_Recover to the UTF-8 decoder, v2 (deleted) — Details — Splinter Review

- Removed a bogus comment. - Elaborated a comment a bit more. - nsScriptableUnicodeConverter is now a friend of nsCharsetAlias. Try result: https://tbpl.mozilla.org/?tree=Try&rev=d02d6a13249c

Attachment #687025 - Attachment is obsolete: true

Attachment #687025 - Flags: review?(smontagu)

Attachment #689963 - Flags: review?(smontagu)

Simon Montagu :smontagu

Updated

•

12 years ago

Attachment #689963 - Flags: review?(smontagu) → review+

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 11

•

12 years ago

Comment on attachment 687027 [details] [diff] [review] Part 3: Remove workaround for unreliable inputErrorBehavior r+ on the assumption that the default behavior across all decoders really is to produce the REPLACEMENT CHARACTER in the decoder and never to return NS_ERROR_ILLEGAL_INPUT.

Attachment #687027 - Flags: review?(hsivonen) → review+

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 12

•

12 years ago

Comment on attachment 687026 [details] [diff] [review] Part 2: Fix nsMIMEHeaderParamImpl Why is this change needed?

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Updated

•

12 years ago

Blocks: 819868

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 13

•

12 years ago

Filed bug 819868 as a follow-up.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 14

•

12 years ago

Also filed bug 819869.

Blocks: 819869

Masatoshi Kimura [:emk]

Assignee

Comment 15

•

12 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #12) > Comment on attachment 687026 [details] [diff] [review] > Part 2: Fix nsMIMEHeaderParamImpl > > Why is this change needed? netwerk/test/unit/test_MIME_params.js failed without the change. It seems that the function expects that the conversion will fail if the input is invalid as UTF-8. Unlike nsIScriptableUnicodeConverter, ConvertStringToUTF8 already had aAllowSubstitution parameter. So I changed the caller instead of simulating the old quirks in nsUTF8ConverterService.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 16

•

12 years ago

Comment on attachment 687026 [details] [diff] [review] Part 2: Fix nsMIMEHeaderParamImpl OK. Thanks for the explanation.

Attachment #687026 - Flags: review?(hsivonen) → review+

Masatoshi Kimura [:emk]

Assignee

Updated

•

12 years ago

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 17

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/78b0678417f9 https://hg.mozilla.org/integration/mozilla-inbound/rev/f2d8e0807127 https://hg.mozilla.org/integration/mozilla-inbound/rev/85211b40ba37

Keywords: checkin-needed

Ed Morley [:emorley]

Comment 18

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/78b0678417f9 https://hg.mozilla.org/mozilla-central/rev/f2d8e0807127 https://hg.mozilla.org/mozilla-central/rev/85211b40ba37

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla20

:Ms2ger (he/him; ⌚ UTC+1/+2)

Updated

•

12 years ago

Depends on: 820791

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

12 years ago

Depends on: 820959

Alice0775 White

Updated

•

12 years ago

Depends on: 858433

Alice0775 White

Updated

•

12 years ago

Depends on: 859172

Daniel Veditz [:dveditz]

Updated

•

12 years ago

Depends on: 843434

You need to log in before you can comment on or make changes to this bug.