Closed Bug 943268 Opened 11 years ago Closed 11 years ago

Remove nsCharsetAlias and nsCharsetConverterManager

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla32

People

(Reporter: hsivonen, Assigned: hsivonen)

References

Details

(Keywords: addon-compat, dev-doc-needed)

Attachments

(1 file, 12 obsolete files)

WIP 11 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch		Details \| Diff \| Splinter Review
m-c patch, no test changes yet 11 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch		Details \| Diff \| Splinter Review
m-c patch, no test changes yet, zap dead defines 11 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch		Details \| Diff \| Splinter Review
c-c WIP that doesn't work 11 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch		Details \| Diff \| Splinter Review
More m-c WIP 11 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch		Details \| Diff \| Splinter Review
m-c patch 11 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch	emk : review-	Details \| Diff \| Splinter Review
WIP 11 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch		Details \| Diff \| Splinter Review
c-c WIP that still doesn't work 11 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch		Details \| Diff \| Splinter Review
Remove nsCharsetConverterManager and nsCharsetAlias 11 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch		Details \| Diff \| Splinter Review
Remove nsCharsetConverterManager and nsCharsetAlias, v2 11 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch		Details \| Diff \| Splinter Review
Remove nsCharsetConverterManager and nsCharsetAlias, accommodate Linux 32 debug 11 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch		Details \| Diff \| Splinter Review
Remove nsCharsetConverterManager and nsCharsetAlias addressing review comments 11 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch	emk : review+	Details \| Diff \| Splinter Review
Drop "Find" from FindEncodingForLabel* 11 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch		Details \| Diff \| Splinter Review

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Description

•

11 years ago

Most uses of nsCharsetAlias and nsCharsetConverterManager in Firefox have been replaced by the less COMtaminated and more Encoding Standard-compliant mozilla::dom::EncodingUtils. Once the remaining uses go away, nsCharsetAlias and nsCharsetConverterManager should move to comm-central (or be removed if comm-central introduces an email-oriented analog of mozilla::dom::EncodingUtils or drops support for non-Encoding Standard encodings and moves to using mozilla::dom::EncodingUtils).

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Updated

•

11 years ago

Depends on: 943270

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Updated

•

11 years ago

Depends on: 943272

Mats Palmgren (inactive)

Updated

•

11 years ago

Severity: normal → enhancement

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Updated

•

11 years ago

No longer depends on: 943272

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 1

•

11 years ago

Attached patch WIP (obsolete) (deleted) — Details — Splinter Review

Assignee: nobody → hsivonen

Status: NEW → ASSIGNED

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 2

•

11 years ago

Attached patch m-c patch, no test changes yet (obsolete) (deleted) — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=0e0e1b02fe35

Attachment #8406823 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 3

•

11 years ago

Attached patch m-c patch, no test changes yet, zap dead defines (obsolete) (deleted) — Details — Splinter Review

Attachment #8407417 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 4

•

11 years ago

Attached patch c-c WIP that doesn't work (obsolete) (deleted) — Details — Splinter Review

prop2arrays isn't working. It's not exactly clear to me what I'm doing wrong. Also, jar manifests are missing pending example from bug 943252.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Updated

•

11 years ago

Depends on: 943252
No longer depends on: 809347

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 5

•

11 years ago

Attached patch More m-c WIP (obsolete) (deleted) — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=155086ad089b

Attachment #8407443 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 6

•

11 years ago

Attached patch m-c patch (obsolete) (deleted) — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=4f130d9e6d3e

Attachment #8410224 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 7

•

11 years ago

Comment on attachment 8410886 [details] [diff] [review] m-c patch Notes: * This removes the ability to use x-euc-tw as an nsIPlatformCharset. * This removes the ability to have aliases for "internal" (i.e. non-Encoding Standard) encodings in nsIScriptableUConv. (I.e. in the internal mode, only Gecko-canonical names work.) * Extension compat: nsIScriptableUConv (in the non-internal mode), nsIConverterInputStream and nsIConverterOutputStream now only accept Encoding Standard labels. * However, for compatibility, nsIConverterInputStream and nsIConverterOutputStream give the old meaning to the label UTF-16. * UTF-16 and ISO-8859-1 remain as special Gecko-canonical encoding names when used without Encoding Standard label resolution. (Getting these special cases is out of scope for this bug, IMO.) * Language group mappings are now only supported for Encoding Standard encodings. Thunderbird will have to find another way (e.g. synthetic lang attribute) to get the same layout font selection effects for non-Encoding Standard encodings.

Attachment #8410886 - Attachment description: Fix more tests → m-c patch

Attachment #8410886 - Flags: review?(VYV03354)

Masatoshi Kimura [:emk]

Comment 8

•

11 years ago

Comment on attachment 8410886 [details] [diff] [review] m-c patch Review of attachment 8410886 [details] [diff] [review]: ----------------------------------------------------------------- ::: content/base/src/nsDocumentEncoder.cpp @@ +1178,5 @@ > return NS_ERROR_NOT_INITIALIZED; > > + nsAutoCString encoding; > + if (!EncodingUtils::FindEncodingForLabel(mCharset, encoding) || > + encoding.EqualsLiteral("replacement")) { Could you add a method or an optional parameter to indicate that the caller wants the replacement encoding rather than adding a compare everywhere? The most callers will not want the replacement encoding. ::: intl/uconv/idl/nsIScriptableUConv.idl @@ +78,2 @@ > */ > attribute boolean isInternal; Is this attribute still needed at all? ::: intl/uconv/src/nsConverterInputStream.cpp @@ +39,5 @@ > + // Compat with old test cases. Unclear if any extensions really care. > + encoding.Assign(label); > + } else if (!EncodingUtils::FindEncodingForLabel(label, encoding)) { > + // Weird API design, but retaining for compat > + encoding.AssignLiteral("ISO-8859-1"); The old code used to fallback only if aCharset was null. But it would return early in that case. So I think the old fallback was already dead. ::: intl/uconv/src/nsConverterOutputStream.cpp @@ +44,5 @@ > + encoding.Assign(label); > + } else if (!EncodingUtils::FindEncodingForLabel(label, encoding) || > + encoding.EqualsLiteral("replacement")) { > + // Weird API design, but retaining for compat > + encoding.AssignLiteral("ISO-8859-1"); The old code didn't have this (dead) fallback for the encoder. ::: intl/uconv/src/nsTextToSubURI.cpp @@ +24,5 @@ > > NS_IMETHODIMP nsTextToSubURI::ConvertAndEscape( > const char *charset, const char16_t *text, char **_retval) > { > + if(nullptr == _retval) { nit: if (!_retval) { ::: intl/uconv/tests/test_long_doc.html @@ +28,2 @@ > > +var decoders = [ Could you put the decoder name list in a common header instead of copy & pasting everywhere? ::: intl/uconv/ucvcn/nsISO2022CNToUnicode.cpp @@ +13,5 @@ > nsresult rv; > > if(!mGB2312_Decoder) { > // creating a delegate converter (GB2312) > + mGB2312_Decoder = EncodingUtils::DecoderForEncoding(NS_LITERAL_CSTRING("GB2312")); This will always fail. GB2312 is not a canonical name. @@ +29,5 @@ > nsresult rv; > > if(!mEUCTW_Decoder) { > // creating a delegate converter (x-euc-tw) > + mEUCTW_Decoder = EncodingUtils::DecoderForEncoding(NS_LITERAL_CSTRING("x-euc-tw")); This will always fail. x-euc-tw is no longer supported as you said. Why don't you just remove the ISO-2022-CN decoder especially when instantiating the decoder will always fail? Moreover, the decoder has been broken for a long time (bug 470523), but nobody cares. ::: intl/uconv/ucvja/nsJapaneseToUnicode.cpp @@ +763,5 @@ > } else { > if (!mGB2312Decoder) { > // creating a delegate converter (GB2312) > + mGB2312Decoder = > + EncodingUtils::DecoderForEncoding(NS_LITERAL_CSTRING("GB2312")); "gbk" until we decide to remove this in bug 996599. ::: intl/unicharutil/src/nsSaveAsCharset.cpp @@ +327,5 @@ > nsresult nsSaveAsCharset::SetupUnicodeEncoder(const char* charset) > { > NS_ENSURE_ARG(charset); > + nsDependentCString encoding(charset); > + mEncoder = EncodingUtils::EncoderForEncoding(encoding); |charset| can have any random value because nsSaveAsCharset::Init is scriptable. ::: js/xpconnect/src/XPCLocale.cpp @@ +192,5 @@ > if (NS_SUCCEEDED(rv)) { > nsAutoCString charset; > rv = platformCharset->GetDefaultCharsetForLocale(localeStr, charset); > if (NS_SUCCEEDED(rv)) { > + mDecoder = EncodingUtils::DecoderForEncoding(charset); Are you sure GetDefaultCharsetForLocale will always return Gecko-canonical name on UNIX? ::: layout/base/tests/test_bug399284.html @@ -73,5 @@ > > -function lastTest(frame) > -{ > - testFontSize(frame); > - SimpleTest.finish(); Isn't SimpleTest.finish(); needed? ::: netwerk/base/src/nsStandardURL.cpp @@ +211,5 @@ > bool nsStandardURL:: > nsSegmentEncoder::InitUnicodeEncoder() > { > NS_ASSERTION(!mEncoder, "Don't call this if we have an encoder already!"); > + mEncoder = EncodingUtils::EncoderForEncoding(mCharset); If i read the code correctly, mCharset will be taken from mOriginCharset and mOriginCharset may be taken from the scriptable source. ::: netwerk/test/unit/test_gre_resources.js @@ -21,5 @@ > - } > -} > - > -function run_test() { > - for each(let file in ["charsetData.properties"]) I think we should change the file to another file under resource://gre-resources/ rather than removing the entire test.

Attachment #8410886 - Flags: review?(VYV03354) → review-

Masatoshi Kimura [:emk]

Comment 9

•

11 years ago

> This will always fail. GB2312 is not a canonical name. Oh, I misunderstood that DecoderForEncoding only supports Encoding Standard encodings. Please disregard this comment (and a few following comments).

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 10

•

11 years ago

(In reply to Masatoshi Kimura [:emk] from comment #8) > ::: content/base/src/nsDocumentEncoder.cpp > @@ +1178,5 @@ > > return NS_ERROR_NOT_INITIALIZED; > > > > + nsAutoCString encoding; > > + if (!EncodingUtils::FindEncodingForLabel(mCharset, encoding) || > > + encoding.EqualsLiteral("replacement")) { > > Could you add a method or an optional parameter to indicate that the caller > wants the replacement encoding rather than adding a compare everywhere? > The most callers will not want the replacement encoding. OK. > ::: intl/uconv/idl/nsIScriptableUConv.idl > @@ +78,2 @@ > > */ > > attribute boolean isInternal; > > Is this attribute still needed at all? It is still used for unit tests. > ::: intl/uconv/src/nsConverterInputStream.cpp > @@ +39,5 @@ > > + // Compat with old test cases. Unclear if any extensions really care. > > + encoding.Assign(label); > > + } else if (!EncodingUtils::FindEncodingForLabel(label, encoding)) { > > + // Weird API design, but retaining for compat > > + encoding.AssignLiteral("ISO-8859-1"); > > The old code used to fallback only if aCharset was null. But it would return > early in that case. So I think the old fallback was already dead. Good point. Yet, unit test assumed that bogus input should result in decoding as ISO-8859-1 (dunno by what mechanism now!), so I think this code needs to stay, but I'll push to try and see what happens if I remove this. > ::: intl/uconv/src/nsConverterOutputStream.cpp > @@ +44,5 @@ > > + encoding.Assign(label); > > + } else if (!EncodingUtils::FindEncodingForLabel(label, encoding) || > > + encoding.EqualsLiteral("replacement")) { > > + // Weird API design, but retaining for compat > > + encoding.AssignLiteral("ISO-8859-1"); > > The old code didn't have this (dead) fallback for the encoder. I'll see what happens if I push to try without this. > ::: intl/uconv/src/nsTextToSubURI.cpp > @@ +24,5 @@ > > > > NS_IMETHODIMP nsTextToSubURI::ConvertAndEscape( > > const char *charset, const char16_t *text, char **_retval) > > { > > + if(nullptr == _retval) { > > nit: if (!_retval) { OK. > ::: intl/uconv/tests/test_long_doc.html > @@ +28,2 @@ > > > > +var decoders = [ > > Could you put the decoder name list in a common header instead of copy & > pasting everywhere? I'd rather not. The list is not actually the same everywhere, since sometimes UTF-16 is on the list and sometimes not. Also, I don't know how to include code cleanly into xpcshell tests. > Why don't you just remove the ISO-2022-CN decoder especially when > instantiating the decoder will always fail? I'll remove the decoders in a later patch. For canonical names, your comment 9 is correct. > ::: intl/unicharutil/src/nsSaveAsCharset.cpp > @@ +327,5 @@ > > nsresult nsSaveAsCharset::SetupUnicodeEncoder(const char* charset) > > { > > NS_ENSURE_ARG(charset); > > + nsDependentCString encoding(charset); > > + mEncoder = EncodingUtils::EncoderForEncoding(encoding); > > |charset| can have any random value because nsSaveAsCharset::Init is > scriptable. :-( I audited all the in-tree callers. Re-resolving labels would break form submission when a form is inserted into a replacement-encoded document using a script from the frame parent. I suggest we just break extensions that pass a non-canonical name. > ::: js/xpconnect/src/XPCLocale.cpp > @@ +192,5 @@ > > if (NS_SUCCEEDED(rv)) { > > nsAutoCString charset; > > rv = platformCharset->GetDefaultCharsetForLocale(localeStr, charset); > > if (NS_SUCCEEDED(rv)) { > > + mDecoder = EncodingUtils::DecoderForEncoding(charset); > > Are you sure GetDefaultCharsetForLocale will always return Gecko-canonical > name on UNIX? Yes. See VerifyCharset() in nsUNIXCharset.cpp. > ::: layout/base/tests/test_bug399284.html > @@ -73,5 @@ > > > > -function lastTest(frame) > > -{ > > - testFontSize(frame); > > - SimpleTest.finish(); > > Isn't SimpleTest.finish(); needed? It indeed s not needed and is actively harmful. I have no idea how this test managed not to fail before. > ::: netwerk/base/src/nsStandardURL.cpp > @@ +211,5 @@ > > bool nsStandardURL:: > > nsSegmentEncoder::InitUnicodeEncoder() > > { > > NS_ASSERTION(!mEncoder, "Don't call this if we have an encoder already!"); > > + mEncoder = EncodingUtils::EncoderForEncoding(mCharset); > > If i read the code correctly, mCharset will be taken from mOriginCharset and > mOriginCharset may be taken from the scriptable source. :-( I will investigate. > ::: netwerk/test/unit/test_gre_resources.js > @@ -21,5 @@ > > - } > > -} > > - > > -function run_test() { > > - for each(let file in ["charsetData.properties"]) > > I think we should change the file to another file under > resource://gre-resources/ rather than removing the entire test. OK. Thanks.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 11

•

11 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #10) > > ::: intl/unicharutil/src/nsSaveAsCharset.cpp > > @@ +327,5 @@ > > > nsresult nsSaveAsCharset::SetupUnicodeEncoder(const char* charset) > > > { > > > NS_ENSURE_ARG(charset); > > > + nsDependentCString encoding(charset); > > > + mEncoder = EncodingUtils::EncoderForEncoding(encoding); > > > > |charset| can have any random value because nsSaveAsCharset::Init is > > scriptable. > > :-( I audited all the in-tree callers. Re-resolving labels would break form > submission when a form is inserted into a replacement-encoded document using > a script from the frame parent. > > I suggest we just break extensions that pass a non-canonical name. Nevermind. I'll just check for "replacement" first.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 12

•

11 years ago

Attached patch WIP (obsolete) (deleted) — Details — Splinter Review

Attachment #8410886 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 13

•

11 years ago

Attached patch c-c WIP that still doesn't work (obsolete) (deleted) — Details — Splinter Review

Any ideas why the Makefile.in fails to cause charsetalias.properties.h to be generated?

Attachment #8407444 - Attachment is obsolete: true

Flags: needinfo?(neil)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 14

•

11 years ago

Adding /mozilla right after $(topsrcdir) for props2arrays.py doesn't help.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 15

•

11 years ago

charsetData.properties removal notes: * Move knowledge of multibyteness to nsMsgI18N.cpp: https://mxr.mozilla.org/comm-central/search?string=isMultibyte * Remove .notForOutgoing: https://mxr.mozilla.org/comm-central/search?string=notForOutgoing * The m-c patch moves lang groups elsewhere. Probably not worthwhile to get the lang group right for mail-only encodings like ISO-2022-CN. * Hard-code list of "internal" encodings. Soon, charsetTitles.properties will only be used for the compose window title: https://mxr.mozilla.org/comm-central/source/mail/components/compose/content/MsgComposeCommands.js#2441

Joshua Cranmer [:jcranmer]

Comment 16

•

11 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #15) > charsetData.properties removal notes: > > * Move knowledge of multibyteness to nsMsgI18N.cpp: > https://mxr.mozilla.org/comm-central/search?string=isMultibyte The biggest use of that knowledge is the selection for RFC 2047 encoding, which is being killed off by bug 790855 anyways. The other uses of IsMultibyte look vaguely wrong ("oh, we don't have to worry about emitting charset because it's base64 encoded!"). > * Remove .notForOutgoing: > https://mxr.mozilla.org/comm-central/search?string=notForOutgoing With the removal of nsCharsetMenu, this is basically unused?

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 17

•

11 years ago

https://tbpl.mozilla.org/?tree=Try&rev=75f0862b27f6

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 18

•

11 years ago

Attached patch Remove nsCharsetConverterManager and nsCharsetAlias (obsolete) (deleted) — Details — Splinter Review

Attachment #8411697 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 19

•

11 years ago

Let's open another bug for the c-c reaction to make it clearer what needs landing where and what the bug status is.

Summary: Move nsCharsetAlias and nsCharsetConverterManager to comm-central → Remove nsCharsetAlias and nsCharsetConverterManager

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 20

•

11 years ago

https://tbpl.mozilla.org/?tree=Try&rev=79a41f650d9e

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Updated

•

11 years ago

Attachment #8411717 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 21

•

11 years ago

Attached patch Remove nsCharsetConverterManager and nsCharsetAlias, v2 (obsolete) (deleted) — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=e17b6548e919

Attachment #8417330 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 22

•

11 years ago

Attached patch Remove nsCharsetConverterManager and nsCharsetAlias, accommodate Linux 32 debug (obsolete) (deleted) — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=a80517f87757

Attachment #8417385 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Updated

•

11 years ago

Attachment #8417967 - Flags: review?(VYV03354)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Updated

•

11 years ago

Flags: needinfo?(neil)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Updated

•

11 years ago

Blocks: 1006498

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 23

•

11 years ago

Note that potentially odd-looking #include additions are there to make things work when we withdraw code from UNIFIED_SOURCES so following sources can't rely on previous #includes.

neil@parkwaycc.co.uk

Comment 24

•

11 years ago

(In reply to Henri Sivonen from comment #23) > Note that potentially odd-looking #include additions are there to make > things work when we withdraw code from UNIFIED_SOURCES so following sources > can't rely on previous #includes. How is the original code able to build in unified disabled mode?

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 25

•

11 years ago

(In reply to neil@parkwaycc.co.uk from comment #24) > (In reply to Henri Sivonen from comment #23) > > Note that potentially odd-looking #include additions are there to make > > things work when we withdraw code from UNIFIED_SOURCES so following sources > > can't rely on previous #includes. > > How is the original code able to build in unified disabled mode? I don't know. I just kept adding #includes until stuff worked again.

Masatoshi Kimura [:emk]

Comment 26

•

11 years ago

Comment on attachment 8417967 [details] [diff] [review] Remove nsCharsetConverterManager and nsCharsetAlias, accommodate Linux 32 debug Review of attachment 8417967 [details] [diff] [review]: ----------------------------------------------------------------- ::: content/base/src/nsDocumentEncoder.cpp @@ +1195,5 @@ > if (!mDocument) > return NS_ERROR_NOT_INITIALIZED; > > + nsAutoCString encoding; > + if (!EncodingUtils::FindEncodingForLabelNoReplacement(mCharset, encoding)) { I prefer to make this function name shorter because fewer callers care about the replacement encoding. How about renaming the current FindEncodingForLabel to FindEncodingForLabelWithReplacement and renaming FindEncodingForLabelNoReplacement to FindEncodingForLabel? ::: intl/uconv/src/nsConverterInputStream.cpp @@ +37,5 @@ > + nsAutoCString encoding; > + if (label.EqualsLiteral("UTF-16")) { > + // Compat with old test cases. Unclear if any extensions really care. > + encoding.Assign(label); > + } else if (!EncodingUtils::FindEncodingForLabel(label, encoding)) { This change will shrink the supported encodings anyway, so I think FindEncodingForLabelNoReplacement (if not renamed) would be better. ::: intl/uconv/src/nsTextToSubURI.cpp @@ +24,5 @@ > > NS_IMETHODIMP nsTextToSubURI::ConvertAndEscape( > const char *charset, const char16_t *text, char **_retval) > { > + if(!_retval) { Uber nit: space between "if" and "(". (I know some existing code didn't follow the convention, but this should be consistent with other added code here.)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 27

•

11 years ago

(In reply to Masatoshi Kimura [:emk] from comment #26) > I prefer to make this function name shorter because fewer callers care about > the replacement encoding. > How about renaming the current FindEncodingForLabel to > FindEncodingForLabelWithReplacement and renaming > FindEncodingForLabelNoReplacement to FindEncodingForLabel? I think flipping the names around so that the version that never returns "replacement" is the one without modifiers in the name would be bad, because this code should reflect the spec in an obvious way and "get an encoding" in the spec can return "replacement" and treating a returned "replacement" as if the label had been unknown is treated as a modified case by the spec. Also, we should make sure that people who aren't familiar with the issues are doing pick the one that can return "replacement", since that's less likely to lead to a security bug (though that's mainly a problem in the HTML parser and hopefully people patching the HTML parser are familiar with the issues...). If we need to make these names shorter, I'd prefer to drop "Find" from the start to make them: EncodingForLabel() and EncodingForLabelNoReplacement(). > ::: intl/uconv/src/nsConverterInputStream.cpp > @@ +37,5 @@ > > + nsAutoCString encoding; > > + if (label.EqualsLiteral("UTF-16")) { > > + // Compat with old test cases. Unclear if any extensions really care. > > + encoding.Assign(label); > > + } else if (!EncodingUtils::FindEncodingForLabel(label, encoding)) { > > This change will shrink the supported encodings anyway, so I think > FindEncodingForLabelNoReplacement (if not renamed) would be better. OK. > ::: intl/uconv/src/nsTextToSubURI.cpp > @@ +24,5 @@ > > > > NS_IMETHODIMP nsTextToSubURI::ConvertAndEscape( > > const char *charset, const char16_t *text, char **_retval) > > { > > + if(!_retval) { > > Uber nit: space between "if" and "(". (I know some existing code didn't > follow the convention, but this should be consistent with other added code > here.) OK.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 28

•

11 years ago

Attached patch Remove nsCharsetConverterManager and nsCharsetAlias addressing review comments (deleted) — Details — Splinter Review

This patch doesn't rename the methods yet.

Attachment #8417967 - Attachment is obsolete: true

Attachment #8417967 - Flags: review?(VYV03354)

Attachment #8418706 - Flags: review?(VYV03354)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 29

•

11 years ago

Attached patch Drop "Find" from FindEncodingForLabel* (obsolete) (deleted) — Details — Splinter Review

Attachment #8418707 - Flags: review?(VYV03354)

Masatoshi Kimura [:emk]

Comment 30

•

11 years ago

Comment on attachment 8418706 [details] [diff] [review] Remove nsCharsetConverterManager and nsCharsetAlias addressing review comments Let's land this part now. I'll consider the naming little more.

Attachment #8418706 - Flags: review?(VYV03354) → review+

Masatoshi Kimura [:emk]

Updated

•

11 years ago

Keywords: leave-open

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 31

•

11 years ago

(In reply to Masatoshi Kimura [:emk] from comment #30) > Let's land this part now. I'll consider the naming little more. Thank you! Landed: https://hg.mozilla.org/integration/mozilla-inbound/rev/15680e55195c

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Updated

•

11 years ago

Blocks: 1007581

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 32

•

11 years ago

(In reply to Masatoshi Kimura [:emk] from comment #30) > I'll consider the naming little more. In order to make it clear what got fixed here, let's do that with a new bug number: bug 1007581.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Updated

•

11 years ago

Attachment #8418707 - Attachment is obsolete: true

Attachment #8418707 - Flags: review?(VYV03354)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Updated

•

11 years ago

Keywords: leave-open

Ryan VanderMeulen [:RyanVM]

Comment 33

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/15680e55195c

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla32

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

11 years ago

Depends on: 1008077

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 34

•

11 years ago

Notes for the documentation team: This patch doesn't change anything for Web developers. The encodings being taken away from add-on developers were already taking away from Web developers previously. This patch makes character encodings that are not part of the Encoding Standard (http://encoding.spec.whatwg.org/) unavailable to Firefox add-ons, except in the case where you set isInternal to true on nsIScriptableUnicodeConverter, which add-on developers really shouldn't be doing. Additionally, for end users, this de-supports Unix systems whose locale encoding is EUC-TW somewhat more than previously. This probably isn't worth documenting, though, since it looks like our support for non-UTF-8 Unix systems is semi-broken already.

Keywords: addon-compat, dev-doc-needed

Joshua Cranmer [:jcranmer]

Comment 35

•

11 years ago

As this patch is currently designed, there is no mechanism by which Thunderbird can expose its non-Encoding spec based charsets to JavaScript. I would like to see this patch backed out and not relanded into this mechanism is added.

Chris

Comment 36

•

11 years ago

This is also breaking a FFOS extension. Current code was using charsetConverterManager to get an encoder for Shift-JIS encoding. The EncodingUtils.h used in this patch depends on internal headers that can't be used outside of internal gecko code. Either this patch should backed out or please provide a public mechanism to get an encoder for any publicly supported encoding.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 37

•

11 years ago

(In reply to Joshua Cranmer [:jcranmer] from comment #35) > As this patch is currently designed, there is no mechanism by which > Thunderbird can expose its non-Encoding spec based charsets to JavaScript. > > I would like to see this patch backed out and not relanded into this > mechanism is added. I deliberately left in such a mechanism: isInternal on nsIScriptableUConv.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 38

•

11 years ago

(In reply to Chris from comment #36) > This is also breaking a FFOS extension. Current code was using > charsetConverterManager to get an encoder for Shift-JIS encoding. > > The EncodingUtils.h used in this patch depends on internal headers that > can't be used outside of internal gecko code. > > Either this patch should backed out or please provide a public mechanism to > get an encoder for any publicly supported encoding. Is there a reason why any of the following don't work: TextDecoder nsIScriptableUConv nsIConverterInputStream ?

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 39

•

11 years ago

(In reply to Chris from comment #36) > This is also breaking a FFOS extension. Can you please provide a link to the code that broke so that I can better assist in fixing it?

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 40

•

11 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #38) > (In reply to Chris from comment #36) > > This is also breaking a FFOS extension. Current code was using > > charsetConverterManager to get an encoder for Shift-JIS encoding. > > > > The EncodingUtils.h used in this patch depends on internal headers that > > can't be used outside of internal gecko code. > > > > Either this patch should backed out or please provide a public mechanism to > > get an encoder for any publicly supported encoding. > > Is there a reason why any of the following don't work: > TextDecoder > nsIScriptableUConv > nsIConverterInputStream > ? Oops. I misread the part where you say *en*coder. Why do you need to encode anything other than UTF-8?

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 41

•

11 years ago

Also: EncodingUtils::EncoderForEncoding hides XPCOM cruft. If you need XPCOM cruft, you can use do_CreateInstance with the contract id for the Shift_JIS encoder.

Joshua Cranmer [:jcranmer]

Comment 42

•

11 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #37) > (In reply to Joshua Cranmer [:jcranmer] from comment #35) > > As this patch is currently designed, there is no mechanism by which > > Thunderbird can expose its non-Encoding spec based charsets to JavaScript. > > > > I would like to see this patch backed out and not relanded into this > > mechanism is added. > > I deliberately left in such a mechanism: isInternal on nsIScriptableUConv. That mechanism is *broken*. The code in question in the tests does this: this._encoder = Cc[""].createInstance(); this._encoder.isInternal true; this._encoder.charset label; And this results in an assertion failure: encoder (Tried to create encoder for uknown encoding.), at /src/trunk/comm-central/mozilla/dom/encoding/EncodingUtils.cpp. Actually, looking at one of the tests in more detail, it asserts instead of throwing an error on an unknown charset.

Joshua Cranmer [:jcranmer]

Comment 43

•

11 years ago

It also appears unable to be able to do any custom alias management. Crashing a program because a user-specified (potentially network-derived charset) was not found IS NOT A GOOD IDEA. Why are you not throwing an error instead?

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 44

•

11 years ago

EncodingUtils has two *distinct* concepts that are respresented as nsACStrings: labels and encodings. A label is a string you get from the network. An encoding is a symbol that denotes the concept of a particular encoding. DecoderForEncoding and EncoderForEncoding take encodings. You must never treat a string obtained from the network as an encoding. DexoderForEncosinf asserts, as you've noticed, if you break this rule. You must always treat network-originating strings as labels. Use FindEncodingForLabel to go from a label to an encoding in a Web-oriented way. Use nsCharsetAlias to do the same in an email-oriented way. When isInternal is true, nsIScriptableUConv takes an encoding instead of taking a label. This works as evidenced by tests in m-c. There's a bug about making the encoding concept not be represented by an nsACString. That bug will probably never get fixed, though.

Joshua Cranmer [:jcranmer]

Comment 45

•

11 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #44) > When isInternal is true, nsIScriptableUConv takes an encoding instead of > taking a label. This works as evidenced by tests in m-c. I can tell you right off the bat that all the uses of nsIScriptableUConv (particularly in extensions) I've looked at that set internal = true do not follow this rule. Even the tests that you ported to comm-central don't follow this rule. It is extremely hostile to add-ons to make this kind of major change.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 46

•

11 years ago

Why are extensions using an API with "internal" in its name? But OK. Let's make isInternal less hostile to extensions by making it accept Web labels as well as internal Gecko encodings as a follow-up.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 47

•

11 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #46) > Let's > make isInternal less hostile to extensions by making it accept Web labels as > well as internal Gecko encodings as a follow-up. Bug 1008832.

Depends on: 1008832

Chris

Comment 48

•

11 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #38) > (In reply to Chris from comment #36) > > This is also breaking a FFOS extension. Current code was using > > charsetConverterManager to get an encoder for Shift-JIS encoding. > > > > The EncodingUtils.h used in this patch depends on internal headers that > > can't be used outside of internal gecko code. > > > > Either this patch should backed out or please provide a public mechanism to > > get an encoder for any publicly supported encoding. > > Is there a reason why any of the following don't work: > TextDecoder > nsIScriptableUConv > nsIConverterInputStream > ? Thanks for the feedback. I've reworked some of our extension code to leverage nsIScriptableUConv. We should be ok now, unless otherwise noted.

Simon Montagu :smontagu

Updated

•

9 years ago

Depends on: 1171006

You need to log in before you can comment on or make changes to this bug.