Closed Bug 200049 Opened 22 years ago Closed 22 years ago

[minimo] land uconv build changes.

Categories

(Core :: Internationalization, defect)

x86
Windows XP
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla1.4beta

People

(Reporter: dougt, Assigned: dougt)

References

Details

Attachments

(1 file)

land the changes that allow [optional] native unicode converters.
Severity: normal → major
Target Milestone: --- → mozilla1.4beta
Attached patch patch v.1 (deleted) — Splinter Review
This patch was on the minimo branch as was going to land. However I didn't land it until i figured out why accessing the charset manager from js was busted. Last night, I found where i did something really stoopid - i used the same module name (in the Makefile.in) for the native uncov as was used for uconv. This mistake caused the uconv typelib built in uconv/idl to be wacked by the typelib generated in uconv/native. (maybe i should move the native uconv idl into uconv/idl?) note that the bulk of the change is just reordering the module code to reduce the number of #ifdef's i needed to toggle this native uconv support.
Comment on attachment 119111 [details] [diff] [review] patch v.1 I did speak to ftang regarding these changes and he did not object.
Attachment #119111 - Flags: superreview?(alecf)
Attachment #119111 - Flags: review?(darin)
Comment on attachment 119111 [details] [diff] [review] patch v.1 >Index: src/Makefile.in >+ mozreg \ > xpcom_obsolete \ nit: use consistent indentation. >Index: src/nsCharsetConverterManager.cpp >+ mNativeUC = do_GetService(NS_NATIVE_UCONV_SERVICE_CONTRACT_ID); > } do we really want to create this service in all cases? why isn't this #ifdef'd? i know this will fail, but why bother calling GetService? do you have a configure.in option for this? or is that coming later? r=darin (i think it would still be good to get a review from someone on the intl team)
Attachment #119111 - Flags: review?(darin) → review+
Comment on attachment 119111 [details] [diff] [review] patch v.1 looks great! sr=alecf
Attachment #119111 - Flags: superreview?(alecf) → superreview+
darin - i fixed up your concerns regarding the patch -- we do have a configure option for this experimental native uconv support (--enable-native-uconv) so there is no reason not do disable this check completely when not using the native unicode converter. Checking in Makefile.in; /cvsroot/mozilla/intl/uconv/Makefile.in,v <-- Makefile.in new revision: 1.16; previous revision: 1.15 done Checking in native/Makefile.in; /cvsroot/mozilla/intl/uconv/native/Makefile.in,v <-- Makefile.in new revision: 1.3; previous revision: 1.2 done Checking in src/Makefile.in; /cvsroot/mozilla/intl/uconv/src/Makefile.in,v <-- Makefile.in new revision: 1.70; previous revision: 1.69 done Checking in src/nsCharsetConverterManager.cpp; /cvsroot/mozilla/intl/uconv/src/nsCharsetConverterManager.cpp,v <-- nsCharsetConverterManager.cpp new revision: 1.97; previous revision: 1.96 done Checking in src/nsCharsetConverterManager.h; /cvsroot/mozilla/intl/uconv/src/nsCharsetConverterManager.h,v <-- nsCharsetConverterManager.h new revision: 1.8; previous revision: 1.7 done Checking in src/nsUConvModule.cpp; /cvsroot/mozilla/intl/uconv/src/nsUConvModule.cpp,v <-- nsUConvModule.cpp new revision: 1.44; previous revision: 1.43 done Marking Fixed.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
+ mNativeUC->GetNativeConverter("UCS-2", + NS_LossyConvertUCS2toASCII(*aDest).get(), + getter_AddRefs(encoder)); UCS-2 in the above should be UTF-16 (native) or equivalent name used by iconv(3) on the target platform (well, iconv(3) detected at compile time can be different from iconv(3) used at runtime....) What Mozilla uses internally(PRUnichar *) is not UCS-2 but UTF-16 (native). The same is true of a similar line for the decoder and a couple of comments that mention 'ucs-2'. In addition, I'm wondering how non-standardized codeset names used by different implemnetations of iconv(3) on different platforms are reconsiled with the standard charset names used by Mozilla. Am I missing something? If everyone uses Bruno Haible's libiconv or glibc, there'd be no problem. See http://www.ctan.org/tex-archive/macros/texinfo/texinfo/intl/config.charset for the list. (The list is maintained by Bruno Haible and there may be an updated list somewhere else. The above url happened to be at the top of google search.). We may need 'inverse charsetAlias lookup mechanism' to map Mozilla's internal charset/codeset name to that used by iconv(3).
Hi Jungshik, Thank you for your comment. I believe that the line that you mention is from nsICharsetConverterManager::GetUnicodeEncoder. This method is expect to return an interface that can "encode" some given charset into UCS-2. Or at least that is what the documentation of this interface suggests: http://lxr.mozilla.org/seamonkey/source/intl/uconv/public/nsICharsetConverterManager.h#193 I should note that conversions seam to work fine. However, as mentioned that these changes are still experimental and there could be major bugs. wrt reconsiled the charset names used by Mozilla -- if iconv doesn't support the charset name we fail. This is by design. Please note that this is not intended to be a replacemewnt for uconv on the desktop. This is intended for small devices which can not afford duplicate unicode converters (one in iconv and one in mozilla).
Thank you Doug for your reply. As you found, UCS-2 is ubiquitious in Mozilla code, but most, if not all, of them refers to UTF-16. Fortunately, most occurrence of UCS-2 are in function names, comments and documents so that there's no impact on the actual code. Moreover, functions with UCS2 in their names(e.g. NS_ConvertUCS2toUTF8) correctly implement UTF-16. This is confusing and we need to do tree-wide clean-up sometime (there's a bug I filed for it). Now, your code used 'UCS-2' in the actual code and that breaks things if source/target includes characters beyond BMP(Baisc Multilingual Plane). Please, note that UCS-2 and UTF-16 are exactly the same as long as only characters in BMP ( Unicode codepoint below 0x10000) are involved. Characters beyond that cannot be represented in UCS-2 and iconv(3) would return an error when thrown those characters. (for instance, UTF-8 decoder would return an error if input includes non-BMP characters.) > wrt reconsiled the charset names used by Mozilla -- if iconv doesn't > support the charset name we fail. This is by design. I'm sorry I wasn't clear enough. What if iconv(3) does support a charset, but its name is different from that used by Mozilla? Is it still acceptable or do we want to do something more in that case? Perhaps, this is not so much an issue because most small embedded devices are likely to use libiconv by Bruno Haible.
Gotta love our documentation story :-/ and my ignorance on some of these intl issues. I appriecate your help here! I have created a new bug to track the wrong charset name being used in the native uconv decoder/encoder. See bug 201182. As far as worrying about charset aliasing -- I am not completely sure. It really depends on if the top intl site are busted. Maybe libiconv does a "good enough" job here? We really need to run these changes through a very tough test suite to determine this.
Hi Doug, pleased to help you here. I'll keep my eyes open for bug 201182. > As far as worrying about charset aliasing -- I am not completely sure. It > really depends on if the top intl site are busted. Maybe libiconv does > a "good enough" job here? We really need to run these changes through a very > tough test suite to determine this If it was not clear from my previous comment, there are two kinds of charset aliasing issues here. One is mapping what's used by html/xml documents(document authors/web masters) to what Mozilla uses internally. That's taken care of by nsICharsetAliases (or sth. like that) which I believe is used regardless of whether nativeuconv is turned on or not. Therefore, this type of aliasing is not our concern. The other is mapping Mozilla's internal charset name to that used by iconv(3). As you know by now, POSIX/Single Unix Spec. is silent on standardizing codeset names used by iconv(3) so that different implementations of iconv(3) use slightly different names. If most of our target small devices use libiconv by Bruno Haible, we have little to worry because it uses most, if not all, of Mozilla's internal names to mean more or less identical (not exactly the same) charsets. If this assumption doesn't hold and a wild variety of iconv(3) implementation are in use, we may want to add compile-time mechanism to map Mozilla's internal names to iconv's names. Because targets for this code are small devices, I think we're relieved of another potential problem, which is that run-time iconv can be diffrent from compile-time iconv. (hopefully, they don't have room for a second iconv implementation and something like LD_LIBRARY or LD_PRELOAD(?)) :-) )
Blocks: 213938
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: