Closed
Bug 200049
Opened 22 years ago
Closed 22 years ago
[minimo] land uconv build changes.
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
FIXED
mozilla1.4beta
People
(Reporter: dougt, Assigned: dougt)
References
Details
Attachments
(1 file)
(deleted),
patch
|
darin.moz
:
review+
alecf
:
superreview+
|
Details | Diff | Splinter Review |
land the changes that allow [optional] native unicode converters.
Assignee | ||
Updated•22 years ago
|
Severity: normal → major
Target Milestone: --- → mozilla1.4beta
Assignee | ||
Comment 1•22 years ago
|
||
This patch was on the minimo branch as was going to land. However I didn't
land it until i figured out why accessing the charset manager from js was
busted. Last night, I found where i did something really stoopid - i used the
same module name (in the Makefile.in) for the native uncov as was used for
uconv. This mistake caused the uconv typelib built in uconv/idl to be wacked
by the typelib generated in uconv/native. (maybe i should move the native
uconv idl into uconv/idl?)
note that the bulk of the change is just reordering the module code to reduce
the number of #ifdef's i needed to toggle this native uconv support.
Assignee | ||
Comment 2•22 years ago
|
||
Comment on attachment 119111 [details] [diff] [review]
patch v.1
I did speak to ftang regarding these changes and he did not object.
Attachment #119111 -
Flags: superreview?(alecf)
Attachment #119111 -
Flags: review?(darin)
Comment 3•22 years ago
|
||
Comment on attachment 119111 [details] [diff] [review]
patch v.1
>Index: src/Makefile.in
>+ mozreg \
> xpcom_obsolete \
nit: use consistent indentation.
>Index: src/nsCharsetConverterManager.cpp
>+ mNativeUC = do_GetService(NS_NATIVE_UCONV_SERVICE_CONTRACT_ID);
> }
do we really want to create this service in all cases? why isn't this
#ifdef'd? i know this will fail, but why bother calling GetService?
do you have a configure.in option for this? or is that coming later?
r=darin (i think it would still be good to get a review from someone on
the intl team)
Attachment #119111 -
Flags: review?(darin) → review+
Comment 4•22 years ago
|
||
Comment on attachment 119111 [details] [diff] [review]
patch v.1
looks great!
sr=alecf
Attachment #119111 -
Flags: superreview?(alecf) → superreview+
Assignee | ||
Comment 5•22 years ago
|
||
darin - i fixed up your concerns regarding the patch -- we do have a configure
option for this experimental native uconv support (--enable-native-uconv) so
there is no reason not do disable this check completely when not using the
native unicode converter.
Checking in Makefile.in;
/cvsroot/mozilla/intl/uconv/Makefile.in,v <-- Makefile.in
new revision: 1.16; previous revision: 1.15
done
Checking in native/Makefile.in;
/cvsroot/mozilla/intl/uconv/native/Makefile.in,v <-- Makefile.in
new revision: 1.3; previous revision: 1.2
done
Checking in src/Makefile.in;
/cvsroot/mozilla/intl/uconv/src/Makefile.in,v <-- Makefile.in
new revision: 1.70; previous revision: 1.69
done
Checking in src/nsCharsetConverterManager.cpp;
/cvsroot/mozilla/intl/uconv/src/nsCharsetConverterManager.cpp,v <--
nsCharsetConverterManager.cpp
new revision: 1.97; previous revision: 1.96
done
Checking in src/nsCharsetConverterManager.h;
/cvsroot/mozilla/intl/uconv/src/nsCharsetConverterManager.h,v <--
nsCharsetConverterManager.h
new revision: 1.8; previous revision: 1.7
done
Checking in src/nsUConvModule.cpp;
/cvsroot/mozilla/intl/uconv/src/nsUConvModule.cpp,v <-- nsUConvModule.cpp
new revision: 1.44; previous revision: 1.43
done
Marking Fixed.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Comment 6•22 years ago
|
||
+ mNativeUC->GetNativeConverter("UCS-2",
+ NS_LossyConvertUCS2toASCII(*aDest).get(),
+ getter_AddRefs(encoder));
UCS-2 in the above should be UTF-16 (native) or equivalent name used
by iconv(3) on the target platform (well, iconv(3) detected at
compile time can be different from iconv(3) used at runtime....)
What Mozilla uses internally(PRUnichar *) is not UCS-2 but UTF-16 (native). The
same is true of a similar line for the decoder and a couple of comments
that mention 'ucs-2'.
In addition, I'm wondering how non-standardized codeset names used by
different implemnetations of iconv(3) on different platforms are
reconsiled with the standard charset names used by Mozilla. Am I missing
something? If everyone uses Bruno Haible's libiconv or glibc, there'd be no
problem.
See http://www.ctan.org/tex-archive/macros/texinfo/texinfo/intl/config.charset
for the list. (The list is maintained by Bruno Haible and there may
be an updated list somewhere else. The above url happened to be at the top
of google search.). We may need 'inverse charsetAlias lookup mechanism'
to map Mozilla's internal charset/codeset name to that used by
iconv(3).
Assignee | ||
Comment 7•22 years ago
|
||
Hi Jungshik,
Thank you for your comment. I believe that the line that you mention is from
nsICharsetConverterManager::GetUnicodeEncoder. This method is expect to return
an interface that can "encode" some given charset into UCS-2. Or at least that
is what the documentation of this interface suggests:
http://lxr.mozilla.org/seamonkey/source/intl/uconv/public/nsICharsetConverterManager.h#193
I should note that conversions seam to work fine. However, as mentioned that
these changes are still experimental and there could be major bugs.
wrt reconsiled the charset names used by Mozilla -- if iconv doesn't support the
charset name we fail. This is by design.
Please note that this is not intended to be a replacemewnt for uconv on the
desktop. This is intended for small devices which can not afford duplicate
unicode converters (one in iconv and one in mozilla).
Comment 8•22 years ago
|
||
Thank you Doug for your reply.
As you found, UCS-2 is ubiquitious in Mozilla code, but most, if not all,
of them refers to UTF-16. Fortunately, most occurrence of UCS-2
are in function names, comments and documents so that there's no
impact on the actual code. Moreover, functions with UCS2 in their
names(e.g. NS_ConvertUCS2toUTF8) correctly implement UTF-16.
This is confusing and we need
to do tree-wide clean-up sometime (there's a bug I filed for it).
Now, your code used 'UCS-2' in the actual code and that breaks things
if source/target includes characters beyond BMP(Baisc Multilingual
Plane). Please, note that UCS-2 and UTF-16 are exactly the same
as long as only characters in BMP ( Unicode codepoint below 0x10000)
are involved. Characters beyond that cannot be represented in
UCS-2 and iconv(3) would return an error when thrown those characters.
(for instance, UTF-8 decoder would return an error if input
includes non-BMP characters.)
> wrt reconsiled the charset names used by Mozilla -- if iconv doesn't
> support the charset name we fail. This is by design.
I'm sorry I wasn't clear enough. What if iconv(3) does support
a charset, but its name is different from that used by Mozilla?
Is it still acceptable or do we want to do something more in that case?
Perhaps, this is not so much an issue because most small
embedded devices are likely to use libiconv by Bruno Haible.
Assignee | ||
Comment 9•22 years ago
|
||
Gotta love our documentation story :-/ and my ignorance on some of these intl
issues. I appriecate your help here! I have created a new bug to track the
wrong charset name being used in the native uconv decoder/encoder. See bug 201182.
As far as worrying about charset aliasing -- I am not completely sure. It
really depends on if the top intl site are busted. Maybe libiconv does a "good
enough" job here? We really need to run these changes through a very tough test
suite to determine this.
Comment 10•22 years ago
|
||
Hi Doug, pleased to help you here. I'll keep my eyes open
for bug 201182.
> As far as worrying about charset aliasing -- I am not completely sure. It
> really depends on if the top intl site are busted. Maybe libiconv does
> a "good enough" job here? We really need to run these changes through a very
> tough test suite to determine this
If it was not clear from my previous comment, there are two kinds
of charset aliasing issues here. One is mapping what's used by html/xml
documents(document authors/web masters) to what Mozilla uses internally.
That's taken care of by nsICharsetAliases (or sth. like that) which
I believe is used regardless of whether nativeuconv is turned on or not.
Therefore, this type of aliasing is not our concern. The other is mapping
Mozilla's internal charset name to that used by iconv(3). As you know by
now, POSIX/Single Unix Spec. is silent on standardizing codeset names
used by iconv(3) so that different implementations of iconv(3) use
slightly different names.
If most of our target small devices
use libiconv by Bruno Haible, we have little to worry because
it uses most, if not all, of Mozilla's internal names to mean
more or less identical (not exactly the same) charsets.
If this assumption doesn't hold and a wild variety
of iconv(3) implementation are in use, we may want to add
compile-time mechanism to map Mozilla's internal names to
iconv's names. Because targets for this code are small devices,
I think we're relieved of another potential problem,
which is that run-time iconv can be diffrent from compile-time iconv.
(hopefully, they don't have room for a second iconv implementation
and something like LD_LIBRARY or LD_PRELOAD(?)) :-) )
You need to log in
before you can comment on or make changes to this bug.
Description
•