Closed
Bug 127755
Opened 23 years ago
Closed 21 years ago
ISO-8859-11 (Latin/Thai) Support
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla1.4final
People
(Reporter: arthit, Assigned: ftang)
References
(Blocks 1 open bug)
Details
(Keywords: intl)
Attachments
(5 files, 2 obsolete files)
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
smontagu
:
review+
rbs
:
superreview+
|
Details | Diff | Splinter Review |
At present time (Mozilla 0.9.8 & 20020224xx nightly build),
Mozilla supports only one Thai char encoding, TIS-620.
ISO-8859-11 is not supported yet.
ISO-8859-11
Information technology -- 8-bit single-byte coded graphic character sets --
Part 11: Latin/Thai alphabet
http://www.iso.ch/iso/en/CatalogueDetailPage.CatalogueDetail?
CSNUMBER=28263&ICS1=35&ICS2=40&ICS3=
most of the ISO-8859-11 is taken from TIS-620,
you may use TIS-620 standard as co-reference.
TIS-620
http://www.nectec.or.th/it-standards/std620/std620.htm
----
for testing,
Thai language webpage that use ISO-8859-11
http://www.bababorbor.com/
Comment 1•23 years ago
|
||
->il8n
Assignee: asa → yokoyama
Component: Browser-General → Internationalization
QA Contact: doronr → ruixu
Updated•23 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee | ||
Comment 3•23 years ago
|
||
can you tell me what is the difference betwen TIS-620 and ISO-8859-11 ?
Does the final version of ISO-8859-11 the same as what they specified in
http://www.nectec.or.th/it-standards/iso8859-11/ ?
Status: NEW → ASSIGNED
Comment 4•23 years ago
|
||
the document at link
http://www.nectec.or.th/it-standards/iso8859-11/
is quite old - it just a draft.
Comment 5•23 years ago
|
||
Does someone here know where we can get ISO-8859-11-encoded fonts from (PS
Type1, TrueType etc.) ?
Reporter | ||
Comment 6•23 years ago
|
||
(taken from OpenOffice.org mailing list)
Theppitak <thep@links.nectec.or.th> has answered questions about
differences of TIS-620 vs ISO-8859-11
----
Q: about TIS-620 0xA0, should it be mapped to U+00A0 (NBSP)
or considered as UNASSIGNED ?
Actually, 0xA0 is _unassigned_ in TIS-620.
Although NBSP becomes well-known when TIS-620 is put in row with
encoding tables of ISO-8859 series, not every legacy system under
TIS-620 standards recognizes and handles this character. So, IMO,
it should be treated _unassigned_ as such.
But the distiction seems to become more relaxed as time goes by,
anyway. :)
Q: The differences among MS874, TIS-620, ISO-8859-11.
MS874 = TIS-620 + { NBSP, ellipsis, quote_left, quote_right,
doublequote_left, doublequote_right, bullet, en_dash, em_dash }
ISO-8859-11 = TIS-620 + { NBSP }
-Thep.
----
Reporter | ||
Comment 7•23 years ago
|
||
changed severity to MAJOR,
since we are about to migrate from TIS-620 to ISO-8859-11.
ftang:
i changed only the severity --
i understand your team situation, a priority is still up to your team :)
Severity: normal → major
Comment 8•23 years ago
|
||
Frank,
Will this go to 1.0?
Comment 9•22 years ago
|
||
ToDo list (Unix/Linux only):
- We need a ISO-8859-11 converter
- We need to hook-up entries for ISO-8859-11 in
mozilla/gfx/src/gtk/nsFontMetricsGTK.cpp and
mozilla/gfx/src/xlib/nsFontMetricsXlib.cpp
Is this list complete ?
Assignee | ||
Comment 10•22 years ago
|
||
ok, so
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP874.TXT
is MS874, right ?
if I take out the
0x80
0x20AC
#EURO SIGN
0x85
0x2026
#HORIZONTAL ELLIPSIS
0x91
0x2018
#LEFT SINGLE QUOTATION MARK
0x92
0x2019
#RIGHT SINGLE QUOTATION MARK
0x93
0x201C
#LEFT DOUBLE QUOTATION MARK
0x94
0x201D
#RIGHT DOUBLE QUOTATION MARK
0x95
0x2022
#BULLET
0x96
0x2013
#EN DASH
0x97
0x2014
#EM DASH
then it become ISO-8859-11
and if I then take out
0xA0
0x00A0
#NO-BREAK SPACE
then it become TIS-620 ?
Is that the case? You didn't mention euro, assume TIS-620 do not have euro
neither ISO-8859-11
Assignee | ||
Comment 11•22 years ago
|
||
Assignee | ||
Comment 12•22 years ago
|
||
Assignee | ||
Comment 13•22 years ago
|
||
Assignee | ||
Comment 14•22 years ago
|
||
Assignee | ||
Comment 15•22 years ago
|
||
ok, I make the uf and ut by the following way
1. copy http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP874.TXT
into intl/uconv/tools
2. in intl/uconv/tools , make umaptable
3. vi cp874.txt and remove those line said undefine
4. umaptable -uf < cp874.txt > cp874.uf
5. umaptable -ut < cp874.txt > cp874.ut
the result of 4 and 5 are identical to the current cp874.uf and cp874.ut if I
use cvs -b . the difference is in the comment section of licensing
now I copy the cp874.txt to iso885911.txt and remove those entreis I menetion:
0x80 0x20AC #EURO SIGN
0x85 0x2026 #HORIZONTAL ELLIPSIS
0x91 0x2018 #LEFT SINGLE QUOTATION MARK
0x92 0x2019 #RIGHT SINGLE QUOTATION MARK
0x93 0x201C #LEFT DOUBLE QUOTATION MARK
0x94 0x201D #RIGHT DOUBLE QUOTATION MARK
0x95 0x2022 #BULLET
0x96 0x2013 #EN DASH
0x97 0x2014 #EM DASH
and run the umaptable to generate iso885911.uf and ut
then I copy iso885911.txt to tis620.txt but remove 0xa0 and then run umaptable
to generate tis620.uf and ut
roland or masaki, if you guy really care to support, you can take these .uf and
ut and copy the code we do for cp874 to make convert for it
after that you do need to change charsetData.properties and
charsetName.properties and probably also navigator.properties
alecf is changing how we create the converter. He is moving away from the one
class per converter to pass in table in the base contructor. I don't want to
generate any stuff which might conflict with his work untill he finish.
I have consider subclass the cp874 converter and add if code for those
characters. But after look at the size of the table it seems not worthy because
each of these tables are 32 bytes only. I don't think any if statmenet is less
than 32 bytes in machine code :)
roland, do you want to take this ?
Comment 16•22 years ago
|
||
A few questions to Arthit.
- What MIME charset names are used to tag web pages, emails,
and news articles in TIS 620, ISO 8859-11 and
CP874, respectively?
- Are there a lot of web pages and email messages/news articles that
are actually in CP874(Windows-874/IBM 874) but are tagged as
TIS-620 or ISO-8859-11?
- What MIME charset(actually used encoding) is expected in
html/xml/mailnews by Thai people and software
products for Thai? Is it TIS-620, ISO-8859-11 or CP874?
Perhaps, they don't care much because
TIS-620 is a subset of ISO-8859-11 , which is in turn
a subset of MS874/CP874.
- What MIME charset name has to be used to *tag*
documents with characters ONLY present in CP874?
(IANA charset registry doesn't have CP874/Windows-874/IBM 874
although it has ISO 8859-11 and TIS 620.)
- Which of them is used for Thai X11 fonts? I remember
that there are a couple of variants depending on how
non-spacing glyphs are treated (with zero-width or
negative-width). This is for gtk/x11 ports of gfx.
If the answer to the second question is yes, the situation
appears to be rather similar to ISO 8859-1 vs Windows-1252
and EUC-KR vs X-Windows-949 where the first in pairs is
a proper subset of the the second in pairs. In both cases, Mozilla
treats *incoming* documents *tagged* as in the smaller set of each pair
as in the larger set of each pair. Why? Because there are numerous
documents mislabelled as in ISO-8859-1(EUC-KR) that are actually
in Windows-1252(X-Windows-949).
As for *outgoing* (Mozilla generated documents : composer and
mailnews) documents,
Mozilla is strictly compliant to the standard definition of
those MIME charsets and warns users that if characters outside
the repertoire of a currently selected MIME charset are
included in an outgoing document tagged in the smaller
set of each pair. (i.e. characters bet. 0x80 - 0x9F are included
in ISO-8859-1 email).
I'm pretty sure that the way I described above for incoming documents
can be applied to TIS 620/ISO 8859-11/CP874, but I need to know
more as to how to treat outgoing documents. Therefore, your
answers to questions above would help me implement two
converters you proposed.
Comment 17•22 years ago
|
||
ftp://ftp.nectec.or.th/pub/thailinux/cvs/docs/thaisupp/thaisupp.html
has details on Thai support under Linux. Do gtk/xlib ports
have to support all registry-encoding variants,
tis620-0, tis620-1, tis620-2, iso8859-11, tis620.2529-1, tis620.2533-0,
tis620.2533-1? Well, it seems like this issue already has been
taken care of so that we don't have to worry about
gtk/xlib ports of gfx here.
Reporter | ||
Comment 18•22 years ago
|
||
answers to questions in comment #16
- What MIME charset names are used to tag web pages, emails,
and news articles in TIS 620, ISO 8859-11 and
CP874, respectively?
for Thai characters on the internet -- just TIS-620 and ISO-8859-11.
----
- Are there a lot of web pages and email messages/news articles that
are actually in CP874(Windows-874/IBM 874) but are tagged as
TIS-620 or ISO-8859-11?
very few (if exist) tagged as ISO-8859-11 (since it has been announced as a
standard not for long)
less than very few tagged as CP874 (i never see it)
most of them tagged as Windows-874 and TIS-620
about characters,
sez if Windows-874 = TIS-620 + X
all characters in X range are presented in other code page of Unicode/other
encoding (and can use HTML entities instead, if the author really want to use)
from my own experience,
number of pages tagged as Windows-874 is much more than those of TIS-620.
anyway, according to IANA
we must use only TIS-620 or ISO-8859-11
----
- What MIME charset(actually used encoding) is expected in
html/xml/mailnews by Thai people and software
products for Thai? Is it TIS-620, ISO-8859-11 or CP874?
Perhaps, they don't care much because
TIS-620 is a subset of ISO-8859-11 , which is in turn
a subset of MS874/CP874.
it is true that, actually, the users don't care much as long as everything works.
anyway, TIS-620 (also other standards) is getting some momentum now.
after some users get educated about the importance of standard.
----
- What MIME charset name has to be used to *tag*
documents with characters ONLY present in CP874?
(IANA charset registry doesn't have CP874/Windows-874/IBM 874
although it has ISO 8859-11 and TIS 620.)
I don't know :(
----
- Which of them is used for Thai X11 fonts? I remember
that there are a couple of variants depending on how
non-spacing glyphs are treated (with zero-width or
negative-width). This is for gtk/x11 ports of gfx.
if you talking about fonts/glyphs table.
TIS-620-0 for TIS-620 direct map
TIS-620-1 for MacThai map
TIS-620-2 for Windows-874 map
----
sorry for very late response,
i'm out of the office for almost a month.
also, i will ask other Thai ecoding experts to review my answers again.
pls stay tuned, thx :)
Comment 19•22 years ago
|
||
> - What MIME charset names are used to tag web pages, emails,
> and news articles in TIS 620, ISO 8859-11 and
> CP874, respectively?
"tis-620", "iso-8859-11", and "windows-874", respectively.
But you know, "windows-874" is not registered.
It's just used only by Microsoft products.
And applications in other OS's hardly recognize it currently.
For "iso-8859-11", the formal registration is not done yet,
as it's still relatively new.
( ref: http://www.iana.org/assignments/character-sets )
But the process should be obvious.
> - Are there a lot of web pages and email messages/news articles that
> are actually in CP874(Windows-874/IBM 874) but are tagged as
> TIS-620 or ISO-8859-11?
Few, because most web page editors that can (often without user's
awareness, by means of "auto-correction" or alike) produce characters
that only exist in CP874 are running on Windows itself, and
"windows-874" tagging is just automatic.
On the other hand, web authors who tag their web pages as "tis-620"
usually either know what they are doing, or the tools simply don't allow
entering characters outside the range.
So, wrongly tagging a CP874 page as "tis-620" is rare.
> - What MIME charset(actually used encoding) is expected in
> html/xml/mailnews by Thai people and software
> products for Thai? Is it TIS-620, ISO-8859-11 or CP874?
> Perhaps, they don't care much because
> TIS-620 is a subset of ISO-8859-11 , which is in turn
> a subset of MS874/CP874.
For Windows, both "windows-874" and "tis-620" are recognized.
For other OS's, "tis-620" is expected. CP874 is hardly
supported, either in XFree86 font server or in the core fonts themselves.
X-Term knows only TIS-620 and ISO-10646-1 for Thai, but not CP874.
Some mail/news readers, such as mutt and slrn, strictly reject
"windows-874" messages. So, we have many problems
reading CP874 messages, and TIS-620 is thus most expected.
For ISO-8859-11, its future is very close. I believe it will be
ready to adopt soon. (In fact, some implementations, such as
KDE/Qt, seem to know ISO-8859-11 even better than TIS-620!)
> - What MIME charset name has to be used to *tag*
> documents with characters ONLY present in CP874?
>
> (IANA charset registry doesn't have CP874/Windows-874/IBM 874
> although it has ISO 8859-11 and TIS 620.)
"Windows-874" may be the best we can get for CP874 pages.
But it's basically unsupported outside Windows.
> - Which of them is used for Thai X11 fonts? I remember
> that there are a couple of variants depending on how
> non-spacing glyphs are treated (with zero-width or
> negative-width). This is for gtk/x11 ports of gfx.
Basically, TIS-620 is used for X11 fonts. The other variations
are extended from TIS-620. But CP874 is not completely covered.
Recently, XFree86 4.3.0 has also included ISO-8859-11 core fonts.
Nonetheless, considering the trend that X11 core fonts will be
gradually abandoned and replaced by Xft, which may encourage
the use of Windows TrueType fonts, CP874 may be covered soon.
But completely dropping X11 core fonts is not so soon, is it?
> If the answer to the second question is yes, the situation
> appears to be rather similar to ISO 8859-1 vs Windows-1252
> and EUC-KR vs X-Windows-949 where the first in pairs is
> a proper subset of the the second in pairs. In both cases, Mozilla
> treats *incoming* documents *tagged* as in the smaller set of each pair
> as in the larger set of each pair. Why? Because there are numerous
> documents mislabelled as in ISO-8859-1(EUC-KR) that are actually
> in Windows-1252(X-Windows-949).
I think the situation may be different for Thai, but defensive
strategy is not harmful.
> As for *outgoing* (Mozilla generated documents : composer and
> mailnews) documents,
> Mozilla is strictly compliant to the standard definition of
> those MIME charsets and warns users that if characters outside
> the repertoire of a currently selected MIME charset are
> included in an outgoing document tagged in the smaller
> set of each pair. (i.e. characters bet. 0x80 - 0x9F are included
> in ISO-8859-1 email).
But it would be nice to do something more than "warn", e.g.,
offer an option to map the extra characters back into the smaller
charset. This is quite crucial for communicating with Thai
non-Windows users.
Comment 20•22 years ago
|
||
> ftp://ftp.nectec.or.th/pub/thailinux/cvs/docs/thaisupp/thaisupp.html
>
> has details on Thai support under Linux. Do gtk/xlib ports
> have to support all registry-encoding variants,
> tis620-0, tis620-1, tis620-2, iso8859-11, tis620.2529-1, tis620.2533-0,
> tis620.2533-1? Well, it seems like this issue already has been
> taken care of so that we don't have to worry about
> gtk/xlib ports of gfx here.
Well, but the pango code in GTK+2, which processes those font
registry-encodings is not effective. I know it's overridden by
pangoLite when --enable-ctl, but the preferences dialog still
read only tis620-0 variation. Moreover, pangoLite handles
Thai Unicode fonts not quite well, and Xft support is quite
lacking. So, Mozilla support for registry-encoding variants is
not quite complete yet, IMO.
Let me find related Bugzilla issue for this.
Comment 21•22 years ago
|
||
Thanks a lot, both of you, for detailed answers.
> So, wrongly tagging a CP874 page as "tis-620" is rare.
> I think the situation may be different for Thai, but defensive
> strategy is not harmful.
All right. So, for decoders (traditional encodings -> UTF-16), making
TIS-620 and ISO-8859-11 as aliases to Windows-874 wouldn't break anything.
For outgoing documents (UTF-16 -> TIS-620/ISO-8859-11/Windows-874),
I'll make them strictly compliant to the standard as is the case of other encodings.
> But it would be nice to do something more than "warn", e.g.,
> offer an option to map the extra characters back into the smaller
> charset. This is quite crucial for communicating with Thai
> non-Windows users.
I guess that Mozilla does a bit more than that. If you're composing
a web page or email in html, it'll turn chars. not representable in the current
encoding
to NCRs. When plain text contents are generated, let me see (these days
I rarely use legacy encoding...). The dialog box came up warning users to change
the encoding. It may be nice to suggest a more extensive encoding than just
warning. If TIS-620/ISO-8859-11 is selected but a message
contains characters in Windows-874, it can suggest UTF-8 (preferred) or
Windows-874 (with a strong warning that it's not recognized everywhere.).
BTW, do you think it's a good idea to hide 'Windows-874' from users
in composer and mailnews editor? As I mentioned,
X-Windows-949 is to EUC-KR what Windows-874 is to TIS-620/ISO-8859-11.
Although Mozilla supports X-Windows-949(CP949/UHC), it's *hidden* from
users by default in composer/mailnews editor because it's not registered with
IANA. Another question? We may prepend 'Windows-874' with 'X-' to reflect
its non-standard status. What do you think of that? In that case, we definitely
have to hide it from composer/mailnews editor menu. Otherwise, some users
may use it only to find that MS products have no idea of what X-Windows-874
is....
Well, all these hassles can be neatly solved when everybody uses UTF-8 :-)
> Mozilla support for registry-encoding variants is not quite complete yet,
> Let me find related Bugzilla issue for this.
If there's a separate bug for that, let it be dealt with there.
Reporter | ||
Comment 22•22 years ago
|
||
comment #21
> BTW, do you think it's a good idea to hide 'Windows-874'
> from users in composer and mailnews editor?
Yes. it is a very good idea.
I proposed a behavior like this also in OpenOffice.org.
(OO.o can handle Windows-874 perfectly, as well as TIS-620.
But only TIS-620 will be shown to the user (in UI))
Comment 23•22 years ago
|
||
All right. I'll fix this soon.
> Xft support is quite lacking.
You may wish to take a look at bug 203052 and bug 176290.
Comment 24•22 years ago
|
||
In this patch, ISO-8859-11/TIS-620 decoders are 'aliased' (at the 'class'
implementation
level) to the CP874 decoder. In the opposite direction, all three of them
are separate adhering to their spec. Mail composition window, by default,
shows TIS-620 only(should it be ISO-8859-11, instead? I think TIS-620 is
safest),
but die-hard users can customize it to use Windows-874. BTW, I'm not including
*uf/*ut files (already uploaded by ftang).
One remaining question is how TIS-620 is 'perceived' under MacOS? In the
mapping
between MacOS script ID and Mozilla charset, I just left it as it (Script
ID for Thai is mapped to Mozilla's TIS-620 which used to be Windows-874 - it
was TIS-620 before, but it was actually Windows-874 - but now
is the strict TIS-620). Do I have to change it to Windows-874?
(http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/maccharset.properties#56)
I guess not if TIS-620/ISO-8859-11 better for cross-platform operation.
The best might be to have x-MacThai, but do we want to?
Comment 25•22 years ago
|
||
basically the same patch, but changes for gfx ports of gtk/xlib were added.
tis620-0, iso8859-11, and tis620-2 are treated separately in gtk/xlib.
tis620.2533-1 is still treated as tis620-0 (because we don't have
MacThai converter).
When converting trad. encoding to Unicode, TIS620, ISO 8859-11 and Windows-874
are synonymous (as we discussed). The other way around, they're differentiated
according to their specs. The mail composition window will 'expose'
TIS-620 by default, but end-users, if they desire, can customize it
to list Windows-874 and ISO-8859-11 as well.
Attachment #122192 -
Attachment is obsolete: true
Comment 26•22 years ago
|
||
Comment on attachment 122564 [details] [diff] [review]
a new patch with gtk/xlib changes added
Despite its length, this is a simple fix as ftang wrote last summer.
Attachment #122564 -
Flags: superreview?(rbs)
Attachment #122564 -
Flags: review?(ftang)
Comment 27•22 years ago
|
||
ftang, can you review this patch? This is a simple fix per your comment #15 last
August. Let me know if you're too busy and I'll ask Simon, Roy or Roland.
Target Milestone: --- → mozilla1.4final
Comment 28•21 years ago
|
||
Comment on attachment 122564 [details] [diff] [review]
a new patch with gtk/xlib changes added
Simon, can you review?
Attachment #122564 -
Flags: review?(ftang) → review?(smontagu)
Comment 29•21 years ago
|
||
Since this bug was reported, we've added support for iso-8859-11 by aliasing to
TIS-620 at the properties file level (bug 146287). Can you summarize what more
we are gaining by adding all these new conversion classes?
Nits:
>Index: intl/uconv/src/charsetalias.properties
>...
>@@ -490,3 +489,6 @@
> x-gbk=x-gbk
> windows-936=windows-936
> ansi-1251=windows-1251
>+x-tscii=x-tscii
>+x-tamilttf=x-tamilttf
>+x-sun-unicode-india-0=x-sun-unicode-india-0
These belong to a different patch, right?
New files should have the MPL tri-license, not NPL.
Comment 30•21 years ago
|
||
Thanks for taking a look at it.
I summarized what this patch does in comment #24 and comment #25. The patch in
bug 146287 treats ISO-8859-11 as identical to Windows-874 (its name is TIS-620
in the current Mozilla code, but actually it's Windows-874) in *both* directions
(encoding and decoding) although ISO-8859-11 is different from Windows-874. My
patch differentiate them when 'emitting out' to the world(encoding from Unicode)
while treating them identically when decoding(to Unicode) them. This distinction
cannot be made with charsetalias.properties file and has to be done at C++
level. The same kind of trick was used for a similar case (EUC-KR vs
Windows-949). The difference here is that we have three charsets TIS-620,
ISO-8859-11 and Windows-874, but the principle is the same.
Sorry for the 'pollution' from other patches (that have been already committed
now). I'll change the license boilerplate.
Comment 31•21 years ago
|
||
OK, it looks like there is no leaner way to do this with our current
architecture, but if there is going to be more of this kind of aliasing it would
be nice if we could find a way to do it without these dummy classes.
Am I making a mountain out of a molehill here? How much footprint does each
converter add?
I tried to test the patch and saw some strange effects in the Character Coding
menu. Can you attach a new patch against current trunk?
Comment 32•21 years ago
|
||
Two other cases are ISO-8859-1 < Windows-1252 and GB2312 (EUC-CN) < GBK <
GB18030. The former is taken care of. As for the latter, I forgot (I have to
check) whether they're distinguished in converting to Unicode. IIRC, they're
not (i.e. the way I think they have to be dealt with) but in a little different
way.
As for the footprint of a dummy class, it's ~100bytes (or less) in optimization
build on Windows. This is also from memory. I'm away from my devel.
environment. I'll give you the numbers and a patch against the current trunk
when I get back. BTW, can you tell me what some strange effects you found were?
Comment 33•21 years ago
|
||
The chief "strange effect" was that a long list of charsets appears in the
Character Coding | More menu, as well as the submenus below it, but I now see
that that happens in builds without the patch as well. I never saw this change
go in, and I wonder if it was deliberate or not.
Comment 34•21 years ago
|
||
Opened bug 209878 on the character coding menu problem.
Comment 35•21 years ago
|
||
Two dummy decoders and two encoders (non-dummy) increased the size of
libuconv.so on Linux (gcc 3.2 -O1 on ix86) by 876 bytes. The increase under
Windows should be ~500 bytes so that my memory served me well when I gave 100
bytes for a dummy decoder in Win32 binary.
Attachment #122564 -
Attachment is obsolete: true
Comment 36•21 years ago
|
||
Comment on attachment 126273 [details] [diff] [review]
the same patch with the license change and the 'pollution' removed
Asking Simon/Roger for r/sr.
Simon, let me get r if you're not worried about ~ 100 bytes per dummy decoder.
It's the leanest way possible at the moment.
BTW, it turns out that GB2312 < GBK < GB18030 case is not taken care of the way
I believe they should be. That is, they're distinguished from each other when
converting to Unicode as well as when converting from Unicode. Not everybody
might agree with me on this issue. If everybody does, we can cut down the size
of libiconv a bit more. Anyway, I'll file a new bug on that with Yueheng Xu on
CC.
Attachment #126273 -
Flags: superreview?(rbs)
Attachment #126273 -
Flags: review?(smontagu)
Comment 37•21 years ago
|
||
Comment on attachment 126273 [details] [diff] [review]
the same patch with the license change and the 'pollution' removed
Yes, I agree that this is the leanest way possible at the moment. r=smontagu
with one nit:
>--- intl/uconv/src/charsetData.properties 4 Jun 2003 06:19:31 -0000 1.34
>+++ intl/uconv/src/charsetData.properties 23 Jun 2003 03:40:43 -0000
>@@ -117,6 +117,8 @@
> koi8-u.LangGroup = x-cyrillic
> shift_jis.LangGroup = ja
> tis-620.LangGroup = th
>+windows-874.LangGroup = th
>+iso-8859-11.LangGroup = th
> tis620-2.LangGroup = th
This will be clearer if you insert the new lines after the two flavours of
tis620 instead of between them.
>--- gfx/src/gtk/nsFontMetricsGTK.cpp 11 Jun 2003 22:32:52 -0000 1.255
>+++ gfx/src/gtk/nsFontMetricsGTK.cpp 23 Jun 2003 03:40:52 -0000
>@@ -2176,8 +2183,8 @@
> printf("=== %s failed (%s)\n", aEntry->mInfo->mCharSet, __FILE__);
> }
> }
>- aEntry++;
> }
>+ aEntry++;
> }
>
This looks like pollution again. Please remove it before checking in. :-)
Attachment #126273 -
Flags: review?(smontagu) → review+
Comment 38•21 years ago
|
||
Simon, thanks for r. I'll reorder charsets in charsetData.prop file and get rid
of the pollution.
Updated•21 years ago
|
Attachment #122564 -
Flags: review?(smontagu)
Comment 39•21 years ago
|
||
Comment on attachment 122564 [details] [diff] [review]
a new patch with gtk/xlib changes added
sorry for spamming. canceling sr request for the obsolete patch.
Attachment #122564 -
Flags: superreview?(rbs)
Comment 40•21 years ago
|
||
Comment on attachment 126273 [details] [diff] [review]
the same patch with the license change and the 'pollution' removed
sr=rbs
Attachment #126273 -
Flags: superreview?(rbs) → superreview+
Comment 41•21 years ago
|
||
Fix was checked in to the trunk. Thanks.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Comment 42•21 years ago
|
||
tbox shown code size increased by approx 1.5k
Overall Change in Size
Total: +1508 (+1508/+0)
Code: +208 (+208/+0)
Data: +1300 (+1300/+0)
libuconv.so
Total: +1072 (+1072/+0)
Code: +208 (+208/+0)
Data: +864 (+864/+0)
+544 (+544/+0) R (DATA)
+544 (+544/+0) UNDEF:libuconv.so:R
+416 g_ufJohabJamoMapping
+72 g_ufMappingTable
+56 g_ufShiftTable
+320 (+320/+0) D (DATA)
+320 (+320/+0) UNDEF:libuconv.so:D
+224 components
+96 gConverterRegistryInfo
+208 (+208/+0) T (CODE)
+208 (+208/+0) UNDEF:libuconv.so:T
+60 nsUnicodeToISO885911Constructor(nsISupports *, nsID const &,
void **)
+60 nsUnicodeToTIS620Constructor(nsISupports *, nsID const &, void **)
+44 nsISO885911ToUnicodeConstructor(nsISupports *, nsID const &,
void **)
+44 nsTIS620ToUnicodeConstructor(nsISupports *, nsID const &, void **)
libgfx_gtk.so
Total: +256 (+256/+0)
Code: +0 (+0/+0)
Data: +256 (+256/+0)
+224 (+224/+0) D (DATA)
+224 (+224/+0) UNDEF:libgfx_gtk.so:D
+96 ISO885911
+96 TIS6202
+32 gCharSetMap
+32 (+32/+0) R (DATA)
+32 (+32/+0) UNDEF:libgfx_gtk.so:R
+32 kRegionCID
libgfxxprint.so
Total: +180 (+180/+0)
Code: +0 (+0/+0)
Data: +180 (+180/+0)
+160 (+160/+0) D (DATA)
+160 (+160/+0) UNDEF:libgfxxprint.so:D
+64 ISO885911
+64 TIS6202
+32 gConstCharSetMap
+20 (+20/+0) R (DATA)
+20 (+20/+0) UNDEF:libgfxxprint.so:R
+20 gConstNoneCharSetMap
You need to log in
before you can comment on or make changes to this bug.
Description
•