Closed
Bug 75928
Opened 24 years ago
Closed 24 years ago
need GB18030 converters
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla0.9.1
People
(Reporter: bstell, Assigned: ftang)
References
Details
(Whiteboard: depend on 80772. expect date 5/17)
Attachments
(12 files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review |
GB18030 is a mandatory Standard in China
this bug tracks the GB18030 converter requirement in bug 72525
Assignee | ||
Comment 1•24 years ago
|
||
mark as moz0.9.1
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.1
Updated•24 years ago
|
QA Contact: andreasb → ylong
Assignee | ||
Comment 2•24 years ago
|
||
Assignee | ||
Comment 3•24 years ago
|
||
Assignee | ||
Comment 4•24 years ago
|
||
Assignee | ||
Comment 5•24 years ago
|
||
Assignee | ||
Comment 6•24 years ago
|
||
Assignee | ||
Comment 7•24 years ago
|
||
Assignee | ||
Comment 8•24 years ago
|
||
Assignee | ||
Comment 9•24 years ago
|
||
Assignee | ||
Comment 10•24 years ago
|
||
Assignee | ||
Comment 11•24 years ago
|
||
Comment 12•24 years ago
|
||
Add myself to the CC list
Assignee | ||
Comment 13•24 years ago
|
||
Assignee | ||
Comment 14•24 years ago
|
||
OK, here is what I done:
1. GB18030 is 1, 2 or 4 byte encoding
2. the 1 byte part are the same as GBK
3. the 2 byte part are a little different than GBK
so I first create a new pl file - intl\uconv\tools\gengb18030tables.pl
This perl script will take two input file CP936.txt and GB18030- compare them and
seperate them into 4 different part
a. common between GB18030 and CP936 (which is GBK)
b. GB18030 unique in 2 bytes
c. CP936 unique in 2 bytes
d. 4 byte GB18030
Then I also fix the cp936tocdx.pl since it origionally have some problem.
the gengb18030tables.pl will call umaptable to generate
gb18030uniq2b.uf gb18030uniq2b.ut gbkuniq2b.uf and gbkuniq2b.ut ,
gb180304bytes.uf and gb180304bytes.ut
also it will call the updated cp936tocdx.pl to generate the correct cp936map.h
The origional ucvcn directory is very very massy. I need to clean it up and
abstrate some function into a class nsGBKConvUtil.{h,cpp}
I also change the nsUnicodeToGBK and nsGBKToUnicode to add converters to support
GB18030 (from / to unicode) GB18030Font0 (from unicode) and GB18030Font1 (to
unicode). The later two are for Linux/Solaris font encoding. Since they share the
table with other, the size won't have big impact.
In order to support 4 byte GB18030, I need to add new class to uconv/src/ugen and
uscan.c to transform the 0x81-0xfe 0x30-0x39 0x81-0xfe 0x30-0x39 4 byte seq into
a 16 bits space so our converersion table will be more compact.
Assignee | ||
Updated•24 years ago
|
Whiteboard: code ready need to be review
Assignee | ||
Comment 15•24 years ago
|
||
see also 75706
Assignee | ||
Comment 16•24 years ago
|
||
bstell review my code . here are the comments:
1. add comment about the 2 byte and 4 byte fallback converter in nsGBKToUnicode
and nsUnicodeToGBK
2. add macro for the 2 byte and 4 byte sequence checking
3. rename UNDEF_UNICODE to UCS2_NO_MAPPING
4. change the GetMaxLength return vaule to NS_OK_UDEC_EXACTLENGTH for those
converter which have fix width
5. complete FillInfo for
nsUnicodeToGBK, nsUnicodeToGB18030, nsUnicodeToGB18030Font0
6. complete Try2ByteEncoder and Try4ByteEncoder for nsUnicodeToGBK
7. chagne the nsGBKConvUtil.cpp 0xa000 to 0x9fff
Assignee | ||
Comment 17•24 years ago
|
||
Assignee | ||
Comment 18•24 years ago
|
||
The last diff is the diff from the previous patch. Notice that it use - instead
of + for what I add because I do a dir diff reversely with the copy I back up.
Assignee | ||
Comment 19•24 years ago
|
||
spin the uconv/public and uconv/src part into 79297
Assignee | ||
Comment 20•24 years ago
|
||
Last comment is wrong-
It should be
spin the uconv/public and uconv/src part into 79273
Assignee | ||
Comment 21•24 years ago
|
||
spin the pl part to 79275
Assignee | ||
Comment 22•24 years ago
|
||
spin the converter table change to 79276
Assignee | ||
Updated•24 years ago
|
Comment 23•24 years ago
|
||
Frank, do you have a table for GB18030? If you have it, please attach it.
Assignee | ||
Updated•24 years ago
|
Whiteboard: code ready need to be review
Assignee | ||
Updated•24 years ago
|
Whiteboard: depend on 80772. expect date 5/17
Assignee | ||
Comment 24•24 years ago
|
||
fixed and check in
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Comment 25•24 years ago
|
||
Hi Frank,
I have a question about conversion. I'm now working PS support in gfx/ps module
and it needs to convert unicode to native gb18030 code if user wants to use
gb18030 (GBK2K based) PS fonts.
However, it seems that the following operation fails (blank code is generated)
on my environment. Can you check this?
I use UNICODE 0x340f for example,
1. ./nsconv -f -f gb18030 -t unicode gb18030_file > unicode_file
To unicode conversion is correct for me.
2. ./nsconv -f UTF-8 -t gb18030 unicode_file > gb18030_file_reverse
However, unicode to gb18030 generates blank.
Assignee | ||
Comment 26•24 years ago
|
||
spin off the encoder issue katakai mentioned into 81200
Comment 28•23 years ago
|
||
QA can verify this in Linux. We do not have fonts for Windows and Mac.
I change the QA contact to reporter.
QA Contact: teruko → bstell
You need to log in
before you can comment on or make changes to this bug.
Description
•