Closed Bug 75928 Opened 24 years ago Closed 24 years ago

need GB18030 converters

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla0.9.1

People

(Reporter: bstell, Assigned: ftang)

References

Details

(Whiteboard: depend on 80772. expect date 5/17)

Attachments

(12 files)

(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), text/plain
Details
(deleted), text/plain
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
GB18030 is a mandatory Standard in China this bug tracks the GB18030 converter requirement in bug 72525
Blocks: 72525
mark as moz0.9.1
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.1
QA Contact: andreasb → ylong
Attached patch other diff in intl/uconv/ucvcn (deleted) — Splinter Review
Attached patch patch to cp936tocdx.pl (deleted) — Splinter Review
Add myself to the CC list
OK, here is what I done: 1. GB18030 is 1, 2 or 4 byte encoding 2. the 1 byte part are the same as GBK 3. the 2 byte part are a little different than GBK so I first create a new pl file - intl\uconv\tools\gengb18030tables.pl This perl script will take two input file CP936.txt and GB18030- compare them and seperate them into 4 different part a. common between GB18030 and CP936 (which is GBK) b. GB18030 unique in 2 bytes c. CP936 unique in 2 bytes d. 4 byte GB18030 Then I also fix the cp936tocdx.pl since it origionally have some problem. the gengb18030tables.pl will call umaptable to generate gb18030uniq2b.uf gb18030uniq2b.ut gbkuniq2b.uf and gbkuniq2b.ut , gb180304bytes.uf and gb180304bytes.ut also it will call the updated cp936tocdx.pl to generate the correct cp936map.h The origional ucvcn directory is very very massy. I need to clean it up and abstrate some function into a class nsGBKConvUtil.{h,cpp} I also change the nsUnicodeToGBK and nsGBKToUnicode to add converters to support GB18030 (from / to unicode) GB18030Font0 (from unicode) and GB18030Font1 (to unicode). The later two are for Linux/Solaris font encoding. Since they share the table with other, the size won't have big impact. In order to support 4 byte GB18030, I need to add new class to uconv/src/ugen and uscan.c to transform the 0x81-0xfe 0x30-0x39 0x81-0xfe 0x30-0x39 4 byte seq into a 16 bits space so our converersion table will be more compact.
Whiteboard: code ready need to be review
see also 75706
bstell review my code . here are the comments: 1. add comment about the 2 byte and 4 byte fallback converter in nsGBKToUnicode and nsUnicodeToGBK 2. add macro for the 2 byte and 4 byte sequence checking 3. rename UNDEF_UNICODE to UCS2_NO_MAPPING 4. change the GetMaxLength return vaule to NS_OK_UDEC_EXACTLENGTH for those converter which have fix width 5. complete FillInfo for nsUnicodeToGBK, nsUnicodeToGB18030, nsUnicodeToGB18030Font0 6. complete Try2ByteEncoder and Try4ByteEncoder for nsUnicodeToGBK 7. chagne the nsGBKConvUtil.cpp 0xa000 to 0x9fff
Attached patch diff from above patches (deleted) — Splinter Review
The last diff is the diff from the previous patch. Notice that it use - instead of + for what I add because I do a dir diff reversely with the copy I back up.
spin the uconv/public and uconv/src part into 79297
Last comment is wrong- It should be spin the uconv/public and uconv/src part into 79273
spin the pl part to 79275
spin the converter table change to 79276
Depends on: 79273, 79275, 79276
Blocks: 51421
Frank, do you have a table for GB18030? If you have it, please attach it.
Whiteboard: code ready need to be review
Blocks: 80725
Depends on: 80772
Whiteboard: depend on 80772. expect date 5/17
fixed and check in
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Hi Frank, I have a question about conversion. I'm now working PS support in gfx/ps module and it needs to convert unicode to native gb18030 code if user wants to use gb18030 (GBK2K based) PS fonts. However, it seems that the following operation fails (blank code is generated) on my environment. Can you check this? I use UNICODE 0x340f for example, 1. ./nsconv -f -f gb18030 -t unicode gb18030_file > unicode_file To unicode conversion is correct for me. 2. ./nsconv -f UTF-8 -t gb18030 unicode_file > gb18030_file_reverse However, unicode to gb18030 generates blank.
spin off the encoder issue katakai mentioned into 81200
Changed QA contact to teruko@netscape.com.
QA Contact: ylong → teruko
QA can verify this in Linux. We do not have fonts for Windows and Mac. I change the QA contact to reporter.
QA Contact: teruko → bstell
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: