Closed Bug 54135 Opened 24 years ago Closed 23 years ago

conversion (fromU/toU) problem- Sjis code x'81ca' becomes x'fa54'

Categories

(Core :: DOM: Editor, defect, P3)

x86
Windows 2000
defect

Tracking

()

VERIFIED FIXED
mozilla0.9.6

People

(Reporter: hobbit_mak, Assigned: ftang)

References

()

Details

(Keywords: intl)

Attachments

(20 files)

(deleted), patch
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), application/octet-stream
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), application/octet-stream
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), text/plain
Details
(deleted), text/html
Details
(deleted), text/html
Details
(deleted), text/html
Details
(deleted), text/html
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), application/octet-stream
ftang
: review+
Details
(deleted), application/octet-stream
Details
From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; m18) Gecko/20000924 BuildID: 2000092408 If you edit page of Shift JIS and save it proper character x'81ca' becomes x'fa54'. Reproducible: Always Steps to Reproduce: 1.Edit page of http;//homepage1.nifty.com/hobbit/html/utf8.html 2.Save it to local file. Actual Results: x'81ca'(proper code) changed to x'fa54'(Windows code) Expected Results: x'81ca' is reatained. Maybe related with 35166. http://bugzilla.mozilla.org/show_bug.cgi?id=35166
assigning to ftang for initial debug
Assignee: beppe → ftang
minor issue. mark it as assign
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Target Milestone: --- → Future
x'fa54'(Windows code) cannot be displayed by Mozilla itself. (Build 2000112704)
It is reported that Linux build also had this problem. http://bugzilla.mozilla.gr.jp/show_bug.cgi?id=474
Also sjis code 0x81E0 becomes to 0x8790 sjis code 0x81e6 becomes to 0xfA5B
Attached patch Patch for sjis.ut and sjis.uf (deleted) — Splinter Review
Patch above was verified on Windows 2000 environments.
Attached list is in utf-8 encoding.
remove Future from the target milestone.
Keywords: intl
Target Milestone: Future → ---
This problem is fixed in Build 2001011720.
Sorry test with modified modules. This problem is reproduced on Build ID 2001012304.
Because http://bugzilla.mozilla.org/show_bug.cgi?id=44374 was fixed, 81BE becomes 879C. 81BF becomes 879B. 81DA becomes 8797. 81DB becomes 8796. 81DF becomes 8791. 81E3 becomes 8795. 81E7 becomes 8792. Patch is also updated.
Summary: Sjis code x'81ca' becomes x'fa54' → conversion problem- Sjis code x'81ca' becomes x'fa54'
hobbit.makoto@nifty.ne.jp: How you generate these patch ? Do you change the source table and use the ufrom and uto tool to generate it? If so, can you give us the change of the source table?
Summary: conversion problem- Sjis code x'81ca' becomes x'fa54' → conversion (fromU/toU) problem- Sjis code x'81ca' becomes x'fa54'
I could not find how to use the tool. So I changed both source of coment and object.
Mozilla convert U+FFE2 to 7C7B (ISO-8022-JP). It must be 224C (ISO-8022-JP).
How can I change the source table and use the ufrom and uto tool to generate it? I could not find these tools in source file.
tools at mozilla/intl/uconv/tools/umaptable.c nhotta- can you help to drive this ? I am overload
Assignee: ftang → nhotta
Status: ASSIGNED → NEW
hobbit.makoto@nifty.ne.jp, could you summarize the current remaining problem?
Problem left in build 2001050804 is - Ten characters are changed if you edit Shift JIS source and save it as Shift JIS code. 0x81be becomes 0x879c 0x81bf becomes 0x879b 0x81ca becomes 0xfa54 0x81da becomes 0x8797 0x81db becomes 0x8796 0x81df becomes 0x8791 0x81e0 becomes 0x8790 0x81e3 becomes 0x8795 0x81e6 becomes 0xfa5b 0x81e7 becomes 0x8792 Problem about iso-8022-jp was fixed. I could not download latest source yet, so I could not use tool yet.
Attached file zipped mozilla/intl/uconv/tools/ (deleted) —
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.1
hobbit.makoto@nifty.ne.jp: Please try the attached file and update your patch, thanks.
I download mozilla/intl/uconv/tools/. But I could not found how you made sjis.ut and shis.ut. I went to mozilla/intl/uconb/tools/. I nmaked make.win and get umaptable.exe. Maybe you made sjis.ut and sjis.uf by umaptable and original conversion table. But I could not fine where and how to make sjis.ut and shis.ut.
Let me ask Frank and I will update.
for convert from sjis into unicode I run /intl/uconv/tools/cp932tojdx.pl against http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT and it will generate source/intl/uconv/ucvja/jis0208.ump this will be shared by SJIS/EUC/ISO-2022-JP to unicode conversion for convert from unicode into ShiftJIS I run intl/uconv/tools/jis0208fromcp932.pl againt http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT It will generate a file and I then pipe that file into umaptable -uf > 0208.uf to generate the jis0208.uf
Attached patch diff list for cp932.txt (deleted) — Splinter Review
I got cp932.txt from unicode and made sjis.uf from that. But some characters mapped to two sjis position. So I comment out sjis locations that had not proper in JIS X 0208 and 0212. I attached diff list and sjis.uf and confirmed that this sjis.uf solves problems.
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Target Milestone: mozilla0.9.2 → mozilla0.9.1
Attached patch diff -u for sjis.uf (deleted) — Splinter Review
I put a diff for sjis.uf, it's very big. I expected something similar to the patch of 02/14/01 06:13. hobbit.makoto@nifty.ne.jp, do you have any idea why the diff is so large? What characters did you actually changed? Please list character codes of changed characters.
I suppose that original table is not derived from CP932.txt. I would like to know the original table also, but I could not find it.
I am going to ask Frank. The characters you changed are the same as listed in your comment 2001-05-08 18:07?
No, character I changed from cp932.txt is listed in 05/15/01 07:30. No character in 2001-05-08 18:07 is not changed. They are the same as in cp932.txt.
It is strongly recommended to record from which tool and table or other resource, source was created. It is better to record in source file. Maybe this is the reason of difficulity to solve this bug. In http://bugzilla.mozilla.org/show_bug.cgi?id=35166 You conclude that you use cp932 for Unicode to SJIS conversion.
Bug 67374 - sources and tools to build unicode converters not in tree.
Depends on: 67374
Whiteboard: ftang to provide a source file for the current sjis.uf
TM to 0.9.2 per PDT triage (it's OK to check it in by Friday or after 0.9.1 branch is made).
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Reassign to ftang.
Assignee: nhotta → ftang
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
pdt+ base on 6/11 pdt meeting.
Whiteboard: ftang to provide a source file for the current sjis.uf → [PDT+]ftang to provide a source file for the current sjis.uf
I don't think we have time to address this problem by moz0.9.2. Push to moz0.9.3
Target Milestone: mozilla0.9.2 → mozilla0.9.3
remove PDT+
Whiteboard: [PDT+]ftang to provide a source file for the current sjis.uf → ftang to provide a source file for the current sjis.uf
mark as nsbranch
Keywords: nsBranch
Whiteboard: ftang to provide a source file for the current sjis.uf → no progress yet. ftang to provide a source file for the current sjis.uf
I read a part of program for japanese-unicode conversion. But I didn't recognize the sources and ways to generate some mapping tables. So, I made a tool to generate jis0201.uf, jis0208.uf, jis0208.ump, jis0208ext.uf and sjis.uf from CP932.TXT and SHIFTJIS.TXT. *.uf are generated with 'umaptable'. Diffs are so large because,,,, the original mapping policy about codes that SJIS:UCS2 = N:1 is to use HIGHER SJIS code. It is not so good idea. They shoud be mapped to LOWER SJIS code (without IBM ext codes : bug-82678). see http://support.microsoft.com/support/kb/articles/Q170/5/59.ASP. testpage : http://rh.vinelinux.org/~shom/sjis-cp932.html ---------- In addition, this tool can generate tables from APPLE_JAPANESE.TXT. # ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/JAPANESE.TXT If it is possible to add "Shift_JIS (Macintosh)" , some problems will be resolved: 1) SJIS in/out problems (bookmark import, saving mail draft, compose, etc.) on Mac0S 8,9 SJIS 815C (U+2014) EM DASH 8160 (U+301C) WAVE DASH 8161 (U+2016) DOUBLE VERTICAL LINE 817C (U+2212) MINUS SIGN 8191 (U+00A2) CENT SIGN (questionable : U+FFE0?) 8192 (U+00A3) POUND SIGN (questionable : U+FFE1?) 81CA (U+00AC) NOT SIGN (questionable : U+FFE2?) 2) Apple extended ShiftJIS codes (SJIS 8540-886D,EB41-ED96) # partly. because APPLE defined some codes as Unicode Sequences. # mozilla cannot process Unicode Sequeces. testpage : http://rh.vinelinux.org/~shom/sjis-mac.html
Attached file mkjpconv.pl (deleted) —
usage: mkjpconv.pl SHIFTJIS.TXT CP932.TXT (or mkjpconv.pl SHIFTJIS.TXT APPLE_JAPANESE.TXT APPLE_JAPANESE.TXT is generated (CR->LF) from APPLE/JAPANESE.TXT) SHIFTJIS.TXT is: ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/SHIFTJIS.TXT
Missed 0.9.3.
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Matsumoto san, could you put sjis.uf generated by your tool? I think the current problem is that it is hard to identify modifications. For example, if we want to change the mapping for Shift_JIS 0x81ca, we want to identify that change in sjis.uf. Then we can make sure the change won't affect other characters.
Attached patch diffs from 0.9.2 to generated (deleted) — Splinter Review
There is very large amount of diffs, but I can see all glyphs defined in SHIFTJIS.TXT on http://rh.vinelinux.org/~shom/sjis-cp932.html. I think current mapping table has many (hidden) problems espacially dual mapped codes in CP932.TXT. Do you have a tool (or method) to generate SJIS->UCS2, UCS2->SJIS, JIS->UCS2, UCS2->JIS mapping tables ?
I made a tool to check all codes in CP932.TXT. # to generate Shift JIS encoded HTML page perl mksjistest.pl CP932.TXT > sjis-cp932.html # to generate UTF-8 encoded HTML page perl mksjistest.pl CP932.TXT UTF-8 > sjis-cp932-utf8.html I modified sjis-cp932-utf8.html by 0.9.2 and 0.9.2 + generated maps, and 'Save As Charset' with Shift_JIS. (so I'm using Linux. Please check on Windows) diffs are: SRC = SJIS, ORG = modified by 0.9.2, NEW = modified by newmap SRC ORG NEW ------------------ JIS defined region 81BE 879C 81BE 81BF 879B 81BF 81CA FA54 81CA 81DA 8797 81DA 81DB 8796 81DB 81DF 8791 81DF 81E0 8790 81E0 81E3 8795 81E3 81E6 FA5B 81E6 81E7 8792 81E7 ------------------ NEC specific codes 8754 FA4A 8754 8755 FA4B 8755 : : : 875D FA53 875D 8782 FA59 8782 8784 FA5A 8784 878A FA58 878A 8790 8790 81E0 8791 8791 81DF 8792 8792 81E7 8795 8795 81E3 8796 8796 81DB 8797 8797 81DA 879A FA5B 81E6 879B 879B 81BF 879C 879C 81BE ----------------- NEC selected IBM ext region ED40 FA5C ED40 : : : EEF8 FA49 EEF8 EEF9 FA54 81CA EEFA FA55 EEFA EEFB FA56 EEFB EEFC FA57 EEFC ------------------ IBM ext region FA40 FA40 EEFA : : : FA49 FA49 EEF8 FA4A FA4A 8754 : : : FA53 FA53 875D FA54 FA54 81CA FA55 FA55 EEFA : : : FA57 FA57 EEFC FA58 FA58 878A FA59 FA59 8782 FA5B FA5B 81E6 FA5C FA5C ED40 : : : FC4B FC4B EEEC ------------------------- I think new mapping policy is same as OE. (I heard OE mapped codes in IBM ext region to NEC selected region)
Attached file mksjistest.pl (deleted) —
Attached file sjis-cp932.html (deleted) —
Attached file sjis-cp932-utf8.html (UTF-8 encoded) (deleted) —
roy yokoyama, can you help the check in the changes? shoji-san, which diffs should we pick?
Assignee: ftang → yokoyama
Status: ASSIGNED → NEW
accepting for 0.9.4 milestone.
Status: NEW → ASSIGNED
Please use *.uf, *.ump in the next attachment (old newmap.zip is not include jisx0208ext.uf, sorry) or create them by mkjpconv.pl (from SHIFTJIS.TXT and CP932.TXT). 'jisx0201gl.uf' is obsolete (not used in all sources). And if these are acceptable (I'll make testcases), add mkjpconv.pl into intl/uconv/tools. # cp932tojdx.pl and jis0208fromcp932.pl will be obsolete. I don't know where is the source of jis0212.{uf,ump}. I want to change mkjpconv.pl to make jis0212.{uf, ump}.
m0.9.5
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Blocks: 99171
nsbranch- since Frank moved it to 0.9.5
Keywords: nsbranchnsbranch-
shoji-san: what is the status of this bug? Are we waitng for ftang to provide sjis.uf source as stated in the whileboard? Note: I'd appreciate if you can change the status of patches which are already obsolete. === cc'ing ftang
Please test new maps on Windows, Mac and OS/2. testcases.zip has SJIS encoded texts to test. 1. display ALL chars in raw.txt must be shown. On Windows, ALL chars in rawext.txt, rawibmext.txt must be shown. 2. compose (round trip) 1) edit raw{,ext,ibmext}.txt.html on composer 2) save as with ShiftJIS 3) rawdump.pl <saved html> "<ORG>:<NEW>:DIFF" are not round tripped codes. New codes must be "SJIS lower" in http://bugzilla.mozilla.org/attachment.cgi?id=44509&action=view (see http://support.microsoft.com/support/kb/articles/Q170/5/59.ASP) 3. mail 1) compose new mail 2) CUT & PASTE all chars in raw.txt 3) send ALL chars in the mail with raw.txt must be shown. on Windows, ALL chars in the mail with raw{ext,ibmext}.txt must be shown. ------ If any problem would be occured on Mac or OS/2 especially about 9 chars in http://rh.vinelinux.org/~shom/sjisprob.html , it should not be corrected by changing mapping tables.
nhotta is back from sabbatical. assiging back to him.
Assignee: yokoyama → nhotta
Status: ASSIGNED → NEW
move to 0.9.6
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.5 → mozilla0.9.6
I think the tool has to be reviewed first. Frank, please review mkjpconv.pl included in the attachment of 08/08/01 03:17.
Assignee: nhotta → ftang
Status: ASSIGNED → NEW
Whiteboard: no progress yet. ftang to provide a source file for the current sjis.uf
Status: NEW → ASSIGNED
Blocks: 104056
Viewing the following diff by 4.x, we can see that mozilla is generating codes which 4.x cannot show, so put 4xp keywoard. diff between ...-sjis-0.9.2.html and ..--sjis-new.html http://bugzilla.mozilla.org/attachment.cgi?id=44534&action=view
Keywords: 4xp
Comment on attachment 45060 [details] newmap.zip (mkjpconv.pl, jis0208.uf, jis0208ext.uf, jis0201.uf, sjis.uf, IBMNEC.map ) rs=ftang.
Attachment #45060 - Flags: review+
Please check them in.
give back to nhotta for check in.
Assignee: ftang → nhotta
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
rs=blizzard
should someone from international QA be the qa_contact for this bug ?
Change QA contact to myself.
QA Contact: sujay → ylong
Checked in to the trunk. The tool still needs to be checked in. Frank, please review the tool. http://bugzilla.mozilla.org/attachment.cgi?id=51199&action=view
Assignee: nhotta → ftang
Status: ASSIGNED → NEW
Keywords: 4xp
The tool issue to be handled by bug 67374. Mark this as FIXED.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Changed QA contact to teruko@netscape.com.
QA Contact: ylong → teruko
No longer blocks: 104056
Verified as fixed in 2001-10-26 trunk build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: