Closed Bug 59679 Opened 24 years ago Closed 24 years ago

Mac: A Japanese character in blockquote is ill-converted if Saved as ISO-2022-JP

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla0.9

People

(Reporter: tarahim, Assigned: nhottanscp)

References

Details

(Keywords: intl)

Attachments

(2 files)

Patch to call nsIUnicodeEncoder::Finish(). 24 years ago nhottanscp (deleted), patch		Details \| Diff \| Splinter Review
new patch, added a line to set a length before calling Finish. 24 years ago nhottanscp (deleted), patch		Details \| Diff \| Splinter Review

hirata masakazu

Reporter

Description

•

24 years ago

In HTML composer, a Japanese character "~" (2141 in JIS and 301C in Unicode) in blockquote is converted to three garbage characters when a file is saved as ISO-2022-JP. 2000110808 MTrunk.

nhottanscp

Assignee

Comment 1

•

24 years ago

I can reproduce, if not blockquote then it's okay (so doesn't seem to be a converter problem). Looks like the character was converted to NCR. <blockquote>$B$"$$$($*(B<br> $B$F$9$H〜%F%9%H(B<br> </blockquote>

Status: NEW → ASSIGNED

Katsuhiko Momoi

Comment 2

•

24 years ago

On Mac, I get this problem whether it it in Blockquote or in normal HTML text. The problem I think has to do with our decicion to map Shift_JIS 0x8160 to FF5E (by MS conversion map). But Mac maps it to 301C in Unicode. We had this problem discussed elsewhere and decided to go with FF5E knowing that there will be an incompatibility problem on Mac.

Katsuhiko Momoi

Comment 3

•

24 years ago

〜 is 301C in hex.

Katsuhiko Momoi

Comment 4

•

24 years ago

Did we fix our Unicode <-> ISO-2022-JP to be consistent with our Unicode <-> Shift_JIS table?

nhottanscp

Assignee

Comment 5

•

24 years ago

I tried again and I can reproduce it without blockquote. So it generically happens on Macintosh. I get the charset warning on Mac for sending a mail with that character. I was not involved with the old issue, probably Frank changed something, cc to cata. If we can find out the bug number of the old problem I migtht be able to find his check in for that.

Katsuhiko Momoi

Comment 6

•

24 years ago

http://bugzilla.mozilla.org/show_bug.cgi?id=35166 This is the bug where we changed mapping.

nhottanscp

Assignee

Comment 7

•

24 years ago

Here is his check in for that bug. Not easy to understand. We better wait until he comes back... http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=ftang%25netscape.com&whotype=match&sortby=Date&hours=2&date=explicit&mindate=08%2F16%2F2000+17%3A03%3A00&maxdate=08%2F16%2F2000+17%3A04%3A00&cvsroot=%2Fcvsroot

nhottanscp

Assignee

Comment 8

•

24 years ago

Is this working fine on Unix?

Comment 9

•

24 years ago

I don't see the problem on a linux build. (110807 Ja build)

Teruko Kobayashi

Comment 10

•

24 years ago

This does not happen on Win95J.

Teruko Kobayashi

Updated

•

24 years ago

Keywords: intl

nhottanscp

Assignee

Comment 11

•

24 years ago

Now I can reassign to Frank.

Assignee: nhotta → ftang

Status: ASSIGNED → NEW

Summary: A Japanese character in blockquote is ill-converted if Saved as ISO-2022-JP → Mac: A Japanese character in blockquote is ill-converted if Saved as ISO-2022-JP

Frank Tang

Comment 12

•

24 years ago

This looks similar to bug 63841 ><blockquote>$B$"$$$($*(B<br> $B$F$9$H〜%F%9%H(B<br> </blockquote> notice that the there are no esc + "(B" before 〜 and there are no esc +"$B" after it. Reassign this back to nhotta to debug. It looks a dup of 63841.

Assignee: ftang → nhotta

Frank Tang

Updated

•

24 years ago

Keywords: nsbeta1

Priority: P3 → P2

Target Milestone: --- → mozilla0.8

Frank Tang

Comment 13

•

24 years ago

Mark it as P2 nsbeta1.

nhottanscp

Assignee

Comment 14

•

24 years ago

There could be a problem for encoder client (e.g. not calling Finish() for unmapped error case) so it may cause the incorrect escape sequences. But the mapping problem is a separate issue and should be corrected by the converter. I will take a look at the client side first.

nhottanscp

Assignee

Comment 15

•

24 years ago

I can reproduce this on Windows 2000 when I input \u301C using "Character Map" utility.

Status: NEW → ASSIGNED

nhottanscp

Assignee

Comment 16

•

24 years ago

Somehow nsISaveAsCharset is not used any more, charset conversion is done in layout (nsDocumentEncoder.cpp, rev=1.35). I think it does not call Finish() in case of conversion error. Adding jst to cc, I have other bug 65324 caused by that change.

nhottanscp

Assignee

Comment 17

•

24 years ago

Reassign to jst, nsIUnicodeEncoder::Finish() has to be called in case of NS_ERROR_UENC_NOMAPPING. nsDocumentEncoder.cpp 431 jst 1.35 if (convert_rv == NS_ERROR_UENC_NOMAPPING) { 432 nsCAutoString entString("&#"); 433 entString.AppendInt(unicodeBuf[unicodeLength - 1]); 434 entString.Append(';'); 435 436 rv = aStream->Write(entString.GetBuffer(), entString.Length(), &written);

Assignee: nhotta → jst

Status: ASSIGNED → NEW

Johnny Stenback (:jst)

Comment 18

•

24 years ago

Reassigning to anthonyd.

Assignee: jst → anthonyd

nhottanscp

Assignee

Comment 19

•

24 years ago

Filed JIS encoder problem for \u301C as bug 65991.

rubydoo123

Comment 20

•

24 years ago

moving this to moz0.9

Target Milestone: mozilla0.8 → mozilla0.9

anthonyd

Comment 21

•

24 years ago

near as I can tell, (though I have no way to reproduce this, there is some character that isn't being converted correctly or something on the mac. The solution (as stated in the bug is to call: nsIUnicodeEncoder::Finish(...) BUT, this method has never been implemented at all. Can someone please explain to me what is supposed to be done here, and why implementing, and then calling this method will fix fix this bug? anthonyd

nhottanscp

Assignee

Comment 22

•

24 years ago

Please look at the header file for the info nsIUnicodeEncoder::Finish. http://lxr.mozilla.org/seamonkey/source/intl/uconv/public/nsIUnicodeEncoder.h#128 See below for the implementation. http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvja/nsUCvJaSupport.cpp#543

nhottanscp

Assignee

Comment 23

•

24 years ago

Reassign to nhotta.

Assignee: anthonyd → nhotta

nhottanscp

Assignee

Comment 24

•

24 years ago

Attached patch Patch to call nsIUnicodeEncoder::Finish(). (deleted) — Details — Splinter Review

nhottanscp

Assignee

Comment 25

•

24 years ago

jst, could you do a review for the patch?

Status: NEW → ASSIGNED

Johnny Stenback (:jst)

Comment 26

•

24 years ago

The one thing that concerns me about the patch is: char finish_buf[32]; I don't see any code that guarantees that we won't write past the bounds of this buffer, is Finish guaranteed to never ever write more than 31 characters into the output buffer? If that's guaranteed to be ok, then r=jst.

nhottanscp

Assignee

Comment 27

•

24 years ago

I forgot to set a length before calling Finish(), I will attach a patch.

nhottanscp

Assignee

Comment 28

•

24 years ago

Attached patch new patch, added a line to set a length before calling Finish. (deleted) — Details — Splinter Review

vidur (gone)

Comment 29

•

24 years ago

If we call Finish() to get the converter to write out an escape sequence for the character, do we still need to write out a numerical character entity for it as well? We're also presuming that the result of the call to Finish() relate to just the character that triggered the error. Is this a safe assumption? Could the call to Finish() actually require more space than the stack-based buffer and generate a NS_OK_UENC_MOREOUTPUT error code?

nhottanscp

Assignee

Comment 30

•

24 years ago

The entity needs to be written out after the escape sequence since the character was not mapped to the target charset. NS_ERROR_UENC_NOMAPPING is returned when a character could not be mapped from unicode. That case, we always need to call Finish(). Even if calling Finish() when it is not needed just causes extra escape sequences to be written out and does not corrupt the output. About the buffer size, we could loop and increase the buffer but the output is small (3 bytes for ISO-2022-JP) so practically no problem.

nhottanscp

Assignee

Comment 31

•

24 years ago

Vidur, did my comment make sense, do you have other comments?

nhottanscp

Assignee

Updated

•

24 years ago

Blocks: 63841

nhottanscp

Assignee

Updated

•

24 years ago

Keywords: review

nhottanscp

Assignee

Comment 32

•

24 years ago

It was reviewed and checked in as a part of bug 65324. JIS encoder problem for \u301C is bug 65991.

Status: ASSIGNED → RESOLVED

Closed: 24 years ago

Resolution: --- → FIXED

Teruko Kobayashi

Comment 33

•

24 years ago

Verified as fixed in 3-01 build.

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.