Closed Bug 59679 Opened 24 years ago Closed 24 years ago

Mac: A Japanese character in blockquote is ill-converted if Saved as ISO-2022-JP

Categories

(Core :: Internationalization, defect, P2)

PowerPC
Mac System 9.x
defect

Tracking

()

VERIFIED FIXED
mozilla0.9

People

(Reporter: tarahim, Assigned: nhottanscp)

References

Details

(Keywords: intl)

Attachments

(2 files)

In HTML composer, a Japanese character "~" (2141 in JIS and 301C in Unicode) in blockquote is converted to three garbage characters when a file is saved as ISO-2022-JP. 2000110808 MTrunk.
I can reproduce, if not blockquote then it's okay (so doesn't seem to be a converter problem). Looks like the character was converted to NCR. <blockquote>$B$"$$$($*(B<br> $B$F$9$H&#12316;%F%9%H(B<br> </blockquote>
Status: NEW → ASSIGNED
On Mac, I get this problem whether it it in Blockquote or in normal HTML text. The problem I think has to do with our decicion to map Shift_JIS 0x8160 to FF5E (by MS conversion map). But Mac maps it to 301C in Unicode. We had this problem discussed elsewhere and decided to go with FF5E knowing that there will be an incompatibility problem on Mac.
&#12316; is 301C in hex.
Did we fix our Unicode <-> ISO-2022-JP to be consistent with our Unicode <-> Shift_JIS table?
I tried again and I can reproduce it without blockquote. So it generically happens on Macintosh. I get the charset warning on Mac for sending a mail with that character. I was not involved with the old issue, probably Frank changed something, cc to cata. If we can find out the bug number of the old problem I migtht be able to find his check in for that.
http://bugzilla.mozilla.org/show_bug.cgi?id=35166 This is the bug where we changed mapping.
Is this working fine on Unix?
I don't see the problem on a linux build. (110807 Ja build)
This does not happen on Win95J.
Keywords: intl
Now I can reassign to Frank.
Assignee: nhotta → ftang
Status: ASSIGNED → NEW
Summary: A Japanese character in blockquote is ill-converted if Saved as ISO-2022-JP → Mac: A Japanese character in blockquote is ill-converted if Saved as ISO-2022-JP
This looks similar to bug 63841 ><blockquote>$B$"$$$($*(B<br> $B$F$9$H&#12316;%F%9%H(B<br> </blockquote> notice that the there are no esc + "(B" before &#12316; and there are no esc +"$B" after it. Reassign this back to nhotta to debug. It looks a dup of 63841.
Assignee: ftang → nhotta
Keywords: nsbeta1
Priority: P3 → P2
Target Milestone: --- → mozilla0.8
Mark it as P2 nsbeta1.
There could be a problem for encoder client (e.g. not calling Finish() for unmapped error case) so it may cause the incorrect escape sequences. But the mapping problem is a separate issue and should be corrected by the converter. I will take a look at the client side first.
I can reproduce this on Windows 2000 when I input \u301C using "Character Map" utility.
Status: NEW → ASSIGNED
Somehow nsISaveAsCharset is not used any more, charset conversion is done in layout (nsDocumentEncoder.cpp, rev=1.35). I think it does not call Finish() in case of conversion error. Adding jst to cc, I have other bug 65324 caused by that change.
Reassign to jst, nsIUnicodeEncoder::Finish() has to be called in case of NS_ERROR_UENC_NOMAPPING. nsDocumentEncoder.cpp 431 jst 1.35 if (convert_rv == NS_ERROR_UENC_NOMAPPING) { 432 nsCAutoString entString("&#"); 433 entString.AppendInt(unicodeBuf[unicodeLength - 1]); 434 entString.Append(';'); 435 436 rv = aStream->Write(entString.GetBuffer(), entString.Length(), &written);
Assignee: nhotta → jst
Status: ASSIGNED → NEW
Reassigning to anthonyd.
Assignee: jst → anthonyd
Filed JIS encoder problem for \u301C as bug 65991.
moving this to moz0.9
Target Milestone: mozilla0.8 → mozilla0.9
near as I can tell, (though I have no way to reproduce this, there is some character that isn't being converted correctly or something on the mac. The solution (as stated in the bug is to call: nsIUnicodeEncoder::Finish(...) BUT, this method has never been implemented at all. Can someone please explain to me what is supposed to be done here, and why implementing, and then calling this method will fix fix this bug? anthonyd
Please look at the header file for the info nsIUnicodeEncoder::Finish. http://lxr.mozilla.org/seamonkey/source/intl/uconv/public/nsIUnicodeEncoder.h#128 See below for the implementation. http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvja/nsUCvJaSupport.cpp#543
Reassign to nhotta.
Assignee: anthonyd → nhotta
jst, could you do a review for the patch?
Status: NEW → ASSIGNED
The one thing that concerns me about the patch is: char finish_buf[32]; I don't see any code that guarantees that we won't write past the bounds of this buffer, is Finish guaranteed to never ever write more than 31 characters into the output buffer? If that's guaranteed to be ok, then r=jst.
I forgot to set a length before calling Finish(), I will attach a patch.
If we call Finish() to get the converter to write out an escape sequence for the character, do we still need to write out a numerical character entity for it as well? We're also presuming that the result of the call to Finish() relate to just the character that triggered the error. Is this a safe assumption? Could the call to Finish() actually require more space than the stack-based buffer and generate a NS_OK_UENC_MOREOUTPUT error code?
The entity needs to be written out after the escape sequence since the character was not mapped to the target charset. NS_ERROR_UENC_NOMAPPING is returned when a character could not be mapped from unicode. That case, we always need to call Finish(). Even if calling Finish() when it is not needed just causes extra escape sequences to be written out and does not corrupt the output. About the buffer size, we could loop and increase the buffer but the output is small (3 bytes for ISO-2022-JP) so practically no problem.
Vidur, did my comment make sense, do you have other comments?
Blocks: 63841
Keywords: review
It was reviewed and checked in as a part of bug 65324. JIS encoder problem for \u301C is bug 65991.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Verified as fixed in 3-01 build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: