Closed
Bug 59679
Opened 24 years ago
Closed 24 years ago
Mac: A Japanese character in blockquote is ill-converted if Saved as ISO-2022-JP
Categories
(Core :: Internationalization, defect, P2)
Tracking
()
VERIFIED
FIXED
mozilla0.9
People
(Reporter: tarahim, Assigned: nhottanscp)
References
Details
(Keywords: intl)
Attachments
(2 files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review |
In HTML composer, a Japanese character "~" (2141 in JIS and 301C in Unicode) in
blockquote is converted to three garbage characters when a file is saved as
ISO-2022-JP.
2000110808 MTrunk.
Assignee | ||
Comment 1•24 years ago
|
||
I can reproduce, if not blockquote then it's okay (so doesn't seem to be a
converter problem).
Looks like the character was converted to NCR.
<blockquote>$B$"$$$($*(B<br> $B$F$9$H〜%F%9%H(B<br> </blockquote>
Status: NEW → ASSIGNED
Comment 2•24 years ago
|
||
On Mac, I get this problem whether it it in Blockquote
or in normal HTML text.
The problem I think has to do with our decicion to map
Shift_JIS 0x8160 to FF5E (by MS conversion map). But Mac
maps it to 301C in Unicode. We had this problem discussed elsewhere
and decided to go with FF5E knowing that there will be an incompatibility
problem on Mac.
Comment 3•24 years ago
|
||
〜 is 301C in hex.
Comment 4•24 years ago
|
||
Did we fix our Unicode <-> ISO-2022-JP to be consistent
with our Unicode <-> Shift_JIS table?
Assignee | ||
Comment 5•24 years ago
|
||
I tried again and I can reproduce it without blockquote. So it generically
happens on Macintosh. I get the charset warning on Mac for sending a mail with
that character.
I was not involved with the old issue, probably Frank changed something, cc to
cata.
If we can find out the bug number of the old problem I migtht be able to find
his check in for that.
Comment 6•24 years ago
|
||
http://bugzilla.mozilla.org/show_bug.cgi?id=35166
This is the bug where we changed mapping.
Assignee | ||
Comment 7•24 years ago
|
||
Here is his check in for that bug. Not easy to understand.
We better wait until he comes back...
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=ftang%25netscape.com&whotype=match&sortby=Date&hours=2&date=explicit&mindate=08%2F16%2F2000+17%3A03%3A00&maxdate=08%2F16%2F2000+17%3A04%3A00&cvsroot=%2Fcvsroot
Assignee | ||
Comment 8•24 years ago
|
||
Is this working fine on Unix?
Comment 10•24 years ago
|
||
This does not happen on Win95J.
Assignee | ||
Comment 11•24 years ago
|
||
Now I can reassign to Frank.
Assignee: nhotta → ftang
Status: ASSIGNED → NEW
Summary: A Japanese character in blockquote is ill-converted if Saved as ISO-2022-JP → Mac: A Japanese character in blockquote is ill-converted if Saved as ISO-2022-JP
Comment 12•24 years ago
|
||
This looks similar to bug 63841
><blockquote>$B$"$$$($*(B<br> $B$F$9$H〜%F%9%H(B<br> </blockquote>
notice that the there are no esc + "(B" before 〜 and there are no esc
+"$B" after it.
Reassign this back to nhotta to debug.
It looks a dup of 63841.
Assignee: ftang → nhotta
Updated•24 years ago
|
Comment 13•24 years ago
|
||
Mark it as P2 nsbeta1.
Assignee | ||
Comment 14•24 years ago
|
||
There could be a problem for encoder client (e.g. not calling Finish() for
unmapped error case) so it may cause the incorrect escape sequences.
But the mapping problem is a separate issue and should be corrected by the
converter.
I will take a look at the client side first.
Assignee | ||
Comment 15•24 years ago
|
||
I can reproduce this on Windows 2000 when I input \u301C
using "Character Map" utility.
Status: NEW → ASSIGNED
Assignee | ||
Comment 16•24 years ago
|
||
Somehow nsISaveAsCharset is not used any more, charset conversion is done in
layout (nsDocumentEncoder.cpp, rev=1.35).
I think it does not call Finish() in case of conversion error.
Adding jst to cc, I have other bug 65324 caused by that change.
Assignee | ||
Comment 17•24 years ago
|
||
Reassign to jst, nsIUnicodeEncoder::Finish() has to be called in case of
NS_ERROR_UENC_NOMAPPING.
nsDocumentEncoder.cpp
431 jst 1.35 if (convert_rv == NS_ERROR_UENC_NOMAPPING) {
432 nsCAutoString entString("&#");
433 entString.AppendInt(unicodeBuf[unicodeLength - 1]);
434 entString.Append(';');
435
436 rv = aStream->Write(entString.GetBuffer(),
entString.Length(), &written);
Assignee: nhotta → jst
Status: ASSIGNED → NEW
Assignee | ||
Comment 19•24 years ago
|
||
Filed JIS encoder problem for \u301C as bug 65991.
Comment 21•24 years ago
|
||
near as I can tell, (though I have no way to reproduce this, there is some
character that isn't being converted correctly or something on the mac. The
solution (as stated in the bug is to call:
nsIUnicodeEncoder::Finish(...)
BUT, this method has never been implemented at all. Can someone please explain
to me what is supposed to be done here, and why implementing, and then calling
this method will fix fix this bug?
anthonyd
Assignee | ||
Comment 22•24 years ago
|
||
Please look at the header file for the info nsIUnicodeEncoder::Finish.
http://lxr.mozilla.org/seamonkey/source/intl/uconv/public/nsIUnicodeEncoder.h#128
See below for the implementation.
http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvja/nsUCvJaSupport.cpp#543
Assignee | ||
Comment 24•24 years ago
|
||
Comment 26•24 years ago
|
||
The one thing that concerns me about the patch is:
char finish_buf[32];
I don't see any code that guarantees that we won't write past the bounds of this
buffer, is Finish guaranteed to never ever write more than 31 characters into
the output buffer?
If that's guaranteed to be ok, then r=jst.
Assignee | ||
Comment 27•24 years ago
|
||
I forgot to set a length before calling Finish(), I will attach a patch.
Assignee | ||
Comment 28•24 years ago
|
||
Comment 29•24 years ago
|
||
If we call Finish() to get the converter to write out an escape sequence for the
character, do we still need to write out a numerical character entity for it as
well? We're also presuming that the result of the call to Finish() relate to
just the character that triggered the error. Is this a safe assumption? Could
the call to Finish() actually require more space than the stack-based buffer and
generate a NS_OK_UENC_MOREOUTPUT error code?
Assignee | ||
Comment 30•24 years ago
|
||
The entity needs to be written out after the escape sequence since the character
was not mapped to the target charset.
NS_ERROR_UENC_NOMAPPING is returned when a character could not be mapped from
unicode. That case, we always need to call Finish(). Even if calling Finish()
when it is not needed just causes extra escape sequences to be written out and
does not corrupt the output.
About the buffer size, we could loop and increase the buffer but the output is
small (3 bytes for ISO-2022-JP) so practically no problem.
Assignee | ||
Comment 31•24 years ago
|
||
Vidur, did my comment make sense, do you have other comments?
Assignee | ||
Comment 32•24 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•