Closed
Bug 153855
Opened 23 years ago
Closed 19 years ago
Composer does not display a UTF-16 signature in the right way.
Categories
(MailNews Core :: Internationalization, defect, P1)
MailNews Core
Internationalization
Tracking
(Not tracked)
VERIFIED
FIXED
mozilla1.8.1beta2
People
(Reporter: tjibbe, Assigned: masayuki)
References
Details
(Keywords: intl, verified1.8.1)
Attachments
(2 files, 4 obsolete files)
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
jshin1987
:
review+
mscott
:
superreview+
beltzner
:
approval1.8.1+
|
Details | Diff | Splinter Review |
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.1a) Gecko/20020611
BuildID: 20002061104
When using a UTF-8 encoded file as signature, composer only shows the first
three characters of the file, as if the signature file is not converted.
Reproducible: Always
Steps to Reproduce:
1.Write a signature file and save it as UTF-8
2.Tell Mozilla to use this file for your signature.
3.Compose a message.
Actual Results: With my signature file,the following three lines are added in
the composer window:
--
ÿþM
Expected Results: It should display as:
--
Met vriendelijke groeten,
Tjibbe Steneker.
The character "ij" is only available in the Unicode charset.
Reporter | ||
Comment 1•23 years ago
|
||
I just attached the signature file I use.
Reporter | ||
Comment 2•23 years ago
|
||
As per bug 135762, please replace &307; in the original report with the U+0133
(LATIN SMALL LIGATURE IJ) character.
Comment 3•23 years ago
|
||
Comment 4•22 years ago
|
||
This happens in Linux, too. The signature's interpreted as ISO-8859-1 for some
reason. Even when I start the program with a UTF-8 locale.
This seems to be some old source code that has never been modified. =P
Comment 5•22 years ago
|
||
Dave Oftedal, bug 52248 appears to address the BSD (and presumably Linux/*nix)
sig encoding issue; there is some discussion there for workarounds by setting
the system locale to use UTF-8. (This is for plain-text sigs; HTML sigs are bug
138008.)
See also bug 180985, which appears to be about using UTF-8 (or other encoding)
for the filename of the sig.
I could not find a dupe specific to Windows for plain-text UTF-8 sigs, so,
confirming. Attachment 1 [details] [diff] still fails, as described, even if the default
encoding is UTF-8, with 1.4-RC1.
Status: UNCONFIRMED → NEW
Component: Composition → Internationalization
Ever confirmed: true
Keywords: intl
Comment 6•20 years ago
|
||
It's still visible in Thunderbird version 0.9 (20041103)
Do we have a chance to get this into the aviary branch?
Flags: blocking-aviary1.0?
Updated•20 years ago
|
Product: MailNews → Core
Comment 7•20 years ago
|
||
too late for the 1.0 train now since there is not a patch yet and this is not a
stopper. =
Flags: blocking-aviary1.0? → blocking-aviary1.0-
Assignee | ||
Comment 8•19 years ago
|
||
This will fix by the latest patch of bug 201071.
Assignee: ducarroz → masayuki
Depends on: 201071
Flags: blocking-thunderbird2?
OS: Windows 98 → All
Priority: -- → P1
Hardware: PC → All
Target Milestone: --- → mozilla1.8.1alpha2
Assignee | ||
Updated•19 years ago
|
Status: NEW → ASSIGNED
Comment 9•19 years ago
|
||
i also have this problems with thunderbird 1.5 (20051201).
i have a UTF-8 textfile containing simplified chinese symbols.
when i use UTF-8 encoding for my outgoing emails as default the
UTF-8 becomes "double-converted", means the signature will be
converted, even if it already has the right encoding. this behaviour
makes it impossible to add special characters to your signature.
i can also reproduce this behaviour with german "umlaute" from the
ISO-8859-1. if i add them to my UTF-8 they are also "double-converted".
a possible fix would be to give people the ability to already set
the signature-encoding for each mail account. if the signature
encoding is different from the email-encoding, the signature will
be converted.
Updated•19 years ago
|
Flags: blocking-thunderbird2? → blocking-thunderbird2+
Assignee | ||
Comment 10•19 years ago
|
||
This is fixed on both trunk and 1.8 branch by bug 201071.
Now, we can use UTF-8 signature file on all platforms.
Comment 11•19 years ago
|
||
(In reply to comment #10)
> This is fixed on both trunk and 1.8 branch by bug 201071.
> Now, we can use UTF-8 signature file on all platforms.
Masayuki Nakano, are you sure this is fixed? Using today's Thunderbird trunk build (3a1-0418), Win2K, and the sig attached to this bug, I'm getting the same original symptom.
Assignee | ||
Comment 12•19 years ago
|
||
Mike:
the attached file is not UTF-8, that is encoded as UTF-16LE.
Comment 13•19 years ago
|
||
Because the signature file attached by the reporter is in UTF-16LE. This bug is not about UTF-8 signature files but about UTF-16 signature files. On non-Windows platforms, the chance is low that somebody makes a signature file in UTF-16, but on Windows, it's more likely.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: Composer does not display a UTF-8 signature in the right way. → Composer does not display a UTF-16 signature in the right way.
Assignee | ||
Comment 14•19 years ago
|
||
Jungshik:
Do you have an idea of a way which checks whether the buffer is UTF-16?
Do we check only BOM?
Keywords: fixed1.8.1
Comment 15•19 years ago
|
||
(In reply to comment #14)
> Do you have an idea of a way which checks whether the buffer is UTF-16?
> Do we check only BOM?
That's tough with a plain text file _without_ any formatting. On Windows, checking for a BOM at the beginning and looking for embedded 0x00's (especially '0x0D 0x00 0x0A 0x00' == CRLF or '0x20 0x00' falling at a 'word' boundary) may work in most of cases. An additional check may be to see if the buffer can be round-tripped to and from UTF-16 as the default code page. None of these (when tested by itself) is not very strong (especially the last one is weak) but combined together, they can be rather robust.
Anyway, I don't think it's a major bug not even 'normal'.
Severity: major → minor
Comment 16•19 years ago
|
||
I don't think we should do more than look for a BOM.
Assignee | ||
Comment 17•19 years ago
|
||
Ah... Currenty, we don't support UTF-16 encoding even if the signature file is HTML format... We need to fix this issue...
Assignee | ||
Comment 18•19 years ago
|
||
This patch only checks BOM. I think that it's enough for supporting Windows. Because notepad always adds BOM.
Attachment #219178 -
Flags: review?(jshin1987)
Assignee | ||
Updated•19 years ago
|
Status: REOPENED → ASSIGNED
Comment 19•19 years ago
|
||
Comment on attachment 219178 [details] [diff] [review]
Patch rv1.0
"UTF-16BE" and "UTF-16LE" means "There is no BOM". Although Mozilla uconv may perform the error recovering, it's invalid.
Can you use the "UTF-16" converter? It will auto-detect the endianness from BOM.
Assignee | ||
Comment 20•19 years ago
|
||
(In reply to comment #19)
> "UTF-16BE" and "UTF-16LE" means "There is no BOM". Although Mozilla uconv may
> perform the error recovering, it's invalid.
Really? The patch works fine.
Severity: minor → major
Assignee | ||
Comment 21•19 years ago
|
||
Comment 22•19 years ago
|
||
(In reply to comment #20)
> Really?
See RFC 2781.
http://www.ietf.org/rfc/rfc2781.txt
| Systems labelling UTF-16BE text MUST NOT prepend a BOM to the text.
| Systems labelling UTF-16LE text MUST NOT prepend a BOM to the text.
> The patch works fine.
It works thanks to the error recovery. It doesn't mean it's correct. Is tag soup correct if Mozilla (or MSIE, or whatever else) can parse it?
(In reply to comment #21)
> uconv removes the BOM if it's on its head.
See also Unicode Book 4.0.
http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf#G28070
| In UTF-16BE, an initial byte sequence <FE FF> is interpreted as U+FEFF ZERO
WIDTH NO-BREAK SPACE.
| In UTF-16LE, an initial byte sequence <FF FE> is interpreted as U+FEFF ZERO
WIDTH NO-BREAK SPACE.
That is, we aren't supposed to remove it. So I said "it's invalid".
You should not rely on the current behavior. uconv may become more strict someday.
Severity: major → minor
Assignee | ||
Comment 23•19 years ago
|
||
Thank you, Kimura-san. I changed the point.
Attachment #219178 -
Attachment is obsolete: true
Attachment #219300 -
Flags: review?(jshin1987)
Attachment #219178 -
Flags: review?(jshin1987)
Comment 24•19 years ago
|
||
(In reply to comment #22)
> (In reply to comment #20)
> > Really?
> See RFC 2781.
> http://www.ietf.org/rfc/rfc2781.txt
That RFC is about labelled encoding of MIME parts. The file being read from Windows doesn't *have* a MIME label. The "LE" and "BE" designations when talking about files are for the purposes of the people discussing it; the OS
may have a standard way of doing it, but the file should have a BOM. (And I
see no point in keeping the BOM when inserting the sig into a message.)
Comment 25•19 years ago
|
||
(In reply to comment #24)
> That RFC is about labelled encoding of MIME parts. The file being read from
> Windows doesn't *have* a MIME label.
uconv is also used for parsing MIME data. Therefore the meaning of encoding names should be much the MIME's one.
> The "LE" and "BE" designations when
> talking about files are for the purposes of the people discussing it;
We should not label "UTF16, with BOM, big endian" as "UTF-16BE" to avoid confising even if we are out of the MIME context.
Here is sample texts from RFC 2781:
| Text labelled with UTF-16BE, without a BOM:
| D8 08 DF 45 00 3D 00 52 00 61
| Text labelled with UTF-16LE, without a BOM:
| 08 D8 45 DF 3D 00 52 00 61 00
| Big-endian text labelled with UTF-16, with a BOM:
| FE FF D8 08 DF 45 00 3D 00 52 00 61
| Little-endian text labelled with UTF-16, with a BOM:
| FF FE 08 D8 45 DF 3D 00 52 00 61 00
Notice that UTF-16s with BOM are never called as UTF-16BE/UTF-16LE.
> the OS may have a standard way of doing it, but the file should have a BOM.
Then it should never be called as "UTF-16BE" or "UTF-16LE".
All you have to do is prepend BOM to the everything on the disk and call it as "UTF-16".
> (And I see no point in keeping the BOM when inserting the sig into a message.)
It's critical about parsing XML document. uconv is not only for the signature parsing.
Comment 26•19 years ago
|
||
Correction:
> should be much
should match
Sorry for my poor English.
Assignee | ||
Comment 27•19 years ago
|
||
Comment on attachment 219300 [details] [diff] [review]
Patch rv1.1
I found a bug. The BOM is inserted to body of message. We should remove BOM if the encoding is UTF-16 or UTF-8.
Attachment #219300 -
Flags: review?(jshin1987) → review-
Assignee | ||
Comment 28•19 years ago
|
||
removing BOM.
Attachment #219300 -
Attachment is obsolete: true
Attachment #219574 -
Flags: review?(jshin1987)
Assignee | ||
Comment 29•19 years ago
|
||
Sorry for the spam.
Attachment #219574 -
Attachment is obsolete: true
Attachment #219577 -
Flags: review?(jshin1987)
Attachment #219574 -
Flags: review?(jshin1987)
Comment 30•19 years ago
|
||
why doesn't the unicode conversion remove the BOM?
Assignee | ||
Comment 31•19 years ago
|
||
I know why the BOM isn't removed. It's only when the signature file is UTF-8. In this case, |CopyUTF8toUTF16| is used instead of unicode decoder.
Attachment #219577 -
Attachment is obsolete: true
Attachment #219604 -
Flags: review?(jshin1987)
Attachment #219577 -
Flags: review?(jshin1987)
Assignee | ||
Updated•19 years ago
|
Target Milestone: mozilla1.8.1alpha2 → mozilla1.8.1beta1
Assignee | ||
Comment 32•19 years ago
|
||
Comment on attachment 219604 [details] [diff] [review]
Patch rv1.3
Simon:
Could you review this? Originally, this patch should be reviewed by jshin. But he is busy still now. We need this patch for Tb2.0, so I need a reviewer for this in early time. Could you check this?
Attachment #219604 -
Flags: review?(jshin1987) → review?(smontagu)
Comment 33•19 years ago
|
||
Comment on attachment 219604 [details] [diff] [review]
Patch rv1.3
I did review the patch, but forgot to log in (at my home, bugzilla keeps asking for login every single transaction I make) after pressing submit button but thought I had.
>Index: mailnews/base/util/nsMsgI18N.cpp
>+ fSpec.GetFileSize() % 2 == 0 && fSpec.GetFileSize() >= 2 &&
>+ ((readBuf[0] == char(0xFE) && readBuf[1] == char(0xFF)) ||
>+ (readBuf[0] == char(0xFF) && readBuf[1] == char(0xFE)))) {
>+ sigEncoding.Assign("UTF-16");
>+ }
I'm not very happy about the above, but perhaps, it'd work almost all the time...
Assignee | ||
Updated•19 years ago
|
Attachment #219604 -
Flags: review?(smontagu) → review?(jshin1987)
Assignee | ||
Comment 34•19 years ago
|
||
(In reply to comment #33)
> I'm not very happy about the above, but perhaps, it'd work almost all the
> time...
Do you have another idea?
Comment 35•19 years ago
|
||
Comment on attachment 219604 [details] [diff] [review]
Patch rv1.3
r=jshin
Simon should be as good as me...
(In reply to comment #33)
>> I'm not very happy about the above, but perhaps, it'd work almost all the
>> time...
>Do you have another idea?
we can make a more complicated check, but I guess we can just get away with this simple-minded check given that UTF-16 is not likely to be used on platforms other than Windows and a similar method is used by Notepad/Wordpad on Windows.
Attachment #219604 -
Flags: review?(jshin1987) → review+
Assignee | ||
Comment 36•19 years ago
|
||
Comment on attachment 219604 [details] [diff] [review]
Patch rv1.3
Scott:
Would you check this?
Attachment #219604 -
Flags: superreview?(mscott)
Comment 37•19 years ago
|
||
Comment on attachment 219604 [details] [diff] [review]
Patch rv1.3
if jshin is happy with this approach than so am I.
Attachment #219604 -
Flags: superreview?(mscott) → superreview+
Assignee | ||
Comment 38•19 years ago
|
||
checked-in. I'll request approval to 1.8 branch.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago → 19 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 40•19 years ago
|
||
Comment on attachment 219604 [details] [diff] [review]
Patch rv1.3
Let's go to Tb2. This patch is needed by bug 201071 that is already checked-in to 1.8.1 branch. Of course, the risk is very low.
# What is different between approval-thunderbird2 and approval1.8.1?? Do I need both approval for check-in?
Attachment #219604 -
Flags: approval1.8.1?
Attachment #219604 -
Flags: approval-thunderbird2?
Comment 41•19 years ago
|
||
Comment on attachment 219604 [details] [diff] [review]
Patch rv1.3
a=drivers, land on the branch
Attachment #219604 -
Flags: approval1.8.1? → approval1.8.1+
Assignee | ||
Comment 43•19 years ago
|
||
-> v.1.8.1
Keywords: fixed1.8.1 → verified1.8.1
Target Milestone: mozilla1.8.1beta1 → mozilla1.8.1beta2
Assignee | ||
Updated•18 years ago
|
Attachment #219604 -
Flags: approval-thunderbird2?
Updated•16 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•