715823 - Forward message, wrong encoding (if different charset is used in part under multipart/mixed, charset of last part is used in forward inline, without converting data to the charset)

Reporter

Description

•

13 years ago

Attached file sample e-mail (deleted) — Details

User Agent: Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1 Build ID: 20111221202246 Steps to reproduce: My settings: Tested on Windows XP, TB 9;Ubuntu , TB 11.0a2 Preferences - Display - Formatting - Advanced Characters Encodings: incoming & outgoing mail - Central European (ISO-8859-2) View - Character encoding - Autodetect - Off Steps to reproduce: 1- Open sample e-mail 2 - Forward Actual results: Actual Results: Message is displayed in wrong encoding, see attachment. Expected results: Expected Results: Correct encoding.

raal

Reporter

Comment 1

•

13 years ago

Attached image printscreen, forwarded message with bad characters (deleted) — Details

raal

Reporter

Updated

•

13 years ago

OS: Linux → All

Ludovic Hirlimann [:Usul]

Updated

•

13 years ago

Attachment #586351 - Attachment mime type: application/octet-stream → message/rfc822

Nikolay Shopik

Comment 2

•

13 years ago

Seems like TB choices wrong encoding when forwarding. I seen worst case scenarios with Cyrillic. Can you attach forwarded message too?

raal

Reporter

Comment 3

•

13 years ago

Attached file forwarded message (deleted) — Details

YUKI "Piro" Hiroshi

Comment 4

•

11 years ago

This is caused by attachments with different encoding. The testcase message has a body encoded in ISO-8859-2 and attachments encoded in UTF-8. I've confirmed with other cases with these combinations: 1. * Message body: ISO-2022-JP * Attachment: plaintext file in Shift_JIS 2. * Message body: ISO-2022-JP * Attachment: plaintext file in EUC-JP 3. * Message body: ISO-2022-JP * Attachment: plaintext file in UTF-8 The combination ISO-2022-JP message body and ISO-2022-JP attachment doesn't cause this problem. And I've found out why this problem happens. When Thunderbird generates a message body for forwarding, nsMsgCompose module guesses the character encoding of the original message from the cached information in the topmost window. nsMsgCompose::CreateMessage() http://hg.mozilla.org/comm-central/file/781c3bd778c5/mailnews/compose/src/nsMsgCompose.cpp#l1847 => GetTopmostMsgWindowCharacterSet() http://hg.mozilla.org/comm-central/file/781c3bd778c5/mailnews/compose/src/nsMsgCompose.cpp#l134 GetTopmostMsgWindowCharacterSet() reads the cached character encoding information via nsIMsgWindow#mailCharacterSet, however, it returns the character encoding of the last attachment, instead of original message body's one. As the result, nsMsgCompose::CreateMessage() applies wrong character encoding to the forwarded message body and broken body appears for me.

Status: UNCONFIRMED → NEW

Ever confirmed: true

YUKI "Piro" Hiroshi

Comment 5

•

11 years ago

Just a workaround, this addon solves this problem in most cases. However, of course it should be fixed by Thunderbird itself. https://github.com/clear-code/correct-encoding

Hiroyuki Ikezoe (:hiro)

Updated

•

11 years ago

Component: Message Compose Window → Composition

Product: Thunderbird → MailNews Core

Hardware: x86 → All

Version: 11 → Trunk

Hiroyuki Ikezoe (:hiro)

Comment 7

•

11 years ago

Attached patch bug715823.patch (obsolete) (deleted) — Details — Splinter Review

Fix for forwarding incline case. This patch does solve the wrong encoding in case of incline forward. But I am not 100% sure this patch does not cause any regression. So please test the binary with this patch. The binary can be downloaded at: http://ftp.mozilla.org/pub/mozilla.org/thunderbird/try-builds/hiikezoe@mozilla-japan.org-85309fa29a41/

Hiroyuki Ikezoe (:hiro)

Comment 8

•

11 years ago

Attached patch An xpcshell test (obsolete) (deleted) — Details — Splinter Review

Hiroyuki Ikezoe (:hiro)

Comment 9

•

11 years ago

For the record, try server results with the patches are: https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=85309fa29a41

Magnus Melin [:mkmelin]

Updated

•

11 years ago

QA Contact: hiikezoe

WADA:World Anti-bad-Duping Agency

Comment 10

•

11 years ago

bug 716983 looks closed as dup of this bug. Is bug 701818 dup too?

Hiroyuki Ikezoe (:hiro)

Comment 11

•

11 years ago

The patch I attached in comment 7 solves bug 701818 too. So we can close the bug as a duplicate one.

Hiroyuki Ikezoe (:hiro)

Updated

•

11 years ago

Assignee: nobody → hiikezoe

Status: NEW → ASSIGNED

YUKI "Piro" Hiroshi

Comment 13

•

11 years ago

(In reply to Hiroyuki Ikezoe (:hiro) from comment #7) > This patch does solve the wrong encoding in case of incline forward. But I > am not 100% sure this patch does not cause any regression. So please test > the binary with this patch. I've tested the build and confirmed this bug was away. However, by the patch, Thunderbird seems to ignore user-chosen character encoding for forwarded messages. If the original message has wrong encoding information in its Content-Type header, or if auto-detection is failed, you will have to choose different encoding from the "View" menu. Then the forwarded message should inherit the user-chosen encoding. After this patch, this behavior is also removed...

YUKI "Piro" Hiroshi

Comment 14

•

11 years ago

Attached file testcase with ISO-2022-JP body and Shift_JIS attachment (deleted) — Details

Steps to reproduce the regression: 1. Exit Thunderbird. 2. Put the file "test-folder" under your local folder account, for example: "%AppData%\Thunderbird\Profiles\xxx.default\Mail\Local Folders" 3. Start Thunderbird. Then a new "test-folder" folder appears under your local folder account. 4. Disable auto-detection. View => Character Encoding => Auto-Detect => (Off) 5. Choose the "test-folder" folder. 6. The folder contains just a message, so choose the message. 7. The message is loaded with wrong encoding. (ex. "Western (Windows-1252)") 8. Choose the character encoding "ISO-2022-JP" via: View => Character Encoding => More Encodings => East Asian => Japanese (ISO-2022-JP) Then the message body is shown correctly. 9. Forward the message as "Inline". Actual result on the test build: The forwarded body is broken like the step 7, it is shown with wrong encoding. Expected result (and the actual result on the currently released Thunderbird): The forwarded body is shown correctly, as an ISO-2022-JP encoded text.

WADA:World Anti-bad-Duping Agency

Comment 15

•

11 years ago

FYI. Overriding wrong charset in mail is done by; - Per folder "Default Character Encoding" with "[x] Aplly default to all messages..." (developer is trying to change name to "Override Character Encoding") - Per mail/Ad-hoc View/Character Encoding

Ludovic Hirlimann [:Usul]

Comment 16

•

11 years ago

Can we get reviews going ?

Hiroyuki Ikezoe (:hiro)

Comment 17

•

11 years ago

(In reply to YUKI "Piro" Hiroshi from comment #14) > Created attachment 8348527 [details] > testcase with ISO-2022-JP body and Shift_JIS attachment > > Steps to reproduce the regression: > > 1. Exit Thunderbird. > 2. Put the file "test-folder" under your local folder account, for example: > "%AppData%\Thunderbird\Profiles\xxx.default\Mail\Local Folders" > 3. Start Thunderbird. Then a new "test-folder" folder appears under > your local folder account. > 4. Disable auto-detection. > View => Character Encoding => Auto-Detect => (Off) > 5. Choose the "test-folder" folder. > 6. The folder contains just a message, so choose the message. > 7. The message is loaded with wrong encoding. (ex. "Western (Windows-1252)") > 8. Choose the character encoding "ISO-2022-JP" via: > View => Character Encoding => More Encodings => East Asian > => Japanese (ISO-2022-JP) > Then the message body is shown correctly. > 9. Forward the message as "Inline". > > Actual result on the test build: > The forwarded body is broken like the step 7, it is shown with wrong > encoding. > > Expected result (and the actual result on the currently released > Thunderbird): > The forwarded body is shown correctly, as an ISO-2022-JP encoded text. Thanks for the testing and investigations. Then, we need a new attribute in nsIMsgWindow.idl to preserve the charset of original message body or to represent a chosen charset by user. When I started to fix this issue, I tried to add such new attribute but I cloud not have a good name for it because there is already "mailCharacterSet", and "mailCharacterSet" should be preserved for addon compatibility. So I decided to take another solution without the new attribute in the patch. Anyway, as far as I understand, the new attribute is necessary to fix this issue now. "charset" or "characterSet" are very confusable to represent the original charset. I'd propose "forceMailCharacterSet" to represent the chosen charset by user. Piro, what do you think?

Flags: needinfo?(yuki)

Hiroyuki Ikezoe (:hiro)

Comment 18

•

11 years ago

(In reply to Ludovic Hirlimann [:Usul] (away dadying) from comment #16) > Can we get reviews going ? Not yet. The patch should be revised.

YUKI "Piro" Hiroshi

Comment 19

•

11 years ago

(In reply to Hiroyuki Ikezoe (:hiro) from comment #17) > Anyway, as far as I understand, the new attribute is necessary to fix this > issue now. "charset" or "characterSet" are very confusable to represent the > original charset. I'd propose "forceMailCharacterSet" to represent the > chosen charset by user. Piro, what do you think? Basically I think that "mailCharacterSet" should return the character encoding of the body, not the last part. The attribute was introduced by the bug 28869 (http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=nsIMsgWindow.idl&branch=1.27&root=/cvsroot&subdir=mozilla/mailnews/base/public&command=DIFF_FRAMESET&rev1=1.12&rev2=1.13) and I couldn't find the reason why it should return the character encoding of the last part, not the body. The main reason is written as: > When libmime decide a main body charset to use, the charset name to be feed > backed to the user by putting a mark on the charset menu item. The menu cannot indicate all of used charsets for mixed charset messages - the UI seems to be designed just for messages with single charset. Because we can indicate only one charset for a message, we have to decide that what is the point of the "Character Encoding" menu. And, I think the menu should reflect the state of the body, not attachments. The current behavior of the mailCharacterSet attribute seems wrong for me. However, we should not modify existing behavior. Yes, it is true. So, my another plan is: a new attribute "mailBodyCharacterSet" of the interface nsIMsgWindow. It sounds better than "forceMailCharacterSet", for me.

Flags: needinfo?(yuki)

Hiroyuki Ikezoe (:hiro)

Comment 20

•

11 years ago

(In reply to YUKI "Piro" Hiroshi from comment #19) > (In reply to Hiroyuki Ikezoe (:hiro) from comment #17) > > Anyway, as far as I understand, the new attribute is necessary to fix this > > issue now. "charset" or "characterSet" are very confusable to represent the > > original charset. I'd propose "forceMailCharacterSet" to represent the > > chosen charset by user. Piro, what do you think? > > Basically I think that "mailCharacterSet" should return the character > encoding of the body, not the last part. Yes, indeed. > (http://bonsai.mozilla.org/cvsview2. > cgi?diff_mode=context&whitespace_mode=show&file=nsIMsgWindow.idl&branch=1. > 27&root=/cvsroot&subdir=mozilla/mailnews/base/ > public&command=DIFF_FRAMESET&rev1=1.12&rev2=1.13) and I couldn't find the > reason why it should return the character encoding of the last part, not the > body. IIRC, it is used to view source multipart message. And as you know, I am afraid of breaking addons. > > When libmime decide a main body charset to use, the charset name to be feed > > backed to the user by putting a mark on the charset menu item. > > The menu cannot indicate all of used charsets for mixed charset messages - > the UI seems to be designed just for messages with single charset. Because > we can indicate only one charset for a message, we have to decide that what > is the point of the "Character Encoding" menu. And, I think the menu should > reflect the state of the body, not attachments. The current behavior of the > mailCharacterSet attribute seems wrong for me. > > However, we should not modify existing behavior. Yes, it is true. So, my > another plan is: a new attribute "mailBodyCharacterSet" of the interface > nsIMsgWindow. It sounds better than "forceMailCharacterSet", for me. Thanks for the better name! The name is really really better one!

Hiroyuki Ikezoe (:hiro)

Comment 21

•

11 years ago

I had tried a build with revised patch sets on try server before: https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=9fe6a3a486ea but unfortunately there is a bug that try server build for Windows fails with mozilla-central patches. See bug 953226.

Ludovic Hirlimann [:Usul]

Updated

•

11 years ago

Depends on: 953226

WADA:World Anti-bad-Duping Agency

Updated

•

10 years ago

Summary: Forward message, wrong encoding → Forward message, wrong encoding (if different charset is used in part under multipart/mixed, charset of last part is used in forward/edit as new, without converting data to the charset)

Chris Cooper [:coop] (he/him)

Comment 22

•

10 years ago

As Callek says, I don't think there's anything for releng to do here.

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → INCOMPLETE

Chris Cooper [:coop] (he/him)

Comment 23

•

10 years ago

Oops, meant to close the releng dep.

Status: RESOLVED → REOPENED

Resolution: INCOMPLETE → ---

ISHIKAWA, Chiaki

Comment 25

•

10 years ago

(In reply to Hiroyuki Ikezoe (:hiro) from comment #21) > I had tried a build with revised patch sets on try server before: > > https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=9fe6a3a486ea > > but unfortunately there is a bug that try server build for Windows fails > with mozilla-central patches. > See bug 953226. Thank you for working on this. Has the latest patch been tested in working binary? Is the patch attached in this entry the latest? (Oh, I see, I think I need to use the following patch? https://hg.mozilla.org/try-comm-central/rev/ba72c8bbdb86 ) BTW, in the duped bug 1075436, I uploaded the screen shots which may be useful to explain the situation in case 1 in comment 4 to non-Japanese users. > 1. * Message body: ISO-2022-JP > * Attachment: plaintext file in Shift_JIS https://bugzilla.mozilla.org/attachment.cgi?id=8498194 SCREEN: The problematic message received. Visible OK, but shown encoding (Shift_JIS) does not match the main body's (ISO-2022-JP). https://bugzilla.mozilla.org/attachment.cgi?id=8498198 SCREEN: When I tried to forward the message, the mail body text is garbled. and the message https://bugzilla.mozilla.org/attachment.cgi?id=8499388 message that triggers garbled mail body text when forwarded. ISO-2022-JP encoding, the attachment is Shift_JIS plain file. TIA

Hiroyuki Ikezoe (:hiro)

Comment 26

•

10 years ago

(In reply to ISHIKAWA, Chiaki from comment #25) > (In reply to Hiroyuki Ikezoe (:hiro) from comment #21) > > I had tried a build with revised patch sets on try server before: > > > > https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=9fe6a3a486ea > > > > but unfortunately there is a bug that try server build for Windows fails > > with mozilla-central patches. > > See bug 953226. > > Thank you for working on this. > > Has the latest patch been tested in working binary? > Is the patch attached in this entry the latest? > (Oh, I see, I think I need to use the following patch? > https://hg.mozilla.org/try-comm-central/rev/ba72c8bbdb86 > ) And this: https://hg.mozilla.org/try-comm-central/rev/294dd9f2cbf8 I am not sure those patches can be applied to the current trunk cleanly, but apparently the xpcshell test should be rewritten with Promise.

Hiroyuki Ikezoe (:hiro)

Comment 27

•

10 years ago

Attached patch Adapt to the latest trunk (obsolete) (deleted) — Details — Splinter Review

Attachment #8345019 - Attachment is obsolete: true

Hiroyuki Ikezoe (:hiro)

Comment 28

•

10 years ago

Attached patch The xpcshell test rewritten with Promise (obsolete) (deleted) — Details — Splinter Review

Attachment #8345021 - Attachment is obsolete: true

Hiroyuki Ikezoe (:hiro)

Comment 29

•

10 years ago

Please test the binary with the above patches. http://ftp.mozilla.org/pub/mozilla.org/thunderbird/try-builds/hiikezoe@mozilla-japan.org-f7fb7b284413/ Try run: https://treeherder.mozilla.org/ui/#/jobs?repo=try-comm-central&revision=f7fb7b284413

ISHIKAWA, Chiaki

Comment 30

•

10 years ago

(In reply to Hiroyuki Ikezoe (:hiro) from comment #29) > Please test the binary with the above patches. > > http://ftp.mozilla.org/pub/mozilla.org/thunderbird/try-builds/ > hiikezoe@mozilla-japan.org-f7fb7b284413/ > > Try run: > https://treeherder.mozilla.org/ui/#/jobs?repo=try-comm- > central&revision=f7fb7b284413 Hi, on my local build and local test the following test succeeded: TEST-PASS | /REF-OBJ-DIR/objdir-tb3/_tests/xpcshell/mailnews/compose/test/unit/test_detectAttachmentCharset.js | test passed (time: 1002.004ms) Also, sending an e-mail with main body UTF-8 and attaching a Japanese plain text file encoded in Shift-JIS resulted in an e-mail - that did not garble Japanese characters in mail body text inside the compose window when it is to be forwarded, but - if the attachment (Japanese plain text file in Shift-JIS encoding) is shown in line, then the display of the attachment part IS GARBLED UNFORTUNATELY. I am now trying to figure out how to set outcoing character set to be ISO-2022-JP to recreate the original questions in duped bug of Bug 1075436 Is the interface to do this gone? TIA

Magnus Melin [:mkmelin]

Comment 31

•

10 years ago

(In reply to ISHIKAWA, Chiaki from comment #30) > I am now trying to figure out how to set outcoing character set to be > ISO-2022-JP Compose window | Options | Character encoding

Hiroyuki Ikezoe (:hiro)

Comment 32

•

10 years ago

(In reply to ISHIKAWA, Chiaki from comment #30) > - if the attachment (Japanese plain text file in Shift-JIS encoding) is > shown in line, > then the display of the attachment part IS GARBLED UNFORTUNATELY. In the composition window? I am guessing it's the message pane. In case of the message pane attachment 8500254 [details] [diff] [review] tries to use the charset of message body instead of attachment charset. (if I am not wrong) Given that the charset of message body is ISO-2022-JP and attachment file is Shift-JIS, which charset should be used by default?

ISHIKAWA, Chiaki

Comment 33

•

10 years ago

Attached file Mail sent (and received) by C-C TB: mail body (ISO-2022-JP), and attachment is Shift_JISIS (deleted) — Details

Hi, thank you for the experimental patches. To be clear, what data is being produced by the patched C-C TB, I am uploading an e-mail with ISO-2022-JP main body, and a plain text Japanese attachment in Shift_JIS encoding. Note that - for some reason, the subject line is encoded in UTF-8 (!) This is different from the current released binary, I think. See that attachment that shows ISO-2022-JP encoding in subject line: https://bugzilla.mozilla.org/attachment.cgi?id=8499388 [ Created from the currently released windows version of TB ] from Bug 1075436 - Trying to forward a Japanese e-mail (ISO-2022-JP encoding for mail main body) with certain attachment files cause a garbled main text in mail composite window So here is what happens when I send a Japanese mail (in ISO-2022-JP charset encoding) with Japanese subject line and attaching a plain Japanese text file (encoded in Shift_JIS). The received e-mail has the MIME information as below: -------+------------------+-------------------+--------------- | Main Text | Attachment | Subject | charset | charset | charset | Specified as | Specified | encoding -------+------------------+-------------------+--------------- Current| ISO-2022-JP | SJIS | ISO-2022-JP Release| -------+------------------+-------------------+--------------- Patched| ISO-2022-JP | None | UTF-8 C-C | | | -------+------------------+-------------------+--------------- That said, yes, the garbling of attachment shown as "inline" occurs in the message window where the received message is displayed. (In composition window, the attachment seems to be always shown as a filename in the attachment file window, and so the content is not shown.) This garbling occurs when the main mail message is encoded in UTF-8 and in ISO-2022-JP (both cases). I think the garbling occurs because the lack of charset specification on the side of the sender (patched TB). I don't know if there is a clear solution here. I don't understand the nature of the patch, but it tried to fix the problem of using incorrect decoding of ISO-2022-JP mail text using the incorrect Shift_JIS decoder setup (that seems to be left by the charset specification of an attachment file that came with Shift_JIS charset specificatin. Is it possible to modify TB to use the incoming charset specification (done by the sender) to render the main text, AND also uses the charset specification for attachment [which was lacking in the patched C-C TB, but was Shift_JIS in the current release), AND still let the main text to be correctly rendered by making sure that the decoder is properly initialized for each part of multi-part mime data? If it is possible, by keeping the sender-side Shift_JIS encoding info for the attachment, the receiver side can probably show the Shift_JIS attachment intact. Although, I am talking of Shift_JIS vs ISO-2022-JP, etc., I think the same issue of garbled inlined attachment display would occur as in the original reporter's case. (Due to the previous issues experienced over the years with inline attachment display, I have disabled the inline display feature, but I noticed that some colleagues of mine uses Mac and its mailer handles inline attachment display rather well, and so some people with good experience of seeing inline attachment "just work" may complain if the currently proposed patch has this issue of garbled display of attachment. Well, we may need more feedback from the user, but I think programming-wise, making sure that the decoder is initialized properly for each part of multi-part is very important since the sender is NOT TB alone, and there may be other mailers which send attachment with charset encoding. (I don't know the behavior of every mailer in the wild, though. At least Mozilla/5.0 (Windows NT 6.3; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 does, and there are people out there who don't upgrade their programs for extended amount of time.) If TB can't handle the charset information right and leave a decoder for the main mail text in an incorrect state, we will still have this garbled mail message in composition window when the user tries to forward such e-mail. Correct? TIA PS: That the subject is encoded in UTF-8 even though I would like the e-mail to go out in ISO-2022-JP is a little surprise.

Hiroyuki Ikezoe (:hiro)

Comment 34

•

10 years ago

I am not sure you may know or not, the current trunk Thunderbird does not detect the charset of attachment files if intl.charset.detector is not set. See bug 844115 and bug 1074585. Anyway if you just sent a message with attachments, not forwarding, we should open a new bug for it. And I'd suggest you should test against the current trunk without patches for comparison.

ISHIKAWA, Chiaki

Comment 35

•

10 years ago

(In reply to Hiroyuki Ikezoe (:hiro) from comment #34) > I am not sure you may know or not, the current trunk Thunderbird does not > detect the charset of attachment files if intl.charset.detector is not set. > See bug 844115 and bug 1074585. > > Anyway if you just sent a message with attachments, not forwarding, we > should open a new bug for it. > > And I'd suggest you should test against the current trunk without patches > for comparison. 1. No, I did not know the bug 844115 and bug 1074585 ("did not understand" it was more lke it). >Anyway if you just sent a message with attachments, not forwarding, we should open a new bug for it. I am not sure what "it" refers to. The garbling of inline attachment (in Shift_JIS) display after I send it myself with the mail body in ISO-2022-JP and receive it (both in TB)? >And I'd suggest you should test against the current trunk without patches for comparison. Will do. TIA

ISHIKAWA, Chiaki

Comment 36

•

10 years ago

I insert "no. 2", and "no. 3" below to make it clear which part of the discussion is touched. (In reply to ISHIKAWA, Chiaki from comment #35) > (In reply to Hiroyuki Ikezoe (:hiro) from comment #34) No.1 : > > I am not sure you may know or not, the current trunk Thunderbird does not > > detect the charset of attachment files if intl.charset.detector is not set. > > See bug 844115 and bug 1074585. > > No. 2: > > Anyway if you just sent a message with attachments, not forwarding, we > > should open a new bug for it. > > No. 3: > > And I'd suggest you should test against the current trunk without patches > > for comparison. > > 1. No, I did not know the bug 844115 and bug 1074585 ("did not understand" > it was more lke it). > No. 2: > >Anyway if you just sent a message with attachments, not forwarding, we should open a new bug for it. > I am not sure what "it" refers to. > The garbling of inline attachment (in Shift_JIS) display after I send it > myself with the mail body > in ISO-2022-JP and receive it (both in TB)? > No. 3: > >And I'd suggest you should test against the current trunk without patches for comparison. > Will do. > > TIA Re the third point, I did, and I got a very confusing result. Unpatched C-C, that is without the patch proposed here, produced a binary of TB, as of the version fetched in the last couple of days,today, produced an exactly the same type of e-mail message: To recap, The received e-mail has the MIME information as below: --------+------------------+-------------------+--------------- | Main Text | Attachment | Subject | charset | charset | charset | Specified as | Specified | encoding --------+------------------+-------------------+--------------- Current | ISO-2022-JP | SJIS | ISO-2022-JP Release | --------+------------------+-------------------+--------------- Patched | ISO-2022-JP | None | UTF-8 C-C | | | patched with the patches proposed here. --------+------------------+-------------------+--------------- Unpachted| ISO-2002-JP | None | UTF-8 C-C | | | --------+------------------+-------------------+--------------- I had to re-check if this was correct and unless I made a screw-up somewhere, the above is what I obtained. So the message sent out thusly from TB, when received, and when the attachment is tried to viewed inline, the attachment part is garbled (I think lacking the charset information, TB tries to decode the attachment as ISO-2022-JP: maybe TB can be instructed to try detection, but that seems to be gone, right?. Oh, OK, if we set some preferendces, it will be correctly visible. I should re-try.)

Hiroyuki Ikezoe (:hiro)

Comment 37

•

10 years ago

What is your preference value of intl.charset.detector? You need to set it to 'ja_parallel_state_machine'. Even if you set the value surely, attachment charset was not detected, please open a new bug, we have to fix it.

ISHIKAWA, Chiaki

Comment 38

•

10 years ago

(In reply to Hiroyuki Ikezoe (:hiro) from comment #37) > What is your preference value of intl.charset.detector? It was default: empty. > You need to set it to 'ja_parallel_state_machine'. I did. > Even if you set the value surely, attachment charset was not detected, > please open a new bug, we have to fix it. Now, with the setting of the above variable, to ja_parallel_state_machine, TB can now correctly render the content of of Shift_JIS plain Japanese text when I toggle the view attachment inline to ON. Thank you for the info. However, I wonder why this is not set to ja_parallel_state_machine by default. In the next round of release, Mozilla community may have unnecessary load of answering/dealing with Japanese users who face garbled text display of mismatched decoding when some plain text attachment came (and its encoding is not the same as the main mail text body.) TIA

Hiroyuki Ikezoe (:hiro)

Comment 39

•

10 years ago

(In reply to ISHIKAWA, Chiaki from comment #38) > (In reply to Hiroyuki Ikezoe (:hiro) from comment #37) > > What is your preference value of intl.charset.detector? > > It was default: empty. > > > You need to set it to 'ja_parallel_state_machine'. > > I did. > > > Even if you set the value surely, attachment charset was not detected, > > please open a new bug, we have to fix it. > > Now, with the setting of the above variable, to ja_parallel_state_machine, > TB can now correctly render the content of > of Shift_JIS plain Japanese text when I toggle the > view attachment inline to ON. > Thank you for the info. > > However, I wonder why this is not set to ja_parallel_state_machine > by default. Because the value is not only for Japanese. The value is set in localization. http://hg.mozilla.org/l10n-central/ja/file/6ce31c3a1c68/toolkit/chrome/global/intl.properties#l49

ISHIKAWA, Chiaki

Comment 40

•

10 years ago

(In reply to Hiroyuki Ikezoe (:hiro) from comment #39) > (In reply to ISHIKAWA, Chiaki from comment #38) [...] > > > > However, I wonder why this is not set to ja_parallel_state_machine > > by default. > > Because the value is not only for Japanese. > > The value is set in localization. > http://hg.mozilla.org/l10n-central/ja/file/6ce31c3a1c68/toolkit/chrome/ > global/intl.properties#l49 Thank you for the pointer. 43 # LOCALIZATION NOTE (intl.charset.detector): 44 # This preference controls the initial setting for the character encoding 45 # detector. Valid values are ja_parallel_state_machine for Japanese, ruprob 46 # for Russian and ukprob for Ukrainian and the empty string to turn detection 47 # off. The value must be empty for locales other than Japanese, Russian and 48 # Ukrainian. 49 intl.charset.detector = ja_parallel_state_machine So it is Japanese, Russian and Ukrainian. Mozilla may have to think about the support cost (questions caused by possible garbled inline attachments) and want to either - No. 1: write something about setting intl.charset.detector to one of values to avoid the garbled inline display of attachment text files when one's e-mail has large number of Japanese, Russian and Ukrainian content. or (and?) - (hmm), or maybe rewrite the code to set this variable automatically to one of the proper values if the default display character set for incoming e-mail is set to one of the charaset used for Japanese, Russian and Ukrainian content. But this may be too fragile setup. Anyway, once the next version is out, I think there will be many questions from people who suddenly see garbled inline attachment display every so often, so release note should contain at least the caution in No. 1 item above. TIA I wonder

Hiroyuki Ikezoe (:hiro)

Comment 41

•

10 years ago

Please open a new bug about removal of universal charset detector. I know it will be a problem for some of users. but it's totally unrelated this issue. Thank you,

ISHIKAWA, Chiaki

Comment 42

•

10 years ago

(In reply to Hiroyuki Ikezoe (:hiro) from comment #41) > Please open a new bug about removal of universal charset detector. I know it > will be a problem for some of users. > but it's totally unrelated this issue. > Thank you, I will. Thank you for the suggestion. Typical Japanese, Russian and Ukrainian users will never understand that the "removal of universal charaset detector" in C-C TB means that they may get garbled inline display of attachment plain text files in the future. TIA

ISHIKAWA, Chiaki

Comment 43

•

10 years ago

Filed Bug 1081693 - (Call for Better Documentation) Please mention that Japanese, Russian and Ukrainian users of TB should set intl.charset.detector by hand in the next release document.

:aceman

Comment 44

•

10 years ago

Comment on attachment 8500254 [details] [diff] [review] Adapt to the latest trunk What do you think of this guys?

Attachment #8500254 - Flags: review?(mkmelin+mozilla)

Attachment #8500254 - Flags: review?(Pidgeot18)

ISHIKAWA, Chiaki

Comment 45

•

10 years ago

I think this should go in to help those who need to handle occasional multi-language attachments (after adaption to the latest source files.) This solved my issues with proper setting of int.charset.detector in bug 1081693. TOA

ISHIKAWA, Chiaki

Comment 46

•

10 years ago

> TOA Itchy finger did this. TIA

Magnus Melin [:mkmelin]

Comment 47

•

10 years ago

Comment on attachment 8500254 [details] [diff] [review] Adapt to the latest trunk Review of attachment 8500254 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/base/public/nsIMsgWindow.idl @@ +68,1 @@ > attribute ACString mailCharacterSet; nit: star on each documentation line. documentation is good, but if that's what it does, does anybody really want such a beast? (and if they do, rename it to something reasonable)

Joshua Cranmer [:jcranmer]

Comment 48

•

10 years ago

Comment on attachment 8500254 [details] [diff] [review] Adapt to the latest trunk Review of attachment 8500254 [details] [diff] [review]: ----------------------------------------------------------------- It seems clear, judging from a DXR search, that nearly every use of mailCharacterSet was replaced with the newly-added mailBodyCharacterSet. A few were missing, and in reading the history of this bug, it doesn't seem justified why these should be. The bug appears to be that the mail charset is replaced with the attachment charset if there is a textual attachment--which suggests that the bug is in the setting of this field from attachments anyways. Replacing the attribute but keeping the old one feels like a mistake, or at least something that requires particular justification for such an action, justification that is as of yet forthcoming. So either fix mailCharacterSet to only consider body parts instead of attachment parts in its computation, or justify why its current behavior is needed by at least some clients.

Attachment #8500254 - Flags: review?(mkmelin+mozilla)

Attachment #8500254 - Flags: review?(Pidgeot18)

Attachment #8500254 - Flags: review-

ISHIKAWA, Chiaki

Comment 49

•

9 years ago

(In reply to Joshua Cranmer [:jcranmer] from comment #48) > Comment on attachment 8500254 [details] [diff] [review] > Adapt to the latest trunk > > Review of attachment 8500254 [details] [diff] [review]: > ----------------------------------------------------------------- > > It seems clear, judging from a DXR search, that nearly every use of > mailCharacterSet was replaced with the newly-added mailBodyCharacterSet. A > few were missing, and in reading the history of this bug, it doesn't seem > justified why these should be. The bug appears to be that the mail charset > is replaced with the attachment charset if there is a textual > attachment--which suggests that the bug is in the setting of this field from > attachments anyways. Replacing the attribute but keeping the old one feels > like a mistake, or at least something that requires particular justification > for such an action, justification that is as of yet forthcoming. > > So either fix mailCharacterSet to only consider body parts instead of > attachment parts in its computation, or justify why its current behavior is > needed by at least some clients. Sorry, I am not a native English user, and maybe I missed something. But is is the case that the above comment suggests that aceman's patch may have left a few use of mailCharacterSet incorrectly instead of using mailBodyCharacterSet. If so, I hope the patch can be made more complete to fix some remaining problematic areas to handle the issue. I have hit upon the mangled character set issue when I tried to forward some e-mails recently, and realized that this bug is still there :-) I hope this bug will be eradicated soon.

PM

Comment 50

•

9 years ago

Attached patch Possible fix (obsolete) (deleted) — Details — Splinter Review

When opening a multipart message with different encoding, nsMsgWindow::SetMailCharacterSet is called for each part contening a charset. Function MimeMultipart_notify_emitter (file: mailnews/mime/src/mimemult.cpp) call SetMailCharacterSetToMsgWindow for each body part that contain a charset: if (ct && (obj->options->notify_nested_bodies || MimeObjectIsMessageBody(obj))) { char *cset = MimeHeaders_get_parameter(ct, "charset", NULL, NULL); MimeObjectIsMessageBody should return true only for body parts. This is not the case. In function MimeObjectIsMessageBodyNoClimb the is_body variable is never set to true. Initial value is set to false. So the value never change. The patch 'patch_715823.txt' set the initial value to true. With applying the patch, SetMailCharacterSet is only called for body parts. Transfering the sample email seem to be good (see sample_email_patch_715823.png).

PM

Comment 51

•

9 years ago

Attached image sample_email_patch_715823.png (deleted) — Details

Magnus Melin [:mkmelin]

Comment 52

•

9 years ago

https://developer.mozilla.org/en/docs/Getting_your_patch_in_the_tree#Getting_the_patch_reviewed Have you tested that the xpcshell test works? https://bugzilla.mozilla.org/attachment.cgi?id=8500255 (And if so, please include it in your patch.)

PM

Comment 53

•

9 years ago

(In reply to Magnus Melin from comment #52) [...] > Have you tested that the xpcshell test works? > https://bugzilla.mozilla.org/attachment.cgi?id=8500255 (And if so, please > include it in your patch.) Test fail: 0:02.21 TEST_STATUS: Thread-1 check_charset FAIL [check_charset : 95] "ISO-2022-JP" == "Shift_JIS" But with setting: charsetOverride: false, => test ok 0:01.99 TEST_STATUS: Thread-1 check_charset PASS [check_charset : 96] "ISO-2022-JP" == "ISO-2022-JP" The value of charsetOverride is it exact for the test?

Nikolay Shopik

Comment 54

•

9 years ago

Does bug 597369 sound similar? I found that disabling preview pane F8 helps as workaround (you have to restart TB before see actual result)

ISHIKAWA, Chiaki

Comment 55

•

9 years ago

(In reply to Nikolay Shopik from comment #54) > Does bug 597369 sound similar? I found that disabling preview pane F8 helps > as workaround (you have to restart TB before see actual result) That bug seems to be fixed (if the latest posting is correct) by a change in Java code. Here, it seems the problem lies in the c++ code somewhere according to the discussion posts above. So there may be a superficial similarity, but the root cause, i.e., why the incorrect code set is chosen seems very different.

jonaslindstr

Comment 57

•

9 years ago

(In reply to Nikolay Shopik from comment #54) > Does bug 597369 sound similar? I found that disabling preview pane F8 helps > as workaround (you have to restart TB before see actual result) Thanks for the workaround!

Jorg K (CEST = GMT+2)

Assignee

Comment 58

•

9 years ago

Magnus, can you explain what the issue is here. I imported the message from attachment 8502270 [details] into today's Daily (which I'm using in production). In the preview, the body looks good (UTF-8) and the text file viewed inline looks garbled. That's understandable. When I open the attachment, it's displayed fine in Notepad++ on Windows. Now I press forward. The message content looks fine, still in UTF-8, the attachment is present and can be opened. I also tried attachment 8348527 [details]. This test case is faulty, it's missing charset=ISO-2022-JP in the content type. After correcting that, it works fine. The summary says: Forward message, wrong encoding (if different charset is used in part under multipart/mixed, charset of last part is used in forward/edit as new, without converting data to the charset) I have tested two cases and can't reproduce the problem. Where is it?

Flags: needinfo?(mkmelin+mozilla)

Jorg K (CEST = GMT+2)

Assignee

Comment 59

•

9 years ago

Attached file ISO-2022-JP body and Shift_JIS attachment (corrected attachment 8348527) (deleted) — Details

I've corrected the test case from attachment 8348527 [details].

Jorg K (CEST = GMT+2)

Assignee

Updated

•

9 years ago

Attachment #8625732 - Attachment is obsolete: true

Jorg K (CEST = GMT+2)

Assignee

Updated

•

9 years ago

Attachment #8500254 - Attachment is obsolete: true

Jorg K (CEST = GMT+2)

Assignee

Comment 60

•

9 years ago

I can reproduce the problem with attachment 8348527 [details] in TB 38. I cannot reproduce the problem in a TB 46 (Earlybird back then) compiled 2016-03-05. This was most likely fixed by either bug 1239658 or bug 597369. I've backed out the fix to bug 1251783 locally, that didn't make a difference. I am closing this bug with target milestone TB 47 since this is where the other bugs landed. All have been uplifted to TB 46 and TB 45. I am also assigning myself, since I fixed the aforementioned bugs. Whoever wants this bug reopened must present a reproducible case for TB 45 or later.

Assignee: hiikezoe → mozilla

Status: REOPENED → RESOLVED

Closed: 10 years ago → 9 years ago

Flags: needinfo?(mkmelin+mozilla)

Resolution: --- → FIXED

Target Milestone: --- → Thunderbird 47.0

Jorg K (CEST = GMT+2)

Assignee

Updated

•

9 years ago

Attachment #8500255 - Attachment is obsolete: true

Jorg K (CEST = GMT+2)

Assignee

Updated

•

8 years ago

Attachment #8502270 - Attachment mime type: message/rfc822 → text/plain

Jorg K (CEST = GMT+2)

Assignee

Comment 61

•

8 years ago

I looked at attachment 8737925 [details] again. The message body is in ISO-2022-JP, the attachment says windows-1252. Forwarding maintains ISO-2022-JP, reply and "edit as new" switches to windows-1252. However, the message content isn't garbled. To be continued in bug 1026989.

WADA:World Anti-bad-Duping Agency

Updated

•

8 years ago

Blocks: 597821

Jorg K (CEST = GMT+2)

Assignee

Comment 62

•

8 years ago

I've looked at this yet again. The message in attachment 8737925 [details] forwards garbled with TB 38 and it forwards OK in TB 45. So something did get fixed and I closed the bug. That there are still aspects that don't work will be fixed in bug 1026989.

WADA:World Anti-bad-Duping Agency

Comment 63

•

8 years ago

(In reply to Jorg K (GMT+1) from comment #62) > it forwards OK in TB 45. But "pick up charset from last attachment" phenomenon is still observed when "Forward As/Attachment" in Thunderd 45.5.1 on Windows. I believe that main issue in all reports of bug 597821, bug 604628, bug 618465, bug 633217, bug 701818, bug 716983, were "pick up charset from last attachment" phenomenon instead of garbled display upon Forward(Inline), except this bug 715823. If main issue of this bug was "pick up charset from last attachment", why patch proposed in this bug, which was recently fortunately dig out by bug 1026989, were hidden in this bug for so long time? For aging? Anyway, Jorg K, please provide solution for "pick up charset from last attachment" issue in bug 1026989 instead of old this bug. I believe aging of the patch was sufficiently done by this bug :-)

Jorg K (CEST = GMT+2)

Assignee

Comment 64

•

8 years ago

(In reply to WADA from comment #63) > Anyway, Jorg K, please provide solution for "pick up charset from last > attachment" issue in bug 1026989 instead of old this bug. I will. In fact, that bug already has a working fix. It's just missing a test which I will add today. I've just revisited this bug to understand why I closed it: I closed it because the inline forward case got fixed between TB 38 and TB 45 and because I made a mistake and got confused by the poor test cases. In attachment 8502270 [details] and attachment 8348527 [details] the MIME part representing the attachment does NOT even have a charset specified. So naturally no charset can be picked up from those attachments. Only attachment 8737925 [details] has a valid test case.

WADA:World Anti-bad-Duping Agency

Updated

•

8 years ago

Depends on: 1026989

WADA:World Anti-bad-Duping Agency

Comment 65

•

8 years ago

Because problem in "Forward Inline" case was actually disappeared while this bug was being processed, including both "wrong charset from attachment" and "garbled display", problem reported by comment #0 was surely resolved by unknown patch in unknown bug as written in comment #62. Removing "edit as new" from bug summary to avoid further confusions, because it was not involved in original report of this bug. It was from comment added to this bug. Sorry for confusing bug summary change. Jorg K, it's never your mistake. It's confusing bug summary change.

Summary: Forward message, wrong encoding (if different charset is used in part under multipart/mixed, charset of last part is used in forward/edit as new, without converting data to the charset) → Forward message, wrong encoding (if different charset is used in part under multipart/mixed, charset of last part is used in forward inline, without converting data to the charset)

WADA:World Anti-bad-Duping Agency

Updated

•

8 years ago

Depends on: 1323377

WADA:World Anti-bad-Duping Agency

Updated

•

8 years ago

No longer depends on: 1323377

sample e-mail 13 years ago raal (deleted), message/rfc822		Details
printscreen, forwarded message with bad characters 13 years ago raal (deleted), image/png		Details
forwarded message 13 years ago raal (deleted), application/octet-stream		Details
bug715823.patch 11 years ago Hiroyuki Ikezoe (:hiro) (deleted), patch		Details \| Diff \| Splinter Review
An xpcshell test 11 years ago Hiroyuki Ikezoe (:hiro) (deleted), patch		Details \| Diff \| Splinter Review
testcase with ISO-2022-JP body and Shift_JIS attachment 11 years ago YUKI "Piro" Hiroshi (deleted), application/mbox		Details
Adapt to the latest trunk 10 years ago Hiroyuki Ikezoe (:hiro) (deleted), patch	jcranmer : review-	Details \| Diff \| Splinter Review
The xpcshell test rewritten with Promise 10 years ago Hiroyuki Ikezoe (:hiro) (deleted), patch		Details \| Diff \| Splinter Review
Mail sent (and received) by C-C TB: mail body (ISO-2022-JP), and attachment is Shift_JISIS 10 years ago ISHIKAWA, Chiaki (deleted), text/plain		Details
Possible fix 9 years ago PM (deleted), patch		Details \| Diff \| Splinter Review
sample_email_patch_715823.png 9 years ago PM (deleted), image/png		Details
ISO-2022-JP body and Shift_JIS attachment (corrected attachment 8348527) 9 years ago Jorg K (CEST = GMT+2) (deleted), text/plain		Details