Closed Bug 961983 Opened 11 years ago Closed 8 years ago

Reply to UTF-16LE encoded message doesn't work (reply to message with UTF-16LE encoded attachment covered in bug 1026989)

Categories

(Thunderbird :: Message Compose Window, defect)

45 Branch
x86
Linux
defect
Not set
normal

Tracking

(thunderbird_esr45? affected, thunderbird51 fixed, thunderbird52 fixed, thunderbird53 fixed)

RESOLVED FIXED
Thunderbird 53.0
Tracking Status
thunderbird_esr45 ? affected
thunderbird51 --- fixed
thunderbird52 --- fixed
thunderbird53 --- fixed

People

(Reporter: elacunza, Assigned: jorgk-bmo)

References

Details

(Keywords: reproducible, testcase)

Attachments

(3 files, 6 obsolete files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:26.0) Gecko/20100101 Firefox/26.0 (Beta/Release) Build ID: 20131206152524 Steps to reproduce: Reply to the attached message. I just write on top of the body "test reply". Actual results: The recipient receives a blank message. Expected results: Recipient should have received my "test reply" text.
This also happens on Windows, as we originally detected the problem in an Ubuntu<->Windows mail interchange, both sides having Thunderbird.
This happens on Thunderbird 45.4.0 on Ubuntu also.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Version: 24 Branch → 45 Branch
This problem is produced by multiple reasons. A. The message body is lost after it is sent (or saved as a draft), if "UTF-16LE" is chosen for the text encoding. B. A reply message inherits the text encoding of the one of the original message (or its text attachment) automatically. The real problem is the "A". We can confirm that by these steps: 1. Open the Error Console via Ctrl-Shift-J. 2. File => New => Message 3. Fill the message body, like: ~~~ Hello. ~~~ 4. Copy this code and paste to the "Code" field in the error console, and hit the Enter key. Note, you must remove all linebreaks before pasting, because Thunderbird automatically replaces all linebreaks to "," and it may produce a syntax error. ~~~ var {Services}=Components.utils.import('resource://gre/modules/Services.jsm',{}); Services.wm.getMostRecentWindow('msgcompose').SetDocumentCharacterSet('UTF-16LE'); ~~~ 5. " - Unicode (UTF-16LE)" appears in the window title of the message editor. 6. Ctrl-S to save the message as a draft. 7. Go to the "Draft" folder and show the saved draft. Expected result: "Hello." appears as its body. Actual result: The message has a blank body. Moreover, if you edit the draft and save it again, then the updated message body will be lost again. The only one solution to solve this problem, you need to choose any available message encoding like the "Unicode" (means "UTF-8") from the "Options" menu of the message editor.
Summary: Replying a message with an UTF-16LE content-encoding sends corrupt messages → Message body encoded in UTF-16LE are lost when it is sent or saved
Attached file UTF-16LE-attachment-file.txt (obsolete) (deleted) —
Another testcase. Steps to reproduce: 1. Downlaod this attachment file "UTF-16LE-attachment-file.txt" as a file. 2. Create new message. 3. Attach the file "UTF-16LE-attachment-file.txt" to the message. 4. Send it to yourself. 5. After receive the mail, create new reply for it. 6. Fill the message body and save it as a draft. 7. Go to the "Draft" folder and show the message. Expected result: the draft has its message body. Actual result: the draft has a blank message body.
Component: Untriaged → Message Compose Window
Workaround addon. https://addons.mozilla.org/thunderbird/addon/edit-message-encoding-fallback/ License: MPL 2.0 This simply inserts following codes into Recipients2CompFields: ~~~ var charsetData = CharsetMenu.getData(); var allSupportedEncodings = charsetData.pinnedCharsets.map(function(aData) { return aData.value }) .concat(charsetData.otherCharsets.map(function(aData) { return aData.value })); if (gMsgCompose.compFields.characterSet && allSupportedEncodings.indexOf(gMsgCompose.compFields.characterSet) < 0) { SetDocumentCharacterSet('UTF-8'); } ~~~
The real problem is that the reply picks up the charset from an attachment. And if the attachment is UTF-16LE encoded, the message will get this too, and then it's downhill form there. This is bug 1026989. No doubt that your workaround will work, but the correct fix is not to pick up the charset from the attachment. The reporter originally reported that replying to a message that is UTF-16LE encoded also doesn't work, see attachment 8362846 [details]: Date: Tue, 21 Jan 2014 09:22:45 +0100 From: Eneko Lacunza <elacunza@binovo.es> User-Agent: Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Eneko Lacunza <elacunza@binovo.es> Subject: Re: IMAGEN EXCHANGE --> PROXMOX Content-Type: text/plain; charset=UTF-16LE; format=flowed Content-Transfer-Encoding: 7bit Firstly, UTF-16LE is not a valid encoding charset, and secondly, 16 bit sure as hell don't fit into 7 but ;-) However, we could implement something like you suggested, but maybe somewhere else where the charset is first read from the message so that the invalid charset doesn't even get into the comp fields. See also: bug 1297118.
Summary: Message body encoded in UTF-16LE are lost when it is sent or saved → Reply to UTF-16LE encoded message doesn't work (reply to message with UTF-16LE encoded attachment covered in bug 1026989)
Attached patch 961983-suppress-utf-16.patch (v1) (obsolete) (deleted) — Splinter Review
Please review whoever comes first ;-) This is really a non-issue. If someone ever receives an UTF-16 encoded message, which I doubt, it will look garbled (see test message). So they will set an appropriate charset before replying. If they don't or reply to an empty message, we will now stop short UFT-16 from getting anywhere. Note that the real problem here is in bug 1026989, where the charset is picked up from an attachment. I'm superseding attachment 8811675 [details] here since it belongs in the other bug.
Assignee: nobody → jorgk
Status: NEW → ASSIGNED
Attachment #8816968 - Flags: review?(mkmelin+mozilla)
Attachment #8816968 - Flags: review?(acelists)
Attached file Supposedly UTF-16 encoded non-empty test e-mail (obsolete) (deleted) —
Reviewers, you can use this to test ;-)
Attachment #8362846 - Attachment is obsolete: true
Attachment #8811675 - Attachment is obsolete: true
The original test message wasn't build up, it was a real message between me and a customer. I can go back to the root of the reply-chain to see where all started if that cane be of any use.
The original message you had attached in attachment 8362846 [details] had an empty body. I've just added a line of content. We don't need any more input from you, but thank you for the offer. Messages in UTF-16 will be forced to UTF-8 in a reply, regardless of whether the body of the original message was empty.
Can you please enlighten me why is utf-16 bad? Can we display such messages? Why can't we send them out? OK, I can understand we do not support the encoding internally for editing/composing. Is there a better way to check if a charset is supported/invalid than just harcoding the name?
UTF-16 is bad. We can't display such messages, we can't send them out. As you know, UTF-16 is used internally and apparently for attachment, but it's not a valid e-mail encoding. See bug 1028531 comment #1 for an authoritative answer. Also see: https://mzl.la/2h7osOh Also: https://en.wikipedia.org/wiki/Unicode_and_email#Unicode_support_in_message_bodies This is a first step if stopping UTF-16 before it goes any further. As I said in the patch, this is a crude fix to stop UTF-16. The special role it has warrants hard-coding this.
I see no mention about avoiding UTF-16 on the wiki page. Anyway, can't we do the filtering of invalid charsets inside nsMsgCompFields::SetCharacterSet() ? Or if the caller wants to know whether SetCharacterSet accepted or changed the charset, he can call SetCharacterSet() to see what charset was really stored.
(In reply to :aceman from comment #15) > Anyway, can't we do the filtering of invalid charsets inside > nsMsgCompFields::SetCharacterSet() ? I don't think so. The function and his brothers are purely for storing, not checking: NS_IMETHODIMP nsMsgCompFields::SetCharacterSet(const char *value) { return SetAsciiHeader(MSG_CHARACTER_SET_HEADER_ID, value); } I think I picked the right spot since then we call m_editor->SetDocumentCharacterSet(msgCharSet); a few lines later rv = childCV->SetForceCharacterSet(msgCharSet);
Comment on attachment 8816968 [details] [diff] [review] 961983-suppress-utf-16.patch (v1) Review of attachment 8816968 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/compose/src/nsMsgCompose.cpp @@ +1621,5 @@ > // Set the charset > + nsAutoCString msgCharSet(m_compFields->GetCharacterSet()); > + > + // Don't accept UTF-16 ever. > + if (Substring(msgCharSet, 0, 6).LowerCaseEqualsLiteral("utf-16")) { The Substring operation seems memory-unsafe if msgCharSet is shorter than 6. It's sad that c-c code can't count on encoding names being canonical. If it could, you could just check for the string starting with "UTF-16" (upper case).
Thanks for the comment! (In reply to Henri Sivonen (:hsivonen) from comment #17) > The Substring operation seems memory-unsafe if msgCharSet is shorter than 6. Really? That function doesn't check the length of it's input value? Well, in this case I can fix it. > It's sad that c-c code can't count on encoding names being canonical. If it > could, you could just check for the string starting with "UTF-16" (upper > case). One day we'll look into it in bug 1297118,
Comment on attachment 8816968 [details] [diff] [review] 961983-suppress-utf-16.patch (v1) I'll try a different solution inspired by Henri, that is, switching to canonical names at that stage (implementing a bit of bug 1297118). Stand by.
Attachment #8816968 - Flags: review?(mkmelin+mozilla)
Attachment #8816968 - Flags: review?(acelists)
Attached patch 961983-suppress-utf-16.patch (v2) (obsolete) (deleted) — Splinter Review
OK, how about this then? I noticed that UTF-7 and even x-mac-croatian seem to be valid canonical names, however, forcing the charset to those fails. There doesn't seem to be string function to test "starts with", so I'm using good old strncmp() which should be safe even if the input is shorter than six characters.
Attachment #8816968 - Attachment is obsolete: true
Attachment #8817379 - Flags: review?(mkmelin+mozilla)
Attachment #8817379 - Flags: feedback?(hsivonen)
Comment on attachment 8817379 [details] [diff] [review] 961983-suppress-utf-16.patch (v2) Review of attachment 8817379 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/compose/src/nsMsgCompose.cpp @@ +1627,5 @@ > + nsAutoCString msgCharSet; > + rv = ccm->GetCharsetAlias(m_compFields->GetCharacterSet(), msgCharSet); > + > + // Don't accept UTF-16 ever. > + if (NS_FAILED(rv) || strncmp(msgCharSet.get(), "UTF-16", 6) == 0) { I got a tip from Aceman: I could change this to: StringBeginsWith(msgCharSet, NS_LITERAL_CSTRING("UTF-16")) Any preference?
OK, let's go with the fancy string function ;-)
Attachment #8817379 - Attachment is obsolete: true
Attachment #8817379 - Flags: review?(mkmelin+mozilla)
Attachment #8817379 - Flags: feedback?(hsivonen)
Attachment #8817390 - Flags: review?(mkmelin+mozilla)
Attachment #8817390 - Flags: feedback?(hsivonen)
Comment on attachment 8817390 [details] [diff] [review] 961983-suppress-utf-16.patch (v2b). Review of attachment 8817390 [details] [diff] [review]: ----------------------------------------------------------------- While the code looks ok to me, I don't see a change in behavior compared to trunk, testing with the attached test case. Both display the content as some chinese(?) characters, and that's also included in the reply for both cases.
That's not the point. The point is that the title shows UTF-16 without the patch. So if you compose based on that, you'll mess up when you save or send, see comment #0. The test message is of course wrong, since you can't squeeze 16 bit into CTE 7bit. UTF-16 only makes sense for base64-encoded attachments which are the original cause of this bug. And that won't happen any more after bug 1026989 is fixed (only awaiting review on the test). BTW, the original test case here had an empty body. Let me see whether I can create a better test case.
I think I can't come up with a better test case, the Chinese characters are unavoidable. Not only does composing in UTF-16 not work, which I'm stopping here, also UTF-16 detection doesn't work in general, take your pick from this list: https://mzl.la/2h7osOh, maybe bug 244829. After bug 1026989 fixes picking up UTF-16 from the attachment, this bug here could be a WONTFIX, but as the reporter assures in comment #9, this was seen in the wild. I started looking at this bug here as a fix for bug 1026989, but it turned out that now I'll be fixing both.
Attached file Test e-mail in UTF-16 (deleted) —
OK, here's a decent test e-mail. Use current trunk to reply and send (at least to outbox), then try with the patch. 100% difference ;-)
Attachment #8816969 - Attachment is obsolete: true
Attachment #8817681 - Attachment mime type: message/rfc822 → text/plain
Would it be possible to add a simple test into the test file added in bug 1026989 ?
Sure. Reply to this test message here and get UTF-8. I can do that.
Attached patch 961983-utf-16-test.patch (v1) (obsolete) (deleted) — Splinter Review
Assuming that bug 1026989 will land first, here's a patch that adds a test to the new tests created over there.
Attachment #8817713 - Flags: review?(mkmelin+mozilla)
Attachment #8817713 - Flags: feedback?(acelists)
Attached patch 961983-utf-16-test.patch (v1b). (deleted) — Splinter Review
Refreshed for changes in bug 1026989. Try for both bugs here: https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=82ea022b7dc80f4c91c7c2f654083a52a2108a26 Let's get the charsets right again!
Attachment #8817713 - Attachment is obsolete: true
Attachment #8817713 - Flags: review?(mkmelin+mozilla)
Attachment #8817713 - Flags: feedback?(acelists)
Attachment #8817790 - Flags: review?(mkmelin+mozilla)
Attachment #8817790 - Flags: feedback?(acelists)
Comment on attachment 8817681 [details] Test e-mail in UTF-16 Sorry but even with this I don't see any change from trunk. Both seem to work, and there's no change in title. I even sent it an received it (as utf-8) with trunk. I suppose that should happen too as we normally auto-upgrade to utf-8 if conversion fails. This was all tested on linux. Maybe a platform difference?
Comment on attachment 8817790 [details] [diff] [review] 961983-utf-16-test.patch (v1b). Review of attachment 8817790 [details] [diff] [review]: ----------------------------------------------------------------- ::: mail/test/mozmill/composition/body-utf16.eml @@ +1,1 @@ > +From: test <test@test.com> test.com is a real domain, not for testing. You can use example.com (which is real but for testing!)
(In reply to Magnus Melin from comment #31) > This was all tested on linux. Maybe a platform difference? It seems that way. I adjusted the test and now it works everywhere. https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=82ea022b7dc80f4c91c7c2f654083a52a2108a26 Check the parts for bug 1026989. Looks like bug 1026989 is ready to go, just received r+. I'll s/test.com/example.com/
Try storing the message in a folder before replying.
Comment on attachment 8817390 [details] [diff] [review] 961983-suppress-utf-16.patch (v2b). Review of attachment 8817390 [details] [diff] [review]: ----------------------------------------------------------------- Yep, replying from folder reproduced it. r=mkmelin ::: mailnews/compose/src/nsMsgCompose.cpp @@ +1626,5 @@ > + > + nsAutoCString msgCharSet; > + rv = ccm->GetCharsetAlias(m_compFields->GetCharacterSet(), msgCharSet); > + > + // Don't accept UTF-16 ever. Maybe also add the comment from https://bugzilla.mozilla.org/show_bug.cgi?id=1028531#c1 inline here as explanation.
Attachment #8817390 - Flags: review?(mkmelin+mozilla) → review+
Comment on attachment 8817790 [details] [diff] [review] 961983-utf-16-test.patch (v1b). Review of attachment 8817790 [details] [diff] [review]: ----------------------------------------------------------------- Not 100% proof as replying from file didn't reproduce on linux. But I suppose other platforms would catch a regression. You could add a comment to that affect, so it's easier to understand if we do get a windows only failure in the future. r=mkmelin
Attachment #8817790 - Flags: review?(mkmelin+mozilla) → review+
https://hg.mozilla.org/comm-central/rev/61ad407448cdc69faf2adb348e9559ef048b19fe https://hg.mozilla.org/comm-central/rev/b26d08edfa2ac520277d576cbe9fd5c3c0bdf53d Code landed with extra comment as requested by Magnus in comment #35. Test landed with extra comment as requested by Aceman in bug 1026989 where this test was first created. Test also landed with s/test.com/example.com/.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 53.0
Comment on attachment 8817390 [details] [diff] [review] 961983-suppress-utf-16.patch (v2b). [Approval Request Comment] Regression caused by (bug #): That seems to have been wrong since day one. User impact if declined: Certain messages get encoded as UTF-16 and since that doesn't work, the content is lost. Testing completed (on c-c, etc.): Manual and via test. Risk to taking this patch (and alternatives if risky): Not risky, just forcing the charset from UTF-16 to UTF-8.
Attachment #8817390 - Flags: feedback?(hsivonen)
Attachment #8817390 - Flags: approval-comm-esr45?
Attachment #8817390 - Flags: approval-comm-beta?
Attachment #8817390 - Flags: approval-comm-aurora+
Comment on attachment 8817790 [details] [diff] [review] 961983-utf-16-test.patch (v1b). [Approval Request Comment] See preceding comment. This is the test.
Attachment #8817790 - Flags: feedback?(acelists)
Attachment #8817790 - Flags: approval-comm-esr45?
Attachment #8817790 - Flags: approval-comm-beta?
Attachment #8817790 - Flags: approval-comm-aurora+
(In reply to Magnus Melin from comment #36) > Not 100% proof as replying from file didn't reproduce on linux. Is there a misunderstanding here? The test is 100% proof since we reply from a folder which fails on all platforms: https://hg.mozilla.org/comm-central/rev/7a69461597e94a4a25157f0a8e17d1ec53df90b9#l2.44
(In reply to Magnus Melin from comment #32) > test.com is a real domain, not for testing. You can use example.com (which > is real but for testing!) We had test.com in 11 files, I took the liberty to fix this: https://hg.mozilla.org/comm-central/rev/c6776de4cd739ada20d6cbf52e70ec2ffda1e5a2
Ah right, I see that now.
Attachment #8817390 - Flags: approval-comm-beta? → approval-comm-beta+
Attachment #8817790 - Flags: approval-comm-beta? → approval-comm-beta+
No longer blocks: 1323377
There are too many changes in nsMsgCompose::InitEditor to make this an easy uplift, so I am going to skip this for 45.7.0 If you want consideration for 45.8.0 please submit a patch with adaptation for esr45
Comment on attachment 8817390 [details] [diff] [review] 961983-suppress-utf-16.patch (v2b). Just to be clear, what I man by "too many changes" is not this patch, but other patches that have caused drift of the code.
Attachment #8817390 - Flags: approval-comm-esr45? → approval-comm-esr45-
Attachment #8817790 - Flags: approval-comm-esr45? → approval-comm-esr45-
You got the apply order wrong. You need to apply bug 1026989 first.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: