Closed
Bug 1505315
Opened 6 years ago
Closed 6 years ago
Encoding errors of characters ś and ą in subject, body and attachment names when sending message via "Sent to > Mail recipient" on Polish Windows (caused by interpreting MAPI data as ISO-8859-2 instead of windows-1250)
Categories
(MailNews Core :: Simple MAPI, defect)
Tracking
(thunderbird_esr6064+ fixed, thunderbird64 fixed, thunderbird65 fixed)
RESOLVED
FIXED
Thunderbird 65.0
People
(Reporter: mkasprowicz, Assigned: jorgk-bmo)
References
Details
(Keywords: regression)
Attachments
(6 files, 1 obsolete file)
(deleted),
image/jpeg
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
emk
:
review+
jorgk-bmo
:
approval-comm-beta+
jorgk-bmo
:
approval-comm-esr60+
|
Details | Diff | Splinter Review |
(deleted),
text/plain
|
Details | |
(deleted),
application/vnd.ms-excel
|
Details | |
(deleted),
image/jpeg
|
Details |
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0
Steps to reproduce:
I updated the program to the latest version
Actual results:
no Polish characters in the title and attachments
Expected results:
should be Polish characters.for example, Mątwicki should be included in the attachment
Assignee | ||
Comment 1•6 years ago
|
||
We need the full message as .eml file to be able to investigate, or at least all the message headers and also the MIME part headers for at least one attachment.
Looking at the subject in the picture which has a [009C] character, there's most likely an encoding error of the sender's side. TB is pretty solid these days when it comes to encoding. That said, version TB 60 and beyond my be a little stricter in what they allow.
Reporter | ||
Comment 2•6 years ago
|
||
Reporter | ||
Comment 3•6 years ago
|
||
I do not know if it's okay, but I've added the .eml file above.
Assignee | ||
Updated•6 years ago
|
Attachment #9023221 -
Attachment mime type: message/rfc822 → text/plain
Reporter | ||
Comment 4•6 years ago
|
||
I am a simple boy, I am asking for a simpler explanation because I do not understand?
Assignee | ||
Comment 5•6 years ago
|
||
Thanks.
The subject contains various RFC 2047 encoded stings, the first one is
Subject: =?UTF-8?Q?Wysy=c5=82anie_wiadomo=c2=9cci_e-mail=3a_M=c5=a1twicki_M?=
The character that's not decoded properly is c2 9c. That's an encoding error:
https://www.utf8-zeichentabelle.de/unicode-utf8-table.pl:
U+009C c2 9c <control>
So that's a unicode control character with no textual representation, so the TB display is correct.
Looking at the attachment headers:
Content-Type: application/pdf;
name="=?UTF-8?Q?M=c5=a1twicki_M=2e_-_PPE_za_10-2018=2epdf?="
Content-Type: application/pdf;
name="=?UTF-8?Q?M=c5=a1twicki_M=2e_-_PIT-4_za_10-2018=2epdf?="
The character in question is c5 a1 *is*
U+0161 š c5 a1 LATIN SMALL LETTER S WITH CARON
https://www.utf8-zeichentabelle.de/unicode-utf8-table.pl?number=1024
which is what TB displays.
Now I see that you created the message with TB 60.3. So I don't know how it got mis-encoded. I've just extracted the attachment from the message, changed the š to a ą and sent it to you. For all I can tell, the ą is in the attachment name in the sent message.
Reporter | ||
Comment 6•6 years ago
|
||
I was curious and wrote an attachment from you and sent it back and the problem continues.
Wysyłanie wiadomoci e-mail: Mštwicki M. - PPE za 10-2018
Wiadomoć jest gotowa do wysłania wraz z następujšcymi załšcznikami (plikami lub łšczami):
Mštwicki M. - PPE za 10-2018
What's interesting, some Polish characters show, for example "ł" "ć" "ę".
will you think about my problem? this is not a big problem, but they are annoying stamps.
Assignee | ||
Comment 7•6 years ago
|
||
Yes, I would like to understand the problem. By the looks of it, you didn't compose the e-mail by hand, right? Maybe you used the "Send to > Mail recipient" from the Windows desktop.
The subject was:
Wysyłanie wiadomo[009C]ci e-mail: Mštwicki M. - PPE za 10-2018, Mštwicki M. - komornik, Mštwicki M. - PIT-4 za 10-2018
but it should have been:
Wysyłanie wiadomości e-mail: Mątwicki M. - PPE za 10-2018, Mątwicki M. - komornik, Mątwicki M. - PIT-4 za 10-2018
(translated: Sending an e-mail: ...)
So somehow the ś got corrupted. Also the attachment names. I'll send the e-mail again to you so you can see that it works.
So the question is: How did you generate this e-mail? Looks like there is a bug in Windows or the Windows/Thunderbird interface.
Does it work if you create a new e-mail, paste the subject "Wysyłanie wiadomości e-mail: Mątwicki ..." and add the attachment(s) manually?
Assignee | ||
Comment 8•6 years ago
|
||
(In reply to Jorg K (GMT+1) from comment #7)
> Maybe you used the "Send to > Mail
> recipient" from the Windows desktop.
Right, the message body is:
Wiadomo[009C]ć jest gotowa do wysłania wraz z następujšcymi załšcznikami (plikami lub łšczami):
The message is ready to send with the following attachments (files or links):
That's generated badly by Windows.
Assignee | ||
Updated•6 years ago
|
Summary: no Polish characters in the title and attachments → Encoding errors of characters ś and ą in subject, body and attachment names when sending message via "Sent to > Mail recipient" on Polish Windows
Comment 10•6 years ago
|
||
Someone might also take a look at bug 689942 and close it out if it no longer exists.
Comment 11•6 years ago
|
||
I have the same problem
Since I update Thunderbird to version 60.0 and up there is a problem with create new email by right click on file on desktop and send to --->receiver.
The same situation is when I use different program which is prepare message and use Thunderbird as a default client. It is not only my problems, Many people on polish Mozilla forum has the same problems. I have 5 different computers and when I upgrade Thunderbird to version 60 problem appear.
Actual results:
Thunderbird normally create new email but in topic there are mistakes in polish letters. The same situation is in space under the topic, (where we write email), default text which is add has mistakes in polish letters.
I don't have any extra ad-dons installed in Thunderbird. Restart Thunderbird to default sets not help at all.
Creating of normal email works well but only when you try to send file in this way as I describe mistakes in polish letters appear.
Assignee | ||
Comment 12•6 years ago
|
||
This will be hard to debug since I'd have to install a Polish language pack.
So there are only these two characters wrong?
U+0105 ą c4 85 comes out as U+0161 š c5 a1
U+015B ś c5 9b comes out as U+009C <control> c2 9c
Any other characters that are broken? Anything in Hungarian or Czech also broken?
Assignee | ||
Comment 13•6 years ago
|
||
For the record, I had to compile a 32bit version although I usually use 64bit for development, but that would crash due to bug 393302.
Then I had to set
HKLM\SOFTWARE\Clients\Mail\Mozilla Thunderbird\DLLPath
to
C:\mozilla-source\comm-central\obj-i686-pc-mingw32\comm\mailnews\mapi\mapiDLL\mozMapi32.dll
and
HKLM\SOFTWARE\Classes\CLSID\{29F458BE-8866-11D5-A3DD-00B0D0F3BAA7}\LocalServer32 - Default
to
"C:\mozilla-source\comm-central\obj-i686-pc-mingw32\dist\bin\thunderbird.exe" /MAPIStartup
Sadly I don't get a compose window when using "Sent to > Mail recipient", but the main Window opens instead :-(
My plan is to produce a debug version that an affected Polish user can run and report back the results.
Reporter | ||
Comment 14•6 years ago
|
||
I do not know if this will help, but I've checked which formats all Polish characters are.
the original file name: ąęćżźłóśń
title of the message: Wysyłanie wiadomoci e-mail: šęćżłóń
Assignee | ||
Comment 15•6 years ago
|
||
So problems with ąźś.
Assignee | ||
Comment 16•6 years ago
|
||
FRG, any idea how I can convince TB to open a compose window instead of the main window, see comment #13.
Flags: needinfo?(frgrahl)
Comment 17•6 years ago
|
||
I just installed TB 60.3 in a vm. Works of course with en-US. Did you set the registration paths for the dlls too?
https://dxr.mozilla.org/comm-esr60/source/mail/installer/windows/nsis/shared.nsh#369
Flags: needinfo?(frgrahl)
Assignee | ||
Comment 18•6 years ago
|
||
Yes, I did, see comment #13. I've even done
regsvr32.exe /s C:\mozilla-source\comm-central\obj-i686-pc-mingw32\comm\mailnews\mapi\mapiDLL\mozMapi32.dll
now. Nothing helped.
But here comes the success story now. I did |mach package| and got myself
C:\mozilla-source\comm-central\obj-i686-pc-mingw32\dist\install\sea\thunderbird-65.0a1.en-US.win32.installer.exe
Installing that, everything works now. I can now add my debug and see what happens.
Comment 19•6 years ago
|
||
I meant MapiProxy_InUse.dll. Didn't see it in comment 13 but installing the build is probably cleaner anyway.
> Installing that, everything works now. I can now add my debug and see what happens.
You probably know but really easy now to build an l10n version. Just add ac_add_options --with-l10n-base=d:/seamonkey/l10n/l10n-esr60 (your path of course) and use mach build installers-pl -v for polish after you did the en-US build.
Assignee | ||
Comment 20•6 years ago
|
||
Well, while looking where to put the debug, I found the problem.
In https://searchfox.org/comm-central/source/mailnews/mapi/mapihook/src/msgMapiHook.cpp we encode the data we get passed in from Windows into unicode using nsMsgI18NFileSystemCharset() and then nsMsgI18NConvertToUnicode(platformCharSet, ...).
nsMsgI18NFileSystemCharset() got somewhat simplified and does no longer return what it returned before. It now simply return the fallback encoding for that locale, see:
https://hg.mozilla.org/comm-central/rev/0b0cba8d70bd#l1.31
Looks like for Polish, that is ISO-8859-2, also called Latin-2:
https://searchfox.org/mozilla-central/rev/4e094f66ced333d69b24cd49273789e3a1173dfc/dom/encoding/localesfallbacks.properties#57
https://de.wikipedia.org/wiki/ISO_8859-2
However, the real Windows file system charset is windows-1250, https://en.wikipedia.org/wiki/Windows-1250.
Let's see: In windows-1250 our ą is 0xB9, and in ISO-8859-2 that is a š :-(. In windows-1250 the ś is 0x9C and in ISO-8859-2 that is a control character :-( - Exactly what we observed. ę is 0xEA in both encodings so that's why that works.
So the root cause of the problem is that Windows delivers the data as windows-1250 via the MAPI interface, we used to interpret it correctly, but now we interpret it as ISO-8859-2.
Now that we know what broke it, how do we fix it?
The fix would be to implement MAPISENDMAILW instead of MAPISENDMAIL https://docs.microsoft.com/en-gb/windows/desktop/api/mapi/nc-mapi-mapisendmailw to send a unicode message.
I workaround might be to set the pref intl.charset.fallback.override to windows-1250.
Reporters, can you please try that.
Blocks: 1381762
Keywords: regression
Assignee | ||
Updated•6 years ago
|
Summary: Encoding errors of characters ś and ą in subject, body and attachment names when sending message via "Sent to > Mail recipient" on Polish Windows → Encoding errors of characters ś and ą in subject, body and attachment names when sending message via "Sent to > Mail recipient" on Polish Windows (caused by interpreting MAPI data as ISO-8859-2 instead of windows-1250)
Assignee | ||
Comment 21•6 years ago
|
||
Masatoshi-san, I need your help.
As you can see from comment #20, the problem was caused by dropping nsIPlatformCharset and dumbing down nsMsgI18NFileSystemCharset().
So here I'm trying to solve the problem by moving to MAPISendMailW which can supposedly handle "unicode".
The patch works in so far as "Sent to > Mail recipients" will start a compose window and attach the selected file. So the HandleAttachments() function works.
However, the subject is a single "E" and the body a single "Y". The print shows:
=== E, 45
=== , 0
=== , 0
=== , 0
=== [02], 2
=== , 0
=== , 0
=== , 0
=== , 0
=== , 0
=== , 0
=== , 0
=== [02], 2
=== , 0
=== , 0
=== , 0
=== Y, 59
=== , 0
=== , 0
=== , 0
I don't know what Microsoft mean by "unicode". Since it's still passed as LPSTR I assumed that it's UTF-8, but it doesn't appear to be. Even interpreted as UTF-16 I don't get a better result.
I'm sure you have more experience with this.
Assignee: nobody → jorgk
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #9024216 -
Flags: feedback?(VYV03354)
Assignee | ||
Comment 22•6 years ago
|
||
I forgot to say:
https://docs.microsoft.com/en-gb/windows/desktop/api/mapi/nc-mapi-mapisendmailw
That talks of Unicode and that has lpMapiMessageW instead of lpMapiMessage. But both structures appear to be the same. We model this structure here: https://searchfox.org/comm-central/rev/ed86e0f292198530a321ba1dd2ebc9b3b8a7f506/mailnews/mapi/mapihook/build/msgMapi.idl#35
Comment 23•6 years ago
|
||
"unicode" means UTF-16LE in Microsoft terms.
Assignee | ||
Updated•6 years ago
|
Component: Untriaged → Simple MAPI
Product: Thunderbird → MailNews Core
Assignee | ||
Comment 24•6 years ago
|
||
Comment on attachment 9024216 [details] [diff] [review]
1505315-MAPI-unicode.patch - WIP
Sorry, lpMapiMessageW is different, see:
typedef struct MapiMessageW {
ULONG ulReserved;
PWSTR lpszSubject;
PWSTR lpszNoteText;
PWSTR lpszMessageType;
PWSTR lpszDateReceived;
PWSTR lpszConversationID;
FLAGS flFlags;
lpMapiRecipDescW lpOriginator;
ULONG nRecipCount;
lpMapiRecipDescW lpRecips;
ULONG nFileCount;
lpMapiFileDescW lpFiles;
} *lpMapiMessageW;
Somehow I could only find this in the Google cache.
Attachment #9024216 -
Flags: feedback?(VYV03354)
Assignee | ||
Comment 25•6 years ago
|
||
Some more:
typedef struct MapiRecipDescW {
ULONG ulReserved;
ULONG ulRecipClass;
PWSTR lpszName;
PWSTR lpszAddress;
ULONG ulEIDSize;
PVOID lpEntryID;
} *lpMapiRecipDescW;
typedef struct MapiFileDescW {
ULONG ulReserved;
ULONG flFlags;
ULONG nPosition;
PWSTR lpszPathName;
PWSTR lpszFileName;
PVOID lpFileType;
} *lpMapiFileDescW;
Comment 26•6 years ago
|
||
Adding MAPISendMailW is great, but you cannot remove MAPISendMail because this is an implementation of Microsoft Messaging Application Programming Interface (MAPI) and we have no control over callers. Some lagacy callers might be hardcoding MAPISendMail.
Comment 27•6 years ago
|
||
Please use NS_CopyUnicodeToNative/NS_CopyNativeToUnicode (or use MultiByteToWideChar/WideCharToMultiByte directly) instead of depending on dumb FallbackEncoding.
Assignee | ||
Comment 28•6 years ago
|
||
OK, so with that hint, the fix is very simple. NS_CopyNativeToUnicode() internally uses MultiByteToWideChar() and that will use the correct code page.
I wonder which other bugs we have now due to the dumbing down of nsMsgI18NFileSystemCharset(). There are a few call sites.
We could use NS_CopyNativeToUnicode() in some call sites, but sadly that doesn't return a status, so we won't notice if something can't be encoded, for example here:
https://dxr.mozilla.org/comm-central/rev/2a29ee0adb310b54a6a2df72034953fed8f2b043/comm/mailnews/base/src/nsMessenger.cpp#1854
This needs a follow-up bug to check all those call sites. Here for example
https://dxr.mozilla.org/comm-central/rev/2a29ee0adb310b54a6a2df72034953fed8f2b043/comm/mailnews/addrbook/src/nsAbManager.cpp#827
we could just use NS_CopyUnicodeToNative.
Attachment #9024216 -
Attachment is obsolete: true
Attachment #9024236 -
Flags: review?(VYV03354)
Updated•6 years ago
|
Attachment #9024236 -
Flags: review?(VYV03354) → review+
Comment 29•6 years ago
|
||
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/e1449ad9e4d6
Use NS_CopyNativeToUnicode() in MAPI to respect Windows code page. r=emk
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•6 years ago
|
Target Milestone: --- → Thunderbird 65.0
Assignee | ||
Updated•6 years ago
|
Attachment #9024236 -
Flags: approval-comm-esr60+
Attachment #9024236 -
Flags: approval-comm-beta+
Assignee | ||
Comment 30•6 years ago
|
||
TB 60.3.1/TB 60.4 ESR:
https://hg.mozilla.org/releases/comm-esr60/rev/489911791e8398d65602a30e4844a4a2f3f063a4
status-thunderbird64:
--- → affected
status-thunderbird65:
--- → fixed
status-thunderbird_esr60:
--- → fixed
tracking-thunderbird_esr60:
--- → 64+
Assignee | ||
Comment 31•6 years ago
|
||
Reporters, an unofficial English build of TB 60.3.1 is now available here:
https://queue.taskcluster.net/v1/task/J_C7Dh-AQY6DqO3uVgOSnA/runs/0/artifacts/public/build/install/sea/target.installer.exe
Please try it.
Reporter | ||
Comment 32•6 years ago
|
||
the original file name: ąęćżźłóńś
Subject: Wysyłanie wiadomości e-mail: ąęćżźłóńś
message: Wiadomość jest gotowa do wysłania wraz z następującymi załącznikami (plikami lub linkami):
ąęćżźłóńś
It seems that everything is fine. When can you expect an official update?
Assignee | ||
Comment 33•6 years ago
|
||
I hope within the next five days, sadly I don't set release dates myself.
Assignee | ||
Comment 34•6 years ago
|
||
I believe there is another issue which I will fix in bug 1506422. Please try this for me:
Send yourself a plaintext e-mail with only ą in it or save a draft. You can use Shift+Click "Write" if you're usually composing in HTML. Or you can use the message you produced above. Save the e-mail or draft as text file. Open that file in Notepad. I think you will see š.
We will save the file using ISO-8859-2 and Polish Windows will open the file assuming windows-1250 encoding.
Flags: needinfo?(mkasprowicz)
Reporter | ||
Comment 35•6 years ago
|
||
I do not know if I understood correctly but I did:
1. I sent the file by send to
2. I saved this message as a text file and opened it in a notebook
In the text file, I have it:
Subject:
Wysyłanie wiadomości e-mail: ąęćżźłóńś
From:
Marcin Kasprowicz <mkasprowicz@o2.pl>
Date:
11.11.2018, 19:57
To:
mkasprowicz@o2.pl
Wiadomość jest gotowa do wysłania wraz z następującymi załącznikami (plikami lub linkami):
ąęćżźłóńś
I did everything on the update from you
Flags: needinfo?(mkasprowicz)
Assignee | ||
Comment 36•6 years ago
|
||
OK thanks, can you attach that text file here. I want to check which encoding it is.
Reporter | ||
Comment 37•6 years ago
|
||
Assignee | ||
Comment 38•6 years ago
|
||
Thanks, for some reason this got saved as UTF-8 and not ISO-8859-2. So bug 1506422 wasn't a problem here.
Assignee | ||
Comment 39•6 years ago
|
||
I have another favour to ask: Address book export. Please do this:
Open the address book.
File > New > Address book. Call it whatever you want, like xxx.
Right-click on the new address book, New Contact. Call the person ąęćżźłóńś.
Export this address book: Tools > Export, choose "Comma Separated (System Charset)" - Not UTF-8.
Check the content of the file. If in doubt, attach it here. I actually get
ąęćżźłóńś
since the Polish characters can't be stored in my system charset.
You can of course delete that address book now. Thanks in advance.
Flags: needinfo?(mkasprowicz)
Reporter | ||
Comment 40•6 years ago
|
||
Flags: needinfo?(mkasprowicz)
Reporter | ||
Comment 41•6 years ago
|
||
Assignee | ||
Comment 42•6 years ago
|
||
Thank you, pretty much what I got.
Assignee | ||
Comment 44•6 years ago
|
||
TB 60.3.1 which contains the fix has now been released, Polish version here:
https://download.mozilla.org/?product=thunderbird-60.3.1-SSL&os=win&lang=pl
Assignee | ||
Comment 45•6 years ago
|
||
Beta (TB 64 beta 3):
https://hg.mozilla.org/releases/comm-beta/rev/0efafe38b540
You need to log in
before you can comment on or make changes to this bug.
Description
•