Closed Bug 584504 Opened 14 years ago Closed 14 years ago

Incorrect import of Outlook 2007 data

Categories

(Thunderbird :: Migration, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 207156

People

(Reporter: mike001, Unassigned)

References

()

Details

(Whiteboard: [gs])

Attachments

(5 files)

Attached file Correct message source (deleted) —
TB: Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.2.7) Gecko/20100713 Thunderbird/3.1.1 OS: MS Windows XP Professional SP3 RUS MS Office Outlook 2007 (12.0.6535.5005) SP2 MSO (12.0.6535.5002) User says emails imported from Outlook (using TB "Import" function) are imported, but the results are garbled. User says examination of data shows TB did not import both parts of a MIME message, and the part it did import did not include the appropriate headers to allow it to be displayed properly in TB. I suspect this is due to what TB receives from Outlook via the SimpleMAPI calls, but I don't have Windows or Outlook to be able to test this. STR (according to user): Import from Outlook, where "charset=" of attachment is not the default charset for locale View imported mail, not displayed properly. Expected Results: Mail imported and able to be viewed OK. Attachments (first added with submission, others will be added afterward): 1) "normal.txt" - Message source of an email that's OK in Outlook 2) "garbled.txt" - Message source of same email after "Import" to TB 3) "Outlook.7z" - Outlook PST file, "7zipped", with 2 emails: a) Test mail from "mail.ru" b) Test mail from "hotmail.com" Of test messages, user says: "They both look fine in Outlook. When imported into TB, the headers of both are fine (I mean the names of the messages in the list), but the contents of the one from hotmail is garbled. When look into the source I see that the one from mail.ru uses the windows-1251 that is default in the TB settings, that's why it looks OK." The PST file was created "fresh", because the user's original PST file (from which the "normal" and "garbled" examples came) contained sensitive data. This MAY be a duplicate of Bug 207156, but I "normal.txt" doesn't appear to contain an attachment at all (unlike the examples in Bug 207156), and I can't "see" the imported email because I don't have a Windows machine to import it into.
Attached file "Garbled" message source (deleted) —
The companion to "normal.txt" (user says this is how TB "imported" the message).
Attached file 7zipped Outlook PST file (deleted) —
Outlook 2007 PST file, containing 2 emails (as per description)
Status: UNCONFIRMED → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
The attachment shows a screen shot of the e-mail in Outlook and TB. I used a version of TB containing the fix to bug 547119. Clearly the fix doesn't address this problem. There are two messages, the first one sent by Михаил Каганский is plain text. Upon import it is totally garbled. The second one sent by Mike Kaganski is HTML. It also looks garbled, since TB displays the content as charset=windows-1252. However, if you switch the character set to KOI8-R on the "View > Character encoding" menu, the message looks good. I will investigate further.
Further analysis: ----------------- First message: Outlook headers: Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 8bit This is a plain text message, so the fix to bug 547119 does not apply here. The debugger shows: The MAPI interface returns: "text/plain; charset=koi8-r" from the Content-Type header. This is correctly decoded and charset koi8-r is used. The converted text message has the following headers in TB: Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: 8bit However, despite the fact that everything appears correct, the content arrives garbled. ===================================================== Second message (HTML): Outlook headers DO NOT contain charset. Default encoding therefore windows-1252. Since the message is HTML, the fix to bug 547119 now tries to extract the charset from HTLM body. I ran the dubugger on TB and this is the body of the message as returned by the MAPI API. I have no idea how the submitter of the bug obtained the body (as submitted in his attachment). There is absolutely NO charset in the HTML, therefore, we have no chance to extract it and the fix to bug 547119 does not apply here. mData = 0x0688b940 "<html> <head> <style><!-- .hmmessage P { margin:0px; padding:0px } body.hmmessage { font-size: 10pt; font-family:Tahoma } --></style> </head> <body class='hmmessage'> ÍâÞ - âÕáâÞÒÞÕ áÞÞÑéÕÝØÕ, ÝÐßØáÐÝÝÞÕ ÚØàØÛÛØæÕÙ </b... ================ Conclusion: 1st message: plain text, the fix from bug 547119 does not apply here. The message is imported garbled. I hope that is not due to the fix to bug 547119. Can someone please tell me, where in the current version of TB this message is imported correcly. 2nd message: HTML. The fix to bug 547119 doesn't apply since the message body does not contain a charset we could extract.
I ran the import with the current version of TB (not the debug version that contained the fix to bug 547119 used for the tests above) and I can confirm that the current version shows exactly the same behavior: 1) Text message garbled 2) HTML message appears garbled but can be viewed correctly when changing the encoding. IMHO nothing can be done about 2), but 1) should be investigated further. In any case, this IS NOT A DUPLICATE of bug 547119.
This is a duplicate of Bug 207156. See there for further comments.
Given Jorg's analysis and Comment #7, marking as duplicate of 207156.
In XP in "Regional and Language Options" (Advanced) I switched the character set to be used for non-unicode applications to Russian. And guess what: The two messages imported just fine. Even the folder names were right. So this is really a total non-issue. Just configure the PC correctly before you do the import.
Excuse me, but I disagree. You say "this is really a total non-issue". Saying so you mean that your program works the way _you_ expect it to process the data. OK, I'm glad you are satisfied. But this IS the issue for a user! I'm a system administrator in a company where other people that aren't geeks work, and they want to see their messages safe and displayed correct. And yes, their PCs _are_ configured correctly. And the messages (those that don't contain the charset info and by lucky chance are in windows-1251) _are_ shown correctly. And those that are encoded koi-8r (they are really common here) _aren't_. And it's frustrating. And even if I show them how to change the view (an extra operation an average user hates to do), this change isn't saved and needs to be done _every single time_ the message is viewed! And they would naturally say "The Outlook shown them OK; throw away this stupid program!". And this _is_ the issue. If your program is unable to extract the information about the charset from another program; if it's unable to correctly guess it when this info is absent; if it cannot store the user-supplied info when the user bothered to provide it; if it relies on the user configured the PC "correctly" (and thus naturally discarding any mail that is outside his natural language; suppose I have mail in greek ang hebrew) - this _is_ the issue! Sorry for the outburst; I really like your program and want to replace the Outlook; I just see the obstacle I cannot pass for now...
OK. I can understand your frustration very well. I tried to replace Outlook a while ago and hit a few walls in TB. After complaining for a while I decided to fix things myself. I only learned of this specific problem today. BTW, it's not my program, I'm an non-paid volunteer, I had to learn how to build TB from the source code in my own time. Sure, if you have mixed language e-mail, there is no way to configure the PC correctly. Anyway, as you rightly pointed out, the MAPI interface is non-unicode. So therefore, the only way to get the data across is to change the way Windows is treating the data. Plain text messages are always affected (lost data) and HTML messages may or may not be affected (badly viewed data), depending on the quality of the HTML. If the HTML contains the charset information, than the fix to bug will do the trick. You might be interested to know that the character encoding can be set on a folder basis, this way it sticks.
Well, I see. From MSDN documentation (http://msdn.microsoft.com/en-us/library/cc815901.aspx): "When clients or service providers that support Unicode make a method call that includes character strings as input or output parameters, they set the MAPI_UNICODE flag. Setting this flag indicates to the implementation that all incoming strings are Unicode strings. On output, setting this flag requests that all strings passed back from the implementation should be Unicode strings if possible". I should see if this has someting to do with this. Could you point me to the code that is of interest here, so that I could srart right out without searching?
I see that there's not many functions that support the unicode flag. However I would be grateful if you could direct me to the code that handles the import from outlook.
I certainly don't know any more than what Jorg added in https://bugzilla.mozilla.org/show_bug.cgi?id=207156#c36 -- http://mxr.mozilla.org/comm-central/source/mailnews/import/outlook/src/MapiMessage.cpp#461 I'd imagine all the functions (and headers) concerned with import from Outlook would be in the ..../import/outlook/src directory.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: