207156 - Characters outside the default code page becomes ?'s when importing Outlook mail

Reporter

Description

•

22 years ago

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030507 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030507 Trying to import Outlook 2002 mail into Mozilla. Some of my mail includes Unicode characters not in my system code page 1252, some not in any standard code page. Outlook displays this correctly. Mozilla replaces with ?'s or a near equivalent in CP1252 (e.g. Turkish g breve > g). The same problem with e-mail I send, not marked with any encoding but presumably stored as Unicode in Outlook (although in Mozilla it appears as "Content-type: text/plain; charset=windows-1252; format=flowed"), and in e-mail I receive which is marked "Content-Type: text/plain; charset=utf-8; format=flowed". Reproducible: Always Steps to Reproduce: 1. Import my Outlook data into Mozilla 2. Open the messages in Mozilla and/or look at the raw files which have the same problem Actual Results: Part of one message looks like: I can send Unicode Greek text in plain text e-mail, as here: ?? ???? ?? ? ?????. Expected Results: Greek characters should appear instead of the question marks. (Some of them are from the Unicode Extended Greek pages, not in standard Greek code pages.) Classified as "critical" because I am losing data here, and there was no warning. Unfortunately if there is no fix or work around for this one I will have to abandon Mozilla and continue to use Outlook, as I need access to old e-mails which are not just in CP1252. This may be related to bugs 88603 and 100867 - if the latter it has not been fixed properly.

Peter Kirk

Reporter

Comment 1

•

22 years ago

Attached file Two e-mails illustrating the bug, in Outlook PST and from Mozilla (deleted) — Details

This zip file includes two copies of the same e-mail, one which I submitted to a list and one received back by me (only the headers are different), in an Outlook PST file with the Greek text visible in both, and as source saved from Mozilla showing the Greek replaced by question marks even in the received copy which is marked as utf-8.

Peter Kirk

Reporter

Comment 2

•

22 years ago

This problem may be more Outlook's than Mozilla's. I find that every way of saving or exporting these messages from Outlook, even importing them into Outlook Express 6, converts them according to CP1252 - which means any data not in CP1252 is converted to ?'s and so lost. The Unicode data is saved only in the .PST files. But it is accessible by copy and paste from an Outlook window and also if a message is forwarded. In the latter case Outlook automatically selects a suitable code page if one is available or uses UTF-8 if no one code page covers the whole text. This suggests that a better way to export Outlook messages to Mozilla might be to forward the messages to an internal pseudo-mail server within Mozilla.

Peter Kirk

Reporter

Comment 3

•

22 years ago

A little more on this one. I find that the Unicode is preserved correctly and can be read by Mozilla if the problem e-mail is sent as an attachment to another e-mail and received by Mozilla. This method also gets around bug 127049 (and bug 183124 which seems to be a duplicate) as the headers are filled in as e-mail addresses - and is a cleaner solution than Gene Wood's because the message is not actually resent and so no additional headers are added. So my outline new import procedure, to fix both bugs, would be to attach all the messages in each Outlook folder to a dummy message and send that via a mail client to Mozilla, which can then unpack them back into a folder complete with Unicode and meaningful headers. It would probably be possible to set up a macro within Outlook 2002 to do what is necessary at that end. Can Mozilla handle the rest?

Peter Kirk

Reporter

•

21 years ago

Some further observations on this one: I discovered that in many of the messages which I imported from Outlook via the IMAP server the only attachment listed is winmail.dat, which is basically unreadable except within Outlook. From looking in Mozilla at several hundred messages which had attachments in Outlook, I discovered the following patterns: Messages from the time when I was using Outlook 2000 mostly came across via IMAP into Mozilla correctly, with readable attachments. In just a few cases, mostly towards the end of this period, the attachments became unreadable in winmail.dat. I traced some of these to the sender of the message using Outlook 2002. Messages dated from the day when I installed Office XP and Outlook 2002 (I found one message telling me the exact date, 14 January 2002) mostly have unreadable attachments, when imported via IMAP. There are some exceptions, many of them being when the message is HTML format rather than plain text. This all suggests that the problem is largely with Outlook 2002. It is not Mozilla's problem, I realise, but I hope it is helpful that this is noted here, and perhaps mentioned in documentation at some time. So for these messages with unreadable attachments I was forced to revert to the copies imported from Outlook by Mozilla. This is despite several significant problems with such imported messages, some of which probably are Mozilla issues: 1) Addresses in the To: etc lines of these messages are separated by semicolons and spaces, not by commas and new lines - see bug 210600; 2) In many of these messages the text has become an attachment named Part 1.1 - this would not be a problem if such attachments were displayed inline, they are not displayed inline although I have "Display Attachments Inline" checked. 3) Attachment names are truncated to 8.3 filename format; 4) The Unicode problem which this bug started with.

Severity: critical → normal

Peter Kirk

Reporter

Comment 10

•

21 years ago

On comment #9 point 2, see my newly reported bug 210606.

Peter Kirk

Reporter

Comment 11

•

21 years ago

Re comment 9: I just found Fentun, at http://www.fentun.com/, which decodes winmail.dat attachments (and runs OK on Windows 2000). I set up Fentun as the default application for MIME type ms-tnef, and now I can read my imported attachments! Wonderful! Well, it would have been even more wonderful if it had alllowed me to double click on the .gif file it found to view it with my default viewer, rather than require me extract it to a file and view it from there. But you can't have everything.

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: MailNews → Core

Gervase Markham [:gerv]

Comment 12

•

19 years ago

This is an automated message, with ID "auto-resolve01". This bug has had no comments for a long time. Statistically, we have found that bug reports that have not been confirmed by a second user after three months are highly unlikely to be the source of a fix to the code. While your input is very important to us, our resources are limited and so we are asking for your help in focussing our efforts. If you can still reproduce this problem in the latest version of the product (see below for how to obtain a copy) or, for feature requests, if it's not present in the latest version and you still believe we should implement it, please visit the URL of this bug (given at the top of this mail) and add a comment to that effect, giving more reproduction information if you have it. If it is not a problem any longer, you need take no action. If this bug is not changed in any way in the next two weeks, it will be automatically resolved. Thank you for your help in this matter. The latest beta releases can be obtained from: Firefox: http://www.mozilla.org/projects/firefox/ Thunderbird: http://www.mozilla.org/products/thunderbird/releases/1.5beta1.html Seamonkey: http://www.mozilla.org/projects/seamonkey/

Gervase Markham [:gerv]

Comment 13

•

19 years ago

This bug has been automatically resolved after a period of inactivity (see above comment). If anyone thinks this is incorrect, they should feel free to reopen it.

Status: UNCONFIRMED → RESOLVED

Closed: 19 years ago

Resolution: --- → EXPIRED

Atlanx

Comment 14

•

19 years ago

This bug should be reopend because of bug 330134 and there is still no bugfix for bug 207156. And the only reason that there are no other comments to this bug is that not a lot of people having emails with unicode and importing of emails from another emailprogram is happening mostly one time in usage of mozilla/thunderbird. Can someone deactivate autosolving for this bug and 330134?

Simon Montagu :smontagu

Updated

•

19 years ago

Status: RESOLVED → UNCONFIRMED

Resolution: EXPIRED → ---

Simon Montagu :smontagu

Comment 15

•

19 years ago

Confirming based on comments

Status: UNCONFIRMED → NEW

Ever confirmed: true

Simon Montagu :smontagu

Updated

•

19 years ago

Blocks: 330134

Jungshik Shin

Comment 16

•

•

18 years ago

possible related bugs: bug 217234 bug 272745 bug 359785 bug 276663 I am sure there are others. (does OE have same problem and cause? example bug 254118)

Severity: normal → critical

Keywords: dataloss

Atlanx

Comment 23

•

18 years ago

Can someone add "intl" to the keywords like in bug 330134 And this is not only a win2000 problem. It's still there in WinXP.

ovidiu

Comment 24

•

17 years ago

(In reply to comment #22) > possible related bugs: > bug 217234 > bug 272745 > bug 359785 > bug 276663 I do see these are dupes > (does OE have same problem and cause? example bug 254118) > This is very probably related But are these still on in current versions tb2.0.0.14 or 3?

utf16

Comment 25

•

16 years ago

This bug makes migration to Thunderbird completely impossible.

Nikolay Shopik

Comment 28

•

Comment 32

•

16 years ago

someone post the source of the email. the samples have ??? then I can try import of OE and see if it is the same problem. Or maybe I just make an email formatted to greek(windows)?

Jorg K (CEST = GMT+2)

Comment 33

•

14 years ago

The attached PST contains two messages: One sent message (without any useful header information) and one received one. The received message has this header: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" TB correctly recognizes the character set and in the imported e-mail we get this header: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed However, in the plain text body of the message the Greek characters appear garbled, very much like the Russian characters in bug 547119.

Michael A. Pasek

Updated

•

14 years ago

URL: http://gsfn.us/t/19887

Whiteboard: [gs]

Jorg K (CEST = GMT+2)

Comment 35

•

14 years ago

I will further investigate this problem shortly.

Jorg K (CEST = GMT+2)

Comment 36

•

14 years ago

Take a look at http://mxr.mozilla.org/comm-central/source/mailnews/import/outlook/src/MapiMessage.cpp#461 If the first attempt to retrieve the message body failed, we retrieve the body in the else-clause at line 482. Debugging shows that the body is returned at line 482 as this (for example the e-mail containing Greek characters): ==== {mData=0x07ae5728 "Three points here: 1) This is a quite separate issue from the plain text vs. HTML format issue. I can send Unicode Greek text in plain text e-mail, as here: ?? ???? ?? ? ?????. Indeed you did just this in your e-mail to me where you quoted these Greek words from my e-mail (retained below). Now to read this, especially the extended polytonic Greek characters, you may have to ==== Outlook has already converted the characters in question into question marks. And there is nothing we can do to get the original information back. Debugging on the Russian messages supplied in bug 584504 tell a different story. The plain text message attached there is returned in the first call at line 461. Outlook returns us the plain text message as HTML (with the correct data). Our heuristic at line 477 decides correctly that this is a plain text message, so we retrieve the body again at line 482 and again get ??? back. Summary: There are two cases here: In one case, Outlook never offers the message body as HTML. When retrieved at line 482, we get ??? In the other case, Outlook at first delivers the message body as HTML **with** the correct data. We decide to retrieve it again and then get the question marks. So the best a fix could do is to convert some plain text messages as HTML in oder to have the correct content. Opinions?

Michael A. Pasek

Comment 37

•

14 years ago

For any case where the body is returned as HTML -- even if Outlook did convert it from text/plain -- wouldn't using the Outlook HTML conversion be more likely to result in the proper display to the user ? I note that the MXR comm-central repository contains your fix for Bug 250878 (applied Aug 2, 2010), whereas the user in the case of Bug 584504 would not have that fix (the "heuristic at line 477" would not be present in his case). I'm also slightly confused by the comments regarding that user's problem in the four bug reports; I think what you're saying is that the first message (plain text) could be properly imported (with the fix for Bug 250878 ?), but the HTML message is hopelessly lost, since it doesn't contain any charset data, either in the headers or HTML body. Is that an accurate statement ?

Jorg K (CEST = GMT+2)

Comment 38

•

•

14 years ago

As I suspected, the Microsoft MAPI implementation just couldn't be non-Unicode-aware. At least some problems here could be resolved by just using the necessary Unicode flags/structures. E.g., the PR_BODY constant is defined in MAPITags.h as "PROP_TAG( PT_TSTRING, 0x1000)"; there also exist PR_BODY_W and PR_BODY_A versions. When you query the msg specifically for the PR_BODY_W, you get the UTF16 string. In TB, it seems that the UNICODE preprocessor directive isn't in effect, so the generic PR_BODY translates to PR_BODY_A and thus the returned body text contains "?" in places where the characters out of the current codepage were. If the PR_BODY_W would be used then, again, the decision would have to be made how to store the UTF16 text into the body; either use utf8 or some other charset? But this would allow to fix that Greek message.

Jorg K (CEST = GMT+2)

Comment 47

•

14 years ago

Join the team, Mike, and fix it ;-) UTF8 would be a suitable choice, right?

Mike Kaganski

Assignee

Comment 48

•

14 years ago

Thank you for invitation. The fix is the thing I'm busy with right now. Hope to be worthwhile.

Mike Kaganski

Assignee

•

14 years ago

Seems like I have failed to fulfill the requirements of the "How to Submit a Patch" (https://developer.mozilla.org/en/Getting_your_patch_in_the_tree) in that I didn't request the review. Now that I try to fulfill this, I am not sure who is the person who owns the module in question. Based on the patches of other bugs (Bug 250878, Bug 309932) that were reviewed by David :Bienvenu, I decided to ask David. Please tell me the correct address if I'm wrong.

Jorg K (CEST = GMT+2)

Comment 53

•

14 years ago

I think that's a good start ;-) He'll nominate someone else if he doesn't want to do it.

David :Bienvenu

Comment 54

•

14 years ago

Comment on attachment 472109 [details] [diff] [review] Totally reworked retrieval of the body, it is now retrieved in Unicode. Fixes to body type handling. Fixes to original charset guessing. switching review to Neil - I haven't been able to get testing of outlook import working on my machine.

Attachment #472109 - Flags: review?(bienvenu) → review?(neil)

neil@parkwaycc.co.uk

•

14 years ago

Regarding the use of <locale>: I only use the static method std::locale::classic() that returns the always-existent object (classic C locale). The isspace, isalpha and isdigit functions return a value on any input value, so no exception is possible. I use the locale paradigm to make it clear which assumptions/tests are being done, without any dependencies on a user environment settings. I'm quite sure that the algorithms in this module should be as fast as possible (hopefully table-based).

Jan

Comment 59

•

14 years ago

Is somewhere available a build of thunderbird for windows (nightly build etc.) which includes the mike's patch? I would really need it. I have the same problem as Mike and my trial version of Outlook what I've installed because of my old mail import to Thunderbird will expire soon...

Wayne Mery (:wsmwk)

Comment 60

•

14 years ago

bienvenu, or Mike, can you throw up a tryserver build. also, is there progress to finding an additional reviewer beyond neil? removing my obsolete wanted-thunderbird3? and transferring to thunderbird3.2. bugs for you to examine as possible dups (I have not examined these closely): Bug 270638 - Import kills 8-bit characters from subjects and addresses (has testcase) Bug 357294 - incorrect import of outlook pst (has testcase)

Blocks: tb-enterprise, 157010

blocking-thunderbird3.2: --- → ?

Flags: wanted-thunderbird3?

Mike Kaganski

Assignee

Comment 61

•

14 years ago

(In reply to comment #60) > bienvenu, or Mike, can you throw up a tryserver build. Unfortunately, I don't know how to do it, and suspect that some high access level needed for this. I'm just a newbee. (In reply to comment #60) > also, is there progress to finding an additional reviewer beyond neil? See above :) (In reply to comment #60) > Bug 270638 - Import kills 8-bit characters from subjects and addresses (has > testcase) This one isn't dup (thiugh I believe it may be fixed using similar approach) (In reply to comment #60) > Bug 357294 - incorrect import of outlook pst (has testcase) After importing the testcase with my patched version, the message looks OK. Well, almost OK, because the message consists of very long lines (see Bug 593907) - one letter got broken in my test.

Jorg K (CEST = GMT+2)

Comment 62

•

14 years ago

I can confirm Mike's test result. In a version that contains this fix, the test message from bug 357294 is imported correctly. Bug 357294 can therefore be marked as a duplicate of this bug.

Mike Kaganski

Assignee

Comment 64

•

14 years ago

Attached patch Major revision of the mail import (obsolete) (deleted) — Details — Splinter Review

The import code have been rewritten to make it more structured and manageable. Roles and responsibilities of modules and classes have been reviewed. As a result, this made possible to fix numerous other bugs, most notably Bug 558653. Created a workaround for deficiency of nsMsgComposeAndSend::EnsureLineBreaks(), thus solving Bug 593907. Prevented an analog of Bug 503690 to appear in Outlook import (incomplete workaround for poorly implemented nsMsgComposeAndSend::GetBodyFromEditor()). Made it possible to import of Outlook messages (msg) embedded as attachments as message/rfc822 attachment type. Fixed a minor bug in converting RTF to HTTP/plaintext.

Attachment #472109 - Attachment is obsolete: true

Attachment #488161 - Flags: review?(neil)

Attachment #472109 - Flags: review?(neil)

Jorg K (CEST = GMT+2)

Comment 65

•

14 years ago

The import from Microsoft Outlook has had many bugs. Mike Kaganski has totally reworked the import to fix all major bugs, ie. this bug 207156 and bug 558653 (which are the parents of many duplicates). He has made a fantastic effort. I have worked with Mike over the last few months and I have tested his changes. This week I have tested Mike's latest changes on my Outlook data from eleven years (1999-2010). I imported 2 GB worth of data in on hit without any problem. I urge you to review and accept this patch as soon as possible. If ever TB wants to make way into Outlook territory, this patch is an absolute MUST HAVE.

Mike Kaganski

Assignee

•

14 years ago

Excuse me. I now uploaded it to Google Docs. Here is the link: https://docs.google.com/leaf?id=0B-kIIbVJbQ46NWRiNGY4NWUtYTRhZC00NTZlLThhMTctM2I5NzYzOGI3ZjVi&hl=en

Ludovic Hirlimann [:Usul]

Attachment #488995 - Flags: review?(neil)

Attachment #488161 - Flags: review?(neil)

Mark Banner (:standard8)

Comment 76

•

14 years ago

That's better, the build succeeded and all tests passed (although we've not got any tests covering Outlook import). The build can be found here: http://ftp.mozilla.org/pub/mozilla.org/thunderbird/tryserver-builds/bugzilla@standard8.plus.com-f90ad2e714c9/tryserver-win32/

Giuliano Masseroni [:jooliaan]

Comment 77

•

14 years ago

Hello, I have tested the above mentioned build with Outlook 2007 (Italian version) and I have been able to import mail, folders and address book fine. Just the Settings weren't imported but I think this was because our company work with Exchange. I had to import them by choosing the related option one by one since if I choose Import Everything, after I hit Next, I receive this dialog window back http://img26.imageshack.us/img26/9141/nofrom.png where after "from:" there are no clients available and if I hit Next nothing happens. Ciao, Giuliano

Vincent (caméléon)

Comment 78

•

14 years ago

I have just made a test with Mozilla/5.0 (Windows NT 5.1; rv:2.0b8pre) Gecko/20101109 Thunderbird/3.3a1pre. Import of all parameters from Outlook 2007 is fine but: - account settings (pop and smtp parameters) where not imported and failed with a warning "An error occurred while importing settings (...)" if I try to import the settings only later. - all mail are marked unread whereas they are read.

Jorg K (CEST = GMT+2)

Comment 79

•

14 years ago

Mike Kaganski rework has to do with importing e-mail messages. It has nothing to do with importing settings. Imported mail used to show as unread and this behaviour hasn't changed. The behaviour that has changed is that - HTML messages are now reliably imported as HTML - plain text messages are imported as plain text - messages using international characters are imported properly - messages with embedded images are imported properly - messages with long lines are now imported properly, so no <CR><LF> is inserted into a multi-byte character thus destroying the original information. I trust Mike himself can extend this list.

Mike Kaganski

Assignee

Comment 80

•

14 years ago

Giuliano, caméléon, thank you for testing the patch! (In reply to comment #77) Well, I have never tried to import everything before. Now I have downloaded this tryserver build along with a normal nightly build (http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-central/thunderbird-3.3a1pre.en-US.win32.zip). When I tried to "Import everything" they both worked OK. I have only one Outlook profile, so I don't know if the problem you described is somehow related the multi-profile setup. However, as Jorg mentioned, we haven't modified a single line outside the mail import. Everything in the import of addresses and settings is left as it was. What to the fact that all the mail is marked unread - well, I'm not even sure if it's possible to mark any of the imported mail as read. If it is, please tell me how, and I will try to fix this (insignificant to my opinion) inconvenience.

Jorg K (CEST = GMT+2)

Comment 81

•

14 years ago

Oops, one thing to be added to the list in comment #79: - messages with attached messages now get imported properly As for the imported messages being marked as unread: Go to "Unread Folders" and mark them all as read in one hit. Yes, it's inconvenient, and there is a bug about it, too - bug 219269.

Ludovic Hirlimann [:Usul]

•

14 years ago

(In reply to comment #82) That definitely looks like a bug. Please provide more information about this. Most useful would be information about the Outlook configuration plus a .pst file that contains such messages and is known to import without attachments on the user's machine.

Martijn (MozBrowser.nl)

Comment 86

•

14 years ago

Ludovic, I tried to import from Outlook 2007 SP2. I have not especially checked the import of attached email messages, but at first glance I did not see some of the attachments that were in Outlook. When I now look in other folders I do see some attachments with all kinds of different attachments (the regular PDF and XLS, but also .lic and .config). So some of them are imported correct. I did not see that before, because I checked another folder that has attachments in Outlook, but not in the import in Thunderbird. However, in another folder I do not see a DOCX-attachment (= the OOXML format from recent Microsoft Office versions). I think that most of the attachments that I am missing are from the DOCX format.

Martijn (MozBrowser.nl)

Comment 87

•

14 years ago

I am less comfortable with attaching a PST file as this is my corporate email. I think it is best reproduced with sending a message with a DOCX file attached to it to a test mailbox (just to see whether docx attachments should be working fine or whether there's a reason these might fail).

Giuliano Masseroni [:jooliaan]

•

14 years ago

Attached patch Major revision of the mail import (obsolete) (deleted) — Details — Splinter Review

Fixed a bug where some attachments were mistakingly treated as embedded. Martijn, thank you very much for the bug report and the test case! I was wrong when thought about the possible cause of it. Now I hope that this problem will be solved. The code that tells if an attachment is embedded or not has to be improved. I'm not familiar with this, so I simply check for embedded images (i.e. "src" attribute of <img> tags). However, there may exist other cases of embedded resources (like css or script), or other attributes may be affected. If somebody has testcases of this, or has some documentation covering this, please send it to me so that I can improve this.

Attachment #488995 - Attachment is obsolete: true

Attachment #490035 - Flags: review?(neil)

Attachment #488995 - Flags: review?(neil)

Jorg K (CEST = GMT+2)

Comment 92

•

14 years ago

Yes, it happens in background images ... <table width="100%" border="0" cellspacing="0" cellpadding="0"><tr valign="top"> <td style="background-image:url(cid:1__=45BBFC26DFED153A8f9e8a93d@slv.vic.gov.au);

Mike Kaganski

Assignee

Comment 93

•

14 years ago

Attached patch Major revision of the mail import (obsolete) (deleted) — Details — Splinter Review

The complete reimplementation of handling the embedded attachments. Now the import relies on the information that Outlook provides to decide whether an attachment is embedded or not. Further, the Content-Ids of those attachments are kept intact (previously they were replaced with TB-generated). I hope these changes will improve the reliability and quality of the embedded attachments import. But I must say, that the logic to decide whether an attachment is embedded or not is based on some undocumented properties, and I need some feedback to see if it has the right to live. Second, the Bug 219269 is partially fixed. Namely, the read state is now retained on import. For this, I adopted the patch proposed by David Bienvenu for Bug 315069 (Attachment #362624 [details] [diff]). I hope that David will not frown upon it. Thank you, David. (In reply to comment #92) Jorg, this patch now creates the proper message structure for those testcases that you sent me. However, TB seems to ignore the embedded attachments in the places other than <img>. You saw it yourself when recieved such message directly with TB. This should be filed as a separate bug, what do you think? By the way, seems like Outlook does the same. At least, I didn't notice any difference in them.

Attachment #490035 - Attachment is obsolete: true

Attachment #490035 - Flags: review?(neil)

neil@parkwaycc.co.uk

Comment 94

•

14 years ago

Given that there is now a system in place for a number of volunteers to test builds including this patch, and that it is now three times its original size, I would just like to point out that I cannot provide much useful input at this point, except possibly to assist in the use of Mozilla-specific constructs. For instance, bool CMapiMessage::GetTmpFile(nsILocalFile **tmp_file) { nsCOMPtr<nsIFile> _tmp_file; ... _tmp_file->QueryInterface(nsILocalFile::GetIID(), reinterpret_cast<void**>(tmp_file)); } [note: no return value!] becomes bool CMapiMessage::GetTmpFile(nsILocalFile **aResult) { nsCOMPtr<nsIFile> tmpFile; ... return NS_SUCCEEDED(CallQueryInterface(tmpFile, aResult)); } although I do wonder why data->tmp_file isn't an nsIFile in the first place.

Mike Kaganski

Assignee

Comment 95

•

14 years ago

(In reply to comment #94) Hi Neil! Thank you for your input! Your assistance in this field would be most useful. I cannot even get close to understanding of the Mozilla-specific constructs, and I'm absolutely sure that I make mistakes often. Please help me with this. If you would also comment on the mistakes (I mean why it is conceptually wrong), I would be most grateful.

Jorg K (CEST = GMT+2)

Comment 96

•

14 years ago

I converted 2 GB of Outlook data using the newest version posted with comment #93. At first glance I found no problems. As a bonus, imported mail is no marked as read. Please get a tryserver build going, so others then test it, too.

Jorg K (CEST = GMT+2)

Comment 97

•

14 years ago

Oops, this is not ready for prime time use yet. Text messages containing the word "From" in the message body are now broken into to parts, since the import in no longer inserts the ">" prefix. So the mailbox after the conversion looks like this: From - Sat, 19 Sep 2009 05:20:31 X-Mozilla-Status: 0001 X-Mozilla-Status2: 00000000 Date:Sat, 19 Sep 2009 05:24:36 +0000 From: "JA¶rg Knobloch" Subject: How are you? To: "David Morgan" Content-Type: text/plain; charset=iso-8859-1; format=flowed Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Hello David, From the "mailing list" e-mail I sent out the other day regarding the pictures o f my latest trip, I received a read receipt from you. That was a good sign. Drop me a line. Jörg.

Mike Kaganski

Assignee

Comment 98

•

14 years ago

Comment 104

•

14 years ago

Comment on attachment 491457 [details] [diff] [review] Major revision of the mail import I didn't asked for review of the previous patch, since I made a lot of versions that turned out to require bug fixing. But now the last tryserver build with the last patch seems to achieve a generally positive responce, and I have no plans to add any new functionality to it. So now I think is the time to start the process of reviewing and consequent merge to the trunk. Neil, I ask you to review it (or assign someone for this task), since it's required to make things done. I understand that there are no tests for the module, but as this module is used for only one task that is performed once, maybe it's acceptable to rely on the users' testing? What to the improper use of the Mozilla-specific constructs, I would be thankful if you poit to them so that they could be fixed.

Attachment #491457 - Flags: review?(neil)

Mike Kaganski

Assignee

Comment 105

•

14 years ago

Attached patch Major revision of the mail import (obsolete) (deleted) — Details — Splinter Review

Fixed a bug that was introduced by previous patches, that improperly escaped some plaintext attachments and caused such messages to be split to fragments ("ghost messages"). Thanks to Jörg for pointing out this bug.

Attachment #491457 - Attachment is obsolete: true

Attachment #497106 - Flags: review?(neil)

Attachment #491457 - Flags: review?(neil)

Ludovic Hirlimann [:Usul]

Updated

•

14 years ago

Blocks: 618480

Ludovic Hirlimann [:Usul]

Updated

•

14 years ago

Blocks: 219269

neil@parkwaycc.co.uk

Comment 106

•

14 years ago

Comment on attachment 497106 [details] [diff] [review] Major revision of the mail import I didn't try to understand the code. This is just a style review. For instance, there are some lines of the form if (test) return false; these should be split onto two lines. >+ m_headers.Assign(NS_ConvertASCIItoUTF16(pVal->Value.lpszA).get()); CopyASCIItoUTF16(pVal->Value.lpszA, m_headers); >+ _tmp_file->QueryInterface(nsILocalFile::GetIID(), reinterpret_cast<void**>(tmp_file)); CallQueryInterface(_tmp_file, tmp_file); Actually I notice that you used this earlier in the file. >+ nsresult rv = NS_NewFileURI(getter_AddRefs(uri), aFile, /*Is this OK? I added it to optimize calls*/m_pIOService); It might be necessary if you're running on a background thread. I'm not sure how threadsafe the IO service is. >+ data->real_name = strdup(NS_ConvertUTF16toUTF8(fname).get()); ... >+ NS_Free( data->real_name); Although NS_Free currently calls free and therefore this will work, it didn't before, and as far as I know there's no guarantee that this will continue to work in the future. So, please either a) ensure that you free everything that you strdup, or b) use methods that allocate using NS_Alloc, for instance data->real_name = ToNewUTF8String(fname); >+#define hackWiden2(t) L ## t >+#define hackWiden(t) hackWiden2(t) I think we have a macro for this, NS_LL(x) >+ pOutlookEditor->QueryInterface( NS_GET_IID(nsIEditor), getter_AddRefs(pEditor) ); pEditor = do_QueryObject(pOutlookEditor); >+ if (m_EmbeddedObjectList == nsnull) { Could just write if (!m_EmbeddedObjectList) { >+ NS_IF_ADDREF(m_EmbeddedObjectList); m_EmbeddedObjectList is an nsCOMPtr which addrefs automatically. But GetEmbeddedObjects needs to NS_IF_ADDREF(*aNodeList). >+ nsOutlookHTMLImageElement *image = new nsOutlookHTMLImageElement(this, uri, cid, name); >+ >+ nsCOMPtr<nsIDOMHTMLImageElement> imageNode; >+ image->QueryInterface( NS_GET_IID(nsIDOMHTMLImageElement), getter_AddRefs(imageNode) ); nsCOMPtr<nsIDOMHTMLImageElement> imageNode = new nsOutlookHTMLImageElement(this, uri, cid, name); >+ nsCOMPtr<nsOutlookHTMLImageElement> node; >+ nsresult rv = m_EmbeddedObjectList->QueryElementAt(embedIndex, NS_GET_IID(nsOutlookHTMLImageElement), getter_AddRefs(node)); nsOutlookHTMLImageElement doesn't have an IID. This only compiles because we don't check IIDs strictly. (The templated IID mechanism makes it hard to enforce this. The previous system was enforcable.) Now we know this is a closed system, but I'd prefer a static cast.

Attachment #497106 - Flags: review?(neil) → review-

Mike Kaganski

Assignee

Comment 107

•

14 years ago

Happy new year everyone! Thank you Neil! I'll make these changes as soon as I will reach my development PC. I hope these will help me better understand the Mozilla coding style.

Mike Kaganski

Assignee

Comment 108

•

14 years ago

We need a PST file to test it. Either attach a PST file or individual messages in Outlook format (.MSG).

Vincent (caméléon)

Comment 121

•

14 years ago

Attached file import issue with "é" and "è" (obsolete) (deleted) — Details

***PLEASE DELETE THIS ATTACHMENT WHEN JOB IS DONE TO PROTECT PRIVACY *** Original in Outlook 2007: Venez découvrir nos produits nouveaux => Le débitmètre massique thermique FS10A de FCI After Miramar alpha 3 import: Venez dÃ©couvrir nos produits nouveaux => Le dÃ©bitmÃ¨tre massique thermique FS10A de FCI

Vincent (caméléon)

Comment 122

•

14 years ago

Attached file 3 messages showing different issues when importing from Outlook 2007 by Miramar alpha3 (deleted) — Details

3 messages showing different issues when importing from Outlook 2007 by Miramar alpha3: - image not correctly imported - body of email completely corrupted - accent issue with "é" and "è" *** PLEASE DELETE THE ZIP FILE WHEN JOB IS DONE FOR PRIVACY CONCERN ***

Attachment #529065 - Attachment is obsolete: true

Mike Kaganski

Assignee

Comment 123

•

14 years ago

(In reply to comment #119) Hello Zacki! I'm afraid that the Miramar Alpha 3 doesn't include the patches that were suggested here. For some reason it is postponed to an indefinite time (at least I can guess so taking into account the recent changes of the state of this bug). I would advise you to check the latest tryserver build containing these patches (it is referenced in Comment #101), but it seems to be unavailable now. I am sure that it would import those messages properly. Your feedback is helpful in the sense that it may remind to the developers that the state of import still needs improvement, so maybe they will pay the due attention to this bug.

Zacki

Comment 124

•

14 years ago

OK. So I don't know where to find the patch as the link in the comment #101 doesn't work. By the way, do you still need the two exemples in Outlook format ?

Vincent (caméléon)

•

14 years ago

I think that when this patch will find its way to any official build of TB they will mention it here somehow, probably they will change the status of the bug to "Fixed in version X". I hoped that this would happen already, too, but you may check the code of the files (http://mxr.mozilla.org/comm-central/source/mailnews/import/outlook/src/) - it is still unchanged since august 2010, while the last patch here dated january 2011. If you wish to test this patch, you may only try to request the tryserver build (last time it was Mark Banner who did it for me).

Vincent (caméléon)

Comment 129

•

14 years ago

Unfortunately I have no idea how I can try this patch or what is a tryserver build... Could you explain it step by step for averages users like Zacki and me are? If it is not too complicated, I believe Zacki will be happy to test this patch as he have an urgent need to import from Outlook...

Mike Kaganski

Assignee

Comment 130

•

14 years ago

caméléon, Regarding the tryserver build. You are echoing me :) This thing is explained here above (starting at Comment #60). In essence, you need to ask someone who has the authority to do it (possibly via a personal mail), and if you are successfull, this person compiles the patched version and posts a link here. The only other option is to compile it yourself, but this is rather complicated (though not impossible). There are tons of information on this in the web, the starting point may be here: https://developer.mozilla.org/en/Simple_Thunderbird_build. Regarding Comment #126. I suppose (and I think that Jörg will agree) that it's OK for everyone to write here about this problem, until it is actually fixed in the TB. One user had opened this bug, some other (including Jörg and me) tried to contribute and invented the fix, but the work is not finished until it is actually commited to the program. The developers should take it and commit to the trunk (or choose to invent their own wheel). Until then, I think that anyone interested ought to raise this issue so that developers don't forget that it is not dissapear by a miracle. On the other side, of course, you realize that the creators of the proposed patch will not support unpatched programs. From the normal user's point of view it's not essential to know all these details. He needs his import to go smoothly and its result satisfactory. So please go and call to devs: Hey, we need this! Turn your attention here and just make it work!

David :Bienvenu

•

14 years ago

Yes, it is definitely must be done. The only thing I ask is that the notice to be kept that I am the original author, so that if I will later decide to use this lib in a project under a different (incompatible) license, there would not arise a legal issue. Is this OK?

David :Bienvenu

Comment 135

•

14 years ago

yes, of course, I'll put you as the original author, thx!

David :Bienvenu

Comment 136

•

14 years ago

Attached patch update to trunk, fix whitespace issues (obsolete) (deleted) — Details — Splinter Review

this now builds on the trunk. I fixed a bunch of whitespace issues (tabs, newline issues, etc) and updated the copyright on the rtf files per Mike's last comment. I'm having a bit of a challenge testing the import code because I only have a trial version of Outlook at the moment. I'll try to do a fuller review later today, but the first thing that struck me were the commented out lines, e.g., + // nsOutlookMail::SetDefaultContentType() + if (strnicmp(m_mimeContentType.get(), "multipart/", 10) == 0) { + //50229 ISO 2022 Traditional Chinese + //50930 EBCDIC Japanese (Katakana) Extended + //50931 EBCDIC US-Canada and Japanese + //50933 EBCDIC Korean Extended and Korean + //50935 EBCDIC Simplified Chinese Extended and Simplified Chinese + //50936 EBCDIC Simplified Chinese + //50937 EBCDIC US-Canada and Traditional Chinese + //50939 EBCDIC Japanese (Latin) Extended and Japanese + m_msgFlags = CMapiApi::GetLongFromProp( pVal); +// CMapiApi::MAPIFreeBuffer( pVal); // No need since GetLongFromProp() has a delVal with default of PR_TRUE + } + pVal = CMapiApi::GetMapiProperty(m_lpMsg, PR_LAST_VERB_EXECUTED); + if (pVal) { + m_msgLastVerb = CMapiApi::GetLongFromProp( pVal); +// CMapiApi::MAPIFreeBuffer( pVal); // No need since GetLongFromProp() has a delVal with default of PR_TRUE We generally don't allow commented out code in patches, because it makes the code hard to maintain. If appropriate, the commmented out code should be replaced with comments, or just removed. Also, #if 0 code - nsOutlookEditor::UpdateEmbeddedImageReference(). I assume this was just cloned from the Eudora editor code - should it just be removed?

Attachment #504319 - Attachment is obsolete: true

Attachment #529583 - Flags: review?(dbienvenu)

Mike Kaganski

Assignee

Comment 137

•

14 years ago

(In reply to comment #136) > + // nsOutlookMail::SetDefaultContentType() > + if (strnicmp(m_mimeContentType.get(), "multipart/", 10) == 0) { = the next lines came from old nsOutlookMail::SetDefaultContentType() > + //50229 ISO 2022 Traditional Chinese > + //50930 EBCDIC Japanese (Katakana) Extended > + //50931 EBCDIC US-Canada and Japanese > + //50933 EBCDIC Korean Extended and Korean > + //50935 EBCDIC Simplified Chinese Extended and Simplified Chinese > + //50936 EBCDIC Simplified Chinese > + //50937 EBCDIC US-Canada and Traditional Chinese > + //50939 EBCDIC Japanese (Latin) Extended and Japanese As I noted above this block, this list is from a MS document; I left those lines that had not corresponding charset commented out to keep the table as close to original as possible; besides, it clearly shows that this table is incomplete, so if someone knows a charset name fot that codepages it's easily added here. > + m_msgFlags = CMapiApi::GetLongFromProp( pVal); > +// CMapiApi::MAPIFreeBuffer( pVal); // No need since GetLongFromProp() has > a delVal with default of PR_TRUE > + } > + pVal = CMapiApi::GetMapiProperty(m_lpMsg, PR_LAST_VERB_EXECUTED); > + if (pVal) { > + m_msgLastVerb = CMapiApi::GetLongFromProp( pVal); > +// CMapiApi::MAPIFreeBuffer( pVal); // No need since GetLongFromProp() has > a delVal with default of PR_TRUE Here I do something I believe to be OK, but as I'm not comfortable with the Mozilla API, I feel safer putting these comments so anyone looking for possible memory leaks will see a possible root of problems. > #if 0 code - nsOutlookEditor::UpdateEmbeddedImageReference(). > I assume this was just cloned from the Eudora editor code - should it just be > removed? No, this isn't cloned from the Eudora editor, it's the first attempt to revert the original cids. It was later replaced with another, that I believe to be more precise. Whatever, it may be removed. However, I'd like to explain what is it for. When a multipart/related message contains embedded objects, they are referenced by their cids. As the editor interface allows for autogenerated cids only, not allowing me to specify a cid of my choice, the resulting code of the imported message differed from the original. Furthermore, these cids may appear at places other than img tags, eg in scc or script sections or in other parts of the multipart/related message. At first I used the Eudora approach, which cares for img tags only, and tried to replace the autogenerated cids after generation of the message. But as it became clear that this approach is too narrow I decided to drop it and now my code tries to restore all the cids that may happen in the message so that the message is most close to the original.

David :Bienvenu

Comment 138

•

Ludovic Hirlimann [:Usul]

Updated

•

14 years ago

Blocks: 583490

David :Bienvenu

Comment 146

•

14 years ago

•

14 years ago

(In reply to comment #151) > 1. I tried to make it clear that this function is just a replacement for a > defficient other (that may become corrected at some time, or its variant may > appear, so that the only thing that will be needed in that case is to > replace my function with that proper one). Mike, thx for looking at the change. I left your comment in, so I think it's still clear that you're replacing some deficient functionality. But it's an excellent working assumption that the other function will not be fixed. Making the method private, as you say, makes it clear that it's a helper function. And if the other function is fixed, it's easy to switch back to it - we have excellent tools for tracking those kinds of changes.

David :Bienvenu

Comment 153

•

14 years ago

try server build here - The full log for this test run is available at http://ftp.mozilla.org/pub/mozilla.org/thunderbird/try-builds/bienvenu@nventure.com-e4866a647ff0 My plan is to land this on the trunk after we branch for Miramar, so the changes would be in the release 6 weeks after Miramar.

Mike Kaganski

Assignee

Comment 154

•

14 years ago

OK, I've downloaded and tested this build against my test cases. This time I used Outlook 2010 for testing, and as far as I can tell, everything is fine. So now this importer is tested to work with Outlook 2003, 2007 and 2010. Hope to hear about other (older) versions, too.

Ludovic Hirlimann [:Usul]

Updated

•

14 years ago

Assignee: smontagu → mikekaganski

David :Bienvenu

Comment 155

•

14 years ago

fixed on trunk, with one more tweak Mike e-mailed me - changeset http://hg.mozilla.org/comm-central/rev/90c3929c5b5d. This fix will be in the release after Miramar

Status: NEW → RESOLVED

Closed: 19 years ago → 14 years ago

Resolution: --- → FIXED

Target Milestone: --- → Thunderbird 3.4

David :Bienvenu

Comment 156

•

14 years ago

Attached patch stop using stdlib::locale (obsolete) (deleted) — Details — Splinter Review

this caused at least one person to have a build issue, and we don't tend to use the stdlib stuff much, and I think plain old isalpha is fine for rtf.

Philip Chee

Comment 157

•

14 years ago

Comment on attachment 533280 [details] [diff] [review] stop using stdlib::locale With this patch I don't get the link error and can compile successfully. The application starts up without error but I can't test the import functionality because I don't have Outlook installed.

Attachment #533280 - Flags: feedback+

David :Bienvenu

Comment 158

•

•

14 years ago

Attached patch use our own macro's (obsolete) (deleted) — Details — Splinter Review

Attachment #533280 - Attachment is obsolete: true

Attachment #533457 - Flags: review?(mikekaganski)

Attachment #533280 - Flags: review?(mikekaganski)

Mike Kaganski

Assignee

Comment 165

•

14 years ago

Please modify the macros this way: #define IS_DIGIT(i) (((i) >= '0') && ((i) <= '9')) #define IS_ALPHA(VAL) ((((i) >= 'A') && ((i) <= 'Z')) || (((i) >= 'a') && ((i) <= 'z'))) This will literally follow the next statements from MS RTF Spec: "<letter> a..z | A..Z <control name> <letter>+ <digit> 0..9 <parameter> '-'? <digit>+ <control word entity> '\' <control name><parameter>?"

David :Bienvenu

Comment 167

•

14 years ago

Attached patch use explicit char checks - checked in (deleted) — Details — Splinter Review

this uses explicit character checks...

Attachment #533457 - Attachment is obsolete: true

Attachment #534752 - Flags: review?(mikekaganski)

Attachment #533457 - Flags: review?(mikekaganski)

Mike Kaganski

Assignee

•

12 years ago

This is marked resolved, but I am seeing similar errors in importing from Outlook 2003 using Thunderbird 17.0.2

Wayne Mery (:wsmwk)

Comment 177

•

12 years ago

(In reply to km from comment #176) > This is marked resolved, but I am seeing similar errors in importing from > Outlook 2003 using Thunderbird 17.0.2 Please file a separate follow-up bug so we can track that separately.

Two e-mails illustrating the bug, in Outlook PST and from Mozilla 22 years ago Peter Kirk (deleted), application/x-stuffit		Details
Totally reworked retrieval of the body, it is now retrieved in Unicode. Fixes to body type handling. Fixes to original charset guessing. 14 years ago Mike Kaganski (deleted), patch		Details \| Diff \| Splinter Review
Major revision of the mail import 14 years ago Mike Kaganski (deleted), patch		Details \| Diff \| Splinter Review
Major revision of the mail import 14 years ago Mike Kaganski (deleted), patch		Details \| Diff \| Splinter Review
Message with attached messages for test (comment #82) 14 years ago Jorg K (CEST = GMT+2) (deleted), application/zip		Details
Major revision of the mail import 14 years ago Mike Kaganski (deleted), patch		Details \| Diff \| Splinter Review
Major revision of the mail import 14 years ago Mike Kaganski (deleted), patch		Details \| Diff \| Splinter Review
Major revision of the mail import 14 years ago Mike Kaganski (deleted), patch		Details \| Diff \| Splinter Review
Major revision of the mail import 14 years ago Mike Kaganski (deleted), patch	neil : review-	Details \| Diff \| Splinter Review
Major revision of the mail import 14 years ago Mike Kaganski (deleted), patch		Details \| Diff \| Splinter Review
Mail of Outlook 2007 not correctly imported on Miramer Alpha 3 (text only) 14 years ago Zacki (deleted), text/plain		Details
Mail of Outlook 2007 not correctly imported on Miramer Alpha 3 (images) 14 years ago Zacki (deleted), text/plain		Details
import issue with "é" and "è" 14 years ago Vincent (caméléon) (deleted), application/x-ole-storage		Details
3 messages showing different issues when importing from Outlook 2007 by Miramar alpha3 14 years ago Vincent (caméléon) (deleted), application/zip		Details
The three messages in question imported into TB 14 years ago Jorg K (CEST = GMT+2) (deleted), application/octet-stream		Details
update to trunk, fix whitespace issues 14 years ago David :Bienvenu (deleted), patch		Details \| Diff \| Splinter Review
tweak the comments. 14 years ago David :Bienvenu (deleted), patch		Details \| Diff \| Splinter Review
remove more #if 0 code 14 years ago David :Bienvenu (deleted), patch		Details \| Diff \| Splinter Review
more cleanup 14 years ago David :Bienvenu (deleted), patch		Details \| Diff \| Splinter Review
yet more cleanup 14 years ago David :Bienvenu (deleted), patch		Details \| Diff \| Splinter Review
stop using stdlib::locale 14 years ago David :Bienvenu (deleted), patch	philip.chee : feedback+	Details \| Diff \| Splinter Review
use our own macro's 14 years ago David :Bienvenu (deleted), patch		Details \| Diff \| Splinter Review
use explicit char checks - checked in 14 years ago David :Bienvenu (deleted), patch	standard8 : review+	Details \| Diff \| Splinter Review