454359 - confusing message with "quarantine option" disabled if AppendMsgFromFile fails in nsPop3Sink::IncorporateComplete.

Reporter

Description

•

16 years ago

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.15) Gecko/20080702 Ubuntu/8.04 (hardy) Firefox/2.0.0.15 Kazehakase/0.5.4 Build Identifier: confusable message if AppendMsgFromFile fails in nsPop3Sink::IncorporateComplete. If "quarantine option" is disabled, AppendMsgFromFile in nsPop3Sink::IncorporateComplete() fails only if memory deficit or writing data to Inbox so the error message "This message may contain a virus or there is not enough disk space. Skip this message?" is confusable because the error is not concern to temporary file(i.e. the message). Moreover, if user press OK button on the message dialog, the message will be never received even if the message does not have any virus codes. Reproducible: Always Steps to Reproduce: 1. 2. 3.

Hiroyuki Ikezoe (:hiro)

Reporter

Comment 1

•

16 years ago

Attached patch proposed patch (obsolete) (deleted) — Details — Splinter Review

Just return the original error code.

Attachment #337607 - Flags: superreview?(bienvenu)

Attachment #337607 - Flags: review?(bienvenu)

Hiroyuki Ikezoe (:hiro)

Reporter

Comment 2

•

16 years ago

(In reply to comment #0) > If "quarantine option" is disabled, AppendMsgFromFile in I meant If "quarantine option" is enabled.

Martijn Wargers (dead)

Updated

•

16 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

David :Bienvenu

Comment 3

•

16 years ago

I'm not sure that's true - a virus checker might prevent us from reading the temp file, in which case AppendMsgFromFile would fail.

Hiroyuki Ikezoe (:hiro)

Reporter

Comment 4

•

16 years ago

Ah, I see. AppendMsgFromFile should return error code if reading fails. I will revise the patch.

Hiroyuki Ikezoe (:hiro)

Reporter

Comment 5

•

16 years ago

Attached patch revised patch (deleted) — Details — Splinter Review

AppendMsgFromFile returns NS_ERROR_FAILURE if reading from file fails. Caller of AppendMsgFromFile (in this case, IncorporateComplete) returns NS_MSG_ERROR_WRITING_MAIL_FOLDER if the error code of AppendMsgFromFile is NS_MSG_ERROR_WRITING_MAIL_FOLDER otherwise call HandleTempDownloadFailed.

Attachment #337607 - Attachment is obsolete: true

Attachment #337787 - Flags: superreview?(bienvenu)

Attachment #337787 - Flags: review?(bienvenu)

Attachment #337607 - Flags: superreview?(bienvenu)

Attachment #337607 - Flags: review?(bienvenu)

Magnus Melin [:mkmelin]

Updated

•

16 years ago

Assignee: nobody → poincare

Component: General → Backend

Product: Thunderbird → MailNews Core

QA Contact: general → backend

David :Bienvenu

Comment 6

•

14 years ago

Comment on attachment 337787 [details] [diff] [review] revised patch sorry for the delay - we have to treat errors both writing to and reading from the temporary quarantine file as potentially caused by a virus checkers, and I believe this patch breaks that. Do you have a particular reproducible case of an error that you'd like to fix?

Attachment #337787 - Flags: superreview?(bienvenu)

Attachment #337787 - Flags: superreview-

Attachment #337787 - Flags: review?(bienvenu)

Attachment #337787 - Flags: review-

Wayne Mery (:wsmwk)

Updated

•

13 years ago

Assignee: hiikezoe → nobody

Component: Backend → Networking: POP

QA Contact: backend → networking.pop

Summary: confusable message if AppendMsgFromFile fails in nsPop3Sink::IncorporateComplete. → confusing message with "quarantine option" disabled if AppendMsgFromFile fails in nsPop3Sink::IncorporateComplete.

Wayne Mery (:wsmwk)

Updated

•

13 years ago

Assignee: nobody → hiikezoe

Wayne Mery (:wsmwk)

Comment 7

•

13 years ago

Hiroyuki Ikezoe seems to be gone

Whiteboard: [patchlove][has draft patch][needs new assignee]

Wayne Mery (:wsmwk)

Comment 8

•

10 years ago

ishikawa, might you be able to finish the patch?

Flags: needinfo?(ishikawa)

ISHIKAWA, Chiaki

Comment 10

•

10 years ago

(In reply to Wayne Mery (:wsmwk) from comment #8) > ishikawa, might you be able to finish the patch? Let me look at this.

Flags: needinfo?(ishikawa)

:aceman

Comment 11

•

10 years ago

The problem here is that it is not sure whether the proposed change is actually wanted. Bienvenu says all writes or reads can fail due to the antivirus checkers. Hiro wants to detect failures that are caused by other reasons. But it is not sure if that is possible. So yes, the message the user gets may be confusing but maybe we can't do better.

Assignee: hiikezoe → nobody

Wayne Mery (:wsmwk)

Comment 12

•

10 years ago

It does feel like we should do something different. I'm not awake enough ATM to suggest what.

ISHIKAWA, Chiaki

Comment 13

•

10 years ago

(In reply to ISHIKAWA, Chiaki from comment #10) > (In reply to Wayne Mery (:wsmwk) from comment #8) > > ishikawa, might you be able to finish the patch? > > Let me look at this. Many Japanese organizations including schools, government offices have a fiscal year that starts in April. That means March is the end of fiscal year and people are busy doing the end of the final report, accounting, etc. My office also was quite busy in March. Only now, with the message of Wayne, I had the chance to look at this. Now, today I look at this, and I realize I have been working on this part of code for some time to make detection of I/O errors better. I started to work on this because I noticed the lack of error code checking of so many I/O routines, read/write/flush/close, etc. Awful :-( I have discovered that the changes were necessary to detect the errors properly so that I can enable buffering without fear of possible regression errors that may corrupt message folder. Today, pop3 download does not use bufering and so downloading 1-2MB mime-encoded file is done by ~70 octets per line WITHOUT I/O buffering at all: each line invokes WRITE of such smallish octets, quite awful in terms of I/O performance. If the user's profile is on a remote server (as in a diskless corporate setting, it kills performance very badly. Obviously from a few reports of imap users do not seem to handle network error gracefully at all.) I have tried to - detect errors better: current read/write calls often fail to check errors well. Also, checking on the return value of Close() are missing in many places. Once we enable buffering, Close() is where the error from the final flush(), such as network error, file system overflow error, etc. are reported. - Once the errors are detected, what are the good error recover strategies? At least for a major issue of closing file streams related to the failure of return value of Close() I mention above, I have at least tried to sanitize major use of file streams, and figured that a few routines need to changed semantically. - There are other places where I am not entirely sure of what should be done. - The changes I made are not so significant in terms of code complexity, but numerous and scattered all over the place related to pop3 and imap and file handling of messages. Interestingly, I found that the checking of file stream being null or not is not quite correct for this case of saving a message to temporary file which is a major topic of Hiro's patch. This is nsPop3Sink.cpp 775 if (m_downloadingToTempFile) 776 { 777 // close file to give virus checkers a chance to do their thing... 778 m_outFileStream->Flush(); 779 m_outFileStream->Close(); 780 m_newMailParser->FinishHeader(); 781 // need to re-open the inbox file stream. 782 bool exists; 783 m_tmpDownloadFile->Exists(&exists); 784 if (!exists) 785 return HandleTempDownloadFailed(aMsgWindow); 786 787 nsCOMPtr <nsIInputStream> inboxInputStream = do_QueryInterface(m_outFileStream); 788 rv = MsgReopenFileStream(m_tmpDownloadFile, inboxInputStream); 789 NS_ENSURE_SUCCESS(rv, HandleTempDownloadFailed(aMsgWindow)); 790 if (m_outFileStream) 791 { 792 int64_t tmpDownloadFileSize; 793 uint32_t msgSize; 794 hdr->GetMessageSize(&msgSize); 795 // we need to clone because nsLocalFileUnix caches its stat result, 796 // so it doesn't realize the file has changed size. 797 nsCOMPtr <nsIFile> tmpClone; 798 rv = m_tmpDownloadFile->Clone(getter_AddRefs(tmpClone)); 799 NS_ENSURE_SUCCESS(rv, rv); 800 tmpClone->GetFileSize(&tmpDownloadFileSize); 801 802 if (msgSize > tmpDownloadFileSize) 803 rv = NS_MSG_ERROR_WRITING_MAIL_FOLDER; 804 else 805 rv = m_newMailParser->AppendMsgFromStream(inboxInputStream, hdr, 806 msgSize, m_folder); 807 if (NS_FAILED(rv)) 808 return HandleTempDownloadFailed(aMsgWindow); 809 810 After a download to a temporary file, TB closes the file (and this is the timing when anti-virus checker of the yesteryear may decide to delete the whole file: thus the necessity of having this option of saving to a temprary file. Otherwise, scuh anti-virus program removes WHOLE INBOX!). TB then re-opens the file now as read source using a different filestream. There is a bug. See line 790 above. http://mxr.mozilla.org/comm-central/source/mailnews/local/src/nsPop3Sink.cpp#790 TB checks for the null-ness of the original output filestream instead of the newly created input filestream which may be nullptr if anti-virus program deleted the temporary file. It took me a full couple of months of looking at the code and try to see what it was supposed to do. The logic dictates we should check for the inputstream, but the code is a little convoluted especially switching of writing to reading. It was not quite clear what TB wished to perform. But just last week, I was checking the affected code (yeah, I am doing the review of the changes I made to a few files before final patch proposal.) It became crystal clear what it was trying to do and why the check was incorrect. So there can be a few improvement in this area definitely. I have the patches ready. - the proper error checks where I think the check is needed for I/O errors during downloading. (I think the read error check in Hiro's is also mine, but in a slightly different form. I was not concerned with the appropriateness of the error code, but uniform error handling code/style, etc.) - |Close| were properly checked for return values. - SOME new error handlings were introduced. - Coupled with |Close| check, and new error handling, I found issues of where filestreams ought to be cleared and made an improvement. - But in many places where errors are detected now (where they were not before), I only inserted a DEBUG-eanbled message for debugging only and return error code. That is, I am not sure what the proper error recovery was. *BUT*, do note that it is already a great improvement and detect errors as such. Also, the overall framework of TB message handling seems to guarantee that, on the next restart of TB, the improper partial message is removed, etc. - I also inserted a line of code to enable buffering, and removed an offending Seek() which should not be there IMHO. The patches have been extensively tested with pop3 and buffering for pop3 download. (imap testing still not good.) I used my local mail profile and remote profile on a linux CIFS server, NFS, and even a CIFS server available on a small router: a USB-stick memory now holds a mail directory for test mail account. What I did was to simulate network outage by pulling the cable off (actually tested TB ran inside linux guest inside VirtualBox, and so toggling virtual network card did it.) I noticed a few corner cases and fixed them. Thus, I could test the network outage cases rather well, and as far as pop3 is concerned, I am ready with the patch. The patches and descriptions are as follows. Meta bug is Bug 1121842 - [META] RFC: C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. Step 1 is addressed by Bug 1122698. Step 2 is addressed by Bug 1134527. Step 3 is addressed by Bug 1134529. Bug 1122698 - C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (Step-1) Bug 1134527 - C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (Step-2) Bug 1134529 - C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (Step-3) I am afraid that I have not uploaded the bug fix of the inboxInputStream issue mentioned above. Also, after these months of effors, I think it would be good idea to merge the changes into a single patch set. It really does not make much sense to separate them as I did any more. I need a few cleanups definitely, but the local patches have been tested and running locally (under linux 64-bit build.) But like I said the subtle changes are everywhere, and due to the lack of better understanding of error recovery strategy which are clearly missing today [after all, before my changes I/O errors were not caught properly so there were no needs for error recover before :-( ] So I was hesitant to push the changes into the tree until the release in May takes place. Error recovery strategy can wait. After all TB has worked without good error recover all the time. However, regression that was noticed before when buffering was enabled an issue. This has blocked my pushing the patch because I could not (or did not have the time to) create an xpctest or mozmill test to simulate a heavy download I/O cases that seemed to trigger a message corruption when buffering was enabled before. My manually simulated test cases send a few dozen e-mails with particular heard strings to a test account. TB downloads these n e-mails that are sorted into a few different holders by filters, AND while that is happening, I try to move/copy a dozen or so messages during the download from a folder to another folder that are involved in filtering. Enabling buffering caused a regression before when it was attempted, and I think such a test that seems to simulate the busted cases back then would be an important addition. (See some discussions in Bug 1116055 - Performance issue: Failure to use buffered write (comm-central thunderbird if someone is interested in this topic.) At least a simple non-disturbed performance of TB under such simulation has to be in the test case. (Disturbing the operation by simulated network I/O error is too much to ask today, I think.) BTW, why the error about incorrect checking of null-ness of wrong filestream (output stream before closing the temporary download vs input stream to read from that temporary file, which the AV program may delete completely.) is not noticed now is probably today's AV software becomes aware of fiasco of deleting the whole Inbox file of mail programs and careful not to do that, etc. I am not sure. TIA PS: In parallel to think of writing the test for heavy I/O that caused regression errors when buffering was enabled, or tyring to have someone write it :-) I was doing the code audit of my change. This is the final step. 1st round done: mailnews/local/src/nsPop3Sink.cpp half finished: mailnews/local/src/nsPop3Service.cpp mailnews/local/src/nsPop3Protocol.cpp mailnews/local/src/nsMovemailService.cpp mailnews/local/src/nsParseMailbox.cpp mailnews/local/src/nsLocalMailFolder.cpp mailnews/local/src/nsMailboxService.cpp IMAP ... You can see that maybe I would have to wait for the next round of release(?) Anyway, just last week I was looking at these files and was convinced that the inputstream and outputstream usage is incorrect in one if-statement and if it works for everybody today, it is because AV programs have wised up and not delete the temporary file but probably mark it and only when its content is executable do something clever about it when it is going to be executed, I suppose. PPS: Another reason I was not so sure of is the error recovery issues. There does not seem to exist an overall description of such strategy for TB. Ok, people may say, I have to make sure that I keep the database(s) intact and roll back them to the previous states after an error. But WHAT databases exist in TB? I found a neat summary in a post, but can't find it. Grrrr :-( Is there a good documentation?

Kent James (:rkent)

Comment 14

•

10 years ago

"Ok, people may say, I have to make sure that I keep the database(s) intact and roll back them to the previous states after an error. But WHAT databases exist in TB? I found a neat summary in a post, but can't find it. Grrrr :-( Is there a good documentation?" The main databases are: 1) the mork message summary file 2) panacea.dat for the folder information 3) gloda for global searching But in addition to these, there are myriad other local variables, caching schemes, etc. that cause issues. Plus, the variables often have conflated meanings. Example: size of database file is used for both information display to the user, as well as a flag of file validity. Update requirements for these two uses are very different. And don't get me started on the New flag conflation issues! It is all of this legacy stuff that is part of the "technical debt" that makes Thunderbird code so hard to work on.

ISHIKAWA, Chiaki

Comment 15

•

10 years ago

(In reply to Kent James (:rkent) from comment #14) > "Ok, people may say, I have to make sure that I keep the database(s) intact > and > roll back them to the previous states after an error. > But WHAT databases exist in TB? > I found a neat summary in a post, but can't find it. Grrrr :-( > Is there a good documentation?" > > The main databases are: Thank you for the pointers. > > 1) the mork message summary file > 2) panacea.dat for the folder information > 3) gloda for global searching > > But in addition to these, there are myriad other local variables, caching > schemes, etc. that cause issues. I noticed this caching scheme that is not quite well updated sometimes. Someone has written down a very summary about the database in a dozen lines or so for my purpose. I think it is by WADA. I wonder if he is reading this. I found it in a thread (the particular comment was posted some time ago) in a bugzilla entry to which I am a subscriber, and which had a new comment in the last 10 days or so. (Oh well, I wish computers are clever enough to find the comment with this type of vague description. The hitch is that WADAS's comment was in the thread and posted some time ago. It could be even a year ago or so.) > Plus, the variables often have conflated > meanings. Example: size of database file is used for both information > display to the user, as well as a flag of file validity. Update requirements > for these two uses are very different. And don't get me started on the New > flag conflation issues! Thank you for reminding me ! :-) > > It is all of this legacy stuff that is part of the "technical debt" that > makes Thunderbird code so hard to work on. Yes, I agree. The debt is huge. I really feel as I were trying to chip on the edge of a glacier when I try to change something in TB. I will try to merge some local patches for hardening the error handling (maybe not so much for making the error message user-friendlier as Hiro's original intention here was) into four big patches, and make sure they build and pass |make mozmill| and |make xpcshell-test| The patches would be in Bug 1122698 - C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (Step-1) Bug 1134527 - C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (Step-2) Bug 1134529 - C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (Step-3) And then another patch for the removal of |Seek| that interferes with true buffering write and a few other error check for imap-related code. [I may decide to merge all of them after a few feedback cycles with reviewers. The idea of decomposition is to make sure that it is easy to see that I am changing the code for adding checks for error return of close/flush, etc., then adding checks for read/write, etc, and then adding a few error handling that became necessary like properly closing filestream to Inbox for pop3 download and cosmetic changes and leaving comment for future work. But once each real change is checked and verified, it may be easier to merge them and keep them as a big patch before eventual inclusion in the tree to avoid bitrot.] Oh yes, I need to have the automated test for the heavy I/O to simulate the frequent download with filtering to a multiple folders, and moving/copying of messages at the same time to make sure that the messages folders do not get corrupted. I have done manual testing, but I do need automated testing. Testing manually takes time and boring although it certainly makes me aware of some issues like "downloading can run only in main thread because a few databases are not thread-safe, thus GUI-operaton suffers a lot during downloading!", and a few other performance issues, which when someone is forced to work with slow operations is bound to become aware of. TIA PS: BTW, Hiro's patch was concerned with 0-byte read error. With a remote file system, a read may be interrupted and can return 0 octet read, but with |errno| being EINTR (or something) that indicates that we should re-try read since the error is temporary. This has not been addressed very well in my code even. This issue of short read/write should be addressed. OTOH, HOWEVER, after a couple of months testing, it looks that the ->Read(), ->Write used in message download code do not seem to pass the low-level error code as such. It seems the issue is either taken care of inside the particular I/O library they use and we don't have to face the issue (or testing is not complete. But causing it in real-world is rather difficult. We may need the LD_PRELOAD type of fake I/O routine to cause such short read/write intentionally to check the processing.). But I can certainly say that routines, which are defined under ./mozilla M-C portion of the tree, that are called from the download code do seem to suffer from the lack of EINTR-like error handling for retry. Well, at least the versions of functions defined for POSIX (linux, mac os x, solaris, maybe.) to move/copy files DO suffer from the lack of EINTR-like processing. I figured it out because there was a report that there were I/O errors when a user profile (with Mail directory) is in a remote CIFS mount and the user runs linux TB client. The user used imap. However, there were no such errors with Windows TB client with the same remote server. Now I realize the routines to move/copy a message file after download of a message is finished (move/copy may be invoked by filtering or after download is performed to a temporary file instead of Inbox directly) is implemented in OS-dependent library files. Basically a SINGLE Win32 system call is used for move/copy under Windows. Yes a single system call handles move a file to somewhere (even to a remote file system). It seems these WIN32 system calls handle short read/write, the network errors and retries under the hood. So no I/O errors are experienced by Windows TB user with CIFS-mounted user profile and Mail. Great. Whereas the routines for POSIX OS to move/copy files were handcoded AND THEY LACK EINTR processing. I know. I fixed the move file routine for error handling a year or so ago, and wondered aloud why EINTR was not handled in the code. It still is not, I think. So if excessive number of short read/write occurs in a corporate network. linux TB is toast in such environment. The problem of linux TB failing miserably with CIFS-mounted user profile and mail directory lies there. With 20/20 hindsight it may be crystal clear. But it took me a while to realize the true cause of the issue: I needed to fix the error handling issues in the download code first, and only then I came to realize the cause of the issue experienced by linux TB user with CIFS-mounted user profiel and mail directory. That windows users don't experience the issue was a clue in the first place, but even Windows users will suffer from the deficiency in the error handling framework in the message download code. So my work was not a complete waste, but actually very important to make TB more robust in the face of failures. Whew. All I want is a rock solid mail client (tm) as a mere mortal user. It is a long hard climb to reach the nirvana, eh?

Wayne Mery (:wsmwk)

Comment 16

•

9 years ago

so we should revisit this after Chiaki's patches are done for the issues blocking bug 1121842

Depends on: 1121842

Whiteboard: [patchlove][has draft patch][needs new assignee] → [patchlove]

ISHIKAWA, Chiaki

Comment 17

•

9 years ago

(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #16) > so we should revisit this after Chiaki's patches are done for the issues > blocking bug 1121842 Thank you for reminding. I am trying to build OS X and Windows versions on TryServer and making a small progress now.

Wayne Mery (:wsmwk)

Updated

•

7 years ago

Blocks: 944923, 1150346, 1416662

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Wayne Mery (:wsmwk)

Updated

•

2 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=504291

proposed patch 16 years ago Hiroyuki Ikezoe (:hiro) (deleted), patch		Details \| Diff \| Splinter Review
revised patch 16 years ago Hiroyuki Ikezoe (:hiro) (deleted), patch	Bienvenu : review- Bienvenu : superreview-	Details \| Diff \| Splinter Review