Closed Bug 1116055 Opened 10 years ago Closed 8 years ago

Performance issue: Failure to use buffered write - Part 1: Collect file seek position information.

Categories

(MailNews Core :: Database, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 52.0

People

(Reporter: ishikawa, Assigned: ishikawa)

References

(Blocks 8 open bugs)

Details

(Keywords: perf)

Attachments

(4 files, 21 obsolete files)

(deleted), image/png
Details
(deleted), application/gzip
Details
(deleted), application/gzip
Details
(deleted), patch
aceman
: review+
Details | Diff | Splinter Review
I have spotted a very conspicuous offender of not using buffered output where it can in comm-central thunderbird. I think as long as a buffer of reasonable size, even if it is 4KB or 16KB, is used, I/O performance will be much better and reasonably good. (especially for a remote file system, or imap case.) It does not have to be 128KB or much larger something, I think, although it would help for a local disk write and read. However, here I report that I have found a routine is copying message data WITHOUT ANY BUFFERING at all when it WRITES the output LINE BY LINE. The culprit routine I found is NS_IMETHODIMP nsMsgLocalMailFolder::CopyData(nsIInputStream *aIStream, int32_t aLength) in the file comm-central/mailnews/local/src/nsLocalMailFolder.cpp And the problematic unbuffered write in question is http://mxr.mozilla.org/comm-central/source/mailnews/local/src/nsLocalMailFolder.cpp#2111 2111 rv = mCopyState->m_fileStream->Write(start, lineLength, &bytesWritten); This does not seem to use buffering at all. (Please check the excerpt of system call trace from strace under linux attached in the following.). I agree that, since this function seems to be in need of scanning the data for "From" at the beginning, and escaping "From" at the beginning of line, etc., it needs to SCAN the INPUT line by line, but there is NO REASON AT ALL TO USE UNBUFFERED *OUTPUT*! Repeat-by: Monitor and record the system calls issued by TB. I used strace under linux. strace -o /tmp/t.out TB-BINARY Start TB, and then copy a message with reasonably large binary attachment in one folder to another folder. I found the repetition of small write(fd, bug, len) with length parameter 74 MANY times. E.g.: Excerpt from a typical strace record. I only excerpted the interesting lines. There are many other system calls. The particular message I copied had an attachment of 1.2MB size. The repeated writes() have the first parameter (file descriptor) as 54. I searched backward in trace file and found that FD 54 is for the target folder. (folder-C) ... open("/home/ishikawa/.thunderbird/u1ku1i3y.default/Mail/Local Folders/folder-C", O_RDWR|O_CREAT, 0664) = 54 ... Extraction of the initial part of the writes to this file descriptor: (I think I created the message using HTML format mail. That is why you see a few lines of HTML main message body. But that is not of any concern. Concern is the long repetition of mime data that follows.) ... write(54, "From - Sun Nov 22 11:35:01 2009\r"..., 33) = 33 clock_gettime(CLOCK_MONOTONIC, {2572825, 299989570}) = 0 clock_gettime(CLOCK_MONOTONIC, {2572825, 300253008}) = 0 write(54, "X-Mozilla-Status: 0001\r\n", 24) = 24 clock_gettime(CLOCK_MONOTONIC, {2572825, 299989570}) = 0 clock_gettime(CLOCK_MONOTONIC, {2572825, 300253008}) = 0 write(54, "X-Mozilla-Status2: 10000000\r\n", 29) = 29 .... I omit clock_gettime() below ... write(54, "X-Mozilla-Keys: $label3 "..., 97) = 97 write(54, "FCC: mailbox://nobody@Local%20Fo"..., 44) = 44 write(54, "X-Identity-Key: id1\r\n", 21) = 21 write(54, "Message-ID: <4B08A02E.7080201@ex"..., 44) = 44 write(54, "Date: Sun, 22 Nov 2009 11:35:01 "..., 39) = 39 write(54, "From: ishikawa <ishikawa@example"..., 39) = 39 write(54, "X-Mozilla-Draft-Info: internal/d"..., 70) = 70 write(54, "User-Agent: Thunderbird 2.0.0.23"..., 49) = 49 write(54, "MIME-Version: 1.0\r\n", 19) = 19 write(54, "To: nobody@example.com\r\n", 24) = 24 write(54, "Subject: libc-2.7.so 11:35\r\n", 28) = 28 write(54, "Content-Type: multipart/mixed;\r\n", 32) = 32 write(54, " boundary=\"------------000003080"..., 50) = 50 write(54, "\r\n", 2) = 2 write(54, "This is a multi-part message in "..., 46) = 46 write(54, "--------------000003080301040107"..., 40) = 40 write(54, "Content-Type: text/html; charset"..., 45) = 45 write(54, "Content-Transfer-Encoding: 7bit\r"..., 33) = 33 write(54, "\r\n", 2) = 2 write(54, "<!DOCTYPE html PUBLIC \"-//W3C//D"..., 65) = 65 write(54, "<html>\r\n", 8) = 8 write(54, "<head>\r\n", 8) = 8 write(54, "</head>\r\n", 9) = 9 write(54, "<body bgcolor=\"#ffffff\" text=\"#0"..., 41) = 41 write(54, "<br>\r\n", 6) = 6 write(54, "<br>\r\n", 6) = 6 write(54, "</body>\r\n", 9) = 9 write(54, "</html>\r\n", 9) = 9 write(54, "\r\n", 2) = 2 write(54, "--------------000003080301040107"..., 40) = 40 write(54, "Content-Type: application/octet-"..., 41) = 41 write(54, " name=\"libc-2.7.so\"\r\n", 21) = 21 write(54, "Content-Transfer-Encoding: base6"..., 35) = 35 write(54, "Content-Disposition: attachment;"..., 34) = 34 write(54, " filename=\"libc-2.7.so\"\r\n", 25) = 25 write(54, "\r\n", 2) = 2 write(54, "f0VMRgEBAQAAAAAAAAAAAAMAAwABAAAA"..., 74) = 74 write(54, "AAA0AAAANAAAADQAAABAAQAAQAEAAAUA"..., 74) = 74 write(54, "BAAAAAEAAAABAAAAAAAAAAAAAAAAAAAA"..., 74) = 74 write(54, "EwCcJwAAcFQAAAYAAAAAEAAAAgAAAJyd"..., 74) = 74 ... MANY lines of write with 74 octets.... write(54, "--------------000003080301040107"..., 42) = 42 write(54, "\r\n", 2) = 2 close(54) = 0 Observation: For this message with 1.2MB attachment, I counted 24016 write calls. (fgrep "write(54," /tmp/t.out | wc) Granted that the routine seems to be doing something about seeing "From", escaping "From" with ">From", etc., but it has *NO* reason to use UNBUFFERED WRITE here. It can buffer the output and then finally at the end of the loop it can flush or something. HOW TO FIX? I am not familiar with I/O routine inside mozilla code base. I am not sure where this file stream is opened [See the stack backtrace at the end.] and why it is not using buffered mode. rv = mCopyState->m_fileStream->Write(start, lineLength, &bytesWritten); Is there a way to set the buffer-mode to m_fileStream inside CopyData (just in case there are multiple wayward callers)? Or can we simply replace this Write with a buffered output version, but what that function would be? [The stack trace is shown below for those who want to chase what is calling this function.] The above is for copying an existing message to another folder. (So it can happen offline.) When I look up CopyData in mxr database, I found another instance of CopyData (for imap) and that seems to have the same unbuffered output issue. (Or at least superficially the code looks the same. Unless "->Write" semantics is different (i.e., unbuffered vs buffered) we have the crawling I/O performance when we copy / download large message (when we write to a local folder) when imap is used, too! MXR: E.g. Other place(s) where CopyData was found: I searched the identifier CopyData using mxr: MXR is slightly buggy, I think: I only left mail/mailnews-related entries. Defined as a function in: mailnews/imap/src/nsImapMailFolder.cpp (View Hg log or Hg annotations) line 3343, as member of class nsImapMailFolder -- NS_IMETHODIMP nsImapMailFolder::CopyData(nsIInputStream *aIStream, int32_t aLength) This function, CopyData calls another function CopyDataToOutputStreamForAppend() and in it, we have 3303 { 3304 rv = outputStream->Write(start, 3305 end-start, 3306 &writeCount); 3307 rv = outputStream->Write(CRLF, 2, &writeCount); 3308 } We should make sure that these writes are buffered! (and flush at the end of the loop/return.) From what I read before about the slowness of I/O in a comment posted by IMAP user, probably these are NOT buffered, either. AND, we SHOULD CHECK the return value rv: there is NO ERROR checking here !? No wonder there are reports of IMAP users losing messages, etc. in the past? [Added comment: even if these UNBUFFERED writes failed, close will succeed! So checking here is very important! ] Defined as a function prototype in: mailnews/base/public/nsICopyMessageListener.idl (View Hg log or Hg annotations) line 20 -- void copyData(in nsIInputStream aIStream, in long aLength); Defined as a variable in: mailnews/local/src/nsLocalMailFolder.cpp (View Hg log or Hg annotations) line 1830 -- rv = CopyData(inputStream, (int32_t) fileSize); line 2026 -- NS_IMETHODIMP nsMsgLocalMailFolder::CopyData(nsIInputStream *aIStream, int32_t aLength) This is the function I found out about and report here. The questions I have are: Is there a way to set the buffer-mode to m_fileStream inside CopyDate (just in case there are other wayward callers)? Or can we simply replace this Write with a buffered output version? See Related bugs: Bug 558528 - Larger buffer size in file I/O by Mail&News to improve performance on slower drives (eg network drives/shares). Use of Necko's buffer size, buffer size user can easily customize. Well, I have not used the aceman's patch in there yet since I wanted to find out where the offending code for calling write() repeatedly with smallish 74 octet length originated. I noticed this repeated write() of smallish size earlier this year when someone reported a serious issue when his mail store is on a remote file system (CIFS) and he seems to have used imap. I tried to re-create his problem locally by mounting Windows 7 file system from my local linux installation and in so doing noticed a few issues with CIFS code itself during a transient network error (which seems to have been fixed nicely), and a missing error code or two on TB's side, and in so doing noticed a trace of MANY write system calls with short (70+ octets) on wire. Such many small write calls over wire can kill performance easily, and the missing error handling was bad BTW, :aceman's patch there seems to handle many "read" side of the issue. The problem I am reporting is "write" side. Also, note that the above bug deals with buffer size when buffering is enabled. This bug is about a place where buffering is not used at all when it should. Bug 494238 - (jetpack-panel-apps) ability for a jetpack to open a panel I wanted to use the great profiler ( see https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Thunderbird_Performance_Problem_with_G ). But it never seemed to work for me when I tried to use it against C-C TB until lately. But finally, thanks to https://bugzilla.mozilla.org/show_bug.cgi?id=494238#c47 it seems to work! (But I have not yet used that profiler for this particular performance issue because it has been only a day since I could get it working after a long fiasco with bug regarding loadinfo, Bug 1111304 - assertion failure loadinfo, this month. ) I wanted to get out the words fast so that something can be done during the holidays for this serious performance issue. But once I become familiar with the profiler and can offer performance figures using the profiler, I will. It looks the profiler could have saved me in my efforts to capture the traceback mentioned below. I see a traceback in the profile window somehow. This is the stack trace of C-C TB when the write with this small chunk is invoked. (gdb) where #0 write () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff7fb8740 in pt_Write (fd=<optimized out>, buf=<optimized out>, amount=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/nsprpub/pr/src/pthreads/ptio.c:1315 #2 0x00007ffff04beccc in (anonymous namespace)::interposedWrite ( aFd=<optimized out>, aBuf=<optimized out>, aAmt=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/build/NSPRInterposer.cpp:68 #3 0x00007ffff7f9f0b9 in PR_Write (fd=<optimized out>, buf=<optimized out>, amount=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/nsprpub/pr/src/io/priometh.c:114 #4 0x00007ffff009ce03 in nsMsgFileStream::Write (this=<optimized out>, buf=<optimized out>, count=<optimized out>, result=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/base/util/nsMsgFileStream.cpp:154 #5 0x00007ffff02ace8d in nsMsgLocalMailFolder::CopyData ( this=<optimized out>, aIStream=<optimized out>, aLength=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsLocalMailFolder.cpp:2111 #6 0x00007fffeffa089f in nsCopyMessageStreamListener::OnDataAvailable ( this=<optimized out>, ctxt=<optimized out>, aIStream=<optimized out>, sourceOffset=<optimized out>, aLength=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/base/src/nsCopyMessageStreamListener.cpp:86 #7 0x00007ffff02c8a48 in nsMailboxProtocol::ReadMessageResponse ( this=<optimized out>, inputStream=<optimized out>, sourceOffset=<optimized out>, length=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsMailboxProtocol.cpp:594 #8 0x00007ffff02c8f0b in nsMailboxProtocol::ProcessProtocolState ( this=<optimized out>, url=<optimized out>, inputStream=<optimized out>, offset=<optimized out>, length=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsMailboxProtocol.cpp:679 #9 0x00007ffff00c1399 in nsMsgProtocol::OnDataAvailable ( this=<optimized out>, request=<optimized out>, ctxt=<optimized out>, inStr=<optimized out>, sourceOffset=<optimized out>, count=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/base/util/nsMsgProtocol.cpp:353 #10 0x00007ffff0572bfe in nsInputStreamPump::OnStateTransfer ( this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/netwerk/base/src/nsInputStreamPump.cpp:609 #11 0x00007ffff0573f5d in nsInputStreamPump::OnInputStreamReady ( this=<optimized out>, stream=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/netwerk/base/src/nsInputStreamPump.cpp:436 #12 0x00007ffff04751d8 in nsInputStreamReadyEvent::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/io/nsStreamUtils.cpp:88 #13 0x00007ffff049cf6d in nsThread::ProcessNextEvent (this=<optimized out>, aMayWait=<optimized out>, aResult=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/threads/nsThread.cpp:855 #14 0x00007ffff04cfba3 in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/glue/nsThreadUtils.cpp:265 #15 0x00007ffff08c4356 in mozilla::ipc::MessagePump::Run ( this=<optimized out>, aDelegate=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/glue/MessagePump.cpp:99 #16 0x00007ffff0885029 in MessageLoop::RunInternal (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:233 #17 0x00007ffff08853fc in RunHandler (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:226 #18 MessageLoop::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:200 #19 0x00007ffff28827b3 in nsBaseAppShell::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/widget/nsBaseAppShell.cpp:164 #20 0x00007ffff33d61f9 in nsAppStartup::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/components/startup/nsAppStartup.cpp:281 #21 0x00007ffff345e1ee in XREMain::XRE_mainRun (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4153 #22 0x00007ffff345f9ab in XREMain::XRE_main (this=<optimized out>, argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4229 #23 0x00007ffff345fd7d in XRE_main (argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>, aFlags=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4449 #24 0x000000000040378e in do_main (argc=argc@entry=1, argv=argv@entry=0x7fffffffd9f8, xreDirectory=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mail/app/nsMailApp.cpp:195 #25 0x00000000004038b5 in main (argc=1, argv=0x7fffffffd9f8) at /REF-COMM-CENTRAL/comm-central/mail/app/nsMailApp.cpp:380 (gdb) info register Yeah, right. I had to set a breakpoint to "write" system call and checked the registers (this is under Debian GNU/Linux 64-bit) and when the passed length parameter is 74 in, I hit the breakpoint and voila. It took me a while to figure out gdb's esoteric command expression syntax, and I have to skip the major part of the execution without setting up the break point (because with the conditional breakpoint, the execution speed is much slower [maybe error message written to stderr causes one |write| call per character and thus slows down TB execution very much], and hitting control-C to revert back to gdb prompt when TB is finally waiting for user interaction after showing the three window panes, and THEN set the conditional breakpoint break write if ($rdx == 74) to gdb prompt and continue, then hit the breakpoint to obtain the above stack trace. The great profiler I mentioned above seems to record the repeated write nicely (since it is sampled many times, and at the sampled time, the profiler seems to record the stack trace, and so it would have been much easier for me to figure out which higher level routine invokes short writes) TIA
Summary: Performance issue: Failure to use buffered write comm-central thunderbird → Performance issue: Failure to use buffered write (comm-central thunderbird)
Attached patch 1st try: make m_fileStream buffered. (works) (obsolete) (deleted) — Splinter Review
Hi, I figured out that there is a function to attach buffering to existing filestream. I found out that the file stream was created in a function nsresult nsMsgLocalMailFolder::InitCopyMsgHdrAndFileStream() http://mxr.mozilla.org/comm-central/source/mailnews/local/src/nsLocalMailFolder.cpp#1978 Since the code in question was created entirely by mozilla@davidbienvenu.org, it may be a good idea to put him in the requestee field instead. Please feel free to change this if doing so is better. The patch passed local |make mozmill| and make xpcshell-tests under Debian GNU/Linux 64-bit: I mean it does not seem to introduce any new issues. xpcshell-tests seems to have some issues with isUS pref, but this is a C-C wide issue (see https://tbpl.mozilla.org/?tree=Thunderbird-Trunk ) and does not have anything to do with this patch. TIA
Attachment #8542002 - Flags: review?(bwinton)
BTW, I have no idea if this fixes the unbuffered write issue for IMAP case. But at least, this solved the issue of many small writes in local copying. One bug down, many more to go after.
Oh failure to check the resulting code of Write, Close, Flush, etc. in this file and mailnews/imap/src/nsImapMailFolder.cpp should be done eventually. I will file a bug to take care of these two files. TIA
Assignee: nobody → ishikawa
Comment on attachment 8542002 [details] [diff] [review] 1st try: make m_fileStream buffered. (works) Review of attachment 8542002 [details] [diff] [review]: ----------------------------------------------------------------- I don't think bwinton will review c++ backend code.
Attachment #8542002 - Flags: review?(neil)
Attachment #8542002 - Flags: review?(irving)
Attachment #8542002 - Flags: review?(bwinton)
Component: Untriaged → Database
Keywords: perf
Product: Thunderbird → MailNews Core
There may well be other unbuffered output streams. A lot of the time this was caused by the conversion from nsFileSpec to nsIFile; nsFileSpec buffers its output streams by default but nsIFile does not, so all of the converted cases suddenly stopped buffering their output streams. Most of them were detected in previous bugs but it's not impossible that some have been simply missed.
Comment on attachment 8542002 [details] [diff] [review] 1st try: make m_fileStream buffered. (works) > rv = mCopyState->m_msgStore-> > GetNewMsgOutputStream(this, getter_AddRefs(mCopyState->m_newHdr), > &reusable, > getter_AddRefs(mCopyState->m_fileStream)); The maildir store already buffers its new message output stream; the (default) mbox store does not. I think the correct fix here is to make the mbox store return a buffered output stream.
Attachment #8542002 - Flags: review?(neil) → review-
(In reply to neil@parkwaycc.co.uk from comment #6) > Comment on attachment 8542002 [details] [diff] [review] > 1st try: make m_fileStream buffered. (works) ... > The maildir store already buffers its new message output stream; the > (default) mbox store does not. I think the correct fix here is to make the > mbox store return a buffered output stream. Neil, thank you new for the comment. Now I understand the past change over from nsFileSpec to nsIFile and I think there is at least one place where this buffering issue is missed. (Was it ver 23 time period? I found there were many users who complain(ed) that TB's performance dropped circa ver23.x ) I have no idea how to do the proposed conversion unfortunately. I tried a few things. A few attempts to convert a non-buffered stream within mailnews/local/src/nsMsgBrkMBoxStore.cpp obviously mutated a pointer in an unexpected way and I got failures. The following is the detail of experiments to rewrite. As a result, I prefer the simple bandage patch proposed initially here since it eliminates a serious performance loss due to large number of small |write| calls, especially for people who needs to use remote file servers without a big source code change at all. Please read the following why I think the proposed change to return buffered output stream from mbox/messagestore in the current source code is not easy. OTOH, I am not an expert of mozilla source code. Someone in the know may be able to fix the issue easily. I have no idea. Observation: trials and errors. Initially, I thought I may want to change |MsgGetFileStream| to return a buffered stream. From MXR: MsgGetFileStream mailnews/base/util/nsMsgUtils.cpp (View Hg log or Hg annotations) line 1421 -- nsresult MsgGetFileStream(nsIFile *file, nsIOutputStream **fileStream) Defined as a function prototype in: mailnews/base/util/nsMsgUtils.h (View Hg log or Hg annotations) line 186 -- NS_MSG_BASE nsresult MsgGetFileStream(nsIFile *file, nsIOutputStream **fileStream); mailnews/local/src/nsMsgBrkMBoxStore.cpp (View Hg log or Hg annotations) line 661 -- rv = MsgGetFileStream(mboxFile, aResult); (This is in GetNewMsgOutputStream()) line 876 -- rv = MsgGetFileStream(mboxFile, getter_AddRefs(outputStream)); (This is in GetOutputStream()) As it turned out, changing the stream in GetOutputStream() here using NS_BufferedOutputStream() resulted in the strange error I mentioned below. GetOutputStream is for handling the header part stored in array (NSIarray), I think. So this part is not modified to add buffering. We may lose performance slightly due to non-buffering, but I can't help it. mailnews/local/src/nsPop3Sink.cpp (View Hg log or Hg annotations) line 491 -- rv = MsgGetFileStream(m_tmpDownloadFile, getter_AddRefs(m_outFileStream)); The above call is in an if-clause, but else-clause calls GetNewMsgOutputStream, which calls MsgGetFileStream in turn, and so no change is necessary here (if MsgGetFileStream returns buffered output). Referenced (in 2 files total) in: mailnews/compose/src/nsMsgAttachmentHandler.cpp (View Hg log or Hg annotations) line 818 -- rv = MsgGetFileStream(mEncodedWorkingFile, getter_AddRefs(obj->fileStream)); The above does not seem to need extra processing. mailnews/local/src/nsMsgMaildirStore.cpp (View Hg log or Hg annotations) line 1310 -- return MsgGetFileStream(maildirFile, getter_AddRefs(aOutputStream)); The above does not seem to need extra processing. However, a simple change like the following produced compilation error: nsresult MsgGetFileStream(nsIFile *file, nsIOutputStream **fileStream) { nsMsgFileStream *newFileStream = new nsMsgFileStream; NS_ENSURE_TRUE(newFileStream, NS_ERROR_OUT_OF_MEMORY); nsresult rv = newFileStream->InitWithFile(file); if (NS_SUCCEEDED(rv)) rv = newFileStream->QueryInterface(NS_GET_IID(nsIOutputStream), (void **) fileStream); // make sure that the returned filestream is buffered. if (NS_SUCCEEDED(rv)) { *fileStream = NS_BufferOutputStream(*fileStream, 64 * 1024 ); } return rv; } Compilation error: /REF-COMM-CENTRAL/comm-central/mailnews/base/util/nsMsgUtils.cpp: In function ‘nsresult MsgGetFileStream(nsIFile*, nsIOutputStream**)’: /REF-COMM-CENTRAL/comm-central/mailnews/base/util/nsMsgUtils.cpp:1431:17: error: cannot convert ‘already_AddRefed<nsIOutputStream>’ to ‘nsIOutputStream*’ in assignment *fileStream = NS_BufferOutputStream(*fileStream, 64 * 1024 ); ^ I modified the code as below and it compiled. Does it look OK?: Note ".take()" at the end. I followed a few similar usages in the C-C tree. #if 1 // make sure that the returned filestream is buffered. if (NS_SUCCEEDED(rv)) { *fileStream = NS_BufferOutputStream(*fileStream, 64 * 1024 ).take(); } #endif |make mozmill| errors: However, then I get three new errors NS_NOITNERFACE: Test Failure: Component returned failure code: 0x80004002 (NS_NOINTERFACE) [nsIMsgFolder.addKeywordsToMessages] [2026] WARNING: (debug)setting buffering to an existing stream inside GetOutputStream.: file /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsMsgBrkMBoxStore.cpp, line 889 [2026] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004002: file /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsMsgBrkMBoxStore.cpp, line 973 Test Failure: Component returned failure code: 0x80004002 (NS_NOINTERFACE) [nsIMsgFolder.addKeywordsToMessages] TEST-UNEXPECTED-FAIL | /REF-COMM-CENTRAL/comm-central/mail/test/mozmill/folder-display/test-deletion-from-virtual-folders.js | test-deletion-from-virtual-folders.js::setupModule The error occured in the following function. NS_IMETHODIMP nsMsgBrkMBoxStore::ChangeKeywords(nsIArray *aHdrArray, const nsACString &aKeywords, bool aAdd) { NS_ENSURE_ARG_POINTER(aHdrArray); nsCOMPtr<nsIOutputStream> outputStream; nsCOMPtr<nsISeekableStream> seekableStream; int64_t restoreStreamPos; uint32_t messageCount; nsresult rv = aHdrArray->GetLength(&messageCount); NS_ENSURE_SUCCESS(rv, rv); if (!messageCount) return NS_ERROR_INVALID_ARG; ===> rv = GetOutputStream(aHdrArray, outputStream, seekableStream, restoreStreamPos); NS_ENSURE_SUCCESS(rv, rv); Obviously, trying to produce a buffered stream for nsIArray *nsHdrArray is not a good idea. So modifying the code at the level of |MsgGetFileStream| is not good. So I tried to move the modification level a little higher and nearer toward the application code. I decided to see if I can add buffering at the level of GetNewMsgOutputStream() as in below. --- a/mailnews/local/src/nsMsgBrkMBoxStore.cpp +++ b/mailnews/local/src/nsMsgBrkMBoxStore.cpp @@ -664,16 +664,22 @@ nsMsgBrkMBoxStore::GetNewMsgOutputStream printf("failed opening offline store for %s\n", URI.get()); NS_ENSURE_SUCCESS(rv, rv); seekable = do_QueryInterface(*aResult, &rv); NS_ENSURE_SUCCESS(rv, rv); rv = seekable->Seek(nsISeekableStream::NS_SEEK_END, 0); NS_ENSURE_SUCCESS(rv, rv); m_outputStreams.Put(URI, *aResult); } + + if (NS_SUCCEEDED(rv)) { + NS_WARNING("(debug)setting buffering to an existing stream inside GetNewMsgOutputStream."); + *aResult = NS_BufferOutputStream(*aResult, 64 * 1024 ).take(); + } + Problem with this is that there is no information about the passed aResult upon entry to GetNewMsgOutputStream, and it could be that *aResult has already been configured for buffering. But who knows? ---- With the above patch, then, I get the following failure: [32497] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80470002: file /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsPop3Sink.cpp, line 755 ++DOCSHELL 0x38bf440 == 15 [pid = 32497] [id = 15] // We need to flush the output stream, in case mail filters move // the new message, which relies on all the data being flushed. rv = m_outFileStream->Flush(); // Make sure the message is written to the disk ==> NS_ENSURE_SUCCESS(rv, rv); NS_ASSERTION(m_newMailParser, "could not get m_newMailParser"); I have ABSOLUTELY NO IDEA why. I *think* this error did not show up using |make mozmill| (!). Strange. Anyway, because of the above I/O error, I could not get new mails downloaded with this patch. I suspect that I am mutating the passed pointer *aResult in an unexpected manner in |GetNewMsgOutputStream|. If I take out the above patch, TB again works. So I finally reverted to the original patch. I know that with the approach of the original patch, we have to check for all the occurrences of the call to Message store (mbox) |GetNewMsgOutputStream| to see if we need to buffer the outputstream immediately after the call to GetNewMsgOutputStream. Anyway, making mbox/message store to return a buffered output stream does not seem to be a simple straight-forward issue. In the meantime, some users do suffer from poor performance. I am inclined to take the easy patch for now for the next release, etc. Any ideas? CAVEAT EMPTOR: I am a mere user of TB and an occasional patch contributor, so I don't understand the code of TB very well. So I may be missing something. If we add buffering to a stream returned by |GetNewMsgOutputStream| in the bandage fix manner, the following mxr listing shows where we may need to pay attention, and now I realize why we want to make mbox/message store to return a buffered output: since nsMsgMaildirStore.cpp defines a GetNewMsgOutputStream() that creates a buffered output stream, the users of pluggable backend(s) may assume the stream is buffered, and if I add extra code to add buffering, this will eventually override the already buffered stream [returned by the version of GetNewMsgOtuputStream in nsMsgMaildirStore.cpp when we switch to MsgMailDir format.] Hmm... But as I explained above, the change seems to be not so easy. GetNewMsgOutputStream If you can't find what you're looking for, you can always perform a free-text search for it. Defined as a function in: mailnews/local/src/nsMsgBrkMBoxStore.cpp (View Hg log or Hg annotations) line 618, as member of class nsMsgBrkMBoxStore -- nsMsgBrkMBoxStore::GetNewMsgOutputStream(nsIMsgFolder *aFolder, mailnews/local/src/nsMsgMaildirStore.cpp (View Hg log or Hg annotations) line 600, as member of class nsMsgMaildirStore -- nsMsgMaildirStore::GetNewMsgOutputStream(nsIMsgFolder *aFolder, (this returns buffered output, so no modification is necessary.) Defined as a function prototype in: mailnews/base/public/nsIMsgPluggableStore.idl (View Hg log or Hg annotations) line 133 -- nsIOutputStream getNewMsgOutputStream(in nsIMsgFolder aFolder, mailnews/base/util/nsMsgDBFolder.cpp (View Hg log or Hg annotations) line 873 -- rv = offlineStore->GetNewMsgOutputStream(this, &aHdr, &reusable, mailnews/import/applemail/src/nsAppleMailImport.cpp (View Hg log or Hg annotations) line 555 -- rv = msgStore->GetNewMsgOutputStream(aDstFolder, getter_AddRefs(msgHdr), mailnews/import/eudora/src/nsEudoraMailbox.cpp (View Hg log or Hg annotations) line 299 -- rv = msgStore->GetNewMsgOutputStream(dstFolder, getter_AddRefs(msgHdr), &reusable, line 392 -- rv = msgStore->GetNewMsgOutputStream(dstFolder, getter_AddRefs(msgHdr), &reusable, mailnews/import/oexpress/nsOE5File.cpp (View Hg log or Hg annotations) line 313 -- rv = msgStore->GetNewMsgOutputStream(dstFolder, getter_AddRefs(msgHdr), &reusable, mailnews/import/oexpress/nsOEMailbox.cpp (View Hg log or Hg annotations) line 318 -- rv = m_msgStore->GetNewMsgOutputStream(m_dstFolder, getter_AddRefs(msgHdr), &reusable, mailnews/import/outlook/src/nsOutlookMail.cpp (View Hg log or Hg annotations) line 394 -- rv = msgStore->GetNewMsgOutputStream(dstFolder, getter_AddRefs(msgHdr), &reusable, mailnews/local/src/nsPop3Sink.cpp (View Hg log or Hg annotations) line 500 -- m_msgStore->GetNewMsgOutputStream(m_folder, getter_AddRefs(newHdr), Referenced (in 6 files total) in: mailnews/local/src/nsLocalMailFolder.cpp (View Hg log or Hg annotations) line 1985 -- GetNewMsgOutputStream(this, getter_AddRefs(mCopyState->m_newHdr), line 3572 -- rv = msgStore->GetNewMsgOutputStream(this, getter_AddRefs(newHdr), mailnews/local/src/nsMovemailService.cpp (View Hg log or Hg annotations) line 504 -- rv = msgStore->GetNewMsgOutputStream(inbox, getter_AddRefs(newHdr), mailnews/local/src/nsParseMailbox.cpp (View Hg log or Hg annotations) line 2378 -- rv = store->GetNewMsgOutputStream(destFolder, &aHdr, &reusable, mailnews/local/src/nsMsgMaildirStore.cpp (View Hg log or Hg annotations) line 637 -- ("GetNewMsgOutputStream - tmp subfolder does not exist!!\n")); mailnews/import/test/unit/resources/TestMailImporter.js (View Hg log or Hg annotations) line 100 -- let outputStream = msgStore.getNewMsgOutputStream(destination, mailnews/local/test/unit/test_over4GBMailboxes.js (View Hg log or Hg annotations) line 114 -- let outputStream = plugStore.getNewMsgOutputStream(gInbox, newMsgHdr, reusable) TIA
(In reply to ISHIKAWA, Chiaki from comment #7) > Neil, thank you new for the comment. Now I understand the past change over > from > nsFileSpec to nsIFile and I think there is at least one place where this > buffering > issue is missed. (Was it ver 23 time period? I found there were many users > who complain(ed) that TB's performance dropped circa ver23.x ) Bug 33451 suggests around TB 2.
Please note that a side effect of the slowness, is that, because code for moving mail from IMAP account to remote CIFS mounted directory does not correctly handled various network IO error (timeout, eagain, ...), as soon as I transfer a mail with a > 3 MB attachment on a samba share on a filler (besides the fact that it takes ages when it execptionnaly works) I usually end up with corrupted target directory (after having TB itself unresponsives for minutes). I need to quit and to try recovering the target directory. Fater recovery I usually get several identical mails with 0 size. frequenetly it even causes data lost which makes this bug side effect critical. I agree the write error handling in case of networked IO is a separate issue but this rize the criticallity of this bug.
Comment on attachment 8542002 [details] [diff] [review] 1st try: make m_fileStream buffered. (works) Review of attachment 8542002 [details] [diff] [review]: ----------------------------------------------------------------- I'll echo Neil's r- on the current patch. We tried adding buffering for writes to local folders in bug 769346, and had to back it out because it caused bug 815012; please look through those bugs and their patches to see if we're going to have the same problem with your new patches.
Attachment #8542002 - Flags: review?(irving) → review-
I would like to bring to your attention that using the same TB version on windows (currently in a Linux VM but in the past on windows XP itself) do not exhibit the same problem although the IMAP server is the same and the network destination folder is the same (in order to share sorted/stored mails onLinux and Windows). Even in a VM with only 2GB memory allowed it is a lot faster, never fails and is the only way I have found to effectively classify manually my mails. I doubt it is only related to windows networking client code but more probably to the way the low level IO handling (read/write operation) is implemented on windows. It may help finding a correct way to fix it.
(In reply to Eric Valette from comment #11) > I would like to bring to your attention that using the same TB version on > windows (currently in a Linux VM but in the past on windows XP itself) do > not exhibit the same problem although the IMAP server is the same and the > network destination folder is the same (in order to share sorted/stored > mails onLinux and Windows). Even in a VM with only 2GB memory allowed it is > a lot faster, never fails and is the only way I have found to effectively > classify manually my mails. > > I doubt it is only related to windows networking client code but more > probably to the way the low level IO handling (read/write operation) is > implemented on windows. > > It may help finding a correct way to fix it. This seems a good hint to investigate the root cause. Thank you.
(In reply to :Irving Reid (No longer working on Firefox) from comment #10) > Comment on attachment 8542002 [details] [diff] [review] > 1st try: make m_fileStream buffered. (works) > > Review of attachment 8542002 [details] [diff] [review]: > ----------------------------------------------------------------- > > I'll echo Neil's r- on the current patch. > > We tried adding buffering for writes to local folders in bug 769346, and had > to back it out because it caused bug 815012; please look through those bugs > and their patches to see if we're going to have the same problem with your > new patches. OK, for testing, to recreate the corrupt messages that were observed when buffering was enabled, I am trying to see if the following scenario is enough to re-create the problem [when buffering is enabled.] Scenario: Prerequisites: Create folder-C, folder-D, folder-E, and folder-F. Create message filters: One to filter message with a subject line containing "[folder-D]" to folder-D. Ditto for filtering a message with a subject that contains "folder-E" to folder-E. Similarly to folder-F. (Folder-C is left alone for manual operation later on.) What I will do: First I send a group of messages of various sizes, some of which are meant to be filtered into folder-D, folder-E and folder-F. Now start TB and downloading the group of messages, and letting filtering take place. While doing the downloading and filtering, do additional manual copy of messages from another folder to a few of the destination folders of filtering. I send enough e-mails so that I have the timing to do the additional copy/move while the download still continues. Background: Reading the reports of problems bug 815012, I think the mixing of downloading, filtering and extra copying/moving manually seems to do the trick. [Well, in some cases, simply filtering or simple manual moves resulted in corruption.] That is why I thought of the above scenario to test if the current TB has better locking, etc. so that even enabling buffering, we still have the messages intact [without getting corrupt]. What I did so far: So, without the patch to enable the buffering, I am using the following shell (bash) script to send e-mails of various sizes in a mixed order so that there will be enough POP3 download when TB starts up (and I can manually do the copying of some e-mails from the existing folders to a few of the destination eventually. But not yet there.) The sizes of message bodies created in the shell script (modulo CRLF vs LF issue: I don't know how exactly they are handled, but the intent of using messages of different size should be clear. I want to create a different buffering conditions during the copying and especially at the end just before closing.) bytes 0 : 1569 1 : 2592 2 : 5153 3 : 10272 4 : 20513 5 : 40992 6 : 132641 7 : 264736 8 : 528929 9 : 2098720 10 : 3147297 11 : 4195872 # !/usr/bin/bash # # # create a list of numbers for message size. # # these should not be exact multiple of 1K, etc. # sizeArray=() sizeArray=("${sizeArray[@]}" "$((1024 * 1 + 512 + 33 ))" ) sizeArray=("${sizeArray[@]}" "$((1024 * 2 + 512 + 32 ))" ) sizeArray=("${sizeArray[@]}" "$((1024 * 4 + 1024 + 33 ))" ) sizeArray=("${sizeArray[@]}" "$((1024 * 8 + 2048 + 32 ))" ) sizeArray=("${sizeArray[@]}" "$((1024 * 16 + 4096 + 33 ))" ) sizeArray=("${sizeArray[@]}" "$((1024 * 32 + 8192 + 32 ))" ) sizeArray=("${sizeArray[@]}" "$((1024 * 128 + 1024 + 512 + 33 ))" ) sizeArray=("${sizeArray[@]}" "$((1024 * 256 + 2048 + 512 + 32 ))" ) sizeArray=("${sizeArray[@]}" "$((1024 * 512 + 4096 + 512 + 33 ))" ) sizeArray=("${sizeArray[@]}" "$((1024 * 1024 * 2 + 1024 + 512 + 32))" ) sizeArray=("${sizeArray[@]}" "$((1024 * 1024 * 3 + 1024 + 512 + 33))" ) sizeArray=("${sizeArray[@]}" "$((1024 * 1024 * 4 + 1024 + 512 + 32))" ) ASIZE=${#sizeArray[@]} for i in $( seq 0 $(( ${ASIZE} - 1 )) ) do echo $i ":" ${sizeArray[$i]} done # creating the files. # do this only ONCE so that # we will use the same file for comparison purposes # for messages left in folders. # # files are left in /var/tmp # create_messages () { for f in C D E F do for i in $( seq 1 ${ASIZE} ) do FNAME=$(printf "%d%c" $i $f) echo $FNAME ti=$(( $i - 1 )) od -x /dev/urandom | head --bytes=${sizeArray[ ${ti}]} > /var/tmp/$FNAME done done } # On my PC, messages sent by ethe following two loops # took 5-7 seconds ore much longer for TB to download locally and filter. # So I decide to run the two loops three times. # Sending a set of messages # we may want to have a two different order: # whether the choice of C D E F should be in the outer loop or in the # inner loop. This should affect I/O operations in a different manner, and # help us to find dormant I/O coding bugs. acounter=1 astart=$( date +%s ) send_loop () { # folder is chosen in the outer loop for f in C D E F do for i in 1 2 3 4 5 6 7 8 9 10 11 12 do FNAME=/var/tmp/${i}${f} echo $FNAME is sent # subject line is like # [folder-C][1C] test 1 1410563909 mail -s "[folder-${f}][${i}${f}] test $acounter $astart" ishikawa < $FNAME acounter=$(( ${acounter} + 1)) done done # folder is chosen in the inner loop for i in 1 2 3 4 5 6 7 8 9 10 11 12 do for f in C D E F do FNAME=/var/tmp/${i}${f} echo $FNAME is sent mail -s "[folder-${f}][${i}${f}] test $acounter $astart" ishikawa < $FNAME acounter=$((${acounter} + 1)) done done } if [ -s /var/tmp/1C ] then echo test data for messages have been created. else create_messages fi # # six loops for sending messages. # TB crashed when I tried to remove messages from # filtered folders. # But this may have something to do with profiler being enabled. # # three loops and TB did not crash. # what about four loops? # seemed to work. # Deletion from Inbox may have had an issue since it had # left over messages from before (and I deleted 92 messages in total.) # But again, it was probably due to "profiling" enabled. # It seems then five loops will cause errors? # Ouch. # I hit a MOZ_ASSERT() during the deletion of messages created by # five loops. GC assertion error. Ouch. # six loops. One more MOZ_ASSERT() from GC. Am I getting lucky? send_loop send_loop send_loop send_loop send_loop send_loop Encountered Problems: But even before getting there [do the additional manual copy/move] I encountered a few bugs of C-C TB. [And remember this is WITHOUT the patch for enabling buggy buffering.] I was trying to tidy up the content of the folders after a few trials of sending/receiving the set of e-mails. I thought TB was robust enought, but it was not there yet. - It seems that batch deletion of selected messages (large enough to take 10-20 seconds or more) have some race issues, which manifest themselves when "profiling" is enabled and enough delay is introduced to moving of each message: I see the following warning line a few times in the tty console where TB was started: [10802] WARNING: Last delete did not complete: file /REF-COMM-CENTRAL/comm-central/mailnews/base/src/nsMsgDBView.cpp, line 3055 And if profiling is enabled, eventually TB crashes(!). I have a suspicion that if the mail store is in a mounted remote file system, and if the remote file system experiences transient network errors, large delay can be introduced during I/O operations. Even without profiling is enabled, when TB spews out "WARNING: Last delete did not complete:...", I suspect there can be file corruption. I wonder if a separate clone process/thread is created for EACH message copy during the run of many selected message being copied. (Or else, why do I see the "Last delete did not complete" message?) Do we not serialize the move/copy by a lock or something? Also, the timing delay caused by the DISPLAY of the first few lines when I select the messages might be playing a role here. When I select several dozen messages some of which are more than 1MB, the delay caused by this selection and display is large enough, and I notice the display is updated sometimes while the deletion proceeds (!) so that may also play a role in uncovering the underlying race issues. - Another surprise is that I could trigger MOZ_ASSERT() from GC in a few test trial of deleting a set of selected largish messages. I have seen GC errors maybe once in every three months on the average during |make mozmill| tests on my local PC for the last few years. It is there, but very rare. But I saw MOZ_ASSERT() twice in a day. It is either the GC has a new bug or the particular memory access pattern of deletion of a series of largish messages in TB may trip GC rather easily? I am going to report these to new bugzilla entries. (The m-c portion of c-c tree was updated on Dec 29. I have updated the source code in the last few hours to see if I still get MOZ_ASSERT() easily.) This was the head of the local M-C tree (under ./mozilla directory of C-C tree) *when I saw MOZ_ASSERT() hg log -R ./mozilla -r default -l 1 changeset: 221543:67872ce17918 tag: tip parent: 221502:62e3a9f26d21 parent: 221542:beba124659ca user: Wes Kocher <wkocher@mozilla.com> date: Mon Dec 29 17:28:04 2014 -0800 summary: Merge inbound to m-c a=merge CLOSED TREE I had a few local patches on top of this when I compiled C-C TB, but they did not touch GC at all. I only inserted a few dump statements to track down other issues related to TB. I think it will be a few days before I can tackle the message corruption issue when buffering is turned on. TIA
Now I have a theory: cloned thread to move a message to a folder is OK as long as the target folder is different, actually such use of thread can achieve parallelism, but if the target folder is the same, we need proper serialization and it seems that we don't have it now (or otherwise we won't see the warning line, or will we? >[10802] WARNING: Last delete did not complete: file /REF-COMM-CENTRAL/comm-central/mailnews/base/src/nsMsgDBView.cpp, line 3055 [OK, it may actually be the opposite: a locking for serialization is there, and usually the last delete finishes before the next move begins due to the speed of parallelism? I hope someone familiar with the code can explain what is going on with threads here. Is there a master document that describes the use of thread in TB? Last time I checked, I could cound more than 20 threads running in TB when I checked thread status at a breakpoint in gdb.] Anyway from the description, simple move of two messages from a folder to the other could result in a corruption (with the incorrect buffering.) two years ago. So I hope to catch the error/cause once I re-enabled buffering for local testing.
Current progress. I refreshed the C-C tree, and I don't see MOZ_ASSERT() from GC code so far at least. Great. After several testing, I can confirm that the current C-C does not suffer from message corruption in the test scenario. Scenario: After patching TB to enable buffering, I performed downloading and filtering messages to a few buffers, and even manually invoking copy/move of messages from another folder into filter destination folders while downloading was in operation. TB two years ago seemed to suffer from a corruption in a similar scenario. Proper locking, etc. seem to be in place in today's C-C tree and warns the user if the operation is not possible due to the target folder being locked for other operation. So I am fairly confident that the buffering proposed here is OK in the current C-C tree. (In my test, I downloaded 144 messages to three folders each by filtering and each folder will have about 120MB of data after this operation. It seems that the operation now feels faster.) I have run |make mozmill| and |make xpcshell-tests| and the patch did not introduce new bugs. I will probably post a cleaned up patch in a couple of days. Actually the attached patch is already fine. *HOWEVER*, after all this, I noticed that there is another culprit for line-by-line |write| in TB. Sigh. I am going to post a few bugzilla entries about the issues which I uncovered during debugging this extra buffering issue. 1. - one about the failure to report subtle errors in buffer I/O routine. 2. - one about the possible failure of locking of buffered write when multiple threads references the same file descriptor-like structure. Shouldn't we protect each buffered write via lock/semaphore if multiple threads share the same file descriptor-like structure? 3. - Related to this: Do seek-like low-level functions properly flush pending buffered output when the seek position is moved? (the flush operation ought to occur before the seek so that the pending write occurs at the expected operation.) ) If we have multiple threads sharing the same file descriptor-like structure, we need to carefully synchronize the seek/write operations. 4. Lack of the return value checks for Write(), Close(), etc. in Pop3 and Imap related files. Problems 2 and 3 may have something to do with the following corruption issue. I am using POP3 for mail fetch, and the POP3 code seems responsible for storing the fetched message (received using recvfrom with 32KB buffer) line-by-line by calling |write| for each line into Inbox. I tried enabling buffering in nsPop3Sink.cpp additionally, but then I encountered an inconsistency issue detected at low-level routines, and so I cannot add to Inbox any more (so no downloading.) At least the Inbox file is not messed up thanks to early exit after the check, I think. So enabling buffering in nsPop3Sink.cpp is difficult as of now. During debugging this issue in nsPop3Sink.cpp, I also noticed a subtle error reporting issue in buffering I/O code (problem 1 above). I think the problems in nsPop3Sink.cpp are - Aside from the downloading thread, it seems there is at least one extra thread that mucks around with next read/write position (via seek) WHILE the main download thread writes to Inbox. This is bad. There is a comment to this effect, and before writing there is a seek to make sure that we are at the end of Inbox. http://mxr.mozilla.org/comm-central/source/mailnews/local/src/nsPop3Sink.cpp#669 669 nsresult nsPop3Sink::WriteLineToMailbox(const nsACString& buffer) 670 { 671 if (!buffer.IsEmpty()) 672 { 673 uint32_t bufferLen = buffer.Length(); 674 if (m_newMailParser) 675 m_newMailParser->HandleLine(buffer.BeginReading(), bufferLen); 676 // The following (!m_outFileStream etc) was added to make sure that we don't write somewhere 677 // where for some reason or another we can't write to and lose the messages 678 // See bug 62480 679 NS_ENSURE_TRUE(m_outFileStream, NS_ERROR_OUT_OF_MEMORY); 680 681 // seek to the end in case someone else has seeked elsewhere in our stream. 682 nsCOMPtr <nsISeekableStream> seekableOutStream = do_QueryInterface(m_outFileStream); 683 seekableOutStream->Seek(nsISeekableStream::NS_SEEK_END, 0); 684 uint32_t bytesWritten; 685 m_outFileStream->Write(buffer.BeginReading(), bufferLen, &bytesWritten); 686 NS_ENSURE_TRUE(bytesWritten == bufferLen, NS_ERROR_FAILURE); 687 } 688 return NS_OK; 689 } ) Having another thread that moves seek pointer is bad. (It is possible the file descriptor-like structure is for both READ and WRITE and READ-side thread is moving the seek pointer. If this were the case, maybe we can divide the File descriptor properly into two, one to access Inbox for read-only and the other for write-only. But we may still need some form of interlocking there. Pretty complex.) I really wonder what the other thread is. I am afraid that the seek (which inevitably needs a flush to occur before it to make sure no pending write is written to an incorrect place) causes the line-by-line |write| behavior. Actually, I was not so sure if today's implementation of seek properly flushes the pending input/output buffer, I added Flush() manually to where seek seems to be invoked. There are many places where seek is performed. That was when I realized that the mozilla code ought to do the flush automatically when seek-like operation is invoked for a buffered stream. Also, when buffering is enabled in nsPop3Sink.cpp, the mysterious other thread may have closed the file stream forcibly [without flush to occur? Obviously some locking issue.)] I saw a premature write error which was produced because the underlying stream was closed somehow before the buffered output could be finished??? [To obtain this error code, I needed to add proper checking into buffered write routine. The particular situation was not detected soonest in the existing code. The error was not reported fully!. ] The error I got was stream-was-already-closed from the incomplete write. TB tried to write 75 bytes but only 45 bytes were written.) Anyway, I will file another bugzilla to tackle this remaining line-by-line |write| issue in nsPop3Sink.cpp and reference other bugzilla entries of problems which I have described above. But for the buffering proposed in this bugzilla, the current C-C TB seems very solid, and so I will post a final patch (well actually request the review.) To be frank, I have no idea how to write the test for this operation (and will test harness allow the creation of all these message bodies that amount to 20MB in total, and sending messages with different headers in total of 360MB? For message integrity checking after the operation, I used external awk script by looking at the folder file after TB pauses and invoked "diff" for comparison. I have no idea if this is acceptable in test scenario. TIA BTW, when I add stricter check of return values of Write and Close in Pop3 and imap files, I notice that there are calls of Close() to already closed streams (in Pop3 case which I can test easily.). These look suspicious. I think a competing cloned task in nsPop3Sink.cpp is doing something funny to the file descriptor-like data. But again, I will discuss this in a bugzilla about nsPop3Sink.cpp. With imap, I notice that the created file stream seems to be already in buffer write mode and so probably there is no performance penalty and the corruption issue is already taken care of by now hopefully: at least I see no errors in |make mozill| and |make xpcshell-tests|. TIA PS: I really wish the buffering is enabled everywhere where applicable. I receive many messages with attachments (mainly PDFs) and the speedup for filtering and manual move, copy, saving of these messages and attachments will make TB user experience better.
(In reply to ISHIKAWA, Chiaki from comment #15) > I think the problems in nsPop3Sink.cpp are > - Aside from the downloading thread, it seems there is at least one extra > thread that mucks around with next read/write position (via seek) WHILE > the main download thread writes to Inbox. This is bad. My understanding is that nsPop3Protocol.cpp (unlike IMAP) runs on the main thread. You can still however experience unexpected code paths, either due to main-thread callbacks or to unexpected paths that subroutines take. > Having another thread that moves seek pointer is bad. ... > I really wonder what the other thread is. I spent a day or more investigating interactions of POP3 with low-level socket code in conjunction with bug 872357. I believe that there is a socket thread that obtains the actual data, which then passes it on to the main thread nsPop3Protocol code. You can see that by enabling logging on the socket code as well as the pop3 code, so you'll see the various calls as well as what thread they are running on. > To be frank, I have no idea how to > write the test for this operation (and will test harness allow the creation > of all these message bodies > that amount to 20MB in total, and sending messages with different headers in > total of 360MB? In bug 872357, the specific issue was breakdown of the testing harness in cases where we were trying to create 4GB! of messages, so yes you can create large messages with the testing harness. The buffer fix proposed there may be needed in some cases, though, at least with Windows. > > For message integrity checking after the operation, I used external awk > script by looking at the > folder file after TB pauses and invoked "diff" for comparison. > I have no idea if this is acceptable in test scenario. Probably not. xpcshell tests are javascript, you should be able to do something similar there. > > PS: I really wish the buffering is enabled everywhere where applicable. > I receive many messages with attachments (mainly PDFs) and > the speedup for filtering and manual move, copy, saving of these > messages and attachments will make TB user experience better. I think it is great that you are taking on the project of trying to enable buffering. The hard part is going to be trying to figure out if these changes cause regressions.
(In reply to Kent James (:rkent) from comment #16) > (In reply to ISHIKAWA, Chiaki from comment #15) Thank you for your comment and encouraging words. > > I think the problems in nsPop3Sink.cpp are > > - Aside from the downloading thread, it seems there is at least one extra > > thread that mucks around with next read/write position (via seek) WHILE > > the main download thread writes to Inbox. This is bad. > > My understanding is that nsPop3Protocol.cpp (unlike IMAP) runs on the main > thread. > > You can still however experience unexpected code paths, either due to > main-thread callbacks or to unexpected paths that subroutines take. > OK, so I need to look deeper. > > Having another thread that moves seek pointer is bad. > ... > > I really wonder what the other thread is. > > I spent a day or more investigating interactions of POP3 with low-level > socket code in conjunction with bug 872357. I believe that there is a socket > thread that obtains the actual data, which then passes it on to the main > thread nsPop3Protocol code. You can see that by enabling logging on the > socket code as well as the pop3 code, so you'll see the various calls as > well as what thread they are running on. > Great hints. I will turn on logging to figure out what is doing what around the time I see Flush(), Close() of already closed streams. [ ... ] > > > > For message integrity checking after the operation, I used external awk > > script by looking at the > > folder file after TB pauses and invoked "diff" for comparison. > > I have no idea if this is acceptable in test scenario. > > Probably not. xpcshell tests are javascript, you should be able to do > something similar there. I will try to keep this in mind. The comparison does not have to create a nice listing a la diff, just a simple |cmp| like YES/NO result is fine. > > > > PS: I really wish the buffering is enabled everywhere where applicable. > > I receive many messages with attachments (mainly PDFs) and > > the speedup for filtering and manual move, copy, saving of these > > messages and attachments will make TB user experience better. > > I think it is great that you are taking on the project of trying to enable > buffering. The hard part is going to be trying to figure out if these > changes cause regressions. Yes, I want to avoid regressions. So far, without enabling the buffering in nsPop3Sink.cpp, the patch proposed here seem to work fine. As for nsPop3Sink.cpp, I filed the following bugzilla entry first: Bug 1120046 RFC mozilla/netwerk/base/src/nsBufferedStreams.cpp: better error reporting and maybe adding thread-race lock My debugging suggests that the buffered output routine requires locking to be multi-thread safe. At least, it will not hurt to have MT-safe buffered I/O routines. Having such MT-safe buffered I/O routines may solve major issues of nsPop3Sink.cpp although there are still places where application-level locking may be in order. I see codes like |Seek| to a desired postion; |Write| something; in nsPop3Sink.jp [and presumably a few other files.] If somdehow, the execution is interrupted by a signal or something and a different task is invoked between |Seek| and |Write|, we may be in for a surprise. I will try to look for the probable cause in the next couple of days. TIA
(In reply to ISHIKAWA, Chiaki from comment #17) > (In reply to Kent James (:rkent) from comment #16) > > My debugging suggests that the buffered output routine requires locking to > be multi-thread safe. At least, it will not hurt to have MT-safe buffered > I/O routines. > Again, my understanding is that nPop3Protocol.cpp only runs on the main thread, so such locking should not be required. The socket code should not be doing any operations that force locking on its users. But I am not really a threaded C++ code expert.
Okay, as the resident threading expert for comm-central, let me interject: The only code in comm-central that runs off-main-thread (excluding JS workers, but I don't think we have any of those) is: * IMAP protocol (1 thread/connection) * LDAP protocol (1 thread/connection) * import code (except it really shouldn't--the database is NOT multithread-safe) Possibly the MAPI code may do some off-main-thread stuff as well, but that's complicated. Mozilla-central uses more threads for different reasons; the biggest impact on comm-central is that sockets and SSL communications each have their own threads. However, the guarantees (as I understand it) is that, when you're communicating over a socket, it can be safely read/written only on the thread to which the socket notification events are being delivered (i.e., main thread for everyone but IMAP and LDAP and the IMAP/LDAP threads for those two). As a basic rule of thumb, for best speed, avoid touching objects from multiple threads. Thread communication and synchronization is expensive--even if there is no contention in synchronization. This is particularly true on ARM, and somewhat less so on x86 (x86 has a fairly strong consistency model, but it doesn't solve remote cache issues). The ideal situation is only ever manipulating objects on one thread. If that's not possible, the next best is touching it on one thread, then switching it to another thread (what I've heard called as migratory data), so that synchronization is only really needed when handing off between threads. The worst is assuming that multiple threads are constantly touching the same data.
(In reply to Kent James (:rkent) from comment #18) > (In reply to ISHIKAWA, Chiaki from comment #17) > > (In reply to Kent James (:rkent) from comment #16) > > > > My debugging suggests that the buffered output routine requires locking to > > be multi-thread safe. At least, it will not hurt to have MT-safe buffered > > I/O routines. > > > > Again, my understanding is that nPop3Protocol.cpp only runs on the main > thread, so such locking should not be required. The socket code should not > be doing any operations that force locking on its users. > > But I am not really a threaded C++ code expert. Thank you for the comment. I hope that we don't need locking if possible. I filed a possible lead to the problem albeit an indirect one. Bug 1120136 - C-C: mailnews/local/src/nsPop3Sink.{h,cpp} variable m_inboxOutputStream is no longer used. I set you as a possible reviewer since that code seems to be put in by you. (In reply to Joshua Cranmer [:jcranmer] from comment #19) > Okay, as the resident threading expert for comm-central, let me interject: > > The only code in comm-central that runs off-main-thread (excluding JS > workers, but I don't think we have any of those) is: > * IMAP protocol (1 thread/connection) > * LDAP protocol (1 thread/connection) > * import code (except it really shouldn't--the database is NOT > multithread-safe) > Possibly the MAPI code may do some off-main-thread stuff as well, but that's > complicated. Thank you for the information. nsPop3Sink.cpp must be shooting itself in the foot on its own then :-) > > Mozilla-central uses more threads for different reasons; the biggest impact > on comm-central is that sockets and SSL communications each have their own > threads. However, the guarantees (as I understand it) is that, when you're > communicating over a socket, it can be safely read/written only on the > thread to which the socket notification events are being delivered (i.e., > main thread for everyone but IMAP and LDAP and the IMAP/LDAP threads for > those two). I think I need to time to understand this. > As a basic rule of thumb, for best speed, avoid touching objects from > multiple threads. Thread communication and synchronization is > expensive--even if there is no contention in synchronization. This is > particularly true on ARM, and somewhat less so on x86 (x86 has a fairly > strong consistency model, but it doesn't solve remote cache issues). The > ideal situation is only ever manipulating objects on one thread. So far so good. > If that's > not possible, the next best is touching it on one thread, then switching it > to another thread (what I've heard called as migratory data), so that > synchronization is only really needed when handing off between threads. I don't believe I need to this in TB. > The > worst is assuming that multiple threads are constantly touching the same > data. Right. Thank you again for the comment. Yes, locking for MT-safeness can be expensive: it depends on how often we have to call lock/unlock pair. (The situation in nsPop3Sink.cpp suggests that for mail main body, it is necessary for once or twice per line output, and for headers and meta-headers, maybe a few times per line.) [However, two comments suggest there should not be another rogue thread that changes the seek position, etc. Something is wrong in nsPop3Sink.cpp then. Maybe a careful analysis of Bug 1120136 might reveal the incorrect code.] Re the slowness of thread locking, I recall that an I/O intensive program's performance dropped by almost half when the MT-support came into glibc. Back then, the thread-safe version was linked by default and thus, fgetc(), which the program used now called lock/unlock for each single character it read (the program was a formatter and thus needed to process every single character carefully and used fgetc instead of reading a large chunk by fread into application's internal buffer and process each character there.), thus the significant drop in performance. That was on Sun3 machine (or was it ultra10?) a long time ago. In light of Bug 1120136 in nsPop3Sink.cpp, I hope by having a couple of stream variables (one of them is a reference to Inbox and will be open all the time), I can enable the buffering in nsPop3Sink.cpp (hopefully without locking, but we may still need locking even then.): actually, I think once the move to a message per file storage format is taken, then this problem in nsPop3Sink.cpp will be a non issue. Come to think of it, https://lists.mozilla.org/pipermail/dev-platform/2014-May/004762.html suggests that maybe I should forget chasing this bug in nsPop3Sink.cpp but push for proper use of |fdatasync| instead of using|fsync| everywhere at least under LINUX and other posix-compliant OS where |fdatasync| *works*. That might result in immediate speed gain as the data in the message suggests. But, unless |fdatasync| is available under Windows, that approach leaves Windows behind, and so I should probably try to chase the root cause of the delivery failure when buffering is enabled in nsPop3Sik.cpp after all. Well, finding an efficient cross-platform mail client (and bug-free almost) is difficult, isn't it? TIA
(In reply to Eric Valette from comment #9) > Please note that a side effect of the slowness, is that, because code for > moving mail from IMAP account to remote CIFS mounted directory does not > correctly handled various network IO error (timeout, eagain, ...), as soon > as I transfer a mail with a > 3 MB attachment on a samba share on a filler > (besides the fact that it takes ages when it execptionnaly works) I usually > end up with corrupted target directory (after having TB itself unresponsives > for minutes). I need to quit and to try recovering the target directory. > Fater recovery I usually get several identical mails with 0 size. > frequenetly it even causes data lost which makes this bug side effect > critical. > > I agree the write error handling in case of networked IO is a separate issue > but this rize the criticallity of this bug. Hi, as part of looking at nsPop3Sink.cpp, I am adding a lot of missing checks to nsPop3Sink.cpp. Mostly the checking of return values of Flush(), Close(), Write, and if the # of written bytes is equalt to requested # of bytes. (The final check is important when there is a transient network error for a write to remote file system: we can possibly retry. But how many times?..) Maybe once I am confident enough with nsPop3Sink.cpp error handling, I will pay attention to error handling in imap-related sources. At least, I can offer a framework of handling these errors in most crude manner and someone can pick up the error handling pattern and port it in the imap-related code. Oh one more thing, I have not been able to figure out why Windows code seems to work better with CIFS mount than linux version. I looked at Windows-native implementation of low-level mozilla I/O code, but I think I don't know the details of MS I/O function to understand the difference between Windows version and Posix-version (say linux) to figure out where the performance difference can be caused. It may be even that CIFS client code on the linux side may still need improvement. But I will keep this in mind. TIA
(In reply to ISHIKAWA, Chiaki from comment #13) > (In reply to :Irving Reid (No longer working on Firefox) from comment #10) > - It seems that batch deletion of selected messages (large enough to > take 10-20 seconds or more) have some race issues, which manifest > themselves when "profiling" is enabled and enough delay is > introduced to moving of each message: I see the following warning > line a few times in the tty console where TB was started: > > [10802] WARNING: Last delete did not complete: file > /REF-COMM-CENTRAL/comm-central/mailnews/base/src/nsMsgDBView.cpp, line > 3055 > > And if profiling is enabled, eventually TB crashes(!). > I have noticed a strange anomaly in the behavior of TB. After looking at the source code, I had a theory about the above "Last delete did not complete": it seems that TB may have an almost asynchronous operation by running filtering operation of move/copy in a cloned thread, and that if the e-mail sequence in the folder to be moved/copied is like the following, we may see such a warning. a large message (to be moved): takes time a small message (to be moved) : runs fast a large message (to be moved) a small message (to be moved) when the main thread tries to move/delete a small message, it may find the previous move/delete by asynchronous clone might not have finished (even on local PC's disk due to the large amount of data), thus the message. So I changed the order of e-mails sent by test script so that large, small, large, small, ..., large, small: the sending order is as follows. [I am using a slightly modified script in comment 13.] test data for messages have been created. /var/tmp/12C is sent <--- large /var/tmp/6C is sent <--- small /var/tmp/11C is sent <--- large (but smaller than 12c) /var/tmp/5C is sent <--- small /var/tmp/10C is sent <--- large (but smaller than 11C) /var/tmp/4C is sent /var/tmp/9C is sent /var/tmp/3C is sent /var/tmp/8C is sent /var/tmp/2C is sent /var/tmp/7C is sent /var/tmp/1C is sent /var/tmp/12D is sent /var/tmp/6D is sent /var/tmp/11D is sent /var/tmp/5D is sent ... /var/tmp/8E is sent /var/tmp/2E is sent /var/tmp/7E is sent /var/tmp/1E is sent /var/tmp/12C is sent <--- large /var/tmp/12D is sent /var/tmp/12E is sent /var/tmp/6C is sent <--- small /var/tmp/6D is sent /var/tmp/6E is sent /var/tmp/11C is sent <--- large (but smaller than 12C) /var/tmp/11D is sent /var/tmp/11E is sent /var/tmp/5C is sent /var/tmp/5D is sent /var/tmp/5E is sent /var/tmp/10C is sent ... /var/tmp/7C is sent /var/tmp/7D is sent /var/tmp/7E is sent /var/tmp/1C is sent /var/tmp/1D is sent /var/tmp/1E is sent ishikawa@ip030:/REF-COMM-CENTRAL/work-dir$ Then I received them to see if the message about last delete not finishing is seen. *A BIG SURRPRISE* The receiving is now *SLOOOOW*. Previously, the download proceeded smoothly. Yes, it takes time, but I saw messages coming down one by one. Now, for some reason, it takes almost forever. I have no idea why. - It may be that TB's code have internal contention issue when filtering is invoked and the messages size come in certain pattern. - Or, it may be that the local POP3 daemon may have some issue when the message file sizes are in certain order. I am not even sure if the downloading is performed in the order of sending... In any case this discovery is quite a surprise. The crawling speed of TB mentioned sometimes in bugzilla such as may be related to certain size patterns (well at least some of them). Yes, the speed is slow. I began typing this AFTER the downloading of messages from local POP3 server was initiated, thd downloading still continues. I see constant disk I/O at 8-10MB / sec range. Something is amiss... The counting of downloaded message does not increase as rapidly as with the previous ordering. The refresh does not occur as I move the windows around as frequently as I wish. Something is fishy in the code of nsPop3Sink.cpp. (Or in the POP3 server code. ) It is possible that extra scan of message file (due to unfinished delete maybe) occurs now, and O(n^2) n being the message scanning/reading of data occurs. I wonder if there is a concise document that describes what happens when a series of messages is downloaded and they are acted upon filtering. I find the comment too sparse to understand the operation sensibly. It will be a long time until I can get to the root cause of mail incorporation failure Half way through as I wrote the above, TB finished downloading, and I sent another fresh batch of 72 messages. Hmm, TB is still downloading !? And I see 8-10MB I/O. I will investigating which is causing these issues. Preliminary finding is that TB is issueing |fsync| for EVERY single line it writes. So lots of |fsync|, may be about 300 |fsync| calls a second? No wonder it feels slow. Please note that TB is NOT patched for the buffering nsPop3Sink.sh and so except for the patch proposed in this bugzilla, it is basically stock C-C tree version. [Oh well, I did add a few patches but not quote related to this issue.] I am afraid that today's released version will show similar slowness if the particular mail sequence it receives is like the artificially created one. This can almost be used as DoS tool... TIA
(In reply to ISHIKAWA, Chiaki from comment #22) > (In reply to ISHIKAWA, Chiaki from comment #13) > > (In reply to :Irving Reid (No longer working on Firefox) from comment #10) > > > - It seems that batch deletion of selected messages (large enough to > > take 10-20 seconds or more) have some race issues, which manifest > > themselves when "profiling" is enabled and enough delay is > > introduced to moving of each message: I see the following warning > > line a few times in the tty console where TB was started: > > > > [10802] WARNING: Last delete did not complete: file > > /REF-COMM-CENTRAL/comm-central/mailnews/base/src/nsMsgDBView.cpp, line > > 3055 > > > > And if profiling is enabled, eventually TB crashes(!). > > > > I have noticed a strange anomaly in the behavior of TB. > Forget about this. This could be a too-eager flush/close introduced during my patch. I am re-investigating the patch to see if there is extra flush somewhere. TIA
I think the original patch as it stands is correct as far as I can test it on local PC. As for the making the stream in nsPop3Sink.cpp, I filed a separate bugzilla Bug 1121842 - [META] RFC: C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. So I will follow the issue in nsPop3Sink.cpp in that bug. I will try to focus on the test issues in this bugzilal entry. - the xpcshell test or something for verifying/confirming that the patch does not cause regression with relatively large amount of pop3 download coupled with filtering some of them to a set of folders. My test scenario will be - I will create a test data files of various fixed sizes in advance. These will be sent as message body. Names: {C,D,E,F}x{1,2,3,4,5,6,7,8,9,10,11,12} They are named as C1, C2, C3, ..., C12, D1, D2, D3, ..., F12 - I need to create three folders (folder-C folder-D folder-E folder-F) in advance. - I need to create filtering rule in advance. that filters messages with subject line that contains "[folder-C]" -> move the message to folder-C (this filter is disabled) "[folder-D]" -> move the message to folder-D enabled "[folder-E]" -> move the message to folder-E enabled "[folder-F]" -> move the message to folder-F enabled - Disable TB's automatic download and download at the startup time in advance. - I need to send the test file as messages to a test pop3 server with a proper subject line. E.g. "folder-C][Cnn]" with the message body of {C1,C2, ...,C12} Cnn is substituted with C1, C2, ...C12. "folder-D][Dnn]" with the message body of {D1,D2, ...,D12} ditto for folder-E, and folder F. Send these in advance of receiving (download) operation by TB. - Start TB. - Start download (I am not sure how to do the following [1]) - But during dowload, I may want to introduce output pertuabation caused by manual filtering or manual move/copy of messages from Inbox. For example, select the disabled filter to move a message with "[folder-C]" in the subject to folder-C. Or I could MANUALLY copy them after selection to, say, folder-E to invoke the proper error check of locked folder preventing the manual move, or the manual move locking the destination folder preventing the move by filtering operation (both of which I observe on my local test and they work perfectly!). But the results are so timing dependent I am not sure what to do here. At least the move bu the filter from Inbox to folder-C checks the proper file I/O on Inbox. Anyway, stop TB after the download operation (and possible moving) finishes. - Check that each of folder-D, folder-E, folder-F contains the expected number of messages (in this case 12). - check that each folder contains exactly the expected message once and only once. For example, for folder-D, we should have messages with subject lines that contain D1, D2, D3, ..., D12 once and only once. - Check the text body of each message based on the subject line's "[D1]", "[D2]", ..., etc. against the original file for complete match. If the set of tests passes the buffering output is SUCCESS(!). Well that said, to be honest, maybe just running |make mozmill| and |make xpcshell-tests| may be enough with the patch enabled. What do people think? (Well, I need to find out the js code that I can re-use for all the operation above. Not a simple work, I am afraid...) TIA PS: come to think of it, I noticed the blocked filter operation does not have a good error reporting YET (See the work-in-progress patch in Bug 854172 - Add a missing check of the return value of MoveIncorporatedMessage, and the failure to log such failure. ) and so it is difficult to check right now that the proper filter locking is taking place in the official C-C tree yet.) I will see what I can do. [1] Note: for asynchronous perturbation by invoking the move by filter or manual move/copy of messages with "[Folder-C]" in subject line in Inbox to folder-D, say, to cause output contention on the same folder, we can certainly do this and see if the final folders contain uncorrupted messages that match the orginal data (ignoring the potential error caused by the locking of folders): we can just skip the check on the total number of messages downloaded, that the whole set of messages are put into the folder (because locking of the folder may fail to move the messages), etc.
Attached patch experiment 2 (obsolete) (deleted) — Splinter Review
Chiaki, can you try this patch in your tests?
Attachment #8570630 - Flags: feedback?(ishikawa)
(In reply to :aceman from comment #25) > Created attachment 8570630 [details] [diff] [review] > experiment 2 > > Chiaki, can you try this patch in your tests? I will do this. Sorry I have been busy with family matters for the last few weeks and could not do much programming. TIA
(In reply to :aceman from comment #25) > Created attachment 8570630 [details] [diff] [review] > experiment 2 > > Chiaki, can you try this patch in your tests? I have done this. The patch does not introduce problems in my test of |make mozmill| and local e-mail exchange and filtering. However, the buffering for appending to Inbox is not effective. This is because of the seek performed for each write of one line as noted in comment 15. Tracing system call shows a repetition of the calls: code snippet in comment 15 shows the lseek to the end presumbaly, and I am not sure where seek to the current position sneaks in. lseek(44, 0, SEEK_END) = 359092573 lseek(44, 0, SEEK_CUR) = 359092573 write(44, "02dad0 a0 ba 72 ad 68 11 38 58 d"..., 75) = 75 Anyway, buffering code has to flush the buffer whenever a lsee is done, I think. So basically, the effect of buffering is lost in appending downloaded message to Inbox when one uses POP3. I will investigate WHERE ON EARTH the file pointer is changed aside from the downloading thread. Maybe I will study the lead mentioned in the comment 16 by rkent. I have a hunch that if one wants to attach a tag to a message in Inbox, that will introduce a write operation to Inbox and this may be also an operation that may "move" the file pointer if the writing of tag information uses the same file descriptor. Does anyone know how the tag is written? If such extra writing operations can use a separate file descriptor, then seek just before the writing of a line when one downloads a message from POP3 server can be avoided. I wonder what imap code does. Does it suffer from the same unbuffered write operation? I don't use imap and so can't test. TIA TIA
Attachment #8570630 - Flags: feedback?(ishikawa) → feedback+
Then I do not understand why your patch would help with the seek problem and mine not. It just adds buffering at a different place. Have you actually measured, that the ton of seek calls are very slow?
(In reply to :aceman from comment #28) > Then I do not understand why your patch would help with the seek problem and > mine not. It just adds buffering at a different place. Have you actually > measured, that the ton of seek calls are very slow? Hi, I was not descriptive enough. I think both your patch and mine does help with many part of file operation, EXCEPT FOR for the downloading message to inbox. (ACTUALLY, I need to check the strace output to figure out that your patch indeed buffers the file operation such as moving ALREADY DOWNLOADED message to another folder due to filtering. I assumed so but I was going to check it with strace and post-run analysis.) The point is lseek() itself may not be THAT slow, but the isue is instead of a sequence of large write using a buffer write(4K), write(4K), write(4K), the extra seek seems to cause buffering code to give up buffering and we see lseek(), lseek(), write(75 or whatever a line's length), lseek, lseek, write, .... It is not that lseek() is slow (can be slow over the wire if we mount a remote file system), but rather the presence of stream->Seek() just before each stream->Write() seems to invalidate the proper buffering. So my patch does NOT solve this issue, either. Thus I want to figure out whether this ...->Seek() is absolutely necessary and want to eliminate it. And so I am curious where this lseek(44, 0, SEEK_CUR) = 359092573 with "SEEK_CUR" comes from when your patch is applied. I think the code snippet in comment 16 only mentions > 683 seekableOutStream->Seek(nsISeekableStream::NS_SEEK_END, 0); and SEEK_CUR is not used there. I counted the use of "Seek(" in /mail /mailnews and there are about a few dozen calls. I wonder if one of them is the cause of this SEEK_CUR. I also wonder if one of them might be the reason for moving/altering the file position [with subsequent writing/reading] behind download code (POP3) and that this is the reason the code snippet in comment 15 at one point in time needed to incorporate ->Seek call. I believe that if I can remove the offending Seek, by making sure that the out of context "lseek" (with possible read/write afterward), which the download code needed to defend against by seek to the end of the file when each line of message being downloaded is written, uses a different file stream (to the same Inbox), we can make buffering code REALLY buffer the write to Inbox. Maybe I was not clear enough. But the above is my thinking now. But, sorry, I was SO FOCUSED on appending to Inbox only and so I failed to check that your code DOES buffer the file operation such as moving / copying a messasge to another buffer. I was only checking that logically there was no test failure, and I checked whether the appending to the Inbox was buffered (and it was not). I will check and report back. TIA
(In reply to ISHIKAWA, Chiaki from comment #29) > But, sorry, I was SO FOCUSED on appending to Inbox only and so I failed to > check > that your code DOES buffer the file operation such as moving / copying a > messasge to > another buffer. I was only checking that logically there was no test > failure, and > I checked whether the appending to the Inbox was buffered (and it was not). Great, then please use my patch (as I think it does what Neil requested) and add the Seek fix on top of that.
An interesting development. I thought hard about what action might cause writing to Inbox when a message is being downloaded from POP3 server into Inbox. Maybe adding a tag to a message into a folder. I checked the operation and traced the syscall during the operation. Here is one example: Excerpt: Adding a tag (label) to a message in folder-F: folder-F is opened and a X-Moilla-Status header line is read from the folder file and presumably a tag/label (below $label2) is added and written back to the relevant portion. open("/home/ishikawa/test-MAIL-dir/MAIL-DIR/folder-F", O_RDWR|O_CREAT, 0664) = 58 fcntl(58, F_GETFD) = 0 lseek(58, 80469144, SEEK_SET) = 80469144 lseek(58, 0, SEEK_CUR) = 80469144 read(58, "X-Mozilla-Status: 0001\nX-Mozilla"..., 4096) = 4096 lseek(58, 0, SEEK_CUR) = 80473240 lseek(58, 0, SEEK_CUR) = 80473240 lseek(58, 0, SEEK_CUR) = 80473240 lseek(58, 80469210, SEEK_SET) = 80469210 write(58, " $label2", 8) = 8 close(58) = 0 Ok, so if I add a tag/label to a message in Inbox, a write to Inbox happens! Does it interfere with download operation? Let me try. Excerpt: Adding a tag (label) to a message in Inbox open("/home/ishikawa/test-MAIL-dir/MAIL-DIR/Inbox", O_RDWR|O_CREAT, 0664) = 47 fcntl(47, F_GETFD) = 0 lseek(47, 38234113, SEEK_SET) = 38234113 lseek(47, 0, SEEK_CUR) = 38234113 read(47, "X-Mozilla-Status: 0001\nX-Mozilla"..., 4096) = 4096 lseek(47, 0, SEEK_CUR) = 38238209 lseek(47, 0, SEEK_CUR) = 38238209 lseek(47, 0, SEEK_CUR) = 38238209 lseek(47, 38234179, SEEK_SET) = 38234179 write(47, " $label4", 8) = 8 close(47) = 0 This time I used a different tag/label and so you see $label4 instead of $label2 above. Sure a write takes place to Inbox. However, obviously this operation opens a new file descriptor and is unlikely to cause a conflict with the file pointer that is being referenced by the pop3 download code to write a downloaded message into Inbox. What else can be writing in (or maybe reading from) Inbox in parallel to pop3 download code? Maybe GLODA index thingy? During my local test and recording of system calls, I see something like the following. I noticed the string "gloda" in the log. It seems to me that probably after a flurry of messages are moved to folder-E by the filter rule during download, gloda kicks in to read the message therein? (It could be just that mimemsg.js under /gloda/ subdirectory is being used for mime parsing. I am not sure) Anyway, it seems to open a file descriptor on its own (fd = 57). open("/home/ishikawa/test-MAIL-dir/MAIL-DIR/folder-E", O_RDONLY) = 57 fcntl(57, F_GETFD) = 0 lseek(57, 109742338, SEEK_SET) = 109742338 lseek(57, 0, SEEK_CUR) = 109742338 lstat("/REF-OBJ-DIR/objdir-tb3/dist/bin/modules/gloda/mimemsg.js", {st_mode=S_IFLNK|0777, st_size=67, ...}) = 0 lstat("/REF-OBJ-DIR/objdir-tb3/dist/bin/modules/gloda/mimemsg.js", {st_mode=S_IFLNK|0777, st_size=67, ...}) = 0 readlink("/REF-OBJ-DIR/objdir-tb3/dist/bin/modules/gloda/mimemsg.js", "/REF-COMM-CENTRAL/comm-central/m"..., 67) = 67 lstat("/REF-COMM-CENTRAL/comm-central/mailnews/db/gloda/modules/mimemsg.js", {st_mode=S_IFREG|0644, st_size=26426, ...}) = 0 stat("/REF-COMM-CENTRAL/comm-central/mailnews/db/gloda/modules/mimemsg.js", {st_mode=S_IFREG|0644, st_size=26426, ...}) = 0 lstat("/REF-OBJ-DIR/objdir-tb3/dist/bin/modules/gloda/utils.js", {st_mode=S_IFLNK|0777, st_size=65, ...}) = 0 lstat("/REF-OBJ-DIR/objdir-tb3/dist/bin/modules/gloda/utils.js", {st_mode=S_IFLNK|0777, st_size=65, ...}) = 0 readlink("/REF-OBJ-DIR/objdir-tb3/dist/bin/modules/gloda/utils.js", "/REF-COMM-CENTRAL/comm-central/m"..., 65) = 65 lstat("/REF-COMM-CENTRAL/comm-central/mailnews/db/gloda/modules/utils.js", {st_mode=S_IFREG|0644, st_size=6250, ...}) = 0 stat("/REF-COMM-CENTRAL/comm-central/mailnews/db/gloda/modules/utils.js", {st_mode=S_IFREG|0644, st_size=6250, ...}) = 0 futex(0x266b26c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x266b208, 13756) = 1 ... So after considering a little more, I got bold and decided to disable the |Seek| in front of |Write| in nsPop3Sink.cpp because I could not think of any other actions that might move the write pointer to Inbox used by the pop3 download code. (pop3 download is supposed to be single-thread and most other actions can not easily mess with the file position of the file descriptor which the download code uses since it is very likely that other actions open a file pointer for its own use.) This is how I disabled the |Seek| diff --git a/mailnews/local/src/nsPop3Sink.cpp b/mailnews/local/src/nsPop3Sink.cpp --- a/mailnews/local/src/nsPop3Sink.cpp +++ b/mailnews/local/src/nsPop3Sink.cpp @@ -673,19 +673,27 @@ nsresult nsPop3Sink::WriteLineToMailbox( uint32_t bufferLen = buffer.Length(); if (m_newMailParser) m_newMailParser->HandleLine(buffer.BeginReading(), bufferLen); // The following (!m_outFileStream etc) was added to make sure that we don't write somewhere // where for some reason or another we can't write to and lose the messages // See bug 62480 NS_ENSURE_TRUE(m_outFileStream, NS_ERROR_OUT_OF_MEMORY); +#if 0 + // CI: I have no idea why this was put in here. + // I thought I have seen a case where the seek() here was meaningful, but + // I am not so sure. [I was fixing low-level routine's error message when + // seek() was called to a novel place. But maybe I was wrong.] + // So for testing the hypothesis that this is not necessary, I am removing this for once. + // seek to the end in case someone else has seeked elsewhere in our stream. nsCOMPtr <nsISeekableStream> seekableOutStream = do_QueryInterface(m_outFileStream); seekableOutStream->Seek(nsISeekableStream::NS_SEEK_END, 0); +#endif uint32_t bytesWritten; nsresult rv4 = m_outFileStream->Write(buffer.BeginReading(), bufferLen, &bytesWritten); #if DEBUG // During test, this Write failed. if (NS_FAILED(rv4)) { fprintf(stderr, "m_outFileStream->Write(buffer.BeginReading(), bufferLen, &bytesWritten); failed. rv4 = 0x%08x\n", rv4); } else if (bytesWritten != bufferLen) My initial brief local testing seems good. Downloading to Inbox from pop3 server now uses 4K buffering! (Compared with 75 octets write and accompanying seek, this is a winner in terms of speed when file system is remotely mounted via network!) To wit: Excerpt of incorporating messaging to Inbox via pop3 download write(1, "Incorporate message begin:", 26) = 26 write(1, "\n", 1) = 1 write(1, "uidl string: 0000473a533eeb2c\n", 30) = 30 access("/home/ishikawa/test-MAIL-dir/MAIL-DIR/Inbox", F_OK) = 0 open("/home/ishikawa/test-MAIL-dir/MAIL-DIR/Inbox", O_WRONLY) = 3 fcntl(3, F_GETFD) = 0 write(3, "", 0) = 0 lseek(3, 0, SEEK_END) = 19274925 lseek(3, 0, SEEK_CUR) = 19274925 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=333, ...}) = 0 write(3, "From - Fri Mar 13 00:37:11 2015\n"..., 4096) = 4096 write(3, " e4 ea c6 28 85 30 c5 6a >...6."..., 4096) = 4096 write(3, "00630 ab 34 33 8f 1a af 6b f1 d2"..., 4096) = 4096 write(3, "e 09 7b >..i..MN.W...`N.{<\n0009"..., 4096) = 4096 write(3, " 9b c3 47 c3 6d aa 6b 55 bc 82 1"..., 4096) = 4096 write(3, "....},...<\n001070 35 30 dc a6 09"..., 4096) = 4096 write(3, "d 6f 9c e1 03 02 33 >....8....."..., 4096) = 4096 write(3, " 3a 8b 1b f5 e5 86 b8 13 89 01 c"..., 4096) = 4096 ... BTW, for moving message via filter rule, we certainly see the effect of buffering, too. In this case, reading from Inbox uses 10KB of buffering while writing to folder-F uses 4K buffering. (We might want to use a large buffering. Correct?) write(1, "GetDiskSpaceAvailable returned: "..., 49) = 49 open("/home/ishikawa/test-MAIL-dir/MAIL-DIR/Inbox", O_RDONLY) = 3 fcntl(3, F_GETFD) = 0 lseek(3, 15080243, SEEK_SET) = 15080243 stat("/home/ishikawa/test-MAIL-dir/MAIL-DIR/folder-F", {st_mode=S_IFREG|0755, st_size=93915266, ...}) = 0 access("/home/ishikawa/test-MAIL-dir/MAIL-DIR/folder-F", F_OK) = 0 open("/home/ishikawa/test-MAIL-dir/MAIL-DIR/folder-F", O_WRONLY) = 59 fcntl(59, F_GETFD) = 0 write(59, "", 0) = 0 lseek(59, 0, SEEK_END) = 93915266 lseek(59, 0, SEEK_CUR) = 93915266 read(3, "From - Fri Mar 13 00:36:12 2015\n"..., 10240) = 10240 write(59, "From - Fri Mar 13 00:36:12 2015\n"..., 4096) = 4096 write(59, " 07 98 10 fa 0b e8 45 ca >ZP.?M"..., 4096) = 4096 read(3, " 20 0e a8 04 b3 e0 72 ef fb 9b "..., 10240) = 10240 write(59, "00630 24 e2 40 50 1e 58 b2 24 05"..., 4096) = 4096 write(59, "6 ed ef >.[.....'rq.O....<\n0009"..., 4096) = 4096 read(3, "-.~MD...H<\n001070 db 30 a4 bb d3"..., 10240) = 10240 write(59, " 39 81 83 c9 1d 42 f2 64 b0 6f d"..., 4096) = 4096 write(59, "-.~MD...H<\n001070 db 30 a4 bb d3"..., 4096) = 4096 write(59, "5 a7 8f f1 01 c5 2a >.?._..|..U"..., 4096) = 4096 ... My local testing sends many largish messages to be filtered into folder-D, folder-E, folder-F. I will mix a manual copying of messages to folder-C by manual invocation of a filter (which is usually disabled.) I will repeat this locally created test and additionally run |make mozmill |and |make xpcshell-tests| with your buffering patch and the above deletion of seek to see if it uncovers any problem. TIA PS: Let me describe the current behavior before the deletion of |Seek| using excerpts from system call trace. The following is an excerpt from the log of test run of TB with your patch (BUT WITHOUT the deletion of |Seek| before |Write|.) The sequence of system calls when a download of message from a pop3 server begins ("Incorporate message being:") is shown below. Inbox is opened, but the writing to Inbox is done only one line at a time. There is no effective buffering BECAUSE lseek calls interfere with buffering. Excerpt: write(1, "Incorporate message begin:", 26) = 26 write(1, "\n", 1) = 1 write(1, "uidl string: 000046bc533eeb2c\n", 30) = 30 access("/home/ishikawa/test-MAIL-dir/MAIL-DIR/Inbox", F_OK) = 0 open("/home/ishikawa/test-MAIL-dir/MAIL-DIR/Inbox", O_WRONLY) = 48 fcntl(48, F_GETFD) = 0 write(48, "", 0) = 0 lseek(48, 0, SEEK_END) = 366649869 lseek(48, 0, SEEK_CUR) = 366649869 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=333, ...}) = 0 write(48, "", 0) = 0 lseek(48, 0, SEEK_END) = 366649869 lseek(48, 0, SEEK_CUR) = 366649869 write(48, "From - Thu Mar 12 22:44:13 2015\n", 32) = 32 lseek(48, 0, SEEK_END) = 366649901 lseek(48, 0, SEEK_CUR) = 366649901 write(48, "X-Account-Key: account1\n", 24) = 24 lseek(48, 0, SEEK_END) = 366649925 lseek(48, 0, SEEK_CUR) = 366649925 write(48, "X-UIDL: 000046bc533eeb2c\n", 25) = 25 lseek(48, 0, SEEK_END) = 366649950 lseek(48, 0, SEEK_CUR) = 366649950 write(48, "X-Mozilla-Status: 0000\n", 23) = 23 lseek(48, 0, SEEK_END) = 366649973 lseek(48, 0, SEEK_CUR) = 366649973 write(48, "X-Mozilla-Status2: 00000000\n", 28) = 28 lseek(48, 0, SEEK_END) = 366650001 lseek(48, 0, SEEK_CUR) = 366650001 write(48, "X-Mozilla-Keys: "..., 97) = 97 lseek(48, 0, SEEK_END) = 366650098 lseek(48, 0, SEEK_CUR) = 366650098 write(48, "Return-path: <ishikawa@ip030.exa"..., 42) = 42 lseek(48, 0, SEEK_END) = 366650140 lseek(48, 0, SEEK_CUR) = 366650140 write(48, "Envelope-to: ishikawa@ip030.exam"..., 40) = 40 lseek(48, 0, SEEK_END) = 366650180 lseek(48, 0, SEEK_CUR) = 366650180 write(48, "Delivery-date: Tue, 10 Mar 2015 "..., 47) = 47 lseek(48, 0, SEEK_END) = 366650227 lseek(48, 0, SEEK_CUR) = 366650227 write(48, "Received: from ishikawa by ip030"..., 68) = 68 lseek(48, 0, SEEK_END) = 366650295 lseek(48, 0, SEEK_CUR) = 366650295 write(48, "\t(envelope-from <ishikawa@ip030."..., 46) = 46 lseek(48, 0, SEEK_END) = 366650341 lseek(48, 0, SEEK_CUR) = 366650341 write(48, "\tid 1YVA62-0007CX-VC\n", 21) = 21 lseek(48, 0, SEEK_END) = 366650362 lseek(48, 0, SEEK_CUR) = 366650362 write(48, "\tfor ishikawa@ip030.example.com;"..., 65) = 65 lseek(48, 0, SEEK_END) = 366650427 lseek(48, 0, SEEK_CUR) = 366650427 write(48, "To: ishikawa@ip030.example.com\n", 31) = 31 lseek(48, 0, SEEK_END) = 366650458 lseek(48, 0, SEEK_CUR) = 366650458 write(48, "Subject: [folder-C][6C] test 4 1"..., 42) = 42 lseek(48, 0, SEEK_END) = 366650500 lseek(48, 0, SEEK_CUR) = 366650500 write(48, "Message-Id: <E1YVA62-0007CX-VC@i"..., 50) = 50 lseek(48, 0, SEEK_END) = 366650550 lseek(48, 0, SEEK_CUR) = 366650550 write(48, "From: ishikawa <ishikawa@ip030.e"..., 44) = 44 lseek(48, 0, SEEK_END) = 366650594 lseek(48, 0, SEEK_CUR) = 366650594 write(48, "Date: Tue, 10 Mar 2015 11:40:38 "..., 38) = 38 lseek(48, 0, SEEK_END) = 366650632 lseek(48, 0, SEEK_CUR) = 366650632 write(48, "\n", 1) = 1 lseek(48, 0, SEEK_END) = 366650633 lseek(48, 0, SEEK_CUR) = 366650633 write(48, "000000 2d be 8f 32 a1 9e e6 a6 8"..., 75) = 75 lseek(48, 0, SEEK_END) = 366650708 lseek(48, 0, SEEK_CUR) = 366650708 write(48, "000010 03 82 30 d8 ef 26 78 9c 5"..., 75) = 75 lseek(48, 0, SEEK_END) = 366650783 lseek(48, 0, SEEK_CUR) = 366650783 write(48, "000020 03 38 93 39 b7 06 38 b5 e"..., 75) = 75 lseek(48, 0, SEEK_END) = 366650858 lseek(48, 0, SEEK_CUR) = 366650858 write(48, "000030 6d f3 ff 3d 02 f7 95 d5 f"..., 75) = 75 lseek(48, 0, SEEK_END) = 366650933 lseek(48, 0, SEEK_CUR) = 366650933 ... MANY lines of 75 octets write for large messags ... ... The test message have large number of lines of 75 octets. So buffering effect is not visible due to interfering lseek().
(In reply to ISHIKAWA, Chiaki from comment #31) Great analysis, thanks for your experiments! > Here is one example: > > Excerpt: Adding a tag (label) to a message in folder-F: > folder-F is opened and a X-Moilla-Status header line is read > from the folder file and presumably a tag/label (below $label2) is added and > written back to the relevant portion. > > > open("/home/ishikawa/test-MAIL-dir/MAIL-DIR/folder-F", O_RDWR|O_CREAT, 0664) > = 58 > fcntl(58, F_GETFD) = 0 > lseek(58, 80469144, SEEK_SET) = 80469144 > lseek(58, 0, SEEK_CUR) = 80469144 > read(58, "X-Mozilla-Status: 0001\nX-Mozilla"..., 4096) = 4096 > lseek(58, 0, SEEK_CUR) = 80473240 > lseek(58, 0, SEEK_CUR) = 80473240 > lseek(58, 0, SEEK_CUR) = 80473240 > lseek(58, 80469210, SEEK_SET) = 80469210 > write(58, " $label2", 8) = 8 > close(58) = 0 > > > Ok, so if I add a tag/label to a message in Inbox, > a write to Inbox happens! Yes, it should do that. Writes the tag to the .msf database but also to mbox. > Does it interfere with download operation? > Let me try. > > This time I used a different tag/label and so you see $label4 instead > of $label2 above. > > Sure a write takes place to Inbox. However, obviously this operation > opens a new file descriptor and is unlikely to cause a conflict with > the file pointer that is being referenced by the pop3 download code to > write a downloaded message into Inbox. > > What else can be writing in (or maybe reading from) Inbox in parallel to > pop3 download code? So you think something can write to other messages while new ones are being downloaded? parallel operations are possible in some cases but it looks scary if it isn't handled properly. Is it possible to check current position in the file and only seek to end if we aren't already at the end? > BTW, for moving message via filter rule, we certainly see the effect > of buffering, too. In this case, reading from Inbox uses 10KB of > buffering while writing to folder-F uses 4K buffering. > (We might want to use a large buffering. Correct?) I have a bug somewhere where I consolidate these different sizes. You need not worry about it here. Please let nothing distract you from the seeking bug here :)
> So you think something can write to other messages while new ones are being > downloaded? parallel operations are possible in some cases but it looks > scary if it isn't handled properly. > Is it possible to check current position in the file and only seek to end if we aren't already at the end? lseek to get the current write pointer cannot be a local operation in general on a remote filesystem as unless you have opened the file in exclusive write mode as another app may work and write on the same file. So this is probably already a costly remote filesystem operation, of the same order of magnitude than the seek itself.
(In reply to Eric Valette from comment #33) > > So you think something can write to other messages while new ones are being > > downloaded? parallel operations are possible in some cases but it looks > > scary if it isn't handled properly. > > > Is it possible to check current position in the file and only seek to end if we aren't already at the end? > > lseek to get the current write pointer cannot be a local operation in > general on a remote filesystem as unless you have opened the file in > exclusive write mode as another app may work and write on the same file. So > this is probably already a costly remote filesystem operation, of the same > order of magnitude than the seek itself. I agree about this. Trying to find the file's read/write position may be an expensive operation itself :-( If we have to do it for each line, that may negate the whole issue of using buffering ... OTOH, tThe whole point of having this explicit seek() looks insane to me (!) I have absolutely no idea why on earth this seek() operation is put in here, i.e., what other operations use the same file descriptor to seek to a different position (!?) This design is insane. Whatever other operations may want to do with Inbox, they need to open a separate descriptor and use *THAT* for the seek operation. However, mxr is not a good friend here. This particular code has no "blame" associated with it. Than, I recalled, once someone showed me an old CVS-based system in relation to an issue that looks so arcane to me regarding folder compaction code, and recollecting something called Bonsai was there, I searched. http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=nsPop3Sink.cpp&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=all&mindate=&maxdate=&cvsroot=%2Fcvsroot Voila. At least, I could tell some early issues regarding the appearance and disappearance of seek and use of eof() checking to reduce the seek operations by checking that we are indeed at the eof() so that at least seek() is not explicity invoked, etc. I have not read the historical threads in the above URL in detail, but may decide to put in the eof check (if that is available with today's stream) or simply remove seek like I did experimentally (since at least once, the check for EOF was put in as an assert(): well checking we are at EOF is expensive, so I don't if leaving the test is desirable *IF* we can prove that no other operations are moving the file pointer (!!!) I wonder if someone recalls something about this, but Bonsai URL suggests this happened many years ago.) If someone has the time to read the history over the weekend and come up with a suggestion based on recollection of what happened, etc., I will appreciate it. I will be back on Sunday evening... Too bad, the current MXR does not have this bonsai history in it. It is a prehistorical stuff, but I think TB code has many decifiencies traced back to this era. As far as I can tell, someone in the early 2000's complained about the slow down on NFS, and we are doing the same regarding CIFS mount in 2015 :-) TIA
(In reply to ISHIKAWA, Chiaki from comment #34) > OTOH, tThe whole point of having this explicit seek() looks insane to me (!) > I have absolutely no idea why on earth this seek() operation is put in here, > i.e., > what other operations use the same file descriptor to seek to a different > position (!?) > This design is insane. Whatever other operations may want to do with Inbox, > they need to open a separate descriptor and use *THAT* for the seek > operation. I totally agree. And instead of finding costly and anyway unsecure means (operation are not atomic so that after checking/seeking it may be false) to try to detect that someone else is not writing to the same file, from the same app; why not on earth use file locking via Advisory record locking implemented by fcntl? or even Open file description locks? >As far as I can tell, someone in the early 2000's complained about the slow down on NFS, and >we are doing the same regarding CIFS mount in 2015 :-) For me this is more recent than that. It worked correctly with same setup up to 15 or something like that...
To put it differently using "seek to end of file then write" is WRONG BY DESIGN as between the seek and the write the thread may be descheduled (as a result of interrupt/ time slicing qunatum) while another thread does a write (if this is allowed and this is totally wrong for me...) So I would remove it anyway. The locking can also be done login before a write but this may involve redesign.
(In reply to ISHIKAWA, Chiaki from comment #34) > OTOH, tThe whole point of having this explicit seek() looks insane to me (!) > I have absolutely no idea why on earth this seek() operation is put in here, > i.e., > what other operations use the same file descriptor to seek to a different > position (!?) > This design is insane. Whatever other operations may want to do with Inbox, > they need to open a separate descriptor and use *THAT* for the seek > operation. I doubt that will be possible. If you open one descriptor/file handle/stream for WRITING to a file you usually can't open another one for WRITING to the same file. At most it could be a READONLY stream. This locking is usually enforced by the OS. I'd hope the mozilla streams have these locks in place. I have an idea: can we prepare a separate buffer of some size (be it the 4K, 10K, whatever) with several lines of data? Then seek to EOF once and write that whole buffer? That could be much easier to do than hunt what writes to the same stream behind your back. And even if you could find the culprit, it may not be so easy to remove it. Do we want to block the whole UI (e.g. tagging) while POP3 download is in progress?
(In reply to :aceman from comment #37) > I doubt that will be possible. If you open one descriptor/file handle/stream > for WRITING to a file you usually can't open another one for WRITING to the > same file. At most it could be a READONLY stream. This locking is usually > enforced by the OS. I'd hope the mozilla streams have these locks in place. Will would two thread need to write to the same file at the same time? That's insane!!!! You have no guranty il will ever work. > > I have an idea: > can we prepare a separate buffer of some size (be it the 4K, 10K, whatever) > with several lines of data? Then seek to EOF once and write that whole > buffer? That could be much easier to do than hunt what writes to the same > stream behind your back. You will never have any warabnty you write at the right location. > And even if you could find the culprit, it may not > be so easy to remove it. Do we want to block the whole UI (e.g. tagging) > while POP3 download is in progress?
(In reply to Eric Valette from comment #38) > > I have an idea: > > can we prepare a separate buffer of some size (be it the 4K, 10K, whatever) > > with several lines of data? Then seek to EOF once and write that whole > > buffer? That could be much easier to do than hunt what writes to the same > > stream behind your back. > You will never have any warabnty you write at the right location. If we have warranty today that if we seek before each write of a line, the write will be at the right location, then I think there is the same warranty if we seek before write of each buffer. It is transparent to the stream that the buffer will now contain several lines of data. There will just be less seeks and solve the problem Chiaki is chasing.
(In reply to :aceman from comment #39) > (In reply to Eric Valette from comment #38) > > > I have an idea: > > > can we prepare a separate buffer of some size (be it the 4K, 10K, whatever) > > > with several lines of data? Then seek to EOF once and write that whole > > > buffer? That could be much easier to do than hunt what writes to the same > > > stream behind your back. > > You will never have any warabnty you write at the right location. > If we have warranty today that if we seek before each write of a line, the > write will be at the right location, You do not seem to understand how multithreading application work: you have no waranty. Suppose that between the seek and the write, your thread is descheduled and another does a write on the file, it will write where you where supposed to write and you will write after that.
Yes, but in that regard my proposal is no worse than current state. It can even be better, less seeks and subsequent writes so less descheduling windows that you describe.
(In reply to :aceman from comment #41) > Yes, but in that regard my proposal is no worse than current state. It can > even be better, less seeks and subsequent writes so less descheduling > windows that you describe. Seek is already bad as explained above and "less windows" will just make the root cause of corruption harder to address later.
1. I know longer remember where I saw bienvenu talking about seeks (probably on context of compact), but I'm sure bienvenu has deep understand of this area and surely understands any issues that were known as of the time he stopped being active. So if it would help, I could reach out to him with questions, if you have specific ones you can formulate. 2. bugs which mention lseek, seek or unbuffered - http://mzl.la/1HRwQYD (only 3 have only unbuffered) - where bienvenue was a participant
Multithreaded access to the actual mbox file is not something that exists today. Only IMAP, import, and LDAP run off-main-thread, and I'm pretty sure IMAP proxies back to the main thread to save messages while import tries to prevent the entire application from doing anything while running the import thread (which is broken on so many levels, but that's not important). (In reply to Eric Valette from comment #35) > I totally agree. And instead of finding costly and anyway unsecure means > (operation are not atomic so that after checking/seeking it may be false) > to try to detect that someone else is not writing to the same file, from the > same app; why not on earth use file locking via Advisory record locking > implemented by fcntl? or even Open file description locks? File locking on NFS has historically been completely and totally broken. It may still be. On the other hand, we are presently trying to discourage the practice of supporting other applications mucking about our profile directory while TB Is running...
(In reply to Joshua Cranmer [:jcranmer] from comment #44) > Multithreaded access to the actual mbox file is not something that exists > today. Only IMAP, import, and LDAP run off-main-thread, and I'm pretty sure > IMAP proxies back to the main thread to save messages while import tries to > prevent the entire application from doing anything while running the import > thread (which is broken on so many levels, but that's not important). So whats the use of the seek and what is it supposed to protect from? I just say it is costly for networked operation and does no protect you if anything is running really in // (whether a thread or a process). I have no clue of what thunderbird threading architecture is but trying to protect the target file via seek and write is broken if anyone does anything in parallel on the same file and if nothing is done really in // on the target file, then I do not see what the purpose could be. > File locking on NFS has historically been completely and totally broken. It > may still be. On the other hand, we are presently trying to discourage the > practice of supporting other applications mucking about our profile > directory while TB Is running... Current code on the other hand is totally broken for my use case (imap for input storage on filer via CIFS). I lost data and trash my local folder each time I transfer Two 5Mb message in a single move. All other transfers including several GB tranfers from the same machine on the same destination is rather fast and never lose data.
Do not forget that the problems I see are specific to Linux IO implementation in my case. See comment#11
(In reply to Eric Valette from comment #46) > Do not forget that the problems I see are specific to Linux IO > implementation in my case. See comment#11 I will see if I can solve the issue related to CIFS implementation under linux. Just yesterday, I finally got an interesting error from low-level Read() during the network-outage of remote CIFS-mount operation agains a router that can make a FAT32-formatted USB stick in it accessible as remote CIFS mount. (It turns out the firmware seems to be based on linux 2.26.x series.) I could mount it by adding sec=ntlmv2 to mount.cifs command. As a result of the Read() error noticed after the longish time out occurs, I have added the following code below and if errno turned out to EINTR and friends, I may want to add repeat code to see if it improves the situation. diff --git a/netwerk/base/nsFileStreams.cpp b/netwerk/base/nsFileStreams.cpp --- a/netwerk/base/nsFileStreams.cpp +++ b/netwerk/base/nsFileStreams.cpp @@ -484,16 +484,28 @@ nsFileInputStream::Close() } return rv; } NS_IMETHODIMP nsFileInputStream::Read(char* aBuf, uint32_t aCount, uint32_t* _retval) { nsresult rv = nsFileStreamBase::Read(aBuf, aCount, _retval); + // + // When a error of remote CIFS mount for one's Mail directory is simulated + // the Read above failed. +#if 1 + int myerrno; + myerrno = errno; // we want to save this so error I/O does not change this. + if (NS_FAILED(rv)) { + fprintf(stderr,"nsFileStreamBase::Read failed errno=%d, *_retval\n", + myerrno, *_retval); + } +#endif + NS_ENSURE_SUCCESS(rv, rv); // Check if we're at the end of file and need to close if (mBehaviorFlags & CLOSE_ON_EOF && *_retval == 0) { Close(); } return NS_OK; I will eventually make "int myerrno" into a global variable so that it can be referenced in TB code safely. TIA
Re "deleting seek" issue, I would like to thank everybody for the valuable feedback. After a brief analysis, I am tempted for removing "seek" completely. See my reasoning below. Before doing so, has anyone noticed that, during download, TB is not responsive in terms of UI operation? To see if the manual marking of messages during download may mess with download code's file position, I tried to do exactly that but could not do so very well because UI doesn't respond quickly enough. I could never mark a message while the downloading is in operation. The download code seems to hog CPU. (I even inserted sched_yield() to be called after every 300KB write or so in the busy loop, but to no avail.) [Or could it be there is a lock that prevents me from doing this(?). But I did not see any dialog or something like that. BTW, I usually wait for the download without doing anything and thus I have never realized that TB is so unresponsive during downloading. Maybe this unresponsive is due to the original design.] Second: regarding the opening of File descriptor to a same file, I think for Inbox, TB uses "rw" flag and so someone can open it also. But such second simultaneous open may never happen: see my hypothesis and tentative analysis below. The following is the gist of my finding. After a brief analysis [which ends up as a long writing below.], which still needs to be completed, I am tempted to remove the seek code completely. Tentative conclusion: Filtering can not mess with |m_outFileStream| since it is executed after the file stream associated with it is closed. So I can at least rule out the possibility of "someone else" being the filter code. Compact uses a high-level locking and doesn't mess with Pop3 download. Great. I *think* manual marking(tag/label) of messages can be safely ruled from the analysis in item 3 below since it seems to create a file stream on its own. (Please correct my misunderstanding if there is a hole in my thinking there. I am confused about the changes of database and folder file itself.) This leaves us deleting/moving of messages, RENAME, as as potential actions that can move the file pointer position behind the back of pop3 download code. If anyone has a comment about possibly missing actions in the item-1 below, I will appreciate it. See below for more details. TIA CI Note: in the following, I am worried about the manual marking (tag/label) may interfere with the filter-induced marking. Well, come to think of it, filter-induced marking can only refer to the message just being downloaded, and manual marking would refer to only EXISTING messages and so the two never interferes with each other, I think. ======= HYPOTHESIS: For all possible actions on "Inbox", the operation opens a different/separate file descriptor (rw) to Inbox thus does not interfere with the file stream, |m_outFileStrtream| used by pop3 download code. COROLLARY: If so, then we can safely assume that POP3 "download" code does not have to seek (|m_outFileStream|) to the end before writing to Inbox within nsresult nsPop3Sink::WriteLineToMailbox(const nsACString& buffer) and thus we can increase the pop3 download throughput. That is, if no one else moves the file pointer associated with file descriptor, |m_outFileStream|, there is no need for // seek to the end in case someone else has seeked elsewhere in our stream. Note: in the case of downloading by ways of temporary file, which a user can specify by user preference, I don't think we can interfere with the download code from outside at all until the download finishes and the message is appended to Inbox. So we can ignore this operational mode. Proving the hypothesis in the current code base So I set out to answer the following questions to prove the hypothesis. CAVEAT: this analysis is for pop3 (local download) and not for imap. (I have no idea if imap code has an issue like this extra seek which degrades the download throughput considerably due to invalidated buffering.) Item 1: Possible operations to Inbox What can we do to Inbox? We need to enumerate them. (What *might* change file descriptor used by POP3 code.) List of Actions: Append ... pop3: this is download code in question itself. * DELETE a message (or multiple messages) ? COPY a message (or multiple messages) TO other folder. This does not modify Inbox, so I doubt if this could be the culprit. * COPY (FROM other folder: modification to Inbox) ... I think copy in this sense is like Append above. * MOVE (In a sense delete: copy + delete) MARK existing message manually - Add tag (label?) to existing message manually - Remove tag - Change tag ... I think these are all handled in a similar code sequence. COMPACT ... this is done with clearly a locking although the locking contention is not reported visibly. (It does not have to be modal notification. It can be a simply a message in the debug console or transient message that disappears after a while automatically...) RENAME (well, nobody wants to rename Inbox, I think, and it won't involve a file descriptor downloading uses, I think.) WHAT ELSE? <=== I appreciate if I have forgotten something here. Modification By Filter: Note: the above delete, copy, move, add tag, remove tag, etc. can be invoked by user-defined filter IMMEDIATELY after a message is downloaded and safely stored in Inbox, I think. [CHECKED: I was not sure if the filter is invoked for each message or after the series of messages are stored into Inbox. I found that filter is run in |incorporateComplete()|, and this is done for each message. As found below, filter operation is done when |m_outFileStream| is closed and so filter won't mess with the file position used by |WriteLineToMailbox|. Great. I have eliminated one possible cause of background change to the file position. BTW, to the uninitiated including myself, the following log printed after I hit "Get Message" button shows the order of operations of some important functions. Begin mail message delivery. GetDiskSpaceAvailable returned: 6650036224 bytes Incorporate message begin: uidl string: 00004cad533eeb2c GetDiskSpaceAvailable returned: 6646890496 bytes Incorporate message complete. Incorporate message begin: uidl string: 00004cae533eeb2c GetDiskSpaceAvailable returned: 6646890496 bytes Incorporate message complete. Incorporate message begin: uidl string: 00004caf533eeb2c GetDiskSpaceAvailable returned: 6646890496 bytes Incorporate message complete. Incorporate message begin: uidl string: 00004cb0533eeb2c GetDiskSpaceAvailable returned: 6646726656 bytes Incorporate message complete. Incorporate message begin: uidl string: 00004cb1533eeb2c Incorporate message complete. End mail message delivery. off-topic question/observation: I am disturbed to find that the extra message I printed for GetDiskSpaceAvailable "GetDiskSpaceAvailable returned: 6646726656 bytes" is not visible for some "Incorporate message begin:", "Incorporate message complete" pairs. Maybe there is a path that forgets to check the disk space? TB code leaves much room for improvement. Whenever I look into something, I notice another potential bug :-( Item 2: Check if the operations to Inbox, enumerated in item 1, can interfere the use of |m_outFileStream| by download code Well, I do it in a reverse manner. I study the life cycle of the file descriptor used by download code, specifically |m_outFileStream| used by |WriteLineToMailbox|, and see if other operations can interfere with it. 2-tentative conclusion from the life-time analysis. Filtering can not mess with |m_outFileStream| since it is executed after the file stream is closed. So I can at least rule out the possibility of "someone else" being the filter code. This leaves us manual marking(tag/label) of messages, deleting/moving of messages as possible culprit. 2-a: Creation of m_outFileStream: |m_outFileStream| is initialized in nsPop3Sink::IncorporateBegin(const char* uidlString, if download is to be temporary file rv = MsgGetFileStream(m_tmpDownloadFile, getter_AddRefs(m_outFileStream)); CI's comment: Another TODO/FIXME: we may want to create a buffered stream here. or rv = m_msgStore->GetNewMsgOutputStream(m_folder, getter_AddRefs(newHdr), &reusable, getter_AddRefs(m_outFileStream)); filter is disabled if (m_uidlDownload) if (m_uidlDownload) m_newMailParser->DisableFilters(); (But I am not sure what it means. Filtering *IS* done in |IncorporateComplete()|. 2-b Close() of m_outFileStream: |m_outFileStream| is closed in nsPop3Sink::EndMailDelivery(nsIPop3Protocol *protocol) if it is not closed earlier in the error processing path. if (m_outFileStream) { nsresult rv3 = m_outFileStream->Close(); m_outFileStream = 0; NS_WARN_IF_FALSE(NS_SUCCEEDED(rv3), "m_outFileStream->Close() failed;"); } As I read the code, I noticed that folder lock is released in nsPop3Sink::EndMailDelivery(nsIPop3Protocol *protocol) PR_LOG(POP3LOGMODULE, PR_LOG_MAX, ("Calling ReleaseFolderLock from EndMailDelivery")); nsresult rv = ReleaseFolderLock(); NS_ASSERTION(NS_SUCCEEDED(rv),"folder lock not released successfully"); (Oh well, |BeginMailDeliver| locks this folder. Hmm, may be we can ask all the operations on Inbox to honor the lock while? [At least compactifying code honors this. See discussion on COMPACT in item 3.] If so, we don't need the offending seek code. But, even if the lock is observed, which seems to be the case by compact code at least, I need to check if the low-level file descriptor is not used outside download code by duplication, etc., just in case.) Filter is then executed (AFTER the closing of m_outFileStream) in nsresult nsPop3Sink::EndMailDelivery(nsIPop3Protocol *protocol) bool filtersRun; m_folder->CallFilterPlugins(nullptr, &filtersRun); // ??? do we need msgWindow? int32_t numNewMessagesInFolder; // if filters have marked msgs read or deleted, the num new messages count (The following part is a little obscure to me: anyway, the file descriptor |m_outFileStream| used by the download code is closed already by this time.) filter is also then executed again toward the end of EndMailDelivery: // check if the folder open in this window is not the current folder, and if it has new // message, in which case we need to try to run the filter plugin. if (m_newMailParser) ... ... if (openFolder && openFolder != m_folder) { // only call filter plugins if folder is a local folder, because only // local folders get messages filtered into them synchronously by pop3. nsCOMPtr<nsIMsgLocalMailFolder> localFolder = do_QueryInterface(openFolder); if (localFolder) { bool hasNew, isLocked; (void) openFolder->GetHasNewMessages(&hasNew); if (hasNew) { // if the open folder is locked, we shouldn't run the spam filters // on it because someone is using the folder. see 218433. // Ideally, the filter plugin code would try to grab the folder lock // and hold onto it until done, but that's more difficult and I think // this will actually fix the problem. Another Close case: NS_IMETHODIMP nsPop3Sink::IncorporateComplete(nsIMsgWindow *aMsgWindow, int32_t aSize) This calls m_outFileStream->Flush() if the file is downloaded to a temporary file m_outFileStream is closed: if (m_downloadingToTempFile) { // close file to give virus checkers a chance to do their thing... nsresult rv3 = m_outFileStream->Close(); ... nsCOMPtr <nsIInputStream> inboxInputStream = do_QueryInterface(m_outFileStream); rv = MsgReopenFileStream(m_tmpDownloadFile, inboxInputStream); m_outFileStream is closed nsresult rv3 = m_outFileStream->Close(); Within IncorporateComplete(): When the download is done by way of a temporary file, the content is eventually copied by the following code: destination is m_folder. I checked that the stream to m_folder is opened within AppendMsgFromStream() afresh. rv = m_newMailParser->AppendMsgFromStream(inboxInputStream, hdr, msgSize, m_folder); Additional filter rule is called then in the final part of nsPop3Sink::IncorporateComplete(nsIMsgWindow *aMsgWindow, int32_t aSize) But by then, m_outFileStream has been closed already. // run any reply/forward filter after we've finished with the // temp quarantine file, and/or moved the message to another folder. m_newMailParser->ApplyForwardAndReplyFilter(aMsgWindow); So reading the above, I can already rule out filtering as possible action that moves the pointer position of |m_outFileStream| since this file stream is already closed when filtering is invoked. Item 3: Can other operations mess with m_outFileStream? Can the following operations mess with m_outFileStream? The life cycle analysis is moot on the following operations although I can at least rule out filtering-induced operations as the culprit of moving file position of |m_outFileStream|. So now I have to look in the code that implements the following operations to see if they *may* have a chance to interfere. COMPACT: No need to worry I think based on the following observation. It uses semaphore to lock the folder when compactifying proceeds. NS_IMETHODIMP nsFolderCompactState::Compact(nsIMsgFolder *folder, bool aOfflineStore, ... calls rv = localFolder->ParseFolder(m_window, this); and this subsequently calls, NS_IMETHODIMP nsMsgBrkMBoxStore::RebuildIndex(nsIMsgFolder *aFolder, which eventually invokes the following locking code: |BeginMailDeliver| also invokes. So these are mutually exclusive. 806 bool isLocked; 807 aFolder->GetLocked(&isLocked); 808 if (isLocked) 809 { 810 NS_ASSERTION(false, "Could not get folder lock"); 811 return NS_MSG_FOLDER_BUSY; 812 } cf. BeginMailDelivery also acquires a semaphore(?) I wonder what it is for. Oh I see, actually GetLocked() seems to only returns if the semaphore is obtained by someone else, and real locking is done by semaphore beneath. Hmm... so GetLocked() can have a race issue. Code snippet from BeginMailDelivery: nsCOMPtr <nsISupports> supports = do_QueryInterface(static_cast<nsIPop3Sink*>(this)); m_folder->GetLocked(&isLocked); if(!isLocked) { PR_LOG(POP3LOGMODULE, PR_LOG_MAX, ("BeginMailDelivery acquiring semaphore")); m_folder->AcquireSemaphore(supports); } MOVE still to be investigated. COPY to still to be investigated. COPY from DELETE still to be investigated. (manual) MARK No need to worry about. It seems to open its file stream. it seems that we need to update message database, message store (Inbox) in the case of Berkeley mbox style file (Inbox), and there seems to be a cache of headers (?) also in the memory. We are concerned with the update of Inbox (a folder) basically. But following the code often ends up with the changes to DB and cache. Very confusing. Anyway, I *think* marking eventually ends up in calling NS_IMETHODIMP nsMsgBrkMBoxStore::ChangeKeywords(nsIArray *aHdrArray, const nsACString &aKeywords, bool aAdd) http://mxr.mozilla.org/comm-central/source/mailnews/local/src/nsMsgBrkMBoxStore.cpp#920 which seems top open its own stream: 935 rv = GetOutputStream(aHdrArray, outputStream, seekableStream, 936 restoreStreamPos); and then closes it. nsMsgBrkMBoxStore::GetOutputStream(nsIArray *aHdrArray, http://mxr.mozilla.org/comm-central/source/mailnews/local/src/nsMsgBrkMBoxStore.cpp#828 calls either m_outputStreams.Get(URI, getter_AddRefs(outputStream))) or if fails to leave a usable stream in |outputStream|, it calls MsgGetFileStream(mboxFile, getter_AddRefs(outputStream)); to create a stream. (The latter will eventually call 48 nsresult nsMsgFileStream::InitWithFile(nsIFile *file) 49 { 50 return file->OpenNSPRFileDesc(PR_RDWR|PR_CREATE_FILE, 0664, &mFileDesc); 51 } (Note PR_RDWR|PR_CREATE_FILE flag. It is not exclusive write.) In either case, a new stream is created, thus it is unlikely that |m_outFileStream| operation is messed up with marking a message. I am a little disturbed, though, that manual marking may end up interfering with filter-induced marking since there does not seem to be a high-level lock operation on for manual marking. (Filtering code is invoked inside BeginMailDelivery and EndMailDelivery and thus the folder is locked during the operation. If manual marking observes the locking, the chance of interference is nil.) It is possible that I missed it during my browsing of the code. I have not finished the analysis completely, I think I have seen semaphore being mentioned for copy code and see AddKeyWordsMessages() in the code somehow. I wonder why. e.g. |AddKeywordsMessages()| is called within |OnCopyCompleted()| http://mxr.mozilla.org/comm-central/source/mailnews/local/src/nsLocalMailFolder.cpp#1457 I am very unfamiliar with how copy/move/delete, etc. is invoked from UI, or for that matter how marking operation is invoked, also. RENAME other To be filled. [EOF] TIA
What about creating a temporary patch that will show a warning in the error console if the file pointer isn't at the end of the file at the place we call the seek? This patch would be checked in in nightlies. We would still seek and with a Tell() before it so it would be slow, but we may detect situations when the seek is needed. And this would only be for dev versions on TB. Make the error message contain request to report into this bug :) After som time we may know more about why the seek is there and can better decide whether to remove it.
(In reply to :aceman from comment #49) > What about creating a temporary patch that will show a warning in the error > console if the file pointer isn't at the end of the file at the place we > call the seek? This patch would be checked in in nightlies. We would still > seek and with a Tell() before it so it would be slow, but we may detect > situations when the seek is needed. And this would only be for dev versions > on TB. Make the error message contain request to report into this bug :) > After som time we may know more about why the seek is there and can better > decide whether to remove it. Great idea. Actually I have a local patch that include code does just that (#ifdef/endif#). If someone hits the problem of file position changed behind our back, we can be sure that the seek() is necessary :-( I will post a patch this evening (JST: it is in the morning now.) TIA
Actually, I had a little time to go out and so finished the patch and tested and ran |make mozmill|. What do you think?
Attachment #8597567 - Flags: review?(neil)
Attachment #8597567 - Flags: feedback?(acelists)
(In reply to ISHIKAWA, Chiaki from comment #51) > Created attachment 8597567 [details] [diff] [review] > Check fileposition before and after a supposedly unnecessary seek() and > report mismatch. > I am afraid that I used NS_ERROR() and NS_WARNING(). They do not show up in error console, do they? I wonder what functions I should use from C++ side to print a message into error console. I suspect that I need to invoke JavaScript function via XPCOM. But I am not exactly sure how it is done TIA
Comment on attachment 8597567 [details] [diff] [review] Check fileposition before and after a supposedly unnecessary seek() and report mismatch. >+#if 1 >+ // extra check: record the pre-seek postion >+ int64_t before_seek_pos; >+ nsresult rv1 = seekableOutStream->Tell(&before_seek_pos); >+ if (NS_FAILED(rv1)) >+ { >+ NS_ERROR("seekableOutStream->Tell(&before_seek_pos); failed."); >+ } >+#endif >+ >+ // xxx actually we ought to check the error return value. > seekableOutStream->Seek(nsISeekableStream::NS_SEEK_END, 0); >+ >+#if 1 These needs to be #ifdef DEBUG >+ if ( before_seek_pos != after_seek_pos ) { >+ // Use NS_ERROR so that we can capture stack trace. >+ NS_WARNING("! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! "); >+ NS_WARNING("Call for Help"); >+ NS_WARNING("! ! ! ! ! Unexpected seek filepos change. ! ! ! ! !"); >+ NS_WARNING("Please report that you saw this message to the following bugzilla."); >+ NS_WARNING("https://bugzilla.mozilla.org/show_bug.cgi?id=1116055"); >+ NS_WARNING("! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! "); >+ NS_ERROR("(debug): WANT to find out where this occurs. before_seek_pos != after_seek_pos"); >+ >+ } This should probably replaced by NS_ASSERTION(before_seek_pos == after_seek_pos, "Stream position changed unexpectedly");
Attachment #8597567 - Flags: review?(neil) → feedback+
I don't think we want to use NS_ERROR or NS_ASSERTION as that will crash the process. Nobody would use such a build on real emails:) Ideally we want to show the message in the Error console. Neil, do you know how to call .reportError or similar from C++ code? Maybe like this: nsCOMPtr<nsIConsoleService> console(do_GetService(NS_CONSOLESERVICE_CONTRACTID)); if (console) { nsAutoString filename; nsCOMPtr<nsIScriptError> error(do_CreateInstance(NS_SCRIPTERROR_CONTRACTID)); error->Init(NS_LITERAL_STRING("Our message"), filename, EmptyString(), lineno, 0, nsIScriptError::warningFlag, "chrome javascript"); console->LogMessage(error); }
Comment on attachment 8597567 [details] [diff] [review] Check fileposition before and after a supposedly unnecessary seek() and report mismatch. Review of attachment 8597567 [details] [diff] [review]: ----------------------------------------------------------------- Yes, this is exactly the message I had in mind :) Just make it shown in the error console so that our testers on nightly channels can see it in normal usage.
Attachment #8597567 - Flags: feedback?(acelists) → feedback+
There are different views on more extended use of error console from C++ code. I am certainly in the camp that important errors should show up there. I did a patch https://bugzilla.mozilla.org/attachment.cgi?id=8563884&action=diff that added a C++ interface to that, but it did not get landed because of controversy over the extended use of error console. But the code there for MsgLogToConsole is the correct way to add error console logging to C++ code.
(In reply to Kent James (:rkent) from comment #56) > There are different views on more extended use of error console from C++ > code. I am certainly in the camp that important errors should show up there. > > I did a patch > https://bugzilla.mozilla.org/attachment.cgi?id=8563884&action=diff that > added a C++ interface to that, but it did not get landed because of > controversy over the extended use of error console. But the code there for > MsgLogToConsole is the correct way to add error console logging to C++ code. Thank you for the pointer. I will shorten the message in something like a line or two, and try to put it into the error console. I think error console may be split into two for better usage. One is used for ERROR console as in today. The other one is used for showing informational logging continuously. It is hard to decide what message goes into which console window, but I think the type of ephemeral message we see when messages come in can be shown in the continous log window. [I assume that window is shown temporarily as new messages are written or at certain time interval.] But I digress. I will report back when I succeed in writing into error console. But isn't this code framework a little awkward? We create primitives in C++ so that that can be used by JS-side to create error console and API to write into them. Now when we want to write into error console, we need to go through hoops. It would be nice to have a C++ API that takes an ASCII string (or UTF8 string) and executes it as JS code snippet in TB's environment :-) [Like the top-level |eval| function in a Lisp environment.] TIA
"But isn't this code framework a little awkward? We create primitives in C++ so that that can be used by JS-side to create error console and API to write into them. Now when we want to write into error console, we need to go through hoops. It would be nice to have a C++ API that takes an ASCII string (or UTF8 string) and executes it as JS code snippet in TB's environment :-)" I don't understand this comment. XPCOM is the mechanism by which you can execute js in C++ and C++ in js, and that is how the snipped does that.
Knowing it may still be needed is one thing, removing the need for it should be the target as it slows down anyway and just shows concurrent threads do mess with the same file concurrently.
(In reply to Kent James (:rkent) from comment #58) > "But isn't this code framework a little awkward? We create primitives in C++ > so that that can be used by JS-side to create error console and API to write > into them. Now when we want to write into error console, we need to > go through hoops. It would be nice to have a C++ API that takes an ASCII > string (or UTF8 string) and executes it as JS code snippet in TB's > environment :-)" > > I don't understand this comment. XPCOM is the mechanism by which you can > execute js in C++ and C++ in js, and that is how the snipped does that. I fully understand. It is just that when we simply want a fixed JavaScript statements to be executed (without needing to reference a variable that is external of the fixed JavaSCript statements at all) a simple hypothetical function that takes a string and execute it as JS would be much easier to handle and concise: rv = EvalJS("dump('unexpected filepos change. Please report this to bugzilla 1116055');") rv = EvalJS ("Components.utils.reportError(' ibid ');"); or both. rv = EvalJS ("{ let mess='unexpected filepos change. Please report this to bugzilla 1116055'; dump (mess); Components.utils.reportError(mess); }"); This would be much more concise than invoking a dozen or so C++ source lines to simply print a line in Error Console and improves readability and maintainability IMHO, and can be used for many other things easily, but I digress. TIA
This is to be put in Nightly (not the final release) to check whether |Seek()| is necessary to fix strange unexpected file position change. Logically, and my local empirical testing produced no mismatch (but note that after refreshing the C-C tree the day before, I have to ADD one |Seek()| to the end in |IncorporateBegin| (I am not sure if this is new, or my mixup caused by erratic merge and recreation of hg patches.) Anyway the current patch fixes the issue and using rkent's logging code I can see the sanity check output very nicely! TIA
Attachment #8597567 - Attachment is obsolete: true
Attachment #8600042 - Flags: review?(neil)
Attachment #8600042 - Flags: feedback?(acelists)
Comment on attachment 8600042 [details] [diff] [review] Check fileposition before and after a supposedly unnecessary seek() and report mismatch. (Take 2. Uses rkent's logging code to console) Review of attachment 8600042 [details] [diff] [review]: ----------------------------------------------------------------- Quite a lot of code for what we want. And so many ifdefs? ::: mailnews/local/src/nsPop3Sink.cpp @@ +52,5 @@ > +// log an error string to the error console > +// Code based on rkent's unlanded patch: > +// https://bugzilla.mozilla.org/attachment.cgi?id=8563884&action=diff > +// > +NS_MSG_BASE void MsgLogToConsole(const nsAString &aErrorText, You could put this in nsMsgUtils.cpp or similar file. @@ +109,5 @@ > +NS_MSG_BASE > +void MsgLogToConsole(const char* aText, const nsAString &aFilename, uint32_t aLine); > +// nsAString > +NS_MSG_BASE > +void MsgLogToConsole(const nsAString& aText, const nsAString &aFilename, uint32_t aLine); Why this duplicate declaration? @@ +787,3 @@ > // seek to the end in case someone else has seeked elsewhere in our stream. > nsCOMPtr <nsISeekableStream> seekableOutStream = do_QueryInterface(m_outFileStream); > +#if 1 What are these "#if 1" blocks? @@ +827,5 @@ > + { > + static int warned_once = 0; > + if (!warned_once) { > + warned_once = 1; > + MSG_LOG_TO_CONSOLE("(info) We have disabled a |Seek()| in WriteLineToMailbox. cf. https://bugzilla.mozilla.org/show_bug.cgi?id=1116055"); What Seek() did we disable? This patch should not disable anything, just report when the existing seek actually moved the file pointer.
Attachment #8600042 - Flags: feedback?(acelists)
Attachment #8600042 - Flags: feedback?(acelists)
Blocks: 1121842
(In reply to :aceman from comment #62) > Comment on attachment 8600042 [details] [diff] [review] > Check fileposition before and after a supposedly unnecessary seek() and > report mismatch. (Take 2. Uses rkent's logging code to console) > > Review of attachment 8600042 [details] [diff] [review]: > ----------------------------------------------------------------- > > Quite a lot of code for what we want. And so many ifdefs? > > ::: mailnews/local/src/nsPop3Sink.cpp > @@ +52,5 @@ > > +// log an error string to the error console > > +// Code based on rkent's unlanded patch: > > +// https://bugzilla.mozilla.org/attachment.cgi?id=8563884&action=diff > > +// > > +NS_MSG_BASE void MsgLogToConsole(const nsAString &aErrorText, > > You could put this in nsMsgUtils.cpp or similar file. Yes, I could, but given rkent did not get this landed, I think this patch may be short-lived: to test if a situation exists where |Seek()| is necessary inside |WriteLineToMailbox()|. But nsMsgUtils.cpp might be a good idea. > > @@ +109,5 @@ > > +NS_MSG_BASE > > +void MsgLogToConsole(const char* aText, const nsAString &aFilename, uint32_t aLine); > > +// nsAString > > +NS_MSG_BASE > > +void MsgLogToConsole(const nsAString& aText, const nsAString &aFilename, uint32_t aLine); > > Why this duplicate declaration? Well, actually, if I put the function for logging to Error Console, the above duped declarations need to go a corresponding .h file and this file needs to include that header file. This time, I forgot to remove these. Maybe I will #if 0/,#ndif the functions. > @@ +787,3 @@ > > // seek to the end in case someone else has seeked elsewhere in our stream. > > nsCOMPtr <nsISeekableStream> seekableOutStream = do_QueryInterface(m_outFileStream); > > +#if 1 > > What are these "#if 1" blocks? > There are two blocks that start with #if 1. I wanted to make it easy to enable/disable the checking of file pointer position before and after |Seek()|. Well instead of "#if 1", I should probably say #if !defined(DISABLE_SEEK_POS_CHECK) in both places so that I can disable the checking by #define DISABLE_SEEK_POS_CHECK in one shot. > @@ +827,5 @@ > > + { > > + static int warned_once = 0; > > + if (!warned_once) { > > + warned_once = 1; > > + MSG_LOG_TO_CONSOLE("(info) We have disabled a |Seek()| in WriteLineToMailbox. cf. https://bugzilla.mozilla.org/show_bug.cgi?id=1116055"); > > What Seek() did we disable? This patch should not disable anything, just > report when the existing seek actually moved the file pointer. This part is inside /* #define DISABLE_SEEK */ #if ! defined(DISABLE_SEEK) ... #if 1 ... #endif Seek() <--- This is the Seek() to be disabled. ... #if 1 ... #endif #else | <==== The code snippet is here. Executed when DISABLE_SEEK is not defined. #endif So unless #define DISABLE_SEEK is done, the code does not disable |Seek()| in question. quetion.| TIA
This patch should be put into nightly during Beta period to see if there are combinations of user preferences, and other factors which *may* necessitates the strange |Seek()| for writing a single line to a local store during POP3 download. The code is a little longer than the single block of code that is in comment 54 by aceman. This is because I adopted a rather complete code by rkent that did not land before. I tweaked it further so that I can have different classes of messages: Error, Warning and Inforamtion. They are clearly distinguished in Error Console, and this patch has to be a good citizen to go with the classification. I addressed all the issues raised by aceman for the previous version of the patch (except for the length. Sorry it is long for the reason cited above.) All the #ifdefs are for easy switching of compilation mode during local testing. I am testing with/without the offending |Seek()| by modifying macro, etc. Also, surrounding the code block with #if,#endif makes it easy to identify which code should be eliminated easily. I will upload a screen to show that the distinction of error, warning, and information. I am using the logging code to print some other information regarding enabling buffering, etc. for "informational" purposes, too in other parts of nsPop3Sink.cpp (for fixing error handling, etc.) TIA
Attachment #8600042 - Attachment is obsolete: true
Attachment #8600042 - Flags: review?(neil)
Attachment #8600042 - Flags: feedback?(acelists)
Attachment #8600368 - Flags: review?(neil)
Attachment #8600368 - Flags: feedback?(acelists)
This is the screenshot of error console. An info message is printed by the patched code. We should think of using Error Console to communicate important information to the users. (In this sense, Error Console is a misnomer. Information Console or Information Window is more like it.) TIA
(In reply to aceman from comment #54) > I don't think we want to use NS_ERROR or NS_ASSERTION as that will crash the > process. You're thinking of MOZ_ASSERT. NS_ASSERTION will turn tests orange in debug builds, so that would have been useful for a try run too see whether there are any obvious issues.
Comment on attachment 8600368 [details] [diff] [review] Check fileposition before and after a supposedly unnecessary seek() and report mismatch. (Take 3. Uses rkent's logging code to console) modified based on aceman's comment >+ if (!scriptError) { >+#if DEBUG >+ fprintf(stderr,"Failed to create scriptError.%s:%d\n", __FILE__, __LINE__); >+#endif >+ return; >+ } I think we already have something like this, something like if (NS_WARN_IF(!scriptError)) return; >+/** >+ Helper methods and macro for reporting an error to the console >+ */ I wouldn't add helper methods, I'd just do it all in the macros. >+// macro with filename and line number >+#define MSG_LOG_TO_CONSOLE(_text) MsgLogToConsole(_text, NS_LITERAL_STRING(__FILE__), __LINE__) >+#define MSG_LOG_INFO_TO_CONSOLE(_text) MsgLogToConsole4(_text, NS_LITERAL_STRING(__FILE__), __LINE__, nsIScriptError::infoFlag) >+#define MSG_LOG_WARN_TO_CONSOLE(_text) MsgLogToConsole4(_text, NS_LITERAL_STRING(__FILE__), __LINE__, nsIScriptError::warningFlag) You should use NS_LITERAL_STRING(_text) here. Also I'm not convinced about these names. How about this scheme: #define MSG_LOG_TO_CONSOLE(_text, _flag) MsgLogToConsole(NS_LITERAL_STRING(_text), NS_LITERAL_STRING(__FILE__), __LINE__, _flag) #define MSG_LOG_ERR_TO_CONSOLE(_text) MSG_LOG_TO_CONSOLE(_text, nsIScriptError::infoFlag) etc. >+#if DEBUG >+ int64_t first_pre_seek_pos; >+ nsresult rv3 = seekableOutStream->Tell(&first_pre_seek_pos); >+#endif >+ >+ // xxx handle error: such as network error for remote file system >+ seekableOutStream->Seek(nsISeekableStream::NS_SEEK_END, 0); >+ >+#if DEBUG >+ int64_t first_post_seek_pos; >+ nsresult rv4 = seekableOutStream->Tell(&first_post_seek_pos); >+ if (NS_SUCCEEDED(rv3) && NS_SUCCEEDED(rv4)) { >+ fflush(stdout); >+ fprintf(stderr,"(debug) first_pre_seek_pos = 0x%08lx, first_pos_seek_pos=0x%08lx\n", >+ first_pre_seek_pos, first_post_seek_pos); >+ } >+#endif [Check whether the position changed?] >+#if ! defined(DISABLE_SEEK) #ifndef? >+ // DISABLING SEEK! Actually this is the bit that isn't disabling seek! >+ if (NS_FAILED(rv2)) >+ { >+ NS_ERROR("seekableOutStream->Tell(&before_seek_pos); failed."); >+ } Since NS_ERROR does an assertion anyway, might as well use NS_ASSERTION directly. >+ if( before_seek_pos != after_seek_pos ) { >+ since_last_seek_report = 0; >+ /* This merits error message status. */ >+ MSG_LOG_TO_CONSOLE("!!! HELP ME !!! Unexpected filepos change! Report this error to https://bugzilla.mozilla.org/show_bug.cgi?id=1116055"); If you want people to report, a more useful message might be in order, such as "WriteLineToMailbox detected an unexpected file position change. If you can reliably reproduce this message, please report the steps you used to bug 1116055." >+#if DEBUG >+ fprintf(stderr,"before_seek_pos=0x%08lx, after_seek_pos=0x%08lx\n", >+ before_seek_pos, after_seek_pos); (You didn't fflush stdout this time?) >+ } else if (since_last_seek_report >= HOWOFTEN) { >+ since_last_seek_report = 0; >+ MSG_LOG_INFO_TO_CONSOLE("Periodic check in WriteLineToMailbox() during BETA testing Please ignore."); This could probably be improved too, such as HOWOFTEN " unnecessary seeks detected by bug 1116055 in WriteLineToMailbox." (this assumes that you initialised since_last_seek_report to 0 though) >+#else >+ /* this is defined(DISABLE_SEEK) branch */ (Put the comment on the #else line?) >+ static int warned_once = 0; >+ if (!warned_once) { >+ warned_once = 1; bool?
Attachment #8600368 - Flags: review?(neil) → feedback+
Hi, I uploaded a new patch based on Neil's comment. One great discovery. The additional |Seek()| I needed to perform is necessary only when I enable buffering (not in this patch, but other local tests). This explains why I have seen the |Seek()| was indeed necessary once per message. Basically, enabling buffering seems to reset the file's seek position to the beginning. Anyway, here are responses to Neils comment. (In reply to neil@parkwaycc.co.uk from comment #67) > Comment on attachment 8600368 [details] [diff] [review] > Check fileposition before and after a supposedly unnecessary seek() and > report mismatch. (Take 3. Uses rkent's logging code to console) modified > based on aceman's comment > > >+ if (!scriptError) { > >+#if DEBUG > >+ fprintf(stderr,"Failed to create scriptError.%s:%d\n", __FILE__, __LINE__); > >+#endif > >+ return; > >+ } > I think we already have something like this, something like > if (NS_WARN_IF(!scriptError)) > return; Yes, you are right. Fixed here in another place just below. > >+/** > >+ Helper methods and macro for reporting an error to the console > >+ */ > I wouldn't add helper methods, I'd just do it all in the macros. I cleaned up the usage of the function and left only a function with "4" at the end of the name: this is to make the function name different from the original rkent version to avoid confusion. > >+// macro with filename and line number > >+#define MSG_LOG_TO_CONSOLE(_text) MsgLogToConsole(_text, NS_LITERAL_STRING(__FILE__), __LINE__) > >+#define MSG_LOG_INFO_TO_CONSOLE(_text) MsgLogToConsole4(_text, NS_LITERAL_STRING(__FILE__), __LINE__, nsIScriptError::infoFlag) > >+#define MSG_LOG_WARN_TO_CONSOLE(_text) MsgLogToConsole4(_text, NS_LITERAL_STRING(__FILE__), __LINE__, nsIScriptError::warningFlag) > You should use NS_LITERAL_STRING(_text) here. > Also I'm not convinced about these names. How about this scheme: > #define MSG_LOG_TO_CONSOLE(_text, _flag) > MsgLogToConsole(NS_LITERAL_STRING(_text), NS_LITERAL_STRING(__FILE__), > __LINE__, _flag) > #define MSG_LOG_ERR_TO_CONSOLE(_text) MSG_LOG_TO_CONSOLE(_text, > nsIScriptError::infoFlag) > etc. Thank you for the suggestion. I changed the macros accordingly and they look more organized. > >+#if DEBUG > >+ int64_t first_pre_seek_pos; > >+ nsresult rv3 = seekableOutStream->Tell(&first_pre_seek_pos); > >+#endif > >+ > >+ // xxx handle error: such as network error for remote file system > >+ seekableOutStream->Seek(nsISeekableStream::NS_SEEK_END, 0); > >+ > >+#if DEBUG > >+ int64_t first_post_seek_pos; > >+ nsresult rv4 = seekableOutStream->Tell(&first_post_seek_pos); > >+ if (NS_SUCCEEDED(rv3) && NS_SUCCEEDED(rv4)) { > >+ fflush(stdout); > >+ fprintf(stderr,"(debug) first_pre_seek_pos = 0x%08lx, first_pos_seek_pos=0x%08lx\n", > >+ first_pre_seek_pos, first_post_seek_pos); > >+ } > >+#endif > [Check whether the position changed?] Well, I thought this was unnecessary because the file position would differ always. Not so when the buffering is not enabled (and it is not eanbled in this patch.) I test buffering in other local tests. I figure now that file position is rewound to the beginning when buffering is enabled. One confusion is now cleared in my mind here. Yes, the informational message is printed if the seek position is different. > >+#if ! defined(DISABLE_SEEK) > #ifndef? > > >+ // DISABLING SEEK! > Actually this is the bit that isn't disabling seek! Sorry about the confusing comment. I took it out. > >+ if (NS_FAILED(rv2)) > >+ { > >+ NS_ERROR("seekableOutStream->Tell(&before_seek_pos); failed."); > >+ } > Since NS_ERROR does an assertion anyway, might as well use NS_ASSERTION > directly. Yes, I now have NS_ASSERTION (0, ...) Oh, maybe you menat NS_ASSERTION(NS_SUCCEEDED(rv2), "the error message") ? > >+ if( before_seek_pos != after_seek_pos ) { > >+ since_last_seek_report = 0; > >+ /* This merits error message status. */ > >+ MSG_LOG_TO_CONSOLE("!!! HELP ME !!! Unexpected filepos change! Report this error to https://bugzilla.mozilla.org/show_bug.cgi?id=1116055"); > If you want people to report, a more useful message might be in order, such > as > "WriteLineToMailbox detected an unexpected file position change. If you can > reliably reproduce this message, please report the steps you used to bug > 1116055." Changed the message here. > >+ fprintf(stderr,"before_seek_pos=0x%08lx, after_seek_pos=0x%08lx\n", > >+ before_seek_pos, after_seek_pos); > (You didn't fflush stdout this time?) I added fflush(stdout). fflush(stdout) is not strictly necessary, but sometimes I find the buffering of stdout makes the log output intermixed in asynchronous manner with the output done to |stderr| and thus I often want to put fflush(stdout). > >+ } else if (since_last_seek_report >= HOWOFTEN) { > >+ since_last_seek_report = 0; > >+ MSG_LOG_INFO_TO_CONSOLE("Periodic check in WriteLineToMailbox() during BETA testing Please ignore."); > This could probably be improved too, such as > HOWOFTEN " unnecessary seeks detected by bug 1116055 in WriteLineToMailbox." > (this assumes that you initialised since_last_seek_report to 0 though) The intention here is not counting the number of unnecessary |Seek()| which is proportional to the number of lines (CRLF ending) of downloaded messages. But rather, I simply wanted to aler the user and developer that |Sync()| is called. (In the future patch, this |Seek()| is disabled.) > >+#else > >+ /* this is defined(DISABLE_SEEK) branch */ > (Put the comment on the #else line?) > Yes, I did so. > >+ static int warned_once = 0; > >+ if (!warned_once) { > >+ warned_once = 1; > bool? Used bool. I put you as a reviewer, but I am not sure if it was OK. TIA
After months of trying to fix I/O issues and local testing of C-C TB, now I am fairly confident that there is only one thread that tries to use the I/O stream and performs seek operation: that is obviously the main thread only. (Socket descriptor cannot be used to perform seek operation obviously.) And I now understand that one seek per one message becomes necessary in the current code when buffering is enabled, but that is an acceptable overhead. So I am going to start a cleanup and request for formal review. There is a whole series of patches that must be applied in a given order (including the latest ones related to short read patches [Bug 1170564 - [META] Failure to deal with short read]. I hope the result is a more robust and efficient (buffered I/O) TB. How do I request the review for such related series. Just start with the first one (actually this entry), and then after the first one, start requesting the second one? TIA
If they don't depend on each other, it's probably best to submit them one by one.
What I meant was, preferably "one issue per patch".
(In reply to Magnus Melin from comment #72) > What I meant was, preferably "one issue per patch". I wish the situation out there is a simple one: however, the I/O mess is very complex and I don't know whether my half-hearted attempt to create a series of patches makes sense or not. I will start doing this in a few days. [I have found another issue with short-read in M-C tree...] TIA
Blocks: 1174500
This is a slightly modified patch. I have moved the macros to nsMsgUtils.h as suggested before. (This is because I may find the macros useful to report otherwise unseen/unreported issues to error console so that users can take remedial actions.) This is going to be the first patch in the following series of patches I have created over the last few months. ======================================== ORDER ======================================== 1 A check_fileptr_change.patch: Bug 1116055: Check the unexpected change of file seek position and ask the user to report it 1116055 part-a This was considered a necessary step so that we can know that if any user experiences a strange seek position change. 2 A fix-close-step-1.patch: Bug 1122698 Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends (part 1 of a series) bug 1122698 3 A fix-close-part-2.patch: Bug 1134527 Add error value checking of Close() and Flush() (part-2 of a series) bug 1134527 4 A fix-close-part-3-write-error-check.patch: Bug 1134529: Check Write failures (part-3 of a series) bug 1134529 5 A removing-a-Seek.patch: C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (step 4) NEW BUG to be submitted. 6 A 1116055-acelist.patch: One liner patch posted in Bug 1116055 to return a buffered stream from mbox::getNewOutputStream by aceman NEW Bug to be submitted. (1116055 part-b to be split into separate bug.) I will create a separate bug for this since it will be necessary to apply this patch only after other patches landed. (This is because enabling the buffering requires proper error checking of |Close()| and |Flush()|. These are only taken care of AFTER 1-5 patches above are applied.) Maybe about a month after the first patch above is applied, I can enable this patch. 7 A add-short-read-handling.patch: Bug 1170606 - C-C T-B fixes to take care of short read 11170606 This is theoretically independent of 1-6, but my local patch assumes the changes in 1-6 to be in place. (The patch may not apply cleanly.) There are some additional shor-read issues UNDER M-C portion of the source tree. They can be applied indepdent of this series. TIA
Attachment #8542002 - Attachment is obsolete: true
Attachment #8570630 - Attachment is obsolete: true
Attachment #8600730 - Attachment is obsolete: true
Attachment #8600730 - Flags: review?(neil)
Attachment #8625451 - Flags: review?
Attachment #8625451 - Flags: review? → review?(neil)
Blocks: 1176857
Blocks: 1134529
Blocks: 1134527
Blocks: 1122698
Comment on attachment 8625451 [details] [diff] [review] Check fileposition before and after a supposedly unnecessary seek() and report mismatch. (Take 5. Uses rkent's logging code to error console) modified based comments of aceman and Neal I put the reviewer to neil. I wonder if he has the time, though. Just a quick summary of the content of the patch. - I created a set of function/macros to place an info/warn/error messasge in error console. The code was from :rkent. This was necessitated to have a user to report if a |Seek| operation, which has hindered the use of buffered output in Pop3 download for so many years, is every necessary on SOMONE's PC by checking the seek positions. I am sure it is not necessary any more, though. - The said |Seek()|, which was invoked for EVERY SINGLE message LINE of download will be removed now. However, there is one |Seek()| operation that becomes necessary once the buffering is enabled in the future patch ( Bug 1176857 - C-C TB Enabling buffering for pop3 download (and possibly part of imap file operation) I think missing this one |Seek()| per message was the cause of message corruption in the previous attempt a couple of years ago now(?). But I may be wrong. The patches as a whole passed local |make mozmill| and |make xpcshell-tests|. I am trying to show that these work on Try Server, but I encounter some strange issues on Windows and MacOS X10.7. - The said function/macros are used to print a message that the one seek per message is necessary. Once Bug 1176857 is applied in the ending part of the patch series, these extra messages for beta testers will be removed before final release preparation (or some REAL ERROR messages may remain since I found there are places where error is not reported at all to users. VERY BAD. [But I have not inserted such error message YET. These are niceties that can wait.] To disable the |Seek()| in this file, other patches especially the one later that adds |Seek()| per message in |IncorporateBegin()| is essential. Anyway, having this patch landed for daily build helps us to see if there ever is a warped PC that needs the |Seek()| in this file that is going to be removed. I doubt there is such a PC in the whole world, but we may want to be sure. TIA PS: I have noticed that return values of |Seek()| and |Tell()| are not checked. I will add the check in a planned patch to pick up these left-over failures to check low-level I/O check after [7] add-short-read-handling.patch: Bug 1170606 - C-C T-B fixes to take care of short read so that existing patches do not have to be modified heavily.
The patch has been modified to cope with some M-C header changes that broke compilation of a few C-C files and necessitated changes in C-C tree, too. I put Neil as reviewer before. Also, here is the tryserver run. https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=42b3eb889de9 xshell-tests these days are full of strange errors. Other than that, this patch does not add new errors. Please let me know how I should get started with the reviewing and landing of the series of the patches. TIA
Attachment #8625451 - Attachment is obsolete: true
Attachment #8625451 - Flags: review?(neil)
Attachment #8632789 - Flags: review?(neil)
BTW, I confirmed under linux that IMAP operation such as IMAP download does use buffered write after the series of patches are applied (this being the first one in series.) The check was done by strace.
(In reply to ISHIKAWA, Chiaki from comment #75) > Comment on attachment 8625451 [details] [diff] [review] > Check fileposition before and after a supposedly unnecessary seek() and > report mismatch. (Take 5. Uses rkent's logging code to error console) > modified based comments of aceman and Neal I was just about to add r+ to this when you attached your new patch... now you're going to have to wait for me to update my tree.
(In reply to neil@parkwaycc.co.uk from comment #78) > (In reply to ISHIKAWA, Chiaki from comment #75) > > Comment on attachment 8625451 [details] [diff] [review] > > Check fileposition before and after a supposedly unnecessary seek() and > > report mismatch. (Take 5. Uses rkent's logging code to error console) > > modified based comments of aceman and Neal > > I was just about to add r+ to this when you attached your new patch... now > you're going to have to wait for me to update my tree. Sorry to hear that. But anyway, the series of patches of which this patch is at the beginning would fail to apply cleanly or won't compile unless the change introduced in this patch and other members of the series of the patches are updated to cope with the latest tree changes (mostly a header change in M-C part of the tree). I will wait for you to get ready with the new tree... TIA
When you initiate the next try-run, would it be possible to also do an opt build? I'd love to test the patch stack on a few real network shares...
(In reply to Ignaz Forster from comment #80) > When you initiate the next try-run, would it be possible to also do an opt > build? I'd love to test the patch stack on a few real network shares... That will be wonderful. My local testing is not as exhaustive as I wish it to be. Real-world network share testing would be great. I take that you want the whole series of patches applied in a tryserver job. Stay tuned.
But I think the "ifdef DEBUG" blocks will be removed in an opt build. Will the test produce any useful results in that case?
> But I think the "ifdef DEBUG" blocks will be removed in an opt build. Will > the test produce any useful results in that case? Probably, as parts of this patch set may (as far as I understand it) fix performance issues. For testing purposes I've installed the last try-run build, but that one is actually slower than a stock Thunderbird 24 - which I assume is due to the debug sections...(In reply to :aceman from comment #82) (In reply to ISHIKAWA, Chiaki from comment #81) > I take that you want the whole series of patches applied in a tryserver job. That would be great, as I would check if I can provoke any inconsistencies that way. Note that I will do that from a user's (or administrator's) perspective without verifing the source code changes :-) I still hope that will be useful.
(In reply to Ignaz Forster from comment #83) > > But I think the "ifdef DEBUG" blocks will be removed in an opt build. Will > > the test produce any useful results in that case? > Hi, aceman has a point here. I am trying to leave error messages in error console, and also to STDERR output in the hope of recording some real errors that can happen. > Probably, as parts of this patch set may (as far as I understand it) fix > performance issues. For testing purposes I've installed the last try-run > build, but that one is actually slower than a stock Thunderbird 24 - which I > assume is due to the debug sections...(In reply to :aceman from comment #82) Ugh... Slower than Tb 24. That sounds tough... But yes, it *could* be due to copious error messages in debug build. > (In reply to ISHIKAWA, Chiaki from comment #81) > > I take that you want the whole series of patches applied in a tryserver job. > > That would be great, as I would check if I can provoke any inconsistencies > that way. Note that I will do that from a user's (or administrator's) > perspective without verifing the source code changes :-) > I still hope that will be useful. Do you intend to test linux version or windows version? Linux version has a better chance of recording real errors to STDERR even in non-debug build. Also, maybe strace can be used to see whether the new version invokes too many system calls unnecessarily if the new version is still slower than the old version. (we can compare the syscall profile) But anyway, checking the performance against real network share as a user is something I have been wanting to hear about. Once Bug 1183729 is fixed, I can create a binary. Bug 1183729 is a blocker now. Thank you for your patience. TIA
I'm mainly interested in testing the Linux version, but I may be able to also get access to a few Windows clients. Of course I can test with both the debug and the opt build (and see if it makes a difference performance wise). My plan would be to test against our own file server (Hitachi) first and check for performance regressions. After that I will have to get in contact with different departments to get access to their file servers (those are usually Novell and NetApp servers, and some of them are very slow when used with Thunderbird) which may take several weeks. All of the shares are cifs shares. If you want me to collect any additional information apart from the output of Thunderbird, the type of file server, mount options and the time needed to so some operations like copying mails (e.g. strace output) I will see if I can get that information, too.
(In reply to Ignaz Forster from comment #85) > I'm mainly interested in testing the Linux version, but I may be able to > also get access to a few Windows clients. Of course I can test with both the > debug and the opt build (and see if it makes a difference performance wise). > > My plan would be to test against our own file server (Hitachi) first and > check for performance regressions. After that I will have to get in contact > with different departments to get access to their file servers (those are > usually Novell and NetApp servers, and some of them are very slow when used > with Thunderbird) which may take several weeks. All of the shares are cifs > shares. > > If you want me to collect any additional information apart from the output > of Thunderbird, the type of file server, mount options and the time needed > to so some operations like copying mails (e.g. strace output) I will see if > I can get that information, too. This sounds wonderful! I would like to see the linux test result mainly since from what I understand is windows version suffer less performance issues: however, the slow download would be for both versions if I am not mistaken. Let me give a few days time to complete the binary creation. I was off to a family business since the weekend is a long weekend (Monday is a holiday in Japan) and so let me take a couple of days to figure out the current compilation problem. TIA
Comment on attachment 8632789 [details] [diff] [review] Check fileposition before and after a supposedly unnecessary seek() and report mismatch. (Take 6. Uses rkent's logging code to error console) modified based comments of aceman and Neal >+#define MSG_NS_INFO(_txt) do { MSG_LOCAL_INFO_TO_CONSOLE(_txt); fflush(stdout); fprintf(stderr,"(info) %s (%s:%d)\n", _txt, __FILE__, __LINE__); } while(0) >+ Nit: git apply warns about this new blank line at the end of the file
Attachment #8632789 - Flags: review?(neil) → review+
(In reply to neil@parkwaycc.co.uk from comment #87) > Comment on attachment 8632789 [details] [diff] [review] > Check fileposition before and after a supposedly unnecessary seek() and > report mismatch. (Take 6. Uses rkent's logging code to error console) > modified based comments of aceman and Neal > > >+#define MSG_NS_INFO(_txt) do { MSG_LOCAL_INFO_TO_CONSOLE(_txt); fflush(stdout); fprintf(stderr,"(info) %s (%s:%d)\n", _txt, __FILE__, __LINE__); } while(0) > >+ > Nit: git apply warns about this new blank line at the end of the file Thank you for the review. I will remove the blank line and reupload the patch and THEN will put the checkin-needed keyword. (Sorry for the delayed response. Due to excessive heat in Japan in the last week, I have not been able to do much programming...)
(In reply to ISHIKAWA, Chiaki from comment #86) > Let me give a few days time to complete the binary creation. I was off to a > family business since the weekend is a long weekend (Monday is a holiday in > Japan) and so let me take a couple of days to figure out the current > compilation problem. Hi, I am a college of Ignaz. My computer actually shows a massive performance drop an a CIFS share. I would like to try out a current build of yours. Could you make an x86 build for me to test. That would be great.
I also would like to test a patch serie but not on a git/svn/whatever version. I would like to apply a patch on last stable 38.x.
I do not see anymore progress on this! And I'm very disappointed! At least packages patches that we can test on top of a stable release...
(In reply to Eric Valette from comment #91) > I do not see anymore progress on this! And I'm very disappointed! At least > packages patches that we can test on top of a stable release... Perhaps what you should be disappointed in is that we do not have the resources to do everything we need. Money could help make that happen. Users with big storage (including me with my modest storage), and everyone using a laptop or on a network also would have preferred to see this fixed already. However, a) right or wrong we made a concious decision a year ago that we would first put the developer effort into reliablity issues on the IO stack to better protect user data, b) doing a) takes a long time thus 7 months til the first draft patch, c) there is only one volunteer developer on this who has other major time commitments elsewhere (a big thanks to chiaki for taking and staying with such a long project!), d) who unfortunately month ago had serious hardware issues on his development environment, without which we might have seen more activity here. This bug may or may not get fixed in time for version 45, but if the IO patches do get in, then there is a fair posibility that the buffering issue may be introduced after 45 ships during the 45 life cycle. So you may need to be patient a while longer.
What is ask is just the release of sorted patches and a way to apply them myselves on existing versions. I know TB is dying, but as long as there is nothing in kmail to access exchange I still need it.
>right or wrong we made a concious decision a year ago that we would first put the developer effort into reliablity issues on the IO stack to better protect user data, Oh and BTW, this bug cause hang when transfering imap stuff to network backuped cifs storage. So not fixing it do make people loose their data... I've lost several mail, on now I start a VM to run the TB version to transfer/sort mail to backup storage...
I mean the Windows TB version that does not have the IO problem...
(In reply to Eric Valette from comment #93) > What is ask is just the release of sorted patches and a way to apply them > myselves on existing versions. I know TB is dying, but as long as there is > nothing in kmail to access exchange I still need it. Are you good at building Thunderbird and evaluating the technical aspects of these patches? Even if you are, unless the developer has the remaining buffer patches ready to post today (which would then need to be reviewed) I doubt we are going to prioritize them now, because time is now short for accomplishing all the related work that must go into version 45 in the next few weeks. Are you not in a position to not use CIFS? And did bug 624806 comment 14 not help you?
>Are you not in a position to not use CIFS? Of course not. > And did bug 624806 comment 14 not help you? I already had this flags and saw no improvement. And BTW I already make comments in this 3 years old bug if you look at the end...
Hi, I am sorry I could not make a good progress for a few reasons. 1. My PC where I develop TB patches had major hardware malfunction this summer (UPS problem, hard disk problem, hard disk enclosure problen, my stand-by PC's motherboard died, etc. I am finally back on track on this issue.) 2. A patch that I posted to take care of partial read caused TB on Mac OSX to get hung and was backed out. I wanted to fix THAT issue first, but unfortunately mozilla's compiler farm could not compile TB on MacOS X last week. Maybe I should try this again this weekend. Once the number 2 is fixed, I will restart working on these. In the mean time, I noticed a few bugs in the current release of TB under linux (my main working environment) and so I will post the issues to bugzilla. Thank you for your patience.
Blocks: 561272
(In reply to Max Harmathy from comment #89) > (In reply to ISHIKAWA, Chiaki from comment #86) > > Let me give a few days time to complete the binary creation. I was off to a > > family business since the weekend is a long weekend (Monday is a holiday in > > Japan) and so let me take a couple of days to figure out the current > > compilation problem. > > Hi, I am a college of Ignaz. > My computer actually shows a massive performance drop an a CIFS share. I > would like to try out a current build of yours. Could you make an x86 build > for me to test. That would be great. Once successful x86 DEBUG build is as follows: https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=b22bd125725f I think the binary is available in http://archive.mozilla.org/pub/thunderbird/try-builds/ishikawa@yk.rim.or.jp-b22bd125725fe0480e526cd538df84e951b0a691/try-comm-central-win32-debug/ Are you using "Maildir" format for message storage. If so, you may want to create a new account and test the operation there. Although I suspect the errors I saw in "xpcshell-test" under windows on tryserver is the failure of ASYNC test framework under windows (and could be a ill-configured test that also has some issues under linux or OSX as well), I saw errors mostly with tests that were executed with "Maildir" message storage format. Be assured that my local test under linux and test on tryserver OSX has passed rather well (without any visible errors other than erros that are experienced by others who have nothing to do with my patches). It would be great if you can test to see if there is a peformance improvement, etc. However, DO NOTE that this is a debug version and may want to print out many debug output, which has been helpful in noticing I/O issues, and this printing alone may make the operation a little slower. However, use of 16K buffering, etc. would certainly make the operation smoother for CIFS operation. (Oops, this binary does not have the fix to use 16KB for buffering by detault.) TIA TIA
Attached patch Updated patch to address bitrot (obsolete) (deleted) — Splinter Review
This is an updated patch to address the bitrot issue since the previous patch was given r+. I think I need to get the r+ again, correct? Also, after a period of testing, I had to disable the printing to debug console for now. This is because XPCSHELL-TEST seems to grok the output to debug console during testing as an indication of error and fails such tests that trigger the output to debug console. So I had to disable it for now until I get to pass xpcshell-test with flying colors. Once I have patches in, I may want to tweak the code again to enable such debug console output based on user preference, etc. I will post the recent updates of all the patches and roadmap of the patches shortly and then request review. TIA
Here is the roadmap for this and other patches. Updated List of patches and rough road map. The following is the updated list of patches in 2016 (with exact bugzilla #, etc.) regarding the enabling of buffering to make message writing faster in C-C TB and related necessary error checking of low-level I/O routines. First and foremost, I don't claim my patch sets solve all the related I/O issues related to faster output operation with the use of buffering. It still lacks proper error recovery when network errors occur in some places. (The pritmitive case of failure to download message from remote mail server is more or less better handled than before. At least, there are more error checks.) At least it is a GOOD IMPROVEMENT over the current situation. Also, my patch makes it clear where proper error handling is missing. The current code does not check many errors which I could make happen by running C-C TB with the message store on a remote network file system and simulating network outages by pulling network cables. ======================================== PLANNED ORDER of landing patches ======================================== bug 1116055 -> bug 1242030 (consolidation of 1122698, 1134527, 1134529, 1174500) -> bug 1242042 -> bug 1176857 -> bug 1242046 -> bug 1242050 -> bug 1170606 An overview of patches in bugzilla entries. [1] Bug 1116055 (the patch in this bug needs updating due to bit rot.) check_fileptr_change.patch: Bug 1116055: Check the unexpected change of file seek position and ask the user to report it This was considered a necessary step so that we can know that if any user experiences a strange seek position change. The single most conspicuous culprit of slow down of I/O was a |Seek| was called at the writing of EACH LINE of message. It is hard to believe, but it is real. This kills any attempt to enable buffering. So the first thing to do was to investigate WHY this |Seek| was necessary. NOTE: The patch has already been given r+ before, and I have tested the operation of |Seek| and figured out the subtle issue of some functions to create buffered output stream reset the file's current position to the beginning without explicitly mentioning it. That, I think, was the cause of message corruption when buffering was tried a few years ago. It is fixed in my patch series below. However, a bit rot crept in due to other patches between the time the patch was given r+ and now. So I needed to updated it. I think I may need to ask a review of the updated version. [2] Bug 1242030 - Consolidated patch set from bug 1122698, bug 1134527, bug 1134529, bug 1174500 The |Seek| before writing each line was introduced, I think, when function to enable buffering inadvertently resets file position to the beginning. This may be a natural behavior of such functions, but I think it was not documented well and people were caught by surprise. I suspect that the excessive use of |Seek| was an attempt to put a band-aid over this issue, and when the attempt was made to enable buffering, a subtle issue of rewinding the file position seemed to cause file corruption. (I experienced such corruption myself, but thanks to the patch in [1], I could learn the resetting of file position occurred, which was traced to a function to enable buffering, and so I could fix the issue.) I have merged the four patches from bug 1122698, 1134527, 1134529, 1174500 mentioned below, into one patch set in bug 1242030, consisting of small patches to files under a directory each. Each patch is smaller than the whole gigantic consolidated patch since it now only address changes under a particular directory or two. The reason for consolidation is that patch as a whole is much easier to understand this way. (Error checks are uniformly visible in the consolidated patch whereas in the former separate patches, low-level I/O functions whose return values were not checked in the earlier patch are checked later. This makes the reader of the earlier patch sometimes wonder why we don't check the error status of a particular low-level I/O function when in fact such a function is checked in a later patch. Consolidated patch avoids this issue. To enhance error processing, I had to change the semantics of a couple of functions: - |discardNewMessage| does not close aOutputStream. (Clarification/Rationale in Bug 1121842, 1122698) - |finishdNewMessage| does not close aOutputStream. (Clarification/Rationale in Bug 1121842, 1122698) This required the changes to files under mailnews/import directory, too. The first patch in the following list addresses this need. Consolidated patch set in bug Bug 1242030 are split into the following smaller patches. Nulling-filestream-after-close-under-IMPORT-directory.patch fix-close-MERGED-base-dir.patch fix-close-MERGED-imap-dir.patch fix-close-MERGED-local-dir.patch and another patch that is explained later: enable-buffering-1st-step.patch (explained later in a different bugzilla, Bug 1242042.) [3] (Now merged in Bug 1242030) Bug 1122698 fix-close-step-1.patch: Bug 1122698 Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends (part 1 of a series) [4] (Now merged in Bug 1242030) Bug 1134527 fix-close-part-2.patch: Bug 1134527 Add error value checking of Close() and Flush() (part-2 of a series) [5] (Now merged in Bug 1242030) Bug 1134529 fix-close-part-3-write-error-check.patch: Bug 1134529: Check Write failures (part-3 of a series) [6] (Now merged in Bug 1242030, but then the essential part of bug 1174500 [was mistyped as 1117450 in a few bugzilla entries. :-( ] removing a |Seek| was spun off as a separate patch: Bug 1242046 ) removing-a-Seek.patch: C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (step 4) [7] (New in 2016 ) Bug 1242042 - Enabling buffering for file stream to write message for C-C TB A enable-buffering-1st-step.patch: Enabling buffering first step. I have extracted the code to enable buffering to be applied after the additional error checks in the preceding patches [1]-[6]. After a couple of months of testing of the final patches, I realize a certain buffering operations that were in the patch are actually unnecessary, and so commented them out. (We can enable them back again if my understanding of the code flow is incorrect. There are many combination of execution paths, and I may misunderstood the nature of certain paths. My gripe about slow downloading of messages and saving of large attachment *ARE* handled properly by buffering now.) I have created the separate patch to enable buffering since it will be necessary to apply this patch ONLY AFTER previous patches to check for primitive I/O errors have landed. (This is because enabling the buffering requires proper error checking of |Close()| and |Flush()|. |Close| was never checked properly in the existing code. These are only taken care of AFTER patches [1]-[6] above are applied. BTW, I have not had the time to fix this issue of missing error checks of |Close()| in files under mailnews/import directory completely yet. If you don't believe me, read the source! Fixing this issue would be a good student project. [8] Bug 1176857 1116055-acelist.patch: One liner patch posted in Bug 1116055 to return a buffered stream from mbox::getNewOutputStream by aceman The patch has been posted to the following bugzilla entry. Bug 1176857 - C-C TB Enabling buffering for pop3 download (and possibly part of imap file operation) After the latest review of my local patches and testing, I realized that some buffering operations in my patch may be unnecessary for some operations. But I leave them as such. This patch from aceman could fall under that category, too. We may want to review the operation of the program more carefully to see if it is indeed unnecessary for some operations, but it does not hurt to have this buffering function for now. Lack of internal documentation of low-level protocol handling, etc. hurts. [9] (NEW in 2016) Bug 1242046: Removing unnecessary |Seek| that caused the C-C TB to operate slowly in terms of I/O removing-a-Seek-rev02.patch: removing a Seek that is performance pig! This was extracted into a new patch from the patches above [5]-[8]. Even if buffering is properly enabled, one offending |Seek()| needs to be disabled so that the buffering is effective for writing messages to a local file. Otherwise, this |Seek()| interferes with buffering, and essentially we call system |write| for EACH LINE of message, thus lowering output performance significantly. Killing this |Seek| did the trick. We now call ONE |Seek| per message, NOT one |Seek| per ONE LINE of message! This patch requires the landing of [1]-[8] before it. [10] New in 2016 1242050 - C-C TB: use a default output stream buffer size of 16KB instead of 4KB buffer-by-16K-by-default.patch: output buffer by 16KB by default I created a patch to make the default buffering to 16KB for message copying. In this day and age, we can use 16KB for output buffering without worrying about memory exhaustion. (Already, mork uses 32KB buffering.) This should help when a local LAN uses a jumbo packet, too. This patch is theoretically independent of any of the other patches here. But in practice, the buffering is not quite effective without the patches in [1]-[9] above. [11] Bug 1170606 add-short-read-handling.patch: Bug 1170606 - C-C T-B fixes to take care of short read This patch set *could* be theoretically independent of [1]-[9], and [10]. But my patch application order assumes the changes in [1]-[9] to have landed first. (The patches in bug 1170606 may not apply cleanly without [1]-[9] in advance. The files and lines touched overlap significantly. ) Also, note that some missing I/O error checks, i.e., missing from the early patches in [1]-[9] above, have been included in this patch set during more than 12 months of testing since I started working on the general issue of ineffective output buffering that slows down C-C TB. Simply stated, the current code sucks so badly regarding the failure to check the return code of I/O and low-level routines that creating a neat set of patches is IMPOSSIBLE. You read a dozen lines of code and realize the failure to test some low-level functions. This happens almost always when I check certain part of files :-( This situation is pathetic, but has to be dealt with to have a "rock solid mail client (tm)" :-) [But seriously, after analyzing the code in detail, my hope is very distant now.] The patch in bug 1170606 has been split into smaller patch chunks that address files in one directory or two at a time. add-short-read-handling-base.patch add-short-read-handling-IMPORT-directory.patch add-short-read-handling-BASE-directory.patch add-short-read-handling-COMPOSE-directory-part-1.patch add-short-read-handling-IMAP-directory.patch add-short-read-handling-LOCAL-directory.patch add-short-read-handling-MIME-directory.patch add-short-read-handling-OTHER-directory.patch There are some additional short-read issues UNDER M-C portion of the source tree. They can be applied independent of this series and I am not mentioning them here. TIA
Neil, do I need another review because of the bit rot, etc.?
Flags: needinfo?(neil)
No you do not need a new review if you haven't changed the logic of the patch, just shift some some lines due to bitrot. But if the precise placement of some lines in the patch is actually the logic of the patch, then you can try another review. Thanks for looking into this and keeping working on the issue for so long. My question to Neil/rkent would actually be whether we want these potential dataloss-causing patches in trunk NOW (when we are polishing TB45), or we wait some weeks until TB45 is out and we can focus on improvements and testing on trunk.
> Once successful x86 DEBUG build is as follows: > https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=b22bd125725f This has all the patches, correct?
Flags: needinfo?(ishikawa)
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #105) > > Once successful x86 DEBUG build is as follows: > > https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=b22bd125725f > > This has all the patches, correct? Yes, it does. There have been a few changes in C-C TB tree that affected my patches and so I have been updating the patches very slightly, and am trying to make sure C-C TB complies for the last couple of weeks. Unfortunately, C-C TB built is totally broken due to Bug 1243760 - Replace nsPIDOMWindow with nsPIDOMWindowInner/Outer in C-C due to bug 1241764 Since midnight of Jan 31 (JST), and so I can't show you a clean build right now. TIA
Flags: needinfo?(ishikawa)
(I thought I had already posted this, but it did not seem to go out somehow. So here it goes.) The previous version of this memo was posted to https://bugzilla.mozilla.org/show_bug.cgi?id=1116055#c101 on Jan 23, and then I got boggled down to fix Windows ONLY errors on try-comm-server and could not follow up adequately. Finally the strange errors observed only for Windows build on try-comm-central have been fixed. So here is the updated road-map for this and other patches. (Feb 2016 version) ======================================== Updated List of patches and rough road map. ======================================== The following is the updated list of patches in Feb 2016 (with exact bugzilla #, etc.) regarding the enabling of buffering to make message writing faster in C-C TB and related necessary error checking of low-level I/O routines. On top of that, a measure to handle short-read that may be introduced by read against remote file system is introduced. First and foremost, I don't claim my patch sets solve all the related I/O issues related to faster output operation with the use of buffering. It still lacks proper error recovery when network errors occur in some places. (The primitive case of failure to download message from remote mail server is more or less better handled than before. At least, there are more error checks, and should be MUCH FASTER when your e-mail messages contain large attachments.) At least, my patch set is a *GOOD* IMPROVEMENT over the current situation. Also, my patch makes it clear where proper error detection/handling is missing. The current code does not check many errors which I could make happen by running C-C TB with the message store on a remote network file system and simulating network outages by pulling network cables. Anyway, I think my patch set brings much faster operation when you have a lot of messages with large attachments (in a typical office environment, for example, that need to produce reports to external clients) and especially when the mail is stored on a remote computer. My patches for short-read tries to address the issue of network file read failures. But do note (explained in bug https://bugzilla.mozilla.org/show_bug.cgi?id=1170564#c1 ) that there are some libraries under M-C trees that need to be upgraded to make TB fully prepared with user profile on remote server: NSS for one. Basically, these libraries must cope with short-read and potential TRANSIENT I/O errors including read/write/seek/tell/close and TB may need to be tweaked to handle such issues in a user-friendly manner eventually. ======================================== PLANNED ORDER of landing patches ======================================== bug 1116055 -> bug 1242030 (consolidation of 1122698, 1134527, 1134529, 1174500) -> bug 1242042 -> bug 1176857 -> bug 1242046 -> bug 1242050 -> bug 1170606 ======================================== An overview of patches in bugzilla entries. ---------------------------------------- [In February 2016, I fixed the errors observed ONLY UNDER WINDOWS when we use Maildir format for storage. Thus I needed to tweak my patch set somewhat and added a patch to take care of this issue and so I updated this write-up.] [1] Bug 1116055 (the patch in this bug needs updating due to bit rot.) check_fileptr_change.patch: Bug 1116055: Check the unexpected change of file seek position and ask the user to report it This was considered a necessary step so that we can know that if any user experiences a strange seek position change under the hood, so to speak. The single most conspicuous culprit of slow down of I/O was a |Seek| was called at the writing of EACH LINE of message. It is hard to believe, but it is real. This kills buffering. So the first thing to do was to investigate WHY this |Seek| was necessary. NOTE: The patch has already been given r+ before, and I have tested the operation of |Seek| and figured out the subtle issue of some functions to create buffered output stream resetting the file's current position to the beginning without explicitly mentioning it. That, I think, was the cause of message corruption when buffering was tried a few years ago. It is fixed in my patch series below. However, a bit rot crept in due to other patches between the time the patch was given r+ and now. So I needed to updated it. I think I may need to ask a review of the updated version. [2] Bug 1242030 - Consolidated patch set from bug 1122698, bug 1134527, bug 1134529, bug 1174500 The |Seek| before writing each line was introduced, I think, when function to enable buffering inadvertently resets file position to the beginning. This may be a natural behavior of such functions, but I think it was not documented well and people were caught by surprise. I suspect that the excessive use of |Seek| was an attempt to put a band-aid over this issue, and when the attempt was made to enable buffering, a subtle issue of rewinding the file position seemed to cause file corruption. (I experienced such corruption myself, but thanks to the patch in [1], I could learn the resetting of file position occurred, which was traced to a function to enable buffering, and so I could fix the issue.) I have merged the four patches from bug 1122698, 1134527, 1134529, 1174500 mentioned below, into one patch set in bug 1242030, consisting of small patches to files under a directory each. Each patch is smaller than the whole gigantic consolidated patch since it now only address changes under a particular directory or two. The reason for consolidation is that patch as a whole is much easier to understand this way. (Error checks are uniformly visible in the consolidated patch whereas in the former separate patches, low-level I/O functions whose return values were not checked in the earlier patch are checked later. This makes the reader of the earlier patch sometimes wonder why we don't check the error status of a particular low-level I/O function when, in fact, such a function is checked in a later patch. Consolidated patch avoids this issue. To enhance error recovery processing, I had to change the semantics of a couple of functions: - |discardNewMessage| does not close aOutputStream. (Clarification/Rationale in Bug 1121842, 1122698) - |finishdNewMessage| does not close aOutputStream. (Clarification/Rationale in Bug 1121842, 1122698) (Feb 2016) EXCEPT for the version of DiscardNewMessage and FinishNewMessage for Maildir storage format. The original analysis and reasoning is given in Bug 1121842 in a verbose comment a la stream of consciousness monologue. This change required the modifications to files under mailnews/import directory, too. The first patch in the following list addresses this need.: Nulling-filestream-after-close-under-IMPORT-directory.patch *** begin Feb 2016 Feb **** Additional comment about this design But I had to change one thing in Feb 2016 to fix errors only observed under Windows on try-comm-central. (Feb 2016) I said, we don't close file stream EXCEPT for the version of DiscardNewMessage and FinishNewMessage for Maildir storage format. DiscardNewMessage and FinishNewMessage() DO close the stream in that case. This has something to do with renaming/removing a file cannot be done under Windows if an open file descriptor exists for that file. This was the cause of the failures of WINDOWS BUILD on try-comm-central. (Under linux or OSX, it is possible to do so.) My patch set was created originally under linux. However, my desire, explained in bug 1121842, to keep handling the Close() of the file stream to the newly downloaded/copied/moved message file (or mbox, etc.) done on the caller-side of the Finish/DiscardNewMessage() so that yet-to-be-devised better error-handling could be done in the logically higher-level of routines necessitated adding a third argument to the pair of functions to return the indicator whether these functions closed the passed file stream (as the 1st argument) before returning. By looking at the indicator, the callers can decide whether to close the filestream and check the error code on their own. (With the change, Windows DEBUG build on try-comm-central ran without a hitch!) The addition of third argument to Finish/DiscardNewMessage() necessitated an additional patch to modify the code under mailnews/import to import messages from OTHER e-mail clients such as OE5, etc. **** end Feb 2016 addition. Consolidated patch set in bug Bug 1242030 are split into the following smaller patches. Nulling-filestream-after-close-under-IMPORT-directory.patch fix-close-MERGED-base-dir.patch fix-close-MERGED-imap-dir.patch fix-close-MERGED-local-dir.patch (The next one is an addition in Feb 2016) *fix-close-MERGED-add-closedflag-IMPORT-directory.patch and another patch that is explained later: enable-buffering-1st-step.patch (explained later in a different bugzilla, Bug 1242042.) [3] (Now merged in Bug 1242030) Bug 1122698 fix-close-step-1.patch: Bug 1122698 Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends (part 1 of a series) [4] (Now merged in Bug 1242030) Bug 1134527 fix-close-part-2.patch: Bug 1134527 Add error value checking of Close() and Flush() (part-2 of a series) [5] (Now merged in Bug 1242030) Bug 1134529 fix-close-part-3-write-error-check.patch: Bug 1134529: Check Write failures (part-3 of a series) [6] (Now merged in Bug 1242030, but then the essential part of bug 1174500 [was mistyped as 1117450 in a few bugzilla entries. :-( ] removing a |Seek| was spun off as a separate patch: Bug 1242046 ) removing-a-Seek.patch: C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (step 4) [6.5] (New in Feb 2016) An addition to 1242030 in Feb. fix-close-MERGED-add-closedflag-IMPORT-directory.patch This patch takes care of the modification of files under mailnews/import directory to take care of the addition of the third argument to |FinishNewMessage| and |DiscardNewMessage|. [7] (New in 2016 ) Bug 1242042 - Enabling buffering for file stream to write message for C-C TB A enable-buffering-1st-step.patch: Enabling buffering first step. I have extracted the code to enable buffering to be applied after the additional error checks in the preceding patches [1]-[6]. After a couple of months of testing of the final patches, I realize a certain buffering operations that were in the patch are actually unnecessary, and so commented them out. (We can enable them back again if my understanding of the code flow is incorrect. There are many combination of execution paths, and I may misunderstood the nature of certain paths. My gripe about slow downloading of messages and saving of large attachment *ARE* handled properly by buffering now.) I have created the separate patch to enable buffering since it will be necessary to apply this patch ONLY AFTER previous patches to check for primitive I/O errors have landed. (This is because enabling the buffering requires proper error checking of |Close()| and |Flush()|. |Close| was never checked properly in the existing code. These are only taken care of AFTER patches [1]-[6] above are applied. BTW, I have not had the time to fix this issue of missing error checks of |Close()| in files under mailnews/import directory completely yet. If you don't believe me, read the source! Fixing this issue would be a good student project. [8] Bug 1176857 1116055-acelist.patch: One liner patch posted in Bug 1116055 to return a buffered stream from mbox::getNewOutputStream by aceman The patch has been posted to the following bugzilla entry. Bug 1176857 - C-C TB Enabling buffering for pop3 download (and possibly part of imap file operation) After the latest review of my local patches and testing, I realized that some buffering operations in my patch may be unnecessary for some operations. But I leave them as such. This patch from aceman could fall under that category, too. We may want to review the operation of the program more carefully to see if it is indeed unnecessary for some operations, but it does not hurt to have this buffering function for now. Lack of internal documentation of low-level protocol handling, etc. hurts. [9] (NEW in 2016) Bug 1242046: Removing unnecessary |Seek| that caused the C-C TB to operate slowly in terms of I/O removing-a-Seek-rev02.patch: removing a Seek that is performance pig! This was extracted into a new patch from the patches above [5]-[8]. Even if buffering is properly enabled, one offending |Seek()| needs to be disabled so that the buffering is effective for writing messages to a local file. Otherwise, this |Seek()| interferes with buffering, and essentially we call system |write| for EACH LINE of message, thus lowering output performance significantly. Killing this |Seek| did the trick. We now call ONE |Seek| per message, NOT one |Seek| per ONE LINE of message! This patch requires the landing of [1]-[8] before it. [10] New in 2016 1242050 - C-C TB: use a default output stream buffer size of 16KB instead of 4KB buffer-by-16K-by-default.patch: output buffer by 16KB by default I created a patch to make the default buffering to 16KB for message copying. In this day and age, we can use 16KB for output buffering without worrying about memory exhaustion. (Already, mork uses 32KB buffering.) This should help when a local LAN uses a jumbo packet, too. This patch is theoretically independent of any of the other patches here. But in practice, the buffering is not effective at all without the patches in [1]-[9] above. [11] Bug 1170606 add-short-read-handling.patch: Bug 1170606 - C-C T-B fixes to take care of short read This patch set *could* be theoretically independent of [1]-[9], and [10]. But my patch application order assumes the changes in [1]-[9] to have landed first. (The patches in bug 1170606 may not apply cleanly without [1]-[9] in advance. The files and lines touched overlap significantly. ) Also, note that some missing I/O error checks, i.e., missing from the early patches in [1]-[9] above, have been included in this patch set during more than 12 months of testing since I started working on the general issue of ineffective output buffering that slows down C-C TB. Simply stated, the current code sucks so badly regarding the failure to check the return code of I/O and low-level routines that creating a neat set of patches is IMPOSSIBLE. You read a dozen lines of code and realize it fails to test return values of some low-level functions. This happens almost always when I check certain part of files in detail:-( This situation is pathetic, but has to be dealt with to have a "rock solid mail client (tm)" :-) [But seriously, after analyzing the code in detail for the last 12 months, my hope is very distant now.] The patch in bug 1170606 has been split into smaller patch chunks that address files in one directory or two at a time. add-short-read-handling-base.patch add-short-read-handling-IMPORT-directory.patch add-short-read-handling-BASE-directory.patch add-short-read-handling-COMPOSE-directory-part-1.patch add-short-read-handling-IMAP-directory.patch add-short-read-handling-LOCAL-directory.patch add-short-read-handling-MIME-directory.patch add-short-read-handling-OTHER-directory.patch There are some additional short-read issues UNDER M-C portion of the source tree. They can be applied independent of this series and I am not mentioning them here. *** additional comment: I made an incorrect short-read patch for M-C which rendered OSX binary useless, and it was backed out. Bug 1170646 - A few M-C fixes to handle short read in Cache code ( from [META] Failure to deal with short read) I have no idea why linux version did not suffer from a similar problem. But even under OSX, only a version of SDK seemed to suffer from the issue and not another version. I may revisit it in a few weeks now that OSX debug build succeeds on try-comm-server (back then I could not, and did not assume OSX version of PR_Read() behaved slightly differently from linux's.) My understanding of nsIInputStream Read() semantics at least is better now. Read() returns 0=NS_OK when it encounters EOF without reading an octet whereas typically read() returns EOF under POSIX when it tries to read it. There may be another subtle semantic issue regarding PR_Read() that was handled in bug 1170646, and I may have introduced a subtle mix-up here, or that the OSX SDK have been updated. Here again my confusion and subsequent debugging were not wasted. I found a few interesting things. There seems to be a few places where EOF processing is not quite correct. At least inefficient. I will upload the current patches shortly (the version that have been tested on try-comm-central, not the version from which extra print statements are removed.) TIA
Updated patch to address recent bitrot.
Attachment #8711179 - Attachment is obsolete: true
Attachment #8721154 - Attachment description: Updated patch to address bitrot → Updated patch to address bitrot (Feb 18 version)
Blocks: 1244456
I did a few tests with build https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=35b902d1613b today. In general the test is looking very good so far. Test setup: Server=[Hitachi NAS Platform 12.4.3924.14] /proc/mounts: rw,relatime,vers=1.0,sec=ntlmssp,cache=strict,username=xxx.qmux,domain=DOMAIN,uid=10000,forceuid,gid=500,forcegid,addr=10.160.160.160,file_mode=0700,dir_mode=0700,nocase,nounix,nobrl,rsize=16384,wsize=65216,actimeo=1 Kernel: Linux 3.13.0-79-generic #123-Ubuntu SMP Fri Feb 19 14:28:32 UTC 2016 i686 i686 i686 GNU/Linux Task: Move 441 mails (size: 66MB) from one "local" folder to another - both of them on the same network share mounted via cifs. Results: For comparison of the results I've used the "opt" builds of Thunderbird 24.8.1, Thunderbird 47.0a1 (Nighly) and Thunderbird 47.0a1 (try build with your patches). Thunderbird 24.8.1: 8s Thunderbird 47.0a1 (daily): 10s Thunderbird 47.0a1 (try): 15s First of all: It is notable that the overall performance for the ideal case has decreased over time. However this changes dramatically when other applications (like the file manager, antivirus software or even Thunderbird itself) access the file at the same time: All versions except the try-build hit massive performance penalties, probably due to loosing the exclusive lock on the file. As a consequence it took over 10 Minutes to transfer the mails with these versions. The try-build version seems to be immune to this: Except for the reduced bandwidth due to the parallel access Thunderbird was unimpressed and continued writing the file with high speed. According to Wireshark the file is opened and closed several times per minute now - once for each mail. However trying to interrupt the transfer of a large mail didn't succeed, Thunderbird continued the transfer with high speed. Thunderbird once locked up for me, waiting for file I/O after trying to compact one of the folders. I wasn't able to reproduce this yet though. I will (hopefully be able to) continue testing next week, this time with the debug builds. If you are still interested I will also attach the strace output.
(In reply to Ignaz Forster from comment #109) > I did a few tests with build > https://treeherder.mozilla.org/#/jobs?repo=try-comm- > central&revision=35b902d1613b today. In general the test is looking very > good so far. > > Test setup: > Server=[Hitachi NAS Platform 12.4.3924.14] > /proc/mounts: > rw,relatime,vers=1.0,sec=ntlmssp,cache=strict,username=xxx.qmux, > domain=DOMAIN,uid=10000,forceuid,gid=500,forcegid,addr=10.160.160.160, > file_mode=0700,dir_mode=0700,nocase,nounix,nobrl,rsize=16384,wsize=65216, > actimeo=1 > Kernel: Linux 3.13.0-79-generic #123-Ubuntu SMP Fri Feb 19 14:28:32 UTC 2016 > i686 i686 i686 GNU/Linux > > Task: Move 441 mails (size: 66MB) from one "local" folder to another - both > of them on the same network share mounted via cifs. > > Results: > For comparison of the results I've used the "opt" builds of Thunderbird > 24.8.1, Thunderbird 47.0a1 (Nighly) and Thunderbird 47.0a1 (try build with > your patches). > > Thunderbird 24.8.1: 8s > Thunderbird 47.0a1 (daily): 10s > Thunderbird 47.0a1 (try): 15s > > First of all: It is notable that the overall performance for the ideal case > has decreased over time. However this changes dramatically when other > applications (like the file manager, antivirus software or even Thunderbird > itself) access the file at the same time: All versions except the try-build > hit massive performance penalties, probably due to loosing the exclusive > lock on the file. As a consequence it took over 10 Minutes to transfer the > mails with these versions. > The try-build version seems to be immune to this: Except for the reduced > bandwidth due to the parallel access Thunderbird was unimpressed and > continued writing the file with high speed. > > According to Wireshark the file is opened and closed several times per > minute now - once for each mail. However trying to interrupt the transfer of > a large mail didn't succeed, Thunderbird continued the transfer with high > speed. > > Thunderbird once locked up for me, waiting for file I/O after trying to > compact one of the folders. I wasn't able to reproduce this yet though. > > > I will (hopefully be able to) continue testing next week, this time with the > debug builds. If you are still interested I will also attach the strace > output. (I already exchanged an e-mail with Ignaz over the weekend when I did not have a web access. This is a formal public thank you note!). Thank you for the very encouraging benchmarking information. It motivates me to move forward with the patches for the NEXT (not the immediate one, but the next one) release cycle. I would appreciate more benchmark information such as system call traces because there may be a few more things I may have missed. This file lock of CIFS is something I may have overlooked (but there is nothing C-C TB can do except for issuing far less system write calls. The chances of C-C TB getting off the CPU due to context switch of many WRITE/READ system calls obviously becomes much higher when there are so many system calls. This overhead alone is rather painful when there are competing programs. Again, thank you for benchmarking info. I will try hard to push the parches into C-C tree. I hope others who have experienced similar problems with LAN would report performance result similarly. TIA
First of all a correction of my last post: The total size of the actual mails was only 13.7 MB - I forgot to compress the folders before checking the size of the mbox file... For this week's test setup I changed to our test network, providing two CIFS capable servers: An old Netapp filer and a Samba server. The above strace files were taken from the Samba server, as I didn't find a way to get system calls from the Netapp filer. The results would have been interesting though... For comparing of the results I've used the "opt" builds of Thunderbird 24.8.1, Thunderbird 47.0a1 (Nighly) and Thunderbird 47.0a1 (try build with your patches) again. Test setup 1: ------------- Server=NetApp Release 8.1.3P3 7-Mode Results: Thunderbird 24.8.1: 80s Thunderbird 47.0a1 (daily): 70s Thunderbird 47.0a1 (try): 44s Test setup 2: ------------- Server=Samba 3.6.22 Results: Thunderbird 24.8.1: 5s Thunderbird 47.0a1 (daily): 7s Thunderbird 47.0a1 (try): 7 - 30s (I wasn't able to find out why causes the variations) As can be seen from the strace logs the actual number of system calls has increased significantly - which would explain the longer time for fast connections. However for slow connections the time has decreased. When accessing the files with another application during transfer the try-build continued without speed loss, while the other builds got significantly slower again (e.g. 115s for the Samba setup).
(In reply to Ignaz Forster from comment #113) > First of all a correction of my last post: The total size of the actual > mails was only 13.7 MB - I forgot to compress the folders before checking > the size of the mbox file... > > For this week's test setup I changed to our test network, providing two CIFS > capable servers: An old Netapp filer and a Samba server. > The above strace files were taken from the Samba server, as I didn't find a > way to get system calls from the Netapp filer. The results would have been > interesting though... > > For comparing of the results I've used the "opt" builds of Thunderbird > 24.8.1, Thunderbird 47.0a1 (Nighly) and Thunderbird 47.0a1 (try build with > your patches) again. > > Test setup 1: > ------------- > Server=NetApp Release 8.1.3P3 7-Mode > > Results: > Thunderbird 24.8.1: 80s > Thunderbird 47.0a1 (daily): 70s > Thunderbird 47.0a1 (try): 44s > > > Test setup 2: > ------------- > Server=Samba 3.6.22 > > Results: > Thunderbird 24.8.1: 5s > Thunderbird 47.0a1 (daily): 7s > Thunderbird 47.0a1 (try): 7 - 30s (I wasn't able to find out why causes > the variations) > > > As can be seen from the strace logs the actual number of system calls has > increased significantly - which would explain the longer time for fast > connections. However for slow connections the time has decreased. When > accessing the files with another application during transfer the try-build > continued without speed loss, while the other builds got significantly > slower again (e.g. 115s for the Samba setup). Thank you for posting the strace data. I don't understand the variation on Samba server, but I will try to digest the strace over the next few days. Thank you so much! BTW, I read this thread all over again myself and there are two issues concerning the network remote usage. 1. Slow write This is being fixed by the buffering patch of mine). 2. Potentially broken read on a very busy server/LAN (linux-only) noted by Eric. This is hopefully being fixed mostly by my patch for Bug 1170606 add-short-read-handling.patch: Bug 1170606 - C-C T-B fixes to take care of short read (Windows seem to handle short read/retry, etc. under the hood (i.e., within the system call. Linux doesn't. Thus the need for explicit check for short read.) However, note that there *ARE* a few libraries under ./mozilla directory (M-C) portion that needs updating to cope with this. (For example, NSS, which is being worked on, I think.) In any case, I am glad to see the performance go up very much, and this should be a good news for SOHO or small office environment (20-30) users, I suppose. TIA
Hi, Are there any news/patch regarding this bug? I think our 1400 users are facing this issue: We have version 45.0 deployed and since at least version 31.0.5 moving e-mails from IMAP inbox to local folders (which are located on a Novell network drive mapped with NCP) takes an eternity. If we move more e-mails at once it even happens that Thunderbird hangs completely and the e-mails get lost! My test with a mail with a 6 MB attachement: move from Inbox to Local Folders (NCP) -> 1min 20s. from Local Folders back to Inbox (NCP) -> 2s. ! move from Inbox to Local Folders (CIFS) -> 20s. Using outlook with the same configurations takes as expected only 2 seconds to move the 6 MB from inbox to local folders. Also other data transfers work normally. Moving the local folders locally to C:\ also works fine. With Wireshark, it looks like thunderbird (with local folders on a Novell drive) only sends packages of ~40 Bytes to the Novell server. regards
(In reply to yr from comment #115) > Hi, > Are there any news/patch regarding this bug? > > I think our 1400 users are facing this issue: > We have version 45.0 deployed and since at least version 31.0.5 moving > e-mails from IMAP inbox to local folders > (which are located on a Novell network drive mapped with NCP) takes an > eternity. If we move more e-mails at once it even happens > that Thunderbird hangs completely and the e-mails get lost! > My test with a mail with a 6 MB attachement: > move from Inbox to Local Folders (NCP) -> 1min 20s. > from Local Folders back to Inbox (NCP) -> 2s. ! > move from Inbox to Local Folders (CIFS) -> 20s. > Using outlook with the same configurations takes as expected only 2 seconds > to move the 6 MB from inbox to local folders. > Also other data transfers work normally. > Moving the local folders locally to C:\ also works fine. > > With Wireshark, it looks like thunderbird (with local folders on a Novell > drive) only sends packages of ~40 Bytes to the Novell server. > > regards Is your client linux-based? Oh well, you mention that "local folders locally to C:\" meaning that you are likely to use Windows clients. Windows OS have a built-in API to move files using most efficient and error-resistant manner for Windows SHARE, and TB seems to use it when that API is usable. Thus for Windows client, CIFS aka Samba is known to work rather well. However, if the remote file system is not CIFS or your client is not Windows-based, we see this type of problems since TB uses hand-crafted copy routine that does not use buffered output always. In your case, NCP may not support the full CIFS/Samba semantics and thus TB under Window OS may be falling back to the hand-crafted version of copying files(?): this I am not sure. I will try to get the patch(es) accepted into C-C tree. In the meantime, can you possibly try the binary in http://archive.mozilla.org/pub/thunderbird/try-builds/ishikawa@yk.rim.or.jp-35b902d1613be8871f49ee82b7eef14d3210cd2a/try-comm-central-win32-debug/ to see if the binary with my patches improves the operation against NCP file mount? I have a nagging feeling there may be unknown cause for slowdown with NCP and running the build with my patches to see if there is an improvement of transfer speed answers my doubt. TIA
Thank you for the fast reply. I installed the thunderbird-47.0a1 you provided and the move of the 6MB sample mail takes now 13s :) not perfect but i think we could work with it. Let me know if you need more information or logs
(In reply to ISHIKAWA, Chiaki from comment #116) > I will try to get the patch(es) accepted into C-C tree. Yes, please now is the right time in the developement cycle to land risky stuff like this. Please set up review requests in bugs/patches that are ready. Just do not set the request on Neil or Irving, they are currently away. Try rkent. I think you can leave the debugging output to warn for the case when you have skipped the seek. I think that problem is still open we do not know if the seek can be removed altogether.
(In reply to yr from comment #117) > Thank you for the fast reply. I installed the thunderbird-47.0a1 you > provided and the move of the 6MB sample mail takes now 13s :) not perfect > but i think we could work with it. Let me know if you need more information > or logs That sounds good. Even NCP can benefit from the patches I created, then. I think this is good enough at the moment. (I will think hard if I can log some pertinent information in the future regarding the buffer size used by system calls, but it seems to me it is always easier to use the external tools to check for the buffer size on the wire, etc.) (In reply to :aceman from comment #118) > (In reply to ISHIKAWA, Chiaki from comment #116) > > I will try to get the patch(es) accepted into C-C tree. > > Yes, please now is the right time in the developement cycle to land risky > stuff like this. > Please set up review requests in bugs/patches that are ready. Just do not > set the request on Neil or Irving, they are currently away. Try rkent. > I think you can leave the debugging output to warn for the case when you > have skipped the seek. I think that problem is still open we do not know if > the seek can be removed altogether. Will do. The daytime work brings a lot of visitors to the office and I have been overwhelmed by the visitors at this moment, but I will as you suggest. TIA
Is it possible to provide us a patch we could deploy on our version 45.0 so we have suitable workaround for our situation?
I am uploading the latest patch that has been tested and used locally on my PC: This patch ought to be followed by the following patches. Patches marked with "*" are new updated uploads to cope with bit rot. (June 9th, 2016) *[1] Bug 1116055 (THIS BUGZILLA's patch) check_fileptr_change.patch: Bug 1116055: Check the unexpected change of file seek position and ask the user to report it [2] Bug 1242030 - Consolidated patch set from bug 1122698, bug 1134527, bug 1134529, bug 1174500 Consolidated patch set in bug Bug 1242030 are split into the following smaller patches. Nulling-filestream-after-close-under-IMPORT-directory.patch fix-close-MERGED-base-dir.patch fix-close-MERGED-imap-dir.patch fix-close-MERGED-local-dir.patch *fix-close-MERGED-add-closedflag-IMPORT-directory.patch [7] (New in 2016 ) Bug 1242042 - Enabling buffering for file stream to write message for C-C TB A enable-buffering-1st-step.patch: Enabling buffering first step. [8] Bug 1176857 1116055-acelist.patch: One liner patch posted in Bug 1116055 to return a buffered stream from mbox::getNewOutputStream by aceman [9] (NEW in 2016) Bug 1242046: Removing unnecessary |Seek| that caused the C-C TB to operate slowly in terms of I/O removing-a-Seek-rev02.patch: removing a Seek that is performance pig! This was extracted into a new patch from the patches above [5]-[8]. [10] New in 2016 1242050 - C-C TB: use a default output stream buffer size of 16KB instead of 4KB buffer-by-16K-by-default.patch: output buffer by 16KB by default
Attachment #8721154 - Attachment is obsolete: true
I forgot: Please check the following tryserver job for testing result, etc. There are no new errors aside from other errors noticed by others. https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=2a082a9b933239bb49f22bab3f8bf6b445326127
Blocks: 760859
Comment on attachment 8761265 [details] [diff] [review] Updated patch to address bit rot (May 27th version) Review of attachment 8761265 [details] [diff] [review]: ----------------------------------------------------------------- Is this the same patch that was already reviewed by Neil? If so, we could finish it now. ::: mailnews/base/util/nsMsgUtils.cpp @@ +95,5 @@ > + const nsAString &aFilename, > + uint32_t aLinenumber, > + uint32_t flags) > +{ > +// adapted from nsContentUtils::LogSmpleConsoleError Typo, LogSimpleConsoleError. ::: mailnews/local/src/nsPop3Sink.cpp @@ +534,5 @@ > > if (closure) > *closure = (void*) this; > > + /* May 1st. 2015: Something in the file handling changed and we Please prefix these comment blocks with "Bug 1116055". @@ +702,5 @@ > return WriteLineToMailbox(m_outputBuffer); > } > > +/* #define DISABLE_SEEK_POS_CHECK */ > +/* #define DISABLE_SEEK 1 */ Please set up these variables or the ifdefs below so that we only report the seek() problems, but never disable the seek for now. @@ +787,5 @@ > + So we switched to stderr printing during debugging... > + */ > +#if 0 > + MSG_LOG_INFO_TO_CONSOLE("We have disabled a |Seek()| in WriteLineToMailbox. cf. https://bugzilla.mozilla.org/show_bug.cgi?id=1116055"); > +#else Is this 'if 0' block needed? @@ +788,5 @@ > + */ > +#if 0 > + MSG_LOG_INFO_TO_CONSOLE("We have disabled a |Seek()| in WriteLineToMailbox. cf. https://bugzilla.mozilla.org/show_bug.cgi?id=1116055"); > +#else > + fprintf(stderr, "We have disabled a |Seek()| in WriteLineToMailbox. cf. https://bugzilla.mozilla.org/show_bug.cgi?id=1116055"); Can we point the users to a new bug where we collect the reports? As this bug 1116055 is already crowded.
Blocks: 1242046
(In reply to :aceman from comment #123) > Comment on attachment 8761265 [details] [diff] [review] > Updated patch to address bit rot (May 27th version) > > Review of attachment 8761265 [details] [diff] [review]: > ----------------------------------------------------------------- > > Is this the same patch that was already reviewed by Neil? If so, we could > finish it now. > > ::: mailnews/base/util/nsMsgUtils.cpp > @@ +95,5 @@ > > + const nsAString &aFilename, > > + uint32_t aLinenumber, > > + uint32_t flags) > > +{ > > +// adapted from nsContentUtils::LogSmpleConsoleError > > Typo, LogSimpleConsoleError. > > ::: mailnews/local/src/nsPop3Sink.cpp > @@ +534,5 @@ > > > > if (closure) > > *closure = (void*) this; > > > > + /* May 1st. 2015: Something in the file handling changed and we > > Please prefix these comment blocks with "Bug 1116055". > > @@ +702,5 @@ > > return WriteLineToMailbox(m_outputBuffer); > > } > > > > +/* #define DISABLE_SEEK_POS_CHECK */ > > +/* #define DISABLE_SEEK 1 */ > > Please set up these variables or the ifdefs below so that we only report the > seek() problems, but never disable the seek for now. > > @@ +787,5 @@ > > + So we switched to stderr printing during debugging... > > + */ > > +#if 0 > > + MSG_LOG_INFO_TO_CONSOLE("We have disabled a |Seek()| in WriteLineToMailbox. cf. https://bugzilla.mozilla.org/show_bug.cgi?id=1116055"); > > +#else > > Is this 'if 0' block needed? > > @@ +788,5 @@ > > + */ > > +#if 0 > > + MSG_LOG_INFO_TO_CONSOLE("We have disabled a |Seek()| in WriteLineToMailbox. cf. https://bugzilla.mozilla.org/show_bug.cgi?id=1116055"); > > +#else > > + fprintf(stderr, "We have disabled a |Seek()| in WriteLineToMailbox. cf. https://bugzilla.mozilla.org/show_bug.cgi?id=1116055"); > > Can we point the users to a new bug where we collect the reports? As this > bug 1116055 is already crowded. aceman, Thank you for getting the review process started. I will be on a three day business trip starting tomorrow (well, actually it is already past midnight in Japan). So I will come back to this patch and related issues later next week, hopefully Wednesday night or Thursday. TIA
Blocks: 1306914
Yes, this is the patch that was originally given r+ by Neal, and updated to take care of bitrot since that time, etc. (My PC at home went through a very bad hardware problem in the summer and early fall of 2015 and I could not follow it up quickly enough.) (In reply to :aceman from comment #123) > Comment on attachment 8761265 [details] [diff] [review] > Updated patch to address bit rot (May 27th version) > > Review of attachment 8761265 [details] [diff] [review]: > ----------------------------------------------------------------- > > Is this the same patch that was already reviewed by Neil? If so, we could > finish it now. Yes, let's do it. > > ::: mailnews/base/util/nsMsgUtils.cpp > @@ +95,5 @@ > > + const nsAString &aFilename, > > + uint32_t aLinenumber, > > + uint32_t flags) > > +{ > > +// adapted from nsContentUtils::LogSmpleConsoleError > > Typo, LogSimpleConsoleError. > fixed. > ::: mailnews/local/src/nsPop3Sink.cpp > @@ +534,5 @@ > > > > if (closure) > > *closure = (void*) this; > > > > + /* May 1st. 2015: Something in the file handling changed and we > > Please prefix these comment blocks with "Bug 1116055". I put "Bug 1116055" in a few places. > @@ +702,5 @@ > > return WriteLineToMailbox(m_outputBuffer); > > } > > > > +/* #define DISABLE_SEEK_POS_CHECK */ > > +/* #define DISABLE_SEEK 1 */ > > Please set up these variables or the ifdefs below so that we only report the > seek() problems, but never disable the seek for now. Maybe the use of defined() or !defined() was not quite intuitive. I changed that to read: // When you define the following macros to 1, // you disable the functions: // #define DISABLE_SEEK_POS_CHECK 1 // #define DISABLE_SEEK 1 > @@ +787,5 @@ > > + So we switched to stderr printing during debugging... > > + */ > > +#if 0 > > + MSG_LOG_INFO_TO_CONSOLE("We have disabled a |Seek()| in WriteLineToMailbox. cf. https://bugzilla.mozilla.org/show_bug.cgi?id=1116055"); > > +#else > > Is this 'if 0' block needed? Nope when I think about it. So it is removed with #else and #endif. > @@ +788,5 @@ > > + */ > > +#if 0 > > + MSG_LOG_INFO_TO_CONSOLE("We have disabled a |Seek()| in WriteLineToMailbox. cf. https://bugzilla.mozilla.org/show_bug.cgi?id=1116055"); > > +#else > > + fprintf(stderr, "We have disabled a |Seek()| in WriteLineToMailbox. cf. https://bugzilla.mozilla.org/show_bug.cgi?id=1116055"); > > Can we point the users to a new bug where we collect the reports? As this > bug 1116055 is already crowded. OK, I created Bug 1306914 - Report a file position issue: unnecessary |Seek()| is removed (re: Performance issue: Failure to use buffered write (comm-central thunderbird)) for the sole purpose of collecting the report of inconsistent file seek position, and reflect the bug entry number in the code. TIA
Attachment #8761265 - Attachment is obsolete: true
Attachment #8796887 - Flags: review+
Blocks: 1242030
It seems to me the needinfo for neil can be cleared per comment 108, no?
Flags: needinfo?(ishikawa)
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #126) > It seems to me the needinfo for neil can be cleared per comment 108, no? I was under the impression that Neal's review is no longer needed for taking care of bitrot ? But I think it is better to hear his opinion.
Flags: needinfo?(ishikawa)
Blocks: 939548
Comment on attachment 8796887 [details] [diff] [review] Check fileposition before and after a supposedly unnecessary seek() and report mismatch. (Take 7) modified based comments of aceman as of late carrying on r=neal Review of attachment 8796887 [details] [diff] [review]: ----------------------------------------------------------------- Thanks, this is getting ready. Hopefully last round of polishing. ::: mailnews/local/src/nsPop3Sink.cpp @@ +564,5 @@ > + > + if (first_pre_seek_pos != first_post_seek_pos) { > + /* This is to be expected for direct download into Inbox > + (only when buffering is enabled, it seems. The seek > + position seems to be rewound to the begining depending on Typo, beginning. @@ +567,5 @@ > + (only when buffering is enabled, it seems. The seek > + position seems to be rewound to the begining depending on > + the particular function we use for enabling the > + buffering! That makes sense. This may explain the > + corruption of messages when buffering was enabled? No, it One space missing. @@ +701,5 @@ > > return WriteLineToMailbox(m_outputBuffer); > } > > +// When you define the following macros to 1, Please add that these defines are for bug 1116055 and bug 1306914 only. @@ +711,5 @@ > nsresult nsPop3Sink::WriteLineToMailbox(const nsACString& buffer) > { > +#define HOWOFTEN 100000 > + > +#if DISABLE_SEEK != 1 Sorry, I was not specific enough. We use '#ifdef' and '#ifndef' instead of '#if defined' and '#if ! defined'. Please use those instead of checking the value of '1'. The change of name of the defines to DISABLE_ is fine, thanks. @@ +731,5 @@ > + // But now I need to add a |Seek()| before this routine gets > + // called in |IncorporateBegin()| in local testing. > + // This one-time Seek() becomes necessary when buffering is > + // eventually turned on. Turning on buffering by certain function > + // call rewinds the seek position to the beginnig! Typo, beginning. @@ +758,5 @@ > + nsresult rv3 = seekableOutStream->Tell(&after_seek_pos); > + NS_ASSERTION(NS_SUCCEEDED(rv3), "seekableOutStream->Tell(&before_seek_pos); failed."); > + > + if (NS_SUCCEEDED(rv2) && NS_SUCCEEDED(rv3)) { > + if( before_seek_pos != after_seek_pos ) { if (before_seek_pos != after_seek_pos) { @@ +772,5 @@ > + since_last_seek_report = 0; > + // printing message to console log using MSG_LOG_INFO_TO_CONSOLE(...) > + // causes such tests to be flagged as error during xpcshell test. And so > + // avoid that for now. > + NS_WARNING("Periodic check of unnecessary Seek() in WriteLineToMailbox() during BETA testing Please ignore."); Is this output needed to be landed in TB trunk? Or was it only for your experiments?
Attachment #8796887 - Flags: feedback+
I have addressed the previous comment in this patch. (In reply to :aceman from comment #128) > Comment on attachment 8796887 [details] [diff] [review] > Check fileposition before and after a supposedly unnecessary seek() and > report mismatch. (Take 7) modified based comments of aceman as of late > carrying on r=neal > > Review of attachment 8796887 [details] [diff] [review]: > ----------------------------------------------------------------- > > Thanks, this is getting ready. > Hopefully last round of polishing. Thank you again for your review. It has been helpful. > > ::: mailnews/local/src/nsPop3Sink.cpp > @@ +564,5 @@ > > + > > + if (first_pre_seek_pos != first_post_seek_pos) { > > + /* This is to be expected for direct download into Inbox > > + (only when buffering is enabled, it seems. The seek > > + position seems to be rewound to the begining depending on > > Typo, beginning. Fixed. > > @@ +567,5 @@ > > + (only when buffering is enabled, it seems. The seek > > + position seems to be rewound to the begining depending on > > + the particular function we use for enabling the > > + buffering! That makes sense. This may explain the > > + corruption of messages when buffering was enabled? No, it > > One space missing. Fixed. > > @@ +701,5 @@ > > > > return WriteLineToMailbox(m_outputBuffer); > > } > > > > +// When you define the following macros to 1, > > Please add that these defines are for bug 1116055 and bug 1306914 only. Added a comment. I took the liberty of adding a couple of lines to explain exactly what functions the macros disable. > > @@ +711,5 @@ > > nsresult nsPop3Sink::WriteLineToMailbox(const nsACString& buffer) > > { > > +#define HOWOFTEN 100000 > > + > > +#if DISABLE_SEEK != 1 > > Sorry, I was not specific enough. We use '#ifdef' and '#ifndef' instead of > '#if defined' and '#if ! defined'. Please use those instead of checking the > value of '1'. The change of name of the defines to DISABLE_ is fine, thanks. I see. Done/fixed. > @@ +731,5 @@ > > + // But now I need to add a |Seek()| before this routine gets > > + // called in |IncorporateBegin()| in local testing. > > + // This one-time Seek() becomes necessary when buffering is > > + // eventually turned on. Turning on buffering by certain function > > + // call rewinds the seek position to the beginnig! > > Typo, beginning. Fixed. > > @@ +758,5 @@ > > + nsresult rv3 = seekableOutStream->Tell(&after_seek_pos); > > + NS_ASSERTION(NS_SUCCEEDED(rv3), "seekableOutStream->Tell(&before_seek_pos); failed."); > > + > > + if (NS_SUCCEEDED(rv2) && NS_SUCCEEDED(rv3)) { > > + if( before_seek_pos != after_seek_pos ) { > > if (before_seek_pos != after_seek_pos) { > Fixed. > @@ +772,5 @@ > > + since_last_seek_report = 0; > > + // printing message to console log using MSG_LOG_INFO_TO_CONSOLE(...) > > + // causes such tests to be flagged as error during xpcshell test. And so > > + // avoid that for now. > > + NS_WARNING("Periodic check of unnecessary Seek() in WriteLineToMailbox() during BETA testing Please ignore."); > > Is this output needed to be landed in TB trunk? Or was it only for your > experiments? You may have already realized, but this has been meant to be printed during BETA period to make sure that the consistency check code for seek position is working. Sort of poor man's telemetry. Once the code is known to be working correctly, this can be disabled. TIA
Attachment #8796887 - Attachment is obsolete: true
Attachment #8797832 - Flags: review+
OK, but please change '#if DEBUG' to '#ifdef DEBUG' when landing this.
Status: NEW → ASSIGNED
Flags: needinfo?(neil)
Keywords: checkin-needed
This needs further cleaning up. Clearing checkin-needed for now.
Keywords: checkin-needed
(In reply to :aceman from comment #130) > OK, but please change '#if DEBUG' to '#ifdef DEBUG' when landing this. Ouch, I missed that. I will clean them up (and other patches that follow need to be cleaned up, too since they use the same construct and of course, there will be merge conflict.) Oh well. TIA
Don't worry. I'm just doing the clean-up now.
Which means Jorg will clean the patch in this bug. Chiaki, you can still do the changes in patches in other bugs. Thanks.
Clean-up: Removed comments which were just rambling on. Those comments don't help to understand the code. Removed further comments that tried to document where code had come from. Cleaned up white-space issues and cut and paste errors. This is NOT ready for checking since there are compile warnings on Windows: 0:11.78 Warning: C4477 in c:\mozilla-source\comm-central\mailnews\local\src\nsPop3Sink.cpp: 'fprintf' : format string '%08lx' requires an argument of type 'unsigned long', but variadic argument 1 has type 'int64_t' 0:11.78 c:/mozilla-source/comm-central/mailnews/local/src/nsPop3Sink.cpp(743): warning C4477: 'fprintf' : format string '%08lx' requires an argument of type 'unsigned long', but variadic argument 1 has type 'int64_t' 0:11.78 c:/mozilla-source/comm-central/mailnews/local/src/nsPop3Sink.cpp(743): note: consider using '%llx' in the format string 0:11.78 c:/mozilla-source/comm-central/mailnews/local/src/nsPop3Sink.cpp(743): note: consider using '%I64x' in the format string 0:11.78 Warning: C4477 in c:\mozilla-source\comm-central\mailnews\local\src\nsPop3Sink.cpp: 'fprintf' : format string '%08lx' requires an argument of type 'unsigned long', but variadic argument 2 has type 'int64_t' 0:11.79 c:/mozilla-source/comm-central/mailnews/local/src/nsPop3Sink.cpp(743): warning C4477: 'fprintf' : format string '%08lx' requires an argument of type 'unsigned long', but variadic argument 2 has type 'int64_t' 0:11.79 c:/mozilla-source/comm-central/mailnews/local/src/nsPop3Sink.cpp(743): note: consider using '%llx' in the format string 0:11.79 c:/mozilla-source/comm-central/mailnews/local/src/nsPop3Sink.cpp(743): note: consider using '%I64x' in the format string Printing long ints is tricky since the different platforms use different format strings, see attachment 8796209 [details] [diff] [review] for an example. Search for I64d. Stand-by for further review comments. Please fix the issues and answer the questions and then present again for review. BTW, Neil is called Neil and not Neal.
Attachment #8632789 - Attachment is obsolete: true
Attachment #8797832 - Attachment is obsolete: true
Comment on attachment 8798216 [details] [diff] [review] Check fileposition before and after a supposedly unnecessary seek() and report mismatch. Review of attachment 8798216 [details] [diff] [review]: ----------------------------------------------------------------- This needs another round. Also please check that my reworking of the patch hasn't broken it. ::: mailnews/local/src/nsPop3Sink.cpp @@ +548,5 @@ > + // Debugging, see bug 1116055. > + int64_t first_post_seek_pos; > + nsresult rv4 = seekableOutStream->Tell(&first_post_seek_pos); > + if (NS_SUCCEEDED(rv3) && NS_SUCCEEDED(rv4)) { > + fprintf(stderr,"(debug) first_pre_seek_pos = 0x%016llx, first_post_seek_pos=0x%016llx\n", Here you use 016llx. @@ +699,5 @@ > + } > + } > +#else > +#define HOWOFTEN 100000 > + static int since_last_seek_report = (HOWOFTEN - 2); Please explain the logic behind this. @@ +739,5 @@ > + // This merits a console message. > + MSG_LOG_ERR_TO_CONSOLE("WriteLineToMailbox detected an unexpected file position change. " > + "If you can reliably reproduce this message, please report the steps you used to bug 1306914."); > +#ifdef DEBUG > + fprintf(stderr,"(debug) before_seek_pos=0x%08lx, after_seek_pos=0x%08lx\n", And here you use 08lx. The data type is int64_t here and above. @@ +742,5 @@ > +#ifdef DEBUG > + fprintf(stderr,"(debug) before_seek_pos=0x%08lx, after_seek_pos=0x%08lx\n", > + before_seek_pos, after_seek_pos); > +#endif > + } else if (since_last_seek_report >= HOWOFTEN) { If I read this correctly, since you start at HOWOFTEN-2, the second one will get NS_WARNING, and then we wait 100.000 and give a warning again. I think this entire logic can be removed since the warnings don't help at all. @@ +746,5 @@ > + } else if (since_last_seek_report >= HOWOFTEN) { > + since_last_seek_report = 0; > + // Printing message to console log using MSG_LOG_INFO_TO_CONSOLE(...) > + // causes xpcshell tests to be flagged as error. > + NS_WARNING("Periodic check of unnecessary Seek() in WriteLineToMailbox() during BETA testing. Please ignore."); BETA?? Why do I need a warning that I should ignore?
Fixed print issues, now using 0x%016llx everywhere.
Attachment #8798216 - Attachment is obsolete: true
OK, I made all the changes I wanted. If you agree, I'll land the patch.
Attachment #8798222 - Attachment is obsolete: true
Attachment #8798224 - Flags: feedback?(ishikawa)
Sorry about the noise. I killed some more unused stuff.
Attachment #8798224 - Attachment is obsolete: true
Attachment #8798224 - Flags: feedback?(ishikawa)
Attachment #8798226 - Flags: feedback?(ishikawa)
Just making sure this compiles on Mac due to the 0x%016llx format descriptor for int64_t. https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=1dae1c56f370abc181b3fbb1130f777369ee23b9 I can see it compiles on Windows, I assume it compiles on Linux, since this is where the patch came from, so we need to cover Mac.
Fixed more white-space issues. Now it looks good and ready for landing. Who will run this with DISABLE_SEEK or DISABLE_SEEK_POS_CHECK set?
Attachment #8798226 - Attachment is obsolete: true
Attachment #8798226 - Flags: feedback?(ishikawa)
Attachment #8798332 - Flags: feedback?(ishikawa)
Attachment #8798332 - Flags: feedback?(acelists)
Dear Jorg, Thank you for all the cleanup work. I appreciate it very much. The format string issues were noticed this summer, and I checked and "fixed" them in my patch sets, but unfortunately, by mistake, the change of format strings ended up in the later patch and did not appear in this patch. Tough. About the message that appears every 100 writes or so, it was meant for me to make sure that the version of TB was compiled with the internal check and not seeing the warning message was OK and not because of build screw up and the check code was not compiled in. Thinking that probably only a few people including me will check the TB's operation in this regard, I can safely put a debug code in my local build when there is a problem about this series of patch sets. There are couple of requests with your patch posted. First: you removed some definitions from MsgUtils.h since they are NOT used in THIS particular patch. But some macros are used later in my series. For me, it was very natural to put the macros in this first patch since they are going to be used later in the patch sets. So if you are OK, can you revive the macros in the header file. (Your patch also missed the last ending newline.) [If you don't agree, you can omit them. I need to re-create some patches to accommodate that.] Second: the format string compiles OK under linux, but actually, even under linux we need a cast. The format string causes a couple of warnings to be printed. (I "fixed" them in my older patch, but again it came too late in the subsequent patch.) I put the cast (long long unsigned int) in the uploaded patch. The header file can wait. But this cast ought to go in. We don't know when linux GCC starts to make this a fatal error on try-comm-central and local build. As for the definitions of DISABLE_* macros in my patch series, grep "#define DISABLE" * check_fileptr_change-jorgk-v5-ci-r1.patch:+// #define DISABLE_SEEK_POS_CHECK 1 check_fileptr_change-jorgk-v5-ci-r1.patch:+// #define DISABLE_SEEK 1 fix-close-MERGED-local-dir-Rev02.patch: // #define DISABLE_SEEK_POS_CHECK 1 fix-close-MERGED-local-dir-Rev02.patch:-// #define DISABLE_SEEK 1 fix-close-MERGED-local-dir-Rev02.patch:+#define DISABLE_SEEK 1 removing-a-Seek-rev02.patch:-// #define DISABLE_SEEK_POS_CHECK 1 removing-a-Seek-rev02.patch:+#define DISABLE_SEEK_POS_CHECK 1 removing-a-Seek-rev02.patch: #define DISABLE_SEEK 1 So disabling the unnecessary |Seek()| is done in fix-close-MERGED-local-dir-Rev02.patch which is patch 1242030-d in Bug 1242030, and disabling the file seek position check is done finally in removing-a-Seek-rev02.patch in Bug Bug 1242046 Timeline: bug 1116055 (this bugzilla) => bug 939548 -> bug 1242030 (extra |Seek()| is removed) -> bug 1242042 -> bug 1176857 -> bug 1242046 (seek position check is removed finally here.) -> bug 1242050 -> bug 1170606 TIA PS: I will post the patches in the subsequent buzilla entries that have been recreated with the posted patch in mind. (I may not be able to keep up with the recreation of the patches later this week although I try.)
Attachment #8798386 - Flags: review+
Attachment #8798332 - Flags: feedback?(ishikawa) → feedback+
Gee, I should have hit feedback- since I request at least an addition of casts.
Attachment #8798332 - Flags: feedback?(acelists)
Attachment #8798332 - Flags: feedback-
Attachment #8798332 - Flags: feedback+
Oh my goodness. Bugzilla UI needs to be improved, I tried to reset my feedback to "-", but somehow I ended up deleting the request entry for aceman (!) Jorg, I believe you will understand the cast issue from the comment 142. Aceman, sorry. Can you comment on the Jorg's v5 patch and my tweak to the patch?
Flags: needinfo?(acelists)
(In reply to ISHIKAWA, Chiaki from comment #142) > About the message that appears every 100 writes or so, it was meant for me > to make sure that the version of TB was compiled with the internal check and > not seeing the warning message was OK and not because of build screw up and > the check code was not compiled in. Let's trust the compiler ;-) Also, you had it every 100,000 checks. > But some macros are used later in my series. OK. > I put the cast (long long unsigned int) in the uploaded patch. OK. We can use the new patch. Now, what I like to understand is this, preferably without reading through five or more bugs. The aim is to eliminate superfluous Seek() calls and finally move to a buffered write. This patch here does this: 1) It adds a "seek to end" in IncorporateBegin() together with some checks. No effect for the user. 2) In IncorporateWrite() it adds another set of checks and the possibility to turn off seeking be setting DISABLE_SEEK. You can also disable the checking be setting DISABLE_SEEK_POS_CHECK. There is absolutely no change in processing other than writing a message to the console asking the user to report certain things. I understand that you added 1). The idea is to seek to the end once and that's it, or perhaps not at all if it turns out unnecessary. I assume you will be scanning Xpcshell and Mozmill logs to see. But why do you need DISABLE_SEEK and DISABLE_SEEK_POS_CHECK? They are currently not defined, so the code will just compile. If you want to disable seek, you could just remove the code later instead of disabling it by setting DISABLE_SEEK. DISABLE_SEEK_POS_CHECK is even less necessary. Either the checking code is removed later or the seeking and checking code is removed. Personally I'd remove those defines altogether.
Attachment #8798332 - Attachment is obsolete: true
Flags: needinfo?(acelists)
Attachment #8798386 - Flags: feedback?(acelists)
Comment on attachment 8798386 [details] [diff] [review] Check fileposition before and after a supposedly unnecessary seek() and report mismatch (JK v5) with a couple of tweaks. Frankly, this hasn't passed my review yet. I want to understand what those defines are for. Currently they make no sense and and if we want to disable certain code later, we can disable it later or remove it later.
Attachment #8798386 - Flags: review+
OK, I got rid of the defines. Also, I changed the reporting to e-mail dev-apps-thunderbird@lists.mozilla.org. I don't think people will open a BMO account to report this. If you give them an e-mail, they might send an e-mail. Also, I'm dumping this out as a debug message, so you can see it in the debug log next to the other debug output.
Attachment #8798415 - Flags: feedback?(ishikawa)
Attachment #8798415 - Flags: feedback?(acelists)
(In reply to Jorg K (GMT+2) from comment #147) > Created attachment 8798415 [details] [diff] [review] > Check fileposition before and after a supposedly unnecessary seek() and > report mismatch (JK v6). > > OK, I got rid of the defines. The use of defines were to make it easy to disable the features (enable back if the change was bad !). I wanted to make it easy to explain what happens in the series of patches, bug 1116055 (this bugzilla that enables the check) => bug 939548 -> bug 1242030 (extra |Seek()| is removed) <--- ESPECIALLY HERE -> bug 1242042 -> bug 1176857 -> bug 1242046 (seek position check is removed finally here.) <--- AND HERE. -> bug 1242050 -> bug 1170606 bug 1242045 is basically a one liner now. Bug from the look of past activities, if you and aceman are fine with removing the defines and understand what happens in bug 1242030 and bug 1242045 easily, I am OK with that. > Also, I changed the reporting to e-mail > dev-apps-thunderbird@lists.mozilla.org. I don't think people will open a BMO > account to report this. If you give them an e-mail, they might send an > e-mail. Do we have to open a BMO account to report a bug?! It has been years since I began reporting bugs. I forgot about it. Grrr... Reporting bugs should be easier than that. > > Also, I'm dumping this out as a debug message, so you can see it in the > debug log next to the other debug output. All right. I think the priority here is to get this patch landed so that the subsequent ones can be checked in. I will rebase the patches in the subsequent bugzilla entries and upload them, hopefully tonight (JST, Japan Standard Time.). Thank you again for the clean up work and comments. It is great to have someone who looks in the code and knows the outline of the changes, albeitvaguely. (The source file itself of C-C TB, and hg repo, and now defunct cvs repo don't cut it when something is mysterious and hard to figure out. We need someone who looked at the code before, preferably trying to understand it, and recalls the motivation behind the changes, history, etc. In this sense, C-C TB is in a precarious state, but since TB is an important tool for work myself, I will try to support it.)
Attachment #8798415 - Flags: feedback?(ishikawa) → feedback+
(In reply to ISHIKAWA, Chiaki from comment #148) > The use of defines were to make it easy to disable the features (enable back > if the change was bad !). Each bug needs to be understandable by itself. This bug is about looking at what happens with the Seek(). I assume we will leave this in for a while to collect data. In general, if changes are bad, they get backed out. If the aim was to remove the Seek() then we could remove it straight away. But I understand that you want to remove it later, so later is when you remove it. In the bug that removes it, we can discuss whether you just remove the code or do: #if 0 THIS CODE HAS BEEN REMOVED BY BUG xxx. WE NO LONGER SEEK. CODE IS LEFT IN PLACE FOR REFERENCE ONLY. ... #endif. Typically we remove dead code, otherwise we would have a graveyard of superseded erroneous rubbish.
Attachment #8798386 - Attachment is obsolete: true
Attachment #8798386 - Flags: feedback?(acelists)
(In reply to Jorg K (GMT+2) from comment #149) > (In reply to ISHIKAWA, Chiaki from comment #148) > > The use of defines were to make it easy to disable the features (enable back > > if the change was bad !). > Each bug needs to be understandable by itself. This bug is about looking at > what happens with the Seek(). > I assume we will leave this in for a while to collect data. > > In general, if changes are bad, they get backed out. > > If the aim was to remove the Seek() then we could remove it straight away. > But I understand that you want to remove it later, so later is when you > remove it. In the bug that removes it, we can discuss whether you just > remove the code or do: > > #if 0 > THIS CODE HAS BEEN REMOVED BY BUG xxx. WE NO LONGER SEEK. > CODE IS LEFT IN PLACE FOR REFERENCE ONLY. > ... > #endif. > > Typically we remove dead code, otherwise we would have a graveyard of > superseded erroneous rubbish. Fair enough. I should explain why I am doing this roundabout way of writing this particular series of patches. When I proposed a simple-minded buffering patch a couple of years ago, I was advised to proceed with caution. This was because there had been an earlier attempt but the patch had to be backed out due to mysterious mail file corruptions. Back then nobody knew why the corruption happened. (I can't find where the comments are now. They are in the bugzilla comments.) So I broken down the patches: one to introduce internal check, one to remove the seek and report error, etc. This paid off although the series of patches is a little awkward to maintain. Also, I have tried to make the patches easier to explain by the breakdown. However, I am not sure if I succeeded in this respetct. Anyway, when I enabled buffering and tested the combination of {pop3, imap} x {mbox, maildir} using a remote file system to store my profile and mail store [and pulling off the network cable to simulate file system errors], the internal check did report seek position inconsistency which was bad. Eventually, I discovered that the resetting of seek position to the beginning by some buffer-enabling functions was to blame. I figured out after many trials that one |Seek()| per single message download is necessary for POP3 at least and imap seems to be handled satisfactorily now, and the code with the patches seems to work as far as xpcshell/mozmill test goes. Everything else in the series of patches is about taking care of the error return value of |Close()| now that the error can certainly occur when |Close()| is called and flushing buffer causes real write. Previously return value of Close were ignored in many places. A shame. And in doing so I uncovered a few issues regarding error recovery, some incorrect modifications of previous versions, etc. Since the patches inserted error checks to many I/O calls whose return value had not been checked before, the patches are lengthy. These error checks ought to lead to a better error recovery strategy in the face of network file system error, but that has to wait for another round of development (and I wish to learn how frequent these network file system I/O errors are.). TIA
Thanks for the explanation. Yes, we need to be careful not to smash peoples message stores ;-) (In reply to ISHIKAWA, Chiaki from comment #150) > I figured out after many trials that one |Seek()| per single message > download is necessary for POP3 at least and imap seems to be handled > satisfactorily now, and the code with the patches seems to work as far as > xpcshell/mozmill test goes. Yes, that's the Seek() in IncorporateBegin() and no one objects. > Everything else in the series of patches is about taking care of the error > return value of |Close()| now that the error can certainly occur when > |Close()| is called and flushing buffer causes real write. Well, we should really distinguish between cleaning up faulty or missing error handling, for which Kent is a specialist, and changing the main code path of the code that retrieves POP and IMAP messages for local storage. I think Wayne, Aceman and I are willing to see this through and we can ask Kent for help were required. I'll wait for Aceman's feedback and then land the patch later today so we can see whether we get any unexpected file position losses reported.
Another idea: I think I'll change the fprintf(stderr,"(debug) ... to fprintf(stderr,"(seekdebug) ... so those lines can be easily found in a long log file. OK?
I liked the defines as they clearly marked which code we need to remove later (not only the seek but also the debugging output). And as Chiaki says, it allows devs to easily turn off the seek to test the other patches in the stack to see if the seek was hampering the speed.
(In reply to Jorg K (GMT+2) from comment #152) > Another idea: I think I'll change the > fprintf(stderr,"(debug) ... > to > fprintf(stderr,"(seekdebug) ... > so those lines can be easily found in a long log file. OK? Come to think of it, this is OK. (In reply to :aceman from comment #153) > I liked the defines as they clearly marked which code we need to remove > later (not only the seek but also the debugging output). > > And as Chiaki says, it allows devs to easily turn off the seek to test the > other patches in the stack to see if the seek was hampering the speed. Jorg, actually I share this sentiment with Aceman. As a matter of fact, it has not been quite easy to figure out which part should be removed or kept when the {#ifdef,#ifndef}[/#else]/#endif markers are gone. I found this out the hard way when I tried to create follow up patches. Please recall that these patches could have been written in one chunk, but were divided into separate chunks for a few reasons, and they are very much inter-dependent. If you still insist, I would suggest removing the dead code (i.e., getting rid of the #ifdef/#else/#endif) maybe a few weeks AFTER 1242046 landed. Changing the #ifdef,#ifndef,#else,#endif construct at this stage turned out to be so error-prone in rebasing follow-up patches in 1242030. bug 1116055 (this bugzilla that enables the check) => bug 939548 -> bug 1242030 (extra |Seek()| is removed) <--- ESPECIALLY HERE -> bug 1242042 -> bug 1176857 -> bug 1242046 (seek position check is removed finally here.) <--- AND HERE. -> bug 1242050 (already landed) -> bug 1170606 (this can wait) What do you think?
This is badly done from beginning to end. If the aim was to move to buffered write, do it in one bug. There can be multiple parts which can land independently. This all seems like a *big* WIP to me. We want to kill the seek, but we're not brave enough to do it. In this case, the developer needs to compile his own version and run with it for a few months to convince himself that all is good, then land the patch. We developing a product here this is not a WIP playground. We don't need to make it easy for a developer to change other parts of the system in a single line. I explained it in comment #149 that you can disable the seek with two lines: #if 0 ... #endif. So here we're landing something that we already shouldn't be doing, but never mind. Perhaps we can gather some information.
Summary: Performance issue: Failure to use buffered write (comm-central thunderbird) → Performance issue: Failure to use buffered write - Part 1: Collect file seek position information.
By popular demand, I brought one define back. But nothing of the #ifndef where one has to ask oneself whether it's defined or not, perhaps as a compiler option. This clearly shows the section to be removed later. Last offer ;-)
Attachment #8798415 - Attachment is obsolete: true
Attachment #8798415 - Flags: feedback?(acelists)
Attachment #8798495 - Flags: feedback?(acelists)
(In reply to Jorg K (GMT+2) from comment #156) > This is badly done from beginning to end. > > If the aim was to move to buffered write, do it in one bug. There can be > multiple parts which can land independently. This all seems like a *big* WIP > to me. We want to kill the seek, but we're not brave enough to do it. In > this case, the developer needs to compile his own version and run with it > for a few months to convince himself that all is good, then land the patch. > > We developing a product here this is not a WIP playground. We don't need to > make it easy for a developer to change other parts of the system in a single > line. > I am afraid that you get my approach wrong. I did it one chunk originally. I split the patch into a few chunks exactly to gather information from various beta testers (nightly users) for the sake of collecting information based on some comments. And this information is to support the tacit assumption in the patch that only one |Seek()| is necessary for one message each by checking the seek position. > I explained it in comment #149 that you can disable the seek with two lines: > #if 0 ... #endif. > The marker (one ifdef you resurrected is more than enough as a marker for editing.) Thanks. > So here we're landing something that we already shouldn't be doing, but > never mind. Perhaps we can gather some information. Yes, the aim is exactly to gather some information whether there is an esoteric system where the assumption about the |Seek()| does not hold along the way. Very unlikely, but who knows. Given the number of nightly users (surprisingly high to my taste, and some do use them for real work !), we can safely check sanity of the patch set approach about killing the |Seek()|. bug 1116055 (this bugzilla that enables the check) => bug 939548 (could be split into two, one being merged into bug 1242030) -> bug 1242030 (extra |Seek()| is removed) <--- ESPECIALLY HERE -> bug 1242042 -> bug 1176857 -> bug 1242046 (seek position check is removed finally here.) <--- AND HERE. -> bug 1242050 (already landed) -> bug 1170606 (this can wait) (cf. Would someone be able to check the gigantic patch that is the merge of 1116055-1242046? If so, I would have stuck to the original one chunk approach.) During the application of the above patches, if there is a consistent problem report regarding the seek position from a user or two but not from others, I would want to learn the hardware configuration, software version, etc. of such users. My take at this moment is that of a suspicion about their unique hardware/software problem. Unless a large number of users with similar hardware/software report the issues, I would disregard the reports as false positives. At least over the last dozen months or so, a few corporate users who need to use remote file system for their profile and mail store tried the binaries created on try-comm-central using my patches and follow-up patches and they did not report any seek position issues, but they reported positive speed improvement due to the less number of read system calls issued locally and thrown at the remote servers. TIA
Comment on attachment 8798495 [details] [diff] [review] Check fileposition before and after a supposedly unnecessary seek() and report mismatch (JK v7). Review of attachment 8798495 [details] [diff] [review]: ----------------------------------------------------------------- So you merged the 2 defines into one. I can live with that :) The reduced comments are OK, the patch is more readable now, thanks. ::: mailnews/local/src/nsPop3Sink.cpp @@ +713,5 @@ > + if (before_seek_pos != after_seek_pos) { > + // This merits a console message. > + MSG_LOG_ERR_TO_CONSOLE("Unexpected file position change detected. " > + "If you can reliably reproduce this, please report the steps " > + "you used to dev-apps-thunderbird@lists.mozilla.org."); Things can go lost in a mailinglist and also not everybody is subscribed there. I would prefer if you put back the bug number as an option too.
Attachment #8798495 - Flags: feedback?(acelists) → feedback+
Isn't it past 2 AM in Japan? (In reply to ISHIKAWA, Chiaki from comment #158) Never mind, let's land this and see how we can gather information. About large and complicated bugs: They are done in parts in one bug and the reviewer looks at one part at a time. I can give you references of you want. I'll take a look at all the other bugs you have in the pipeline. Please don't confuse Daily/Nightly with Aurora or Beta. They way the patch is written, it will even go into the product. Anyway, I discussed it with Aceman on IRC and we're happy to land what we have. I'll replace the NS_ASSERTION with MOZ_ASSERT. (In reply to :aceman from comment #159) > > + "you used to dev-apps-thunderbird@lists.mozilla.org."); > Things can go lost in a mailinglist and also not everybody is subscribed > there. I would prefer if you put back the bug number as an option too. Hmm, but normal users don't have an BMO account. I don't know what a good solution is here.
Comment on attachment 8798495 [details] [diff] [review] Check fileposition before and after a supposedly unnecessary seek() and report mismatch (JK v7). Review of attachment 8798495 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/local/src/nsPop3Sink.cpp @@ +713,5 @@ > + if (before_seek_pos != after_seek_pos) { > + // This merits a console message. > + MSG_LOG_ERR_TO_CONSOLE("Unexpected file position change detected. " > + "If you can reliably reproduce this, please report the steps " > + "you used to dev-apps-thunderbird@lists.mozilla.org."); Chuaki, are we able to print the folder name as part of the message? Also, I think the message should say what the benefit for the user could be, to motivate the reporting. E.g.: Unexpected file position change detected in folder XYZ. If you can reliably reproduce this, please report the steps you have done to dev-apps-thunderbird@lists.mozilla.org or into bug 123456 at bugzilla.mozilla.org. Removing this problem will allow speeding up message downloads.");
I'm looking at getting the folder name now. I'll change the message, too.
Now with folder name and new text. Do you want to open a new bug for the reporting?
Attachment #8798495 - Attachment is obsolete: true
Attachment #8798557 - Flags: feedback?(acelists)
I forced the message and this came out on the console: Unexpected file position change detected in folder Inbox. If you can reliably reproduce this, please report the steps you used to dev-apps-thunderbird@lists.mozilla.org or to bug XXX at bugzilla.mozilla.org. Resolving this problem will allow speeding up message downloads. So I assume this works. I will of course run this in a Daily and as a POP user I can see whether this is ever displayed.
Comment on attachment 8798557 [details] [diff] [review] Check fileposition before and after a supposedly unnecessary seek() and report mismatch (JK v8). Review of attachment 8798557 [details] [diff] [review]: ----------------------------------------------------------------- Thanks. Just put in the new bug number.
Attachment #8798557 - Flags: feedback?(acelists) → review+
https://hg.mozilla.org/comm-central/rev/72efbbd0811ea56ea2fedf4c78a774c04cd7aa38 Bug 1308335 will collect user feedback, if any ;-) This is just the beginning of switching to buffered write. Expect further action in bug 939548, bug 1242030, bug 1242042, bug 1176857 and bug 1242046.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 52.0
Hooray! Thank you very much. I got a bout of summer cold and thus I was sleeping and getting up from time to time in the last 24 hours. that is why I could send patches from time to time.) Then, this morning my display died, I have switched to a moth-balled display. But going from 1920 x 1600 to 1600 x 1200 hurts. This weekend is not going to be fun :-( I will try to rebase other patches based on the final patch from this bugzilla entry, but don't hold your breath. It may be Monday morning (a national holiday in Japan) when I can upload the synced patches. (I have already uploaded the patches based on v7. So the interim review can be done reasonably without difficulty.)
(In reply to ISHIKAWA, Chiaki from comment #167) > So the interim review can be done reasonably without difficulty. There won't be any review if you don't request it ;-)
I've looked at the debug runs of https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=72efbbd0811ea56ea2fedf4c78a774c04cd7aa38 looking for "seekdebug" but didn't find any. How do you intend to find cases where the Seek() was necessary, either in IncorporateBegin() or WriteLineToMailbox()? Users may report the latter, but it will be hard to catch the former.
(In reply to Jorg K (GMT+2) from comment #169) > I've looked at the debug runs of > https://treeherder.mozilla.org/#/jobs?repo=comm- > central&revision=72efbbd0811ea56ea2fedf4c78a774c04cd7aa38 > looking for "seekdebug" but didn't find any. How do you intend to find cases > where the Seek() was necessary, either in IncorporateBegin() or > WriteLineToMailbox()? Users may report the latter, but it will be hard to > catch the former. Maybe I was not clear enough (or I am missing something). Until the buffering is enabled, we won't see the necessity of |Seek()| in IncorporateBegin because the file pointer is not reset and stays as expected by the following write. But ONCE the buffering is enabled, |Seek()| must be issued once in IncorporateBegin. I will submit jobs to show you the messages for the required |Seek()|. The latter report: If, in the course of applying the patches, a sizeable user population reports that the |Seek()| is required inside WriteLineToMailbox(), then something definitely is wrong! (OK, I admit the the title of the bugzilla entries are somewhat off to what the patches do in each entry after so many months. I think I will clean them up tomorrow.) bug 1116055 (this bugzilla that enables the check) => bug 939548 (could be split into two, one being merged into bug 1242030) -> bug 1242030 (extra |Seek()| is removed) -> bug 1242042 Enabling buffering for file stream to write message for C-C TB (This enables buffering). *Note*: We will start to see the necessity of one |Seek()| in IncorporateBegin only here or after the next patch. -> bug 1176857 C-C TB Enabling buffering for pop3 download (and possibly part of imap file operation) *Note*: this may be also the reason why we need one |Seek()| in IncorporateBegin. So we may start to see the warning here. -> bug 1242046 Removing unnecessary |Seek| that caused the C-C TB to operate slowly in terms of I/O (seek position check is removed finally here.) -> bug 1242050 (already landed) -> bug 1170606 (this can wait) The title of the patches are a little outdated (or off the mark from the patches posted there) after so many months (almost a year or two). I will clean them up tomorrow.
Now I am a little puzzled why I don't see the "Seek was necessary message" in my local mozmill log. Hmm. But don't think the patch was in vain. The following log lines are from the TTY console where I invoked C-C TB with the patches up to just before removing the extra |Seek()| together with the inconsistency check, and downloading a few messages from local pop3 server. YOU CAN SEE CLEARLY such lines as (seekdebug) Seek was necessary in IncorporateBegin() for folder Inbox. (seekdebug) first_pre_seek_pos = 0x0000000000000000, first_post_seek_pos=0x0000000000629156 The patches are applied bug 1116055 (this bugzilla that enables the check) => bug 939548 (could be split into two, one being merged into bug 1242030) -> bug 1242030 (extra |Seek()| is removed) -> bug 1242042 Enabling buffering for file stream to write message for C-C TB (This enables buffering). *Note*: We will start to see the necessity of one |Seek()| in IncorporateBegin only here or after the next patch. -> bug 1176857 C-C TB Enabling buffering for pop3 download (and possibly part of imap file operation) *Note*: this may be also the reason why we need one |Seek()| in IncorporateBegin. So we may start to see the warning here. ============== up to here ============= -> bug 1242046 Removing unnecessary |Seek| that caused the C-C TB to operate slowly in terms of I/O (seek position check is removed finally here.) -> bug 1242050 (already landed) -> bug 1170606 (this can wait) Excerpt from the message lines printed in TTY console. Yay, we now know which folder required the |Seek()| in this case, Inbox. Oh, I will try moving the files between folders and see other folder names show up or not. GetDiskSpaceAvailable returned: 6016299008 bytes Incorporate message begin: uidl string: 0000a7fd533eeb2c [19250] WARNING: (debug): IncorporateBegin: !m_downloadingToTempFile path: file /NREF-COMM-CENTRAL/comm-central/mailnews/local/src/nsPop3Sink.cpp, line 539 (debug) Creating buffered output stream to mboxFile=<</home/ishikawa/.thunderbird/wczlstuh.default/Mail/localhost/Inbox>> in nsMsgBrkMBoxStore::GetNewMsgOutputStream in nsMsgBrkMBoxStore.cpp; (debug) nsCOMPtr <nsIInputStream> inboxInputStream = do_QueryInterface(m_outFileStream, &rv);failed with rv=0x80004002 [19250] WARNING: (debug) We are enabling buffering for m_outFileStream in nsPop3Sink::IncorporateBegin in nsPop3Sink.cpp.: file /NREF-COMM-CENTRAL/comm-central/mailnews/local/src/nsPop3Sink.cpp, line 611 620: m_outFileStream->Flush() returned 0x00000000 (seekdebug) Seek was necessary in IncorporateBegin() for folder Inbox. (seekdebug) first_pre_seek_pos = 0x0000000000000000, first_post_seek_pos=0x0000000000629156 [19250] WARNING: (debug): ApplyForwardAndReplyFilter getting called.: file /NREF-COMM-CENTRAL/comm-central/mailnews/local/src/nsPop3Sink.cpp, line 1249 Incorporate message complete. Incorporate message begin: uidl string: 0000a7fe533eeb2c [19250] WARNING: (debug): IncorporateBegin: !m_downloadingToTempFile path: file /NREF-COMM-CENTRAL/comm-central/mailnews/local/src/nsPop3Sink.cpp, line 539 (debug) Creating buffered output stream to mboxFile=<</home/ishikawa/.thunderbird/wczlstuh.default/Mail/localhost/Inbox>> in nsMsgBrkMBoxStore::GetNewMsgOutputStream in nsMsgBrkMBoxStore.cpp; (debug) nsCOMPtr <nsIInputStream> inboxInputStream = do_QueryInterface(m_outFileStream, &rv);failed with rv=0x80004002 [19250] WARNING: (debug) We are enabling buffering for m_outFileStream in nsPop3Sink::IncorporateBegin in nsPop3Sink.cpp.: file /NREF-COMM-CENTRAL/comm-central/mailnews/local/src/nsPop3Sink.cpp, line 611 620: m_outFileStream->Flush() returned 0x00000000 (seekdebug) Seek was necessary in IncorporateBegin() for folder Inbox. (seekdebug) first_pre_seek_pos = 0x0000000000000000, first_post_seek_pos=0x00000000006294db [19250] WARNING: (debug): ApplyForwardAndReplyFilter getting called.: file /NREF-COMM-CENTRAL/comm-central/mailnews/local/src/nsPop3Sink.cpp, line 1249 Incorporate message complete. ++DOMWINDOW == 32 (0x791fc80) [pid = 19250] [serial = 32] [outer = 0x4948110] Incorporate message begin: uidl string: 0000a7ff533eeb2c [19250] WARNING: (debug): IncorporateBegin: !m_downloadingToTempFile path: file /NREF-COMM-CENTRAL/comm-central/mailnews/local/src/nsPop3Sink.cpp, line 539 (debug) Creating buffered output stream to mboxFile=<</home/ishikawa/.thunderbird/wczlstuh.default/Mail/localhost/Inbox>> in nsMsgBrkMBoxStore::GetNewMsgOutputStream in nsMsgBrkMBoxStore.cpp; (debug) nsCOMPtr <nsIInputStream> inboxInputStream = do_QueryInterface(m_outFileStream, &rv);failed with rv=0x80004002 [19250] WARNING: (debug) We are enabling buffering for m_outFileStream in nsPop3Sink::IncorporateBegin in nsPop3Sink.cpp.: file /NREF-COMM-CENTRAL/comm-central/mailnews/local/src/nsPop3Sink.cpp, line 611 620: m_outFileStream->Flush() returned 0x00000000 (seekdebug) Seek was necessary in IncorporateBegin() for folder Inbox. (seekdebug) first_pre_seek_pos = 0x0000000000000000, first_post_seek_pos=0x0000000000629860 BTW, you see the problem immediately. The file pointer position became 0 (!) after the buffered output was enabled. (That is, it was rewound to the beginning of the file. If it was reset to a value, something more complex and opaque, I would not have figured out the real cause.)
(In reply to ISHIKAWA, Chiaki from comment #171) > YOU CAN SEE CLEARLY such lines as > (seekdebug) Seek was necessary in IncorporateBegin() for folder Inbox. I'm glad you like my work of adding the folder name which was landed in your name ;-)
Lost track of this. I planned put out a call for testing so we get feedback for bug 1308335. What types of users do we most/very much want to reach? Those with profiles on network drive? slow disk?
Blocks: 1308335
Flags: needinfo?(ishikawa)
No one will see the message just yet. Refer to comment #170: === Until the buffering is enabled, we won't see the necessity of |Seek()| in IncorporateBegin because the file pointer is not reset and stays as expected by the following write. But ONCE the buffering is enabled, |Seek()| must be issued once in IncorporateBegin. I will submit jobs to show you the messages for the required |Seek()|. ***>>> The latter report: If, **in the course of applying the patches**, a sizeable user population reports that the |Seek()| is required inside WriteLineToMailbox(), then something definitely is wrong! ===
(In reply to Jorg K (GMT+2) from comment #174) > No one will see the message just yet. Refer to comment #170: > > === > Until the buffering is enabled, we won't see the necessity of |Seek()| in > IncorporateBegin > because the file pointer is not reset and stays as expected by the following > write. > > But ONCE the buffering is enabled, |Seek()| must be issued once in > IncorporateBegin. > I will submit jobs to show you the messages for the required |Seek()|. > > ***>>> The latter report: If, **in the course of applying the patches**, a > sizeable user population reports > that the |Seek()| is required inside WriteLineToMailbox(), then something > definitely is wrong! > === Hi, Jorg is correct. But I still want Wayne to alert the user base who uses TB with one's profile (including mail store) on remote file system. Especially the linux user. (Maybe a small percentage, though.) The reason being is that many corruption issues (one from Eric comes to mind) is caused by the network file system error. And if the errors in congested enterprise network file systems is so rampant, it *IS* possible that those poor users even may see the warning about inconsistent file seek position EVEN today. That is because their network file system is so overloaded that some seek operation may not finish properly before the execution hits this code path. It is indeed a very rare and buggy situation, but I have a nagging suspicion that somewhat outdated SAMBA/CIFS implementation on the server-side may allow such situation to occur with enough frequency to harrass TB users: very rarely I hope, but still if you a few hundred thousand corporate users (?), I have no idea about this, you may see an error of this magnitude once a week or so??? TIA PS: if indeed such broken or congested network file servers are in place in today's corporate world, we should add error checks to all the Seek() calls and exit early.
Flags: needinfo?(ishikawa)

I will update the patches in mentioned bugzilla esp. comment 170 with the latest patches.
(I needed to take of the bitrot due to clang-format reformating , directory layout changes over the last 18 months or so.)

Flags: needinfo?(ishikawa)

One thing I do not understand is that the bug is marked as closed (but still ndoes not work on Linux with stored archived mail stored on a filer) and now you push new patches. Could you explain why the bug has been closed?

(In reply to Eric Valette from comment #177)

One thing I do not understand is that the bug is marked as closed (but still ndoes not work on Linux with stored archived mail stored on a filer) and now you push new patches. Could you explain why the bug has been closed?

I am not sure whether closed was the right choice.

But as in comment 171, this patch is only the first part of the series of patches to enable buffering write during mail download, copy, move, etc.

The series.

bug 1116055 (this bugzilla that enables the check)
=> bug 939548 (could be split into two, one being merged into bug 1242030)
-> bug 1242030
(extra |Seek()| is removed)
-> bug 1242042 Enabling buffering for file stream to write message for C-C TB
(This enables buffering).
Note: We will start to see the necessity of one |Seek()| in IncorporateBegin only here or
after the next patch.
-> bug 1176857 C-C TB Enabling buffering for pop3 download (and possibly part of imap file operation)
Note: this may be also the reason why we need one |Seek()| in IncorporateBegin.
So we may start to see the warning here.
============== up to here =============
-> bug 1242046 Removing unnecessary |Seek| that caused the C-C TB to operate slowly in terms of I/O
(seek position check is removed finally here.)
-> bug 1242050 (already landed)
-> bug 1170606 (this can wait) <=========== THIS IS THE PATCH TO MAKE IT POSSIBLE
TO USE PROFILE ON REMOTE FILE SYSTEM.

And unfortunately, because of historical reasons, the patches for bug 1170606 (C-C T-B fixes to take care of short read (part of [META] Failure to deal with short read) depend on the changes in the preceding patches for enabling buffered write.

So in hindsight, I probably should have renamed this bugzilla entry as something like "[META] Performance issue: Failure to use buffered write"
and let this bugzilla depend on all the patches mentioned:
bug 1242030
bug 1242042
bug 1176857
bug 1242046
bug 1242050 (already landed)

And let bug 1170606 depend on such a META entry (either this one, or a newly created META entry.) simply because of textual patch issues.

Since this bugzilla is about buffered WRITE,
and bug 1170606 is about SHORT READ,
they are orthogonal in nature and can be applied in parallel IN PRINCIPLE, but
since they are sprinkled together in many files with not-well-checked return values,
practically there are so many overlaps when we create patches, then one set MUST come before the other for successful compilation and
for historical reasons, patch set for buffered WRITE was created first.
Thus bug 1170606 have to wait.

So this bugzilla is closed but the rest of the patches are still pending to enable buffered write.
And the patches for short read depend on the patches for buffered write. There is a fixed ordering dictated by my patch sets.
Because of the directory laytout changes and clang-format formating changes, I need to update the patches with the ones that compile locally and on tryserver as well.
So that is the long answer.

Now, due to the size of the patch(es), it took a long time to check and review, and then
modify to accommodate various changes in M-C, C-C including the change of directory layout, and change of semantics of some I/O routines.
It has been an uphill battle to settle on a stable patch set.
But I think the time is ripe for installing a stable patch set now that
automatic c++ formating in place (no manual variation of layout or coding style),
I have been running mochitest and xpcshel tests using the TB binary with the local patch set, which now works again after accommodating the changes in the last 26 months or so.

So I am clearing my local patch queue one by one right now to get these series of patches landed in C-C.

Flags: needinfo?(ishikawa)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: