Closed Bug 1306914 Opened 8 years ago Closed 8 years ago

Collecting file seek position problem report: WriteLineToMailbox detected an unexpected file position change.

Categories

(MailNews Core :: Networking: POP, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1116055

People

(Reporter: ishikawa, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

+++ This bug was initially created as a clone of Bug #1116055 +++ I have spotted a very conspicuous offender of not using buffered output where it can in comm-central thunderbird. I think as long as a buffer of reasonable size, even if it is 4KB or 16KB, is used, I/O performance will be much better and reasonably good. (especially for a remote file system, or imap case.) It does not have to be 128KB or much larger something, I think, although it would help for a local disk write and read. However, here I report that I have found a routine is copying message data WITHOUT ANY BUFFERING at all when it WRITES the output LINE BY LINE. ... the quote from the original bug continues at the end of this comment. This bug entry was created to see if users may encounter possible strange seek position issues when unnecessary |Seek()| that had been issued for each line of message saving (POP3 case), thus negating any effort to use buffering for output, have been removed. A set of patches has been created in bug 1116055 and friends that disables the extra |Seek()| calls. Unless we do this, buffering does not happen at all. In the patch, it was finally decided only ONE |Seek()| per message saving is absolutely necessary, but no more. However, to make sure the code is correct on all platforms, the patch inserts a probe to see if the implicit assumption about the file position is correct. Should a user experience a bug possibly related to the removing of |Seek()| issue, and see the message WriteLineToMailbox detected an unexpected file position change. If you can reliably reproduce this message, please report the steps you used to bug THISBUGNUMBER, the user is encouraged to report to this bugzilla entry. TIA The quote from the original bug continues here. The culprit routine I found is NS_IMETHODIMP nsMsgLocalMailFolder::CopyData(nsIInputStream *aIStream, int32_t aLength) in the file comm-central/mailnews/local/src/nsLocalMailFolder.cpp And the problematic unbuffered write in question is http://mxr.mozilla.org/comm-central/source/mailnews/local/src/nsLocalMailFolder.cpp#2111 2111 rv = mCopyState->m_fileStream->Write(start, lineLength, &bytesWritten); This does not seem to use buffering at all. (Please check the excerpt of system call trace from strace under linux attached in the following.). I agree that, since this function seems to be in need of scanning the data for "From" at the beginning, and escaping "From" at the beginning of line, etc., it needs to SCAN the INPUT line by line, but there is NO REASON AT ALL TO USE UNBUFFERED *OUTPUT*! Repeat-by: Monitor and record the system calls issued by TB. I used strace under linux. strace -o /tmp/t.out TB-BINARY Start TB, and then copy a message with reasonably large binary attachment in one folder to another folder. I found the repetition of small write(fd, bug, len) with length parameter 74 MANY times. E.g.: Excerpt from a typical strace record. I only excerpted the interesting lines. There are many other system calls. The particular message I copied had an attachment of 1.2MB size. The repeated writes() have the first parameter (file descriptor) as 54. I searched backward in trace file and found that FD 54 is for the target folder. (folder-C) ... open("/home/ishikawa/.thunderbird/u1ku1i3y.default/Mail/Local Folders/folder-C", O_RDWR|O_CREAT, 0664) = 54 ... Extraction of the initial part of the writes to this file descriptor: (I think I created the message using HTML format mail. That is why you see a few lines of HTML main message body. But that is not of any concern. Concern is the long repetition of mime data that follows.) ... write(54, "From - Sun Nov 22 11:35:01 2009\r"..., 33) = 33 clock_gettime(CLOCK_MONOTONIC, {2572825, 299989570}) = 0 clock_gettime(CLOCK_MONOTONIC, {2572825, 300253008}) = 0 write(54, "X-Mozilla-Status: 0001\r\n", 24) = 24 clock_gettime(CLOCK_MONOTONIC, {2572825, 299989570}) = 0 clock_gettime(CLOCK_MONOTONIC, {2572825, 300253008}) = 0 write(54, "X-Mozilla-Status2: 10000000\r\n", 29) = 29 .... I omit clock_gettime() below ... write(54, "X-Mozilla-Keys: $label3 "..., 97) = 97 write(54, "FCC: mailbox://nobody@Local%20Fo"..., 44) = 44 write(54, "X-Identity-Key: id1\r\n", 21) = 21 write(54, "Message-ID: <4B08A02E.7080201@ex"..., 44) = 44 write(54, "Date: Sun, 22 Nov 2009 11:35:01 "..., 39) = 39 write(54, "From: ishikawa <ishikawa@example"..., 39) = 39 write(54, "X-Mozilla-Draft-Info: internal/d"..., 70) = 70 write(54, "User-Agent: Thunderbird 2.0.0.23"..., 49) = 49 write(54, "MIME-Version: 1.0\r\n", 19) = 19 write(54, "To: nobody@example.com\r\n", 24) = 24 write(54, "Subject: libc-2.7.so 11:35\r\n", 28) = 28 write(54, "Content-Type: multipart/mixed;\r\n", 32) = 32 write(54, " boundary=\"------------000003080"..., 50) = 50 write(54, "\r\n", 2) = 2 write(54, "This is a multi-part message in "..., 46) = 46 write(54, "--------------000003080301040107"..., 40) = 40 write(54, "Content-Type: text/html; charset"..., 45) = 45 write(54, "Content-Transfer-Encoding: 7bit\r"..., 33) = 33 write(54, "\r\n", 2) = 2 write(54, "<!DOCTYPE html PUBLIC \"-//W3C//D"..., 65) = 65 write(54, "<html>\r\n", 8) = 8 write(54, "<head>\r\n", 8) = 8 write(54, "</head>\r\n", 9) = 9 write(54, "<body bgcolor=\"#ffffff\" text=\"#0"..., 41) = 41 write(54, "<br>\r\n", 6) = 6 write(54, "<br>\r\n", 6) = 6 write(54, "</body>\r\n", 9) = 9 write(54, "</html>\r\n", 9) = 9 write(54, "\r\n", 2) = 2 write(54, "--------------000003080301040107"..., 40) = 40 write(54, "Content-Type: application/octet-"..., 41) = 41 write(54, " name=\"libc-2.7.so\"\r\n", 21) = 21 write(54, "Content-Transfer-Encoding: base6"..., 35) = 35 write(54, "Content-Disposition: attachment;"..., 34) = 34 write(54, " filename=\"libc-2.7.so\"\r\n", 25) = 25 write(54, "\r\n", 2) = 2 write(54, "f0VMRgEBAQAAAAAAAAAAAAMAAwABAAAA"..., 74) = 74 write(54, "AAA0AAAANAAAADQAAABAAQAAQAEAAAUA"..., 74) = 74 write(54, "BAAAAAEAAAABAAAAAAAAAAAAAAAAAAAA"..., 74) = 74 write(54, "EwCcJwAAcFQAAAYAAAAAEAAAAgAAAJyd"..., 74) = 74 ... MANY lines of write with 74 octets.... write(54, "--------------000003080301040107"..., 42) = 42 write(54, "\r\n", 2) = 2 close(54) = 0 Observation: For this message with 1.2MB attachment, I counted 24016 write calls. (fgrep "write(54," /tmp/t.out | wc) Granted that the routine seems to be doing something about seeing "From", escaping "From" with ">From", etc., but it has *NO* reason to use UNBUFFERED WRITE here. It can buffer the output and then finally at the end of the loop it can flush or something. HOW TO FIX? I am not familiar with I/O routine inside mozilla code base. I am not sure where this file stream is opened [See the stack backtrace at the end.] and why it is not using buffered mode. rv = mCopyState->m_fileStream->Write(start, lineLength, &bytesWritten); Is there a way to set the buffer-mode to m_fileStream inside CopyData (just in case there are multiple wayward callers)? Or can we simply replace this Write with a buffered output version, but what that function would be? [The stack trace is shown below for those who want to chase what is calling this function.] The above is for copying an existing message to another folder. (So it can happen offline.) When I look up CopyData in mxr database, I found another instance of CopyData (for imap) and that seems to have the same unbuffered output issue. (Or at least superficially the code looks the same. Unless "->Write" semantics is different (i.e., unbuffered vs buffered) we have the crawling I/O performance when we copy / download large message (when we write to a local folder) when imap is used, too! MXR: E.g. Other place(s) where CopyData was found: I searched the identifier CopyData using mxr: MXR is slightly buggy, I think: I only left mail/mailnews-related entries. Defined as a function in: mailnews/imap/src/nsImapMailFolder.cpp (View Hg log or Hg annotations) line 3343, as member of class nsImapMailFolder -- NS_IMETHODIMP nsImapMailFolder::CopyData(nsIInputStream *aIStream, int32_t aLength) This function, CopyData calls another function CopyDataToOutputStreamForAppend() and in it, we have 3303 { 3304 rv = outputStream->Write(start, 3305 end-start, 3306 &writeCount); 3307 rv = outputStream->Write(CRLF, 2, &writeCount); 3308 } We should make sure that these writes are buffered! (and flush at the end of the loop/return.) From what I read before about the slowness of I/O in a comment posted by IMAP user, probably these are NOT buffered, either. AND, we SHOULD CHECK the return value rv: there is NO ERROR checking here !? No wonder there are reports of IMAP users losing messages, etc. in the past? [Added comment: even if these UNBUFFERED writes failed, close will succeed! So checking here is very important! ] Defined as a function prototype in: mailnews/base/public/nsICopyMessageListener.idl (View Hg log or Hg annotations) line 20 -- void copyData(in nsIInputStream aIStream, in long aLength); Defined as a variable in: mailnews/local/src/nsLocalMailFolder.cpp (View Hg log or Hg annotations) line 1830 -- rv = CopyData(inputStream, (int32_t) fileSize); line 2026 -- NS_IMETHODIMP nsMsgLocalMailFolder::CopyData(nsIInputStream *aIStream, int32_t aLength) This is the function I found out about and report here. The questions I have are: Is there a way to set the buffer-mode to m_fileStream inside CopyDate (just in case there are other wayward callers)? Or can we simply replace this Write with a buffered output version? See Related bugs: Bug 558528 - Larger buffer size in file I/O by Mail&News to improve performance on slower drives (eg network drives/shares). Use of Necko's buffer size, buffer size user can easily customize. Well, I have not used the aceman's patch in there yet since I wanted to find out where the offending code for calling write() repeatedly with smallish 74 octet length originated. I noticed this repeated write() of smallish size earlier this year when someone reported a serious issue when his mail store is on a remote file system (CIFS) and he seems to have used imap. I tried to re-create his problem locally by mounting Windows 7 file system from my local linux installation and in so doing noticed a few issues with CIFS code itself during a transient network error (which seems to have been fixed nicely), and a missing error code or two on TB's side, and in so doing noticed a trace of MANY write system calls with short (70+ octets) on wire. Such many small write calls over wire can kill performance easily, and the missing error handling was bad BTW, :aceman's patch there seems to handle many "read" side of the issue. The problem I am reporting is "write" side. Also, note that the above bug deals with buffer size when buffering is enabled. This bug is about a place where buffering is not used at all when it should. Bug 494238 - (jetpack-panel-apps) ability for a jetpack to open a panel I wanted to use the great profiler ( see https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Thunderbird_Performance_Problem_with_G ). But it never seemed to work for me when I tried to use it against C-C TB until lately. But finally, thanks to https://bugzilla.mozilla.org/show_bug.cgi?id=494238#c47 it seems to work! (But I have not yet used that profiler for this particular performance issue because it has been only a day since I could get it working after a long fiasco with bug regarding loadinfo, Bug 1111304 - assertion failure loadinfo, this month. ) I wanted to get out the words fast so that something can be done during the holidays for this serious performance issue. But once I become familiar with the profiler and can offer performance figures using the profiler, I will. It looks the profiler could have saved me in my efforts to capture the traceback mentioned below. I see a traceback in the profile window somehow. This is the stack trace of C-C TB when the write with this small chunk is invoked. (gdb) where #0 write () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff7fb8740 in pt_Write (fd=<optimized out>, buf=<optimized out>, amount=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/nsprpub/pr/src/pthreads/ptio.c:1315 #2 0x00007ffff04beccc in (anonymous namespace)::interposedWrite ( aFd=<optimized out>, aBuf=<optimized out>, aAmt=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/build/NSPRInterposer.cpp:68 #3 0x00007ffff7f9f0b9 in PR_Write (fd=<optimized out>, buf=<optimized out>, amount=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/nsprpub/pr/src/io/priometh.c:114 #4 0x00007ffff009ce03 in nsMsgFileStream::Write (this=<optimized out>, buf=<optimized out>, count=<optimized out>, result=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/base/util/nsMsgFileStream.cpp:154 #5 0x00007ffff02ace8d in nsMsgLocalMailFolder::CopyData ( this=<optimized out>, aIStream=<optimized out>, aLength=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsLocalMailFolder.cpp:2111 #6 0x00007fffeffa089f in nsCopyMessageStreamListener::OnDataAvailable ( this=<optimized out>, ctxt=<optimized out>, aIStream=<optimized out>, sourceOffset=<optimized out>, aLength=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/base/src/nsCopyMessageStreamListener.cpp:86 #7 0x00007ffff02c8a48 in nsMailboxProtocol::ReadMessageResponse ( this=<optimized out>, inputStream=<optimized out>, sourceOffset=<optimized out>, length=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsMailboxProtocol.cpp:594 #8 0x00007ffff02c8f0b in nsMailboxProtocol::ProcessProtocolState ( this=<optimized out>, url=<optimized out>, inputStream=<optimized out>, offset=<optimized out>, length=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsMailboxProtocol.cpp:679 #9 0x00007ffff00c1399 in nsMsgProtocol::OnDataAvailable ( this=<optimized out>, request=<optimized out>, ctxt=<optimized out>, inStr=<optimized out>, sourceOffset=<optimized out>, count=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/base/util/nsMsgProtocol.cpp:353 #10 0x00007ffff0572bfe in nsInputStreamPump::OnStateTransfer ( this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/netwerk/base/src/nsInputStreamPump.cpp:609 #11 0x00007ffff0573f5d in nsInputStreamPump::OnInputStreamReady ( this=<optimized out>, stream=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/netwerk/base/src/nsInputStreamPump.cpp:436 #12 0x00007ffff04751d8 in nsInputStreamReadyEvent::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/io/nsStreamUtils.cpp:88 #13 0x00007ffff049cf6d in nsThread::ProcessNextEvent (this=<optimized out>, aMayWait=<optimized out>, aResult=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/threads/nsThread.cpp:855 #14 0x00007ffff04cfba3 in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/glue/nsThreadUtils.cpp:265 #15 0x00007ffff08c4356 in mozilla::ipc::MessagePump::Run ( this=<optimized out>, aDelegate=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/glue/MessagePump.cpp:99 #16 0x00007ffff0885029 in MessageLoop::RunInternal (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:233 #17 0x00007ffff08853fc in RunHandler (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:226 #18 MessageLoop::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:200 #19 0x00007ffff28827b3 in nsBaseAppShell::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/widget/nsBaseAppShell.cpp:164 #20 0x00007ffff33d61f9 in nsAppStartup::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/components/startup/nsAppStartup.cpp:281 #21 0x00007ffff345e1ee in XREMain::XRE_mainRun (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4153 #22 0x00007ffff345f9ab in XREMain::XRE_main (this=<optimized out>, argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4229 #23 0x00007ffff345fd7d in XRE_main (argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>, aFlags=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4449 #24 0x000000000040378e in do_main (argc=argc@entry=1, argv=argv@entry=0x7fffffffd9f8, xreDirectory=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mail/app/nsMailApp.cpp:195 #25 0x00000000004038b5 in main (argc=1, argv=0x7fffffffd9f8) at /REF-COMM-CENTRAL/comm-central/mail/app/nsMailApp.cpp:380 (gdb) info register Yeah, right. I had to set a breakpoint to "write" system call and checked the registers (this is under Debian GNU/Linux 64-bit) and when the passed length parameter is 74 in, I hit the breakpoint and voila. It took me a while to figure out gdb's esoteric command expression syntax, and I have to skip the major part of the execution without setting up the break point (because with the conditional breakpoint, the execution speed is much slower [maybe error message written to stderr causes one |write| call per character and thus slows down TB execution very much], and hitting control-C to revert back to gdb prompt when TB is finally waiting for user interaction after showing the three window panes, and THEN set the conditional breakpoint break write if ($rdx == 74) to gdb prompt and continue, then hit the breakpoint to obtain the above stack trace. The great profiler I mentioned above seems to record the repeated write nicely (since it is sampled many times, and at the sampled time, the profiler seems to record the stack trace, and so it would have been much easier for me to figure out which higher level routine invokes short writes) TIA
Blocks: 1242046
Sorry, this bugzilla entry might have been made by mistake. Please ignore it for the moment. I will close it if this was indeed was created by accident...
Sorry I was not clear enough (and I almost forgot myself !). This bugzilla is to collect the report of "WriteLineToMailbox detected an unexpected file position change. If you can reliably reproduce this message, please report the steps you used to bug 1306914." which is printed if there is a system that does not honor the tacit assumption about the |Seek()| position in nsPop3Sink.cpp. This is crucial in the enabling-buffering patches. I am going to change the title a little bit.
When the user sees a seek position inconsistency during the beta testing period of patches to enable write buffering (see bug 1116055), C-C TB nightly would print the following message: WriteLineToMailbox detected an unexpected file position change. If you can reliably reproduce this message, please report the steps you used to bug 1306914. Please report the type of OS, 32-bit vs 64-bit, and CPU type and report the exact line you see in the error log or in the console. Thank you!
Summary: Report a file position issue: unnecessary |Seek()| is removed (re: Performance issue: Failure to use buffered write (comm-central thunderbird)) → Collecting file seek position problem report: WriteLineToMailbox detected an unexpected file position change. If you can reliably reproduce this message, please report the steps you used to bug 1306914.
We're landing this in bug 1116055.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
Summary: Collecting file seek position problem report: WriteLineToMailbox detected an unexpected file position change. If you can reliably reproduce this message, please report the steps you used to bug 1306914. → Collecting file seek position problem report: WriteLineToMailbox detected an unexpected file position change.
We will collect user feedback in bug 1308335.
You need to log in before you can comment on or make changes to this bug.