1116055 - Performance issue: Failure to use buffered write - Part 1: Collect file seek position information.

Assignee

Description

•

10 years ago

I have spotted a very conspicuous offender of not using buffered output where it can in comm-central thunderbird. I think as long as a buffer of reasonable size, even if it is 4KB or 16KB, is used, I/O performance will be much better and reasonably good. (especially for a remote file system, or imap case.) It does not have to be 128KB or much larger something, I think, although it would help for a local disk write and read. However, here I report that I have found a routine is copying message data WITHOUT ANY BUFFERING at all when it WRITES the output LINE BY LINE. The culprit routine I found is NS_IMETHODIMP nsMsgLocalMailFolder::CopyData(nsIInputStream *aIStream, int32_t aLength) in the file comm-central/mailnews/local/src/nsLocalMailFolder.cpp And the problematic unbuffered write in question is http://mxr.mozilla.org/comm-central/source/mailnews/local/src/nsLocalMailFolder.cpp#2111 2111 rv = mCopyState->m_fileStream->Write(start, lineLength, &bytesWritten); This does not seem to use buffering at all. (Please check the excerpt of system call trace from strace under linux attached in the following.). I agree that, since this function seems to be in need of scanning the data for "From" at the beginning, and escaping "From" at the beginning of line, etc., it needs to SCAN the INPUT line by line, but there is NO REASON AT ALL TO USE UNBUFFERED *OUTPUT*! Repeat-by: Monitor and record the system calls issued by TB. I used strace under linux. strace -o /tmp/t.out TB-BINARY Start TB, and then copy a message with reasonably large binary attachment in one folder to another folder. I found the repetition of small write(fd, bug, len) with length parameter 74 MANY times. E.g.: Excerpt from a typical strace record. I only excerpted the interesting lines. There are many other system calls. The particular message I copied had an attachment of 1.2MB size. The repeated writes() have the first parameter (file descriptor) as 54. I searched backward in trace file and found that FD 54 is for the target folder. (folder-C) ... open("/home/ishikawa/.thunderbird/u1ku1i3y.default/Mail/Local Folders/folder-C", O_RDWR|O_CREAT, 0664) = 54 ... Extraction of the initial part of the writes to this file descriptor: (I think I created the message using HTML format mail. That is why you see a few lines of HTML main message body. But that is not of any concern. Concern is the long repetition of mime data that follows.) ... write(54, "From - Sun Nov 22 11:35:01 2009\r"..., 33) = 33 clock_gettime(CLOCK_MONOTONIC, {2572825, 299989570}) = 0 clock_gettime(CLOCK_MONOTONIC, {2572825, 300253008}) = 0 write(54, "X-Mozilla-Status: 0001\r\n", 24) = 24 clock_gettime(CLOCK_MONOTONIC, {2572825, 299989570}) = 0 clock_gettime(CLOCK_MONOTONIC, {2572825, 300253008}) = 0 write(54, "X-Mozilla-Status2: 10000000\r\n", 29) = 29 .... I omit clock_gettime() below ... write(54, "X-Mozilla-Keys: $label3 "..., 97) = 97 write(54, "FCC: mailbox://nobody@Local%20Fo"..., 44) = 44 write(54, "X-Identity-Key: id1\r\n", 21) = 21 write(54, "Message-ID: <4B08A02E.7080201@ex"..., 44) = 44 write(54, "Date: Sun, 22 Nov 2009 11:35:01 "..., 39) = 39 write(54, "From: ishikawa <ishikawa@example"..., 39) = 39 write(54, "X-Mozilla-Draft-Info: internal/d"..., 70) = 70 write(54, "User-Agent: Thunderbird 2.0.0.23"..., 49) = 49 write(54, "MIME-Version: 1.0\r\n", 19) = 19 write(54, "To: nobody@example.com\r\n", 24) = 24 write(54, "Subject: libc-2.7.so 11:35\r\n", 28) = 28 write(54, "Content-Type: multipart/mixed;\r\n", 32) = 32 write(54, " boundary=\"------------000003080"..., 50) = 50 write(54, "\r\n", 2) = 2 write(54, "This is a multi-part message in "..., 46) = 46 write(54, "--------------000003080301040107"..., 40) = 40 write(54, "Content-Type: text/html; charset"..., 45) = 45 write(54, "Content-Transfer-Encoding: 7bit\r"..., 33) = 33 write(54, "\r\n", 2) = 2 write(54, "<!DOCTYPE html PUBLIC \"-//W3C//D"..., 65) = 65 write(54, "<html>\r\n", 8) = 8 write(54, "<head>\r\n", 8) = 8 write(54, "</head>\r\n", 9) = 9 write(54, "<body bgcolor=\"#ffffff\" text=\"#0"..., 41) = 41 write(54, "<br>\r\n", 6) = 6 write(54, "<br>\r\n", 6) = 6 write(54, "</body>\r\n", 9) = 9 write(54, "</html>\r\n", 9) = 9 write(54, "\r\n", 2) = 2 write(54, "--------------000003080301040107"..., 40) = 40 write(54, "Content-Type: application/octet-"..., 41) = 41 write(54, " name=\"libc-2.7.so\"\r\n", 21) = 21 write(54, "Content-Transfer-Encoding: base6"..., 35) = 35 write(54, "Content-Disposition: attachment;"..., 34) = 34 write(54, " filename=\"libc-2.7.so\"\r\n", 25) = 25 write(54, "\r\n", 2) = 2 write(54, "f0VMRgEBAQAAAAAAAAAAAAMAAwABAAAA"..., 74) = 74 write(54, "AAA0AAAANAAAADQAAABAAQAAQAEAAAUA"..., 74) = 74 write(54, "BAAAAAEAAAABAAAAAAAAAAAAAAAAAAAA"..., 74) = 74 write(54, "EwCcJwAAcFQAAAYAAAAAEAAAAgAAAJyd"..., 74) = 74 ... MANY lines of write with 74 octets.... write(54, "--------------000003080301040107"..., 42) = 42 write(54, "\r\n", 2) = 2 close(54) = 0 Observation: For this message with 1.2MB attachment, I counted 24016 write calls. (fgrep "write(54," /tmp/t.out | wc) Granted that the routine seems to be doing something about seeing "From", escaping "From" with ">From", etc., but it has *NO* reason to use UNBUFFERED WRITE here. It can buffer the output and then finally at the end of the loop it can flush or something. HOW TO FIX? I am not familiar with I/O routine inside mozilla code base. I am not sure where this file stream is opened [See the stack backtrace at the end.] and why it is not using buffered mode. rv = mCopyState->m_fileStream->Write(start, lineLength, &bytesWritten); Is there a way to set the buffer-mode to m_fileStream inside CopyData (just in case there are multiple wayward callers)? Or can we simply replace this Write with a buffered output version, but what that function would be? [The stack trace is shown below for those who want to chase what is calling this function.] The above is for copying an existing message to another folder. (So it can happen offline.) When I look up CopyData in mxr database, I found another instance of CopyData (for imap) and that seems to have the same unbuffered output issue. (Or at least superficially the code looks the same. Unless "->Write" semantics is different (i.e., unbuffered vs buffered) we have the crawling I/O performance when we copy / download large message (when we write to a local folder) when imap is used, too! MXR: E.g. Other place(s) where CopyData was found: I searched the identifier CopyData using mxr: MXR is slightly buggy, I think: I only left mail/mailnews-related entries. Defined as a function in: mailnews/imap/src/nsImapMailFolder.cpp (View Hg log or Hg annotations) line 3343, as member of class nsImapMailFolder -- NS_IMETHODIMP nsImapMailFolder::CopyData(nsIInputStream *aIStream, int32_t aLength) This function, CopyData calls another function CopyDataToOutputStreamForAppend() and in it, we have 3303 { 3304 rv = outputStream->Write(start, 3305 end-start, 3306 &writeCount); 3307 rv = outputStream->Write(CRLF, 2, &writeCount); 3308 } We should make sure that these writes are buffered! (and flush at the end of the loop/return.) From what I read before about the slowness of I/O in a comment posted by IMAP user, probably these are NOT buffered, either. AND, we SHOULD CHECK the return value rv: there is NO ERROR checking here !? No wonder there are reports of IMAP users losing messages, etc. in the past? [Added comment: even if these UNBUFFERED writes failed, close will succeed! So checking here is very important! ] Defined as a function prototype in: mailnews/base/public/nsICopyMessageListener.idl (View Hg log or Hg annotations) line 20 -- void copyData(in nsIInputStream aIStream, in long aLength); Defined as a variable in: mailnews/local/src/nsLocalMailFolder.cpp (View Hg log or Hg annotations) line 1830 -- rv = CopyData(inputStream, (int32_t) fileSize); line 2026 -- NS_IMETHODIMP nsMsgLocalMailFolder::CopyData(nsIInputStream *aIStream, int32_t aLength) This is the function I found out about and report here. The questions I have are: Is there a way to set the buffer-mode to m_fileStream inside CopyDate (just in case there are other wayward callers)? Or can we simply replace this Write with a buffered output version? See Related bugs: Bug 558528 - Larger buffer size in file I/O by Mail&News to improve performance on slower drives (eg network drives/shares). Use of Necko's buffer size, buffer size user can easily customize. Well, I have not used the aceman's patch in there yet since I wanted to find out where the offending code for calling write() repeatedly with smallish 74 octet length originated. I noticed this repeated write() of smallish size earlier this year when someone reported a serious issue when his mail store is on a remote file system (CIFS) and he seems to have used imap. I tried to re-create his problem locally by mounting Windows 7 file system from my local linux installation and in so doing noticed a few issues with CIFS code itself during a transient network error (which seems to have been fixed nicely), and a missing error code or two on TB's side, and in so doing noticed a trace of MANY write system calls with short (70+ octets) on wire. Such many small write calls over wire can kill performance easily, and the missing error handling was bad BTW, :aceman's patch there seems to handle many "read" side of the issue. The problem I am reporting is "write" side. Also, note that the above bug deals with buffer size when buffering is enabled. This bug is about a place where buffering is not used at all when it should. Bug 494238 - (jetpack-panel-apps) ability for a jetpack to open a panel I wanted to use the great profiler ( see https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Thunderbird_Performance_Problem_with_G ). But it never seemed to work for me when I tried to use it against C-C TB until lately. But finally, thanks to https://bugzilla.mozilla.org/show_bug.cgi?id=494238#c47 it seems to work! (But I have not yet used that profiler for this particular performance issue because it has been only a day since I could get it working after a long fiasco with bug regarding loadinfo, Bug 1111304 - assertion failure loadinfo, this month. ) I wanted to get out the words fast so that something can be done during the holidays for this serious performance issue. But once I become familiar with the profiler and can offer performance figures using the profiler, I will. It looks the profiler could have saved me in my efforts to capture the traceback mentioned below. I see a traceback in the profile window somehow. This is the stack trace of C-C TB when the write with this small chunk is invoked. (gdb) where #0 write () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff7fb8740 in pt_Write (fd=<optimized out>, buf=<optimized out>, amount=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/nsprpub/pr/src/pthreads/ptio.c:1315 #2 0x00007ffff04beccc in (anonymous namespace)::interposedWrite ( aFd=<optimized out>, aBuf=<optimized out>, aAmt=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/build/NSPRInterposer.cpp:68 #3 0x00007ffff7f9f0b9 in PR_Write (fd=<optimized out>, buf=<optimized out>, amount=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/nsprpub/pr/src/io/priometh.c:114 #4 0x00007ffff009ce03 in nsMsgFileStream::Write (this=<optimized out>, buf=<optimized out>, count=<optimized out>, result=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/base/util/nsMsgFileStream.cpp:154 #5 0x00007ffff02ace8d in nsMsgLocalMailFolder::CopyData ( this=<optimized out>, aIStream=<optimized out>, aLength=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsLocalMailFolder.cpp:2111 #6 0x00007fffeffa089f in nsCopyMessageStreamListener::OnDataAvailable ( this=<optimized out>, ctxt=<optimized out>, aIStream=<optimized out>, sourceOffset=<optimized out>, aLength=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/base/src/nsCopyMessageStreamListener.cpp:86 #7 0x00007ffff02c8a48 in nsMailboxProtocol::ReadMessageResponse ( this=<optimized out>, inputStream=<optimized out>, sourceOffset=<optimized out>, length=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsMailboxProtocol.cpp:594 #8 0x00007ffff02c8f0b in nsMailboxProtocol::ProcessProtocolState ( this=<optimized out>, url=<optimized out>, inputStream=<optimized out>, offset=<optimized out>, length=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsMailboxProtocol.cpp:679 #9 0x00007ffff00c1399 in nsMsgProtocol::OnDataAvailable ( this=<optimized out>, request=<optimized out>, ctxt=<optimized out>, inStr=<optimized out>, sourceOffset=<optimized out>, count=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/base/util/nsMsgProtocol.cpp:353 #10 0x00007ffff0572bfe in nsInputStreamPump::OnStateTransfer ( this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/netwerk/base/src/nsInputStreamPump.cpp:609 #11 0x00007ffff0573f5d in nsInputStreamPump::OnInputStreamReady ( this=<optimized out>, stream=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/netwerk/base/src/nsInputStreamPump.cpp:436 #12 0x00007ffff04751d8 in nsInputStreamReadyEvent::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/io/nsStreamUtils.cpp:88 #13 0x00007ffff049cf6d in nsThread::ProcessNextEvent (this=<optimized out>, aMayWait=<optimized out>, aResult=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/threads/nsThread.cpp:855 #14 0x00007ffff04cfba3 in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/glue/nsThreadUtils.cpp:265 #15 0x00007ffff08c4356 in mozilla::ipc::MessagePump::Run ( this=<optimized out>, aDelegate=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/glue/MessagePump.cpp:99 #16 0x00007ffff0885029 in MessageLoop::RunInternal (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:233 #17 0x00007ffff08853fc in RunHandler (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:226 #18 MessageLoop::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:200 #19 0x00007ffff28827b3 in nsBaseAppShell::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/widget/nsBaseAppShell.cpp:164 #20 0x00007ffff33d61f9 in nsAppStartup::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/components/startup/nsAppStartup.cpp:281 #21 0x00007ffff345e1ee in XREMain::XRE_mainRun (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4153 #22 0x00007ffff345f9ab in XREMain::XRE_main (this=<optimized out>, argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4229 #23 0x00007ffff345fd7d in XRE_main (argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>, aFlags=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4449 #24 0x000000000040378e in do_main (argc=argc@entry=1, argv=argv@entry=0x7fffffffd9f8, xreDirectory=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mail/app/nsMailApp.cpp:195 #25 0x00000000004038b5 in main (argc=1, argv=0x7fffffffd9f8) at /REF-COMM-CENTRAL/comm-central/mail/app/nsMailApp.cpp:380 (gdb) info register Yeah, right. I had to set a breakpoint to "write" system call and checked the registers (this is under Debian GNU/Linux 64-bit) and when the passed length parameter is 74 in, I hit the breakpoint and voila. It took me a while to figure out gdb's esoteric command expression syntax, and I have to skip the major part of the execution without setting up the break point (because with the conditional breakpoint, the execution speed is much slower [maybe error message written to stderr causes one |write| call per character and thus slows down TB execution very much], and hitting control-C to revert back to gdb prompt when TB is finally waiting for user interaction after showing the three window panes, and THEN set the conditional breakpoint break write if ($rdx == 74) to gdb prompt and continue, then hit the breakpoint to obtain the above stack trace. The great profiler I mentioned above seems to record the repeated write nicely (since it is sampled many times, and at the sampled time, the profiler seems to record the stack trace, and so it would have been much easier for me to figure out which higher level routine invokes short writes) TIA

1st try: make m_fileStream buffered. (works) 10 years ago ISHIKAWA, Chiaki (deleted), patch	neil : review- Irving : review-	Details \| Diff \| Splinter Review
experiment 2 10 years ago :aceman (deleted), patch	ishikawa : feedback+	Details \| Diff \| Splinter Review
Check fileposition before and after a supposedly unnecessary seek() and report mismatch. 10 years ago ISHIKAWA, Chiaki (deleted), patch	aceman : feedback+ neil : feedback+	Details \| Diff \| Splinter Review
Check fileposition before and after a supposedly unnecessary seek() and report mismatch. (Take 2. Uses rkent's logging code to console) 10 years ago ISHIKAWA, Chiaki (deleted), patch		Details \| Diff \| Splinter Review
Check fileposition before and after a supposedly unnecessary seek() and report mismatch. (Take 3. Uses rkent's logging code to console) modified based on aceman's comment 10 years ago ISHIKAWA, Chiaki (deleted), patch	neil : feedback+	Details \| Diff \| Splinter Review
Screen dump to show the "Informational" message printed by the patch. 10 years ago ISHIKAWA, Chiaki (deleted), image/png		Details
Check fileposition before and after a supposedly unnecessary seek() and report mismatch. (Take 4. Uses rkent's logging code to error console) modified based comments of aceman and Neal 10 years ago ISHIKAWA, Chiaki (deleted), patch		Details \| Diff \| Splinter Review
Check fileposition before and after a supposedly unnecessary seek() and report mismatch. (Take 5. Uses rkent's logging code to error console) modified based comments of aceman and Neal 9 years ago ISHIKAWA, Chiaki (deleted), patch		Details \| Diff \| Splinter Review
Check fileposition before and after a supposedly unnecessary seek() and report mismatch. (Take 6. Uses rkent's logging code to error console) modified based comments of aceman and Neal 9 years ago ISHIKAWA, Chiaki (deleted), patch	neil : review+	Details \| Diff \| Splinter Review
Updated patch to address bitrot 9 years ago ISHIKAWA, Chiaki (deleted), patch		Details \| Diff \| Splinter Review
Updated patch to address bitrot (Feb 18 version) 9 years ago ISHIKAWA, Chiaki (deleted), patch		Details \| Diff \| Splinter Review
Server side strace with Thunderbird-try-build from Feb 22 9 years ago Ignaz Forster (deleted), application/gzip		Details
Server side strace with Thunderbird-daily from Mar 10 9 years ago Ignaz Forster (deleted), application/gzip		Details
Updated patch to address bit rot (May 27th version) 8 years ago ISHIKAWA, Chiaki (deleted), patch		Details \| Diff \| Splinter Review
Check fileposition before and after a supposedly unnecessary seek() and report mismatch. (Take 7) modified based comments of aceman as of late carrying on r=neal 8 years ago ISHIKAWA, Chiaki (deleted), patch	ishikawa : review+ aceman : feedback+	Details \| Diff \| Splinter Review
Check fileposition before and after a supposedly unnecessary seek() and report mismatch. (Take 8) modified based comments of aceman as of late carrying on r=neal 8 years ago ISHIKAWA, Chiaki (deleted), patch	ishikawa : review+	Details \| Diff \| Splinter Review
Check fileposition before and after a supposedly unnecessary seek() and report mismatch. 8 years ago Jorg K (CEST = GMT+2) (deleted), patch		Details \| Diff \| Splinter Review
Check fileposition before and after a supposedly unnecessary seek() and report mismatch (JK v2). 8 years ago Jorg K (CEST = GMT+2) (deleted), patch		Details \| Diff \| Splinter Review
Check fileposition before and after a supposedly unnecessary seek() and report mismatch (JK v3). 8 years ago Jorg K (CEST = GMT+2) (deleted), patch		Details \| Diff \| Splinter Review
Check fileposition before and after a supposedly unnecessary seek() and report mismatch (JK v4). 8 years ago Jorg K (CEST = GMT+2) (deleted), patch		Details \| Diff \| Splinter Review
Check fileposition before and after a supposedly unnecessary seek() and report mismatch (JK v5). 8 years ago Jorg K (CEST = GMT+2) (deleted), patch	ishikawa : feedback- ishikawa : feedback-	Details \| Diff \| Splinter Review
Check fileposition before and after a supposedly unnecessary seek() and report mismatch (JK v5) with a couple of tweaks. 8 years ago ISHIKAWA, Chiaki (deleted), patch		Details \| Diff \| Splinter Review
Check fileposition before and after a supposedly unnecessary seek() and report mismatch (JK v6). 8 years ago Jorg K (CEST = GMT+2) (deleted), patch	ishikawa : feedback+	Details \| Diff \| Splinter Review
Check fileposition before and after a supposedly unnecessary seek() and report mismatch (JK v7). 8 years ago Jorg K (CEST = GMT+2) (deleted), patch	aceman : feedback+	Details \| Diff \| Splinter Review
Check fileposition before and after a supposedly unnecessary seek() and report mismatch (JK v8). 8 years ago Jorg K (CEST = GMT+2) (deleted), patch	aceman : review+	Details \| Diff \| Splinter Review