846123 - Thunderbird 100% CPU for minutes when copying a large number of messages (IMAP Online Copy, Copy between Offline-Use=Off folders)

Reporter

Description

•

12 years ago

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130219 Firefox/17.0 Build ID: 20130219212323 Steps to reproduce: 1) Select 30,000 messages in an IMAP folder 2) Right click selected messages and choose to copy to another IMAP folder Actual results: Thunderbird eats up 100% cpu for a number of minutes. There is no IMAP activity in the background. After a number of minutes, cpu goes back down and it actually starts doing the IMAP work to copy the messages. During the 100% cpu period, a strace reveals that futexes are slowly timing out: [pid 706] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 706] futex(0x7f273e7f5de8, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 706] futex(0x7f273d69a8cc, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 55, {1362017517, 64864000}, ffffffff <unfinished ...> [pid 704] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 704] futex(0x7f273e7f6d58, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 704] futex(0x7f2740263ecc, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 45, {1362017521, 981496000}, ffffffff <unfinished ...> [pid 707] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 707] futex(0x7f27422f7128, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 707] futex(0x7f273e09328c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 95, {1362017523, 224369000}, ffffffff <unfinished ...> [pid 512] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 512] futex(0x7f27418cffc8, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 512] futex(0x7f273d6c5e4c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 55, {1362017530, 805453000}, ffffffff <unfinished ...> [pid 2588] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 2588] futex(0x7f2749adf908, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 2588] futex(0x7f270c3fc1cc, FUTEX_WAIT_PRIVATE, 3, NULL <unfinished ...> <snip> When it finishes timing out, then the IMAP transfers actually start. Expected results: Thunderbird should copy the messages without the initial long period of timing out where it doesn't appear to be doing anyhing.

WADA:World Anti-bad-Duping Agency

Comment 1

•

12 years ago

(In reply to thomaschan from comment #0) > 2) Right click selected messages and choose to copy to another IMAP folder Which case? (a-1) IMAP account of target Mbox is same as IMAP accunt of source Mbox (a-2) IMAP account of target Mbox is defferent from IMAP account of source Mbox I assum (a-1). I assume auto-sync is enabled. mail.server.serverN.autosync_offline_stores = true, or if not false, mail.server.default.autosync_offline_stores = true(deault) Which case? (c-1) copy source folder : Offline-Use=On, copy target folder : Offline-Use=On (c-2) copy source folder : Offline-Use=On, copy target folder : Offline-Use=Off (c-3) copy source folder : Offline-Use=Off, copy target folder : Offline-Use=On (c-4) copy source folder : Offline-Use=Off, copy target folder : Offline-Use=Off Do you enable Global Search and Indexer of Tools/Options/Advanced/General? mailnews.database.global.indexer.enabled = true( default is true) If yes, change for Bug 562115, Bug 574441 etc. may be relevant. And, if yes, following part of phenomenon you observed is perhaps explained. > There is no IMAP activity in the background. After a number of minutes, > (snip) and it actually starts doing the IMAP work to copy the messages. because Tb do copy between offline-store file then sync with IMAP server. Because above "copy between offline-store" consists of "large file block read + large file block write" and "indexing of mail data for Global Search using SQL", High CPU usage may be; Efficient CPU usage / aggressive CPU usage, in file copy and indexing of data. Because "sync with IMAP" phase consists of "uid a:b,p:q,y:z copy target for all 30000 mails at source Mbox" and "header fetch of all 30000 mails at taret Mbox" and "uid x fetch body[] for all 30000 mails may be executed if Offline-store", workload of IMAP server for them is usually heavy and timeout may occur due to continuous fetch requests for 30000 mails. Do you see your problem with following? (1) Go Work Offline mode, copy 30000 mails. (file to file copy + indexing) (2) Go Work Online mode => Tb tries to sync with IMAP server. Do you see your problem with "Global Search and Indexer" disabled?

WADA:World Anti-bad-Duping Agency

Comment 2

•

12 years ago

FYI. Tb has logging capability. See 402793 comment #28 for protocol logging etc. > Linux sample > export NSPR_LOG_MODULES=timestamp,imap:5,MsgCopyService:5,ImapAutoSync:5,IMAPOFFLINE:5,MsgPurge:5,MSGDB:5 > ",MsgPurge:5,MSGDB:5" may not be needed in your case. > If log for Socket level activity is also needed, add following. > timestamp,imap:5, ... ,nsSocketTransport:5,nsHostResolver:5 Because FUTEX_WAIT/FUTEX_WAKE looks relevant to "STS poll iter [0]" log after "SocketInputStream::AsyncWait" log by nsSocketTransport:5, Socket level log may be useful for matching Tb's activity with your strace.

WADA:World Anti-bad-Duping Agency

Comment 3

•

12 years ago

IMAP logging writes mail data in log, so log file is huge. Socket log produces very very many log lines, so it makes log file huger. Please don't attach huge log file to this bug, because developers and QA peoples also can't read and understand such huge file easily. Please extract importat part only if log analysis by developer is needed, and remove/replace private data from log file, if you need to open log fle to public.

thomaschan

Reporter

Comment 4

•

12 years ago

(In reply to WADA from comment #1) > > 2) Right click selected messages and choose to copy to another IMAP folder > > Which case? > (a-1) IMAP account of target Mbox is same as IMAP accunt of source Mbox > (a-2) IMAP account of target Mbox is defferent from IMAP account of source > Mbox > I assum (a-1). a-1. > I assume auto-sync is enabled. > mail.server.serverN.autosync_offline_stores = true, > or if not false, mail.server.default.autosync_offline_stores = true(deault) > Which case? > (c-1) copy source folder : Offline-Use=On, copy target folder : > Offline-Use=On > (c-2) copy source folder : Offline-Use=On, copy target folder : > Offline-Use=Off > (c-3) copy source folder : Offline-Use=Off, copy target folder : > Offline-Use=On > (c-4) copy source folder : Offline-Use=Off, copy target folder : > Offline-Use=Off I don't have the autosync setting for the specific server, but the default is true. So I think that would mean c-1? > Do you enable Global Search and Indexer of Tools/Options/Advanced/General? > mailnews.database.global.indexer.enabled = true( default is true) Global Search and Indexer is false. > Do you see your problem with following? > (1) Go Work Offline mode, copy 30000 mails. (file to file copy + indexing) > (2) Go Work Online mode => Tb tries to sync with IMAP server. I went offline (but chose not to download the message bodies). I then did the copy and it finished the operation pretty quickly. After going offline, it then goes to 100% cpu and starts with the futex timeouts.

thomaschan

Reporter

Comment 5

•

12 years ago

(In reply to thomaschan from comment #4) > > I assume auto-sync is enabled. > > mail.server.serverN.autosync_offline_stores = true, > > or if not false, mail.server.default.autosync_offline_stores = true(deault) > > Which case? > > (c-1) copy source folder : Offline-Use=On, copy target folder : > > Offline-Use=On > > (c-2) copy source folder : Offline-Use=On, copy target folder : > > Offline-Use=Off > > (c-3) copy source folder : Offline-Use=Off, copy target folder : > > Offline-Use=On > > (c-4) copy source folder : Offline-Use=Off, copy target folder : > > Offline-Use=Off > > I don't have the autosync setting for the specific server, but the default > is true. So I think that would mean c-1? Sorry, I read the my config setting wrong. Actually I had already set mail.server.default.autosync_offline_store = false. So it's actually c-4.

thomaschan

Reporter

Comment 6

•

12 years ago

Attached file Log of thunderbird when futex timeouts are happening (deleted) — Details

thomaschan

Reporter

Comment 7

•

12 years ago

The log was done with: export NSPR_LOG_MODULES=timestamp,imap:5,MsgCopyService:5,ImapAutoSync:5,IMAPOFFLINE:5,nsSocketTransport:5,nsHostResolver:5 There's a time gap between 19:48 and 19:58 where the futex timeouts are happening and cpu is 100%

WADA:World Anti-bad-Duping Agency

Comment 8

•

12 years ago

(In reply to thomaschan from comment #6) > Log of thunderbird when futex timeouts are happening > Inbox is selected at a cached connection, Tb requested IDLE, and server replied "done". > Because resync with server starts, Tb terminated IDLE, then server returned OK to IDLE. > 2013-03-04 19:58:16.876813 UTC :S-INBOX:CreateNewLineFromSocket: 11 OK IDLE completed. > 2013-03-04 19:58:16.877003 UTC :S-INBOX:ProcessCurrentURL: entering > 2013-03-04 19:58:16.877212 UTC :S-INBOX:ProcessCurrentURL:imap://tbird17test@imapserver:993/onlinecopy%3EUID%3E/INBOX%3E51:100,152:201,253:302,352:401, ... ,64081:64205,64212:64251%3E/INBOX/test1: = currentUrl Log line length=7372 bytes. Because this is single "copy mail" request to IMAP code from Tb's main task, IMAP code perhaps issue single 'uid a:b,c:d:, ..., y:z x:x "INBOX/test1"' command for 30,000 mails. I believe mail.server.default.imap_mail_move.granularity = 100 mails or 10MB will be needed. mail.server.default.imap_mail_header_fetch.granularity = 3000 mails, mail.server.default.imap_mail_body_fetch.granularity = 10 mails, like one will also be needed. External look is similar to bug 538283(and dup bugs) and bug 559676, and FUTEX WAIT/WAKEUP is perhaps by "PR_Sleep and wakingup" which is seen in bug 559676. IIRC, mutex lock is used by it when some conditions exist. In your case, as autosync_offline_store=false, following occurs. (1) While Offline mode, msgDBHeader is copied using fakedMsgKey(MsgKey is UID, but actual UID of mail in copy target folder can be known only after copy command completion, so temporary UID is used.). Because this is in-memory operation of many small data, completes quickly. (2) After go back to Work Online mode, Tb tries to sync with server, and requests copy of 30,000 mails to IMAP server by single command. (A) It takes long to copy 30,000 mails at server if ordinal/normal IMAP server. (B) And, it takes long to return response to the copy command(unsol response of UID for each mail if UIDPLUS is supported, or OK reponse after end of copy command). (C) Because no response from server for long time, Tb(IP, TCP, Socket, SSL, Socket, ...) tries to read response data repeatedly. (D) Because IMAP API of Tb is synchronouse, and request to IMAP is done at UI task, Tb's UI freeze occur. If IMAP server is accessed via SSL tunnel of Avast!, "CPU hog for long time" in phase (2) won't occur in many cases. Tb <-(non-SSL)-> [143] Avast! IMAP proxy <-(SSL)-> [993] IMAP server. I think "CPU 100% in your case" is essentialy problem of bug 559676 in SSL code used by Tb/Fx. - Too aggressive/frequent read operation for response, repeatedly, shortly. - Bad PR_poll/PR_iter relevant logic, because many "wait for 0 microsecond" like log is seen in Socket log always if SSL is used. For "Data Loss" part in your bug summary. It sounds similar to bugs which are set in dependency tree for Bug 720158. In your case, "lost" mails seems correctly copied/moved to target Mbox by go Work Offline/go back Work Online. It may be a difference between "ordinal bulk copy/move" and "batch type move by archive".

WADA:World Anti-bad-Duping Agency

Updated

•

12 years ago

Blocks: 847285

WADA:World Anti-bad-Duping Agency

Comment 9

•

12 years ago

For "single uid copy command" in your log. When I tested "many mail move by move in context menu", Tb split copy command to multiple "uid copy" commands. This perhaps because; "many mail move operation" is split to multiple "smaller mail move operations" interally, so IMAP command is also split. When "Archive while online", archive operation is split into; Move request from source to Archives/YYY1 for all mails which should go YYYY1 | | Move request from source to Archives/YYYn for all mails which should go YYYYn Each "move to Archives/YYYx" is executed separatelly(serially) at single cached connection where source Mbox is selectd. I don't know each "move to Archives/YYYx" is split to multiple "uid copy" commands or executed by single "uid copy" command. Because single "uid copy" was used in your case when "re-do of copy/move of mails while Work Offline", I think single "uid copy" command is used also by each "move to Archives/YYYx" by Archive, but I'm not sure.

WADA:World Anti-bad-Duping Agency

Comment 10

•

12 years ago

Oh, sorry, your case was "copy" case and no "Data Loss". I was confused with other bug. Although archive is a "move" and similar issue is relevant, archive is diferent from move/copy. Please ignore my archive and "Data loss" related comments. For "single uid copy command" or "multiple uid copy commands" by "move/copy while Online" : Check IMAP log for "copy mm mails" or "move mm mails" with smaller number of mails, please.

thomaschan

Reporter

Comment 11

•

12 years ago

I tried setting: mail.server.default.imap_mail_move.granularity = 100 mail.server.default.imap_mail_header_fetch.granularity = 3000 mail.server.default.imap_mail_body_fetch.granularity = 10 It didn't seem to help the wait before the transfer times. I did some timings of a variable amount of messages, timing between the two log events where the cpu spike happens, for example: 2013-03-05 23:56:54.988515 UTC - 1724770080[7f9166a64040]: request 10e662c0 Clearing OK request - src imap://tbird17test@imapserver/INBOX dest imap://tbird17test@imapserver/INBOX/test1 numItems 30000 type=0 <cpu spike> 2013-03-06 00:06:12.597309 UTC - 1024456448[7f914d9aa9b0]: 45398800:imapserver:S-INBOX:SendData: DONE Number_of_messages, Timeout_in_seconds 1000, 0 3000, 5 4000, 9 6000, 20 10000, 55 20000, 222 30000, 557 The general trend seems to be that the timeout period is doubled when the number of messages copied increases by 50%.

WADA:World Anti-bad-Duping Agency

Comment 12

•

12 years ago

(In reply to thomaschan from comment #11) > I tried setting: > mail.server.default.imap_mail_move.granularity = 100 Sorry for my confusing comment. Such setting doesn't exist. All of them is based on my feeling of; if such settings existed in Tb, "30,000 UID in one command" might be avoided... See bug 610264 for reason why I thought header_fetch.granularity like one would be needed, please.

WADA:World Anti-bad-Duping Agency

Comment 13

•

12 years ago

(In reply to thomaschan from comment #11) > Number_of_messages, Timeout_in_seconds > 1000, 0 > (snip) > 30000, 557 What do you call by "Timeout_in_seconds"? > The general trend seems to be that the timeout period is doubled when the number of messages copied increases by 50%. What do you call by "timeout period"?

WADA:World Anti-bad-Duping Agency

Comment 14

•

12 years ago

Attached file IMAP log, copy 2 mails, between Offline-Use=Off folder, Online Copy, Gmail IMAPe Wo Gmail (deleted) — Details

WADA:World Anti-bad-Duping Agency

Comment 15

•

12 years ago

(In reply to thomaschan from comment #11) > 2013-03-05 23:56:54.988515 UTC - 1724770080[7f9166a64040]: request 10e662c0 > Clearing OK request - src imap://tbird17test@imapserver/INBOX dest > imap://tbird17test@imapserver/INBOX/test1 numItems 30000 type=0 > <cpu spike> > 2013-03-06 00:06:12.597309 UTC - 1024456448[7f914d9aa9b0]: > 45398800:imapserver:S-INBOX:SendData: DONE Atttached is IMAP log for "copy 2 mails"(Inbox/Inbox1 to Inbox/Inbox2). Because I didn't click Inbox/Inbox2(target), header is not fetched yet at Inbox/Inbox2 after copy. As seen in my log, response to "uid x:y copy Target" is "OK" only(as UIDPLUS is supported, UID is returned by OK response). "SendData: DONE" is perhaps IDLE related log after OK for "uid copy" for mails. What phase does logs of "request 10e662c0", "Clearing OK request - src imap:/..." correspond to?

thomaschan

Reporter

Comment 16

•

12 years ago

(In reply to WADA from comment #13) > (In reply to thomaschan from comment #11) > > Number_of_messages, Timeout_in_seconds > > 1000, 0 > > (snip) > > 30000, 557 > > What do you call by "Timeout_in_seconds"? > > > The general trend seems to be that the timeout period is doubled when the number of messages copied increases by 50%. > > What do you call by "timeout period"? What I mean by timeout is the duration between the two events in the log: 2013-03-05 23:56:54.988515 UTC - 1724770080[7f9166a64040]: request 10e662c0 Clearing OK request - src imap://tbird17test@imapserver/INBOX dest imap://tbird17test@imapserver/INBOX/test1 numItems 30000 type=0 and 2013-03-06 00:06:12.597309 UTC - 1024456448[7f914d9aa9b0]: 45398800:imapserver:S-INBOX:SendData: DONE This is when the UI hangs, cpu spikes, and nothing appears to happen except for the futex timeouts in strace.

thomaschan

Reporter

Comment 17

•

12 years ago

Attached file Copy 2 emails from one IMAP folder to another (deleted) — Details

thomaschan

Reporter

Comment 18

•

12 years ago

(In reply to WADA from comment #15) > "SendData: DONE" is perhaps IDLE related log after OK for "uid copy" for > mails. > What phase does logs of "request 10e662c0", "Clearing OK request - src > imap:/..." correspond to? I've uploaded a new log of copying two emails to another folder. The phase it corresponds to is before "uid copy". The "Clearing OK request" is right after "CopyMessages request" and "DoCopy".

WADA:World Anti-bad-Duping Agency

Comment 19

•

12 years ago

(In reply to thomaschan from comment #17) > Created attachment 721633 [details] > Copy 2 emails from one IMAP folder to another (In reply to thomaschan from comment #18) > The phase it corresponds to is before "uid copy". > The "Clearing OK request" is right after "CopyMessages request" and "DoCopy". Thanks for log. I got log with imap:5 only. You look got log with other options. By your log, following is known by your log only. (1) Tb does do "copy" locally. (msgDBHeader copy only, because Offline-Use=Off, auto-sync=Disabled) (2) log like next. request c8f15680 Clearing OK request - src imap:// ... IBOX/test2 dest imap:// ... /INBOX/test1 numItems 2 type=0 (3) Terminate IDLE for test2, to issue command to server :S-INBOX/test2:SendData: DONE (to terminate IDLE) :S-INBOX/test2:CreateNewLineFromSocket: 19 OK IDLE completed. (4) Issue copy command, to request "copy from test2 to test1" to server. S-INBOX/test2:SendData: 21 uid copy 173959,173961 "INBOX/test1" (5) copy command completed at server. :S-INBOX/test2:CreateNewLineFromSocket: 21 OK ... COPY completed. (6) After it, IDLE command is issued for test2 :S-INBOX/test2:SendData: 24 IDLE :S-INBOX/test2: ... + IDLE accepted, awaiting DONE command. (7) After it, you accessed test1 or new mail check is invoked at test1. :S-INBOX/test1:SendData: DONE (to terminate IDLE) :S-INBOX/test1:CreateNewLineFromSocket: * 4 RECENT :S-INBOX/test1:CreateNewLineFromSocket: * 60038 EXISTS :S-INBOX/test1:CreateNewLineFromSocket: 38 OK IDLE completed. :S-INBOX/test1:SendData: 39 noop Start of your "timeout" of "Timeout_in_seconds" is step (2). Which "DONE", at step (3), or step (7), does correnpond to "DONE" in your comment #11 and comment #16?

thomaschan

Reporter

Comment 20

•

12 years ago

(In reply to WADA from comment #19) > > By your log, following is known by your log only. > (1) Tb does do "copy" locally. > (msgDBHeader copy only, because Offline-Use=Off, auto-sync=Disabled) > (2) log like next. > request c8f15680 Clearing OK request - > src imap:// ... IBOX/test2 > dest imap:// ... /INBOX/test1 numItems 2 type=0 > (3) Terminate IDLE for test2, to issue command to server > :S-INBOX/test2:SendData: DONE (to terminate IDLE) > :S-INBOX/test2:CreateNewLineFromSocket: 19 OK IDLE completed. <snip> > Which "DONE", at step (3), or step (7), does correnpond to "DONE" in your > comment #11 and comment #16? The "DONE" corresponds to step (3).

WADA:World Anti-bad-Duping Agency

Comment 21

•

12 years ago

(In reply to thomaschan from comment #20) > The "DONE" corresponds to step (3). Next step is step (4), "uid 30,000_UIDs copy INBOX/test1" which produces length=7372 bytes log line. After step (4), at IMAP level, server doesn't return data until step (5), OK response to "uid 30,000_UIDs copy INBOX/test1". After step (4), until step (5), client side action is "waiting for response from server" at any level; (a) Tb's main task who requested "mail copy between Mbox" to IMAP code. (b) IMAP code sends IMAP "uid copy" command to Socket, waits for OK. (c) SSL sends data of "uid copy" to server, and waits for response. (d) Socket sends data blocks to server's port, waits for ACK. (e) TCP transfers data blocks to server, and waits for ACK. (f) TCP transfers data packets to server, and waits for ACK. Because no data transfer occurs between "uid copy" and "OK" in your case, I think we could see culprit of problem. In your case, culprit couldn't hide behind continuous data transfer for huge mail data. As seen by Socket log, Tb's SSL code produces huge Socket level logs for "read 4046 bytes" followed by "no data" while waiting for IMAP level OK response. I think this is reason of "CPU 100%" for long time and many FUTEX WAIT/WAKE during the "CPU 100%". I believe bug 559676(and other bugs for CPU 100% with IMAP/SSL, SMTP/SSL) is this phenomenon.

WADA:World Anti-bad-Duping Agency

Comment 22

•

12 years ago

cc-ing to Nelson Bolyard. No data transfer occurs between "uid x:y copy traget_mbox" and "OK response" this bug's case, then client side activity is "waiting for response or ACK" only in this bug's case. So, I think problem in Tb's SSL code. Nelson, what kind of data is required for problem analysis? Is nsSocketTransport log sufficient?

thomaschan

Reporter

Comment 23

•

12 years ago

(In reply to WADA from comment #21) I'm not quite sure I get what you are saying. The "futex timeout/100% cpu spike" I'm seeing starts after step (2) ("Clearing OK request" is printed), and stops when step (3) starts ("SendData: DONE" is printed). From step (3) on, the thunderbird UI is no longer hung and cpu is about 25-30%. So, it seems like anything past step (4) is unrelated?

WADA:World Anti-bad-Duping Agency

Comment 24

•

12 years ago

(In reply to thomaschan from comment #23) > (In reply to WADA from comment #21) > > I'm not quite sure I get what you are saying. The "futex timeout/100% cpu > spike" I'm seeing starts after step (2) ("Clearing OK request" is printed), > and stops when step (3) starts ("SendData: DONE" is printed). > > From step (3) on, the thunderbird UI is no longer hung and cpu is about > 25-30%. Sorry, I misunderstood following in your comment #16. > This is when the UI hangs, cpu spikes, (snip) I did read it as "Above DONE is when the UI hangs, cpu spike. ...", instead of "Between 'Clearing OK request' and 'DONE' of above, is when ...". Following is your log for "2 mails copy" till send "DONE". No log line is omitted. > 2013-03-06 09:57:26.549210 CopyMessages > 2013-03-06 09:57:26.549237 request c8f15680 CopyMessages request - src imap://tbird17test@imapserver/INBOX/test2 dest imap://tbird17test@imapserver/INBOX/test1 numItems 0 type=0 > 2013-03-06 09:57:26.549259 request c8f15680 DoCopy - src imap://tbird17test@imapserver/INBOX/test2 dest imap://tbird17test@imapserver/INBOX/test1 numItems 2 type=0 > 2013-03-06 09:57:26.571005 msg id 2a789 SetNewFlags was 0 to 1 > 2013-03-06 09:57:26.572969 msg id 2a789 setOperation was 0 add 4 > 2013-03-06 09:57:26.573222 msg id 41579 SetNewFlags was 0 to 1 > 2013-03-06 09:57:26.582938 msg id 41579 setOperation was 0 add 8 > 2013-03-06 09:57:26.582971 msg id 2a787 SetNewFlags was 0 to 1 > 2013-03-06 09:57:26.582979 msg id 2a787 setOperation was 0 add 4 > 2013-03-06 09:57:26.583214 msg id 4157a SetNewFlags was 0 to 1 > 2013-03-06 09:57:26.583221 msg id 4157a setOperation was 0 add 8 > 2013-03-06 09:57:26.593623 NotifyCompletion - src imap://tbird17test@imapserver/INBOX/test2 dest imap://tbird17test@imapserver/INBOX/test1 > 2013-03-06 09:57:26.593636 request c8f15680 Clearing OK request - src imap://tbird17test@imapserver/INBOX/test2 dest imap://tbird17test@imapserver/INBOX/test1 numItems 2 type=0 > <here is the place you saw UI hangs, cpu spike> > 2013-03-06 09:57:27.090900 : c9504800:imapserver:S-INBOX/test2:SendData: DONE > After the sending DONE, "uid x:y copy INBOX/test1" is requested, > and, thunderbird UI is no longer hung and cpu is about 25-30%. How long does it took to complete following job from (Start) to (End)? No something-wrong8such as like CPU hog, UI freeze) between (Start) and (End)? >(Start) > request c8f15680 DoCopy - > src imap://tbird17test@imapserver/INBOX/test2 > dest imap://tbird17test@imapserver/INBOX/test1 numItems 2 type=0 >(End) > request c8f15680 Clearing OK request - > src imap://tbird17test@imapserver/INBOX/test2 > dest imap://tbird17test@imapserver/INBOX/test1 numItems 2 type=0 Because CPU 100% and FUTEX WAIT/WAKE doesn't occur during (Start) till (End), and CPU 100% and FUTEX WAIT/WAKE occurs between "above (End)" and "sending DONE"(== ready to send 'uid x:y copy test1"), Rebuild-Index like job may be executed after end of Docopy, before sending IMAP "uid copy" command. Anyway, removing Nelson from cc, because SSL is never relevant. Nelson, sorry for my confusion and spam.

WADA:World Anti-bad-Duping Agency

Comment 25

•

12 years ago

Sorry, typo. No something-wrong(such as CPU hog, UI freeze) between (Start) and (End)?

WADA:World Anti-bad-Duping Agency

Comment 26

•

12 years ago

FYI. "CopyMessages request" log > http://mxr.mozilla.org/comm-central/source/mailnews/base/src/nsMsgCopyService.cpp#481 "DoCopy" log > http://mxr.mozilla.org/comm-central/source/mailnews/base/src/nsMsgCopyService.cpp#259 "Clearing OK request" log > http://mxr.mozilla.org/comm-central/source/mailnews/base/src/nsMsgCopyService.cpp#178 nsMsgCopyService::CopyMessages is called by; From menu. > http://mxr.mozilla.org/comm-central/source/mail/base/content/mailWindowOverlay.js#1758 Via Drag&Drop > http://mxr.mozilla.org/comm-central/source/mail/base/content/folderPane.js#662

thomaschan

Reporter

Comment 27

•

12 years ago

(In reply to WADA from comment #24) > How long does it took to complete following job from (Start) to (End)? > No something-wrong8such as like CPU hog, UI freeze) between (Start) and > (End)? > > >(Start) > > request c8f15680 DoCopy - > > src imap://tbird17test@imapserver/INBOX/test2 > > dest imap://tbird17test@imapserver/INBOX/test1 numItems 2 type=0 > > >(End) > > request c8f15680 Clearing OK request - > > src imap://tbird17test@imapserver/INBOX/test2 > > dest imap://tbird17test@imapserver/INBOX/test1 numItems 2 type=0 From (Start) to (End) on a 6000 message copy, it takes 1.4 seconds according to the log. I can't say for sure that there is no UI freeze during that time as it's fairly short. It definitely has a UI freeze after that in the 20 seconds before "SendData: DONE" comes back. Here is the log/relevant timing for a 6000 message copy: 2013-03-06 23:14:25.273813 UTC - 836396832[7f0f31b64040]: CopyMessages 2013-03-06 23:14:25.273845 UTC - 836396832[7f0f31b64040]: request ff651260 CopyMessages request - src imap://tbird17test@imapserver/INBOX/test2 dest imap://tbird17test@imapserver/INBOX/test1 numItems 0 type=0 2013-03-06 23:14:25.275208 UTC - 836396832[7f0f31b64040]: request ff651260 DoCopy - src imap://tbird17test@imapserver/INBOX/test2 dest imap://tbird17test@imapserver/INBOX/test1 numItems 6000 type=0 2013-03-06 23:14:25.550859 UTC - 836396832[7f0f31b64040]: msg id 2ba06 SetNewFlags was 0 to 1 2013-03-06 23:14:25.552963 UTC - 836396832[7f0f31b64040]: msg id 2ba06 setOperation was 0 add 4 <snip> 2013-03-06 23:14:26.577110 UTC - 836396832[7f0f31b64040]: msg id 47ef2 SetNewFlags was 0 to 1 2013-03-06 23:14:26.577117 UTC - 836396832[7f0f31b64040]: msg id 47ef2 setOperation was 0 add 8 2013-03-06 23:14:26.679098 UTC - 836396832[7f0f31b64040]: NotifyCompletion - src imap://tbird17test@imapserver/INBOX/test2 dest imap://tbird17test@imapserver/INBOX/test1 2013-03-06 23:14:26.679111 UTC - 836396832[7f0f31b64040]: request ff651260 Clearing OK request - src imap://tbird17test@imapserver/INBOX/test2 dest imap://tbird17test@imapserver/INBOX/test1 numItems 6000 type=0 2013-03-06 23:14:46.801915 UTC - 54523648[7f0f03f026a0]: 3ff1800:imapserver:S-INBOX/test2:SendData: DONE

thomaschan

Reporter

Comment 28

•

12 years ago

(In reply to thomaschan from comment #27) > From (Start) to (End) on a 6000 message copy, it takes 1.4 seconds according > to the log. I can't say for sure that there is no UI freeze during that > time as it's fairly short. It definitely has a UI freeze after that in the > 20 seconds before "SendData: DONE" comes back. I just tried on a 30000 message copy, and I definitely could not click on folders or get the screen redrawn in the time between (Start) and (End) when all the "msg id xxxxx" lines were scrolling through. It took about 7-8 seconds, which makes sense since 6000 messages took 1.4 seconds. Still, that "msg id xxxxx" time period is fairly short (and seems to be linear in time to the number of messages) compared to the long UI hang after "Clearing OK request" (which seems to be more exponential in time to the number of messages).

WADA:World Anti-bad-Duping Agency

Comment 29

•

12 years ago

For FUTEX log on Linux. Because IMAP to IMAP copy message, nsImapService::CopyMessages is perhaps finally invoked. > http://mxr.mozilla.org/comm-central/source/mailnews/imap/src/nsImapService.cpp#737 > 737 NS_IMETHODIMP nsImapService::CopyMessages(uint32_t aNumKeys, nsImapService::CopyMessages calls imapUrl->SetCopyState > http://mxr.mozilla.org/comm-central/source/mailnews/imap/src/nsImapService.cpp#777 > 777 imapUrl->SetCopyState(aMailboxCopy); nsImapUrl::SetCopyState requests mutex lock to support multi-task/multi-cpu. > http://mxr.mozilla.org/comm-central/source/mailnews/imap/src/nsImapUrl.cpp#1111 > 1111 NS_IMETHODIMP nsImapUrl::SetCopyState(nsISupports* copyState) > 1112 { > 1113 MutexAutoLock mon(mLock); > 1114 m_copyState = copyState; Above imapUrl->SetCopyState(aMailboxCopy) looks for Mbox, but similar state setting for each message may be executed with getting mutex lock. Mutex lock used by Tb on Linux is perhaps FUTEX lock. If "mutex lock per message for message state", it's probably normal, but if "mutex lock for Mbox state update" is invoked for each message, it may be problem of inefficiency, or simply a bug. thomaschan, can "consecutive FUTEX lock requests for 30,000 messages" easily produce CPU 100%? If job relevan to FUTEX lock is message related one, I also think it should be linear to number of mesages. It should be O(n). It shouldn't be O(n^1.5) or O(n^2) like one, or O(2^n) or O(n!) like one. I suspect "scan of all previous messages"(average = half of number of messages) for each message" like problem is involved. (See Bug 452221 for this kind of problem. issue is: if top mail is removed from thread, update of all childs is needed.) (In reply to thomaschan from comment #28) > I definitely could not click on folders or get the screen redrawn > in the time between (Start) and (End) > when all the "msg id xxxxx" lines were scrolling through. > It took about 7-8 seconds, which makes sense since 6000 messages took 1.4 seconds. > Still, that "msg id xxxxx" time period is fairly short (and seems to be linear in time to the number of messages) > compared to the long UI hang after "Clearing OK request" > (which seems to be more exponential in time to the number of messages). For UI freeze while long IMAP processing. See bug 842371 and read 842371 comment #3, please. Please separate followings, although these are never absolutely independednt and are relevant. (i) CPU 100% for long time, very many FUTX lock requests, and so on, during funny phase. (ii) UI waits for long time if IMAP process takes long, because IMAP API is synchronous. (iii) UI freeze due to CPU 100% by other task or by myself.

WADA:World Anti-bad-Duping Agency

Comment 30

•

12 years ago

Question about your operation. Do you use Drag&Drop for mail copy test? Because that bug is problem in "onDragStart" and this bug is after DoCopy which occurs after "onDrop" if Drag&Drop is used, I don't think that bug is relevant to this bug. But, please surely rule out bug 812923 from this bug. (read bug 812923 comment #21, please)

thomaschan

Reporter

Comment 31

•

12 years ago

(In reply to WADA from comment #29) > thomaschan, can "consecutive FUTEX lock requests for 30,000 messages" easily > produce CPU 100%? > > If job relevan to FUTEX lock is message related one, I also think it should > be linear to number of mesages. It should be O(n). It shouldn't be O(n^1.5) > or O(n^2) like one, or O(2^n) or O(n!) like one. > I suspect "scan of all previous messages"(average = half of number of > messages) for each message" like problem is involved. > (See Bug 452221 for this kind of problem. issue is: if top mail is removed > from thread, update of all childs is needed.) Hmm, this bug seems interesting. My test IMAP mailbox is actually a clone of many duplicate messages, so that might be it. I will try to find another test mailbox with more unique mails to see if I can reproduce the same timings. On a side note, this problem does not occur with Thunderbird 2. It seems to start doing the IMAP ops right away.

thomaschan

Reporter

Comment 32

•

12 years ago

(In reply to WADA from comment #30) > Question about your operation. > Do you use Drag&Drop for mail copy test? No, I'm using right-click copy instead of drag and drop.

WADA:World Anti-bad-Duping Agency

Comment 33

•

12 years ago

(In reply to thomaschan from comment #31) > On a side note, this problem does not occur with Thunderbird 2. > It seems to start doing the IMAP ops right away. IIRC, change from "request copy to server, then fetch the copied mail as newly added mail" style to "Do Offline Copy, then propagate the copy operation to server" style was done in Tb 3.x. And, IIRC, Tb issued multiple "uid a,..,z copy target" commands in the past when number of mails to copy/move is large. I guess single "uid 30,000_UID copy target" command is a result of that IMAP Online Copy after Tb 3.x is similar to "Copy while Work Offline, then go Work Online".

WADA:World Anti-bad-Duping Agency

Comment 34

•

12 years ago

(In reply to thomaschan from comment #31) > My test IMAP mailbox is actually a clone of many duplicate messages, so that might be it. If "MsgDB update for threading at copy_target after mail copy by DoCopy" like one is cause, I think high CPU utilization should be observed in both of next. (1) Copy from INBOX/test2 to test1 while Work Online. (2) Copy from INBOX/test2 to test1 while Work Offline. If "Setting up for requesting IMAP command(uid 30000_UID copy test1)" is cause, I think high CPU utilization should be observed by; (3) Copy from INBOX/test2 to test1 while Work Offline, go Work Online. Do you see difference among them?

thomaschan

Reporter

Comment 35

•

12 years ago

(In reply to WADA from comment #34) > If "MsgDB update for threading at copy_target after mail copy by DoCopy" > like one is cause, I think high CPU utilization should be observed in both > of next. > (1) Copy from INBOX/test2 to test1 while Work Online. This is the normal test case, so high CPU utilization is seen after "Clearing OK Request" (20 seconds for 6000 messages). > (2) Copy from INBOX/test2 to test1 while Work Offline. When doing the offline copy (without downloading bodies), high CPU is not seen. > If "Setting up for requesting IMAP command(uid 30000_UID copy test1)" is > cause, I think high CPU utilization should be observed by; > (3) Copy from INBOX/test2 to test1 while Work Offline, go Work Online. As per (2) the first part (copy while offline) does not incur high CPU. With 6000 messages, after going back online, we get a really quick spike for less than a second and then it's fine. With 30000 messages, after going back online, we get the CPU spike for 513 seconds.

WADA:World Anti-bad-Duping Agency

Comment 36

•

12 years ago

(In reply to thomaschan from comment #35) > With 6000 messages, after going back online, we get a really quick spike for less than a second and then it's fine. > With 30000 messages, after going back online, we get the CPU spike for 513 seconds. Problem occurs when going back online, so, perhaps problem during IMAP request creation. Your measurement in comment #11. > Number_of_messages, Timeout_in_seconds > 1000, 0 > 3000, 5 > 4000, 9 > 6000, 20 > 10000, 55 > 20000, 222 20000=10000*2, 55*(2^2)=55* 4=220 > 30000, 557 30000= 6000*5, 20*(5^2)=20*25=500 > 30000=10000*3, 55*(3^2)=55* 9=495 Actually, approximately O(n^2) issue. As seen in next request(and uid copy command too), UID is sorted, and merged to "x:y" if continuous(11,12,13,14,15 => 11:15). > :S-INBOX:ProcessCurrentURL:imap://tbird17test@imapserver:993/onlinecopy>UID>/INBOX> > 51:100,152:201,253:302,352:401,453:502,554:603,653:702,... (1) Algorythm like next is suspected. For each UID, scan all UIDs to get continuous UIDs (similar one to bug 812923 comment #21) I'll try to check Tb's IMAP code for it. (2) Another suspect - Sorting. Do you see difference between followings? (A) At Order Recieved column(UID), sort in ascending order, Copy (B) At Order Recieved column(UID), sort in descending order, Copy If Select All, "current selection" may not be used(entire Mbox copy may be executed). Deselct one mail after Select All in this test, please.

WADA:World Anti-bad-Duping Agency

Comment 37

•

12 years ago

In my quick check with 2000-1 mails, difference was not observed. (A) Ascending : Timeout_in_seconds = 8.5 sec (B) Descending : Timeout_in_seconds = 8.5 sec

WADA:World Anti-bad-Duping Agency

Comment 38

•

12 years ago

CPU 50% of thunderird.exe at Task Manager was observed in Tb 17 on MS Win during the "Timeout_in_seconds = 8.5 sec with 2000 mail copy" in commet #37. Tested on Core2Duo, so phenomenon is "100% CPU utilization on one CPU".

Status: UNCONFIRMED → NEW

Component: Untriaged → Networking: IMAP

Ever confirmed: true

Keywords: perf

OS: Linux → Other

Product: Thunderbird → MailNews Core

Hardware: x86_64 → All

Summary: Thunderbird 100% CPU for minutes when copying a large number of messages → Thunderbird 100% CPU for minutes when copying a large number of messages (IMAP Online Copy, Copy between Offline-Use=Off folders)

thomaschan

Reporter

Comment 39

•

12 years ago

(In reply to WADA from comment #38) > CPU 50% of thunderird.exe at Task Manager was observed in Tb 17 on MS Win > during the "Timeout_in_seconds = 8.5 sec with 2000 mail copy" in commet #37. > Tested on Core2Duo, so phenomenon is "100% CPU utilization on one CPU". Yes, this corresponds to what I see (a single CPU topped out) on a 12 core Linux machine.

thomaschan

Reporter

Comment 40

•

12 years ago

(In reply to thomaschan from comment #31) > Hmm, this bug seems interesting. My test IMAP mailbox is actually a clone > of many duplicate messages, so that might be it. I will try to find another > test mailbox with more unique mails to see if I can reproduce the same > timings. Ok, I found a folder of spam emails to copy. This one still had some duplicate subject lines, but was a lot more unique overall. 6000 messages still incurred the 20 second timeout as before.

thomaschan

Reporter

Comment 41

•

12 years ago

(In reply to WADA from comment #37) > In my quick check with 2000-1 mails, difference was not observed. > (A) Ascending : Timeout_in_seconds = 8.5 sec > (B) Descending : Timeout_in_seconds = 8.5 sec I get the same results with 6000 mails, 20 seconds regardless of ascending or descending. I did the select all and deselecting the first mail as requested.

WADA:World Anti-bad-Duping Agency

Comment 42

•

12 years ago

Following may be relevant code. > http://mxr.mozilla.org/comm-central/source/mailnews/imap/src/nsImapProtocol.cpp#7772 > 737 NS_IMETHODIMP nsImapService::CopyMessages(uint32_t aNumKeys, > 767 AllocateImapUidString(aKeys, aNumKeys, nullptr, messageIds); > http://mxr.mozilla.org/comm-central/source/mailnews/imap/src/nsImapUtils.cpp#248 > 250 void AllocateImapUidString(uint32_t *msgUids, uint32_t &msgCount, > 285 flagState->GetMessageFlagsFromUID(curSequenceEnd, &foundIt, &curFlagStateIndex); > http://mxr.mozilla.org/comm-central/source/mailnews/imap/src/nsImapFlagAndUidState.cpp#231 > 235 imapMessageFlagsType nsImapFlagAndUidState::GetMessageFlagsFromUID(uint32_t uid, bool *foundIt, int32_t *ndx) > 238 *ndx = (int32_t) fUids.IndexOfFirstElementGt(uid) - 1; > http://mxr.mozilla.org/comm-central/source/mozilla/xpcom/glue/nsTArray.h#890 Two IndexOfFirstElementGt is defined. > 905 template<class Item, class Comparator> > 906 index_type > 907 IndexOfFirstElementGt(const Item& item, > 908 const Comparator& comp) const { > > 928 // A variation on the IndexOfFirstElementGt method defined above. > 929 template<class Item> > 930 index_type > 931 IndexOfFirstElementGt(const Item& item) const { > 932 return IndexOfFirstElementGt(item, nsDefaultComparator<elem_type, Item>()); > 933 }

WADA:World Anti-bad-Duping Agency

Comment 43

•

12 years ago

If JavaScript, and if Sort is possible, simple logic like following is possible. > UIDarray is JavaScript array which holds UIDs. > var UIDarrayX = Sorted result of UIDarray as number // <= hard to do it effitiently > > var Table = new Array(); var Last=0; var Index; > for(var nn=0;nn<UIDarrayX.length;nn++) > { > if(Last<UIDarrayX[nn]) // because UID>0, this step is always executed by nn=0 > { Index=Table.length; Table[Index]={}; Table[Index]["start"]=UIDarrayX[nn]; Table[Index]["end"]=UIDarrayX[nn]; Last=UIDarrayX[nn]; } > else if(Last+1==UIDarrayX[nn]) > { Table[Index]["end"] =UIDarrayX[nn]; Last=UIDarrayX[nn]; } > else if(Last==UIDarrayX[nn]){ continue; } // Duplicate => ignore > else { continue; } // Sort logic is broken > } > var UIDarrayY = new Array(); > for(var nn=0;nn<Table.length;nn++) > { > if (Table["start"]==Table["end"}{ UIDarrayY[UIDarrayY.length]=Table["start"]; } > else if(Table["start"]< Table["end"}{ UIDarrayY[UIDarrayY.length]=Table["start"]+":"+Table["end"]; } > else{ Throw error, because logic error exists; } > } > var UIDstring =UIDarrayY.join(","); Problem is; Efficient sort.

WADA:World Anti-bad-Duping Agency

Comment 44

•

12 years ago

(In reply to WADA from comment #43) > Problem is; Efficient sort. FYI. JavaScript Array sort is not so bad. function sorter(a,b){ return Number(a) - Number(b); } var NumArray=new Array(); Sorted by NumArray.sort(sorter); NumArray.length = 10,000, reversed order : Sort time = 47 msec NumArray.length = 80,000, reversed order : Sort time = 256 msec Time to take Merge UIDs by logic of comment #43. Number of unieque UIDs = 24,000 (30,000 is generated by Math.random) approxiately, 1/3 : non continuous 1/3 : 2 continuous UIDs 1/3 : 3 continuous UIDs Sort 30,000 UIDs(including dup) : 57 msec Merge unieque 24,000 UIDs : 17 msec

WADA:World Anti-bad-Duping Agency

Comment 45

•

12 years ago

Logic in IMAP code I referred in comment #42 was not so different from example I presented. And, UIDms or msgKeys are usually already sorted by requester if request to IMAP code. Performance issue occurs while generating request to IMAP code. After DoCopy(Offline mode copy at local), or actual Offline copy while Work Offline mode, control looks passed to nsImapOfflineSync::ProcessNextOperation(). > http://mxr.mozilla.org/comm-central/source/mailnews/imap/src/nsImapOfflineSync.cpp#742 > 742 nsresult nsImapOfflineSync::ProcessNextOperation() In our case, nsImapOfflineSync::ProcessNextOperation() does; (1) Invoke "liteselect of SourceMbox By listselect : Open SourceMbox, select SourceMbox, fetch headers (2) Invoke "ProcessCopyOperation" for SourceMbox/TargetMbox > http://mxr.mozilla.org/comm-central/source/mailnews/imap/src/nsImapOfflineSync.cpp#603 > 603 void nsImapOfflineSync::ProcessCopyOperation(nsIMsgOfflineImapOperation *aCurrentOp) > 660 rv = imapFolder->ReplayOfflineMoveCopy(matchingFlagKeys.Elements(), matchingFlagKeys.Length(), false, destFolder, > http://mxr.mozilla.org/comm-central/source/mailnews/imap/src/nsImapMailFolder.cpp#3841 > 3841 nsImapMailFolder::ReplayOfflineMoveCopy(nsMsgKey *aMsgKeys, uint32_t aNumKeys, > 3907 AllocateUidStringFromKeys(aMsgKeys, aNumKeys, uids); > 3908 rv = imapService->OnlineMessageCopy(this, uids, aDstFolder, > OnlineMessageCopy passes request to IMAP code AllocateUidStringFromKeys is also a converter from msgkey array of a,b,c,...x,y,z to UID list of a:d,...,x:z for IMAP request. > http://mxr.mozilla.org/comm-central/source/mailnews/imap/src/nsImapMailFolder.cpp#2171 > 2171 nsImapMailFolder::AllocateUidStringFromKeys(nsMsgKey *keys, uint32_t numKeys, nsCString &msgIds) AllocateUidStringFromKeys calls NS_QuickSort, just before loop for each msgKey. > 2180 // sort keys and then generate ranges instead of singletons! > 2181 NS_QuickSort(keys, numKeys, sizeof(nsMsgKey), CompareKey, nullptr); > 2182 for (uint32_t keyIndex = 0; keyIndex < total; keyIndex++) Problem is "Performance issue of NS_QuickSort is exposed by large array data"? > http://mxr.mozilla.org/comm-central/source/mozilla/xpcom/glue/nsQuickSort.cpp#98 > 98 void NS_QuickSort ( No efficient built-in sort in CPP?

WADA:World Anti-bad-Duping Agency

Comment 46

•

12 years ago

FYI. > Bug 72196 - eliminate |NS_QuickSort|, use C's standard |qsort| instead > Reported: 2001-03-16 > Status: RESOLVED WONTFIX (on 2009-07-24) Note-1: Bug is not for performance issue. For problem when NSPR and XPCOM is compiled differently. Note-2: As written in Bug 72196 comment #3, folloowing comment is seen. > http://mxr.mozilla.org/comm-central/source/mozilla/xpcom/glue/nsQuickSort.cpp#32 > 4 * Copyright (c) 1992, 1993 >(snip) > 32 /* We need this because Solaris' version of qsort is broken and > 33 * causes array bounds reads. > 34 */

WADA:World Anti-bad-Duping Agency

Comment 47

•

12 years ago

C++ looks to have sort(), and independent stable_sort() because sort() in C++ is not stable sort. (.sort of JavaScript is always stable sort) > http://www.cplusplus.com/reference/algorithm/sort/ > http://www.cplusplus.com/reference/algorithm/stable_sort/

WADA:World Anti-bad-Duping Agency

Comment 48

•

12 years ago

FYI. NS_QuickSort() in comm-central. > http://mxr.mozilla.org/comm-central/search?string=ns_quicksort Sort at thread pane may be affected if big Mbox(/mailnews/base/src/nsMsgDBView.cpp).

WADA:World Anti-bad-Duping Agency

Comment 49

•

12 years ago

Cc-ing to David :Bienvenu and :aceman. Does NS_QuickSort() use in nsImapMailFolder::AllocateUidStringFromKeys produce problem of this bug?

:aceman

Comment 50

•

12 years ago

Thanks for the detective work WADA. I can not test this myself, but I could look at the code you pointed out. I just remember a bug where existing brute-force quicksort had to be replaced by bubble sort to GAIN speed. It was a time where the array was mostly already sorted and we added a new element to that. What I will do here is to look if the code pattern is not like this: array = []; for each ("element from a msg to be copied") { array.append(element); quicksort(array1); } This would be O(n^2.log n). While if this is the work we need to do, it can be done like this: for each ("element from a msg to be copied") { // array is already sorted array.insert_by_binary_search(element); } In this case, it can be done in O(n.log n).

WADA:World Anti-bad-Duping Agency

Comment 51

•

12 years ago

(In reply to :aceman from comment #50) > I can not test this myself, (snip) Can you measure difference between NS_QuickSort(Array_of_30000_numbers) and C++ stable_sort(Array_of_30000_numbers)?

WADA:World Anti-bad-Duping Agency

Comment 52

•

12 years ago

FYI. Bug 738533 was for going back to qsort() from NS_QuickSort() in CSS etc., which was opened on 2012-03-22 and fixed on 2012-08-15... > Bug 738533 Review NS_QuickSort() documentation and uses: Solaris, qsort, ...

:aceman

Comment 53

•

12 years ago

From bug 738533 it is not explicit which sort is the good one. Which one we should use in c-c. Also notice there is basically no usage of stable_sort in m-c/c-c. Also we do not need a stable sort as I would hope the UIDs are unique.

OS: Other → Linux

neil@parkwaycc.co.uk

Comment 54

•

12 years ago

(In reply to aceman from comment #50) > I just remember a bug where existing brute-force quicksort had to be > replaced by bubble sort to GAIN speed. No, I denied review on the bubble sort in bug 764306, instead replacing it with the code pattern that you then go on to describe.

:aceman

Comment 55

•

12 years ago

I could also imagine that not the sort itself is slow but the later constructing msgIds string could be expensive (AppendUid extends the string which probably includes new memory allocation). Depends on how non-consecutive the uids are. So the whole nsImapMailFolder::AllocateUidStringFromKeys function should be benchmarked and that is a bit difficult for me to do. I do have a IMAP server but any idea how to produce many messages on it? :)

WADA:World Anti-bad-Duping Agency

Comment 56

•

12 years ago

(In reply to :aceman from comment #55) > I could also imagine that not the sort itself is slow but the later > constructing msgIds string could be expensive > (AppendUid extends the string which probably includes new memory allocation). Essential logic looked for me similar to my comment #43(see also comment #44), and AllocateImapUidString of nsImapService::CopyMessages also uses similar logic(see comment #42) and it looked the module has no performance problem in this bug's case, so I suspected QuickSort first, but it seems wrong. > http://mxr.mozilla.org/comm-central/source/mailnews/imap/src/nsImapProtocol.cpp#7772 > 737 NS_IMETHODIMP nsImapService::CopyMessages(uint32_t aNumKeys, > 767 AllocateImapUidString(aKeys, aNumKeys, nullptr, messageIds); > Depends on how non-consecutive the uids are. See last 7372 bytes log line of attachment 720831 [details] for actual data. > I do have a IMAP server but any idea how to produce many messages on it? :) As I wrote in comment #38, I could see this bug by only 2,000 mails with Gmail IMAP server. I have Object Rexx script to generate crafted mails, which was used to generate test cases for Bug 452221 and test cases for 4GB/larger than 4GB mail folder related bugs. Do you need it?

thomaschan

Reporter

Comment 57

•

12 years ago

(In reply to :aceman from comment #55) > I do have a IMAP server but any idea how to produce many messages on it? :) I generated mails two ways for my tests. One was to just keep doubling your messages by copying to the same imap folder. It doesn't take too much time to get to a large number of messages. The second was to just upload a large spam folder I had :)

:aceman

Comment 58

•

12 years ago

I wonder why there are 2 similar functions nsImapMailFolder::AllocateUidStringFromKeys and AllocateImapUidString, both using different ways to append the uid numbers (PR_snprintf(buf, sizeof(buf), "%u", buf) vs. appendInt). I wonder which one is faster. And if something could be shared.

:aceman

Comment 59

•

12 years ago

OK, I was able to test on 4500 msgs. The problem is not in the function nsImapMailFolder::AllocateUidStringFromKeys(). It does not matter if uids are consecutive or not. After the function returned on the 4500 uids, I got the long hang as the bug describes, however the number of copied messages appeared in the target folder (shown by Extra folder columns addon). When in the target folder I deleted many of the messages to make the uids nonconsecutive. I then copied those to a third folder (and I really got a long list of ranges from AllocateUidStringFromKeys), I got no hang now. That is strange.

:aceman

Comment 60

•

12 years ago

Atuljangra, would you be able to see what is going on here?

WADA:World Anti-bad-Duping Agency

Comment 61

•

12 years ago

Because problem is in "Replay of Offline copy in actual(non-pseudo) online mode" after "Offline Copy", "actual Offline Copy" and "Reply after restart" is almost same, and it's easier than checking while Online mode. (1) go Work Offline, copy many mails between IMP Offline-use=Off folder. (call FolderA to FolderB). (2) Click third small folder(call FolderX), to force first select upon restart. (3) Max cached connection=1 to see "queuing IMAP request" log, stop any new mail check, and Terminate Tb. (4) Enable logging, and retart Tb. Parameter sample; > timestamp,imap:5,ImapAutoSync:5,IMAPOFFLINE:5,MsgCopyService:5,MsgPurge:5 (5) Tb does following. - Just after restart, queue "liteselect" request for FolderA is queued because max cached connections=1 > queuing url:imap://.../liteselect>/FolderA - login to server, anonymous mode, folder re-discovery after LIST etc. - select FolderX, fetch flags, because selected at folder pane - "liteselect" request for FolderA is procssed by IMAP code. > playing queued url:imap://.../liteselect>/FolderA - select FolderA, uid 1:* fetch flags - End of "liteselect" for FolderA > offline imap url succeeded :imap://.../liteselect>/FolderA - CPU hog starts. - After a while, log for queuing "onlinecopy" appears > queuing url:imap://.../onlinecopy>UID>/FoderA>1851,1906,1926,1940,1951,1959:1963>/FolderB > playing queued url:imap://.../onlinecopy>UID>/FolderA>1851,1906,1926,1940,1951,1959:1963>/FolderB - "uid a:b,...y:z copy FolderB" is issueed. There is no Tb log between "end of liteselect" and "queuing of onlinecopy", even when log is taken with all:5, and no file access by Tb(checked by Process Monitot), so CPU is consumed by CPU-bound/in- memory process. Interesting phenomenon was observed when non-consecutive mails are incidentally copied. - UIDs in onlinecopy request to IMAP is non-consecutive. - Because IMAP permits smallest-UID-in-range:highest-UID-in-range, and non-existent UID is ignored by server, IMAP code generated single smallest-UID:highest-UID in "uid xxx　copy” 　 command　 for shortest command string. > 00000284 5.06817389 [4020] 0[230f140]: offline imap url succeeded :imap://yatter.king@gmail.com@imap.gmail.com:993/liteselect>/INBOX/Inbox1 > 00000285 5.08655262 [4020] 0[230f140]: queuing url:imap://yatter.king@gmail.com@imap.gmail.com:993/onlinecopy>UID>/INBOX/Inbox1>1851,1906,1926,1940,1951,1959:1963>/INBOX/Inbox2 > 00000291 5.09421825 [4020] 0[230f140]: playing queued url:imap://yatter.king@gmail.com@imap.gmail.com:993/onlinecopy>UID>/INBOX/Inbox1>1851,1906,1926,1940,1951,1959:1963>/INBOX/Inbox2 > 00000293 5.09500217 [4020] 1956[7f91ef0]: 90dbc00:imap.gmail.com:S-INBOX/Inbox1:ProcessCurrentURL:imap://yatter%2Eking%40gmail%2Ecom@imap.gmail.com:993/onlinecopy%3EUID%3E/INBOX/Inbox1%3E1851,1906,1926,1940,1951,1959:1963%3E/INBOX/Inbox2: = currentUrl > 00000294 5.09631872 [4020] 1956[7f91ef0]: 90dbc00:imap.gmail.com:S-INBOX/Inbox1:SendData: 17 uid copy 1851:1963 "INBOX/Inbox2" 90dbc00:imap.gmail.com:S-INBOX/Inbox1:CreateNewLineFromSocket: 17 OK [COPYUID 602171931 1851,1906,1926,1940,1951,1959:1963 6084,6082,6086,6079:6081,6083,6085,6088,6087] (Success) > 00000297 5.78692436 [4020] 0[230f140]: offline imap url succeeded :imap://yatter.king@gmail.com@imap.gmail.com:993/onlinecopy>UID>/INBOX/Inbox1>1851,1906,1926,1940,1951,1959:1963>/INBOX/Inbox2 aceman, is there any way to get log with timestamp while Tb is doing something after "end of liteselect" before "queuing onlinecopy"?

Wayne Mery (:wsmwk)

Comment 62

•

11 years ago

(In reply to WADA from comment #61) > aceman, is there any way to get log with timestamp while Tb is doing > something after "end of liteselect" before "queuing onlinecopy"? (In reply to :aceman (away till Aug 11) from comment #60) > Atuljangra, would you be able to see what is going on here?

Flags: needinfo?(atuljangra66)

Flags: needinfo?(acelists)

Whiteboard: [bulkoperations]

Atul Jangra [:atuljangra]

Comment 63

•

11 years ago

Wayne, I'm quite busy with a project right now. I'll look it in a couple of days. Not clearling the need-info.

:aceman

Comment 64

•

11 years ago

Sorry, I have not much knowledge about IMAP.

Flags: needinfo?(acelists)

Alfred Peters

Assignee

Comment 65

•

11 years ago

Attached file Extended log of move operation. (deleted) — Details

I've added some additional log-commands into the source code. TB trunk 27.0a1 compiled with --enable-debug. I did move 5000 e-mail in online mode between two sub-folders. I've already found a fix for the 160-second gap at 17:09:24.26. But there is another 45-second gap at 17:08:37.42.

Alfred Peters

Assignee

Comment 66

•

11 years ago

Attached file Log of DeletMessege part (deleted) — Details

This is the log of the '45-second' gap. During this time deletemessage() is performed on the folder-db. All 300 Messages, the database is commited. The time consumption caused by the fact that each message is individually removed from the DB. I don't know if this could be improved.

neil@parkwaycc.co.uk

Comment 67

•

11 years ago

(In reply to Alfred Peters from comment #66) > This is the log of the '45-second' gap. > During this time deletemessage() is performed on the folder-db. > All 300 Messages, the database is commited. What I find strange is that the first 300 messages took over 12 seconds but subsequent commits are only about 2 seconds apart.

Alfred Peters

Assignee

Comment 68

•

11 years ago

Attached patch Reduce the time consumption from O(n^2) to O(n) when copying email between two IMAP-folders. (obsolete) (deleted) — Details — Splinter Review

And here is the announced patch for the larger gap. It reduces the time for moving 10,000 e-mail on my TB trunk 27.0a1 (optimized build) from 175 to 30 seconds. The patch replaces only the outer to the inner for-loop in the function nsImapMailFolder::ReplayOfflineMoveCopy(). As a result, most function calls are invoked only n times instead of n^2 times.

Attachment #816378 - Flags: review?(mozilla)

Alfred Peters

Assignee

Comment 69

•

11 years ago

(In reply to neil@parkwaycc.co.uk from comment #67) > What I find strange is that the first 300 messages took over 12 seconds but > subsequent commits are only about 2 seconds apart. As far as I could see, the first call of "NotifyHdrDeletedAll(msg, threadParent, flags, instigator); // tell listeners" (line 2044 in nsMsgDatabase::DeleteHeader()) consumes most of that time.

neil@parkwaycc.co.uk

Comment 70

•

11 years ago

Comment on attachment 816378 [details] [diff] [review] Reduce the time consumption from O(n^2) to O(n) when copying email between two IMAP-folders. >- for (uint32_t msgIndex = 0; msgIndex < aNumKeys; msgIndex++) > { >- nsCOMPtr<nsIMsgOfflineImapOperation> currentOp; >- for (uint32_t opIndex = 0; opIndex < offlineOps.Length(); opIndex++) > { >- dstFolderDB->GetOfflineOpForKey(offlineOps[opIndex], false, >- getter_AddRefs(currentOp)); >- if (currentOp) > { >- nsMsgKey srcMessageKey; >- currentOp->GetSrcMessageKey(&srcMessageKey); > if (srcMessageKey == aMsgKeys[msgIndex]) Whoa, this must be the slowest implementation of Contains() that I've seen...

neil@parkwaycc.co.uk

Comment 71

•

11 years ago

(In reply to Alfred Peters from comment #69) > (In reply to comment #67) > > What I find strange is that the first 300 messages took over 12 seconds but > > subsequent commits are only about 2 seconds apart. > > As far as I could see, the first call of > "NotifyHdrDeletedAll(msg, threadParent, flags, instigator); // tell > listeners" > (line 2044 in nsMsgDatabase::DeleteHeader()) consumes most of that time. I suspected as much, but I had no way to be sure. Unfortunately there are a number of implementations of OnHdrDeleted...

Assignee: nobody → infofrommozilla

WADA:World Anti-bad-Duping Agency

Comment 72

•

11 years ago

(In reply to Alfred Peters from comment #69) > As far as I could see, the first call of > "NotifyHdrDeletedAll(msg, threadParent, flags, instigator); // tell > listeners" > (line 2044 in nsMsgDatabase::DeleteHeader()) consumes most of that time. If "threading related performance issue upon delete message", it may be a part of problems found in bug 452221. See also Bug 218075. Because "delete mails in IMAP" is perhaps executed in UID order(small UID first) usually, performance issue you saw may be rather similar to Bug 218075.

Alfred Peters

Assignee

Comment 73

•

11 years ago

(In reply to WADA from comment #72) > If "threading related performance issue upon delete message", it may be a > part of problems found in bug 452221. Yes, I can see a difference. But the influence on the movement is not so dramatic. Moving 10k of my e-mail with individual subject (as mentioned in comment #68): 30 seconds Moving the same e-mail with the same subject: 36 seconds Moving your 8k e-mail from attachment 335678 [details]: 32 seconds > See also Bug 218075. Because "delete mails in IMAP" is perhaps executed in Bug 218075 is about expunge responses. I have only measured up to the first server action.

Atul Jangra [:atuljangra]

Comment 74

•

11 years ago

Nice to see someone is working on this. I was not able to take a look at this. I will start working on TB as soon as I get some time :D

Flags: needinfo?(atuljangra66)

Wayne Mery (:wsmwk)

Comment 75

•

11 years ago

Alfred, are you still wanting review from davidbienvenu? Or is a next patch more appropriate? With a different reviewer?

Flags: needinfo?(infofrommozilla)

Alfred Peters

Assignee

Comment 76

•

11 years ago

(In reply to Wayne Mery (:wsmwk) from comment #75) > Alfred, are you still wanting review from davidbienvenu? Is he the wrong reviewer? Is there a better one? If not - yes I want. > Or is a next patch more appropriate? No. The Patch has an offset of 12 lines but is still valid and necessary. The time reduction to 20% is quite something. If further improvements are needed (e.g. in notify-mechanism), we can leave the bug open. > With a different reviewer? Is there an alternative?

Flags: needinfo?(infofrommozilla)

Wayne Mery (:wsmwk)

Updated

•

10 years ago

Attachment #816378 - Flags: review?(mozilla)

Wayne Mery (:wsmwk)

Comment 77

•

10 years ago

(In reply to Alfred Peters from comment #76) > (In reply to Wayne Mery (:wsmwk) from comment #75) > > Alfred, are you still wanting review from davidbienvenu? > > Is he the wrong reviewer? Is there a better one? > If not - yes I want. kent or magnus, can you review or suggest someone? Thanks

Flags: needinfo?(mkmelin+mozilla)

Flags: needinfo?(kent)

Kent James (:rkent)

Comment 78

•

10 years ago

I would be willing to look at it, but Neil has already been involved here so I would normally expect him to handle the review. That's why I did not get involved earlier.

Flags: needinfo?(kent)

Wayne Mery (:wsmwk)

Updated

•

10 years ago

Flags: needinfo?(mkmelin+mozilla)

Wayne Mery (:wsmwk)

Comment 79

•

10 years ago

Comment on attachment 816378 [details] [diff] [review] Reduce the time consumption from O(n^2) to O(n) when copying email between two IMAP-folders. true enough. switching review

Attachment #816378 - Flags: review?(neil)

neil@parkwaycc.co.uk

Comment 80

•

10 years ago

Comment on attachment 816378 [details] [diff] [review] Reduce the time consumption from O(n^2) to O(n) when copying email between two IMAP-folders. So, we're collecting all the message headers for operations whose source folder is the current folder and message key is an element of the array. I wonder whether it would make sense to check the source folder first though.

Attachment #816378 - Flags: review?(neil) → review+

Alfred Peters

Assignee

Comment 81

•

10 years ago

Attached patch Reduce the time consumption from O(n^2) to O(n) when copying email between two IMAP-folders. (version 2) (deleted) — Details — Splinter Review

(In reply to neil@parkwaycc.co.uk from comment #80) > So, we're collecting all the message headers for operations whose source > folder is the current folder and message key is an element of the array. I > wonder whether it would make sense to check the source folder first though. If you meant it that way, yes.

Magnus Melin [:mkmelin]

Comment 82

•

10 years ago

Alfred, did you mean to ask for review for the last patch? (Also once the patch has review, mark unneeded versions as obsolete, and don't forget to set the checkin-needed keyword to get it landed.)

Magnus Melin [:mkmelin]

Updated

•

10 years ago

Status: NEW → ASSIGNED

Alfred Peters

Assignee

Updated

•

10 years ago

Attachment #8530672 - Flags: review?(neil)

neil@parkwaycc.co.uk

Updated

•

10 years ago

Attachment #8530672 - Flags: review?(neil) → review+

Alfred Peters

Assignee

Updated

•

10 years ago

Attachment #816378 - Attachment is obsolete: true

Alfred Peters

Assignee

Updated

•

10 years ago

Keywords: checkin-needed

Patrick Cloke [:clokep]

Comment 83

•

10 years ago

https://hg.mozilla.org/comm-central/rev/d414fc3554c0

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Keywords: checkin-needed

Resolution: --- → FIXED

Target Milestone: --- → Thunderbird 37.0

Log of thunderbird when futex timeouts are happening 12 years ago thomaschan (deleted), text/plain		Details
IMAP log, copy 2 mails, between Offline-Use=Off folder, Online Copy, Gmail IMAPe Wo Gmail 12 years ago WADA:World Anti-bad-Duping Agency (deleted), text/plain		Details
Copy 2 emails from one IMAP folder to another 12 years ago thomaschan (deleted), text/plain		Details
Extended log of move operation. 11 years ago Alfred Peters (deleted), text/plain		Details
Log of DeletMessege part 11 years ago Alfred Peters (deleted), text/plain		Details
Reduce the time consumption from O(n^2) to O(n) when copying email between two IMAP-folders. 11 years ago Alfred Peters (deleted), patch	neil : review+	Details \| Diff \| Splinter Review
Reduce the time consumption from O(n^2) to O(n) when copying email between two IMAP-folders. (version 2) 10 years ago Alfred Peters (deleted), patch	neil : review+	Details \| Diff \| Splinter Review