Closed Bug 172786 Opened 22 years ago Closed 22 years ago

Building mail summary file takes forever

Categories

(MailNews Core :: Database, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rob, Assigned: Bienvenu)

References

Details

Attachments

(2 files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.1) Gecko/20020826 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.1) Gecko/20020826 Sometimes, when I have to shut Mozilla down and restart, it can take 5-10 minutes to build the summary file for my Inbox. There's more examples of users complaining about this here: http://groups.google.com/groups?q=mozilla+building+summary+file&hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=353D2505.C8304A20%40netscape.com&rnum=3 http://www.geocrawler.com/archives/3/116/1999/2/0/592662/ Reproducible: Sometimes Steps to Reproduce: 1.Open mail window 2.Try to get mail or click on inbox (other folders don't seem to have this problem - I have only 784 messages in my inbox) Actual Results: 3.Get "Building mail summary file..." for 5-10 minutes and hard drive crunches away... it's naptime Expected Results: Summary file should already be there and be built. Even if it's not or is corrupted, it should take that long to build. There have been times where I see each message come into the inbox and it takes about 1 second per message. That is awful! I have a 1.4GHz Athlon and reasonably fast hard drive.
This message are VERY old. (1998 and 1999). Mozilla doesn't rebuild the .msf files unless they are corrupted or deleted. I don't have this problem with my bugzilla mail folder with 11250 messages. (Athlon 1.3Ghz and a fast HDD). A complete rebuild needs of course it's time but you should not get this unless something special happens (i never got this). Have you ever compacted the folders (a deleted message is only marked as deleted and will only physical deleted if you compact the mail folder) ? How big is your mail file in your profile for that folder ?
Just checked main "mail" folder sizes and it's 443MB, which is pretty big. Did an "empty trash" and it's down to 154MB, which is more reasonable and expected. My in-box folder is 128MB, where 95MB of that is a set of e-mails with 1MB attachments that I need to hang onto. I don't think there's anything too wild about that. I'm not sure if emptying the trash will fix the odd "building file..." or not. One point I was hoping to make with the old '99 and '98 messages is that this is not a new issue, and it can still happen, and still causes problems. I will be watching for the situation to occur again and add more detail as I encounter it. I don't restart Mozilla that often - once every many days, usually after it crashes. It could be that it's corrupting the in-box file when it crashes. I don't know.
It could be the problem that you have all this big files in one folder. Mozilla will only close the folder file if you don't use it. If you look in the inbox the whole time and mozilla crashes it's possible that mozilla must rebuild the .msf file. If the .msf file is open while Mozilla crashes it must rebuild the .msf file or you could get corrupted mail folders. Mozilla must read the complete mail file to generate the msf and with 450MB it is of course not very fast. but i don't more details but bienvenu should know more.
QA Contact: gayatri → esther
yes, we have to read through the whole file to regenerate the summary file, so if it's huge, we're going to spend a lot of time in just diskio. we shouldn't have to regenerate summary files very often. Do you have your inbox on a file server, or on your local machine? If the former, that would explain both why your .msf files are getting out of date, and why it's so slow to regenerate them. there's also a bug where we update the message display while regnerating the .msf files, which we should fix, but I doubt that's what's taking so long.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Local machine. This would fix the problem: Store attachments outside of the "Inbox" file (or folder file) so that rebuilding the .msf file doesn't require scanning through MBs and MBs of attachment data. It is entirely useless to scan 50 or 100MB worth of attachments when you are just trying to get the header information for 700 messages. That's the only reason I can imagine for such slow performance during rebuilding. Instead, make 1 file (or a separate directory) for attachments and one for actual messages, so my 100MB of attachments don't slow down the rebuilding of this inbox.msf file that does get corrupted from time to time.
we use the berkeley mailbox format for compatibility with other mail readers - it's a standard mailbox format, but it requires us to store the attachments inline. 5-10 minutes is still an incredibly long time to read through a 128 MB file - it takes a fraction of that time on my machine (1.6 GHz with a reasonably fast hard drive). So I suspect something else is contributing to the slowness, besides the size of the file. Things like a virus checker, or horrible disk fragmentation.
I don't have a virus checker. Could be some fragmentation. OK, clear on the format standard. I give up for now.
you might defrag. I'll do some tests here on a file that size when I get a chance for comparisons sake.
I just upgrade to Mozilla 1.2 from Netscape 4.8. Building email summary takes AGES now, compared to old Netscape. Since mail folders are the very same, and bout programs (NS and Mozilla) read the entire folders in order to rebuild summary, i'd expect this operation takes about the same time. It doesn't seem the case, nevertheless. I'd say about 1:2 performance hit for Mozilla.
This has happened to me several more times since I posted this boog. I've defragged the hard drive and doesn't help much. What appears to happen is: 1. Mozilla / Mozilla Mail crashes 2. Getting back into Mozilla after the crash, the mail file index is rebuilt - which takes several minutes for probably a few hundred e-mails. I'm talking full hard drive running crazy while I sit and watch the messages enter the index 1 by painful 1. Each only takes a second or less, but I say again that this is way too slow. It needs to be tuned up.
Rob and Jesus, have you tried 1.3a? It should be quite a bit faster than 1.2 for rebuilding the summary files.
I have not yet. I will get that installed soon and report findings.
MS Windows 98. Mailbox: 12.5 MB Rebuild mailbox database: - Netscape 4.8: 15 seconds. - Mozilla Build 2002122208: 45 seconds :-((
*** Bug 186902 has been marked as a duplicate of this bug. ***
100% reproducible on Linux, build 2003012322 Environment: OS: Mandrake Linux 9.0 Hardware: Pentium III 666 MHz, 512 MB RAM, hard disk storing the Mozilla profile: SAMSUNG SV2042H, ATA DISK drive (20411 MB in size), UDMA 100 A folder in Local Folders has 38M in size. Its name is "NMAIL Message Notifications". For your testing you can download a gzipped archive (3.8M) here: <http://olo.office.altkom.com.pl/domowa/qa/mozilla/2003_01_24_Large_Mail_Folder_loading/> Mozilla's behaviour: When I enter "Local Folders/Admin/NMAIL Message Notifications", Mozilla starts to build the summary file. Mozilla consumes all available CPU time. After a long time (~20 minutes) progress bar eventually stops moving. After some more time Mozilla hangs completely and even doesn't update its window when I switch from other apps. The summary file has a size of 0 bytes all the time.
BTW, I have a message filter rule that moves messages to this folder, and new messages keep coming very frequently - about 5 msgs per 1 minute during peak hours - those are admin notifications from a 500 user mail server.
*** Bug 182907 has been marked as a duplicate of this bug. ***
BTW I've placed more testcases at the location from comment #15, all are compressed archives containing versions of the big folder I have trouble with. The latest ones will be compressed with bzip2 instead of gzip - now they only take 1.9MB.
ok i get this problem too, on both win98 se and linux, but its a tad slower to build on linux (both have mozilla 1.3 release), i should mention that i symlinked the 'Mail' directory from windows to linux so i use the same folders for both... i dont know if that could be a problem, because it seems to build it everytime i reboot computer.. maybe every time i switch OS it thinks the summary file is not up to date ?!?! btw, i have about 300 messages, and it takes about a minute to build summary file for inbox, size of Inbox file: 26309K how do i prevent it from rebuilding every time? Nehal
yes, the symlink is what's causing the problem. The timestamps stored in the .msf files aren't agreeing with the timestamp of the file on the disk, presumably becausee the two os's aren't in sync as far as the times are concerned.
so what would be the fix? (other than not symlinking, i would like to use common folders)
I would like to lend my support to this bug report. I get "Rebuilding summary file" at least a few times a week (probably because Mozilla or Windows hangs just as often). It takes eons to complete. When it's done, there is no visible difference, but OK, maybe it's repairing something. Surely, there must be a way to accelerate this. Maybe the answer is not to let the mail file get so big so quickly in the first place (mine is well under 100MB, and it still takes several minutes on a 1999 PC). Perhaps the answer is to index the mail file and only rebuild those parts of it which require rebuilding. Please, someone optimize this. Thank you.
I am sitting here right now waiting. I have a "backups" folder off of my inbox that has serveral hundred e-mails with attached files. The attached files are 1-2MB apiece. For some reason, it is taking 1 second PER e-mail to rebuild the index. This is ridiculous. For some reason, Mozilla e-mail is so incredibly unstable that I can't run my e-mail for more than a few days without needing to rebuild all of my indexes. This is a complete mess.
In case it wasn't clear, it seems like the time it takes to index somehow depends on the size of the attachment, which seems wrong to me.
parsing a mail folder/mail message is dependent on the size of the message - we have to read through the message to find the beginning of the next message. 1MB a second is still slow, however. My summary files are never invalidated. There seems to be something about some people's configurations or useage models that invalidate summary files. If I had reproducible steps on my machine that caused this bug, I'd have a much better chance to fix it.
can someone tell me how it decides whether it should rebuild or not, so i can fix for my problem (comment #19), does it compare dates of files or what? thankyou, Nehal
Fixing bug 58308 is likely to provide a satisfiable fix to this bug. Adding bug 58308 to deps.
Depends on: 58308
I think the problem is not frequent rebuilds but the time spent doing them. Rebuilding a mailbox under old Mozila 4.8 is THREE times faster. The very same mailbox! :-(. See comment #13 This issue and general inestability in email/news component keeps me using NS 4.8 for email. Mozilla is superbe as a browser, nevertheless. And no, Maildir is not an option for me. I'd like to see Mozilla reindexing a mailbox, at least, as fast as NS 4.8 :-). It's clearly doable, since NS 4.8 is already doing it :-). The limiting factor should be HD bandwidth, not CPU or memory. Currently HD transfer rates of 20 MB/s are "normal".
mozilla decides to rebuild the summary file based on two criteria - the last modified time, and the size of the mailbox file. When we make a change to the mailbox file, we sync the file and then get the last changed time and the file size, and write it to the .msf file, and save the .msf file. The next time we open the .msf file, we compare the size and last modified time of the mailbox file with what we have in the .msf file. If either doesn't match, we assume an external agent has changed the mailbox file, and thus our summary file is invalid. Outside of Mozilla crashing while writing out the .msf file, this should be relatively robust. There are a few possibilities for the .msf file getting out of sync: 1. Some other program (e.g., another e-mail program, a virus checker, etc.) is changing the last modified date or file size of the mailbox. 2. Mozilla is changing the file but not writing out the new last modified date to the .msf file for some operation. 3. The OS is reporting one last modified date when we ask for it after making a change, and then reporting a different last modified date later when we open the file again. Daylight savings time seems to cause this on some OS's. Having your mailboxes on a networked drive can also cause this problem.
I have a patch that helps somewhat, which I'll attach. But I believe the real reason we're slower than 4.x is that nsInputStreamPump::OnStateTransfer() limits itself to 16K when sending OnDataAvailable - if I comment out this check, I get speeds similar to 4.x. I'll investigate more, and talk to someone (Darin?) about this: // XXX need to make max ODA size configurable if (avail > 16384) avail = 16384;
Status: NEW → ASSIGNED
Attached patch partial fix (deleted) — Splinter Review
this patch makes it so we delay displaying the thread pane when reparsing a local folder (so we don't waste time updating the thread pane), and I removed a status update that wasn't correct because contentlength was always 0 and slowed us down as well.
Attached patch don't limit ODA to 16K (deleted) — Splinter Review
this patch removes the 16K limit on ODA data - this speeds up reparsing quite a bit. I'm going to run with it a bit to make sure it doesn't break anything.
the reason Aleksander's mail folder takes so long to parse is that it's basically one giant thread, if we thread by subject, and our code that adds messages to threads breaks down performance-wise when the threads are huge. If you turn off threading by subject without_re, it should be a lot faster. I'll try to figure out a way to improve this code.
*** Bug 196607 has been marked as a duplicate of this bug. ***
Darin, I haven't had any problems with the patch that removes the 16K throttle - does it look OK to you? Are there any tests you want me to try? I'd like to consider checking this in for 1.4, or does that seem crazy?
1 folder 68 MB (16916 messages, 500 has small text attachements) all same two subjects: Ask A Question and Re: Ask A Question Just imported from Outlook Express Summary took 100% CPU for 4 hours XP pro Pentium III 600MHz 7200rpm new harddisk never used 1.4a Build 2003040105 I have a sent folder of 10000 messages I want to do the same to. Should I wait so it can be done using a psossible patch? Or will deleting the msf file be enought for testing?
Michael, see comment 33 - http://bugzilla.mozilla.org/show_bug.cgi?id=172786#c33 - I have a patch that will allow you turn off threading by subject without re:, with a hidden pref, that I will check in when the tree opens again for general development.
Comment on attachment 121077 [details] [diff] [review] don't limit ODA to 16K r=darin (without this check, max ODA is 64k... and i think that should be ok and probably good for most stream listener implementations)
Attachment #121077 - Flags: review+
Comment on attachment 121077 [details] [diff] [review] don't limit ODA to 16K sr/a=sspitzer
Attachment #121077 - Flags: superreview+
Attachment #121077 - Flags: approval1.4b+
fix checked in for 1.4 final - should speed up folder parsing substantially.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Tp on linux and mac improved quite a bit! :-) btek (linux) ---------------------- Tp before: 1078 Tp after: 1017 ( -61ms, or 5.67% improvement ) silverstone(mac osx) ---------------------- Tp before: 593 Tp after: 539 ( -54ms, or 9.1% improvement ) monkey (mac osx) ---------------------- Tp before: 664 Tp after: 584 ( -80ms, or 12.05% improvement ) fuego (linux) ---------------------- Tp before: 2655 Tp after: 2596 ( -59ms, or 2.22% improvement ) creature (win2k) ---------------------- Tp before: 270 Tp after: 269 and 268 ( no difference )
great!
So, what happens if we make ODA bigger than 64K? Would Tp go down even more?
Quite possibly - Darin talked about increasing the number of 16K segments. The issue is that we'd need to make sure all ODA handlers can deal with getting more data than 64K at once. I think we should try it. I suspect it would help with local file operations the most, since those operations are most likely to have more than 64K of data at a time.
roc, bienvenu: the issue is that a lot of ODA impls will take the count parameter and pass that straight to malloc (either directly or indirectly). gzip stream converter is a good example. of course, such impls could be revised :-/ anyhow, the parameters of interest are stored in netwerk/base/src/nsNetSegmentUtils.h the default values are: #define NET_DEFAULT_SEGMENT_SIZE 4096 #define NET_DEFAULT_SEGMENT_COUNT 16 also, nsIOService owns a cache of these segments. the default number of cached segments is: #define NS_NECKO_BUFFER_CACHE_COUNT 24 with a 15 minute expiration. if someone has spare cycles, it'd be worthwhile to play around with different configurations of these settings. fwiw: in the past i tried upping the max buffer size, but didn't see much improvement beyond 32k. i also have a bug somewhere about making these buffer sizes more configurable (per channel, transport, stream pump, etc.).
Tp on Windows (creature) has not improved but it has on Linux (btek). Why ? Do btek and creature have the same internet connection speed ?
No, I think Windows is just better behaved in this respect. Any time you have a lot of thread switches/interactions like this, Mac and Linux seem more adversely affected. At least, that's one possibility. An other possibility is that Windows Mozilla didn't run into the 16K limit as often for some reason; perhaps it processed the data quickly enough so that the data didn't backup - I don't know.
beast is pretty damn fast, so yeah... unless we have some windows numbers for a slower box, i wouldn't discount the possibility that this also helped windows (in general).
Bienvenu, concerning your comment #33: this issue hasn't been fixed yet, right? Is there a follow-up bug? I have a testcase at hand (the one from comment #15) and am willing to test a fix when available. The problem is still visible - when I visit the mentioned folder in threaded mode, Mozilla consumes all CPU for several seconds. It's on a Pentium III 666MHz, with 512 MB RAM and the folder only contains 318 messages, but all of them have the same subject.
slow threading when all messages have the same subject is a different problem that is not fixed. I think there's a separate bug filed for it, but if not, I'll file a new one and note it here.
David, bug 159660 seems to cover this.
Adam, no, that's a different issue, I think. That has to do with listing the contents of a thread and displaying them. The slowness threading messages with the same subject has to do with the algorithm used to add new messages to a thread when we're threading by subject.
David, I've opened bug 226730 for the threading problem.
I don't know if this info will be useful (I have a stripped binary), but here are the backtraces from next-stepping through the running mozilla-bin process (the one that consumes all the CPU): 0x4285f900 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so (gdb) bt #0 0x4285f900 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so #1 0x40a8bf96 in PL_DHashTableRawRemove () from /usr/lib/mozilla-1.4/libxpcom.so (gdb) next Single stepping until exit from function NSGetModule, which has no line number information. 0x40a8bf96 in PL_DHashTableRawRemove () from /usr/lib/mozilla-1.4/libxpcom.so (gdb) bt #0 0x40a8bf96 in PL_DHashTableRawRemove () from /usr/lib/mozilla-1.4/libxpcom.so #1 0x09cebd28 in ?? () (gdb) next Single stepping until exit from function PL_DHashTableRawRemove, which has no line number information. 0x40a8bf0a in PL_DHashTableOperate () from /usr/lib/mozilla-1.4/libxpcom.so (gdb) bt #0 0x40a8bf0a in PL_DHashTableOperate () from /usr/lib/mozilla-1.4/libxpcom.so #1 0x09cebd28 in ?? () (gdb) next Single stepping until exit from function PL_DHashTableOperate, which has no line number information. 0x4285faf5 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so (gdb) bt #0 0x4285faf5 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so #1 0x09cebd28 in ?? () (gdb) next Single stepping until exit from function NSGetModule, which has no line number information. 0x40a8c063 in PL_DHashTableEnumerate () from /usr/lib/mozilla-1.4/libxpcom.so (gdb) bt #0 0x40a8c063 in PL_DHashTableEnumerate () from /usr/lib/mozilla-1.4/libxpcom.so #1 0x09cbb498 in ?? () (gdb) next Single stepping until exit from function PL_DHashTableEnumerate, which has no line number information. 0x4285f550 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so (gdb) bt #0 0x4285f550 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so #1 0x40a8c063 in PL_DHashTableEnumerate () from /usr/lib/mozilla-1.4/libxpcom.so (gdb) next Single stepping until exit from function NSGetModule, which has no line number information. 0x424d9de0 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmork.so (gdb) bt #0 0x424d9de0 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmork.so #1 0x4286e64a in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so
thanks, I know exactly what's going on, but it's hard to fix - turning off threading by subject will speed it up for you. user_pref("mail.thread_without_re", false);
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: