Closed Bug 749983 Opened 13 years ago Closed 9 years ago

Make "compact folders" more efficient. takes too long, and puts too much load on filte server's disk drive

Categories

(MailNews Core :: Backend, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 845952

People

(Reporter: psz, Unassigned)

Details

(Keywords: perf)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:12.0) Gecko/20100101 Firefox/12.0 Build ID: 20120423122624 Steps to reproduce: I find that compacting folders takes a long time, and is very inefficient: the whole folder is duplicated (without the deleted messages), then the old file is replaced with the new one. Whereas typically, old messages are kept in the folder for archiving and some new messages are deleted: often, a simple truncate() of the file might almost suffice. Actual results: I observe serious issues with compaction efficiency, most directly on a Linux login server, which becomes unuseably slow for a full minute, for all users, when one user is compacting his 1.5GB Inbox. The same issue occurs for Windows users who have their thunderbird mail folders on a Samba server. My workaround for now is to advise users to keep their Inbox small, to keep long-term messages in another "Keep" folder. Expected results: Could compaction be made more efficient? Some ideas: - Do in-situ. Write directly in the old folder file: starting at the first "hole", write any subsequent messages, then truncate at the end. Might not be as robust as the current duplicating method, might corrupt the file if thunderbird is interrupted while compacting; robustness may be improved by writing dummy start-of-message markers at the end of each message moved/written. - Instead of mbox files, use MIX format as UW IMAP: http://www.washington.edu/imap/documentation/mixfmt.txt.html This would also help with mailbox file size limits. Thanks, Paul Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of Sydney Australia
is your issue primarily on the server?
Component: General → Backend
Keywords: perf
Product: Thunderbird → MailNews Core
QA Contact: general → backend
Summary: Make "compact folders" more efficient → Make "compact folders" more efficient. takes too long
(In reply to Wayne Mery (:wsmwk) from comment #1) > is your issue primarily on the server? The main issue is that a single thunderbird user, compacting his Inbox, causes the Linux login server, or the Samba file server, to become slow and un-responsive for all users. That the thunderbird user himself observes his compacting take a long time, is less of an issue: at least he thinks something useful is happening. Thanks, Paul
ah, so the thunderbird profile for the user is on samba ? what size of MB are you using for the compact? bug 558528 would help individual users. Not sure how much it would reduce network load.
Summary: Make "compact folders" more efficient. takes too long → Make "compact folders" more efficient. takes too long, and puts too much load on samba networked drive
(In reply to Wayne Mery (:wsmwk) from comment #3) > ah, so the thunderbird profile for the user is on samba ? > what size of MB are you using for the compact? I have two kinds of users: - Linux users, who log in to a Linux server through Linux terminals - Windows users, who have many of their important files, like their email profiles, on a Samba server I observe slowness on both kinds of setups, most directly for the Linux users with an obvious correlation of cause and effect. The "problem" users have Inboxes of several hundred MBs; the largest Inbox we currently have is 1.5GB. > bug 558528 would help individual users. Not sure how much it would reduce > network load. The load I observe is not network but disk I/O congestion. On the Linux login server, the filesystem is a local RAID array. (Please change summary.) Thanks, Paul
(In reply to Paul Szabo from comment #4) > (In reply to Wayne Mery (:wsmwk) from comment #3) > ah, so the thunderbird > profile for the user is on samba ? > what size of MB are you using for the > compact? > I have two kinds of users: > - Linux users, who log in to a Linux > server through Linux terminals - Windows users, who have many of their > important files, like their email profiles, on a Samba server > I observe > slowness on both kinds of setups, most directly for the Linux users with an > obvious correlation of cause and effect. > The "problem" users have Inboxes of several hundred MBs; the largest Inbox we currently have is 1.5GB. what I was mean, is have you changed the default value in (windows) Tools | options | advanced | network and disk | disk space | Compact ... from 20 MB ? Users with big inbox or profile and/or very big mails should probably raise the size to 100MB or more. are any of these users using gmail? > bug > 558528 would help individual users. Not sure how much it would reduce > > network load. > The load I observe is not network but disk I/O congestion. understood > On the Linux login server, the filesystem is a local RAID array. (Please change > summary.) done
Summary: Make "compact folders" more efficient. takes too long, and puts too much load on samba networked drive → Make "compact folders" more efficient. takes too long, and puts too much load on networked drive
(In reply to Wayne Mery (:wsmwk) from comment #5) > what I was mean, is have you changed the default value in (windows) Tools | > options | advanced | network and disk | disk space | Compact ... from 20 MB ? > Users with big inbox or profile and/or very big mails should probably raise > the size to 100MB or more. I suppose some may have changed it. Regardless: I observe slowness each time compaction takes place; the setting you mention may control how often that triggers. Some of my users may receive 100MB per day, so compaction happens daily, anyway. Noting also that many of my users use Linux, not Windows. > are any of these users using gmail? No: gmail users use the web interface, not thunderbird. >> On the Linux login server, the filesystem is a local RAID array. >> (Please change summary.) > done Thanks: though it still mentions "networked drive".
Summary: Make "compact folders" more efficient. takes too long, and puts too much load on networked drive → Make "compact folders" more efficient. takes too long, and puts too much load on disk drive
(In reply to Paul Szabo from comment #0) > Could compaction be made more efficient? Some ideas: > - Do in-situ. Write directly in the old folder file: starting > at the first "hole", write any subsequent messages, then > truncate at the end. Might not be as robust as the current > duplicating method, might corrupt the file if thunderbird is > interrupted while compacting; robustness may be improved by > writing dummy start-of-message markers at the end of each > message moved/written. > - Instead of mbox files, use MIX format as UW IMAP: > http://www.washington.edu/imap/documentation/mixfmt.txt.html > This would also help with mailbox file size limits. Further to those "ideas" for a fix, a partial improvement which may be easy to implement: - When deleting a message, if that is the last message, then truncate the folder file (and the index file) at the end of the last-remaining message. This would be quick and easy, and may help to trigger compaction of the folder less often.
(In reply to Paul Szabo from comment #7) > (In reply to Paul Szabo from comment #0) > > Could compaction be made more efficient? Some ideas: > > - Do in-situ. Write directly in the old folder file: starting > > at the first "hole", write any subsequent messages, then > > truncate at the end. Might not be as robust as the current > > duplicating method, might corrupt the file if thunderbird is > > interrupted while compacting; robustness may be improved by > > writing dummy start-of-message markers at the end of each > > message moved/written. > > - Instead of mbox files, use MIX format as UW IMAP: > > http://www.washington.edu/imap/documentation/mixfmt.txt.html > > This would also help with mailbox file size limits. > > Further to those "ideas" for a fix, a partial improvement which > may be easy to implement: > - When deleting a message, if that is the last message, then > truncate the folder file (and the index file) at the end of > the last-remaining message. This would be quick and easy, > and may help to trigger compaction of the folder less often. We do that for move/delete message filters, but not delete of a message through the UI.
Bienvenu, Is there one or more good article, code comment or bug comment that talk about the efficiency/deficiency/limitations/tradeoffs of implementing compact? For (one) example I seem to remember some discussion in the last couple years. Maybe it was in the bugs related to making compact automatic.
(In reply to Wayne Mery (:wsmwk) from comment #9) > Bienvenu, > Is there one or more good article, code comment or bug comment that talk > about the efficiency/deficiency/limitations/tradeoffs of implementing > compact? Probably, but I can't think of any that advance the discussion. Compacting berkeley mailbox is expensive by nature and moving to a different storage format is really the way out. Truncating folders when the physically last message is deleted would help for some use cases but it gets complicated when users read and delete messages from older to newer, if that makes sense.
One idea I haven't seen in the bugs is the idea of not compacting a folder if the benefits are nominal vs the cost. For example, don't compact a 1GB folder if it will save only 1MB or 10MB. (bug 711765 is an idea, but from a different angle)
yes, I've outlined a strategy where we compact the folders where we get the biggest bang for our buck first, before, where the ratio of space to be reclaimed to space still used is highest.
(In reply to David :Bienvenu from comment #12) > yes, I've outlined a strategy where we compact the folders where we get the > biggest bang for our buck first, before, where the ratio of space to be > reclaimed to space still used is highest. Is Bug 711765 - Percentage based automatic Compact - suitable?
(In reply to Wayne Mery (:wsmwk) from comment #13) > Is Bug 711765 - Percentage based automatic Compact - suitable? I do not think it is. We might argue for better strategies on deciding when or which folders to compact. But the issue here is that, when we do a compaction, it is slow and "expensive". I agree with comment#10, the use of mbox files makes compaction expensive (maybe even with trickery like comment#8, which should be done anyway because is "right" and "neat"). We should aim to make compaction faster and more efficient, maybe by making message deletion fast enough to do each time, sidestepping the issue of compaction completely.
To summarize... * Short term, your option today is to increase the compact threshold to 100MB, 200MB, etc, so that compacts are less frequent. * Your stated ultimate goal is bug 845952, which elimiates compact * bug 558528, but 1242042 and friends. It's anyone's guess whether these or bug 845952 will happen first. But your stated goal is to eliminate compact, so let's dup this to bug 845952. You're welcome of course to help with or monitor the progress of the other bug reports which might aid your cause.
Status: UNCONFIRMED → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
Summary: Make "compact folders" more efficient. takes too long, and puts too much load on disk drive → Make "compact folders" more efficient. takes too long, and puts too much load on filte server's disk drive
You need to log in before you can comment on or make changes to this bug.