Open Bug 1432879 Opened 7 years ago Updated 6 years ago

Inbox Folder fails silently at 4GB file system limit (FAT32) with no error message, so better error checking is needed. Also message download attempts repeat causing excessive server load. (redownload)(automatic compact failed)

Categories

(MailNews Core :: Database, defect)

defect
Not set
major

Tracking

(Not tracked)

People

(Reporter: joe, Unassigned)

References

(Depends on 1 open bug)

Details

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.99 Safari/537.36 Steps to reproduce: Inbox IMAP file reached 4GB on Windows 7, which is Win 7's single-file size limit. Actual results: Thunderbird continues to download recent messages continually, creating excessive server load and network traffic. The loop is apparently that Thunderbird: (1) sees that there are messages on the server that are not stored locally; (2) downloads them; (3) fails to write to disk because the file is too large; (4) goes back to step (1) without reporting any issue to the user. Expected results: An error message of some kind. Preferably one that indicates something about a problem with the filesystem, and ideally one indicating that the system file limit has been reached. A simple error message would have saved me a couple of hours. More ideally also stop new message downloads until the issue is resolved. I found out about this issue when my mail server reported excessive network traffic. It took me a lot of deduction and trial and error ("what process is causing this excess traffic? dovecot. Ok, it's a mail problem. Is that a client or server issue? Client. Which user and which client?") to determine that Thunderbird was the issue, then finding the Tbird activity manager to see the odd activity of continual attempts to re-download the most recent messages, and then pinning that to the system filesize limit. Reproduced on 52.5.2 and 58.0b3.
Adding Gene for IMAP and Aceman for 4GB.
Not imap, but is reporter using FAT* or NTFS? A quick google shows FAT32 with 4GB limit but NTFS much more, maybe 16TB.
Well, the IMAP part would be that we keep downloading the same message over and over if we fail to store it.
Filesystem is FAT32.
Just a quick test of setting my INBOX mbox file read-only and immutable and receiving a new email, all I see is that the email is downloaded from the server each time it is accessed in tb, unlike other mails that are just read from INBOX file when they are accessed. I don't see repetitive fetches from the server with this simple test. Joe, was it just the INBOX file that was 4G? What was the size of the INBOX.msf? Typically it is much less. Were any other files under ImapMail huge? Curious what you did to fix this. Did you split your inbox into smaller folders? Or maybe just shutdown tb and that fixed it? Also, did you see the network activity going on constantly with any kind of sniffer such as Wireshark? Were messages being added to "Activity Mgr" every few seconds? What did the messages say? (I don't see anything in AM when messages are fetched.) Can you cause the problem again by making your INBOX large again (if you dare)? None of my mbox files are anywhere near 4G. Also, I only have systems with ext4 or NTFS so would have to have really big mbox's to reach the file size limit.
Only the INBOX file was 4GB. The msf file was not nearly that large. No other files in my profile folder were over 1G. I actually deleted the msf file first because I first suspected it was an indexing-corruption problem, but then it just started downloading my entire Inbox repeatedly. After I noticed that INBOX was exactly 4GB, googled "Windows 7 file limit" and "FAT32 file limit" like you did, and the was like "hmmmm, also 4G . . . veeeeery suspicious". I deleted the INBOX file, Thunderbird downloaded all my Inbox messages one last time, and now INBOX is like 700MB and all is well. I can't explain why the new INBOX file is one-fifth the size it was before. I have never compacted it, but docs say "The compaction process is done automatically in Thunderbird (since version 5) when it saves more than 20 MB of space on the disk."[1] I did see the network activity locally using the built-in Windows network monitor, which matched up with the Thunderbird Activity Manager, and with tcpdump of port 993 on the server (which showed a continuous stream of my client computer's IP). The Thunderbird Activity Manager would show a "downloading x messages of y", with x incrementing from zero up to y each time that lined up with a burst of 100% network use. Thunderbird would then pause for about a minute and start over. y was increasing over time (from around 90 to 120 during the time I was watching it, presumably messages that were not getting stored locally). The Windows network use graph ended up looking like a series of plateaus: about a minute of 100% network, then about a minute of 0% network, repeated. I guess I could theoretically pile a bunch of **** into my Inbox (send myself huge number of emails with large attachments?) to make it four gigs again to reproduce, but I think that exceeds my availability to science and open source at the moment. [1] https://support.mozilla.org/en-US/kb/compacting-folders#w_when-does-compaction-occur
More info here where I posted this as a question and then answered it when I figured it out; plus screenshot of Activity Monitor: https://support.mozilla.org/en-US/questions/1201115
(In reply to joe morris from comment #6) > Only the INBOX file was 4GB. The msf file was not nearly that large. No > other files in my profile folder were over 1G. I actually deleted the msf In Options->Advanced->config editor, set mailnews.allowMboxOver4GB preference to false. So that TB should not even try to cross 4GB. But I'm not sure this is implemented for IMAP to... > before. I have never compacted it, but docs say "The compaction process is > done automatically in Thunderbird (since version 5) when it saves more than > 20 MB of space on the disk."[1] > https://support.mozilla.org/en-US/kb/compacting-folders#w_when-does- > compaction-occur Can you check the Options if you really have compaction enabled?
Thanks for the info. Are you using the "move deleted items to trash folder" setting or possibly the "just mark it as deleted" or "delete immediately" settings for your server? If using the latter two, there should not be any automatic compact since it would "expunge" your emails from the server permanently and you have no "Trash" as backup. When "deleting to trash" not sure about the auto-compact. I have only seen occasional prompts to do a compact usually after accessing nntp folders. Don't think I have seen it occur just on its own. I will look into this. Unless you compact a folder, when you delete a message to trash it doesn't leave the mbox file. So after a long time with lots of emails coming in and many deleted, the file can still grow to a large size, even though a lot of it has been "virtually" deleted. Compacting will get rid of the deleted stuff in the mbox. When you delete the mbox and mbox.msf files and let them reload from the server, they will also be smaller since expunged items won't be fetched. This has the same effect as compacting. But some of this depends on whether your server supports the imap MOVE command or if it auto-expunges emails that are deleted as gmail does by default. I just stumbled on a config editor pref called mailnews.allowMboxOver4GB that defaults to true. I assume you have not touched this and it is still true on your system? Maybe this would have helped. I will look at what this does. More to come...
I set a breakpoint on where the 4GB limit is potentially checked and it only seems to occur when I copy a file to Local Folders, not when anything is added to or copied to an imap folder, so I guess aceman is right on that ;<. I guess that makes sense since it is implemented in the /mailnews/local directory.
Compaction is/was enabled (T-bird Options->Advanced->"Compact all folders when it will save over 20 MB in total" Server trash setting (Account Settings-> Server Settings) is "When I delete a message: Move it to this folder: Trash on [account name]". Trash is a subfolder of Inbox, so I think that means in the same INBOX file. It has some files in it, but I do clean it out periodically so not like three gigabytes. I have never used the config editor to change any settings. IMHO the mystery of why the INBOX is now smaller is maybe interesting and worth looking into, but the network-usage issue when the file limit is reached seems perhaps easier to trace and more critical to solve. Even if there was some file-compaction issue for me, it's quite conceivable someone else would hit the 4GB limit and get this network-flood effect with proper compaction happening. Also, one other data point that has occurred to me is that the network traffic was greater than what would have been warranted for a single download of the "missing" messages. Thunderbird's activity manager reported it downloading 100 messages or so, and each attempt was about a minute or two of ~50Mbs network traffic (then a minute pause, then repeat). So back of the napkin math (60s * 50Mbs), that's like 3GB in each ~1 minute attempt, and my last 100 messages at the time were definitely not anywhere near 3GB. Maybe within each attempt at downloading the set of messages, it was trying to download each message (download message, fail to write it, download again, write fail again) multiple times? Speculative, but the network traffic generated was greater than just a single download of the messages that Thunderbird was reporting in its activity manager. Again, I am 100% sure all the network traffic was coming from Thunderbird because I was watching it locally (Windows network monitor) and on the server (tcpdump and RRDtool), and when I closed Thunderbird it would stop, when I opened Thunderbird it resumed, and Windows network monitor showed ~1min pulses of activity that matched the Thunderbird activity monitor. The initial report from the server that got me investigating was that it "has exceeded the notification threshold (5) for outbound traffic rate by averaging 17.04 Mb/s for the last 2 hours." I was like "oh #$%!, that server only has a 50GB disk, what could possibly be downloading 60GB/hour?". It does seem a little odd that, what with Win7 being an older operating system this has never been reported before, and newer OS's and filesystems tend not to have filesize constraints. So, maybe those are reasons to de-prioritize. But this was somewhat difficult to trace -- I needed to observe server and client at the same time -- so maybe that's why it wasn't reported. And it is pretty bad if it's is true: you don't want sysadmins to be tracing mysterious floods of network activity and have it turn out to originate with a bug in Thunderbird.
(In reply to joe morris from comment #11) > It does seem a little odd that, what with Win7 being an older operating > system this has never been reported before, and newer OS's and filesystems > tend not to have filesize constraints. So, maybe those are reasons to > de-prioritize. But this was somewhat difficult to trace -- I needed to > observe server and client at the same time -- so maybe that's why it wasn't > reported. And it is pretty bad if it's is true: you don't want sysadmins to > be tracing mysterious floods of network activity and have it turn out to > originate with a bug in Thunderbird. Win7 should be OK. The default filesystem type on Windows NT-based systems for at least the last 20 years is NTFS, not FAT32. NTFS does not have the 4GB single-file limit. So you have intentionally formatted the partition with this older FS type. Which means there are not many users and sysadmins that could hit this. I'm not an expert in IMAP and do not know TB internals on what it does if the IMAP file can't accept to store a new message. What does it then do and what does it redownload. But if your real size of all messages is only 700MB, then compaction should probably have kicked in long ago.
(In reply to :aceman from comment #12) > > Win7 should be OK. The default filesystem type on Windows NT-based systems > for at least the last 20 years is NTFS, not FAT32. NTFS does not have the > 4GB single-file limit. So you have intentionally formatted the partition > with this older FS type. Which means there are not many users and sysadmins > that could hit this. That's what I thought too. I suppose the reporter has a reason for using the FAT32 FS instead of default NTFS. > > I'm not an expert in IMAP and do not know TB internals on what it does if > the IMAP file can't accept to store a new message. What does it then do and > what does it redownload. > But if your real size of all messages is only 700MB, then compaction should > probably have kicked in long ago. I don't know in the code exactly where the mbox offline storage is written, but doing an experiment on both Windows 10 and Linux, if the INBOX file is made read-only and a new and large message comes in, tb just treats that message as if offline storage/sync has been disabled and downloads the message and any attachment from the imap server each time that message or attachments are accessed. For the previously stored to INBOX emails, no network access occurs since they are just read from INBOX file. I see network activity only when the not-stored emails are accessed; no continuous or periodic access observed otherwise. However, if the INBOX is writable and a write starts but fails part way into the write (e.g., error = file size exceeded), that might be a different story. I will try to simulate that.
I found the place where the write occurs. I changed the code so that the Write() returns error NS_ERROR_FILE_TOO_BIG, that according to documentation means "Completing the write or seek operation would have resulted in a file larger than the system could handle" and printf a message to verify it is hit. I see no ill effect when the new emails arrive. The INBOX file does not get larger, due to the write error, but newly received emails can still be accessed. I see no unusual network activity either. I do notice that when older emails are accessed (those that were stored in INBOX file before the error was introduced) there is no network activity. But when the new emails that were not stored in INBOX are accessed, they are downloaded from the server over the network when accessed. This is exactly how it should work. However, I just noticed that I had "check for new messages" every X minutes turned off. When I turn this on and set it to the minimum of 1 minute, I do see and attempt to download the un-stored messages every 60 seconds. This is probably what the reporter is seeing. Maybe with 100 fairly large messages not stored due to write error over a period of time, and a fairly short "check for new messages" time, it could result in a continuous attempt to download messages that never finishes before the next attempt starts. Joe, does this seem plausible? What is you "check for new messages" time set to?
My check for new messages is set to every 10 minutes, but the checks/download bursts were happening more frequently -- it was about equal parts full network use and zero network use, about one minute on one minute off (maybe two on two off, but not ten). It also does not explain why the network traffic exceeded the data size of the messages Thunderbird reported it was trying to fetch. My server rejects messages over 20MB (postfix message_size_limit), so the theoretical size limit for 100 messages would be 2GB; and I know I didn't get 100 messages in a row that were the max file size (probably none were even close). I still suspect there's some loop that, in each attempt to download un-stored messages, was fetching each un-stored message more than once. For the curious, I have FAT32 because I created a separate encrypted partition for my Thunderbird profile, and for no particularly good reason I can now recall, I formatted it as FAT32. Not a common choice, and again I'd perceive that as a reason to de-prioritize but not ignore: some users might put their profile (or their whole file system) on removable media (USB, SD card) which is still typically FAT32, and again the inconvenience is as much for the server as the client. I documented this on the MozillaZine knowledgebase (http://kb.mozillazine.org/Repeatably_downloading_the_same_messages), and an admin there messaged me "The maximum folder size in Thunderbird used to be 4GB but that limit was removed in version 52 . . . perhaps the removal of the 4GB folder size limit broke in the version you used." So, FWIW, could be a reason this hasn't been seen before.
Ok, thanks for the reply. I've noticed that regardless of the error when writing to the mbox file, tb does not tell the user anything. So there could be other access errors other than "file too big" that could cause the same result you see. Probably the most typical mbox access error is if the file is deleted. In that case, tb just silently re-fetches all the messages and rebuilds the file. Another work-around might be to use the "Maildir" file format instead of mbox. This trades a possibly very large mbox file for numerous small mail files, one per message. However, this may cause the issue "too many files" for some legacy filesystems. And, I think there are other issues with maildir but I am using it on this win7 laptop now with no problems. Anyhow, not sure what more I can do on this. Maybe your documentation article is enough to call this "resolved" in some way?
I was going to suggest another work-around for the compact problem. That would be set the "Clean up (expunge) Inbox on exit" imap server option. However, when I tried this after deleting a large email, TB sent the expunge to the imap server, which is good, but it doesn't actually do anything to the INBOX file. I only see the INBOX file get smaller after a right-click "compact" is manually done. (This is on a system using mbox format, of course.)
Also, I was curious about what the auto compact thing does and if it really works. I set my threshold down from default 20MB to 1 MB and thought I would quickly see something happen after deleting at least 1MB of INBOX mails. Nothing happened; INBOX never got compressed. I then found in the code where AutoCompact() is periodically checked (about every 30s or so). This is in nsImapMailFolder.cpp. The problem is it is never called because unless TB is set to offline. Also, you have to leave it in offline mode for at least 1 hour before you get the prompt to allow all folders to be compacted. I set the hardcoded timer down to 2min and now I get the prompt every two minutes as long as I am offline and I cancel the prompt so the folders are not compressed. This seems to be the problem why the reporter never got a prompt to do compaction. (Also, I don't think I have ever seen it either except for nntp/news folders.) I have looked but don't see any documentation as to why compaction prompt (and compaction) only occur in offline mode for imap folders. I don't see the same restriction for pop3, news and local folders. Maybe there is a concern that compacting while online could be a problem if new emails arrive? But if that's the case, why is the manual right-click compact command allow while online? I can "fix" the problem so that compaction prompt and auto-compact occurs both online and offline by changing the condition for when AutoCompact() is called in nsImapMailFolder.cpp; just remove the check for offline: // if (rv == NS_MSG_ERROR_OFFLINE) { if (aMsgWindow) AutoCompact(aMsgWindow); }
(In reply to gene smith from comment #18) > Also, I was curious about what the auto compact thing does and if it really > works. I set my threshold down from default 20MB to 1 MB and thought I would > quickly see something happen after deleting at least 1MB of INBOX mails. > Nothing happened; INBOX never got compressed. Bug 1415468!?
(In reply to Alfred Peters from comment #19) > (In reply to gene smith from comment #18) > > Also, I was curious about what the auto compact thing does and if it really > > works. I set my threshold down from default 20MB to 1 MB and thought I would > > quickly see something happen after deleting at least 1MB of INBOX mails. > > Nothing happened; INBOX never got compressed. > > Bug 1415468!? Not the same. That bug is about reducing the total amount of offline storage that is typically kept. This bug is about why the user of imap is never prompted when he has at least X MB of possibly wasted offline storage that could be deleted (compacted out) and avoid his INBOX file reaching the FAT32 max of 4GB.
Thanks for all the attention you have given to this Gene and everyone else. I was happy to get any response at all, and it seems like perhaps there might be a fix to this file-compaction issue which I wasn't even really thinking about when I filed this. I would still like to know if it's possible to report errors writing to disk to the user. I cannot think when there would be a *write* error (as opposed to an access error) and as a user I'd be like "oh whatever". Yes, FAT32 is rare, but as Gene writes: "I've noticed that regardless of the error when writing to the mbox file, tb does not tell the user anything. So there could be other access errors other than 'file too big' that could cause the same result you see." A file permission error on linux/unix comes to mind. Maybe there are some circumstances where the error would be annoying or inappropriate, but seems like it might be worth the risk of putting it in for those that need to see it. Stick it in the status bar maybe if you are afraid of annoying users with pop-ups. I'm not sure disk *access* errors would be so important. Gene writes "the most typical mbox access error is if the file is deleted". It might be nice to have a message when that happens, and/or a user confirmation before Thunderbird commences re-downloading (e.g., if the profile is inadvertently moved/unmounted and can be moved back/remounted rather than re-downloading). Fairly often, though, someone might delete their profile/mbox on purpose, like I did to reconstruct my local mbox here. I perceive that as a nice-to-have vs. critical-to-have for write error reporting.
(In reply to joe morris from comment #21) > I would still like to know if it's possible to report errors writing to disk to the user. This definitely should be done. I have a patch set that inserts many error checks and reporting to write-related errors. Now that the workload of my day job has dwindled a bit (the second half of the year poses greater load than in the first half), I am trying to re-sync the patch sets with the latest source code: turns out a key function is missing now, and I am scratching my head. (NS_BufferOutputStream was removed in Oct, 2017. This is a key function used in a place of my patch set in Bug 1242030 - Consolidated patch set from bug 1122698, bug 1134527, bug 1134529, bug 1174500 ) > Stick it in the status bar maybe if you are afraid of annoying users with pop-ups. Developers and users may have different opinions on how such errors are reported, and I have known that there *are* different opinions, certainly. But the errors definitely need to be reported somehow so that Joe's case can be analyzed and, what is more important, Joe could take a sensible work-around knowing, "Oh, the file system is filling up", etc. TIA
Chiaki and all - thanks for the analysis. So we have multiple bad problems here :( FWIW a short list of other recent compact reports https://mzl.la/2ohXN6f - I haven't concluded any are related, and there are more of course https://mzl.la/2od7SkD
Severity: normal → major
Status: UNCONFIRMED → NEW
Component: OS Integration → Database
Ever confirmed: true
Product: Thunderbird → MailNews Core
Summary: Thunderbird, with no error message, creates excessive server load when Win 7 4GB file limit reached → Folder becomes 4GB file with no error message because automatic compact fails/doesn't happen. Also creates excessive server load because message download attempts repeat.
Version: 58 Branch → 58
I added a patch for Bug 1157256 (found in Wayne's list) that allows auto-compacting for imap folders in online mode. See comment 18 above.
(In reply to gene smith from comment #24) > I added a patch for Bug 1157256 (found in Wayne's list) that allows > auto-compacting for imap folders in online mode. See comment 18 above. Joe, beta now has that patch. Can you try it from http://www.mozilla.org/en-US/thunderbird/channel/ and let us know it is working for you?
Flags: needinfo?(joe)
Summary: Folder becomes 4GB file with no error message because automatic compact fails/doesn't happen. Also creates excessive server load because message download attempts repeat. → Inbox Folder becomes 4GB file with no error message because automatic compact fails/doesn't happen. Also creates excessive server load because message download attempts repeat. (redownload). And fs is FAT32 with 4gb file limit.
Sorry I won't be able to test: (1) we just had a baby so my available time has radically decreased; and (2) this is not exactly an easy test. The problem went away when I deleted and re-downloaded my Inbox, which apparently caused it to be compacted. To really test, I would have to intentionally stuff a folder full to get to a 4GB folder-file to reproduce the effect, and then clean it out again, which would be fairly time consuming. It sounds like the auto-compaction should reduce the likelihood of an IMAP folder reaching 4GB in the first place, but I'm not sure how to test that.
Flags: needinfo?(joe)
Depends on: 1242030
Summary: Inbox Folder becomes 4GB file with no error message because automatic compact fails/doesn't happen. Also creates excessive server load because message download attempts repeat. (redownload). And fs is FAT32 with 4gb file limit. → Inbox Folder fails silently at 4GB file system limit (FAT32) with no error message, so better error checking is needed. Also message download attempts repeat causing excessive server load. (redownload)(automatic compact failed)
You need to log in before you can comment on or make changes to this bug.