Closed Bug 55814 Opened 24 years ago Closed 24 years ago

downloaded mails are lost when disk is full

Categories

(MailNews Core :: Backend, defect, P2)

x86
Windows 2000
defect

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.9

People

(Reporter: spachti, Assigned: naving)

References

Details

(Keywords: dataloss, Whiteboard: [nsbeta1+]relnote-user [1.0 stop ship?])

Attachments

(1 file)

i tested Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; m18) Gecko/20000929 Netscape6/6.0b3 and stored the netscape profile and mail a samba drive where the available space is limited by quotas (digital unix). When the drive is nearly full and i download a mail which is bigger than the available disk space (via pop3) the mail is downloaded to my workstation and the subject is shown. but: when i want to view the mail i get an unknown error. after i closed the mail client(and browser) i deleted some files to get free space and opend the mail client again. the downloaded mail(s) are neither in my local mailbox nor on the pop3 server --> mail lost ! if have you have further questions please mail to martin.sperl@gmx.net thanks for your great work
not a database issue. Jeff has worked on this in the past, and a mozilla contributor checked in a fix for pre-flighting the disk space available. I wonder if FE_DiskSpaceAvailable is returning the right value for the Samba drive.
Assignee: bienvenu → putterman
Component: Mail Database → Mail Back End
reassigning to jefft. We should verify that mail is really lost. In the past we've minused these bugs because we thought mail wasn't being lost.
Assignee: putterman → jefft
Investigating...
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Target Milestone: --- → M18
Will be mentioned in general low disk space release note item Related: bug 57902, bug 49868, bug 32443
Keywords: relnoteRTM
QA Contact: esther → laurel
windows through samba limited by quotas reports the free space of the whole disk and not the available space for the specific user limited by his quotas. a serious check for available disk space have to allocate the disk storage necessary for the next operation to see wheter it will work or not.
Whiteboard: relnote-user
reassigning jefft's bugs to naving
Assignee: jefft → naving
Status: ASSIGNED → NEW
Severity: normal → critical
Keywords: mozilla0.8
Whiteboard: relnote-user → relnote-user [1.0 stop ship?]
Losing mails is the worst thing a mail client can do :) If this bug is real, it should be evaluated as a possible stop ship for Mozilla 1.0. Nominating for mozilla 0.8 for a start. I expect this to get pushed out a little more, but let's avoid a "but now it's too late for this" dilemma for 1.0. Are there any "arch" or "highrisk" issues with this? Upping severity because of dataloss. Please correct me if I'm wrong.
Damn it, is there a path that I missed through GetMsg that doesn't hit my disk space checking routine? Adding a dependancy to that bug.
Depends on: 32443
Unless I'm totally misunderstanding the code, reserving the disk space won't do anything. I got pointed to this from bug 62480, which is POP3, as is the original reporter's problem (so I haven't actually experienced this, I've just been looking at the code) nsPop3Sink::WriteLineToMailbox does *m_outFileStream << buffer, which has no way of checking for disk full errors (techically, you have to flush before you know that for sure - maybe this should be done?) The patch to 62480 actually checks for error status, but nothing's setting them, so it doesn't make a difference. I can't see anything being reserved. Mozilla checks that space is available, but this is subject to races (ns3PopProtocol.cpp#1803 is where you're talking about, right?) if someone else is writing to the disk at the same time you are. Checking before is OK, but mozilla should deal with unexpected failures (IO failure, NFS server dying, etc) Anything else just causes dataloss. Also, if Inbox is a symlink, checking the ammount of space available in the folder won't help. NB - the reporter mentions using samba. If you don't compile samba correctly (--enable-quota, IIRC), then some older versions won't tell windows that something went wrong (if hard quota != soft quota, but occasionally anyway, especialy if the server isn't linux), and it will happily keep writing. From personal experience, I know that MS Access and Word onto a quota'd samba drive will corrupt their own files quite happily. In this case, there isn't anything mozilla can do about it. This doesn't mean that there isn't a problem though.
This currently isn't planned for mozilla0.8. I've changed the nomination to mozilla0.9 so it gets properly triaged for the next milestone. It looks like others besides naving are looking at this. If someone else is able to fix this by 0.8, then please do so!
Keywords: mozilla0.8mozilla0.9
putterman: The nomination for mozilla 0.8 was only to evaluate if there are any "arch" or "highrisk" issues with this bug, to avoid a "but now it's too late for moz 1.0" dilemma. Please consider doing this evaluation early in the 0.9 cycle.
yeah, I sort of fell down on the job when it came to looking at mozilla0.8 nominations (I was busy with all of the nsbeta1 nominations). It turns out there weren't too many that weren't looked at already and I've gone through and made sure that they get nominated for mozilla0.9 so that they can be evaluated rather than lost.
Ideally Mozilla wouldn't delete messages from the POP server without first confirming the messages have been successfully written to the mail file on disk. Having two copies of the same message is a lot better than losing a message forever. I assumed this was what Moz did, but some of the commments above and on bug 62480 lead me to doubt this. Can anyone clarify?
mozilla.org@pidgin.org: Thats what this bug is about. Currently there is no way for the pop code to know that the write failed. Adding dataloss keyword. The full solution for this bug is more complicated though, because if we run out of space halfway through the message, we have to delete the half of the message we have already written, or the mailbox may be left corrupted. Actaully noticing the error will mean that we don't delete the message, which is a start. FWIW, I've had the same mailloss in NS4.x, when the linux machines netscape was running on got the wrong sizes for quota from the sunos NFS drives.
Keywords: dataloss
Target Milestone: M18 → ---
nominating for beta1
Keywords: nsbeta1
marking nsbeta1+
Priority: P3 → P2
Whiteboard: relnote-user [1.0 stop ship?] → [nsbeta1+]relnote-user [1.0 stop ship?]
Target Milestone: --- → mozilla0.9
We don't begin downloading messages until we make sure that the disk space is available. From the comments in the code, GetDiskSpaceAvailable() may not work on all platforms. I believe this may be true for samba drive (digital unix) Reporter, you can verify this if you have a debug build. You should look for "Call to GetDiskSpaceAvailable FAILED! " on the console
The problem is that GetDiskSpaceAvailable is not atomic - it tells you what is available now, not what may be available by the time we've finished downloading. The GetDiskSpaceAvailable call should be there an an optimisation ("if we know we can't finish, don't bother starting"), not as a solution. That check is also totally broken anyway, at least on unix - consider quotas: nsLocalFileUnix.cpp: 1004 // The number of Bytes free = The number of free blocks available to 1005 // a non-superuser, minus one as a fudge factor, multiplied by the size 1006 // of the beforementioned blocks. This may or may not be a bug in the nsLocalFileUnix implementation - it is returning the ammount of free disk space available. The only comment in the idl file is: // maybe we should put this somewhere else. Which isn't much help as to the official definition of the attribute. Regardless, the code _must_ get the return value - even if you consider this ammount + the slack checked for to be sufficient, think disk IO error, network drive going offline, etc. The problem is that nsPop3Sink::WriteLineToMailbox just uses the nsIOFileStream, which doesn't report errors.
The available space can change as we begin downloading, only due to external factors, like downloading some other app/file. What you suggest about failed writes may work in this case. However this case may not occur very often.
under multi-user operating systems, other users can affect the available space on disk through normal usage. I think the case would occur as the norm on most shared systems. am I misreading something?
Under multi user OS, I think we have quotas so one user's memory usage should not affect others. The best we can do right now is do GetDiskSpaceAvailable before we download each message but it will slow down getting messages.
> Under multi user OS, I think we have quotas so one user's memory usage should not affect others. 1. GetDiskSpaceAvailable doesn't take quotas into account on (at least) unix. I filed bug 72892 on that. 2. Besides, thats only true if numUsers*diskSpacePerUser <= diskSize. Which is not true on the two sets of machines (from two different unis) I have access to, and I doubt its true generally, although I have no data to back that up. (On some of the machines, TAs and staff have no quotas at all, but students do. And root never has a quota, so a large log file could take up all the disk space. This happened this morning on one machine, and the unix cmd-line mail gave an semi-informative error and quit.) 3. This is _mail_. Error codes must be checked. Retrieving and sending mail without losing it is the only essential purpose of a mail client.
at least for all the multi-user machines around here, we don't have quotas and are free to chew up as much disk with builds as we like. It's still our responsibility to not eat the user's mail, regardless of the setup by a sysadmin.
As an end user, I would much rather take a small performance hit every time I retrieve messages if the alternative is to lose messages every so often, and I imagine many users feel the same way. Mozilla or any other POP client should get a return value from whatever is writing the message to disk, and then delete the message only if the return value indicates the write succeeded. It's not enough to just guard against common cases like disk full, quota full, bad permissions etc. Bug 71025 reports a case where messages were lost when writing messages to disk failed for a reason other than the disk being full, and Mozilla deleted the message from the server anyway.
Just another case study from reality: Our working group at a university pays per GB for (university-wide-)centralized backups. Thus our home partition is very limited in size. In the past, every two months or so it suddenly happened that the disk was full. There sometimes seems to be a (mysterious) process writing to disk continuously, until no space is left at all. So this occurs immediately, while working, even if there was plenty of space an hour ago. Usually we discover it when we try to use mail. (We are mostly using emacs or exmh for mail, in a linux/solaris environment. Also, we are still using NN4.6/4.7.) We are not using quotas, so one user's memory usage _does_ affect others.
I am working on it and will have a fix soon but how to test it ?
Attached patch patch that was checked in (deleted) — Splinter Review
fixed
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
*** Bug 74321 has been marked as a duplicate of this bug. ***
OK with 2001-06-07-13-0.9.1 commercial branch (beta1) build and linux rh6.2. On a POP account, server settings have "leave messages on server" disabled. When disk is full, Get Msg (there are indeed several messages to retrieve) gives no "unknown error". Status bar text shows there are indeed messages to retrieve, but doesn't retrieve them -- text is "receiving 0 of N messages". Mail window is not left in an unusable state, no hang, etc. After freeing some disk space, Get Msg is able to retrieve the new messages just fine; properly downloaded to inbox and able to be displayed.
Same results on same scenario using 2001-06-07-13-0.9.1 with win98. Since this is such a time-consuming process to reproduce, I'm going to go out on a limb and assume mac is okay, too. Marking this verified.
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: