Closed Bug 1584983 Opened 5 years ago Closed 4 years ago

Thunderbird re-downloads entire Office 365 mailbox multiple times randomly because UIDVALIDITY changed

Categories

(Thunderbird :: Folder and Message Lists, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: u597279, Unassigned)

References

Details

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0

Steps to reproduce:

Thunderbird 68.1.1 64 Bit, Windows 10 64 bit (platform does not matter), mail stored as maildir.

I have an Office 365 student account that I access through IMAP on thunderbird. Over weeks of use, sometime randomly the entire mailbox is redownloaded (i.e. all the mail in all folders are redownloaded as if newly configured).
Add to that, I noticed that my profile directory is 10s of gigabytes. Upon inspection, I see that there are multiple copies of each email in the profile directory (appended with -1, -2, ...) so I suspect everytime the redownload happens, I end up with multiple copies of the email stored on my profile and that ends up inflating the size of the profile directory. Compacting the mailbox does NOT remove these duplicate files. I have to delete them manually. This only appears to happen with Office 365 account as I have another email provider configured simultaneously but that has not been showing these problems.
This issue occurs on Windows and on Linux. I am not sure what other information I can provide that can be useful.

Actual results:

Random multiple redownloads, multiple copies of email on hard disk.

Expected results:

Should not redownload email, should not use up so much space on hard disk.

Reporter, Are you just reading and/or sending email with the account when you see the re-downloads? Or are you maybe also renaming or adding folders in the account or possibly changing the account settings or doing any other configuration activities?

Where you see the directories with -1, -2 etc on them, can you tell me which directory these are in and the other file structures and names you are seeing (should be under ImapMail/ directory in the profile).

Also, which directories are you deleting manually? I think it should be the un-numbered and lower numbered versions.

About how many emails get re-downloaded when the event occurs?

Also, if you use any 3rd party add-ons you might try running without them for a while (safe mode).

I don't currently have an office365 account but I do have Outlook account. Haven't noticed a problem with outlook imap; would assume they are similar servers since both M/S, but who knows.

Attached image Screenshot_20191001_012846.png (deleted) —

Duplicate files

(In reply to gene smith from comment #1)

Reporter, Are you just reading and/or sending email with the account when you see the re-downloads? Or are you maybe also renaming or adding folders in the account or possibly changing the account settings or doing any other configuration activities?

There is no real pattern as far as I can tell and I have not found an exact way to replicate it. From what I can remember, I open thunderbird and it randomly decides to do it. There is absolutely nothing I am doing that causes that. No account change, no config change, no renaming. Just open thunderbird to check email.

Where you see the directories with -1, -2 etc on them, can you tell me which directory these are in and the other file structures and names you are seeing (should be under ImapMail/ directory in the profile).

It is not the directories, but the email files themselves. Since maildir stores each email as a file, and I was annoyed by this issue, I had decided to investigate my profile directory to see if there is some funky thing going on that would explain the issue. I found multiple copies of the same email in every directory of the account. I am inferring that whenever the emails are redownloaded, a new copy of the file is stored in the profile folder. See screenshot added above. Note that it is the exact same email, but multiple copies of it are in the directory. All the emails are exactly the same size. This was taken for a relatively new profile (a couple of months old). The date modified on these are around 2 weeks apart. Many older profiles had more copies.

Also, which directories are you deleting manually? I think it should be the un-numbered and lower numbered versions.
So far, I have resorted to deleting the entire directory structure under ImapMail/outlook.office365.com/** because there are too many of them to deal with. Thunderbird then repopulates all of them. Its easier than searching and dealing with 1000s of emails, especially under windows.

About how many emails get re-downloaded when the event occurs?
ALL of it, every single email in the account. I have 1000s of emails in my account and everything is redownloaded. Currently, I am syncing about 2 GB of email in total. I set thunderbird to download all emails as I prefer to have an offline copy on my disk. But when this redownloading happens, this can quickly get out of hand (at this time, the profile folder was 6GB due to 3 copies. Checking some of my older backups, the profile folders ranged from 10 to 35 GB).

Also, if you use any 3rd party add-ons you might try running without them for a while (safe mode).

I had not used any third party addons until recently and it did not matter, it occured regardless. I can try to create a new profile and try again if you wish so, but I do not know if that changes anything. On a side note, what do I do when this event happens? Is there a way to capture any log that maybe useful in analysing this scenario? If so, please let me know.

I don't currently have an office365 account but I do have Outlook account. Haven't noticed a problem with outlook imap; would assume they are similar servers since both M/S, but who knows.

Indeed I suspect it is an Office 365 issue because only this account is affected. I have multiple accounts configured and have not seen this behaviour anywhere. I would however like to see this issue resolved as I use thunderbird extensively.

OK, I've seen directory names have number suffixes so I thought that's what you probably meant. So you are seeing duplicates (or triplicates, etc) of every eml file in every folder. That seems really weird. There is a value called "UID Validity" that is stored for every mailbox (folder) in tb and if that doesn't match what the server has I suspect all the emails in the now invalid folder are re-downloaded; but I don't remember right off the details on this.

I assume you are not seeing duplicates emails in tb's list of messages.

Anyhow, there is a way to record a log of IMAP activity. You have to define some environment vars and run tb with those set. The file that it records might get kind of big but I don't know what else you can do to debug what is causing this.

The details are here: https://wiki.mozilla.org/MailNews:Logging
You can attach the log like you attached the screenshot. Note that when you restart tb when running with the MOZ_LOG* vars set, the previously recorded log is overwritten (not appended).

My outlook account has only mbox format. I did check another account with maildir that has about 4K messages in Inbox. I didn't see any suffix number on any of the 4k files. (Also, there is no "compact" function for maildir folders that could potentially remove the redundant eml files.)

One more thing: It might be better to record the log with a new profile containing only the Office account. That would eliminate recording stuff in the log for accounts that don't have this problem and make the log smaller and easier to read. But this is not mandatory.

(In reply to gene smith from comment #4)

OK, I've seen directory names have number suffixes so I thought that's what you probably meant. So you are seeing duplicates (or triplicates, etc) of every eml file in every folder. That seems really weird. There is a value called "UID Validity" that is stored for every mailbox (folder) in tb and if that doesn't match what the server has I suspect all the emails in the now invalid folder are re-downloaded; but I don't remember right off the details on this.

I assume you are not seeing duplicates emails in tb's list of messages.

Anyhow, there is a way to record a log of IMAP activity. You have to define some environment vars and run tb with those set. The file that it records might get kind of big but I don't know what else you can do to debug what is causing this.

The details are here: https://wiki.mozilla.org/MailNews:Logging
You can attach the log like you attached the screenshot. Note that when you restart tb when running with the MOZ_LOG* vars set, the previously recorded log is overwritten (not appended).

My outlook account has only mbox format. I did check another account with maildir that has about 4K messages in Inbox. I didn't see any suffix number on any of the 4k files. (Also, there is no "compact" function for maildir folders that could potentially remove the redundant eml files.)

One more thing: It might be better to record the log with a new profile containing only the Office account. That would eliminate recording stuff in the log for accounts that don't have this problem and make the log smaller and easier to read. But this is not mandatory.

Ok, I will create a new profile with no addons and enable logging. Hopefully I will be able to capture something that might explain what is going on.

Attached file Office365MailCondensed.log (deleted) —

I have managed to capture the scenario and have provided a trimmed log of the first few 1000 lines of the log for privacy reasons. I can however try to trim it further and provide the full log when time permits and if required. Let me know if you need further information.

I see fetch of headers for inbox and archive/2015. I don't see fetch of entire messages requested. For inbox and archive/2015, are they set so that messages are saved for offline storage?
At least for Inbox, it appears that only message headers are being fetched for "new" message (those with UID exceeding the next uid value returned by select). Not sure why so many headers for archive/2015 are fetched.
Can you tell me what you were doing in tb when this log was recorded?
Does full message body fetch occur later in the cut-off part of the log?
I thought maybe the re-download is due to a uidvalidity error. But apparently not since I don't see the expected log line that occurs when this happens: "uid validity not ok".

(In reply to gene smith from comment #8)

I see fetch of headers for inbox and archive/2015. I don't see fetch of entire messages requested. For inbox and archive/2015, are they set so that messages are saved for offline storage?
At least for Inbox, it appears that only message headers are being fetched for "new" message (those with UID exceeding the next uid value returned by select). Not sure why so many headers for archive/2015 are fetched.
Can you tell me what you were doing in tb when this log was recorded?
Does full message body fetch occur later in the cut-off part of the log?
I thought maybe the re-download is due to a uidvalidity error. But apparently not since I don't see the expected log line that occurs when this happens: "uid validity not ok".

I had actually closed TB before it was able to download all files, it was requesting ALL messages under pretty much all folders (e.g. Archive/2016, /2017, {not limited to Archive directory either}, etc...), and there are LOTs of emails. I closed it because I suspected that if there an error that caused the re-downloading, it would have already appeared in the log, there is no point in me waiting for almost an hour for TB to fetch all my emails and inflate the log to several gigabytes. Also yes, I think the full body download appears later in the log. I see lots of HTML email body in the log, so definitely it is fetching the full emails.

I have some sort of info for the current scenario: earlier in the day I had moved a bunch of mails from Inbox to archive. This was in another instance of TB in my work computer. Nothing unusual, I frequently do this without any issues. However, this appears to have caused TB to redownload everything. In order to confirm this, I checked my work computer and lo behold, it appears to have downloaded a second copy of all the emails around the time I had moved the emails. The office profile is relatively new with no addons (except dictionaries and language packs). A bit later on my home computer, same thing happens. I literally ONLY started TB and I see this as soon as I launch it. This profile is relatively new, contains only this email and I do not even use it for anything other than fetching emails (no sending or anything else). Similar to my work profile, it has no addons except dictionaries. I launch an instance of TB from bash with this profile and with the appropriate logging variables to make sure I capture this whenever this happens.

You are right - interestingly it was not the case for Inbox. I cannot say for sure, but I remember this being similar to the last time it happened. I searched for "invalid" in the full log - with no results, so that possibly rules out UID problems. I am really stumped, I see no obvious errors yet the emails get re-downloaded: Why?

I see something similar when looking at my Outlook account. Seeing assert with debug build that "ASSERTION: uid validity seems to have changed, blowing away db ...". Saw re-fetch of headers for a large folder. I store only headers and not full messages. Will keep looking.

Reporter cg: If you would add another item to the logging variable, IMAP_CS:5, it would help. This will enable a log line that shows info like this:

D/IMAP_CS UpdateImapMailboxInfo(): UIDVALIDITY changed, reset highest MODSEQ and UID for folder=<foldername>

Specifically, I am referring to this env var:

MOZ_LOG=IMAP:5,IMAP_CS:5

I am seeing this logged for most of my folders on my outlook imap account. I don't know why the uidvalidity value has changed for most folders. I only use the outlook account for tb test purposes and don't access it often.

Also, just to be sure, please check the value of mail.server.default.use_condstore in the config editor. It should be false.

Note also that a very similar problem to what you see was reported long ago, but for a different server: Bug 264800. Someone mentioned that the account being accessed via POP3 by another client might have caused the server to "panic" and reset the uidvalidity. Possibly access to office365 via another (non-tb?) client is doing this? Just a wild guess.

(In reply to gene smith from comment #11)

Reporter cg: If you would add another item to the logging variable, IMAP_CS:5, it would help. This will enable a log line that shows info like this:

D/IMAP_CS UpdateImapMailboxInfo(): UIDVALIDITY changed, reset highest MODSEQ and UID for folder=<foldername>

Specifically, I am referring to this env var:

MOZ_LOG=IMAP:5,IMAP_CS:5

I am seeing this logged for most of my folders on my outlook imap account. I don't know why the uidvalidity value has changed for most folders. I only use the outlook account for tb test purposes and don't access it often.

Also, just to be sure, please check the value of mail.server.default.use_condstore in the config editor. It should be false.

Note also that a very similar problem to what you see was reported long ago, but for a different server: Bug 264800. Someone mentioned that the account being accessed via POP3 by another client might have caused the server to "panic" and reset the uidvalidity. Possibly access to office365 via another (non-tb?) client is doing this? Just a wild guess.

Does it have to be POP3? While I do use another clients (specifically one mobile client and possibly one other client active during thunderbird), but they are all IMAP. However, the circumstances are the same whether the re-download happens or not. mail.server.default.use_condstore is at the default vale of false.

Ok, I will add the variables and update when I have another log.

Does it have to be POP3?

No idea why outlook and, for you, office365 are reporting changes in the uidvalidity number. Someone reported this about POP3 long ago.
I think adding the IMAP_CS:5 will work with your version of tb. But if you don't see lines in the new log containing "IMAP_CS" or more specifically,

D/IMAP_CS UpdateImapMailboxInfo(): UIDVALIDITY changed, reset highest MODSEQ and UID for folder=<foldername>

then no need to attach another log file.

FWIW, I modified my local tb build so the logging for the above line includes the new uidvalidity number. I see it change to 206 from the old value 183 for a folder called "ml-test" on the outlook account. This, of course, causes a re-download.

(In reply to gene smith from comment #13)

FWIW, I modified my local tb build so the logging for the above line includes the new uidvalidity number. I see it change to 206 from the old value 183 for a folder called "ml-test" on the outlook account. This, of course, causes a re-download.

I suspect that might be the issue, and if it is, then what would be the potential solution to this? Is there something on TBs end that can prevent this?

I checked a few other test accounts that have been idle for months or possibly years and they all returned the same uidvalidity number that tb had stored before and no re-downloads occurred. No idea why outlook/office365 is doing this (or appears to be doing this).

I doubt that it is an easy change in tb code since uidvalidity is assumed to rarely, if ever, change for a folder. I can suggest some possible things to try and/or workarounds:

  1. For folders that you see this on, try repairing the folder (right-click on folder, properties). This may clean up the redundant maildir files with suffixes, not sure. This will cause a another re-download of all the folder messages, but only the selected folder you right-clicked on. Just repair one and see if it helps before repairing all your folders.

  2. You probably have a good reason for using maildir (maybe backup efficiency) but you may have better luck going with mbox format for local storage since it has been around longer and is fully (or more fully) tested and debugged. I doubt that you will end up with duplicate emails in the mbox file even if they are re-downloaded due to uidvalidity changes. But this won't stop the re-download if uidvalidity changes.

  3. Maybe don't store the full message locally at all and just keep the messages only on the imap server. If a re-download occur it will only fetch the message header and not the complete body and you won't have any storage of email bodies in either mbox or maildir formats

Don't know if any of these are acceptable, but repairing a folder or two should be worth a try. But still curious what you see in the log with env var IMAP_CS.

To update this issue: (In reply to gene smith from comment #15)

I checked a few other test accounts that have been idle for months or possibly years and they all returned the same uidvalidity number that tb had stored before and no re-downloads occurred. No idea why outlook/office365 is doing this (or appears to be doing this).

I doubt that it is an easy change in tb code since uidvalidity is assumed to rarely, if ever, change for a folder. I can suggest some possible things to try and/or workarounds:

  1. For folders that you see this on, try repairing the folder (right-click on folder, properties). This may clean up the redundant maildir files with suffixes, not sure. This will cause a another re-download of all the folder messages, but only the selected folder you right-clicked on. Just repair one and see if it helps before repairing all your folders.

From what I can see, it deletes all emails and redownloads them, which I guess is better than having multiple copies.

  1. You probably have a good reason for using maildir (maybe backup efficiency) but you may have better luck going with mbox format for local storage since it has been around longer and is fully (or more fully) tested and debugged. I doubt that you will end up with duplicate emails in the mbox file even if they are re-downloaded due to uidvalidity changes. But this won't stop the re-download if uidvalidity changes.

Yes - it is for backup reasons. I would rather not change to mbox unless I absolutely have to.

  1. Maybe don't store the full message locally at all and just keep the messages only on the imap server. If a re-download occur it will only fetch the message header and not the complete body and you won't have any storage of email bodies in either mbox or maildir formats

This maybe a way, but it requires me to change my backup strategy. I will have to evaluate if it would work.

Don't know if any of these are acceptable, but repairing a folder or two should be worth a try. But still curious what you see in the log with env var IMAP_CS.

Unfortunately, you are right: I see the "UIDVALIDITY changed, reset highest MODSEQ and UID for folder=<foldername>".

Overall I really hope there is a better solution than having to not have the email locally.

Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Component: Untriaged → Folder and Message Lists
Resolution: --- → INVALID
Summary: Thunderbird re-downloads entire Office 365 mailbox multiple times randomly → Thunderbird re-downloads entire Office 365 mailbox multiple times randomly because UIDVALIDITY changed
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: