1796711 - Consider dropping support for various modes of having incomplete messages (IMAP: mime_parts_on_demand; POP3: fetch headers only)

Reporter

Description

•

2 years ago

We have a few modes where the whole message isn't available even when "available".
In IMAP there is mime_parts_on_demand, which will fetch e.g. a large attachment only when you open it, but lets you see the email message before doing that.

In POP3 there is "Fetch headers only". UX of which is nothing but horrible, with a button to manually load the message after you are supposed to be looking at it.

None of these modes would appear to be in the users best interest, and they are both off by default. In general, getting the message and having it available at all times if you have it at all seems like a more sensible approach. The have-it-but-not-really cases adds much complexity we could do without. Thinking ahead, for doing mails in conversations with fetch headers only... what would that even look like?

There's certainly bugs, on pop I think around how those interact with filters, and for mime_parts_on_demand Wayne's suggested many of https://mzl.la/3sdK5Sq + some known crashers where mime_parts_on_demand is likely to be the cause.

Wayne Mery (:wsmwk)

Comment 1

•

2 years ago

Thanks for filing the bug. I am in favor. I look forward to improved stability and predictability.

I am sure some pop users will disagree/complain. But really, "fetch headers only" flies in the face of the purpose of pop. (as does keep message on server, to some extent)

gene smith

Comment 2

•

2 years ago

For IMAP it still fetches by parts even with mime_parts_on_demand set false (the current default). Re: bug 1794762 comment 4 and earlier comments.
Will probably have to figure out how to avoid that without requiring everyone to store all messages for offline. I think it can be done.

gene smith

Comment 3

•

2 years ago

As Magnus mentioned, the imap fetch parts on demand is now disabled by default. However there is still one situation where is still occurs. When the message is too big to fit into the cache entry, the current code attempts to fetch just an individual attachment when it's needed. However, often this still doesn't help because the attachment itself may be too big to fit in the cache entry too. If that's the case, the cache prefs under "browser.cache.memory" have to be adjusted as described in bug 1794762 and in several other bugs usually with "27 bytes" in the summary.

I've been working on a change to imap that never does parts fetches and will not require the user to adjust the prefs even with super-huge messages. The only limitation is the amount of free disk space the user has.

The first change is to increase the cache entry size. We currently use the "cache2" memory cache which defaults to 25M. If a part is larger than 25M, as is rarely the case, the user will see "This part will be fetched on demand" when viewing the attachment since more data was piped to the cache entry than it can hold so the entry gets "doomed" and attachment/part appears corrupted when displayed or saved.

An easy way to increase the cache entry size is to use "browser.cache.disk...." instead. Using the disk cache instead of memory (RAM) cache typically doubles the default size from 25M up to 51.2M (*). The disk cache has a default "smart" feature that adjust the total capacity of the cache based on the total disk size and amount of free space with a maximum of 1G allocated. Even on a laptop with a very limit flash disk (about 25G and 8.7G free) about 696M of capacity is allocated for disk cache in TB by the cache2 system with a default max entry size of 51.2M.

While working on related "27 byte" bugs over the years, users have sent me sample emails that cause the problem. Most are at or slightly bigger than the 25M ram cache limit. None I've seen even approach the 51.2M disk cache entry size. However, in case a user has a message over the 51.2M limit, my change, before writing to cache, checks to be sure the message will fit in the entry. If it won't fit, I send the message to a temp file instead of to cache. (The save to a temp file uses code based on BenC's code that saves messages for anti-virus quarantining to a temp file.)

The temp file will only ever contain a single (too big for cache) message opened by the user and downloaded. If the user accesses another message the temp file will be deleted and the new message will be cached (if it fits) or saved again to temp (if won't fit in cache). If the user returns to the original huge message, it will be re-downloaded and stored to the temp file again.

So as long as the user has enough free disk space to hold their largest existing message, the previous problems should be solved. But the need to resort to a temp file should be rare.

With this change, when a message is accessed the first time, the complete message will be downloaded and stored in disk cache or to temp file. When parts/attachments are accessed or need to be displayed inline, the whole message will be read back from disk and the part extracted from the stream by the existing "libmime" code. No storage or fetching of individual parts from the server occurs at all. This means that the imap fetch of bodystructure is no longer needed and the file nsImapBodyShell.cpp can be eliminated. Also, quite a bit of code in nsImapProtocol.cpp and some in nsImapServerResponseParser.cpp to support fetching parts and body structure can be removed. The only added code is to support checking if the message will fit into cache and to support writing/reading the temp file.

Note: When the conversion to cache2 was done in 2016, disk cache was originally intended to be used. However, only with memory cache could a peek at the first 100 bytes to check for validity occur (see https://searchfox.org/comm-central/rev/5c171ecce2af72ca83a611fc5562820d6d04130d/mailnews/imap/src/nsImapProtocol.cpp#9546). So my change leaves this check out. I don't think I've ever seen or heard of this failing. However, there is still a check before this that verifies that the cache entry size matches the actual message size, which is a good validity check even though I don't think I've seen/heard of this failing either. If the removed validity check turns out to be a showstopper or if there are other reasons not to use disk cache, going back to memory cache is not a big change (one line change). With memory cache in use, the temp file might need to be used more often (due to smaller cache entry size and lower total cache capacity) but other changes would be unaffected.

Here's a try build with my WIP changes: https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=b7856c1a52137521eb932fe9d3102d3001782847
There are several unit test failures at test_partsOnDemand.js and test_cacheParts.js. These can also probably be removed since neither activity, parts on demand and caching of individual parts, is supported by my change. (Some of the failure and timeouts are probably due to some temporary MOZ_ASSERTs that I added to make sure part fetches are actually never requested.)

One more thing that should be mentioned is that whether parts fetches on demand occurs or not is only relevant for users that don't use offline store. When the user uses mbox or maildir storage, the whole message is always fetched/downloaded and individual message parts are never requested from the server.

(*) The maximum size of a disk cache entry, browser.cache.disk.max_entry_size defaults to 51.2M. However, the actual maximum can be less since cache2 restricts it to 1/8 of the cache disk capacity. The cache disk capacity is determined automatically when browser.cache.disk.smart_size_enabled is true which is the default setting and the value seen in browser.cache.disk.disk_capacity is irrelevant. The true disk capacity, determined by "smart sizing" can be seen in the new "about:cache" link I added to "Troubleshooting information" under "disk -- Maximum storage size:".

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1794762, https://bugzilla.mozilla.org/show_bug.cgi?id=1673093, https://bugzilla.mozilla.org/show_bug.cgi?id=1757478, https://bugzilla.mozilla.org/show_bug.cgi?id=1796538, https://bugzilla.mozilla.org/show_bug.cgi?id=1302422

gene smith

Comment 4

•

2 years ago

Attached patch no-fetch-parts-v0.diff (obsolete) (deleted) — Details — Splinter Review

I've tested this with a "modern" laptop (1T ssd) and on a very old and weak laptop with a 25G (65% full) and the disk caching and temp file (described in previous comment) works great with very large messages; tested up to 117M which is much larger than most mail servers even allow.
Testing in real world by a user using no offline storage would be useful. Richard Leger comes to mind :).
Here's the link again to the try build: https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=b7856c1a52137521eb932fe9d3102d3001782847

gene smith

Updated

•

2 years ago

Flags: needinfo?(richard.leger)

sebastian

Comment 5

•

2 years ago

We use Thunderbird with an IMAP mail server.
For our use case it's important that Thunderbird does not store all the attachments on the hard disk permanently!
We have mailboxes that receive many emails with large attachments every day (up to 70 MB per single mail). We have to keep these emails on the IMAP server / in the mailbox because we often have to look up things later.
So we disabled the local storage / caching of the emails in the Thunderbird settings menu.
Otherwise, our notebooks run out of SSD disk space (max. 256 GB SSD space overall, some only 128 GB).
This happened several times when the local mail folder caching was reenabled by some software update or bad luck. This rendered the laptops unusable with 0 bytes free disk space on /.
We'd appreciate a disk caching mechanism that deletes local cached attachment copies after they haven't been accessed for a few days. This might be a compromise between no local copies and caching everything from back years ago.
But we just cannot afford to keep full copies of the mailboxes on each and every small client laptop SSD. :(

b5

Comment 6

•

2 years ago

Mixing the removal of two different functions (IMAP "parts on demand", POP3: "Fetch headers only") into the one bug appears odd. "Fetch headers only" is used by various users since complaints are received each time it breaks (bug 1783552, bug 1763974, bug 1708073, bug 1503395, bug 1446679, etc.). Removing a feature used by some users will likely cause similar problems as the removal of movemail.

sebastian

Comment 7

•

2 years ago

(In reply to sebastian from comment #5)

An additional idea from an architectural point of view: In a typical client-server architecture a client shouldn't have to mirror the entire server database to its local storage but only request the data needed over a network.
Caching concepts like in Squid Proxy often include some idea of house-keeping and cache size limiting. Only the most recent or most frequently used entries are kept in the cache. Contents not accessed for a while are dropped if the cache runs against some limit.

Richard Leger

Comment 8

•

2 years ago

(In reply to sebastian from comment #5)

We use Thunderbird with an IMAP mail server.
For our use case it's important that Thunderbird does not store all the attachments on the hard disk permanently!
We have mailboxes that receive many emails with large attachments every day (up to 70 MB per single mail). We have to keep these emails on the IMAP server / in the mailbox because we often have to look up things later.
So we disabled the local storage / caching of the emails in the Thunderbird settings menu.
Otherwise, our notebooks run out of SSD disk space (max. 256 GB SSD space overall, some only 128 GB).
This happened several times when the local mail folder caching was re-enabled by some software update or bad luck. This rendered the laptops unusable with 0 bytes free disk space on /.
We'd appreciate a disk caching mechanism that deletes local cached attachment copies after they haven't been accessed for a few days. This might be a compromise between no local copies and caching everything from back years ago.
But we just cannot afford to keep full copies of the mailboxes on each and every small client laptop SSD. :(

I agree with Sebastian, when using IMAP in online mode (no cache), attachment shall not necessarily be loaded and cached locally unless requested by the end-user (e.g when opening the attachment), eg open or save action...
If the user decide to locally cache for a period of time, then cached attachment shall be removed after that period of time... I suppose it what it should do already but I have not checked.
I also noticed in case of big attachment, you cannot read the email content (body) while the attachment is loading which is not very convenient... but maybe it is linked to this part of the code you are trying to disable...
Will try the build and revert back...

gene smith

Comment 9

•

2 years ago

(In reply to sebastian from comment #5)

We use Thunderbird with an IMAP mail server.
For our use case it's important that Thunderbird does not store all the attachments on the hard disk permanently!
We have mailboxes that receive many emails with large attachments every day (up to 70 MB per single mail). We have to keep these emails on the IMAP server / in the mailbox because we often have to look up things later.
So we disabled the local storage / caching of the emails in the Thunderbird settings menu.

Did you have to increase the "browser.cache.memory.----" options to be able to open the 70MB messages?

Otherwise, our notebooks run out of SSD disk space (max. 256 GB SSD space overall, some only 128 GB).
This happened several times when the local mail folder caching was reenabled by some software update or bad luck. This rendered the laptops unusable with 0 bytes free disk space on /.

I don't think this can get switched on by an update -- at least I've never seen it occur. But if you do a new install and/or create a new profile, storing ALL message locally is the default.

We'd appreciate a disk caching mechanism that deletes local cached attachment copies after they haven't been accessed for a few days. This might be a compromise between no local copies and caching everything from back years ago.
But we just cannot afford to keep full copies of the mailboxes on each and every small client laptop SSD. :(

The disk cache I'm proposing only stores up to the default limit of 1000M or 1G and it ejects from cache older entries when space is needed for new messages. It uses mozilla/firefox's caching code (called cache2) so with your 128G SSDs you should be OK. I tested with a laptop with 25G SSD (with less than half of that actually free) and it worked fine on it.

But I'm not sure what cache2's policy is for deleting entries at shutdown or for expiring entries not accessed after some time. Often I see them deleted after TB shuts down but sometimes they aren't deleted. There is a "Clear Cache" button that deletes all the cache entry files under General Settings which does delete them all when clicked. I think maybe another Settings option would be needed so that all disk cache entries are deleted automatically and completely at shutdown. I think firefox has this option.

gene smith

Comment 10

•

2 years ago

(In reply to Richard Leger from comment #8)

I agree with Sebastian, when using IMAP in online mode (no cache), attachment shall not necessarily be loaded and cached locally unless requested by the end-user (e.g when opening the attachment), eg open or save action...

I'm not sure that's what he's saying. Currently, if you run with no offline store (as you do) the messages you open are cached to cache2's memory cache which defaults to 200M RAM bytes. Since "mime_parts_on_demand" is now defaulted as false, the whole message, including any attachments, is put into RAM cache.

If the user decide to locally cache for a period of time, then cached attachment shall be removed after that period of time... I suppose it what it should do already but I have not checked.

I think when you say "locally cache" you are actually meaning enable use of offline store (mbox or maildir) to store messages. Sometimes, even in the TB code, they call the offline store "cache" even though it's not a formal caching system. So sometimes the terms are not clear.

I also noticed in case of big attachment, you cannot read the email content (body) while the attachment is loading which is not very convenient... but maybe it is linked to this part of the code you are trying to disable...

Yes, I know what you mean. If you use offline store (which you don't) before you open the message it will be stored in mbox or maildir already by autosync. So the whole message just has to be read from disk and rendered to the display which is fast. But without offline store, when a new message is opened that is not yet in cache, it has to be fully downloaded from the server before it can be rendered. But if "mime_parts_on_demand" is set true with the current code, the attachment download would often be deferred until it is opened and only the message body would have to arrive in cache so the body part appears quicker.
Possibly autosync could be extended to help with this (currently autosync only works when offline storage is enabled).

Will try the build and revert back...

Thanks. Let me know how it goes. FWIW, I tried the debug build for linux and it crashed on my small laptop. But the optimized linux try build ran OK. So I would recommend using the optimized win64 try build.

Anje

Comment 11

•

2 years ago

The temp file will only ever contain a single (too big for cache) message opened by the user and downloaded. If the user accesses another message the temp file will be deleted and the new message will be cached (if it fits) or saved again to temp (if won't fit in cache). If the user returns to the original huge message, it will be re-downloaded and stored to the temp file again.
So as long as the user has enough free disk space to hold their largest existing message, the previous problems should be solved. But the need to resort to a temp file should be rare.

I see 'all' attachments get stored in UserName/Appdata/Local/Temp where 'pid-xxxx' folders are created to hold those opened attachments.
Size is irrelevant. Some are quite small eg: 12 KB, 120 KB
Those mp4/image/txt/html/pdf/docx etc files in the 'pid-xxxx' folders are not deleted nor are the 'pid-xxxx' folders. Overtime, this can amount to using a lot of space on disk which many people will not be aware of it's existance. Manual deletion is required.

I also noticed in case of big attachment, you cannot read the email content (body) while the attachment is loading which is not very convenient...

Would Anti-Virus product scanning incoming mail contribute (or be the cause) to this problem?

gene smith

Comment 12

•

2 years ago

(In reply to Anje from comment #11)

I see 'all' attachments get stored in UserName/Appdata/Local/Temp where 'pid-xxxx' folders are created to hold those opened attachments.
Size is irrelevant. Some are quite small eg: 12 KB, 120 KB
Those mp4/image/txt/html/pdf/docx etc files in the 'pid-xxxx' folders are not deleted nor are the 'pid-xxxx' folders. Overtime, this can amount to using a lot of space on disk which many people will not be aware of it's existance. Manual deletion is required.

I'm not sure if you are running the try build, but it appears that these "temp/pid-*" directories come from here: bug 1753242. I notice them too while testing my patch for this bug and didn't know what they were and thought maybe I was causing them, but apparently not.
Yes, they don't seem to get deleted and look like they keep growing. Some even have the same huge attachment file in them multiple times since my testing requires opening huge attachment a lot.

I also noticed in case of big attachment, you cannot read the email content (body) while the attachment is loading which is not very convenient...

Would Anti-Virus product scanning incoming mail contribute (or be the cause) to this problem?

I doubt it. On linux I have no AV running and the message body doesn't appear (even if it's just a few words) until the whole message, including huge attachment(s) is completely downloaded to cache.

As I mention above,

Possibly autosync could be extended to help with this (currently autosync only works when offline storage is enabled).

I made some changes to autosync and mostly got this working (with one issue I still need to resolve). It will help when opening newly arrived messages since they will be downloaded to cache in background on receipt and can be displayed from cache quickly. But if you go back and opening an older message, e.g., from long ago that is no longer in cache, it will still have to be downloaded from the server to cache and then read back from cache for the message to appear.

sebastian

Comment 13

•

2 years ago

(In reply to gene smith from comment #12)

Would Anti-Virus product scanning incoming mail contribute (or be the cause) to this problem?

I doubt it. On linux I have no AV running and the message body doesn't appear (even if it's just a few words) until the whole message, including huge attachment(s) is completely downloaded to cache.

Gene is right. I can confirm this problem for all our Linux clients running Thunderbird.

(In reply to gene smith from comment #9)

Did you have to increase the "browser.cache.memory.----" options to be able to open the 70MB messages?

Yes!

I had to set browser.cache.disk.max_entry_size, browser.cache.memory.max_entry_size and browser.cache.memory.capacity to 999999999 to make it work.
Otherwise, larger attachments can neither be opened in Thunderbird, nor saved to file system. If you try to save an attachments, TB will only write a corrupt file of a few bytes.

(In reply to gene smith from comment #9)

The disk cache I'm proposing only stores up to the default limit of 1000M or 1G and it ejects from cache older entries when space is needed for new messages. It uses mozilla/firefox's caching code (called cache2) so with your 128G SSDs you should be OK. I tested with a laptop with 25G SSD (with less than half of that actually free) and it worked fine on it.

Gene, this would be fantastic! <3

ISHIKAWA, Chiaki

Comment 14

•

2 years ago

I think ”fetch headers only for POP3 mail" can be removed.
I have never felt the need. (If one wants to do so, I think one is better advised to use IMAP.)

But for IMAP, I do see it is the main raison d'etre of IMAP protocol.

BTW, I have never used IMAP with TB simply because a dozen or so years ago, IMAP server were so full of bugs.
I could not lose a single message because I was using it for office work with external partners.
(I know the Internet is based on "best effort" and e-mails do get lost from time to time, but still, software should not lose e-mail messages due to bugs.)
Time flies and google et al servers many users via IMAP.

POP3 code should be kept as simple and straightforward as possible for ease of maintenance.

Just my two cents worth.

Ben Bucksch (:BenB)

Comment 15

•

2 years ago

First, please do not mix the conversation about POP3 "headers only", and IMAP "parts only". @Magnus: Can you please file separate bugs for each of these?

I agree that "POP3 headers only" might not be not very useful these days and could possibly be removed.

Magnus wrote in comment 0:

In IMAP there is mime_parts_on_demand, which will fetch e.g. a large attachment only when you open it, but lets you see the email message before doing that.

Indeed, fetching MIME parts on demand via IMAP is a very useful feature. Email can have big attachments. Maybe not you, but some people keep sending pictures and videos or large documents per email. When you have a slow internet connection, it is helpful to first download only the headers and the body (which technically is the first "MIME part" with text) of all the emails, and the attachments (the other "MIME parts") only in a second step, or only when needed.

This is particularly true when downloading many messages at once. E.g. when downloading all new messages over a slow Internet connection (not everybody can get fast Internet, esp. in rural areas).

In IMAP, would you also remove the ability to fetch headers only? What do you do when the user installs Thunderbird, configures his existing email account, and the account has 50 GB of messages? Would Thunderbird have to download all 50 GB, only to have the headers of all messages be displayed in Thunderbird?

There's certainly bugs

For the record, the bug list you cited has nothing to do with the proposal here. You just search for "mime part", which is merely IMAP's way of saying "attachment" or "email body". You don't want to remove attachments, do you?

Even if there are bugs: We cannot just remove useful features just because we found bugs. Otherwise there'd be nothing left of Thunderbird.

If you remove this feature, this will likely also cause bugs in areas that you didn't consider.

Magnus wrote in comment 0:

adds much complexity we could do without

Can you show that reduction of complexity in a concrete, working patch that doesn't cause other regressions? gene's patch removes only a few hundred lines of code, but also adds a lot of code, so the net removal is fairly little, if any.

Esp. considering the usefulness of the feature.

Chiaki wrote in comment 14:

for IMAP, I do see it is the main raison d'etre of IMAP protocol.

Yes, IMAP was designed so that you download only what you immediately need for display. So that you can fetch only the headers, only the body, and only a specific attachment, and you don't have to download and store everything locally.

This proposal here is basically reducing IMAP into a POP3 with server-side backup. Before we do that, that should have massive gains for the code. I cannot see that right now.

Mihovil Stanic [:Mikeyy - L10n HR]

Comment 16

•

2 years ago

(In reply to Richard Leger from comment #8)

I also noticed in case of big attachment, you cannot read the email content (body) while the attachment is loading which is not very convenient... but maybe it is linked to this part of the code you are trying to disable...

Just wanted to highlight this as it's a real issue.

(In reply to Ben from comment #15)

This proposal here is basically reducing IMAP into a POP3 with server-side backup. Before we do that, that should have massive gains for the code. I cannot see that right now.

Agreed. An important part of IMAP and TB is the ability to download everything except attachments. The only offline folder I have is INBOX, in which I keep this year's emails. In my opinion whole archive should have a header and body downloaded locally, but attachments kept on the server and downloaded when required.

sebastian

Comment 17

•

2 years ago

(In reply to Ben Bucksch (:BenB) from comment #15)

Indeed, fetching MIME parts on demand via IMAP is a very useful feature. Email can have big attachments. Maybe not you, but some people keep sending pictures and videos or large documents per email. When you have a slow internet connection, it is helpful to first download only the headers and the body (which technically is the first "MIME part" with text) of all the emails, and the attachments (the other "MIME parts") only in a second step, or only when needed.

Yes! +1

(In reply to Ben Bucksch (:BenB) from comment #15)

Chiaki wrote in comment 14:

for IMAP, I do see it is the main raison d'etre of IMAP protocol.

Yes, IMAP was designed so that you download only what you immediately need for display. So that you can fetch only the headers, only the body, and only a specific attachment, and you don't have to download and store everything locally.

This proposal here is basically reducing IMAP into a POP3 with server-side backup. Before we do that, that should have massive gains for the code. I cannot see that right now.

Full ACK

(In reply to Mihovil Stanic [:Mikeyy - L10n HR] from comment #16)

An important part of IMAP and TB is the ability to download everything except attachments. [...] In my opinion whole archive should have a header and body downloaded locally, but attachments kept on the server and downloaded when required.

I totally agree.

gene smith

Comment 18

•

2 years ago

Preface: Most of what I wrote below was before I saw comment 14 to comment 17 above. So my efforts so far have just been to implement the "always fetch it all" feature for imap as requested by Magnus.

I've changed my mind about using a "temp" file to store the huge and probably rare messages that won't fit in a cache entry. There are some issues with using a temp file such as:

supporting multiple users running tb on the same system
a single user running multiple tb instances
a user opening the huge message into two or more windows
the "temp" partition may be a fairly small RAM disk on some systems
as Anje noticed, some systems may not auto-clean old files from temp dir
and, of course, it adds a bit more complexity to the code.

Without a temp file, if the message won't fit in cache, it is just accessed via imap fetch and streamed without storing it anywhere. I don't know exactly how all the mime part processing works but apparently the whole message, if not first streamed to cache, goes directly to "libmime", which extracts parts as needed for inline display and to display the "links" to the various attachment. Without cache or online store, when an attachment is opened or saved, the whole message has to be fetched again to extract the attachment (mime part). Without the temp file, the attachment access time will depend on network and server speed since there is no cache or temp file to re-stream the whole message from, which obviously would be much faster.

With or without a temp file, initial access and display of the too-big-to-cache email will be the same since the whole messages will have to be fetched. The difference will be that accessing the attachments after the download will be slower since the whole message has to be fetched again to access each individual part. But with a temp file, to access individual attachments, just the temp file is streamed to libmime which is faster than getting the whole message again from the server.

The default setting for disk cache is browser.cache.disk.max_entry_size = 51200 or 51.2M. This is plenty big to store your average huge message. But if this is defaulted in TB to -1, it will usually allow for an even larger cache entry that is 1/8 of browser.cache.disk.capacity. On the system I'm running now this would increase the effective max entry size from 51.2M to 128M (1/8th of 1G). So with a default setting of -1 for max entry size, being unable to fit to cache would be less likely and decrease the need to save to a temp file.

Also, as I probably described before, the temp file I have proposed is only helpful during one access to the oversized message and its attachments. If you open another message and then come back and open the oversized message again, it has to be re-downloaded and re-stored to the temp file. This was to keep it simple and only ever have zero or one temp file in existence at any time.

I also found a way to include the cache2 validity check that I said I was going to leave out. It's there now. So using using disk cache now doesn't lose this validation feature that was thought to only work with memory cache. I'll attach my WIP next.

gene smith

Comment 19

•

2 years ago

Attached patch no-fetch-parts-v2.diff (deleted) — Details — Splinter Review

Attachment #9303236 - Attachment is obsolete: true

gene smith

Comment 20

•

2 years ago

(In reply to ISHIKAWA, Chiaki from comment #14)

I think ”fetch headers only for POP3 mail" can be removed.
I have never felt the need. (If one wants to do so, I think one is better advised to use IMAP.)

But for IMAP, I do see it is the main raison d'etre of IMAP protocol.

What is proposed by Magnus and what I have implemented so far for this bug for imap doesn't prevent the user from just fetching and storing message headers and not storing complete messages locally.

(In reply to Ben Bucksch (:BenB) from comment #15).

In IMAP, would you also remove the ability to fetch headers only? What do you do when the user installs Thunderbird, configures his existing email account, and the account has 50 GB of messages? Would Thunderbird have to download all 50 GB, only to have the headers of all messages be displayed in Thunderbird?

What is proposed here and what I have implemented so far (on this bug report) for imap wouldn't require this. However, what you describe IS the default TB behavior in that all messages are fully downloaded and saved to offline store files (mbox or maildir). To fetch headers only you would need to disable this default behavior before TB's first access to the imap server. The proposed changes only affect when we are not saving to offline store and don't prevent initial download of only headers.

Can you show that reduction of complexity in a concrete, working patch that doesn't cause other regressions? gene's patch removes only a few hundred lines of code, but also adds a lot of code, so the net removal is fairly little, if any.

I haven't counted the lines but I'm pretty sure I've removed a LOT more than I added, e.g., nsImapBodyStructure.cpp/.h gone, nsImapServerResponseParser.cpp removed all the bodystructure parsing stuff. No new files but changes mostly in nsImapProtocol.cpp.

Esp. considering the usefulness of the feature.

Do we know if anyone is actually using the mime_parts_on_demand feature considering that the last few ESRs have it disabled by default? Has anyone complained about it being off by default?

Chiaki wrote in comment 14:

for IMAP, I do see it is the main raison d'etre of IMAP protocol.

Yes, IMAP was designed so that you download only what you immediately need for display. So that you can fetch only the headers, only the body, and only a specific attachment, and you don't have to download and store everything locally.

Just to be clear (and repeat myself) to do this with TB you have to disable the default behavior to download and store everything to mbox/maildir files. Then you have to set mime_parts_on_demand back to true from default of false and also probably reduce the threshold for on demand fetches. Also, I have run into bugs with fetch on demand in that often when you want only to fetch a single attachment the whole message gets downloaded anyhow and often more than one time. So when I mention this bug usually Magnus comments (paraphrasing) that fetch on demand is a not really usuful feature since web pages are usually much bigger than emails...

This proposal here is basically reducing IMAP into a POP3 with server-side backup. Before we do that, that should have massive gains for the code. I cannot see that right now.

I think another reason for imap (probably the biggest compared to POP3) is that you can have multiple clients using the same account and this wouldn't affect that. Not sure what you mean by "massive gains".

(In reply to Mihovil Stanic [:Mikeyy - L10n HR] from comment #16)

Agreed. An important part of IMAP and TB is the ability to download everything except attachments. The only offline folder I have is INBOX, in which I keep this year's emails. In my opinion whole archive should have a header and body downloaded locally, but attachments kept on the server and downloaded when required.

Have you changed the default setting (in config editor) for mail.server.default.mime_parts_on_demand from the default of false? If it is still at false this bug won't affect you. You can still have INBOX with offline store and everything else not have offline store and my proposed changes won't affect that.

gene smith

Comment 21

•

2 years ago

I wrote in comment 20:

I have run into bugs with fetch on demand in that often when you want only to fetch a single attachment the whole message gets downloaded anyhow and often more than one time. So when I mention this bug usually Magnus comments (paraphrasing) that fetch on demand is a not really usuful feature since web pages are usually much bigger than emails...

Here's some direct quotes from Magnus:
bug 1675914 comment 1
bug 1675914 comment 9
bug 1629292 comment 2

Ben Bucksch (:BenB)

Comment 22

•

2 years ago

Thunderbird has code to fetch only the headers and display these first, and then download the full message afterwards. This is extremely useful, in a number of cases:

For slow Internet connections.
For huge mailboxes with 100 GB mails.
For roaming users with flexible desks within the office (esp. after Corona),
and many other situations.

If Thunderbird is not doing that (by default), then that's a bug. Yes, it makes sense to sync all the 100 GB - if possible. But that's going to take ages, and in the meantime, you want to read mail. That's why we have this. That's what IMAP was made for.

usually Magnus comments ... that fetch on demand is a not really useful feature

Well, that's Magnus' personal opinion. I would think that quite a number of users in rural areas would disagree. And those with 100 GB mailboxes. And those with roaming. And...

Suggesting WONTFIX for this proposal.

gene smith

Comment 23

•

2 years ago

(In reply to Ben Bucksch (:BenB) from comment #22)

Thunderbird has code to fetch only the headers and display these first, and then download the full message afterwards. This is extremely useful, in a number of cases:

This doesn't get rid of "fetch only the headers and display these first". It will still first fetch header so you will still be able to see the subject, from, date etc (the summary line) without downloading the whole message. What this proposal does is later fetch the full message when you open it and saves it to disk cache (not to offline store).

For slow Internet connections.

Yes, if you have a slow connection and the message is huge due to attachments (as indicated by the optional "size" column on the summary line) it may take a while to download the whole thing. In that case you can abort the download by selecting another small message (this is called pseudo-interrupt in the imap code) then open the huge message in it's own tab or window and do other stuff in tb while the big messages loads.
Anyhow, slow connections are an issue with this, but users with slow connection will probably not want to access emails always from the server but use TBs default of letting autosync download and store them locally to in mbox/maildir files in background.

For huge mailboxes with 100 GB mails.

This amount of bytes in the mailbox (folder) shouldn't matter for this since this doesn't cause you to download all of it. In fact, this change is only relevant for when you are not using offline store which, by default, downloads the full 100G mailbox.
If you are referring to individual 100 GB messages, yes that would be problem, but I don't think any servers allow individual emails that big (maybe 100M max, not sure.)

For roaming users with flexible desks within the office (esp. after Corona),

Not sure I understand how this affect your network speed rather than being at a fixed desk. (Unless maybe you mean flexible/floppy disks with minimal storage for huge downloads? :))

and many other situations.

If Thunderbird is not doing that (by default), then that's a bug. Yes, it makes sense to sync all the 100 GB - if possible. But that's going to take ages, and in the meantime, you want to read mail. That's why we have this. That's what IMAP was made for.

Again, this proposal isn't for sync/download/storing locally all 100 GB mailboxes. It only has an effect when you click-on/open an individual message.

usually Magnus comments ... that fetch on demand is a not really useful feature

Well, that's Magnus' personal opinion. I would think that quite a number of users in rural areas would disagree. And those with 100 GB mailboxes. And those with roaming. And...

I was just gathering together previous discussion of the issue. (I didn't always agree with his comments.)

Mihovil Stanic [:Mikeyy - L10n HR]

Comment 24

•

2 years ago

(In reply to Ben Bucksch (:BenB) from comment #22)

If Thunderbird is not doing that (by default), then that's a bug. Yes, it makes sense to sync all the 100 GB - if possible. But that's going to take ages, and in the meantime, you want to read mail. That's why we have this. That's what IMAP was made for.

I'm a 100GB mailbox user and just installed fresh TB on a new laptop over the weekend.
TB default behavior is to mark all folders for offline use. The positive side is, TB won't start downloading messages inside the folder until you click on it (folder).
Since I'm using TB for a long time, I know this, so the first thing I do after connecting a new account is to go to setting and remove the "offline" check mark from all folders except INBOX and Sent. Those are folders I use every day and I want emails there to be fast to access.

For me personally best default TB behavior should be to download HEADERS and BODY of all emails in all accounts, index them and use them for search and quick filters. Just don't download attachments.

Ben Bucksch (:BenB)

Comment 25

•

2 years ago

TB default behavior is to mark all folders for offline use

Just to clarify: Yes, "offline use" will ask TB to download all messages and store them locally. But that can happen as background task and doesn't need to block the user from using the mailbox in an on-demand. Indeed, I just tested, and Thunderbird does show the list (headers) of emails before it downloaded all messages. When I click on an email which has not been downloaded yet, it downloads and displays, independent from the background download of all messages. I can see that it takes a tiny moment to display. The second time I click on the email, it displays instantaneously. This true even before all emails are downloaded, immediately after setting up the account.

For example, what we do in Owl (different account type, not IMAP) is: First get the headers, so that at least the list of emails can show. Then download in full all emails with less than 50 KB or so, then eventually download the large emails in full. Additionally, as soon as somebody clicks on a message, if it isn't downloaded already, get the body (HTML and plaintext) first, to display the text quickly, then get the large attachments. This allows the user to very quickly see all emails, and even read them, but still eventually all emails will be downloaded in full (if so configured). This strategy is necessary for large inboxes with 100 GB. All this works in Thunderbird. This same strategy would be a sensible approach for IMAP as well. The Thunderbird IMAP implementation has the code to do that all. That's exactly how IMAP is designed and supposed to be used.

This is not a feature to remove, but to improve. It makes a big difference for users who set up a new machine with Thunderbird, who change computers in flexible offices with roaming profiles, for people with slow internet, and many other use cases.

gene smith

Comment 26

•

2 years ago

This is not a feature to remove, but to improve.

As I've tried to explain several times, I'm not proposing removing the feature of how messages are downloaded and saved locally when the user selects the (default) setting of using offline store. My possible and proposed change only affects folders without offline store.
Does the "Owl" addon allow you to configure folders where the full or partial messages are never permanently stored locally but only downloaded and not permanently saved when the user clicks on the message?

Anyhow, I think maybe if I can fix this bug 1675914 then most of my proposed changes won't be needed.
That bug seems to be caused by the OpenPGP need to see the full message and not just the individual parts even when encryption is not being used.

gene smith

Updated

•

2 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1675914

Thilo [:ThiloteE]

Comment 27

•

2 years ago

fetch headers only also

lets users avoid downloading vast amounts of SPAM-mails to their local device.
is some form of development assistance: In some parts of the world, internet flatrates are not (yet!?) common. See for example Ghana: https://blog.meqasa.com/top-internet-service-providers-in-ghana/ Depending on usage, users can minimize expenditure using this feature.

I personally don't use this feature anymore, but acknowledge its great uses to some users at the present time. I naively would expect prices for internet traffic in relation to amount of data transfered to drop in the coming decade further (looking at you "widespread rollout of fiber networks"), which might remove at least some of the incentives for using this feature. On another note, prices for hardware storage are dropping as well. Terabyte SSDs increasingly become the norm at my place and HDDs obviously have become dirt cheap.

Matt

Comment 28

•

2 years ago

(In reply to Ben Bucksch (:BenB) from comment #15)

I agree that "POP3 headers only" might not be not very useful these days and could possibly be removed.

Magnus wrote in comment 0:

In IMAP there is mime_parts_on_demand, which will fetch e.g. a large attachment only when you open it, but lets you see the email message before doing that.

Indeed, fetching MIME parts on demand via IMAP is a very useful feature. Email can have big attachments. Maybe not you, but some people keep sending pictures and videos or large documents per email. When you have a slow internet connection, it is helpful to first download only the headers and the body (which technically is the first "MIME part" with text) of all the emails, and the attachments (the other "MIME parts") only in a second step, or only when needed.

And as close to that as POP can get is headers only. Otherwise, you wait for the entire email and everything else on the server to be downloaded to display even the header in a list. While I do not see a lot of usage for pop headers only, mostly because folk generally do not fiddle with such settings out of ignorance. The same can be said about the IMAP parts on demand. Both are exceedingly useful features, especially where bandwidth is an issue, failure in takeup is perhaps more a reflection on feature discoverability than of code.

Please can we not relegate POP to the status of second class citizen, just because folk on Bugzilla are IMAP users. I am seeing a cottage industry in support forums of folk trying to make IMAP into POP using filters, or expecting IMAP accounts to be immutable archives of local mail. Changes like these where data is routinely not stored in the local profile permanently is not what the average users expects to occur.

gene smith

Comment 29

•

2 years ago

(In reply to Matt from comment #28)

And as close to that as POP can get is headers only. Otherwise, you wait for the entire email and everything else on the server to be downloaded to display even the header in a list. While I do not see a lot of usage for pop headers only, mostly because folk generally do not fiddle with such settings out of ignorance.

The same can be said about the IMAP parts on demand. Both are exceedingly useful features, especially where bandwidth is an issue, failure in takeup is perhaps more a reflection on feature discoverability than of code.

Actually, for IMAP, downloading the attachment(s) on demand has not really worked since ESR78 when openpgp encryption feature was added (see bug 1675914). What happens now is when you open an imap message with attachments (even without encryption enabled) and with parts on demand enabled and the total msg size is more than 500K, only the top level body part gets fetched first as it should and then immediately the entire message (everything) gets downloaded. So if no one has complained about excessive bandwidth usage since ESR78 it doesn't seem to be a big problem.

Please can we not relegate POP to the status of second class citizen, just because folk on Bugzilla are IMAP users. I am seeing a cottage industry in support forums of folk trying to make IMAP into POP using filters, or expecting IMAP accounts to be immutable archives of local mail. Changes like these where data is routinely not stored in the local profile permanently is not what the average users expects to occur.

Not sure I understand your point here, at least regarding imap. When you say "Changes like these where data is routinely not stored in the local profile permanently is not what the average users expects to occur", it sounds like you are also thinking that my proposed change will prevent a user from storing full IMAP messages locally to mbox or maildir. It won't. My proposed change (attached above) only affects the (small?) subset of users that choose not to use offline store for imap.

Just to maybe clarify, for imap when offline store is enabled (the default) the whole message is always fetched (and stored in profile: mbox or maildir) on first access and individual parts or attachments are never fetched, downloaded or stored.

Thomas D. (:thomas8)

Comment 30

•

2 years ago

Iiuc from Gene's Bug 1805186 Comment 8, Bug 1805186 occurs with mail.server.default.mime_parts_on_demand = false.

A quick work-around for this [Bug 1805186] is to just set mail.server.default.mime_parts_on_demand to true.

Bug 1805186 is pretty cunning and confusing, so we'd probably want to ensure that's fixed before even considering to remove that pref here (which would then default to false).

Depends on: 1805186

Summary: consider dropping support for various modes of having incomplete messages (mime_parts_on_demand, fetch headers only pop3) → Consider dropping support for various modes of having incomplete messages (IMAP: mime_parts_on_demand; POP3: fetch headers only)

Richard Leger

Comment 31

•

2 years ago

(In reply to gene smith from comment #4)

Created attachment 9303236 [details] [diff] [review]
no-fetch-parts-v0.diff

I've tested this with a "modern" laptop (1T ssd) and on a very old and weak laptop with a 25G (65% full) and the disk caching and temp file (described in previous comment) works great with very large messages; tested up to 117M which is much larger than most mail servers even allow.
Testing in real world by a user using no offline storage would be useful. Richard Leger comes to mind :).
Here's the link again to the try build: https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=b7856c1a52137521eb932fe9d3102d3001782847

As per try built https://bugzilla.mozilla.org/show_bug.cgi?id=1805186#c20 (assuming it includes improvements related to this bug) no major issue reported (as per Bug 1805186 Comment 31). Though I cannot test with large message as my system is limited to 50M message size.

Flags: needinfo?(richard.leger)

Wayne Mery (:wsmwk)

Updated

•

2 years ago

status-thunderbird_esr102: --- → affected

Component: General → Networking: IMAP

Product: Thunderbird → MailNews Core

Wayne Mery (:wsmwk)

Updated

•

1 year ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=628646

no-fetch-parts-v0.diff 2 years ago gene smith (deleted), patch		Details \| Diff \| Splinter Review
no-fetch-parts-v2.diff 2 years ago gene smith (deleted), patch		Details \| Diff \| Splinter Review