Closed Bug 1580480 Opened 5 years ago Closed 4 years ago

IMAP fetch chunk size is always 65536 bytes

Categories

(MailNews Core :: Networking: IMAP, defect)

defect
Not set
normal

Tracking

(thunderbird_esr68 affected, thunderbird_esr78+ fixed, thunderbird78 wontfix, thunderbird80 fixed)

RESOLVED FIXED
Thunderbird 79.0
Tracking Status
thunderbird_esr68 --- affected
thunderbird_esr78 + fixed
thunderbird78 --- wontfix
thunderbird80 --- fixed

People

(Reporter: mozilla.org, Assigned: longsonr)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: perf, regression, Whiteboard: [TM:78.2.0][regression:TB17][requires bug 1654995])

Attachments

(2 files, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36

Steps to reproduce:

Thunderbird should adjust IMAP chunk size based on chunk download speed. I see from logs that it always downloads all messages in 65536 byte chunks.

If i grep m_endTime from source code then I do not see that this variable is set.

nsImapProtocol::AdjustChunkSize

Actual results:

IMAP chunk is always 65k

Expected results:

IMAP chunk should increase with good internet connection to download large attachments faster.

IIRC it should be adjusting chunk size dynamically based in part on these base config settings
mail.imap.chunk_size
mail.imap.min_chunk_size_threshold

Described at http://kb.mozillazine.org/Mail_and_news_settings and (last?) implemented at Bug 425849 - Consider changing mail.imap.fetch_by_chunks pref value

Flags: needinfo?(gds)

Yes it should, but I do not see where is m_endTime set.
There is only comparison with weird comment :)
As I understand that this is always negative as m_endTime is always 0.

comm/mailnews/imap/src/nsImapProtocol.cpp:
PRTime2Seconds(m_endTime - m_startTime, &deltaInSeconds);
m_trackingTime = false;
if (deltaInSeconds < 0) return; // bogus for some reason

As I understand m_endTime should be set in nsImapProtocol::FetchMessage after fetching chunk. (I am not programmer tho).

I'd say the max size is 64K.

If i grep m_endTime from source code then I do not see that this variable is set.

I didn't even know this feature exists. However, it seem have gotten broken with the patch associated with Bug 773782. The setting of the variable m_endTime to a non-zero value seems to have removed.

Other than the macro change, this code seems to have last been touched 19 years ago with this: Bug 46375. Comments indicate it didn't have a huge effect.

Flags: needinfo?(gds)
Regressed by: 773782

Today, when e-mail is not one text file, but extracted to small pieces (attachments base64decoded, deduplicated, etc) and e-mail is encoded every time client asks for it. Downloading one 20MB email means that whole 20MB is encoded 312 times (64k*312=20MB). This makes unneccessary load and overhead to email server.

I see from code: static int32_t gChunkSize = 250000;
but i dont see from logs that it asks 250000 bytes. It asks whole e-mail at a time or whole e-mail by 64k chunks (I do not understand why it decides to go with 64k, this seems pretty random).

Please, lets try to fix this 19 year old code.

We tried to make workaround by sending more data than Thunderbird asks, but then Thunderbird stops for asking more and e-mail ends up corrupted.

The initialization of chunk size to 25000 is irrelevant; it is actually obtained from the pref
mail.imap.chunk_size which defaults to 65536. So not random.
For now, if you feel that chunking is slowing down email accesses, you can completely turn it off in the config editor (under advanced preferences):
mail.server.default.fetch_by_chunks. Set this to false and restart tb.

Curious how your tb is setup. Are your folders synchronized for offline use? That means the whole email is fetched and stored locally to a file and when accessed again you don't go to the network. Tb defaults to this mode for all folders. With this setup, chunking may only occur when the email is first fetched from the server.

As I understand from source code, chunk size should be dynamically increased, if chunk is downloaded with less than 4seconds?

Isn't this 64k a little outdated? Maybe you can increase this default to 1MB for example?

Problem is not with my personal TB. Problem is with unnecessary load to mail servers.

Curious if you run a server that is getting hammered because of this? Anyhow, the feature to change the chunk size dynamically is there but what looks like an error in the code 7 years ago caused it to not work. I will take a look at it asap.

Assignee: nobody → gds
Status: UNCONFIRMED → NEW
Ever confirmed: true

We use e-mail server where email is stored in mongodb cluster. Every time when client asks for chunk, whole e-mail is downloaded from mongodb cluster.
If 25MB e-mail comes in and client downloads it in 64kb chunks, this means 381 chunks. 381*25MB=9525MB traffic between IMAP and mongodb server for one 25MB email download.
I know, this should be improved from server side, but still. Bigger chunks means faster e-mail download.

Wow, it's really been broken since 2012?

Keywords: perf, regression
Whiteboard: [regression:TB17]
Attached patch patch (obsolete) (deleted) — Splinter Review
Assignee: gds → longsonr
Attachment #9158044 - Flags: review?(mkmelin+mozilla)
Comment on attachment 9158044 [details] [diff] [review] patch Review of attachment 9158044 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me, thx! Would you mind attaching a patch with the normal hg headers? Set up your hg username/email and do an hg export of the patch.
Attachment #9158044 - Flags: review?(mkmelin+mozilla) → review+
Status: NEW → ASSIGNED

Commit message should be "Bug 1580480 - .... <why you did what>. r=mkmelin"

Attached patch 1580480.txt (deleted) — Splinter Review
Attachment #9158044 - Attachment is obsolete: true

Should I just push this to comm-central?

Flags: needinfo?(mkmelin+mozilla)

We try to sync pushes with the m-c merges, so it's preferable if you attach the patch and add the checkin-needed-tb keyword.

Flags: needinfo?(mkmelin+mozilla)
Component: Preferences → Networking: IMAP
OS: Unspecified → All
Product: Thunderbird → MailNews Core
Hardware: Unspecified → All
Target Milestone: --- → Thunderbird 79.0

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/2ee4ad68689a
Fix IMAP fetch chunk size so that it adjusts based on network bandwidth. Reverts an inadvertent change from bug 773782. r=mkmelin DONTBUILD

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED

(In reply to Pulsebot from comment #19)

Fix IMAP fetch chunk size so that it adjusts based on network bandwidth. Reverts an inadvertent change from bug 773782. r=mkmelin DONTBUILD

RIght, this one: https://hg.mozilla.org/comm-central/rev/31da1d3a11c3#l25.40

Comment on attachment 9158094 [details] [diff] [review] 1580480.txt Patch author or reviewer should remember to request uplift where appropriate. Low risk: Restoring code accidentally removed in 2012, see previous comment.
Attachment #9158094 - Flags: approval-comm-beta?
Comment on attachment 9158094 [details] [diff] [review] 1580480.txt On the face of it the change is certainly trivial and it will be a nice win, but it has been broken for so long that I am concerned about consequences from unbreaking it at the very end of the 78 cycle. So let's have it first go through at least an entire beta - in this case 79 - and then consider it for uplift to 68.11 and 78.1. Setting flags for those now now so we don't forget it. [Approval Request Comment] Regression caused by (bug #): bug 773782 User impact if declined: slow download of large messages Testing completed (on c-c, etc.): TBD Risk to taking this patch (and alternatives if risky): TBD after checking 79 beta crash stats
Flags: needinfo?(vseerror)
Attachment #9158094 - Flags: approval-comm-esr78?
Attachment #9158094 - Flags: approval-comm-esr68?
Attachment #9158094 - Flags: approval-comm-beta?
Attachment #9158094 - Flags: approval-comm-beta-

Gene, possible to devise a test so we don't get bit in the future?

Flags: needinfo?(vseerror) → needinfo?(gds)
Flags: needinfo?(vseerror)

(In reply to Wayne Mery (:wsmwk) from comment #23)

Gene, possible to devise a test so we don't get bit in the future?

Wayne, I hope you mean just an informal local test I can do to verify this fix doesn't break but improves chunking with fast speeds and slow? If so, I have a way to slow down my "lighting fast" wifi/internet to 1990's and/or 3rd world speeds and latency and see how it behaves while doing chunks. If you mean a unit test included in tb source then probably not.

Flags: needinfo?(gds)

Testing this, I see a few things that don't seem quite right to me:

  1. The adjustment occurs for only one possible case where chunking can occur. Seem like it should occur in FetchTryChunking() where calls to FetchMessage() occurs in chunks so all cases are covered. However, where it is covers the most common chunking case.

  2. Chunk size is adjust up by 8192 bytes when the time between chunk fetches is less than 2 seconds. However it only changes every other fetch so you get chunk fetch sequence sizes X, X, X+8192, X+8192, X+2*8192, X+2*8192 ... Seems like it should just increase by 8192 on every fetch instead of every other. (X is the configured chunk size, default 65535.)

  3. As long as the time is less than 2 seconds between fetches, the chunk size will keep increasing. You have to download several large messages before you get to a large enough chunk size that chunking is effectively disabled. Issue 2 also makes it take longer.

  4. At one time there was a limit on maximum chunk size but now there is not. The chunk size is stored as uint32_t so it could go up to 4Gbytes without roll-over which would be a much larger email than any system supports. So I guess this is OK.

  5. When the time between chunk fetches is between 2 and 4 seconds, no adjustment up or down occurs. When the time goes above 4 seconds, the chunk size is decreased by 8192 bytes. To test this you have to slow down your network, both outgoing and especially incoming. I had to slow down incoming to about 8000bps to bring the time to above 4 seconds. Seems to work OK. Minimum chunk size seems to be 2 (actually, 8192) so can't go to zero or underflow.

  6. Adjusted chunk size and threshold (always 1.5 times chunk size) are saved as new prefs when the connection ends or tb shuts down so it is reused as the new prefs on new connections or emails or when tb restarts. But if you want to go back to the default chunk size settings using config editor, the default or user settings gets overwritten by the adjusted values. You may have to disable chunking and then restart tb and set the values back to default or custom values again and then enable chunking for the user settings to take effect.

So maybe there are a few problem. Slow to converge on the optimum larger chunk size due to the linear increase and due to the problem described in item 2. I am curious what would happen if you just double the chunk size when the time is short or half it when the time is long?
Also, resetting back to default or custom values seems like of difficult. But If this really works you probably wouldn't need to do that. Also, maybe a way to disable auto-chunk adjustment and just let the user set their own fixed chunking parameters is needed? Of course, disabling chunking or possibly making that the default is still a possibility since I don't see chunking as that helpful unless the user's connection is really slow, messages are stored only on the server and the messages accessed are often large enough to need chunking.

Comment on attachment 9158094 [details] [diff] [review] 1580480.txt Gene, thanks for digging into the details. Sounds like there is significant room for improvement. Let's revisit uplifting after considering these issues further.
Flags: needinfo?(vseerror)
Attachment #9158094 - Flags: approval-comm-esr78?
Attachment #9158094 - Flags: approval-comm-esr68?

I don't understand the reasoning not to take this simple fix.
Sure, there is always room for improvements elsewhere, but taking this one-liner will just improve things, not make them worse. Further enhancements should be handled in another bugs.

Comment on attachment 9158094 [details] [diff] [review] 1580480.txt [Approval Request Comment] Regression caused by (bug #): bug 773782 many years ago User impact if declined: inefficient chunk fetching Testing completed (on c-c, etc.): yes Risk to taking this patch (and alternatives if risky): I don't see much risk even if it won't fully fix all code paths
Attachment #9158094 - Flags: approval-comm-esr78?

(paraphrasing) I didn't say "no". I said "no, not now". See comment 22 for the exact rationale. There's no way we are going to take code at face value into release which has been totally idle for 8 years after just a few days on nightly or beta. (or into beta for that matter) And although I don't say it in comment 26, the breadth of information in comments 25 gives me further reason be skeptical. In short, I think an assessment of "not much risk" is a bit too bright.

There are some issues but probably not a lot of risk in incorporating the patch above as is.

I made a few corrections and modification to comment 25 in bold. I also did a few experimental changes that seem to improve chunking that I will clone as an enhancement bug for future consideration.

Blocks: 1649553
Attached file chunk-no-m_endTime.diff -- maybe? (deleted) —

Just noticed that the approved patch could eliminate the variable m_endTime with the same functionality. Probably not that important but just thought I'd mention it. With this attached change two files and 3 lines would be affected.

I guess it depends on if there's more use of it in other places later.

79 hasn't hit users until today, so deferring esr approval until 78.1.0

Whiteboard: [regression:TB17] → [TM:78.1.0][regression:TB17]
Comment on attachment 9158094 [details] [diff] [review] 1580480.txt Approved for esr78
Attachment #9158094 - Flags: approval-comm-esr78? → approval-comm-esr78+

Did this cause bug 1654995? If so, let's not put it into TB 78.

Regressions: 1654995

Must not land on TB 78, Wayne, please remove the flag since you withdrew my ability to do it without actually informing me :-(

Flags: needinfo?(vseerror)
Flags: needinfo?(rob)
Comment on attachment 9158094 [details] [diff] [review] 1580480.txt (In reply to Jorg K (CEST = GMT+2) from comment #36) > Must not land on TB 78, Wayne, please remove the flag since you withdrew my ability to do it without actually informing me :-( You are mistaken, I didn't remove any capabilities. (In reply to gene smith from comment #24) > (In reply to Wayne Mery (:wsmwk) from comment #23) > > Gene, possible to devise a test so we don't get bit in the future? > > If you mean a unit test included in tb source then probably not. Yes, I meant unit tests. Seems now like we REALLY need that. (not your fault of course that we don't have it) Maybe now we should have such tests before allowing this to go onto ESR. And also re-reevaluate / group think comment 25. [Triage Comment]
Flags: needinfo?(vseerror)
Flags: needinfo?(rob)
Attachment #9158094 - Flags: approval-comm-esr78+ → approval-comm-esr78-

(I've debated about whether to post this but it seems in the best interest of the project to do so, nothing personal here ...)

At least three highly competent developers stated very clearly that this change was low risk. I was the lone person to push back against it in the interest of quality control.

One basic lesson here is that imap is fragile, and no patches can be assumed to be safe or low risk.

In the next 48 hours, before 80 beta ships, we need to assess whether bug 1654995 is a sufficient fix, and if not then back out this patch / bug 1580480 from the beta branch.

Alternatively, we back it out of beta branch today and rebuild 79.0b3 without it. Please speak up now.

Your conservative approach saved the day, and it's a pity we were all wrong. Backouts are always a pain, so personally I'd go forward with the follow-up fix. It won't get worse. The bad patch has been on beta for a while and hasn't caused too much damage and no dataloss.

Sorry, I thought I tested this with a big attachment that would have caught the problem right away. Not sure how I missed it. I'm going to try that attachment again right now and try to understand how I missed it...

Backing out the follow-up fix, it does fail with the original patch so maybe I didn't actually test with that attachments. Or maybe I just observed that the download seemed to work with no error reported and in the imap log there were no obvious indications of a problem. Of course, when I look closer at the log I can now see that the amount of bytes received is a bit less than the expected amount so the inline .png displays as a "broken picture".

No worries - mistakes happen.

Eventually this should go out in TB 78, Wayne, can you set the flag back to "?" (I can't).

Flags: needinfo?(vseerror)

(In reply to Jorg K (CEST = GMT+2) from comment #43)

Eventually this should go out in TB 78

Yes, eventually, but not yet. It was decided to have this and bug 1654995 go through a full beta, so to the end of August. Ref: bug 1654995 comment 12.

Flags: needinfo?(vseerror)
Whiteboard: [TM:78.1.0][regression:TB17] → [TM:78.2.0][regression:TB17][requires bug 1654995]

Comment on attachment 9158094 [details] [diff] [review]
1580480.txt

[Approval Request Comment]
Should be uplifted with bug 1654995

Attachment #9158094 - Flags: approval-comm-esr78- → approval-comm-esr78?

Comment on attachment 9158094 [details] [diff] [review]
1580480.txt

[Triage Comment]
Approved for esr78

Attachment #9158094 - Flags: approval-comm-esr78? → approval-comm-esr78+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: