Closed Bug 1787963 Opened 2 years ago Closed 2 years ago

Dataloss when moving messages from POP/Local to IMAP while offline and then going online

Categories

(MailNews Core :: Backend, defect, P1)

Thunderbird 102

Tracking

(thunderbird_esr102+ affected, thunderbird105 wontfix)

RESOLVED FIXED
106 Branch
Tracking Status
thunderbird_esr102 + affected
thunderbird105 --- wontfix

People

(Reporter: TbSync, Assigned: benc)

References

(Blocks 1 open bug)

Details

(4 keywords)

Attachments

(2 files)

STR:

  • Select File -> Work Offline and select "do not download all messages" in the following prompt
  • Move a message from a Local/POP folder to an IMAP folder
  • Observe the message being moved
  • Deleselect File -> Work Offline
  • Observe the moved file vanishing from the IMAP folder

The message did not return to the original folder, it is gone.

Expectation:

The message should be sent to the IMAP server after going back online. If that is not possible (for example server is down), keep the message in the folder until a later time, to retry sending it to the server.

At no time should a message, which has not yet been acknowledged by the server be removed from Thunderbird.

Good find, John, thank you! Confirming exactly as described (102.2.0 (64-bit), Win10).

This constitutes unexpected and serious dataloss, and imo pretty easy to get into and not exactly an edge case, there might be very good reasons to move messages from a local folder into an IMAP folder. Even if it was an edge case, it may still have catastrophic consequences for large numbers of messages moved. I guess similar desasters may happen when the network just goes away during the move or some such, so it may be even easier to get into. Per bug fields legend for severity, dataloss is S1. We should seek to fix this asap.

I noticed that the message body is already gone after the move to the IMAP folder when offline. Trying to drag the message back to a local folder shows an alert which might be helpful for understanding this.

v v Note: tracking-firefox* flags are wrongly set by BMO automation, wasn't me. Can't remove them either.

Severity: S2 → S1
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(benc)
Priority: P2 → P1
Summary: Data-loss when moving messages from POP/Local to IMAP while offline and then going online → Dataloss when moving messages from POP/Local to IMAP while offline and then going online

I can follow the STR and confirm the issue (I used a purely Local folder - no POP3 involvement).

Some more observations:

  1. Watching the raw mbox files for both the source (local) and destination (IMAP, but offline) folders, I can see that the message is never copied into the destination mbox file.
  2. after the move, while still in offline mode, you can click on the message in the IMAP folder, but the body is not available.
  3. the issue doesn't seem to occur with a copy (rather than a move): When you copy a message, the IMAP folder still won't serve you up the message body while offline, but when you go back online it's still in the imap folder and the full body can be viewed.

2 is consistent with the IMAP folder msgDB having an entry for the message, but there being no message available to display (I think it first looks for an offline copy in the msgStore, which we know isn't there because of 1. It then tries to stream it down from the IMAP server, which of course it can't do because we're offline).

Further thoughts:

  • The IMAP code seems to treat the local msgStore as a cache, and is pretty cavalier about throwing away messages. So I'm not sure how robust it is to rely on messages copied into there while offline being uploaded to the IMAP server once we go back online.
  • I haven't looked at the IMAP offline playback stuff in detail (it's firmly in the "Here be monsters" bin in my mind ; -). But I will. It's possible it does something clever...
  • Would it be acceptable to treat IMAP folders as "read only" while in offline mode? And have CopyMessage()/CopyFolder() et al fail appropriately? I'm kind of guessing that the point 3 above and the existence of IMAP offline playback code suggests "no", but hey. Have to ask :-)

In any case, it'd be good to have a xpcshell unit test to illustrate the issue (i.e a test that currently fails but will pass once it's fixed). So that's probably the first thing I'll do tomorrow. If anyone is interested in having a crack at that in the meantime, I'd suggest comm/mailnews/imap/test/unit/test_copyThenMove.js would be a good starting point - just strip out all but the move operation and add in code to go offline first then back online (or whatever causes the imap folder to resync and discard the moved message).

Assignee: nobody → benc
Status: NEW → ASSIGNED
Flags: needinfo?(benc)

I used the offline mode to create a 100% reproducible situation.

But the actual issue might be deeper and more severe: Unstable Internet. Even if the offline mode is not enabled, there are cases where the message is not transferred to the server and then gets thrown away. (Riding a train throu a tunnel). IIRC, we have offline detection which switches to offline mode when the internet is away, but that has lag. So making IMAP read-only during offline would not solve the issue if offline mode is not (yet) enabled but internet is down.

This describes a somewhat different bug, but I still think it is worth mentioning to find the proper solution for this bug, which is a rock solid offline -> online IMAP handling. We should not depend on state detection and prevent the move/copy, but make sure that a message which is only in offline msgStore (I hope I use the right terminology here) never gets removed before it has been acknowledged by the server.

this bug are actually two independent issues?

  1. the message is lost when going online
  2. the body is lost while offline, directly after moving

(In reply to John Bieling (:TbSync) from comment #3)

But the actual issue might be deeper and more severe: Unstable Internet. Even if the offline mode is not enabled, there are cases where the message is not transferred to the server and then gets thrown away. (Riding a train throu a tunnel). IIRC, we have offline detection which switches to offline mode when the internet is away, but that has lag. So making IMAP read-only during offline would not solve the issue if offline mode is not (yet) enabled but internet is down.

Yep, in the case of flaky internet connections, the solution is to have rock solid error handling, so if anything fails (internet, disk error, whatever), it doesn't screw anything up and is correctly rolled back. So I'll treat it as a separate issue to this one.
(And I don't think we're there, by a long shot - the message and folder copy/move code is unfathomably complex and brittle. I really think it needs major refactoring and simplification before we have any hope of rock solid error handling).

This describes a somewhat different bug, but I still think it is worth mentioning to find the proper solution for this bug, which is a rock solid offline -> online IMAP handling. We should not depend on state detection and prevent the move/copy, but make sure that a message which is only in offline msgStore (I hope I use the right terminology here) never gets removed before it has been acknowledged by the server.

You're right. And this one does seem to be a specific bug when copying from local folder to an IMAP folder. See next comment.

I've spent the day digging down into this (This is mostly a braindump so tomorrow I can remember how far I got today :)

  • Moving a message from a local folder to an IMAP folder happens in two parts.
    1. nsImapMailFolder::CopyMessagesOffline() is called to copy the message to the IMAP folder.
    2. The message in the local folder is deleted (using code in nsMsgLocalMailFolder).

Regarding part 1:

  • When offline, rather than actually doing a copy, an nsMsgOfflineImapOperation move op is set up in nsImapMailFolder::CopyMessagesOffline().
  • This is stored in the folder msgDB.
  • When we go back online, UpdateFolderWithListener() is called which goes through any nsMsgOfflineImapOperations which are in the msgDB, and performs them for real. Move operations are handled by nsImapOfflineSync::ProcessMoveOperation.

I think the problem is that part 2 (deleting the original message in the local folder) happens when part 1 returns, and has no knowledge that the copy has been deferred until we go back online.
So when we go online and UpdateFolderWithListener() is called, the original message is long gone, so there's nothing left to move.

Not sure if this offline local->IMAP case ever worked. It could well be a fundamental design flaw rather than a bug. I need to keep digging to see if I can spot any hints of historical attempts to address it. It's a hard slog because the message copy/move code is pretty mindbending to trace through (soooo many ill-defined listener callbacks!).

But I suspect the real issue is architectural. Fundamentally, all folder types should be aware of deferred operations like this, and have mechanisms to cope. But as things stand, only IMAP has any concept of deferring operations.

This is truely significant and not an edge case, due to the fact that TB stores sent messages in local folders upon loss of network connectivity.

So the attempt to copy them next into the account's sent folder may lead to loosing them.

This patch addresses a couple of issues in nsImapMailFolder::CopyMessagesOffline(),
which handles copying messages to IMAP folders when in Offline mode:

  1. It uses nsIMsgFolder.HasMessageOffline() to see if a full local copy of a
    message is available in the source folder. But the local folder implementation
    of HasMessagesOffline() would always return false. So copy operations from
    a local folder would never copy the full message across to the msgStore of
    the destination IMAP folder. This patch adds a local folder implementation
    of HasMessageOffline().
    (The roots of this issue is down to pointless differences between the way
    local folders and IMAP folder handle message storage on disk).
  2. The CopyOfflineMsgBody() routine used to actually copy the message between
    the local folder msgStore and the IMAP folder msgStore would corrupt the
    message data. I've replaced it with a much simpler dumb copy.

The upshot is that after the offline copy, a full message will appear in the
destination IMAP folder. It's a required first step, but it doesn't fix the
rest of the bug - the moved message disappearing during the IMAP sync after going
back into online mode.

Keywords: leave-open
Target Milestone: --- → 106 Branch

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/54bab87d2605
Copy message to msgStore as part of local->IMAP offline copy op. r=mkmelin

So this patch isn't exactly a perfect fix, but it's as good as I can get without big architectural changes.

The problem was that during an offline move from a local folder to an IMAP folder, the source message was being deleted while still offline. Then, when the user went back online, the stored-up IMAP operation would attempt to perform the move for real. But it'd fail... because the source message no longer exists.
This patch just avoids the immediate source message deletion while offline. So the source message is effectively copied rather than moved. When the user goes back online, then the source message is properly moved.

Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/af7d8cd3db0e
Fix offline local->IMAP message move operation. r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED

Nice find. However, if this is not a regression, it probably does not qualify as S1

It is a severe dataloss issue, which alone should qualify it for S1, not?

I was thinking S1=blocker, which is the old criteria. But they have unfortunately collapsed the criteria of severity and dataloss is now in that category. The old definition of blocker also means we should stop shipping, stop updates, etc. which we would not do in this case. I guess I must get comfortable with the new definition.

(Catastrophic) Blocks development/testing,

No indicators of that

may impact more than 25% of users,

No indicators of that - I would think moving messages local to imap is somewhat infrequent occurrence, if the primary driver of that circumstance is a user being offline and then going online. If that is true (and we don't have data, so it is subjective), I would not rate this as severe/frequent.

causes data loss,

yes, for the literal definition of the criteria this is definitely S1. I have no objection to keep it there.

However, if it is a NN year old bug and none of the other criteria are met, for me, within in the many shades of "bad", this isn't an uber-severe bug.

On the other hand, if this is a regression, that would make it a strong candidate for stopping updates immediately. Hence, my question about regression.

Blocks: 1787104

No indicators of that - I would think moving messages local to imap is somewhat infrequent occurrence

not really: if offline, TB stores a lot of messages in localfolders - e.g. all sent copies I then do find in a special sent folder under local folders, if the imap sent folder is inaccessible.
If I now copy them to imap sent, they are lost.

I can also reproduce the problem as described in the STR by John in comment #0 100% of the time using Thunderbird 102.2.2. Additionally, I have also tested Thunderbird 91.13.0 and there the problem DOES NOT occur. So it sounds like a very serious regression resulting in dataloss.

Until the problem is adequately fixed in Thunderbird 102 I strongly recommend to stop automatic and semi-automatic updates via ("Help -> About") immediately.

Especially considering the other just observed problem resulting in dataloss: Profile import in Thunderbird 102 causes corruption and potentially secret key loss

With the current Daily (2022-09-13) it gets complicated, because there are a few notable observations:

  1. When in offline-mode and moving a message from a local folder to an imap folder, the message is now actually copied at first, i.e. there are temporarily two identical messages in the source and destination folder. As soon as Thunderbird gets back online, the move is actually performed, i.e. the (original) message in the source folder is removed.

I find this a little bit irritating, because the first impression is, that one accidentally copied the message instead of moving it. The user is tempted to either perform the move operation another time or to delete the source message manually.

  1. If the user indeed moves the same message multiple times, then temporarily multiple messages are displayed in the destination folder. As soon as Thunderbird gets back online, the duplicate copies are removed automatically. (Still irritating, but luckily no real dataloss...)

  2. If the user decides to manually delete the source message... As soon as Thunderbird gets back online, the message in the destination folder is automatically deleted - resulting again in the very same dataloss situation as originally!!!

  3. When in offline-mode and moving a message from a local folder to an imap folder and - while still in offline-mode - back to another local folder. As soon as Thunderbird gets back online, there are now two messages - one in the imap folder and another one in the local folder.

Do we have any candidates for what caused the regression?

And, it apparently needs test coverage.

I just took the liberty to narrow down the problem:
The last working version is Daily 96.0a1 (2021-11-25) and the first broken version is Daily 96.0a1 (2021-11-26).

Wayne, when I look at the mentioned commits in the TXT files in the archive for Daily 96.0a1 (2021-11-25) and Daily 96.0a1 (2021-11-26), then - maybe - this pushlog interval is more appropriate?

https://hg.mozilla.org/comm-central/pushloghtml?startdate=2021-11-25+12%3A00%3A00&enddate=2021-11-26+12%3A00%3A00

(In reply to Alexander Bergmann from comment #23)

https://hg.mozilla.org/comm-central/pushloghtml?startdate=2021-11-25+12%3A00%3A00&enddate=2021-11-26+12%3A00%3A00

Could be something of Ben's tweaked a longstanding flaw.

Flags: needinfo?(benc)

(In reply to Wayne Mery (:wsmwk) from comment #24)

(In reply to Alexander Bergmann from comment #23)

https://hg.mozilla.org/comm-central/pushloghtml?startdate=2021-11-25+12%3A00%3A00&enddate=2021-11-26+12%3A00%3A00

Hmm... I can't really see any of those changes causing this issue... I assume one of them must have, but nothing jumps out.

Could be something of Ben's tweaked a longstanding flaw.

I'm pretty sure the old code only ever worked because all the longstanding flaws lined up in such a way that they were cancelling each other out...

Alexander's points in comment #19 are all true, but I don't really have any good ideas for them, short of just disabling the offline IMAP copy/move operations (note, that's the explicit offline mode, which is very different to the normal error case of a connection going down).
The message-copy architecture as it stands just doesn't support offline operations in any considered kind of way - whatever fix I hack in will likely fail in some other way (or, more likely, break something else).
I realise that it used to work, but I think that was more down to luck than anything else. It's not just a case of rolling back a patch that broke it.
The proper solution would be to step back and re-implement it in a more robust way - this definitely needs to be done for message copying in general at some point, but it's a big job (eg I spent the last couple of days just tracing through the local folder message copy path for Bug 1731177, and that was without any IMAP involvement).

Flags: needinfo?(benc)

Are we satisifed the tests give us sufficient comfort to uplift to 102?

Flags: needinfo?(benc)

(In reply to Wayne Mery (:wsmwk) from comment #26)

Are we satisifed the tests give us sufficient comfort to uplift to 102?

To be honest, I'm not sure any level of unit test coverage would give me much confidence on the offline-move case - the whole architecture is just fundamentally a bit dodgy.
If you think it's a serious enough issue, then I'd say go for it if the changesets apply cleanly to 102.

Flags: needinfo?(benc)

tbysync and I had chat this weekend. he is going to test with the patch.

Alexandar, can you give your assessment of how well behaved, or not, it is with the patch on beta or nightly?

Flags: needinfo?(myaddons)

klaus too :)

Wayne, I am not sure which patch exactly you want me to test... But all the mentioned problems in comment #19 are still one hundred percent reproducible for me both on Beta 106.0b3 and Daily 107.0a1 (2022-10-02).

Flags: needinfo?(myaddons)

I have just discovered another variation of the bug - verified on Beta 106.0b3 and Daily 107.0a1 (2022-10-02):

When in offline-mode and you move a message from a local folder to an imap folder and you use "Edit -> Undo" or "Ctrl + Z", then:

  1. two completely empty ghost messages are created in the source folder
  2. the move is performed anyway when Thunderbird gets back online
  3. the two ghost messages are still there afterwards

I have run a test in Thunderbird 91 and there the ghost messages mentioned in my previous comment #31 are also occurring. So I don't think they are related to the problem we are discussing in this bug. I guess I will have to file a new bug for them.

Today I spent the time to learn some valuable basics - I guess:

First I used mozregression to verify my reported finding regarding the pushlog range. This is the range mozregression came up with:
https://hg.mozilla.org/comm-central/pushloghtml?fromchange=7296008c832d5c0510fd36901e6cc5713867bfd5&tochange=8c44f53caece8bfd13295337d23deba36dbc92ca

This is actually the same range as already reported by me in comment #23. (Using mozregression was actually much more easy than downloading the versions manually, installing them manually and keeping records about the result manually.)

However, this still leaves us with a few commits, of which one might be the culprit! So I went on and built Thunderbird from source myself for the very first time - following the guide at Building Thunderbird.

After setting up the build environment and building Thunderbird initially, I reverted the two patches D156269 and D157071:

hg backout 54bab87d2605
hg backout af7d8cd3db0e

Afterwards the problem did occur in the very same way again as in Thunderbird 102. And all the mentioned side-effects in comment #19 are gone as well. Of course moving the message from local to imap now always fails and results in the message being lost.

Now I was able to finally look into the pushlog range. I started by reverting the first patch in the pushlog range, i.e. 8c44f53caece8bfd13295337d23deba36dbc92ca:

hg backout 8c44f53caece8bfd13295337d23deba36dbc92ca

And this indeed did the trick! :-)

Now Thunderbird behaves again as in version 91: When in offline-mode the message is moved from local to imap instantly and - when back online - it stays this way. As all of this is pretty new for me, I ask you to please verify my findings.

For me it looks like simply reverting the patch 8c44f53caece8bfd13295337d23deba36dbc92ca is the way to go for now. At the moment I have a bad feeling about the two proposed patches D156269 and D157071, because of the side-effects reported in my comment #19.

(In reply to Alexander Bergmann from comment #33)
Ben, what do you want to do ^^^

Flags: needinfo?(benc)

Alex: nice work in comment #33 !

(In reply to Magnus Melin [:mkmelin] from comment #34)

(In reply to Alexander Bergmann from comment #33)
Ben, what do you want to do ^^^

It kind of reinforces my opinion that the old code was damn lucky to have worked at all...

The only thing patch 8c44f53caece8bfd13295337d23deba36dbc92ca does is to return an obviously wrong value when an unset .messageOffset is read, rather than a subtly wrong value (it returns the message key rather than offset, which is never going to be valid, but is way more likely to still be within the extent of the mbox file).
Any code that relies on that behaviour is going to cause trouble at some point, no matter what.

So. Practical steps forward in the short term:

  • Back out 8c44f53caece8bfd13295337d23deba36dbc92ca
  • Back out D157071
  • Back out D156269 (but maybe not - see below)
  • Separate out the unit test in D157071 into it's own patch.
  • Extend that unit test to ensure the message data stays intact, especially when the message key starts getting higher (I'm concerned that the the borked messageOffset might be causing the start of the message to be clipped off).
  • Write up a bug for either implementing all this stuff properly and robustly, or just ensure IMAP folders are treated as read-only when in offline mode.

But first... Alex:
Does it work if you back out 8c44f53caece8bfd13295337d23deba36dbc92ca and
D157071?

but keep in D156269 ?

(D156269 includes some long-overdue cleanup which will probably have to happen at sometime anyway)

Flags: needinfo?(benc) → needinfo?(myaddons)

Ben, I think you are right that 8c44f53caece8bfd13295337d23deba36dbc92ca is not the solution by itself. With the patch applied, I get the behavior of Thunderbird 102, i.e. every message is lost instantly when going back online. With the patch reverted, I get the behavior of Thunderbird 91, which seemed to work fine - at first glance.

Today I realized that in fact not everything is working fine in Thunderbird 91 either. In all my previous tests I moved a single message from the source folder to the destination folder. In both folders there were no other messages at the time. Pretty much by accident I discovered Thunderbird 91 is also discarding the message or creating a garbled message, when there are a lot of messages in both folders. (In my further tests I had 50 other messages each in the source folder and destination folder.)

This is one of the garbled messages - the header is broken and the body is missing:

0001
X-Mozilla-Keys:                                                                                 
rom - Tue Oct 04 23:57:45 2022
X-Mozilla-Keys:                                                                                 
X-Mozilla-Keys:                                                                                 
Message-ID: <50d3f48c-7db7-2517-12d9-6e6a25b79f27@example.com>
Date: Sun, 24 Jul 2022 17:27:28 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.13.1
From: John Doe <john.doe@example.com>
Subject: Contacts
To: John Doe <john.doe@example.com>
Content-Language: en-US
X-Mozilla-Draft-Info: internal/draft; vcard=0; receipt=0; DSN=0; uuencode=0;
 attachmentreminder=0; deliveryformat=0
X-Identity-Key: id2
Fcc: imap://john.doe@example.com/Sent

I did another mozregression and came up with this pushlog:
https://hg.mozilla.org/comm-central/pushloghtml?fromchange=3667b3a2f949&tochange=4d8d5d6eecfb

Daily 45.0a1 (2015-12-06) seems to be (not) working just like Thunderbird 91, i.e. moving the message leads to a garbled message in the destination folder. (At least I haven't found anything obviously different, yet.) On the other hand Daily 45.0a1 (2015-12-05) moves the message correctly from local to imap - even when going back online. BUT: It removes another one of my other 50 messages in the local source folder as well!

From a quick scanning through the pushlog, the change 4d8d5d6eecfb07238b223c5fbb26ff871b2c0ccb looks interesting in my eyes. Unfortunately I cannot simply revert it like the other patches, because there are some merge conflicts.

But honestly I have spent the last hours testing various variations of changes and I am that close to loosing overview completely. There are so many things breaking in all kind of subtle differences in different versions of Thunderbird and with different combinations of patches. (I come to wonder, whether making imap accounts read-only when in offline-mode is a better solution? And I wonder, whether this has ever worked...)

Flags: needinfo?(myaddons)

Judging from my tests, the patch D156269 seems to have no direct effect on whether messages are deleted or garbled. However, it definitely improves the situation from a usability perspective, because the moved message is now shown while still in offline-mode.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: