Closed Bug 1790610 Opened 2 years ago Closed 2 years ago

Detect, report and repair Thunderbird key storage corruption

Categories

(MailNews Core :: Security, defect, P1)

Unspecified
All

Tracking

(thunderbird_esr102 fixed, thunderbird106 fixed)

RESOLVED FIXED
107 Branch
Tracking Status
thunderbird_esr102 --- fixed
thunderbird106 --- fixed

People

(Reporter: KaiE, Assigned: KaiE)

References

Details

Attachments

(5 files, 4 obsolete files)

The Thunderbird profile directory contains several files that depend on each other.
A very important file is: key4.db

If you lose key4.db, you can no longer access logins.json (saved passwords)

If you lose key4.db, you can no longer decrypt file encrypted-openpgp-passphrase.txt
And the consequence of that is, you can no longer access the OpenPGP secret keys stored in secring.gpg

If you lose key4.db, you have lost the secret keys for the personal S/MIME certificates.

The Thunderbird profile importer feature shipped with Thunderbird 102 may overwrite file key4.db, which has the consequences explained above.

We have several reports from users who are no longer able to use the OpenPGP feature at all, and their problems are caused by this inconsistency. The current Thunderbird code isn't prepared for this kind of inconsistency, and when it occurrs, the code simply runs into an exception and stops to work.

As a first step, I suggest to disable the broken import in 1790605, to avoid that more users will be affected.

However, we also must help our users who are already affected.

In my opinion, we cannot expect users to clean this situation up themselves.

We need to improve Thunderbird to perform a consistency check, that discovers this kind of inconsistency, and assists the user to repair the broken configuration.

Severity: -- → S2
Priority: -- → P1

Let's focus on the inconsistency between key4.db and encrypted-openpgp-passphrase.txt and secring.gpg, because that is a known possible consequence of the code from bug 1720042.

We should perform a once-per-session consistency check, and offer assistance if the check fails.

I suggest to perform the check at the time we attempt to read file encrypted-openpgp-passphrase.txt - as implemented in mail/extensions/openpgp/content/modules/masterpass.jsm

Currently, that code will attempt to decrypt encrypted-openpgp-passphrase.txt, and simply throw an exception on failure (the exception is invisible to the user, and results in an attempt action to not work).

We should instead use the following new logic:

  • find out if the application primary password is set
  • if a primary password is set, find out if the user has unlocked using the PP. If not, we aren't experiencing corruption (return an exception, we'll check again if the user retries after having unlocked with the PP)
  • if a primary password is set, and the user has already unlocked using the PP, and we're unable to decrypt, we have corruption
  • if no primary password is set, and we're unable to decrypt, we have corruption

If we have corruption, check if the corruption is bad, or not problematic.

Check how many secret keys are stored in file secring.pgp.

If there are zero secret keys, we could automatically repair the corruption. It should be fine to delete the contents of encrypted-openpgp-passphrase.txt and receate it (using a fresh automatic passphrase, that will be used for future secret OpenPGP keys).

If there are nonzero secret keys, the corruption means dataloss, we're no longer able to access the secret keys in secring.gpg.
We have to inform the user. We should tell the user that the secret OpenPGP keys are lost.
We could offer two choices:

  • automatically delete the unaccessible secret OpenPGP keys to restore a working configuration with no OpenPGP keys
  • tell the user to quit Thunderbird, and to restore a backup of the profile directory (we should show the full path of the corrupted profile on screen to assist the user in identifying which folder needs to be restored from a backup)

I found another scenario, and I think this scenario is a good argument for implementing automatic reparing:

  • create a fresh profile
  • don't setup an email account, just quit
  • start again, same profile
  • import from another profile
  • quit and start again
  • confirm that you can send and receive email (imported account, imported passwords)
  • try to configure openpgp, and you get a failure

In this scenario, the user wouldn't notice the profile corruption immediately.
The user might start to invest into configuring and using this profile.

At a later time, the user might decide to try OpenPGP, and will notice with frustration that it doesn't work, with no good indication why.

Flags: needinfo?(sancus)
Attached image fatal-corruption-report.png (deleted) —

Even if we decide to not automatically repair (for example, because we're worried about potentially make a situation worse by deleting files), in my opinion, at the very least we should automatically detect the corruption and inform the user. The attached patch implements that, see also the screenshot.

I realize simply reporting isn't ideal... We'd inform users who aren't using OpenPGP. And it will be more work to report only when users attempt to use OpenPGP with a broken profile.

I think automatic repairing really would be better.

Attachment #9294618 - Attachment description: Bug 1790610 - Detect and report OpenPGP key storage corruption. r=mkmelin → Bug 1790610 - Detect and report OpenPGP key storage corruption, offer to repair minimal corruption. r=mkmelin

Comment on attachment 9294620 [details]
fatal-corruption-report.png

I suggest, only show this warning if there really is a bad problem, if we find existing secret keys that we can no longer access.

Attachment #9294620 - Attachment description: corruption-report.png → fatal-corruption-report.png
Attachment #9294620 - Attachment filename: corruption-report.png → fatal-corruption-report.png
Attached image simple-corruption-and-repair.png (deleted) —

I suggest to show this simpler warning, if we detect the corruption, but the corruption isn't a problem. If there are no secret keys stored, it should be safe to delete the passphrase file.

I suggest to still inform the user, and require the user to confirm repairing (as done in this screenshot and updated patch).

Status: NEW → ASSIGNED

I would strongly prefer we avoid any messaging if possible.

Flags: needinfo?(sancus)
Attachment #9294618 - Attachment description: Bug 1790610 - Detect and report OpenPGP key storage corruption, offer to repair minimal corruption. r=mkmelin → Bug 1790610 - Detect and automatically repair OpenPGP key storage corruption. r=mkmelin

Patch updated, automatic repair, no prompts.

Based on Kai's patch.

Attachment #9295207 - Attachment is obsolete: true

(In reply to Andrei Hajdukewycz [:sancus] from comment #9)

I would strongly prefer we avoid any messaging if possible.

Why ?

Flags: needinfo?(sancus)

Just to reiterate what I wrote on phab:
Looks OK to me.... but that's coming in blind without knowing anything about OpenPGP and just reading over the bug comments and patch.
So I don't have a great understanding of the bigger picture here.
Kai: I can come back to this tomorrow and take a deeper look (i.e. try and exercise all the failure cases) if you'd prefer!

Comment on attachment 9294618 [details]
Bug 1790610 - Detect and automatically repair OpenPGP key storage corruption. r=BenC

[Approval Request Comment]
Regression caused by (bug #): profile import
User impact if declined: corrupted profile that breaks users attempting to use openpgp
Testing completed (on c-c, etc.):
Risk to taking this patch (and alternatives if risky): low theoretical risk for additional dataloss, because we're moving important files, but the code should only do it when really necessary

Attachment #9294618 - Flags: approval-comm-beta?

Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/08d9b3319b64
Detect and automatically repair OpenPGP key storage corruption. r=BenC

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 107 Branch

Comment on attachment 9294618 [details]
Bug 1790610 - Detect and automatically repair OpenPGP key storage corruption. r=BenC

[Triage Comment]
Approved for beta

Attachment #9294618 - Flags: approval-comm-beta? → approval-comm-beta+

Please don't ship this yet. We need to check if this makes things worse.

Attachment #9294618 - Flags: approval-comm-beta+

This code is broken on comm-central. We have a regression that destroys more. Please back out.

Backout by thunderbird@calypsoblue.org: https://hg.mozilla.org/comm-central/rev/cc3f06995948 Backed out changeset 08d9b3319b64 for causing bug 1792450. r=backout
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Blocks: 1792450
No longer blocks: 1792450
Regressions: 1792450
Attachment #9294618 - Attachment description: Bug 1790610 - Detect and automatically repair OpenPGP key storage corruption. r=mkmelin → Bug 1790610 - Detect and automatically repair OpenPGP key storage corruption. r=BenC

I'm trying to find a way to test the regression bug scenario, which is difficult.

The regression happened, when we already had files from a previous Thunderbird session, and our code unnecessarily replaced it.

To test that our code wouldn't replace existing information, we'd need to run a test that reuses profile files from a different session.
This is difficult with our automated tests apparently. I cannot find any test that restarts Thunderbird, rather, I find tests with comments, stating that they avoid a restart, because that would be tricky.

Another idea is to copy prepared files into the profile directory, prior to executing the tests. This is also difficult. If our process starts with an empty profile, then NSS init will create a fresh key4.db very early during startup - which means we cannot combine it with prepared secring.gpg / encrypted-openpgp-passphrase.txt files, because they would mismatch.

I'm also not aware of a way to copy a prepared file key4.db into the profile folder, prior to starting up.

I found tests using that approach in mailnews/compose/test/unit/ - a function setupForPassword() does this copying. But are those tests used by Thunderbird, or are they old tests from SeaMonkey? Even if they are Thunderbird tests, they are unit tests. I'm not sure if the same action would work with a mochitest.

Any ideas are welcome.

I have a simpler idea, which could simply test the flow of the related code.

Currently the related code is executed only once. As soon as we have succeeded init, we have the password from encrypted-openpgp-passphrase.txt cached in memory. If it's cached, we don't attempt to read it again, which means we also won't run through that repair code again.

We could use a test that erases the cached password from memory, thereby forcing us to run through that code twice. We could ensure that after the second execution we're still using the same password. This would confirm that our second re-init doesn't replace the earlier data.

OS: Unspecified → All

I've implemented the idea from comment 21 in the separate patch.

With the original, broken repair patch, this test fails.

The updated repair patch contains two changes, the logic fix, and the safety consistency check. I have tested that each of those two changes is sufficient to prevent the regression. Having both improves safety further against accidental, future code changes.

Attached patch 1790610-fix-esr102.patch (obsolete) (deleted) — Splinter Review

fix backported to 102, I'm running my local thunderbird with this patch

Ben, do you have time to review the updated patch?

Flags: needinfo?(benc)

The revised patch looks good to me, but I'd recommend getting a second opinion too.

Flags: needinfo?(benc)

During additional testing I found that the updated patch is still insufficient.

Most users will have both a secring.gpg and an automatic backup file secring.gpg.old, which we create every time we save an updated secring.gpg file.

In addition, we have automatic fallback loading old the secring.gpg.old file.

With the current patch if corruption is detected, we move file secring.gpg away. Then we automatically restart. Then our code attempts to fall back loading from the secring.gpg.old file, and it reports corruption. (This report is from older repair code, from a past corruption scenario, bug 1656287, which we haven't yet removed.)

I suggest that we do two things in addition, to prevent that misinterpretation:

  • also rename file secring.gpg.old to secring.gpg.old.corrupt
  • as a second precaution, create an empty file secring.gpg as part of repairing (to avoid falling back)

While the repair code helps to repair the original corruption in stable Thunderbird caused by bug 1720042,
it cannot repair the corruption that was caused by the earlier patch in this bug (only in Thunderbird Daily).

We currently don't have detection for that scenario.
The scenario is: We are able to decrypt the passphrase from encrypted-openpgp-passphrase.txt, but key stored in file secring.gpg cannot be decrypted using it.

I suggest that we add an additional patch, only for Daily, that reports this inconsistency.
(It could also happen if a user modifies file secring.gpg using external OpenPGP software, which the user shouldn't do.)

Attached patch backport-1790610-esr102-v2.patch (obsolete) (deleted) — Splinter Review
Attachment #9296270 - Attachment is obsolete: true
Attachment #9296455 - Attachment is obsolete: true

(In reply to Kai Engert (:KaiE:) from comment #29)

While the repair code helps to repair the original corruption in stable Thunderbird caused by bug 1720042,
it cannot repair the corruption that was caused by the earlier patch in this bug (only in Thunderbird Daily).

We currently don't have detection for that scenario.
The scenario is: We are able to decrypt the passphrase from encrypted-openpgp-passphrase.txt, but key stored in file secring.gpg cannot be decrypted using it.

I suggest that we add an additional patch, only for Daily, that reports this inconsistency.
(It could also happen if a user modifies file secring.gpg using external OpenPGP software, which the user shouldn't do.)

I'll move this patch to bug 1792450, because that bug introduced the need for this additional patch on c-c.

Comment on attachment 9296454 [details]
Bug 1790610 - Report if a secret key cannot be decrypted using our automatic passphrase. r=mkmelin

Revision D158250 was moved to bug 1792450. Setting attachment 9296454 [details] to obsolete.

Attachment #9296454 - Attachment is obsolete: true
Attachment #9294618 - Attachment description: Bug 1790610 - Detect and automatically repair OpenPGP key storage corruption. r=BenC → Bug 1790610 - Detect and automatically repair OpenPGP key storage corruption. r=mkmelin,BenC

Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/7cb73c32cc14
Detect and automatically repair OpenPGP key storage corruption. r=mkmelin,BenC
https://hg.mozilla.org/comm-central/rev/bbe2423152f3
Automated test to ensure OpenPGP key storage repair code doesn't contain a logic error. r=BenC

Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED

I've done additional testing using daily-broken-import from 09-23, and with today's daily, and the repairing worked.
Also, if good openpgp already exist, today's daily don't corrupt them.
I've also been running the backported patch in my local 102 build for the last days, and it's been working fine.

I think the patch is good and can be added to beta.

Comment on attachment 9294618 [details]
Bug 1790610 - Detect and automatically repair OpenPGP key storage corruption. r=BenC

[Approval Request Comment]
Regression caused by (bug #): 1720042 (need repair)
User impact if declined: corrupted profiles and users not knowing about the need to repair, experiencing broken functionality at the time they attempt to use OpenPGP without indication what's wrong
Testing completed (on c-c, etc.): yes
Risk to taking this patch (and alternatives if risky): medium

Attachment #9294618 - Attachment description: Bug 1790610 - Detect and automatically repair OpenPGP key storage corruption. r=mkmelin,BenC → Bug 1790610 - Detect and automatically repair OpenPGP key storage corruption. r=BenC
Attachment #9294618 - Flags: approval-comm-beta?
Attachment #9296244 - Flags: approval-comm-beta?

Comment on attachment 9296244 [details]
Bug 1790610 - Automated test to ensure OpenPGP key storage repair code doesn't contain a logic error. r=BenC

[Triage Comment]
Approved for beta

Attachment #9296244 - Flags: approval-comm-beta? → approval-comm-beta+

Comment on attachment 9294618 [details]
Bug 1790610 - Detect and automatically repair OpenPGP key storage corruption. r=BenC

[Triage Comment]
Approved for beta

Attachment #9294618 - Flags: approval-comm-beta? → approval-comm-beta+

(In reply to Wayne Mery (:wsmwk) from comment #12)

(In reply to Andrei Hajdukewycz [:sancus] from comment #9)

I would strongly prefer we avoid any messaging if possible.

Why ?

Kai and I had discussed this right before I made the comment, so it was more summarizing the result of our meeting, but my view was basically: If we screwed it up, and it's worth fixing, then it should be done as automatically as possible. Complicated popup messages will confuse many users, and doing messages also means adding strings and localizing them as well. It's better to avoid all that if at all possible.

Flags: needinfo?(sancus)

Magnus, FYI, I think this bug shouldn't be responsible for scenarios where stored passwords cannot be read.
Unusable stored passwords could be seen it key4.db doesn't contain the symmetric key used for decrypting logins.json.

For the scenario handled in this bug (as caused by the code that was backed out in bug 1790605), if file logins.json already existed, file key4.db was not copied.

In other words, this bug here only affected users who did not yet have saved passwords.
(It affected openpgp and s/mime key storage, it did not affected stored passwords.)

I would like to see this uplifted to esr102, for the reason explained in comment 38.

I have tested the fix myself with my local esr102 build (which I'm running as my primary Thunderbird) , and I didn't see any problems.
Should we consider my own testing, plus no more bad reports from daily/beta sufficient for uplifting to esr102?

Or would you like to see additional testing by someone else?

I think it's most important to double check that there aren't any other lingering regressions like bug 1792450.

It would be good to get testing assistance from users who are already using OpenPGP with esr102, who own one or more OpenPGP secret keys.
Those users should backup their profile, then start the esr102-test build that I would provide, and confirm they are not experiencing any regressions, and no key dataloss.

I can ask for volunteers in the openpgp matrix channel.

If you're running Thunderbird 102 with your profile, and you have at least one OpenPGP secret key, and you'd like to volunteer to test the fix for this bug:

  • quit thunderbird
  • find your profile folder, and create a full backup of the folder
  • download one of the builds from the below links, for your respective platform
  • unpack and start that Thunderbird build

Then please test that you can still use the OpenPGP keys managed by Thunderbird (e.g. decrypt a message and send an encrypted an signed message.)

Windows 64bit:
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/TVg9JX1uRbmhd_wpjsqpQA/runs/0/artifacts/public/build/setup.exe
or
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/TVg9JX1uRbmhd_wpjsqpQA/runs/0/artifacts/public/build/target.zip

Windows 32bit:
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/K7PDJpUzTSCzu-yvYGWuGg/runs/0/artifacts/public/build/setup.exe
or
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/K7PDJpUzTSCzu-yvYGWuGg/runs/0/artifacts/public/build/target.zip

Linux 64bit:
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/cZ_cJ5-TT0uQs9kQ6u_kUA/runs/0/artifacts/public/build/target.tar.bz2

Linux 32bit:
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/DZtCCWDLSLKNgAsLm8LosQ/runs/0/artifacts/public/build/target.tar.bz2

macOS: build not yet ready

Downloaded Linux 64bit, imported profile from TB 91. Allowed several certificate exceptions (as I run a private server with self-signed certificate) and passwords as prompted. The file encrypted-openpgp-passphrase.txt existed, with time of original run of downloaded daily. File secring.gpg did not exist.
Account settings showed OpenPGP key ID, but said key could not be found. Imported ~/.gnupg/secring.gpg and all keys from ~/.gnupg/pubring.gpg.
Tested sending signed and encypted email, and receiving reply.
All OK.
Ubuntu 20.04 LTS
File encrypted-openpgp-passphrase.txt still had original timestamp.

Comment on attachment 9296457 [details] [diff] [review]
backport-1790610-esr102-v3.patch

see comment 38.

Requesting approval, to get this on Wayne's radar.

Attachment #9296457 - Flags: approval-comm-esr102?

Comment on attachment 9296457 [details] [diff] [review]
backport-1790610-esr102-v3.patch

[Triage Comment]
Approved for esr102

Attachment #9296457 - Flags: approval-comm-esr102? → approval-comm-esr102+

Comment on attachment 9296244 [details]
Bug 1790610 - Automated test to ensure OpenPGP key storage repair code doesn't contain a logic error. r=BenC

please uplift the test, too

Attachment #9296244 - Flags: approval-comm-esr102?

Comment on attachment 9296244 [details]
Bug 1790610 - Automated test to ensure OpenPGP key storage repair code doesn't contain a logic error. r=BenC

[Triage Comment]
Approved for esr102

Attachment #9296244 - Flags: approval-comm-esr102? → approval-comm-esr102+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: