Closed Bug 1750188 Opened 3 years ago Closed 3 years ago

connectivity failure to many https sites: SEC_ERROR_LIBRARY_FAILURE

Categories

(Core :: Security: PSM, defect, P1)

defect

Tracking

()

RESOLVED FIXED
98 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox96 --- unaffected
firefox97 --- unaffected
firefox98 --- fixed

People

(Reporter: aryx, Unassigned)

References

(Regression)

Details

(Keywords: regression)

Firefox 98.0a1 20220114093102 on Windows 8.1

Since the update to the mentioned, connecting to many https site fails with SEC_ERROR_LIBRARY_FAILURE, e.g. https://sql.telemetry.mozilla.org/ , https://calendar.google.com/

Accessing the resources works with Firefox 97.0 beta 2.

Bug 1748341 and bug 1747320 landed in the last Nightly.

Benjamin, Dennis, can you confirm what the issue is?

Flags: needinfo?(djackson)
Flags: needinfo?(bbeurdouche)

I used mozregression with my regular Nightly profile and got:

Last good revision: 1bad8541569f702f0673a92bdd64b70bc22d5292
First bad revision: d4a6f5cb9b3f66a9c2282b334b087503e99c462d
Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=1bad8541569f702f0673a92bdd64b70bc22d5292&tochange=d4a6f5cb9b3f66a9c2282b334b087503e99c462d

This points to bug 1747320.

Thanks Viktor. This reproduces with existing profiles but not new profiles (due to blocklist not downloaded yet?). Backout of bug 1747320 and new Nightlies scheduled.

Edit: After ~5 it also reproduces in a new Nightly profile.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(djackson)
Flags: needinfo?(bbeurdouche)
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch
Keywords: regression
Regressed by: 1747320
Has Regression Range: --- → yes

Bug 1748341 and bug 1747320 landed in the last Nightly.
Benjamin, Dennis, can you confirm what the issue is?

Yes, I can confirm after reproducing that backing out 1747320 locally seems to resolve the issue.

We need to find a way to add tests for this too. It's pretty concerning that nothing in CI went orange with this bug.

I think it wasn't caught because it requires the use of an old profile. I couldn't reproduce on a fresh profile.
I would suspect we don't keep a collection of old profiles?

Conditioned profiles ("condprof" at Treeherder) generate profiles which can be used by other CI tasks. See bug 1562870 which implemented them.

(In reply to Ryan VanderMeulen [:RyanVM] from comment #4)

We need to find a way to add tests for this too. It's pretty concerning that nothing in CI went orange with this bug.

This is an issue that we've faced in IndexedDB and I think the general proposal would be that to detect profile-related regressions we potentially not only need point-in-time profiles (ex: a profile from the last release, a profile from the last upgrade-path-release), but also for the test infrastructure to have tests run against rolling profiles where the tests run against the preceding night's nightly profile at the end of the run which is based on the previous night's nightly run and exposed to the next night's nightly as long as it passed, etc. edit: Also clearly a fresh profile needs to be in the mix too.

In particular, in IDB we regularly create tests that involve point-in-time snapshots, but a regression we encountered was due to newly introduced code that affected nightlies going forward. Manually created snapshots can't cover that unless a team manually creates snapshots every day, etc. (And in the specific case in question someone changed something without review by the people who would know that a new snapshot was required if the change in question hadn't been unsound, etc. I will note herald rules do potentially help address that underlying review and awareness-related issue.)

(In reply to Benjamin Beurdouche [:beurdouche] from comment #5)

I think it wasn't caught because it requires the use of an old profile. I couldn't reproduce on a fresh profile.
I would suspect we don't keep a collection of old profiles?

Fresh profiles catch this issue after a few minutes too.

Just out of curiosity, when can we expect Win64 builds with the fix? The latest Win64 build from http://ftp.mozilla.org/pub/firefox/nightly/latest-mozilla-central/ still has the issue.

As mentioned above, the issue did affect new profiles after some period of use. The exact conditions were that RemoteSecuritySettings had to be initialized prior to a RemoteSettings poll. You can trigger the issue in affected build in a new profile by running:

const { RemoteSecuritySettings } = ChromeUtils.import("resource://gre/modules/psm/RemoteSecuritySettings.jsm");
RemoteSecuritySettings.init()

const { RemoteSettings } = ChromeUtils.import("resource://services-settings/remote-settings.js");
RemoteSettings.pollChanges()

Old profiles with a data.safe.bin file the security_state directory were affected immediately.

(In reply to Julien Cristau [:jcristau] from comment #10)

Builds from https://ftp.mozilla.org/pub/firefox/nightly/2022/01/2022-01-14-13-16-31-mozilla-central/ are fixed.

Confirmed, thanks very much!

It appears affected builds are not able to recover from auto-update, i.e. the build I had just claims there's no new versions available.

Severity: -- → S1
Priority: -- → P1

If you are affected by this issue and the browser is stuck on 2022-01-14 without seeing any update:

In about:config, set security.pki.crlite_mode to 0.
Now when going to "About Nightly" you should be able to see the update.

After restarting, don't forget to set security.pki.crlite_mode back to what it was (1 for most people) and enjoy! : )

Ps: an alternative way to fix this is to download Nightly from the website again

Flags: qe-verify+

I was not able to reproduce this issue using an affected Nightly build from 2021-01-14. I've tested on macOS 11 and Win 8.1 x86, it appears that the error is not displayed in my case, when accessing https://sql.telemetry.mozilla.org/ or https://calendar.google.com/. Also, I've used fresh and exiting profiles.

Hi, Sebastian! Could you please help us checking if the bug is fixed on latest Beta 89?

Flags: needinfo?(aryx.bugmail)

The failure got fixed by backout of bug 1747320 and hasn't been observed afterwards. One had to wait some minutes after the creation of a new profile with the affected build until all the certificate data had been downloaded to be able to reproduce the issue.

Flags: needinfo?(aryx.bugmail)
You need to log in before you can comment on or make changes to this bug.