Closed Bug 1577944 Opened 5 years ago Closed 5 years ago

Painfully slow loading of HTTPS when importing Windows certificates

Categories

(Core :: Security: PSM, defect, P1)

69 Branch
Desktop
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla71
Tracking Status
firefox-esr68 --- unaffected
firefox69 --- wontfix
firefox70 --- fixed
firefox71 --- fixed

People

(Reporter: steven+mozilla, Assigned: keeler)

References

(Regression)

Details

(Keywords: regression, Whiteboard: [psm-assigned])

Attachments

(3 files)

Over the past three or four months I've noticed an intermittent but frequent massive performance regression in Firefox Nightly. Sites are often painfully slow to load and in some cases load incorrectly, for example, the site may fail to load its stylesheets.

The system I am using is a Windows 10 machine. I'm running 32 bit nightly (70.0a1, for example 2019-07-21 and 2019-08-30, but I think version 69 is affected, see later). The problem occurs even in safe mode and on a fresh profile.

(When investigating I used a mix of my normal profile and a fresh profile and normal mode and safe mode, so individual results later might be for any combination, however, this makes no difference.)

The problem shows only on sites that use TLS (that is, HTTPS sites). I think it's related to importing enterprise root certificates. I'm a home user but I have both AVG and Qustodio installed on this system. Although AVG does not have web filtering turned on and although Qustodio is not enabled for my account, they do install certificates. I think the mere presence of these certificates is a factor.

The problem has been difficult to track down because it has proven to be awkward to reproduce under controlled conditions. This is, I believe, because at least one component of the problem is a cascading failure where performance degrades over short time periods (maybe queues fill up). The problem shows most clearly when navigating rapidly between sites as that is the time it can affect resources loading early in the page as opposed to less important resources that load later. Having other tabs running might make things worse.

I think sites which trigger HTTPS requests as they unload are most likely to affect the next page.

Even when the problem isn't severe enough to cause complete failure of the page, it shows up clearly in the network debugging tab as long blocked and TLS handshake times. In the example later, a resource on a site that normally spends 100 ms blocked, 80 ms on the TLS handshake and 62 ms to load, can balloon to 6.2 seconds blocked and 6.2 seconds on the TLS handshake (but the loading time is unaffected at 50 ms). However, this is not the worst case and requests can time out.

The problem is easiest to reproduce if the disable cache option is selected on the network debugging tab but the problem shows in normal operation even when the cache is enabled.

Some experimentation showed that the problem is correlated with having the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Mozilla\Firefox\Certificates\ImportEnterpriseRoots set to 1 (presumably the equivalent about:config setting, security.enterprise_roots.enabled, would also trigger the problem but I haven't checked). If this key is set to 0 then the problem goes away completely.

Because the problem appears to be related to a sequence of browsing, I developed a test where I navigated to the four sites https://www.bleepingcomputer.com/, http://www.theregister.co.uk/, http://www.telegraph.co.uk/ and http://news.bbc.co.uk/ in that order. Although I've used HTTP URLs for some of the sites, they immediately redirect to the HTTPS variant.

I also found another site that would consistently show a difference in performance even though it never completely failed. It was https://chromachecker.com/info/en/page/webbrowser

Attached are two screenshots of the network debugging tab, one with ImportEnterpriseRoots set to 0 and one with it set to 1. This is for the last step in the sequence test. I've highlighted the first CSS load for the site being navigated to (BBC). I also draw your attention to the requests due to the site I've just left (as identified by the referrer header). I've marked those with a dark red dot on the screenshot. This shows how navigating to a low resource site can suffer from a sequence of other network requests getting in the way (I believe this is why the problem was so difficult to get a reproduction for).

Some investigation with Process Monitor shows a vast difference in the number of accesses to cert9.db and the associated -journal and -wal files. I believe this indicates a change in the number of database accesses due to certificate activity. I don't know if these accesses are directly the cause of the slowdown or just a marker.

For the chromachecker site, for one measurement run, with ImportEnterpriseRoots=0, the browser made 780 access to cert9.db and 130 accesses to each of the -journal and -wal files (different runs show slightly different numbers so 780/130 could be 720/120). With ImportEnterpriseRoots=1 I saw 36354 accesses to cert9.db and 6059 accesses to the -journal and -wal files. This is an increase of roughly a factor of 45.

At one point when I was experimenting with sequences (possibly not the same sequence I described earlier), I saw a change from 3570/595 accesses to 291660/48654, an 80-fold increase.

Having Process Monitor running does slow down the system but, for reference, on just chromachecker, the total time of the cert9.db* accesses increases from ~50 ms to ~2 s. For the sequence the time goes from ~250 ms to ~20 s.

Using moz-regresssion, I've tracked the regression down to the fix for bug 1551177 (skip self-signed intermediate certificates that aren't trust anchors in path building). That's why I've marked this against Firefox version 69 and against the Core Security: PSM component.

On the last working autoland build (8b074e2a3b68) for the chromachecker site for ImportEnterpriseRoots of 0 and 1, I saw 876/146 and 720/120 accesses respectively (the 876/146 measurement could be a normal variation). For the first broken autoland build (5a721a7648f2, https://hg.mozilla.org/integration/autoland/rev/5a721a7648f2) I saw 720/120 and 34308/5718 accesses.

Given that the problem is related to accessing the OS's root certificates and that bug 1551177 is something to do with self-signed certificates, I had a look at the Windows certificate store. However, at this point it's really getting past the level I can follow. The following information from Powershell might be useful to others:

PS C:\Users\Steven> import-module pki
PS C:\Users\Steven> cd Cert:\LocalMachine\Root
PS Cert:\LocalMachine\Root> @(dir -recurse | ? {$_.Issuer -ne $_.Subject}).Count
1
PS Cert:\LocalMachine\Root> @(dir -recurse | ? {$_.Issuer -eq $_.Subject}).Count
414
PS Cert:\LocalMachine\Root> @(dir -recurse | ? {$_.Issuer -like 'CN=*Qustodio*' }).Count
18
PS Cert:\LocalMachine\Root> @(dir -recurse | ? {$_.Issuer -like 'CN=*AVG*' }).Count
2

Searching through Bugzilla, I've found bug 1555306 (Collision on NS find_objects()/sqlite3PagerSharedLock() when fetching OSCP causes extereme variance in page load times), so I tried turning off the option "Query OCSP responder servers to confirm the current validity of certificates" but it had no effect.

I also found bug 1577038 (add PK11_GetCertsFromPrivateKey to return all certificates with public keys matching a particular private key) and related bug 1573542 (Slow performance (switching tabs, interacting with map) on this map) which sound potentially similar.

I've been digging in with the profiler. This is my first time using it so I may have got something wrong.

It's somewhat difficult to work out what's going on as a lot of the threads spend a lot of time stalled waiting. I think there are two dominant reasons: one is some sort of thread pool exhaustion. The other is that the problem is dominated by waiting for other threads to release locks. I think the first problem is a symptom of the second problem.

I've published a profile trace at https://perfht.ml/2PJCPgK

For example, for SSL Cert #46, the thread runs for 4.2 seconds spending 2.4 seconds waiting (in nsThreadPool::Run::Wait) and 1.8 seconds apparently trying to do some work.

Chasing down the 1.8 seconds, it starts in SSLServerCertVerificationJob::Run() then ending up in NSSCertDBTrustDomain::GetCertTrust being called from CheckCandidates.

I believe this call to GetCertTrust from CheckCandidates was added by the fix for bug 1551177. The code has moved a little since then, but I think the code is now:

static Result CheckCandidates(TrustDomain& trustDomain,
                              TrustDomain::IssuerChecker& checker,
                              Vector<Input>& candidates,
                              Input* nameConstraintsInputPtr, bool& keepGoing) {
  for (Input candidate : candidates) {
    if (ShouldSkipSelfSignedNonTrustAnchor(trustDomain, candidate)) { <--- Added by bug 1551177, the delay is here
      continue;
    }
    Result rv = checker.Check(candidate, nameConstraintsInputPtr, keepGoing);
    if (rv != Success) {
      return rv;
    }
    if (!keepGoing) {
      return Success;
    }
  }

  return Success;
}

ShouldSkipSelfSignedNonTrustAnchor doesn't appear in the profile trace, so I think it's been inlined. The code is:

static bool ShouldSkipSelfSignedNonTrustAnchor(TrustDomain& trustDomain,
                                               Input certDER) {
  [...]					       
  if (trustDomain.GetCertTrust(EndEntityOrCA::MustBeCA, CertPolicyId::anyPolicy,   <-- This call causes the delay
                               certDER, trust) != Success) {
    return false;
  }
  [...]

Digging further, the profiler says 100% of the time spend in GetCertTrust is accrued by the call to CERT_NewTempCertificate. The code looks like this:

Result NSSCertDBTrustDomain::GetCertTrust(EndEntityOrCA endEntityOrCA,
                                          const CertPolicyId& policy,
                                          Input candidateCertDER,
                                          /*out*/ TrustLevel& trustLevel) {
  // XXX: This would be cleaner and more efficient if we could get the trust
  // information without constructing a CERTCertificate here, but NSS doesn't
  // expose it in any other easy-to-use fashion. The use of
  // CERT_NewTempCertificate to get a CERTCertificate shouldn't be a
  // performance problem because NSS will just find the existing
  // CERTCertificate in its in-memory cache and return it.
  SECItem candidateCertDERSECItem = UnsafeMapInputToSECItem(candidateCertDER);
  UniqueCERTCertificate candidateCert(CERT_NewTempCertificate(                       <-- This call causes the delay
      CERT_GetDefaultCertDB(), &candidateCertDERSECItem, nullptr, false, true));
  if (!candidateCert) {
    return MapPRErrorCodeToResult(PR_GetError());
  }
  [...]

That comment is worrying. I think it's not true.

Inside CERT_NewTempCertificate, the 1.8 seconds splits into 1 second spent in NSSTrustDomain_FindCertificateByEncodedCertificate and 0.8 seconds in stan_GetCERTCertificate.

Following the first of those, which delegates into nssTrustDomain_FindCertificateByIssuerAndSerialNumber, we end in getting to nssToke_FindObjectsByTempate twice, each time this calls RtlEnterCriticalSection (and follows the contended path). The first takes 0.54 seconds to get the critical section, the other 0.04 seconds.

The second path (0.8 s total) also has two child paths of interest. The first starting with nssTrustDomain_FindTrustForCertificate costs 0.38 seconds of which 0.31 s is in RtlEnterCriticalSection from nssToken_FindObjectsByTemplate. The second starts with nssTrust_GetCERTCertTrustForCert and also ends up at nssToken_FindObjectsByTemplate this time spending 0.2 seconds to enter the critical section.

So, of the 1.8 seconds spent in this thread, 1.1 seconds (~60%) is spent trying to acquire the critical section in find_objects called from nssToken_FindObjectsByTemplate.

In addition there are four other paths that end up at RtlEnterCriticalSection which do not get there via find_objects. These account for 0.14 s taking the total critical section deadlock time up to nearer 70% of the 1.8 seconds active time.

The code in nssToken_FindObjectsByTemplate, looks like this:

NSS_IMPLEMENT nssCryptokiObject **
nssToken_FindObjectsByTemplate(
    NSSToken *token,
    nssSession *sessionOpt,
    CK_ATTRIBUTE_PTR obj_template,
    CK_ULONG otsize,
    PRUint32 maximumOpt,
    PRStatus *statusOpt)
{
    [...]
    /* If these objects are being cached, try looking there first */
    if (token->cache &&
        nssTokenObjectCache_HaveObjectClass(token->cache, objclass)) {           <-- Don't see calls to this
        PRStatus status;
        objects = nssTokenObjectCache_FindObjectsByTemplate(token->cache,        <-- Don't see calls to this
                                                            objclass,
                                                            obj_template,
                                                            otsize,
                                                            maximumOpt,
                                                            &status);
        if (status == PR_SUCCESS) {
            if (statusOpt)
                *statusOpt = status;
            return objects;
        }
    }
    /* Either they are not cached, or cache failed; look on token. */
    objects = find_objects(token, sessionOpt,                                    <-- This is the expensive call
                           obj_template, otsize,
                           maximumOpt, statusOpt);
    return objects;
}

So, judging by the comments, the code path we're seeing is taken only if the objects are not cached, but the explanatory comment in GetCertTrust that says it's cheap to call CERT_NewTempCertificate is relying on the object being in cache.

As well as the time spend waiting to enter critical sections, it's worth noting that the thread also sees delays accessing the file system when performing sqlite operations. This may, effectively, be another contention with concurrent database accesses colliding in the file system rather than directly in the process. The file access delays are much smaller than the critical section contention delays, at least on the one thread I've looked at.

Overall, (that is, not just in deadlock contention), of the 1.8 seconds the thread spends trying to do useful work, 1.4 seconds is spent in find_objects.

Comment 9 of bug 1555306 speculates that missing the cache in CERT_NewTempCertificate() could be really expensive as the system performs I/O while holding the lock.

Thanks for looking in to this!
How many enterprise roots/intermediates is Firefox importing for you? To determine this, first you'll need to enable the browser console by setting the preference devtools.chrome.enabled to true in about:config. Then you can open the browser console with ctrl + shift + j. There should be a prompt (>>) where you can run the following:
Cc["@mozilla.org/psm;1"].getService(Ci.nsINSSComponent).getEnterpriseRoots().length
Cc["@mozilla.org/psm;1"].getService(Ci.nsINSSComponent).getEnterpriseIntermediates().length
(if you see a red error that says NS_NOINTERFACE, don't worry about it - it's unrelated)

Flags: needinfo?(steven+mozilla)

On the test profile I've been using:

Cc["@mozilla.org/psm;1"].getService(Ci.nsINSSComponent).getEnterpriseRoots().length
142
Cc["@mozilla.org/psm;1"].getService(Ci.nsINSSComponent).getEnterpriseIntermediates().length
0

142 certificates is somewhat less than the 415 stored on the machine, but it's still a lot, and it's more than the 45 fold increase I'm seeing in cert9.db accesses. I got curious what they were. The certificates are returns as plain byte arrays rather than objects and I don't know how to turn them into objects. However, I can see they contain things that look like ASCII strings. On the assumption the strings contained something identify the certificates, I wrote a bit of terse Javascript to extract the strings and print them.

(I've added linebreaks to show what I'm doing. You'll have to remove them if you want to run it)

console.log(
  Cc["@mozilla.org/psm;1"]
    .getService(Ci.nsINSSComponent)
    .getEnterpriseRoots().map(
      function (x) {
        return x.map(function (x) { return String.fromCharCode(x < 32 || x >= 127 ? 1 : x) })
                .join("")
                .split("\001")
                .filter(function (x) { return x.length > 8 })
                .join("/")
                .replace(/^[^A-Za-z]+/, "")
                .substr(0, 60)
      }).sort().join("\n"))
undefined
AC Camerfirma SA CIF A827432871#0!/http://www.chambersign.or
AC Camerfirma SA CIF A827432871#0!/http://www.chambersign.or
ACEDICOM Root1/080418162422Z/280413162422Z0D1/ACEDICOM Root1
AOL Time Warner Inc.1/America Online Inc.1705/.AOL Time Warn
AOL Time Warner Inc.1/America Online Inc.1705/.AOL Time Warn
AVG Technologies cz1/Engineering1/AVG Technologies0/16082519
AddTrust AB1&0$/AddTrust External TTP Network1"0 /AddTrust E
AddTrust AB1/AddTrust TTP Network1 0/AddTrust Public CA Root
AddTrust AB1/AddTrust TTP Network1!0/AddTrust Class 1 CA Roo
AddTrust AB1/AddTrust TTP Network1#0!/AddTrust Qualified CA 
America Online Inc.1604/-America Online Root Certification A
America Online Inc.1604/-America Online Root Certification A
Baden-Wuerttemberg (BW)1/Stuttgart1)0'/ Deutscher Sparkassen
Baltimore1/CyberTrust1"0 /Baltimore CyberTrust Root0/0005121
Barcelona1/Barcelona1.0,/%IPS Internet publishing Services s
Barcelona1/Barcelona1.0,/%IPS Internet publishing Services s
Barcelona1/Barcelona1.0,/%IPS Internet publishing Services s
Barcelona1/Barcelona1.0,/%IPS Internet publishing Services s
Barcelona1/Barcelona1.0,/%IPS Internet publishing Services s
Barcelona1/Barcelona1.0,/%IPS Internet publishing Services s
Budapest1'0%/NetLock Halozatbiztonsagi Kft.1/Tanusitvanykiad
Budapest1/NetLock Kft.1705/k (Certification Services)1503/,N
CNNIC ROOT0/070416070914Z/270416070914Z021/CNNIC ROOT0
Chunghwa Telecom Co., Ltd.1*0(/!ePKI Root Certification Auth
ComSign CA1/040324113218Z/290319150218Z041/ComSign CA1/,http
ComSign Secured CA1/040324113720Z/290316150456Z0<1/ComSign S
Copyright (c) 1997 Microsoft Corp.1/Microsoft Corporation1!0
Copyright (c) 20051"0 /OISTE Foundation Endorsed1(0&/OISTE W
Cybertrust, Inc1/Cybertrust Global Root0/061215080000Z/21121
Dhimyotis1/Certigna0/070629151305Z/270629151305Z041/Dhimyoti
DigiCert Inc1/www.digicert.com1 0/DigiCert Global Root CA0/0
DigiCert Inc1/www.digicert.com1$0"/DigiCert Assured ID Root 
DigiCert Inc1/www.digicert.com1+0)/"DigiCert High Assurance 
Digital Signature Trust Co.1/DST Root CA X30/000930211219Z/2
Entrust, Inc.1907/0www.entrust.net/CPS is incorporated by re
Entrust.net1?0=/6www.entrust.net/SSL_CPS incorp. by ref. (li
Entrust.net1@0>/7www.entrust.net/CPS_2048 incorp. by ref. (l
Entrust.net1@0>/7www.entrust.net/GCCA_CPS incorp. by ref. (l
Entrust.net1H0F/?www.entrust.net/Client_CA_Info/CPS incorp. 
Equifax Secure Inc.1&0$/Equifax Secure eBusiness CA-10/99062
Equifax Secure Inc.1-0+/$Equifax Secure Global eBusiness CA-
GeoTrust Inc.1 0/GeoTrust Universal CA 20/040304050000Z/2903
GeoTrust Inc.1/GeoTrust Global CA0/020521040000Z/22052104000
GeoTrust Inc.1/GeoTrust Universal CA0/040304050000Z/29030405
GeoTrust Inc.110//(GeoTrust Primary Certification Authority0
GeoTrust Inc.1907/0(c) 2007 GeoTrust Inc. - For authorized u
GeoTrust Inc.1907/0(c) 2008 GeoTrust Inc. - For authorized u
GlobalSign Root CA - R21/GlobalSign1/GlobalSign0/06121508000
GlobalSign nv-sa1/GlobalSign Root CA0/980901120000Z/28012812
Government Root Certification Authority0/021205132333Z/32120
Greater Manchester1/COMODO CA Limited1'0%/COMODO Certificati
Greater Manchester1/COMODO CA Limited1+0)/"COMODO ECC Certif
Greater Manchester1/Comodo CA Limited1!0/AAA Certificate Ser
Greater Manchester1/Comodo CA Limited1$0"/Secure Certificate
Greater Manchester1/Comodo CA Limited1%0#/Trusted Certificat
Hongkong Post1 0/Hongkong Post Root CA 10/030515051314Z/2305
IGC/A1#0!/igca@sgdn.pm.gouv.fr0/021213142923Z/201017142922Z0
Japan Certification Services, Inc.1/SecureSign RootCA110/090
Network Solutions L.L.C.100./'Network Solutions Certificate 
Primary Certificate Authority (2009)1/California1/Netflix In
QuoVadis Limited1%0#/Root Certification Authority1.0,/%QuoVa
QuoVadis Limited1/QuoVadis Root CA 20/061124182700Z/31112418
QuoVadis Limited1/QuoVadis Root CA 30/061124191123Z/31112419
Qustodio (35254)0/181228143644Z/210924143644Z0(1/Qustodio (3
Qustodio (37641)0/180412174853Z/210107174853Z0(1/Qustodio (3
Qustodio (39484)0/190206210813Z/211103210813Z0(1/Qustodio (3
Qustodio (41694)0/171012125621Z/200709125621Z0(1/Qustodio (4
Qustodio (43953)0/180320192149Z/201215192149Z0(1/Qustodio (4
Qustodio (46094)0/170223003405Z/191120003405Z0(1/Qustodio (4
Qustodio (63571)0/170222225247Z/191119225247Z0(1/Qustodio (6
Qustodio (64201)0/170222233730Z/191119233730Z0(1/Qustodio (6
Qustodio (64827)0/180722123707Z/210418123707Z0(1/Qustodio (6
Qustodio (65621)0/171209232632Z/200905232632Z0(1/Qustodio (6
Qustodio (77477)0/190301232033Z/211126232033Z0(1/Qustodio (7
Qustodio (77654)0/180521010034Z/210215010034Z0(1/Qustodio (7
Qustodio (79771)0/170529203151Z/200224203151Z0(1/Qustodio (7
Qustodio (9676)0/181109193641Z/210806193641Z0'1/Qustodio (96
Qustodio (98685)0/170614093025Z/200311093025Z0(1/Qustodio (9
RSA Security Inc1/RSA Security 2048 V30/010222203923Z/260222
SECOM Trust Systems CO.,LTD.1*0(/!Security Communication EV 
SECOM Trust.net1'0%/Security Communication RootCA10/03093004
Scottsdale1%0#/Starfield Technologies, Inc.1200/)Starfield R
Scottsdale1/GoDaddy.com, Inc.110//(Go Daddy Root Certificate
SecureTrust Corporation1/Secure Global CA0/061107194228Z/291
SecureTrust Corporation1/SecureTrust CA0/061107193118Z/29123
Sociedad Cameral de Certificaci/n Digital - Certic/mara S.A.
Sonera Class1 CA0/010406104913Z/210406104913Z091/Sonera Clas
Sonera Class2 CA0/010406072940Z/210406072940Z091/Sonera Clas
Sony Computer Entertainment Inc.1/SCEI DNAS Root 010/0407120
Sony Computer Entertainment Inc.1/SCEI DNAS Root 020/0407120
Sony Computer Entertainment Inc.1/SCEI DNAS Root 030/0407120
Sony Computer Entertainment Inc.1/SCEI DNAS Root 040/0407120
Sony Computer Entertainment Inc.1/SCEI DNAS Root 050/0407120
Sony Electronics1/San Diego1/California1/internet.sony.tv Ro
Staat der Nederlanden1+0)/"Staat der Nederlanden Root CA - G
Starfield Technologies, Inc.1200/)Starfield Class 2 Certific
StartCom Ltd.1+0)/"Secure Digital Certificate Signing1)0'/ S
SwissSign AG1!0/SwissSign Silver CA - G20/061025083246Z/3610
SwissSign AG1#0!/SwissSign Platinum CA - G20/061025083600Z/3
SwissSign AG1/SwissSign Gold CA - G20/061025083035Z/36102508
Swisscom1%0#/Digital Certificate Services1/Swisscom Root CA 
TC TrustCenter GmbH1"0 /TC TrustCenter Class 2 CA1%0#/TC Tru
TC TrustCenter GmbH1"0 /TC TrustCenter Class 3 CA1%0#/TC Tru
TC TrustCenter GmbH1$0"/TC TrustCenter Universal CA1&0$/TC T
TDC Internet1/TDC Internet Root CA0/010405163317Z/2104051703
TDC OCES CA0/030211083930Z/370211090930Z011/TDC OCES CA0/#ht
The Go Daddy Group, Inc.110//(Go Daddy Class 2 Certification
Unizeto Sp. z o.o.1/Certum CA0/020611104639Z/270611104639Z0>
VeriSign, Inc.1/VeriSign Trust Network1:08/1(c) 1999 VeriSig
VeriSign, Inc.1/VeriSign Trust Network1:08/1(c) 1999 VeriSig
VeriSign, Inc.1/VeriSign Trust Network1:08/1(c) 1999 VeriSig
VeriSign, Inc.1/VeriSign Trust Network1:08/1(c) 1999 VeriSig
VeriSign, Inc.1/VeriSign Trust Network1:08/1(c) 2006 VeriSig
VeriSign, Inc.1/VeriSign Trust Network1:08/1(c) 2007 VeriSig
VeriSign, Inc.1/VeriSign Trust Network1:08/1(c) 2008 VeriSig
VeriSign, Inc.1705/.Class 1 Public Primary Certification Aut
VeriSign, Inc.1705/.Class 2 Public Primary Certification Aut
VeriSign, Inc.1705/.Class 3 Public Primary Certification Aut
VeriSign, Inc.1<0:/3Class 1 Public Primary Certification Aut
VeriSign, Inc.1<0:/3Class 2 Public Primary Certification Aut
VeriSign, Inc.1<0:/3Class 3 Public Primary Certification Aut
VeriSign, Inc.1<0:/3Class 4 Public Primary Certification Aut
Visa International Service Association1/Visa eCommerce Root0
Washington1/Microsoft Corporation1200/)Microsoft Root Certif
Washington1/Microsoft Corporation1200/)Microsoft Root Certif
Wells Fargo WellsSecure1/Wells Fargo Bank NA1604/-WellsSecur
Wells Fargo1,0*/#Wells Fargo Certification Authority1/0-/&We
Western Cape1/Cape Town1/Thawte Consulting cc1(0&/Certificat
Western Cape1/Cape Town1/Thawte Consulting cc1(0&/Certificat
Western Cape1/Cape Town1/Thawte Consulting1(0&/Certification
Western Cape1/Durbanville1/Thawte Certification1/Thawte Time
beTRUSTed1/beTRUSTed Root CAs1/0-/&beTRUSTed Root CA - RSA I
beTRUSTed1/beTRUSTed Root CAs1301/*beTRUSTed Root CA - Entru
beTRUSTed1/beTRUSTed Root CAs1301/*beTRUSTed Root CA-Baltimo
certSIGN1/certSIGN ROOT CA0/060704172004Z/310704172004Z0;1/c
generated by AVG Antivirus for SSL/TLS scanning1/AVG Web/Mai
me.sel.sony.com ROOT CA1/Sony Electronics1/San Diego1/Califo
microsoft1-0+/$Microsoft Root Certificate Authority0/0105092
thawte, Inc.1(0&/Certification Services Division1806//(c) 20
thawte, Inc.1(0&/Certification Services Division1806//(c) 20
thawte, Inc.1806//(c) 2007 thawte, Inc. - For authorized use
www.xrampsecurity.com1$0"/XRamp Security Services Inc1-0+/$X
Flags: needinfo?(steven+mozilla)
Assignee: nobody → dkeeler
Priority: -- → P1
Regressed by: 1551177
Whiteboard: [psm-assigned]
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

You got me curious about how many certificates this computer has and whether it's normal. According to Manage User Certificates, this machine has 415 certificates in "Trusted Root Certificate Authorities" (equivalent to PowerShell path Cert:\LocalMachine\Root or Cert:\CurrentUser\Root, I get the same number in both) and 227 in "Third-Party Root Certification Authorities" (Cert:...\AuthRoot). That's a lot of trust. I had a look at another machine we have and the numbers are 78 and 45 respectively, which is quite a difference.

I've not gone out of my way to install lots of certificates on this machine. So either they were installed when I got the machine or they've been acquired since. This machine is old: it was bought in 2011 so it's had many updates deployed to it (including the Windows 7 to Windows 10 upgrade). The OEM is Sony so perhaps they install a lot by default. There are multiple users running different browsers.

I checked to see if there were a lot of expired certificates, and although there are some, they're not dominant, and indeed, many are recently expired, which may well be reasonable (I'll note that this morning the total is 416 instead of 415 recorded earlier, it was also 415 last night, so this may just be some short-duration churn):

PS Cert:\LocalMachine\Root\> @(dir | ? {$_.NotAfter -lt (Get-Date)}).Count
83
PS Cert:\LocalMachine\Root\> @(dir | ? {$_.NotAfter -ge (Get-Date)}).Count
333
PS Cert:\LocalMachine\Root\> dir | ? {$_.NotAfter -lt (Get-Date)} | Group-Object {$_.NotAfter.Year} | Sort-Object Name

Count Name                      Group
----- ----                      -----
    2 1999                      {[Subject]...
    3 2004                      {[Subject]...
    1 2006                      {[Subject]...
    1 2010                      {[Subject]...
    2 2011                      {[Subject]...
    1 2013                      {[Subject]...
    6 2015                      {[Subject]...
   10 2016                      {[Subject]...
   11 2017                      {[Subject]...
   10 2018                      {[Subject]...
   36 2019                      {[Subject]...

I also used SysInternals sigcheck to see if there were a lot of certificates that weren't trusted by Microsoft. In total it found 65 but it's not even as bad as that. 18 are local roots installed by AVG or Qustodio so there are only 47 non-standard certificates.

None of the certificates reported by sigcheck -tv are expired so I think it ignores expired certificates (as it reported 15 Qustodio certificates but there are 18 in total, 3 of which are expired).

I have a patch that I think will help - can you try this build: https://queue.taskcluster.net/v1/task/B5aW7SJPTriluKHifB_biA/runs/0/artifacts/public/build/setup.exe (with the same profile you've been using)

Flags: needinfo?(steven+mozilla)

Well that was more hassle than it should have been :-)

The link you gave to setup.exe didn't work. It downloaded a 223 kB file which uninstalled Firefox but then didn't actually install anything. I tried several times, thinking I'd done something wrong but to no avail.

Eventually I switched to https://queue.taskcluster.net/v1/task/B5aW7SJPTriluKHifB_biA/runs/0/artifacts/public/build/install/sea/target.installer.exe (a more plausible 44 MB) which seemed to install fine (except by the time I'd got home this evening, Nightly had already upgraded so your build was treated as a downgrade and I had to remember how to tell it to allow my existing profile to run just in case).

I mention this not because I'm complaining but more as a warning there may have been some finger trouble on my part and a somewhat confused system (I even managed to trigger my antivirus at one point). I found it interesting to learn where these builds were stored.

The build you provided seems to have fixed the problem. I now see no performance difference when enterprise roots are imported compared to when they are excluded. Pages load quickly in both cases. I've also used Process Monitor to check the number of accesses to cert9.db and they seem to be the same both with and without enterprise roots imported.

I cross-checked I hadn't done something funny by using the Cc...getEnterpriseRoots() command you gave earlier and I confirmed my tests were running with 142 certificates imported.

Then, as a further cross-check, I uninstalled your build and reinstalled the latest Nightly and confirmed the problem still existed with that.

So, it all looks good to me.

Flags: needinfo?(steven+mozilla)

Calling CERT_NewTempCertificate on an enterprise certificate is inefficient
because NSS tries (and fails) to find a copy of that certificate in its internal
data structures (which includes querying softoken, which involves hitting the
disk). We can avoid doing so for these certificates in
NSSCertDBTrustDomain::GetCertTrust because we already know what trust values
they should have (after checking the relevant blocklists).

Pushed by dkeeler@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2f8f58d37cbc avoid calling CERT_NewTempCertificate in NSSCertDBTrustDomain::GetCertTrust for enterprise certificates r=jcj,kjacobs
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71

Does this need uplift to beta 70?

Flags: needinfo?(dkeeler)

Beta would be nice. ESR 68 should also get this.

Flags: needinfo?(dkeeler)

Comment on attachment 9092199 [details]
bug 1577944 - avoid calling CERT_NewTempCertificate in NSSCertDBTrustDomain::GetCertTrust for enterprise certificates r?jcj,kjacobs

Beta/Release Uplift Approval Request

  • User impact if declined: For users that enable the enterprise roots feature, certificate verification can be much slower than necessary.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This patch reorders some code and adds an early-return case where we don't need to keep looking for information, so it should be safe.
  • String changes made/needed:

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: This is a performance issue.
  • User impact if declined: For users that enable the enterprise roots feature, certificate verification can be much slower than necessary.
  • Fix Landed on Version:
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This patch reorders some code and adds an early-return case where we don't need to keep looking for information, so it should be safe.
  • String or UUID changes made by this patch:
Attachment #9092199 - Flags: approval-mozilla-esr68?
Attachment #9092199 - Flags: approval-mozilla-beta?

As the person who originally reported this, I feel I should say something to confirm my issue is now resolved in Nightly and nothing went wrong during the final promotion.

From the updates on this ticket, I saw this land on Nightly a few days ago. I reset ImportEnterpriseRoots to 1 and I've been keeping an eye on the performance.

I'm pleased to say the problem is completely resolved. Nightly feels light and responsive, just as it always should.

I've also used Process Monitor to confirm that the number of accesses to cert9.db is back under control and it is. The behaviour is similar to before the regression.

Thanks for resolving this.

Comment on attachment 9092199 [details]
bug 1577944 - avoid calling CERT_NewTempCertificate in NSSCertDBTrustDomain::GetCertTrust for enterprise certificates r?jcj,kjacobs

Fix for a P1 perf regression importing Windows certificates, the fix was verified as working on Nightly by the reporter and is covered by test, uplift accepted for 70 beta 9, thanks.

Attachment #9092199 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9092199 [details]
bug 1577944 - avoid calling CERT_NewTempCertificate in NSSCertDBTrustDomain::GetCertTrust for enterprise certificates r?jcj,kjacobs

Fixes a severe perf regression under some circumstances. Approved for 68.2esr.

Attachment #9092199 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+

Backed out changeset e7b03786bc00 (bug 1577944) for build bustage at security/certverifier/NSSCertDBTrustDomain.cpp

Backout: https://hg.mozilla.org/releases/mozilla-esr68/rev/3d0cfdc3894b01bb3f1ef035c8cb3e6816502d8f

Failure push: https://treeherder.mozilla.org/#/jobs?repo=mozilla-esr68&selectedJob=268368289&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&revision=86981b66c0ce813515653d57f9e4651981174f07

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=268368289&repo=mozilla-esr68&lineNumber=11790

task 2019-09-25T14:16:32.918Z] 14:16:32 INFO - config/external/icu/i18n/region.o
[task 2019-09-25T14:16:32.918Z] 14:16:32 INFO - make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/config/external/icu/i18n'
[task 2019-09-25T14:16:33.538Z] 14:16:33 INFO - make[4]: Entering directory '/builds/worker/workspace/build/src/obj-firefox/security/certverifier'
[task 2019-09-25T14:16:33.539Z] 14:16:33 INFO - /builds/worker/workspace/build/src/clang/bin/clang++ -m32 -o Unified_cpp_certverifier0.o -c -I/builds/worker/workspace/build/src/obj-firefox/dist/stl_wrappers -I/builds/worker/workspace/build/src/obj-firefox/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -DDEBUG=1 -DSTATIC_EXPORTABLE_JS_API -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -I/builds/worker/workspace/build/src/security/certverifier -I/builds/worker/workspace/build/src/obj-firefox/security/certverifier -I/builds/worker/workspace/build/src/security/ct -I/builds/worker/workspace/build/src/security/manager/ssl -I/builds/worker/workspace/build/src/obj-firefox/dist/include -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nspr -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nss -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-firefox/mozilla-config.h -Qunused-arguments -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -Qunused-arguments -Wall -Wbitfield-enum-conversion -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wshadow-field-in-constructor-modified -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wwrite-strings -Wno-invalid-offsetof -Wclass-varargs -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wloop-analysis -Wc++1z-compat -Wc++2a-compat -Wcomma -Wimplicit-fallthrough -Werror=non-literal-null-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-enum-zero-compare -Wtautological-unsigned-zero-compare -Wno-inline-new-delete -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=backend-plugin -Wno-error=return-std-move -Wno-error=atomic-alignment -Wformat -Wformat-security -Wno-gnu-zero-variadic-macro-arguments -Wno-unknown-warning-option -Wno-return-type-c-linkage -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fcrash-diagnostics-dir=/builds/worker/artifacts -march=pentium-m -msse -msse2 -mfpmath=sse -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -Xclang -load -Xclang /builds/worker/workspace/build/src/obj-firefox/build/clang-plugin/libclang-plugin.so -Xclang -add-plugin -Xclang moz-check -Os -fno-omit-frame-pointer -funwind-tables -Werror -Wall -Wextra -Wunreachable-code -Wno-unused-parameter -MD -MP -MF .deps/Unified_cpp_certverifier0.o.pp /builds/worker/workspace/build/src/obj-firefox/security/certverifier/Unified_cpp_certverifier0.cpp
[task 2019-09-25T14:16:33.539Z] 14:16:33 INFO - In file included from /builds/worker/workspace/build/src/obj-firefox/security/certverifier/Unified_cpp_certverifier0.cpp:29:
[task 2019-09-25T14:16:33.539Z] 14:16:33 ERROR - /builds/worker/workspace/build/src/security/certverifier/NSSCertDBTrustDomain.cpp:1360:3: error: unknown type name 'BackCert'
[task 2019-09-25T14:16:33.539Z] 14:16:33 INFO - BackCert cert(certDER, endEntityOrCA, nullptr);
[task 2019-09-25T14:16:33.539Z] 14:16:33 INFO - ^
[task 2019-09-25T14:16:33.539Z] 14:16:33 INFO - 1 error generated.
[task 2019-09-25T14:16:33.542Z] 14:16:33 INFO - /builds/worker/workspace/build/src/config/rules.mk:826: recipe for target 'Unified_cpp_certverifier0.o' failed
[task 2019-09-25T14:16:33.542Z] 14:16:33 ERROR - make[4]: *** [Unified_cpp_certverifier0.o] Error 1
[task 2019-09-25T14:16:33.543Z] 14:16:33 INFO - make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/security/certverifier'
[task 2019-09-25T14:16:33.543Z] 14:16:33 INFO - /builds/worker/workspace/build/src/config/recurse.mk:74: recipe for target 'security/certverifier/target' failed
[task 2019-09-25T14:16:33.543Z] 14:16:33 ERROR - make[3]: *** [security/certverifier/target] Error 2
[task 2019-09-25T14:16:33.543Z] 14:16:33 INFO - make[3]: *** Waiting for unfinished jobs....
[task 2019-09-25T14:16:33.551Z] 14:16:33 INFO - make[4]: Entering directory '/builds/worker/workspace/build/src/obj-firefox/config/external/icu/i18n'
[task 2019-09-25T14:16:33.551Z] 14:16:33 INFO - config/external/icu/i18n/reldatefmt.o

Flags: needinfo?(dkeeler)

Whoops - we actually shouldn't need to uplift this to esr68. My mistake.

Flags: needinfo?(dkeeler)

Ryan - it looks like I can't clear the esr68 flag on attachment 9092199 [details] - would you mind unsetting it? Thanks!

Flags: needinfo?(ryanvm)
Flags: needinfo?(ryanvm)
Attachment #9092199 - Flags: approval-mozilla-esr68+
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: