508081 - Sqlite shared database continues to malfunction after we lost and regained access to the database

Reporter

Description

•

15 years ago

This bug is first reported in Chromium issue 15630 (http://crbug.com/15630). Suppose the file system where the NSS sqlite shared database resides goes away and comes back while NSS is already initialized. The softoken returns CKR_DEVICE_ERROR (0x48) when we try to search for certificates because some sqlite function fails with SQLITE_IOERR. The stack trace is: Breakpoint 3, sdb_mapSQLError (type=SDB_CERT, sqlerr=10) at sdb.c:349 349 return CKR_DEVICE_ERROR; (gdb) where #0 sdb_mapSQLError (type=SDB_CERT, sqlerr=10) at sdb.c:349 #1 0xf1bbdcdd in sdb_FindObjects (sdb=0xf1287220, sdbFind=0xf56619c8, object=0xf5694a18, arraySize=5, count=0xf57608e0) at sdb.c:781 #2 0xf1bc23b3 in sftkdb_FindObjects (handle=0xf1266d08, find=0xf56619c8, ids=0xf5694a18, arraySize=5, count=0xf57608e0) at sftkdb.c:1241 #3 0xf1ba9963 in sftk_searchDatabase (handle=0xf1266d08, search=0xf1267be0, pTemplate=0xf5760a84, ulCount=3) at pkcs11.c:4144 #4 0xf1ba9ccf in sftk_searchTokenList (slot=0xf1276f60, search=0xf1267be0, pTemplate=0xf5760a84, ulCount=3, tokenOnly=0xf5760968, isLoggedIn=1) at pkcs11.c:4265 #5 0xf1ba9f0e in NSC_FindObjectsInit (hSession=16777217, pTemplate=0xf5760a84, ulCount=3) at pkcs11.c:4317 #6 0xf77861ff in find_objects (tok=0xf12a45e0, sessionOpt=0xf12a1db8, obj_template=0xf5760a84, otsize=3, maximumOpt=0, statusOpt=0xf5760aec) at devtoken.c:334 #7 0xf7786599 in find_objects_by_template (token=0xf12a45e0, sessionOpt=0xf12a1db8, obj_template=0xf5760a84, otsize=3, maximumOpt=0, statusOpt=0xf5760aec) at devtoken.c:463 #8 0xf7786e59 in nssToken_FindCertificatesBySubject (token=0xf12a45e0, sessionOpt=0xf12a1db8, subject=0xf5760b80, searchType=nssTokenSearchType_TokenOnly, maximumOpt=0, statusOpt=0xf5760aec) at devtoken.c:657 #9 0xf777d95d in nssTrustDomain_FindCertificatesBySubject (td=0xf12a1cd0, subject=0xf5760b80, rvOpt=0x0, maximumOpt=0, arenaOpt=0x0) at trustdomain.c:646 #10 0xf777daa7 in NSSTrustDomain_FindCertificatesBySubject (td=0xf12a1cd0, subject=0xf5760b80, rvOpt=0x0, maximumOpt=0, arenaOpt=0x0) at trustdomain.c:702 #11 0xf77760f2 in CERT_CreateSubjectCertList (certList=0x0, handle=0xf12a1cd0, name=0xf12b02f8, sorttime=1249323300718664, validOnly=1) at stanpcertdb.c:691 #12 0xf782fed7 in pkix_pl_Pk11CertStore_CertQuery (params=0xf126a8b0, pSelected=0xf5760cc0, plContext=0xf19068a8) at pkix_pl_pk11certstore.c:238 #13 0xf78310f7 in pkix_pl_Pk11CertStore_GetCert (store=0xf564b330, selector=0xf564f840, parentVerifyNode=0xf1247b08, pNBIOContext=0xf5760d2c, pCertList=0xf5760d34, plContext=0xf19068a8) at pkix_pl_pk11certstore.c:618 #14 0xf77c72e0 in pkix_Build_GatherCerts (state=0xf190d248, certSelParams=0xf126a8b0, pNBIOContext=0xf5760df0, plContext=0xf19068a8) at pkix_build.c:1807 #15 0xf77c87ed in pkix_BuildForwardDepthFirstSearch (pNBIOContext=0xf5760f5c, state=0xf190d248, pValResult=0xf5760f54, plContext=0xf19068a8) at pkix_build.c:2343 #16 0xf77cd4df in pkix_Build_InitiateBuildChain (procParams=0xf126bcd8, pNBIOContext=0xf5761010, pState=0xf5761018, pBuildResult=0xf5761014, pVerifyNode=0xf5761078, plContext=0xf19068a8) at pkix_build.c:3551 #17 0xf77ce1d6 in PKIX_BuildChain (procParams=0xf126bcd8, pNBIOContext=0xf576108c, pState=0xf5761088, pBuildResult=0xf5761090, pVerifyNode=0xf5761078, plContext=0xf19068a8) at pkix_build.c:3719 #18 0xf7728fea in CERT_PKIXVerifyCert (cert=0xbd07188, usages=2, paramsIn=0xf57611d8, paramsOut=0xf5761190, wincx=0x0) at certvfypkix.c:2148 [Rest of stack trace omitted for brevity] The sqlite function that failed is the sqlite3_step call in sdb_FindObjects. That sqlite3_step call returns SQLITE_IOERR (10 = 0x0A). That error seems to come from this failure: #0 unixFileSize (id=0xf1277d60, pSize=0xf5781010) at /usr/local/google/home/wtc/chrome1/src/third_party/sqlite/src/os_unix.c:1110 #1 0x09465419 in sqlite3OsFileSize (id=0xf1277d60, pSize=0xf5781010) at /usr/local/google/home/wtc/chrome1/src/third_party/sqlite/src/os.c:80 #2 0x0946a162 in sqlite3PagerPagecount (pPager=0xf1277c78, pnPage=0xf5781058) at /usr/local/google/home/wtc/chrome1/src/third_party/sqlite/src/pager.c:2548 #3 0x0946b66b in sqlite3PagerAcquire2 (pPager=0xf1277c78, pgno=1, ppPage=0xf57810dc, noContent=0, pDataToFill=0x0) at /usr/local/google/home/wtc/chrome1/src/third_party/sqlite/src/pager.c:3848 #4 0x0946b537 in pagerAcquire (pPager=0xf1277c78, pgno=1, ppPage=0xf57810dc, noContent=0) at /usr/local/google/home/wtc/chrome1/src/third_party/sqlite/src/pager.c:3780 #5 0x0946b889 in sqlite3PagerAcquire (pPager=0xf1277c78, pgno=1, ppPage=0xf57810dc, noContent=0) at /usr/local/google/home/wtc/chrome1/src/third_party/sqlite/src/pager.c:3918 #6 0x094a5cf6 in sqlite3BtreeGetPage (pBt=0xf12777e8, pgno=1, ppPage=0xf5781120, noContent=0) at /usr/local/google/home/wtc/chrome1/src/third_party/sqlite/src/btree.c:1082 #7 0x094a6ad1 in lockBtree (pBt=0xf12777e8) at /usr/local/google/home/wtc/chrome1/src/third_party/sqlite/src/btree.c:1715 #8 0x094a7097 in sqlite3BtreeBeginTrans (p=0xf12777c0, wrflag=0) at /usr/local/google/home/wtc/chrome1/src/third_party/sqlite/src/btree.c:1983 #9 0x094ba1d3 in sqlite3VdbeExec (p=0xf091cce8) at /usr/local/google/home/wtc/chrome1/src/third_party/sqlite/src/vdbe.c:2479 #10 0x0947f293 in sqlite3Step (p=0xf091cce8) at /usr/local/google/home/wtc/chrome1/src/third_party/sqlite/src/vdbeapi.c:476 #11 0x0947f4d9 in sqlite3_step (pStmt=0xf091cce8) at /usr/local/google/home/wtc/chrome1/src/third_party/sqlite/src/vdbeapi.c:540 #12 0xf1bbdc4f in sdb_FindObjects (sdb=0xf1287220, sdbFind=0xf091aea0, object=0xf091ade0, arraySize=5, count=0xf57818e0) at sdb.c:762 #13 0xf1bc23b3 in sftkdb_FindObjects (handle=0xf1266d08, find=0xf091aea0, ids=0xf091ade0, arraySize=5, count=0xf57818e0) at sftkdb.c:1241 #14 0xf1ba9963 in sftk_searchDatabase (handle=0xf1266d08, search=0xf091b250, pTemplate=0xf5781a84, ulCount=3) at pkcs11.c:4144 #15 0xf1ba9ccf in sftk_searchTokenList (slot=0xf1276f60, search=0xf091b250, pTemplate=0xf5781a84, ulCount=3, tokenOnly=0xf5781968, isLoggedIn=1) at pkcs11.c:4265 #16 0xf1ba9f0e in NSC_FindObjectsInit (hSession=16777217, pTemplate=0xf5781a84, ulCount=3) at pkcs11.c:4317 #17 0xf77861ff in find_objects (tok=0xf12a45e0, sessionOpt=0xf12a1db8, obj_template=0xf5781a84, otsize=3, maximumOpt=0, statusOpt=0xf5781aec) at devtoken.c:334 #18 0xf7786599 in find_objects_by_template (token=0xf12a45e0, sessionOpt=0xf12a1db8, obj_template=0xf5781a84, otsize=3, maximumOpt=0, statusOpt=0xf5781aec) at devtoken.c:463 #19 0xf7786e59 in nssToken_FindCertificatesBySubject (token=0xf12a45e0, sessionOpt=0xf12a1db8, subject=0xf5781b80, searchType=nssTokenSearchType_TokenOnly, maximumOpt=0, statusOpt=0xf5781aec) at devtoken.c:657 #20 0xf777d95d in nssTrustDomain_FindCertificatesBySubject (td=0xf12a1cd0, subject=0xf5781b80, rvOpt=0x0, maximumOpt=0, arenaOpt=0x0) at trustdomain.c:646 #21 0xf777daa7 in NSSTrustDomain_FindCertificatesBySubject (td=0xf12a1cd0, subject=0xf5781b80, rvOpt=0x0, maximumOpt=0, arenaOpt=0x0) at trustdomain.c:702 #22 0xf77760f2 in CERT_CreateSubjectCertList (certList=0x0, handle=0xf12a1cd0, name=0xf12b02f8, sorttime=1249325704833941, validOnly=1) at stanpcertdb.c:691 #23 0xf782fed7 in pkix_pl_Pk11CertStore_CertQuery (params=0xf091b008, pSelected=0xf5781cc0, plContext=0xf091e868) at pkix_pl_pk11certstore.c:238 #24 0xf78310f7 in pkix_pl_Pk11CertStore_GetCert (store=0xf091b5a8, selector=0xf091af00, parentVerifyNode=0xf091ad68, pNBIOContext=0xf5781d2c, pCertList=0xf5781d34, plContext=0xf091e868) at pkix_pl_pk11certstore.c:618 #25 0xf77c72e0 in pkix_Build_GatherCerts (state=0xf091ac18, certSelParams=0xf091b008, pNBIOContext=0xf5781df0, plContext=0xf091e868) at pkix_build.c:1807 #26 0xf77c87ed in pkix_BuildForwardDepthFirstSearch (pNBIOContext=0xf5781f5c, state=0xf091ac18, pValResult=0xf5781f54, plContext=0xf091e868) at pkix_build.c:2343 #27 0xf77cd4df in pkix_Build_InitiateBuildChain (procParams=0xf091b418, pNBIOContext=0xf5782010, pState=0xf5782018, pBuildResult=0xf5782014, pVerifyNode=0xf5782078, plContext=0xf091e868) at pkix_build.c:3551 #28 0xf77ce1d6 in PKIX_BuildChain (procParams=0xf091b418, pNBIOContext=0xf578208c, pState=0xf5782088, pBuildResult=0xf5782090, pVerifyNode=0xf5782078, plContext=0xf091e868) at pkix_build.c:3719 #29 0xf7728fea in CERT_PKIXVerifyCert (cert=0xbd07188, usages=2, paramsIn=0xf57821d8, paramsOut=0xf5782190, wincx=0x0) at certvfypkix.c:2148 [Rest of stack trace omitted for brevity] The fstat() call in unixFileSize fails (returns -1) with errno 107 (ENOTCONN). Note that Chromium uses its own copy of the sqlite library. I'm not sure if that matters to this bug.

Wan-Teh Chang

Reporter

Comment 1

•

15 years ago

I just verified that the old DBM database doesn't have this bug. So this bug can be considered a regression by products that switch from DBM to sqlite. Bob, I have the setup to reproduce and debug this bug. You are welcome to come to my office to debug this.

Version: 3.11.14 → 3.12

Robert Relyea

Comment 2

•

15 years ago

Question: Does old dbm lack the bug because old dbm does not return an error, returns a different error, or the error is handled differently? My first thought is that the DBM sematics were wrong (they should have returned an error if the underlying file went way), but the upper level code should have been more tolerant of underlying failures in softoken itself. NOTE: Has a couple of issues.... Say you have a database on one of these flakey file system. You delete a certificate from the builtins (which puts a 'delete' record in the old database). If your file system goes away, does dbm suddenly 'see' that new certificate? Is this the failure mode we want? bob

Nelson Bolyard (seldom reads bugmail)

Comment 3

•

15 years ago

DBM always held the file open the whole time. IINM, sqlite3 closes and opens it between (some) accesses.

Wan-Teh Chang

Reporter

Comment 4

•

15 years ago

Bob, the old DBM database successfully returns the root CA cert I added manually after it loses and regains access to the cert database.

Wan-Teh Chang

Reporter

Comment 5

•

15 years ago

I am running into this bug again. Chromium copies PSM's algorithm of storing intermediate CA certs in the NSS cert database. This causes the new temp certs that libSSL creates for the intermediate CA certs to become "perm" certs: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/certdb/stanpcertdb.c&rev=1.84&mark=378-380,398#362 When we lose access to ~/.pki/nssdb, these intermediate CA certs become undiscoverable if we use the sql db, even after we regain access to ~/.pki/nssdb. On the other hand, if we use the dbm db, the softoken's db slot continues to return these intermediate CA certs even when we lose access to ~/.pki/nssdb. Re: Nelson's comment 3: I found that we open the sql or dbm databases at NSS initialization, and do not close them until NSS shutdown. However, the dbm code reads from the file only once, and can satisfy subsequent C_FindObjects calls from memory, so the dbm code is oblivious to the fact that the filesystem is gone. I believe this is why the dbm database handles this condition "better" then the sql database. As Bob noted, dbm's behavior is less correct, but from a user's point of view, dbm's behavior makes the cert chains validate, and is therefore more desirable. I don't know the softoken code well enough to propose a fix. I think it'll require reopening the db after detecing an error with the current db handle. This bug is a serious problem for Linux Chromium users at Google, so I am afraid that I will have to not add the intermediate CA certs to the cert db.

Wan-Teh Chang

Reporter

Comment 6

•

15 years ago

I forgot to add: I will not give up the sql db yet.

Wan-Teh Chang

Reporter

Comment 7

•

15 years ago

One obvious "workaround" I tried was to remove the code in CERT_NewTempCertificate that simply returns the perm cert if the cert is already in the cert db: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/certdb/stanpcertdb.c&rev=1.84&mark=378-380,398#362 Index: mozilla/security/nss/lib/certdb/stanpcertdb.c =================================================================== RCS file: /cvsroot/mozilla/security/nss/lib/certdb/stanpcertdb.c,v retrieving revision 1.84 diff -u -u -r1.84 stanpcertdb.c --- mozilla/security/nss/lib/certdb/stanpcertdb.c 29 May 2009 19:16:54 -0000 1.84 +++ mozilla/security/nss/lib/certdb/stanpcertdb.c 11 Sep 2009 05:16:26 -0000 @@ -374,11 +374,13 @@ /* First, see if it is already a temp cert */ c = NSSCryptoContext_FindCertificateByEncodedCertificate(gCC, &encoding); +#if 0 if (!c) { /* Then, see if it is already a perm cert */ c = NSSTrustDomain_FindCertificateByEncodedCertificate(handle, &encoding); } +#endif if (c) { /* actually, that search ends up going by issuer/serial, * so it is still possible to return a cert with the same However, when I did this, the certificate chain building code in libpkix always get into a loop. I tried several changes to eliminate the duplicate temp and perm certs, but couldn't get rid of the cert loop problem. I also tried opening two cert db's containing the same intermediate CA certs, but libpkix did not get a cert loop. So it seems that having two perm certs, from two tokens, of the same intermediate CA is fine.

Robert Relyea

Comment 8

•

15 years ago

This whole thing sounds like an sql lite issue, particularly if the failure is persistent. bob

certutil patch for testing 12 years ago Kai Engert (:KaiE:) (deleted), patch		Details \| Diff \| Splinter Review
potential fix (v6) 12 years ago Kai Engert (:KaiE:) (deleted), patch	KaiE : review? wtc	Details \| Diff \| Splinter Review
certutil patch for testing (v6) 12 years ago Kai Engert (:KaiE:) (deleted), patch		Details \| Diff \| Splinter Review
vfychain patch for testing 12 years ago Kai Engert (:KaiE:) (deleted), patch		Details \| Diff \| Splinter Review
unixfilesize patch used for tracing 12 years ago Kai Engert (:KaiE:) (deleted), patch		Details \| Diff \| Splinter Review