Closed Bug 435496 Opened 16 years ago Closed 6 years ago

crash [@ ocsp_CacheKeyHashFunction]

Categories

(NSS :: Libraries, defect, P1)

3.12
x86
Windows XP
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: MatsPalmgren_bugz, Unassigned)

References

Details

(Keywords: crash)

Crash Data

"Top Crashers for Firefox 3.0" currently has this crash at #21. A few crash IDs: bp-c64537fd-266d-11dd-a520-001cc45a2c28 bp-db190d47-25f2-11dd-9d85-001cc45a2ce4 bp-73c6bb97-25a4-11dd-84b4-0013211cbf8a A couple of example stacks: ocsp_CacheKeyHashFunction mozilla/security/nss/lib/certhigh/ocsp.c:300 PL_HashTableRemove mozilla/nsprpub/lib/ds/plhash.c:374 nss3.dll@0xa101f nsNSSComponent::DoProfileBeforeChange mozilla/security/manager/ssl/src/nsNSSComponent.cpp:2388 nsNSSComponent::Observe mozilla/security/manager/ssl/src/nsNSSComponent.cpp:1972 nsObserverService::NotifyObservers mozilla/xpcom/ds/nsObserverService.cpp:181 xul.dll@0x7b88d3 xul.dll@0x7b89a3 ================================== ocsp_CacheKeyHashFunction PL_HashTableRemove mozilla/nsprpub/lib/ds/plhash.c:374 ocsp_RemoveCacheItem CERT_ClearOCSPCache OCSP_ShutdownGlobal NSS_Shutdown nsNSSComponent::ShutdownNSS mozilla/security/manager/ssl/src/nsNSSComponent.cpp:1690 nsNSSComponent::DoProfileBeforeChange mozilla/security/manager/ssl/src/nsNSSComponent.cpp:2388 nsNSSComponent::Observe mozilla/security/manager/ssl/src/nsNSSComponent.cpp:1972 nsObserverList::NotifyObservers mozilla/xpcom/ds/nsObserverList.cpp:128 nsObserverService::NotifyObservers mozilla/xpcom/ds/nsObserverService.cpp:181 nsXREDirProvider::DoShutdown mozilla/toolkit/xre/nsXREDirProvider.cpp:853 ScopedXPCOMStartup::~ScopedXPCOMStartup mozilla/toolkit/xre/nsAppRunner.cpp:905 XRE_main mozilla/toolkit/xre/nsAppRunner.cpp:3211 main mozilla/browser/app/nsBrowserApp.cpp:158
Odds are very good that this bug has the same cause as bug 433594, and that the fix for bug 433386 will also cure this crash, but to be safe, we shuold also insert a NULL pointer check into ocsp_CacheKeyHashFunction. As an aside, I really don't understand why PL_HashTableRemove is calling ocsp_CacheKeyHashFunction here. That seems wrong. I want to understand that before declaring that a NULL check in ocsp_CacheKeyHashFunction is the right fix.
After looking at this further, I no longer think that avoiding a NULL ptr dereference in ocsp_CacheKeyHashFunction will solve any problem. The problem is probably not a NULL ptr but rather a stale ptr. This hash function returns a value that is used to find a "hash bucket" that is the head of a chain of objects of this type. All return values from this hash function are valid. There is no value that this function can return that means "not found" or "no valid hash for this object". Obviously this function is being called with a stale certID handle, which means that a certID cache entry contains a pointer to a stale certID. A certID handle has been put into the cache, and then that certID has been destroyed. The only real fix to this problem is to avoid calling PL_HashTableRemove with a stale certID handle/ptr. It wouldn't hurt to enclose the call to PL_HashTableRemove inside ocsp_RemoveCacheItem with a NULL test (item->certID != NULL) but that won't detect a stale cache entry. Since there's no good way to detect that a certID handle is stale, we must prevent cache entries from becoming stale. I think bug 433386 is one known cause of that problem. Kai, can you confirm that? If we knew how to cause this particular crash, and could reproduce it, we could test whether the fix for bug 433386 fixes this crash too, or not.
I think I cc'ed the wrong kai. :(
(In reply to comment #2) > If we knew how to cause this particular crash, and could reproduce it, Most of the crash report comments mentioned "startup" or "quit".
Yes, the second stack trace above clearly happens during shutdown. Some code path has apparently destroyed an object to which the cache still holds a pointer. The question is: what series of operations (e.g. a visit to what URL) leaves the ocsp certID cache in that state. I think the solution to this may involve making the certID object be a reference counted object. The present strategy, where a certID object can only have one "owner" who holds the sole reference to the object, seems doomed. I can see that it would work when creating the certID object, and putting it into the cache. Once the reference is put into the cache, the caller no longer needs to hold its copy. But when find the certID in the cache, you clearly want to have two references to it, one that remains in the cache, and another than is returned to the caller of the cache lookup. I think the present design just doesn't handle that important and most common case at all well. But I could be wrong.
This bug and bug 428746 probably have the same cause.
Keywords: topcrash
Whiteboard: possibly fixed in rc2 by bug 433386
I wonder if this bug is a side effect of the cause of bug 433594, which is now fixed.
Priority: -- → P2
Target Milestone: --- → 3.12.1
Priority: P2 → P1
Having heard nothing new about this bug in 9 months, I gather it is no longer an issue. If it is, we need new evidence.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WORKSFORME
There are 215 reported Firefox crashes in the past 4 weeks with this signature. http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&query=ocsp_CacheKeyHashFunction&date=&range_value=4&range_unit=weeks&do_query=1&signature=ocsp_CacheKeyHashFunction All except two are with Firefox 3.0 rc1 or earlier. The exceptions are: Firefox 3.0 rc2 on Windows XP: bp-07225492-9907-4987-a3fd-92d552090326 Firefox 3.0.8 on MacOSX 10.4.11: bp-7af8a2a1-37fa-4287-871e-920222090401 So it seems bug 433594 made this crash much less likely to occur, but there is evidence it still occurs with that bug fixed.
Status: RESOLVED → REOPENED
Keywords: topcrash
Priority: P1 → --
Resolution: WORKSFORME → ---
Whiteboard: possibly fixed in rc2 by bug 433386
Mats, Thanks for that additional info. I propose that, henceforth, the scope of this bug will be about the crash in Firefox 3.0.8 on MacOSX 10.4.11: bp-7af8a2a1-37fa-4287-871e-920222090401 http://crash-stats.mozilla.com/report/index/7af8a2a1-37fa-4287-871e-920222090401
FWIW, it also occurs in Firefox 3.5 on Windows XP: bp-13386624-3ff5-45d4-94df-df29d2090625
Assignee: nobody → julien.pierre.boogz
Priority: -- → P1
Target Milestone: 3.12.1 → 3.12.5
Version: unspecified → 3.12
Mats, Thanks for the report in comment 11 . That one has a stack with the full debug info. From it, I can tell that the crash is due to an invalid issuerNameHash in a CERTOCSPCertID in the OCSP cache. Either len is incorrect, or the data pointer is NULL. I will check what can cause that situation to occur.
The only place where issuerNameHash is supposed to be set is in ocsp_CreateCertID, by a call to cert_GetSubjectNameDigest, which in turns calls ocsp_DigestValue . Looking at all the error paths, I can't find any which would result in an invalid issuerNameHash . There are proper size checks in ocsp_DigestValue, such as the function would fail if the fill buffer was too small. This would in turn cause ocsp_CertID to fail. My preliminary condition is thus that the OCSPCertID was somehow corrupt some time after it was created initially successfully in a valid state. It would be very helpful to dump the structure from the crash logs as is possible with a core file, but I don't know if this capability is available.
Target Milestone: 3.12.5 → ---
Bugs that are currently assigned to Julien => assigning to nobody. Search for 20100628-kaie-jp
Assignee: bugzilla+nospam → nobody
Crash Signature: [@ ocsp_CacheKeyHashFunction]
(In reply to Mats Palmgren (:mats) from comment #9) > There are 215 reported Firefox crashes in the past 4 weeks with this > signature. way rare these days - on average only a couple per version per month attempting to contact user of bp-4a45a655-cdf3-42c1-8534-0632b2130311
Closing because no crash reported since 12 weeks.
Status: REOPENED → RESOLVED
Closed: 16 years ago6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.