Closed
Bug 435496
Opened 16 years ago
Closed 6 years ago
crash [@ ocsp_CacheKeyHashFunction]
Categories
(NSS :: Libraries, defect, P1)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: MatsPalmgren_bugz, Unassigned)
References
Details
(Keywords: crash)
Crash Data
"Top Crashers for Firefox 3.0" currently has this crash at #21.
A few crash IDs:
bp-c64537fd-266d-11dd-a520-001cc45a2c28
bp-db190d47-25f2-11dd-9d85-001cc45a2ce4
bp-73c6bb97-25a4-11dd-84b4-0013211cbf8a
A couple of example stacks:
ocsp_CacheKeyHashFunction mozilla/security/nss/lib/certhigh/ocsp.c:300
PL_HashTableRemove mozilla/nsprpub/lib/ds/plhash.c:374
nss3.dll@0xa101f
nsNSSComponent::DoProfileBeforeChange mozilla/security/manager/ssl/src/nsNSSComponent.cpp:2388
nsNSSComponent::Observe mozilla/security/manager/ssl/src/nsNSSComponent.cpp:1972
nsObserverService::NotifyObservers mozilla/xpcom/ds/nsObserverService.cpp:181
xul.dll@0x7b88d3
xul.dll@0x7b89a3
==================================
ocsp_CacheKeyHashFunction
PL_HashTableRemove mozilla/nsprpub/lib/ds/plhash.c:374
ocsp_RemoveCacheItem
CERT_ClearOCSPCache
OCSP_ShutdownGlobal
NSS_Shutdown
nsNSSComponent::ShutdownNSS mozilla/security/manager/ssl/src/nsNSSComponent.cpp:1690
nsNSSComponent::DoProfileBeforeChange mozilla/security/manager/ssl/src/nsNSSComponent.cpp:2388
nsNSSComponent::Observe mozilla/security/manager/ssl/src/nsNSSComponent.cpp:1972
nsObserverList::NotifyObservers mozilla/xpcom/ds/nsObserverList.cpp:128
nsObserverService::NotifyObservers mozilla/xpcom/ds/nsObserverService.cpp:181
nsXREDirProvider::DoShutdown mozilla/toolkit/xre/nsXREDirProvider.cpp:853
ScopedXPCOMStartup::~ScopedXPCOMStartup mozilla/toolkit/xre/nsAppRunner.cpp:905
XRE_main mozilla/toolkit/xre/nsAppRunner.cpp:3211
main mozilla/browser/app/nsBrowserApp.cpp:158
Comment 1•16 years ago
|
||
Odds are very good that this bug has the same cause as bug 433594, and
that the fix for bug 433386 will also cure this crash, but to be safe,
we shuold also insert a NULL pointer check into ocsp_CacheKeyHashFunction.
As an aside, I really don't understand why PL_HashTableRemove is calling
ocsp_CacheKeyHashFunction here. That seems wrong. I want to understand
that before declaring that a NULL check in ocsp_CacheKeyHashFunction is
the right fix.
Comment 2•16 years ago
|
||
After looking at this further, I no longer think that avoiding a NULL ptr
dereference in ocsp_CacheKeyHashFunction will solve any problem. The
problem is probably not a NULL ptr but rather a stale ptr.
This hash function returns a value that is used to find a "hash bucket"
that is the head of a chain of objects of this type. All return values
from this hash function are valid. There is no value that this function
can return that means "not found" or "no valid hash for this object".
Obviously this function is being called with a stale certID handle,
which means that a certID cache entry contains a pointer to a stale
certID. A certID handle has been put into the cache, and then that
certID has been destroyed.
The only real fix to this problem is to avoid calling PL_HashTableRemove
with a stale certID handle/ptr. It wouldn't hurt to enclose the call to
PL_HashTableRemove inside ocsp_RemoveCacheItem with a NULL test
(item->certID != NULL) but that won't detect a stale cache entry.
Since there's no good way to detect that a certID handle is stale, we must prevent cache entries from becoming stale.
I think bug 433386 is one known cause of that problem.
Kai, can you confirm that?
If we knew how to cause this particular crash, and could reproduce it,
we could test whether the fix for bug 433386 fixes this crash too, or not.
Comment 3•16 years ago
|
||
I think I cc'ed the wrong kai. :(
Reporter | ||
Comment 4•16 years ago
|
||
(In reply to comment #2)
> If we knew how to cause this particular crash, and could reproduce it,
Most of the crash report comments mentioned "startup" or "quit".
Comment 5•16 years ago
|
||
Yes, the second stack trace above clearly happens during shutdown.
Some code path has apparently destroyed an object to which the cache
still holds a pointer. The question is: what series of operations
(e.g. a visit to what URL) leaves the ocsp certID cache in that state.
I think the solution to this may involve making the certID object be
a reference counted object. The present strategy, where a certID
object can only have one "owner" who holds the sole reference to the
object, seems doomed. I can see that it would work when creating
the certID object, and putting it into the cache. Once the reference
is put into the cache, the caller no longer needs to hold its copy.
But when find the certID in the cache, you clearly want to have two
references to it, one that remains in the cache, and another than is
returned to the caller of the cache lookup. I think the present design
just doesn't handle that important and most common case at all well.
But I could be wrong.
Comment 6•16 years ago
|
||
This bug and bug 428746 probably have the same cause.
Comment 7•16 years ago
|
||
I wonder if this bug is a side effect of the cause of bug 433594, which is
now fixed.
Updated•16 years ago
|
Updated•16 years ago
|
Priority: P2 → P1
Comment 8•16 years ago
|
||
Having heard nothing new about this bug in 9 months, I gather it is no
longer an issue. If it is, we need new evidence.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WORKSFORME
Reporter | ||
Comment 9•16 years ago
|
||
There are 215 reported Firefox crashes in the past 4 weeks with this signature.
http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&query=ocsp_CacheKeyHashFunction&date=&range_value=4&range_unit=weeks&do_query=1&signature=ocsp_CacheKeyHashFunction
All except two are with Firefox 3.0 rc1 or earlier. The exceptions are:
Firefox 3.0 rc2 on Windows XP:
bp-07225492-9907-4987-a3fd-92d552090326
Firefox 3.0.8 on MacOSX 10.4.11:
bp-7af8a2a1-37fa-4287-871e-920222090401
So it seems bug 433594 made this crash much less likely to occur,
but there is evidence it still occurs with that bug fixed.
Status: RESOLVED → REOPENED
Keywords: topcrash
Priority: P1 → --
Resolution: WORKSFORME → ---
Whiteboard: possibly fixed in rc2 by bug 433386
Comment 10•16 years ago
|
||
Mats, Thanks for that additional info. I propose that, henceforth, the scope
of this bug will be about the crash in Firefox 3.0.8 on MacOSX 10.4.11:
bp-7af8a2a1-37fa-4287-871e-920222090401
http://crash-stats.mozilla.com/report/index/7af8a2a1-37fa-4287-871e-920222090401
Reporter | ||
Comment 11•15 years ago
|
||
FWIW, it also occurs in Firefox 3.5 on Windows XP:
bp-13386624-3ff5-45d4-94df-df29d2090625
Updated•15 years ago
|
Assignee: nobody → julien.pierre.boogz
Priority: -- → P1
Target Milestone: 3.12.1 → 3.12.5
Version: unspecified → 3.12
Comment 12•15 years ago
|
||
Mats,
Thanks for the report in comment 11 . That one has a stack with the full debug info.
From it, I can tell that the crash is due to an invalid issuerNameHash in a CERTOCSPCertID in the OCSP cache. Either len is incorrect, or the data pointer is NULL.
I will check what can cause that situation to occur.
Comment 13•15 years ago
|
||
The only place where issuerNameHash is supposed to be set is in ocsp_CreateCertID, by a call to cert_GetSubjectNameDigest, which in turns calls ocsp_DigestValue .
Looking at all the error paths, I can't find any which would result in an invalid issuerNameHash . There are proper size checks in ocsp_DigestValue, such as the function would fail if the fill buffer was too small. This would in turn cause ocsp_CertID to fail.
My preliminary condition is thus that the OCSPCertID was somehow corrupt some time after it was created initially successfully in a valid state. It would be very helpful to dump the structure from the crash logs as is possible with a core file, but I don't know if this capability is available.
Updated•15 years ago
|
Target Milestone: 3.12.5 → ---
Comment 14•14 years ago
|
||
Bugs that are currently assigned to Julien => assigning to nobody.
Search for 20100628-kaie-jp
Assignee: bugzilla+nospam → nobody
Updated•13 years ago
|
Crash Signature: [@ ocsp_CacheKeyHashFunction]
Comment 15•11 years ago
|
||
(In reply to Mats Palmgren (:mats) from comment #9)
> There are 215 reported Firefox crashes in the past 4 weeks with this
> signature.
way rare these days - on average only a couple per version per month
attempting to contact user of bp-4a45a655-cdf3-42c1-8534-0632b2130311
Comment 16•6 years ago
|
||
Closing because no crash reported since 12 weeks.
Status: REOPENED → RESOLVED
Closed: 16 years ago → 6 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•