Main thread IO in the parent process in nsNSSComponent::HasUserCertsInstalled
Categories
(Core :: Networking, defect, P3)
Tracking
()
People
(Reporter: mstange, Assigned: valentin)
References
Details
(Keywords: perf, perf:responsiveness, regression, Whiteboard: [necko-triaged])
Attachments
(2 files)
Updated•7 years ago
|
Assignee | ||
Updated•7 years ago
|
Assignee | ||
Comment 1•7 years ago
|
||
Updated•7 years ago
|
Comment hidden (mozreview-request) |
Comment 4•7 years ago
|
||
mozreview-review |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 7•7 years ago
|
||
mozreview-review |
Comment 8•7 years ago
|
||
mozreview-review-reply |
Comment 9•7 years ago
|
||
mozreview-review |
Assignee | ||
Comment 10•7 years ago
|
||
mozreview-review-reply |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 12•7 years ago
|
||
Updated•7 years ago
|
Assignee | ||
Comment 14•6 years ago
|
||
Comment hidden (mozreview-request) |
Reporter | ||
Updated•6 years ago
|
Assignee | ||
Comment 16•6 years ago
|
||
Assignee | ||
Comment 17•6 years ago
|
||
Assignee | ||
Comment 18•6 years ago
|
||
Updated•6 years ago
|
Updated•6 years ago
|
Comment 20•6 years ago
|
||
Here's another profile where this consumes about 300ms on start-up: https://perfht.ml/2VLGcTa
Updated•6 years ago
|
Assignee | ||
Comment 21•6 years ago
|
||
Assignee | ||
Comment 22•6 years ago
|
||
Assignee | ||
Comment 23•6 years ago
|
||
Comment 24•6 years ago
|
||
Here is a profile of nsNSSComponent::HasUserCertsInstalled with main thread I/O markers: https://perfht.ml/2CX20Tc
I'm surprised we need to do that many read and stat calls. The stat calls are almost all on cert9.db-wal and cert9.db-journal. Would it be possible to have a copy of this file in memory to access it quickly without doing so much disk I/O?
See also bug 1478148. While it would be beneficial to improve that implementation, there are significant architectural constraints that make it difficult.
Assignee | ||
Comment 26•6 years ago
|
||
(In reply to Valentin Gosu [:valentin] from comment #23)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=204a7b073ff84b30cca8768fcc0a46f9c8d0d7ac
In my latest try patch it seems we no longer hang without a profile like before, but sometimes we get:
MOZ_CRASH(NSS_Shutdown failed)
I'm not really sure why that would happen. Could just holding the nssComponent mutex cause NSS shutdown to fail?
Updated•6 years ago
|
My guess is it has something to do with that CanEnableSpeculativeConnect
doesn't have an owning reference to the nsINSSComponent
- there's nothing keeping that alive, from what I can tell, although the documentation says the thread will be joined at xpcom shutdown, which should happen strictly before trying to call NSS_Shutdown, so I don't really see how that would prevent NSS from shutting down successfully (I would just expect some uaf crash). It could be that that code isn't threadsafe in NSS and we're merely exercising an existing bug. I've tried to narrow it down to either checking for smartcards or checking for client certs, but only checking for one or the other doesn't seem to result in failures (although maybe I just haven't tried enough retriggers). I'll keep looking, but I don't see anything in this patch that would cause a new NSS shutdown failure (the most common cause of these is not releasing some NSS resource).
Assignee | ||
Comment 28•6 years ago
|
||
(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #27)
My guess is it has something to do with that
CanEnableSpeculativeConnect
doesn't have an owning reference to thensINSSComponent
- there's nothing keeping that alive, from what I can tell, although the documentation says the thread will be joined at xpcom shutdown, which should happen strictly before trying to call NSS_Shutdown, so I don't really see how that would prevent NSS from shutting down successfully (I would just expect some uaf crash). It could be that that code isn't threadsafe in NSS and we're merely exercising an existing bug. I've tried to narrow it down to either checking for smartcards or checking for client certs, but only checking for one or the other doesn't seem to result in failures (although maybe I just haven't tried enough retriggers). I'll keep looking, but I don't see anything in this patch that would cause a new NSS shutdown failure (the most common cause of these is not releasing some NSS resource).
The lambda that we pass to the thread holds a ref to the nsINSSComponent - it may not even run by the time we shutdown. That could be the cause.
Looks like I just needed to run more tests: https://treeherder.mozilla.org/#/jobs?repo=try&revision=91bdbc705eddb6fd6270b8f9004764049bc6895a indicates there may be a leak in CERT_FindUserCertsByUsage (maybe caused by a race?). Another thing I noticed is that NSS_IsInitialized() isn't protected by any kind of synchronization - we essentially race on the variable that backs it, so if nsINSSComponent is being held alive after xpcom shutdown somehow, then that check is essentially useless (but again I don't see how this could be happening because we've supposedly joined all threads at that point).
Stepping back, this mechanism is a bit fragile, as we've seen. Would it be possible to re-work things slightly so that when we're speculatively connecting, if the server asks for a client certificate, we just close the connection? (NSS doesn't directly expose a way to do this, I think, so we would have to essentially make a note that this happened in nsNSSSocketInfo that necko checks and then manually closes the connection.)
Assignee | ||
Comment 30•6 years ago
|
||
(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #29)
Looks like I just needed to run more tests: https://treeherder.mozilla.org/#/jobs?repo=try&revision=91bdbc705eddb6fd6270b8f9004764049bc6895a indicates there may be a leak in CERT_FindUserCertsByUsage (maybe caused by a race?). Another thing I noticed is that NSS_IsInitialized() isn't protected by any kind of synchronization - we essentially race on the variable that backs it, so if nsINSSComponent is being held alive after xpcom shutdown somehow, then that check is essentially useless (but again I don't see how this could be happening because we've supposedly joined all threads at that point).
Is CERT_FindUserCertsByUsage supposed to be thread-safe? Because this happens in all of my try runs, so it might be very racy.
Stepping back, this mechanism is a bit fragile, as we've seen. Would it be possible to re-work things slightly so that when we're speculatively connecting, if the server asks for a client certificate, we just close the connection? (NSS doesn't directly expose a way to do this, I think, so we would have to essentially make a note that this happened in nsNSSSocketInfo that necko checks and then manually closes the connection.)
Would that actually work? Looking at bug 910207, it seems like something like this was suggested before, but it didn't work.
I think the problem was that by the time we actually connect, that speculative connection might already be used by a HTTP request.
It looks like what was missing from that original implementation was the part where necko says "ok, the server wanted a client certificate so we killed the speculative connection, and now that the user has actually gone to this site, we need to connect again and allow the client certificate dialog".
Assignee | ||
Comment 32•6 years ago
|
||
(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #31)
It looks like what was missing from that original implementation was the part where necko says "ok, the server wanted a client certificate so we killed the speculative connection, and now that the user has actually gone to this site, we need to connect again and allow the client certificate dialog".
I think if we had a way to pass a callback into NSS that can be used to choose if to show the cert chooser or fail the connection that would be great. I suspect that's not as easy as it sounds :)
Updated•6 years ago
|
Assignee | ||
Comment 33•6 years ago
|
||
Dana, do you think the patch could still be used/useful?
What could I do to work around the issues in comment 29?
It looks like that patch has bitrotted some, but I imagine it's salvageable. In particular, nsNSSComponent::mNSSInitialized
doesn't exist any more, so you shouldn't need to acquire nsNSSComponent::mMutex
at all. Also, the leaks may be fixed by first calling nsNSSComponent::BlockUntilLoadableRootsLoaded();
in each of HasUserCertsInstalled
and HasActiveSmartCards
(just to be safe).
Assignee | ||
Comment 35•6 years ago
|
||
Assignee | ||
Comment 36•6 years ago
|
||
Updated•6 years ago
|
Comment 37•6 years ago
|
||
Comment 38•6 years ago
|
||
bugherder |
Updated•6 years ago
|
Updated•3 years ago
|
Description
•