Collision on NS find_objects()/sqlite3PagerSharedLock() when fetching OSCP causes extereme variance in page load times
Categories
(Core :: Security: PSM, task, P1)
Tracking
()
Performance Impact | high |
Tracking | Status | |
---|---|---|
firefox69 | --- | affected |
People
(Reporter: acreskey, Assigned: jesup)
References
Details
(Keywords: perf:pageload, Whiteboard: [psm-assigned])
Attachments
(3 files)
It was observed that there is extreme variance in the page load times for https://www.washingtonpost.com
e.g. results from ~2000ms to ~5000ms when testing against a recorded http session.
Finding the root cause may help us pinpoint and reduce noise in our runtime.
Notes:
Browsertime was used against static sites recorded with WebPageReplay.
The attached graph shows the distribution of 2 sets of 25 cold loads on the 2017 reference laptop.
Both sets of results, straight_wpr_0
and straight_wpr_1
were made with identical configurations. i.e. outside of noise you would expect the same results.
Reporter | ||
Comment 1•5 years ago
|
||
jesup:
I've attached captured profiles through browsertime.
geckoProfile-7 and geckoProfile-25 are of the more common fast load times (~2100ms)
geckoProfile-19 and geckoProfile-22 are two of the slowest loads (~5000ms)
Added Socket,JS to the thread list and also turned off responsiveness.
Reporter | ||
Comment 2•5 years ago
|
||
There's a video which autoplays on this recording, by the way.
Reporter | ||
Comment 3•5 years ago
|
||
I noticed that in the slow loads the network requests visible in the content process are generally "waiting for socket thread"
for 4800+ms.
Reporter | ||
Comment 4•5 years ago
|
||
Added 'SSL,DNS' to the thread list.
Fast load (2164ms)
http://bit.ly/30PixDL
Slow load (4781ms)
http://bit.ly/30PYDbQ
I think it's the OCSP requests in the SSL cert threads that have to wait for main thread?
Reporter | ||
Comment 5•5 years ago
|
||
Some profiles from live sites to see if the OCSP delays are a biproduct of the webpagereplay recording:
Faster live site: 3357ms
http://bit.ly/2W2XcDv
Slow live site: 6281ms (why are the no samples from the parent process from the first 5seconds?)
http://bit.ly/2VZliPd
Reporter | ||
Comment 6•5 years ago
|
||
Note that I did see long, eg 862ms, DoOCSPRequest
calls in the SSL Cert threads on the slow live site.
Assignee | ||
Comment 7•5 years ago
|
||
SSL threads are colliding with each other for an sqlite DB access critical section, and with the SocketThread.
Assignee | ||
Comment 8•5 years ago
|
||
Clip from #necko:
(base profile: http://bit.ly/2KbbJdI)
[15:14] jesup Socket thread is 90% in ssl3_HandleHandshakeMessage(), split evenly between CERT_NewTempCertificate(), HandshakeCallback() and tls13_HandlePostHelloHandshakeMessage() basically
...
[ lots of blocking involving sqlite3PagerSharedLock() ]
...
[15:30] jesup both the socketthread and the SSL Cert#2 thread are calling find_objects and blocking on a critical section
[15:33] mayhemer I'm honestly lost in the profile, hard to figure out where delays are in the graphs and where to actually start
[15:36] jesup Summary: the socketthread and the SSL Cert#2 thread are calling find_objects() and blocking on a critical section. This eats the first 1/3. then SSL cert #2 dispatches to mainthread (which likely hands off to Socket Thread) the ssl connection to get the OCSP, and 1/2 of that is CERT_NewTempCertificate() (and waiting on the DB), and 1/2 is tls13_HandlePostHelloHandshakeMessage() (which...
[15:36] jesup ...blocks on find_objects() again)
[15:38] jesup let me find who owns that find_objects critical section...
[15:40] jesup aha, the lock on the sqlite is within find_objects also, but apparently not in the critical section
[15:40] jesup aha, it's fighting with SSL Cert #5
[15:41] jesup Look at this profile, with find_objects highlighted: http://bit.ly/2Ke3qOA
[15:43] jesup Note that in the two sections where SocketThread is blocked on the critical section in find_objects(), SSL cert #5 is in find_objects. And SSL Cert #6 is also fighting for it
Assignee | ||
Comment 9•5 years ago
|
||
[15:46] mayhemer in other words, we are doing IO under the lock!
[15:46] mayhemer I like the busy waits sqlite does.. 2x 100ms sleep
[15:48] acreskey jesup: good find! I think longterm that sqllite DB is going away
[15:50] mayhemer I just wonder how to fix this. this seems to me like a nss bug, since it calls a sync API to read data (always a bad thing)
[15:51] mayhemer note that this may also go to HW security tokens, not just disk
Assignee | ||
Comment 10•5 years ago
|
||
Moving to Networking; this might be more an NSS issue/design issue, but it's at least partly an issue in how networking is interfacing to NSS.
Comment 12•5 years ago
|
||
The socket thread is blocked by calling SSL_ForceHandshake from nsNSSSocketInfo::DriveHandshake:
RtlEnterCriticalSection
struct nssCryptokiInstanceStr * * find_objects(struct NSSTokenStr *, struct nssSessionStr *, struct CK_ATTRIBUTE *, unsigned long, unsigned int, <unnamed-tag> *)
nssToken_FindObjectsByTemplate
nssToken_FindCertificateByIssuerAndSerialNumber
nssTrustDomain_FindCertificateByIssuerAndSerialNumber
NSSTrustDomain_FindCertificateByEncodedCertificate
CERT_NewTempCertificate
ssl3_HandleHandshakeMessage
ssl3_HandleNonApplicationData
ssl3_HandleRecord
ssl3_GatherCompleteHandshake
SSL_ForceHandshake
nsNSSSocketInfo::DriveHandshake()
bool mozilla::net::nsHttpConnection::EnsureNPNComplete(nsresult & const, unsigned int & const)
mozilla::net::nsHttpConnection::OnSocketWritable()
Cache I/O thread is deserializing security info:
find_objects
nssToken_FindObjectsByTemplate
nssToken_FindCertificateByIssuerAndSerialNumber
nssTrustDomain_FindCertificateByIssuerAndSerialNumber
nssTrustDomain_FindCertificateByEncodedCertificate
CERT_NewTempCertificate
CERT_DecodeCertFromPackage
nsNSSCertificate::InitFromDER(char*, int)
nsNSSCertificate::Read(nsIObjectInputStream*)
nsBinaryInputStream::ReadObject(bool, nsISupports**)
NS_ReadOptionalObject(nsIObjectInputStream*, bool, nsISupports**)
mozilla::psm::TransportSecurityInfo::Read(nsIObjectInputStream*)
nsBinaryInputStream::ReadObject(bool, nsISupports**)
NS_DeserializeObject(nsTSubstring<char> const&, nsISupports**)
mozilla::net::CacheEntry::GetSecurityInfo(nsISupports**)
mozilla::net::nsHttpChannel::OpenCacheInputStream(nsICacheEntry*, bool, bool)
mozilla::net::nsHttpChannel::OnCacheEntryCheck(nsICacheEntry*, nsIApplicationCache*, unsigned int*)
mozilla::net::CacheEntry::InvokeCallback(mozilla::net::CacheEntry::Callback&)
mozilla::net::CacheEntry::InvokeCallbacks(bool)
mozilla::net::CacheEntry::InvokeCallbacks()
mozilla::net::CacheEntry::OnFileReady(nsresult, bool)
non-virtual thunk to mozilla::net::CacheEntry::OnFileReady(nsresult, bool)
mozilla::net::CacheFile::OnMetadataRead(nsresult)
non-virtual thunk to mozilla::net::CacheFile::OnMetadataRead(nsresult)
mozilla::net::CacheFileMetadata::OnDataRead(mozilla::net::CacheFileHandle*, char*, nsresult)
mozilla::net::ReadEvent::Run()
mozilla::net::CacheIOThread::LoopOneLevel(unsigned int)
mozilla::net::CacheIOThread::ThreadFunc()
Assignee | ||
Comment 13•5 years ago
|
||
We're seeing this sort of problem crop up regularly in test runs; here's one from foxnews:
https://perfht.ml/2X4bwjw
2s OCSP response - 1s "waiting for Socket Thread", 1s waiting for DNS. From previous investigations, we'd likely find (like above) that we have a bunch of SSL threads and the Socket thread fighting over the sqlite DB via find_names().
Also this profile https://perfht.ml/2QIC4RE (simpleperf on an geckoview_example startup) shows Cache2 IO hitting find_names() as well, for deserialization of security info stored in cache entries.
Is there any way to reduce the delay when we collide (change the 100ms backoff in sqlite perhaps (per mayhemer)), or to make these async operations in some manner? Or even serialize all of these onto a single thread (even if it blocks callers) to avoid the 100ms backoff behavior?
I hear the sqlite DB is to be replaced with an rkv store - when would this occur? Bug#? Would that impact this pattern/delay?
The NSS sqlite DB is not being replaced with rkv. You're probably thinking of intermediate preloading (bug 1535662), which we're still in the process of shipping (it's only enabled on Nightly right now). Intermediate preloading helps a bit with this contention by potentially avoiding a call to find_objects
(bug 1557092).
I think the issue here is the transaction created by the intermediate caching code importing certificates[0] is blocking readers. Setting PRAGMA journal_mode = WAL;
made a test program I wrote that pits a reader thread against a writer thread go orders of magnitude faster, but I'm not sure if we can set and ship that in NSS (for one, the sqlite documentation page says it doesn't work with DBs on networked drives - I don't know if that means "it just won't do WAL and falls back to DELETE" or if it means "it'll fail in byzantine ways if you try"). To reduce the long sleeps we could override the default sqlite busy handler, but that would still block any threads trying to read while the transaction completes. Since that involves syncing to the underlying filesystem, that'll always be slow. Another option would be to stop caching intermediates altogether. In theory once intermediate preloading ships intermediate caching is less important (it's mostly a band-aid anyway to help with misconfigured servers).
[0] See bug 1529044 for the most recent implementation, although the feature has been around for a long time. Also, any code that imports new certificates would have this problem. Luckily, it's rare that users do this.
Assignee | ||
Comment 15•5 years ago
|
||
I see this is currently done off an Idle runnable... is that causing this collision?
I do see all the SSL threads and the SocketThread calling VerifyCert() https://perfht.ml/2RbtLhH and this profile shows SSL Thread 4 (https://perfht.ml/2R9WjYM) running PK11_ImportCerts() throughout this time period.
My concern with using 'idle' priority here is that it doesn't mean we're really idle... this is off-main-thread, so unless there's other SSL runnables queued to this SSL thread, those are empty, and the only thing delaying us running the idle runnable is GetIdlePeriodHint(), which looks at upcoming RefreshDriver and timer executions. The net result is that we'll be pretty likely to run it even if we're super-busy on MainThread.
Idle as currently designed was intended for MainThread usage, unfortunately. You could dispatch an Idle runnable to mainThread that dispatched a runnable to an SSL thread; that would ensure we run it in a real MainThread idle period (or very close to). This wouldn't stop all collisions, but I suspect would make them much less likely.
Perhaps even better would be running this off some "deep idle" callback, with a (soft) time limit to avoid failing to import forever, but waiting for a longer idle period.
Assignee | ||
Comment 16•5 years ago
|
||
So unique_ptr<> and lambdas hate each other, and this doesn't compile (nor any mutation of it I've tried); one could stop trying to abuse lambdas and just create a runnable class that does "if (NS_IsMainThread()) { thread->Dispatch(this, NS_DISPATCH_NORMAL); return; }", and send that runnable to the MainThread idle queue to start. I'd advise that instead of twisting the code in a knot to make the 3-level-lambda thing work.
I was trying this to a) ensure it works, and b) see if the collisions (mostly) disappear
f?'s to put it on both of your radars. Andrew, if we can get a working patch we can see if this makes the noise go away or lessen.
You could "best case" test it by turning off intermediate cert import like you tried before
Assignee | ||
Comment 17•5 years ago
|
||
So, trying to run this on "Idle" on an SSL thread is exactly the same as just Dispatching it; non-MainThreads don't implement PrioritizedEventQueue
Assignee | ||
Comment 18•5 years ago
|
||
When an intermediate cert save is needed, we could Dispatch to Socket Thread to start (or reset if running) a Timer. When the timer fires, either Dispatch to SSL Cert threadpool to save, or Dispatch to MainThread Idle queue, and when that runs that Dispatch to SSL Cert to save. Perhaps use a 10s or 60s timer, with perhaps a deadline that we won't reset the timer to more than N seconds from the oldest dirty data waiting (example: 30s timer, with a max 2 min dirty time). It depends on how much we care about possible not writing it out. (We could also force a write as part of shutdown if dirty, perhaps)
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #15)
I see this is currently done off an Idle runnable... is that causing this collision?
I don't think that's the issue - it's that we have many threads wanting to read the sqldb and some threads wanting to occasionally write to the sqldb (the fact that NSS decoding a certificate results in reading the sqldb is an unfortunate consequence of design decisions made in NSS).
Perhaps even better would be running this off some "deep idle" callback, with a (soft) time limit to avoid failing to import forever, but waiting for a longer idle period.
Right - we basically want to do this when the network is idle. Maybe something like that already exists?
Also, it's not really a problem if importing gets delayed indefinitely (until shutdown, basically) because as long as we hold those certificates alive in memory, we'll be able to find them during verification, which is why we're saving them in the first place. Of course, saving a bunch of intermediates at shutdown might have performance characteristics we want to avoid.
One thing to keep in mind is that when we ship intermediate preloading, we have the option of saving certificates in cert_storage (rkv), which won't have this problem. (The issue is there's some crash bugs and performance issues we have to solve before we can ship that.)
Comment 21•5 years ago
|
||
Necko currently can't provide any net-idle notification (like that there are no requests happening for a while) but we have a mechanism that we could relatively easily base such a notification on.
Reporter | ||
Comment 22•5 years ago
|
||
I can try out some of these variations in the next bit.
I'd like to find a scenario in CI where I can see the impact of this -- the tp6 mobile cold load tests should do it.
But definitely I will need to do some local performance measurement for sites like foxnews.com.
Reporter | ||
Comment 24•5 years ago
|
||
Assignee | ||
Comment 25•5 years ago
|
||
You may need to do the test in browsertime where it was first seen, not in raptor, especially if there are network effects here (like really fast OCSP responses, or something)
Reporter | ||
Comment 26•5 years ago
|
||
Ah, from a slack conversation (thank you Dana) I realized why I'm not see intermediate import in our raptor tests.
We import the mitmproxy CA cert into firefox so we can connect directly to mitmproxy to return the archived http responses.
And so no intermediates are encountered during testing. (Same thing when using WebPageReplay)
Reporter | ||
Comment 27•5 years ago
|
||
FYI, I just found out that we don't do many OCSP request on android because by pref we only make them for EV certificates:
https://searchfox.org/mozilla-central/rev/5e660d3dfcba897c8501e3fda1d415565a096e7e/mobile/android/app/mobile.js#421
Comment 28•5 years ago
|
||
A lock convoy through CERT_NewTempCertificate() and find_objects() seem to be implicated in bug 1577944 (see comment 2) via a new (non OCSP) path to CERT_NewTempCertificate() that was added in bug 1551177.
I believe bug 1694542 addressed this - Randell, can you confirm?
Assignee | ||
Comment 30•3 years ago
|
||
forwarding to andrew
Reporter | ||
Comment 31•3 years ago
|
||
Let me just double check this on the reference laptop. I'll need a day for it to stabilize from OS updates.
Reporter | ||
Comment 32•3 years ago
|
||
I haven't been able to find any cases where the OCSP requests are blocked.
Trying the usual sites and I can see the requests but they are completing quite quickly.
Great - thanks!
Updated•3 years ago
|
Description
•