Closed Bug 717761 Opened 13 years ago Closed 9 years ago

Main thread can be blocked by IO on the cache thread

Tracking

()

Status:

RESOLVED DUPLICATE of bug 913806

People

(Reporter: u408661, Assigned: michal)

References

Details

(Keywords: main-thread-io, perf, Whiteboard: [Snappy:P1][necko-backlog])

Attachments

(4 files, 2 obsolete files)

start of a patch 13 years ago u408661 (deleted), patch		Details \| Diff \| Splinter Review
stacks of bad locks 13 years ago u408661 (deleted), text/plain		Details
stacks of bad locks 13 years ago u408661 (deleted), application/x-gzip		Details
Classify uses of cache lock 13 years ago Brian Smith (:briansmith, :bsmith, use NEEDINFO?) (deleted), patch		Details \| Diff \| Splinter Review
Remove isStorageEnabledForPolicy 13 years ago Brian Smith (:briansmith, :bsmith, use NEEDINFO?) (deleted), patch	u408661 : review-	Details \| Diff \| Splinter Review
Stacks from chrome hangs involving cache operations 12 years ago Vladan Djeric (:vladan) (deleted), text/plain		Details

u408661

Reporter

Description

•

13 years ago

In bug 715774 comment 3 dmandelin identifies a call stack that caused a disturbingly long jank via the cache: NtWaitForSingleObject RtlIntegerToUnicodeString PR_Lock nsCacheService::Lock nsCacheEntryDescriptor::GetMetaDataElement nsHttpChannel::CheckCache nsHttpChannel::Connect nsHttpChannel::OnNormalCacheEntryAvailable nsHttpChannel::OnCacheEntryAvailable nsCacheListenerEvent::Run nsThread::ProcessNextEvent It seems that getting a metadata element from a particular cache entry can block if anything else is being done with the cache service, as we have one global lock for the entire cache. In this case, it runs on the main thread, causing the jank. We should probably have finer-grained locking to prevent things like this from happening. Assigning to myself as part of the cache work to be done this quarter.

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Comment 1

•

13 years ago

There are many problems in this code. One of them is that the HSTS (and SPDY Alternate-Protocol?) checks are happening in the wrong spot--they should be happening BEFORE we looked up the entry in the cache, but currently they are happening AFTER we look up the entry in the cache. But, a more major problem is that it doesn't make sense for the cache to call a callback on the main thread so that the callback can decide whether to read an entry from the cache (involving the cache thread/lock) or from the network (involving the socket transport thread). AFAICT, instead of calling this callback on the main thread, the cache itself should decide whether validation is necessary. If validation isn't necessary, then the cache should return the response directly to the main thread. If validation isn't necessary, it should jump directly to the socket transport thread to do the validation (skipping the main thread). If that validation response comes back 304, then the socket transport thread should transfer control back to the cache thread to read the response, and then the response should be sent to the main thread. If the validation response comes back otherwise, then that response data should be sent to the main thread and then later sent to the cache. More generally, the main thread is responsible for issuing a cache lookup (to the cache thread) or issuing a network request (to the socket transport thread), and it is responsible for receiving the response data or error result. But, between those two points, the main thread should never be involved. And, even more generally than that, there should never be any case where any thread (except the cache thread) obtains any cache lock, except for the lock built into the internals of the cache thread's nsIEventTarget implementation. (All of this AFAICT.) Basically, this is very much like what bz suggested in bug 612632 comment 8. AFAICT, a good solution for bug 612632 will solve this problem too.

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Comment 2

•

13 years ago

Although the I/O doesn't happen directly on the main thread, the main thread is blocked (because of the lock) on I/O happening on the cache thread, so this is effectively main thread I/O.

Keywords: main-thread-io, perf

Whiteboard: [Snappy]

Version: unspecified → Trunk

u408661

Reporter

Comment 3

•

13 years ago

(In reply to Brian Smith (:bsmith) from comment #1) > Basically, this is very much like what bz suggested in bug 612632 comment 8. > AFAICT, a good solution for bug 612632 will solve this problem too. While I agree in general that we should be better for validation to be non-blocking in some way or another (which is why I filed this bug), that's different (in my view) from what bz is talking about. He's talking, effectively, about reducing the number of "round trips" in the cache, while this bug is talking about not being dumb about our locking. Perhaps they can/will both be resolved in the same patch, but I'm not convinced of that, since we could easily solve one without solving the other. (I would even argue that we SHOULD do this piecemeal, to make it easier to identify errors we may have made in the implementation of one versus the other.)

u408661

Reporter