Closed Bug 767277 Opened 12 years ago Closed 12 years ago

Close cache entries off the main thread to avoid the most common cause of cache-related jank

Categories

(Core :: Networking: HTTP, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
mozilla17
Tracking Status
firefox14 - ---
firefox15 - ---
firefox16 - ---

People

(Reporter: briansmith, Assigned: briansmith)

References

(Depends on 1 open bug)

Details

(Keywords: main-thread-io, perf, Whiteboard: [Snappy:p1])

Attachments

(1 file, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #767275 +++ Using the telemetry in bug 767275, I found that nsHttpChannel::OnStopRequest() is the place where we most frequently wait on the cache service lock on the main thread for janky amounts of time (>=25ms). I believe this is because, right before we call onStopRequest, we write data to the cache entry on a background thread, and that background thread holds the lock during the write. This patch is pretty straightforward, except for the tricky bit about cacheAsFile. Before this patch, there was a hack that marks the cache entry valid in the case that XHR had requested the cache entry to be marked valid before we notify the listener that the response is finished. If we don't do that, then at least one of the XHR blob-related mochitests tests (test_xhr_progressevents.html, I believe) will fail. But, the place where we check to see if we are caching the entry as a file is EXACTLY where the jank tends to occur. Accordingly, I added a hack on top of that hack to ensure we only check if the entry is being cached as a file if somebody explicitly told us to via SetCacheAsFile. This will do the right thing for XHR, which is really all we care about, AFAICT. (We really need to kill cacheAsFile.) In my local testing, as measured by the telemetry I added in bug 767275, this made a substantial reduction in the amount of jank. Asking for two reviews because this may be trickier than it appears, as we've seen in the case of bug 764171.
Attachment #635645 - Flags: review?(michal.novotny)
Attachment #635645 - Flags: review?(honzab.moz)
This feels like something we'd be willing to take on Aurora 15 once ready, but it's unclear that the user impact is high enough to warrant the risk to 14.
Comment on attachment 635645 [details] [diff] [review] Close cache entries on the cache service thread in nsHttpChannel to avoid cache-lock-related jank Review of attachment 635645 [details] [diff] [review]: ----------------------------------------------------------------- This patch doesn't replace CloseCacheEntry(true) in nsHttpChannel::OnCacheEntryAvailable(). ::: netwerk/cache/nsCacheEntryDescriptor.cpp @@ +11,5 @@ > #include "nsCacheEntry.h" > #include "nsReadableUtils.h" > #include "nsIOutputStream.h" > #include "nsCRT.h" > +#include "nsThreadUtils.h" This is probably some leftover from a previous version. ::: netwerk/protocol/http/nsHttpChannel.cpp @@ +334,5 @@ > + MOZ_ASSERT(!(mFlags & nsHttpChannel::CLOSE_CACHE_ENTRY_DOOM_ON_FAILURE)); > + } > +private: > + const nsCOMPtr<nsICacheEntryDescriptor> mCacheEntry; > + const nsHttpChannel::CloseFlags mFlags; What's a point of having these const? @@ +345,5 @@ > + LOG(("dooming cache entry!!")); > + mCacheEntry->Doom(); > + } > + > + mCacheEntry = NULL; nsnull @@ +5067,5 @@ > bool asFile = false; > if (mInitedCacheEntry && !mCachedContentIsPartial && > (NS_SUCCEEDED(mStatus) || contentComplete) && > (mCacheAccess & nsICache::ACCESS_WRITE) && > + mWasAskedToStoreOnDisk && I think this really needs a comment explaining why it is here. A link to the comment in bugzilla would be sufficient. ::: netwerk/protocol/http/nsHttpChannel.h @@ +343,5 @@ > > // state flags > PRUint32 mCachedContentIsValid : 1; > PRUint32 mCachedContentIsPartial : 1; > + PRUint32 mWasAskedToStoreOnDisk : 1; Maybe mWasAskedToCacheAsFile would be a slightly better name.
Attachment #635645 - Flags: review?(michal.novotny) → review-
Comment on attachment 635645 [details] [diff] [review] Close cache entries on the cache service thread in nsHttpChannel to avoid cache-lock-related jank Review of attachment 635645 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/protocol/http/nsHttpChannel.cpp @@ +330,5 @@ > + { > + // CloseCacheEntry should have set or cleared CLOSE_CACHE_ENTRY_DOOM > + // accordingly if it was passed CLOSE_CACHE_ENTRY_DOOM_ON_FAILURE instead of > + // forwarding CLOSE_CACHE_ENTRY_DOOM_ON_FAILURE here. > + MOZ_ASSERT(!(mFlags & nsHttpChannel::CLOSE_CACHE_ENTRY_DOOM_ON_FAILURE)); According this assertion, doesn't it make more sense just pass a bool to this class? @@ +671,5 @@ > if (mCacheEntry) { > if (NS_FAILED(rv)) > mCacheEntry->Doom(); > } > + CloseCacheEntry(0); Hmm... have an enum value for this too. Also, there is a call to Doom() right above, could that be optimized too? @@ +1499,5 @@ > // expiration time (bug 87710). > if (mCacheEntry) { > rv = InitCacheEntry(); > if (NS_FAILED(rv)) > + CloseCacheEntry(CLOSE_CACHE_ENTRY_DOOM); CLOSE_CACHE_ENTRY_DOOM_ON_FAILURE ? @@ +3644,5 @@ > mCacheEntry = 0; > mCacheAccess = 0; > mInitedCacheEntry = false; > + > + DispatchCacheRunnable(r); According this, you are also postponing release of the cache entry to be made on the cache thread even there is no need to doom it. It means that parallel loads waiting for this cache entry to close may be delayed. Should we measure this as a regression? BTW, this goes beyond scope of this bug, though I don't say it's wrong. ::: netwerk/protocol/http/nsHttpChannel.h @@ +225,5 @@ > nsresult CheckCache(); > bool ShouldUpdateOfflineCacheEntry(); > nsresult ReadFromCache(bool alreadyMarkedValid); > + > + typedef PRUint32 CloseFlags; Call this CloseCacheEntryFlags please...
Attachment #635645 - Flags: review?(honzab.moz) → review-
(In reply to Michal Novotny (:michal) from comment #2) > This patch doesn't replace CloseCacheEntry(true) in > nsHttpChannel::OnCacheEntryAvailable(). Thanks for noticing this. > > + const nsCOMPtr<nsICacheEntryDescriptor> mCacheEntry; > > + const nsHttpChannel::CloseFlags mFlags; > > What's a point of having these const? I tend to make everything const by default, as a way of documenting that it doesn't change. But, mCacheEntry shouldn't be const here because of this: > > + mCacheEntry = NULL; > > nsnull > I think this really needs a comment explaining why it is here. A link to the > comment in bugzilla would be sufficient. OK, I will try to clarify the comment. > Maybe mWasAskedToCacheAsFile would be a slightly better name. +1. (In reply to Honza Bambas (:mayhemer) from comment #3) > > + // forwarding CLOSE_CACHE_ENTRY_DOOM_ON_FAILURE here. > > + MOZ_ASSERT(!(mFlags & nsHttpChannel::CLOSE_CACHE_ENTRY_DOOM_ON_FAILURE)); > > According this assertion, doesn't it make more sense just pass a bool to > this class? A patch on top of this (in a follow-up bug) will pass additional flags, so I'd rather leave it as is. > > + CloseCacheEntry(0); > > Hmm... have an enum value for this too. > > Also, there is a call to Doom() right above, could that be optimized too? Seems like it; I will check. > > if (NS_FAILED(rv)) > > + CloseCacheEntry(CLOSE_CACHE_ENTRY_DOOM); > > CLOSE_CACHE_ENTRY_DOOM_ON_FAILURE ? If we failed to initialize the cache entry then we must doom it regardless of mStatus, right? > According this, you are also postponing release of the cache entry to be > made on the cache thread even there is no need to doom it. It means that > parallel loads waiting for this cache entry to close may be delayed. Should > we measure this as a regression? BTW, this goes beyond scope of this bug, > though I don't say it's wrong. The goal of this bug is to make sure we *always* close the cache entry off the main thread, so I don't see any scoping issue, but maybe I am overlooking something? Definitely, I don't intend for this to happen only in the case where we doom entries, which is relatively uncommon. I did not see any worsening of page load times with this patch (in fact, if anything, things were faster in my local testing). We will have to watch the HTTP_PAGE_* telemetry after this lands. > Call this CloseCacheEntryFlags please... Thanks guys!
Assignee: nobody → bsmith
No longer depends on: 769493
No longer depends on: 769491
Here's the patch, updated to address the review comments. I also moved the some of the closings of mCacheInputStream off the main thread.
Attachment #635645 - Attachment is obsolete: true
Attachment #637852 - Flags: review?(michal.novotny)
Attachment #637852 - Flags: review?(honzab.moz)
Here is the tryserver run for these changes: https://tbpl.mozilla.org/?tree=Try&rev=bce013d8a04b
Assignee: bsmith → nobody
Component: Networking: Cache → Networking: HTTP
QA Contact: networking.cache → networking.http
(In reply to Brian Smith (:bsmith) from comment #6) > Here is the tryserver run for these changes: > https://tbpl.mozilla.org/?tree=Try&rev=bce013d8a04b The try run looks shockingly good other than Windows which is taking forever to complete. Would it be possible to get a review today or tonight?
Attachment #637852 - Flags: review?(michal.novotny) → review+
Comment on attachment 637852 [details] [diff] [review] Close cache entries on the cache service thread in nsHttpChannel to avoid cache-lock-related jank [v2] Review of attachment 637852 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/protocol/http/nsHttpChannel.cpp @@ +5438,5 @@ > { > if (!mCacheEntry || mLoadFlags & INHIBIT_PERSISTENT_CACHING) > return NS_ERROR_NOT_AVAILABLE; > nsCacheStoragePolicy policy; > + mWasAskedToCacheAsFile = value; Could somebody drop this to false and thus break the logic in OnStopRequest?
Attachment #637852 - Flags: review?(honzab.moz)
Attachment #637852 - Flags: review+
I pushed the patch for bug 767275 and I will wait ~3-5 days to push this patch so that we use the new telemetry to see before/after results. (In reply to Honza Bambas (:mayhemer) from comment #8) > ::: netwerk/protocol/http/nsHttpChannel.cpp > @@ +5438,5 @@ > > { > > if (!mCacheEntry || mLoadFlags & INHIBIT_PERSISTENT_CACHING) > > return NS_ERROR_NOT_AVAILABLE; > > nsCacheStoragePolicy policy; > > + mWasAskedToCacheAsFile = value; > > Could somebody drop this to false and thus break the logic in OnStopRequest? I will change this to mWasAskedToCacheAsFile = mWasAskedToCacheAsFile || value just to be safe.
Brian - are we good to go here?
Assignee: nobody → bsmith
Whiteboard: [Snappy] → [Snappy:p1]
Target Milestone: mozilla16 → mozilla17
Backed out because I suspect it is causing the test for bug 770243 to fail: https://hg.mozilla.org/integration/mozilla-inbound/rev/71854c060f90 The problem case is this: We async dispatch a runnable to close the cache entry "later," thus leaving the cache entry open for a while after we've marked it valid. During that time between when we marked the cache entry valid and the time we close it, another channel decides to open the cache entry. That channel gets a ACCESS_READ descriptor instead of an ACCESS_READ_WRITE descriptor, and so it skips the validation of the cached response. Bug 770243's tests passed previously when this patch was not applied (we should verify this last claim). But, with this patch, case 4 of that bug's tests fails. If I modify that case to use do_execute_soon, then the test succeeds. Debugging the failure, I saw that the cache is returning an ACCESS_READ descriptor. This causes HttpCacheQuery::CheckCache() to skip the validation of the entry on the assumption that it is valid. But, the test only works when each channel opened does revalidation.
15 is now on Beta. Same as with comment 1, this probably carries too much risk for uplifting to that branch.
Apparently the consensus is that this is a bad idea and I'm a bad person for having come up with it. Michal is working on another, more general, prettier solution.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: