Consider optionally enabling the cache chunks behaviour
Categories
(Core :: Networking: Cache, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox87 | --- | affected |
People
(Reporter: acreskey, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: [necko-triaged])
There is disabled cache behaviour, CACHE_CHUNKS, that caches unused cache file chunks even when not using memory cache.
Mayhemer suggested that it could be worth looking into as a way of improving cache performance without going completely to the memory cache.
While I have been able to measure significant improvements using only the memory cache, so far with CACHE_CHUNKS, I'm seeing that it seems to slow down cached request completion.
I measured the time from nsHttpChannel::AsyncOpen()
to nsHttpChannel::OnStopRequest()
for cached, items.
On a motoG5, over a series of pageloads, this looks to slow down mean request completion time by ~25%.
------------------
baseline
count 1511.000000
mean 490.058240
std 1738.014898
min 5.000000
10% 15.000000
20% 19.000000
40% 38.000000
50% 48.000000
60% 59.000000
80% 84.000000
90% 443.000000
max 11652.000000
------------------
------------------
test (CACHE_CHUNKS)
count 1518.000000
mean 619.027009
std 1598.994609
min 5.000000
10% 15.000000
20% 19.000000
40% 37.000000
50% 46.000000
60% 58.000000
80% 89.600000
90% 2462.900000
max 7708.000000
------------------
Any thoughts on why this is case, or if I'm missing something would be appreciated.
Comment 1•4 years ago
|
||
If you just uncomment #define CACHE_CHUNKS then we'll very shortly hit max_chunks_memory_usage limit which will break cache completely. Just for testing, you can disable the limit and see if it makes a positive change. For proper implementation, you would need to introduce some kind of eviction. This was never implemented, see e.g. this commented-out code.
Reporter | ||
Comment 2•4 years ago
|
||
That's very helpful, thank you.
As a first step, I will see how it behaves without the limit.
Updated•4 years ago
|
Reporter | ||
Comment 3•4 years ago
|
||
I haven't been able to measure a difference even with no limit on the cache chunks memory usage.
But I don't like that I'm only measuring on Android.
So I'm shifting to adding a way to collect these fine-grained timings in our CI performance tests where I can run on all devices (Bug 1690373).
Reporter | ||
Comment 4•4 years ago
|
||
Using the same process that is described in this bug, I've evaluated the impact of enabling CACHE_CHUNKS
(with an unlimited cache). Patch here.
The test measures total http channel completion time as the user visits 5 popular sites (cold page load), and the revisits them again (warm pageload).
Unfortunately, I'm not picking up any measurable improvements:
Linux
MacOS
Windows
I disabled RCWN to simplify the runtime, but I also saw the same lack of measurable improvements with RCWN enabled.
I'll do some profiling tomorrow, but I think that maybe the cache is really not a bottleneck in these scenarios.
If anyone has any ideas, let me know!
Comment 5•4 years ago
|
||
IMO, the best thing to do here is to check in the log that the chunks are really reused during the warm pageload. The patch just ensures that the chunks are not freed once they are used, but their lifetime is tied to the lifetime of the CacheFile, so if the CacheFile goes away and a new one is created for the reload, then it has no effect.
Also, the chunks could still be freed in CacheFile::ThrowMemoryCachedData(), but AFAICS that's possible only by calling nsICacheStorageService::purgeFromMemory(nsICacheStorageService::PURGE_DISK_DATA_ONLY) which I don't expect to be called during the test.
Reporter | ||
Comment 6•4 years ago
|
||
(In reply to Michal Novotny [:michal] from comment #5)
IMO, the best thing to do here is to check in the log that the chunks are really reused during the warm pageload. The patch just ensures that the chunks are not freed once they are used, but their lifetime is tied to the lifetime of the CacheFile, so if the CacheFile goes away and a new one is created for the reload, then it has no effect.
Also, the chunks could still be freed in CacheFile::ThrowMemoryCachedData(), but AFAICS that's possible only by calling nsICacheStorageService::purgeFromMemory(nsICacheStorageService::PURGE_DISK_DATA_ONLY) which I don't expect to be called during the test.
Ah, yes, I'm not 100% sure yet, but it looks like the CacheFiles might be purged quite early because of a tight limit on browser.cache.disk.metadata_memory_limit
of a quarter meg.
Called from here.
Investigating.
Reporter | ||
Comment 7•4 years ago
|
||
As far as I can tell, this log is the best indication that we are using the cached chunks:
https://searchfox.org/mozilla-central/rev/5e70cd673a0ba0ad19b662c1cf656e0823781596/netwerk/cache2/CacheFile.cpp#1437-1439
In a simplified scenario of just two sites:
The baseline builds only logs that once.
But it's logged 130 times with CACHE_CHUNKS
set (browser.cache.disk.max_chunks_memory_usageset to infinite, and
browser.cache.disk.metadata_memory_limit` increased.)
Something I noticed from local debugging is that the CacheFiles are released after their metadata is written (they drop out of function scope)
https://searchfox.org/mozilla-central/rev/5e70cd673a0ba0ad19b662c1cf656e0823781596/netwerk/cache2/CacheFileIOManager.cpp#1493
I'm not sure yet if that's affecting anything.
Comment 8•4 years ago
|
||
(In reply to Andrew Creskey [:acreskey] [he/him] from comment #7)
As far as I can tell, this log is the best indication that we are using the cached chunks:
https://searchfox.org/mozilla-central/rev/5e70cd673a0ba0ad19b662c1cf656e0823781596/netwerk/cache2/CacheFile.cpp#1437-1439But it's logged 130 times with
CACHE_CHUNKS
set (browser.cache.disk.max_chunks_memory_usageset to infinite, and
browser.cache.disk.metadata_memory_limit` increased.)
This shows that some chunks are re-used without reading them from the disk again, but it doesn't tell how often we need to go to the disk. You can compare it with the number of chunks that were read from the disk, see (2) below.
Something I noticed from local debugging is that the CacheFiles are released after their metadata is written (they drop out of function scope)
https://searchfox.org/mozilla-central/rev/5e70cd673a0ba0ad19b662c1cf656e0823781596/netwerk/cache2/CacheFileIOManager.cpp#1493
I'm not sure yet if that's affecting anything.
This means that CacheFile was released by CacheEntry, so it's destroyed after finalizing the file on the disk.
If the cache is empty when you start the test, then you can check 2 logs:
-
https://searchfox.org/mozilla-central/rev/6a41d8d7ad42d57cf1adcc7067fd8f1689c42ab3/netwerk/cache2/CacheEntry.cpp#455
If you see any line with "new=0" it means that the CacheFile was released and then initialized again to serve the data from the disk. -
https://searchfox.org/mozilla-central/rev/6a41d8d7ad42d57cf1adcc7067fd8f1689c42ab3/netwerk/cache2/CacheFile.cpp#1482
This is logged when the chunk is read from the disk.
Reporter | ||
Comment 9•4 years ago
|
||
Thanks Michal.
Here I've counted of the number of times each log is present.
The cache is empty when the test is started (it's always a new profile).
It looks to me like we are seeing the chunks being cached.
baseline CACHE_CHUNKS
CacheFile::GetChunkLocked() - Reusing cached chunk 4 316
CacheEntry::OnFileReady, where new=0 364 0
CacheFile::GetChunkLocked() - Reading newly created chunk 374 17
So far with these changes I've reproduced some improvements in mean HttpChannelCompletion from cache, here and here.
But none have been picked up as high confidence by perfherder's t-test compare.
I'm running a compare now where I've stripped out the cold page load from the measurement to focus on the warm page load.
Reporter | ||
Comment 10•4 years ago
|
||
This is what I see when only measuring warm pageload:
MacOS: 4.28% improvement in cache timing, low confidence
Windows: 5.23% improvement in cache timing, low confidence
Linux: 1.21% regression in cache timing, low confidence
Since it looks to me that cache chunk is working (comment 9), I'll make sure this reproduces.
Reporter | ||
Comment 11•4 years ago
|
||
I ran the warm pageload test again, this time it did pick up a high confidence improvement on MacOS:
Linux: ~2.4% to 2.8% improvement in network timing, low confidence
I'm running it now with RCWN re-enabled.
Reporter | ||
Comment 12•4 years ago
|
||
This is interesting.
With RCWN enabled, I believe we can see that this improves cache performance:
More channels end up completing via cache, which has the effect of increasing the perfstat HttpChannelCompletion_Cache
(high confidence) and decreasing the total durations for HttpChannelCompletion_Network
(high confidence).
Overall time spent completing network requests, HttpChannelCompletion
is improved. (3.39% low confidence, macOS).
A similar effect is seen on all three desktop platforms:
Reporter | ||
Comment 13•4 years ago
|
||
I won't be working on platform performance for a while so I'll summarize.
I believe that we can see reproducible evidence of a modest performance improvement from the Cache Chunks behaviour (given large cache sizes).
On spinning disk hardware I would expect this improvement to be larger, but we are not well setup test on platter drives. In addition, from recent scans of top selling PC hardware, the platter drive is quickly becoming the minority.
I see additional potential for this feature on Android when the user is on a cellular connection and RCWN is disabled.
Updated•3 years ago
|
Updated•3 years ago
|
Description
•