Closed
Bug 597224
Opened 14 years ago
Closed 14 years ago
HTTP Cache: use directory tree to store cache files
Categories
(Core :: Networking: Cache, defect)
Core
Networking: Cache
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
blocking2.0 | --- | beta9+ |
People
(Reporter: jduell.mcbugs, Assigned: michal)
References
(Blocks 2 open bugs)
Details
Attachments
(1 file, 3 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review |
The squid developers have told us that some filesystems start to degrade in performance once they have 256 or more files per directory, and we currently dump 1000s of cache files into just one directory. This will be alleviated somewhat by bug 593198 (which we should probably do first), but we'll still probably want to move to use multiple directories.
(unless we determine that all our filesystems of interest do OK with lots of files--including mobile devices.)
Comment 1•14 years ago
|
||
I'm currently using a cache directory which is itself heavily fragmented, because there are much more files as before.
Reporter | ||
Comment 2•14 years ago
|
||
Taras has found (bug 593198 comment 20) that we're taking a 1-2 second hit at some point at startup to open (and list?) the cache directory, because it's so fragmented from having too many files in one dir.
I like his proposal:
"lets change the storage algorithm so a file like 3E3E27FEd01 would become
Cache/3/E/3E27FEd01 with Cache/0->F/ directories preallocated on cache
creation to avoid fragmentation. We can lazy allocate their subdirs."
I'd love to see this in FF4, but I don't expect we can block for it. If we can get a patch in time, though, I hope we can approve it. (Is there a "wanted-2.0" flag or something I should set?).
blocking2.0: --- → ?
Assignee | ||
Updated•14 years ago
|
Assignee: nobody → michal.novotny
Assignee | ||
Comment 3•14 years ago
|
||
Attachment #480884 -
Flags: review?(jduell.mcbugs)
Assignee | ||
Comment 4•14 years ago
|
||
I forgot to cut the first 2 chars from the filename.
Attachment #480884 -
Attachment is obsolete: true
Attachment #480901 -
Flags: review?(jduell.mcbugs)
Attachment #480884 -
Flags: review?(jduell.mcbugs)
Assignee | ||
Comment 6•14 years ago
|
||
I think this is wanted but I don't see a flag.
Why does this bug depend on 593198? They are IMO independent or maybe this bug blocks 593198.
Reporter | ||
Comment 7•14 years ago
|
||
Yes, I think we do want this if it can be done for ff4, but also don't know what flag to set/ask for "want but not block".
This seems likely to be easier to do quickly than bug 593198, which has the whole "how to shrink" problem. And you're right, I don't think they depend on each other.
No longer depends on: 593198
Comment 8•14 years ago
|
||
For consideration: offline application's nsDiskCacheDeviceSQL is already using something like that, see [1]. Maybe stay consistent?
[1] http://mxr.mozilla.org/mozilla-central/source/netwerk/cache/nsDiskCacheDeviceSQL.cpp#180
Comment 9•14 years ago
|
||
Maybe better code example:
http://mxr.mozilla.org/mozilla-central/source/netwerk/cache/nsDiskCacheDeviceSQL.cpp#346
Assignee | ||
Comment 10•14 years ago
|
||
Should be blocking because bug #604897 is blocking.
blocking2.0: - → ?
Reporter | ||
Comment 11•14 years ago
|
||
Comment on attachment 480901 [details] [diff] [review]
patch v2
So this patch winds up having a directory tree with 16 parent subdirectories, each of which can have 16 subdirs, for 255 directories total. I suspect we want more.
I don't know that we have good statistics on the breakdown of our cache files, but many users will have a 1 GB cache, and our current _CACHE_123 files are only 30 MB or so each, so until/unless we change that, users are going to have a *lot* of separate files.
(Ex: if avg size is 2K, that's 500K files, most of which won't fit in the block files, if my read of bug 604897 comment 5 is right (all three CACHE files will add up to 90 MB). Even if we fit 90% of files in the block files, that's still 50K files, which would be 196 files per directory--close to the 255 cutoff where squid has found perf degradation.)
So I propose we use two hex digits for the leaf directory names, ie. have 16 parent subdirs, and 255 leaf ones, for 4080 directories total. This would add a little extra cost to reading files (another directory inode to traverse, with less likelihood of it being in the filesystem cache), but it would avoid what seem to be more serious problems with too many files per directory on some systems.
I'm open to arguments that we can store enough files in the cache block files, if someone has numbers to back that up.
+r with this issue sorted out.
>diff --git a/netwerk/cache/nsDiskCacheMap.cpp b/netwerk/cache/nsDiskCacheMap.cpp
>@@ -87,16 +88,19 @@ nsDiskCacheMap::Open(nsILocalFile * cac
> // check size of map file
> PRUint32 mapSize = PR_Available(mMapFD);
> if (mapSize == 0) { // creating a new _CACHE_MAP_
>
> // block files shouldn't exist if we're creating the _CACHE_MAP_
> if (cacheFilesExist)
> goto error_exit;
Do we want a "cacheSubdirsExist" variable, too? I.e. should we check for the existence of at least the 16 top-level subdirs, at the same time we check for _CACHE_MAP_ and the cache block files? I'm less concerned about the case of them preexisting when we don't expect them (this code block) as I am about the 'else' block below all this (for opening an existing cache directory) where we'd implicitly assume they're present, but if any are missing, we might behave badly. Perhaps I'm being too paranoid--I'm not sure what would delete those directories (and if it did, it could always delete them during operation, after we've checked them at startup, and still break things).
Assignee | ||
Comment 12•14 years ago
|
||
(In reply to comment #11)
> (Ex: if avg size is 2K, that's 500K files, most of which won't fit in the block
> files, if my read of bug 604897 comment 5 is right (all three CACHE files will
> add up to 90 MB). Even if we fit 90% of files in the block files, that's still
> 50K files, which would be 196 files per directory--close to the 255 cutoff
> where squid has found perf degradation.)
After 604897 lands we can put up to 172032 records into the block files. So there is no way to fit 90% of the entries to the block files. Also remember that the cache entry has 2 records (metadata and data).
> So I propose we use two hex digits for the leaf directory names, ie. have 16
> parent subdirs, and 255 leaf ones, for 4080 directories total. This would add
> a little extra cost to reading files (another directory inode to traverse, with
> less likelihood of it being in the filesystem cache), but it would avoid what
> seem to be more serious problems with too many files per directory on some
> systems.
Why not to use Cache/0/1/2/34567d01 instead of Cache/0/12/34567d01 ?
> I'm open to arguments that we can store enough files in the cache block files,
> if someone has numbers to back that up.
I'm running locally firefox with the fix from bug #604897 for several weeks now, so my FF caches entries even if the block files are full. It occupies 614MB. I'll try to get some stats from my cache.
> Do we want a "cacheSubdirsExist" variable, too? I.e. should we check for the
> existence of at least the 16 top-level subdirs, at the same time we check for
> _CACHE_MAP_ and the cache block files? I'm less concerned about the case of
> them preexisting when we don't expect them (this code block) as I am about the
> 'else' block below all this (for opening an existing cache directory) where
> we'd implicitly assume they're present, but if any are missing, we might behave
> badly. Perhaps I'm being too paranoid--I'm not sure what would delete those
> directories (and if it did, it could always delete them during operation, after
> we've checked them at startup, and still break things).
We don't need to do it. Any missing subdirectory would be created.
Reporter | ||
Comment 13•14 years ago
|
||
> Why not to use Cache/0/1/2/34567d01 instead of Cache/0/12/34567d01 ?
a 2-level directory tree instead of 3 level means fewer filesystem directory inodes to traverse, and it seems like 256 subdirs is still small enough to avoid perf problems. Keeping the top-level to 16 dirs makes it more likely that the inodes for the first directory is in the OS cache.
I just looked at how squid does this, and they use this same scheme (16 parent, with 256 leaf dirs), so that's a good sign.
Comment 14•14 years ago
|
||
Blocking, and I think this needs to happen ASAP, to get some mileage on this before we release.
blocking2.0: ? → beta9+
Assignee | ||
Comment 15•14 years ago
|
||
Here is some statistics from my cache that occupies 639989 KB (max size is 640MB). Average count of files per subdirectory is 169. If the cache size were 1GB then this the count would be around 444. So I agree that 16*256 subdirectories should be used.
There are few interesting things that I'd like to investigate later:
- high number of files with zero size
- there should be 43381 separate files on the disk but 2 are missing
- a lot of small files are stored as separate files, although there is quite a lot of free blocks
metadata data both
------------------------------------------
count 58592 58592 117184
average 668.13 9289.15 4978.64
Records grouped by block files limits:
------------------------------------------
0 0 3091 3091
1-768 50327 16797 67124
769-3072 5641 14961 20602
3073-16384 2624 16632 19256
16385- 0 7111 7111
Records grouped by blocks usage in block files:
------------------------------------------
1-256 351 10289 10640
257-512 39916 4376 44292
513-768 10060 2132 12192
769-1024 3454 2002 5456
1025-2048 1944 7613 9557
2049-3072 243 5346 5589
3073-4096 1575 3542 5117
4097-8192 1049 7706 8755
8193-12288 0 3487 3487
12289-16384 0 1897 1897
Number of blocks needed to fit all small files into block files:
------------------------------------------
256 110355 25436 135791
1024 8071 33260 41331
4096 3673 37003 40676
Where are actually stored files of size 1-768:
------------------------------------------
separate file 8570 3347 11917
_CACHE_001_ 41305 13435 54740
_CACHE_002_ 299 9 308
_CACHE_003_ 153 6 159
Where are actually stored files of size 769-3072:
------------------------------------------
separate file 3167 5845 9012
_CACHE_002_ 2367 8535 10902
_CACHE_003_ 107 581 688
Where are actually stored files of size 3073-16384:
------------------------------------------
separate file 1752 13589 15341
_CACHE_003_ 872 3041 3913
Bitmap of _CACHE_000_:
------------------------------------------
free bits : 20419
free continuous blocks (1) : 46
free continuous blocks (2) : 217
free continuous blocks (3) : 34
free continuous blocks (4+) : 881
Bitmap of _CACHE_001_:
------------------------------------------
free bits : 9768
free continuous blocks (1) : 14
free continuous blocks (2) : 19
free continuous blocks (3) : 28
free continuous blocks (4+) : 387
Bitmap of _CACHE_002_:
------------------------------------------
free bits : 0
free continuous blocks (1) : 0
free continuous blocks (2) : 0
free continuous blocks (3) : 0
free continuous blocks (4+) : 0
Assignee | ||
Comment 16•14 years ago
|
||
Attachment #480901 -
Attachment is obsolete: true
Attachment #495003 -
Flags: superreview?(bzbarsky)
Attachment #480901 -
Flags: review?(jduell.mcbugs)
Comment 17•14 years ago
|
||
Comment on attachment 495003 [details] [diff] [review]
patch v3 - now using 16*256 subdirectories
Please add some comments explaining that the shifts by 28 and 20 and the 0xFFFFF mask partition the hashnumber, just so that's clear?
sr=me.
Attachment #495003 -
Flags: superreview?(bzbarsky) → superreview+
Assignee | ||
Comment 18•14 years ago
|
||
Attachment #495003 -
Attachment is obsolete: true
Assignee | ||
Updated•14 years ago
|
Keywords: checkin-needed
Comment 19•14 years ago
|
||
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•14 years ago
|
Keywords: checkin-needed
You need to log in
before you can comment on or make changes to this bug.
Description
•