consider emitting metrics for file sizes in disk cache
Categories
(Socorro :: Processor, task, P2)
Tracking
(Not tracked)
People
(Reporter: willkg, Assigned: willkg)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
We're trying to figure out what's going on with processors and why they slow down and then never speed back up again. Dashboard graphs suggest that disk read/write times increase and then throughput drops.
What if it's related to the size of files in the disk cache?
This bug covers looking at emitting metrics for file sizes in disk cache.
Assignee | ||
Comment 1•1 year ago
|
||
I think I have code for emitting things every second-ish from the disk cache managers. Emitting the size of the median and 95% file might be what we want. I'm pretty sure it's easy to compute with data the disk cache manager is already tracking.
Assignee | ||
Comment 2•1 year ago
|
||
Assignee | ||
Comment 3•1 year ago
|
||
Assignee | ||
Comment 4•1 year ago
|
||
Assignee | ||
Comment 5•1 year ago
|
||
Between those two PRs, I added:
processor.cache_manager.files.count
processor.cache_manager.files.gt_500
processor.cache_manager.file_sizes.avg
processor.cache_manager.file_sizes.median
processor.cache_manager.file_sizes.ninety_five
processor.cache_manager.file_sizes.max
Beyond that, rough signal, I'll need to get a directory listing of what's in the disk cache. It's harder to see that over time, though.
Also, I changed the heartbeat interval from 1s to 60s because 1s is probably not helpful and it's incredibly spammy in a local development environment.
Assignee | ||
Comment 6•1 year ago
|
||
Assignee | ||
Comment 7•1 year ago
|
||
I deployed this to prod just now in bug #1851648. Marking as FIXED.
Description
•