Closed Bug 1850750 Opened 1 year ago Closed 1 year ago

consider emitting metrics for file sizes in disk cache

Categories

(Socorro :: Processor, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

We're trying to figure out what's going on with processors and why they slow down and then never speed back up again. Dashboard graphs suggest that disk read/write times increase and then throughput drops.

What if it's related to the size of files in the disk cache?

This bug covers looking at emitting metrics for file sizes in disk cache.

I think I have code for emitting things every second-ish from the disk cache managers. Emitting the size of the median and 95% file might be what we want. I'm pretty sure it's easy to compute with data the disk cache manager is already tracking.

Assignee: nobody → willkg
Status: NEW → ASSIGNED

Between those two PRs, I added:

  • processor.cache_manager.files.count
  • processor.cache_manager.files.gt_500
  • processor.cache_manager.file_sizes.avg
  • processor.cache_manager.file_sizes.median
  • processor.cache_manager.file_sizes.ninety_five
  • processor.cache_manager.file_sizes.max

Beyond that, rough signal, I'll need to get a directory listing of what's in the disk cache. It's harder to see that over time, though.

Also, I changed the heartbeat interval from 1s to 60s because 1s is probably not helpful and it's incredibly spammy in a local development environment.

I deployed this to prod just now in bug #1851648. Marking as FIXED.

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: