Closed Bug 1130591 Opened 10 years ago Closed 9 years ago

docker-worker: Runs out of disk-space

Categories

(Taskcluster :: Workers, defect)

All
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: jonasfj, Unassigned)

References

Details

kgrandon, reported disk space issues here: https://tools.taskcluster.net/task-inspector/#4Wy1Uh6XQ--DMWEHlf2eIw/1 For some reasons docker-worker is running out of disk space on /mnt. I suspect it has something to do with misconfiguration after jlal reconfigured all workerTypes when the workerTypes azure table was dropped. I ssh'ed in and dumped some data here: https://gist.github.com/jonasfj/581d8356410c35fb2d39 Potential issues: 1) The number of caches in /mnt/var/cache/docker-worker/gaia-misc-caches is alarming. Shouldn't there atmost be <capacity> instances of any named cache. 2) `docker ps` and `docker ps -a` doesn't show the same number of containers, implying that there is some halted containers that just sits around. They should always be deleted rather quickly afaik. 3) `docker ps` indicates that there is a lot of logserve containers running. Somehow it seems that they aren't killed. instance-id: i-ff57e4f1 Note, this seems to happening on multiple instances, not this single instance. See the other runs on task.
Blocks: 1076681
Note, it appears my previous "note" was wrong. This is not happening for all instances. I feared it was one of those everything is broken things, but I just looked wrong at the other runs which seems to have been running currently.
Component: TaskCluster → Docker-Worker
Product: Testing → Taskcluster
This has been resolved on multiple levels with reducing the amount of lingering containers, better garbage collection checks around volumes, and handling stalled containers.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Component: Docker-Worker → Workers
You need to log in before you can comment on or make changes to this bug.