Closed Bug 589006 Opened 14 years ago Closed 14 years ago

Adjust nagios settings for disk usage alerts for masters.

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: joduinn, Assigned: jabba)

Details

Currently, nagios alerts for disk space when slaves and masters run out of space. Slaves routinely get close to disk capacity during builds - slaves have limited disk space, to allow for more slaves on the same shared SAN. This means that nagios settings for *slaves* is correctly set to error at 100% full. However, masters should be treated differently in nagios, because when a master runs out of space, it takes down all the slaves and closes the tree. Masters should warn at 90% and critical at 95% of disk usage. This should help us avoid surprise outages like when a master ran out of space a few weeks ago and crashed, closing the tree. The masters are: ================ production-master production-master01 production-master02 production-master03 test-master01 test-master02 talos-master02 talos-master production-mobile-master
I think we already did this in bug 574537 for the /builds partitions, with the possible exception of test-master02.b.m.o. Is this a dupe or are you talking about the other partitions ?
Assignee: server-ops → jdow
All of those are indeed already set up properly for /builds. Please let me know if you need this check for / as well.
(In reply to comment #2) > All of those are indeed already set up properly for /builds. Please let me know > if you need this check for / as well. checking "/" also would be great, yes please. Not having that has bitten us recently during buildbot master reconfigs.
Added / in addition to /builds.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.