Closed Bug 420346 Opened 17 years ago Closed 17 years ago

new build machines to monitor with nagios

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: mrz)

References

Details

I've been adding the nrpe daemon to a number of build machines the past couple of days. I'd like to get nagios monitoring them. Aravind, these are using the same configs that the others are using, so check_load, check_disk, etc. with the same arguments should work fine. I'd like to be around when these are brought up to deal with any failures. This generally means before 1 or 2pm PST. Let me know a good day/time for you, though. The try server machines are all in the sandbox network, I don't know if you'll be able to monitor them or not. If you can, please do. Here's the list: (machine) - (disks to watch) - (watch check_buildbot?) Linux: These machines should all respond to the following checks: check_load -w $ARG1$ -c $ARG2$ check_users -w $ARG1$ -c $ARG2$ check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$ check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$ fxdbug-linux-tbox - /, /builds - no l10n-linux-tbox - /, /builds - no argo - / - no balsa-18branch - /, /builds - no fx-linux-tbox - /, /builds - no tb-linux-tbox - /, /builds - no sm-try1-linux-slave - /, /builds - yes sm-try2-linux-slave - /, /builds - yes sm-try-master - /, /builds - yes staging-master - /, /builds - yes xr-linux-tbox - /, /builds - no sm-staging-try-master - / - yes sm-staging-try1-linux-slave - / - yes prometheus-vm - / - no Windows: These machines should all respond to the following checks: check_load=inject checkCPU warn=$ARG1$ crit=$ARG2$ time=1m time=5m time=15m check_disk=inject CheckDriveSize MinWarnFree=$ARG1$ MinCritFree=$ARG2$ Drive=$ARG3$ check_procs=inject checkCounter "Counter=\Objects\Processes" ShowAll MaxWarn=$ARG1$ MaxCrit=$ARG2$ cerberus-vm - C - no fxdbug-win32-tbox - C, D, E - no fx-win32-tbox - C, D, E - no moz2-win32-slave1 - C, D, E - yes l10n-win32-tbox - C, D, E - no patrocles - C, E - no tbnewref-win32-tbox - C, D, E - no sm-try1-win32-slave - C, D, E - yes sm-try2-win32-slave - C, D, E - yes xr-win32-tbox - C, D, E - no sm-staging-try1-win32-slave - C, D - yes pacifica-vm - C - no Mac: Most of these are already monitored it looks like, but the following can be updated to do PING, RAID, avg_load, and root_partition checks. Where indicated, check_buildbot should be watched as well. bm-xserve01 - / - no bm-xserve08 - / - no sm-xserve15 - / - yes bm-xserve16 - / - no A few more machines will trickle in next week, but this is the bulk of them.
Assignee: server-ops → mrz
Whew. All added.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
The try server monitors are all red, I guess that's because they are in the sandbox network. Can you allow nagios (tcp/5666) through that firewall?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Sorry, I missed a couple of machines in the first list: moz2-linux-slave1 and bm-xserve07. Can you add them, too?
(In reply to comment #2) > The try server monitors are all red, I guess that's because they are in the > sandbox network. Can you allow nagios (tcp/5666) through that firewall? > Probably, punched a hole through for icmp and tcp/5666.
(In reply to comment #3) > Sorry, I missed a couple of machines in the first list: moz2-linux-slave1 and > bm-xserve07. Can you add them, too? buildbot checks too?
(In reply to comment #5) > (In reply to comment #3) > > Sorry, I missed a couple of machines in the first list: moz2-linux-slave1 and > > bm-xserve07. Can you add them, too? > > buildbot checks too? > Only for the moz2 slave.
(In reply to comment #3) > Sorry, I missed a couple of machines in the first list: moz2-linux-slave1 and > bm-xserve07. Can you add them, too? > added
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.