Closed
Bug 420346
Opened 17 years ago
Closed 17 years ago
new build machines to monitor with nagios
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: mrz)
References
Details
I've been adding the nrpe daemon to a number of build machines the past couple of days. I'd like to get nagios monitoring them. Aravind, these are using the same configs that the others are using, so check_load, check_disk, etc. with the same arguments should work fine. I'd like to be around when these are brought up to deal with any failures. This generally means before 1 or 2pm PST. Let me know a good day/time for you, though.
The try server machines are all in the sandbox network, I don't know if you'll be able to monitor them or not. If you can, please do.
Here's the list:
(machine) - (disks to watch) - (watch check_buildbot?)
Linux:
These machines should all respond to the following checks:
check_load -w $ARG1$ -c $ARG2$
check_users -w $ARG1$ -c $ARG2$
check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
fxdbug-linux-tbox - /, /builds - no
l10n-linux-tbox - /, /builds - no
argo - / - no
balsa-18branch - /, /builds - no
fx-linux-tbox - /, /builds - no
tb-linux-tbox - /, /builds - no
sm-try1-linux-slave - /, /builds - yes
sm-try2-linux-slave - /, /builds - yes
sm-try-master - /, /builds - yes
staging-master - /, /builds - yes
xr-linux-tbox - /, /builds - no
sm-staging-try-master - / - yes
sm-staging-try1-linux-slave - / - yes
prometheus-vm - / - no
Windows:
These machines should all respond to the following checks:
check_load=inject checkCPU warn=$ARG1$ crit=$ARG2$ time=1m time=5m time=15m
check_disk=inject CheckDriveSize MinWarnFree=$ARG1$ MinCritFree=$ARG2$ Drive=$ARG3$
check_procs=inject checkCounter "Counter=\Objects\Processes" ShowAll MaxWarn=$ARG1$ MaxCrit=$ARG2$
cerberus-vm - C - no
fxdbug-win32-tbox - C, D, E - no
fx-win32-tbox - C, D, E - no
moz2-win32-slave1 - C, D, E - yes
l10n-win32-tbox - C, D, E - no
patrocles - C, E - no
tbnewref-win32-tbox - C, D, E - no
sm-try1-win32-slave - C, D, E - yes
sm-try2-win32-slave - C, D, E - yes
xr-win32-tbox - C, D, E - no
sm-staging-try1-win32-slave - C, D - yes
pacifica-vm - C - no
Mac:
Most of these are already monitored it looks like, but the following can be updated to do PING, RAID, avg_load, and root_partition checks. Where indicated, check_buildbot should be watched as well.
bm-xserve01 - / - no
bm-xserve08 - / - no
sm-xserve15 - / - yes
bm-xserve16 - / - no
A few more machines will trickle in next week, but this is the bulk of them.
Assignee | ||
Updated•17 years ago
|
Assignee: server-ops → mrz
Assignee | ||
Comment 1•17 years ago
|
||
Whew. All added.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 2•17 years ago
|
||
The try server monitors are all red, I guess that's because they are in the sandbox network. Can you allow nagios (tcp/5666) through that firewall?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 3•17 years ago
|
||
Sorry, I missed a couple of machines in the first list: moz2-linux-slave1 and bm-xserve07. Can you add them, too?
Assignee | ||
Comment 4•17 years ago
|
||
(In reply to comment #2)
> The try server monitors are all red, I guess that's because they are in the
> sandbox network. Can you allow nagios (tcp/5666) through that firewall?
>
Probably, punched a hole through for icmp and tcp/5666.
Assignee | ||
Comment 5•17 years ago
|
||
(In reply to comment #3)
> Sorry, I missed a couple of machines in the first list: moz2-linux-slave1 and
> bm-xserve07. Can you add them, too?
buildbot checks too?
Reporter | ||
Comment 6•17 years ago
|
||
(In reply to comment #5)
> (In reply to comment #3)
> > Sorry, I missed a couple of machines in the first list: moz2-linux-slave1 and
> > bm-xserve07. Can you add them, too?
>
> buildbot checks too?
>
Only for the moz2 slave.
Assignee | ||
Comment 7•17 years ago
|
||
(In reply to comment #3)
> Sorry, I missed a couple of machines in the first list: moz2-linux-slave1 and
> bm-xserve07. Can you add them, too?
>
added
Status: REOPENED → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•