Closed
Bug 412816
Opened 17 years ago
Closed 17 years ago
add nrpe daemon to production grade tinderboxen and other important build machines
Categories
(Release Engineering :: General, defect, P3)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: bhearsum)
References
Details
Attachments
(1 file)
(deleted),
patch
|
rhelmer
:
review+
|
Details | Diff | Splinter Review |
Since we're moving to automation for nightly/depend builds we don't need to worry about fx-win32-tbox, fx-linux-tbox, and bm-xserve08. I _think_ the following list is all the ones that it should be added to (ref platforms are excluded from this, see bug #412443):
fxdbug-linux-tbox
l10n-linux-tbox
staging-try1-linux-slave
moz2-win32-slave1
staging-try1-win32-slave
staging-try-master
try-master
l10n-win32-tbox
egg
try1-linux-slave
try1-win32newref-slave
fxdbug-win32-tbox
xr-win32-tbox
karma
fxexp-win32-tbox
moz2-linux-slave1
tbnewref-win32-tbox
xr-linux-tbox
tb-linux-tbox
bm-xserve01
bm-xserve02
bm-xserve07
bm-xserve11
bm-xserve12
bm-xserve15
bl-bldlnx01
bl-bldlnx03
bl-bldxp01
balsa-18branch
argo-vm
crazyhorse
patrocles
pacifica-vm
prometheus-vm
I *think* that's all of them. If someone could verify this that'd be great.
Assignee | ||
Updated•17 years ago
|
Assignee: nobody → bhearsum
Priority: -- → P2
Assignee | ||
Updated•17 years ago
|
Priority: P2 → P3
Comment 1•17 years ago
|
||
These look good for a start, we should get this started. Do you need help getting this going? I saw that you added a win32 setup doc ( http://wiki.mozilla.org/Build:Nagios:Win32 ), are there equivs for Linux and OSX? Also can we get the configs checked in if they are not (they'd be a good starting point if nothing else.. I understand that different machines have different partitions, thresholds etc.).
I'll cross the list against the Build:Farm page, and try to dig up any that are not listed on Build:Farm while I'm at it.
Assignee | ||
Comment 2•17 years ago
|
||
I'm pretty caught up in other things right now, so anything you want to do would be appreciated :-).
Assignee | ||
Comment 3•17 years ago
|
||
These are basically the same as the configs being used on the release automation machines right now. I've added some header comments to them. They should "just work" when checked out but Mac/Linux do need to be adjusted for the number of Buildbot processes running.
I propose checking them into mozilla/tools/nagios. If we want to start pushing things to hg we could create hg.mozilla.org/build/software-configs or misc-configs (or something) and put them there.
Attachment #306023 -
Flags: review?(rhelmer)
Updated•17 years ago
|
Attachment #306023 -
Flags: review?(rhelmer) → review+
Assignee | ||
Comment 4•17 years ago
|
||
Comment on attachment 306023 [details] [diff] [review]
[checked in] basic nagios configs
RCS file: /cvsroot/mozilla/tools/nagios/NSC.ini,v
done
Checking in NSC.ini;
/cvsroot/mozilla/tools/nagios/NSC.ini,v <-- NSC.ini
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/tools/nagios/nrpe-linux.cfg,v
done
Checking in nrpe-linux.cfg;
/cvsroot/mozilla/tools/nagios/nrpe-linux.cfg,v <-- nrpe-linux.cfg
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/tools/nagios/nrpe-mac.cfg,v
done
Checking in nrpe-mac.cfg;
/cvsroot/mozilla/tools/nagios/nrpe-mac.cfg,v <-- nrpe-mac.cfg
initial revision: 1.1
done
Attachment #306023 -
Attachment description: basic nagios configs → [checked in] basic nagios configs
Assignee | ||
Comment 5•17 years ago
|
||
Alright, so far I've got the nrpe daemon running on the following machines:
fxdbug-linux-tbox
l10n-linux-tbox
argo
fx-linux-tbox
tb-linux-tbox
try1-linux-slave
try2-linux-slave
try-master
staging-master
moz2-linux-slave1
xr-linux-tbox
staging-try-master
staging-try1-linux-slave
prometheus-vm
cerberus-vm
fxdbug-win32-tbox
fx-win32-tbox
moz2-win32-slave1
l10n-win32-tbox
tbnewref-win32-tbox
I think most of the Macs were already running it but I updated the nrpe.cfg on the following (to add disk, load, user, and processes monitors):
bm-xserve01
bm-xserve02
bm-xserve04
bm-xserve07
bm-xserve08
bm-xserve11
bm-xserve12
bm-xserve15
I've got additional Windows boxes to put it on tomorrow, and once all of the new Macs come up I'll be checking their nrpe.cfg to make sure it is in line with the rest.
Assignee | ||
Comment 6•17 years ago
|
||
I managed to get the nrpe daemon going on balsa-18branch today (Redhat 7.2), crazyhorse and the Redhat 8.0 machines have been troublesome, I'm still working on that.
I also finished up with the Windows machines. In addition to the ones listed in comment #5 the nrpe daemon is running on:
try1-win32-slave
try2-win32-slave
xr-win32-tbox
staging-try1-win32-slave
pacifica-vm
bm-xserve16 is also ready to go.
I'm about to file an IT bug to get monitoring on these going.
Comment 7•17 years ago
|
||
Ben, since IT added the checks in bug 430246, it looks like all the MSYS machines are failing the PING test. Some sort of firewall or MSYS issue on the ref platform ?
Depends on: 420346
Assignee | ||
Comment 8•17 years ago
|
||
(In reply to comment #7)
> Ben, since IT added the checks in bug 430246, it looks like all the MSYS
> machines are failing the PING test. Some sort of firewall or MSYS issue on the
> ref platform ?
>
Yep, those VMs were blocking ICMP. I enabled it on them, and the reference VM itself.
Assignee | ||
Comment 9•17 years ago
|
||
Alright, I've got all this monitoring stuff sorted out now. avg_load notifications have been disabled for windows machines -- we should figure out a good way to do this at some point in the future.
Assignee | ||
Comment 10•17 years ago
|
||
Hit enter too soon.
egg, karma, and crazyhorse are not being monitored. I can't find working packages for karma and egg (redhat 8) - and rpm just hangs on crazyhorse. I bet crazyhorse we could get crazyhorse working after a reboot, but I'm not sure about the other two.
Assignee | ||
Comment 11•17 years ago
|
||
Does anyone feel strongly that egg, karma, and crazyhorse should be monitored? I'm inclined to just let them be.
Assignee | ||
Comment 12•17 years ago
|
||
I'm going to take the lack of response as silent agreement.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•