Closed
Bug 419506
Opened 17 years ago
Closed 17 years ago
add nagios to all unittest machines
Categories
(Release Engineering :: General, defect, P2)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rcampbell, Unassigned)
References
Details
We should be running nagios on all these machines, qm-rhel02, qm-centos5-01, etc. For a full list, see:
http://wiki.mozilla.org/Buildbot/IT_Unittest_Support_Document
and,
http://wiki.mozilla.org/Buildbot/Talos/Machines
Comment 1•17 years ago
|
||
Also see:
http://wiki.mozilla.org/Build:Nagios:Win32
Setting up Nagios on Mac and Linux are quite a bit easier. bhearsum, do we have docs for those too?
Not sure if our configs are in CVS or not, but we should make those available.
Should we get Nagios client into the ref platform, so we don't need to do this after-the-fact in the future?
Reporter | ||
Comment 2•17 years ago
|
||
that'd be ideal. Thanks for the pointers!
Comment 3•17 years ago
|
||
I'm collecting stock nagios configs for all the platforms. I'll be putting a patch in bug 412816 shortly.
Comment 4•17 years ago
|
||
With regard to Linux docs, I'll be adding the nrpe daemon to the ref platform today, docs for installing will go on the ref platform page.
Our Mac's come with nagios installed (afaik), so I don't have any plans to write docs there. All of the ones I've set up have literally been "drop in nrpe.cfg".
Comment 5•17 years ago
|
||
I'm setting up nagios on the rest of the build machines right now, do you want me to do these, too? You may have to hook me up with passwords, I don't think I know them (anymore).
With regards to Talos machines, I think we should be careful as it may effect the numbers. If there's a machine on each platform I can install it on as a test I'd be happy to do so.
Comment 6•17 years ago
|
||
(In reply to comment #5)
> I'm setting up nagios on the rest of the build machines right now, do you want
> me to do these, too? You may have to hook me up with passwords, I don't think I
> know them (anymore).
Ben, yes, that would be great if you could do that too. I'll send you offline the usr/pswds that I know of, but I think I only have some of them.
> With regards to Talos machines, I think we should be careful as it may effect
> the numbers. If there's a machine on each platform I can install it on as a
> test I'd be happy to do so.
Excellent point. The buildbot masters would be good either way, but Alice would know best about whether we should touch the talos slave machines...
Comment 7•17 years ago
|
||
Rob's getting me a list of these machines and the passwords, I'll do this.
Assignee: build → bhearsum
Updated•17 years ago
|
Status: NEW → ASSIGNED
Priority: -- → P2
Comment 8•17 years ago
|
||
To be clear, I'm going to adding nagios to the following machines:
qm-rhel02
qm-centos5-01
qm-xserve01
qm-win2k3-01
Comment 9•17 years ago
|
||
Alright, Nagios is on them.
Comment 10•17 years ago
|
||
Anything left to do here or can we mark as FIXED?
Component: Testing → Release Engineering
Product: Core → mozilla.org
Version: Trunk → other
Updated•17 years ago
|
QA Contact: testing → release
Comment 11•17 years ago
|
||
If we still want Nagios on Talos machines, then sure. I'm not sure how viable this is going to be. We haven't even tested to see if/how much nagios is going to affect the numbers.
Reporter | ||
Comment 12•17 years ago
|
||
I'm inclined to leave nagios off the slaves for the reason Ben states. We can add it later if the need arises after we've tested it on the staging machines.
Comment 13•17 years ago
|
||
Alright, this is done then.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Summary: add nagios to all unittest / talos machines → add nagios to all unittest machines
Comment 14•17 years ago
|
||
The following emails have been getting sent to build@moco for the last few days, since this was enabled. Reopening bug to track.
Subject: [build] ** PROBLEM alert - qm-xserve01/buildbot is WARNING **
Date: Tue, 18 Mar 2008 03:12:08 -0700 (PDT)
From: nagios@dm-nagios01.mozilla.org (nagios)
To: build@mozilla.org
***** Nagios *****
Notification Type: PROBLEM
Service: buildbot
Host: qm-xserve01
Address: 10.2.73.11
State: WARNING
Date/Time: 03-18-2008 03:12:08
Additional Info:
PROCS WARNING: 0 processes with args /tools/buildbot/bin/buildbot
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 15•17 years ago
|
||
The error leads me to believe that the test is checking for a process with /tools/buildbot/bin/buildbot in the name.
Buildbot is running but it's installed in MacPython so the process has the name;
/Library/Frameworks/Python.framework/Versions/Current/bin/buildbot
This could be the issue, but I don't know enough about nagios to tell one way or another.
Comment 16•17 years ago
|
||
Another quick observation, one of the tests this is running is clicking on a webcal: url which launches iCal.
The url it's trying to open is;
webcal://127.0.0.1/rheeeeet.html
Comment 17•17 years ago
|
||
Sorry, I totally forgot about this problem. I'll have a look at it today.
Comment 18•17 years ago
|
||
Alright, the Buildbot monitor is fixed. I had to change this:
command[check_buildbot]=/usr/local/nagios/plugins/check_procs -w 1:1 -a /Library/Frameworks/Python.framework/Versions/Current/bin/buildbot
to this:
command[check_buildbot]=/usr/local/nagios/plugins/check_procs -w 1:1 -a buildbot
For some reason the full path to Buildbot doesn't work on OS X. It works fine on the Build machines, not sure why it doesn't here.
Comment 19•17 years ago
|
||
I'm not sure about Calendar opening up -- maybe one of the tests is supposed to do that?
Assignee: bhearsum → nobody
Status: REOPENED → NEW
Comment 20•17 years ago
|
||
Anything left to do here or can we mark as FIXED? Is nagios on the new PGO unittest machines in bug#420073?
Component: Release Engineering: Talos → Release Engineering
Updated•17 years ago
|
Status: NEW → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•