Closed
Bug 752332
Opened 13 years ago
Closed 11 years ago
enable nagios check of puppet agent status on bm32
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: hwine, Assigned: ashish)
References
Details
Trial deploy of new check that puppet agent has been successful recently. (See bug 685527#c4 for details)
Please add checking of "check_puppet_agent" on buildbot-master32 via NRPE, with notifications disabled. Sample nagios service configuration given at:
<https://github.com/hwine/nagios-plugins/blob/master/check_puppet_agent>
After some burn in and tuning, we'll deploy via puppet on a broader scale and then ask for general activation in another bug.
Reporter | ||
Comment 1•12 years ago
|
||
To clarify, please use this line for enabling the service:
check_command check_nrpe!check_puppet_agent!3600!7200
Comment 2•12 years ago
|
||
I've added the check to the existing nagios with notifications disabled. Rick, please make sure to copy this check from admin1.infra.scl1.mozilla.com to nagios1.private.releng.scl3.mozilla.com when you migrate things today.
Assignee: server-ops-releng → rbryce
Component: Server Operations: RelEng → Server Operations
QA Contact: arich → phong
Comment 3•12 years ago
|
||
This is no longer needed. AS releng is staying put on scl1
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INVALID
Reporter | ||
Comment 4•12 years ago
|
||
The move to scl3 may be invalid, but tracking this on scl1 nagios is not. Moving back to server-ops-releng and marking fixed instead.
Assignee: rbryce → server-ops-releng
Component: Server Operations → Server Operations: RelEng
QA Contact: phong → arich
Resolution: INVALID → FIXED
Reporter | ||
Comment 5•12 years ago
|
||
(In reply to Rick Bryce [:rbryce] from comment #3)
> This is no longer needed. AS releng is staying put on scl1
Times have changed, we're in scl3 - please enable check per comment 1 for buildbot-master32 on the nagios server at http://nagios1.private.releng.scl3.mozilla.com/releng-scl3/
plugin functions correctly locally, want to triple check it works okay before rolling out to all hosts.
Assignee: server-ops-releng → server-ops
Status: RESOLVED → REOPENED
Component: Server Operations: RelEng → Server Operations
QA Contact: arich → shyam
Resolution: FIXED → ---
Comment 6•12 years ago
|
||
Given that we're not building masters with old-puppet any more, and that all current masters are on KVM and thus will be replaced in the move to scl3, is this still necessary?
Assignee | ||
Updated•12 years ago
|
Flags: needinfo?(hwine)
Reporter | ||
Comment 7•12 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #6)
> Given that we're not building masters with old-puppet any more, and that all
> current masters are on KVM and thus will be replaced in the move to scl3, is
> this still necessary?
Yes - it will be deployed on ALL puppetized non-talos machines. Just starting with this older one since it used to work there -- before nagios, etc. upgraded. Easier to trouble shoot.
And, yes, we want a nagios alert for this condition. As I understood puppetAgain, the dashboard will flag the error, but not trigger a nagios alert.
Flags: needinfo?(hwine)
Comment 8•12 years ago
|
||
We also get an email for every failed puppet run, in the releng-shared mailbox. I really don't think a nagios alert is necessary.
Comment 9•12 years ago
|
||
And I should add, at least Callek and I check those religiously. I'd like to know that others are watching that mailbox, too.
Reporter | ||
Comment 10•12 years ago
|
||
Per IRC chat with Dustin, we can proceed on hooking this up.
Assignee | ||
Comment 11•12 years ago
|
||
OK, there is no buildbot-master32:
Host buildbot-master32.srv.releng.scl3.mozilla.com not found: 3(NXDOMAIN)
Or am I missing something? :)
Assignee: server-ops → ashish
Comment 12•11 years ago
|
||
That was one of the buildbot-masters that was recently decommissioned.
Reporter | ||
Comment 13•11 years ago
|
||
(In reply to Ashish Vijayaram [:ashish] from comment #11)
> OK, there is no buildbot-master32:
>
> Host buildbot-master32.srv.releng.scl3.mozilla.com not found: 3(NXDOMAIN)
>
> Or am I missing something? :)
No, I am - it was there when I started testing. I'll move my setup, then update this request. Taking out of your queue for now.
Assignee: ashish → hwine
Reporter | ||
Comment 14•11 years ago
|
||
Okay, build-master12 is even older than 32 was (puppet version) -- I'll have to do some work there to support the plugin.
Ashish - can you hook up buildbot-master63.srv.releng.use1.mozilla.com (in AWS) please? the plugin runs clean there.
Thanks!
Assignee: hwine → ashish
Assignee | ||
Comment 15•11 years ago
|
||
Done!
< nagios-releng> ashish: buildbot-master63.srv.releng.use1.mozilla.com:Puppet freshness is OK - OK: Puppet agent last run: 1706 sec ago
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 16•11 years ago
|
||
(In reply to Hal Wine [:hwine] from comment #14)
> Okay, build-master12 is even older than 32 was (puppet version) -- I'll have
> to do some work there to support the plugin.
No update needed, see bug 685527 comment 13
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•