Closed Bug 669229 Opened 13 years ago Closed 13 years ago

Race condition in puppet nagios config

Categories

(Release Engineering :: General, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: catlee)

References

Details

(Whiteboard: [puppet])

The nrpe.cfg changes in bug 656413 led to a lot of alerts like this: moz2-linux-slave07.build.sjc1:disk - /var is 4CRITICAL: NRPE: Command check_disk not defined moz2-linux-slave07.build.sjc1:disk - / is 4CRITICAL: NRPE: Command check_disk not defined moz2-linux-slave07.build.sjc1:disk - /builds is 4CRITICAL: NRPE: Command check_disk not defined moz2-linux-slave07.build.sjc1:buildbot is 4CRITICAL: NRPE: Command check_buildbot not defined Only some linux32/linux64 hosts so far, and fixed by a 'service nrpe restart' or a reboot. The template for /etc/nagios/nrpe.cfg changed in http://hg.mozilla.org/build/puppet-manifests/rev/369888bba343 Turns out it's a race condition (moz2-linux-slave07 again): Jul 4 13:32:27 moz2-linux-slave07 puppetd[2189]: Starting catalog run ... Jul 4 13:32:36 moz2-linux-slave07 puppetd[2189]: (//Node[moz2-linux-slave07]/buildslave/nagios/nagios::service/File[/etc/nagios/nrpe.cfg]/content) content changed '{md5}74e04c65fcd07eca040415ea87ab1449' to '{md5 }ced55880e90e540b14c576892d3554e6' Jul 4 13:32:36 moz2-linux-slave07 puppetd[2189]: (//Node[moz2-linux-slave07]/buildslave/nagios/nagios::service/Service[nrpe]) Triggering 'refresh' from 1 dependencies We got the new nrpe.cfg and restart the service ... Jul 4 13:32:37 moz2-linux-slave07 nrpe[2027]: Caught SIGTERM - shutting down... Jul 4 13:32:37 moz2-linux-slave07 nrpe[2027]: Cannot remove pidfile '/var/run/nrpe.pid' - check your privileges. Jul 4 13:32:37 moz2-linux-slave07 nrpe[2027]: Daemon shutdown Jul 4 13:32:37 moz2-linux-slave07 nrpe[2603]: Could not open config directory '/etc/nagios/nrpe.d' for reading. ... but didn't create /etc/nagios/nrpe.d yet ... Jul 4 13:32:37 moz2-linux-slave07 nrpe[2604]: Starting up daemon Jul 4 13:32:37 moz2-linux-slave07 nrpe[2604]: Warning: Daemon is configured to accept command arguments from clients! Jul 4 13:32:37 moz2-linux-slave07 nrpe[2604]: Listening for connections on port 5666 Jul 4 13:32:37 moz2-linux-slave07 nrpe[2604]: Allowing connections from: <redacted> ... here it goes ... Jul 4 13:32:37 moz2-linux-slave07 puppetd[2189]: (//Node[moz2-linux-slave07]/buildslave /nagios/nagios::service/File[/etc/nagios/nrpe.d]/ensure) created ... Jul 4 13:32:51 moz2-linux-slave07 puppetd[2189]: Finished catalog run in 24.49 seconds Puppet bug ? Puppet config bug ?
I think this is fixed now.
Assignee: nobody → catlee
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.