Closed
Bug 917462
Opened 11 years ago
Closed 11 years ago
please adjust nagios alert for gaia_bumper.stamp on buildbot-master66
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mozilla, Assigned: ericz)
References
Details
(Whiteboard: [reit-ops])
Attachments
(1 file)
(deleted),
patch
|
ashish
:
review+
|
Details | Diff | Splinter Review |
buildbot-master66.srv.releng.usw2.mozilla.com:File Age - /builds/gaia_bumper/gaia_bumper.stamp
Looks like it's warning ~574 seconds and critical at ~930 seconds?
Could we adjust this to warn at 1200 seconds and critical at 1800 seconds?
It keeps flapping with no-human-intervention-needed notifications.
Updated•11 years ago
|
Whiteboard: [reit-ops]
Assignee | ||
Updated•11 years ago
|
Assignee: infra → server-ops
Component: Infrastructure: Monitoring → Server Operations
Product: Infrastructure & Operations → mozilla.org
QA Contact: jdow → shyam
Assignee | ||
Updated•11 years ago
|
Assignee: server-ops → eziegenhorn
Assignee | ||
Comment 1•11 years ago
|
||
This is committed in rev 75199...will take a bit to get pushed out via puppet.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 2•11 years ago
|
||
Thank you!
Reporter | ||
Comment 3•11 years ago
|
||
This appeared today:
[19:39] <nagios-releng> Mon 19:39:55 PDT [4316] buildbot-master66.srv.releng.usw2.mozilla.com:File Age - /builds/gaia_bumper/gaia_bumper.stamp is WARNING: FILE_AGE WARNING: /builds/gaia_bumper/gaia_bumper.stamp is 559 seconds old and 0 bytes (http://m.allizom.org/File+Age+-+/builds/gaia_bumper/gaia_bumper.stamp)
Do you know how long it'll take to take effect?
Flags: needinfo?(eziegenhorn)
Comment 4•11 years ago
|
||
It's already in effect :
define service{
use generic-service
host_name buildbot-master66.srv.releng.usw2.mozilla.com
service_description File Age - /builds/gaia_bumper/gaia_bumper.stamp
check_command check_file_age!1200!1800!/builds/gaia_bumper/gaia_bumper.stamp
That's odd you saw it show up :|
Assignee | ||
Comment 5•11 years ago
|
||
:aki Yeah that has been in effect for two weeks, I have no idea how you saw this alert. I double-checked nagios1.private.releng.scl3 and it has the correct, current config values. I looked in the logs there and the last time this alert shows up was the end of June. What channel did you see this alert in?
Flags: needinfo?(eziegenhorn)
Reporter | ||
Comment 6•11 years ago
|
||
This was in #buildduty.
Is nagios-releng controlled by this service?
Assignee | ||
Comment 7•11 years ago
|
||
Ok, so :ashish had some great insights into why this isn't working right (the host isn't puppetized and a bad interaction with a weirdly-defined check) and he also got me access to the box which will be great help. I believe the critical threshold is working now and am still working on the warning threshold which seems broken still.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 8•11 years ago
|
||
Ran a number of tests and this appears to be working reliably now. Let me know if it false-alarms any longer.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 9•11 years ago
|
||
From today: [21:01] <nagios-releng> Sun 21:01:47 PDT [4742] buildbot-master66.srv.releng.usw2.mozilla.com:File Age - /builds/gaia_bumper/gaia_bumper.stamp is WARNING: FILE_AGE WARNING: /builds/gaia_bumper/gaia_bumper.stamp is 501 seconds old and 0 bytes (http://m.allizom.org/File+Age+-+/builds/gaia_bumper/gaia_bumper.stamp)
Assignee | ||
Updated•11 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 10•11 years ago
|
||
Ok, what it was actually alerting about is that the file is 0 bytes. For whatever reason, we specify it should be at least 1 byte big. This is not the behavior we want for this box, but I'm not sure about others. Additionally, we specify the -m flag to check_file_age, and it doesn't take a -m flag. Perhaps that's from an older version. Since fixing this affects other hosts I'm going to ask Ashish to review the patch before I put it in.
Attachment #824183 -
Flags: review?(ashish)
Comment 11•11 years ago
|
||
Comment on attachment 824183 [details] [diff] [review]
checkcommands.pp patch
Review of attachment 824183 [details] [diff] [review]:
-----------------------------------------------------------------
The choice of 1 byte is quite likely historical. I can't imagine this change breaking anything but keep an eye out after pushed out. The -m flag applies to an earlier version of the bundled plugin but doesn't work unless used alongwith a different NRPE plugin check_file_age2...
Attachment #824183 -
Flags: review?(ashish) → review+
Assignee | ||
Comment 12•11 years ago
|
||
Patch committed in r77265. Will watch #buildduty for a few days.
Comment 13•11 years ago
|
||
13:42 nagios-releng: Wed 13:42:07 PDT [4782] buildbot-master66.srv.releng.usw2.mozilla.com:File Age - /builds/gaia_bumper/gaia_bumper.stamp is WARNING: FILE_AGE WARNING: /builds/gaia_bumper/gaia_bumper.stamp is 527 seconds old and 0 bytes (http://m.allizom.org/File+Age+-+/builds/gaia_bumper/gaia_bumper.stamp)
13:58 nagios-releng: Wed 13:58:07 PDT [4783] buildbot-master66.srv.releng.usw2.mozilla.com:File Age - /builds/gaia_bumper/gaia_bumper.stamp is WARNING: FILE_AGE WARNING: /builds/gaia_bumper/gaia_bumper.stamp is 1487 seconds old and 0 bytes (http://m.allizom.org/File+Age+-+/builds/gaia_bumper/gaia_bumper.stamp)
13:00 nagios-releng: Wed 14:00:08 PDT [4784] buildbot-master66.srv.releng.usw2.mozilla.com:File Age - /builds/gaia_bumper/gaia_bumper.stamp is OK: FILE_AGE OK: /builds/gaia_bumper/gaia_bumper.stamp is 15 seconds old and 0 bytes (http://m.allizom.org/File+Age+-+/builds/gaia_bumper/gaia_bumper.stamp)
Reporter | ||
Comment 14•11 years ago
|
||
[12:55] <nagios-releng> Fri 12:55:48 PDT [4852] buildbot-master66.srv.releng.usw2.mozilla.com:File Age - /builds/gaia_bumper/gaia_bumper.stamp is WARNING: FILE_AGE WARNING: /builds/gaia_bumper/gaia_bumper.stamp is 591 seconds old and 0 bytes (http://m.allizom.org/File+Age+-+/builds/gaia_bumper/gaia_bumper.stamp)
Reporter | ||
Comment 15•11 years ago
|
||
[05:09] <nagios-releng> [#buildduty] Tue 05:09:49 PST [4153] buildbot-master66.srv.releng.usw2.mozilla.com:File Age - /builds/gaia_bumper/gaia_bumper.stamp is WARNING: FILE_AGE WARNING: /builds/gaia_bumper/gaia_bumper.stamp is 554 seconds old and 0 bytes (http://m.allizom.org/File+Age+-+/builds/gaia_bumper/gaia_bumper.stamp)
[05:19] <nagios-releng> [#buildduty] Tue 05:19:50 PST [4159] buildbot-master66.srv.releng.usw2.mozilla.com:File Age - /builds/gaia_bumper/gaia_bumper.stamp is OK: FILE_AGE OK: /builds/gaia_bumper/gaia_bumper.stamp is 61 seconds old and 0 bytes (http://m.allizom.org/File+Age+-+/builds/gaia_bumper/gaia_bumper.stamp)
Assignee | ||
Comment 16•11 years ago
|
||
Yeah something is still busted. To wit:
-sh-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H buildbot-master66.srv.releng.usw2.mozilla.com -t 15 -c check_file_age -a "-w 677 -c 1500 -W 0 -C 0 -m -f /builds/gaia_bumper/gaia_bumper.stamp"
FILE_AGE WARNING: /builds/gaia_bumper/gaia_bumper.stamp is 261 seconds old and 0 bytes
That shouldn't warn with those parameters. Once I regain access to the box I'll troubleshoot more.
Assignee | ||
Comment 17•11 years ago
|
||
We were bumping up against differences in /etc/nagios/nrpe.cfg between releng hosts and infra hosts. They defined the check_file_age check's arguments differently and it was causing the first argument specified for releng hosts (only buildbot-master66 uses it) to be ignored as it was garbled. Therefore, it was using the default warning age of 240 seconds. Dustin just landed a patch to nrpe.cfg for releng hosts to make it match infra hosts. I will watch it a few more days.
Assignee | ||
Comment 18•11 years ago
|
||
This has alerted in the last three days, but they were all valid alerts. Hesitantly going to close this again. Thanks for your patience.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•