Closed
Bug 930021
Opened 11 years ago
Closed 11 years ago
Monitor free inodes on buildbot masters
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jhopkins, Assigned: ashish)
References
Details
Attachments
(1 file)
(deleted),
patch
|
dustin
:
review+
|
Details | Diff | Splinter Review |
buildbot-master10 ran out of inodes due to a stale cleanup lock file but nagios did not alert on the inode situation beforehand.
We should make sure we not only monitor and alert on free disk space but free inodes as well.
Comment 1•11 years ago
|
||
Nagios is checking free space and inodes for /, /builds/, and /var, which is triplication because those paths are all on the same / partition (wat). The checks have been green for at least 61 days.
Where is the lock file stored ?
Reporter | ||
Comment 2•11 years ago
|
||
> Nagios is checking free space and inodes for /, /builds/, and /var, which is triplication because those paths are all on the same / partition (wat). The checks have been green for at least 61 days.
What are the inodes warning/alert thresholds?
> Where is the lock file stored ?
The lock file was /etc/cron.d/bm10-tests1-tegra See also: bug 930216
Comment 3•11 years ago
|
||
something isn't working with the nagios check then. the first indication that something was wrong was at 23:44 ET when nagios alerted about # of dead items. the notification for /builds never happened AFAICT.
Comment 4•11 years ago
|
||
http://nagios1.private.releng.scl3.mozilla.com/releng-scl3/cgi-bin/config.cgi?type=services&expand=buildbot-master10.build.mtv1.mozilla.com says 5% and 10%, which presumably applies for both space and inodes.
ashish, any ideas on this ?
Flags: needinfo?(ashish)
Assignee | ||
Comment 5•11 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #4)
> http://nagios1.private.releng.scl3.mozilla.com/releng-scl3/cgi-bin/config.
> cgi?type=services&expand=buildbot-master10.build.mtv1.mozilla.com says 5%
> and 10%, which presumably applies for both space and inodes.
>
> ashish, any ideas on this ?
That is correct. The thresholds apply to inodes as well. Unsure why Nagios didn't alert. But I can't verify that now either...
Flags: needinfo?(ashish)
Reporter | ||
Comment 6•11 years ago
|
||
Looking at[1]:
$USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -c check_disk -a $ARG1$ $ARG2$ $ARG3$
-> $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -c check_disk -a 10% 5% /
The arguments "10% 5% /" get translated on the client via /etc/nagios/nrpe.cfg:
command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
So the 'rendered' command that gets run in this case is:
check_disk -w 10% -c 5% -p /
I did some testing and found that the above command will only check free disk space - there are separate arguments for checking inodes: -W and -C (note capitalization).
If we want to use the same thresholds for inodes as free disk space, we could modify /etc/nrpe.cfg to read:
command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -W $ARG1$ -C $ARG2$ -p $ARG3$
Otherwise, we could create a new command like:
command[check_inodes]=/usr/lib64/nagios/plugins/check_disk -W $ARG1$ -C $ARG2$ -p $ARG3$
[1] http://nagios1.private.releng.scl3.mozilla.com/releng-scl3/cgi-bin/config.cgi?type=command&expand=check_nrpe_disk!10%25!5%25!%2F
Reporter | ||
Updated•11 years ago
|
Component: Other → Platform Support
QA Contact: joduinn → coop
Comment 7•11 years ago
|
||
(In reply to John Hopkins (:jhopkins) from comment #6)
> Otherwise, we could create a new command like:
>
> command[check_inodes]=/usr/lib64/nagios/plugins/check_disk -W $ARG1$ -C
> $ARG2$ -p $ARG3$
I'm all for having a distinct check here.
Reporter | ||
Comment 8•11 years ago
|
||
Attachment #8365187 -
Flags: review?(dustin)
Updated•11 years ago
|
Attachment #8365187 -
Flags: review?(dustin) → review+
Reporter | ||
Comment 9•11 years ago
|
||
Comment on attachment 8365187 [details] [diff] [review]
[puppet] add check_inodes
https://hg.mozilla.org/build/puppet/rev/c340ec61e2d9
Next, I believe we have to request that IT make use of this new check command.
Attachment #8365187 -
Flags: checked-in+
Reporter | ||
Comment 10•11 years ago
|
||
ashish: what do we need to do to have nagios use the check_inodes command?
Flags: needinfo?(ashish)
Assignee | ||
Comment 11•11 years ago
|
||
(In reply to John Hopkins (:jhopkins) from comment #10)
> ashish: what do we need to do to have nagios use the check_inodes command?
Which hostgroups should this new check be added to? buildbot-master10 doesn't exist anymore...
Flags: needinfo?(ashish)
Reporter | ||
Comment 12•11 years ago
|
||
At a minimum, these hostgroups:
dev-buildbot-masters
scl3-production-buildbot-masters
use1-production-buildbot-masters
usw2-production-buildbot-masters
I expect check_inodes would be a good counterpart to most (all?) existing UNIX-based check_disk checks.
Flags: needinfo?(ashish)
Assignee | ||
Comment 13•11 years ago
|
||
Check has been added to specified hostgroups. However this lone host has not gotten the NRPE configuration:
dev-master01.build.scl1.mozilla.com
I've acked the host per conversation with :Callek on IRC.
Assignee: nobody → ashish
Status: NEW → RESOLVED
Closed: 11 years ago
Component: Platform Support → Server Operations
Flags: needinfo?(ashish)
Flags: checked-in+
Product: Release Engineering → mozilla.org
QA Contact: coop → shyam
Resolution: --- → FIXED
Version: unspecified → other
Comment 14•11 years ago
|
||
re: c#13 added to /etc/nagios/nrpe.cfg on dev-master01
command[check_inodes]=/usr/lib64/nagios/plugins/check_disk -W $ARG1$ -C $ARG2$ -p $ARG3$
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•