Closed Bug 624260 Opened 14 years ago Closed 14 years ago

nagios load average warnings should be on masters only

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: arich)

References

Details

(Whiteboard: [buildslaves][nagios])

09:22 < nagios> [33] moz2-linux-slave22.build:avg load is WARNING: WARNING - load average: 0.09, 6.04, 11.40
09:27 < nagios> moz2-linux-slave22.build:avg load is OK: OK - load average: 1.07, 2.93, 8.56

It looks like this was running the fuzzer - it's running fuzzer now, at any rate.  I'm not sure this particular warning is helpful.  Even if I had caught the slave during the five minutes when the warning was active, there was no necessary action.

If there's utility to this check, then let's keep it but increase the timescale and thresholds.  Otherwise, I think we should just remove it..
I think it's just part of the default set of checks for nagios (or the local nagios setup). The only thing I recall this check catching is thrashing Linux boxes which had 'make -j9' on some mobile builds, instead of our usual -j4. We fixed the mozconfigs for that (and you could argue review should have caught it originally).

We have already disabled notifications and/or removed the check for the equivalent windows test (because we'd hit 100% CPU routinely during a compile).
if -j9 results in quicker builds and those builds aren't running on shared hosts, is there any reason that we don't want to use -j9?
(In reply to comment #2)
> if -j9 results in quicker builds and those builds aren't running on shared
> hosts, is there any reason that we don't want to use -j9?

That seems entirely unrelated to this bug.
During triage, we agreed:

* slaves go from totally-idle to totally-busy, and this is valid. Nagios alerts on this is not useful.
* masters should stay fairly consistent, and safely-low load. Nagios alerts on masters is *very* useful.

Morphing and pushing to IT per zandr.
Assignee: dustin → server-ops-releng
Component: Release Engineering → Server Operations: RelEng
QA Contact: release → zandr
Summary: nagios load average warnings - helpful? → nagios load average warnings should be on masters only
Assignee: server-ops-releng → zandr
Assignee: zandr → arich
All slaves have been modified so that they do not check avg load.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.