Closed
Bug 627821
Opened 14 years ago
Closed 14 years ago
Relax nagios checks on dm-wwwbuild01 file age
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: zandr, Assigned: rtucker)
References
Details
These checks are very noisy these days, since the files aren't always getting built and uploaded in time.
bug 625978 is to fix the root cause.
In the mean time, we should relax the age check to 15 minutes to reduce nagios chatter.
Comment 1•14 years ago
|
||
Did this already happen ?
Assignee | ||
Updated•14 years ago
|
Assignee: server-ops-releng → rtucker
Assignee | ||
Comment 2•14 years ago
|
||
These service checks are commented out:
#http_age&contact_groups build::dm-wwwbuild01:build.mozilla.org:/builds/builds-running.js!7m
#http_age&contact_groups build::dm-wwwbuild01:build.mozilla.org:/builds/builds-pending.js!7m
#http_age&contact_groups build::dm-wwwbuild01:build.mozilla.org:/builds/builds-4hr.js.gz!7m
Would you like one of them to be enabled and set to a 15 minute threshold?
Comment 3•14 years ago
|
||
https://nagios.mozilla.org/nagios/cgi-bin/status.cgi?navbarsearch=1&host=dm-wwwbuild01
has other ideas, more than one set of definitions ?
Assignee | ||
Comment 4•14 years ago
|
||
I found them. They are hardcoded into the services.cfg file as opposed to how we usually do these in a generated way. Do you want me to change all of them to a different value? If so what value?
Comment 5•14 years ago
|
||
We haven't had much flapping recently, but it's somewhat dependent on our VM and the DB server in ways we don't yet understand. And load on the buildbot cluster has been low while we're all frozen for the 4.0 releases, so we'll have to see what happens as everything cranks back up.
Which is a long way of saying it depends what value they all have at the moment. Could you look it up ?
Assignee | ||
Comment 6•14 years ago
|
||
They are currently set to 7 minutes.
Comment 7•14 years ago
|
||
Lets leave them at 7 minutes. They haven't been flapping recently, and anyway we want this information to be timely. Plus, we know there's a leak somewhere in the buildapi code which provides the files we're monitoring, and the sooner we know that's slowing things down the better.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•