Closed
Bug 754288
Opened 13 years ago
Closed 11 years ago
Fix nagios alerts for preproduction and preproduction-stage
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: coop, Unassigned)
References
Details
(Whiteboard: [monitoring][nagios][preproduction])
Attachments
(1 file)
(deleted),
patch
|
Details | Diff | Splinter Review |
A bunch of nagios checks are still failing for the new preproduction boxes:
https://nagios.mozilla.org/nagios/cgi-bin/status.cgi?navbarsearch=1&host=preproduction
https://nagios.mozilla.org/nagios/cgi-bin/status.cgi?navbarsearch=1&host=preproduction-stage
Comment 1•13 years ago
|
||
On old pp-master we had nagios checks not puppetized.
Right now to make it work with the new system we have to fix the following checks:
* MySQL connectivity: it should use localhost
* buildbot: min 1, max 6 processes
Updated•13 years ago
|
Priority: -- → P3
Comment 2•12 years ago
|
||
* Added min/max_master variable
* check_mysql moved to check_mysql.cfg.erb
* $libdir moved to paths::libdir
I tested it with --noop --environmnt=rail to check the diff and it looked fine on pp-master and on one of the production bm.
Attachment #638503 -
Flags: review?(catlee)
Updated•12 years ago
|
Assignee: nobody → rail
Comment 3•12 years ago
|
||
I found another couple of issues.
preproduction-master is trying to upload Thunderbird logs using the ffxbld key, so we end up with alerts like this
preproduction-master.srv.releng.scl3:Command Queue is CRITICAL: 9 dead items
when that fails. If you look at /builds/buildbot/builder-master/postrun.cfg it doesn't have all the things you'd expect from a prod master. Is this a problem with how we setup the masters after tearing them down every week ?
/builds/buildbot/release-master/master/postrun.cfg is also missing.
Puppet also seems to be busted:
Jul 4 14:30:53 preproduction-master puppet-agent[5775]: Could not retrieve catalog from remote server: Error 400 on SERVER: No support for http method POST
Jul 4 14:30:53 preproduction-master puppet-agent[5775]: Using cached catalog
Jul 4 14:30:53 preproduction-master puppet-agent[5775]: Could not run Puppet configuration client: interning empty string
Google says this is from running a v2.7 client against a v2.6 master, but we have 0.25.4 on pp-m, and 0.25.5 on master-puppet1.
Comment 4•12 years ago
|
||
> /builds/buildbot/release-master/master/postrun.cfg is also missing.
Yeah, that's bug 739513
postrun.cfg is managed by puppet, but pp masters aren't really managed by it...
Comment 5•12 years ago
|
||
Comment on attachment 638503 [details] [diff] [review]
preproduction puppet changes
Review of attachment 638503 [details] [diff] [review]:
-----------------------------------------------------------------
::: modules/buildmaster/manifests/init.pp
@@ +25,5 @@
> + $min_masters = $num_masters
> + }
> + if $max_masters == '' {
> + $max_masters = $num_masters
> + }
are you sure this works? puppet doesn't like overriding variables.
Comment 6•12 years ago
|
||
Comment on attachment 638503 [details] [diff] [review]
preproduction puppet changes
(In reply to Chris AtLee [:catlee] from comment #5)
> Comment on attachment 638503 [details] [diff] [review]
> preproduction puppet changes
>
> Review of attachment 638503 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: modules/buildmaster/manifests/init.pp
> @@ +25,5 @@
> > + $min_masters = $num_masters
> > + }
> > + if $max_masters == '' {
> > + $max_masters = $num_masters
> > + }
>
> are you sure this works? puppet doesn't like overriding variables.
--noop generated no diff for a production master and generated 1:6 for pp master. I'll investigate this issue deeper.
Attachment #638503 -
Flags: review?(catlee)
Comment 8•11 years ago
|
||
We killed preprod masters.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Assignee | ||
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•