Closed Bug 754288 Opened 13 years ago Closed 11 years ago

Fix nagios alerts for preproduction and preproduction-stage

Categories

(Release Engineering :: General, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: coop, Unassigned)

References

Details

(Whiteboard: [monitoring][nagios][preproduction])

Attachments

(1 file)

On old pp-master we had nagios checks not puppetized. Right now to make it work with the new system we have to fix the following checks: * MySQL connectivity: it should use localhost * buildbot: min 1, max 6 processes
Priority: -- → P3
Attached patch preproduction puppet changes (deleted) — Splinter Review
* Added min/max_master variable * check_mysql moved to check_mysql.cfg.erb * $libdir moved to paths::libdir I tested it with --noop --environmnt=rail to check the diff and it looked fine on pp-master and on one of the production bm.
Attachment #638503 - Flags: review?(catlee)
Assignee: nobody → rail
I found another couple of issues. preproduction-master is trying to upload Thunderbird logs using the ffxbld key, so we end up with alerts like this preproduction-master.srv.releng.scl3:Command Queue is CRITICAL: 9 dead items when that fails. If you look at /builds/buildbot/builder-master/postrun.cfg it doesn't have all the things you'd expect from a prod master. Is this a problem with how we setup the masters after tearing them down every week ? /builds/buildbot/release-master/master/postrun.cfg is also missing. Puppet also seems to be busted: Jul 4 14:30:53 preproduction-master puppet-agent[5775]: Could not retrieve catalog from remote server: Error 400 on SERVER: No support for http method POST Jul 4 14:30:53 preproduction-master puppet-agent[5775]: Using cached catalog Jul 4 14:30:53 preproduction-master puppet-agent[5775]: Could not run Puppet configuration client: interning empty string Google says this is from running a v2.7 client against a v2.6 master, but we have 0.25.4 on pp-m, and 0.25.5 on master-puppet1.
> /builds/buildbot/release-master/master/postrun.cfg is also missing. Yeah, that's bug 739513 postrun.cfg is managed by puppet, but pp masters aren't really managed by it...
Comment on attachment 638503 [details] [diff] [review] preproduction puppet changes Review of attachment 638503 [details] [diff] [review]: ----------------------------------------------------------------- ::: modules/buildmaster/manifests/init.pp @@ +25,5 @@ > + $min_masters = $num_masters > + } > + if $max_masters == '' { > + $max_masters = $num_masters > + } are you sure this works? puppet doesn't like overriding variables.
Comment on attachment 638503 [details] [diff] [review] preproduction puppet changes (In reply to Chris AtLee [:catlee] from comment #5) > Comment on attachment 638503 [details] [diff] [review] > preproduction puppet changes > > Review of attachment 638503 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: modules/buildmaster/manifests/init.pp > @@ +25,5 @@ > > + $min_masters = $num_masters > > + } > > + if $max_masters == '' { > > + $max_masters = $num_masters > > + } > > are you sure this works? puppet doesn't like overriding variables. --noop generated no diff for a production master and generated 1:6 for pp master. I'll investigate this issue deeper.
Attachment #638503 - Flags: review?(catlee)
back to the pool
Assignee: rail → nobody
Priority: P3 → --
We killed preprod masters.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: