Closed Bug 805587 Opened 12 years ago Closed 12 years ago

Remove CentOS5 cluster Nagios alert once those build slaves are retired

Categories

(Release Engineering :: General, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jhopkins, Unassigned)

References

Details

Once everything is running on mock and we don't need the centos5 build slaves anymore, this Nagios alert can be retired: admin1.infra.scl1:scl3-bld-centos5-32-try-buildbot cluster is CRITICAL: CLUSTER CRITICAL: scl3-bld-centos5-32-try-buildbot cluster: 2 ok, 15 warning, 0 unknown, 0 critical
Depends on: 813612, 813613
per arr: > that cluster check will go away as soon as we get the clear to delete the vms > that check is ONLY for the vmware vms
No longer depends on: 813612, 813613
This check doesn't exist anymore. Did you perhaps mean: scl3-bld-centos5-32-build-buildbot cluster, scl3-bld-centos5-32-build-ping cluster, scl3-bld-centos5-64-build-buildbot cluster, and scl3-bld-centos5-64-build-ping cluster? Nagios shows that all of the 64-bit builders are disabled and 16 of the 22 32-bit builders are disabled. http://nagios1.private.releng.scl3.mozilla.com/releng-scl3/
Depends on: 813612, 813613
I think nthomas may have more context on what the plans are.
Flags: needinfo?(nthomas)
As Amy says, we already removed the try VMs and the nagios checks (bug 804763 and friends). Bug 804766 has the work on the releng side to get us ready to remove the non-try VMs. That was originally filed when the netapp was overloaded, but it helped us realize we're not using the VMs much so they're for the chop.
Flags: needinfo?(nthomas)
So this is WORKSFORME then?
coop: yes, I believe it is
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
This would be WFM if it refers to the *try* VMs, which are gone, but it's for the non-try VMs.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
The cluster checks I can see on nagios1.private.releng.scl3.mozilla.com are: scl1-linux-slave-buildbot cluster scl1-linux-slave-ping cluster scl1-linux64-slave-buildbot cluster scl1-linux64-slave-ping cluster scl3-bld-centos5-32-build-buildbot cluster scl3-bld-centos5-32-build-ping cluster scl3-bld-centos5-64-build-buildbot cluster scl3-bld-centos5-64-build-ping cluster AFAICT the first block are dependent on the physical ix machines getting reimaged as w64, the second on the VMs in bug 804766 getting turned destroyed. Both of those depend on bug 805577, bug 813613 and the related T'bird ESR moves to stop using slaves.
All but 6 of the bld-centos5-32 vms have not been running buildbot for over three months now. Can we delete those unused VMs and reclaim those VMWare resources? http://nagios1.private.releng.scl3.mozilla.com/releng-scl3/cgi-bin/status.cgi?servicegroup=scl3-bld-centos5-32-build-buildbot-service&style=detail&&hoststatustypes=2&hostprops=0 None of the bld-centos5-64 vms have been running buildbot. Can we also delete those and reclaim those VMWare resources? http://nagios1.private.releng.scl3.mozilla.com/releng-scl3/cgi-bin/status.cgi?servicegroup=scl3-bld-centos5-64-build-buildbot-service&style=detail&&hoststatustypes=2&hostprops=0
Yes, please do that. If you would rather remove all the VMs in one swoop it shouldn't be hard for us to move the fuzzing jobs to the pretty-much-idle ix slaves. RelEng: As well as cleaning up production_config.py, I think we'd have to remove the line http://hg.mozilla.org/build/buildbotcustom/file/b796dd95b29f/misc.py#l2279. Otherwise there are only fast slaves (via mozilla/build_localconfig.py) and we'd never assign work. This is already the case with linux64.
Blocks: 841331
I've gone ahead and removed the checks for all of the bld-centos5-64 machines (including the cluster checks) and the bld-centos5-32 machines that we not running buildbot and have filed bug 841331 to have the vms deleted. We can open a second bug when you're ready to get rid of the last 6 vms.
No longer blocks: 841331
Depends on: 841331
Can we get rid of the last 6 vms now that we've mostly moved to mock on centos 6?
Yeah, we can get rid of them. That work should probably happen in bug 804766.
Okay, I've removed the nagios cluster checks for them.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.