Closed Bug 712398 Opened 13 years ago Closed 13 years ago

Setup buildbot-master21 as a mac test master

Categories

(Release Engineering :: General, defect, P3)

x86
All
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: bhearsum)

Details

(Whiteboard: [buildmasters][capacity][buildduty])

Attachments

(2 files)

We're seeing in bug 712244 PB Maximum connection being reached and CPU wio. There are 2 masters for Linux slaves, 3 masters for Macosx and 2 masters for Windows. Nevertheless, there are slightly more Windows machines per silo as their jobs take longer. I am adding almost another 30 Windows slaves and would like to be ready for it.
ok, buildbot-master21 is up in scl1, with the root password changed, and has been added to nagios and inventory.
over to releng for puppetization/configuration.
Assignee: server-ops-releng → nobody
Component: Server Operations: RelEng → Release Engineering
QA Contact: zandr → release
Thanks Dustin!
Assignee: nobody → armenzg
Status: NEW → ASSIGNED
I won't be able to touch it until the week of January 9th when I come back from vacations.
Priority: -- → P3
14:16 < nagios-sjc1> buildbot-master21.build.scl1:MySQL connectivity is ACKNOWLEDGEMENT (CRITICAL): CHECK_NRPE: Socket timeout after 10 seconds.;dustin;not set up yet So the mysql ACL for this isn't in place yet. The new plan for such tasks is to file a "Server Ops: ACLs" request for the network change, and a "Server Ops: Database" request for the MySQL change, so they can be done in parallel. Please do so on the 9th, or earlier if someone else picks this up.
I missed that I need to modify the number of CPUs *after* building the VM. Arr spotted it, and has modified the VM, so it will come up with 2 CPUs after it's rebooted. I'll leave it to you guys to schedule that (since I'm not sure if it's in prod yet).
I'll use bug 712244 to set it up.
Assignee: armenzg → server-ops-releng
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Component: Release Engineering → Server Operations: RelEng
Priority: P3 → P2
QA Contact: release → zandr
Resolution: --- → FIXED
We'll do the setup in here and leave bug 712244 to determine what is the way forward.
Assignee: server-ops-releng → nobody
No longer blocks: 712244
Status: RESOLVED → REOPENED
Component: Server Operations: RelEng → Release Engineering
Priority: P2 → --
QA Contact: zandr → release
Resolution: FIXED → ---
Summary: Please create buildbot-master21 → Setup buildbot-master21
Priority: -- → P3
Whiteboard: [buildmasters][capacity]
Armen: you switched from "I'll" in comment #7 to "We'll" in comment #8 -- who is actually going to do the work here?
Assignee: nobody → armenzg
I have not had time to work on this for the last month and not seeing the end of the tunnel. Putting back in the queue. This is not urgent. It might be more pressing when there are less r4 machines out of the pool due to dongles. We now have 3 masters of each testing OS group. I believe we should have one more macosx test master due to ratio of #slaves/#masters darwin10/darwin11 -> 170 darwin9 -> 61 win7 -> 73 xp -> 67 fed32 -> 73 fed64 -> 68 tests1-linux -> 73+68=141 tests1-windows -> 73+67=140 tests1-macosx -> 170+61=231
Assignee: armenzg → nobody
Whiteboard: [buildmasters][capacity] → [buildmasters][capacity][buildduty]
Armen, is this bug simply about getting a master instance up and running on this machine, and putting it in the production pool?
Assignee: nobody → bhearsum
yes, that is correct. Based on comment 10, having an extra macosx master would be the best use of it. Thanks!
OK, thanks. I'll try to get this done this week.
Summary: Setup buildbot-master21 → Setup buildbot-master21 as a mac test master
Attached patch update json (deleted) — Splinter Review
Attachment #602914 - Flags: review?(armenzg)
Attached patch puppet config update (deleted) — Splinter Review
Attachment #602916 - Flags: review?(armenzg)
I added this master to slavealloc, put the proper ssh keys on it. At this point, I'm just waiting for reviews, then I can turn it on I think....Once it's actually up and running we need to update Nagios to look for it, as this check is currently looking for 0 instances of buildbot ;) https://nagios.mozilla.org/nagios/cgi-bin/extinfo.cgi?type=2&host=buildbot-master21.build.scl1&service=buildbot
Attachment #602914 - Flags: review?(armenzg) → review+
Attachment #602916 - Flags: review?(armenzg) → review+
Attachment #602914 - Flags: checked-in+
Comment on attachment 602916 [details] [diff] [review] puppet config update This is landed, and the master has been created and turned on. I've pushed talos-r3-leopard-032 and talos-r4-{snow,lion}-032 to it, and after I see successful jobs from them, I'll enable the master fully.
Attachment #602916 - Flags: checked-in+
(In reply to Ben Hearsum [:bhearsum] from comment #16) > Once > it's actually up and running we need to update Nagios to look for it, as > this check is currently looking for 0 instances of buildbot ;) > https://nagios.mozilla.org/nagios/cgi-bin/extinfo.cgi?type=2&host=buildbot- > master21.build.scl1&service=buildbot Apparently Puppet changes this somewhere, so it's OK now.
I've seen successful builds from all 3 slaves, I just unlocked them and enabled the master fully. The slaves should rebalance themselves in the next couple of hours.
So, nothing else to do here, woo!
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: