Closed Bug 567147 Opened 14 years ago Closed 14 years ago

re-image production try slaves and move out of sandbox and into build network

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
All
task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lsblakk, Assigned: phong)

References

Details

Here's the list of win32 slaves. They have been scheduled for 10hrs of downtime in nagios, please update nagios when they are back online. try-w32-slave01 try-w32-slave02 try-w32-slave03 try-w32-slave04 try-w32-slave06 try-w32-slave07 try-w32-slave08 try-w32-slave09 try-w32-slave10 try-w32-slave11 try-w32-slave15 try-w32-slave21 try-w32-slave22 try-w32-slave23 try-w32-slave24 try-w32-slave25 try-w32-slave26 try-w32-slave27 try-w32-slave28 try-w32-slave29
Assignee: server-ops → phong
Try slaves are currently idle and can be turned off - also scheduled for 48 hours flexible nagios downtime try-linux-slave02 try-linux-slave03 try-linux-slave04 try-linux-slave07 try-linux-slave09 try-linux-slave11 try-linux-slave12 try-linux-slave17 try-linux-slave18
cancel re-imaging of: try-linux-slave02 try-linux-slave03 we will be keeping those two in the sandbox on current try for now to carry on with maemo builds.
Here is the list of mac slaves to be re-imaged and moved to build: try-mac-slave02 try-mac-slave03 try-mac-slave04 try-mac-slave06 try-mac-slave07 try-mac-slave10 try-mac-slave12 try-mac-slave13 try-mac-slave14 try-mac-slave15 try-mac-slave17 try-mac-slave18 try-mac-slave19 they have a 96 hour nagios downtime scheduled.
try-w32-slave12 try-w32-slave13 try-w32-slave14 try-w32-slave16 try-w32-slave17 try-w32-slave18 try-w32-slave19 try-w32-slave20 Those all have 80 GB build drives that need to be reduced to 30 GB.
(In reply to comment #1) > Try slaves are currently idle and can be turned off - also scheduled for 48 > hours flexible nagios downtime > > try-linux-slave02 > try-linux-slave03 > try-linux-slave04 > try-linux-slave07 > try-linux-slave09 > try-linux-slave11 > try-linux-slave12 > try-linux-slave17 > try-linux-slave18 These are done. I will come back and update nagios, but the should be online and ready.
(In reply to comment #0) > Here's the list of win32 slaves. They have been scheduled for 10hrs of > downtime in nagios, please update nagios when they are back online. > > try-w32-slave01 > try-w32-slave02 > try-w32-slave03 > try-w32-slave04 > try-w32-slave06 > try-w32-slave07 > try-w32-slave08 > try-w32-slave09 > try-w32-slave10 > try-w32-slave11 > try-w32-slave15 > try-w32-slave21 Those are done.
try-w32-slave22 try-w32-slave23 try-w32-slave24 try-w32-slave25 should be online now.
all linux and w32 slaves are done.
Flags: colo-trip+
win32 and linux slaves have been attached to the try master.
phong - try-linux-slave17 says it's the linux64 ref platform - any ideas?
i probably messed up and used the wrong template. I will recreate it in the morning.
(In reply to comment #3) > Here is the list of mac slaves to be re-imaged and moved to build: > > try-mac-slave02 > try-mac-slave03 > try-mac-slave04 > try-mac-slave06 > try-mac-slave07 > try-mac-slave10 > try-mac-slave12 > try-mac-slave13 > try-mac-slave14 > try-mac-slave15 > try-mac-slave17 > try-mac-slave18 > try-mac-slave19 > > they have a 96 hour nagios downtime scheduled. try-mac-slave13 was already done from a previous batch. The rest have been re-imaged and updated in nagios. I will bring these to MPT and rack them.
all done.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Looks like we ended up leaving try-linux-slave02/03 untouched to provide Maemo builds on the old try server. Please restore the DNS (and DHCP?) entries for them. ESX says they currently have try-linux-slave02.m.o 10.2.74.46 try-linux-slave03.m.o 10.2.74.249 Nagios also thinks try-linux-slave03 is failing PING but none of the other tests.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
try-linux-slave02 had the correct IP of 10.2.76.46. try-linux-slave03 got added back to DHCP and it's IP should be 10.2.76.65
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
try-mac-slave18 thinks that it's try-mac-slave16 - i reset the hostname but it didn't stick, on reboot it came back as 16.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
raising to critical as this is causing slaves to fail out, and hurting our wait times
Severity: normal → critical
scutil --set HostName try-mac-slave18.mozilla.org
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
i meant .build.mozilla.org.
Something's still missing here on some of the machines. For example, try-linux-slave02 can't connect to production-puppet:80. Can you double check the linux and mac try machines and ensure they have the same access to production-puppet that their moz2-* counterparts do?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
See comment #2. Those 2 are still in the sandbox network. That's to be expected.
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
(In reply to comment #21) > See comment #2. Those 2 are still in the sandbox network. That's to be > expected. They still need access to production-puppet:80.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Do we just need this for the 2 slaves left or all everything in that vlan?
(In reply to comment #23) > Do we just need this for the 2 slaves left or all everything in that vlan? What else is left there?
bug 571006 filed for this.
access to puppet granted in bug 571006
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.