Closed Bug 712456 Opened 13 years ago Closed 11 years ago

upgrade heatfink/fan/memory on remaining ix builder machines and move them to scl3

Categories

(Infrastructure & Operations :: DCOps, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Unassigned)

References

Details

(Whiteboard: #UJU-709-53233 - replacement drive for 009)

Low-priority at the moment since scl3 isn't built yet. These hosts will need to be allocated space in scl3 and moved there, via iX for heatsink/fan replacement. mw32-ix-slave01 mw32-ix-slave02 mw32-ix-slave03 mw32-ix-slave04 mw32-ix-slave05 mw32-ix-slave06 mw32-ix-slave07 mw32-ix-slave08 mw32-ix-slave09 mw32-ix-slave10 mw32-ix-slave11 mw32-ix-slave12 mw32-ix-slave13 mw32-ix-slave14 mw32-ix-slave15 mw32-ix-slave16 mw32-ix-slave17 mw32-ix-slave18 mw32-ix-slave19 mw32-ix-slave20 mw32-ix-slave21 mw32-ix-slave22 mw32-ix-slave23 mw32-ix-slave24 mw32-ix-slave25 mw32-ix-slave26
Assignee: server-ops-releng → arich
colo-trip: --- → mtv1
I don't think this is *quite* ready for an mtv1 trip yet!
colo-trip: mtv1 → ---
Summary: move mw32-ix-slave* to scl3 → upgrade heatfink/fan/memory and move mw32-ix-slave* to scl3
Assigning this to Jake and we can have iX come out and do this once we are ready to move them.
Assignee: arich → jwatkins
Status: NEW → ASSIGNED
linux-ix-ref mv-moz2-linux-ix-slave01 mv-moz2-linux-ix-slave02 mv-moz2-linux-ix-slave03 mv-moz2-linux-ix-slave04 mv-moz2-linux-ix-slave05 mv-moz2-linux-ix-slave06 mv-moz2-linux-ix-slave07 mv-moz2-linux-ix-slave08 mv-moz2-linux-ix-slave09 mv-moz2-linux-ix-slave10 mv-moz2-linux-ix-slave11 mv-moz2-linux-ix-slave12 mv-moz2-linux-ix-slave13 mv-moz2-linux-ix-slave14 mv-moz2-linux-ix-slave15 mv-moz2-linux-ix-slave16 mv-moz2-linux-ix-slave17 mv-moz2-linux-ix-slave18 mv-moz2-linux-ix-slave19 mv-moz2-linux-ix-slave20 mv-moz2-linux-ix-slave21 mv-moz2-linux-ix-slave22 mv-moz2-linux-ix-slave23 mw32-ix-slave01 mw32-ix-slave02 mw32-ix-slave03 mw32-ix-slave04 mw32-ix-slave05 mw32-ix-slave06 mw32-ix-slave07 mw32-ix-slave08 mw32-ix-slave09 mw32-ix-slave10 mw32-ix-slave11 mw32-ix-slave12 mw32-ix-slave13 mw32-ix-slave14 mw32-ix-slave15 mw32-ix-slave16 mw32-ix-slave17 mw32-ix-slave18 mw32-ix-slave19 mw32-ix-slave20 mw32-ix-slave21 mw32-ix-slave22 mw32-ix-slave23 mw32-ix-slave24 mw32-ix-slave25 mw32-ix-slave26 win32-ix-ref
Summary: upgrade heatfink/fan/memory and move mw32-ix-slave* to scl3 → upgrade heatfink/fan/memory and move mw32-ix-slave* and mv-moz2-linux-ix-slave*, and ix ref machines to scl3
Assignee: jwatkins → mlarrain
No longer blocks: releng-scl3
Depends on: 774829
Assignee: mlarrain → server-ops
Component: Server Operations: RelEng → Server Operations: DCOps
QA Contact: zandr → dmoore
This project is on hold until the hardware is released for move. Please update here when DC Ops is cleared to begin planning.
Whiteboard: [reit]
colo-trip: --- → mtv1
This bug is not currently actionable, so I'm making it infra-only to avoid confusion.
Group: infra
No longer blocks: 780022
Blocks: 780022
Group: infra
Summary: upgrade heatfink/fan/memory and move mw32-ix-slave* and mv-moz2-linux-ix-slave*, and ix ref machines to scl3 → upgrade heatfink/fan/memory on remaining ix builder machines and move them to scl3
Blocks: 784721
No longer blocks: 780022
No longer depends on: 774829
Please upgrade the following machines now but leave them in service in mtv1: mv-moz2-linux-ix-slave01 mv-moz2-linux-ix-slave02 mv-moz2-linux-ix-slave03 mv-moz2-linux-ix-slave04 mv-moz2-linux-ix-slave05 mv-moz2-linux-ix-slave06 mv-moz2-linux-ix-slave07 mv-moz2-linux-ix-slave08 mv-moz2-linux-ix-slave09 mv-moz2-linux-ix-slave10 mv-moz2-linux-ix-slave11 mv-moz2-linux-ix-slave12 mv-moz2-linux-ix-slave13 mv-moz2-linux-ix-slave14 mv-moz2-linux-ix-slave15 mv-moz2-linux-ix-slave16 mv-moz2-linux-ix-slave17 mv-moz2-linux-ix-slave18 mv-moz2-linux-ix-slave19 mv-moz2-linux-ix-slave20 mv-moz2-linux-ix-slave21 mv-moz2-linux-ix-slave22 mv-moz2-linux-ix-slave23 They can be taken offline at any time.
Blocks: 847529
Hostnames have been remapped per upcoming retask: mv-moz2-linux-ix-slave01 bld-centos6-ix-051 mv-moz2-linux-ix-slave02 bld-centos6-ix-052 mv-moz2-linux-ix-slave03 bld-centos6-ix-053 mv-moz2-linux-ix-slave04 bld-centos6-ix-054 mv-moz2-linux-ix-slave05 bld-centos6-ix-055 mv-moz2-linux-ix-slave06 bld-centos6-ix-056 mv-moz2-linux-ix-slave07 bld-centos6-ix-057 mv-moz2-linux-ix-slave08 bld-centos6-ix-058 mv-moz2-linux-ix-slave09 bld-centos6-ix-059 mv-moz2-linux-ix-slave10 bld-centos6-ix-060 mv-moz2-linux-ix-slave11 bld-centos6-ix-061 mv-moz2-linux-ix-slave12 bld-centos6-ix-062 mv-moz2-linux-ix-slave13 bld-centos6-ix-063 mv-moz2-linux-ix-slave14 bld-centos6-ix-064 mv-moz2-linux-ix-slave15 bld-centos6-ix-065 mv-moz2-linux-ix-slave16 bld-centos6-ix-066 mv-moz2-linux-ix-slave17 bld-centos6-ix-067 mv-moz2-linux-ix-slave18 bld-centos6-ix-068 mv-moz2-linux-ix-slave19 bld-centos6-ix-069 mv-moz2-linux-ix-slave20 bld-centos6-ix-070 mv-moz2-linux-ix-slave21 bld-centos6-ix-071 mv-moz2-linux-ix-slave22 bld-centos6-ix-072 mv-moz2-linux-ix-slave23 bld-centos6-ix-073
My mistake, that should be: bld-linux64-ix-051.build.mtv1.mozilla.com bld-linux64-ix-052.build.mtv1.mozilla.com bld-linux64-ix-053.build.mtv1.mozilla.com bld-linux64-ix-054.build.mtv1.mozilla.com bld-linux64-ix-055.build.mtv1.mozilla.com bld-linux64-ix-056.build.mtv1.mozilla.com bld-linux64-ix-057.build.mtv1.mozilla.com bld-linux64-ix-058.build.mtv1.mozilla.com bld-linux64-ix-059.build.mtv1.mozilla.com bld-linux64-ix-060.build.mtv1.mozilla.com bld-linux64-ix-061.build.mtv1.mozilla.com bld-linux64-ix-062.build.mtv1.mozilla.com bld-linux64-ix-063.build.mtv1.mozilla.com bld-linux64-ix-064.build.mtv1.mozilla.com bld-linux64-ix-065.build.mtv1.mozilla.com bld-linux64-ix-066.build.mtv1.mozilla.com bld-linux64-ix-067.build.mtv1.mozilla.com bld-linux64-ix-068.build.mtv1.mozilla.com bld-linux64-ix-069.build.mtv1.mozilla.com bld-linux64-ix-070.build.mtv1.mozilla.com bld-linux64-ix-071.build.mtv1.mozilla.com bld-linux64-ix-072.build.mtv1.mozilla.com bld-linux64-ix-073.build.mtv1.mozilla.com
Blocks: 849022
Slight change of plan here. 19 of the iX boxes are going to become linux foopys to get us off of the Mac minis there. Can we please rename a subset of these for use as foopies, specifically: bld-linux64-ix-0[55-73]
coop: I'll handle that in a different bug next week since it's just a name change. That doesn't impact the need to upgrade the hardware in this dcops bug.
So far we have upgraded the heatsinks and memory of Ix systems asset tags 3132, 3133, 3134, 3135, 3136, 3137, 3139, 3140, and 3144. We will complete the rest of the upgrades on Monday.
The only one of those that had a working IPMI was 3139 (foopy125). Could you please make sure that the IPMI comes up on each device so I can kickstart them?
So I may have found some secret sauce to making the IPMI lan connections recover. From the local machine, you have to set the IP src type to static, wait till it picks that up, then switch it back to dhcp. This seems to work MUCH more reliably than doing an mc reset. ipmitool lan set 1 ipsrc static ipmitool lan set 1 ipsrc dhcp Doing this I got the connections to all but foopy122 (doesn't seem to take) and foopy124 (can't get to the host) up. Please check on those two.
We've completed the heatsink/memory upgrade of: bld-linux64-ix-051.build.mtv1.mozilla.com bld-linux64-ix-052.build.mtv1.mozilla.com bld-linux64-ix-053.build.mtv1.mozilla.com bld-linux64-ix-054.build.mtv1.mozilla.com bld-linux64-ix-055.build.mtv1.mozilla.com bld-linux64-ix-056.build.mtv1.mozilla.com bld-linux64-ix-057.build.mtv1.mozilla.com bld-linux64-ix-058.build.mtv1.mozilla.com bld-linux64-ix-059.build.mtv1.mozilla.com bld-linux64-ix-060.build.mtv1.mozilla.com bld-linux64-ix-061.build.mtv1.mozilla.com bld-linux64-ix-062.build.mtv1.mozilla.com bld-linux64-ix-063.build.mtv1.mozilla.com bld-linux64-ix-064.build.mtv1.mozilla.com bld-linux64-ix-065.build.mtv1.mozilla.com bld-linux64-ix-066.build.mtv1.mozilla.com bld-linux64-ix-067.build.mtv1.mozilla.com bld-linux64-ix-068.build.mtv1.mozilla.com bld-linux64-ix-069.build.mtv1.mozilla.com bld-linux64-ix-070.build.mtv1.mozilla.com bld-linux64-ix-071.build.mtv1.mozilla.com bld-linux64-ix-072.build.mtv1.mozilla.com bld-linux64-ix-073.build.mtv1.mozilla.com Foopy124, is still not cooperating, the green indicator light is still not responsive. We troubleshooted the equipment by resetting,swapping the ethernet cable, plugging it in another port. As well as, opening it up and check if anything was lose when we were upgrading it.
Okay, so all of the ones done last week and this week look good except: foopy124 - I can log into the mgmt console, but when I power the machine on, it doesn't even get a display. That means that either something needs to be reseated, or the magic smoke got out. foopy122 - I can get to the running OS but can't get to the IPMI. May be it just needs to have the power unplugged and plugged back in, or it may be a bad cable or switch port?
Blocks: 851579
foopy122 has bad IPMI, foopy124 had a bad DIMM and is back online.
The rest of these machines are now out of warranty will be decommissioned when their current purpose is fulfilled. They will not be moving to scl3.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Assignee: server-ops → server-ops-dcops
It turns out we're not going to decommission the iX machines after all, so reopening this bug to get these machines hardware upgraded and moved to scl3: mw32-ix-slave01.build.mtv1.mozilla.com mw32-ix-slave02.build.mtv1.mozilla.com mw32-ix-slave03.build.mtv1.mozilla.com mw32-ix-slave04.build.mtv1.mozilla.com mw32-ix-slave05.build.mtv1.mozilla.com mw32-ix-slave06.build.mtv1.mozilla.com mw32-ix-slave07.build.mtv1.mozilla.com mw32-ix-slave08.build.mtv1.mozilla.com mw32-ix-slave09.build.mtv1.mozilla.com mw32-ix-slave10.build.mtv1.mozilla.com mw32-ix-slave11.build.mtv1.mozilla.com mw32-ix-slave12.build.mtv1.mozilla.com win32-ix-ref.build.mtv1.mozilla.com linux-ix-ref.build.mtv1.mozilla.com
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 947950
Blocks: 947951
The primary nics of the machines in comment 18 should be put on vlan 236 in scl3 when they are moved. The management nics should be on vlan 216.
replaced heat sinks, seems we're short one heat sink for asset-03107, which we'll try to procure.
ServiceNow order submitted. REQ0021153
RITM0022787
colo-trip: mtv1 → scl3
van: we have three more machines that need upgrades (bug 948997). Are we lacking the parts for those? If so, how much will it cost to buy new parts (it may not be worth it)?
Flags: needinfo?(vle)
:arr, we are short on heat sinks so we would have to order 3 more. they're $35 each before taxes/shipping. http://www.heatsinkfactory.com/cooljag-den-7-cpu-cjg-36.html
Flags: needinfo?(vle)
Okay, let's go ahead and order the parts in expectation that we're going to do these last 3 machines soon after we wrap up the tegra move.
Order has been placed through servicenow. RITM0022977
Whiteboard: [reit] → 3 heatsinks ordered RITM0022977
Added the bug for the last three machines to be upgraded and moved to scl3. That should bring us to a total of 17 machines moving to scl3 and being repurposed as windows2008r2 builders. We're still hashing out the hostnames for these, but please put all the primary nics on VLAN236 (winbuild) https://inventory.mozilla.org/en-US/core/vlan/139/ and the oob interfaces on VLAN216 (inband) https://inventory.mozilla.org/en-US/core/vlan/138/
Blocks: 948997
I've started a spreadsheet to track the move for these: https://docs.google.com/a/mozilla.com/spreadsheet/ccc?key=0AhyKG0L2cstIdEMzd2RoMk1UaHg4Ty10Q1NHQXNzd0E I noticed that there are TWO switch ports listed for the primary nic, and I presume that's either an error or we mistakenly cabled up the secondary nic on those as well. If we did the latter, that's unnecessary, since we only use the primary and not secondary nic. Please make sure that we're not wasting cable and switches there if so. I put in switch1.r202-10.console.scl3.mozilla.net:ge-0/0/<port> in the spreadsheet to update inventory, so if that's incorrect, please let me know and update that column. I've also added the last three systems so that you guys can fill in switch ports, rack, rack order, and oob switch. I'm going to work with uberj to get all of the information updated in inventory based on this ss.
Inventory/DNS/DHCP has been updated for the 15 machines that are already in scl3. dcops: can someone verify that they're on the correct vlans and power cycle them so that the oob interfaces are pingable?
uber: can you take a look at amy's spreadsheet? i have updated the location and switch info for the last 3 hosts. please also note that i have updated the name of the switch, giving it the FQDN and the correct name. arr: inband mgmt switch and host switch ports have been configured. let me know if you're having any issues. vle@switch1.r202-10.ops.releng.scl3.mozilla.net# show member-range ge-0/0/23 to ge-0/0/36; member-range ge-0/0/7 to ge-0/0/9; unit 0 { family ethernet-switching { port-mode access; vlan { members releng-winbuild; } } }
:uberj: I also had to change the FQDN of the ipmi since it was missing the "releng" atom (so the SREG, A, PTR, and CNAMEs will need to be updated for that.
:van: I am unable to reach the ipmi interfaces for 0002, 0011, and 0014. :uberj: might you get a chance to update the information from the spreadsheet and add the CNAMEs today?
Flags: needinfo?(juber)
:van: also, 9 doesn't seem to see its disk, and 13 doesn't look like it's powering on.
:arr, [0002,0011,0014] are back online. 0013 was hung during the reimage phase and ive rebooted it. 0009 has a bad disk that is no longer spinning up. can we replace it with any spare drive(we have spare 1tbs) or does it have to match the specs of the other iX hosts? we moved [0017-0019] and i've confirmed IPMI is reachable.
>we moved [0017-0019] and i've confirmed IPMI is reachable. I meant we moved [0015-0017].
van: the drive for 0009 should match specs, please order and replace. and I can't reach ipmi for 0015
#UJU-709-53233 opened for drive RMA/quote. IPMI fixed for 15 and 17.
Everything but 0009 is up now, thanks!
Flags: needinfo?(juber)
following up with iX regarding hard drive for 009.
Whiteboard: 3 heatsinks ordered RITM0022977 → #UJU-709-53233 - replacement drive for 009
heat sinks upgraded on hosts and moved to SCL3. we're running into an issue with 0009 as it wont image after drive swap. i have opened Bug 964535 for relops to take a look at "The Dirty Environment" error. closing bug as we have a different bug to track 0009.
Status: REOPENED → RESOLVED
Closed: 12 years ago11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.