Closed
Bug 712456
Opened 13 years ago
Closed 11 years ago
upgrade heatfink/fan/memory on remaining ix builder machines and move them to scl3
Categories
(Infrastructure & Operations :: DCOps, task)
Infrastructure & Operations
DCOps
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Unassigned)
References
Details
(Whiteboard: #UJU-709-53233 - replacement drive for 009)
Low-priority at the moment since scl3 isn't built yet. These hosts will need to be allocated space in scl3 and moved there, via iX for heatsink/fan replacement.
mw32-ix-slave01
mw32-ix-slave02
mw32-ix-slave03
mw32-ix-slave04
mw32-ix-slave05
mw32-ix-slave06
mw32-ix-slave07
mw32-ix-slave08
mw32-ix-slave09
mw32-ix-slave10
mw32-ix-slave11
mw32-ix-slave12
mw32-ix-slave13
mw32-ix-slave14
mw32-ix-slave15
mw32-ix-slave16
mw32-ix-slave17
mw32-ix-slave18
mw32-ix-slave19
mw32-ix-slave20
mw32-ix-slave21
mw32-ix-slave22
mw32-ix-slave23
mw32-ix-slave24
mw32-ix-slave25
mw32-ix-slave26
Reporter | ||
Updated•13 years ago
|
Blocks: releng-scl3
Updated•13 years ago
|
Assignee: server-ops-releng → arich
colo-trip: --- → mtv1
Reporter | ||
Comment 1•13 years ago
|
||
I don't think this is *quite* ready for an mtv1 trip yet!
colo-trip: mtv1 → ---
Updated•13 years ago
|
Summary: move mw32-ix-slave* to scl3 → upgrade heatfink/fan/memory and move mw32-ix-slave* to scl3
Comment 2•13 years ago
|
||
Assigning this to Jake and we can have iX come out and do this once we are ready to move them.
Assignee: arich → jwatkins
Status: NEW → ASSIGNED
Comment 3•13 years ago
|
||
linux-ix-ref
mv-moz2-linux-ix-slave01
mv-moz2-linux-ix-slave02
mv-moz2-linux-ix-slave03
mv-moz2-linux-ix-slave04
mv-moz2-linux-ix-slave05
mv-moz2-linux-ix-slave06
mv-moz2-linux-ix-slave07
mv-moz2-linux-ix-slave08
mv-moz2-linux-ix-slave09
mv-moz2-linux-ix-slave10
mv-moz2-linux-ix-slave11
mv-moz2-linux-ix-slave12
mv-moz2-linux-ix-slave13
mv-moz2-linux-ix-slave14
mv-moz2-linux-ix-slave15
mv-moz2-linux-ix-slave16
mv-moz2-linux-ix-slave17
mv-moz2-linux-ix-slave18
mv-moz2-linux-ix-slave19
mv-moz2-linux-ix-slave20
mv-moz2-linux-ix-slave21
mv-moz2-linux-ix-slave22
mv-moz2-linux-ix-slave23
mw32-ix-slave01
mw32-ix-slave02
mw32-ix-slave03
mw32-ix-slave04
mw32-ix-slave05
mw32-ix-slave06
mw32-ix-slave07
mw32-ix-slave08
mw32-ix-slave09
mw32-ix-slave10
mw32-ix-slave11
mw32-ix-slave12
mw32-ix-slave13
mw32-ix-slave14
mw32-ix-slave15
mw32-ix-slave16
mw32-ix-slave17
mw32-ix-slave18
mw32-ix-slave19
mw32-ix-slave20
mw32-ix-slave21
mw32-ix-slave22
mw32-ix-slave23
mw32-ix-slave24
mw32-ix-slave25
mw32-ix-slave26
win32-ix-ref
Summary: upgrade heatfink/fan/memory and move mw32-ix-slave* to scl3 → upgrade heatfink/fan/memory and move mw32-ix-slave* and mv-moz2-linux-ix-slave*, and ix ref machines to scl3
Updated•13 years ago
|
Assignee: jwatkins → mlarrain
Updated•13 years ago
|
No longer blocks: releng-scl3
Updated•12 years ago
|
Assignee: mlarrain → server-ops
Component: Server Operations: RelEng → Server Operations: DCOps
QA Contact: zandr → dmoore
Comment 4•12 years ago
|
||
This project is on hold until the hardware is released for move. Please update here when DC Ops is cleared to begin planning.
Updated•12 years ago
|
Whiteboard: [reit]
Updated•12 years ago
|
colo-trip: --- → mtv1
Comment 5•12 years ago
|
||
This bug is not currently actionable, so I'm making it infra-only to avoid confusion.
Group: infra
Updated•12 years ago
|
Group: infra
Updated•12 years ago
|
Summary: upgrade heatfink/fan/memory and move mw32-ix-slave* and mv-moz2-linux-ix-slave*, and ix ref machines to scl3 → upgrade heatfink/fan/memory on remaining ix builder machines and move them to scl3
Comment 6•12 years ago
|
||
Please upgrade the following machines now but leave them in service in mtv1:
mv-moz2-linux-ix-slave01
mv-moz2-linux-ix-slave02
mv-moz2-linux-ix-slave03
mv-moz2-linux-ix-slave04
mv-moz2-linux-ix-slave05
mv-moz2-linux-ix-slave06
mv-moz2-linux-ix-slave07
mv-moz2-linux-ix-slave08
mv-moz2-linux-ix-slave09
mv-moz2-linux-ix-slave10
mv-moz2-linux-ix-slave11
mv-moz2-linux-ix-slave12
mv-moz2-linux-ix-slave13
mv-moz2-linux-ix-slave14
mv-moz2-linux-ix-slave15
mv-moz2-linux-ix-slave16
mv-moz2-linux-ix-slave17
mv-moz2-linux-ix-slave18
mv-moz2-linux-ix-slave19
mv-moz2-linux-ix-slave20
mv-moz2-linux-ix-slave21
mv-moz2-linux-ix-slave22
mv-moz2-linux-ix-slave23
They can be taken offline at any time.
Comment 7•12 years ago
|
||
Hostnames have been remapped per upcoming retask:
mv-moz2-linux-ix-slave01 bld-centos6-ix-051
mv-moz2-linux-ix-slave02 bld-centos6-ix-052
mv-moz2-linux-ix-slave03 bld-centos6-ix-053
mv-moz2-linux-ix-slave04 bld-centos6-ix-054
mv-moz2-linux-ix-slave05 bld-centos6-ix-055
mv-moz2-linux-ix-slave06 bld-centos6-ix-056
mv-moz2-linux-ix-slave07 bld-centos6-ix-057
mv-moz2-linux-ix-slave08 bld-centos6-ix-058
mv-moz2-linux-ix-slave09 bld-centos6-ix-059
mv-moz2-linux-ix-slave10 bld-centos6-ix-060
mv-moz2-linux-ix-slave11 bld-centos6-ix-061
mv-moz2-linux-ix-slave12 bld-centos6-ix-062
mv-moz2-linux-ix-slave13 bld-centos6-ix-063
mv-moz2-linux-ix-slave14 bld-centos6-ix-064
mv-moz2-linux-ix-slave15 bld-centos6-ix-065
mv-moz2-linux-ix-slave16 bld-centos6-ix-066
mv-moz2-linux-ix-slave17 bld-centos6-ix-067
mv-moz2-linux-ix-slave18 bld-centos6-ix-068
mv-moz2-linux-ix-slave19 bld-centos6-ix-069
mv-moz2-linux-ix-slave20 bld-centos6-ix-070
mv-moz2-linux-ix-slave21 bld-centos6-ix-071
mv-moz2-linux-ix-slave22 bld-centos6-ix-072
mv-moz2-linux-ix-slave23 bld-centos6-ix-073
Comment 8•12 years ago
|
||
My mistake, that should be:
bld-linux64-ix-051.build.mtv1.mozilla.com
bld-linux64-ix-052.build.mtv1.mozilla.com
bld-linux64-ix-053.build.mtv1.mozilla.com
bld-linux64-ix-054.build.mtv1.mozilla.com
bld-linux64-ix-055.build.mtv1.mozilla.com
bld-linux64-ix-056.build.mtv1.mozilla.com
bld-linux64-ix-057.build.mtv1.mozilla.com
bld-linux64-ix-058.build.mtv1.mozilla.com
bld-linux64-ix-059.build.mtv1.mozilla.com
bld-linux64-ix-060.build.mtv1.mozilla.com
bld-linux64-ix-061.build.mtv1.mozilla.com
bld-linux64-ix-062.build.mtv1.mozilla.com
bld-linux64-ix-063.build.mtv1.mozilla.com
bld-linux64-ix-064.build.mtv1.mozilla.com
bld-linux64-ix-065.build.mtv1.mozilla.com
bld-linux64-ix-066.build.mtv1.mozilla.com
bld-linux64-ix-067.build.mtv1.mozilla.com
bld-linux64-ix-068.build.mtv1.mozilla.com
bld-linux64-ix-069.build.mtv1.mozilla.com
bld-linux64-ix-070.build.mtv1.mozilla.com
bld-linux64-ix-071.build.mtv1.mozilla.com
bld-linux64-ix-072.build.mtv1.mozilla.com
bld-linux64-ix-073.build.mtv1.mozilla.com
Comment 9•12 years ago
|
||
Slight change of plan here. 19 of the iX boxes are going to become linux foopys to get us off of the Mac minis there. Can we please rename a subset of these for use as foopies, specifically:
bld-linux64-ix-0[55-73]
Comment 10•12 years ago
|
||
coop: I'll handle that in a different bug next week since it's just a name change. That doesn't impact the need to upgrade the hardware in this dcops bug.
Comment 11•12 years ago
|
||
So far we have upgraded the heatsinks and memory of Ix systems asset tags 3132, 3133, 3134, 3135, 3136, 3137, 3139, 3140, and 3144. We will complete the rest of the upgrades on Monday.
Comment 12•12 years ago
|
||
The only one of those that had a working IPMI was 3139 (foopy125). Could you please make sure that the IPMI comes up on each device so I can kickstart them?
Comment 13•12 years ago
|
||
So I may have found some secret sauce to making the IPMI lan connections recover. From the local machine, you have to set the IP src type to static, wait till it picks that up, then switch it back to dhcp. This seems to work MUCH more reliably than doing an mc reset.
ipmitool lan set 1 ipsrc static
ipmitool lan set 1 ipsrc dhcp
Doing this I got the connections to all but foopy122 (doesn't seem to take) and foopy124 (can't get to the host) up. Please check on those two.
Comment 14•12 years ago
|
||
We've completed the heatsink/memory upgrade of:
bld-linux64-ix-051.build.mtv1.mozilla.com
bld-linux64-ix-052.build.mtv1.mozilla.com
bld-linux64-ix-053.build.mtv1.mozilla.com
bld-linux64-ix-054.build.mtv1.mozilla.com
bld-linux64-ix-055.build.mtv1.mozilla.com
bld-linux64-ix-056.build.mtv1.mozilla.com
bld-linux64-ix-057.build.mtv1.mozilla.com
bld-linux64-ix-058.build.mtv1.mozilla.com
bld-linux64-ix-059.build.mtv1.mozilla.com
bld-linux64-ix-060.build.mtv1.mozilla.com
bld-linux64-ix-061.build.mtv1.mozilla.com
bld-linux64-ix-062.build.mtv1.mozilla.com
bld-linux64-ix-063.build.mtv1.mozilla.com
bld-linux64-ix-064.build.mtv1.mozilla.com
bld-linux64-ix-065.build.mtv1.mozilla.com
bld-linux64-ix-066.build.mtv1.mozilla.com
bld-linux64-ix-067.build.mtv1.mozilla.com
bld-linux64-ix-068.build.mtv1.mozilla.com
bld-linux64-ix-069.build.mtv1.mozilla.com
bld-linux64-ix-070.build.mtv1.mozilla.com
bld-linux64-ix-071.build.mtv1.mozilla.com
bld-linux64-ix-072.build.mtv1.mozilla.com
bld-linux64-ix-073.build.mtv1.mozilla.com
Foopy124, is still not cooperating, the green indicator light is still not responsive. We troubleshooted the equipment by resetting,swapping the ethernet cable, plugging it in another port. As well as, opening it up and check if anything was lose when we were upgrading it.
Comment 15•12 years ago
|
||
Okay, so all of the ones done last week and this week look good except:
foopy124 - I can log into the mgmt console, but when I power the machine on, it doesn't even get a display. That means that either something needs to be reseated, or the magic smoke got out.
foopy122 - I can get to the running OS but can't get to the IPMI. May be it just needs to have the power unplugged and plugged back in, or it may be a bad cable or switch port?
Comment 16•12 years ago
|
||
foopy122 has bad IPMI, foopy124 had a bad DIMM and is back online.
Comment 17•12 years ago
|
||
The rest of these machines are now out of warranty will be decommissioned when their current purpose is fulfilled. They will not be moving to scl3.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Assignee: server-ops → server-ops-dcops
Comment 18•11 years ago
|
||
It turns out we're not going to decommission the iX machines after all, so reopening this bug to get these machines hardware upgraded and moved to scl3:
mw32-ix-slave01.build.mtv1.mozilla.com
mw32-ix-slave02.build.mtv1.mozilla.com
mw32-ix-slave03.build.mtv1.mozilla.com
mw32-ix-slave04.build.mtv1.mozilla.com
mw32-ix-slave05.build.mtv1.mozilla.com
mw32-ix-slave06.build.mtv1.mozilla.com
mw32-ix-slave07.build.mtv1.mozilla.com
mw32-ix-slave08.build.mtv1.mozilla.com
mw32-ix-slave09.build.mtv1.mozilla.com
mw32-ix-slave10.build.mtv1.mozilla.com
mw32-ix-slave11.build.mtv1.mozilla.com
mw32-ix-slave12.build.mtv1.mozilla.com
win32-ix-ref.build.mtv1.mozilla.com
linux-ix-ref.build.mtv1.mozilla.com
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 19•11 years ago
|
||
The primary nics of the machines in comment 18 should be put on vlan 236 in scl3 when they are moved. The management nics should be on vlan 216.
Comment 20•11 years ago
|
||
replaced heat sinks, seems we're short one heat sink for asset-03107, which we'll try to procure.
Comment 21•11 years ago
|
||
ServiceNow order submitted. REQ0021153
Comment 22•11 years ago
|
||
RITM0022787
Updated•11 years ago
|
colo-trip: mtv1 → scl3
Comment 23•11 years ago
|
||
van: we have three more machines that need upgrades (bug 948997). Are we lacking the parts for those? If so, how much will it cost to buy new parts (it may not be worth it)?
Updated•11 years ago
|
Flags: needinfo?(vle)
Comment 24•11 years ago
|
||
:arr, we are short on heat sinks so we would have to order 3 more. they're $35 each before taxes/shipping.
http://www.heatsinkfactory.com/cooljag-den-7-cpu-cjg-36.html
Flags: needinfo?(vle)
Comment 25•11 years ago
|
||
Okay, let's go ahead and order the parts in expectation that we're going to do these last 3 machines soon after we wrap up the tegra move.
Comment 26•11 years ago
|
||
Order has been placed through servicenow. RITM0022977
Whiteboard: [reit] → 3 heatsinks ordered RITM0022977
Comment 27•11 years ago
|
||
Added the bug for the last three machines to be upgraded and moved to scl3. That should bring us to a total of 17 machines moving to scl3 and being repurposed as windows2008r2 builders.
We're still hashing out the hostnames for these, but please put all the primary nics on VLAN236 (winbuild) https://inventory.mozilla.org/en-US/core/vlan/139/ and the oob interfaces on VLAN216 (inband) https://inventory.mozilla.org/en-US/core/vlan/138/
Blocks: 948997
Comment 28•11 years ago
|
||
I've started a spreadsheet to track the move for these: https://docs.google.com/a/mozilla.com/spreadsheet/ccc?key=0AhyKG0L2cstIdEMzd2RoMk1UaHg4Ty10Q1NHQXNzd0E
I noticed that there are TWO switch ports listed for the primary nic, and I presume that's either an error or we mistakenly cabled up the secondary nic on those as well. If we did the latter, that's unnecessary, since we only use the primary and not secondary nic. Please make sure that we're not wasting cable and switches there if so. I put in switch1.r202-10.console.scl3.mozilla.net:ge-0/0/<port> in the spreadsheet to update inventory, so if that's incorrect, please let me know and update that column.
I've also added the last three systems so that you guys can fill in switch ports, rack, rack order, and oob switch.
I'm going to work with uberj to get all of the information updated in inventory based on this ss.
Comment 29•11 years ago
|
||
Inventory/DNS/DHCP has been updated for the 15 machines that are already in scl3. dcops: can someone verify that they're on the correct vlans and power cycle them so that the oob interfaces are pingable?
Comment 30•11 years ago
|
||
uber: can you take a look at amy's spreadsheet? i have updated the location and switch info for the last 3 hosts. please also note that i have updated the name of the switch, giving it the FQDN and the correct name.
arr: inband mgmt switch and host switch ports have been configured. let me know if you're having any issues.
vle@switch1.r202-10.ops.releng.scl3.mozilla.net# show
member-range ge-0/0/23 to ge-0/0/36;
member-range ge-0/0/7 to ge-0/0/9;
unit 0 {
family ethernet-switching {
port-mode access;
vlan {
members releng-winbuild;
}
}
}
Comment 31•11 years ago
|
||
:uberj: I also had to change the FQDN of the ipmi since it was missing the "releng" atom (so the SREG, A, PTR, and CNAMEs will need to be updated for that.
Comment 32•11 years ago
|
||
:van: I am unable to reach the ipmi interfaces for 0002, 0011, and 0014.
:uberj: might you get a chance to update the information from the spreadsheet and add the CNAMEs today?
Flags: needinfo?(juber)
Comment 33•11 years ago
|
||
:van: also, 9 doesn't seem to see its disk, and 13 doesn't look like it's powering on.
Comment 34•11 years ago
|
||
I've updated info and created cnames. Just need to track down macs for https://inventory.mozilla.org/en-US/systems/show/1645/ https://inventory.mozilla.org/en-US/systems/show/1644/ and https://inventory.mozilla.org/en-US/systems/show/1643/
Comment 35•11 years ago
|
||
:arr, [0002,0011,0014] are back online. 0013 was hung during the reimage phase and ive rebooted it. 0009 has a bad disk that is no longer spinning up. can we replace it with any spare drive(we have spare 1tbs) or does it have to match the specs of the other iX hosts?
we moved [0017-0019] and i've confirmed IPMI is reachable.
Comment 36•11 years ago
|
||
>we moved [0017-0019] and i've confirmed IPMI is reachable.
I meant we moved [0015-0017].
Comment 37•11 years ago
|
||
van: the drive for 0009 should match specs, please order and replace.
and I can't reach ipmi for 0015
Comment 38•11 years ago
|
||
#UJU-709-53233 opened for drive RMA/quote. IPMI fixed for 15 and 17.
Comment 40•11 years ago
|
||
following up with iX regarding hard drive for 009.
Updated•11 years ago
|
Whiteboard: 3 heatsinks ordered RITM0022977 → #UJU-709-53233 - replacement drive for 009
Comment 41•11 years ago
|
||
heat sinks upgraded on hosts and moved to SCL3. we're running into an issue with 0009 as it wont image after drive swap. i have opened Bug 964535 for relops to take a look at "The Dirty Environment" error. closing bug as we have a different bug to track 0009.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•