Closed Bug 848109 Opened 12 years ago Closed 11 years ago

Recycle 10-of-15 ATeam rev3 minis as WinXP test slaves

Categories

(Infrastructure & Operations :: DCOps, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: joduinn, Assigned: arich)

References

Details

(Whiteboard: [reit])

There are 10-of-15 former ATeam rev3 minis in mtv that are currently unused. They are in a tower-stack on the floor behind ctalbert's desk. 

We should re-image these rev3 minis as WinXP test machines to help with wait times. Since the machines are currently in mtv, we'll probably want to relocate them to a proper colo too (scl1?).
colo-trip: --- → mtv1
I'd need hostname/asset tag/serial num info to find these in inventory and I need the mac address to set up dhcp and deploystudio.
We picked up the Mac-Minis from ctalbert's desk.

We are going to open a ticket to get another power supply at SCL1.
I was able to get inventory and dhcp data for two of these machines, just inventory with no dhcp for one other, and am still missing data on the other seven.  

talos-r3-xp-101.build.scl1.mozilla.com 34:15:9E:18:3C:C0
talos-r3-xp-102.build.scl1.mozilla.com 34:15:9E:18:41:08

For those two that I had dhcp data for, they should be plug and netbootable now.  Please make sure that the dongle status of these matches other xp talos machines and that the switch and rack locations are updated in inventory for the three that are already there.  

For the final eight, please provide a csv with all of the data necessary to create inventory entries form scratch.
Augh, no wonder I couldn't find the others in inventory, they had a different name.  They got re-purposed back in bug 643506, but I'm guessing they had labels on them that never got removed.  With the asset tags, I was able to track them all down and will have inventory updated shortly.
Depends on: 835395
These will need the following changed/added in inventory:

serial
switch ports
system rack
slot
Oh, ha, helps if I give you hostnames

talos-r3-xp-101.build.scl1.mozilla.com
talos-r3-xp-102.build.scl1.mozilla.com
talos-r3-xp-103.build.scl1.mozilla.com
talos-r3-xp-104.build.scl1.mozilla.com
talos-r3-xp-105.build.scl1.mozilla.com
talos-r3-xp-106.build.scl1.mozilla.com
talos-r3-xp-107.build.scl1.mozilla.com
talos-r3-xp-108.build.scl1.mozilla.com
talos-r3-xp-109.build.scl1.mozilla.com
talos-r3-xp-110.build.scl1.mozilla.com
colo-trip: mtv1 → scl1
Depends on: 850317
Blocks: 851529
Whiteboard: [reit]
dcops: don't forget to make sure that you can log in as cltbld and run:

tasklist

and get output of tasks (not a credential error).  Credential errors means that it needs to be reimaged again.
Sadly, none of them imaged correctly.  I suspect it's going to be a long fight of continual reimaging till we get them to run tasklist successfully.  Armen, I think at one point you said something about moving away from using that.  Has that come to pass?

DCOps, can you please reimage them all again (many of them aren't even pingable right now).
I've beat these two into submission:

talos-r3-xp-101.build.scl1.mozilla.com
talos-r3-xp-105.build.scl1.mozilla.com
And I got talos-r3-xp-103.build.scl1.mozilla.com working.
talos-r3-xp-109.build.scl1.mozilla.com works
(In reply to Amy Rich [:arich] [:arr] from comment #8)
> Sadly, none of them imaged correctly.  I suspect it's going to be a long
> fight of continual reimaging till we get them to run tasklist successfully. 
> Armen, I think at one point you said something about moving away from using
> that.  Has that come to pass?
> 
I've tried finding a reference to which bug we discussed this in. Do you still know what bug we discovered in? I could find out from there.
So far the following are working:

talos-r3-xp-101.build.scl1.mozilla.com
talos-r3-xp-102.build.scl1.mozilla.com
talos-r3-xp-103.build.scl1.mozilla.com

talos-r3-xp-105.build.scl1.mozilla.com

talos-r3-xp-109.build.scl1.mozilla.com


talos-r3-xp-106.build.scl1.mozilla.com looks like it died half way through an instll.  I can't boot it into OS X and therefore can't reimage it remotely.  dcops, please do that one by hand.  I will keep beating 4, 7, 8, and 10 since I can reach them remotely.
Armen: Hrm, I don't remember... we might have had the conversation about tasklist on irc.
I will setup those 5 machines from comment 13.

(In reply to Amy Rich [:arich] [:arr] from comment #14)
> Armen: Hrm, I don't remember... we might have had the conversation about
> tasklist on irc.

I will try my best to figure it out. I think it had to do to switching to mozharness for talos jobs but I can't remember.
talos-r3-xp-106 should now be reachable.
talos-r3-xp-107 is working.

talos-r3-xp-106 died in the middle of the install again.  I'm guessing it might be bad hardware.
I think xp-106, xp-108, and xp-110 should be working now.  I was viewing deploy studio server and it indicated that all three finished reimaging. I'm now able to access the three hosts via ssh and ping.

ping  talos-r3-xp-106.build.scl1.mozilla.com
PING talos-r3-xp-106.build.scl1.mozilla.com (10.12.51.212): 56 data bytes
64 bytes from 10.12.51.212: icmp_seq=0 ttl=122 time=11.896 ms
64 bytes from 10.12.51.212: icmp_seq=1 ttl=122 time=9.955 ms
^C
--- talos-r3-xp-106.build.scl1.mozilla.com ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 9.955/10.925/11.896/0.971 ms

vhua-7605:Desktop vhua$ ping  talos-r3-xp-108.build.scl1.mozilla.com
PING talos-r3-xp-108.build.scl1.mozilla.com (10.12.51.234): 56 data bytes
64 bytes from 10.12.51.234: icmp_seq=0 ttl=122 time=12.472 ms
64 bytes from 10.12.51.234: icmp_seq=1 ttl=122 time=10.904 ms
64 bytes from 10.12.51.234: icmp_seq=2 ttl=122 time=10.887 ms
^C
--- talos-r3-xp-108.build.scl1.mozilla.com ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 10.887/11.421/12.472/0.743 ms

vhua-7605:Desktop vhua$ ping  talos-r3-xp-110.build.scl1.mozilla.com
PING talos-r3-xp-110.build.scl1.mozilla.com (10.12.51.236): 56 data bytes
64 bytes from 10.12.51.236: icmp_seq=0 ttl=122 time=9.595 ms
64 bytes from 10.12.51.236: icmp_seq=1 ttl=122 time=8.799 ms
^C
--- talos-r3-xp-110.build.scl1.mozilla.com ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 8.799/9.197/9.595/0.398 ms


vhua-7605:Desktop vhua$ ssh cltbld@talos-r3-xp-110.build.scl1.mozilla.com
The authenticity of host 'talos-r3-xp-110.build.scl1.mozilla.com (10.12.51.236)' can't be established.
RSA key fingerprint is a9:68:97:6b:57:7a:c9:3d:ce:e6:0f:e9:52:95:ef:04.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'talos-r3-xp-110.build.scl1.mozilla.com,10.12.51.236' (RSA) to the list of known hosts.
cltbld@talos-r3-xp-110.build.scl1.mozilla.com's password: 

vhua-7605:Desktop vhua$ ssh cltbld@talos-r3-xp-108.build.scl1.mozilla.com
The authenticity of host 'talos-r3-xp-108.build.scl1.mozilla.com (10.12.51.234)' can't be established.
RSA key fingerprint is a9:68:97:6b:57:7a:c9:3d:ce:e6:0f:e9:52:95:ef:04.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'talos-r3-xp-108.build.scl1.mozilla.com,10.12.51.234' (RSA) to the list of known hosts.
cltbld@talos-r3-xp-108.build.scl1.mozilla.com's password: 

vhua-7605:Desktop vhua$ ssh cltbld@talos-r3-xp-106.build.scl1.mozilla.com
cltbld@talos-r3-xp-106.build.scl1.mozilla.com's password:
We're now up to the following having imaged correctly:

talos-r3-xp-101.build.scl1.mozilla.com
talos-r3-xp-102.build.scl1.mozilla.com
talos-r3-xp-103.build.scl1.mozilla.com
talos-r3-xp-104.build.scl1.mozilla.com
talos-r3-xp-105.build.scl1.mozilla.com

talos-r3-xp-109.build.scl1.mozilla.com

These still aren't functional (you have to log in a cltbld and successfully run the tasklist command to make sure they're working correctly):

talos-r3-xp-106.build.scl1.mozilla.com
talos-r3-xp-108.build.scl1.mozilla.com
talos-r3-xp-110.build.scl1.mozilla.com

Will try the last three again today.  If we can't get them working by the end of today, and they persist in failing in the middle of the imaging process, I'm going to call them unusable.
Got talos-r3-xp-110.build.scl1.mozilla.com working.
(In reply to Amy Rich [:arich] [:arr] from comment #20)
> Got talos-r3-xp-110.build.scl1.mozilla.com working.

I can't reach the slave with just "talos-r3-xp-110" as with other slaves.

Armens-MacBook-Air ~ $ host talos-r3-xp-110.build.scl1.mozilla.com
talos-r3-xp-110.build.scl1.mozilla.com has address 10.12.51.236
Armens-MacBook-Air ~ $ host talos-r3-xp-110
Host talos-r3-xp-110 not found: 3(NXDOMAIN)
Armens-MacBook-Air ~ $ host talos-r3-xp-109
talos-r3-xp-109.build.mozilla.org is an alias for talos-r3-xp-109.build.scl1.mozilla.com.
talos-r3-xp-109.build.scl1.mozilla.com has address 10.12.51.235
(In reply to Amy Rich [:arich] [:arr] from comment #19)
> 
> Will try the last three again today.  If we can't get them working by the
> end of today, and they persist in failing in the middle of the imaging
> process, I'm going to call them unusable.

If you think it is a good and they still give you trouble we can try to make them win7 machines instead and see if they behave on staging.

Thanks for all your help with this.
At last, success.  I have all of them reporting tasklist successfully.
Assignee: server-ops-dcops → arich
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.