Closed Bug 937313 Opened 11 years ago Closed 10 years ago

please run hardware diagnostics on and reimage talos-r3-fed-057

Categories

(Infrastructure & Operations :: DCOps, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Unassigned)

References

Details

(Whiteboard: error: 4M0T/1/40000002: Master -0)

It's having read-only filesystem issues.
colo-trip: --- → scl1
Whiteboard: Running Diagnostics (AHT)
Keeps on crashing and restarting while running Diagnostics. Will resume on fix tomorrow.
Whiteboard: Running Diagnostics (AHT)
Machine still crashing during diagnostics test when checking for memory phase. Will start the re-image process to see if will go through.
Whiteboard: Re-Image in Progess
Whiteboard: Re-Image in Progess → Waiting for re-image
Diagnostics indicates the following error code:
4M0T/1/40000002: Master -0

I was not able to search online for the exact code details. Apparently Apple does not provide this to the general public.  *shame on them*
  
I'm thinking it might be a motherboard issue.  Let me know how you want to proceed.
Whiteboard: Waiting for re-image → error: 4M0T/1/40000002: Master -0
vinh: can you try the PRAM and SMC reset procedures? (https://discussions.apple.com/thread/1404579?start=0&tstart=0) and retry the diagnostics?
Flags: needinfo?(vhua)
Sure I'll give it a shot.
Flags: needinfo?(vhua)
PRAM and SMC reset did not resolve the error:  4M0T/1/40000002: Master -0 issue.
We've just decomm the talos-r3-xp/w7 minis.  I can reimage one of those to replace this faulty one.  Or do you prefer to decomm this host?
Flags: needinfo?(jhopkins)
Armen: should we replace this failed talos-r3-fed slave with one of the talos-r3-xp/w7 minis or just decomm without replacement?
Flags: needinfo?(jhopkins) → needinfo?(armenzg)
vinh, it would be great if we could re-purpose a bunch of Win7/WinXp machines as Fed64/Fed machines.

Let me include you on an email thread.
Flags: needinfo?(armenzg)
Armen,
Haven't heard any further discussion on the re-purpose of win7/xp to Fedora machines from the email.  In the meantime should r3-fed-057 be decomm?
Hi vinh,
I'm encouraging developers to move us to AWS before the end of Q1.
Even if we imaged one of those win7/winxp machines as fedora machines we don't know:
1) if the image still works as expected (I don't know when was the last time we re-imaged a fed/fed64 machine
2) we don't have puppet anymore for these machines

I expect that we will leave the win7/xp machines untouched until we're closer to the end of Q1 and we see how running the jobs on AWS is going.

If we can't fix fed-057 easily (I don't know what comment 6 means) let's decommission it.

Thanks for your patience and keeping me accountable.
:arr - Can you remove this host from nagios for decomm?
removed from nagios
Host has been decomm.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.