Closed
Bug 1069095
(panda-0619)
Opened 10 years ago
Closed 10 years ago
panda-0619 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Unassigned)
References
()
Details
(Whiteboard: [buildduty][buildslaves][capacity])
Hasn't taken a job for 26 days.
Comment 1•10 years ago
|
||
Had to do a little bit of extra work to get this panda back into production. Required a self-test run, followed by a re-image, but it's taking jobs now.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 2•10 years ago
|
||
Failing every other job, disabled in slavealloc. (Also rather suspicious that it only did two jobs, while most of this busted set did six or eight today before I disabled them.)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 3•10 years ago
|
||
Tools repo updated on foopy so that this panda can be properly rebooted again.
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 4•10 years ago
|
||
30% green, 14% orange, 18% red, 38% retry. According to Ouija, we expect 21.1% failure for pandas, not 70%. Disabled in slavealloc.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 5•10 years ago
|
||
replaced SD card, panda passed self test.
Reporter | ||
Comment 6•10 years ago
|
||
Reenabled to build up to strike two.
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 7•10 years ago
|
||
Score since then: 10 green, 3 orange, 5 red, 16 blue. Expected failure rate (including retries) for a panda is currently at 26.9%, and that's 70.6%.
Strike two, disabled in slavealloc.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 8•10 years ago
|
||
this panda had a bad power cable going back to the power control relay board. i moved the power over to bank 8 and updated inventory however the board is not completing the self test. is there another file that needs to be updated to reflect the relay setting or maybe perhaps time to decomm this board?
Comment 9•10 years ago
|
||
(In reply to Van Le [:van] from comment #8)
> this panda had a bad power cable going back to the power control relay
> board. i moved the power over to bank 8 and updated inventory however the
> board is not completing the self test. is there another file that needs to
> be updated to reflect the relay setting or maybe perhaps time to decomm this
> board?
Adding a NI for Jake to see if there's something else that needs updating.
Flags: needinfo?(jwatkins)
Comment 10•10 years ago
|
||
(In reply to Van Le [:van] from comment #8)
> this panda had a bad power cable going back to the power control relay
> board. i moved the power over to bank 8 and updated inventory however the
> board is not completing the self test. is there another file that needs to
> be updated to reflect the relay setting or maybe perhaps time to decomm this
> board?
Simply changing the system.relay.0 k/v in inventory is correct. The mozpool inventory sync cron will pick it up at the 15,45 hour marks.
Bank 8 does not exist on those relay boards. You might have meant bank 2, relay 8 which is also what inventory shows and what mozpool(bmm) is showing. So you might want to double check which relay it is connected to. At the bottom of this mana page is the board layout. You can us that to see which bank and relay you are actually connected to.
https://mana.mozilla.org/wiki/display/IT/Power+Control+Relay+Board
If you have a volt meter, you can check that the cable is getting power and you can use that meter to see if the selftest is actually hitting the relay you 'think' you are on.
Also, how did you determine the power cable was bad? It may have been a blown fuse. In that case, the power cable connector shorted out on the chassis or the pandaboard smoked and blew the fuse. And I seem to recall we left the fuses out on the 12th power cable (the one that goes to the empty panda bracket). So if you switched over to that one, make sure there is a fuse for it. And you should remove the fuse for the power connector that is not in place. (so it doesn't short out also)
If you are sure it is receiving power and connected to the correct relay then the pandaboard is probably smoked in which I would say decomm it. If you're sure your on the right relay and don't have a meter, decomm it.
Flags: needinfo?(jwatkins)
Comment 11•10 years ago
|
||
>Bank 8 does not exist on those relay boards. You might have meant bank 2, relay 8 which is also what inventory shows and what mozpool(bmm) is showing.
yah inventory is correct, i mean to write bank 2 relay 8, not sure what happened.
>Also, how did you determine the power cable was bad? It may have been a blown fuse.
I tested by moving the cables back and forth and replaced the fuse several several times. It could be possible that relay or cable somehow shorted but I checked the connections as well. I didn't have a volt meter on hand but I'll give it another look before I decomm it. Thanks for the info.
Comment 12•10 years ago
|
||
(In reply to Van Le [:van] from comment #11)
> >Bank 8 does not exist on those relay boards. You might have meant bank 2, relay 8 which is also what inventory shows and what mozpool(bmm) is showing.
>
> yah inventory is correct, i mean to write bank 2 relay 8, not sure what
> happened.
I've updated our copy of the relay info:
https://hg.mozilla.org/build/tools/rev/751cafa489e5
> >Also, how did you determine the power cable was bad? It may have been a blown fuse.
>
> I tested by moving the cables back and forth and replaced the fuse several
> several times. It could be possible that relay or cable somehow shorted but
> I checked the connections as well. I didn't have a volt meter on hand but
> I'll give it another look before I decomm it. Thanks for the info.
Did we come to any resolution here, i.e. did we find any further problems? Can I return this panda to service? In the absence of any errors, I'd prefer not to decomm.
Flags: needinfo?(vle)
Comment 13•10 years ago
|
||
it's bad, we should decommission it. FWIW, we have over 200+ panda spares just sitting around.
Flags: needinfo?(vle)
Comment 14•10 years ago
|
||
Decommissioned.
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•