Closed
Bug 900873
(panda-0044)
Opened 11 years ago
Closed 11 years ago
panda-0044 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Unassigned)
References
Details
(Whiteboard: [buildduty][buildslaves][capacity])
Failing over 50% of the jobs it's done:
https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?name=panda-0044
eg:
https://tbpl.mozilla.org/php/getParsedLog.php?id=26061280&tree=Try#error0
{
00:26:26 INFO - 08/02/2013 00:26:26: INFO: Uninstalling org.mozilla.fennec...
00:26:28 INFO - 08/02/2013 00:26:28: WARNING: Automation Error: Unable to reboot panda-0044 via Relay Board.
00:26:28 INFO - 08/02/2013 00:26:28: WARNING: Automation Error: Unable to reboot panda-0044 via Relay Board.
00:26:28 INFO - 08/02/2013 00:26:28: WARNING: Automation Error: Unable to reboot panda-0044 via Relay Board.
00:26:28 INFO - 08/02/2013 00:26:28: WARNING: Automation Error: Unable to reboot panda-0044 via Relay Board.
00:26:28 INFO - 08/02/2013 00:26:28: WARNING: Automation Error: Unable to reboot panda-0044 via Relay Board.
00:26:28 INFO - 08/02/2013 00:26:28: INFO: verifyDevice: failing to cleanup device
00:26:28 INFO - reconnecting socket
00:26:28 INFO - removing file: /mnt/sdcard/writetest
00:26:28 INFO - reconnecting socket
00:26:28 INFO - reconnecting socket
00:26:28 INFO - reconnecting socket
00:26:28 INFO - reconnecting socket
00:26:28 ERROR - Return code: 1
00:26:28 CRITICAL - Preparing to abort run due to failed verify check.
00:26:28 INFO - Request 'http://mobile-imaging-010.p10.releng.scl1.mozilla.com/api/request/284670/' deleted on cleanup
00:26:28 FATAL - Dieing due to failing verification
00:26:28 FATAL - Exiting -1
00:26:28 INFO - Running post-action listener: _resource_record_post_action
}
Please disable.
Flags: needinfo?(bugspam.Callek)
Comment 2•11 years ago
|
||
I might be reading this wrong but it looks like the relay board is being contacted directly and its making a request to mozpool. Those two don't mix.
Callek, am I reading that right?
Flags: needinfo?(bugspam.Callek)
Comment 3•11 years ago
|
||
For the record, changing the state in mozpool is absolutely, unequivocally, 100% the wrong thing to do here.
IRC conversation suggests that there's no way in the releng automation to disable a particular panda. If that's true, then that needs to be fixed quickly, and in the interim, manually killing clientproxy/buildslave processes is a better solution.
Comment 4•11 years ago
|
||
panda-0044 is not listed in devices.json so we have no mapping from panda->foopy. I'm not sure how that happened (I know now that it's a bug), but it led me to believe that using mozpool was the way to disable the slave. I then reused that logic on other mozpool-managed slaves.
I understand now that *all* pandas are managed via the disabled flag on a foopy, and that this specific instance with panda-0044 was a one-off. I've added this to our wiki documentation. Having said that, I've searched the devices.json history in Mercurial and not found any matches for panda-0044.
After looking at all the foopies, I found that panda-0044 is hosted on foopy103. I've added it to devices.json based on details in Inventory and stopped it via manage_foopies.py.
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Comment 5•11 years ago
|
||
(In reply to Jake Watkins [:dividehex] from comment #2)
> I might be reading this wrong but it looks like the relay board is being
> contacted directly and its making a request to mozpool. Those two don't mix.
>
> Callek, am I reading that right?
We've been over this before, after we request from mozpool we *do* do stuff that involves direct-relay-board. We can't fix that until after all pandas are handled with mozpool
Flags: needinfo?(bugspam.Callek)
Comment 6•11 years ago
|
||
Sending this slave to recovery
-->Automated message.
Comment 7•11 years ago
|
||
recovered by "panda-recovery" bug 902657
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•