Closed Bug 900873 (panda-0044) Opened 11 years ago Closed 11 years ago

panda-0044 problem tracking

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P3)

ARM
Android

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Unassigned)

References

Details

(Whiteboard: [buildduty][buildslaves][capacity])

Failing over 50% of the jobs it's done: https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?name=panda-0044 eg: https://tbpl.mozilla.org/php/getParsedLog.php?id=26061280&tree=Try#error0 { 00:26:26 INFO - 08/02/2013 00:26:26: INFO: Uninstalling org.mozilla.fennec... 00:26:28 INFO - 08/02/2013 00:26:28: WARNING: Automation Error: Unable to reboot panda-0044 via Relay Board. 00:26:28 INFO - 08/02/2013 00:26:28: WARNING: Automation Error: Unable to reboot panda-0044 via Relay Board. 00:26:28 INFO - 08/02/2013 00:26:28: WARNING: Automation Error: Unable to reboot panda-0044 via Relay Board. 00:26:28 INFO - 08/02/2013 00:26:28: WARNING: Automation Error: Unable to reboot panda-0044 via Relay Board. 00:26:28 INFO - 08/02/2013 00:26:28: WARNING: Automation Error: Unable to reboot panda-0044 via Relay Board. 00:26:28 INFO - 08/02/2013 00:26:28: INFO: verifyDevice: failing to cleanup device 00:26:28 INFO - reconnecting socket 00:26:28 INFO - removing file: /mnt/sdcard/writetest 00:26:28 INFO - reconnecting socket 00:26:28 INFO - reconnecting socket 00:26:28 INFO - reconnecting socket 00:26:28 INFO - reconnecting socket 00:26:28 ERROR - Return code: 1 00:26:28 CRITICAL - Preparing to abort run due to failed verify check. 00:26:28 INFO - Request 'http://mobile-imaging-010.p10.releng.scl1.mozilla.com/api/request/284670/' deleted on cleanup 00:26:28 FATAL - Dieing due to failing verification 00:26:28 FATAL - Exiting -1 00:26:28 INFO - Running post-action listener: _resource_record_post_action } Please disable.
Flags: needinfo?(bugspam.Callek)
Forced state "disabled" via lifeguard
Flags: needinfo?(bugspam.Callek)
I might be reading this wrong but it looks like the relay board is being contacted directly and its making a request to mozpool. Those two don't mix. Callek, am I reading that right?
Flags: needinfo?(bugspam.Callek)
For the record, changing the state in mozpool is absolutely, unequivocally, 100% the wrong thing to do here. IRC conversation suggests that there's no way in the releng automation to disable a particular panda. If that's true, then that needs to be fixed quickly, and in the interim, manually killing clientproxy/buildslave processes is a better solution.
panda-0044 is not listed in devices.json so we have no mapping from panda->foopy. I'm not sure how that happened (I know now that it's a bug), but it led me to believe that using mozpool was the way to disable the slave. I then reused that logic on other mozpool-managed slaves. I understand now that *all* pandas are managed via the disabled flag on a foopy, and that this specific instance with panda-0044 was a one-off. I've added this to our wiki documentation. Having said that, I've searched the devices.json history in Mercurial and not found any matches for panda-0044. After looking at all the foopies, I found that panda-0044 is hosted on foopy103. I've added it to devices.json based on details in Inventory and stopped it via manage_foopies.py.
Product: mozilla.org → Release Engineering
Depends on: 902657
(In reply to Jake Watkins [:dividehex] from comment #2) > I might be reading this wrong but it looks like the relay board is being > contacted directly and its making a request to mozpool. Those two don't mix. > > Callek, am I reading that right? We've been over this before, after we request from mozpool we *do* do stuff that involves direct-relay-board. We can't fix that until after all pandas are handled with mozpool
Flags: needinfo?(bugspam.Callek)
Sending this slave to recovery -->Automated message.
recovered by "panda-recovery" bug 902657
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Depends on: 1148116
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.