Closed
Bug 1256375
Opened 9 years ago
Closed 9 years ago
Rebooting through slaveapi fails with "Expecting value: line 2 column 1 (char 1)"
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Assigned: kmoir)
References
Details
Attachments
(2 files, 1 obsolete file)
(deleted),
patch
|
coop
:
review+
kmoir
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
kmoir
:
checked-in+
|
Details | Diff | Splinter Review |
e.g. https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?class=test&type=t-xp32-ix&name=t-xp32-ix-095 though it affects all flavors of test and build slaves.
We've already lost 13% of the WinXP slaves, so if there aren't any events like a temporary slavealloc outage which takes out a huge swath of slaves, figure three or four days before it becomes a blocker.
Assignee | ||
Comment 1•9 years ago
|
||
I reverted
http://hg.mozilla.org/build/slave_health/rev/49cab2f5cf4c
since it looks like the change to slave health might be the problem
Assignee | ||
Comment 2•9 years ago
|
||
that didn't make a difference so relanding
Assignee | ||
Comment 3•9 years ago
|
||
remove references to mozpool and devices.json etc that are preventing reboots
Attachment #8730454 -
Flags: review?(coop)
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → kmoir
Comment 4•9 years ago
|
||
I managed to connect to almost all of the 13 slaves mentioned above - some were running jobs, so I only rebooted the idle ones.
Another problem here is that their status from Slave Health dashboard does not change because /slave_health/json/test-t-xp32-ix.json file does not get updated (that is valid for other pools as well).
If we open the json file, it can be noticed that it says: generated: "2016-03-14T15:00:09.008130Z"..which does not look right.
Updated•9 years ago
|
Attachment #8730454 -
Flags: review?(coop) → review+
Assignee | ||
Comment 5•9 years ago
|
||
Comment on attachment 8730454 [details] [diff] [review]
bug1256375.patch
Now following instructions here
https://wiki.mozilla.org/ReleaseEngineering/Applications/SlaveAPI
to deploy to dev, then prod
Attachment #8730454 -
Flags: checked-in+
Assignee | ||
Comment 6•9 years ago
|
||
Comment on attachment 8730454 [details] [diff] [review]
bug1256375.patch
actually I'm getting
remote: abort: could not lock repository /repo/hg/mozilla/build/slaveapi: Permission denied
abort: unexpected response: empty string
and can't land it there
Attachment #8730454 -
Flags: checked-in+ → checked-in-
Reporter | ||
Comment 7•9 years ago
|
||
Win7 slaves dead so far (with 3232 pending jobs): t-w732-ix-173 t-w732-ix-047 t-w732-ix-026 t-w732-ix-126 t-w732-ix-028 t-w732-ix-041 t-w732-ix-258 t-w732-ix-281 t-w732-ix-031 t-w732-ix-162 t-w732-ix-150
Assignee | ||
Comment 8•9 years ago
|
||
I added back buildfarm/mobile/devices.json until we can get my commit rights sorted out in bug 1257283.
Assignee | ||
Comment 9•9 years ago
|
||
Assignee | ||
Comment 10•9 years ago
|
||
Comment on attachment 8730454 [details] [diff] [review]
bug1256375.patch
really checked in this time but in git
Attachment #8730454 -
Flags: checked-in- → checked-in+
Assignee | ||
Comment 11•9 years ago
|
||
Hit this error when I restarted production slaveapi so I have reverted the version of slaveapi puppet/production to 1.5.0
2016-03-17 11:26:08,442 - INFO - t-w732-ix-026 - Getting inventory info
2016-03-17 11:26:08,629 - ERROR - t-w732-ix-026 - Something went wrong while processing!
2016-03-17 11:26:08,630 - ERROR - t-w732-ix-026 - Traceback (most recent call last):
2016-03-17 11:26:08,630 - ERROR - t-w732-ix-026 -
2016-03-17 11:26:08,630 - ERROR - t-w732-ix-026 - File "/builds/slaveapi/prod/lib/python2.7/site-packages/slaveapi/processor.py", line 64, in _worker
2016-03-17 11:26:08,630 - ERROR - t-w732-ix-026 - res, msg = action(slave, *args, **kwargs)
2016-03-17 11:26:08,630 - ERROR - t-w732-ix-026 -
2016-03-17 11:26:08,630 - ERROR - t-w732-ix-026 - File "/builds/slaveapi/prod/lib/python2.7/site-packages/slaveapi/actions/reboot.py", line 32, in reboot
2016-03-17 11:26:08,631 - ERROR - t-w732-ix-026 - slave.load_inventory_info()
2016-03-17 11:26:08,631 - ERROR - t-w732-ix-026 -
2016-03-17 11:26:08,631 - ERROR - t-w732-ix-026 - File "/builds/slaveapi/prod/lib/python2.7/site-packages/slaveapi/slave.py", line 60, in load_inventory_info
2016-03-17 11:26:08,631 - ERROR - t-w732-ix-026 - info = Machine.load_inventory_info(self)
2016-03-17 11:26:08,631 - ERROR - t-w732-ix-026 -
2016-03-17 11:26:08,631 - ERROR - t-w732-ix-026 - File "/builds/slaveapi/prod/lib/python2.7/site-packages/slaveapi/machines/base.py", line 44, in load_inventory_info
2016-03-17 11:26:08,631 - ERROR - t-w732-ix-026 - if info["pdu_fqdn"]:
2016-03-17 11:26:08,632 - ERROR - t-w732-ix-026 -
2016-03-17 11:26:08,632 - ERROR - t-w732-ix-026 - TypeError: 'NoneType' object has no attribute '__getitem__'
2016-03-17 11:26:08,632 - ERROR - t-w732-ix-026 -
2016-03-17 11:26:08,632 - ERROR - t-w732-ix-026 -
Assignee | ||
Comment 12•9 years ago
|
||
found the problem, my previous patch removed the removed statement from get_system by accident
Attachment #8731833 -
Flags: review?(coop)
Assignee | ||
Comment 13•9 years ago
|
||
I built a slaveapi 1.6.1 with this patch + bumping the slaveapi version in puppet. I deployed it to the production slaveapi instance and it all seems to be working now - I can reboot machines etc. Now all I have to do is land these patches on g.m.org once it up again.
Attachment #8731833 -
Attachment is obsolete: true
Attachment #8731833 -
Flags: review?(coop)
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 14•9 years ago
|
||
...err reopened until kim can land the patches
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Updated•9 years ago
|
Attachment #8732148 -
Flags: checked-in+
Assignee | ||
Comment 15•9 years ago
|
||
Landed
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•