Closed Bug 970075 (b-2008-ix-0063) Opened 11 years ago Closed 9 years ago

b-2008-ix-0063 problem tracking

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P3)

x86_64
Windows Server 2008

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Unassigned)

References

Details

(Whiteboard: [buildduty][buildslaves][capacity])

Last job was January 22nd.
I hesitatingly put it back into production. His slave health page looks like a Christmas tree. I will have a look after it takes some jobs.
Assignee: nobody → armenzg
(In reply to (Wed-Thu. Feb.19-20th in TRIBE) Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #1) > I hesitatingly put it back into production. > His slave health page looks like a Christmas tree. > > I will have a look after it takes some jobs. I don't see this slave having taken a production job since it went nuts in January. What have you done with it?!
I will be look into it.
Depends on: 977524
Assignee: armenzg → nobody
I am currently debugging the machine.
Let's go with your diagnostics and re-imaged if needed.
runslave.oy still exits almost immediately. Armen, I'll punt this back to you to investigate.
over to armen per c#6
Assignee: nobody → armenzg
The master believes that the slave is connected and rejects the new machine. I wonder how to fix this (bug 984578). I cheated and I locked the machine to another master. This is what I'm seeing: 2014-03-17 13:10:06-0700 [-] Log opened. 2014-03-17 13:10:06-0700 [-] twistd 10.2.0 (C:\mozilla-build\buildbotve\scripts\python.exe 2.6.5) starting up. 2014-03-17 13:10:06-0700 [-] reactor class: twisted.internet.selectreactor.SelectReactor. 2014-03-17 13:10:06-0700 [-] Starting factory <buildslave.bot.BotFactory instance at 0x02F577D8> 2014-03-17 13:10:06-0700 [-] Connecting to buildbot-master83.srv.releng.scl3.mozilla.com:9101 2014-03-17 13:10:06-0700 [-] Watching c:\builds\moz2_slave\shutdown.stamp's mtime to initiate shutdown 2014-03-17 13:10:06-0700 [Broker,client] ReconnectingPBClientFactory.failedToGetPerspective 2014-03-17 13:10:06-0700 [Broker,client] While trying to connect: Traceback from remote host -- Traceback (most recent call last): File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/spread/pb.py", line 1346, in remote_respond d = self.portal.login(self, mind, IPerspective) File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/cred/portal.py", line 116, in login ).addCallback(self.realm.requestAvatar, mind, *interfaces File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/internet/defer.py", line 260, in addCallback callbackKeywords=kw) File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/internet/defer.py", line 249, in addCallbacks self._runCallbacks() --- <exception caught here> --- File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks self.result = callback(self.result, *args, **kw) File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbot-0.8.2_hg_f23f5672becd_production_0.8-py2.7.egg/buildb ot/master.py", line 498, in requestAvatar p = self.botmaster.getPerspective(mind, avatarID) File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbot-0.8.2_hg_f23f5672becd_production_0.8-py2.7.egg/buildb ot/master.py", line 364, in getPerspective d = sl.slave.callRemote("print", "master got a duplicate connection; keeping this one") File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/spread/pb.py", line 328, in callRemote _name, args, kw) File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/spread/pb.py", line 807, in _sendMessage raise DeadReferenceError("Calling Stale Broker") twisted.spread.pb.DeadReferenceError: Calling Stale Broker 2014-03-17 13:10:06-0700 [Broker,client] Lost connection to buildbot-master83.srv.releng.scl3.mozilla.com:9101 2014-03-17 13:10:06-0700 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x02F577D8> 2014-03-17 13:10:06-0700 [-] Main loop terminated. 2014-03-17 13:10:06-0700 [-] Server Shut Down. 2014-03-17 13:10:06-0700 [-] Server Shut Down.
I add the try keys and put it once again into production.
Green.
Assignee: armenzg → nobody
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Alias: w64-ix-slave169 → b-2008-ix-0063
Summary: w64-ix-slave169 problem tracking → b-2008-ix-0063 problem tracking
back in production after train D move
Attempting SSH reboot...Failed. Attempting IPMI reboot...Failed. Filed IT bug for reboot (bug 1165784)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 11 years ago9 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed. Attempting IPMI reboot...Failed. Filed IT bug for reboot (bug 1195405)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
allocated to bug 1198317
Status: RESOLVED → REOPENED
Depends on: 1198317
Resolution: FIXED → ---
deallocated from bug 1198317
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed. Attempting IPMI reboot...Failed. Filed IT bug for reboot (bug 1223172)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Returned to prod after having its disk drives replaced and getting a re-image.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.