Closed
Bug 1168639
Opened 9 years ago
Closed 9 years ago
Figure out what broke six Windows Try buildslaves on May 14th, and fix it
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Unassigned)
References
Details
My money is on these being leftovers from that day when every single Windows try buildslave except the puppet testing pool was broken, but I don't actually remember for sure that that day was the 14th.
Comment 1•9 years ago
|
||
I don't know what happened. I'm going to reimage all of them.
Comment 2•9 years ago
|
||
So it looks like the c:\builds\moz2_slave dir is missing after I reimaged them with the instructions here https://mana.mozilla.org/wiki/display/AVSE/MoCo+Vidyo+Room+Security
Talked to Q in irc and he will investigate
Flags: needinfo?(q)
It looks like the start buildbot scheduled task is missing this may have something to do with filters setup for the runner testing. I am hashing that out now.
Looks like the wrong gpo was linked to try during the runner testing. "Scheduled_tasks_testers" was on the try OU and "Scheduled_tasks_testers_builders" was on the build OU. Correcting this and running a gpupdate /force fixed the issue on 0173. A reboot should now fix the problem.
DO you want to reboot the affected slaves or should I ?
Flags: needinfo?(kmoir)
It looks like 0173 was set to re-image after boot the others may be as well. That can be undone with ipmi commands or we can just them re-image.
After the unexpected re-image 0173 took a job as expected:
http://buildbot-master83.bb.releng.scl3.mozilla.com:8101/buildslaves/b-2008-ix-0173
Comment 9•9 years ago
|
||
Ccing rob since he's been working on the runner GPO stuff.
Comment 11•9 years ago
|
||
Thanks Amy!
Reporter | ||
Comment 12•9 years ago
|
||
Only 0173 and 0183 remain enabled, someone at some point disabled the others.
Those two enabled ones are taking jobs, but only in the worst possible sense of taking them: they have Mercurial 1.9.1 installed instead of Mercurial 3.2.1 like unbroken slaves have, and as a result they each took a job, and spent 4 hours and 3 hours respectively just trying and failing to clone try before I rebooted them to free up those two jobs to be taken by slaves which would actually give the developers genuine builds, instead of 1800s timeout after 1800s timeout.
So now those two are disabled as well.
Comment 14•9 years ago
|
||
I haven't changed anything for 2008 but it could be affected let me look at the gpo.
Q
Flags: needinfo?(q)
Comment 15•9 years ago
|
||
This was indeed affected by https://bugzilla.mozilla.org/show_bug.cgi?id=1169387 it looks like the win8 gpo that was disabled was installing mecurial on 2008 ( both x64 operating systems). I corrected the 2008 gpo and rebooted 0173 and it got version 3.2.1.0
Q
Comment 16•9 years ago
|
||
a reboot of 174 also shows correct:
C:\Users\cltbld>wmic datafile where name='c:\\mozilla-build\\hg\\hg.exe' get version
Version
3.2.1.0
C:\Users\cltbld>
Comment 17•9 years ago
|
||
I will put these two in and let them bake in case something else is wrong.
Reporter | ||
Comment 18•9 years ago
|
||
Those two are looking good, so I reenabled the rest. With any luck, we're done here, thanks!
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•