Closed Bug 1168639 Opened 9 years ago Closed 9 years ago

Figure out what broke six Windows Try buildslaves on May 14th, and fix it

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
Windows Server 2008
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Unassigned)

References

Details

My money is on these being leftovers from that day when every single Windows try buildslave except the puppet testing pool was broken, but I don't actually remember for sure that that day was the 14th.
I don't know what happened. I'm going to reimage all of them.
So it looks like the c:\builds\moz2_slave dir is missing after I reimaged them with the instructions here https://mana.mozilla.org/wiki/display/AVSE/MoCo+Vidyo+Room+Security Talked to Q in irc and he will investigate
Flags: needinfo?(q)
I assume that was a mis paste for the instructions? Q
Flags: needinfo?(q)
It looks like the start buildbot scheduled task is missing this may have something to do with filters setup for the runner testing. I am hashing that out now.
Looks like the wrong gpo was linked to try during the runner testing. "Scheduled_tasks_testers" was on the try OU and "Scheduled_tasks_testers_builders" was on the build OU. Correcting this and running a gpupdate /force fixed the issue on 0173. A reboot should now fix the problem.
DO you want to reboot the affected slaves or should I ?
Flags: needinfo?(kmoir)
It looks like 0173 was set to re-image after boot the others may be as well. That can be undone with ipmi commands or we can just them re-image.
After the unexpected re-image 0173 took a job as expected: http://buildbot-master83.bb.releng.scl3.mozilla.com:8101/buildslaves/b-2008-ix-0173
Ccing rob since he's been working on the runner GPO stuff.
I've just kicked off a reinstall for all but 0173.
Flags: needinfo?(kmoir)
Thanks Amy!
Only 0173 and 0183 remain enabled, someone at some point disabled the others. Those two enabled ones are taking jobs, but only in the worst possible sense of taking them: they have Mercurial 1.9.1 installed instead of Mercurial 3.2.1 like unbroken slaves have, and as a result they each took a job, and spent 4 hours and 3 hours respectively just trying and failing to clone try before I rebooted them to free up those two jobs to be taken by slaves which would actually give the developers genuine builds, instead of 1800s timeout after 1800s timeout. So now those two are disabled as well.
Q: is this a result of bug 1169387?
Flags: needinfo?(q)
I haven't changed anything for 2008 but it could be affected let me look at the gpo. Q
Flags: needinfo?(q)
This was indeed affected by https://bugzilla.mozilla.org/show_bug.cgi?id=1169387 it looks like the win8 gpo that was disabled was installing mecurial on 2008 ( both x64 operating systems). I corrected the 2008 gpo and rebooted 0173 and it got version 3.2.1.0 Q
a reboot of 174 also shows correct: C:\Users\cltbld>wmic datafile where name='c:\\mozilla-build\\hg\\hg.exe' get version Version 3.2.1.0 C:\Users\cltbld>
I will put these two in and let them bake in case something else is wrong.
Those two are looking good, so I reenabled the rest. With any luck, we're done here, thanks!
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.