Closed
Bug 428124
Opened 17 years ago
Closed 16 years ago
mac buildbot slaves should reboot ready for use
Categories
(Release Engineering :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: joduinn, Assigned: bhearsum)
References
Details
Attachments
(2 files, 3 obsolete files)
(deleted),
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
text/plain
|
Details |
Splitting out from bug#417887, as each o.s. will have different gotchas.
Basically, how to make each buildbot master/slaves reboot cleanly, reconnect and handle new jobs?
Assignee | ||
Comment 1•17 years ago
|
||
IIRC we don't have a way to launch Buildbot properly on boot in OS X. I believe there is hdiutil security context problems when it is launched from a startup script. This will require further investigation.
Comment 2•17 years ago
|
||
yeah, the problem on mac is that you need to be on the console when you launch the process so you inherit security settings from the login window.
We might be able to fake that by invoking an AppleScript from the user's StartupItems that does something like the following:
tell application "Terminal"
do script "buildbot start /builds/slave"
end tell
Assignee | ||
Updated•17 years ago
|
Component: Release Engineering → Release Engineering: Future
Priority: -- → P3
Comment 3•17 years ago
|
||
taking this to test out on qm-moz2mini01.
Assignee: nobody → rcampbell
Priority: P3 → P2
Updated•16 years ago
|
Assignee: rcampbell → nobody
Priority: P2 → P3
Assignee | ||
Comment 4•16 years ago
|
||
We chatted a bunch about this today and decided that part of this will be doing scheduled, periodic reboots of staging machines both to iron out kinks in the rebooting and to look for potential performance gains.
Status: NEW → ASSIGNED
Component: Release Engineering: Future → Release Engineering
Assignee | ||
Updated•16 years ago
|
Assignee: nobody → bhearsum
Assignee | ||
Updated•16 years ago
|
Priority: P3 → P2
Assignee | ||
Comment 5•16 years ago
|
||
Thanks to the work in bug 430833 this was easy-peasy to do on moz2-darwin9-slave08. I ran it overnight and all builds went green, except for mozilla-central leak tests, which are legitimately busted right now. I'm going to roll this out on the rest of the staging Macs later today. Once that's run for a bit we can roll it out in production.
Assignee | ||
Comment 6•16 years ago
|
||
Attachment #361784 -
Flags: review?(catlee)
Assignee | ||
Comment 7•16 years ago
|
||
Assignee | ||
Comment 8•16 years ago
|
||
I should probably document the details here:
* Ensure /builds/slave is the slavedir or a symlink to it
* Download buildbot.start.slave.plist and put it in /Library/LaunchAgents
* Make sure it is owned by root:wheel
Copy and paste:
sudo wget --no-check-certificate -Obuildbot.start.slave.plist https://bug428124.bugzilla.mozilla.org/attachment.cgi?id=361795
sudo chown root:wheel buildbot.start.slave.plist
From VNC:
* Make sure the resolution is set to 1280x1024.
* System Prefs -> Accounts -> Login Options
** Set 'Automatic Login' to 'cltbld', enter the password when prompted.
Reboot.
Updated•16 years ago
|
Attachment #361784 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 9•16 years ago
|
||
Comment on attachment 361784 [details] [diff] [review]
enable periodic reboots on staging for macs
changeset: 938:9710768473dd
Attachment #361784 -
Flags: checked‑in+
Assignee | ||
Comment 10•16 years ago
|
||
Alright, these changes have been deployed on moz2-darwin9-slave03, 04, and 08 and periodic reboots of them have been enabled in staging. Once we're confident they're stable this can be rolled out to production.
Assignee | ||
Comment 11•16 years ago
|
||
I found out this morning that Macs need an /etc/sudoers update just like Linux. I added this to the staging Macs, hopefully they'll reboot this time.
Assignee | ||
Comment 12•16 years ago
|
||
Just had the first Mac reboot and come back up successfully. Let's keep running this for awhile to make sure it's stable.
Assignee | ||
Comment 13•16 years ago
|
||
Some of the Macs haven't been coming back up properly. I've added the following to the plist file to try and help with that:
<key>StartInterval</key>
<integer>600</integer>
This will cause OS X to try and start it every 600 seconds, which will fail gracefully if it's already started but _should_ start it up if it happens not to upon boot.
Assignee | ||
Comment 14•16 years ago
|
||
I haven't seen a Mac fail to restart its slave since I added the <key>StartInterval</key> parameter. This is ready to deploy.
Assignee | ||
Comment 15•16 years ago
|
||
Here's an updated plist file with the StartInterval in it.
Attachment #361795 -
Attachment is obsolete: true
Assignee | ||
Comment 16•16 years ago
|
||
This is getting deployed tomorrow. Here's copy/paste instructions (assumes /builds/slave is the slavedir or a symlink to it):
cd /Library/LaunchAgents
sudo wget --no-check-certificate -Obuildbot.start.slave.plist https://bugzilla.mozilla.org/attachment.cgi?id=364362
sudo chown root:wheel buildbot.start.slave.plist
And once the slave becomes idle it should be rebooted to ensure everything got installed okay.
Assignee | ||
Comment 17•16 years ago
|
||
Here's the plist file with the correct syntax. I didn't have time to deploy it on everything today, but the following slaves are setup for clean boots:
bm-xserve05
bm-xserve04
bm-xserve16
bm-xserve17
bm-xserve18
bm-xserve19
bm-xserve20
bm-xserve21
bm-xserve22
fx-mac-1.9-slave1
fx-mac-1.9-slave2
moz2-darwin9-slave01
moz2-darwin9-slave02
moz2-darwin9-slave03
moz2-darwin9-slave04
moz2-darwin9-slave05
moz2-darwin9-slave06
moz2-darwin9-slave07
moz2-darwin9-slave08
This only slaves the try slaves (try-mac-slave01 -> try-mac-slave05), which I will finish up with on Monday.
Attachment #364362 -
Attachment is obsolete: true
Comment 18•16 years ago
|
||
Adjusting summary to remove masters, since they all run on linux.
Summary: mac buildbot masters/slaves should reboot ready for use → mac buildbot slaves should reboot ready for use
Assignee | ||
Comment 19•16 years ago
|
||
Alright, I've deployed this changes (with the CVS_RSH key) to the try server pool: try-mac-slave01 -> try-mac-slave05. I'm still waiting for slave04 to become idle to make sure it comes back up okay. Things are looking good other than that.
Still need to update support docs and inventory, almost done here though.
Assignee | ||
Comment 20•16 years ago
|
||
Whoops, messed up the dependencies.
Assignee | ||
Comment 21•16 years ago
|
||
Okay, this has been deployed on all of our Mac buildbot slaves now, and the support docs have been updated. We're all done here.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Summary: mac buildbot masters/slaves should reboot ready for use → mac buildbot slaves should reboot ready for use
Comment 22•16 years ago
|
||
Also fixed up the CVS_RSH=ssh on fx--mac-1.9-slave1 & 2.
Comment 23•16 years ago
|
||
And on bm-xserve16 thru 19 & 22, moz2-darwin9-slave01 thru 08 - need CVS for release update verify.
Comment 24•16 years ago
|
||
Attachment #364605 -
Attachment is obsolete: true
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•