Closed
Bug 631851
Opened 14 years ago
Closed 14 years ago
deploy Buildbot-0.8.4-pre-moz1 to the puppet buildslaves
Categories
(Release Engineering :: General, defect, P2)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Assigned: dustin)
References
Details
(Whiteboard: [puppet])
Attachments
(2 files, 1 obsolete file)
(deleted),
patch
|
bhearsum
:
review+
dustin
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
bhearsum
:
review+
dustin
:
checked-in+
|
Details | Diff | Splinter Review |
For a number of bugs (listed as deps to this one), we'll need to get a new version of Buildbot deployed across the pool.
Right now the plan is to make this 0.8.0r1. The code deployed now is 0.8.0 plus some backported fixes.
The alternative is to upgrade to 0.8.3, but that's a lot of changes to take. Upgrading to 0.8.3 will be easier later, when everything is run from runslave.py, since it skips 'buildbot start' (which has been renamed to 'buildslave start' in 0.8.1 and up).
Assignee | ||
Comment 1•14 years ago
|
||
From discussion on bug 631854, we should upgrade the slaves to 0.8.3, or more accurately to the current upstream HEAD, which is 0.8.3 plus some patches we want. From my probably-imperfect 'hg diff' invocations, it looks like the only local patch that must be applied is the mozilla properties command.
We can avoid drama due to 'buildbot'->'buildslave' by rolling this out at the same time as runslave.py on every machine. runslave.py does not use the 'buildslave' command.
Once the blocker bugs are done, I'll need to stage this on each platform as a proof of concept before deploying globally.
Summary: deploy a new version of Buildbot to the buildslaves → deploy Buildbot-0.8.3+patches to the buildslaves
Updated•14 years ago
|
Priority: -- → P4
Whiteboard: [puppet]
Assignee | ||
Updated•14 years ago
|
Priority: P4 → P2
Updated•14 years ago
|
Assignee: nobody → dustin
Assignee | ||
Comment 2•14 years ago
|
||
This will be changing the default location of the buildbot code from
/tools/buildbot
to
/tools/buildbot-slave
which nicely keeps new things out of the way of the old.
The existing install of /tools/buildbot is done differently on each different sort of slave, with no clear reason - in some cases, it's a hand-compiled version of Python. In some cases, a version was hand-compiled and installed in /tools/python-X.Y.Z. In some cases, it's the system Python.
The best I can discern from IRC is that it's possible that some of the runtime scripts are using /tools/buildbot/bin/python, and as such that had to be Python-2.6 or higher.
So, a few notes:
1. As written, this will use the system Python (/usr/bin/python) everywhere. This can be changed easily, but I won't change it until I see a reason to.
2. /tools/buildbot/bin/python needs to stick around for the benefit of ateam scripts, with a new bug to explore this particular interaction (probably just blowing away the directory on staging and seeing what breaks)
3. /tools/buildbot/bin and /tools/python/bin are in $PATH at least on try-mac-slaveNN. They shouldn't be. This may mess with some ateam scripts, too - in fact, maybe that's how they're using /tools/buildbot/bin/python?
Assignee | ||
Comment 3•14 years ago
|
||
Catlee rightly points out that the slaves need to have the master code installed, because they run 'buildbot sendchange'. They also need to have buildbot in their PATH, so point 3, above, is invalid.
Assignee | ||
Comment 4•14 years ago
|
||
My plan is to get this installed and operational on each flavor of slave, and then deploy it in such a way that 0.8.4 is installed everywhere but only active on staging slaves. Then we can look for any trouble in staging before rolling it out everywhere.
Assignee | ||
Updated•14 years ago
|
Summary: deploy Buildbot-0.8.3+patches to the buildslaves → deploy Buildbot-0.8.4-pre-moz1 to the puppet buildslaves
Assignee | ||
Comment 5•14 years ago
|
||
Scripts that the slaves run depend on simplejson explicitly (and do not fall back to or from the built-in json module - bug 637508). So that will need to go into the virtualenv as well.
Assignee | ||
Comment 6•14 years ago
|
||
I'm also seeing a lot of
rm -rf build
in dir /home/cltbld/talos-slave/test/. (timeout 1200 secs)
watching logfiles {}
argv: ['rm', '-rf', 'build']
...
closing stdin
using PTY: True
process killed by signal 1
program finished with exit code -1
which is caused by usepty=1 on the slaves. I see this failure in production, too, so I'm not going to worry about it. Once slavealloc is live and 0.8.4 is out, we can disable usepty.
Assignee | ||
Comment 8•14 years ago
|
||
I also ran into some trouble with idle slaves being marked as disconnected due to NAT timeouts (bug 637541). Again, nothing new.
I've been running this on
talos-r3-fed-001
talos-r3-fed64-001
talos-r3-snow-001
talos-r3-leopard-002
linux-ix-slave01
moz2-darwin9-slave08
moz2-darwin10-slave03
At this point, I've seen enough green runs that I'd like to deploy this new version universally to staging. Then we can shake out any bugs before pushing to production.
Attachment #515842 -
Flags: review?(bhearsum)
Comment 9•14 years ago
|
||
Comment on attachment 515842 [details] [diff] [review]
m631851-puppet-manifests-r1.patch
Correct me if I'm wrong, but this patch doesn't cause anything to be deployed, does it? I don't see buildslave::install::production being used anywhere...I'm probably missing something though.
>+ # platform_python is whatever's available on this platform. If that's not
>+ # good enough, we should start installing Pythons with Puppet.
>+ $platform_python = "/usr/bin/python"
This assumption isn't correct in all cases. We use a Python out of /tools for all linux/mac build slaves and one out of ~cltbld for fed, fed64, and snow leopard. (On snow leopard, it's just a symlink to /usr/bin/python, though). Leopard uses the default system Python. We need to continue using these.
Do you have a solution for removing old, unwanted versions of Buildbot?
Attachment #515842 -
Flags: review?(bhearsum) → review-
Assignee | ||
Comment 10•14 years ago
|
||
(In reply to comment #9)
> Correct me if I'm wrong, but this patch doesn't cause anything to be deployed,
> does it? I don't see buildslave::install::production being used anywhere...I'm
> probably missing something though.
Apparently I forgot to qrefresh. I'll do so for the next patch.
> >+ # platform_python is whatever's available on this platform. If that's not
> >+ # good enough, we should start installing Pythons with Puppet.
> >+ $platform_python = "/usr/bin/python"
>
> This assumption isn't correct in all cases. We use a Python out of /tools for
> all linux/mac build slaves and one out of ~cltbld for fed, fed64, and snow
> leopard. (On snow leopard, it's just a symlink to /usr/bin/python, though).
> Leopard uses the default system Python. We need to continue using these.
Well, you're correct that /usr/bin/python isn't good enough in all cases, but your suggestions of which pythons to use are also incorrect :)
Based on md5's of the existing /tools/buildbot/bin/python and various other pythons (and noting that the md5's change when virtualenv copies the Python binary around on mac, probably due to a small resource fork):
test
linux - /usr/bin/python (2.6.2)
linux64 - /usr/bin/python (2.6.2)
darwin9 - /usr/bin/python (2.5.1)
darwin10 - /usr/bin/python (2.6.1)
build
linux - /tools/python-2.6.5/bin/python
darwin9 - /tools/python/bin/python
darwin10 - /tools/python-2.6.4/bin/python
(I'm surprised by test-darwin9, honestly, but /tools/buildbot/bin/python and /usr/bin/python both give the same build info, with version 2.5.1)
This will need a new Puppet module to clean it up and do installs the same way everywhere (particularly since build-darwin9 has some mac-like /tools/python/Python.framework/Versions/2.6/bin/python thing going on), but that's not necessary at the moment.
> Do you have a solution for removing old, unwanted versions of Buildbot?
ensure => absent
Assignee | ||
Comment 11•14 years ago
|
||
I was wrong about the ensure => absent in the last patch, but I added it here :)
Attachment #515842 -
Attachment is obsolete: true
Attachment #516078 -
Flags: review?(bhearsum)
Comment 12•14 years ago
|
||
Comment on attachment 516078 [details] [diff] [review]
m631851-puppet-manifests-r2.patch
Sorry this sat for so long :(.
Attachment #516078 -
Flags: review?(bhearsum) → review+
Assignee | ||
Comment 13•14 years ago
|
||
Comment on attachment 516078 [details] [diff] [review]
m631851-puppet-manifests-r2.patch
552df8913261
deployed everywhere, although it only affects staging (one hopes!)
Attachment #516078 -
Flags: checked-in+
Assignee | ||
Comment 14•14 years ago
|
||
This seems to be going well so far - Aki's killing a bunch of builds on sm01 and the slaves are doing well at killing the underlying processes.
Assignee | ||
Comment 15•14 years ago
|
||
This seems fine in staging so far - let's roll it out!
Attachment #520308 -
Flags: review?(bhearsum)
Updated•14 years ago
|
Attachment #520308 -
Flags: review?(bhearsum) → review+
Assignee | ||
Comment 16•14 years ago
|
||
Hm, it looks like the Python path is wrong for talos-r3-snow-*. Sigh.
Assignee | ||
Comment 17•14 years ago
|
||
Scratch that, talos-r3-snow-002 has the wrong version of Mac OS X installed (10.2.0 instead of 10.6.0). This is still OK to roll out, once jhford's done with the linux64 stuff.
Assignee | ||
Comment 18•14 years ago
|
||
Problems with talos-r3-fed-002, too: bug 645012
Depends on: 645012
Assignee | ||
Comment 19•14 years ago
|
||
Just need to test this on a moz2-linux64-slaveNN machine, and it will be ready to deploy.
Assignee | ||
Updated•14 years ago
|
Attachment #520308 -
Flags: checked-in+
Assignee | ||
Comment 20•14 years ago
|
||
OK, this is deployed on all puppet masters now, and seems to be going smoothly - at least, I've seen a bunch of slaves come up in production with the appropriate version. I'll keep watching puppet master logs to see if there are any machines constantly pinging (which would indicate a puppet failure).
Assignee | ||
Comment 21•14 years ago
|
||
hooray!
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•