Closed
Bug 481886
Opened 16 years ago
Closed 15 years ago
Tracking bug for buildbot 0.7.10p1 upgrade
Categories
(Release Engineering :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: bhearsum)
References
Details
Attachments
(3 files)
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
Buildbot 0.7.10p1 has lots of features that would be useful to us. We should upgrade!
Comment 1•16 years ago
|
||
We need to take a version of buildbot that fixes at least http://buildbot.net/trac/ticket/446, too.
Assignee | ||
Comment 2•16 years ago
|
||
We're going to try and do this early in Q2.
Assignee: catlee → bhearsum
Status: NEW → ASSIGNED
Priority: -- → P3
Assignee | ||
Comment 3•16 years ago
|
||
Some of the nice features in 0.7.10p1 include:
* ATOM/RSS
* Fixed 'ping' button
* Fixed reconfig (no more tracebacks)
* Graceful slave shutdown
* Configurable BuildRequest merging
There's also a few patches which landed post-0.7.10 we should consider including:
* Fixes for one_line_per_build (http://buildbot.net/trac/ticket/455)
* Mercurial step fixes (http://buildbot.net/trac/ticket/462 and http://buildbot.net/trac/ticket/277)
We don't strictly need to take the Mercurial fixes but it might be good to get that testing out of the way - it wouldn't surprise me if it breaks us a bit.
Comment 4•16 years ago
|
||
Most notably, we need to take http://buildbot.net/trac/ticket/446, aka http://github.com/djmitche/buildbot/commit/557c49750d4fea2c2b977bb2c0b4a3db447ab447. Otherwise our builds will break.
Assignee | ||
Comment 5•16 years ago
|
||
This is going to be a pretty easy import, by the looks of it. I'm going to be backing out the patch in bug 485584 since it hasn't solved our issue, and conflicts with some incoming changes. Other than that, there's a few conflicts: process/base.py, process/builder.py, slave/commands.py - all of which are trivial to resolve.
I still need to do a lot of testing in staging before we think about deploying this. It's going to be a bit of a pain to roll out, too, because we'll need to update all of the build slaves (we can probably omit Talos slaves from this since there's no big commands.py changes that affect them). It's probably going to require a fairly big downtime.
So, here's the plan:
* Test the new Buildbot in staging well, focusing on the Mercurial step
* Import 0.7.10p1 into production
* Schedule downtime, roll out across the farm.
Comment 6•16 years ago
|
||
There have been some major patches to the mail notifier stuff, and previously, I ran across problems with TinderboxNotifier, too. We should make sure those don't break, including the l10n-specific uses with WithProperties in tree names.
Comment 7•16 years ago
|
||
(In reply to comment #3)
> There's also a few patches which landed post-0.7.10 we should consider
> including:
> * Fixes for one_line_per_build (http://buildbot.net/trac/ticket/455)
> * Mercurial step fixes (http://buildbot.net/trac/ticket/462 and
> http://buildbot.net/trac/ticket/277)
>
> We don't strictly need to take the Mercurial fixes but it might be good to get
> that testing out of the way - it wouldn't surprise me if it breaks us a bit.
Why not just import a clean 0.7.10p1, and omit those extra later fixes until they are included in 0.7.11, and we import a clean 0.7.11?
It feels easier (and safer?) to import a clean 0.7.10p1, rather then pick-and-choose additional later changes, but I could be missing something.
Comment 8•16 years ago
|
||
Mantra #1, use 0.7.10p1 without patches, and you break the build. The builds won't get their .hg/hgrc set up with paths, which will break about:buildconfig, and make ident required for l10n builds.
Besides the technical details that we need to patch slave/commands.py to include the custom slave-side step code we have. I wonder if it's worth to fork those to slave/mozcommands.py, and to import that from commands.py, to make the distinction more apparent.
Reporter | ||
Comment 9•16 years ago
|
||
I wonder if it's worth it to stop using buildbot's built-in mercurial support completely.
Comment 10•16 years ago
|
||
There are good things coming up, in particular the clobber on switching from one repo-as-branch to another is pretty tough to mimic in pure shell scripts. That's in patches towards .11, too.
Basically, when you have a fx36x clone, and you branch to a releases/mozilla-1.9.2 repo, the Mercurial step realizes that you're now pulling from some place else, and does a clobber. That's the same scenario why we're currently clobbering build/tools all the time, it's not comparing the repo you pulled from with the repo you want to pull from.
Assignee | ||
Comment 11•16 years ago
|
||
(In reply to comment #7)
> (In reply to comment #3)
> > There's also a few patches which landed post-0.7.10 we should consider
> > including:
> > * Fixes for one_line_per_build (http://buildbot.net/trac/ticket/455)
> > * Mercurial step fixes (http://buildbot.net/trac/ticket/462 and
> > http://buildbot.net/trac/ticket/277)
> >
> > We don't strictly need to take the Mercurial fixes but it might be good to get
> > that testing out of the way - it wouldn't surprise me if it breaks us a bit.
>
> Why not just import a clean 0.7.10p1, and omit those extra later fixes until
> they are included in 0.7.11, and we import a clean 0.7.11?
>
> It feels easier (and safer?) to import a clean 0.7.10p1, rather then
> pick-and-choose additional later changes, but I could be missing something.
We've been importing a release + some patches every time we import a new Buildbot - so it's nothing new.
Some of these changes we don't _have_ to take, but since I'm going to be doing the work to import 0.7.10p1 I figure we may as well take some patches that will benefit us. I really want to take the Mercurial ones, and now that Axel mentions it, the MailNotifier ones, so we can deal with whatever bustage there at the same time.
Any any case, as Axel mentions, 0.7.10p1 stock will break the build:
(In reply to comment #8)
> Mantra #1, use 0.7.10p1 without patches, and you break the build. The builds
> won't get their .hg/hgrc set up with paths, which will break about:buildconfig,
> and make ident required for l10n builds.
Assignee | ||
Comment 12•16 years ago
|
||
(In reply to comment #8)
> Besides the technical details that we need to patch slave/commands.py to
> include the custom slave-side step code we have. I wonder if it's worth to fork
> those to slave/mozcommands.py, and to import that from commands.py, to make the
> distinction more apparent.
I think we should avoid these as much as possible mainly because of the huge PITA to deploy them initial + the inevitable bugfixes. But, we do have one custom command in here currently, and I think it's a great idea to move it out.
Assignee | ||
Comment 13•16 years ago
|
||
I've imported 0.7.10p1 and the following tickets into my user repository:
http://buildbot.net/trac/ticket/455
http://buildbot.net/trac/ticket/446
http://buildbot.net/trac/ticket/451
http://buildbot.net/trac/ticket/277
http://buildbot.net/trac/ticket/462
The repository is here: http://hg.mozilla.org/users/bhearsum_mozilla.com/buildbot. I plan to start testing this week starting on staging-master:moz2-master. Once I have all of that sorted out I'll move onto try and talos.
Assignee | ||
Comment 14•16 years ago
|
||
Turns out I forgot to 'hg addremove' after unpacking 0.7.10p1. I've fixed my repository to include all the new files.
Assignee | ||
Comment 15•16 years ago
|
||
While testing 0.7.10p1 on the staging try server I encountered a problem with the MozillaPatchDownload step. I landed a fix upstream for it, and also in http://hg.mozilla.org/users/bhearsum_mozilla.com/buildbot.
Other than that, and the issue I filed bug 487496 for, everything has been fine. I still have to test the Talos buildbot though, and I wouldn't be surprised to find a thing or two that needs fixing.
Priority: P3 → P2
Assignee | ||
Comment 16•16 years ago
|
||
Assignee | ||
Comment 17•16 years ago
|
||
Assignee | ||
Comment 18•16 years ago
|
||
Deployment on Linux:
* Log on as root
wget --no-check-certificate -Obuildbot-0.7.10p1.sh https://bugzilla.mozilla.org/attachment.cgi?id=374948
chmod +x buildbot-0.7.10p1.sh
./buildbot-0.7.10p1.sh
Deployment on Mac:
* Log on as cltbld
wget --no-check-certificate -Obuildbot-0.7.10p1.sh https://bugzilla.mozilla.org/attachment.cgi?id=374948
chmod +x buildbot-0.7.10p1.sh
sudo ./buildbot-0.7.10p1.sh
Deployment on Windows:
* Log on as Administrator
wget --no-check-certificate -Obuildbot-0.7.10p1.sh https://bugzilla.mozilla.org/attachment.cgi?id=374949
chmod +x buildbot-0.7.10p1.sh
./buildbot-0.7.10p1.sh
Assignee | ||
Comment 19•16 years ago
|
||
After a few bumps in the road we've got this deployed. Major problems were:
* Talos losing the ability to override commands (fixed in bug 487496)
* Windows slaves failing due to http://buildbot.net/trac/ticket/456. We checked in this patch and updated the slaves to fix it.
* Many builds failing due to SetMozillaBuildProperties not existing on the slaves. This was the result of a bad merge during the initial import. To fix, re-added the command into commands.py and the slaves were updated.
Reporter | ||
Comment 20•16 years ago
|
||
try-mac-slave06
moz2-darwin9-slave03
weren't updated because they're offline
Reporter | ||
Comment 21•16 years ago
|
||
moz2-darwin9-slave03 has been upgraded.
Holding off on try-mac-slave06 until we get the new buildbot code working on try slaves.
Assignee | ||
Comment 22•16 years ago
|
||
All of the production-1.8 and production-1.9 master + slaves have been updated now. Still to do:
staging-1.9
1.9 unittest
Assignee | ||
Comment 23•16 years ago
|
||
staging-1.9 has been upgraded.
Assignee | ||
Comment 24•16 years ago
|
||
We're hitting what seems to be an ignorable traceback on the 1.9 masters, related to l10n:
File "/tools/buildbot/lib/python2.5/site-packages/buildbot/master.py", line 759, in <lambda>
d.addCallback(lambda res: self.loadConfig_Schedulers(schedulers))
File "/tools/buildbot/lib/python2.5/site-packages/buildbot/master.py", line 835, in loadConfig_Schedulers
d.addCallback(updateDownstreams)
File "/tools/twisted-2.4.0/lib/python2.5/site-packages/twisted/internet/defer.py", line 191, in addCallback
callbackKeywords=kw)
File "/tools/twisted-2.4.0/lib/python2.5/site-packages/twisted/internet/defer.py", line 182, in addCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/tools/twisted-2.4.0/lib/python2.5/site-packages/twisted/internet/defer.py", line 307, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/tools/buildbot/lib/python2.5/site-packages/buildbot/master.py", line 834, in updateDownstreams
s.checkUpstreamScheduler()
File "/tools/buildbot/lib/python2.5/site-packages/buildbot/scheduler.py", line 350, in checkUpstreamScheduler
for s in self.parent.allSchedulers():
<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'allSchedulers'
I've commented in the upstream ticket about it, it doesn't seem to be interfering with anything, though. http://buildbot.net/trac/ticket/35
Assignee | ||
Comment 25•15 years ago
|
||
The only thing left to do here is get the Try Server slaves upgraded to 0.7.10p1. This is blocked on figuring out how to avoid them breaking when the try repository grows too many heads.
Assignee | ||
Comment 26•15 years ago
|
||
Last week I worked with the maintainers of the Buildbot Mercurial code and they landed an upstream patch that will enable us to use 'hg clone --rev' on the try server. We'll need to pull in http://github.com/djmitche/buildbot/commit/483a6043ed2cab2436009eeb7465269b7a48e65f, and land the attached patch. We'll need a short downtime so we can upgrade the slaves at the same time as we land these.
Attachment #378326 -
Flags: review?(catlee)
Reporter | ||
Updated•15 years ago
|
Attachment #378326 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 27•15 years ago
|
||
Comment on attachment 378326 [details] [diff] [review]
MozillaTryServerHgClone fixes for 0.7.10p1+
changeset: 299:1aa4bb2bdf4d
Attachment #378326 -
Flags: checked‑in+ checked‑in+
Assignee | ||
Comment 28•15 years ago
|
||
I got the Try Server slaves upgraded today (yay).
Assignee | ||
Comment 29•15 years ago
|
||
This bug is ripe for the closing - all of our installations have been updated to 0.7.10p1, save 1.9 unittests (which is ok).
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Comment 30•15 years ago
|
||
For the moz2-master slaves we used
http://hg.mozilla.org/build/buildbot/rev/96306d317882
For the try slaves it was
http://hg.mozilla.org/build/buildbot/rev/78b923616fb0
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•