Closed
Bug 712206
Opened 13 years ago
Closed 10 years ago
slave pre-flight tasks (runner)
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task, P3)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: mrrrgn)
References
Details
(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2714] [unittest][talos][automation][q4goal])
Attachments
(2 files, 5 obsolete files)
(deleted),
patch
|
ianconnolly
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review |
a lot of slave maintenance is done at the beginning of build and test jobs that could be done instead before even starting buildbot.
things like doing clobbers, purging old directories, making sure the tools checkout is up-to-date.
in the case where we have spare capacity in the pool this can be a slight win for end-to-end time since builds don't have to waste time cleaning up. in the case where we're maxed out, it's no worse than the current situation.
pre-flight checks could be added here as well (can I compile something? is my display correct?), and refuse to start buildbot if the checks fail.
Updated•13 years ago
|
Whiteboard: [unittest][talos]
Updated•13 years ago
|
Priority: -- → P4
Whiteboard: [unittest][talos] → [unittest][talos][automation]
Reporter | ||
Updated•13 years ago
|
Component: Release Engineering → Release Engineering: Machine Management
QA Contact: release → armenzg
Comment 2•12 years ago
|
||
Among the things this should do is to verify and, if necessary, correct the screen resolution on all affected platforms.
I can see two ways to implement this:
1. add functionality to runslave.py to run a directory full of scripts
2. insert a step *before* runslave.py on all platforms that runs a directory full of scripts
Comment 4•12 years ago
|
||
I can help with the Windows side as I have increased my expertise by setting the Windows 8, 7 and 64-bit several times in the last two quarters.
Comment 5•12 years ago
|
||
I'm thinking that we should add bad slaves in the "see also" field and keep track in which weird ways it starts burning jobs even if hardware diagnostics did not find any problem.
Comment 6•12 years ago
|
||
I would also add the ability to report issues into a dashboard.
I would like to avoid having to go and check each single slave when it fails to start.
Updated•11 years ago
|
Component: Release Engineering: Machine Management → Release Engineering: Platform Support
QA Contact: armenzg → coop
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Comment 7•11 years ago
|
||
We need to clean temporary files on bld-lion machines like on bug 906706.
Comment 8•11 years ago
|
||
It would be great if we had a script that would help us determine if the right graphic card is installed on the win7 machines: see bug https://bugzilla.mozilla.org/show_bug.cgi?id=873566#c24
Comment 9•11 years ago
|
||
An approach to slave pre-flight tasks for puppet:
https://github.com/catlee/puppet/compare/master...startup
Comment 10•11 years ago
|
||
Very cool! I can't comment on the "compare", so a few notes here:
* /etc/preflight.d should be purged, so that puppet can remove no-longer-desired scripts
* the reboot should be delayed or, given the presence of slave-health/kittenherder, maybe just hang forever on the assumption it will be rebooted externally? At any rate, trying to fix something that's rebooting every 30s is no fun at all.
* As for starting Buildbot, it might be easiest to just invoke this script from the various shell scripts used for puppet::atboot, since they already have a mechanism to delay buildbot startup.
* It'd be nicer to have preflight::step take a 'contents' attribute. Then the in-module classes like preflight::steps::tools can use contents => template("${module_name}/tools.erb"), and preflight::step can also be used outside of the preflight module, with arbitrary scripts.
* Really minor, but grammatically I'd prefer preflight::step::tools (singular), just for the symmetry with preflight::step.
* It'd be great to have additional logging from these scripts, both to syslog and, potentially, to nagios via NSCA.
* Given the necessity of running these scripts on all platforms, would it be better to write them in Python or Ruby? That would also allow libraries for platform-independent logging, etc.
I'm sad that we've been sitting on this since May :(
Updated•11 years ago
|
Assignee: nobody → armenzg
Whiteboard: [unittest][talos][automation] → [unittest][talos][automation][q4goal]
Comment 11•11 years ago
|
||
I would prefer writing this in python to make it cross-platform compatible.
It is also easier to have people contribute to it.
I assume there will be some Puppet and GPO to deploy some part of it.
I will do an analysis to see where to hook this up for each OS.
Comment 12•11 years ago
|
||
I can't look at this bug until past the work week.
I'm afraid this will be a missed goal.
Priority: P2 → P3
Comment 13•11 years ago
|
||
Simone: how do you feel about owning the whole thing, and not just the tools repo part?
Flags: needinfo?(sbruno)
Comment 14•11 years ago
|
||
Coop: it surely makes sense since I started working on Bug 712205. I'm taking this!
Assignee: armenzg → sbruno
Flags: needinfo?(sbruno)
Comment 15•11 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] (I read my bugmail; don't needinfo me) from comment #10)
> * As for starting Buildbot, it might be easiest to just invoke this script
> from the various shell scripts used for puppet::atboot, since they already
> have a mechanism to delay buildbot startup.
+1
> * It'd be great to have additional logging from these scripts, both to
> syslog and, potentially, to nagios via NSCA.
I think we handle this. Windows will still suck.
> * Given the necessity of running these scripts on all platforms, would it be
> better to write them in Python or Ruby? That would also allow libraries for
> platform-independent logging, etc.
Yes, I think we want python scripts/libs here.
Comment 16•11 years ago
|
||
Some other notes that have come out of conversations with releng this week:
* measuring improvement: we track current setup and teardown times per OS in brasstacks (http://brasstacks.mozilla.com/gofaster/#/ but the charts seem to be busted right now, I'll investigate). We should gather baseline stats now, so we can assess whether (and how far, hopefully) we've moved the needle at the end
* incremental gains: once we have our baselines, we can and should roll this out per OS as each step is ready, rather than waiting for a giant improvement across the board at the end. Making things a little bit better every week, even on one platform at a time, is progress.
* specific tasks:
** clobberer should run at start-up, although we will still need to check clobberer as part of the build, but only for the tree in question. Depending on how long a slave sits idle before its first job, there's a chance that a clobber request will come in between buildbot startup and job start.
** hg-shared: we should iterate through all the trees in the hg-shared dir and make sure they're updated to tip before starting buildbot. This will give jobs the minimum possible code delta to pull during the actual build process.
** moving post-flight tasks to pre-flight: obviously this makes sense too, but may require some special logic. Some of the post-flight tasks we do are specific to the current build, e.g. removing build and/or objdirs. Moving those tasks to pre-flight means you wouldn't have a current build as reference. Maybe post-flight tasks get merged in a later round of improvements here. I'm hoping better build system dependencies make this a non-issue in the near future.
Comment 17•11 years ago
|
||
** one pony, with rainbow hair
Let's get the basic framework in place before we start hammering every nail!
Comment 18•11 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #16)
> * measuring improvement: we track current setup and teardown times per OS in
> brasstacks (http://brasstacks.mozilla.com/gofaster/#/ but the charts seem to
> be busted right now, I'll investigate).
Filed bug 932877.
Comment 20•11 years ago
|
||
I am going to write a preflight_runner.py script, which will be invoked just before running runslave.py in the following files, which reside in http://hg.mozilla.org/build/puppet/file/default/modules/buildslave/files:
darwin-run-buildslave.sh
run-puppet-and-buildbot.sh
start-buildbot-win64.bat
startTalos-w7.bat
startTalos-w764.bat
startTalos-w864.bat
startTalos-xp.bat
Using these files as hook points for preflight calls will make it possible to roll out preflight scripts per OS, as proposed by coop.
preflight_runner.py will run a directory containing scripts, determining the order of execution based on some simple naming conventions.
I will start implementing support for python scripts, but it should not be difficult to extend to direct support of native shell scripts (.sh scripts for unix, .bat/.cmd scripts for Windows).
Armen, can you please confirm that this plan is compatible with our last conversation?
Flags: needinfo?(armenzg)
Comment 21•11 years ago
|
||
First of all, could you please go to each Windows host and grab a recent version of start*.bat?
What is committed puppet could be easily be out-of-date since we have refactor the Windows infrastructure with GPO and have changed a bunch of assumptions.
This matches what we spoke.
I have no problem making the script more OS-centric, however, I would *suggest* not mixing them in the same directory but split by OS (e.g. win_preflight_tasks/, unix_preflight_tasks/ and osx_preflight_tasks/).
Flags: needinfo?(armenzg)
Comment 22•11 years ago
|
||
Hi Armen,
Thanks for your quick feedback.
I will have a look to the actual .bat files on windows hosts - even if there are differences, I believe the principle of using these files as hook points before runslave.py is called remains valid.
Regarding the preflight tasks, I completely agree with your recommendation; they will necessarily be written and stored separately for each OS, at least for two reasons:
1) because we may want to add different behavior per OS
2) because we may want to support native shell scripts
Comment 23•11 years ago
|
||
The scripts will, presumably, be installed by puppet, so I would recommend having the per-OS (or per-silo, really) logic handled in puppet, not by the runner script.
Reporter | ||
Comment 24•11 years ago
|
||
I'd like to shift focus slightly for this. Instead of running pre-flight tasks on boot, we should run them before running buildbot. A minor distinction right now, but I'd like to get to the point where we're running some kind of loop like:
while True:
- run pre-flight tasks
- run buildbot for one job
- run post-flight tasks
then we can stop rebooting machines after every job.
Reporter | ||
Comment 25•10 years ago
|
||
We're going to do this with runner:
https://github.com/catlee/runner
Assignee: sbruno → catlee
Comment 27•10 years ago
|
||
Puppet module for installing and configuring Runner. Runner itself is already packaged as an sdist and uploaded. Unsure if :catlee wants a review on runner itself or not.
Attachment #8436062 -
Flags: review?(dustin)
Attachment #8436062 -
Flags: feedback?(catlee)
Comment 28•10 years ago
|
||
Comment on attachment 8436062 [details] [diff] [review]
runner.diff
Review of attachment 8436062 [details] [diff] [review]:
-----------------------------------------------------------------
It looks like this has a bunch of other stuff mixed in. Can you re-post a diff containing only your changes?
Attachment #8436062 -
Flags: review?(dustin)
Comment 29•10 years ago
|
||
A diff not created on an brain-addled Friday evening...
Attachment #8436062 -
Attachment is obsolete: true
Attachment #8436062 -
Flags: feedback?(catlee)
Attachment #8437030 -
Flags: review?(dustin)
Attachment #8437030 -
Flags: feedback?(catlee)
Comment 30•10 years ago
|
||
Comment on attachment 8437030 [details] [diff] [review]
Puppet module for Catlee's Runner
Review of attachment 8437030 [details] [diff] [review]:
-----------------------------------------------------------------
I think we talked a bit in irc about this, but just to follow up, it seems like dependencies are going to be the gotcha here, especially if this is going to be useful for other organizations (e.g., QA probably has some pre-flight tasks, but they're probably not very similar to releng's).
Using sorted, numbered scripts works OK if different numbers have meaning (e.g., "all repositories are up to date by level 10"), but they don't represent dependencies very well. For example, if I rename 15-frobnicate.sh to 17-frobnicate.sh, how do I know that 16-barterize.sh didn't depend on it?
Since the runner itself is Python, it'd actually be pretty easy to specify dependencies explicitly, either in the scripts themselves (perhaps in comments) or in some nearby file (e.g., barterize.deps), and then flatten that dependency tree into a fixed order in the Python script. That would also leave the door open to parallelizing the preflight tasks at a later date. And if those dependencies are reflected in the Puppet configuration, then Puppet can ensure that every task's dependency tasks are also installed.
Systemd, launchd, and Gentoo all solve this a little bit differently, and they all have complicated syntax to say 'starts before this service *or* that service' or to refer to virtual services or whatever. I don't see runner needing such levels of complexity.
This, combined with the config handling I suggest below, would mean that toplevel::slave::build (or toplevel::slave::qa::tps_ci) only needs to have
include runner::task::update_shared_repos
include runner::task::clean _tempfiles
and all of the reqiured configuration, dependent tasks, and runner itself would be installed automatically.
::: modules/runner/files/checkout_tools
@@ +6,5 @@
> + exit $?
> +fi
> +HGTOOL=/usr/local/bin/hgtool.py
> +
> +TOOLS_REPO=$($RUNNER_CONFIG_CMD -g hg.tools_repo)
I don't see $RUNNER_CONFIG_CMD defined anywhere..
::: modules/runner/manifests/service.pp
@@ +6,5 @@
> + include runner::settings
> + case $::operatingsystem {
> + 'CentOS': {
> + # Which service this relies on
> + $initd_required_start = 'network'
What's this for? Would this change?
::: modules/toplevel/manifests/slave/build.pp
@@ -33,5 @@
> ccache::ccache_dir {
> - "/builds/ccache":
> - maxsize => "10G",
> - owner => $users::builder::username,
> - group => $users::builder::group;
Looks like you have some kind of automatic puppetlint thing going on? It's OK for files you've modified, but not every file..
::: modules/toplevel/manifests/slave/build/mock.pp
@@ -17,5 @@
> # good way to communicate the need to that class.
> exec {
> 'add-builder-to-mock_mozilla':
> - command => "/usr/bin/gpasswd -a $users::builder::username mock_mozilla",
> - unless => "/usr/bin/groups $users::builder::username | grep '\\<mock_mozilla\\>'",
The addition of {..} here is legit, but we don't follow the style of lining up the => in a single column.
@@ +32,5 @@
> + data => template('toplevel/slave/build/hg.cfg.erb');
> + 'buildbot.cfg':
> + sectionname => 'buildbot',
> + data => template('toplevel/slave/build/buildbot.cfg.erb');
> + }
I think there are bits of the config stuff missing in this patch, so maybe I'm missing something, but -- why have all of these config files? And it looks like each just has a single section? You could use the 'concat' puppet module to concatenate those into a single file and make things easier for users to read in place.
From what I can tell, the config's are orthogonal to the tasks - -meaning that a task may use several config's, and several tasks may use a config. What if each task included the configs it needed?
So runner::tasks::checkout_tools would have
include runner::config::hg
and runner::tasks::update_shared_repos would have
include runner::config::hg
include runner::config::env
Attachment #8437030 -
Flags: review?(dustin) → feedback+
Comment 31•10 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #30)
> Comment on attachment 8437030 [details] [diff] [review]
> Puppet module for Catlee's Runner
>
> Review of attachment 8437030 [details] [diff] [review]:
> -----------------------------------------------------------------
Thanks a bunch for taking the time to review this so thoroughly, Dustin!
> I think we talked a bit in irc about this, but just to follow up, it seems
> like dependencies are going to be the gotcha here, especially if this is
> going to be useful for other organizations (e.g., QA probably has some
> pre-flight tasks, but they're probably not very similar to releng's).
>
> Using sorted, numbered scripts works OK if different numbers have meaning
> (e.g., "all repositories are up to date by level 10"), but they don't
> represent dependencies very well. For example, if I rename 15-frobnicate.sh
> to 17-frobnicate.sh, how do I know that 16-barterize.sh didn't depend on it?
>
> Since the runner itself is Python, it'd actually be pretty easy to specify
> dependencies explicitly, either in the scripts themselves (perhaps in
> comments) or in some nearby file (e.g., barterize.deps), and then flatten
> that dependency tree into a fixed order in the Python script. That would
> also leave the door open to parallelizing the preflight tasks at a later
> date. And if those dependencies are reflected in the Puppet configuration,
> then Puppet can ensure that every task's dependency tasks are also installed.
>
> Systemd, launchd, and Gentoo all solve this a little bit differently, and
> they all have complicated syntax to say 'starts before this service *or*
> that service' or to refer to virtual services or whatever. I don't see
> runner needing such levels of complexity.
>
> This, combined with the config handling I suggest below, would mean that
> toplevel::slave::build (or toplevel::slave::qa::tps_ci) only needs to have
>
> include runner::task::update_shared_repos
> include runner::task::clean _tempfiles
>
> and all of the reqiured configuration, dependent tasks, and runner itself
> would be installed automatically.
I need to grab :catlee to discuss this today. Will get back to you with my thoughts after that.
>
> ::: modules/runner/files/checkout_tools
> @@ +6,5 @@
> > + exit $?
> > +fi
> > +HGTOOL=/usr/local/bin/hgtool.py
> > +
> > +TOOLS_REPO=$($RUNNER_CONFIG_CMD -g hg.tools_repo)
>
> I don't see $RUNNER_CONFIG_CMD defined anywhere..
Runner sets this itself (and guarantees it being set).
Definitely worthy of a comment though. Will fix.
>
> ::: modules/runner/manifests/service.pp
> @@ +6,5 @@
> > + include runner::settings
> > + case $::operatingsystem {
> > + 'CentOS': {
> > + # Which service this relies on
> > + $initd_required_start = 'network'
>
> What's this for? Would this change?
Good question. Will find out if still needed
> ::: modules/toplevel/manifests/slave/build.pp
> @@ -33,5 @@
> > ccache::ccache_dir {
> > - "/builds/ccache":
> > - maxsize => "10G",
> > - owner => $users::builder::username,
> > - group => $users::builder::group;
>
> Looks like you have some kind of automatic puppetlint thing going on? It's
> OK for files you've modified, but not every file..
Oh, damn. Apologies.
>
> ::: modules/toplevel/manifests/slave/build/mock.pp
> @@ -17,5 @@
> > # good way to communicate the need to that class.
> > exec {
> > 'add-builder-to-mock_mozilla':
> > - command => "/usr/bin/gpasswd -a $users::builder::username mock_mozilla",
> > - unless => "/usr/bin/groups $users::builder::username | grep '\\<mock_mozilla\\>'",
>
> The addition of {..} here is legit, but we don't follow the style of lining
> up the => in a single column.
Will know for the future - thanks!
>
> @@ +32,5 @@
> > + data => template('toplevel/slave/build/hg.cfg.erb');
> > + 'buildbot.cfg':
> > + sectionname => 'buildbot',
> > + data => template('toplevel/slave/build/buildbot.cfg.erb');
> > + }
>
> I think there are bits of the config stuff missing in this patch, so maybe
> I'm missing something, but -- why have all of these config files? And it
> looks like each just has a single section? You could use the 'concat'
> puppet module to concatenate those into a single file and make things easier
> for users to read in place.
>
> From what I can tell, the config's are orthogonal to the tasks - -meaning
> that a task may use several config's, and several tasks may use a config.
> What if each task included the configs it needed?
>
> So runner::tasks::checkout_tools would have
> include runner::config::hg
> and runner::tasks::update_shared_repos would have
> include runner::config::hg
> include runner::config::env
There doesn't seem to be any config missing. Is there something in particularly that made you think that that I may have forgotten?
And that's a good point on being able to `include` the configs. I'll look into that today.
Thanks again.
Comment 32•10 years ago
|
||
RUNNER_CONFIG_CMD made me think something was missing, but I see the error of my ways now.
Reporter | ||
Updated•10 years ago
|
Assignee: catlee → iconnolly
Reporter | ||
Comment 33•10 years ago
|
||
Comment on attachment 8437030 [details] [diff] [review]
Puppet module for Catlee's Runner
Review of attachment 8437030 [details] [diff] [review]:
-----------------------------------------------------------------
We also spoke on IRC and vidyo about this.
I like the idea of moving the configs into their own modules, related to which tasks are defined.
Let's leave inter-task dependencies for the next phase.
Attachment #8437030 -
Flags: feedback?(catlee) → feedback+
Comment 34•10 years ago
|
||
Updated runner module post-feedback with Dustin.
Just need a quick look at the config changes to make sure what I envisaged was aligned with what :dustin was thinking.
Left inter-task dependencies to iteration n+1.
Attachment #8437030 -
Attachment is obsolete: true
Attachment #8439356 -
Flags: review?(dustin)
Updated•10 years ago
|
Comment 36•10 years ago
|
||
Comment on attachment 8439356 [details] [diff] [review]
post-feedback diff
Review of attachment 8439356 [details] [diff] [review]:
-----------------------------------------------------------------
All pretty minor comments here -- nice work!
::: modules/runner/files/checkout_tools
@@ +1,1 @@
> +#!/bin/bash
Minor, but these files should have license headers on them
@@ +7,5 @@
> +fi
> +HGTOOL=/usr/local/bin/hgtool.py
> +
> +# $RUNNER_CONFIG_CMD is guaranteed to be set by Runner itself. See Runner
> +# documentation for more information.
Thanks :)
::: modules/runner/manifests/config.pp
@@ +1,5 @@
> +# This Source Code Form is subject to the terms of the Mozilla Public
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +# config file for runner
> +define runner::config($sectionname, $data) {
This appears unused now - remove?
::: modules/runner/manifests/service.pp
@@ +6,5 @@
> + include runner::settings
> + case $::operatingsystem {
> + 'CentOS': {
> + # Which service this relies on
> + $initd_required_start = 'network'
This is still hanging around..
::: modules/runner/manifests/tasks/checkout_tools.pp
@@ +3,5 @@
> + file {
> + '/tools/checkouts':
> + ensure => directory,
> + owner => cltbld,
> + group => cltbld;
replace cltbld with $::config::builder_username
@@ +4,5 @@
> + '/tools/checkouts':
> + ensure => directory,
> + owner => cltbld,
> + group => cltbld;
> + }
one space too many
::: modules/toplevel/manifests/slave/build/mock.pp
@@ +22,4 @@
> require => [Class['packages::mozilla::mock_mozilla'], Class['users::builder']];
> }
> +
> + include runner
If you make each of the runner::tasks::* include runner, then you don't even need this here. That might be a bit easier.
Attachment #8439356 -
Flags: review?(dustin) → review+
Comment 37•10 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #36)
> Comment on attachment 8439356 [details] [diff] [review]
> post-feedback diff
>
> Review of attachment 8439356 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> All pretty minor comments here -- nice work!
Thanks!
>
> ::: modules/runner/files/checkout_tools
> @@ +1,1 @@
> > +#!/bin/bash
>
> Minor, but these files should have license headers on them
>
Fixed!
>
> ::: modules/runner/manifests/config.pp
> @@ +1,5 @@
> > +# This Source Code Form is subject to the terms of the Mozilla Public
> > +# License, v. 2.0. If a copy of the MPL was not distributed with this
> > +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> > +# config file for runner
> > +define runner::config($sectionname, $data) {
>
> This appears unused now - remove?
Done.
>
> ::: modules/runner/manifests/service.pp
> @@ +6,5 @@
> > + include runner::settings
> > + case $::operatingsystem {
> > + 'CentOS': {
> > + # Which service this relies on
> > + $initd_required_start = 'network'
>
> This is still hanging around..
Oops! My bad, I had already removed it but I must have generated the diff beforehand. Fixed.
>
> ::: modules/runner/manifests/tasks/checkout_tools.pp
> @@ +3,5 @@
> > + file {
> > + '/tools/checkouts':
> > + ensure => directory,
> > + owner => cltbld,
> > + group => cltbld;
>
> replace cltbld with $::config::builder_username
Fixed.
>
> @@ +4,5 @@
> > + '/tools/checkouts':
> > + ensure => directory,
> > + owner => cltbld,
> > + group => cltbld;
> > + }
>
> one space too many
Annoyed the linter didn't catch that. Thanks.
>
> ::: modules/toplevel/manifests/slave/build/mock.pp
> @@ +22,4 @@
> > require => [Class['packages::mozilla::mock_mozilla'], Class['users::builder']];
> > }
> > +
> > + include runner
>
> If you make each of the runner::tasks::* include runner, then you don't even
> need this here. That might be a bit easier.
Good idea. Done.
Comment 38•10 years ago
|
||
Could I get a check-in? :)
Carrying the r+ from :dustin over from Attachment 8439356 [details] [diff].
Attachment #8439356 -
Attachment is obsolete: true
Attachment #8441600 -
Flags: review+
Attachment #8441600 -
Flags: checked-in?
Comment 39•10 years ago
|
||
Comment on attachment 8441600 [details] [diff] [review]
runner.diff -- Changes made as per Dustin's 2nd review
remote: https://hg.mozilla.org/build/puppet/rev/58a3c0e9ae64
remote: https://hg.mozilla.org/build/puppet/rev/6fa2dc80d46a
Attachment #8441600 -
Flags: checked-in? → checked-in+
Reporter | ||
Comment 40•10 years ago
|
||
Comment on attachment 8441600 [details] [diff] [review]
runner.diff -- Changes made as per Dustin's 2nd review
we didn't end up with the proper boot order for some reason
Attachment #8441600 -
Flags: checked-in+ → checked-in-
Comment 45•10 years ago
|
||
This should fix the boot order problems we were having previously.
The problems were:
a) We had rc3.d/ symlinks for puppet and buildbot already on ours existing machines. We were changing their boot priority in their init.d/ scripts, but the symlinks weren't being updated to reflect that. Puppet doesn't do this as part of its Service type unfortunately, but I've run an external command to reset these symlinks. (This was all working fine on *new* machines because the symlinks had the correct priority value if they were newly created)
b) Another issue was that we were moving new services to after puppet, that weren't currently in the boot order for after puppet (ie. we were changing puppet's boot priority while it was running from its initscript). This has been fixed by getting puppet for force a reboot whenever its initscript is changed.
These were the known issues that arose from the last deployment. Upon a reboot of a server configured with this patch we correctly see puppet -> runner -> buildbot so hopefully that means we're good.
Attachment #8441600 -
Attachment is obsolete: true
Attachment #8443701 -
Flags: review?(dustin)
Attachment #8443701 -
Flags: checked-in?
Comment 46•10 years ago
|
||
Comment on attachment 8443701 [details] [diff] [review]
Runner - with boot order fixes
Review of attachment 8443701 [details] [diff] [review]:
-----------------------------------------------------------------
LGTM. Just a couple of trailing space nazi comments below. :)
I can fix them when I land the patch.
::: manifests/moco-config.pp
@@ +225,4 @@
> $selfserve_agent_carrot_exchange = "buildapi.control"
> $selfserve_agent_carrot_queue = "buildapi-agent-rabbit2"
>
> +
Can you remove the added empty line above?
::: modules/puppet/manifests/atboot.pp
@@ +27,5 @@
> exec {
> # ask the puppet startup script to reboot
> "reboot-after-puppet":
> + command => "touch /REBOOT_AFTER_PUPPET",
> + path => ['/bin/', '/usr/bin/'],
From IRC, "touch" lives in /bin on CentOS. Good catch!
@@ +61,5 @@
> + # rc3.d symlink and we want to reboot because we've changed
> + # what we want to come after puppet in the boot order
> + notify => [ Exec['initd-refresh'], Exec['reboot-after-puppet'] ];
> + }
> +
.. and here
@@ +65,5 @@
> +
> + exec {
> + 'initd-refresh':
> + # resetpriorities tells chkconfig to update the
> + # symlinks in rcX.d with the values from the service's
and on these 2 lines
@@ +68,5 @@
> + # resetpriorities tells chkconfig to update the
> + # symlinks in rcX.d with the values from the service's
> + # init.d script
> + command => '/sbin/chkconfig puppet resetpriorities',
> + refreshonly => true;
And here
(trailing spaces nazi mode off)
::: modules/puppet/templates/puppet-centos-initd.erb
@@ +8,5 @@
> +### BEGIN INIT INFO
> +# Provides: puppet
> +# Required-Start: $local_fs $network
> +# Required-Start: $local_fs $network
> +# Should-Start: $remote_fs
Ooop, and here
Attachment #8443701 -
Flags: review?(dustin) → review+
Comment 47•10 years ago
|
||
Whitespace fixes from Rail's review.
Carrying over the r+.
Attachment #8443701 -
Attachment is obsolete: true
Attachment #8443701 -
Flags: checked-in?
Attachment #8445296 -
Flags: review+
Updated•10 years ago
|
Attachment #8445296 -
Flags: checked-in?
Reporter | ||
Updated•10 years ago
|
Attachment #8445296 -
Flags: checked-in? → checked-in+
Comment 48•10 years ago
|
||
Attachment #8445296 -
Attachment is obsolete: true
Comment 49•10 years ago
|
||
We're sticking with this - will pull fix out into new bug.
Reporter | ||
Updated•10 years ago
|
Attachment #8445296 -
Attachment is obsolete: false
Updated•10 years ago
|
Summary: slave pre-flight tasks → slave pre-flight tasks (runner)
Updated•10 years ago
|
Whiteboard: [unittest][talos][automation][q4goal] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2706] [unittest][talos][automation][q4goal]
Updated•10 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2706] [unittest][talos][automation][q4goal] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2714] [unittest][talos][automation][q4goal]
Assignee | ||
Updated•10 years ago
|
Assignee: ian → winter2718
Assignee | ||
Comment 50•10 years ago
|
||
It seems like this is essentially finished. I'm starting on getting rid of post-build reboots; will open bugs for any additional runner tasks that need creating catch-as-catch-can.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•