Closed
Bug 713846
Opened 13 years ago
Closed 12 years ago
unified log handling
Categories
(Release Engineering :: General, defect, P2)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: catlee)
References
Details
(Whiteboard: [logs])
Attachments
(4 files, 1 obsolete file)
(deleted),
patch
|
bhearsum
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
bhearsum
:
review+
bhearsum
:
feedback+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
bhearsum
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
bhearsum
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
rework logging so that:
- all jobs have their logs uploaded somewhere
- url to log is recorded in log_url property
- pulse message is generated
Assignee | ||
Comment 1•13 years ago
|
||
The bulk of the work is here. We rip out all the single-purpose log handlers and rely on one global one instantiated per master (in another patch).
The log handler will call postrun.py for every build that finishes.
postrun.py will then:
- upload the log file to the "right place"
- possibly mail the author of the change with a link to the log (for try)
- update statusdb, including the log url and schedulerdb request ids as properties
- send a message to pulse
Each one of these steps is a separate command in the command queue. postrun.py calls itself with additional arguments to proceed to the next step.
philor, please have a look at the changes to log_uploader.py and see if the formatting changes there make sense (e.g. start/stop times per step). If you think there's any extra information that would be useful to add, now's the time to put it in!
Attachment #590345 -
Flags: feedback?(philringnalda)
Attachment #590345 -
Flags: feedback?(bhearsum)
Comment 2•13 years ago
|
||
Comment on attachment 590345 [details] [diff] [review]
unified log handling - buildbotcustom changes
I'm flabbergasted that I don't want moar moar moar, but the only things I remember wanting are projects uploading, log_url, timestamps for steps, and not colliding when a retrigger on a different master happens to be the same build number, and I see all of those in there. Yay!
Attachment #590345 -
Flags: feedback?(philringnalda) → feedback+
Comment 3•13 years ago
|
||
Comment on attachment 590345 [details] [diff] [review]
unified log handling - buildbotcustom changes
Review of attachment 590345 [details] [diff] [review]:
-----------------------------------------------------------------
::: bin/log_uploader.py
@@ +36,5 @@
> if identity:
> cmd.extend(['-i', identity])
> cmd.extend(['-p', str(port), host, remote_cmd])
>
> + return retry(do_cmd, attempts=retries+1, sleeptime=retry_sleep, args=(cmd,))
Why +1 on the number of retries?
@@ +142,5 @@
> shortText = ' '.join(step.getText()) + ' (results: %s, elapsed: %s)' % (results, elapsed)
> + if times and times[0]:
> + logFile.write("========= Started %s (at %s) =========\n" % (shortText, datetime.fromtimestamp(times[0])))
> + else:
> + logFile.write("========= Started %s =========\n" % shortText)
There's a block above that claims the step is "not started" if times or times[0] evaluates to false. This else block seems to be hit only in that state too, so isn't it wrong to print out "Started" here?
@@ +219,5 @@
> if options.l10n:
> + try:
> + suffix = '-%s' % build.getProperty('locale')
> + except KeyError:
> + suffix = '-unknown'
Could this fallback get us into a case where we have duplicate log names? (Eg, two builds on different masters that share a build_number and upload to the same place).
::: bin/postrun.py
@@ +44,5 @@
> + if "nightly" in builder.name:
> + upload_args.append("--nightly")
> + if builder.name.startswith("release-"):
> + upload_args.append("--release")
> + upload_args.append("%s/%s" % (info.get['version'], info.get['build_number']))
Shouldn't these be .get() instead of .get[]?
@@ +59,5 @@
> + upload_args.extend(["--product", product])
> + if platform:
> + upload_args.extend(["--platform", platform])
> + elif not builder.name.startswith("release-"):
> + upload_args.extend(["--platform", 'noarch'])
Don't want to use noarch for release jobs?
@@ +75,5 @@
> +
> + proc = subprocess.Popen(cmd,
> + stdout=subprocess.PIPE,
> + stderr=subprocess.STDOUT,
> + stdin=devnull)
All of this command execution makes me wonder if you should be using utils.commands.run_cmd/get_output - buildbotcustom already has a dependency on build/tools anyways!
@@ +79,5 @@
> + stdin=devnull)
> +
> + retcode = proc.wait()
> + output = proc.stdout.read().strip()
> + print output
Do these prints get captured anywhere? If so, they should probably use the logging module. If not, just get rid of them.
@@ +114,5 @@
> + ssh_info = ssh_info.get(branch, ssh_info["*"])
> +
> + retval = ['--user', ssh_info['user']]
> + if 'sshkey' in ssh_info:
> + retval.extend(["-i", ssh_info['sshkey']])
Have you tested this on all platforms? IIRC "-i" doesn't always resolve ~/ but "-oIdentityFile" does.
@@ +242,5 @@
> +
> + def processBuild(self, options, build_path, request_ids):
> + build = self.getBuild(build_path)
> + info = self.getBuildInfo(build)
> + if not options.log_url:
Let me make sure I understand the flow of this correctly. When a build finishes, Buildbot calls the QueuedCommandHandler that runs postrun.py. postrun.py runs processBuild, and there will be no log_url set, so it runs this block. Inside this block, it adds another postrun.py command to the queue, this time with log_url set. It also sends out mail to people, if necessary. When that runs it skips this block and runs the next one, which adds it to StatusDB, and adds _another_ run to the queue, this time with log_url and statusdb_id set. When that runs, it skips down to the last block and does the Pulse publishing.
::: process/release.py
@@ -22,1 @@
> reload(release.paths)
Hopefully this file hasn't bitrotted too much :S.
Assignee | ||
Comment 4•13 years ago
|
||
Comment on attachment 590345 [details] [diff] [review]
unified log handling - buildbotcustom changes
Review of attachment 590345 [details] [diff] [review]:
-----------------------------------------------------------------
::: bin/log_uploader.py
@@ +36,5 @@
> if identity:
> cmd.extend(['-i', identity])
> cmd.extend(['-p', str(port), host, remote_cmd])
>
> + return retry(do_cmd, attempts=retries+1, sleeptime=retry_sleep, args=(cmd,))
setting -r 1 should mean try once, and if it fails, try once more. Without the +1 it would only try once.
@@ +142,5 @@
> shortText = ' '.join(step.getText()) + ' (results: %s, elapsed: %s)' % (results, elapsed)
> + if times and times[0]:
> + logFile.write("========= Started %s (at %s) =========\n" % (shortText, datetime.fromtimestamp(times[0])))
> + else:
> + logFile.write("========= Started %s =========\n" % shortText)
Yeah, I think the current logs output something like
========= Started (results: not started, elapsed: not started) ==========
======== Finished (results: not started, elapsed: not started) ========
I'll clean that up.
@@ +219,5 @@
> if options.l10n:
> + try:
> + suffix = '-%s' % build.getProperty('locale')
> + except KeyError:
> + suffix = '-unknown'
No, that's handled by this block in formatLog():
if master_name:
build_name = "%s%s-%s-build%s.txt.gz" % (builder_name, builder_suffix, master_name, build_number)
else:
build_name = "%s%s-build%s.txt.gz" % (builder_name, builder_suffix, build_number)
So we'll always be including the builder name, build number, and master name, which *should* make these files unique.
::: bin/postrun.py
@@ +44,5 @@
> + if "nightly" in builder.name:
> + upload_args.append("--nightly")
> + if builder.name.startswith("release-"):
> + upload_args.append("--release")
> + upload_args.append("%s/%s" % (info.get['version'], info.get['build_number']))
yup!
@@ +59,5 @@
> + upload_args.extend(["--product", product])
> + if platform:
> + upload_args.extend(["--platform", platform])
> + elif not builder.name.startswith("release-"):
> + upload_args.extend(["--platform", 'noarch'])
I think this is because postupload.py requires --platform for other jobs, but not for release jobs.
@@ +75,5 @@
> +
> + proc = subprocess.Popen(cmd,
> + stdout=subprocess.PIPE,
> + stderr=subprocess.STDOUT,
> + stdin=devnull)
sure
@@ +79,5 @@
> + stdin=devnull)
> +
> + retcode = proc.wait()
> + output = proc.stdout.read().strip()
> + print output
yes, they'll show up in /dev/shm/queue/commands/logs/XXXXXXX.log. I'll switch to use logging.
@@ +114,5 @@
> + ssh_info = ssh_info.get(branch, ssh_info["*"])
> +
> + retval = ['--user', ssh_info['user']]
> + if 'sshkey' in ssh_info:
> + retval.extend(["-i", ssh_info['sshkey']])
This will only be run on masters, which we restrict to linux, and osx for personal setups. Does -i ~/ work on osx?
@@ +242,5 @@
> +
> + def processBuild(self, options, build_path, request_ids):
> + build = self.getBuild(build_path)
> + info = self.getBuildInfo(build)
> + if not options.log_url:
yeah, that sounds about right.
I considered creating separate scripts for each one of the stages, or having a config that specified which stages get run when. It seemed to add a lot of extra boilerplate, but maybe makes the execution order more clear?
Comment 5•13 years ago
|
||
Comment on attachment 590345 [details] [diff] [review]
unified log handling - buildbotcustom changes
(In reply to Chris AtLee [:catlee] from comment #4)
> This will only be run on masters, which we restrict to linux, and osx for
> personal setups. Does -i ~/ work on osx?
Yeah, looks like it works fine on Linux+Mac.
> @@ +242,5 @@
> > +
> > + def processBuild(self, options, build_path, request_ids):
> > + build = self.getBuild(build_path)
> > + info = self.getBuildInfo(build)
> > + if not options.log_url:
>
> yeah, that sounds about right.
>
> I considered creating separate scripts for each one of the stages, or having
> a config that specified which stages get run when. It seemed to add a lot of
> extra boilerplate, but maybe makes the execution order more clear?
I think it's fine to keep it in here and avoid the boiler plate. Could you add a comment to the file as a whole, or to processBuild that explains the typical flow though?
Attachment #590345 -
Flags: feedback?(bhearsum) → feedback+
Assignee | ||
Updated•13 years ago
|
Component: Release Engineering → Release Engineering: Automation
Priority: -- → P2
QA Contact: release → catlee
Assignee | ||
Comment 6•13 years ago
|
||
This creates the postrun.cfg file that postrun.py expects in the master directory. It also removes support for the (now deprecated) update_from_files cron job. It also moves management of the build master cron jobs into puppet.
Deploying this could be tricky, since you need to have puppet run before doing a reconfig so that postrun.cfg exists. However, after puppet runs, the cron job won't be run any more.
Perhaps a manual run of update_from_files to clean things up would work. Or graceful shutdown each master before deploying?
Attachment #603266 -
Flags: review?(bhearsum)
Comment 7•13 years ago
|
||
Comment on attachment 603266 [details] [diff] [review]
puppet manifests for new log uploading
Review of attachment 603266 [details] [diff] [review]:
-----------------------------------------------------------------
::: modules/buildmaster/manifests/buildbot_master.pp
@@ -98,5 @@
> - "cron-$basedir":
> - require => [
> - Exec["setup-$basedir"]
> - ],
> - command => "/usr/bin/make -f Makefile.setup cron BASEDIR=$full_master_dir MASTER_NAME=$master_name",
If we're not going to use the cron target anymore, let's rip it out rather than let it get out of date.
Attachment #603266 -
Flags: review?(bhearsum) → review+
Assignee | ||
Comment 8•13 years ago
|
||
Attachment #603466 -
Flags: feedback?(bhearsum)
Assignee | ||
Comment 9•13 years ago
|
||
Attachment #590345 -
Attachment is obsolete: true
Attachment #603467 -
Flags: review?(bhearsum)
Updated•13 years ago
|
Attachment #603467 -
Flags: review?(bhearsum) → review+
Updated•13 years ago
|
Attachment #603466 -
Flags: feedback?(bhearsum) → feedback+
Assignee | ||
Comment 10•13 years ago
|
||
Comment on attachment 603466 [details] [diff] [review]
buildbot-configs for log uploading
does a f+ without comment imply r+? let's find out!
Attachment #603466 -
Flags: review?(bhearsum)
Updated•13 years ago
|
Attachment #603466 -
Flags: review?(bhearsum) → review+
Assignee | ||
Updated•13 years ago
|
Attachment #603466 -
Flags: checked-in+
Assignee | ||
Updated•13 years ago
|
Attachment #603467 -
Flags: checked-in+
Assignee | ||
Updated•13 years ago
|
Attachment #603266 -
Flags: checked-in+
Assignee | ||
Comment 11•13 years ago
|
||
deployed as part of this morning's reconfig
Assignee | ||
Comment 12•13 years ago
|
||
Attachment #607982 -
Flags: review?(bhearsum)
Comment 13•13 years ago
|
||
Comment on attachment 607982 [details] [diff] [review]
treat 'fennec' logs as 'mobile' for log uploads
Review of attachment 607982 [details] [diff] [review]:
-----------------------------------------------------------------
Sucks =\
Attachment #607982 -
Flags: review?(bhearsum) → review+
Assignee | ||
Updated•13 years ago
|
Attachment #607982 -
Flags: checked-in+
Comment 14•13 years ago
|
||
Live in production as of 7AM PDT.
Assignee | ||
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•