Closed Bug 713846 Opened 13 years ago Closed 12 years ago

unified log handling

Categories

(Release Engineering :: General, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Assigned: catlee)

References

Details

(Whiteboard: [logs])

Attachments

(4 files, 1 obsolete file)

rework logging so that: - all jobs have their logs uploaded somewhere - url to log is recorded in log_url property - pulse message is generated
Depends on: 718234
Attached patch unified log handling - buildbotcustom changes (obsolete) (deleted) — Splinter Review
The bulk of the work is here. We rip out all the single-purpose log handlers and rely on one global one instantiated per master (in another patch). The log handler will call postrun.py for every build that finishes. postrun.py will then: - upload the log file to the "right place" - possibly mail the author of the change with a link to the log (for try) - update statusdb, including the log url and schedulerdb request ids as properties - send a message to pulse Each one of these steps is a separate command in the command queue. postrun.py calls itself with additional arguments to proceed to the next step. philor, please have a look at the changes to log_uploader.py and see if the formatting changes there make sense (e.g. start/stop times per step). If you think there's any extra information that would be useful to add, now's the time to put it in!
Attachment #590345 - Flags: feedback?(philringnalda)
Attachment #590345 - Flags: feedback?(bhearsum)
Comment on attachment 590345 [details] [diff] [review] unified log handling - buildbotcustom changes I'm flabbergasted that I don't want moar moar moar, but the only things I remember wanting are projects uploading, log_url, timestamps for steps, and not colliding when a retrigger on a different master happens to be the same build number, and I see all of those in there. Yay!
Attachment #590345 - Flags: feedback?(philringnalda) → feedback+
Comment on attachment 590345 [details] [diff] [review] unified log handling - buildbotcustom changes Review of attachment 590345 [details] [diff] [review]: ----------------------------------------------------------------- ::: bin/log_uploader.py @@ +36,5 @@ > if identity: > cmd.extend(['-i', identity]) > cmd.extend(['-p', str(port), host, remote_cmd]) > > + return retry(do_cmd, attempts=retries+1, sleeptime=retry_sleep, args=(cmd,)) Why +1 on the number of retries? @@ +142,5 @@ > shortText = ' '.join(step.getText()) + ' (results: %s, elapsed: %s)' % (results, elapsed) > + if times and times[0]: > + logFile.write("========= Started %s (at %s) =========\n" % (shortText, datetime.fromtimestamp(times[0]))) > + else: > + logFile.write("========= Started %s =========\n" % shortText) There's a block above that claims the step is "not started" if times or times[0] evaluates to false. This else block seems to be hit only in that state too, so isn't it wrong to print out "Started" here? @@ +219,5 @@ > if options.l10n: > + try: > + suffix = '-%s' % build.getProperty('locale') > + except KeyError: > + suffix = '-unknown' Could this fallback get us into a case where we have duplicate log names? (Eg, two builds on different masters that share a build_number and upload to the same place). ::: bin/postrun.py @@ +44,5 @@ > + if "nightly" in builder.name: > + upload_args.append("--nightly") > + if builder.name.startswith("release-"): > + upload_args.append("--release") > + upload_args.append("%s/%s" % (info.get['version'], info.get['build_number'])) Shouldn't these be .get() instead of .get[]? @@ +59,5 @@ > + upload_args.extend(["--product", product]) > + if platform: > + upload_args.extend(["--platform", platform]) > + elif not builder.name.startswith("release-"): > + upload_args.extend(["--platform", 'noarch']) Don't want to use noarch for release jobs? @@ +75,5 @@ > + > + proc = subprocess.Popen(cmd, > + stdout=subprocess.PIPE, > + stderr=subprocess.STDOUT, > + stdin=devnull) All of this command execution makes me wonder if you should be using utils.commands.run_cmd/get_output - buildbotcustom already has a dependency on build/tools anyways! @@ +79,5 @@ > + stdin=devnull) > + > + retcode = proc.wait() > + output = proc.stdout.read().strip() > + print output Do these prints get captured anywhere? If so, they should probably use the logging module. If not, just get rid of them. @@ +114,5 @@ > + ssh_info = ssh_info.get(branch, ssh_info["*"]) > + > + retval = ['--user', ssh_info['user']] > + if 'sshkey' in ssh_info: > + retval.extend(["-i", ssh_info['sshkey']]) Have you tested this on all platforms? IIRC "-i" doesn't always resolve ~/ but "-oIdentityFile" does. @@ +242,5 @@ > + > + def processBuild(self, options, build_path, request_ids): > + build = self.getBuild(build_path) > + info = self.getBuildInfo(build) > + if not options.log_url: Let me make sure I understand the flow of this correctly. When a build finishes, Buildbot calls the QueuedCommandHandler that runs postrun.py. postrun.py runs processBuild, and there will be no log_url set, so it runs this block. Inside this block, it adds another postrun.py command to the queue, this time with log_url set. It also sends out mail to people, if necessary. When that runs it skips this block and runs the next one, which adds it to StatusDB, and adds _another_ run to the queue, this time with log_url and statusdb_id set. When that runs, it skips down to the last block and does the Pulse publishing. ::: process/release.py @@ -22,1 @@ > reload(release.paths) Hopefully this file hasn't bitrotted too much :S.
Comment on attachment 590345 [details] [diff] [review] unified log handling - buildbotcustom changes Review of attachment 590345 [details] [diff] [review]: ----------------------------------------------------------------- ::: bin/log_uploader.py @@ +36,5 @@ > if identity: > cmd.extend(['-i', identity]) > cmd.extend(['-p', str(port), host, remote_cmd]) > > + return retry(do_cmd, attempts=retries+1, sleeptime=retry_sleep, args=(cmd,)) setting -r 1 should mean try once, and if it fails, try once more. Without the +1 it would only try once. @@ +142,5 @@ > shortText = ' '.join(step.getText()) + ' (results: %s, elapsed: %s)' % (results, elapsed) > + if times and times[0]: > + logFile.write("========= Started %s (at %s) =========\n" % (shortText, datetime.fromtimestamp(times[0]))) > + else: > + logFile.write("========= Started %s =========\n" % shortText) Yeah, I think the current logs output something like ========= Started (results: not started, elapsed: not started) ========== ======== Finished (results: not started, elapsed: not started) ======== I'll clean that up. @@ +219,5 @@ > if options.l10n: > + try: > + suffix = '-%s' % build.getProperty('locale') > + except KeyError: > + suffix = '-unknown' No, that's handled by this block in formatLog(): if master_name: build_name = "%s%s-%s-build%s.txt.gz" % (builder_name, builder_suffix, master_name, build_number) else: build_name = "%s%s-build%s.txt.gz" % (builder_name, builder_suffix, build_number) So we'll always be including the builder name, build number, and master name, which *should* make these files unique. ::: bin/postrun.py @@ +44,5 @@ > + if "nightly" in builder.name: > + upload_args.append("--nightly") > + if builder.name.startswith("release-"): > + upload_args.append("--release") > + upload_args.append("%s/%s" % (info.get['version'], info.get['build_number'])) yup! @@ +59,5 @@ > + upload_args.extend(["--product", product]) > + if platform: > + upload_args.extend(["--platform", platform]) > + elif not builder.name.startswith("release-"): > + upload_args.extend(["--platform", 'noarch']) I think this is because postupload.py requires --platform for other jobs, but not for release jobs. @@ +75,5 @@ > + > + proc = subprocess.Popen(cmd, > + stdout=subprocess.PIPE, > + stderr=subprocess.STDOUT, > + stdin=devnull) sure @@ +79,5 @@ > + stdin=devnull) > + > + retcode = proc.wait() > + output = proc.stdout.read().strip() > + print output yes, they'll show up in /dev/shm/queue/commands/logs/XXXXXXX.log. I'll switch to use logging. @@ +114,5 @@ > + ssh_info = ssh_info.get(branch, ssh_info["*"]) > + > + retval = ['--user', ssh_info['user']] > + if 'sshkey' in ssh_info: > + retval.extend(["-i", ssh_info['sshkey']]) This will only be run on masters, which we restrict to linux, and osx for personal setups. Does -i ~/ work on osx? @@ +242,5 @@ > + > + def processBuild(self, options, build_path, request_ids): > + build = self.getBuild(build_path) > + info = self.getBuildInfo(build) > + if not options.log_url: yeah, that sounds about right. I considered creating separate scripts for each one of the stages, or having a config that specified which stages get run when. It seemed to add a lot of extra boilerplate, but maybe makes the execution order more clear?
Comment on attachment 590345 [details] [diff] [review] unified log handling - buildbotcustom changes (In reply to Chris AtLee [:catlee] from comment #4) > This will only be run on masters, which we restrict to linux, and osx for > personal setups. Does -i ~/ work on osx? Yeah, looks like it works fine on Linux+Mac. > @@ +242,5 @@ > > + > > + def processBuild(self, options, build_path, request_ids): > > + build = self.getBuild(build_path) > > + info = self.getBuildInfo(build) > > + if not options.log_url: > > yeah, that sounds about right. > > I considered creating separate scripts for each one of the stages, or having > a config that specified which stages get run when. It seemed to add a lot of > extra boilerplate, but maybe makes the execution order more clear? I think it's fine to keep it in here and avoid the boiler plate. Could you add a comment to the file as a whole, or to processBuild that explains the typical flow though?
Attachment #590345 - Flags: feedback?(bhearsum) → feedback+
Component: Release Engineering → Release Engineering: Automation
Priority: -- → P2
QA Contact: release → catlee
This creates the postrun.cfg file that postrun.py expects in the master directory. It also removes support for the (now deprecated) update_from_files cron job. It also moves management of the build master cron jobs into puppet. Deploying this could be tricky, since you need to have puppet run before doing a reconfig so that postrun.cfg exists. However, after puppet runs, the cron job won't be run any more. Perhaps a manual run of update_from_files to clean things up would work. Or graceful shutdown each master before deploying?
Attachment #603266 - Flags: review?(bhearsum)
Comment on attachment 603266 [details] [diff] [review] puppet manifests for new log uploading Review of attachment 603266 [details] [diff] [review]: ----------------------------------------------------------------- ::: modules/buildmaster/manifests/buildbot_master.pp @@ -98,5 @@ > - "cron-$basedir": > - require => [ > - Exec["setup-$basedir"] > - ], > - command => "/usr/bin/make -f Makefile.setup cron BASEDIR=$full_master_dir MASTER_NAME=$master_name", If we're not going to use the cron target anymore, let's rip it out rather than let it get out of date.
Attachment #603266 - Flags: review?(bhearsum) → review+
Attachment #603466 - Flags: feedback?(bhearsum)
Attachment #590345 - Attachment is obsolete: true
Attachment #603467 - Flags: review?(bhearsum)
Attachment #603467 - Flags: review?(bhearsum) → review+
Attachment #603466 - Flags: feedback?(bhearsum) → feedback+
Comment on attachment 603466 [details] [diff] [review] buildbot-configs for log uploading does a f+ without comment imply r+? let's find out!
Attachment #603466 - Flags: review?(bhearsum)
Attachment #603466 - Flags: review?(bhearsum) → review+
Attachment #603466 - Flags: checked-in+
Attachment #603467 - Flags: checked-in+
Attachment #603266 - Flags: checked-in+
deployed as part of this morning's reconfig
Depends on: 737543
Depends on: 737579
Depends on: 696516
Attachment #607982 - Flags: review?(bhearsum)
Comment on attachment 607982 [details] [diff] [review] treat 'fennec' logs as 'mobile' for log uploads Review of attachment 607982 [details] [diff] [review]: ----------------------------------------------------------------- Sucks =\
Attachment #607982 - Flags: review?(bhearsum) → review+
Attachment #607982 - Flags: checked-in+
Live in production as of 7AM PDT.
Depends on: 738697
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Blocks: 846752
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: