Closed Bug 676879 Opened 13 years ago Closed 13 years ago

usebuildbot=1 doesn't show Valgrind builds

Categories

(Release Engineering :: General, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: coop)

References

Details

(Keywords: regression, Whiteboard: [project])

Attachments

(4 files, 3 obsolete files)

This is caused by bug 669137.
Depends on: 669137
It is? The slowest of them took 3 hours according to tinderbox (which lies), most were an hour or an hour and a half.
Okay, then maybe it's not.
Pretty much everything about the Valgrind builds is a bit odd, from the scheduling to the way they run, so I'm ready to believe there's something missing in what's in the json (or that they don't make it into the json at all - I don't think I've ever seen them show as running, so they may not make it into that json either), but I don't know what we require or what's required to wind up in either json file, or how to catch the json when they are either running or freshly finished, since they just run when they feel like it.
There's a patch on bug 678685 which should set the branch properly for Valgrind.
Depends on: 678685
That patch is now landed, and show get deployed Tuesday morning.
Keywords: regression
Summary: tbpl.allizom.org/?usebuildbot=1 doesn't show Valgrind builds → usebuildbot=1 doesn't show Valgrind builds
Was actually deployed today. Please reopen if there are further issues.
You didn't close it, so I can't reopen it, but it still doesn't show, so perhaps it had some other problem besides the branch.
Blocks: 630538
No longer blocks: 669000
If this is still happening, could this be caused by bug#685299 ?
That actually means we don't get any logs for debugging, but we should still be getting them on TBPL (like SpiderMonkey now). I found a couple of issues 1, The builders in the js.gz files don't have branch set to mozilla-central instead of idle, so this is the same as the end of bug 684158. I've fixed that manually to speed it up 2, We have |"revision": null| in the properties for the builds, so tbpl doesn't know what to display the build against. It does have the branch set properly. Hopefully setting the revision property will fix both the running and finished cases.
Might as well have this in RelEng until we've worked all the kinks out on our side. The Valgrind builds aren't on-change (they're on a 24 hours timer with a 3 idle slaves requirement) so buildbot never sets that revision property when creating the job. Since the script does the checkout, build, and valgrind all in one go we don't get a chance to run a SetProperty step until afterwards. The build wouldn't show as running on tbpl, instead appearing as finished. Alternatively, we could make Valgrind another nightly build, just one that runs after all the normal jobs finish. That'd mix up the project/non-project distinction we're breaking right now. What say you RelEng ?
Component: Tinderboxpushlog → Release Engineering
Product: Webtools → mozilla.org
QA Contact: tinderboxpushlog → release
Version: Trunk → other
That might or might not be a step in the direction it really wants to go, dunno. If we could ever get it run on try, and get it greened up, what it actually wants to be is a tier-1, break it and you back out, on-change build.
Despite still not seeing them show up in the display, I noticed while looking for something else in the tree admin panel that valgrind-linux and valgrind-linux64 now show up there, so I hid them (they'll be permared if they ever do show up), so now to know whether or not they're showing up you'll need to use https://tbpl.mozilla.org/?noignore=1&jobname=valgrind (and starting with a &usetinderbox=1 tacked on there, so you know where you should be looking to see if they are there, helps).
(In reply to Nick Thomas [:nthomas] (gone until Sep. 25) from comment #11) > Might as well have this in RelEng until we've worked all the kinks out on > our side. > > The Valgrind builds aren't on-change (they're on a 24 hours timer with a 3 > idle slaves requirement) so buildbot never sets that revision property when > creating the job. Since the script does the checkout, build, and valgrind > all in one go we don't get a chance to run a SetProperty step until > afterwards. The build wouldn't show as running on tbpl, instead appearing as > finished. Alternatively, we could make Valgrind another nightly build, just > one that runs after all the normal jobs finish. That'd mix up the > project/non-project distinction we're breaking right now. What say you > RelEng ? I don't think it's necessarily a bad thing for the Valgrind builds to be run off of a Nightly scheduler, but I don't know that that'll fix the revision problem immediately. From my reading of the Nightly Scheduler, revision will still be None. So, we probably need some way of upleveling the changeset that the Valgrind script ends up with. Looking for '^revision: [0-9abcdef]+' seems like a generic enough thing for ScriptFactory support, but I'm curious if Catlee has any thoughts on this.
(In reply to Ben Hearsum [:bhearsum] from comment #14) > (In reply to Nick Thomas [:nthomas] (gone until Sep. 25) from comment #11) > > Might as well have this in RelEng until we've worked all the kinks out on > > our side. > > > > The Valgrind builds aren't on-change (they're on a 24 hours timer with a 3 > > idle slaves requirement) so buildbot never sets that revision property when > > creating the job. Since the script does the checkout, build, and valgrind > > all in one go we don't get a chance to run a SetProperty step until > > afterwards. The build wouldn't show as running on tbpl, instead appearing as > > finished. Alternatively, we could make Valgrind another nightly build, just > > one that runs after all the normal jobs finish. That'd mix up the > > project/non-project distinction we're breaking right now. What say you > > RelEng ? > > I don't think it's necessarily a bad thing for the Valgrind builds to be run > off of a Nightly scheduler, but I don't know that that'll fix the revision > problem immediately. From my reading of the Nightly Scheduler, revision will > still be None. If we can find a way to get revision set, however, this is probably the cleanest way.
Whiteboard: [triage-followup]
Priority: -- → P3
(In reply to Ben Hearsum [:bhearsum] from comment #14) > I don't think it's necessarily a bad thing for the Valgrind builds to be run > off of a Nightly scheduler, but I don't know that that'll fix the revision > problem immediately. From my reading of the Nightly Scheduler, revision will > still be None. So, we probably need some way of upleveling the changeset > that the Valgrind script ends up with. Looking for '^revision: [0-9abcdef]+' > seems like a generic enough thing for ScriptFactory support, but I'm curious > if Catlee has any thoughts on this. Why uplevel when we can downlevel, i.e. could we make the valgrind build triggerable after a successful nightly, and pass the nightly changeset to the valgrind script?
tbpl.allizom.org has gone away, but the root problem is still the same. Here are refreshed URLs that reproduce the problem: https://tbpl.mozilla.org/?noignore=1&jobname=valgrind&usetinderbox=1 does show valgrind builds. https://tbpl.mozilla.org/?noignore=1&jobname=valgrind does not show valgrind builds.
(In reply to Chris Cooper [:coop] from comment #16) > (In reply to Ben Hearsum [:bhearsum] from comment #14) > > I don't think it's necessarily a bad thing for the Valgrind builds to be run > > off of a Nightly scheduler, but I don't know that that'll fix the revision > > problem immediately. From my reading of the Nightly Scheduler, revision will > > still be None. So, we probably need some way of upleveling the changeset > > that the Valgrind script ends up with. Looking for '^revision: [0-9abcdef]+' > > seems like a generic enough thing for ScriptFactory support, but I'm curious > > if Catlee has any thoughts on this. > > Why uplevel when we can downlevel, i.e. could we make the valgrind build > triggerable after a successful nightly, and pass the nightly changeset to > the valgrind script? These jobs do their own builds, so I don't think it makes sense to have them depend on nightly jobs...
(In reply to Ben Hearsum [:bhearsum] from comment #18) > These jobs do their own builds, so I don't think it makes sense to have them > depend on nightly jobs... OK, could we just pass them the same changeset that we pass to the other nightlies and just make them another nightly build?
(In reply to Chris Cooper [:coop] from comment #19) > (In reply to Ben Hearsum [:bhearsum] from comment #18) > > These jobs do their own builds, so I don't think it makes sense to have them > > depend on nightly jobs... > > OK, could we just pass them the same changeset that we pass to the other > nightlies and just make them another nightly build? That's what I was describing in #14, but because we don't use any of the Buildbot Source steps, I'm not sure revision will be set. Might be worth a try, though.
I mucked with this a bit on Friday, and was able to get Valgrind running as a nightly build and it *did* get the revision set. Now I just have to alter how the script factory is called to pass the revision.
Assignee: nobody → coop
Status: NEW → ASSIGNED
Priority: P3 → P2
Whiteboard: [triage-followup] → [project]
Attachment #566898 - Flags: review?(bhearsum)
Attached patch Change valgrind to a nightly builder (obsolete) (deleted) — Splinter Review
I've subclassed ScriptFactory as ValgrindFactory (felt kinda dirty about that), but it was the easiest way to get the revisions into buildbot. It's shorter than generateValgrindObjects which it replaces, though.
Attachment #566901 - Flags: review?(bhearsum)
Attachment #566903 - Flags: review?(bhearsum)
Comment on attachment 566901 [details] [diff] [review] Change valgrind to a nightly builder >diff --git a/misc.py b/misc.py >+ if config['enable_valgrind'] and \ >+ platform in config['valgrind_platforms']: >+ valgrind_env=platform_env >+ valgrind_env['REVISION'] = WithProperties("%(revision)s") Drive-by comment - we normally copy envs so that other builders don't get modified.
Comment on attachment 566901 [details] [diff] [review] Change valgrind to a nightly builder Review of attachment 566901 [details] [diff] [review]: ----------------------------------------------------------------- I'm going back and forth on this approach...on one hand, I feel like it's a step backwards to change this lightweight project into something attached to the monolithic set of branch configs. On the other hand, this _is_ a Firefox job, so maybe it belongs there? I'm curious what Catlee thinks about this, as he set these jobs up in the first place. Aside from that, I'm not sure why Valgrind needs its own factory. The only thing it does is override got_revision and revision. However, now that you're triggering Valgrind jobs through a Scheduler which sets revision, I don't think there's any need for that.
Attachment #566901 - Flags: review?(catlee)
Attachment #566901 - Flags: review?(bhearsum)
Attachment #566901 - Flags: review-
Attachment #566898 - Flags: review?(bhearsum) → review+
Comment on attachment 566903 [details] [diff] [review] Config changes required to run valgrind as a nightly builder Review of attachment 566903 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozilla/config.py @@ +86,5 @@ > 'enable_blocklist_update': False, > 'blocklist_update_on_closed_tree': False, > 'enable_nightly': True, > 'enabled_products': ['firefox', 'mobile'], > + 'enable_valgrind': False, Can you make the default to this True, if you end up sticking with this approach? ISTR that we try to keep mozilla-central's config the same as GLOBAL_VARS.
Attachment #566903 - Flags: review?(bhearsum) → review-
(In reply to Ben Hearsum [:bhearsum] from comment #26) > Aside from that, I'm not sure why Valgrind needs its own factory. The only > thing it does is override got_revision and revision. However, now that > you're triggering Valgrind jobs through a Scheduler which sets revision, I > don't think there's any need for that. I did this because if the ScriptFactory is ever force-built without setting the revision, the revision won't bubble up. Maybe we don't care about that? I can tack the SetProperty step onto the existing ScriptFactory call in misc.py instead of subclassing - is that acceptable?
(In reply to Ben Hearsum [:bhearsum] from comment #27) > Can you make the default to this True, if you end up sticking with this > approach? ISTR that we try to keep mozilla-central's config the same as > GLOBAL_VARS. Really? So an entry per branch instead of just one for m-c? That seems inefficient unless we expect it to run most places, but I'll bow to best practice here.
(In reply to Chris Cooper [:coop] from comment #29) > (In reply to Ben Hearsum [:bhearsum] from comment #27) > > Can you make the default to this True, if you end up sticking with this > > approach? ISTR that we try to keep mozilla-central's config the same as > > GLOBAL_VARS. > > Really? So an entry per branch instead of just one for m-c? That seems > inefficient unless we expect it to run most places, but I'll bow to best > practice here. That's the way I've thought of it for the last few years. It's the config used for both mozilla-central and for project branches, so it makes a lot of sense IMHO.
Attachment #566901 - Attachment is obsolete: true
Attachment #566901 - Flags: review?(catlee)
Attachment #567116 - Flags: review?(bhearsum)
Makes the default True for enable_valgrind on m-c, other branches must pref it off, as requested.
Attachment #566903 - Attachment is obsolete: true
Attachment #567118 - Flags: review?(bhearsum)
Attachment #566898 - Attachment is obsolete: true
Attachment #567119 - Flags: review?(bhearsum)
Attachment #567118 - Flags: review?(bhearsum) → review+
Attachment #567116 - Flags: review?(bhearsum) → review+
Attachment #567119 - Flags: review?(bhearsum) → review+
Comment on attachment 567119 [details] [diff] [review] Put revision into files in properties dir for consumption by buildbot https://hg.mozilla.org/build/tools/rev/95eb6eb0311a
Attachment #567119 - Flags: checked-in+
Comment on attachment 567116 [details] [diff] [review] Add generic property-setting steps to ScriptFactory https://hg.mozilla.org/build/buildbotcustom/rev/4b2cc709a159
Attachment #567116 - Flags: checked-in+
Comment on attachment 567118 [details] [diff] [review] Config changes required to run valgrind as a nightly builder, v2 https://hg.mozilla.org/build/buildbot-configs/rev/f93cf77775d3
Attachment #567118 - Flags: checked-in+
Flags: needs-reconfig?
Comment on attachment 567119 [details] [diff] [review] Put revision into files in properties dir for consumption by buildbot This |cat| step failed on the SeaMonkey Blocklist Update Script Factory. This was the last step run. This as-is is suboptimal, as I suspect it may cause other ScriptFactory things to fail. bash -c 'cat *' in dir /builds/slave/comm-cen-trunk-lnx-blocklistupdate/properties (timeout 1200 secs) watching logfiles {} argv: ['bash', '-c', 'cat *'] environment: CC=/tools/gcc/bin/gcc CVS_RSH=ssh CXX=/tools/gcc/bin/g++ G_BROKEN_FILENAMES=1 HISTSIZE=1000 HOME=/home/seabld HOSTNAME=cn-sea-qm-centos5-01 INPUTRC=/etc/inputrc JAVA_HOME=/builds/jdk LANG=en_US.UTF-8 LESSOPEN=|/usr/bin/lesspipe.sh %s LOGNAME=seabld LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35: MAIL=/var/spool/mail/seabld PATH=/opt/local/bin:/tools/python/bin:/tools/buildbot/bin:/tools/python/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/seabld/bin PWD=/builds/slave/comm-cen-trunk-lnx-blocklistupdate/properties SHELL=/bin/bash SHLVL=1 SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass TBOX_CLIENT_CVS_DIR=/builds/tinderbox/mozilla/tools TERM=linux USER=seabld _=/tools/python/bin/python using PTY: False cat: *: No such file or directory program finished with exit code 1 elapsedTime=0.077063
Attachment #567119 - Flags: feedback-
(In reply to Justin Wood (:Callek) from comment #37) > This |cat| step failed on the SeaMonkey Blocklist Update Script Factory. > This was the last step run. This as-is is suboptimal, as I suspect it may > cause other ScriptFactory things to fail. Would something like "for file in `ls -1`; do cat $file; done" be acceptable?
Preproduction release failed during tagging: ====== bash -c 'cat *' in dir /builds/slave/rel-m-beta-firefox-tag/properties (timeout 1200 secs) watching logfiles {} argv: ['bash', '-c', 'cat *'] ...... cat: *: No such file or directory program finished with exit code 1
(In reply to Chris Cooper [:coop] from comment #38) > (In reply to Justin Wood (:Callek) from comment #37) > > This |cat| step failed on the SeaMonkey Blocklist Update Script Factory. > > This was the last step run. This as-is is suboptimal, as I suspect it may > > cause other ScriptFactory things to fail. > > Would something like "for file in `ls -1`; do cat $file; done" be acceptable? That could be fine, but I wonder how buildbot behaves on a SetProperty like that with an empty stdout. Better still, imo, would be |flunkOnFailure=False|
(In reply to Justin Wood (:Callek) from comment #40) > (In reply to Chris Cooper [:coop] from comment #38) > > (In reply to Justin Wood (:Callek) from comment #37) > > > This |cat| step failed on the SeaMonkey Blocklist Update Script Factory. > > > This was the last step run. This as-is is suboptimal, as I suspect it may > > > cause other ScriptFactory things to fail. > > > > Would something like "for file in `ls -1`; do cat $file; done" be acceptable? > > That could be fine, but I wonder how buildbot behaves on a SetProperty like > that with an empty stdout. We should probably find this out rathen than guessing. > Better still, imo, would be |flunkOnFailure=False| I think we should do this, too, but I'd still prefer not to have a step that always fails for some jobs.
(In reply to Ben Hearsum [:bhearsum] from comment #41) > We should probably find this out rathen than guessing. Tested in dev env. If there are no files, and hence, no output, no properties are set. Likewise, if the files containing the properties are not in the correct format, the regexp doesn't match anything so no properties are set. > > Better still, imo, would be |flunkOnFailure=False| I've added flunkOnFailure/warnOnFailure now, but honestly we shouldn't hit those cases now that we're iterating over files rather than doing a blanket 'cat.'
Attachment #567458 - Flags: review?(bear)
Attachment #567458 - Flags: review?(bear) → review+
Attachment #567458 - Flags: checked-in+
Valgrind results appeared last night: https://tbpl.mozilla.org/?rev=0a5e72d1b479 The builds are currently broken, but devs can see the builds and logs now and can hopefully get them greened up.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Flags: needs-reconfig?
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: