Closed Bug 704006 Opened 13 years ago Closed 12 years ago

Add a new buildbot job status for "User cancelled"

Categories

(Release Engineering :: General, enhancement, P5)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: ewong)

References

Details

(Keywords: buildapi, sheriffing-P2, Whiteboard: [tbpl][buildduty])

Attachments

(2 files, 7 obsolete files)

(deleted), patch
ewong
: review+
emorley
: checked-in+
Details | Diff | Splinter Review
(deleted), patch
ewong
: review+
ewong
: feedback+
emorley
: checked-in+
Details | Diff | Splinter Review
I know, people should actually look at their own results, rather than drawing silly conclusions, but... Because we're broken on Windows, an awful lot of failures there are purple when they should be red or orange. Because we're hopeful, we send a lot of broken pushes to try asking for everything, and then cancel most of the jobs once we see something fail. Because we're silly, when we get every Windows instance of a particular test turning purple, rather than looking at the results to see why it was purple, we look at the presence of a lot of purple in nearby pushes and think "well, Try must be busted, I'll just push this." We should show manual kills in a different color on tbpl, and to do that we should have a different status for the job.
Severity: normal → enhancement
Priority: -- → P5
Whiteboard: [tbpl][buildapi]
Whiteboard: [tbpl][buildapi] → [tbpl][buildapi][sheriff-want]
Assignee: nobody → ewong
Status: NEW → ASSIGNED
Attachment #652656 - Flags: review?(catlee)
Attachment #652657 - Flags: review?(mstange)
Attachment #652657 - Flags: review?(mstange) → review?(bmo)
Comment on attachment 652657 [details] [diff] [review] Add a new separate job status for Manually Killed. (tbpl part) (WIP) (v1) Review of attachment 652657 [details] [diff] [review]: ----------------------------------------------------------------- Looking good so far :-) However missing the import-buildbot-data.py changes. Few other nits: * I don't want to bikeshed too much, but I'm not sure how clear 'mkill' is - perhaps 'cancelled' (/'userkilled' etc) would be better? Happy to defer to whomever reviews the buildbot part. * Please can you keep the order of the failure types consistent throughout. ie: mkill (/cancelled) the last type before 'unknown' each time. (To match the order needed on the buildbot side) Tested locally in a Vagrant TBPL instance by modifying import-buildbot-data.py and looked good otherwise. ::: css/style.css @@ +337,5 @@ > } > > +.machineResult.mkill:hover, > +.machineResult.mkill[active=true] { > + background-color: #000; To be consistent with the other result types, the background colour should match the non-hover foreground colour (#F0C).
Attachment #652657 - Flags: review?(bmo) → review-
Comment on attachment 652656 [details] [diff] [review] Add a new separate job status for Manually Killed. (buildbot part) (WIP) (v1) Review of attachment 652656 [details] [diff] [review]: ----------------------------------------------------------------- ::: master/buildbot/status/builder.py @@ +22,5 @@ > # sibling imports > from buildbot import interfaces, util, sourcestamp > > +SUCCESS, WARNINGS, FAILURE, SKIPPED, EXCEPTION, MKILL, RETRY = range(7) > +Results = ["success", "warnings", "failure", "skipped", "mkill", "exception", "retry"] I'm pretty sure this is going to break all scripts relying on the results value in http://builddata.pub.build.mozilla.org/buildjson/builds-4hr.js.gz, since it shifts them all along by one. 'mkill' (or 'cancelled' etc) needs to go at the end. Bonus of this is will be that the tbpl and buildbot changes don't have to be rolled out at the same time, since the 'unknown' case will just cover the new result value until TBPL knows how to interpret it :-) @@ +30,2 @@ > # Retry needs to be considered the worst so that conusmers don't have to > # worry about other failures undermining the RETRY. Doesn't mkill need to go last, to ensure a retry condition doesn't override the user's request to cancel the job?
Added the import-buildbot-part.py changes. And s/mkill/usercancel/
Attachment #652657 - Attachment is obsolete: true
Attachment #652706 - Flags: review?(bmo)
Attachment #652656 - Attachment is obsolete: true
Attachment #652656 - Flags: review?(catlee)
Attachment #652707 - Flags: review?(catlee)
Comment on attachment 652706 [details] [diff] [review] Add a new separate job status for Manually Killed. (tbpl part) (WIP) (v2) r=me with the s/was manually killed/was cancelled/ change in the help text/hover text, that we mentioned on IRC :-)
Attachment #652706 - Flags: review?(bmo) → review+
Comment on attachment 652707 [details] [diff] [review] Add a new separate job status for Manually Killed. (buildbot part) (WIP) (v2) Review of attachment 652707 [details] [diff] [review]: ----------------------------------------------------------------- Looks fine to me. I'll tag dustin for review, and comment on whether this is something that could go upstream as well?
Attachment #652707 - Flags: review?(dustin)
Attachment #652707 - Flags: review?(catlee)
Attachment #652707 - Flags: review+
ewong, see also comment 5 :-)
Comment on attachment 652707 [details] [diff] [review] Add a new separate job status for Manually Killed. (buildbot part) (WIP) (v2) You're re-numbering existing statuses by adding that in the middle of the range(). I'd add it at the end, instead. This could have a lot of knock-on effects in various status filters, so I'd be leery of merging it upstream. If it works for the releng case, though, I have no real issue with it. I'm not quite qualified to actually *review* that, though :)
Attachment #652707 - Flags: review?(dustin) → feedback+
Comment on attachment 652707 [details] [diff] [review] Add a new separate job status for Manually Killed. (buildbot part) (WIP) (v2) I didn't notice that you inserted USERCANCEL in the middle of the existing range. Please add it to the end, so USERCANCEL == 6.
Attachment #652706 - Attachment is obsolete: true
Attachment #653029 - Flags: review+
(In reply to Ed Morley [:edmorley] from comment #5) > @@ +30,2 @@ > > # Retry needs to be considered the worst so that conusmers don't have to > > # worry about other failures undermining the RETRY. > > Doesn't mkill need to go last, to ensure a retry condition doesn't override > the user's request to cancel the job? This part is what's confusing me. This is what I have now. # SUCCESS > SKIPPED > WARNINGS > FAILURE > EXCEPTION > RETRY > USERCANCEL So if RETRY > USERCANCEL, wouldn't RETRY override the USERCANCEL? I don't think I understand this line. (clarifications greatly appreciated.)
Attachment #652707 - Attachment is obsolete: true
Attachment #653030 - Flags: review?(catlee)
Attachment #653030 - Flags: feedback?(bmo)
Comment on attachment 653029 [details] [diff] [review] Add a new separate job status for Manually Killed. (tbpl part) (v3) >diff --git a/js/UserInterface.js b/js/UserInterface.js >--- a/js/UserInterface.js >+++ b/js/UserInterface.js >@@ -883,16 +885,17 @@ var UserInterface = { >+ "usercancel": type + ' on ' + Config.OSNames[result.machine.os] + ' was manually killed', Missed this one :-)
(In reply to Edmund Wong (:ewong) from comment #14) > This part is what's confusing me. > > This is what I have now. > > # SUCCESS > SKIPPED > WARNINGS > FAILURE > EXCEPTION > RETRY > USERCANCEL > > So if RETRY > USERCANCEL, wouldn't RETRY override the USERCANCEL? > I don't think I understand this line. (clarifications greatly appreciated.) How you have it in the v3 patch is correct. The comment is talking about best to worst, I wouldn't take the greater than sign too literally :-)
removed the extra 'manually killed' description from UserInterface.js
Attachment #653029 - Attachment is obsolete: true
Attachment #653239 - Flags: review+
Comment on attachment 653030 [details] [diff] [review] Add a new separate job status for Manually Killed. (buildbot part) (v3) Sorry, missed this request's bugmail, inbetween that for the other patch. often in so far as compatible with tbpl part :-)
Attachment #653030 - Flags: feedback?(bmo) → feedback+
swype being annoying. s/often/lgtm/
Depends on: 790960
No longer depends on: 790960
catlee, do you have a rough eta for when you'll be able to look at the review for the buildbot-custom part of this/is there someone else that could do it? :-) (I'm just rather keen to not let the momentum die out for our new superstar TBPL contributor, given that this review has been pending for 5 weeks)
(In reply to Ed Morley [:edmorley UTC+1] from comment #21) > catlee, do you have a rough eta for when you'll be able to look at the > review for the buildbot-custom part of this/is there someone else that could > do it? :-) Which part is that? We have an hg repository called buildbotcustom, so I may be getting confused here :) I like :ewong's patch in principle. I really think we should try a bit harder to get this upstream first. RelEng tries very hard only to run core buildbot code that has been landed upstream, as this makes upgrading to new versions much easier.
(In reply to Chris AtLee [:catlee] from comment #22) > Which part is that? We have an hg repository called buildbotcustom, so I may > be getting confused here :) Ah, the patch is against buildbot not buildbotcustom, sorry it has been so long since I reviewed the TBPL part. In that case, yeah I understand the desire to upstream - who can I pester on that front?
That would be me! Pull requests on github, por favor. Please build the patch based on the 'nine' branch - this doesn't sound like something fun to forward-port from the 0.8.x series.
Comment on attachment 653030 [details] [diff] [review] Add a new separate job status for Manually Killed. (buildbot part) (v3) waiting for upstream resolution
Attachment #653030 - Flags: review?(catlee)
ewong, would you like me to file & patch upstream, or would you like to? :-)
(In reply to Dustin J. Mitchell [:dustin] from comment #27) > he already has - https://github.com/buildbot/buildbot/pull/555 Ah amazing! :-)
Keywords: buildapi
Keywords: sheriffing-P2
Whiteboard: [tbpl][buildapi][sheriff-want] → [tbpl]
I've made an upstream patch for this and my patch was accepted; but it was for the nine branch and not the main master 0.8 one. Dustin, is there something I need to do here? Or is this not going to work because the current buildbot is 0.8 based and my patch was done for nine?
I really should finish reading the comments before I open my mouth. Didn't realize I had done it in October... time sure flies...
You can probably backport this to 0.8.2 without too much trouble.
backporting from nine branch. Took also the liberty to clean up some spaces.
Attachment #653030 - Attachment is obsolete: true
Attachment #697320 - Flags: review?(catlee)
Attachment #697320 - Flags: feedback?(dustin)
Attachment #697320 - Flags: feedback?(dustin) → feedback+
Comment on attachment 697320 [details] [diff] [review] Add a new separate job status for "Manually killed" (now User cancelled) (v1) Looks great! I'll get this tested in staging and then we can get it deployed.
Attachment #697320 - Flags: review?(catlee) → review+
Comment on attachment 697320 [details] [diff] [review] Add a new separate job status for "Manually killed" (now User cancelled) (v1) Review of attachment 697320 [details] [diff] [review]: ----------------------------------------------------------------- Looks like it works with some minor tweaks! http://people.mozilla.org/~catlee/sattap/41aa28f2.png ::: master/buildbot/status/tinderbox.py @@ +6,5 @@ > from twisted.internet import defer > > from buildbot import interfaces > from buildbot.status import mail > +from buildbot.status.builder import SUCCESS, WARNINGS, EXCEPTION, missing \ line continuation character here ::: master/buildbot/status/web/base.py @@ +5,5 @@ > from zope.interface import Interface > from twisted.web import resource, static > from twisted.python import log > from buildbot.status import builder > +from buildbot.status.builder import SUCCESS, WARNINGS, FAILURE, SKIPPED, and here ::: master/buildbot/test/unit/test_process_base.py @@ +3,5 @@ > > from buildbot.process.base import Build > from buildbot.process.properties import Properties > from buildbot.buildrequest import BuildRequest > +from buildbot.status.builder import FAILURE, SUCCESS, WARNINGS, RETRY, and here
I've updated the patch with the nits fixed. I noticed the colour in the png looks very nice. I'm glad someone thought of the colour pink.
Attachment #697320 - Attachment is obsolete: true
Attachment #708898 - Flags: review+
Attachment #708898 - Flags: feedback+
So, where are we now?
Flags: needinfo?(catlee)
needs landing and then all buildbot masters restarted
Flags: needinfo?(catlee)
Comment on attachment 708898 [details] [diff] [review] Add a new separate job status for "Manually killed" (now User cancelled) (v2) https://hg.mozilla.org/build/buildbot/rev/797a4d4b2c01 Catlee, ok for me to merge to the production branch? Anything specific else I need to do whilst doing that?
Attachment #708898 - Flags: checked-in+
Comment on attachment 653239 [details] [diff] [review] Add a new separate job status for Manually Killed. (tbpl part) (v4) https://hg.mozilla.org/webtools/tbpl/rev/b59287ec8b68
Attachment #653239 - Flags: checked-in+
Summary: Add a new separate job status for "Manually killed" → Add a new buildbot job status for "User cancelled"
> Catlee, ok for me to merge to the production branch? Anything specific else > I need to do whilst doing that? Looking at previous merges, the answer to that seems to be no, so I've just gone ahead and merged to production-0.8 so we can expedite this bug :-)
Depends on: 855670
* TBPL part if landed + deployed (can land before the rest). * buildbot repo part landed + merged to production-0.8 -> Please can you pull the buildbot repo onto the masters (although I imagine this is automated?) and restart them. Cheers :-)
Flags: needinfo?(catlee)
s/if/is/
It's semi-automated. We need to restart all masters to pick up this change. This can be done piecemeal, but still will take quite some time.
Flags: needinfo?(catlee)
Whiteboard: [tbpl] → [tbpl][buildduty]
This is on all build/test/try masters right now. Still needs to be deployed on the schedulers just for completeness, but they don't actually make use of it.
All deployed!
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: