421895 - Enable users to cancel Try Server builds

Reporter

Description

•

17 years ago

If I upload a patch and I see almost immediately that it doesn't work on Mac, I'd like to be able to cancel the other builds so I can try again without waiting hours for the Windows build to complete. This could really shorten the cycle between various attempts to get a patch working on other platforms.

As well as a manual button to cancel your build, it would be nice to have an option to have all platforms cancel their builds automatically if even one of the platforms fails, since usually you're going to want to retry a new patch. Might then be good to have the automatically canceled builds turn a color other than red (so you can see which build was the actual failing build).

Reed Loden [:reed]

Updated

•

17 years ago

Assignee: server-ops → nobody

Component: Server Operations → Try Server

Product: mozilla.org → Webtools

QA Contact: justin → try-server

bhearsum@mozilla.com (:bhearsum)

Updated

•

17 years ago

Priority: -- → P3

bhearsum@mozilla.com (:bhearsum)

Comment 2

•

16 years ago

Mass change of target milestone.

Target Milestone: --- → Future

Reed Loden [:reed]

Updated

•

15 years ago

Component: Try Server → Release Engineering

Product: Webtools → mozilla.org

QA Contact: try-server → release

Target Milestone: Future → ---

Aki Sasaki (not active)

Comment 3

•

15 years ago

I believe this is wanted for production-master as well.
Futuring until we have the resources for this.

Component: Release Engineering → Release Engineering: Future

Chris Cooper [:coop] (he/him)

Comment 4

•

15 years ago

Mass move of bugs from Release Engineering:Future -> Release Engineering. See
http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.

Component: Release Engineering: Future → Release Engineering

bhearsum@mozilla.com (:bhearsum)

Updated

•

15 years ago

Depends on: 539588

Whiteboard: [tryserver]

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

14 years ago

Depends on: 520227

Priority: P3 → P4

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

14 years ago

Priority: P4 → P5

Chris AtLee [:catlee]

Assignee

Comment 5

•

14 years ago

It's possible to cancel *pending* builds using schedulerdb.  Canceling active builds still requires talking directly to the buildbot master.

Whiteboard: [tryserver] → [tryserver][db]

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

14 years ago

Blocks: try_enhancements

neil@parkwaycc.co.uk

Comment 6

•

14 years ago

For cancelling pending builds, it would be nice to be able to
hg commit --close-branch
hg push -f try

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

14 years ago

Assignee: nobody → lsblakk

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

14 years ago

Whiteboard: [tryserver][db] → [tryserver][db][triagefollowup]

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

14 years ago

Whiteboard: [tryserver][db][triagefollowup] → [tryserver][db]

Chris AtLee [:catlee]

Assignee

Comment 7

•

14 years ago

Will do this as part of buildapi

Assignee: lsblakk → catlee

Depends on: 591351

Chris AtLee [:catlee]

Assignee

Updated

•

14 years ago

Priority: P5 → P3

Chris AtLee [:catlee]

Assignee

Comment 8

•

14 years ago

Attached patch read-only bits enabled (obsolete) (deleted) — Details — Splinter Review

Let's start with this.

The read-write stuff is almost there, pending actual testing in staging and read/write access to the database.

Attachment #488118 - Flags: review?(nrthomas)

Nick Thomas [:nthomas] (UTC+12)

Comment 9

•

14 years ago

Comment on attachment 488118 [details] [diff] [review]
read-only bits enabled

Lots of great work here cleaning and expanding on what we have now. I mostly have questions and a few suggestions, but a bit much to r+ with changes.

>diff --git a/buildapi/controllers/builds.py b/buildapi/controllers/builds.py
>+log = logging.getLogger(__name__)
>+access_log = logging.getLogger("buildapi.access")

What was the strategy for using log vs access_log in the functions in this file ? require_auth might be better using access_log, and the log() in branch() is the odd one out for the r/o functions.

>+    def require_auth(self):

We just need ldap auth set up in apache to make this work ? When we get to r/w that is.

>diff --git a/buildapi/lib/helpers.py b/buildapi/lib/helpers.py
> def convert_master(m):
>+    fqdn, port = addr_for_master(m)
>+    pretty_name = '%s:%i' % (fqdn, port)

I was using everything up to the first '.' before, which is useful for pending/running because it's shorter. Are you making that change intentionally ?

>diff --git a/buildapi/model/builds.py b/buildapi/model/builds.py
>+def requestFromRow(row):
>+def buildFromRow(row):
>+    request = requestFromRow(row)
>+    build = {
>+        'buildid': row.buildid,
>+        'requests': [request],

I had thought that including requests in a build blog was duplicating quite a lot of information, but perhaps it's just buildername and branch.

>+def getRevision(branch, revision, starttime=None, endtime=None, limit=None):
>+    # TODO: Look at changes table too
>+    q = getBuildsQuery(branch)

For unscheduled jobs ?

>+def reprioritizeRequest(who, brid, priority, engine):
>+    if r['claimed_at'] != 0:
>+        log.info("Request %i is already running, nothing to do", brid)
>+        return False
>+    if r['complete'] != 0:
>+        log.info("Request %i is complete, nothing to do", brid)
>+        return False

These are the messages end users would want to see. Can we pass them back in the response ?

>+    q = br.update().where(and_(br.c.id == brid, br.c.complete == 0))

Surprised there isn't a claimed_at != 0 condition here too. It looks like a guard against the job getting picked up since the first query.

>+    res = q.execute(bind=engine)
>+    return True

What happens if the execute fails, or stalls ? Is there a timeout ? Is there meant to be some handling of res here ?

>+def cancelRequest(who, brid, engine):
Similar comments as reprioritizeRequest.

>+def cancelBuild(who, bid, engine):

You could use the r/o connection here for smidgen more safety.

>diff --git a/buildapi/tests/functional/test_builds.py b/buildapi/tests/functional/test_builds.py
>+    def setUp(self):
>+        self.engine = sqlalchemy.create_engine("sqlite:///:memory:")
>+        sql = open(os.path.join(os.path.dirname(os.path.dirname(__file__)), "state.sql")).read().split(";")

state.sql is another file to be committed ?

>diff --git a/test.ini b/test.ini
>+branches = branch1, branch2
>+
>+auth_override =
>+
>+masters.aglon.name = aglon:/home/catlee/mozilla/buildapi/buildapi/tests/master
>+masters.aglon.fqdn = aglon.localdomain
>+masters.aglon.port = 5000

How will the prod values be handled for this ? Perhaps add a production.ini.

Attachment #488118 - Flags: review?(nrthomas) → review-

Chris AtLee [:catlee]

Assignee

Comment 10

•

14 years ago

(In reply to comment #9)
> Comment on attachment 488118 [details] [diff] [review]
> read-only bits enabled
> 
> Lots of great work here cleaning and expanding on what we have now. I mostly
> have questions and a few suggestions, but a bit much to r+ with changes.
> 
> >diff --git a/buildapi/controllers/builds.py b/buildapi/controllers/builds.py
> >+log = logging.getLogger(__name__)
> >+access_log = logging.getLogger("buildapi.access")
> 
> What was the strategy for using log vs access_log in the functions in this file
> ? require_auth might be better using access_log, and the log() in branch() is
> the odd one out for the r/o functions.

Ok, I'll fix it up so that r/o functions don't do explicit logging (it's available elsewhere if we need it), and have require_auth use access_log.

> 
> >+    def require_auth(self):
> 
> We just need ldap auth set up in apache to make this work ? When we get to r/w
> that is.

The proxy on build.m.o is set up to do LDAP auth and forwards that along to the app on cruncher.

> >diff --git a/buildapi/lib/helpers.py b/buildapi/lib/helpers.py
> > def convert_master(m):
> >+    fqdn, port = addr_for_master(m)
> >+    pretty_name = '%s:%i' % (fqdn, port)
> 
> I was using everything up to the first '.' before, which is useful for
> pending/running because it's shorter. Are you making that change intentionally
> ?

Nope, fixed it now.

> >diff --git a/buildapi/model/builds.py b/buildapi/model/builds.py
> >+def requestFromRow(row):
> >+def buildFromRow(row):
> >+    request = requestFromRow(row)
> >+    build = {
> >+        'buildid': row.buildid,
> >+        'requests': [request],
> 
> I had thought that including requests in a build blog was duplicating quite a
> lot of information, but perhaps it's just buildername and branch.

I wanted to make it easy to go back and forth between builds and requests.  Including the request information here seems like a good concession to make for convenience.

> >+def getRevision(branch, revision, starttime=None, endtime=None, limit=None):
> >+    # TODO: Look at changes table too
> >+    q = getBuildsQuery(branch)
> 
> For unscheduled jobs ?

And for changes that were merged with others into a single sourcestamp.  I've updated the comment.

> >+def reprioritizeRequest(who, brid, priority, engine):
> >+    if r['claimed_at'] != 0:
> >+        log.info("Request %i is already running, nothing to do", brid)
> >+        return False
> >+    if r['complete'] != 0:
> >+        log.info("Request %i is complete, nothing to do", brid)
> >+        return False
> 
> These are the messages end users would want to see. Can we pass them back in
> the response ?

This code is gone now with the message broker approach.

> 
> >+    q = br.update().where(and_(br.c.id == brid, br.c.complete == 0))
> 
> Surprised there isn't a claimed_at != 0 condition here too. It looks like a
> guard against the job getting picked up since the first query.

gone.

> 
> >+    res = q.execute(bind=engine)
> >+    return True
> 
> What happens if the execute fails, or stalls ? Is there a timeout ? Is there
> meant to be some handling of res here ?

gone.

> 
> >+def cancelRequest(who, brid, engine):
> Similar comments as reprioritizeRequest.

gone.

> 
> >+def cancelBuild(who, bid, engine):
> 
> You could use the r/o connection here for smidgen more safety.

gone.

> >diff --git a/buildapi/tests/functional/test_builds.py b/buildapi/tests/functional/test_builds.py
> >+    def setUp(self):
> >+        self.engine = sqlalchemy.create_engine("sqlite:///:memory:")
> >+        sql = open(os.path.join(os.path.dirname(os.path.dirname(__file__)), "state.sql")).read().split(";")
> 
> state.sql is another file to be committed ?

Yes, it's in my repo...forget if I stripped that out of the last patch for brevity or if I forgot to hg add it.

> 
> >diff --git a/test.ini b/test.ini
> >+branches = branch1, branch2
> >+
> >+auth_override =
> >+
> >+masters.aglon.name = aglon:/home/catlee/mozilla/buildapi/buildapi/tests/master
> >+masters.aglon.fqdn = aglon.localdomain
> >+masters.aglon.port = 5000
> 
> How will the prod values be handled for this ? Perhaps add a production.ini.

Yup!

Chris AtLee [:catlee]

Assignee

Comment 11

•

14 years ago

Attached patch read-only bits + a bit of read-write (deleted) — Details — Splinter Review

I think I've addressed your comments from the previous patch.

The read-write parts have been gutted and replaced by calls out to a message broker.

scripts/buildapi-agent.py is responsible for acting on the job requests to actually cancel builds, reprioritize, etc.

Some implementation notes are here:
http://hg.mozilla.org/users/catlee_mozilla.com/buildapi/file/fa37a00544bf/RESTAPI_NOTES.txt

Attachment #488118 - Attachment is obsolete: true

Attachment #494194 - Flags: feedback?(nrthomas)

Chris AtLee [:catlee]

Assignee

Updated

•

14 years ago

Attachment #494194 - Flags: feedback?(salbiz)

Chris AtLee [:catlee]

Assignee

Updated

•

14 years ago

Attachment #494194 - Flags: feedback?(bear)

Chris AtLee [:catlee]

Assignee

Comment 12

•

14 years ago

Attached patch Initial self-serve implementation (deleted) — Details — Splinter Review

It's still missing some bits, but I think it's good enough to start having people poke at.

Attachment #495564 - Flags: review?

Nick Thomas [:nthomas] (UTC+12)

Comment 13

•

14 years ago

Comment on attachment 494194 [details] [diff] [review]
read-only bits + a bit of read-write

Is it still useful to look at this patch given the later one attached and the changes we discussed last week ?

Chris AtLee [:catlee]

Assignee

Updated

•

14 years ago

Attachment #494194 - Flags: feedback?(salbiz)

Attachment #494194 - Flags: feedback?(nrthomas)

Attachment #494194 - Flags: feedback?(bear)

Chris AtLee [:catlee]

Assignee

Updated

•

14 years ago

Attachment #495564 - Flags: review?

Chris AtLee [:catlee]

Assignee

Comment 14

•

14 years ago

self-serve now allows you to do this.

https://build.mozilla.org/buildapi/self-serve/try

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

read-only bits enabled 14 years ago Chris AtLee [:catlee] (deleted), patch	nthomas : review-	Details \| Diff \| Splinter Review
read-only bits + a bit of read-write 14 years ago Chris AtLee [:catlee] (deleted), patch		Details \| Diff \| Splinter Review
Initial self-serve implementation 14 years ago Chris AtLee [:catlee] (deleted), patch		Details \| Diff \| Splinter Review