Closed Bug 421895 Opened 17 years ago Closed 14 years ago

Enable users to cancel Try Server builds

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jwatt, Assigned: catlee)

References

Details

(Whiteboard: [tryserver][db])

Attachments

(2 files, 1 obsolete file)

If I upload a patch and I see almost immediately that it doesn't work on Mac, I'd like to be able to cancel the other builds so I can try again without waiting hours for the Windows build to complete. This could really shorten the cycle between various attempts to get a patch working on other platforms.

As well as a manual button to cancel your build, it would be nice to have an option to have all platforms cancel their builds automatically if even one of the platforms fails, since usually you're going to want to retry a new patch. Might then be good to have the automatically canceled builds turn a color other than red (so you can see which build was the actual failing build).
Assignee: server-ops → nobody
Component: Server Operations → Try Server
Product: mozilla.org → Webtools
QA Contact: justin → try-server
Priority: -- → P3
Mass change of target milestone.
Target Milestone: --- → Future
Component: Try Server → Release Engineering
Product: Webtools → mozilla.org
QA Contact: try-server → release
Target Milestone: Future → ---
I believe this is wanted for production-master as well.
Futuring until we have the resources for this.
Component: Release Engineering → Release Engineering: Future
Mass move of bugs from Release Engineering:Future -> Release Engineering. See
http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.
Component: Release Engineering: Future → Release Engineering
Depends on: 539588
Whiteboard: [tryserver]
Depends on: 520227
Priority: P3 → P4
Priority: P4 → P5
It's possible to cancel *pending* builds using schedulerdb.  Canceling active builds still requires talking directly to the buildbot master.
Whiteboard: [tryserver] → [tryserver][db]
For cancelling pending builds, it would be nice to be able to
hg commit --close-branch
hg push -f try
Assignee: nobody → lsblakk
Whiteboard: [tryserver][db] → [tryserver][db][triagefollowup]
Whiteboard: [tryserver][db][triagefollowup] → [tryserver][db]
Will do this as part of buildapi
Assignee: lsblakk → catlee
Depends on: 591351
Priority: P5 → P3
Attached patch read-only bits enabled (obsolete) (deleted) — Splinter Review
Let's start with this.

The read-write stuff is almost there, pending actual testing in staging and read/write access to the database.
Attachment #488118 - Flags: review?(nrthomas)
Comment on attachment 488118 [details] [diff] [review]
read-only bits enabled

Lots of great work here cleaning and expanding on what we have now. I mostly have questions and a few suggestions, but a bit much to r+ with changes.

>diff --git a/buildapi/controllers/builds.py b/buildapi/controllers/builds.py
>+log = logging.getLogger(__name__)
>+access_log = logging.getLogger("buildapi.access")

What was the strategy for using log vs access_log in the functions in this file ? require_auth might be better using access_log, and the log() in branch() is the odd one out for the r/o functions.

>+    def require_auth(self):

We just need ldap auth set up in apache to make this work ? When we get to r/w that is.

>diff --git a/buildapi/lib/helpers.py b/buildapi/lib/helpers.py
> def convert_master(m):
>+    fqdn, port = addr_for_master(m)
>+    pretty_name = '%s:%i' % (fqdn, port)

I was using everything up to the first '.' before, which is useful for pending/running because it's shorter. Are you making that change intentionally ?

>diff --git a/buildapi/model/builds.py b/buildapi/model/builds.py
>+def requestFromRow(row):
>+def buildFromRow(row):
>+    request = requestFromRow(row)
>+    build = {
>+        'buildid': row.buildid,
>+        'requests': [request],

I had thought that including requests in a build blog was duplicating quite a lot of information, but perhaps it's just buildername and branch.

>+def getRevision(branch, revision, starttime=None, endtime=None, limit=None):
>+    # TODO: Look at changes table too
>+    q = getBuildsQuery(branch)

For unscheduled jobs ?

>+def reprioritizeRequest(who, brid, priority, engine):
>+    if r['claimed_at'] != 0:
>+        log.info("Request %i is already running, nothing to do", brid)
>+        return False
>+    if r['complete'] != 0:
>+        log.info("Request %i is complete, nothing to do", brid)
>+        return False

These are the messages end users would want to see. Can we pass them back in the response ?

>+    q = br.update().where(and_(br.c.id == brid, br.c.complete == 0))

Surprised there isn't a claimed_at != 0 condition here too. It looks like a guard against the job getting picked up since the first query.

>+    res = q.execute(bind=engine)
>+    return True

What happens if the execute fails, or stalls ? Is there a timeout ? Is there meant to be some handling of res here ?

>+def cancelRequest(who, brid, engine):
Similar comments as reprioritizeRequest.

>+def cancelBuild(who, bid, engine):

You could use the r/o connection here for smidgen more safety.

>diff --git a/buildapi/tests/functional/test_builds.py b/buildapi/tests/functional/test_builds.py
>+    def setUp(self):
>+        self.engine = sqlalchemy.create_engine("sqlite:///:memory:")
>+        sql = open(os.path.join(os.path.dirname(os.path.dirname(__file__)), "state.sql")).read().split(";")

state.sql is another file to be committed ?

>diff --git a/test.ini b/test.ini
>+branches = branch1, branch2
>+
>+auth_override =
>+
>+masters.aglon.name = aglon:/home/catlee/mozilla/buildapi/buildapi/tests/master
>+masters.aglon.fqdn = aglon.localdomain
>+masters.aglon.port = 5000

How will the prod values be handled for this ? Perhaps add a production.ini.
Attachment #488118 - Flags: review?(nrthomas) → review-
(In reply to comment #9)
> Comment on attachment 488118 [details] [diff] [review]
> read-only bits enabled
> 
> Lots of great work here cleaning and expanding on what we have now. I mostly
> have questions and a few suggestions, but a bit much to r+ with changes.
> 
> >diff --git a/buildapi/controllers/builds.py b/buildapi/controllers/builds.py
> >+log = logging.getLogger(__name__)
> >+access_log = logging.getLogger("buildapi.access")
> 
> What was the strategy for using log vs access_log in the functions in this file
> ? require_auth might be better using access_log, and the log() in branch() is
> the odd one out for the r/o functions.

Ok, I'll fix it up so that r/o functions don't do explicit logging (it's available elsewhere if we need it), and have require_auth use access_log.

> 
> >+    def require_auth(self):
> 
> We just need ldap auth set up in apache to make this work ? When we get to r/w
> that is.

The proxy on build.m.o is set up to do LDAP auth and forwards that along to the app on cruncher.

> >diff --git a/buildapi/lib/helpers.py b/buildapi/lib/helpers.py
> > def convert_master(m):
> >+    fqdn, port = addr_for_master(m)
> >+    pretty_name = '%s:%i' % (fqdn, port)
> 
> I was using everything up to the first '.' before, which is useful for
> pending/running because it's shorter. Are you making that change intentionally
> ?

Nope, fixed it now.

> >diff --git a/buildapi/model/builds.py b/buildapi/model/builds.py
> >+def requestFromRow(row):
> >+def buildFromRow(row):
> >+    request = requestFromRow(row)
> >+    build = {
> >+        'buildid': row.buildid,
> >+        'requests': [request],
> 
> I had thought that including requests in a build blog was duplicating quite a
> lot of information, but perhaps it's just buildername and branch.

I wanted to make it easy to go back and forth between builds and requests.  Including the request information here seems like a good concession to make for convenience.

> >+def getRevision(branch, revision, starttime=None, endtime=None, limit=None):
> >+    # TODO: Look at changes table too
> >+    q = getBuildsQuery(branch)
> 
> For unscheduled jobs ?

And for changes that were merged with others into a single sourcestamp.  I've updated the comment.

> >+def reprioritizeRequest(who, brid, priority, engine):
> >+    if r['claimed_at'] != 0:
> >+        log.info("Request %i is already running, nothing to do", brid)
> >+        return False
> >+    if r['complete'] != 0:
> >+        log.info("Request %i is complete, nothing to do", brid)
> >+        return False
> 
> These are the messages end users would want to see. Can we pass them back in
> the response ?

This code is gone now with the message broker approach.

> 
> >+    q = br.update().where(and_(br.c.id == brid, br.c.complete == 0))
> 
> Surprised there isn't a claimed_at != 0 condition here too. It looks like a
> guard against the job getting picked up since the first query.

gone.

> 
> >+    res = q.execute(bind=engine)
> >+    return True
> 
> What happens if the execute fails, or stalls ? Is there a timeout ? Is there
> meant to be some handling of res here ?

gone.

> 
> >+def cancelRequest(who, brid, engine):
> Similar comments as reprioritizeRequest.

gone.

> 
> >+def cancelBuild(who, bid, engine):
> 
> You could use the r/o connection here for smidgen more safety.

gone.

> >diff --git a/buildapi/tests/functional/test_builds.py b/buildapi/tests/functional/test_builds.py
> >+    def setUp(self):
> >+        self.engine = sqlalchemy.create_engine("sqlite:///:memory:")
> >+        sql = open(os.path.join(os.path.dirname(os.path.dirname(__file__)), "state.sql")).read().split(";")
> 
> state.sql is another file to be committed ?

Yes, it's in my repo...forget if I stripped that out of the last patch for brevity or if I forgot to hg add it.

> 
> >diff --git a/test.ini b/test.ini
> >+branches = branch1, branch2
> >+
> >+auth_override =
> >+
> >+masters.aglon.name = aglon:/home/catlee/mozilla/buildapi/buildapi/tests/master
> >+masters.aglon.fqdn = aglon.localdomain
> >+masters.aglon.port = 5000
> 
> How will the prod values be handled for this ? Perhaps add a production.ini.

Yup!
I think I've addressed your comments from the previous patch.

The read-write parts have been gutted and replaced by calls out to a message broker.

scripts/buildapi-agent.py is responsible for acting on the job requests to actually cancel builds, reprioritize, etc.

Some implementation notes are here:
http://hg.mozilla.org/users/catlee_mozilla.com/buildapi/file/fa37a00544bf/RESTAPI_NOTES.txt
Attachment #488118 - Attachment is obsolete: true
Attachment #494194 - Flags: feedback?(nrthomas)
Attachment #494194 - Flags: feedback?(salbiz)
Attachment #494194 - Flags: feedback?(bear)
It's still missing some bits, but I think it's good enough to start having people poke at.
Attachment #495564 - Flags: review?
Comment on attachment 494194 [details] [diff] [review]
read-only bits + a bit of read-write

Is it still useful to look at this patch given the later one attached and the changes we discussed last week ?
Attachment #494194 - Flags: feedback?(salbiz)
Attachment #494194 - Flags: feedback?(nrthomas)
Attachment #494194 - Flags: feedback?(bear)
Attachment #495564 - Flags: review?
self-serve now allows you to do this.

https://build.mozilla.org/buildapi/self-serve/try
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: