Closed
Bug 1049657
Opened 10 years ago
Closed 8 years ago
monitoring for buildbot master step delay
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Callek, Unassigned)
References
Details
Sometimes our buildbot steps take far longer than they should, due to load on the master.
The load criteria can vary wildly, where usually the fix is to just add more masters, or split the slave pool in other ways.
We should monitor for this.
(e.g. is one step that normally takes <1s on a slave could take 30s+ to complete and start the next step, this extra time adds up fast)
Comment 1•10 years ago
|
||
I've been using this code to submit master lag times to graphite. Somebody should be able to use this to generate a coarser metric to use by nagios. e.g. if 50th percentile rises above 10s, we should get an alert.
#!/usr/bin/env python
import sqlalchemy as sa
import time
from datetime import timedelta
import logging
log = logging.getLogger(__name__)
def find_lag_since(db, build_id):
q = sa.text("""
SELECT builds.id as build_id, masters.name as master, steps.starttime, steps.endtime FROM masters, builds, steps
WHERE
builds.master_id = masters.id AND
steps.build_id = builds.id AND
steps.name = 'get_basedir' AND
builds.id > :build_id
""")
return db.execute(q, build_id=build_id)
def get_last_build_id(db, d):
q = sa.text("SELECT id FROM builds WHERE starttime >= :d ORDER BY starttime asc limit 1")
return db.execute(q, d=d).fetchone()[0]
def main():
import config
from build_times import GraphiteSubmitter, td2s, dt2ts
logging.basicConfig(format="%(asctime)s - %(message)s", level=logging.DEBUG)
db = sa.create_engine("mysql://foobar")
log.debug("getting last_build_id")
last_build_id = get_last_build_id(db, "2014-05-01")
g = GraphiteSubmitter("graphitehost", 2003, config.graphite_api_key)
log.debug("getting lag")
for row in find_lag_since(db, last_build_id):
d = td2s(row.endtime - row.starttime)
t = dt2ts(row.starttime)
g.submit("masterlag.%s" % row.master, d, t)
if __name__ == '__main__':
main()
Assignee | ||
Updated•8 years ago
|
Component: Tools → General
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•