Closed
Bug 1186078
Opened 9 years ago
Closed 7 years ago
[Meta] Tracking bug to bring 24 hours backouts a reality
Categories
(Testing :: Talos, defect)
Testing
Talos
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: vaibhav1994, Unassigned)
References
Details
We currently have some things in place to make 24-hour backouts a reality in perf regressions, but a lot of work is still left. Lets use this bug as a tracker.
:jmaher points out the stages in the life of perf sheriff:
> -1: needs attention
> 0: new
> - possibly do something different if this is a merge (look on other >branches, etc. - for automation we don't need to)
> 1: backfilling: needs backfilling (could be the same as #1)
> - mozci to verify rev +- 2 (rev-2, rev-1, rev, rev+1, rev+2) has data
> - mozci to schedule 6 data points builds/jobs for the rev a+- 2 (might need a repeat if there are no builds)
> - need to do this in 2 parts, 1 ensure we have builds, 2, ensure we have tests
> - move to stage -1 if we cannot fill in the holes 100% (i.e. build bustage, dontbuild, trees closed, etc.)
> 2: has more data for specific test
> - somehow verify we have a non merge revision and that revision 'a' is where we shift (we could script this in perfherder/alertmanager)
> 3: needs all-talos run
> - mozci: given revision A showing a regression, schedule all-talos (6 runs) for all tests/platforms for Rev A and A-1.
> - mozci: might have to wait for builds
> 4: has all-talos data for revision a and a-1
> - sanity check we have the full set of data
> 5: bug filed
> 6: closed (wontfix, backout, fixed)
Reporter | ||
Comment 1•9 years ago
|
||
A rough state machine suggested by :jmaher
for alert in alerts:
startRev = getPushLog(alert.rev) - 2
endRev = getPushLog(alert.rev) + 2
dataPoints = perfherder.query(alert.branch, alert.platform, alert.test, startRev, endRev)
switch alert.stage:
case 0: #new
if getRevision(alert.rev).merge:
case = -1
break
if alert.branch.endswith('pgo'):
case = -1
break
alert.stage = 1
case 1: #backfilling
if len(dataPoints) < 5:
status = mozci.trigger(alert.buildername, startRev, endRev, times=6)
if len(status.builds) > 0:
alert.stage = 1 # we are waiting on builds, need to run this again
else:
alert.stage = 2
break
alert.stage = 2
case 2: # enough data after initial backfilling, verify
status = mozci.trigger(alert.buildername, startRev, endRev, times=6)
if status.builds > 0 or status.pending > 0 or status.running > 0:
alert.stage = 1 # waiting on builds/tests
break
if len(dataPoints) < 5:
alert.stage = -1 # all builds are done, missing jobs for revisions
break
for data in dataPoints:
if len(data) < 6:
alert.stage = -1 # all builds are done, missing data for jobs
break
# analyze the data, find specific revision:
pl = getPushLog()
badRevisions = []
for rev in pl[startRev:endRev]:
results = perfherder.compare(pl[rev], pl[rev-1], alert.branch, alert.platform, alert.test)
if results.change < -2.0:
badRevisions.append(rev)
if len(badRevisions) != 1:
alert.stage = -1 # too noisy, other issues
break
if getRevision(badRevisions[0]).merge:
case = -1
break
if alert.rev != badRevisions[0]:
alert.rev = badRevisions[0] # we misreported initially, possibly update other tools/status
alert.stage = 3
case 3:
mozci.trigger_all_talos(alert.rev, alert.branch, times=6)
previous_rev = getPushLog(alert.rev) - 2
mozci.trigger_all_talos(previous_rev, alert.branch, times=6)
alert.stage = 4
break
case 4:
# verify all data exists, i.e. jobs are completed
Reporter | ||
Comment 2•9 years ago
|
||
We had a meeting, and these are some things to take action on: https://etherpad.mozilla.org/perf-backouts
Comment 3•7 years ago
|
||
closing out old bugs that haven't been a priority
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•