Closed Bug 1286693 Opened 8 years ago Closed 7 years ago

[meta] Improve developer UX for evaluating the outcomes of Try pushes

Categories

(Tree Management :: Treeherder, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: jgriffin, Unassigned)

References

Details

Currently, evaluating the output of a Try push can be pretty confusing for the uninitiated. This is partly due to the difficulty in determining whether a failed job is a known intermittent or not, and partly because the UI is dense and doesn't provide much in the way of guidance. This meta bug will track efforts to improve the situation.
Depends on: 1286695
I asked the sheriffs last week how they determine if a push is good. I attempted to translate their answers into code. https://hg.mozilla.org/users/gszorc_mozilla.com/version-control-tools/rev/debb35cb9a23 is what I have so far. It is far from robust. I'm sure anyone who knows the Treeherder data model better than me can find tons of bugs. But it was able to correctly identify a few "good" changesets from autoland and inbound, so it kinda/sorta works. I'd love for Treeherder to expose an "is this commit good" bit. But I'll settle for a reusable Python function that can determine the same thing by querying HTTP APIs :)
(In reply to Gregory Szorc [:gps] from comment #1) > I asked the sheriffs last week how they determine if a push is good. I > attempted to translate their answers into code. > > https://hg.mozilla.org/users/gszorc_mozilla.com/version-control-tools/rev/ > debb35cb9a23 is what I have so far. It is far from robust. I'm sure anyone > who knows the Treeherder data model better than me can find tons of bugs. > But it was able to correctly identify a few "good" changesets from autoland > and inbound, so it kinda/sorta works. > > I'd love for Treeherder to expose an "is this commit good" bit. But I'll > settle for a reusable Python function that can determine the same thing by > querying HTTP APIs :) Yes, that's the ideal end state. Doing that for a Try push, as opposed to determining when an inbound push is safe to merge, probably depends on ramping up auto classification so that it can match most known failures, and adding some more intelligent retrigger logic to handle cases not matched by auto classify. In the meanwhile, we might get some mileage out of creating some docs on MDN that describe how to determine if a job failure is intermittent or not, and linking to that from Try pushes in Treeherder.
I think the algorithm to determine whether a Try push is good to land looks something like this, comments welcome: 1 - Click on the first failed job. Compare the bug suggestions for this job against the failure lines reported in the log. If they look similar, assume that it's a known failure and continue. 2 - If there's no good match in bug suggestions, retrigger the job. If the retriggered job fails again in the same way, or in another way that also has no good match in bug suggestions, assume that the failure is likely my fault. I may retrigger once or twice more to verify. 3 - Continue iterating through failed jobs until I either find a failure that I suspect is mine, or run out of failures. I think we could formalize this a bit better, with the help of sheriffs, to give people an idea of how to approach the problem, until automatic classification is robust enough that we can employ some retrigger-and-compare algorithm to handle the unmatched failures.
This meta bug isn't adding much value at the moment. Let's use mailing lists/other forms to discuss ideas and file bugs for concrete tasks.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.