Closed Bug 1121655 Opened 10 years ago Closed 8 years ago

Define "tier 2" automated-test frameworks

Categories

(Testing :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mcote, Assigned: bc)

References

Details

Attachments

(1 file, 1 obsolete file)

"Tier 2" automated-test frameworks are loosely defined as "jobs that are semi-sheriffable", meaning they will show up in Treeherder, but sheriffs will not deeply investigate failures, referring developers to them instead. We need to define and document this whole concept and the work to be done both on our tools and infrastructure and in the frameworks themselves. This work includes * Definition of Tier 2 requirements. * Definition of Treeherder features required by Sheriffs or test framework owners to sheriff the Tier 2 test framework. * Definition of Treeherder api features needed by Tier 2 frameworks. * Definitions of Tier 2 framework api features needed by a framework in order to report to Treeherder. This will include Python code examples.
OS: Mac OS X → All
Hardware: x86 → All
Attached file rationale.org (obsolete) (deleted) —
Attachment #8561519 - Flags: feedback?(mcote)
Comment on attachment 8561519 [details] rationale.org Excellent thinking to start with the rationale. :) This is a good overview, but there's one significant problem: we're defining tier 2 as something that is "semi-" or "partially" sheriffable. Your definition above is closer to the status quo; the sheriffs either monitor the results and take action, or they completely ignore the results. The idea for the future, as I understand it anyway, is that sheriffs would still be monitoring tier 2 jobs, and they *may* still do back outs and/or flag intermittents, but it is recognized that there may be failures that they are uncomfortable diagnosing, due to instability or a general lack of knowledge/experience in the test system. In this case, developers need to take action, and the test itself is marked as "broken" until the developers show that it is fixed. So there are three categories: * Tier 1 jobs, which are fully sheriffed, * Tier 2 jobs, which are sheriffed to a certain degree but may be referred to developers, and * Unsheriffed jobs, which would be hidden from the default view, but may be visible to developers (when that feature is added to Treeherder). Also, to be effective, sheriffs must also be able to determine when a failure is due to an intermittent issue, in which case it is not treated as a reason to back out the patch. Related, a reason that a framework may not be fully sheriffable, but may be partially sheriffable, is if the sheriffs do not have the necessary knowledge or experience with the framework to determine if a failure is the result of the patch being tested or is an intermittent or infrastructure failure. I would also not bother mentioning about third-party browser tests. I don't think there are any plans to ever display these tests in treeherder, which is the focus here; things like mozbench are going to remain separate from treeherder and sheriffing for the foreseeable future. In light of the above, I think you should distinguish between the points in the second list that may still make a framework eligible for tier 2, versus points that make a framework completely unsheriffable (but still potentially visible to the developers). Hope that makes sense and I haven't just completely missed something while on PTO. :)
Attachment #8561519 - Flags: feedback?(mcote) → feedback-
(In reply to Mark Côté [:mcote] from comment #2) > Comment on attachment 8561519 [details] > rationale.org > > Excellent thinking to start with the rationale. :) > > This is a good overview, but there's one significant problem: we're defining > tier 2 as something that is "semi-" or "partially" sheriffable. Your > definition above is closer to the status quo; the sheriffs either monitor > the results and take action, or they completely ignore the results. > > The idea for the future, as I understand it anyway, is that sheriffs would > still be monitoring tier 2 jobs, and they *may* still do back outs and/or > flag intermittents, but it is recognized that there may be failures that > they are uncomfortable diagnosing, due to instability or a general lack of > knowledge/experience in the test system. In this case, developers need to > take action, and the test itself is marked as "broken" until the developers > show that it is fixed. Ok. I was definitely approaching it from the idea that Tier 2 was report to treeherder but not be sheriffable approach. mdoglio, what is your understanding about Tier 2 being partially sheriffable? > > So there are three categories: > > * Tier 1 jobs, which are fully sheriffed, > * Tier 2 jobs, which are sheriffed to a certain degree but may be referred > to developers, and > * Unsheriffed jobs, which would be hidden from the default view, but may be > visible to developers (when that feature is added to Treeherder). > > Also, to be effective, sheriffs must also be able to determine when a > failure is due to an intermittent issue, in which case it is not treated as > a reason to back out the patch. > > Related, a reason that a framework may not be fully sheriffable, but may be > partially sheriffable, is if the sheriffs do not have the necessary > knowledge or experience with the framework to determine if a failure is the > result of the patch being tested or is an intermittent or infrastructure > failure. These two seem appropriate to the necessary features/enhancements to Treeherder? > > I would also not bother mentioning about third-party browser tests. I don't > think there are any plans to ever display these tests in treeherder, which > is the focus here; things like mozbench are going to remain separate from > treeherder and sheriffing for the foreseeable future. > Ok. > In light of the above, I think you should distinguish between the points in > the second list that may still make a framework eligible for tier 2, versus > points that make a framework completely unsheriffable (but still potentially > visible to the developers). From my point of view, the thing that would make a framework Tier 2/partially sheriffable would be that it met all of the requirements for Tier 1 except for running on all of the trees that merge into mozilla-central. If a bad patch lands directly on one of the repos that is being tested, the sheriff could directly back it out. If the bad patch lands on a repo that isn't tested, then the sheriff would mark the framework as failing and the developers would be responsible for identifying the offending patch. It seems to me that there is no real difference between Tier 1 and Tier 2 with regard to intermittent failures so long as Bug 1080731 - Add mechanism to flag jobs as "ignore failures" until X and Bug 1131071 - Allow to select a visibility profile in the ui is implemented in Threeherder. mdoglio: would bug 1080731 'ignore failures until X' allow the marking of a specific test, e.g. job_symbol, or framework, e.g. group_name, as ignorable in the Tier 1 profile? would bug 1131071 'select a visibility profile' handle the case of making the ignorable failures visible when desired? I envision the following process: Treeherder would maintain a "tier" attribute which can be modified by sheriffs. It can have values: Tier 3 - unsheriffable job which reports to Treeherder. Tier 2 - partially sheriffable job which doesn't run on all repos merged to mozilla-central. Tier 1 - fully sheriffable job running on all repos merged to mozilla-central. A new test framework begins submitting results to Treeherder. It is unknown and therefore automatically classified as a "Tier 3" unsheriffable job. It may or may not immediately meet the Sheriffing/Job Visibility requirements but is it invisible to the default Sheriff visibility profile. The framework developer continues to add any missing sheriffing/job visibility requirements, while determining which tests are reliable and hiding the broken or intermittent tests. Once the test framework is green modulo the ignored tests, the framework developer could nominate the framework for Tier 1 or 2 depending on if it ran on all trees merged into mozilla-central. Tier 1 and 2 would have the same visibility profile but Tier 2 would indicate which set of tests only run on a limited set of repos. If bustage appeared due to a merge from an untested repo, Sheriffs would then be able to file a bug and mark the framework as failing and invisible to sheriffs until the bug is fixed. Is this definition of Tier 2 as Tier 1 without the full repo coverable sufficient?
Flags: needinfo?(mdoglio)
Flags: needinfo?(mcote)
Ryan, sorry for leaving the sheriffs out of the discussion.
Flags: needinfo?(ryanvm)
> mdoglio, what is your understanding about Tier 2 being partially sheriffable? I guess :mcote is referring to bug 1080731 when he says that a job can be partially sheriffable. My understanding is that a Tier2 job should be not visible to the sheriffs, I think bug 1080731 should be more about Making a job invisible until X. Asking the sheriffs' opinion is probably the best thing to do.
I think we have a major disconnect on what a tier 2 job is and how it is to be used. I'll send out an email asking for a good time to meet up where we can talk it out.
> would bug 1080731 'ignore failures until X' allow the marking of a specific > test, e.g. job_symbol, or framework, e.g. group_name, as ignorable in the > Tier 1 profile? As I said above, I think that bug should be "make a job invisible to sheriffs until X" > > would bug 1131071 'select a visibility profile' handle the case of making > the > ignorable failures visible when desired? A visibility profile will be composed by N rules of visibility. Each rule will have an "apply until X" clause as per bug 1080731. > I envision the following process: > > Treeherder would maintain a "tier" attribute which can be modified by > sheriffs. It can have values: > > Tier 3 - unsheriffable job which reports to Treeherder. > Tier 2 - partially sheriffable job which doesn't run on all repos merged to > mozilla-central. > Tier 1 - fully sheriffable job running on all repos merged to > mozilla-central. > > A new test framework begins submitting results to Treeherder. It is unknown > and therefore automatically classified as a "Tier 3" unsheriffable job. It > may or may not immediately meet the Sheriffing/Job Visibility requirements > but is it invisible to the default Sheriff visibility profile. > > The framework developer continues to add any missing sheriffing/job > visibility requirements, while determining which tests are reliable and > hiding the broken or intermittent tests. > > Once the test framework is green modulo the ignored tests, the framework > developer could nominate the framework for Tier 1 or 2 depending on if it > ran on all trees merged into mozilla-central. > > Tier 1 and 2 would have the same visibility profile but Tier 2 would > indicate which set of tests only run on a limited set of repos. If bustage > appeared due to a merge from an untested repo, Sheriffs would then be able > to file a bug and mark the framework as failing and invisible to sheriffs > until the bug is fixed. > > Is this definition of Tier 2 as Tier 1 without the full repo coverable > sufficient? It seems to be coherent, but I'm still not sure whether the Tier 2 jobs will be part of the sheriff activity or not. In the scenario you described it looks like Tier 1 jobs are important for release managers, Tier 1+2 for Sheriffs + devs and Tier 1+2+3 for some devs.
Flags: needinfo?(mdoglio)
I'm going to defer to our meeting tomorrow.
Flags: needinfo?(ryanvm)
Hopefully the meeting cleared everything up; if there are still open questions, needinfo me again.
Flags: needinfo?(mcote)
Comment on attachment 8561519 [details] rationale.org obsoleting the rationale as it was completely off target.
Attachment #8561519 - Attachment is obsolete: true
Attached file amended notes from 2015-02-11 meeting (deleted) —
Status: NEW → ASSIGNED
Depends on: 1137519
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: