Add raptor test option to specify what measurement(s) to alert on
Categories
(Testing :: Raptor, enhancement, P1)
Tracking
(firefox66 fixed)
Tracking | Status | |
---|---|---|
firefox66 | --- | fixed |
People
(Reporter: jmaher, Assigned: rwood)
References
Details
Attachments
(2 files)
Reporter | ||
Comment 1•6 years ago
|
||
Assignee | ||
Comment 2•6 years ago
|
||
Assignee | ||
Comment 3•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Comment 4•6 years ago
|
||
Comment 5•6 years ago
|
||
Comment 6•6 years ago
|
||
Reporter | ||
Comment 7•6 years ago
|
||
Assignee | ||
Comment 8•6 years ago
|
||
Comment 9•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Comment 10•6 years ago
|
||
Comment 11•6 years ago
|
||
Comment 12•6 years ago
|
||
Comment 14•6 years ago
|
||
Updated•6 years ago
|
Assignee | ||
Comment 15•6 years ago
|
||
Question: given the newinfo that the geometric mean really isn't useful, should we make the top-level reported value in treeherder/perfherder be FCP instead? Then we can alert on that by default since it's the top-level; and then in the subtests we will also have alert true set for 'loadtime'. What do you think? In the case where a test INI doesn't have 'fcp' listed to measure, then we could still run but wouldn't dump anything to Perfherder in production, just an idea. I don't see the point in calculating and reporting the geomean if it's not useful.
Assignee | ||
Comment 16•6 years ago
|
||
(In reply to Robert Wood [:rwood] from comment #15)
(in the case where the test INI has 'alert_on' = fcp, loadtime)
Or perhaps we could use the first 'alert_on' value (i.e. fcp) to be the top-level overall result reported. Or have another INI setting too 'suite_result = fcp' to determines the overall result reported; and then have 'alert_on = fcp, loadtime'.
Comment 17•6 years ago
|
||
I don't like the idea of having the top-level reported value to be variable. If fcp is guaranteed to be available then I'm okay with using that, however I think I'd prefer to either leave the top-level as geometric mean for now, or to use 0. We've discussed that fcp and loadtime are useful to engineers, especially for alerting on regresssions, however there is also the possibility that some calculated score may be useful for dashboards. The geometric mean may not be the ideal approach, but any such score would make sense as the top-level value.
:ekyle do you have some thoughts on this?
Comment 18•6 years ago
|
||
The geometric mean is a good statistic to alert on: It has less variance than the individual measures; it is less prone to false positives.
I attended Vicky's performance team meeting, and my conclusions on what to build have not changed:
- Engineers require dashboards that are fine-grained and measure what they can control; they should be directed to Perfherder whenever possible.
- The performance sheriffs are interested in raising issues on regressions: This is different from the engineers; sheriffs only care why there is a regression only so far that it helps them identify the engineer that can fix the problem; otherwise why is not relevant. What's important to Sheriffs is controlling the inevitable volume of alerts and the minimizing the number of false positives to within reason. The geo-mean provides this; it reduces the number of tracked statistics, it reduces variance so there is less false positives.
- Management wants KPI tracking - These are usually large aggregates over many measures, smoothed over some time window. Management wants to know how far we are from our goal. A geo-mean of particular measures can be used here (Our particular geo-mean is unlikely to be useful to management). The geo-mean is not valued for it's reduced variance, rather it is valued for combining multiple disparate measures into a single statistic. The health dashboard is where such aggregate numbers can/should be reported.
So, keeping in mind we have three different categories of customer, we should not change TP6's suite geo-mean calculation; Our team (igolden) will use it.
We can turn on alerting for subtest measures the engineers are interested in. I am concerned that these subtests will have too much noise, produce too many false positives, and will increase our sheriffing load with little benefit. I suspect our team will alert on regressions in the geo-mean, highlight the particular measure that caused the geo-mean regression, then inform the engineer on what happened. I will leave it to the perf sheriffs to figure that out.
Assignee | ||
Comment 19•6 years ago
|
||
Ok, thank you for the excellent feedback guys! I'll leave it as/is regarding the geomean, and just add a new option for 'alert_on'.
Assignee | ||
Comment 20•6 years ago
|
||
Assignee | ||
Comment 21•6 years ago
|
||
Assignee | ||
Comment 22•6 years ago
|
||
Depends on D17288
Assignee | ||
Comment 23•6 years ago
|
||
Here's an example of the output for tp6-google (with just 2 page-cycles) with 'alert_on = fcp, loadtime' set in the test INI:
15:44:08 INFO - raptor-output PERFHERDER_DATA: {"framework": {"name": "raptor"}, "suites": [{"extraOptions": [], "name": "raptor-tp6-google-firefox", "lowerIsBetter": true, "alertThreshold": 2.0, "value": 569.9, "subtests": [{"name": "raptor-tp6-google-firefox-fcp", "lowerIsBetter": true, "alertThreshold": 2.0, "value": 303, "shouldAlert": true, "replicates": [558, 303], "unit": "ms"}, {"name": "raptor-tp6-google-firefox-fnbpaint", "lowerIsBetter": true, "alertThreshold": 2.0, "value": 303, "replicates": [558, 303], "unit": "ms"}, {"name": "raptor-tp6-google-firefox-hero:hero1", "lowerIsBetter": true, "alertThreshold": 2.0, "value": 790, "replicates": [1228, 790], "unit": "ms"}, {"name": "raptor-tp6-google-firefox-loadtime", "lowerIsBetter": true, "alertThreshold": 2.0, "value": 1305, "shouldAlert": true, "replicates": [1811, 1305], "unit": "ms"}, {"name": "raptor-tp6-google-firefox-dcf", "lowerIsBetter": true, "alertThreshold": 2.0, "value": 291, "replicates": [545, 291], "unit": "ms"}, {"name": "raptor-tp6-google-firefox-ttfi", "lowerIsBetter": true, "alertThreshold": 2.0, "value": 1241, "replicates": [1776, 1241], "unit": "ms"}], "type": "pageload", "unit": "ms"}]}
Assignee | ||
Comment 24•6 years ago
|
||
Comment 25•6 years ago
|
||
Comment 26•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/9a008f412fd7
https://hg.mozilla.org/mozilla-central/rev/07500fee706b
Description
•