Closed Bug 1478057 Opened 6 years ago Closed 6 years ago

Report the mean of all measurements per pageload, not separate values per measurement type

Categories

(Testing :: Raptor, defect)

Version 3
defect
Not set
normal

Tracking

(firefox63 fixed)

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: rwood, Assigned: rwood)

References

Details

Attachments

(1 file, 1 obsolete file)

Currently if you turn on multiple measurements per pageload, i.e. fnbpaint and hero, in say raptor-tp6, Raptor will report to treeherder/perfherder like this: raptor-firefox-tp6-amazon raptor-firefox-tp6-amazon-fnbpaint opt: 906 raptor-firefox-tp6-amazon raptor-firefox-tp6-amazon-hero:hero1 opt: 2645.5 raptor-firefox-tp6-facebook raptor-firefox-tp6-facebook-fnbpaint opt: 1037 raptor-firefox-tp6-facebook raptor-firefox-tp6-facebook-hero:hero1 opt: 2545 raptor-firefox-tp6-google raptor-firefox-tp6-google-fnbpaint opt: 276 raptor-firefox-tp6-google raptor-firefox-tp6-google-hero:hero1 opt: 645.5 raptor-firefox-tp6-youtube raptor-firefox-tp6-youtube-fnbpaint opt: 500 raptor-firefox-tp6-youtube raptor-firefox-tp6-youtube-hero:hero1 opt: 2551.5 That's too many data points. Update Raptor so that when there are multiple values being measured per pageload, just report one median value per page. So each page will just have one entry (i.e. 'raptor-firefox-tp6-amazon') which will be the median of all the measurements for that page.
Blocks: 1473365
Assignee: nobody → rwood
Status: NEW → ASSIGNED
There are two items here. 1) As in the description, if a single raptor pageload test (i.e. raptor-tp6-google-firefox) has multiple values being measured per pageload, we should just report one median for all of those measures as the top-level page result. That way we can add as many measurements per page as we like, but it will still only result in a single test output line on treeherder/perferder. The replicates for each measurement type will still be available in the detailed JSON data anyway. 2) Also add an option so that test pages within a raptor test can be aggregated into one overall suite result. For example, to add a new test page to a test like raptor-tp6, currently you would add a new section in the test INI. It's done that way so that each page can be run individually (i.e. raptor-tp6-google-firefox) instead of running the entire test with all the pages. Each test INI section results in a new test output line on treeherder/perfherder. What if we want to add a new raptor test though that has say 25 test pages? That's way too many data points, so we need the ability to have the results aggregated. Possibly add a new 'aggregate_tests = true' option in the INI. If that is set, when the overall test is run (top level INI) then only one median overall result value will appear on treeherder/perfherder. Again all the replicates for all the pages will be available in the detailed result JSON.
This bug will take care of part (1). Along with that, I'll be able to turn on the measurement of hero elements in raptor-tp6 and raptor-gdocs as well as fnbpaint. Filed Bug 1478773 for part (2).
Blocks: 1478773
In summary, with this change the raptor-tp6 and raptor-gdocs tests will measure both fnbpaint and the hero element, on each pageload, and have one overall value reported for each page. For example for raptor-tp6, before this patch the output in treeherder would be: raptor-tp6-amazon-firefox raptor-tp6-amazon-firefox-fnbpaint opt: N raptor-tp6-amazon-firefox raptor-tp6-amazon-firefox-hero:hero1 opt: N raptor-tp6-facebook-firefox raptor-tp6-facebook-firefox-fnbpaint opt: N raptor-tp6-facebook-firefox raptor-tp6-facebook-firefox-hero:hero1 opt: N raptor-tp6-google-firefox raptor-tp6-google-firefox-fnbpaint opt: N raptor-tp6-google-firefox raptor-tp6-google-firefox-hero:hero1 opt: N raptor-tp6-youtube-firefox raptor-tp6-youtube-firefox-fnbpaint opt: N raptor-tp6-youtube-firefox raptor-tp6-youtube-firefox-hero:hero1 opt: N With this patch, the output will be a single line per test page, the median of both, i.e.: raptor-tp6-amazon-firefox opt: N raptor-tp6-facebook-firefox opt: N raptor-tp6-google-firefox opt: N raptor-tp6-youtube-firefox opt: N If you look at the PERFHERDER_DATA output (terminal/logs) or the perfherder-data.json (taskcluster artifact for the test job) then you'll see all the replicates - fnbpaint and hero measurements for each pageload.
Blocks: 1449180
No longer blocks: 1473078
Comment on attachment 8995301 [details] Bug 1478057 - Report the median of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259772/#review266902 ::: testing/raptor/raptor/output.py:94 (Diff revision 4) > + # for pageload tests, if there are > 1 subtests here, that means there > + # were multiple measurements captured in each single pageload; we want > + # to get the mean of those values and report 1 overall 'suite' value > + # for the page; so that each test page/URL only has 1 line output > + # on treeherder/perfherder (all replicates available in the JSON) > + if len(subtests) > 1: We already have a similar construct under the `elif test.type == "benchmark"`. Please unindent that one level instead and update the Output.construct_summary() method. We need to be consistent here.
Comment on attachment 8995301 [details] Bug 1478057 - Report the median of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259772/#review266908 ::: testing/raptor/raptor/output.py:74 (Diff revision 4) > # 328, 295, 290, 286, 270, 279, 280, 346, 303, 308, 398, 281]}, u'browser': > # u'Firefox 62.0a1 20180528123052', u'lower_is_better': True, u'page': > # u'https://www.amazon.com/s/url=search-alias%3Daps&field-keywords=laptop', > # u'unit': u'ms', u'alert_threshold': 2} > > for key, values in test.measurements.iteritems(): Please replace `key` with `measurement_name` and `values` with `replicates` to make the code more readable.
(In reply to Robert Wood [:rwood] from comment #1) > 1) As in the description, if a single raptor pageload test (i.e. > raptor-tp6-google-firefox) has multiple values being measured per pageload, > we should just report one median for all of those measures as the top-level > page result. That way we can add as many measurements per page as we like, > but it will still only result in a single test output line on > treeherder/perferder. The replicates for each measurement type will still be > available in the detailed JSON data anyway. This bug was filed for aggregating values of multiple measurement types using a median. Bug the implementation aggregates them by averaging values (via filter.mean, not filter.median). I believe we need to update the summary of this bug.
Flags: needinfo?(rwood)
Basically, we want "Report the mean of all measurements per pageload, not separate values per measurement type".
Comment on attachment 8995301 [details] Bug 1478057 - Report the median of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259772/#review266916
Attachment #8995301 - Flags: review?(igoldan) → review-
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #14) > Comment on attachment 8995301 [details] > Bug 1478057 - Report the median of all measurements per pageload, not > separate values per measurement type; > > https://reviewboard.mozilla.org/r/259772/#review266916 Can you please r+? The two issues you raised are not issues, just nits IMO, thank you
Flags: needinfo?(rwood)
Comment on attachment 8995301 [details] Bug 1478057 - Report the median of all measurements per pageload, not separate values per measurement type; I will update those 2 nits before landing thanks
Attachment #8995301 - Flags: review- → review?
Comment on attachment 8995301 [details] Bug 1478057 - Report the median of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259772/#review266974
Attachment #8995301 - Flags: review+
(In reply to Robert Wood [:rwood] from comment #16) > Comment on attachment 8995301 [details] > Bug 1478057 - Report the median of all measurements per pageload, not > separate values per measurement type; > > I will update those 2 nits before landing thanks I have to read a review how-to. I thought I have to r- if I ask for anything.
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #18) > (In reply to Robert Wood [:rwood] from comment #16) > > Comment on attachment 8995301 [details] > > Bug 1478057 - Report the median of all measurements per pageload, not > > separate values per measurement type; > > > > I will update those 2 nits before landing thanks > > I have to read a review how-to. I thought I have to r- if I ask for anything. Well generally if it's just small items you can give an r+ on condition that the nits are fixed. I appreciate it in this case as that allows me to update the patch and land it today - otherwise because of our time zone difference I probably wouldn't be able to land it until after the weekend. :)
Summary: Report the median of all measurements per pageload, not separate values per measurement type → Report the mean of all measurements per pageload, not separate values per measurement type
Attachment #8995301 - Attachment is obsolete: true
Attachment #8995301 - Flags: review?
I'm getting HTTP 500 errors again from MozReview when updating a review, the only way I can seem to fix that is to create a new review. Creating a new one, will carry fw the r+ and push to try.
Comment on attachment 8995301 [details] Bug 1478057 - Report the median of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259770/#review266996
(In reply to Robert Wood [:rwood] from comment #20) > I'm getting HTTP 500 errors again from MozReview when updating a review, the > only way I can seem to fix that is to create a new review. Creating a new > one, will carry fw the r+ and push to try. Can't carry fw the r+ in MozReview I guess because it's a new review. Will await another review by igoldan...
Comment on attachment 8995531 [details] Bug 1478057 - Report the mean of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259932/#review267004 ::: testing/raptor/raptor/output.py:412 (Diff revision 1) > elif testname.startswith('raptor-sunspider'): > return self.sunspider_score(vals) > elif testname.startswith('raptor-webaudio'): > return self.webaudio_score(vals) > elif len(vals) > 1: > - return filter.geometric_mean([i for i, j in vals]) > + return round(filter.geometric_mean([i for i, j in vals]), 2) We have 2 measurement types, so `elif(vals) > 1` evaluates to True, meaning we'll get a geometric mean instead of just filter.mean for our pageload test.
Comment on attachment 8995531 [details] Bug 1478057 - Report the mean of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259932/#review267004 > We have 2 measurement types, so `elif(vals) > 1` evaluates to True, meaning we'll get a geometric mean instead of just filter.mean for our pageload test. Yes, thanks for noticing - we actually want a geometric mean because that's better to use when the values aren't directly comparible i.e. they are coming from first non blank paint, and hero element display. If they were all of the same then mean woud be fine. I realized this when doing the update (that is how Talos does it as well).
Comment on attachment 8995531 [details] Bug 1478057 - Report the mean of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259932/#review267004 > Yes, thanks for noticing - we actually want a geometric mean because that's better to use when the values aren't directly comparible i.e. they are coming from first non blank paint, and hero element display. If they were all of the same then mean woud be fine. I realized this when doing the update (that is how Talos does it as well). This is because Output.construct_summary was used only for benchmark tests. Maybe specify an extra param for this function and move all those if/elif inside a bigger if test_type == 'pageload'...elif test_type == 'benchmark' else return the mean.
Comment on attachment 8995531 [details] Bug 1478057 - Report the mean of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259932/#review267004 > This is because Output.construct_summary was used only for benchmark tests. Maybe specify an extra param for this function and move all those if/elif inside a bigger if test_type == 'pageload'...elif test_type == 'benchmark' else return the mean. If you look back in the file history you'll see construct_results was originally being called for all types of tests, but at one point I moved it. I don't agree with your latest suggestion, there's no need to rework this anymore IMO, thanks.
Comment on attachment 8995531 [details] Bug 1478057 - Report the mean of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259932/#review267016 If all tests go well, this is good to land.
Attachment #8995531 - Flags: review?(igoldan) → review+
Pushed by rwood@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/18d62cfe5d86 Report the mean of all measurements per pageload, not separate values per measurement type; r=igoldan
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
No longer blocks: 1473365
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: