Closed Bug 1478057 Opened 6 years ago Closed 6 years ago

Report the mean of all measurements per pageload, not separate values per measurement type

Categories

(Testing :: Raptor, defect)

Product:

Component:

Version:

Version 3

Type:

defect

Priority:

Not set

Severity:

normal

Tracking

(firefox63 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla63

Tracking Flags:

Tracking

Status

firefox63

---

fixed

People

(Reporter: rwood, Assigned: rwood)

References

Details

Attachments

(1 file, 1 obsolete file)

Bug 1478057 - Report the median of all measurements per pageload, not separate values per measurement type; 6 years ago Robert Wood [:rwood] (deleted), text/x-review-board-request	igoldan : review+	Details
Bug 1478057 - Report the mean of all measurements per pageload, not separate values per measurement type; 6 years ago Robert Wood [:rwood] (deleted), text/x-review-board-request	igoldan : review+	Details

Robert Wood [:rwood]

Assignee

Description

•

6 years ago

Currently if you turn on multiple measurements per pageload, i.e. fnbpaint and hero, in say raptor-tp6, Raptor will report to treeherder/perfherder like this: raptor-firefox-tp6-amazon raptor-firefox-tp6-amazon-fnbpaint opt: 906 raptor-firefox-tp6-amazon raptor-firefox-tp6-amazon-hero:hero1 opt: 2645.5 raptor-firefox-tp6-facebook raptor-firefox-tp6-facebook-fnbpaint opt: 1037 raptor-firefox-tp6-facebook raptor-firefox-tp6-facebook-hero:hero1 opt: 2545 raptor-firefox-tp6-google raptor-firefox-tp6-google-fnbpaint opt: 276 raptor-firefox-tp6-google raptor-firefox-tp6-google-hero:hero1 opt: 645.5 raptor-firefox-tp6-youtube raptor-firefox-tp6-youtube-fnbpaint opt: 500 raptor-firefox-tp6-youtube raptor-firefox-tp6-youtube-hero:hero1 opt: 2551.5 That's too many data points. Update Raptor so that when there are multiple values being measured per pageload, just report one median value per page. So each page will just have one entry (i.e. 'raptor-firefox-tp6-amazon') which will be the median of all the measurements for that page.

Robert Wood [:rwood]

Assignee

Updated

•

6 years ago

Blocks: 1473365

Robert Wood [:rwood]

Assignee

Updated

•

6 years ago

Assignee: nobody → rwood

Status: NEW → ASSIGNED

Robert Wood [:rwood]

Assignee

Comment 1

•

6 years ago

There are two items here. 1) As in the description, if a single raptor pageload test (i.e. raptor-tp6-google-firefox) has multiple values being measured per pageload, we should just report one median for all of those measures as the top-level page result. That way we can add as many measurements per page as we like, but it will still only result in a single test output line on treeherder/perferder. The replicates for each measurement type will still be available in the detailed JSON data anyway. 2) Also add an option so that test pages within a raptor test can be aggregated into one overall suite result. For example, to add a new test page to a test like raptor-tp6, currently you would add a new section in the test INI. It's done that way so that each page can be run individually (i.e. raptor-tp6-google-firefox) instead of running the entire test with all the pages. Each test INI section results in a new test output line on treeherder/perfherder. What if we want to add a new raptor test though that has say 25 test pages? That's way too many data points, so we need the ability to have the results aggregated. Possibly add a new 'aggregate_tests = true' option in the INI. If that is set, when the overall test is run (top level INI) then only one median overall result value will appear on treeherder/perfherder. Again all the replicates for all the pages will be available in the detailed result JSON.

Robert Wood [:rwood]

Assignee

Comment 2

•

6 years ago

This bug will take care of part (1). Along with that, I'll be able to turn on the measurement of hero elements in raptor-tp6 and raptor-gdocs as well as fnbpaint. Filed Bug 1478773 for part (2).

Robert Wood [:rwood]

Assignee

Updated

•

6 years ago

Blocks: 1478773

Comment hidden (mozreview-request)

Comment hidden (mozreview-request)

Robert Wood [:rwood]

Assignee

Comment 5

•

6 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ade228c53f1282d4361fed8d12a9b70a673bd302

Comment hidden (mozreview-request)

Robert Wood [:rwood]

Assignee

Comment 7

•

6 years ago

In summary, with this change the raptor-tp6 and raptor-gdocs tests will measure both fnbpaint and the hero element, on each pageload, and have one overall value reported for each page. For example for raptor-tp6, before this patch the output in treeherder would be: raptor-tp6-amazon-firefox raptor-tp6-amazon-firefox-fnbpaint opt: N raptor-tp6-amazon-firefox raptor-tp6-amazon-firefox-hero:hero1 opt: N raptor-tp6-facebook-firefox raptor-tp6-facebook-firefox-fnbpaint opt: N raptor-tp6-facebook-firefox raptor-tp6-facebook-firefox-hero:hero1 opt: N raptor-tp6-google-firefox raptor-tp6-google-firefox-fnbpaint opt: N raptor-tp6-google-firefox raptor-tp6-google-firefox-hero:hero1 opt: N raptor-tp6-youtube-firefox raptor-tp6-youtube-firefox-fnbpaint opt: N raptor-tp6-youtube-firefox raptor-tp6-youtube-firefox-hero:hero1 opt: N With this patch, the output will be a single line per test page, the median of both, i.e.: raptor-tp6-amazon-firefox opt: N raptor-tp6-facebook-firefox opt: N raptor-tp6-google-firefox opt: N raptor-tp6-youtube-firefox opt: N If you look at the PERFHERDER_DATA output (terminal/logs) or the perfherder-data.json (taskcluster artifact for the test job) then you'll see all the replicates - fnbpaint and hero measurements for each pageload.

Comment hidden (mozreview-request)

Robert Wood [:rwood]

Assignee

Comment 9

•

6 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5dad7aa33dd62c0724027a95e757fff3c3bd19c3

Robert Wood [:rwood]

Assignee

Updated

•

6 years ago

Blocks: 1449180

Robert Wood [:rwood]

Assignee

Updated

•

6 years ago

No longer blocks: 1473078

Ionuț Goldan [:igoldan]

Comment 10

•

6 years ago

mozreview-review

Comment on attachment 8995301 [details] Bug 1478057 - Report the median of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259772/#review266902 ::: testing/raptor/raptor/output.py:94 (Diff revision 4) > + # for pageload tests, if there are > 1 subtests here, that means there > + # were multiple measurements captured in each single pageload; we want > + # to get the mean of those values and report 1 overall 'suite' value > + # for the page; so that each test page/URL only has 1 line output > + # on treeherder/perfherder (all replicates available in the JSON) > + if len(subtests) > 1: We already have a similar construct under the `elif test.type == "benchmark"`. Please unindent that one level instead and update the Output.construct_summary() method. We need to be consistent here.

Ionuț Goldan [:igoldan]

Comment 11

•

6 years ago

mozreview-review

Comment on attachment 8995301 [details] Bug 1478057 - Report the median of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259772/#review266908 ::: testing/raptor/raptor/output.py:74 (Diff revision 4) > # 328, 295, 290, 286, 270, 279, 280, 346, 303, 308, 398, 281]}, u'browser': > # u'Firefox 62.0a1 20180528123052', u'lower_is_better': True, u'page': > # u'https://www.amazon.com/s/url=search-alias%3Daps&field-keywords=laptop', > # u'unit': u'ms', u'alert_threshold': 2} > > for key, values in test.measurements.iteritems(): Please replace `key` with `measurement_name` and `values` with `replicates` to make the code more readable.

Ionuț Goldan [:igoldan]

Comment 12

•

6 years ago

(In reply to Robert Wood [:rwood] from comment #1) > 1) As in the description, if a single raptor pageload test (i.e. > raptor-tp6-google-firefox) has multiple values being measured per pageload, > we should just report one median for all of those measures as the top-level > page result. That way we can add as many measurements per page as we like, > but it will still only result in a single test output line on > treeherder/perferder. The replicates for each measurement type will still be > available in the detailed JSON data anyway. This bug was filed for aggregating values of multiple measurement types using a median. Bug the implementation aggregates them by averaging values (via filter.mean, not filter.median). I believe we need to update the summary of this bug.

Flags: needinfo?(rwood)

Ionuț Goldan [:igoldan]

Comment 13

•

6 years ago

Basically, we want "Report the mean of all measurements per pageload, not separate values per measurement type".

Ionuț Goldan [:igoldan]

Comment 14

•

6 years ago

mozreview-review

Comment on attachment 8995301 [details] Bug 1478057 - Report the median of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259772/#review266916

Attachment #8995301 - Flags: review?(igoldan) → review-

Robert Wood [:rwood]

Assignee

Comment 15

•

6 years ago

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #14) > Comment on attachment 8995301 [details] > Bug 1478057 - Report the median of all measurements per pageload, not > separate values per measurement type; > > https://reviewboard.mozilla.org/r/259772/#review266916 Can you please r+? The two issues you raised are not issues, just nits IMO, thank you

Flags: needinfo?(rwood)

Robert Wood [:rwood]

Assignee

Comment 16

•

6 years ago

Comment on attachment 8995301 [details] Bug 1478057 - Report the median of all measurements per pageload, not separate values per measurement type; I will update those 2 nits before landing thanks

Attachment #8995301 - Flags: review- → review?

Ionuț Goldan [:igoldan]

Comment 17

•

6 years ago

mozreview-review

Comment on attachment 8995301 [details] Bug 1478057 - Report the median of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259772/#review266974

Attachment #8995301 - Flags: review+

Ionuț Goldan [:igoldan]

Comment 18

•

6 years ago

(In reply to Robert Wood [:rwood] from comment #16) > Comment on attachment 8995301 [details] > Bug 1478057 - Report the median of all measurements per pageload, not > separate values per measurement type; > > I will update those 2 nits before landing thanks I have to read a review how-to. I thought I have to r- if I ask for anything.

Robert Wood [:rwood]

Assignee

Comment 19

•

6 years ago

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #18) > (In reply to Robert Wood [:rwood] from comment #16) > > Comment on attachment 8995301 [details] > > Bug 1478057 - Report the median of all measurements per pageload, not > > separate values per measurement type; > > > > I will update those 2 nits before landing thanks > > I have to read a review how-to. I thought I have to r- if I ask for anything. Well generally if it's just small items you can give an r+ on condition that the nits are fixed. I appreciate it in this case as that allows me to update the patch and land it today - otherwise because of our time zone difference I probably wouldn't be able to land it until after the weekend. :)

Robert Wood [:rwood]

Assignee

Updated

•

6 years ago

Summary: Report the median of all measurements per pageload, not separate values per measurement type → Report the mean of all measurements per pageload, not separate values per measurement type

Robert Wood [:rwood]

Assignee

Updated

•

6 years ago

Attachment #8995301 - Attachment is obsolete: true

Attachment #8995301 - Flags: review?

Robert Wood [:rwood]

Assignee

Comment 20

•

6 years ago

I'm getting HTTP 500 errors again from MozReview when updating a review, the only way I can seem to fix that is to create a new review. Creating a new one, will carry fw the r+ and push to try.

Robert Wood [:rwood]

Assignee

Comment 21

•

6 years ago

mozreview-review

Comment on attachment 8995301 [details] Bug 1478057 - Report the median of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259770/#review266996

Comment hidden (mozreview-request)

Robert Wood [:rwood]

Assignee

Comment 23

•

6 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=bcb5b801eeec0a585f49f9e4e638beecdd1ee7e2

Robert Wood [:rwood]

Assignee

Comment 24

•

6 years ago

(In reply to Robert Wood [:rwood] from comment #20) > I'm getting HTTP 500 errors again from MozReview when updating a review, the > only way I can seem to fix that is to create a new review. Creating a new > one, will carry fw the r+ and push to try. Can't carry fw the r+ in MozReview I guess because it's a new review. Will await another review by igoldan...

Ionuț Goldan [:igoldan]

Comment 25

•

6 years ago

mozreview-review

Comment on attachment 8995531 [details] Bug 1478057 - Report the mean of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259932/#review267004 ::: testing/raptor/raptor/output.py:412 (Diff revision 1) > elif testname.startswith('raptor-sunspider'): > return self.sunspider_score(vals) > elif testname.startswith('raptor-webaudio'): > return self.webaudio_score(vals) > elif len(vals) > 1: > - return filter.geometric_mean([i for i, j in vals]) > + return round(filter.geometric_mean([i for i, j in vals]), 2) We have 2 measurement types, so `elif(vals) > 1` evaluates to True, meaning we'll get a geometric mean instead of just filter.mean for our pageload test.

Robert Wood [:rwood]

Assignee

Comment 26

•

6 years ago

mozreview-review-reply

Comment on attachment 8995531 [details] Bug 1478057 - Report the mean of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259932/#review267004 > We have 2 measurement types, so `elif(vals) > 1` evaluates to True, meaning we'll get a geometric mean instead of just filter.mean for our pageload test. Yes, thanks for noticing - we actually want a geometric mean because that's better to use when the values aren't directly comparible i.e. they are coming from first non blank paint, and hero element display. If they were all of the same then mean woud be fine. I realized this when doing the update (that is how Talos does it as well).

Ionuț Goldan [:igoldan]

Comment 27

•

6 years ago

mozreview-review-reply

Comment on attachment 8995531 [details] Bug 1478057 - Report the mean of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259932/#review267004 > Yes, thanks for noticing - we actually want a geometric mean because that's better to use when the values aren't directly comparible i.e. they are coming from first non blank paint, and hero element display. If they were all of the same then mean woud be fine. I realized this when doing the update (that is how Talos does it as well). This is because Output.construct_summary was used only for benchmark tests. Maybe specify an extra param for this function and move all those if/elif inside a bigger if test_type == 'pageload'...elif test_type == 'benchmark' else return the mean.

Robert Wood [:rwood]

Assignee

Comment 28

•

6 years ago

mozreview-review-reply

Comment on attachment 8995531 [details] Bug 1478057 - Report the mean of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259932/#review267004 > This is because Output.construct_summary was used only for benchmark tests. Maybe specify an extra param for this function and move all those if/elif inside a bigger if test_type == 'pageload'...elif test_type == 'benchmark' else return the mean. If you look back in the file history you'll see construct_results was originally being called for all types of tests, but at one point I moved it. I don't agree with your latest suggestion, there's no need to rework this anymore IMO, thanks.

Ionuț Goldan [:igoldan]

Comment 29

•

6 years ago

mozreview-review

Comment on attachment 8995531 [details] Bug 1478057 - Report the mean of all measurements per pageload, not separate values per measurement type; https://reviewboard.mozilla.org/r/259932/#review267016 If all tests go well, this is good to land.

Attachment #8995531 - Flags: review?(igoldan) → review+

Comment 30

•

6 years ago

Pushed by rwood@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/18d62cfe5d86 Report the mean of all measurements per pageload, not separate values per measurement type; r=igoldan

Daniel Varga [:dvarga]

Comment 31

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/18d62cfe5d86

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

status-firefox63: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla63

Robert Wood [:rwood]

Assignee

Updated

•

6 years ago

No longer blocks: 1473365

You need to log in before you can comment on or make changes to this bug.