54.82% raptor-youtube-playback-geckoview-live (android-hw-g5-7-0-arm7-api-16) regression (Wed Jun 5 2019)
Categories
(Testing :: Raptor, defect, P2)
Tracking
(firefox-esr60 unaffected, firefox67 unaffected, firefox68 unaffected, firefox69+ fixed)
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox67 | --- | unaffected |
firefox68 | --- | unaffected |
firefox69 | + | fixed |
People
(Reporter: igoldan, Assigned: alexandru.irimovici)
References
Details
(Keywords: perf, regression)
Raptor has detected a Firefox performance regression from push:
We need your help to address this regression.
Regressions:
55% raptor-youtube-playback-geckoview-live android-hw-g5-7-0-arm7-api-16 pgo 36.29 -> 56.18
You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=21372
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a Treeherder page showing the Raptor jobs in a pushlog format.
To learn more about the regressing test(s) or reproducing them, please see: https://wiki.mozilla.org/Performance_sheriffing/Raptor
*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***
Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling
Reporter | ||
Comment 1•5 years ago
|
||
Other Android platforms seem to experience this, according to :whimboo. We need to look up & present those cases as well.
As this is a mozilla-central type of alert, the patch must be bisected, as it cannot be backfilled via Treeherder.
Comment 2•5 years ago
|
||
Note that there is also an increase for Aarch64 on Windows and MacOS, but the latter has only a slight increase.
Comment 3•5 years ago
|
||
Comment 4•5 years ago
|
||
It might be good if the last good commit could be re-run with Raptor. If the numbers of that commit is still lower we would know that something in our code caused this regression. Otherwise it might also be a problem with the host serving the media files.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 5•5 years ago
|
||
Hi Jesup, did you already investigate this?
Comment 6•5 years ago
|
||
I looked at the list of commits. This is bug 1555370 - increase the dropped-frames subtest by 100x (% instead of ratio). very clear when you look at https://treeherder.mozilla.org/perf.html#/comparesubtest?framework=10&originalProject=mozilla-central&originalSignature=2051167&newProject=mozilla-central&newSignature=2051167&originalRevision=4d1920fc1dd0d97630d5d59f978e14b9f3ded156&newRevision=ebe7fdd19be5dbca6817987e2d80bbe46b357ad7
Because of geomeaning, this appears as a much smaller regression in the top numbers..... geomeaning these values is bad - you can geomean dropped frames with each other, but geomeaning decoded frames with dropped frames, etc gets ugly. Even geomeaning dropped frames with % dropped frames is problematic.
Assignee | ||
Comment 7•5 years ago
|
||
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #6)
I looked at the list of commits. This is bug 1555370 - increase the dropped-frames subtest by 100x (% instead of ratio). very clear when you look at https://treeherder.mozilla.org/perf.html#/comparesubtest?framework=10&originalProject=mozilla-central&originalSignature=2051167&newProject=mozilla-central&newSignature=2051167&originalRevision=4d1920fc1dd0d97630d5d59f978e14b9f3ded156&newRevision=ebe7fdd19be5dbca6817987e2d80bbe46b357ad7
Should we revert the changes from bug 1555370?
Because of geomeaning, this appears as a much smaller regression in the top numbers..... geomeaning these values is bad - you can geomean dropped frames with each other, but geomeaning decoded frames with dropped frames, etc gets ugly. Even geomeaning dropped frames with % dropped frames is problematic.
So there is no real regression, only harness update. Do you suggest that we should employ another mean algorithm for the benchmark score?
Comment 8•5 years ago
|
||
(In reply to Alexandru Irimovici from comment #7)
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #6)
I looked at the list of commits. This is bug 1555370 - increase the dropped-frames subtest by 100x (% instead of ratio). very clear when you look at https://treeherder.mozilla.org/perf.html#/comparesubtest?framework=10&originalProject=mozilla-central&originalSignature=2051167&newProject=mozilla-central&newSignature=2051167&originalRevision=4d1920fc1dd0d97630d5d59f978e14b9f3ded156&newRevision=ebe7fdd19be5dbca6817987e2d80bbe46b357ad7
Should we revert the changes from bug 1555370?
No - that simply made it record the right value; it had been recording as a % something that was 1/100th of the percentage.
Because of geomeaning, this appears as a much smaller regression in the top numbers..... geomeaning these values is bad - you can geomean dropped frames with each other, but geomeaning decoded frames with dropped frames, etc gets ugly. Even geomeaning dropped frames with % dropped frames is problematic.
So there is no real regression, only harness update. Do you suggest that we should employ another mean algorithm for the benchmark score?
Doing any type of "mean" among things that measure different types of values can cause these problems. It's made worse because geomean (and means in general) treat all the things as the same weight unless you fiddle with them, so when you have it geomeaning loadtime, FCPaint, FNBPaint, and DCF - changes to FCP/FNBP move almost always in lockstep since they're very close in time, so they have "double" weight.
People want a single top-line number - and inherently this can hide information, even if you avoid that sort of problem.
Comment 9•5 years ago
|
||
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #8)
(In reply to Alexandru Irimovici from comment #7)
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #6)
I looked at the list of commits. This is bug 1555370 - increase the dropped-frames subtest by 100x (% instead of ratio). very clear when you look at https://treeherder.mozilla.org/perf.html#/comparesubtest?framework=10&originalProject=mozilla-central&originalSignature=2051167&newProject=mozilla-central&newSignature=2051167&originalRevision=4d1920fc1dd0d97630d5d59f978e14b9f3ded156&newRevision=ebe7fdd19be5dbca6817987e2d80bbe46b357ad7
Should we revert the changes from bug 1555370?
No - that simply made it record the right value; it had been recording as a % something that was 1/100th of the percentage.
Regarding the increase. This cannot be THIS change. The percentage value is is not taking into account for displaying results in perfherder. It's only in use for the health dashboard. Also note that if it would be, the difference shouldn't only be some decimals but hundreds of them.
Comment 10•5 years ago
|
||
The priority flag is not set for this bug.
:gbrown, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 11•5 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #9)
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #8)
...Should we revert the changes from bug 1555370?
No - that simply made it record the right value; it had been recording as a % something that was 1/100th of the percentage.
Regarding the increase. This cannot be THIS change. The percentage value is is not taking into account for displaying results in perfherder. It's only in use for the health dashboard. Also note that if it would be, the difference shouldn't only be some decimals but hundreds of them.
To make things clear, if you look at the subtests (comment 6), you see the %-dropped-frames tests have "regressions" on the order of 10000% (+/-); if you look at the numbers, they're on the order of 100x different (which matches) (1->100, 0.45->38.39, 0.82->92.47, etc).
Due to geomeaning of all the different tests (only the %-dropped-frames tests changed), this maps to a ~55% change in the geomean.
If you filter the subtests by "X_d" (which happens to exclude the "X_%_d" tests), the results look more typical - there's plenty of noise, but no clear regression.
Rob - I think we should just mark this as invalid
Comment 12•5 years ago
|
||
Strange! The initial alert indeed references the % values only! Why that? Those values should not taken into account for perfherder as already explained above.
The overall suite value should be computed from the [..]X_dropped_frames
values:
But I can see the problem now! Check that code:
It should not check for name.endswith("dropped_frames")
, which also takes the percentage numbers into account. :/
As such this bug is not invalid but clearly needs a fix! Alexandru, can you work on that soon so we can get all the numbers fixed?
Assignee | ||
Comment 13•5 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #12)
Strange! The initial alert indeed references the % values only! Why that? Those values should not taken into account for perfherder as already explained above.
The overall suite value should be computed from the
[..]X_dropped_frames
values:But I can see the problem now! Check that code:
It should not check for
name.endswith("dropped_frames")
, which also takes the percentage numbers into account. :/As such this bug is not invalid but clearly needs a fix! Alexandru, can you work on that soon so we can get all the numbers fixed?
I created this bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1561889 and I'm working on it
Assignee | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 14•5 years ago
|
||
Based on the landing of bug 1561889 we now should have real numbers - expect some up and downs given the adjustments for the harness.
Description
•