1557724 - 54.82% raptor-youtube-playback-geckoview-live (android-hw-g5-7-0-arm7-api-16) regression (Wed Jun 5 2019)

Ionuț Goldan [:igoldan]

Reporter

Description

•

5 years ago

Raptor has detected a Firefox performance regression from push:

https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=ebe7fdd19be5dbca6817987e2d80bbe46b357ad7

We need your help to address this regression.

Regressions:

55% raptor-youtube-playback-geckoview-live android-hw-g5-7-0-arm7-api-16 pgo 36.29 -> 56.18

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=21372

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a Treeherder page showing the Raptor jobs in a pushlog format.

To learn more about the regressing test(s) or reproducing them, please see: https://wiki.mozilla.org/Performance_sheriffing/Raptor

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling

Ionuț Goldan [:igoldan]

Reporter

Comment 1

•

5 years ago

Other Android platforms seem to experience this, according to :whimboo. We need to look up & present those cases as well.
As this is a mozilla-central type of alert, the patch must be bisected, as it cannot be backfilled via Treeherder.

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 2

•

5 years ago

Note that there is also an increase for Aarch64 on Windows and MacOS, but the latter has only a slight increase.

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 3

•

5 years ago

https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,2043558,1,10&series=mozilla-central,2054686,1,10&series=mozilla-central,2051167,1,10

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 4

•

5 years ago

It might be good if the last good commit could be re-run with Raptor. If the numbers of that commit is still lower we would know that something in our code caused this regression. Otherwise it might also be a problem with the host serving the media files.

Alexandru Irimovici

Assignee

Updated

•

5 years ago

Assignee: nobody → airimovici

Alexandru Irimovici

Assignee

Comment 5

•

5 years ago

Hi Jesup, did you already investigate this?

Flags: needinfo?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Comment 6

•

5 years ago

I looked at the list of commits. This is bug 1555370 - increase the dropped-frames subtest by 100x (% instead of ratio). very clear when you look at https://treeherder.mozilla.org/perf.html#/comparesubtest?framework=10&originalProject=mozilla-central&originalSignature=2051167&newProject=mozilla-central&newSignature=2051167&originalRevision=4d1920fc1dd0d97630d5d59f978e14b9f3ded156&newRevision=ebe7fdd19be5dbca6817987e2d80bbe46b357ad7

Because of geomeaning, this appears as a much smaller regression in the top numbers..... geomeaning these values is bad - you can geomean dropped frames with each other, but geomeaning decoded frames with dropped frames, etc gets ugly. Even geomeaning dropped frames with % dropped frames is problematic.

Flags: needinfo?(rjesup)

Alexandru Irimovici

Assignee

Comment 7

•

5 years ago

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #6)

I looked at the list of commits. This is bug 1555370 - increase the dropped-frames subtest by 100x (% instead of ratio). very clear when you look at https://treeherder.mozilla.org/perf.html#/comparesubtest?framework=10&originalProject=mozilla-central&originalSignature=2051167&newProject=mozilla-central&newSignature=2051167&originalRevision=4d1920fc1dd0d97630d5d59f978e14b9f3ded156&newRevision=ebe7fdd19be5dbca6817987e2d80bbe46b357ad7

Should we revert the changes from bug 1555370?

Because of geomeaning, this appears as a much smaller regression in the top numbers..... geomeaning these values is bad - you can geomean dropped frames with each other, but geomeaning decoded frames with dropped frames, etc gets ugly. Even geomeaning dropped frames with % dropped frames is problematic.

So there is no real regression, only harness update. Do you suggest that we should employ another mean algorithm for the benchmark score?

Flags: needinfo?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Comment 8

•

5 years ago

(In reply to Alexandru Irimovici from comment #7)

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #6)

I looked at the list of commits. This is bug 1555370 - increase the dropped-frames subtest by 100x (% instead of ratio). very clear when you look at https://treeherder.mozilla.org/perf.html#/comparesubtest?framework=10&originalProject=mozilla-central&originalSignature=2051167&newProject=mozilla-central&newSignature=2051167&originalRevision=4d1920fc1dd0d97630d5d59f978e14b9f3ded156&newRevision=ebe7fdd19be5dbca6817987e2d80bbe46b357ad7

Should we revert the changes from bug 1555370?

No - that simply made it record the right value; it had been recording as a % something that was 1/100th of the percentage.

Because of geomeaning, this appears as a much smaller regression in the top numbers..... geomeaning these values is bad - you can geomean dropped frames with each other, but geomeaning decoded frames with dropped frames, etc gets ugly. Even geomeaning dropped frames with % dropped frames is problematic.

So there is no real regression, only harness update. Do you suggest that we should employ another mean algorithm for the benchmark score?

Doing any type of "mean" among things that measure different types of values can cause these problems. It's made worse because geomean (and means in general) treat all the things as the same weight unless you fiddle with them, so when you have it geomeaning loadtime, FCPaint, FNBPaint, and DCF - changes to FCP/FNBP move almost always in lockstep since they're very close in time, so they have "double" weight.

People want a single top-line number - and inherently this can hide information, even if you avoid that sort of problem.

Flags: needinfo?(rjesup)

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 9

•

5 years ago

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #8)

(In reply to Alexandru Irimovici from comment #7)

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #6)

I looked at the list of commits. This is bug 1555370 - increase the dropped-frames subtest by 100x (% instead of ratio). very clear when you look at https://treeherder.mozilla.org/perf.html#/comparesubtest?framework=10&originalProject=mozilla-central&originalSignature=2051167&newProject=mozilla-central&newSignature=2051167&originalRevision=4d1920fc1dd0d97630d5d59f978e14b9f3ded156&newRevision=ebe7fdd19be5dbca6817987e2d80bbe46b357ad7

Should we revert the changes from bug 1555370?

No - that simply made it record the right value; it had been recording as a % something that was 1/100th of the percentage.

Regarding the increase. This cannot be THIS change. The percentage value is is not taking into account for displaying results in perfherder. It's only in use for the health dashboard. Also note that if it would be, the difference shouldn't only be some decimals but hundreds of them.

BugBot [:suhaib / :marco/ :calixte]

Comment 10

•

5 years ago

The priority flag is not set for this bug.
:gbrown, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(gbrown)

Henrik Skupin [:whimboo][⌚️UTC+2]

Updated

•

5 years ago

Component: General → Raptor

Flags: needinfo?(gbrown) → needinfo?(rwood)

Robert Wood [:rwood]

Updated

•

5 years ago

Flags: needinfo?(rwood)

Priority: -- → P2

Randell Jesup [:jesup] (needinfo me)

Comment 11

•

5 years ago

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #9)

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #8)
...

Should we revert the changes from bug 1555370?

No - that simply made it record the right value; it had been recording as a % something that was 1/100th of the percentage.

Regarding the increase. This cannot be THIS change. The percentage value is is not taking into account for displaying results in perfherder. It's only in use for the health dashboard. Also note that if it would be, the difference shouldn't only be some decimals but hundreds of them.

To make things clear, if you look at the subtests (comment 6), you see the %-dropped-frames tests have "regressions" on the order of 10000% (+/-); if you look at the numbers, they're on the order of 100x different (which matches) (1->100, 0.45->38.39, 0.82->92.47, etc).

Due to geomeaning of all the different tests (only the %-dropped-frames tests changed), this maps to a ~55% change in the geomean.

If you filter the subtests by "X_d" (which happens to exclude the "X_%_d" tests), the results look more typical - there's plenty of noise, but no clear regression.

Rob - I think we should just mark this as invalid

Flags: needinfo?(rwood)

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 12

•

5 years ago

Strange! The initial alert indeed references the % values only! Why that? Those values should not taken into account for perfherder as already explained above.

Here the referenced sub tests:
https://treeherder.mozilla.org/perf.html#/comparesubtest?framework=10&originalProject=mozilla-central&originalSignature=2051167&newProject=mozilla-central&newSignature=2051167&originalRevision=4d1920fc1dd0d97630d5d59f978e14b9f3ded156&newRevision=ebe7fdd19be5dbca6817987e2d80bbe46b357ad7

The overall suite value should be computed from the [..]X_dropped_frames values:

https://searchfox.org/mozilla-central/rev/06bd14ced96f25ff1dbd5352cb985fc0fa12a64e/testing/raptor/raptor/output.py#1026

But I can see the problem now! Check that code:

https://searchfox.org/mozilla-central/rev/06bd14ced96f25ff1dbd5352cb985fc0fa12a64e/testing/raptor/raptor/output.py#768

It should not check for name.endswith("dropped_frames"), which also takes the percentage numbers into account. :/

As such this bug is not invalid but clearly needs a fix! Alexandru, can you work on that soon so we can get all the numbers fixed?

Status: NEW → ASSIGNED

Flags: needinfo?(rwood) → needinfo?(airimovici)

Alexandru Irimovici

Assignee

Updated

•

5 years ago

Depends on: 1561889

Alexandru Irimovici

Assignee

Comment 13

•

5 years ago

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #12)

Strange! The initial alert indeed references the % values only! Why that? Those values should not taken into account for perfherder as already explained above.

Here the referenced sub tests:
https://treeherder.mozilla.org/perf.html#/comparesubtest?framework=10&originalProject=mozilla-central&originalSignature=2051167&newProject=mozilla-central&newSignature=2051167&originalRevision=4d1920fc1dd0d97630d5d59f978e14b9f3ded156&newRevision=ebe7fdd19be5dbca6817987e2d80bbe46b357ad7

The overall suite value should be computed from the [..]X_dropped_frames values:

https://searchfox.org/mozilla-central/rev/06bd14ced96f25ff1dbd5352cb985fc0fa12a64e/testing/raptor/raptor/output.py#1026

But I can see the problem now! Check that code:

https://searchfox.org/mozilla-central/rev/06bd14ced96f25ff1dbd5352cb985fc0fa12a64e/testing/raptor/raptor/output.py#768

It should not check for name.endswith("dropped_frames"), which also takes the percentage numbers into account. :/

As such this bug is not invalid but clearly needs a fix! Alexandru, can you work on that soon so we can get all the numbers fixed?

I created this bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1561889 and I'm working on it

Alexandru Irimovici

Assignee

Updated

•

5 years ago

Flags: needinfo?(airimovici)

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

status-firefox67: --- → unaffected

status-firefox68: --- → unaffected

status-firefox69: --- → affected

status-firefox-esr60: --- → unaffected

tracking-firefox69: --- → +

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 14

•

5 years ago

Based on the landing of bug 1561889 we now should have real numbers - expect some up and downs given the adjustments for the harness.

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox69: affected → fixed

OS: Android → All

Hardware: x86 → All

Resolution: --- → FIXED

Target Milestone: --- → mozilla69

Bugzilla

54.82% raptor-youtube-playback-geckoview-live (android-hw-g5-7-0-arm7-api-16) regression (Wed Jun 5 2019)

Categories

(Testing :: Raptor, defect, P2)

Tracking

(firefox-esr60 unaffected, firefox67 unaffected, firefox68 unaffected, firefox69+ fixed)

People

(Reporter: igoldan, Assigned: alexandru.irimovici)

References

Details

(Keywords: perf, regression)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Updated

Comment 11

Comment 12

Updated

Comment 13

Updated

Updated

Comment 14