Closed Bug 1448826 Opened 7 years ago Closed 7 years ago

20.27% displaylist_mutate (linux64) regression on push d9f154931d6db3e9b552b0ff8148ead23c11977d (Mon Mar 26 2018)

Categories

(Core :: Web Painting, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox-esr52 --- unaffected
firefox59 --- unaffected
firefox60 --- unaffected
firefox61 + wontfix

People

(Reporter: igoldan, Assigned: mattwoodrow)

References

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push: https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=d9f154931d6db3e9b552b0ff8148ead23c11977d As author of one of the patches included in that push, we need your help to address this regression. Regressions: 20% displaylist_mutate linux64 pgo e10s stylo 2,853.51 -> 3,431.89 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=12370 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Component: Untriaged → Layout: Web Painting
Product: Firefox → Core
:mattwoodrow Big regressions are starting to show up on *all* desktop platforms. The one above is the first that landed. They are happening on both PGO and OPT builds. Please look over this. Can we resolve this issue or should we accept/back it out?
Flags: needinfo?(matt.woodrow)
Bug 1443027 was backed out for causing crashes on Nightly as well, so these regressions should go away shortly. I'm going to leave this bug open for investigation, though.
Assignee: nobody → matt.woodrow
The backout canceled the regressions: == Change summary for alert #12372 (as of Mon, 26 Mar 2018 10:19:36 GMT) == Improvements: 16% displaylist_mutate linux64 pgo e10s stylo 3,418.00 -> 2,862.11 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12372
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
The second push [1] also caused big perf regressions. The backout that followed the push canceled them again. == Change summary for alert #12430 (as of Thu, 29 Mar 2018 23:14:04 GMT) == Regressions: 19% displaylist_mutate windows10-64 pgo e10s stylo 5,616.99 -> 6,691.43 14% displaylist_mutate windows7-32 opt e10s stylo 8,589.95 -> 9,817.83 14% displaylist_mutate windows10-64 opt e10s stylo 6,785.94 -> 7,720.01 11% displaylist_mutate linux64 opt e10s stylo 3,270.36 -> 3,635.71 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12430 [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1443027#c29
While this is technically "fixed" by backout at the moment, that feels pretty disingenuous. Let's leave this open until bug 1443027 manages to land without regressing performance.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Unfortunately, I think landing with the performance regression is inevitable. The new algorithm is more correct (hopefully completely correct!), but involves more work to do so. I do have some new optimization ideas, but I'll work on these separately after landing, since I really want to fix the correctness for 60.
Flags: needinfo?(matt.woodrow)
Getting bug 1443027 into 60 seems fairly risky, seeing how it bounced several times for breaking nightly, I'm not too excited about that prospect.
(In reply to Julien Cristau [:jcristau] from comment #7) > Getting bug 1443027 into 60 seems fairly risky, seeing how it bounced > several times for breaking nightly, I'm not too excited about that prospect. It's definitely a concern. Bug 1443027 is fixing a known (although rare) correctness issue with retained-dl though, so if we choose not to uplift, then we'll have to consider delaying retained-dl until 61.
The relanding from https://bugzilla.mozilla.org/show_bug.cgi?id=1443027#c37 brought back the same perf regressions: == Change summary for alert #12467 (as of Mon, 02 Apr 2018 22:41:06 GMT) == Regressions: 20% displaylist_mutate windows10-64 pgo e10s stylo 3,543.44 -> 4,245.75 19% displaylist_mutate linux64 pgo e10s stylo 2,864.80 -> 3,419.79 18% displaylist_mutate windows7-32 pgo e10s stylo 4,129.38 -> 4,867.35 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12467
These regressions are compared to the known-broken implementation of retained-dl before bug 1443027. We should rather compare the performance between "correct retained-dl" and "no retained-dl". Matt, do we have any numbers for "no retained-dl" on these tests?
The relanding from https://bugzilla.mozilla.org/show_bug.cgi?id=1443027#c43 brought back the same perf regressions: == Change summary for alert #12483 (as of Tue, 03 Apr 2018 20:22:39 GMT) == Regressions: 21% displaylist_mutate windows10-64 pgo e10s stylo 3,553.71 -> 4,309.13 19% displaylist_mutate linux64 pgo e10s stylo 2,855.87 -> 3,410.19 18% displaylist_mutate windows7-32 pgo e10s stylo 4,130.80 -> 4,894.38 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12483
(In reply to Markus Stange [:mstange] from comment #10) > These regressions are compared to the known-broken implementation of > retained-dl before bug 1443027. > We should rather compare the performance between "correct retained-dl" and > "no retained-dl". Matt, do we have any numbers for "no retained-dl" on these > tests? I ran the latest m-c talos tests with display list retaining disabled and enabled: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3eba631bda10&newProject=try&newRevision=68d53e6a15b9&framework=1 disabled enabled Delta Confidence linux64 5,426.32 ± 0.41% > 3,720.88 ± 1.95% -31.43% 50.30 (high) linux64-qr 3,944.27 ± 0.56% > 3,266.20 ± 0.61% -17.19% 51.14 (high) Based on this, retained display lists are still a massive improvement over the baseline.
Thanks for the clarification. Sounds like we're still way ahead of where we started and the previous numbers were based on a flawed implementation. Calling this wontfix based on that, but we will of course still welcome any and all improvements stemming from comment 6 :)
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.