Closed Bug 1416765 Opened 7 years ago Closed 7 years ago

7.29 - 23.16% displaylist_mutate (linux64, osx-10-10, windows10-64) regression on push d5e1ed773fefe2eb9d52402dd74b5fea19b8b4a4 (Sun Nov 12 2017)

Categories

(Core :: Web Painting, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox58 --- affected

People

(Reporter: igoldan, Unassigned, NeedInfo)

References

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=d5e1ed773fefe2eb9d52402dd74b5fea19b8b4a4 As author of one of the patches included in that push, we need your help to address this regression. Regressions: 23% displaylist_mutate summary linux64 opt e10s 20,685.31 -> 25,475.28 18% displaylist_mutate summary osx-10-10 opt e10s 20,234.77 -> 23,934.07 7% displaylist_mutate summary windows10-64 opt e10s22,171.26 -> 23,788.47 Improvements: 7% displaylist_mutate summary windows7-32 opt e10s 27,981.42 -> 25,985.53 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=10507 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Component: Untriaged → Layout: Web Painting
Product: Firefox → Core
:mattwoodrow Bug 1416055 caused very serious regressions on our displaylist_mutate tests. We need to look for performance regain.
Flags: needinfo?(matt.woodrow)
I am quite certain that this is a false positive caused by the code behind MOZ_DIAGNOSTIC_ASSERT_ENABLED flag. Especially merging phase is really slow because we call nsDisplayListBuilder::DebugContains (which is O(n) complexity) when we move items from the new and old list to the final merged list. I ran benchmarks locally (just once, so there might be some noise) on MBP and got the following results: retain-dl-disabled.json: 13611.3875 retain-dl-enabled.json: 18768.75 retain-dl-disabled-no-assert.json: 6406.885 retain-dl-enabled-no-assert.json: 4499.945 This means we are actually 30% faster, which is expected as this patch enables display list retaining. :)
this also seems to have a memory regression: == Change summary for alert #10502 (as of Mon, 13 Nov 2017 01:32:24 GMT) == Regressions: 10% Heap Unclassified summary linux64-stylo-sequential opt stylo-sequential 56,382,323.76 -> 62,016,322.59 8% Heap Unclassified summary windows7-32 opt 39,778,604.35 -> 42,830,737.38 7% Heap Unclassified summary windows7-32-stylo-disabled opt stylo-disabled 40,217,791.25 -> 43,163,248.62 7% Heap Unclassified summary windows7-32 pgo 39,759,862.28 -> 42,516,938.59 6% Heap Unclassified summary windows10-64-stylo-disabled opt stylo-disabled 47,275,422.42 -> 50,101,763.52 6% Heap Unclassified summary windows10-64 pgo 46,989,200.51 -> 49,783,850.41 6% Heap Unclassified summary windows10-64 opt 47,073,697.66 -> 49,839,939.10 4% Heap Unclassified summary osx-10-10 opt 69,403,570.22 -> 72,135,125.47 4% Heap Unclassified summary macosx64-stylo-disabled opt stylo-disabled 69,974,128.60 -> 72,546,720.89 3% Explicit Memory summary linux32-stylo-disabled opt stylo-disabled 235,713,965.45 -> 242,208,571.89 2% Explicit Memory summary linux64 opt 314,877,489.06 -> 322,000,828.94 2% Explicit Memory summary linux64-stylo-disabled opt stylo-disabled 308,452,114.93 -> 315,314,757.03 2% Explicit Memory summary linux64-stylo-sequential opt stylo-sequential 313,418,948.68 -> 320,044,638.66 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=10502 ^ note: 3 patches landed at the same time: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=7fb001f811d183095e9cfdd958c9215484377396&tochange=d5e1ed773fefe2eb9d52402dd74b5fea19b8b4a4 regarding the concern about the regression, this is a newer test that was added in bug 1411804, :dvander, could you comment on the stability of this benchmark that you added?
Flags: needinfo?(dvander)
Well... I don't actually know yet :) Matt & Miko's work on retained-dl was the motivation for adding this test. We expected RDL to have a large positive impact on it and for that to stabilize at a new number. Given what Miko said in comment #2 - the regression is expected and when they disable the assert the test will restabilize. (I guess the assert could also be turned off for Talos via a flag or something.)
Flags: needinfo?(dvander)
We also noticed these regressions on Windows 10: 10% tsvgr_opacity summary windows10-64 pgo e10s 222.65 -> 244.30 9% tsvgr_opacity summary windows10-64 opt e10s 234.26 -> 256.35 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=10507
:miko We still need to resolve the AWSY regressions and the tsvgr_opacity from Windows 10. Please look over these.
Flags: needinfo?(mikokm)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #6) > :miko We still need to resolve the AWSY regressions and the tsvgr_opacity > from Windows 10. Please look over these. Most of the additional memory usage is caused by us keeping the previous display list in memory, while building a new one for the invalidated area. This is needed to perform a partial build of the display list. We also need to store some extra state to be able to merge display lists properly. I ran Talos SVG suite on try with the MOZ_DIAGNOSTIC_ASSERT_ENABLED macro enabled and disabled: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=ad2286b11bff34b632f3df4afa5b628de0582998&newProject=try&newRevision=2e2eabeee970&framework=1 Removing MOZ_DIAGNOSTIC_ASSERT_ENABLED makes tsvgr_opacity run 12% faster on win10-64 opt build, and 15% faster on pgo build, when retained display list is enabled. Here is a comparison of retained display list enabled and disabled (with MOZ_DIAGNOSTIC_ASSERT_ENABLED disabled): https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=880f2d443fdf&newProject=try&newRevision=2e2eabeee970&framework=1 The results are practically identical, which is expected, since the display list created in the test is quite small.
Flags: needinfo?(mikokm)
(In reply to Miko Mynttinen [:miko] from comment #7) > Most of the additional memory usage is caused by us keeping the previous > display list in memory, while building a new one for the invalidated area. > This is needed to perform a partial build of the display list. We also need > to store some extra state to be able to merge display lists properly. Is there something we can do to partially reduce this memory consumption? If not, we can resolve this bug as WONTFIX. > I ran Talos SVG suite on try with the MOZ_DIAGNOSTIC_ASSERT_ENABLED macro > enabled and disabled: > https://treeherder.mozilla.org/perf.html#/ > compare?originalProject=try&originalRevision=ad2286b11bff34b632f3df4afa5b628d > e0582998&newProject=try&newRevision=2e2eabeee970&framework=1 > > Removing MOZ_DIAGNOSTIC_ASSERT_ENABLED makes tsvgr_opacity run 12% faster on > win10-64 opt build, and 15% faster on pgo build, when retained display list > is enabled. > > Here is a comparison of retained display list enabled and disabled (with > MOZ_DIAGNOSTIC_ASSERT_ENABLED disabled): > https://treeherder.mozilla.org/perf.html#/ > compare?originalProject=try&originalRevision=880f2d443fdf&newProject=try&newR > evision=2e2eabeee970&framework=1 > > The results are practically identical, which is expected, since the display > list created in the test is quite small. Thank you for sharing this!
Flags: needinfo?(mikokm)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #8) > (In reply to Miko Mynttinen [:miko] from comment #7) > > Most of the additional memory usage is caused by us keeping the previous > > display list in memory, while building a new one for the invalidated area. > > This is needed to perform a partial build of the display list. We also need > > to store some extra state to be able to merge display lists properly. > Is there something we can do to partially reduce this memory consumption? > If not, we can resolve this bug as WONTFIX. I believe the memory consumption will go down a little, when we optimize the algorithm to rebuild less items. We are currently also storing state for display items twice, because parts of the code modify the state, and we need the original state when reusing the items. Hopefully we can eventually force immutability for display items.
Flags: needinfo?(mikokm)
Resolving as WONTFIX as per comment 8.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.