21.71 - 9.47% wikipedia FirstVisualChange / espn fnbpaint + 10 more (Android, Linux) regression on Tue July 20 2021
Categories
(Core :: Widget: Gtk, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox-esr91 | --- | unaffected |
firefox90 | --- | unaffected |
firefox91 | --- | unaffected |
firefox92 | --- | wontfix |
firefox93 | --- | fixed |
People
(Reporter: alexandrui, Assigned: emilio)
References
(Regressed 1 open bug, Regression)
Details
(Keywords: perf, perf-alert, regression)
Crash Data
Attachments
(2 files)
Perfherder has detected a browsertime performance regression from push 9ac290ec5884fd52bb6c16e9794da5b42f211cbb. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
22% | wikipedia | FirstVisualChange | linux1804-64-shippable-qr | cold webrender | 1,636.67 -> 1,992.00 |
22% | wikipedia | fnbpaint | linux1804-64-shippable-qr | cold webrender | 1,618.08 -> 1,967.15 |
21% | wikipedia | ContentfulSpeedIndex | linux1804-64-shippable-qr | cold webrender | 1,649.00 -> 2,002.65 |
21% | wikipedia | SpeedIndex | linux1804-64-shippable-qr | cold webrender | 1,645.50 -> 1,996.15 |
21% | wikipedia | PerceptualSpeedIndex | linux1804-64-shippable-qr | cold webrender | 1,660.67 -> 2,002.85 |
19% | wikipedia | fnbpaint | linux1804-64-shippable-qr | cold webrender | 1,646.65 -> 1,959.00 |
19% | wikipedia | FirstVisualChange | linux1804-64-shippable-qr | cold webrender | 1,670.00 -> 1,980.00 |
18% | wikipedia | ContentfulSpeedIndex | linux1804-64-shippable-qr | cold webrender | 1,681.42 -> 1,990.58 |
18% | wikipedia | SpeedIndex | linux1804-64-shippable-qr | cold webrender | 1,677.12 -> 1,984.08 |
18% | wikipedia | PerceptualSpeedIndex | linux1804-64-shippable-qr | cold webrender | 1,692.21 -> 1,991.17 |
10% | espn | fcp | android-hw-g5-7-0-arm7-shippable-qr | cold webrender | 2,301.48 -> 2,520.08 |
9% | espn | fnbpaint | android-hw-g5-7-0-arm7-shippable-qr | cold webrender | 2,309.79 -> 2,528.58 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.
For more information on performance sheriffing please see our FAQ.
Comment 1•3 years ago
|
||
Set release status flags based on info from the regressing bug 1718755
Assignee | ||
Comment 2•3 years ago
|
||
I'm a bit surprised about this affecting cold loads etc, specially on android, since that changed timing, but only of stuff dealing with pref changes etc.
Alexandru, do you know if the test harness sets some prefs or theme changes / etc that could explain this?
Comment 3•3 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)
Alexandru, do you know if the test harness sets some prefs or theme changes / etc that could explain this?
Joel may be in a better position to answer this question.
Comment 4•3 years ago
|
||
davehunt/sparky are much more familiar with these harnesses. Usually something like fission or webrender will come in via cli args or preferences in the profile.
Reporter | ||
Comment 5•3 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)
I'm a bit surprised about this affecting cold loads etc, specially on android, since that changed timing, but only of stuff dealing with pref changes etc.
Alexandru, do you know if the test harness sets some prefs or theme changes / etc that could explain this?
Don't think so. I did some retriggers on android.
Comment 6•3 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)
Alexandru, do you know if the test harness sets some prefs or theme changes / etc that could explain this?
These profiles are merged for our Raptor performance tests:
- https://searchfox.org/mozilla-central/source/testing/profiles/perf/user.js
- https://searchfox.org/mozilla-central/source/testing/profiles/raptor/user.js
Tests can specify their own preferences, and variants (such as fission) can also set their own preferences. As far as I know none of these preferences relate to themes.
Updated•3 years ago
|
Comment 8•3 years ago
|
||
:alexandrui indicates that the retriggers point to bug 1718755 (I presume)
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 9•3 years ago
|
||
Remove the need to post a runnable to coalesce them (we still coalesce
theme changes because those do non-trivial work).
What can go on is that on cold load we load the font list async, and
that goes through this code via a faked font.internaluseonly.changed
pref change, which stores a reframe change hint in
mChangeHintForPrefChange.
If the page flushes after this code runs, but before the next refresh
driver tick, we will be just wasting work. So instead do the
RebuildAllStyleData call synchronously. It just schedules a flush and
posts a hint, so it doesn't do the work right there, but makes the next
flush do the work as needed, so it should be faster and also more
correct.
Also improve handling of preference sheet prefs and such to avoid more
wasted work when we go through this code, as the assumption their
handling was doing was that it gets called infrequently, but it's not
the case due to the font list updates.
Depends on D123102
Updated•3 years ago
|
Comment 10•3 years ago
|
||
Comment 11•3 years ago
|
||
bugherder |
Comment 12•3 years ago
|
||
I'm not seeing any obvious improvements in the graphs from comment 0 so far since this landed :(
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 13•3 years ago
|
||
Ok, so I see what's going on. Before my patch, the actual font list change wasn't being processed in time.
I get the expected wins if I avoid the reframe that the font list update causes: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=5c41af5c0cd502c4784dc74f049732d46ab4f3d8&newProject=try&newRevision=a6d8cbfe3c9d8a3356ce3c4b8c33117189927d54&framework=13&page=1
(This is the patch for reference).
Jonathan, I think the behavior now is strictly more correct (and does less work over all). The test regresses because before my patch the work happened off a timer that didn't get measured...
The reframe comes from bug 1647573. If we don't want to just take this regression, we could perhaps avoid reframing here again? Before the patch in this bug, there's the possibility of other layouts happening before the pref update timer, and we were fine with that before... This only seems to affect Linux / Android because presumably other platforms don't trigger font fallback in a way that we need this reflow for this test...
Comment 14•3 years ago
|
||
Jonathan, I think the behavior now is strictly more correct (and does less work over all). The test regresses because before my patch the work happened off a timer that didn't get measured...
Yeah, that makes sense. So the test doesn't fully reflect user experience, because there may be a further visual change coming after what the test measures. But OTOH, in many cases it may turn out that nothing visibly changes after the update, so showing the pre-update content to the user was actually a win even though it wasn't reliably correct.
When there is a "complete" font-list update (e.g. in response to the user installing a new font while Firefox is running), it's essential that we do a full reframe of everything, because the existing frame tree will be holding references to textruns that have references to font objects that no longer exist.
However, in the case of this reflow we don't actually need to reframe, because existing font objects haven't gone away, and any cached references will still be valid; we should just need to restyle so that textframes can reconsider their font matching.
Maybe it's time to finally get rid of the old hack of touching the font.internaluseonly.updated
pref in order to force updates, and instead have a more fine-grained way of notifying such updates.
Assignee | ||
Comment 16•3 years ago
|
||
Assignee | ||
Updated•3 years ago
|
Reporter | ||
Comment 17•3 years ago
|
||
(In reply to Pulsebot from comment #10)
Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5235f051d952
Improve preference change handling code to be faster. r=jfkthame
== Change summary for alert #31020 (as of Wed, 25 Aug 2021 11:28:26 GMT) ==
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
7% | google-search-restaurants dcf | android-hw-g5-7-0-arm7-shippable-qr | cold webrender | 535.52 -> 496.75 |
7% | web-de fnbpaint | android-hw-g5-7-0-arm7-shippable-qr | cold webrender | 391.42 -> 365.42 |
6% | web-de fcp | android-hw-g5-7-0-arm7-shippable-qr | cold webrender | 387.73 -> 363.50 |
6% | web-de fnbpaint | android-hw-g5-7-0-arm7-shippable-qr | cold webrender | 391.48 -> 367.04 |
6% | google-search-restaurants fcp | android-hw-g5-7-0-arm7-shippable-qr | cold webrender | 511.19 -> 480.00 |
... | ... | ... | ... | ... |
3% | google-search-restaurants loadtime | android-hw-g5-7-0-arm7-shippable-qr | cold webrender | 713.48 -> 690.71 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=31020
Comment 18•3 years ago
|
||
It's good to see progress being made on this bug, but I think we're out of time to get a fix into Fx92. I'm honestly a bit unsure whether we're better off just taking this hit for a cycle vs. trying to backout the regressing patches this late in the cycle. What are your thoughts, Emilio?
Comment 19•3 years ago
|
||
Let's see what Emilio says, but FWIW my vote would be to just take the hit for now. Cold-start wikipedia is something of a worst-case, and doesn't represent a perf regression that users are going to experience on an ongoing basis during a browsing session.
(The second patch here should help further, but I wouldn't want to try and uplift to 92 this late in the cycle; I think we should just aim to make these improvements in 93.)
Comment 22•3 years ago
|
||
Comment 23•3 years ago
|
||
bugherder |
Comment 24•3 years ago
|
||
What's the follow-on, now that this bug is marked resolved? We've had no noticeable improvement since this regression happened. Are improvements still planned for 93?
Assignee | ||
Comment 25•3 years ago
|
||
I don't think it's easy to improve on this (ni? Jonathan to confirm). The current behavior is more correct than before. I could avoid or artificially delay the reflow when this happens to restore the previous metrics, but it'd be a correctness issue / potentially cause flashes.
Comment 26•3 years ago
|
||
Yeah, comment 25 sounds right to me. If we render sooner by deferring the reflow here, we introduce/increase the chance of rendering temporarily-incorrect content that will then flash/shift when it updates to the final rendering, which is undesirable.
Updated•3 years ago
|
Updated•3 years ago
|
Description
•