Closed Bug 1722487 Opened 3 years ago Closed 3 years ago

21.71 - 9.47% wikipedia FirstVisualChange / espn fnbpaint + 10 more (Android, Linux) regression on Tue July 20 2021

Tracking

()

Status:

RESOLVED FIXED

Milestone:

93 Branch

Tracking Flags:

Tracking

Status

firefox-esr78

---

unaffected

firefox-esr91

---

unaffected

firefox90

---

unaffected

firefox91

---

unaffected

firefox92

---

wontfix

firefox93

---

fixed

People

(Reporter: alexandrui, Assigned: emilio)

References

(Regressed 1 open bug, Regression)

Details

(Keywords: perf, perf-alert, regression)

Crash Data

Attachments

(2 files)

Bug 1722487 - Improve preference change handling code to be faster. r=jfkthame 3 years ago Emilio Cobos Álvarez (:emilio) (deleted), text/x-phabricator-request		Details
Bug 1722487 - Avoid some work for font list updates. r=jfkthame 3 years ago Emilio Cobos Álvarez (:emilio) (deleted), text/x-phabricator-request		Details

Alexandru Ionescu (needinfo me) [:alexandrui]

Reporter

Description

•

3 years ago

Perfherder has detected a browsertime performance regression from push 9ac290ec5884fd52bb6c16e9794da5b42f211cbb. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio	Suite	Test	Platform	Options	Absolute values (old vs new)
22%	wikipedia	FirstVisualChange	linux1804-64-shippable-qr	cold webrender	1,636.67 -> 1,992.00
22%	wikipedia	fnbpaint	linux1804-64-shippable-qr	cold webrender	1,618.08 -> 1,967.15
21%	wikipedia	ContentfulSpeedIndex	linux1804-64-shippable-qr	cold webrender	1,649.00 -> 2,002.65
21%	wikipedia	SpeedIndex	linux1804-64-shippable-qr	cold webrender	1,645.50 -> 1,996.15
21%	wikipedia	PerceptualSpeedIndex	linux1804-64-shippable-qr	cold webrender	1,660.67 -> 2,002.85
19%	wikipedia	fnbpaint	linux1804-64-shippable-qr	cold webrender	1,646.65 -> 1,959.00
19%	wikipedia	FirstVisualChange	linux1804-64-shippable-qr	cold webrender	1,670.00 -> 1,980.00
18%	wikipedia	ContentfulSpeedIndex	linux1804-64-shippable-qr	cold webrender	1,681.42 -> 1,990.58
18%	wikipedia	SpeedIndex	linux1804-64-shippable-qr	cold webrender	1,677.12 -> 1,984.08
18%	wikipedia	PerceptualSpeedIndex	linux1804-64-shippable-qr	cold webrender	1,692.21 -> 1,991.17
10%	espn	fcp	android-hw-g5-7-0-arm7-shippable-qr	cold webrender	2,301.48 -> 2,520.08
9%	espn	fnbpaint	android-hw-g5-7-0-arm7-shippable-qr	cold webrender	2,309.79 -> 2,528.58

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(emilio)

BugBot [:suhaib / :marco/ :calixte]

Comment 1

•

3 years ago

Set release status flags based on info from the regressing bug 1718755

status-firefox90: --- → unaffected

status-firefox91: --- → unaffected

status-firefox-esr78: --- → unaffected

Emilio Cobos Álvarez (:emilio)

Assignee

Comment 2

•

3 years ago

I'm a bit surprised about this affecting cold loads etc, specially on android, since that changed timing, but only of stuff dealing with pref changes etc.

Alexandru, do you know if the test harness sets some prefs or theme changes / etc that could explain this?

Flags: needinfo?(emilio) → needinfo?(aionescu)

Ryan VanderMeulen [:RyanVM]

Comment 3

•

3 years ago

(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)

Alexandru, do you know if the test harness sets some prefs or theme changes / etc that could explain this?

Joel may be in a better position to answer this question.

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 4

•

3 years ago

davehunt/sparky are much more familiar with these harnesses. Usually something like fission or webrender will come in via cli args or preferences in the profile.

Flags: needinfo?(jmaher) → needinfo?(dave.hunt)

Alexandru Ionescu (needinfo me) [:alexandrui]

Reporter

Comment 5

•

3 years ago

(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)

I'm a bit surprised about this affecting cold loads etc, specially on android, since that changed timing, but only of stuff dealing with pref changes etc.

Alexandru, do you know if the test harness sets some prefs or theme changes / etc that could explain this?

Don't think so. I did some retriggers on android.

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 6

•

3 years ago

(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)

Alexandru, do you know if the test harness sets some prefs or theme changes / etc that could explain this?

These profiles are merged for our Raptor performance tests:

Tests can specify their own preferences, and variants (such as fission) can also set their own preferences. As far as I know none of these preferences relate to themes.

Flags: needinfo?(dave.hunt)

Alexandru Ionescu (needinfo me) [:alexandrui]

Reporter

Comment 7

•

3 years ago

The retriggers reveal this bug.

Flags: needinfo?(aionescu)

Ryan VanderMeulen [:RyanVM]

Updated

•

3 years ago

status-firefox92: affected → fix-optional

status-firefox-esr91: --- → unaffected

Randell Jesup [:jesup] (needinfo me)

Comment 8

•

3 years ago

:alexandrui indicates that the retriggers point to bug 1718755 (I presume)

Flags: needinfo?(emilio)

Emilio Cobos Álvarez (:emilio)

Assignee

Updated

•

3 years ago

Depends on: 1726548

Emilio Cobos Álvarez (:emilio)

Assignee

Updated

•

3 years ago

Flags: needinfo?(emilio)

Emilio Cobos Álvarez (:emilio)

Assignee

Comment 9

•

3 years ago

Attached file Bug 1722487 - Improve preference change handling code to be faster. r=jfkthame (deleted) — Details

Remove the need to post a runnable to coalesce them (we still coalesce
theme changes because those do non-trivial work).

What can go on is that on cold load we load the font list async, and
that goes through this code via a faked font.internaluseonly.changed
pref change, which stores a reframe change hint in
mChangeHintForPrefChange.

If the page flushes after this code runs, but before the next refresh
driver tick, we will be just wasting work. So instead do the
RebuildAllStyleData call synchronously. It just schedules a flush and
posts a hint, so it doesn't do the work right there, but makes the next
flush do the work as needed, so it should be faster and also more
correct.

Also improve handling of preference sheet prefs and such to avoid more
wasted work when we go through this code, as the assumption their
handling was doing was that it gets called infrequently, but it's not
the case due to the font list updates.

Depends on D123102

Phabricator Automation

Updated

•

3 years ago

Assignee: nobody → emilio

Status: NEW → ASSIGNED

Pulsebot

Comment 10

•

3 years ago

Pushed by ealvarez@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5235f051d952 Improve preference change handling code to be faster. r=jfkthame

Marian-Vasile Laza

Comment 11

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/5235f051d952

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

status-firefox93: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 93 Branch

Ryan VanderMeulen [:RyanVM]

Comment 12

•

3 years ago

I'm not seeing any obvious improvements in the graphs from comment 0 so far since this landed :(

Emilio Cobos Álvarez (:emilio)

Assignee

Updated

•

3 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Emilio Cobos Álvarez (:emilio)

Assignee

Comment 13

•

3 years ago

Ok, so I see what's going on. Before my patch, the actual font list change wasn't being processed in time.

I get the expected wins if I avoid the reframe that the font list update causes: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=5c41af5c0cd502c4784dc74f049732d46ab4f3d8&newProject=try&newRevision=a6d8cbfe3c9d8a3356ce3c4b8c33117189927d54&framework=13&page=1

(This is the patch for reference).

Jonathan, I think the behavior now is strictly more correct (and does less work over all). The test regresses because before my patch the work happened off a timer that didn't get measured...

The reframe comes from bug 1647573. If we don't want to just take this regression, we could perhaps avoid reframing here again? Before the patch in this bug, there's the possibility of other layouts happening before the pref update timer, and we were fine with that before... This only seems to affect Linux / Android because presumably other platforms don't trigger font fallback in a way that we need this reflow for this test...

Flags: needinfo?(jfkthame)

Jonathan Kew [:jfkthame]

Comment 14

•

3 years ago

Jonathan, I think the behavior now is strictly more correct (and does less work over all). The test regresses because before my patch the work happened off a timer that didn't get measured...

Yeah, that makes sense. So the test doesn't fully reflect user experience, because there may be a further visual change coming after what the test measures. But OTOH, in many cases it may turn out that nothing visibly changes after the update, so showing the pre-update content to the user was actually a win even though it wasn't reliably correct.

When there is a "complete" font-list update (e.g. in response to the user installing a new font while Firefox is running), it's essential that we do a full reframe of everything, because the existing frame tree will be holding references to textruns that have references to font objects that no longer exist.

However, in the case of this reflow we don't actually need to reframe, because existing font objects haven't gone away, and any cached references will still be valid; we should just need to restyle so that textframes can reconsider their font matching.

Maybe it's time to finally get rid of the old hack of touching the font.internaluseonly.updated pref in order to force updates, and instead have a more fine-grained way of notifying such updates.

Flags: needinfo?(jfkthame)

Emilio Cobos Álvarez (:emilio)

Assignee

Comment 15

•

3 years ago

Alright, I'll give that a shot.

Flags: needinfo?(emilio)

Emilio Cobos Álvarez (:emilio)

Assignee

Comment 16

•

3 years ago

Attached file Bug 1722487 - Avoid some work for font list updates. r=jfkthame (deleted) — Details

Emilio Cobos Álvarez (:emilio)

Assignee

Updated

•

3 years ago

Flags: needinfo?(emilio)

Alexandru Ionescu (needinfo me) [:alexandrui]

Reporter

Comment 17

•

3 years ago

(In reply to Pulsebot from comment #10)

Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5235f051d952
Improve preference change handling code to be faster. r=jfkthame

== Change summary for alert #31020 (as of Wed, 25 Aug 2021 11:28:26 GMT) ==

Improvements:

Ratio	Test	Platform	Options	Absolute values (old vs new)
7%	google-search-restaurants dcf	android-hw-g5-7-0-arm7-shippable-qr	cold webrender	535.52 -> 496.75
7%	web-de fnbpaint	android-hw-g5-7-0-arm7-shippable-qr	cold webrender	391.42 -> 365.42
6%	web-de fcp	android-hw-g5-7-0-arm7-shippable-qr	cold webrender	387.73 -> 363.50
6%	web-de fnbpaint	android-hw-g5-7-0-arm7-shippable-qr	cold webrender	391.48 -> 367.04
6%	google-search-restaurants fcp	android-hw-g5-7-0-arm7-shippable-qr	cold webrender	511.19 -> 480.00
...	...	...	...	...
3%	google-search-restaurants loadtime	android-hw-g5-7-0-arm7-shippable-qr	cold webrender	713.48 -> 690.71

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=31020

Ryan VanderMeulen [:RyanVM]

Comment 18

•

3 years ago

It's good to see progress being made on this bug, but I think we're out of time to get a fix into Fx92. I'm honestly a bit unsure whether we're better off just taking this hit for a cycle vs. trying to backout the regressing patches this late in the cycle. What are your thoughts, Emilio?

Flags: needinfo?(emilio)

Jonathan Kew [:jfkthame]

Comment 19

•

3 years ago

Let's see what Emilio says, but FWIW my vote would be to just take the hit for now. Cold-start wikipedia is something of a worst-case, and doesn't represent a perf regression that users are going to experience on an ongoing basis during a browsing session.

(The second patch here should help further, but I wouldn't want to try and uplift to 92 this late in the cycle; I think we should just aim to make these improvements in 93.)

Emilio Cobos Álvarez (:emilio)

Assignee

Comment 20

•

3 years ago

Yeah, I agree with Jonathan here.

Flags: needinfo?(emilio)

Ryan VanderMeulen [:RyanVM]

Comment 21

•

3 years ago

Thanks.

Status: REOPENED → ASSIGNED

status-firefox92: fix-optional → wontfix

status-firefox93: fixed → affected

Target Milestone: 93 Branch → ---

Pulsebot

Comment 22

•

3 years ago

Pushed by ealvarez@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d400d0a16ce6 Avoid some work for font list updates. r=jfkthame

Cristina Cozmuta (:CrissCozmuta)

Comment 23

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d400d0a16ce6

Status: ASSIGNED → RESOLVED

Closed: 3 years ago → 3 years ago

status-firefox93: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → 93 Branch

Randell Jesup [:jesup] (needinfo me)

Comment 24

•

3 years ago

What's the follow-on, now that this bug is marked resolved? We've had no noticeable improvement since this regression happened. Are improvements still planned for 93?

Flags: needinfo?(emilio)

Emilio Cobos Álvarez (:emilio)

Assignee

Comment 25

•

3 years ago

I don't think it's easy to improve on this (ni? Jonathan to confirm). The current behavior is more correct than before. I could avoid or artificially delay the reflow when this happens to restore the previous metrics, but it'd be a correctness issue / potentially cause flashes.

Flags: needinfo?(emilio) → needinfo?(jfkthame)

blinky

Updated

•

3 years ago

Regressions: 1730456

Jonathan Kew [:jfkthame]

Comment 26

•

3 years ago

Yeah, comment 25 sounds right to me. If we render sooner by deferring the reflow here, we introduce/increase the chance of rendering temporarily-incorrect content that will then flash/shift when it updates to the final rendering, which is undesirable.

Flags: needinfo?(jfkthame)

Tom

Updated

•

3 years ago

Regressions: 1727399

BugBot [:suhaib / :marco/ :calixte]

Updated

•

3 years ago

Crash Signature: [@ gfxFontGroup::FindFallbackFaceForChar] [@ gfxFontGroup::FindFontForChar]

BMO Automation

Updated

•

3 years ago

Has Regression Range: --- → yes

Chris Peterson [:cpeterson]

Updated

•

2 years ago

Regressions: 1771810

You need to log in before you can comment on or make changes to this bug.