Open Bug 1704463 Opened 4 years ago Updated 3 years ago

89.75 - 3.43% imgur ContentfulSpeedIndex / nytimes PerceptualSpeedIndex + 10 more (Linux, OSX, Windows) regression on Wed April 7 2021

Categories

(Firefox :: Theme, defect, P3)

defect

Tracking

()

Performance Impact none
Tracking Status
firefox-esr78 --- unaffected
firefox87 --- unaffected
firefox88 --- unaffected
firefox89 --- fix-optional

People

(Reporter: Bebe, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: perf, perf-alert, regression, Whiteboard: [proton])

Attachments

(4 files)

Perfherder has detected a browsertime performance regression from push 1716229005d8b97f305bd663b8bc8d3fec4a4f3a. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
90% imgur ContentfulSpeedIndex linux1804-64-shippable-qr cold webrender 1,352.75 -> 2,566.83
80% imgur ContentfulSpeedIndex linux1804-64-shippable-qr cold webrender 1,352.92 -> 2,437.92
72% imgur LastVisualChange linux1804-64-shippable-qr cold webrender 3,096.67 -> 5,330.00
23% youtube SpeedIndex windows10-64-shippable-qr warm webrender 827.08 -> 1,020.75
19% youtube SpeedIndex windows10-64-shippable-qr cold webrender 1,286.42 -> 1,536.92
7% cnn ContentfulSpeedIndex linux1804-64-shippable-qr warm webrender 1,246.58 -> 1,328.75
6% reddit ContentfulSpeedIndex windows10-64-shippable-qr warm webrender 361.17 -> 384.17
5% twitch ContentfulSpeedIndex macosx1015-64-shippable-qr warm webrender 977.00 -> 1,028.25
4% bing-search ContentfulSpeedIndex linux1804-64-shippable-qr cold webrender 320.08 -> 333.50
4% twitch ContentfulSpeedIndex macosx1015-64-shippable-qr cold webrender 1,009.21 -> 1,047.50
4% google-docs ContentfulSpeedIndex linux1804-64-shippable-qr warm webrender 1,390.08 -> 1,441.58
3% nytimes PerceptualSpeedIndex linux1804-64-shippable-qr warm webrender 851.83 -> 881.08

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
20% outlook ContentfulSpeedIndex linux1804-64-shippable warm 926.88 -> 737.25
20% outlook ContentfulSpeedIndex linux1804-64-shippable-qr warm webrender 991.42 -> 790.58
19% yahoo-mail LastVisualChange linux1804-64-shippable warm 1,096.67 -> 886.67
19% youtube SpeedIndex linux1804-64-shippable-qr warm webrender 1,044.50 -> 849.50
17% yahoo-mail LastVisualChange linux1804-64-shippable-qr warm webrender 1,163.33 -> 960.00
17% outlook ContentfulSpeedIndex macosx1015-64-shippable-qr warm webrender 919.62 -> 763.17
15% yahoo-mail LastVisualChange linux1804-64-shippable cold 1,566.67 -> 1,333.33
15% yahoo-mail LastVisualChange macosx1015-64-shippable-qr warm webrender 1,096.67 -> 933.33
15% youtube SpeedIndex linux1804-64-shippable-qr cold webrender 1,488.75 -> 1,267.67
14% yahoo-mail LastVisualChange linux1804-64-shippable-qr cold webrender 1,626.67 -> 1,403.33
10% yahoo-mail LastVisualChange macosx1015-64-shippable-qr cold webrender 1,603.33 -> 1,450.00
9% yahoo-mail ContentfulSpeedIndex windows10-64-shippable-qr warm webrender 354.12 -> 321.17
8% paypal ContentfulSpeedIndex windows10-64-shippable-qr cold webrender 738.88 -> 682.75
4% fandom ContentfulSpeedIndex windows10-64-shippable-qr warm webrender 219.58 -> 211.08
4% imdb loadtime android-hw-g5-7-0-arm7-api-16-shippable-qr warm webrender 4,221.24 -> 4,061.46
3% twitch ContentfulSpeedIndex windows10-64-shippable-qr warm webrender 1,331.00 -> 1,297.67

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(bigiri)

Set release status flags based on info from the regressing bug 1700109

== Change summary for alert #29614 (as of Thu, 08 Apr 2021 05:56:29 GMT) ==

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
5% tresize linux1804-64-shippable-qr e10s stylo webrender-sw 19.48 -> 20.51
5% tresize linux1804-64-shippable-qr e10s stylo webrender-sw 19.49 -> 20.47
4% tabswitch linux1804-64-shippable-qr e10s stylo webrender 6.27 -> 6.49

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
18% tart linux1804-64-shippable-qr e10s stylo webrender-sw 4.08 -> 3.34
12% tart windows10-64-shippable-qr e10s stylo webrender 2.90 -> 2.56
10% tart macosx1015-64-shippable e10s stylo 2.29 -> 2.05
7% tart macosx1015-64-shippable-qr e10s stylo webrender 2.12 -> 1.97
5% tart macosx1015-64-shippable-qr e10s stylo webrender-sw 2.23 -> 2.11
4% tresize macosx1015-64-shippable-qr e10s stylo webrender 7.28 -> 6.95
4% twinopen ext+twinopen:twinopen.html macosx1015-64-shippable-qr e10s stylo webrender-sw 95.81 -> 91.72
4% twinopen ext+twinopen:twinopen.html macosx1015-64-shippable-qr e10s stylo webrender 113.17 -> 108.80
4% tart linux1804-64-shippable e10s stylo 2.89 -> 2.78

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=29614

Any thoughts on this Mike?

Flags: needinfo?(bigiri) → needinfo?(mconley)

The tresize and tabswitch regressions listed in comment 2 for Linux (only) look small enough in magnitude (less than 1ms on average each) to be worth ignoring.

The page speed regressions and improvements in comment 0 are much more surprising.

Hey jesup, do you know what the current state-of-the-art is for getting comparison profiles for page speed tests?

Flags: needinfo?(mconley) → needinfo?(rjesup)

Making sure rtestard is in the loop.

Blocks: proton
Whiteboard: [proton]

Some of the regressions relate to sites with video playback (Twitch and Youtube) - could that relate to the 2nd line of text now displayed on tab?

Setting as P1 o reflect the fact we need to investigate in order to understand the source of the issue, it can then be re-assessed once we better understand the root cause

Priority: -- → P1

Dusting off my perf chops to see what's going on here.

Assignee: nobody → mconley

Here are some profiles for the 90% Imgur regression:

Before: https://share.firefox.dev/3snSLDf

After: https://share.firefox.dev/3uSqYwh

The "After" profile with Proton seems to be loading more JS in the content process than the "Before" profile. Here's a list of JS files that the "after" profile seems to load that the "before" process does not:

One thing worth noting is that the browser chrome is now larger, and the content area is smaller, which may affect the visual metrics calculations.

Doing a side by side comparison reveals that the proton one actually did a bit better

So jesup notes that the last major frame of difference is that "We value your privacy" banner, and that it appears slightly later in the Proton case. Because that change takes up so much of the content area, it has a relatively high impact on the overall score.

I've zoomed in on the part of each profile where (I believe) the banner is being shown after a setTimeout:

Before: https://share.firefox.dev/3ahxIfg

After: https://share.firefox.dev/2QwCBdi

Notably, there's a long-ish (47.1ms) layout flush in the "after" profile.

Some updates here:

Imgur regressions on Linux 64 cold webrender

I wrote a patch that took the pre-Proton state of Firefox, and inflated the navigation toolbar padding to match Proton, which caused the content area to be the same size in both the before and after cases.

This seems to address the Imgur issues on Linux 64: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=5bc22fff4a703d278926430e3eae22f5740b04d0&newProject=try&newRevision=379a834aa848613146bc7b0c590336926acf7058&framework=13

So my conclusion is that this "regression" is really just a re-baselining. I don't think it actually is capturing a meaningful regression that would impact our users, so I suggest we accept it as the new baseline with Proton enabled.

YouTube regressions on Windows 10 64-bit webrender (cold and warm)

These regressions appear to have gone away?:

So I guess there's nothing to do here.

CNN

The CNN regression also appears to have receded:

The rest

The rest of the regressions persist. I'm going to see if the change to the content area is the cause for them, too.

After retriggers, the only signal that appears to stick around is reddit ContentfulSpeedIndex opt warm webrender on Windows 10: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=5bc22fff4a703d278926430e3eae22f5740b04d0&newProject=try&newRevision=379a834aa848613146bc7b0c590336926acf7058&framework=13

This regression appears to persist still: https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=1&series=autoland,3381656,1,13&series=autoland,3400821,1,13&timerange=1209600

Investigating - though note that the Reddit vismet test is Tier 2, so this should not block.

Attached video warm-side-by-side.mp4 (deleted) —

Side by side comparison of Reddit warm run on Windows.

I used sparky's generate_side_by_side.py script with these arguments:

python3 generate_side_by_side.py --base-revision 5bc22fff4a703d278926430e3eae22f5740b04d0 --base-branch try --new-revision 379a834aa848613146bc7b0c590336926acf7058 --new-branch try --platform test-windows10-64-shippable-qr/opt --test-name browsertime-tp6-firefox-reddit-e10s --warm --output ./output

Slowing the video down, it looks like the "after" video starts the load a frame or two later, and the cookie banner at the bottom comes in a few frames after.

The before and after profiles don't reveal anything obviously actionable here. I suspect the change in browser UI has caused a shift in scheduling, or memory layout, or something along those lines, which is causing this small regression.

rtestard: Given my conclusion for the regressions in comment 18, and in this comment, I suggest this bug no longer block MR1.

Flags: needinfo?(rtestard)

Thanks for the thorough investigations! Agreed that if this is most likely a re-baselining it should not block Proton.
Leaving this opened as a P3 since it seems like a follow-up is needed to ensure that the proton changes are accounted for moving forward (set a new baseline?)

Flags: needinfo?(rtestard)
Priority: P1 → P3
Whiteboard: [proton] → [proton][qf]
Assignee: mconley → nobody
Whiteboard: [proton][qf] → [proton][qf-]
Has Regression Range: --- → yes
Performance Impact: --- → -
Whiteboard: [proton][qf-] → [proton]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: