17.4 - 8.01% tart / tart + 3 more (Linux, OSX, Windows) regression on Sat April 10 2021
Categories
(Firefox :: Address Bar, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox87 | --- | unaffected |
firefox88 | --- | unaffected |
firefox89 | --- | wontfix |
People
(Reporter: alexandrui, Unassigned)
References
(Blocks 1 open bug, Regression)
Details
(4 keywords, Whiteboard: [proton-address-bar])
Attachments
(8 files)
Perfherder has detected a talos performance regression from push 135c5397086dac5623bc45d92bd6e7709806d89d. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
17% | tart | windows10-64-shippable | e10s stylo | 2.24 -> 2.63 | |
17% | tart | linux1804-64-shippable | e10s stylo | 2.74 -> 3.21 | |
15% | tart | linux1804-64-shippable | e10s stylo | 2.81 -> 3.23 | |
14% | tart | macosx1015-64-shippable | e10s stylo | 2.08 -> 2.38 | |
8% | tart | windows10-64-shippable-qr | e10s stylo webrender | 2.58 -> 2.78 |
Improvements:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
9% | tresize | macosx1015-64-shippable | e10s stylo | 9.71 -> 8.87 | |
7% | tp5o_scroll | macosx1015-64-shippable | e10s stylo | 1.52 -> 1.41 | |
7% | tresize | macosx1015-64-shippable-qr | e10s stylo webrender-sw | 7.82 -> 7.28 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.
For more information on performance sheriffing please see our FAQ.
Comment 1•4 years ago
|
||
It's possible this is partially undoing what graduating the main proton pref did, and may be related to to the pixel rounding problem that made us first mark WPT tests passing when the main pref graduated, and then failing again when urlbar graduated, there must be an half pixel varying the toolbar size.
At least some of the graphs show us going back to previous values.
It's also possible tart doesn't appreciate much the wider shadow around the urlbar, though we also removed the expansion of the urlbar.
This affects mostly Linux and Windows, on the Mac we just seem to go back to the status-quo.
We should probably run some experiments related to the shadow and focus ring.
Let's move this to the right Epic.
Comment 2•4 years ago
|
||
== Change summary for alert #29679 (as of Tue, 13 Apr 2021 11:40:36 GMT) ==
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
23% | youtube | SpeedIndex | windows10-64-shippable | warm | 792.62 -> 972.50 |
22% | youtube | SpeedIndex | windows10-64-shippable | warm | 789.29 -> 962.17 |
18% | youtube | SpeedIndex | windows10-64-shippable | cold | 1,240.92 -> 1,469.58 |
10% | paypal | ContentfulSpeedIndex | windows10-64-shippable-qr | cold webrender | 688.31 -> 758.17 |
5% | google-docs | SpeedIndex | windows10-64-shippable | warm | 1,103.17 -> 1,153.33 |
4% | bing-search | ContentfulSpeedIndex | linux1804-64-shippable-qr | cold webrender | 334.25 -> 347.33 |
4% | bing-search | ContentfulSpeedIndex | linux1804-64-shippable | cold | 323.69 -> 335.42 |
3% | bing-search | ContentfulSpeedIndex | linux1804-64-shippable-qr | warm webrender | 252.29 -> 260.33 |
Improvements:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
19% | youtube | SpeedIndex | windows10-64-shippable-qr | warm webrender | 1,009.12 -> 813.25 |
16% | youtube | SpeedIndex | windows10-64-shippable-qr | cold webrender | 1,524.38 -> 1,275.33 |
7% | ebay | ContentfulSpeedIndex | linux1804-64-shippable | warm | 468.88 -> 434.83 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=29679
Comment 3•4 years ago
|
||
all these tests graph seem to pretty much mimic the tart one.
Updated•4 years ago
|
Comment 4•4 years ago
|
||
Set release status flags based on info from the regressing bug 1703274
Comment 5•4 years ago
|
||
Making it a P1 to reflect the fact we need to investigate in order to better understand the issue
Comment 6•4 years ago
|
||
adding ni? on myself, I'll do some Try pushes with parts disabled (like the shadow, updating engines...) to see if numbers change.
Comment 7•4 years ago
|
||
Regarding TART, I first tried the simplest things: removing the larger shadow, removing the border, avoiding search engines update work. Those didn't help. Then I generated 2 perf profiles (sorry, bit.ly not working atm):
before: https://profiler.firefox.com/from-url/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FWcqw-rNyQfGLgi99-cvFmw%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_tart.zip
after: https://profiler.firefox.com/from-url/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FepnJQbFwSxGPFaqEYCI1SA%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_tart.zip
Looking through them, and more specifically looking at the Marker Chart, there's one obvious difference; in the good
case there is a long urlbar-grow
css animation, that is taking up most of the initial time and likely delaying other things. This is the old urlbar expansion animation.
In the bad
case, that animation doesn't exist, so things should be a bit cheaper but, due to that, other things enter the measurement bracket before the TART - End
event, for example I see a BookmarksToolbarVisibilityUpdated
event.
Most of the additional events seem to be related to the bookmarks toolbar now being visible sooner.
Thus, I suspect we are effectively measuring different things now, maybe we should further delay the toolbars update? Though that may also introduce visual "flicker".
All in all, it doesn't look like the urlbar changes are directly causing a tart regression, but rather making things happen sooner than usual.
Mike, is this a plausible interpretation based on your experience?
Next step, I'll look at Youtube speedindex.
Comment 8•4 years ago
|
||
I have no idea how to trigger perf profiles on vismet tests, thus I'm asking on Matrix (and waiting...). Without profiles it may not be trivial to understand where we spend time.
I assumed maybe updating opensearch engines made a difference, but my patch disabling that didn't change much, so it's likely unrelated.
Since this is another test that measures how long it takes for something to appear on screen, I wonder if it's somehow related to the tart problem, thus things happen sooner and delay page painting. Maybe I should try to disable the bookmarks toolbar completely and see what happens.
Comment 9•4 years ago
|
||
Unfortunately, those shortlinks to the profiles aren't working for me. :/ Can you post the long links?
So one of the bad things about TART is that what it's measuring is the average cost to paint frames during tab animations. It's possible that by getting rid of the URL bar grow animation, we've reduced the number of frames we need to paint (that's good), which reduces the size of the set of numbers that we're taking the average of, which can then shift the numbers (especially if that grow animation required little time to paint).
However, I highly recommend that we ignore the TART regressions for now. The percentages look bad, but the actual magnitude in milliseconds is small (less than a millisecond for each). I think we should focus our energies elsewhere, like looking at the SpeedIndex regressions.
Comment 10•4 years ago
|
||
I updated the above comment with links to the profiles.
(In reply to Mike Conley (:mconley) (:⚙️) (Catching up on needinfos) from comment #9)
So one of the bad things about TART is that what it's measuring is the average cost to paint frames during tab animations. It's possible that by getting rid of the URL bar grow animation, we've reduced the number of frames we need to paint (that's good), which reduces the size of the set of numbers that we're taking the average of, which can then shift the numbers (especially if that grow animation required little time to paint).
It was showing as a large entry on the profile, but I'm not expert in reading graphics costs on these profiles.
However, I highly recommend that we ignore the TART regressions for now. The percentages look bad, but the actual magnitude in milliseconds is small (less than a millisecond for each). I think we should focus our energies elsewhere, like looking at the SpeedIndex regressions.
I'm on those, comparing videos based on your idea in the other bug.
Comment 11•4 years ago
|
||
While many of the test runs show no difference, as visible here there is some cases where content appears slightly sooner before the urlbar change.
Also the urlbar contents get drawn slightly later.
Though the video also shows how the urlbar is pretty much doing nothing fancy.
I'll run more experiments on Try changing some padding here and there to see if that influences the score.
Comment 12•4 years ago
|
||
at this point, I'll take the bug since I spent quite some time investigating it already.
Updated•4 years ago
|
Comment 13•4 years ago
|
||
profile before:
https://profiler.firefox.com/from-url/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2Fe09Y_z_nT22IKy-AjflwCg%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_youtube.zip/calltree/?v=5
profile after:
https://profiler.firefox.com/from-url/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FbEKqikkQQWOMuwip29sbIA%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_youtube.zip/calltree/?v=5
Comment 14•4 years ago
|
||
I ran an experiment increasing the padding around the urlbar to 10px (from 4px) to check if it had any effect.
It actually improved youtube SpeedIndex opt cold by 17%... but not effect on warm. It is still interesting how the test is sensitive to changes of the toolbox height.
On the profiles I've noticed this so far:
before
reports things likesetURI
,setTabTitle
, that come earlier in the navigation, whileafter
reports things likeupdateReaderButton
,onSecurityStateChange
. This seems to suggest there is a shift in times, and maybe the test measurement is suffering an initial delay.- There is a block of telemetry markers that in the
before
profile happen around 2.4s, in theafter
profile they happen at 1.3s, so they happen much sooner. This comes from profile-after-change TelemetryControllerParent.jsm::setupTelemetry() that contains a DeferredTask base d on toolkit.telemetry.initDelay, with a default time of 60s. It also creates a new thread and seems to also cause a couple of writeAtomics. It is happening about 1s sooner inafter
and more or less that time coincides with the time where we see a paint delay. should telemetry setup time be included in this measurement?
I'll run a further experiment delaying that telemetry block and checking its effect.
Comment 15•4 years ago
|
||
my telemetry experiment brought some improvements in Bing Search Speedindex, but not in Youtube.
Comment 16•4 years ago
|
||
Let me sum up a bit the status here:
Experiments:
- the shadow around the urlbar doesn't seem to have a measurable effect
- the bookmarks toolbar updates don't seem to have a measurable effect
- the transparent/thick border doesn't seem to have a measurable effect
- the toolbar size seems to have a measurable effect, increasing padding improved some cold speedindex measurement
- delaying telemetry seems to have a measurable effect, it cleaned up considerably the profile and improved some speedindex but had no effect on others
Notes:
- Things happen sooner in the
after
profile. Some examples are tab-burst-animation, reload-to-stop animation, network events and Navigation events all tend to happen sooner. Some of these even 100ms sooner. Paints though seem to happen later. May we be seeing a shift in workload? - in general, chrome doesn't seem to do additional work, or at least I cannot clearly identity any of it
- since this bug is about the urlbar more in specific, I cannot find any additional work added by the urlbar in the profile
My conclusions:
- I'll file a bug to evaluate delaying telemetry init, so it doesn't happen during these tests and in profiles. It doesn't seem to have a huge impact, but it causes I/O, GC and thread creation, that can create noise.
- there is a time shift in events, certain events happen sooner, while graphic events seem to happen later. In general most of the chrome things seem to happen sooner, that seems counter intuitive
- The toolbar padding seems to also have an effect on delays.
- it may be useful to have someone from layout/graphics to look at the profiles
Updated•4 years ago
|
Comment 17•4 years ago
|
||
Bug 1706333 is a further indication that changing animation duration in the chrome toolbar affects these tests. As a reminder here we removed a quite long animation due to the urlbar expanding.
Comment 18•4 years ago
|
||
Hey mak, I wonder if the fact that we're getting to certain animations earlier is, as you say, reordering some work and priority. What happens if you hide the stop/reload and tab throbber elements in the before and after cases? Does the "after" case continue to regress?
Comment 19•4 years ago
|
||
An experiment I want to run is to remove the urlbar expansion in the before case, and see the effect of doing that.
Removing all the animations is another possibility we should test, even if ideally those didn't change when we enabled the proton urlbar.
Comment 20•4 years ago
|
||
And, to be clear, I didn't find any obvious new jank or explanations in the profiles that I looked at. I'm reasonably certain this is a work re-scheduling thing, and that we should just accept this regression.
Comment 21•4 years ago
|
||
I tried running a test on pre-proton-urlbar with ui.prefersReducedMotion = 1, it didn't have interesting effects on SpeedIndex.
Comment 22•4 years ago
|
||
Here's a side-by-side comparison, comparing 6542eba3a6dfb79c7c979f6ec7eea9315e6824c4 on autoland to 3a4f4d99eb21080af52e408da7b60885140136d2 using sparky's script here.
I honestly don't see a real difference here. I don't know what SpeedIndex is talking about.
python3 generate_side_by_side.py --base-revision 6542eba3a6dfb79c7c979f6ec7eea9315e6824c4 --base-branch autoland --new-revision 3a4f4d99eb21080af52e408da7b60885140136d2 --new-branch autoland --platform test-windows10-64-shippable/opt --test-name browsertime-tp6-firefox-youtube-e10s --warm --output ./output
Comment 23•4 years ago
|
||
Same deal as above, except for cold load.
So I think the YouTube regression report is bogus.
Updated•4 years ago
|
Updated•4 years ago
|
Comment 24•4 years ago
|
||
Again, no visible regression.
Comment 25•4 years ago
|
||
In this case, it actually looks like the after case is faster to get the table of contents listed. For some reason, the before case never gets that far.
So, again, I'm not convinced there's a regression for this one too.
Comment 26•4 years ago
|
||
There is a slight difference on Bing with WebRender enabled. It's very slight.
Comment 27•4 years ago
|
||
Very slight difference on Bing here.
Comment 28•4 years ago
|
||
Very slight difference here, too.
Comment 29•4 years ago
|
||
Okay, here's my conclusion:
Enabling the Proton URL bar resulted in a small change to the content area. This has resulted in some changes to our browsertime scores, and I think we can chalk that up as a re-baselining (similar to bug 1704463).
The change has also resulted in slight differences in scores for some of our other tests, but there's really nothing actionable from the profiles as a single "culprit". If anything, the change in the URL bar has changed the environment slightly, as well as shifting some scheduling around. That's what these tests are detecting. I don't think this bug should block MR1. I think we should at least deprioritize this bug, if not closing it as WONTFIX.
Comment 30•3 years ago
|
||
Thanks for the thorough investigations! Agreed that if this is most likely a re-baselining it should not block Proton.
Leaving this opened as a P3 since it seems like a follow-up is needed to ensure that the proton changes are accounted for moving forward (set a new baseline?)
Updated•3 years ago
|
Comment 31•3 years ago
|
||
Not actively working on this atm, and I did all I could here.
Comment 32•3 years ago
|
||
I'm going to close this as WFM.
I already explained above how tart is now measuring stuff that didn't enter its measurement bracket before.
And both me and mconley were unable to find a clear cause for the SpeedIndex regression. Now today this happens: https://bugzilla.mozilla.org/show_bug.cgi?id=1707949#c9. That bug only changed opacity of urlbar-icons, and shows a huge 18% win on youtube SpeedIndex.
At this point keeping this open doesn't make sense.
Updated•3 years ago
|
Description
•