Closed Bug 1771977 Opened 2 years ago Closed 2 years ago

Understand the facebook performance change caused by enabling stacking context display item reuse

Categories

(Core :: Web Painting, task, P3)

task

Tracking

()

RESOLVED FIXED

People

(Reporter: jrmuizel, Assigned: tnikkel)

References

(Blocks 1 open bug)

Details

Attachments

(1 obsolete file)

https://treeherder.mozilla.org/perfherder/alerts?id=33153 shows a regression in LastVisual by 500% (from 1 second to 5 seconds). It seems unlikely to me that this could be because of the performance difference between retained display lists and stacking context reuse and more likely caused by some kind of painting or other behaviour change. We should understand what's going on by getting a profile of the before and after with a pref flip.

Blocks: 1771974

I think facebook*LastVisualChange tests are just flaky. Enabling partial drawing of bg images improved facebook-nav.marketplace LastVisualChange from 6s to 1s on all platforms. When I fixed a reddit-billgates-ama.members ContentfulSpeedIndex regression (which I'm still not sure if it represents something real or the test measurement being weird) by skipping trying to draw a partial bg image if we didn't have any decoded pixels yet (should be purely an improvement) the above facebook numbers went back to 6s again. (Bug 1769195 has the alerts linked for both)

I used the ability to generate a profile (with screenshots so we can see what is going on) of the browsertime run on try with trunk and with the layout.display-list.retain.sc pref turned on. In both cases we first show the little fb chat overlay bubble in the bottom right very close to 9 seconds in. We aren't busy before this, so it must be some timeout or idle timer firing. The profiler also helpfully graphs the percent visually complete, and we can see that with the perf disabled the graph hits 100% long before the chat bubble shows up, and with the pref enabled the graph only hits 100% when the chat bubble shows up. So we are showing the same visual content at roughly the same time but we are getting very different outputs. I suspect that there might be a 9 second cut off in determining what the final page looks like. And the chat bubble showing up just before/just after 9 seconds determines what our perf infrastructure decides is the final visual image of the page. If the chat bubble comes in just after 9 seconds we consider the final visual to be without the chat bubble, so we reach that after 1 second. If the chat bubble comes in just before 9 seconds then what we consider to be the final visual has the chat bubble, so we hit a complete visual picture after 9 seconds.

I also downloaded the browsertime.tgz file which has videos of the page loads and overlays the metrics when they happen with time stamps (like domcontentloaded etc). With the pref off the chat bubble is never included (the videos end much before that), with the pref on the chat bubble shows up at the end (and the videos are much longer). So this seems to confirm that the perf measurement code is including the chat bubble in the final visual image only when the pref is on.

Now how does this relate to this retain.sc pref. It seems like the retain.sc pref makes the chat bubble show up slightly sooner so it is consistently under the 9 second mark.

pref off 8.9816135 9.08688915 9.09305110 avg 9.05385125
pref on 8.927768475 8.9236057 8.9204410 avg 8.92393839167

About 130ms (7 or 8 frames) sooner with the pref on. So if the retain.sc pref is a net win on the rest of this page load (so that we queue the settimeout or whatever it is to load the chat bubble 130ms sooner) that would explain it.

tldr So this seems like a positive result from turning on the retain.sc pref, and the perf measurement framework is getting confused and instead reporting it as a regression.

Sean, why do you think the perf measurement framework is getting confused?

Flags: needinfo?(sefeng)

From comment 3, it sounds like we didn't wait long enough after the page load so the chat bubble was missing with pref off.

I think this can be easier verified with a longer timeout, we could just modify the default timeout and push to try. However, I am not sure where the code about that is...sparky knows I believe!

Flags: needinfo?(sefeng) → needinfo?(gmierz2)

This is an interactive test so you can easily add extra browsertime commands here to increase the delays between page navigations and recordings: https://searchfox.org/mozilla-central/source/testing/raptor/raptor/tests/interactive/browsertime-responsiveness.ini#50-66

If adding or increasing the wait.byTime calls doesn't help, then you could try modifying the --pageCompleteCheck* and --pageCompleteWait flags. You can add those flags to the test like this: https://searchfox.org/mozilla-central/source/testing/raptor/raptor/tests/custom/browsertime-process-switch.ini#26

Flags: needinfo?(gmierz2)

Thanks!

The "page_timeout = 90000" line is what grabs my attention because the chat bubble happens right around 9 seconds. I pushed to try with both that and wait.byTime increased.

The page shows a chat bubble right around 9 seconds. So increase from 9 seconds to 12 seconds.

Assignee: nobody → tnikkel
Status: NEW → ASSIGNED

Moving the conversation from Phab to here:

! In D148234#4844342, @tnikkel wrote:

! In D148234#4844304, @sparky wrote:
Thanks for the explanation. I think the question now is do you want to measure the chat bubble...

I don't think I know enough about perf tests to decide if we want to measure the chat bubble showing up or not, so I would be curious what other people think and who would make a call on that.

To me, if we are catching the improvement from your pref elsewhere then we could ignore the improvement in the chat bubble and decrease the delay to ignore it. But others may feel differently about this, :sefeng :denispal do either of you have any thoughts on this?

Here's a TLDR of the issue: tldr the fb chat bubble in the bottom right shows up right around the 9 second mark. When the layout.display-list.retain.sc pref is on the chat bubble shows up ~130ms sooner and is included in what the perf measurement framework considers to be the final or complete image of the page. When the layout.display-list.retain.sc pref is off the chat bubble shows up ~130ms later and is _not_ included in what the perf measurement framework considers to be the final or complete image of the page. This means that when the layout.display-list.retain.sc pref is on it takes until 9 seconds for the page to be determined to be visually complete. When the pref layout.display-list.retain.sc is off we determine the page is complete around 1 second.

As a side note, if we decrease the delay, we might end up catching it again one day so increasing the delay would prevent that issue.

See comment above.

Flags: needinfo?(sefeng)
Flags: needinfo?(dpalmeiro)

My opinion is we should increase the delay to catch the bubble. Just because it's part of the page load and I don't see why we'd want to exclude that. I believe generally we should try to catch as much as page contents as possible and exclude any inconsistency. So including it in our measurement would simply make it more consistent.

We can just say this is a baseline change if we increase the delay.

Flags: needinfo?(sefeng)

Yes, I think increasing the delay seems reasonable here. This is also a good example of where a verification step would be useful. We could use that step to rerun the test with a longer delay to see if things pass.

Flags: needinfo?(dpalmeiro)

I agree with :sefeng, and :denispal. Good point about the verification step!
:tnikkel, you can move forward with increasing the delay instead of decreasing/removing it.

Hmm, I've tried several different things as suggested here, wait.byTime and --pageCompleteCheck* and --pageCompleteWait, but so far none of them have changed the LastVisual number on the test on try. Haven't dug into too far yet.

Blocks: 1773761

I think this might not be working because the frames that the composition recorder captures look to be one framed delayed, which means if the final frame produced by the page (say frame N) has something significant change in it then that change will not be in the captured frames, as the captured frames will go up to frame N-1. At least in one configuration that I have locally and observed on try.

Depends on: 1779953

Bug 1779953 fixed the issue in comment 15 but didn't make this problem go away (ie LastVisualChange on this test is still 9000ms with retain.sc=true and much shorter with retain.sc = false).

If I introduce a "fake invalidate" after the chat bubble loads at 9s, that causes a useless paint, and causes composition recorder to record one more frame (identical to the previous frame) and then LastVisualChange is 9000ms when retain.sc=false. So the software analyzing the recorded frames is sensitive to the number of identical frames at the end of the recording. If there are two few it seems to ignore them. If there are enough it says "okay then that is the final picture".

retain.sc=true causes an extra frame to be recorded by the composition recorder. This is because the site makes some change a second or two after the 9s introduction of the chat bubble. retain.sc=false is smart enough to realize this change does not affect anything and is able to perform an empty transaction which does not cause composition recorder to record one more frame. retain.sc=true performs a full transaction, and thus composition recorder records one more frame and trips the aforementioned odd behaviour of the analysis code.

You'll likely want to focus on this line: https://github.com/sitespeedio/browsertime/blob/22985a262cd01cbf1bb51190f3aa8bccf34defbf/browsertime/visualmetrics-portable.py#L883

It has a max pixel difference of 5 meaning that up to 5 pixels (with a fuzz of 15% around the intensities) can be different to be considered the same. As you might guess, using pixel counts is a bad way of determining if frames are different. In the future, we want to implement a segmentation step to determine how clustered the different areas are which would allow us to remove this pixel count and look at the size of the segmented patch.

You may be hitting this issue here where there might be 6+ disjointed pixels that are different spread across the frames being compared.

(In reply to Greg Mierzwinski [:sparky] from comment #18)

You'll likely want to focus on this line: https://github.com/sitespeedio/browsertime/blob/22985a262cd01cbf1bb51190f3aa8bccf34defbf/browsertime/visualmetrics-portable.py#L883

I think the images are identical, so if that line does turn out to be important than the most basic image comparison is broken in that file.

(In reply to Timothy Nikkel (:tnikkel) from comment #19)

(In reply to Greg Mierzwinski [:sparky] from comment #18)

You'll likely want to focus on this line: https://github.com/sitespeedio/browsertime/blob/22985a262cd01cbf1bb51190f3aa8bccf34defbf/browsertime/visualmetrics-portable.py#L883

I think the images are identical, so if that line does turn out to be important than the most basic image comparison is broken in that file.

The code there works fine apart from it using a rudimentary method. It's the fuzz in the images that's the issue which is either coming from resizing the frame, image compression, or from the recorder itself.

By identical I mean the images were bit for bit identical after coming out of the composition recorder.

This seems to be caused by a bug in ffmpeg.

If I use the binary .mozbuild/browsertime/ffmpeg-4.1.4-i686-static/ffmpeg (which I think is what browsertime uses as there is no other ffmpeg binary on my system) I get an mp4 file that is 4 seconds long and never shows the chat bubble.

If I then install ffmpeg from apt-get, this gives me version 4.4.2. Using the exact same command line and input (the only thing changed was the ffmpeg binary) I get a mp4 file that is 7 seconds long and does show the chat bubble.

The ffmpeg command line uses a durations.txt file that lists an image file and then a duration to show it for. The end of the file will look something like this in the test we are looking at:

...
file 'nochatbubble.png'
duration 4.71
file 'chatbubble.png'
duration 0.023
file 'chatbubble.png'

So the chat bubble is showing for 0.023 seconds, which is less than 1 frame at 30fps (0.033). So it seems like ffmpeg definitely could have had a bug handling short duration frames at the end of a image sequence concatenated like this.

How do we go about upgrading the ffmpeg version in browsertime?

Flags: needinfo?(gmierz2)

Oh that's an interesting find!

You would need to update these packages for local testing: https://searchfox.org/mozilla-central/source/tools/browsertime/mach_commands.py#175-200

And these ones will need to be updated for CI testing: https://searchfox.org/mozilla-central/source/taskcluster/ci/fetch/browsertime.yml#5-30

Flags: needinfo?(gmierz2)

:tnikkel, let me know if you need anything else for this. If you can provide me with the archives you want to update to, I can upload them to our perf-automation repo so you can pull from there instead of a personal repo: https://github.com/mozilla/perf-automation

Maybe someone on the perf term who is more familiar with the requirements of bumping package versions could pick this up?

Flags: needinfo?(gmierz2)

:tnikkel, yes we can definitely take care of it. I've added it to our triage list.

Flags: needinfo?(gmierz2)
Whiteboard: [perftest:triage]

:tnikkel can you provide the archives that you want to update to? If you'd prefer, we can source them from https://ffmpeg.org/download.html if you provide your prefered version number. I see from comment 22 that you tested with 4.4, but it looks like the latest is 5.1. I suggest we can take care of this update via bug 1735398.

Depends on: 1735398
Flags: needinfo?(tnikkel)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚GMT from comment #27)

:tnikkel can you provide the archives that you want to update to? If you'd prefer, we can source them from https://ffmpeg.org/download.html if you provide your prefered version number.

I don't have a preference on where to get the builds. Where do you usually get the binaries from?

I don't have a preference on the version number (as long as it's 4.4 or newer so the bug is fixed), I'm not sure what your policy is there. Do you prefer updating to the latest? Or holding back a little to let it stabilize? I would be inclined to update to the latest if there are not other issues/concerns with that.

I see from comment 22 that you tested with 4.4, but it looks like the latest is 5.1. I suggest we can take care of this update via bug 1735398.

I tested 4.4 because that was what apt-get gave me on my ubuntu install, otherwise there is no specific significance of that version number.

Thanks for working on this!

Flags: needinfo?(tnikkel)

(In reply to Timothy Nikkel (:tnikkel) from comment #28)

I don't have a preference on the version number (as long as it's 4.4 or newer so the bug is fixed), I'm not sure what your policy is there. Do you prefer updating to the latest? Or holding back a little to let it stabilize? I would be inclined to update to the latest if there are not other issues/concerns with that.

We don't have a policy that I'm aware of, but we should establish and document one. I'll move this conversation over to bug 1735398.

Severity: -- → S3
Priority: -- → P3

We have a patch where we have upgraded the ffmpeg version to 4.4.1
Being so close to all-hands we are planning on merging it once all-hands completes
https://bugzilla.mozilla.org/show_bug.cgi?id=1735398

Thanks for working on this!

(In reply to Andrej (:andrej) from comment #30)

We have a patch where we have upgraded the ffmpeg version to 4.4.1
Being so close to all-hands we are planning on merging it once all-hands completes
https://bugzilla.mozilla.org/show_bug.cgi?id=1735398

I tested the patch in try server and it does indeed "fix" this bug as expected.

Whiteboard: [perftest:triage]

Now that bug 1735398 has landed I think we can call this fixed.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Attachment #9279552 - Attachment is obsolete: true

What is the measured performance difference from enabling retain_sc on Facebook now that the ffmpeg problem has been fixed?

Flags: needinfo?(tnikkel)

I did a try push with and without retain.sc and retriggered.

https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=3b557f107b430054a6d982ef91556b2b443ba776&newProject=try&newRevision=13d8ec4a91f3b60f07886c7eada42bbf490882a5&framework=13&page=1

Looks to be flat except for facebook-nav.friends ContentfulSpeedIndex opt cold fission webrender on linux and mac regressing a bit. I investigated this by looking at the profiles which show screenshots along with the Contentful Visual Progress. This page loading has 5 visual states that are the same in all cases: previous page showing, blanked out, most of new page showing, one more image drawn, and chat bubble drawn. We hit the same states at the same time with and without retain.sc, however the visual progress that we get credited with is different at on point. With retain.sc off we go from 0% visual progress to 89% when we blank out the previous page and draw nothing. When we actually draw the page we hit 90%. When we draw one more image we hit 91%. Then we have to wait 9 seconds for the chat bubble to draw and we hit 100%. With retain.sc off the exact same visual progress happens at the same time except we only get credited with 82% visual completeness when we blank the page (instead of 89%). We then hit 89% while the page is still blank but the only thing that has changed is the little chrome network status in the bottom right (changed from "www.facebook.com" to "Transferring data from www.facebook.com..."). And then the rest matches with retain.sc off. This is based on the profiler captured screenshots.

If we look at the composition recorder screenshots (slightly different) which is what browsertime actually uses for these metrics I see that with retain.sc off we do no grab any screenshots between the page being blank and most of the page being drawn. With retain.sc on we update the chrome network status in the bottom in this time period while the page is blank. So retain.sc on makes composition recorder decide that the frame has changed and so it grabs one where retain.sc off does not. Looks like this is because there is an empty transaction right in this time period in the retain.sc off profile (DisplayList marker but no ForwardDPTransaction marker). So the empty transaction likely gets skipped by the composition recorder screenshots, leading to a slightly different metrics due to the chrome network status changing but nothing in the page changing.

So this is not a real regression and would also likely go away if we supported empty transactions with retain.sc.

Flags: needinfo?(tnikkel)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: