Open Bug 1641070 Opened 4 years ago Updated 2 years ago

Frames skipped during fling on mac

Tracking

()

Status:

NEW

People

(Reporter: jrmuizel, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(3 files)

Bug 1641070 - Prepare for adding a new field to SampledAPZCState. 4 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) (deleted), text/x-phabricator-request		Details
Bug 1641070 - Add an input cutoff timestamp to the sampled state. 4 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) (deleted), text/x-phabricator-request		Details
Bug 1641070 - Resample APZC state after getting delayed inputs. 4 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) (deleted), text/x-phabricator-request		Details

Jeff Muizelaar [:jrmuizel]

Reporter

Description

•

4 years ago

Take a look at https://share.firefox.dev/2ZGH6om
There should be a steady stream of new frames but sometimes we don't paint.

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

4 years ago

Blocks: wr-mac

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 1

•

4 years ago

We're calling wr_notifier_nop_frame_done for the skipped frames

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 2

•

4 years ago

And we call wr_notifier_nop_frame_done because https://searchfox.org/mozilla-central/rev/7dafc35406b9c945189c617d427f5458933fd3fb/gfx/wr/webrender/src/render_backend.rs#1606

Lee Salzman [:lsalzman]

Updated

•

4 years ago

Severity: -- → S3

OS: Unspecified → macOS

Priority: -- → P3

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 3

•

4 years ago

IIRC Jamie was looking at this code not long ago, there was a patch fixing something here that might be related. i don't recall the details though.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 4

•

4 years ago

Oh, I was thinking of https://hg.mozilla.org/mozilla-central/rev/2760b8a14f17ac39b2b6cc1ac2d63f2f1d05bb77 - maybe not related after all.

Jamie Nicol [:jnicol]

Comment 5

•

4 years ago

It could be related: my patch inverted the logic for how doc.rendered_frame_is_valid is set. I would have thought that during a fling the composite_descriptor would be changing, so we'd set it to false (and therefore render the frame), but maybe I misunderstand what that describes.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 6

•

4 years ago

Here's a bit of logging:

update_doc true
FrameMsg::ScrollId (0,100) ExternalScrollId(2, PipelineId(1, 10))
FrameMsg::ScrollId (0,0) ExternalScrollId(2, PipelineId(1, 1))
BuildFrame
new Frame
update_doc true
FrameMsg::ScrollId (0,144) ExternalScrollId(2, PipelineId(1, 10))
FrameMsg::ScrollId (0,0) ExternalScrollId(2, PipelineId(1, 1))
BuildFrame
new Frame
update_doc true
FrameMsg::ScrollId (0,144) ExternalScrollId(2, PipelineId(1, 10))
FrameMsg::ScrollId (0,0) ExternalScrollId(2, PipelineId(1, 1))
nop Frame
update_doc false
FrameMsg::ScrollId (0,-78) ExternalScrollId(2, PipelineId(1, 10))
FrameMsg::ScrollId (0,0) ExternalScrollId(2, PipelineId(1, 1))
update_doc true
FrameMsg::ScrollId (0,0) ExternalScrollId(2, PipelineId(1, 10))
FrameMsg::ScrollId (0,0) ExternalScrollId(2, PipelineId(1, 1))
BuildFrame
new Frame
update_doc true
FrameMsg::ScrollId (0,34) ExternalScrollId(2, PipelineId(1, 10))
FrameMsg::ScrollId (0,0) ExternalScrollId(2, PipelineId(1, 1))
BuildFrame
new Frame

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 7

•

4 years ago

Of note is the fact that we have two frames of FrameMsg::ScrollId (0,144)

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 8

•

4 years ago

A profile showing the problem with some extra annotation: https://share.firefox.dev/2M7XFBE

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

4 years ago

Flags: needinfo?(kats)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 9

•

4 years ago

What are the STR for this bug?

Flags: needinfo?(kats) → needinfo?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 10

•

4 years ago

I believe I was just flinging on a long hackernews thread.

Flags: needinfo?(jmuizelaar)

Markus Stange [:mstange]

Comment 11

•

4 years ago

I think this is the same problem as the one we had on Android before we switched to delivering Vsync from the main thread: The fling "wheel" events fire on the main thread, at vsync rate, and we can get unlucky if they fire just after we start the composite.

You can see this in the profile. For every no-op composite in the renderer thread, zoom in on the previous frame (since we composite APZ scrolls one frame after they happen) and compare the time between the CompositeToTarget marker on the compositor thread and the DOMEvent marker on the main thread. If the DOMEvent happens after the CompositeToTarget, then the event fired too late and we skip the next frame.

Of course this is not a good situation to be in.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 12

•

4 years ago

Linking to the relevant Android bug. Thanks for the analysis, it makes sense (from what I recall of this bug).

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1432019

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 13

•

4 years ago

We can fix this in a bit of a hacky way if we define some threshold time (e.g. 3ms after vsync) during which any user input events will cause a resampling of APZ state for the next vsync. With the changes in bug 1630781 we have a better encapsulation of sampled state and implementing this sort of change should be relatively easy. However defining the threshold time itself feels not-great.

If the input events dispatched from the OS have timestamps that are consistently related to the vsync timestamp then we can use that for a less-hacky solution. i.e. if we get user inputs that arrive after CompositeToTarget but with a timestamp from before the vsync then we know they just got held up somewhere for Reasons, and we should resample the APZ state after the user input is applied, and use that for the next composite. I don't know if the timestamps will line up though.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 14

•

4 years ago

I can do some investigation into the timestamp approach.

Assignee: nobody → kats

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 15

•

4 years ago

Attached file Bug 1641070 - Prepare for adding a new field to SampledAPZCState. (deleted) — Details

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 16

•

4 years ago

Attached file Bug 1641070 - Add an input cutoff timestamp to the sampled state. (deleted) — Details

Depends on D85297

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 17

•

4 years ago

Attached file Bug 1641070 - Resample APZC state after getting delayed inputs. (deleted) — Details

Depends on D85298

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 18

•

4 years ago

Attaching WIPs for what I had in mind. I'm not sure of a good way to test this yet.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 19

•

4 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=2865599c5d48cf05d46b3cee7eafc4e66427a32a is a try push which includes these patches, in case people want to try it locally.

Markus Stange [:mstange]

Comment 20

•

4 years ago

I tested the try build for a bit. It was a bit hard to get a conclusive comparison because in my machine's current state, even a regular Nightly is mostly smooth during flings. I'm not sure how to increase the machine load in a way that reliably results in jittery wheel event timing.

However, while scrolling was mostly smooth, I did encounter a few instances of frame skips during flings in the try build.
Here's a profile from a jittery fling: https://share.firefox.dev/3gfPTD6
You can see a mix of 1-frame and 2-frame "APZScroll Payload Presented" latencies.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 21

•

4 years ago

Ah, the patch didn't touch the payload stuff, so the payload for "just after vsync" events wouldn't get reported properly. I can fix that but I think your experience underscores the need for a more reliable way to test this. I'll give this some thought.

Markus Stange [:mstange]

Comment 22

•

4 years ago

Oh I see. Yeah, switching to the Renderer thread, the no-op composites are in similar but different places: https://share.firefox.dev/2XfVIcj

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

4 years ago

Depends on: 1251612

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 23

•

4 years ago

I wrote some patches to get test_smoothness usable again (bug 1251612) and added some more logging to understand this problem better. It looks like the unsmoothness actually happens because the events are processed too early rather than too late. i.e. this bit is smooth:

 0:13.30 GECKO(91709) 0x12506d000 Sampling for frame at 5377.784193
 0:13.34 GECKO(91709) 0x12506d000 Handled event with timestamp 5387.830474
 0:13.34 GECKO(91709) 0x12506d000 Sampling for frame at 5411.221955
 0:13.34 GECKO(91709) 0x12506d000 Handled event with timestamp 5399.613695

and this bit is not:

 0:13.42 GECKO(91709) 0x12506d000 Sampling for frame at 5495.568431
 0:13.43 GECKO(91709) 0x12506d000 Handled event with timestamp 5484.319876
 0:13.44 GECKO(91709) 0x12506d000 Handled event with timestamp 5496.650982
 0:13.44 GECKO(91709) 0x12506d000 Sampling for frame at 5511.339022
 0:13.46 GECKO(91709) 0x12506d000 Sampling for frame at 5528.897627
 0:13.46 GECKO(91709) 0x12506d000 Handled event with timestamp 5516.373299
 0:13.48 GECKO(91709) 0x12506d000 Handled event with timestamp 5530.763990

We can see that the events arrive bunched up, but the second event actually has a timestamp that's later than previous "sampling for frame" timestamp, and so we actually want to exclude that event from going into the sample. That seems to be the opposite problem than what I thought, and maybe not as easy to fix. Maybe we should keep the input events in the input queue until the sampling for the previous interval happens.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 24

•

4 years ago

I'm going to leave this bug parked with me for now although I don't plan on working on it in the immediate future as I'm focused on desktop zooming stuff. The discovery outlined in the previous comment means the fix is more complicated than I anticipated and I want to think about how to do it best.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 25

•

4 years ago

This came up during the gfx-daily today. I doubt I'll be working on this in near/medium future, so I'm going to unassign myself. Markus has plans for a bit of vsync rewrite which might make fixing this easier.

Assignee: kats → nobody

Timothy Nikkel (:tnikkel)

Comment 26

•

3 years ago

Is this still happening? I looked at the profile in comment 0, I'm guessing the problem is that there is not a payload presented marker for every vsync timestamp?

It looks like our markers have changed quite a bit since that profile, but in my local profiles ( https://share.firefox.dev/3ssydKp ) I see apz payload presented every vsync. Or should I be looking at contentpaint payload presented? Which only seems to happen after we get a new display list?

Markus Stange [:mstange]

Comment 27

•

3 years ago

Yes, I still see this regularly. Profile from just now: https://share.firefox.dev/3uzXLan
You can see the APZ scroll payload markers fluctuate between 20ms and 36ms.

Timothy Nikkel (:tnikkel)

Comment 28

•

3 years ago

Yeah, I see that in your profile, thanks. Is there anything more complicated to the STR then just fling a simple page that I might be missing?

Markus Stange [:mstange]

Comment 29

•

3 years ago

I'm not sure why it only happens sometimes. Maybe the synthetic momentum events aren't generated based on vsync, maybe there's another clock in play. So then maybe it depends on the clock drift between those two timing sources.

Timothy Nikkel (:tnikkel)

Comment 30

•

3 years ago

Tried to reproduce on two other macs and couldn't. (Don't spend any time trying to figure out why, I'm not sure if I'll look into this.)

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

3 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1734692

Markus Stange [:mstange]

Updated

•

3 years ago

Depends on: 1735391

Markus Stange [:mstange]

Comment 31

•

2 years ago

I see this problem all the time now. I'm not sure what has changed. Whether wheel events get delivered at a consistent rate seems to be really finicky; it seems to depend on the window size, and possibly on the power state. The stutter effect becomes way less pronounced when the profiler is running, or when I record the screen using QuickTime. This makes it really hard to get a handle on it.

You need to log in before you can comment on or make changes to this bug.