Closed Bug 1542808 Opened 6 years ago Closed 5 years ago

Wayland: Use frame callbacks as per-surface VsyncSource

Categories

(Core :: Graphics, enhancement, P3)

Unspecified
Linux
enhancement

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox72 --- fixed

People

(Reporter: kennylevinsen, Assigned: kennylevinsen)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 2 obsolete files)

Wayland does not expose a concept of vsync or refresh rates. Instead, an application can be notified when the compositor decides that the application should render its next frame through "frame callbacks".

These frame callbacks exists per surface, and while they very explicitly do not indicate vsync, they are guaranteed to fire at a time that would give the application the most time to render before the next vblank. These events will under normal operation usually fire at display refresh rate, however, they are throttled as the Wayland compositor sees fit. The events may occur at a reduced rate or not at all, such as is the case for hidden windows.

We should utilize these frame callbacks, both to avoid tearing and to minimize unnecessary CPU load/power Wayland consumption believes rendering is futile.

As my first try at working with the Firefox source, I have been experimenting with various ways of doing this. My initial implementation used a global VSyncSource, fed by each surface's frame callback. However, this does not solve the problem where some, but not all windows are hidden, and thus should not render. It also leads to the vsync rate increasing for every window created. With that in mind, this does not seem like the correct approach.

Instead, I am now working on an implementation of per-surface VsyncSources in nsWindow, installing the source in its VsyncDispatcher when it has been made available. This results in compositing running at the correct rate.

However, this does not solve when to fire refresh timers. I do not immediately see a way to hook these timers up to their relevant VsyncSource. This will need to be solved to complete the implementation.

Input greatly appreciated. Patches for current approaches will be posted.

Blocks: wayland
Priority: -- → P3
Summary: Use frame callbacks as per-surface VsyncSource → Wayland: Use frame callbacks as per-surface VsyncSource

Bug 1515448 landed so we should throttle the eglSwapBuffers() cadence calling now.

Please note that the frame callback is not fired for hidden surfaces so some approximation needs to be used when frame callback is missing.

I'd rather not fire a vsync source at anything other than a frame callback. If there are tasks tied to a vsync source that must fire in the background, then I believe that the correct solution would be to attach those directly to a soft timer source, rather than attempting to simulate vsync in its absence.

That will allow us to avoid unnecessary render/composite tasks for a hidden surface.

Attaching my original naive implementation where each nsWindow attaches surface callbacks that feed a gfxPlatform-global VsyncSource. Works great for 1 window, vsync gets crazy with more.

Attached patch nsWindow-local WaylandVsyncSource (obsolete) (deleted) — Splinter Review

Initial implementation of a nsWindow-local WaylandVsyncSource which attaches to the nsBaseWidget CompositorVsyncDispatcher. It is currently missing proper shutdown of the vsync threads, and segfaults on shutdown in ProfilingStack::~ProfilingStack() for some reason.

Ignoring those two bugs, it seems to work and handles composition, but does not take care of the nsRefreshDrivers.

My attempts to deal with nsRefreshDrivers are based around retrieving the nsWindow instance as a nsBaseWidget through GetPresContext()->GetRootWidget(), and then creating a VsyncRefreshDriverTimer from the nsWindow VsyncSource. For child processes, this will have to go through a VsyncBridge... somehow.

I have not yet had success with the nsRefreshDriver aspect. I am still trying to get a grasp of how things attach to each other down there.

Attachment #9057481 - Attachment is obsolete: true
Blocks: 1531756

Lets Wayland sessions run vsync off wayland surface frame callbacks by creating
an interface for widgets to return a local VsyncSource, if applicable.

This interface is currently used for the compositor, and for refresh drivers
in the parent process. It is not yet used for vsync in content processes.

Attachment #9057486 - Attachment is obsolete: true

Any updates on this front? This build had the best jank-free scrolling experience so far, on all compositing backends - even compared to the latest nightly build.

Still waiting on review. :/

Flags: needinfo?(stransky)

I'm working on it right now.

Flags: needinfo?(stransky)
Attached image "After" screenshot for vsynctester.com (deleted) —

Without in any way endorsing vsynctester.com by itself, I am attaching before/after screenshots from the test to give an idea as to how this affects our readings on wayland, although mostly just for the fun of it. Note that you do see spikes, the screenshot was just taken while there were none.

Test performed without webrender, as the test moves an enourmous texture along a canvas in the background which stalls webrender quite a bit. Also note that while the text is red in the screenshot, it is mostly grey as it should be when observed, flickering occasionally red or cyan.

This implementation works on everything but Mutter, due to what by common protocol interpretation are Mutter bugs (see the review).

See bug 1563075, which implements a different approach.

I have reopened the review with a version rebased on central. However, I have not yet tested it myself on latest mutter post-fix. heftig, do you think you'd have a chance to do that? I also do not know what the process should be with regards to merging this. Do we wait for next release of mutter?

I just tested myself on sway to verify that the patch still behaves as intended, and it looks good.

I am parking my other implementation for now. It's more correct, but debugging test failures is quite time consuming, and as volunteer, I have limited time available. I believe moving forward with this implementation is most appropriate for now.

Flags: needinfo?(jan.steffens)

Well, it definitely doesn't freeze anymore.

Difficult to tell if things improved, though. I've only got a 60 Hz monitor and https://www.vsynctester.com/ still has occasional second-long stretches of about every second frame being dropped. I think it got better?

Flags: needinfo?(jan.steffens)

Regarding second-long stretches: Do you see this both with and without this patch? What renderer are you testing with? You can also try clicking the cog in the upper right corner and disable the moving background.

For reference, without this patch, vsynctester estimates the refresh rate of my display to be ~61.5Hz, and I get both consistent blips (i.e. cyan/red text that should have been grey) more than once a second caused by the accumulated error, as well as a bunch of sporadic blips.

With this patch, I hit the refresh rate of my screen pretty much spot on (59.99something), and I only get occasional blips.

Firefox has already gotten a lot smoother on this test. The failure without this patch is bad, but nowhere near as catastrophic as in my original screenshots.

(In reply to Kenny Levinsen :kennylevinsen from comment #16)

Regarding second-long stretches: Do you see this both with and without this patch? What renderer are you testing with? You can also try clicking the cog in the upper right corner and disable the moving background.

I'm not sure. After the page settles into steady state they're rare and I haven't seen any since removing the patch again. I've tried GL layers and WebRender.

For reference, without this patch, vsynctester estimates the refresh rate of my display to be ~61.5Hz, and I get both consistent blips (i.e. cyan/red text that should have been grey) more than once a second caused by the accumulated error, as well as a bunch of sporadic blips.

With this patch, I hit the refresh rate of my screen pretty much spot on (59.99something), and I only get occasional blips.

Yes, I can confirm this. ~62Hz without the patch and frequent drops, 60Hz with and much more stable.

Seems like this might be breaking widget.wayland_dmabuf_backend.enabled (disclaimer: haven't really tested without the patch, just noticed the Vsync related lines in the backtrace):

* thread #1, name = 'firefox', stop reason = signal SIGBUS: hardware error
  * frame #0: 0x00000008174918ce iris_dri.so`___lldb_unnamed_symbol23455$$iris_dri.so + 46
    frame #1: 0x00000008038123bd libgbm.so.1`___lldb_unnamed_symbol9$$libgbm.so.1 + 77
    frame #2: 0x00000008073b1b4c libxul.so`WaylandDMABufSurface::Unmap() + 28
    frame #3: 0x0000000805caac30 libxul.so`mozilla::layers::TextureClient::Unlock() + 272
    frame #4: 0x0000000805bbe74d libxul.so`mozilla::layers::RemoteRotatedBuffer::Unlock() + 141
    frame #5: 0x0000000805c9bc8c libxul.so`mozilla::layers::ContentClientRemoteBuffer::EndPaint(mozilla::layers::ContentClient::PaintState&, nsTArray<mozilla::layers::ReadbackProcessor::Update>*) + 188
    frame #6: 0x0000000805c97035 libxul.so`mozilla::layers::ClientPaintedLayer::RenderLayerWithReadback(mozilla::layers::ReadbackProcessor*) + 997
    frame #7: 0x0000000805ca0450 libxul.so`mozilla::layers::ClientContainerLayer::RenderLayer() + 192
    frame #8: 0x0000000805c94d5c libxul.so`mozilla::layers::ClientLayerManager::EndTransactionInternal(void (*)(mozilla::layers::PaintedLayer*, gfxContext*, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::layers::DrawRegionClip, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) + 764
    frame #9: 0x0000000805c94ff1 libxul.so`mozilla::layers::ClientLayerManager::EndTransaction(void (*)(mozilla::layers::PaintedLayer*, gfxContext*, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::layers::DrawRegionClip, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) + 81
    frame #10: 0x0000000807804c03 libxul.so`nsDisplayList::PaintRoot(nsDisplayListBuilder*, gfxContext*, unsigned int) + 867
    frame #11: 0x0000000807589204 libxul.so`nsLayoutUtils::PaintFrame(gfxContext*, nsIFrame*, nsRegion const&, unsigned int, nsDisplayListBuilderMode, nsLayoutUtils::PaintFrameFlags) + 7476
    frame #12: 0x0000000807537ebd libxul.so`mozilla::PresShell::Paint(nsView*, nsRegion const&, mozilla::PaintFlags) + 1213
    frame #13: 0x000000080735156b libxul.so`nsViewManager::ProcessPendingUpdatesPaint(nsIWidget*) + 459
    frame #14: 0x0000000807351208 libxul.so`nsViewManager::ProcessPendingUpdatesForView(nsView*, bool) + 568
    frame #15: 0x00000008073524a9 libxul.so`nsViewManager::ProcessPendingUpdates() + 121
    frame #16: 0x00000008075110d2 libxul.so`nsRefreshDriver::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) + 5346
    frame #17: 0x000000080751503f libxul.so`mozilla::RefreshDriverTimer::TickRefreshDrivers(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsTArray<RefPtr<nsRefreshDriver> >&) + 399
    frame #18: 0x0000000807514e0d libxul.so`mozilla::RefreshDriverTimer::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) + 93
    frame #19: 0x000000080751499b libxul.so`mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) + 123
    frame #20: 0x0000000807513e18 libxul.so`mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::ParentProcessVsyncNotifier::Run() + 40
    frame #21: 0x000000080531a07d libxul.so`nsThread::ProcessNextEvent(bool, bool*) + 1917

There's also Bug 1588589 opened for X11.

(In reply to greg v [:myfreeweb] from comment #18)

Seems like this might be breaking widget.wayland_dmabuf_backend.enabled (disclaimer: haven't really tested without the patch, just noticed the Vsync related lines in the backtrace):

Can you please test without the patch? I don't think it's related.

Flags: needinfo?(greg)

(In reply to Martin Stránský [:stransky] from comment #20)

Can you please test without the patch? I don't think it's related.

Yeah, true.

Flags: needinfo?(greg)

stransky: Any ideas on how landing this should be handled, considering Mutter only just had its fix merged?

Flags: needinfo?(stransky)

(In reply to Kenny Levinsen :kennylevinsen from comment #22)

stransky: Any ideas on how landing this should be handled, considering Mutter only just had its fix merged?

Can this be enabled run-time? I can imagine a simple test where we just perform an empty commit to wl_surface and check if we get a frame callback in some time (say ~ 50ms which is under 30fps).

Flags: needinfo?(stransky)

(In reply to Martin Stránský [:stransky] from comment #23)

Can this be enabled run-time? I can imagine a simple test where we just perform an empty commit to wl_surface and check if we get a frame callback in some time (say ~ 50ms which is under 30fps).

Not reliably. We can't be sure that the callback we got was because of our commit. Even the unpatched shell serves all callbacks when any surface is damaged and the scene is recomposited.

The other way around, we can't be sure that not getting a callback right away means the shell is unpatched; it might just be a momentary glitch.

(In reply to Jan Alexander Steffens [:heftig] from comment #24)

(In reply to Martin Stránský [:stransky] from comment #23)

Can this be enabled run-time? I can imagine a simple test where we just perform an empty commit to wl_surface and check if we get a frame callback in some time (say ~ 50ms which is under 30fps).

Not reliably. We can't be sure that the callback we got was because of our commit. Even the unpatched shell serves all callbacks when any surface is damaged and the scene is recomposited.

The other way around, we can't be sure that not getting a callback right away means the shell is unpatched; it might just be a momentary glitch.

Yes, I suggest to use the frame-callback based vsync by default and if we don't get the frame callback in time even if we should (surface is visible etc.) then switch to timer emulation.

If that can be a run-time option, why not just hide it behind some about:config flag which will be off by default? At least the code would be merged and easy to test, and once enough users are deemed eligible it can easily be turned on by default.

I don't think the fallback would work. If I recall correctly, the issue was not that frame callbacks do not occur, but that they would stop occurring a while after the last damage was submitted. Having the fallback constantly monitoring would probably just result in false positives and fallback to vsync-less behavior on problem-free compositors.

I believe the fix in Mutter might make it into an upcoming bugfix release. Maybe we could just wait for that to land this patch. Implementing workarounds for Mutter Wayland defects, after they have already been fixed upstream, before we even officially ship our Wayland support seem a bit silly to me.

(In reply to Kenny Levinsen :kennylevinsen from comment #27)

I believe the fix in Mutter might make it into an upcoming bugfix release. Maybe we could just wait for that to land this patch. Implementing workarounds for Mutter Wayland defects, after they have already been fixed upstream, before we even officially ship our Wayland support seem a bit silly to me.

Do you know if that already works in other compositors like sway/kwin?

Fedora Mutter maintainers just told me they will include the patch in an Fedora 31 update shortly after freeze is over, so even before 3.34.2 is released (which then officially includes the patch). As Fedora 31 is AFAIK the only distribution shipping with FF-Wayland by default, I guess it would be fine to land the patch afterwards? I also asked for a 3.32 backport, as Fedora 30 has the FF-Wayland package, and they will probably do.

(In reply to Kenny Levinsen :kennylevinsen from comment #27)

I believe the fix in Mutter might make it into an upcoming bugfix release. Maybe we could just wait for that to land this patch.

Maybe add a pref that, when enabled, uses the new vsync source and, when disabled, uses the old one? This should make the patch less controversial to land. We can change the pref's default and/or remove the old code later.

I'm fine with the perf for now, let's have something to start with. The auto detection can be implemented later.

(In reply to Martin Stránský [:stransky] from comment #31)

I'm fine with the perf for now, let's have something to start with. The auto detection can be implemented later.

I suppose this implies having to readd the old sorta-almost-vsync source that we're currently using?

As for your other question, wlroots never had the issue, and GNOME is getting a fix. I don't have a recent kwin wayland install to test with ATM, but I'll try to get around to it.

(In reply to Kenny Levinsen :kennylevinsen from comment #32)

(In reply to Martin Stránský [:stransky] from comment #31)

I'm fine with the perf for now, let's have something to start with. The auto detection can be implemented later.

I suppose this implies having to readd the old sorta-almost-vsync source that we're currently using?

IMHO You can have an universal methods:

void WaylandVsyncSource::WaylandDisplay::SetupVSyncCallback()

which will call

void WaylandVsyncSource::WaylandDisplay::SetupFrameCallback()

for frame-callback implementation and

void WaylandVsyncSource::WaylandDisplay::SetupTimerCallback()

for timer-based implementation.

NOTE: You can't use pref module from other than main thread so I suppose to load the pref as a part of nsWaylandDisplay or gfxPlatformGtk class and use it indirectly from WaylandVsyncSource class.

As for your other question, wlroots never had the issue, and GNOME is getting a fix. I don't have a recent kwin wayland install to test with ATM, but I'll try to get around to it.

That's great. We can detect actual compositor by XDG_CURRENT_DESKTOP env variable.

for timer-based implementation.

I was referring to the re-implementing the semi-synchronizing timer solution we're using right now on wayland vs. just falling back to a plain software timer (which is what our current implementation ends up as 99% of the time anyway—IIRC, only the basic compositor correctly drives the synchronization mechanism).

Adding a pref for a basic timer would be a quick task, after figuring out the pref system.

That's great. We can detect actual compositor by XDG_CURRENT_DESKTOP env variable.

[kenny@lappy ~]$ echo $XDG_CURRENT_DESKTOP

[kenny@lappy ~]$

However, if all we want to detect is GNOME/Mutter for a temporary blacklist, we can probably rely on the fact that GDM sets that variable, and very few people are likely starting Mutter in other ways.

(In reply to Jan Alexander Steffens [:heftig] from comment #24)

The other way around, we can't be sure that not getting a callback right away means the shell is unpatched

There is a case where this is actually expected behavior: If Firefox's window is off-screen or completely obscured, then there should be no callbacks.
See https://gitlab.gnome.org/GNOME/mutter/issues/892.

(In reply to Jan Alexander Steffens [:heftig] from comment #35)

(In reply to Jan Alexander Steffens [:heftig] from comment #24)

The other way around, we can't be sure that not getting a callback right away means the shell is unpatched

There is a case where this is actually expected behavior: If Firefox's window is off-screen or completely obscured, then there should be no callbacks.
See https://gitlab.gnome.org/GNOME/mutter/issues/892.

We know when the window is visible/hidden so we can check if we get/don't get frame callback for visible window.

(In reply to Martin Stránský [:stransky] from comment #36)

We know when the window is visible/hidden so we can check if we get/don't get frame callback for visible window.

I'm pretty sure you can't check if the window is completely obscured by another surface.

(In reply to Jan Alexander Steffens [:heftig] from comment #37)

(In reply to Martin Stránský [:stransky] from comment #36)

We know when the window is visible/hidden so we can check if we get/don't get frame callback for visible window.

I'm pretty sure you can't check if the window is completely obscured by another surface.

We use 'visibility-notify-event' to trigger rendering but I don't know how is that reliable now:
https://developer.gnome.org/gtk3/stable/GtkWidget.html#GtkWidget-visibility-notify-event

The patch works fine for me with "widget.wayland_vsync.enabled" set to false on Fedora 31 we can say it does not break anything by default. But I see video shuttering/freezing while playback when widget.wayland_vsync.enabled is set to true, WR enabled, Fedora 31 / mutter-3.34.1-5.fc31.x86_64.

I'm going to retest it with fixed mutter (https://gitlab.gnome.org/GNOME/mutter/issues/892).

I think the more important fix is https://gitlab.gnome.org/GNOME/mutter/merge_requests/839, which is not yet in 3.34.1-5. I hope we get out a new version with that patch soonish, definitely with 3.34.2.

https://gitlab.gnome.org/GNOME/mutter/issues/892 should not really add anything new, it only helps to make completely covered applications behave as if they were on another workspace. It should not change how visible applications behave.

Thanks, I'll try to patch/build it locally.

:kennylevinsen could you rebase the patch? I'm having trouble doing so myself and I'd like to test it a bit.

Rebased. Had some conflicts on wayland buffer scale additions.

Thx. So from some local testing I can say that it seems to run nicely under Mutter (git master). vsynctester reports a 60Hz display, which is correct. But for some reason the config does not seem to have any effect: whether it is TRUE or FALSE, in both cases 60Hz are reported, not 61.something like without this patch.

Did you restart the browser after toggling the setting?

Did you restart the browser after toggling the setting?
Yes. But from the code it doesn't make sense to me :/

Another nit: there seem to be leftovers from the rebase, the two places with wl_surface_set_buffer_scale. Unfortunately I don't have permission for reviews on phabricator, so commenting here.

Just a recap of what this change does, and what to expect:

New setup

With the setting set to false, vsync runs off a 60Hz timer. Set to true, vsync runs off frame callbacks where possible.

Note the "where possible": Multi-process IPC boundaries complicate tying things to its respective window widget. In these locations, the 60Hz timer is still used.

Old setup

Vsync runs off a 62.5Hz timer. More accurately, the timer is a loop that sleeps until last_vsync+16ms, with some logic to attempt to define last_vsync as a frame callback time (mostly only successful under the basic compositor).

Observable difference

Both with and without wayland vsync enabled, this patch should be closer to a normal 60Hz refresh. With it enabled, things like compositing will run off frame callbacks and synchronize to any refresh rate with any compositor.

If you have a 60Hz monitor, the difference between enabled and disabled might be subtle: The timer will start out of phase with vsync, and drift ever so slightly over time.

Another nit: there seem to be leftovers from the rebase, the two places with wl_surface_set_buffer_scale

That is intentional as solution to a conflict. WaylandVsyncSource uses get_wl_surface to get the wl surface, and is not interested in configuring it. However, buffer scale code had been placed into it, which doesn't make sense for WaylandVsyncSource to interact with.

We should probably split the setup logic out of get_wl_surface, so that get_wl_surface does what it says on the tin without the magic that is currently in place.

Thanks for the explanation! Defaulting to 60Hz sounds reasonable and I can now confirm that things work as expected: if no frame callbacks are send, e.g. by hiding the window below gnome-terminal, vsynctester reports much lower FPS (when unhiding it). More importantly, CPU usage drops heavily (on Mutter this needs https://gitlab.gnome.org/GNOME/mutter/merge_requests/918 to be effective, on 3.34 it can be tested on a separate workspace). So this has great potential for saving battery time when FF is hidden behind other windows.

The only thing I found that could still improve is the multi-window case: apparently if one window receives frame-callbacks, all windows continue to render. So if one window completely hides the other one, the hidden one will still update. But that could be a future optimization and should not block this to land as it is.

Gnome 3.34.2 is scheduled for next week, then it should be save to land, no?

It should be safe to land independently of GNOME; that's why we went with the off-by-default pref.

As of current Fedora 31 stable (mutter-3.34.1-11.fc31) the necessary patches are also included. So :stransky if you wanted to test this before landing you can now do so now. Would love to see this land in time for FF 72.

(In reply to robert.mader from comment #50)

As of current Fedora 31 stable (mutter-3.34.1-11.fc31) the necessary patches are also included. So :stransky if you wanted to test this before landing you can now do so now. Would love to see this land in time for FF 72.

I'm fine with the patch as far as it's disabled by default, I suggest to land it ASAP to prevent another bit rot.
It can be enabled on fixed compositors by default (you can read XDG_CURRENT_DESKTOP env variable for it for instance).

Is there any blocker to landing this patch?

(In reply to Yariv from comment #52)

Is there any blocker to landing this patch?

No, it can be landed now.

(In reply to Martin Stránský [:stransky] from comment #53)

No, it can be landed now.

Please forgive my ignorance, what has to be done in order to merge it?
This patch could be useful for testing some GTK code I'm working on.

Sorry, Yariv, I've mostly just been busy.

I'll request landing as it is now, as we can always add automagic gnome bug detection later (such as XDG_CURRENT_DESKTOP == "gnome").

Hmm, the "checkin-needed" keyword I normally use is not available. :stransky, do you know what's going on?

Flags: needinfo?(stransky)

You need to go to "Edit Revision" in Phabricator and add the "Check-in Needed” tag.

Flags: needinfo?(stransky)

kennylevinsen:, I've tried to land your patch, but I got the following error:

On Tue, November 26, 2019, 11:28 PM GMT+2, by dvarga@mozilla.com.
Revisions: D28430 diff 193142
Details: We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again. applying /tmp/tmpRvRf6W widget/gtk/nsWindow.h Hunk #4 succeeded at 259 with fuzz 1. widget/gtk/nsWindow.cpp Hunk #5 FAILED at 3928. 1 out of 12 hunks FAILED -- saving rejects to file widget/gtk/nsWindow.cpp.rej abort: patch command failed: exited with status 256

Flags: needinfo?(bugzilla)

:(In reply to Daniel Varga [:dvarga] from comment #58)

kennylevinsen:, I've tried to land your patch, but I got the following error:

On Tue, November 26, 2019, 11:28 PM GMT+2, by dvarga@mozilla.com.
Revisions: D28430 diff 193142
Details: We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again. applying /tmp/tmpRvRf6W widget/gtk/nsWindow.h Hunk #4 succeeded at 259 with fuzz 1. widget/gtk/nsWindow.cpp Hunk #5 FAILED at 3928. 1 out of 12 hunks FAILED -- saving rejects to file widget/gtk/nsWindow.cpp.rej abort: patch command failed: exited with status 256

Rebased and tagged for checkin again. Thanks!

Flags: needinfo?(bugzilla)
Pushed by dvarga@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/248d909386f5 Implement widget-local VsyncSource for Wayland windows. r=stransky,lsalzman
Status: UNCONFIRMED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72
Assignee: nobody → bugzilla

This interface is currently used for the compositor, and for refresh drivers
in the parent process. It is not yet used for vsync in content processes.

This needs to be done before e.g. vsynctester.com will run fluidly at any refresh rate?

(In reply to walmartguy from comment #62)

This interface is currently used for the compositor, and for refresh drivers
in the parent process. It is not yet used for vsync in content processes.

This needs to be done before e.g. vsynctester.com will run fluidly at any refresh rate?

To get a perfect vsynctester.com result, yes.

However, compositing is done fluidly at any refresh rate with this new feature enabled. Scrolling will follow you screen refresh rate, for example.

Seems to be working well on Sway, thanks.
One exception is scrolling with keyboard arrow keys though, it doesn't look fluid. Perhaps this is due to how input is handled, and not compositing?

(In reply to walmartguy from comment #65)

Seems to be working well on Sway, thanks.
One exception is scrolling with keyboard arrow keys though, it doesn't look fluid. Perhaps this is due to how input is handled, and not compositing?

Have you enabled 'Use smooth scrolling' in Firefox settings (about:preferences)?

(In reply to Yariv from comment #66)

(In reply to walmartguy from comment #65)

Seems to be working well on Sway, thanks.
One exception is scrolling with keyboard arrow keys though, it doesn't look fluid. Perhaps this is due to how input is handled, and not compositing?

Have you enabled 'Use smooth scrolling' in Firefox settings (about:preferences)?

Yes, it's fluid on Xorg.

(In reply to walmartguy from comment #67)

Yes, it's fluid on Xorg.

That's interesting. For me it's smooth on the 'about:support' page. However it's indeed less smooth on other pages - case in point, this and other https://bugzilla.mozilla.org pages.

(In reply to Yariv from comment #68)

(In reply to walmartguy from comment #67)

Yes, it's fluid on Xorg.

That's interesting. For me it's smooth on the 'about:support' page. However it's indeed less smooth on other pages - case in point, this and other https://bugzilla.mozilla.org pages.

Yeah, about:support is smoother (but still worse than mouse scrolling on other sites). It's also odd that Firefox seems to render this particular site with fixed 60fps when using autoscroll, which looks really bad on my 75Hz screen.

(In reply to walmartguy from comment #69)

It's also odd that Firefox seems to render this particular site with fixed 60fps when using autoscroll, which looks really bad on my 75Hz screen.

That sounds like another issue. Perhaps you should file a bug?

I didn't write this patch, so I also wonder about the relationship between input events and the frame callbacks. IIUC Firefox uses GDK for input handling, however I'm not sure whether they use the GDK frame clock signals for display.

(In reply to walmartguy from comment #69)

(In reply to Yariv from comment #68)

(In reply to walmartguy from comment #67)

Yes, it's fluid on Xorg.

That's interesting. For me it's smooth on the 'about:support' page. However it's indeed less smooth on other pages - case in point, this and other https://bugzilla.mozilla.org pages.

Yeah, about:support is smoother (but still worse than mouse scrolling on other sites). It's also odd that Firefox seems to render this particular site with fixed 60fps when using autoscroll, which looks really bad on my 75Hz screen.

If mouse scrolling is smooth and synchronized to your display refresh rate, but keyboard scrolling seems locked to 60fps, I would guess that keyboard processing ends up in a queue served by the global 60Hz software timer that is in place under Wayland as fallback for the consumers of vsync that could not easily be associated with a window, which under Wayland mean that it cannot observe vsync, which is a window-local concept (surface frame callbacks).

On other platforms, this global timer is served by a global vsync source, but such a thing is either a simulated concept or only synchronized to a single monitor, so it's broken by design and not implementable under Wayland without a large pile of hacks.

(In reply to Kenny Levinsen :kennylevinsen from comment #71)

If mouse scrolling is smooth and synchronized to your display refresh rate, but keyboard scrolling seems locked to 60fps, I would guess that keyboard processing ends up in a queue served by the global 60Hz software timer that is in place under Wayland as fallback for the consumers of vsync that could not easily be associated with a window, which under Wayland mean that it cannot observe vsync, which is a window-local concept (surface frame callbacks).

On other platforms, this global timer is served by a global vsync source, but such a thing is either a simulated concept or only synchronized to a single monitor, so it's broken by design and not implementable under Wayland without a large pile of hacks.

That's a pity. Keyboard scrolling is fluid with Chromium XWayland. I suppose Firefox XWayland would be fluid too if it didn't suffer this bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=1588589

Soon also Gnome Shell and others will be able to run XWayland windowed with native refresh rate vsync:
https://gitlab.freedesktop.org/xorg/xserver/merge_requests/356
So it would be really nice if that Firefox XWayland issue was sorted out.

(In reply to walmartguy from comment #72)

Soon also Gnome Shell and others will be able to run XWayland windowed with native refresh rate vsync:
https://gitlab.freedesktop.org/xorg/xserver/merge_requests/356
So it would be really nice if that Firefox XWayland issue was sorted out.

FWIW, I think it's better to put effort into solving this properly in the native Wayland case, instead of in the X11 case for Xwayland.

(In reply to Michel Dänzer from comment #73)

FWIW, I think it's better to put effort into solving this properly in the native Wayland case, instead of in the X11 case for Xwayland.

Yes, but you also mentioned that this bug can cause other issues also on native Xorg. So one could argue that having XWayland fixed would be more like a positive side effect.

(In reply to walmartguy from comment #72)

That's a pity. Keyboard scrolling is fluid with Chromium XWayland. I suppose Firefox XWayland would be fluid too if it didn't suffer this bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=1588589

(In reply to walmartguy from comment #74)

Yes, but you also mentioned that this bug can cause other issues also on native Xorg. So one could argue that having XWayland fixed would be more like a positive side effect.

This is a general Firefox X11 issue (meaning related to neither XWayland nor Wayland).

To fix it, one would have to:

  1. Implement a widget-local VsyncSource for X11, like the one already implemented for Wayland here.
  2. Replace all (or most) use of the global VsyncSource with widget-local sources.

Implementing point 1 immediately makes X11 equal to Wayland with respect Vsync (i.e. partial synchronization). Implementing point 2 fixes both Wayland and X11.

I spent a lot of time in a dead experiment trying to redesign frame scheduling in Firefox for more correct behavior under Wayland and as workaround for GNOME issues, and haven't had the energy to look at Firefox timers again. I also personally have no interest in working on X11 issues, but anyone else are free to give that a shot. I'd still recommend starting with Wayland, though, as it allows you to dig into the hard parts quicker.

Also, please discuss other issues, such as X11 backend issues, in their appropriate bug reports. This bug concerns implementation of a widget-local VsyncSource for Wayland.

(In reply to Kenny Levinsen :kennylevinsen from comment #75)

To fix it, one would have to:

  1. Implement a widget-local VsyncSource for X11, like the one already implemented for Wayland here.
  2. Replace all (or most) use of the global VsyncSource with widget-local sources.

This bug concerns implementation of a widget-local VsyncSource for Wayland.

I'm working on input event interpolation for GDK (https://gitlab.gnome.org/GNOME/gtk/merge_requests/1117) in order to solve discrepancies between input rate and display refresh. The idea is to replace the original input events with an interpolated event, which is synthesized once per 'vsync', in order to achieve smooth scrolling and the like. I currently use GDK's frame clock 'update' signal to deliver the once-per-frame synthesized events.

Your comment leads me to believe that in Firefox input handling is separate and maybe asynchronous w.r.t display handling. In your PR, did you rely on the GDK frame clock signals or did you use some other mechanism? My concern is that if the input/output handling in Firefox is async, it might be possible for input events to sometimes arrive before the frame is updated and sometimes after, defeating the objective of getting exactly one input event per frame.

Your comment leads me to believe that in Firefox input handling is separate and maybe asynchronous w.r.t display handling.

Note that I have minimal knowledge of Firefox's input pipeline after the basic GDK events have fired. Also note that this patch does not modify input processing, as basically only composite/render is tied to widget-local vsync.

You are correct that render is asynchronous in a separate thread, and can be seen as just sampling the document on vsync (widget-local if possible, which on Wayland is through frame callbacks, although this should also be what lies under a GdkFrameClock). There is no direct connection to user input (apart from dirtying the document and kickstarting render).

Event interpolation might help, but it will race against content sampling for render, so there will most likely be missed events. If input events are synchronized to a frame, rendering and event processing will kick off in parallel, and it's very likely that input processing will not make the sampling deadline.

I suspect the only bullet-proof fix within this model (as opposed to models only reporting changes in input) is to interpolate events at a multiple of the display refresh rate, so that on every render there will be N pending events where N≥2 so that a synchronization error of ±1 event will not cause motion to stop or significantly change. Smoothing can then hide minor velocity differences.

(In reply to Kenny Levinsen :kennylevinsen from comment #77)

I suspect the only bullet-proof fix within this model (as opposed to models only reporting changes in input) is to interpolate events at a multiple of the display refresh rate, so that on every render there will be N pending events where N≥2 so that a synchronization error of ±1 event will not cause motion to stop or significantly change. Smoothing can then hide minor velocity differences.

Ok, thanks! The solution that you suggested is probably not good enough, since the whole point of the interpolation is to get rid of these differences. I'll try to do some deep dive into the event handling code in Firefox when I have some time, see if there is a way to sync event handling with frame drawing.

Thanks for the explanations @Kenny Levinsen, I had no idea how much work was required to solve the the issues of the X11 implementation. In that case, I'm totally with Michel to get Wayland done instead. At least it's already working remarkably well with 72.
I'm going to open tickets for the stuttery keyboard scrolling input and about:support weirdness. Though my two cents would be that having Wayland widget vsync for rasterization and WebGL would at least be equally exciting.

(In reply to Yariv from comment #78)

Ok, thanks! The solution that you suggested is probably not good enough, since the whole point of the interpolation is to get rid of these differences. I'll try to do some deep dive into the event handling code in Firefox when I have some time, see if there is a way to sync event handling with frame drawing.

Delaying event processing to render time will introduce input delay, especially so for lower refresh rate display (e.g. 60/75Hz). Web applications also need time to process events and will miss sampling deadlines (effectively input latency) or postpone render using rAF if rAF is spec compliant (missed frames). This will probably matter a lot for games and such.

However, this is off-topic, and 32 people are CC this issue. If you create a new issue, we and interested parties can discuss it there. :)

(In reply to Kenny Levinsen :kennylevinsen from comment #80)

However, this is off-topic, and 32 people are CC this issue. If you create a new issue, we and interested parties can discuss it there. :)

Good call. I think that bug 1554408 could be a good place for that. It might not be obvious from the title, but it's basically about the (input-event rate / display rate) ratio issue - I wasn't aware of the root cause at the time.

Report for the stuttery keyboard arrow key scrolling (I btw. was mistaken regarding about:support, it's completely fluid when not moving the cursor):
https://bugzilla.mozilla.org/show_bug.cgi?id=1601962

Report for the missing hardware vsync for auto scroll at about:support:
https://bugzilla.mozilla.org/show_bug.cgi?id=1601966

Update: what I claimed it comment #48 was actually wrong:

The only thing I found that could still improve is the multi-window case: apparently if one window receives frame-callbacks, all windows continue to render. So if one window completely hides the other one, the hidden one will still update. But that could be a future optimization and should not block this to land as it is.

This was just caused by bug 1602309, as I had the terminal with htop focused, not FF. When FF is focused (and therefore the opaque reqion is properly set), the hidden window does not receive frame callbacks and drops in CPU usage. So this implementation here appears to have the multi window case covered just fine.

In current nightly since around the 74 update, enabling widget.wayland_vsync.enabled causes DnD to not work any more here. I already recreated the profile etc. and this (rather unrelated) setting reliably stops any DnD action from working. In may main browser, version 72, this does not happen. Can anybody confirm?

(In reply to robert.mader from comment #84)

In current nightly since around the 74 update, enabling widget.wayland_vsync.enabled causes DnD to not work any more here. I already recreated the profile etc. and this (rather unrelated) setting reliably stops any DnD action from working. In may main browser, version 72, this does not happen. Can anybody confirm?

I marked a bunch of text from the 'about:support' page, and successfully dragged that to a gedit window. Is this what you referred to?

(In reply to Yariv from comment #85)

(In reply to robert.mader from comment #84)

In current nightly since around the 74 update, enabling widget.wayland_vsync.enabled causes DnD to not work any more here. I already recreated the profile etc. and this (rather unrelated) setting reliably stops any DnD action from working. In may main browser, version 72, this does not happen. Can anybody confirm?

I marked a bunch of text from the 'about:support' page, and successfully dragged that to a gedit window. Is this what you referred to?

Hm, yes, after a rebuild today things work again, so probably something odd on my setup. Sorry for the noise!

Regressions: 1614212

Assuming bug 1614212 can get fixed, should we let this ride the train? If we ship it with 75 there were three releases to find regressions. I'm asking because in combination with GS 3.36 this can significantly help to reduce background activity (GS 3.36 does not send frame callbacks any more if the surface is hidden behind other opaque surfaces).
And here this has worked pretty well so far.

Regressions: 1629350
Blocks: 1629140
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: