Closed Bug 1627505 Opened 5 years ago Closed 5 years ago

compositor window blocks direct manipulation from working

Categories

(Core :: Graphics, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla77
Tracking Status
firefox77 --- fixed

People

(Reporter: tnikkel, Assigned: tnikkel)

References

(Blocks 1 open bug)

Details

Attachments

(3 files)

I found that disabling the compositor window allows my direct manipulation code to work. If I add WS_EX_LAYERED here

https://searchfox.org/mozilla-central/rev/3f9e822318e8ec18ce673a9cb983d3608a3e2ed2/widget/windows/WinCompositorWindowThread.cpp#162

then my dmanip code works. We used to have WS_EX_LAYERED there but we removed it in bug 1544074 due to a blank window on startup reported in at least on configuration.

Let's rewind back a bit to why we had WS_EX_LAYERED there in the first place. In https://bugs.chromium.org/p/chromium/issues/detail?id=871257 Chrome had problems with laggy scrolling (late 2018). From comment 55 in that bug https://bugs.chromium.org/p/chromium/issues/detail?id=871257#c55 a contact from Microsoft provided this explanation:

"the speed hit-test is finding a window in one chrome process [A] (class “Intermediate D3D Window”), but the secondary hit-test once inside that process is finding a different one in a different chrome process ... We fixed one bug on the OS side that was causing issues here, but it seems like there’s another. ... So my recommendation is to figure out why this window is getting hit-tested, and stop it."

And from comment 63 https://bugs.chromium.org/p/chromium/issues/detail?id=871257#c63 the same Microsoft contact:

"Unfortunately, that means that neither of these is a Chrome issue."

Comment 77 https://bugs.chromium.org/p/chromium/issues/detail?id=871257#c77 from the Microsoft contact:

"Ross suggest us using 'WS_EX_LAYERED | WS_EX_TRANSPARENT | WS_EX_NOREDIRECTIONBITMAP' to fix this issue.

The first two should make the window transparent for input, and the third avoids allocating a bitmap that would otherwise be allocated with WS_EX_LAYERED (and is only necessary if using Gdi objects with the window)."

Comment 79 https://bugs.chromium.org/p/chromium/issues/detail?id=871257#c79 explains this further (quoting from msdn):

'"Hit Testing
Hit testing of a layered window is based on the shape and transparency of the window. This means that the areas of the window that are color-keyed or whose alpha value is zero will let the mouse messages through.

If the layered window has the WS_EX_TRANSPARENT extended window style, the shape of the layered window will be ignored and the mouse events will be passed to the other windows underneath the layered window."'

So Chrome lands that workaround and then they here back from Microsoft

"So, in conclusion, we have a fix for 19H1 but in the existing RS4 and RS5 OS, we recommend the workaround from Chrome’s end."

No fast forward to Feb 2019 and in bug 1525183 we are seeing laggy scrolling (I think because we started using Direct Composition which was also the reason in Chrome) and so we import the same 3 flags to our compositor window and it seems to fix the bugs we were seeing.

In April 2019 in bug 1544074 a blank window on startup was reported on some hardware and on a hunch Sotaro suspects WS_EX_LAYERED | WS_EX_TRANSPARENT and removing those fixes the problem. The 19H1 release of Windows 10 wasn't until May, so either the flags didn't fix any problems when they were added or Microsofts fix was backported to earlier versions of Windows 10 by then because there are no linked regressions from removing those flags.

(Chrome also does one thing we don't: put the compositor window on the bottom using HWND_BOTTOM. I tried doing that when we SetParent the window and it doesn't fix damnip.)

So my guess is that "speed hit-testing" that Windows does (mentioned above) is used when determining if a dmanip session should be started, and the "secondary hit-test" is skipped either intentionally or by a bug in Windows. So we'll probably need to use this workaround even if we report it to MS and MS agrees that it is a bug (because it will take a while for that and older versions of Win 10 might not get that fix).

Our options:

  1. add back WS_EX_LAYERED and test on the hardware from bug 1544074 if that flag alone causes the problem. If it regresses bug 1544074 then blocklist the hardware from bug 1544074 from some feature so we don't hit this (the hardware is from 2011 so seems quite reasonable to do this). This assumes WS_EX_LAYERED is enough and we don't need WS_EX_TRANSPARENT to fully fix this.

  2. Same as 1) but add back both WS_EX_LAYERED and WS_EX_TRANSPARENT

  3. Some other solution?

Any other thoughts, anything I missed?

Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(jmuizelaar)
Priority: -- → P3

:tnikkel, thank you for detailed explanation! All necessary information is there. It seems that option(1) is reasonable. If it has a problem, we could try (2). By the way, how can we test the hardware from bug 1544074?

Flags: needinfo?(sotaro.ikeda.g)

(In reply to Sotaro Ikeda [:sotaro] from comment #1)

By the way, how can we test the hardware from bug 1544074?

We could try to reproduce the original problem on similar hardware in our inventory, or ask the original reporter from that bug to test for us if they are willing, as a last resort we could buy the hardware if we deem it necessary and worthwhile.

Assignee: nobody → tnikkel
Status: NEW → ASSIGNED

Michal, we need to partially revert the fix from bug 1544074. Would you be willing to try a couple of builds to see if the problem from bug 1544074 returns or not?

Flags: needinfo?(kluka)

One more question Sotaro. From msdn:

"Hit testing of a layered window is based on the shape and transparency of the window. This means that the areas of the window that are color-keyed or whose alpha value is zero will let the mouse messages through."

Is the compositor window always going to satisfy that?

Flags: needinfo?(sotaro.ikeda.g)

I am actually not sure it. But the following says that the layered window will not become visible until the SetLayeredWindowAttributes or UpdateLayeredWindow function has been called for this window.

And they are not called for compositor windows. Then it seems like same as transparent.

Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(jmuizelaar)
Pushed by tnikkel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/707b309fb85e Use WS_EX_LAYERED on the compositor window to prevent Direct Manipulation from finding it. r=sotaro
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla77

(In reply to Timothy Nikkel (:tnikkel) from comment #4)

Michal, we need to partially revert the fix from bug 1544074. Would you be willing to try a couple of builds to see if the problem from bug 1544074 returns or not?

Yes I can try. Pls ping me when needed.

Flags: needinfo?(kluka)

Also, since that machine has nvidia, try to avoid the latest nvidia drivers, they have problems (bug 1625857). So verify in about:support that you are still getting webrender and your driver is not blocked so it's the same as in bug 1544074. Thanks!

Flags: needinfo?(kluka)

Just to add to comment 12 :

  1. after updating chipset drivers your builds will no longer crash
  2. current nightly 77.0a1 (20200408033650) will only show blank window
  3. when running on nvidia card the performance is poor. UI is lagging, scrolling is terrible...

(In reply to Michal Kluka from comment #12)

OK - https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/CCgM3MFpRWeAuA-q3aXohg/runs/0/artifacts/public/build/target.zip
blank window - https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Hs-8z1W_S_eF8Jv6Ic3yPA/runs/0/artifacts/public/build/target.zip
OK - https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/JdDA2aBdTYW79ZuUu8w0sQ/runs/0/artifacts/public/build/target.zip
blank window - https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/f3cGM2agSRqeIZoi0cNVWQ/runs/0/artifacts/public/build/target.zip

Thanks! The two OK builds correspond to a unmodified build and a build with WS_EX_TRANSPARENT. The two blank builds corresponds to adding WS_ES_LAYERED and WS_ES_LAYERED | WS_EX_TRANSPARENT respectively.

(In reply to Michal Kluka from comment #13)

Just to add to comment 12 :

  1. after updating chipset drivers your builds will no longer crash

Good!

  1. current nightly 77.0a1 (20200408033650) will only show blank window

current nightly has WS_ES_LAYERED so that is expected.

  1. when running on nvidia card the performance is poor. UI is lagging, scrolling is terrible...

In which builds is the performance poor? All of them?

Flags: needinfo?(kluka)

Sotaro, the issue returns with WS_ES_LAYERED. So should we blocklist something on that hardware? I don't know much about the compositor window and blocklist stuff. Can you suggest what to do here?

Flags: needinfo?(sotaro.ikeda.g)

If WS_ES_LAYERED caused the problem, it seems better to block on the hardware. The following could be one idea for it.


It is a bit tricky to block compositor window usage. Current gecko does not have one gfxVars config just for controlling compositor window creation. But there are some gfxVars configs that enable compositor window if gpu process exists.

  • gfxVars::UseDoubleBufferingWithCompositor()
  • gfxVars::UseWebRenderDCompWin()
  • gfxVars::UseWebRenderFlipSequentialWin()

Then it seems that we could do the following.

WinCompositorWidget::EnsureCompositorWindow() creates compositor window.It is called in the following places.
https://searchfox.org/mozilla-central/rev/9120151ddb35f2d4c37bfe613a54a4f10a9a3dc5/gfx/layers/ipc/CompositorBridgeParent.cpp#1602
https://searchfox.org/mozilla-central/rev/9120151ddb35f2d4c37bfe613a54a4f10a9a3dc5/gfx/layers/ipc/CompositorBridgeParent.cpp#1602

Flags: needinfo?(sotaro.ikeda.g)
  1. when running on nvidia card the performance is poor. UI is lagging, scrolling is terrible...

In which builds is the performance poor? All of them?

All of them. But that piece of hardware in not officially supported with Win10 and windows update will bork drivers regularly.
I myself stopped using discrete NVIDIA graphics card because it was causing a lot of troubles (not just with firefox). Firefox works fine with integrated one.

Flags: needinfo?(kluka)

(In reply to Michal Kluka from comment #17)

  1. when running on nvidia card the performance is poor. UI is lagging, scrolling is terrible...

In which builds is the performance poor? All of them?

All of them. But that piece of hardware in not officially supported with Win10 and windows update will bork drivers regularly.
I myself stopped using discrete NVIDIA graphics card because it was causing a lot of troubles (not just with firefox). Firefox works fine with integrated one.

Oh, so maybe we don't even need to bother blocklisting this because it's already broken? So no one would realistically be using that combination?

Andrew, does that make sense? To bring you up to speed, the machine from https://bugzilla.mozilla.org/show_bug.cgi?id=1544074#c0 when running on the nvidia card will get a blank window after the changes from this bug. But that machine is quite old, should we bother doing the plan outlined in comment 16 or just let this be?

cut/paste of about:support from the machine in question
GPU #1
Active: Yes
Description: Intel(R) HD Graphics 3000
Vendor ID: 0x8086
Device ID: 0x0126
Driver Version: 9.17.10.4459
Driver Date: 5-19-2016
Drivers: igdumd64 igd10umd64 igd10umd64 igdumd32 igd10umd32 igd10umd32
Subsys ID: 21d117aa
RAM: Unknown
GPU #2
Active: No
Description: NVIDIA Quadro 2000M
Vendor ID: 0x10de
Device ID: 0x0dda
Driver Version: 21.21.13.7748
Driver Date: 6-8-2017

Flags: needinfo?(aosmond)
Regressions: 1632357

Do you also get a blank window with Chrome? They use a similar setup to us after adding the problematic flag.

Flags: needinfo?(kluka)

We added the pref apz.windows.force_disable_direct_manipulation, you can set it to true and that should avoid this problem for you.

Blocks: 1633343
Attached file chrome://gpu/ for itegrated intel GPU (deleted) —
Flags: needinfo?(kluka)
Attached file chrome://gpu/ for nvidia GPU (deleted) —

(In reply to Timothy Nikkel (:tnikkel) from comment #19)

Do you also get a blank window with Chrome? They use a similar setup to us after adding the problematic flag.

No I do not. It seems that GPU is blacklisted.
Below is from chrome://gpu/

Gpu compositing has been disabled, either via blacklist, about:flags or the command line. The browser will fall back to software compositing and hardware acceleration will be unavailable.
Disabled Features: gpu_compositing

I have attached chrome://gpu/ output for both integrated on NVIDIA graphics.
Seems that integrated is actually able to accelerate something while NVIDIA almost nothing.

Flags: needinfo?(aosmond)

(In reply to Timothy Nikkel (:tnikkel) from comment #24)

We have a possible fix from this try push

https://treeherder.mozilla.org/#/jobs?repo=try&revision=38304ada606c995c553dfb329eea452bd407e136

could you try this build

https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/OVzBeHgbRAS4mTSHP40Bzg/runs/0/artifacts/public/build/target.zip

Thanks!

It works. Actually performance is pretty good.

Flags: needinfo?(kluka)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: