Closed Bug 1667423 Opened 4 years ago Closed 4 years ago

mingw-clang builds do not render properly and show blank windows

Categories

(Core :: Graphics: WebRender, defect, P2)

Desktop
Windows
defect

Tracking

()

RESOLVED FIXED
84 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox81 --- wontfix
firefox82 --- wontfix
firefox83 --- wontfix
firefox84 --- fixed

People

(Reporter: bryce, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(1 file)

STR:

Actual result:
The profile browser (or Firefox window if I specify a profile on the command line) render as blank/white. Rendering the build unusable.

Expected result:
Rendering functions as in non-mingw-clang builds.

Any estimate of when was the last working build? (Note that they haven't been produced every day lately, the builds are tier2 and have seen a lot of bustage that didn't get backed out)

(In reply to :dmajor from comment #1)

Any estimate of when was the last working build? (Note that they haven't been produced every day lately, the builds are tier2 and have seen a lot of bustage that didn't get backed out)

I'm afraid not, this is the first one I can recall running. Is there a way I could bisect? I'm largely unfamiliar with these builds (ended up here trying to debug another bug).

Maybe there's a nicer way, but as a quick and dirty approach, you could find a build on m-c's treeherder and go to "Similar Jobs" and get the past few weeks of history there.

NI me to bisect when I have moment.

Flags: needinfo?(bvandyk)

I tried a bit today going back on https://treeherder.mozilla.org/#/jobs?repo=mozilla-central to find the last working mingw-clang build. It turns out it is the one from 7/29 (commit 3b87c49182a4f435447a4f9147c1db7d496ecb11) while the first available after that on 4/8 (commit 02be42e5105ba49e93890fdbd9e24c4f55a9251c) is broken.

It's a bit unfortunate that the mingw-clang builds in between those two dates are either busted or did not happen at all. Or maybe my treeherder knowledge is just too limited to find them.

Oh, I overlooked that there is one more build that succeeded on 7/29 (commit 4c1c82402c09bd9b6a6bf57f96edc719ece2b2f3) and that runs properly on my Windows box, too. So the pushlog in question is: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=4c1c82402c09bd9b6a6bf57f96edc719ece2b2f3&tochange=02be42e5105ba49e93890fdbd9e24c4f55a9251c. Still a lot of commits. :(

It's really frustrating that we let these builds stay red for days at a time.

If we need to narrow it down further, I guess we'll have to build locally, or push try builds against intermediate revisions in the regression range, with the bustage fixes "backported" so to speak.

(In reply to :dmajor from comment #7)

It's really frustrating that we let these builds stay red for days at a time.

If we need to narrow it down further, I guess we'll have to build locally, or push try builds against intermediate revisions in the regression range, with the bustage fixes "backported" so to speak.

Yeah, that was my plan. I can try spending some time tomorrow on that (doing the try builds version).

I'm going to hold off on bug 1668304 while this investigation is active, to avoid having too many moving pieces at once.

Blocks: 1668304

Alright, this took me longer than expected. The build bustage encountered is pretty well described in https://bugzilla.mozilla.org/show_bug.cgi?id=1656725#c0. Taking that into account while bisecting gave me bug 1653166 as the culprit. In particular, the first bad revision is: https://hg.mozilla.org/mozilla-central/rev/2ce5cfe5cbe9c4331b547293db3b1e188d006de9.

Regressed by: 1653166
Has Regression Range: --- → yes

Set release status flags based on info from the regressing bug 1653166

Flags: needinfo?(bvandyk)

(In reply to Georg Koppen from comment #10)

Alright, this took me longer than expected. The build bustage encountered is pretty well described in https://bugzilla.mozilla.org/show_bug.cgi?id=1656725#c0. Taking that into account while bisecting gave me bug 1653166 as the culprit. In particular, the first bad revision is: https://hg.mozilla.org/mozilla-central/rev/2ce5cfe5cbe9c4331b547293db3b1e188d006de9.

Thanks for tracking that down!

Matt, would you be able to take a look at this?

Flags: needinfo?(matt.woodrow)

Any idea how I would go about debugging this? Do I need to setup a MinGW build environment?

Setting gfx.webrender.dcomp-win.enabled=false fixes the issue here, so it's definitely this change.

The changes should only have affected videos/<canvas>, but it does slightly change how we pass coords through to direct composition (using IDCompositionVisual::SetTransform instead of SetOffsetX/Y).

I went through the WinGW headers for IDCompositionVisual, D2D1_RECT_F and D2D1_MATRIX_3X2_F but nothing really stands out as possibly incorrect.

Flags: needinfo?(matt.woodrow)

(In reply to Matt Woodrow (:mattwoodrow) from comment #13)

Any idea how I would go about debugging this? Do I need to setup a MinGW build environment?

I am actually not sure but in case it helps here is the try build I made for bisecting:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a822d6a135f0cd7ac75eea1ca9a9eb597a01be7a

it has binaries with .pdb files that might help.

Setting gfx.webrender.dcomp-win.enabled=false fixes the issue here, so it's definitely this change.

The changes should only have affected videos/<canvas>, but it does slightly change how we pass coords through to direct composition (using IDCompositionVisual::SetTransform instead of SetOffsetX/Y).

I went through the WinGW headers for IDCompositionVisual, D2D1_RECT_F and D2D1_MATRIX_3X2_F but nothing really stands out as possibly incorrect.

I am not sure which commit you looked at but maybe the headers are fixed on trunk now while we are still on commit 1b373beec6d07478ffba33726bb3bb21f32e4411? Might be worth double-checking...

Moving based on regressing bug 1653166 mentioned in Comment 10.

Component: General → Graphics: WebRender
Product: Firefox → Core

I did an "Add New Job" for a regular Win64 build to:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a822d6a135f0cd7ac75eea1ca9a9eb597a01be7a

in case it might be helpful to watch the values going into dcomp in side-by-side builds.

Matt and I dug into the disassembly of the call to visual->SetTransform().

The IDCompositionVisual interface has two overloads of this function, but the MinGW build calls the wrong variant, because the vtable order for overloaded functions is different depending on the ABI that you compile for: https://godbolt.org/z/cG6WTG (look for the call qword ptr [rax + ...]). The Microsoft ABI adds overloaded functions to the vtable in reverse order of declaration.

tjr, do you know if there a reason why the target triple for MinGW builds is x86_64-w64-mingw32 rather than x86_64-w64-win32?

Flags: needinfo?(tom)

Maybe Jacek has some idea for comment #17, too.

Wow, thanks for digging into this! I don't know. I've sent in a try build changing the target....
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ba8a58af6215de49c62b4dcc00b332cd7dcd6a1f

Flags: needinfo?(tom)
Severity: -- → S3
Priority: -- → P2

Jacek let me know in #build that changing the target will not work, but supplied a patch he thinks will. I sent in https://treeherder.mozilla.org/#/jobs?repo=try&revision=882ce7c614b30cba01771243cfa97b15f29fe602

Just in case that doesn't work, there's also an overload for SetClip further down in the header.

What are next steps? Is it to change the interface definition ala Jacek's patch?

(In reply to :dmajor from comment #22)

What are next steps? Is it to change the interface definition ala Jacek's patch?

Does the build work? :) I haven't been able to test it as my Windows machine is still packed in a box.

Works for me! (Though I'd still recommend adding SetClip and possibly scanning for other interfaces with overloads as well)

Okay, I will work on getting a patch upstreamed; and we will need to finish up Bug 1655482 and then we can bump.

Attached file overloads in dcomp.h (deleted) —

Comment 26 assumes that overloads are listed consecutively so that uniq can flag them.

As far as I can tell only dcomp.h has this problem. My search command found a few hits in other files but they were false positives.

Thanks for great findings. It should be fixed in mingw-w64 git:
https://sourceforge.net/p/mingw-w64/mingw-w64/ci/af56acd2c36af609e701fa4880f25c5f37abe502/

It's unfortunate that ABIs don't match in this case. COM interfaces are generally meant to be portable, that's why function overloading is prohibited in IDL files. Sadly, it looks like Microsoft started introducing them in hand-written interfaces at some point (there is similar problem with dwrite_3.h).

BTW, the difference between *-win32 (or *-windows) and *-mingw32 (or *-windows-gnu) triplets is if it uses mingw or msvc ABI. Currently it's not possible to use msvc ABI with headers/libs provided by mingw (Windows SDK is required). In theory, it could be possible to adjust mingw-w64 to something like that, but it's a lot of work. I experimented a bit with clang msvc mode on top of Wine and we allow something like that in Wine, but it's not possible to use that for purposes we need here (at least not in current form, maybe at some point).

The mingw fix was included in the bump for bug 1655482. I just tried the latest m-c build and it works!

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 84 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: