mingw-clang builds do not render properly and show blank windows
Categories
(Core :: Graphics: WebRender, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox81 | --- | wontfix |
firefox82 | --- | wontfix |
firefox83 | --- | wontfix |
firefox84 | --- | fixed |
People
(Reporter: bryce, Unassigned)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: regression)
Attachments
(1 file)
(deleted),
text/plain
|
Details |
STR:
- Download a nightly mingw-clang build (https://firefox-ci-tc.services.mozilla.com/tasks/index/gecko.v2.mozilla-central.latest.firefox/win64-mingwclang-debug -- opt is broken in a similar way).
- Run the build
Actual result:
The profile browser (or Firefox window if I specify a profile on the command line) render as blank/white. Rendering the build unusable.
Expected result:
Rendering functions as in non-mingw-clang builds.
Any estimate of when was the last working build? (Note that they haven't been produced every day lately, the builds are tier2 and have seen a lot of bustage that didn't get backed out)
Reporter | ||
Comment 2•4 years ago
|
||
(In reply to :dmajor from comment #1)
Any estimate of when was the last working build? (Note that they haven't been produced every day lately, the builds are tier2 and have seen a lot of bustage that didn't get backed out)
I'm afraid not, this is the first one I can recall running. Is there a way I could bisect? I'm largely unfamiliar with these builds (ended up here trying to debug another bug).
Maybe there's a nicer way, but as a quick and dirty approach, you could find a build on m-c's treeherder and go to "Similar Jobs" and get the past few weeks of history there.
Comment 5•4 years ago
|
||
I tried a bit today going back on https://treeherder.mozilla.org/#/jobs?repo=mozilla-central to find the last working mingw-clang build. It turns out it is the one from 7/29 (commit 3b87c49182a4f435447a4f9147c1db7d496ecb11) while the first available after that on 4/8 (commit 02be42e5105ba49e93890fdbd9e24c4f55a9251c) is broken.
It's a bit unfortunate that the mingw-clang builds in between those two dates are either busted or did not happen at all. Or maybe my treeherder knowledge is just too limited to find them.
Comment 6•4 years ago
|
||
Oh, I overlooked that there is one more build that succeeded on 7/29 (commit 4c1c82402c09bd9b6a6bf57f96edc719ece2b2f3) and that runs properly on my Windows box, too. So the pushlog in question is: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=4c1c82402c09bd9b6a6bf57f96edc719ece2b2f3&tochange=02be42e5105ba49e93890fdbd9e24c4f55a9251c. Still a lot of commits. :(
It's really frustrating that we let these builds stay red for days at a time.
If we need to narrow it down further, I guess we'll have to build locally, or push try builds against intermediate revisions in the regression range, with the bustage fixes "backported" so to speak.
Comment 8•4 years ago
|
||
(In reply to :dmajor from comment #7)
It's really frustrating that we let these builds stay red for days at a time.
If we need to narrow it down further, I guess we'll have to build locally, or push try builds against intermediate revisions in the regression range, with the bustage fixes "backported" so to speak.
Yeah, that was my plan. I can try spending some time tomorrow on that (doing the try builds version).
I'm going to hold off on bug 1668304 while this investigation is active, to avoid having too many moving pieces at once.
Comment 10•4 years ago
|
||
Alright, this took me longer than expected. The build bustage encountered is pretty well described in https://bugzilla.mozilla.org/show_bug.cgi?id=1656725#c0. Taking that into account while bisecting gave me bug 1653166 as the culprit. In particular, the first bad revision is: https://hg.mozilla.org/mozilla-central/rev/2ce5cfe5cbe9c4331b547293db3b1e188d006de9.
Updated•4 years ago
|
Updated•4 years ago
|
Comment 11•4 years ago
|
||
Set release status flags based on info from the regressing bug 1653166
Updated•4 years ago
|
Reporter | ||
Updated•4 years ago
|
Comment 12•4 years ago
|
||
(In reply to Georg Koppen from comment #10)
Alright, this took me longer than expected. The build bustage encountered is pretty well described in https://bugzilla.mozilla.org/show_bug.cgi?id=1656725#c0. Taking that into account while bisecting gave me bug 1653166 as the culprit. In particular, the first bad revision is: https://hg.mozilla.org/mozilla-central/rev/2ce5cfe5cbe9c4331b547293db3b1e188d006de9.
Thanks for tracking that down!
Matt, would you be able to take a look at this?
Comment 13•4 years ago
|
||
Any idea how I would go about debugging this? Do I need to setup a MinGW build environment?
Setting gfx.webrender.dcomp-win.enabled=false fixes the issue here, so it's definitely this change.
The changes should only have affected videos/<canvas>, but it does slightly change how we pass coords through to direct composition (using IDCompositionVisual::SetTransform instead of SetOffsetX/Y).
I went through the WinGW headers for IDCompositionVisual, D2D1_RECT_F and D2D1_MATRIX_3X2_F but nothing really stands out as possibly incorrect.
Comment 14•4 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #13)
Any idea how I would go about debugging this? Do I need to setup a MinGW build environment?
I am actually not sure but in case it helps here is the try build I made for bisecting:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a822d6a135f0cd7ac75eea1ca9a9eb597a01be7a
it has binaries with .pdb files that might help.
Setting gfx.webrender.dcomp-win.enabled=false fixes the issue here, so it's definitely this change.
The changes should only have affected videos/<canvas>, but it does slightly change how we pass coords through to direct composition (using IDCompositionVisual::SetTransform instead of SetOffsetX/Y).
I went through the WinGW headers for IDCompositionVisual, D2D1_RECT_F and D2D1_MATRIX_3X2_F but nothing really stands out as possibly incorrect.
I am not sure which commit you looked at but maybe the headers are fixed on trunk now while we are still on commit 1b373beec6d07478ffba33726bb3bb21f32e4411? Might be worth double-checking...
Comment 15•4 years ago
|
||
Moving based on regressing bug 1653166 mentioned in Comment 10.
Comment 16•4 years ago
|
||
I did an "Add New Job" for a regular Win64 build to:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a822d6a135f0cd7ac75eea1ca9a9eb597a01be7a
in case it might be helpful to watch the values going into dcomp in side-by-side builds.
Comment 17•4 years ago
|
||
Matt and I dug into the disassembly of the call to visual->SetTransform()
.
The IDCompositionVisual
interface has two overloads of this function, but the MinGW build calls the wrong variant, because the vtable order for overloaded functions is different depending on the ABI that you compile for: https://godbolt.org/z/cG6WTG (look for the call qword ptr [rax + ...]
). The Microsoft ABI adds overloaded functions to the vtable in reverse order of declaration.
tjr, do you know if there a reason why the target triple for MinGW builds is x86_64-w64-mingw32
rather than x86_64-w64-win32
?
Comment 18•4 years ago
|
||
Maybe Jacek has some idea for comment #17, too.
Comment 19•4 years ago
|
||
Wow, thanks for digging into this! I don't know. I've sent in a try build changing the target....
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ba8a58af6215de49c62b4dcc00b332cd7dcd6a1f
Updated•4 years ago
|
Comment 20•4 years ago
|
||
Jacek let me know in #build that changing the target will not work, but supplied a patch he thinks will. I sent in https://treeherder.mozilla.org/#/jobs?repo=try&revision=882ce7c614b30cba01771243cfa97b15f29fe602
Comment 21•4 years ago
|
||
Just in case that doesn't work, there's also an overload for SetClip
further down in the header.
Updated•4 years ago
|
Comment 22•4 years ago
|
||
What are next steps? Is it to change the interface definition ala Jacek's patch?
Comment 23•4 years ago
|
||
(In reply to :dmajor from comment #22)
What are next steps? Is it to change the interface definition ala Jacek's patch?
Does the build work? :) I haven't been able to test it as my Windows machine is still packed in a box.
Comment 24•4 years ago
|
||
Works for me! (Though I'd still recommend adding SetClip
and possibly scanning for other interfaces with overloads as well)
Comment 25•4 years ago
|
||
Okay, I will work on getting a patch upstreamed; and we will need to finish up Bug 1655482 and then we can bump.
Comment 26•4 years ago
|
||
Comment 27•4 years ago
|
||
Comment 26 assumes that overloads are listed consecutively so that uniq
can flag them.
As far as I can tell only dcomp.h has this problem. My search command found a few hits in other files but they were false positives.
Comment 28•4 years ago
|
||
Thanks for great findings. It should be fixed in mingw-w64 git:
https://sourceforge.net/p/mingw-w64/mingw-w64/ci/af56acd2c36af609e701fa4880f25c5f37abe502/
It's unfortunate that ABIs don't match in this case. COM interfaces are generally meant to be portable, that's why function overloading is prohibited in IDL files. Sadly, it looks like Microsoft started introducing them in hand-written interfaces at some point (there is similar problem with dwrite_3.h).
BTW, the difference between *-win32 (or *-windows) and *-mingw32 (or *-windows-gnu) triplets is if it uses mingw or msvc ABI. Currently it's not possible to use msvc ABI with headers/libs provided by mingw (Windows SDK is required). In theory, it could be possible to adjust mingw-w64 to something like that, but it's a lot of work. I experimented a bit with clang msvc mode on top of Wine and we allow something like that in Wine, but it's not possible to use that for purposes we need here (at least not in current form, maybe at some point).
Comment 29•4 years ago
|
||
The mingw fix was included in the bump for bug 1655482. I just tried the latest m-c build and it works!
Updated•4 years ago
|
Description
•