Closed Bug 1829487 Opened 2 years ago Closed 1 years ago

Hardware accelerated UI rendering broken (Sony Vaio VPCCA2S0E gen6 gt2)

Categories

(Core :: Graphics: WebRender, defect)

x86_64
Windows 10
defect

Tracking

()

RESOLVED FIXED
114 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox112 --- unaffected
firefox113 --- unaffected
firefox114 + fixed

People

(Reporter: steven+mozilla, Assigned: gw)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: correctness, regression)

Attachments

(7 files)

Following a recent Nightly update, the whole UI just renders unreadably as soon as I start Firefox (screenshot attached). If I turn off hardware acceleration it renders just fine (screenshot attached).

The problem reproduces using mozregression-gui which uses a fresh profile with no add-ons.

So far I've tried only 32-bit Firefox.

Reproduction steps

  1. Open up Firefox.

Expected outcome

The GUI should render correctly (screenshot attached).

Actual outcome

The GUI renders unreadably (screenshot attached).

Workaround

Disable hardware acceleration (Preferences->Performance->Use recommended performance settings: off->Use hardware acceleration when available: off)

Bisection results

Last working build: dccf044c
First broken build: 1c58e874

Change that caused the issue

Bug 1823578

https://phabricator.services.mozilla.com/D173095

GPU information

This is extracted from the full troubleshooting information which I've attached.

Active: Yes
Description: Intel(R) HD Graphics 3000
Vendor ID: 0x8086
Device ID: 0x0116
Driver Version: 9.17.10.4459
Driver Date: 5-19-2016
Drivers: igdumd64 igd10umd64 igd10umd64 igdumd32 igd10umd32 igd10umd32
Subsys ID: 00000000
RAM: 0

Other potentially relevant information

I can't get any useful logs (MOZ_LOG=all:5 gives no output, I can't work out a suitable set of modules for logging).

Some time ago I got a problem with webrendering on this machine. I got partway through tracking down but never reported it. I worked round that by setting gfx.webrender.force-disabled to true and I've never set it back to false. If you think information from there might be relevant then I'll dig up my notes.

Attachment #9329812 - Attachment description: Buggy rendering screenshot (fresh profile, → Buggy rendering screenshot (fresh profile)

This probably should be moved to product: Core, component: Graphics: WebRender, but I don't think I have permission to create bugs there.

Another workaround, instead of disabling hardware acceleration in about:preferences, the problem can be avoided by setting gfx.webrender.software to false

I see that gfx.webrender.force-disabled has been renamed and possibly removed. So, it's possible that this is as repeat of the investigation I started some time ago (serves me right for not filing a bug report at the time).

In the troubleshooting information, under Graphics/Features, with no changes in settings (fresh profile) it says:

Compositing WebRender

With hardware acceleration completely disabled it says:

Compositing WebRender (Software)

With hardware acceleration enabled but gfx.webrender.software set to false, it says:

Compositing WebRender (Software D3D11)

The first one has broken rendering, the other two are fine.

All of this means either the right answer is to disable webrender on gen6 gt2 (undoing Bug 1638905), or to diagnose and fix the problem on this hardware.

It looks like it's different from the problem I was investigating some time ago.

I found my notes and tried my reproduction of the old issue. On build dccf044c, the website in question rendered just fine on a fresh profile without altering any settings. On 1c58e874, the rendering is damaged but it's not the catastrophe I saw when I was investigating some time ago.

So, it looks like in general, webrender for gen6 gt2 is OK, and this could be a new, specific bug.

I started poking random gfx.webrender settings. These are my results so far:

gfx.webrender.software=true fixes the problem (as previously reported).
gfx.webrender.debug.disable-batching=true has no effect
gfx.webrender.compositor=false has no effect.
gfx.webrender.max-partial-present-rects=0 has no effect.
gfx.webrender.debug.gpu-cache=true has no effect.
gfx.webrender.batched-texture-uploads=false has no effect.
gfx.webrender.blob-images=false has no effect.
gfx.webrender.dcomp-use-virtual-surfaces=false has no effect.
gfx.webrender.max-filter-ops-per-chain=1 has no effect.
gfx.webrender.multithreading=false has no effect.
gfx.webrender.use-optimized-shaders=false fixes the problem.

So, something in the shaders? I'm out of my depth here.

I should note that in my previous comment, I tried each of those settings one at a time and then reverted them.

Component: General → Graphics: WebRender
Keywords: regression
Product: Firefox → Core
Regressed by: 1823578

:gw, since you are the author of the regressor, bug 1823578, could you take a look? Also, could you set the severity field?

For more information, please visit auto_nag documentation.

Flags: needinfo?(gwatson)

Using software rendering is certainly a valid way to mitigate this, but I am a little curious what we did that broke on these drivers. The strange images on the left are clearly an offscreen render target that should not be making it to the picture cache tiles shown in the UI.

It's interesting that gfx.webrender.use-optimized-shaders=false fixes the problem because the most likely explanation for this rendering artifact would be that one or more of the shaders is not successfully compiling in the driver, which is made somewhat more obtuse by the fact it is translating the shader from OpenGL to Direct3D11 (via ANGLE), so the driver is actually possibly having a problem with the Direct3D11 shader we're sending.

Blocks: gfx-triage
Severity: -- → S2

We may need to back this patch out because this is a common GPU and SWGL fallback may not be ideal. We'll be figuring this out on Monday.

Attempting to fix the shader optimizer may be easier than backing out the patch, but it will be difficult to pin down the problem in there.

We may want to make a downloadable blocklist implementation for disabling the shader optimizer as well.

Flags: needinfo?(jmuizelaar)

I'll compare the shader differences with previous bugs we saw on intel gen6 (https://github.com/jrmuizel/gen6-miscompilation linked from https://github.com/servo/webrender/wiki/Driver-issues

Flags: needinfo?(ahale)

I'm a little confused by this bug actually as the regressor (bug 1823578 ) was backed out already?

Backed out but relanded a few days later.

(Steven Singer from comment #0)

First broken build: 1c58e874

And this is the re-landed commit.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: correctness

Backed out but relanded a few days later.

This matches what I saw. I saw it break one day, but by the time I got round to looking, I updated Nightly and it was fixed. Then it broke again a few days later and stayed broken. I tried to target the mozregression search to the second breakage (because, for all I knew at the time, these were two separate bugs and I didn't want to report something that had already been fixed).

Depends on: 1830691

I can reproduce this locally on a Gen6 0x126 with 9.17.10.3347

Flags: needinfo?(jmuizelaar)

The bug is marked as tracked for firefox114 (nightly). We have limited time to fix this, the soft freeze is in 3 days. However, the bug still isn't assigned.

:bhood, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(bhood)

Seems like this could be the same underlying cause as bug 1708937

Assignee: nobody → ahale
Flags: needinfo?(bhood)

This should be temporarily resolved when bug #1830691 lands. We'll need to work out the underlying cause of this before we can re-enable the new clip-mask rendering paths.

Flags: needinfo?(gwatson)
Assignee: ahale → gwatson

Steven, are you able to confirm if the most recent nightly now works correctly on your hardware (with config settings reverted to their previous values) ?

Flags: needinfo?(ahale) → needinfo?(steven+mozilla)

No. It's not fully fixed.

It's better, more of the UI is working (notably, many of the buttons), but the there are still problems on the UI and the web page.

I've made sure I'm up to date on Nightly 2023-05-04 (20230504215417).

Disabling optimised shaders fixes it. I'll attach a couple of screenshots.

Flags: needinfo?(steven+mozilla)

To my untrained eye, it looks like the problem is now restricted to backgrounds.

Definitely backgrounds, but also composition of text onto backgrounds.

All the images, text and so on are in the right places but backgrounds and text are the wrong colour. Sometimes it's the background that's the wrong colour. The problem with the text is that it's invisible (same colour as the background) regardless of whether the background was the correct colour.

I've attached a video which should make things clearer.

I double checked. On build dccf044c, just before bug 1823578 landed, everything was OK. On 1c58e874, just after, everything was broken. On build 45d725a4 just before bug 1830691 landed, everything was still broken. On build, just after, a62a959b things are in the right place but the background/text problem exists.

As before, turning off shader optimisation fixes the problem.

There must be a bug being exposed in the shader optimizer / driver by the changes to the base shader in that patch, I suspect. I'll create a build today that reverts more of that. I'll post a link to one or more test builds here once that's done, if it would be feasible for you to test them for me.

I may be able to test a build depending on when you send the link. I'm on UK time.

I normally run 32-bit Firefox, but the problem shows on both 64-bit and 32-bit, so make whatever's easier.

This try run will create both 32 and 64 bit builds [1].

For 32-bit Windows, the build has completed and a zip artifact can be downloaded from [2]. I believe if you unzip that to a local directory and run it, you should be able to test it without any installer etc. It should use your existing nightly profile, I think.

The 64-bit build hasn't quite completed yet.

[1] https://treeherder.mozilla.org/jobs?repo=try&revision=977d29f58a3d25c83a06562bfe17c98271f36c15&selectedTaskRun=Nql4Iwc8RQ2mOwEF1K5I9Q.0

[2] https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Nql4Iwc8RQ2mOwEF1K5I9Q/runs/0/artifacts/public/build/target.zip

Both zip files (32-bit and 64-bit) still show the problem (I've made a blank profile for testing, so I started them with -P "Test").

If you want to check I was running the right version, the troubleshooting information for both reports version 20230507194423.

That does look like the correct build id, thanks for checking. Do you happen to know if the driver on your machine from 2016 is the most recent driver available? It seems very old, but then maybe that's the last supported driver for that GPU?

I wonder if the best option might be to block hw-rendering on hd3000 drivers from 2016. How does the browser performance feel on your machine in general if you have gfx.webrender.software enabled?

Would it be better to only block optimized shaders on hd3000? It looks like there's already infrastructure for doing this
https://searchfox.org/mozilla-central/rev/4e6970cd336f1b642c0be6c9b697b4db5f7b6aeb/widget/GfxInfoBase.cpp#227

I'm a bit worried that will just mask the driver bug until the next issue we hit like this. But it's probably worth doing in this case, and if we run in to it again we might block hw-wr completely.

I'm pretty sure this is the latest driver. This is a really old processor (2nd generation Intel Core) and is not being updated (has is end-of-life).

The Intel web site (https://www.intel.com/content/www/us/en/download/17608/intel-graphics-driver-for-windows-15-28.html) lists 9.17.10.4229 from 6/5/2015 as the latest. I'm running something similar (9.17.10.4459, maybe a manufacturer variant). Comment 10 on bug 1678903 gives a similar date.

I'll have to find a site to play with to see what the performance is like. I've been running just with optimised shaders turned off since I found that worked.

Jeff, Andrew, is it easy to block optimized shaders on windows for this device (old gen6)? What would the right approach to do that? I think that might be the best workaround to this for now, since merge day is tomorrow.

Flags: needinfo?(jmuizelaar)
Flags: needinfo?(aosmond)

I hunted around and found a benchmark at https://browserbench.org/MotionMark1.2/ that shows the differences.

I tried latest Nightly (20230507095340) with three settings: default ("h/w"), gfx.webrender.use-optimized-shaders: false ("unopt"), gfx.webrender.software: true ("s/w"). I also tried the builds just before and after the first breakage (default settings only) and Chrome (113.0.5672.64) and Edge (113.0.1774.35).

I should note that the first run with Firefox after starting it up gave bad results (like 1.00±300.00%) for the first test (Multiply) which distorted the results so I discarded that run and took the values from the second run. Still, this was better than Edge which just failed to run at all the first time but ran OK after refreshing the page.

20230507 h/w 20230507 unopt 20230507 s/w dccf044c h/w Build 1c58e874 Edge Chrome
Renders correctly no[1] yes no[2] yes no[3] yes yes
Overall score  135.67 ±8.59%  140.15 ±7.74%  99.84 ±8.25%  133.46± 9.87%  144.30 ±8.09%  150.29 ±4.99%  137.85 ±7.07%
Multiply   30.73±29.34%   41.31±20.87%  60.88±22.66%   35.62±35.50%   68.37±18.35%  230.61 ±4.54%  266.18 ±5.71%
Canvas Arcs  380.07 ±5.12%  395.85 ±4.04% 204.77 ±5.28%  376.33 ±5.48%  391.85 ±4.31%  160.43 ±2.14%  112.28±10.41%
Leaves  149.59 ±7.84%  148.62 ±6.32%  93.14 ±9.72%  147.51±15.16%  117.65±13.03%   92.00 ±4.35%   53.55 ±7.44%
Paths 1020.45 ±4.74% 1062.25 ±3.68% 698.75 ±4.46% 1072.05 ±5.60% 1019.13 ±6.67%  433.37 ±2.78%  376.93 ±4.28%
Canvas Lines  777.33 ±8.38%  729.07 ±7.35% 500.22 ±3.25%  811.53 ±4.21%  744.90 ±4.38% 2485.60 ±5.58% 2474.41 ±3.47%
Images    9.09 ±3.74%    8.89 ±4.40%  30.12 ±5.94%    8.31 ±4.70%    8.63 ±3.18%   53.06 ±7.30%   41.30 ±6.24%
Design   47.53 ±9.01%   48.21 ±9.04%   8.73 ±6.83%   39.04 ±8.24%   45.86±10.85%   25.65±12.59%   36.00±22.22%
Suits[4]  191.60 ±3.56%  184.38 ±5.18%  92.55±10.40%  180.37 ±4.45%  198.56 ±4.50%   52.18 ±6.19%   58.75 ±3.77%

The difference in benchmarks between full hardware acceleration and skipping shader optimisation is within the run-to-run variation.

[1] One of the shapes in Multiply flickers (and there are issues with text before and after the test).

[2] Many of the shapes in multiply have stray pixels in a near-line outside the shape at some angles (maybe worth investigating separately).

[3] Many shapes in Multiply flicker and, as per the initial bug report, the UI is badly corrupted.

[4] In Firefox, there's a noticeable stutter at the start of each burst whereas in Edge it's smooth.

So the headline figure doesn't change that much between hardware and software rendering (about 140 drops to about 100) but this is due to some scores dropping while others, notable Images, improve (maybe worth investigating separately and hardware rendering shouldn't make anything worse).

Yes. The new builds (20230507234502) that block shader optimisation appear to work (I checked both 32-bit and 64-bit).

Thanks for the details benchmarks and testing. We'll land that patch as an interim fix, and look for a better long-term fix.

Pushed by gwatson@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/da0d796d2b7b Block shader optimization feature on Windows + SandyBridge r=gfx-reviewers,jrmuizel
Status: NEW → RESOLVED
Closed: 1 years ago
Resolution: --- → FIXED
Target Milestone: --- → 114 Branch

Just to close the loop, my main copy of Nightly just upgraded to 115.0a1 (20230509093033) and everything's rendering correctly (on both my normal profile and the one I'm using for testing).

The graphics section of the troubleshooting info says WEBRENDER_OPTIMIZED_SHADERS, env, blocklisted, Blocklisted by gfxInfo, Blocklisted due to known issues: bug 1829487

Excellent, thanks for confirming.

Flags: needinfo?(jmuizelaar)
Flags: needinfo?(aosmond)
No longer blocks: gfx-triage
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: