Closed Bug 1600357 Opened 5 years ago Closed 5 years ago

Visual elements suddenly disappear with WebRender on Win10

Categories

(Core :: Graphics: WebRender, defect, P3)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla75
Tracking Status
firefox-esr68 --- unaffected
firefox72 --- wontfix
firefox73 --- wontfix
firefox74 --- wontfix
firefox75 --- fixed

People

(Reporter: tsmith, Unassigned)

References

Details

(Keywords: regression)

Attachments

(3 files)

Attached image broken.png (deleted) β€”

Running Firefox Nightly on Windows 10 once a day or so (maybe every other day) visual elements will suddenly dispersal or become discolored. This does not seem to be limited to the browser chrome or the page content. This has been happening for at least a month, maybe two?

In the attached example:

  1. the URL bar seems to be missing text
  2. the tabs seem to be missing some or all of the text
  3. a folder in the bookmark toolbar is missing text and has a strange blue artifact
  4. the github favicon is yellow
  5. on the page the B in Bugzilla is partially missing
  6. the red circle around my ni? cound is partially missing

When I move the mouse over the affected areas they pop back to normal. This appears to only be visual not functional.

STR: No clue, just seems to happen randomly.

Attached file graphics_info.txt (deleted) β€”
Component: Web Painting → Graphics: WebRender
Summary: Visual elements suddenly disappear → Visual elements suddenly disappear with WebRender on Win10
Blocks: wr-72

tsmith - if this is still happening and it seems to be a regression, could you try using mozregression to pinpoint a regression window?

Flags: needinfo?(twsmith)

(In reply to Jessie [:jbonisteel] plz needinfo from comment #2)

tsmith - if this is still happening and it seems to be a regression, could you try using mozregression to pinpoint a regression window?

Yes this is still happening and has been for months. I don't know how to get a regression range because this seems to be a visual issue (see the attached example). Nor do I know how to trigger the issue, seems to be random.

Flags: needinfo?(twsmith)

tsmith can you go to about:config and set gfx.webrender.picture-caching, restart, and then let me know if it still keeps happening?

Flags: needinfo?(twsmith)

Sure. What would you like me to set it to? It is gfx.webrender.picture-caching=true at the moment. Set it to false instead I assume?

Flags: needinfo?(twsmith) → needinfo?(jbonisteel)

Yep set it to false.

Done. I will update in a week if I no longer see the issue.

Flags: needinfo?(twsmith)

Glenn - NI-ing just to get this on your radar

Flags: needinfo?(jbonisteel) → needinfo?(gwatson)

From the driver section in about:support:

Description: NVIDIA GeForce GTX 1080
Vendor ID: 0x10de
Device ID: 0x1b80
Driver Version: 26.21.14.4120
Driver Date: 11-6-2019
Drivers: C:\WINDOWS\System32\DriverStore\FileRepository\nvddi.inf_amd64_b9df9caf8256ebae\nvldumdx.dll,C:\WINDOWS\System32\DriverStore\FileRepository\nvddi.inf_amd64_b9df9caf8256ebae\nvldumdx.dll,C:\WINDOWS\System32\DriverStore\FileRepository\nvddi.inf_amd64_b9df9caf8256ebae\nvldumdx.dll,C:\WINDOWS\System32\DriverStore\FileRepository\nvddi.inf_amd64_b9df9caf8256ebae\nvldumdx.dll C:\WINDOWS\System32\DriverStore\FileRepository\nvddi.inf_amd64_b9df9caf8256ebae\nvldumd.dll,C:\WINDOWS\System32\DriverStore\FileRepository\nvddi.inf_amd64_b9df9caf8256ebae\nvldumd.dll,C:\WINDOWS\System32\DriverStore\FileRepository\nvddi.inf_amd64_b9df9caf8256ebae\nvldumd.dll,C:\WINDOWS\System32\DriverStore\FileRepository\nvddi.inf_amd64_b9df9caf8256ebae\nvldumd.dll
Subsys ID: 33661028
RAM: 8192
GPU #2
Active: No
Description: Intel(R) HD Graphics 630
Vendor ID: 0x8086
Device ID: 0x5912
Driver Version: 21.20.16.4550
Driver Date: 11-11-2016

The NVIDIA driver looks recent, but the Intel driver looks very old. The NVIDIA GPU is active here, so it shouldn't be related - but might be worth trying to update the Intel GPU driver, just in case that is doing something weird.

I wonder if this might be related too, from the log file:

Failure Log
(#0) Error: WMF VPX video decoding is disabled due to a previous crash.
(#1): CP+[GFX1-]: WMF VPX video decoding is disabled due to a previous crash.
(#2) Error: WMF VPX video decoding is disabled due to a previous crash.
(#3): CP+[GFX1-]: WMF VPX video decoding is disabled due to a previous crash.

Sotaro, what happens when the WMF VMX decoder crashes? Does that take down the GPU process at all? Could the GPU process be crashing and not resetting properly, in a way that affects the WR texture cache or something like that?

Flags: needinfo?(gwatson) → needinfo?(sotaro.ikeda.g)

(In reply to Glenn Watson [:gw] from comment #9)

Failure Log
(#0) Error: WMF VPX video decoding is disabled due to a previous crash.
(#1): CP+[GFX1-]: WMF VPX video decoding is disabled due to a previous crash.
(#2) Error: WMF VPX video decoding is disabled due to a previous crash.
(#3): CP+[GFX1-]: WMF VPX video decoding is disabled due to a previous crash.


Sotaro, what happens when the WMF VMX decoder crashes? Does that take down the GPU process at all? Could the GPU process be crashing and not resetting properly, in a way that affects the WR texture cache or something like that?

The log seemed not related to this bug. The failure log seems to related to Bug 1570046. The above logs said that WMF VPX video decoding usage was disabled. Because WMFVPXVideoCrashGuard detected that a crash happened in a past Firefox usage during instantiating WMF VPX decoder. In this case, crash did not happen this time.

When a crash happens, about:support has the following log. It happens in content process. GPU process is not related to the crash. Bug 1570046 Comment 32 explains the place of the crash.

(#0) Error: WMF VPX decoder just crashed; hardware video will be disabled.

Flags: needinfo?(sotaro.ikeda.g)

It has been a week with gfx.webrender.picture-caching=false and I have not seen the issue.

Flags: needinfo?(twsmith)

I imagine this will be tough to nail down without specific STR

Flags: needinfo?(gwatson)

Yep, I think our possible options to fix this are (a) finding a way that we can reproduce locally or (b) providing a special build that Tyson could run with a heap of extra logging information, to see if there are any reported OpenGL errors when this occurs.

(a) seems unlikely, since it's so random - but perhaps it only occurs with a specific configuration that we haven't tried here (e.g. the exact set of extensions on Tyson's machine)? Jessie, perhaps we could set up a machine in Toronto office with same GPU / drivers, and leave it running for a few days, occasionally browsing on it to see if we can reproduce?

(b) also seems unlikely, since these kinds of bugs are often driver issues that don't report a specific API error back to us - but might be worth a shot.

Flags: needinfo?(gwatson)

The priority flag is not set for this bug.
:jbonisteel, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jbonisteel)

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

I think our VR machine is the closest one that we have. Alexis, can you take a look at the about:support of this bug and let me know if there are any differences between the specs of the machine listed there and our VR machine?

Flags: needinfo?(jbonisteel) → needinfo?(a.beingessner)
Attached file vr-machine-about-support.txt (deleted) β€”

current setup for the VR machine. need to do more checks on it, though

Flags: needinfo?(a.beingessner)

Hello, when you reproduced the problem, could you try pressing Ctrl-Shift-3 ? This should generate a wr-capture folder in your AppData\Local windows folder (for example C:\Users\you\AppData\Local\wr-capture). Then please zip and share the contents of that. Thanks!

tsmith, can you enabled picture caching again and try the suggestion above from bpeers?

Flags: needinfo?(twsmith)

Will do.

Flags: needinfo?(twsmith)

The priority flag is not set for this bug.
:jbonisteel, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jbonisteel)
Flags: needinfo?(jbonisteel)
Priority: -- → P3
No longer blocks: wr-72

Another thing worth confirming - can you let us know what version of Win10 you are using?

Flags: needinfo?(twsmith)

I've seen the issue on both 1903 and 1909 after upgrading.

Flags: needinfo?(twsmith)

There is a new flag available in the next nightly build (and only available in nightly builds) called gfx.webrender.panic-on-gl-error that can be set in about:config. After changing this value, a restart is required before it takes effect.

When this flag is set, any time the GPU driver reports a GL error, we will detect this and panic (controlled crash) the entire GPU process. It shouldn't take the entire browser down, just the GPU process (I believe the GPU process is enabled on Windows and Linux, not sure about Mac).

If you see the bug occur while that is active, and then restart the browser, the logs from the GL error should be visible in about:support.

If you see the glitch occur while that option is active, we can infer a few things:

  • If there is no GPU process crash / output logs, then no GL error is being reported (likely signals a driver bug).
  • If there is a GPU process crash, the logs should give us a clue as to what is occurring (even if nothing is logged, it would still be a clue there is a GL error occurring).

Alright I will set that pref when the build is available. ATM I have all gfx.webrender prefs set to their defaults. I would have sworn the gfx.webrender.compositor default was true last week but it is false now. Is that how I should keep everything?

Flags: needinfo?(gwatson)

Sounds good, thanks!

And yes, the gfx.webrender.compositor flag did change - we've defaulted it back to off for now, since it's not quite ready for next release, well spotted!

Flags: needinfo?(gwatson)

Thought of another question to ask, tsmith - are you using multiple monitors?

Flags: needinfo?(twsmith)

(In reply to Jessie [:jbonisteel] plz needinfo from comment #27)

Thought of another question to ask, tsmith - are you using multiple monitors?

Yes I am, they are 3840x2160 (scaled to 175%) and 1920x1200 (no scaling). I primary use the Firefox on the lower res monitor.

Flags: needinfo?(twsmith)

Curious to know if you have seen this again and if you've been able to follow the instructions in comment 24 when it does happen?

Flags: needinfo?(twsmith)

Just adding some notes from a Slack convo

  • this was happening on a desktop machine
  • twsmith says it has not happened recently

twsmith - just curious if this has popped up at all again lately?

Nothing. I've been using my desktop again consistently enough that I'd be happy to close this. I can always reopen it if I see it again (hopefully I don't). Does that work for you?

Flags: needinfo?(twsmith) → needinfo?(jbonisteel)

Actually if you don't mind, I'd like to leave this issue open for now as there are other folks who have been reporting this and we are still trying to figure out what's up.

Flags: needinfo?(jbonisteel)

Wontfix for 74 given where we are in the cycle.

Tyson - I assume you still haven't seen this in quite some time?

Flags: needinfo?(twsmith)

(In reply to Jessie [:jbonisteel] pls NI from comment #35)

Tyson - I assume you still haven't seen this in quite some time?

Nope, still nothing.

Flags: needinfo?(twsmith)

Suspected fix is bug 1617083. We can reopen if this starts happening again.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Depends on: 1617083
Target Milestone: --- → mozilla75
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: