Closed Bug 1682876 Opened 4 years ago Closed 4 years ago

Fix NVIDIA device reset handling and recovery

Categories

(Core :: Graphics: WebRender, defect, P3)

Desktop
Linux
defect

Tracking

()

RESOLVED FIXED
86 Branch
Tracking Status
firefox86 --- fixed

People

(Reporter: aosmond, Assigned: aosmond)

References

(Blocks 1 open bug)

Details

Crash Data

Attachments

(1 file)

There are too many buffers to clear and WR is becoming unstable the more I attempt to clear. I don't think it is worth the effort to avoid a full reset just for NVIDIA binary driver resets. Let's just handle it like a normal reset.

While testing this via suspend and resume, I often found that we would lose WebRender entirely and fallback to basic. This is because of this:

https://searchfox.org/mozilla-central/rev/c7cf087b6e1384608ca3989f042f12f7cabd0a5f/gfx/gl/GLContext.cpp#520

We would call fGetError, it would return GL_CONTEXT_LOST, set mContextLost = false in the process, and the next fGetError call would continue to return GL_CONTEXT_LOST as a result. We should probably be calling mSymbols.fGetError directly for the initial call as that seems more in line with the spirit of the intention here.

These NVIDIA device resets are specific to Linux and trying to handle
them more gracefully is increasingly difficult. There are many
textures/buffers that we need to clear inside WebRender, but attempting
to add them to the list has proved difficult due to the number of places
we need to add, as well as race conditions with clearing them. Given
this shouldn't happen often, it doesn't seem worth optimizing for and we
should treat it just as an innocent device reset.

Testing this revealed an issue during recovery where unflushed device
resets were not handled as expected. When we checked for errors after
creating a new GL context, we would encounter a GL_CONTEXT_LOST error
which we failed to recover from. This is because we called
GLContext::fGetError instead of the GL method directly; the context lost
state was saved in mContextLost, and any subsequent calls to
GLContext::fGetError would continue to return GL_CONTEXT_LOST.

Pushed by aosmond@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/73bec1147728 Better handle Linux NVIDIA device resets, and unflushed device resets. r=sotaro,jgilbert
Blocks: wr-nv-linux
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 86 Branch
Crash Signature: [@ __memmove_avx_unaligned_erms | webrender::renderer::Renderer::update_texture_cache]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: