Crash in [@ __memmove_avx_unaligned_erms | webrender::renderer::Renderer::draw_frame]
Categories
(Core :: Graphics: WebRender, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox84 | --- | disabled |
firefox85 | --- | disabled |
firefox86 | --- | disabled |
People
(Reporter: dholbert, Unassigned)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: crash, regression)
Crash Data
My main browsing profile is insta-crashing whenever my laptop resumes from suspend. Crash report auto-populated info below.
Also: I tried setting media.ffmpeg.dmabuf-textures.disabled
to true, per bug 1632698 comment 25, but that doesn't help.
Maybe Fission related. (DOMFissionEnabled=1)
Crash report: https://crash-stats.mozilla.org/report/index/b2d63560-8682-47e8-b2dc-a054d0201218
Reason: SIGSEGV /SEGV_MAPERR
Top 10 frames of crashing thread:
0 libc.so.6 __memmove_avx_unaligned_erms
1 libxul.so webrender::renderer::Renderer::draw_frame gfx/wr/webrender/src/renderer.rs:5917
2 libxul.so webrender::renderer::Renderer::render_impl gfx/wr/webrender/src/renderer.rs:3490
3 libxul.so webrender::renderer::Renderer::render gfx/wr/webrender/src/renderer.rs:3246
4 libxul.so wr_renderer_render gfx/webrender_bindings/src/bindings.rs:639
5 libxul.so mozilla::wr::RendererOGL::UpdateAndRender gfx/webrender_bindings/RendererOGL.cpp:186
6 libxul.so mozilla::wr::RenderThread::UpdateAndRender gfx/webrender_bindings/RenderThread.cpp:476
7 libxul.so mozilla::wr::RenderThread::HandleFrameOneDoc gfx/webrender_bindings/RenderThread.cpp:336
8 libxul.so mozilla::detail::RunnableMethodImpl<mozilla::wr::RenderThread*, void xpcom/threads/nsThreadUtils.h:1201
9 libxul.so base::MessagePumpDefault::Run ipc/chromium/src/base/message_pump_default.cc:35
Reporter | ||
Comment 1•4 years ago
|
||
I'm using Ubuntu 20.10 with nvidia driver metapackage "nvidia-driver-455 (proprietary, tested)" from the Ubuntu "Software and Updates | Additional Drivers" dialog.
Reporter | ||
Comment 2•4 years ago
|
||
(and I've manually opted in to webrender by setting gfx.webrender.all
to true, and I also have fission enabled with fission.autostart
set to true. There may be other prefs/settings that are required to trigger the crash, too; I'm not sure. I haven't triggered it in a ~fresh profile so far (enabling webrender isn't sufficient to trigger it in a fresh profile); but I can trigger it 100% of the time in my main browsing profile.
Reporter | ||
Comment 3•4 years ago
|
||
(I also only started hitting this recently because I just switched from Nouveau to NVIDIA drivers in the past day or so.)
Reporter | ||
Comment 4•4 years ago
|
||
I ran mozregression (with a copy of my main browser profile), and I determined that this crash started happening in the push for bug 1661528, BTW. https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=77c01f13f298d667b6f42fa6867698ac32ad2d77
Tentatively flagging as a regression from that bug.
Comment 5•4 years ago
|
||
Presumably the driver does not correctly handle the persistently mapped buffers after a resume, leading to a crash when we attempt to write to one. Seems basically the same as bug 1680138.
Working around this might not be so complicated if we can detect when the suspend/resume has occurred - if we trigger a memory pressure event on the PBO pool it will free the persistently mapped buffers and allocate new ones instead. Do you know if that is possible Andrew?
Comment 6•4 years ago
|
||
The fallback option would just be to disable persistently mapped buffers on nVidia, but hopefully we can do better than that.
Updated•4 years ago
|
Comment 8•4 years ago
|
||
Set release status flags based on info from the regressing bug 1661528
Comment 9•4 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #5)
Presumably the driver does not correctly handle the persistently mapped buffers after a resume, leading to a crash when we attempt to write to one. Seems basically the same as bug 1680138.
Working around this might not be so complicated if we can detect when the suspend/resume has occurred - if we trigger a memory pressure event on the PBO pool it will free the persistently mapped buffers and allocate new ones instead. Do you know if that is possible Andrew?
I tried this approach. It was a huge pain. Rather than trying to be clever, I just decided to treat these special NVIDIA device resets as innocent device resets in bug 1682876, and we tear everything down and bring it back up again. Resolved problems for me on suspend/resume (but with a much older driver as my hardware is ancient).
Updated•4 years ago
|
Comment 10•4 years ago
|
||
This crash is reproducible for me.
Gnome X11, Debian Testing, GTX1060: Press Ctrl+Alt+F3 to switch to a text console, then Ctrl+Alt+F2 to switch back to the X11 desktop.
Mostly instant crash, sometimes corruptions for two seconds, then a crash.
bp-afb77d59-4ee2-42de-9fd2-d4af30201222
(The "Device Reset" button on about:support seems fine.)
Comment 12•4 years ago
|
||
The crash seems characteristic of a device reset, but the crash report has no such annotation, neither DeviceResetReason or something in the critical log. I wonder if the timing is different and we crash before we get to detect it.
Comment 13•4 years ago
|
||
So far I was exclusively using MOZ_X11_EGL=1 although my crash reports contain "EGL? EGL-".
(MOZ_X11_EGL/proprietary Nvidia uses GLX for Visual (bug 1663003 comment 17) and "software timer" for vsync (bug 1650583 comment 30).)
I seem to get a different crash with pure GLX. I have gfx.webrender.panic-on-gl-error=true.
bp-2b4b6b54-04d7-4e8e-8c71-ef5a60201222
bp-619bed62-175e-4266-a351-d11680201222 [@ webrender::device::gl::Device::new::{{closure}} ]
MOZ_CRASH Reason (Sanitized) Caught GL error 507 at bind_framebuffer
A fresh profile with gfx.webrender.all true doesn't crash and does not fall back:
firefox/firefox -P nvtest
# this came right after startup:
Unflushed glGetGraphicsResetStatus: 0x92bb
# this came after switching to text console and back:
[GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
Unflushed glGetGraphicsResetStatus: 0x92bb
[GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
Unflushed glGetGraphicsResetStatus: 0x92bb
Comment 14•4 years ago
|
||
Oh. I missed this. Yes, no device reset handling on EGL -- I expect that.
The crash with GLX is because of the panic -- it would have gracefully handled the context loss otherwise.
Comment 15•4 years ago
|
||
I think the combination of bug 1680759, bug 1682876 and bug 1694909 reduced the crash rate to nothing over time with it gone in 88+.
Description
•