Closed Bug 1683266 Opened 4 years ago Closed 4 years ago

Crash in [@ __memmove_avx_unaligned_erms | webrender::renderer::Renderer::draw_frame]

Categories

(Core :: Graphics: WebRender, defect)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1694909
Tracking Status
firefox-esr78 --- unaffected
firefox84 --- disabled
firefox85 --- disabled
firefox86 --- disabled

People

(Reporter: dholbert, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression)

Crash Data

My main browsing profile is insta-crashing whenever my laptop resumes from suspend. Crash report auto-populated info below.

Also: I tried setting media.ffmpeg.dmabuf-textures.disabled to true, per bug 1632698 comment 25, but that doesn't help.


Maybe Fission related. (DOMFissionEnabled=1)

Crash report: https://crash-stats.mozilla.org/report/index/b2d63560-8682-47e8-b2dc-a054d0201218

Reason: SIGSEGV /SEGV_MAPERR

Top 10 frames of crashing thread:

0 libc.so.6 __memmove_avx_unaligned_erms 
1 libxul.so webrender::renderer::Renderer::draw_frame gfx/wr/webrender/src/renderer.rs:5917
2 libxul.so webrender::renderer::Renderer::render_impl gfx/wr/webrender/src/renderer.rs:3490
3 libxul.so webrender::renderer::Renderer::render gfx/wr/webrender/src/renderer.rs:3246
4 libxul.so wr_renderer_render gfx/webrender_bindings/src/bindings.rs:639
5 libxul.so mozilla::wr::RendererOGL::UpdateAndRender gfx/webrender_bindings/RendererOGL.cpp:186
6 libxul.so mozilla::wr::RenderThread::UpdateAndRender gfx/webrender_bindings/RenderThread.cpp:476
7 libxul.so mozilla::wr::RenderThread::HandleFrameOneDoc gfx/webrender_bindings/RenderThread.cpp:336
8 libxul.so mozilla::detail::RunnableMethodImpl<mozilla::wr::RenderThread*, void  xpcom/threads/nsThreadUtils.h:1201
9 libxul.so base::MessagePumpDefault::Run ipc/chromium/src/base/message_pump_default.cc:35

I'm using Ubuntu 20.10 with nvidia driver metapackage "nvidia-driver-455 (proprietary, tested)" from the Ubuntu "Software and Updates | Additional Drivers" dialog.

(and I've manually opted in to webrender by setting gfx.webrender.all to true, and I also have fission enabled with fission.autostart set to true. There may be other prefs/settings that are required to trigger the crash, too; I'm not sure. I haven't triggered it in a ~fresh profile so far (enabling webrender isn't sufficient to trigger it in a fresh profile); but I can trigger it 100% of the time in my main browsing profile.

(I also only started hitting this recently because I just switched from Nouveau to NVIDIA drivers in the past day or so.)

I ran mozregression (with a copy of my main browser profile), and I determined that this crash started happening in the push for bug 1661528, BTW. https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=77c01f13f298d667b6f42fa6867698ac32ad2d77

Tentatively flagging as a regression from that bug.

Has Regression Range: --- → yes
Regressed by: 1661528

Presumably the driver does not correctly handle the persistently mapped buffers after a resume, leading to a crash when we attempt to write to one. Seems basically the same as bug 1680138.

Working around this might not be so complicated if we can detect when the suspend/resume has occurred - if we trigger a memory pressure event on the PBO pool it will free the persistently mapped buffers and allocate new ones instead. Do you know if that is possible Andrew?

Blocks: wr-nv-linux
Severity: -- → S2
Flags: needinfo?(aosmond)

The fallback option would just be to disable persistently mapped buffers on nVidia, but hopefully we can do better than that.

Meant to do S3 as we don't currently ship to nvidia

Severity: S2 → S3

Set release status flags based on info from the regressing bug 1661528

(In reply to Jamie Nicol [:jnicol] from comment #5)

Presumably the driver does not correctly handle the persistently mapped buffers after a resume, leading to a crash when we attempt to write to one. Seems basically the same as bug 1680138.

Working around this might not be so complicated if we can detect when the suspend/resume has occurred - if we trigger a memory pressure event on the PBO pool it will free the persistently mapped buffers and allocate new ones instead. Do you know if that is possible Andrew?

I tried this approach. It was a huge pain. Rather than trying to be clever, I just decided to treat these special NVIDIA device resets as innocent device resets in bug 1682876, and we tear everything down and bring it back up again. Resolved problems for me on suspend/resume (but with a much older driver as my hardware is ancient).

Flags: needinfo?(aosmond)
Depends on: 1682876

This crash is reproducible for me.
Gnome X11, Debian Testing, GTX1060: Press Ctrl+Alt+F3 to switch to a text console, then Ctrl+Alt+F2 to switch back to the X11 desktop.
Mostly instant crash, sometimes corruptions for two seconds, then a crash.
bp-afb77d59-4ee2-42de-9fd2-d4af30201222
(The "Device Reset" button on about:support seems fine.)

The crash seems characteristic of a device reset, but the crash report has no such annotation, neither DeviceResetReason or something in the critical log. I wonder if the timing is different and we crash before we get to detect it.

So far I was exclusively using MOZ_X11_EGL=1 although my crash reports contain "EGL? EGL-".
(MOZ_X11_EGL/proprietary Nvidia uses GLX for Visual (bug 1663003 comment 17) and "software timer" for vsync (bug 1650583 comment 30).)


I seem to get a different crash with pure GLX. I have gfx.webrender.panic-on-gl-error=true.
bp-2b4b6b54-04d7-4e8e-8c71-ef5a60201222
bp-619bed62-175e-4266-a351-d11680201222 [@ webrender::device::gl::Device::new::{{closure}} ]

MOZ_CRASH Reason (Sanitized) Caught GL error 507 at bind_framebuffer


A fresh profile with gfx.webrender.all true doesn't crash and does not fall back:

firefox/firefox -P nvtest
# this came right after startup:
Unflushed glGetGraphicsResetStatus: 0x92bb
# this came after switching to text console and back:
[GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
Unflushed glGetGraphicsResetStatus: 0x92bb
[GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
Unflushed glGetGraphicsResetStatus: 0x92bb

Oh. I missed this. Yes, no device reset handling on EGL -- I expect that.

The crash with GLX is because of the panic -- it would have gracefully handled the context loss otherwise.

I think the combination of bug 1680759, bug 1682876 and bug 1694909 reduced the crash rate to nothing over time with it gone in 88+.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.