Closed Bug 1683266 Opened 4 years ago Closed 4 years ago

Crash in [@ __memmove_avx_unaligned_erms | webrender::renderer::Renderer::draw_frame]

Tracking

()

Status:

RESOLVED DUPLICATE of bug 1694909

Tracking Flags:

Tracking

Status

firefox-esr78

---

unaffected

firefox84

---

disabled

firefox85

---

disabled

firefox86

---

disabled

People

(Reporter: dholbert, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression)

Crash Data

Daniel Holbert [:dholbert]

Reporter

Description

•

4 years ago

My main browsing profile is insta-crashing whenever my laptop resumes from suspend. Crash report auto-populated info below.

Also: I tried setting media.ffmpeg.dmabuf-textures.disabled to true, per bug 1632698 comment 25, but that doesn't help.

Maybe Fission related. (DOMFissionEnabled=1)

Crash report: https://crash-stats.mozilla.org/report/index/b2d63560-8682-47e8-b2dc-a054d0201218

Reason: SIGSEGV /SEGV_MAPERR

Top 10 frames of crashing thread:

0 libc.so.6 __memmove_avx_unaligned_erms 
1 libxul.so webrender::renderer::Renderer::draw_frame gfx/wr/webrender/src/renderer.rs:5917
2 libxul.so webrender::renderer::Renderer::render_impl gfx/wr/webrender/src/renderer.rs:3490
3 libxul.so webrender::renderer::Renderer::render gfx/wr/webrender/src/renderer.rs:3246
4 libxul.so wr_renderer_render gfx/webrender_bindings/src/bindings.rs:639
5 libxul.so mozilla::wr::RendererOGL::UpdateAndRender gfx/webrender_bindings/RendererOGL.cpp:186
6 libxul.so mozilla::wr::RenderThread::UpdateAndRender gfx/webrender_bindings/RenderThread.cpp:476
7 libxul.so mozilla::wr::RenderThread::HandleFrameOneDoc gfx/webrender_bindings/RenderThread.cpp:336
8 libxul.so mozilla::detail::RunnableMethodImpl<mozilla::wr::RenderThread*, void  xpcom/threads/nsThreadUtils.h:1201
9 libxul.so base::MessagePumpDefault::Run ipc/chromium/src/base/message_pump_default.cc:35

Daniel Holbert [:dholbert]

Reporter

Comment 1

•

4 years ago

I'm using Ubuntu 20.10 with nvidia driver metapackage "nvidia-driver-455 (proprietary, tested)" from the Ubuntu "Software and Updates | Additional Drivers" dialog.

Daniel Holbert [:dholbert]

Reporter

Comment 2

•

4 years ago

(and I've manually opted in to webrender by setting gfx.webrender.all to true, and I also have fission enabled with fission.autostart set to true. There may be other prefs/settings that are required to trigger the crash, too; I'm not sure. I haven't triggered it in a ~fresh profile so far (enabling webrender isn't sufficient to trigger it in a fresh profile); but I can trigger it 100% of the time in my main browsing profile.

Daniel Holbert [:dholbert]

Reporter

Comment 3

•

4 years ago

(I also only started hitting this recently because I just switched from Nouveau to NVIDIA drivers in the past day or so.)

Daniel Holbert [:dholbert]

Reporter

Comment 4

•

4 years ago

I ran mozregression (with a copy of my main browser profile), and I determined that this crash started happening in the push for bug 1661528, BTW. https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=77c01f13f298d667b6f42fa6867698ac32ad2d77

Tentatively flagging as a regression from that bug.

Has Regression Range: --- → yes

Regressed by: 1661528

Jamie Nicol [:jnicol]

Comment 5

•

4 years ago

Presumably the driver does not correctly handle the persistently mapped buffers after a resume, leading to a crash when we attempt to write to one. Seems basically the same as bug 1680138.

Working around this might not be so complicated if we can detect when the suspend/resume has occurred - if we trigger a memory pressure event on the PBO pool it will free the persistently mapped buffers and allocate new ones instead. Do you know if that is possible Andrew?

Blocks: wr-nv-linux

Severity: -- → S2

Flags: needinfo?(aosmond)

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1680138

Jamie Nicol [:jnicol]

Comment 6

•

4 years ago

The fallback option would just be to disable persistently mapped buffers on nVidia, but hopefully we can do better than that.

Jamie Nicol [:jnicol]

Comment 7

•

4 years ago

Meant to do S3 as we don't currently ship to nvidia

Severity: S2 → S3

BugBot [:suhaib / :marco/ :calixte]

Updated

•

4 years ago

Keywords: regression

BugBot [:suhaib / :marco/ :calixte]

Comment 8

•

4 years ago

Set release status flags based on info from the regressing bug 1661528

status-firefox84: --- → affected

status-firefox85: --- → affected

status-firefox86: --- → affected

status-firefox-esr78: --- → unaffected

Andrew Osmond [:aosmond] (he/him)

Comment 9

•

4 years ago

(In reply to Jamie Nicol [:jnicol] from comment #5)

Presumably the driver does not correctly handle the persistently mapped buffers after a resume, leading to a crash when we attempt to write to one. Seems basically the same as bug 1680138.

Working around this might not be so complicated if we can detect when the suspend/resume has occurred - if we trigger a memory pressure event on the PBO pool it will free the persistently mapped buffers and allocate new ones instead. Do you know if that is possible Andrew?

I tried this approach. It was a huge pain. Rather than trying to be clever, I just decided to treat these special NVIDIA device resets as innocent device resets in bug 1682876, and we tear everything down and bring it back up again. Resolved problems for me on suspend/resume (but with a much older driver as my hardware is ancient).

Flags: needinfo?(aosmond)

Darkspirit

Updated

•

4 years ago

Depends on: 1682876

Ryan VanderMeulen [:RyanVM]

Updated

•

4 years ago

status-firefox84: affected → disabled

status-firefox85: affected → disabled

status-firefox86: affected → disabled

Darkspirit

Comment 10

•

4 years ago

This crash is reproducible for me.
Gnome X11, Debian Testing, GTX1060: Press Ctrl+Alt+F3 to switch to a text console, then Ctrl+Alt+F2 to switch back to the X11 desktop.
Mostly instant crash, sometimes corruptions for two seconds, then a crash.
bp-afb77d59-4ee2-42de-9fd2-d4af30201222
(The "Device Reset" button on about:support seems fine.)

Andrew Osmond [:aosmond] (he/him)

Comment 12

•

4 years ago

The crash seems characteristic of a device reset, but the crash report has no such annotation, neither DeviceResetReason or something in the critical log. I wonder if the timing is different and we crash before we get to detect it.

Darkspirit

Comment 13

•

4 years ago

So far I was exclusively using MOZ_X11_EGL=1 although my crash reports contain "EGL? EGL-".
(MOZ_X11_EGL/proprietary Nvidia uses GLX for Visual (bug 1663003 comment 17) and "software timer" for vsync (bug 1650583 comment 30).)

I seem to get a different crash with pure GLX. I have gfx.webrender.panic-on-gl-error=true.
bp-2b4b6b54-04d7-4e8e-8c71-ef5a60201222
bp-619bed62-175e-4266-a351-d11680201222 [@ webrender::device::gl::Device::new::{{closure}} ]

MOZ_CRASH Reason (Sanitized) Caught GL error 507 at bind_framebuffer

A fresh profile with gfx.webrender.all true doesn't crash and does not fall back:

firefox/firefox -P nvtest
# this came right after startup:
Unflushed glGetGraphicsResetStatus: 0x92bb
# this came after switching to text console and back:
[GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
Unflushed glGetGraphicsResetStatus: 0x92bb
[GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
Unflushed glGetGraphicsResetStatus: 0x92bb

Andrew Osmond [:aosmond] (he/him)

Comment 14

•

4 years ago

Oh. I missed this. Yes, no device reset handling on EGL -- I expect that.

The crash with GLX is because of the panic -- it would have gracefully handled the context loss otherwise.

Andrew Osmond [:aosmond] (he/him)

Comment 15

•

4 years ago

I think the combination of bug 1680759, bug 1682876 and bug 1694909 reduced the crash rate to nothing over time with it gone in 88+.

Status: NEW → RESOLVED

Closed: 4 years ago

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Crash in [@ __memmove_avx_unaligned_erms | webrender::renderer::Renderer::draw_frame]

Categories

(Core :: Graphics: WebRender, defect)

Tracking

()

People

(Reporter: dholbert, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9

Updated

Updated

Comment 10

Comment 12

Comment 13

Comment 14

Comment 15