Closed Bug 1745530 Opened 3 years ago Closed 3 years ago

startup crash without crash report (works in safe mode) caused by mesa_glthread=true in Mesa config file

Categories

(Core :: Graphics, defect)

Firefox 97
x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
98 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox95 --- unaffected
firefox96 + fixed
firefox97 + fixed
firefox98 --- fixed

People

(Reporter: norbert.pfeiler, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, nightly-community, regression)

Attachments

(1 file)

since upgrade to nightly 97 i can only get firefox to start in safe mode
every second start crashes, and every other start it offers to start in safe mode which i gladly accept
disabling addons from safe mode doesn’t help

i find 2 suspicious threads, that are not poll/syscall/recvmsgor __futex_abstimed_wait_common64

Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f506d8687c0 in ?? () from /usr/lib/libEGL_mesa.so.0
[Current thread is 1 (Thread 0x7f50592fe640 (LWP 2251478))]
(gdb) bt
#0 0x00007f506d8687c0 in ?? () from /usr/lib/libEGL_mesa.so.0
#1 0x00007f508778e13c in ?? () from /usr/lib/dri/radeonsi_dri.so
#2 0x0000000000000000 in ?? ()

[Switching to thread 39 (Thread 0x7f50a1883780 (LWP 2251274))]
#0 0x00007f50991b0902 in ?? () from /opt/firefox-nightly/libxul.so
(gdb) bt
#0 0x00007f50991b0902 in ?? () from /opt/firefox-nightly/libxul.so
#1 0x00007fff86562fd0 in ?? ()
#2 0x00007f5099145f08 in ?? () from /opt/firefox-nightly/libxul.so
#3 0x00007fff86565620 in ?? ()
#4 0x00007f503990f8d8 in ?? ()
#5 0x00007fff86563010 in ?? ()
#6 0x00007f509913cd55 in ?? () from /opt/firefox-nightly/libxul.so
#7 0x00007f500000000b in ?? ()
#8 0x00007fff86563038 in ?? ()
#9 0x00007fff86565620 in ?? ()
#10 0x00007f503990f8d8 in ?? ()
#11 0x00007fff865630e0 in ?? ()
#12 0x00007f5091b1b900 in ?? ()
#13 0x7b0c0d6934588600 in ?? ()
#14 0x00007fff865642d0 in ?? ()
#15 0x00007fff865642d0 in ?? ()
#16 0x0000000000000002 in ?? ()
#17 0xc01a56afea8dc885 in ?? ()
#18 0x0000000000000015 in ?? ()
#19 0x00007fff86563080 in ?? ()
#20 0x00007f5098041797 in ?? () from /opt/firefox-nightly/libxul.so
#21 0x0000000100000001 in ?? ()
#22 0x00007fff865642d0 in ?? ()
#23 0x0000000000000002 in ?? ()
#24 0x0000000000000002 in ?? ()
#25 0x00000000ffffffff in ?? ()
#26 0x0000000000000015 in ?? ()
#27 0x00007fff865630b0 in ?? ()
#28 0x00007f50991b5e59 in ?? () from /opt/firefox-nightly/libxul.so
#29 0x0000000060000008 in ?? ()
#30 0x0000000000000002 in ?? ()
#31 0x00007fff865631a8 in ?? ()
#32 0x00007fff865641c0 in ?? ()
#33 0x00007fff86563160 in ?? ()
#34 0x00007f50980432e7 in ?? () from /opt/firefox-nightly/libxul.so
#35 0x00007fff865630e8 in ?? ()
#36 0x0000000286566200 in ?? ()
#37 0x00007f503990f8d8 in ?? ()
#38 0x00007fff865631a8 in ?? ()
#39 0x0000000100000007 in ?? ()
#40 0x00007f5000000008 in ?? ()
#41 0x0000000000000000 in ?? ()

(In reply to Andre Klapper from comment #1)

Please see https://support.mozilla.org/en-US/kb/troubleshoot-firefox-crashes-closing-or-quitting

It works with hwacc disabled.
During testing just now, although i had crashes, about:crashes doesn’t list anything dated today.

this is with arch linux+gnome+x11+amdgpu

Flags: needinfo?(norbert.pfeiler)
Component: General → Graphics
Product: Firefox → Core

Thanks for the report! Please open about:support in your address bar, click on "Copy text to clipboard" and paste it here.

Are you able to find a regression range? You should get a pushlog URL at the end:
$ pip3 install --user mozregression
$ ~/.local/bin/mozregression --good 95 --bad 2021-12-12

Blocks: wr-linux
Attached file about:support (deleted) —

hm, so with a new (linux) user nightly starts fine using either Wayland or X11

the 2021-12-12 mozregression test also doesn’t crash

but even
firefox-nightly -P
crashes in my session

when hardware acceleration is disabled i still get a segfault in the debugger:
gdb --args firefox-nightly -P
even though
firefox-nightly -P
works

Does a crash reporter open? Do you have any recent unsent crash reports on about:crashes? Please submit and post some IDs (bp-XXXXX).

Is the crash reproducible if you run mozregression with your settings? For example:
$ ~/.local/bin/mozregression --launch 2021-12-12 --pref gfx.webrender.all:true fission.autostart:true general.autoScroll:true image.jxl.enabled:true layout.frame_rate:120 media.hardware-video-decoding.force-enabled:true mousewheel.default.delta_multiplier_y:200

Keywords: crash, regression

no crash reporter and my last about:crashes is from the 9th, which was still nightly v96

mozregression still works with those args

Does it reproduce if you open 3 windows and then restart Firefox via about:restartrequired (browser.startup.page:3)?
$ ~/.local/bin/mozregression --launch 2021-12-12 --pref gfx.webrender.all:true fission.autostart:true general.autoScroll:true image.jxl.enabled:true layout.frame_rate:120 media.hardware-video-decoding.force-enabled:true mousewheel.default.delta_multiplier_y:200 browser.startup.page:3

now we’re getting somewhere

INFO: Last good revision: ecc17529126cd64d51c6f0c4842395a9ee93ec22
INFO: First bad revision: 799a0280b2137880417e093103f136643506ac60

799a0280b2137880417e093103f136643506ac60 crashes on startup
9eb74149f75b2444be4d13b049ce4d8dd4d894a5 crashes when opening a 2nd window (ctrl+n)

to reproduce i paste about:restartrequired upon startup, press ctrl+n twice and click the restart button with the mouse
the 3rd window overlaps the 1st

when i tried to figure out if it has to do with window overlap, where the cursor is or if it also happens with 2 windows the results were too inconsistent to make something out
i.e. it also managed to restart with 3 windows sometimes, but when i did it like described above immediately, it was deterministic

Thanks for this information. Unfortunately, nothing in the pushlog for that range sticks out or even seems remotely related :/

Just to confirm: This does not reproduce with a fresh profile, just your existing one?

(forgot to set NI for comment 10).

Flags: needinfo?(norbert.pfeiler)

mozregression doesn’t use anything from my user, does it?

Flags: needinfo?(norbert.pfeiler)

The severity field is not set for this bug.
:jimm, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jmathies)
Severity: -- → S4
Flags: needinfo?(jmathies)

(In reply to norbert.pfeiler from comment #12)

mozregression doesn’t use anything from my user, does it?

Right, mozgression should use its own profile.

Also, am I understanding this correctly that you are using our official version of Firefox with Crash Reporting enabled, but this crash is not creating a crash report that shows up in about:crashes ? If that is the case, that's yet another thing we should investigate, and we might be able to progress with a different build. Thanks in advance for checking!

Flags: needinfo?(norbert.pfeiler)

I’m on Arch Linux and use https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=firefox-nightly
which downloads https://download-installer.cdn.mozilla.net/pub/firefox/nightly/latest-mozilla-central
I think the only »modified« thing is that it has automatic updates disabled.

Since i have crash reports up until 2021-12-09 (which was still nightly 96) i’m guessing there is nothing explicitly preventing it.
Does mozregression have a crash reporter running (or can it provide some more crash info)?
If only for this issue, i could upload a coredump collected by systemd.

Flags: needinfo?(norbert.pfeiler)

A core dump would be very useful, but don't upload it on the bug as it might contain sensitive information. Send it to me via e-mail and I'll analyze the crash.

when hardware acceleration is disabled i still get a segfault in the debugger:

Are you actually crashing or does GDB stop with SIGSYS?

it’s always SIGSEGV

e.g.

~> gdb --args firefox-nightly -P
[…]
Thread 175 "firefox-ni:gl0" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff8cdfe640 (LWP 983222)]
0x00007fffa33d3fd4 in ?? () from /usr/lib/libEGL_mesa.so.0

I've tried extracting a stack trace from the core file that was sent me from the reporter but sadly went nowhere because Arch doesn't provide separate debug information for its packages (I should have remembered that, I already stumbled upon this issue in the past). I find it odd that we're tripping over a Mesa failure before we have a chance to set the exception handler and catch it.

That being said I found similar crashes also happening on Arch, see this query. They seem to be happening on Mesa 21.2.5.0, could you try a different version and see if the problem persists?

Since official binaries are used, aren’t there symbols available for it somewhere?

To try a different version of Mesa, you mean?

(In reply to norbert.pfeiler from comment #20)

Since official binaries are used, aren’t there symbols available for it somewhere?

No, I think that it's possible to build them if you build packages locally (see this bug) but I don't think they're distributed for pre-built packages. Or at least I couldn't find them.

To try a different version of Mesa, you mean?

Yes, to figure out if it's an issue on their side rather than on ours.

Confirming the bug in the meantime.

Status: UNCONFIRMED → NEW
Ever confirmed: true

firefox-nightly is not built (compiled), it’s just downloaded and extracted
i was thinking the symbols in the stripped binary can be matched to those in the unstripped version
but anyway, i can also disable stripping

and i can confirm Mesa versions change things:
20.3.4-3 works
21.0.0-1 doesn’t

EGL is enabled on X11 for Mesa >= 21.
Ideas: Could this be one of

(In reply to norbert.pfeiler from comment #23)

firefox-nightly is not built (compiled), it’s just downloaded and extracted
i was thinking the symbols in the stripped binary can be matched to those in the unstripped version
but anyway, i can also disable stripping

Yes, I can retrieve the debug information for that Firefox binary, but the crash is happening in libEGL_mesa.so.0 so without the debug information for that library I can't get a proper stack trace.

using mesa_glthread=false works

but fyi it behaves the same regardless of X11 or Wayland session
idk if that’s only because it’s running in xwayland at the end

(In reply to norbert.pfeiler from comment #26)

using mesa_glthread=false works

Just to be sure: This crash can be fixed by starting Nightly with mesa_glthread=false env var, right?
Please try to find out why it is even enabled on your system. It should be false by default.

Just to be sure: This crash can be fixed by starting Nightly with mesa_glthread=false env var, right?

yes

Please try to find out why it is even enabled on your system. It should be false by default.

any pointers?

~> sudo grep -lr mesa_glthread /etc/* /usr/share/*
only gives /usr/share/drirc.d/00-mesa-defaults.conf
and i don’t find anything suspicious in there

~> env mesa_glthread=false glinfo
also gives the ATTENTION output, so it seems to be set there as well

ha!
alright, i have found a ~/.drirc that sets it

How did this line look like? Was it upper- and lowercase mixed?

[Tracking Requested - why for this release]:
Please consider backing out bug 1744389 from Beta 96 and Nightly 97 because it reintroduced a startup crash without crash report. It was previously fixed by bug 1670545.

How did this line look like? Was it upper- and lowercase mixed?

do you mean the .drirc?

it contained

<driconf>
  <device screen="0" driver="radeonsi">
    <application name="Default">
      <option name="mesa_glthread" value="true" />
    </application>
  </device>
</driconf>
Summary: startup crash (works in safe mode) → startup crash without crash report (works in safe mode) caused by mesa_glthread=true in Mesa config file

(In reply to Darkspirit from comment #32)

[Tracking Requested - why for this release]:
Please consider backing out bug 1744389 from Beta 96 and Nightly 97 because it reintroduced a startup crash without crash report. It was previously fixed by bug 1670545.

Robert, what are your thoughts on this?

Flags: needinfo?(robert.mader)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #34)

(In reply to Darkspirit from comment #32)

[Tracking Requested - why for this release]:
Please consider backing out bug 1744389 from Beta 96 and Nightly 97 because it reintroduced a startup crash without crash report. It was previously fixed by bug 1670545.

Robert, what are your thoughts on this?

We can back out on 96, but I'd like to keep it in nightly.

I was already fearing something like this would happen. The option mesa_glthread is not enabled for a reason. Users who unconditionally enable it are asking for trouble. The fix from bug 1670545 adds noisy warning for almost all our users and fills up our bug reports so I'm against keeping it forever. But we may find some better way to handle this.

Flags: needinfo?(robert.mader)

(In reply to Robert Mader [:rmader] from comment #35)

But we may find some better way to handle this.

How about comment 31?

How about asserting (with a nice message) that mesa_glthread is disabled rather than crashing?

Now that i know what the issue is, I’m fine removing that it was set for my user.
I think this was created by some configuration utility wrt mangohud (gl/vulkan stats overlay).

(In reply to norbert.pfeiler from comment #37)

How about asserting (with a nice message) that mesa_glthread is disabled rather than crashing?

The issue is that there are multiple ways to enable it - we didn't catch this one so the driver went crashing :/

(In reply to Darkspirit from comment #36)

(In reply to Robert Mader [:rmader] from comment #35)

But we may find some better way to handle this.

How about comment 31?

Hm, not a fan of MESA_DEBUG=silent - the point is that mesa usually only prints stuff if it wants users to be aware that there's something odd with their setup. That's the reason why they print a loud warning when setting mesa_glthread via env var - sadly it doesn't when the option is set via driconf. Setting MESA_DEBUG=silent would mean more hard to debug reports because people wouldn't be able to see such warnings.

Maybe there's some simple way to detect the setting, however overall I'm somewhat against us trying to work around all kinds of buggy setups. IIUC enabling the gpu process should prevent crashes in this such scenario and allow a fallback to software rendering. Given that mesa_glthread seems to work fine on Wayland (where enabling the gpu process is way harder), I hope we can do that soon on X11.

The regressing bug 1744389 was Backed out for 96.0rc2 see below:
Backout link

I believe I'm now seeing something related to this with Ubuntu 20.04.3. It doesn't crash, so much as the profile manager just kind of hangs and renders a transparent window.

(In reply to Christopher Smith from comment #41)
If you have the Nvidia driver installed, you are seeing bug 1745172.

Has Regression Range: --- → yes
Has STR: --- → yes

(In reply to Darkspirit from comment #42)

(In reply to Christopher Smith from comment #41)
If you have the Nvidia driver installed, you are seeing bug 1745172.

I do, and it was. Thanks.

ATTENTION: default value of option mesa_glthread overridden by environment.
was EGL/X11-only.

  • As no crash report is generated, we don't know how many X11 users ran into bug 1670545.
  • Could X11 users who are annoyed by the log entry just set MESA_DEBUG=silent themselves (at least for the moment) or switch to Wayland?
  • Nightly+Beta Xwayland users have been switched to Wayland now (bug 1749174).
  • bug 1653444: Shouldn't X11 and Wayland be as close as possible? IIUC, a corresponding Wayland GPU process (bug 1732951) would break gfx.webrender.compositor.force-enabled (bug 1617498) as it is right now because the parent process would need to act as Wayland proxy server?

This bug can be closed as bug 1744389 was backed out from trunk. In case we don't find a nice solution it'll be another case of "fixed by Wayland" ¯_(ツ)_/¯

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: