786383 - OS crashes when opening a page

Reporter

Description

•

12 years ago

Attached file relevant info from messages, Xorg.0.log and .xsession-errors (deleted) — Details

Occasionally, and especially soon after resuming after suspend, Minefield crashes and takes the whole OS with it. (On another computer, I think it just kills the X server, but I'm not sure it is the same issue.) I realize that a report like this is almost useless, but I am bothering to report it because I did manage to find some relevant information from /var/log/messages, Xorg.0.log.old, and .xsession-errors.old. Of course most of this looks like problems with the video system, but I am reporting it here because this problem happens only when I open a page with Minefield (and once when I closed a tab, but I'm not sure about that). After I reboot, I can always open the same page that "caused" the crash with no problem.

Scoobidiver (away)

Comment 1

•

12 years ago

Minefield no longer exists. It's now called Nightly: http://nightly.mozilla.org/ Can you provide a valid stack trace (see https://developer.mozilla.org/docs/How_to_get_a_stacktrace_for_a_bug_report)?

Severity: normal → critical

Keywords: crash

Jonathan Baron

Reporter

Comment 2

•

12 years ago

(In reply to Scoobidiver from comment #1) > Minefield no longer exists. It's now called Nightly: > http://nightly.mozilla.org/ I guess I knew this. > Can you provide a valid stack trace (see > https://developer.mozilla.org/docs/How_to_get_a_stacktrace_for_a_bug_report)? I have already looked, and it seems that the answer is no. The crash is of the entire operating system, before the stacktrace is made. And of course I cannot reliably replicate the crash. Things work most of the time. And it seems that I have abrt running: root@barber ~ > systemctl list-units | grep abrtd abrtd.service loaded active running ABRT Automated Bug Reporting Tool So I guess the crash is before this too. Any other ideas?

Scoobidiver (away)

Updated

•

12 years ago

Summary: crashes when opening a page → OS crashes when opening a page

Jonathan Baron

Reporter

Comment 3

•

12 years ago

Additional information. This seems like the same bug. http://lists.freedesktop.org/archives/dri-devel/2012-June/024091.html It does seem to happen reliably when I start Nightly after a resume. I'm not sure that this is the only time it happens. And the crash does not happen until I start Nightly. That is the only thing that triggers it.

Scoobidiver (away)

Updated

•

12 years ago

Component: General → Graphics

Product: Firefox → Core

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 4

•

12 years ago

First we need to know what driver you're using. From the log it appears to be Nouveau. Can you give more details: what exact Nouveau and Mesa versions? Do you have the Nouveau GL driver installed or only 2D? If you have GL, try: glxinfo | egrep vendor\|renderer\|version Does your crash reproduce with default preferences (i.e. in a clean profile) or did you toggle some preference, such as layers.acceleration.force-enabled?

Jonathan Baron

Reporter

Comment 5

•

12 years ago

(In reply to Benoit Jacob [:bjacob] from comment #4) > First we need to know what driver you're using. From the log it appears to > be Nouveau. Can you give more details: what exact Nouveau and Mesa versions? from lspci: nVidia Corporation G86 [Quadro NVS 290] (rev a1) (prog-if 00 [VGA controller]) Subsystem: nVidia Corporation Device 0492 > Do you have the Nouveau GL driver installed or only 2D? If you have GL, try: > > glxinfo | egrep vendor\|renderer\|version baron@barber ~ > glxinfo | egrep vendor\|renderer\|version server glx vendor string: SGI server glx version string: 1.4 client glx vendor string: Mesa Project and SGI client glx version string: 1.4 GLX version: 1.4 OpenGL vendor string: nouveau OpenGL renderer string: Gallium 0.4 on NV86 OpenGL version string: 2.1 Mesa 8.0.3 OpenGL shading language version string: 1.20 (I don't know whether this answers your question.) > Does your crash reproduce with default preferences (i.e. in a clean profile) > or did you toggle some preference, such as layers.acceleration.force-enabled? I find that I can now replicate the crash reliably. I close Nightly, suspend the computer, wake up the computer, and try to start Nightly. That does it. Other things may do it too, but this is reliable. I just did this with a completely new profile. So I did not toggle any preferences. I also checked ps and lsmod to see if anything was different before or after suspend - thinking that maybe "resume" was the problem - and was unable to find anything. But clearly "resume" is not working properly. On the other hand, I have been suspending this computer every night for several years with no problem. Anything else I should check along these lines?

Jonathan Baron

Reporter

Comment 6

•

12 years ago

https://bugs.archlinux.org/task/31338 seems to be the same bug And it links to this, which is also the same: http://lkml.indiana.edu/hypermail/linux/kernel/1206.1/01611.html

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 7

•

12 years ago

Thanks. Can you try this: go to about:config, set gfx.xrender.enabled to false, restart the browser. Does the problem persist? Also, very recently (yesterday) a fix landed that changes some OpenGL stuff that Firefox does unconditionally on startup to detect system information. That was done to avoid X server crashes on certain drivers. See bug 680644. Please retry with today's Nightly build, as it should be the first build with the fix.

Jonathan Baron

Reporter

Comment 8

•

12 years ago

(In reply to Benoit Jacob [:bjacob] from comment #7) > Thanks. > > Can you try this: go to about:config, set gfx.xrender.enabled to false, > restart the browser. Does the problem persist? Yes. > Also, very recently (yesterday) a fix landed that changes some OpenGL stuff > that Firefox does unconditionally on startup to detect system information. > That was done to avoid X server crashes on certain drivers. See bug 680644. This looks pretty different, unfortunately. > Please retry with today's Nightly build, as it should be the first build > with the fix. That didn't help either. I waited until I thought the new version (8-30) came out and tried this first, before changing gfx.xrender.enabled. Neither change helped.

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 9

•

12 years ago

OK. The only way we could understand more about this problem is by using some system tracing tool to record a trace of what happens last just before the system crash. 2 things come to mind: - strace can help you record low-level system calls - I would also look into recording X11 activity: try to see if there exists some X11 tracing tool out there.

Jonathan Baron

Reporter

Comment 10

•

12 years ago

Attached file output of strace -o trace /home/baron/firefox/firefox (deleted) — Details

Jonathan Baron

Reporter

Comment 11

•

12 years ago

I replied already, then "added" the attachment, but now the reply did not show up. I can never figure out how to do this correctly. (In reply to Benoit Jacob [:bjacob] from comment #9) > OK. > > The only way we could understand more about this problem is by using some > system tracing tool to record a trace of what happens last just before the > system crash. > > 2 things come to mind: > > - strace can help you record low-level system calls I attached the output of strace. The crash happened as usual, and I let it run through what seemed to be two attempts to restart X11. > - I would also look into recording X11 activity: try to see if there exists > some X11 tracing tool out there. I looked and could not find one.

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 12

•

12 years ago

The only relevant I see in the strace is this at the end: write(2, "firefox: Fatal IO error 0 (Succe"..., 52) = 52 But this is only writing an error message about a IO error, not the IO error itself; I was hoping that the IO error itself would show in strace but that's not the case. A google seach for X11 trace gave this: http://xtrace.alioth.debian.org/

Jonathan Baron

Reporter

Comment 13

•

12 years ago

(In reply to Benoit Jacob [:bjacob] from comment #12) > A google seach for X11 trace gave this: > http://xtrace.alioth.debian.org/ Sorry, I did the Google search but missed this one. (There were others that seemed not to have a way of installing on my system.) I will attempt to attach the output of xtrace /home/baron/firefox/firefox > xtrace-output up to the time of the crash, in the next comment.

Jonathan Baron

Reporter

Comment 14

•

12 years ago

Attached file output of xtrace /home/baron/firefox/firefox > xtrace-output (deleted) — Details

output of xtrace /home/baron/firefox/firefox > xtrace-output

Jonathan Baron

Reporter

Comment 15

•

12 years ago

I discovered that abrt (automatic bug reporting tool in Fedora 17) has actually been working, sort of. It saves a lot of information but to my knowledge does not actually report anything. I thought that it wasn't working at all. So I have a whole bunch of information from the last crash, and I wonder if any of the following might be relevant (before I just send it all). The information seems to be about a crash in Xorg and does not mention firefox. abrt_version component executable package pkg_release usr_share_xorg_conf_d.tar.gz analyzer count hostname pkg_arch pkg_version uuid architecture duphash kernel pkg_epoch reason Xorg.0.log backtrace etc_X11_xorg_conf_d.tar.gz os_release pkg_name time ##### Another thought: Why does this happen after suspend/resume? Clearly there is another bug, not in Firefox, that is causing the resume to be incomplete in some way. I have looked quite a bit and found nothing so far. For example, all the same kernel modules are loaded before and after suspend/resume. All the same processes are running. The output of "systemctl list-units" is the same. (Still, Firefox is the only thing that causes the crash, and note that others have reported the same problem, although not here.)

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 16

•

12 years ago

> abrt_version component executable package pkg_release > usr_share_xorg_conf_d.tar.gz > analyzer count hostname pkg_arch pkg_version > uuid > architecture duphash kernel pkg_epoch reason > Xorg.0.log > backtrace etc_X11_xorg_conf_d.tar.gz os_release pkg_name time This looks like mostly "column titles" in a table without the following table contents; except for the X-related filenames indeed which point to an issue in X. > > ##### > > Another thought: Why does this happen after suspend/resume? Clearly there is > another bug, not in Firefox, that is causing the resume to be incomplete in > some way. I have looked quite a bit and found nothing so far. For example, > all the same kernel modules are loaded before and after suspend/resume. All > the same processes are running. The output of "systemctl list-units" is the > same. No idea; but driver/X bugs on suspend/resume are not rare, I have some right here with the proprietary NVIDIA driver.

Jonathan Baron

Reporter

Comment 17

•

12 years ago

(In reply to Benoit Jacob [:bjacob] from comment #16) > > abrt_version component executable package pkg_release > > usr_share_xorg_conf_d.tar.gz > > analyzer count hostname pkg_arch pkg_version > > uuid > > architecture duphash kernel pkg_epoch reason > > Xorg.0.log > > backtrace etc_X11_xorg_conf_d.tar.gz os_release pkg_name time > > This looks like mostly "column titles" in a table without the following > table contents; except for the X-related filenames indeed which point to an > issue in X. Sorry for not being clear. This is a listing of a directory. The file backtrace, for example, looks like this: 0: /usr/bin/Xorg (xorg_backtrace+0x36) [0x4652a6] 1: /usr/bin/Xorg (mieqEnqueue+0x26b) [0x5514ab] 2: /usr/bin/Xorg (0x400000+0x47f02) [0x447f02] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f4421dbd000+0x60e4) [0x7f4421dc30e4] 4: /usr/bin/Xorg (0x400000+0x80787) [0x480787] 5: /usr/bin/Xorg (0x400000+0xa4a80) [0x4a4a80] 6: /lib64/libpthread.so.0 (0x38c4600000+0xefe0) [0x38c460efe0] 7: /lib64/libc.so.6 (ioctl+0x7) [0x38c3eea2f7] 8: /lib64/libdrm.so.2 (drmIoctl+0x28) [0x38ddc03548] 9: /lib64/libdrm.so.2 (drmCommandWrite+0x1b) [0x38ddc0577b] 10: /lib64/libdrm_nouveau.so.1 (0x7f4425b77000+0x3085) [0x7f4425b7a085] 11: /lib64/libdrm_nouveau.so.1 (nouveau_bo_map_range+0x103) [0x7f4425b7a6b3] 12: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (0x7f4425d99000+0x6718) [0x7f4425d9f718] 13: /usr/lib64/xorg/modules/libexa.so (0x7f4424cf2000+0xb007) [0x7f4424cfd007] 14: /usr/bin/Xorg (0x400000+0x160483) [0x560483] 15: /usr/bin/Xorg (0x400000+0xc9d50) [0x4c9d50] 16: /usr/bin/Xorg (0x400000+0xfa8da) [0x4fa8da] 17: /usr/bin/Xorg (0x400000+0x3444a) [0x43444a] 18: /usr/bin/Xorg (0x400000+0x23485) [0x423485] 19: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x38c3e21735] 20: /usr/bin/Xorg (0x400000+0x2375d) [0x42375d]

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 18

•

12 years ago

ah, ok. I still don't see anything more there than "this is an issue with X". But your xtrace output has more precise information: the last lines are: 001:<:001d: 16: Request(98): QueryExtension name='XFIXES' 001:>:001d:32: Reply to QueryExtension: present=true(0x01) major-opcode=147 first-event=98 first-error=158 001:<:001e: 12: XFIXES-Request(147,0): QueryVersion major version=5 minor version=0 001:>:001e:32: Reply to QueryVersion: major version=5 minor version=0 001:<:001f: 16: XFIXES-Request(147,5): CreateRegion region=0x02c00004 rectangles={x=0 y=0 w=16 h=16}; 001:<:0020: 20: DRI2-Request(137,6): CopyRegion drawable=0x02c00002 region=0x02c00004 dest=FrontLeft(0x00000000) src=FakeFrontLeft(0x00000007) 001:>:0020:32: Reply to CopyRegion: 001:<:0021: 8: XFIXES-Request(147,10): DestroyRegion region=0x02c00004 001:<:0022: 8: GLX-Request(153,4): glXDestroyContext context=0x02c00003 001:<:0023: 8: Request(4): DestroyWindow window=0x02c00002 001:<:0024: 8: Request(79): FreeColormap cmap=0x02c00001 001:<:0025: 8: Request(60): FreeGC gc=0x02c00000 001:<:0026: 4: Request(43): GetInputFocus 001:>:0026:32: Reply to GetInputFocus: revert-to=Parent(0x02) focus=0x01e00037 *EOF* As these are the last lines, it seems that the problem has something to do with the XFIXES extension. Could you try disabling it (maybe it's an option in xorg.conf) ?

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 19

•

12 years ago

Also, the other question is why is GLX being used there? The preceding lines in the log, with MOZILLA_COMMANDLINE, show that we are running out X11 initialization code, which can be either http://mxr.mozilla.org/mozilla-central/source/widget/xremoteclient/XRemoteClient.cpp or http://mxr.mozilla.org/mozilla-central/source/toolkit/components/remote/nsGTKRemoteService.cpp This doesn't use GLX. Are you by any chance using a OpenGL-based compositing window manager? Could it be what's using XFIXES and GLX here, resulting in the crash? Could you try checking if the crash reproduces with a non-OpenGL-compositing window manager?

Karl Tomlinson (:karlt)

Comment 20

•

12 years ago

xtrace run in this way wouldn't catch what the window manager was doing. Mesa uses XFIXES with DRI2. Note that there are two X client connections in the log. 000 is the main Firefox process doing the remote communication. 001, with the last requests, is glxtest.cpp. (CopyRegion seems to be a result of dri2FlushFrontBuffer, which I guess happens when finished with the GL context.) Attachment 656114 [details] and "Fatal IO error" look like an X server crashes, but the stack in comment 17 may be a hang. It does look like a problem with the graphics card/driver and suspend.

Karl Tomlinson (:karlt)

Comment 21

•

12 years ago

Does running glxinfo (instead of firefox) after resume cause similar symptoms?

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 22

•

12 years ago

(In reply to Karl Tomlinson (:karlt) from comment #20) > xtrace run in this way wouldn't catch what the window manager was doing. > Mesa uses XFIXES with DRI2. > > Note that there are two X client connections in the log. > 000 is the main Firefox process doing the remote communication. > 001, with the last requests, is glxtest.cpp. Oh! got it now. thanks. (In reply to Karl Tomlinson (:karlt) from comment #21) > Does running glxinfo (instead of firefox) after resume cause similar > symptoms? That is indeed what I'd like to know, as glxtest.cpp does almost the same as glxinfo.

Jonathan Baron

Reporter

Comment 23

•

12 years ago

(In reply to Benoit Jacob [:bjacob] from comment #22) > > Does running glxinfo (instead of firefox) after resume cause similar > > symptoms? > > That is indeed what I'd like to know, as glxtest.cpp does almost the same as > glxinfo. Yes. glxinfo causes the crash. It doesn't look quite the same, but it has all the main features. After a reboot, glxinfo and firefox both work fine. I suppose the workaround is to turn off glx. Maybe I don't need it anyway. I'd be interested to see if this is in fact a bug in firefox. Probably not, since Firefox no longer uniquely causes the crash.

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 24

•

12 years ago

Thanks, that was very helpful. In case you're interested in what's happening: on startup, Firefox uses GLX to query information about the driver, to check if enabling certain graphics features is safe. This is what we called "glxtest" above. This is very similar to the glxinfo program. The fact that both show similar symptoms strongly confirms that this is a driver bug rather than an issue in either. It also means that GLX is really broken on this system (glxinfo is among the simplest and common GLX-using programs) so you're better off disabling GLX anyway. An easy way to do that is to uninstall the Nouveau OpenGL driver (but do keep the Nouveau 2D driver). Alternatively you can also disable GLX in a xorg.conf file. It would be interesting to debug this driver bug, but I don't know how to do that and now that it reproduces with glxinfo, that is a much better testcase. If you have the time, you could file a bug at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa and select Drivers/DRI/Nouveau for Component.

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 25

•

12 years ago

On our side, it's hard to do anything as we crash precisely while trying to get the information that would tell us if we need to do something special. What we could do though is check if glxtest crashed, and in that case permanently disable it (and all the features that depend on it). The downside is that we would in this case no longer automatically re-enable these features when the driver bug gets fixed.

Jonathan Baron

Reporter

Comment 26

•

12 years ago

(In reply to Benoit Jacob [:bjacob] from comment #25) > On our side, it's hard to do anything as we crash precisely while trying to > get the information that would tell us if we need to do something special. > > What we could do though is check if glxtest crashed, and in that case > permanently disable it (and all the features that depend on it). The > downside is that we would in this case no longer automatically re-enable > these features when the driver bug gets fixed. In my opinion, which is not worth much since I'm just a bug reporter, it should be labeled NOTABUG (if that still exists). Now that you all have found what the problem is, the few people who suffer from this problem will do a Google search and find this bug report, which should be sufficient until the actual bug is fixed. An alternative might be to put something in about:config to disable the use of glx. (I actually looked for such a thing.) Users might have other reasons to do this.

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 27

•

12 years ago

(In reply to Jonathan Baron from comment #26) > (In reply to Benoit Jacob [:bjacob] from comment #25) > > On our side, it's hard to do anything as we crash precisely while trying to > > get the information that would tell us if we need to do something special. > > > > What we could do though is check if glxtest crashed, and in that case > > permanently disable it (and all the features that depend on it). The > > downside is that we would in this case no longer automatically re-enable > > these features when the driver bug gets fixed. > > In my opinion, which is not worth much since I'm just a bug reporter, it > should be labeled NOTABUG (if that still exists). That would be INVALID. We will do that unless we decide to do something here. > Now that you all have > found what the problem is, the few people who suffer from this problem will > do a Google search and find this bug report, which should be sufficient > until the actual bug is fixed. Unfortunately, most users react differently when their browser repeatedly crashes on startup: they switch browsers. On the other hand, a good reason NOT to do as I proposed in comment 25 is that this will cause permananent degradations on systems that had a one-time issue. > > An alternative might be to put something in about:config to disable the use > of glx. (I actually looked for such a thing.) Users might have other reasons > to do this. We can't easily do this because this GLXtest thing has to run very early during startup, before we start reading preferences. But we could allow disabling it with an environment variable, and that would be a good idea indeed as that at least would have no downside.

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 28

•

12 years ago

Attached patch MOZ_AVOID_OPENGL_ALTOGETHER env var (deleted) — Details — Splinter Review

Attachment #658950 - Flags: review?(karlt)

Karl Tomlinson (:karlt)

Comment 29

•

12 years ago

Comment on attachment 658950 [details] [diff] [review] MOZ_AVOID_OPENGL_ALTOGETHER env var Nice and simple.

Attachment #658950 - Flags: review?(karlt) → review+

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 30

•

12 years ago

http://hg.mozilla.org/integration/mozilla-inbound/rev/5c5001289c36

Assignee: nobody → bjacob

Target Milestone: --- → mozilla18

Ryan VanderMeulen [:RyanVM]

Comment 31

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/5c5001289c36

Status: NEW → RESOLVED

Closed: 12 years ago

Flags: in-testsuite-

Resolution: --- → FIXED

relevant info from messages, Xorg.0.log and .xsession-errors 12 years ago Jonathan Baron (deleted), text/plain		Details
output of strace -o trace /home/baron/firefox/firefox 12 years ago Jonathan Baron (deleted), text/plain		Details
output of xtrace /home/baron/firefox/firefox > xtrace-output 12 years ago Jonathan Baron (deleted), text/plain		Details
MOZ_AVOID_OPENGL_ALTOGETHER env var 12 years ago Benoit Jacob [:bjacob] (mostly away) (deleted), patch	karlt : review+	Details \| Diff \| Splinter Review