Closed Bug 124377 Opened 23 years ago Closed 23 years ago

LiveLock in Xsun

Categories

(Core Graveyard :: X-remote, defect)

Sun
SunOS
defect
Not set
minor

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: animalfriend, Assigned: blizzard)

References

()

Details

From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:0.9.8) Gecko/20020205 BuildID: 2002020508 (I have a workaround for this bug, c below) When running Mozilla-0.9.8 for some time, the whole application freezes, utilizing 100% CPU, half of it being spent in XSun, rest in mozilla-bin. There is no error message and this state continues until Mozilla is killed (SIGTERM, STRG-C). Unfortunately, there is no specific action that triggers this behaviour, though it often happens when I change to a new page, but sometimes also when scrolling a (rather simple) page. I have seen this happen on SunRays (aka X-Terminals) and Ultra3 stations. Reproducible: Sometimes Steps to Reproduce: 1. Go to the gif-buster 2. Wait a bit, freeze normally occurs after 4-10 reloads Actual Results: Freeze, have to kill process. After restarting, i noticed: - My history is blank (interrupted during page load when history file is updated?) - I have a file with length 0 in my disk cache (lots of space free, so disk is not the problem) Expected Results: Everything else but lockup. ;-) Workaround: Start mozilla with --sync option (synchronized X commands). This seems to hit performance (networkded X) but makes mozilla rock solid. Environment: ---------- sysinfo Manufacturer (Short) is Sun Manufacturer (Full) is Sun Microsystems System Model is Fire 280R Main Memory is 4.0 GB Virtual Memory is 7.2 GB Number of CPUs is 2 CPU Type is sparcv9+vis2 App Architecture is sparc Kernel Architecture is sun4u OS Name is SunOS OS Version is 5.8 OS Distribution is Solaris 8 ... SPARC Kernel Version is ..... 64-bit ---------- $DISPLAY is set to ":6.0" (SunRay Terminal) Freezing appears less frequently on single CPU systems (Ultra3), so maybe there is a conncurrency problem.
Maybe this bug is connected with 102552
1. Wanna try to update your GDK/GTK+ library versions, please ? 2. Did you apply the recommended patches for Solaris 2.8 ? (OT: SunRays are no Xterminals - they are completely different architectures...)
1. Cannot update GTK+, because I am a student and this is my University's server (no admin). GTK is 1.2.6 2. I checked the patch status, nearly all are installed and up2date (5 lagging behind with 1 version or so) I could track down the bug a bit, by using truss mozilla --no-shm (I figured the bug to be connected with that). Here's a snippet: ---------- 18129: open("/tmp/.X11-pipe/X6", O_RDWR) = 6 18129: fstat(6, 0xFFBEEAA0) = 0 18129: uname(0xFFBEE900) = 1 18129: fcntl(6, F_SETFD, 0x00000001) = 0 18129: access("/home/cip/96/jnlukasc/.Xauthority", 4) = 0 18129: open("/home/cip/96/jnlukasc/.Xauthority", O_RDONLY) = 7 18129: fstat64(7, 0xFFBEE9C8) = 0 18129: ioctl(7, TCGETA, 0xFFBEE954) Err#25 ENOTTY 18129: read(7, "\0\0\00483BC1E u\001 6\0".., 8192) = 8192 18129: read(7, " s w s k 4 G j 2 K o / 3".., 8192) = 5777 18129: read(7, 0x0007FE4C, 8192) = 0 18129: llseek(7, 0, SEEK_CUR) = 13969 18129: close(7) = 0 18129: writev(6, 0xFFBEF008, 4) = 48 18129: fstat64(6, 0xFFBEEE98) = 0 18129: fcntl(6, F_SETFL, 0x00000080) = 0 18129: read(6, "01\0\0\v\0\0\0 *", 8) = 8 18129: read(6, "\0\019\n04\0\0\0\0 ?FFFF".., 168) = 168 18129: write(6, " 7\0\00504\0\0\0\0\0\0 %".., 64) = 64 18129: read(6, "01\0\002\0\0\0\0\0\0\0\0".., 32) = 32 18129: read(6, "01\b\003\0\00219\0\0\01F".., 32) = 32 18129: readv(6, 0xFFBEF020, 2) = 2148 18129: writev(6, 0xFFBEEEA4, 3) = 20 18129: read(6, "01\0\004\0\0\0\0\0\0\0\0".., 32) = 32 18129: getpid() = 18129 [18118] 18129: writev(6, 0xFFBEEF94, 3) = 20 18129: read(6, "01\0\005\0\0\0\00188\0\0".., 32) = 32 18129: writev(6, 0xFFBEEE64, 3) = 20 18129: read(6, "01\0\006\0\0\0\00188\0\0".., 32) = 32 18129: getuid() = 30165 [30165] 18129: writev(6, 0xFFBEEE54, 3) = 32 18129: read(6, "01\0\0\b\0\0\0\00194 [\0".., 32) = 32 18129: write(6, "9401\002\001\0\0", 8) = 8 18129: read(6, "01\0\0\t\0\0\0\0\0\0\010".., 32) = 32 18129: open("/tmp/.X11-sme16", O_RDWR) = 7 18129: mmap(0x00000000, 65536, PROT_READ|PROT_WRITE, MAP_SHARED, 7, 0) = 0xFE5 B0000 18129: close(7) = 0 18129: unlink("/tmp/.X11-sme16") = 0 18129: write(6, "9402\002\001\0\0", 8) = 8 18129: read(6, "01\0\0\n\0\0\0\0\0\0\0\0".., 32) = 32 18129: uname(0xFFBEEA88) = 1 18129: write(6, " ", 1) = 1 18129: read(6, "019B\00F\0\0\0\0\0\0\091".., 32) = 32 18129: write(6, " ", 1) = 1 18129: read(6, "01\0\011\0\0\0\0\0\0\095".., 32) = 32 18129: write(6, " ", 1) = 1 18129: read(6, "01\0\012\0\0\0\0\0\0\0D6".., 32) = 32 18129: write(6, " ", 1) = 1 18129: read(6, "01\0\013\0\0\0\0\0\0\096".., 32) = 32 18129: write(6, " ", 1) = 1 18129: read(6, "01\0\014\0\0\0\0\0\001 W".., 32) = 32 18129: write(6, " ", 1) = 1 18129: read(6, 0xFFBEF15C, 32) Err#11 EAGAIN ---------- I am not an X expert, but I'd say that MIT-SHM gets used (huh?). EAGAIN means "out-of-process", strange in a read IMHO. In the LiveLock, there are lots of read(6) operations with EAGAIN. Am I the only one with this problem? What else could cause this (KDE-2.2)? With the workaround --sync Mozilla is rock-solid for me (better than <4.7), therefore I'd say this is a minor glitch, maybe just document it in Release notes. OT: Difference SunRay - XTerminal is IMHO only that there runs no Unix on the appliance, just the XServer, but principle is the same.
> 1. Cannot update GTK+, because I am a student and this is my University's > server (no admin). GTK is 1.2.6 AFAIK you can always install newer versions of GDK/GTK+ in your homedir and use them via setting the LD_LIBRARY_PATH to the location of the newer libs. ---- > I am not an X expert, but I'd say that MIT-SHM gets used (huh?) Uhm, no. The use of /tmp/.X11-sme16 means that Sun's shared memory _transport_ is being used instead of CPU-intensive piping of data (packages are exchanged via shared memory). ---- > EAGAIN means "out-of-process", strange in a read IMHO. In the LiveLock, > there are lots of read(6) operations with EAGAIN. From the read(2) manual page (% man -s2 read): -- snip -- When attempting to read from an empty pipe (or FIFO): [snip] o If some process has the pipe open for writing and O_NONBLOCK is set, read() returns -1 and sets errno to EAGAIN. -- snip -- This means the pipe simply has no new data - nothing special. ---- Do you know the current Xsun patch revision, e.g. what does -- snip -- % showrev -p | fgrep 108652- -- snip -- say ?
I forgot something to quote: Every glib/GDK/GTK+ version below 1.2.8 will end in a more or less instable mozilla (except you use a Xlib toolkit Zilla which does not rely on GDK/GTK+ libraries) ...
1. Tried that, not better. (compiled 1.2.10 of glib and gtk) and used LD_ variable, checked that it's getting used with ldd, zilla was functional but locked after some time). Add version of gtk to README then (section requirements for Solaris, only lists patches). 2. Problem must be linked with transport somehow. I used loopback and it ran stable (is loopback faster than sync???). 3. Uh, sry about that read() thing (5 yrs study of CS wasted). I also saw an SIGALRM on some mutex that is repeated in the truss - would that help? Or if I append the truss as attachment? 4. Patch is -47. Probably recommended patch ball, pretty much everything else up to date. 5. Could be the Solaris version was build on campus. I try to get in contact with the admins, see if they did something special (==wrong). Am I on the only installation with this problem???? 6. A friend just tested with KDE1, same error, so KDE2 is also not the culprit.
Just chiming in with a me too. I noticed this same behavior when upgrading patches from 108652-40 to 108652-46. As frankenstein would say -40 good, -46 bad. When -47 came out, I tried that and it exhibits the same behavior. Am willing to collect more info on machine state with either patches installed if someone gives me some direction on what they want (truss, etc.). Have backed out to patch level -40 and mozilla runs fine. I should also say that this problem has persisted through three milestones of mozilla, 0.9.6-.8 (I tend to use the precompiled one provided by mozilla.org).
Thank god someone else had this problem and me not crazy. I can confirm your findings: 6 and 8 (self- and precompiled) both had same problem, newest gtk didn't help either. I built gtk myself using POSIX threads as stated on this site. I cannot patch back XSun (without getting killed by root), but if that fixes the prob, I guess we tracked it down to a Sun X bug. Agreed? Or are there precompiled and proven to work gtk-libs somewhere we should test? Add DISPLAY reroute code in mozilla.sh as patch for SUN architecture for 1.0???
Well, you may try if the GDK/GTK+-free Xlib toolkit in Zilla does not suffer from this problem. Simply build it with: -- snip -- # unpack source tarball % ./configure --enable-defaut-toolkit=xlib % gmake -- snip -- If this Zilla works then the GDK/GTK+ libraries (or the GDK/GTK+-specific code in Mozilla) have a bug ...
Summary: LiveLock in XSun → LiveLock in Xsun
BTW: There is ma minor typo in my last comment: It's -- snip -- configure --enable-default-toolkit=xlib -- snip -- a 'l' was missing...
We too have been seeing this issue. We noticed it when a 10/1 MU version was rock solid (108652-38), and the latest recommended patches kept hanging the Xsun with both Mozilla and OpenOffice. We were at 108652-51 (the latest on the Sun site), users reported hangs frequently (sometimes several per hour). We are backing down to -40 to see if that fixes the problem.
Brian, did you try changing the $DISPLAY to use local loopback (like "hostname:0.0")? Did that fix the problem? What would be the downside of going back to pl 40? IMHO the performance impact is not that big. Can anyone check if the problem is known at SUN? I don't know how to get access to their bug database.
This bug seems to be FIXED as of Mozilla 1.0! I had the gif-buster URL running for several minutes without the $DISPLAY workaround and everything ran smooth. Configuration: - Mozilla 1.0 - gtk 1.2.10 - glib 1.2.10 - XSun patch revision: 108652-53 I don't know which of these fixed the issue - maybe they did combined.
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago
Resolution: --- → WORKSFORME
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.