Closed Bug 1694 Opened 26 years ago Closed 26 years ago

SunOS 5.6, Gcc 2.8.1, Latest pitches in Threads

Categories

(NSPR :: NSPR, defect, P2)

Sun
Solaris
defect

Tracking

(Not tracked)

CLOSED INVALID

People

(Reporter: igb, Assigned: wtc)

Details

Building from a cvs snapshot, I consistently get the following problem: mungo:/u/igb/mozilla/mozilla/obj-sparc-sun-solaris2.6/dist/bin 08:10:49 (543) $ ./xpviewer Segmentation Fault (core dumped) Using GDB to look at the entrails, I see: #0 0xef118bac in _hashKeyCompare (key1= Cannot access memory at address 0xef7fffec. ) at ../../../xpcom/src/nsHashtable.cpp:31 31 static PR_CALLBACK PRIntn _hashKeyCompare(const void *key1, const void *key2) { (gdb) bt #0 0xef118bac in _hashKeyCompare (key1= Cannot access memory at address 0xef7fffec. ) at ../../../xpcom/src/nsHashtable.cpp:31 Cannot access memory at address 0xef7fff74. which to me, not currently doing development for a living, says the stack has got trampled. So I run it under GDB control, and for as long as I leave the break-point at prulock:207 I can do `cont' and it keeps going. Delete the break point and it pitches. ian Program received signal SIGSEGV, Segmentation fault. 0xee612754 in PR_Lock (lock=0xa6940) at prulock.c:208 208 PRThread *me = _PR_MD_CURRENT_THREAD(); Current language: auto; currently c which isn't the same thing, but (gdb) bt #0 0xee612754 in PR_Lock (lock=0xa6940) at prulock.c:208 Cannot access memory at address 0xef7fffa4. The stack's still broken. So, guessing that the problem's happening around that area, I set a breakpoint and re-run: (gdb) break 207 Breakpoint 1 at 0xee612728: file prulock.c, line 207. (gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /u/igb/mozilla/mozilla/obj-sparc-sun-solaris2.6/dist/bin/./xpviewer warning: Unable to find dynamic linker breakpoint function. warning: GDB will be unable to debug shared library initializers warning: and track explicitly loaded dynamic code. Cannot insert breakpoint 1: Temporarily disabling shared library breakpoints: 1 Breakpoint 1, PR_Lock (lock=0xc8220) at prulock.c:207 207 { (gdb) bt #0 PR_Lock (lock=0xc8220) at prulock.c:207 #1 0xef11c030 in nsRepository::FindFactory (aClass=@0x83f90, aFactory=0xeffff304) at ../../../xpcom/src/nsRepository.cpp:281 #2 0xef11cb80 in nsRepository::RegisterFactory (aClass=@0x83f90, aLibrary=0x83cc0 "libwidgetgtk.so", aReplace=0, aPersist=0) at ../../../xpcom/src/nsRepository.cpp:474 #3 0x2663c in NS_SetupRegistry () at ../../../../xpfe/xpviewer/src/nsSetupRegistry.cpp:140 #4 0x3b01c in nsViewerApp::SetupRegistry (this=0xaa880) at ../../../../xpfe/xpviewer/src/nsViewerApp.cpp:151 #5 0x3b0dc in nsViewerApp::Initialize (this=0xaa880, argc=1, argv=0xeffff554) at ../../../../xpfe/xpviewer/src/nsViewerApp.cpp:169 #6 0x3aa64 in main (argc=1, argv=0xeffff554) at ../../../../xpfe/xpviewer/src/nsBrowserMain.cpp:104 (gdb) step PR_Lock (lock=0xc8240) at prulock.c:208 208 PRThread *me = _PR_MD_CURRENT_THREAD(); (gdb) step 213 PR_ASSERT(me != suspendAllThread); (gdb) step 215 PR_ASSERT(!(me->flags & _PR_IDLE_THREAD)); (gdb) step 225 _PR_INTSOFF(is); (gdb) step 227 PR_ASSERT(_PR_IS_NATIVE_THREAD(me) || _PR_MD_GET_INTSOFF() != 0); (gdb) step 230 if (lock->owner == 0) { (gdb) step 232 lock->owner = me; (gdb) step 233 lock->priority = me->priority; (gdb) step 235 PR_APPEND_LINK(&lock->links, &me->lockList); (gdb) step 238 _PR_FAST_INTSON(is); (gdb) step 239 return; (gdb) step 307 } (gdb) step PR_EnterMonitor (mon=0xc8220) at prmon.c:80 80 mon->entryCount = 1; (gdb) step 82 } (gdb) step nsRepository::FindFactory (aClass=@0x83f90, aFactory=0xeffff304) at ../../../xpcom/src/nsRepository.cpp:283 283 IDKey key(aClass); Current language: auto; currently c++ (gdb) step 284 FactoryEntry *entry = (FactoryEntry*) factories->Get(&key); (gdb) step 286 nsresult res = NS_ERROR_FACTORY_NOT_REGISTERED; (gdb) step 298 PR_ExitMonitor(monitor); (gdb) cont
Status: NEW → ASSIGNED
People have reported infinite recursion problems on Linux/x86. I myself got stack overflow on Digital Unix V4.0D, which could be caused by infinite recursion. Now that you also saw a corrupted stack, I think it's likely to be caused by the same infinite recursion. In short, problems elsewhere caused an infinite recursion, which resulted in a crash in NSPR functions. This is my theory. You can try the following. If it works, then we know it is due to the same infinite recursion problem (from Mike Shaver): if you want to test your port, update widget/ with the datestamp of "1998-11-24 02:00", and all should work again. The nsBaseWidget changes since then have tripped some resize bugs in the GTK code, which case [sic] this stack death. By "widget/", he's referring to the directory mozilla/widget.
I think there's more to it than the fix you suggest. Simply checking out mozilla/widget at the date quoted breaks stuff in layout/events/src. An example follows. I think NS_KEY_PRESS and the associated mKeyListener->KeyPress(*aDOMEvent) has been added to for example layout/events/src/nsEventListenerManager.cpp since the widget changes were made. ian make[3]: Entering directory `/u/igb/mozilla/mozilla/obj-sparc-sun-solaris2.6/layout/events/src' /usr/local/gcc-2.8.1/bin/g++ -o nsEventListenerManager.o -c -DXP_UNIX -g -fPIC -DUSE_AUTOCONF=1 -DMOZILLA_CLIENT=1 -DBROKEN_QSORT=1 -DSTDC_HEADERS=1 -DHAVE_ST_BLKSIZE=1 -DHAVE_ST_RDEV=1 -DHAVE_TZNAME=1 -DHAVE_DIRENT_H=1 -DSTDC_HEADERS=1 -DHAVE_SYS_WAIT_H=1 -DTIME_WITH_SYS_TIME=1 -DHAVE_FCNTL_H=1 -DHAVE_LIMITS_H=1 -DHAVE_MALLOC_H=1 -DHAVE_STRINGS_H=1 -DHAVE_UNISTD_H=1 -DHAVE_SYS_FILE_H=1 -DHAVE_SYS_IOCTL_H=1 -DHAVE_SYS_TIME_H=1 -DHAVE_SYS_CDEFS_H=1 -DHAVE_LIBC=1 -DHAVE_LIBM=1 -DHAVE_LIBDL=1 -DHAVE_LIBRESOLV=1 -DHAVE_LIBSOCKET=1 -DHAVE_LIBNSL=1 -DHAVE_LIBELF=1 -DHAVE_LIBINTL=1 -DHAVE_LIBPOSIX4=1 -DHAVE_LIBW=1 -DHAVE_LIBL=1 -DHAVE_ALLOCA_H=1 -DHAVE_ALLOCA=1 -DHAVE_UNISTD_H=1 -DHAVE_GETPAGESIZE=1 -DHAVE_MMAP=1 -DRETSIGTYPE=void -DHAVE_STRCOLL=1 -DHAVE_STRFTIME=1 -DHAVE_UTIME_NULL=1 -DHAVE_VPRINTF=1 -DHAVE_FTIME=1 -DHAVE_GETCWD=1 -DHAVE_GETHOSTNAME=1 -DHAVE_GETWD=1 -DHAVE_MKDIR=1 -DHAVE_MKTIME=1 -DHAVE_PUTENV=1 -DHAVE_RMDIR=1 -DHAVE_SELECT=1 -DHAVE_SOCKET=1 -DHAVE_STRCSPN=1 -DHAVE_STRDUP=1 -DHAVE_STRERROR=1 -DHAVE_STRSPN=1 -DHAVE_STRSTR=1 -DHAVE_STRTOL=1 -DHAVE_STRTOUL=1 -DHAVE_UNAME=1 -DHAVE_QSORT=1 -DHAVE_SNPRINTF=1 -DHAVE_WAITID=1 -DHAVE_FORK1=1 -DHAVE_REMAINDER=1 -DHAVE_LCHOWN=1 -DHAVE_GETTIMEOFDAY=1 -DGETTIMEOFDAY_TWO_ARGS=1 -DHAVE_IOS_BINARY=1 -DHAVE_IOS_BIN=1 -D_IMPL_NS_HTML -UDEBUG -DNDEBUG -DTRIMMED -DNETSCAPE -DOSTYPE=\"SunOS5\" -DMOZILLA_CLIENT -DLAYERS -DUNIX_EMBED -DX_PLUGINS -DJS_THREADSAFE -DUNIX_ASYNC_DNS -DSTANDALONE_IMAGE_LIB -DMODULAR_NETLIB -DMOZ_USER_DIR=\".mozilla\" -I../../../dist/./include -I../../../dist/include -I../../../../include -I/u/igb/mozilla/build/include -I../../../dist/./public/jpeg -I../../../dist/./public/png -I../../../dist/./public/zlib -I../../../dist/public/dom -I../../../../layout/events/src/../../html/base/src -I/usr/openwin/include ../../../../layout/events/src/nsEventListenerManager.cpp ../../../../layout/events/src/nsEventListenerManager.cpp: In method `unsigned int nsEventListenerManager::HandleEvent(class nsIPresContext &, struct nsEvent *, class nsIDOMEvent **, enum nsEventStatus &)': ../../../../layout/events/src/nsEventListenerManager.cpp:361: `NS_KEY_PRESS' undeclared (first use this function) ../../../../layout/events/src/nsEventListenerManager.cpp:361: (Each undeclared identifier is reported only once ../../../../layout/events/src/nsEventListenerManager.cpp:361: for each function it appears in.) make[3]: *** [nsEventListenerManager.o] Error 1
I haven't tried the suggest workaround myself. Sorry about that. I think I will wait until people fix the infinite recursion problem. You can monitor the newsgroup netscape.public.mozilla.unix for the current status, especially the thread titled "crashing on startup" started by Mike Shaver.
Hi, are you still getting this bug? My Netscape colleague Chris McAfee reported that he ran into three compiler bugs when building SeaMonkey with gcc 2.8.1 on Solaris 2.6. So this could also be a possible cause. Can you revert to gcc 2.7.* or switch to egcs?
As wtc says, 2.8.1 isn't ready for primetime and people think that 2.7.* or egcs is the way to go right now.
Status: ASSIGNED → RESOLVED
Closed: 26 years ago
Resolution: --- → INVALID
Status: RESOLVED → CLOSED
I am going to mark this bug as INVALID because gcc 2.8.1 is not ready for prime time.
Closed the bug.
NSPR now has its own Bugzilla product. Moving this bug to the NSPR product.
You need to log in before you can comment on or make changes to this bug.