Closed Bug 5755 Opened 26 years ago Closed 26 years ago

Bus error in PR_StackPop at os_Irix.s:81

Categories

(NSPR :: NSPR, defect, P3)

3.1.1
SGI
IRIX
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wtc, Assigned: srinivas)

Details

The OS is IRIX 6.5. The test machines are foo3.mcom.com and hsync.mcom.com. The NSPR release is 3.1.1. When running the poll_nm test, optimized build, occasionally I get a core dump due to a bus error. This is very hard to reproduce. I need to write a shell script to run the poll_nm test repeatedly: while true; do poll_nm echo ok done Eventually you will get a core file. The stack trace at the crash is: dbx poll_nm core dbx version 7.3 BETA 54632_Mar27_BETA Mar 27 1999 02:40:31 Core from signal SIGBUS: Bus error (dbx) where > 0 PR_StackPop(0x100167b8, 0x2, 0x0, 0x7, 0xffffffff, 0x0, 0x1, 0x1) ["/tmp_mn t/u/wtc/release/v3.1.1/mozilla/nsprpub/pr/src/md/unix/os_Irix.s":81, 0x40307b8] 1 _PR_Getfd(0x100167b8, 0x2, 0x0, 0x7, 0xffffffff, 0x0, 0x1, 0x1) ["/tmp_mnt/ u/wtc/release/v3.1.1/mozilla/nsprpub/pr/src/io/prfdcach.c":78, 0x400dc54] 2 pt_SetMethods(0x7, 0x2, 0x0, 0x7, 0xffffffff, 0x0, 0x1, 0x1) ["/tmp_mnt/u/w tc/release/v3.1.1/mozilla/nsprpub/pr/src/pthreads/ptio.c":2790, 0x4026df8] 3 pt_Accept(0x0, 0x0, 0xffffffff, 0x7, 0xffffffff, 0x0, 0x1, 0x1) ["/tmp_mnt/ u/wtc/release/v3.1.1/mozilla/nsprpub/pr/src/pthreads/ptio.c":1666, 0x4025740] 4 PR_Accept(0x100167b8, 0x2, 0x0, 0x7, 0xffffffff, 0x0, 0x1, 0x1) ["/tmp_mnt/ u/wtc/release/v3.1.1/mozilla/nsprpub/pr/src/io/priometh.c":166, 0x400fc90] 5 main(0x0, 0x2, 0x0, 0x7, 0xffffffff, 0x0, 0x0, 0x0) ["/tmp_mnt/u/wtc/releas e/v3.1.1/mozilla/nsprpub/pr/tests/poll_nm.c":283, 0x10001e08] 6 __start() ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M4/csu/crt1text. s":177, 0x10001788] (dbx) This does not happen in the debug build because in the debug build the fd cache is not implemented as an atomic stack. One can work around this bug by setting the environment variable NSPR_FD_CACHE_SIZE_HIGH to a nonzero value to disable the atomic stack code in NSPR's fd cache, e.g., setenv NSPR_FD_CACHE_SIZE_HIGH 1024
Status: NEW → ASSIGNED
There is a bug in PR_StackPop; a branch instruction in the delay slot of another branch instruction, which can result in undefined behaviour. Files modified (NSPR_3_1_BRANCH): ps/src/md/unix/os_Irix.s - rev. 2.4.4.1
There is a hardware bug in the R10K chip, of rev 3.1 and earlier, that can cause a ll/sc instruction sequence to succeed incorrectly, when two ll instructions are executed within a span of 32 instructions.
Add extra "nop" instructions to the stack push/pop routines for the workaround. Files modified: ps/src/md/unix/os_Irix.s - rev 2.7
Checked in the extra nop fix to NSPR20_RELEASE_3_1_BRANCH, in preparation for the NSPR 3.1.2 patch release. /m/src/ns/nspr20/pr/src/md/unix/os_Irix.s, revision 2.4.4.2.
I can't log into hsync.mcom.com right now. But I used the shell script to run the poll_nm test repeatedly on foo3.mcom.com (IRIX 6.5) and foo2.mcom.com (IRIX 6.2) and it still hasn't crashed after 5 minutes.
Status: ASSIGNED → RESOLVED
Closed: 26 years ago
Resolution: --- → FIXED
Marked the bug fixed.
You need to log in before you can comment on or make changes to this bug.