461502 - ForkAndExec is crashing on Solaris 8/9 due to environ being NULL

Reporter

Description

•

16 years ago

These tests failed first in nightly builds since build 20081022.1 in all 4 combinations of 32/64 and DBG/OPT builds on 2 machines (Solaris 8 and Solaris 9). I checked also results from today and they failed again. BEGIN TEST: pipeping2 (2008-09-22 21:34:11) END TEST: pipeping2 (2008-09-22 21:34:15) TEST STATUS: pipeping2 = FAILED (errno 11) BEGIN TEST: sockping (2008-09-22 21:42:46) END TEST: sockping (2008-09-22 21:42:47) TEST STATUS: sockping = FAILED (errno 11) I compared code of pipeping2.c with pipeping.c (which passed), only significant difference is in calling PR_ProcessAttrSetInheritableFD function (pipeping.c calls PR_ProcessAttrSetStdioRedirect at the same place of code). Errno 11 means Resource temporarily unavailable. There is not clear which part of code returns this error and which resource is unavailable.

Christophe Ravel

Comment 1

•

16 years ago

NSPR 4.7.2 was released earlier this week. Setting the target milestone to 4.7.3.

Target Milestone: 4.7.1 → 4.7.3

Christophe Ravel

Comment 2

•

16 years ago

This bug is seen on the latest NSPR 4.7.2.

Version: 4.7 → 4.7.2

Nelson Bolyard (seldom reads bugmail)

Comment 3

•

16 years ago

Is it an NSPR bug? Or did the system run out of some resource? Does the system perhaps need to have the amount of some resource increased?

Christophe Ravel

Comment 4

•

16 years ago

The failure happened when we switched from NSPR 4.7.1 to 4.7.2. Unless there are some known system resource limit changes between these 2 micros releases, I don't think that is is the problem. If the problem is due to increased system resource requirement as a side effect of a change in NSPR, this change is a problem by itself and needs to be tracked.

Christophe Ravel

Comment 5

•

16 years ago

Before running the tests, we set the maximum fd number to 1024 (ulimit -n 1024). I ran a truss with pipeping2 "truss -f ./pipeping2". The relevant part at the end is: 24936: sigaction(SIGPIPE, 0xFFBEF188, 0x00000000) = 0 24936: pipe() = 3 [4] 24936: fcntl(3, F_GETFD, 0x00000000) = 0 24936: fcntl(3, F_GETFL, 0x00000000) = 2 24936: fstat64(3, 0xFFBEF270) = 0 24936: fstat64(3, 0xFFBEF270) = 0 24936: fcntl(3, F_SETFL, 0x00000082) = 0 24936: fcntl(4, F_GETFD, 0x00000000) = 0 24936: fcntl(4, F_GETFL, 0x00000000) = 2 24936: fstat64(4, 0xFFBEF270) = 0 24936: fstat64(4, 0xFFBEF270) = 0 24936: fcntl(4, F_SETFL, 0x00000082) = 0 24936: pipe() = 5 [6] 24936: fcntl(5, F_GETFD, 0x00000000) = 0 24936: fcntl(5, F_GETFL, 0x00000000) = 2 24936: fstat64(5, 0xFFBEF270) = 0 24936: fstat64(5, 0xFFBEF270) = 0 24936: fcntl(5, F_SETFL, 0x00000082) = 0 24936: fcntl(6, F_GETFD, 0x00000000) = 0 24936: fcntl(6, F_GETFL, 0x00000000) = 2 24936: fstat64(6, 0xFFBEF270) = 0 24936: fstat64(6, 0xFFBEF270) = 0 24936: fcntl(6, F_SETFL, 0x00000082) = 0 24936: fcntl(3, F_SETFD, 0x00000001) = 0 24936: fcntl(6, F_SETFD, 0x00000001) = 0 24936: lwp_cond_signal(0xFF1D34E8) = 0 24936: lwp_cond_wait(0xFF1D34E8, 0xFF1D34F8, 0xFF1CCD80) = 0 24936: lwp_self() = 3 24936: Incurred fault #6, FLTBOUNDS %pc = 0xFF352A48 24936: siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000 24936: Received signal #11, SIGSEGV [default] 24936: siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000 24936: *** process killed ***

Christophe Ravel

Comment 6

•

16 years ago

I ran a trus with sockping (same conditions, same machine): 24939: sigaction(SIGPIPE, 0xFFBEF190, 0x00000000) = 0 24939: so_socket(1, 2, 0, "", 1) = 3 24939: so_socket(1, 2, 0, "", 1) = 4 24939: so_socketpair(0xFFBEF4BC) = 0 24939: close(3) = 0 24939: fcntl(5, F_GETFD, 0x00000000) = 0 24939: fcntl(5, F_GETFL, 0x00000000) = 2 24939: fstat64(5, 0xFFBEF278) = 0 24939: getsockopt(5, 65535, 8192, 0xFFBEF378, 0xFFBEF370, 1) = 0 24939: fstat64(5, 0xFFBEF278) = 0 24939: getsockopt(5, 65535, 8192, 0xFFBEF378, 0xFFBEF374, 1) = 0 24939: setsockopt(5, 65535, 8192, 0xFFBEF378, 4, 1) = 0 24939: fcntl(5, F_SETFL, 0x00000082) = 0 24939: fcntl(4, F_GETFD, 0x00000000) = 0 24939: fcntl(4, F_GETFL, 0x00000000) = 2 24939: fstat64(4, 0xFFBEF278) = 0 24939: getsockopt(4, 65535, 8192, 0xFFBEF378, 0xFFBEF370, 0) = 0 24939: fstat64(4, 0xFFBEF278) = 0 24939: getsockopt(4, 65535, 8192, 0xFFBEF378, 0xFFBEF374, 0) = 0 24939: setsockopt(4, 65535, 8192, 0xFFBEF378, 4, 0) = 0 24939: fcntl(4, F_SETFL, 0x00000082) = 0 24939: fcntl(5, F_SETFD, 0x00000001) = 0 24939: lwp_cond_signal(0xFF1D34E8) = 0 24939: lwp_cond_wait(0xFF1D34E8, 0xFF1D34F8, 0xFF1CCD80) = 0 24939: lwp_self() = 3 24939: Incurred fault #6, FLTBOUNDS %pc = 0xFF352A48 24939: siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000 24939: Received signal #11, SIGSEGV [default] 24939: siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000 24939: *** process killed ***

Wan-Teh Chang

Comment 7

•

16 years ago

Christophe: all the changes in NSPR 4.7.2 are listed in the release notes: http://www.mozilla.org/projects/nspr/release-notes/nspr472.html There are two incomplete porting changes that are not listed in the release notes: the Symbian OS port and 64-bit Mac OS X port.

Christophe Ravel

Comment 8

•

16 years ago

Note: the crash is only on Solaris 8 and 9 SPARC, not on Solaris 10 SPARC, not on Solaris 9 and 10 x86.

Christophe Ravel

Comment 9

•

16 years ago

System resources: > limit cputime unlimited filesize unlimited datasize unlimited stacksize 32768 kbytes coredumpsize unlimited vmemoryuse unlimited descriptors 1024

Christophe Ravel

Comment 10

•

16 years ago

Stack for pipeping2: core 'core' of 24992: ./pipeping2 ----------------- lwp# 1 / thread# 1 -------------------- ff352a48 ForkAndExec (119d0, 21c78, 0, 236d8, 0, 0) + c0 ff3530e8 _MD_CreateUnixProcess (119d0, 21c78, 0, 236d8, 5, 4) + 80 ff32d470 PR_CreateProcess (119d0, 21c78, 0, 236d8, 31960, 0) + 30 00011394 main (1, ffbef9b4, ffbef9bc, 21c00, 0, 0) + 364 00010c00 _start (0, 0, 0, 0, 0, 0) + 108 ----------------- lwp# 2 / thread# 2 -------------------- ff29edc4 _signotifywait (ff1cc000, 0, 0, 0, 0, 0) + 8 ff1b1c2c thr_yield (0, 0, 0, 0, 0, 0) + 8c ----------------- lwp# 3 -------------------------------- ff29c968 _door_return (3, ff1cd658, ff1cd670, 3, ff1cc000, 1) + 10 ff1aa358 _lwp_start (ff145d98, 0, 6000, ffbef2e4, 0, 0) + 18 ff1b1c2c thr_yield (0, 0, 0, 0, 0, 0) + 8c -------------------------- thread# 3 -------------------- ff1ad9b8 _reap_wait (ff1d0980, 1e924, 0, ff1cc000, 0, 0) + 38 ff1ad710 _reaper (ff1cce00, ff1d2708, ff1d0980, ff1ccdd8, 1, fe400000) + 38 ff1bb01c _thread_start (0, 0, 0, 0, 0, 0) + 40 -------------------------- thread# 4 -------------------- ff1bafd4 _restorefsr (249c0, 0, 0, 0, 0, 0) + 8

Christophe Ravel

Comment 11

•

16 years ago

Stack for sockping: core 'core' of 24998: ./sockping ----------------- lwp# 1 / thread# 1 -------------------- fdb52a48 ForkAndExec (11834, 21ad8, 0, 23538, 0, 0) + c0 fdb530e8 _MD_CreateUnixProcess (11834, 21ad8, 0, 23538, 2, 4) + 80 fdb2d470 PR_CreateProcess (11834, 21ad8, 0, 23538, 31960, 0) + 30 0001126c main (1, ffbef9b4, ffbef9bc, 21800, 0, 0) + 234 00010c08 _start (0, 0, 0, 0, 0, 0) + 108 ----------------- lwp# 2 / thread# 2 -------------------- fda9edc4 _signotifywait (fd9cc000, 0, 0, 0, 0, 0) + 8 fd9b1c2c thr_yield (0, 0, 0, 0, 0, 0) + 8c ----------------- lwp# 3 -------------------------------- fda9c968 _door_return (3, fd9cd658, fd9cd670, 3, fd9cc000, 1) + 10 fd9aa358 _lwp_start (fd945d98, 0, 6000, ffbef2e4, 0, 0) + 18 fd9b1c2c thr_yield (0, 0, 0, 0, 0, 0) + 8c -------------------------- thread# 3 -------------------- fd9ad9b8 _reap_wait (fd9d0980, 1e924, 0, fd9cc000, 0, 0) + 38 fd9ad710 _reaper (fd9cce00, fd9d2708, fd9d0980, fd9ccdd8, 1, fe400000) + 38 fd9bb01c _thread_start (0, 0, 0, 0, 0, 0) + 40 -------------------------- thread# 4 -------------------- fd9bafd4 _restorefsr (24820, 0, 0, 0, 0, 0) + 8

Christophe Ravel

Comment 12

•

16 years ago

The bug appeared between NSPR_4_7_2_BETA2 AND NSPR_4_7_2_BETA3

Nelson Bolyard (seldom reads bugmail)

Comment 13

•

16 years ago

Hmm. Out of resources, trying to fork. I suspect the system ran out of process table entries.

Christophe Ravel

Comment 14

•

16 years ago

That leaves us with the commits for the following bugs: 432430: [PATCH] NSPR port to Symbian OS, unit tests tested 313282: In strcstr.c there is an 'obvious improvement' waiting to be performed 451476: NSPR shared libraries should use direct bindings on Solaris

Wan-Teh Chang

Comment 15

•

16 years ago

Attached patch Diffs between NSPR 4.7.2 Beta 2 and Beta 3 (deleted) — Details — Splinter Review

I omitted the changes that cannot affect Solaris.

Christophe Ravel

Comment 16

•

16 years ago

I made a build with NSPR_4_7_2_RTM but without the changes for bug 451476 (removed -Bdirect). Both tests pipeping2 and sockping don't crash with this build.

Christophe Ravel

Comment 17

•

16 years ago

I made another build with NSPR_4_7_2_RTM but changes "-Bdirect" with "-z direct". Both tests pipeping2 and sockping are still crashing.

Wan-Teh Chang

Comment 18

•

16 years ago

Julien, this bug is about the test failures caused by the -Bdirect patch of bug 451476.

Assignee: wtc → julien.pierre.boogz

Diffs between NSPR 4.7.2 Beta 2 and Beta 3 16 years ago Wan-Teh Chang (deleted), patch		Details \| Diff \| Splinter Review
Fix typos (errno => exit status) in runtests.pl 16 years ago Wan-Teh Chang (deleted), patch	christophe.ravel.bugs : review+	Details \| Diff \| Splinter Review
Proposed patch 16 years ago Wan-Teh Chang (deleted), patch	julien.pierre : review-	Details \| Diff \| Splinter Review
Back out the -Bdirect linker flag (checked in) 16 years ago Wan-Teh Chang (deleted), patch	julien.pierre : review+	Details \| Diff \| Splinter Review
test case for using -Bdirect in shared library 16 years ago Julien Pierre (deleted), application/zip		Details
Additional patch to enable -Bdirect on Solaris 10 and above 16 years ago Julien Pierre (deleted), patch	wtc : review-	Details \| Diff \| Splinter Review
Updated patch 16 years ago Julien Pierre (deleted), patch		Details \| Diff \| Splinter Review
Updated patch (as checked in) 16 years ago Wan-Teh Chang (deleted), patch		Details \| Diff \| Splinter Review