Closed Bug 451476 Opened 16 years ago Closed 16 years ago

NSPR shared libraries should use direct bindings on Solaris

Categories

(NSPR :: NSPR, defect, P2)

4.7.1
x86
SunOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: julien.pierre, Assigned: julien.pierre)

Details

Attachments

(2 files)

We ran into a problem recently where accidental interposition of a symbol by an application's shared library caused a crash during NSPR initialization. The crash stack looked like this : (dbx) where current thread: t@1 [1] __lwp_kill(0x1, 0x6), at 0xfe0b54d7 [2] _thr_kill(0x1, 0x6), at 0xfe0b2c86 [3] raise(0x6), at 0xfe0613f3 [4] abort(0xfe3f9b9c, 0xfe3b2cd9, 0x8046338, 0xfe3d2624, 0xfe3e8470, 0xfe3e8478), at 0xfe041709 =>[5] PR_Assert(s = 0xfe3e8470 "0 == rv", file = 0xfe3e8478 "../../../../pr/src/pthreads/ptsynch.c", ln = 178), line 551 in "prlog.c" [6] PR_NewLock(), line 178 in "ptsynch.c" [7] session_thread_init(0xfee7f720, 0xfe3c5139, 0x0, 0x8046374, 0xfece4b02, 0xfeedebf0), at 0xfece4ad3 [8] PR_CallOnce(once = 0xfeedebf0, func = 0xfece4ac8 = &`libns-httpd40.so`session.cpp`#__1cTsession_thread_init6F_nIPRStatus__ [non -g, demangles to: session_thread_init]()), line 808 in "prinit.c" [9] session_alloc_thread_slot(0xfece912c), at 0xfece4b02 [10] __STATIC_CONSTRUCTOR(0xfeffb818, 0xb4, 0xfeff04f0, 0x80463c0, 0xfefd58e1, 0xfeffb28c), at 0xfece92e4 [11] __cplus_fini_at_exit(0xfeffb28c, 0xfeff04f0, 0xfeffb818, 0xfde90e5c, 0xfdeb03dc, 0xfee3f1d4), at 0xfee3f299 [12] call_init(0xfde90e58, 0x3), at 0xfefd58e1 [13] is_dep_init(0xfeff04f0, 0xfefa0f00), at 0xfefd5695 [14] elf_bndr(0xfefa0f00, 0x1c8, 0xfe3b2534), at 0xfefdfc22 [15] elf_rtbndr(0x1c8, 0xfe3b2534, 0x80464a0, 0xfe3e5398, 0xfe3f9b9c, 0xfe3b2459), at 0xfefcba64 [16] 0xfefa0f00(0x8082350, 0x0), at 0xfefa0f00 [17] PR_NewLogModule(name = 0xfe3e74e0 "clock"), line 366 in "prlog.c" [18] _PR_InitStuff(), line 181 in "prinit.c" [19] _PR_ImplicitInitialization(), line 251 in "prinit.c" [20] PR_NewLock(), line 172 in "ptsynch.c" [21] __STATIC_CONSTRUCTOR(0xfeffb818, 0x44, 0xfe7d0930, 0x8046594, 0xfefd58e1, 0xfeffb28c), at 0xfdee5b1c [22] __cplus_fini_at_exit(0xfeffb28c, 0xfeffdcc8, 0xfeffb818, 0xfdeb094c, 0xfe000bd4, 0xfdee834c), at 0xfdee8407 [23] call_init(0xfdeb08d8, 0x1), at 0xfefd58e1 [24] setup(0x80467e0, 0x8046944, 0x0, 0x8047fc4, 0x1000, 0xfefca0ad, 0xfeffbaec, 0xfefc4000, 0xfefc4000, 0xffffffff, 0x8050034, 0x8047fca, 0x80467d8, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0x0, 0x2), at 0xfefd4f9c [25] _setup(0x8046794, 0xfeffbaec, 0x3, 0x80467d8, 0x4, 0x80467e0), at 0xfefe0c66 [26] _rt_boot(0x0, 0x80469e0, 0x80469ed, 0x8046b28, 0x8046b3e, 0x8046b49), at 0xfefcb94c The intercepted symbol was strncasecmp, which is used in PR_NewLogModule. This triggered the loading of a C++ shared library that implemented strncasecmp. The init function for that library called PR_NewLock, even though NSPR had not completed its initialization. This ended up causing a failure in pthread_mutex_init and an assertion in NSPR. The only way to solve this problem is to prevent interposition of this symbol in order to avoid causing the other shared library to be loaded. The Solaris linker provides -Bdirect and -z direct options to accomplish this. Both of them solve the problem. I'm not sure which one is preferrable to use. We may also want to do this on other Unix platforms as well if such a linker option is available.
The problem is that that C++ shared library implements strncasecmp. We can change NSPR to not call strncasecmp. If -Bdirect works like -Bsymbolic, it'll prevent us from overriding some standard library function calls easily, and we should not impose that on all users of NSPR.
Wan-Teh, -Bdirect is not the same as -Bsymbolic. The way I understand things, it's still possible to interpose if one builds an interposer library with -z interpose. It just prevents additional interposition. -B direct | nodirect Options governing direct binding. -B direct establishes direct binding information by recording the relationship between each symbol reference and the dependency that provides the definition. In addition, direct binding information is established between each symbol reference and an associated definition within the object being created. The runtime linker uses this information to search directly for a symbol in the associated object rather than to carry out a default symbol search. Direct binding information can only be established to dependencies specified with the link-edit. Thus, you should use the -z defs option. Objects that wish to interpose on symbols in a direct binding environment should identify themselves as interposers with the -z interpose option. The use of -B direct enables -z lazy- load for all dependencies. -B nodirect prevents any direct binding to the inter- faces offered by the object being created. The object being created can continue to directly bind to external interfaces by specifying the -z direct option.
I meant "accidental" interposition.
Here is the doc for -z interpose : -z interpose Marks the object as an interposer. When direct bindings are in effect, the runtime linker searchs for symbols in any interposers before the object associated to the direct binding. See -B direct. I did not try to build an interposer library.
Re: strncasecmp I believe that NSPR uses strcasecmp instead of strncasecmp. Could you please doublecheck? In any case, we can modify NSPR so that it doesn't call any non-standard C library function such as strcasecmp. That'll significantly reduce the likelihood of accidental interposition because there is little reason for a normal library (as opposed to a special-purpose library such as jemalloc or SmartHeap) to implement a universally available libc function such as strlen or strcmp. Re: -Bdirect If the Solaris system libraries such as libpthread.so, libthread.so, librt.so, libsocket.so, libnsl.so are built with -Bdirect, then I'm fine with building NSPR with -Bdirect. Is libmtmalloc.so built with -z interpose?
Wan-Teh, You are right, it is strcasecmp. Re: mtmalloc, I don't know if it's built with -z interpose. ldd / nm / file don't show that.
prlog.c uses strcasecmp but not strncasecmp, so this macro definition can be removed.
Attachment #335134 - Flags: review?(julien.pierre.boogz)
Comment on attachment 335134 [details] [diff] [review] Remove an unused strncasecmp macro definition. (checked in) This change is fine, although it is not directly related to this Solaris problem .
Attachment #335134 - Flags: review?(julien.pierre.boogz) → review+
Comment on attachment 335134 [details] [diff] [review] Remove an unused strncasecmp macro definition. (checked in) I checked in this patch on the NSPR trunk (NSPR 4.7.3). Checking in prlog.c; /cvsroot/mozilla/nsprpub/pr/src/io/prlog.c,v <-- prlog.c new revision: 3.43; previous revision: 3.42 done
Attachment #335134 - Attachment description: Remove an unused strncasecmp macro definition. → Remove an unused strncasecmp macro definition. (checked in)
I verified that this still allows LD_PRELOAD of /usr/lib/libmtmalloc.so to work . The malloc function from libmtmalloc.so.1 is called in dbx, not the one from libc.so.1 .
Attachment #336128 - Flags: review?(wtc)
Comment on attachment 336128 [details] [diff] [review] Use -Bdirect when linking NSPR shared libraries on Solaris r=wtc. If GNU ld doesn't support the -Bdirect flag, you can test the GCC_USE_GNU_LD variable.
Attachment #336128 - Flags: review?(wtc) → review+
NSPR 4.7.2 is almost done. Please set target milestone to 4.7.3.
Thanks for the review, Wan-Teh. Should I wait until NSPR 4.7.2 has an RTM tag before I check in to the trunk ?
Target Milestone: 4.7.2 → 4.7.3
You can check it in now.
I was hoping we could get it into 4.7.2 when we release 3.11.10 soon (maybe next week).
Thanks, Wan-Teh. I checked this in to the trunk. Checking in configure; /cvsroot/mozilla/nsprpub/configure,v <-- configure new revision: 1.237; previous revision: 1.236 done Checking in configure.in; /cvsroot/mozilla/nsprpub/configure.in,v <-- configure.in new revision: 1.241; previous revision: 1.240 done I'm also in favor of having this go into 4.7.2 if it's not too late.
Priority: -- → P2
Summary: NSPR shared libraries should use direct bindings → NSPR shared libraries should use direct bindings on Solaris
Version: other → 4.7.1
OK. I put this in NSPR 4.7.2 Beta 3. I just imported NSPR_4_7_2_BETA3 into mozilla-central for testing on the Mozilla trunk.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Target Milestone: 4.7.3 → 4.7.2
Stack for pipeping2: core 'core' of 24992: ./pipeping2 ----------------- lwp# 1 / thread# 1 -------------------- ff352a48 ForkAndExec (119d0, 21c78, 0, 236d8, 0, 0) + c0 ff3530e8 _MD_CreateUnixProcess (119d0, 21c78, 0, 236d8, 5, 4) + 80 ff32d470 PR_CreateProcess (119d0, 21c78, 0, 236d8, 31960, 0) + 30 00011394 main (1, ffbef9b4, ffbef9bc, 21c00, 0, 0) + 364 00010c00 _start (0, 0, 0, 0, 0, 0) + 108 ----------------- lwp# 2 / thread# 2 -------------------- ff29edc4 _signotifywait (ff1cc000, 0, 0, 0, 0, 0) + 8 ff1b1c2c thr_yield (0, 0, 0, 0, 0, 0) + 8c ----------------- lwp# 3 -------------------------------- ff29c968 _door_return (3, ff1cd658, ff1cd670, 3, ff1cc000, 1) + 10 ff1aa358 _lwp_start (ff145d98, 0, 6000, ffbef2e4, 0, 0) + 18 ff1b1c2c thr_yield (0, 0, 0, 0, 0, 0) + 8c -------------------------- thread# 3 -------------------- ff1ad9b8 _reap_wait (ff1d0980, 1e924, 0, ff1cc000, 0, 0) + 38 ff1ad710 _reaper (ff1cce00, ff1d2708, ff1d0980, ff1ccdd8, 1, fe400000) + 38 ff1bb01c _thread_start (0, 0, 0, 0, 0, 0) + 40 -------------------------- thread# 4 -------------------- ff1bafd4 _restorefsr (249c0, 0, 0, 0, 0, 0) + 8
System resources available: > limit cputime unlimited filesize unlimited datasize unlimited stacksize 32768 kbytes coredumpsize unlimited vmemoryuse unlimited descriptors 1024 Originally, stacksize was 8192 kb. I tried to increase it to 32,768 kb but it didn't change anything.
Stack for sockping: core 'core' of 24998: ./sockping ----------------- lwp# 1 / thread# 1 -------------------- fdb52a48 ForkAndExec (11834, 21ad8, 0, 23538, 0, 0) + c0 fdb530e8 _MD_CreateUnixProcess (11834, 21ad8, 0, 23538, 2, 4) + 80 fdb2d470 PR_CreateProcess (11834, 21ad8, 0, 23538, 31960, 0) + 30 0001126c main (1, ffbef9b4, ffbef9bc, 21800, 0, 0) + 234 00010c08 _start (0, 0, 0, 0, 0, 0) + 108 ----------------- lwp# 2 / thread# 2 -------------------- fda9edc4 _signotifywait (fd9cc000, 0, 0, 0, 0, 0) + 8 fd9b1c2c thr_yield (0, 0, 0, 0, 0, 0) + 8c ----------------- lwp# 3 -------------------------------- fda9c968 _door_return (3, fd9cd658, fd9cd670, 3, fd9cc000, 1) + 10 fd9aa358 _lwp_start (fd945d98, 0, 6000, ffbef2e4, 0, 0) + 18 fd9b1c2c thr_yield (0, 0, 0, 0, 0, 0) + 8c -------------------------- thread# 3 -------------------- fd9ad9b8 _reap_wait (fd9d0980, 1e924, 0, fd9cc000, 0, 0) + 38 fd9ad710 _reaper (fd9cce00, fd9d2708, fd9d0980, fd9ccdd8, 1, fe400000) + 38 fd9bb01c _thread_start (0, 0, 0, 0, 0, 0) + 40 -------------------------- thread# 4 -------------------- fd9bafd4 _restorefsr (24820, 0, 0, 0, 0, 0) + 8
I am sorry, comments 18, 19 and 20 where meant for another bug. Please disregard these comments.
Well, after investigation, it looks like the 2 stacks above are related to this bug... I made a build for NSPR_4_7_2_RTM without the -Bdirect and these 2 tests don't crash anymore.
Christophe, is there another bug open for these test failures ?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: