Closed
Bug 451476
Opened 16 years ago
Closed 16 years ago
NSPR shared libraries should use direct bindings on Solaris
Categories
(NSPR :: NSPR, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
4.7.2
People
(Reporter: julien.pierre, Assigned: julien.pierre)
Details
Attachments
(2 files)
(deleted),
patch
|
julien.pierre
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
wtc
:
review+
|
Details | Diff | Splinter Review |
We ran into a problem recently where accidental interposition of a symbol by an application's shared library caused a crash during NSPR initialization.
The crash stack looked like this :
(dbx) where
current thread: t@1
[1] __lwp_kill(0x1, 0x6), at 0xfe0b54d7
[2] _thr_kill(0x1, 0x6), at 0xfe0b2c86
[3] raise(0x6), at 0xfe0613f3
[4] abort(0xfe3f9b9c, 0xfe3b2cd9, 0x8046338, 0xfe3d2624, 0xfe3e8470, 0xfe3e8478), at 0xfe041709
=>[5] PR_Assert(s = 0xfe3e8470 "0 == rv", file = 0xfe3e8478 "../../../../pr/src/pthreads/ptsynch.c", ln = 178), line 551 in "prlog.c"
[6] PR_NewLock(), line 178 in "ptsynch.c"
[7] session_thread_init(0xfee7f720, 0xfe3c5139, 0x0, 0x8046374, 0xfece4b02, 0xfeedebf0), at 0xfece4ad3
[8] PR_CallOnce(once = 0xfeedebf0, func = 0xfece4ac8 = &`libns-httpd40.so`session.cpp`#__1cTsession_thread_init6F_nIPRStatus__ [non -g, demangles to: session_thread_init]()), line 808 in "prinit.c" [9] session_alloc_thread_slot(0xfece912c), at 0xfece4b02
[10] __STATIC_CONSTRUCTOR(0xfeffb818, 0xb4, 0xfeff04f0, 0x80463c0, 0xfefd58e1, 0xfeffb28c), at 0xfece92e4
[11] __cplus_fini_at_exit(0xfeffb28c, 0xfeff04f0, 0xfeffb818, 0xfde90e5c, 0xfdeb03dc, 0xfee3f1d4), at 0xfee3f299
[12] call_init(0xfde90e58, 0x3), at 0xfefd58e1
[13] is_dep_init(0xfeff04f0, 0xfefa0f00), at 0xfefd5695
[14] elf_bndr(0xfefa0f00, 0x1c8, 0xfe3b2534), at 0xfefdfc22
[15] elf_rtbndr(0x1c8, 0xfe3b2534, 0x80464a0, 0xfe3e5398, 0xfe3f9b9c, 0xfe3b2459), at 0xfefcba64
[16] 0xfefa0f00(0x8082350, 0x0), at 0xfefa0f00
[17] PR_NewLogModule(name = 0xfe3e74e0 "clock"), line 366 in "prlog.c"
[18] _PR_InitStuff(), line 181 in "prinit.c"
[19] _PR_ImplicitInitialization(), line 251 in "prinit.c"
[20] PR_NewLock(), line 172 in "ptsynch.c"
[21] __STATIC_CONSTRUCTOR(0xfeffb818, 0x44, 0xfe7d0930, 0x8046594, 0xfefd58e1, 0xfeffb28c), at 0xfdee5b1c
[22] __cplus_fini_at_exit(0xfeffb28c, 0xfeffdcc8, 0xfeffb818, 0xfdeb094c, 0xfe000bd4, 0xfdee834c), at 0xfdee8407
[23] call_init(0xfdeb08d8, 0x1), at 0xfefd58e1
[24] setup(0x80467e0, 0x8046944, 0x0, 0x8047fc4, 0x1000, 0xfefca0ad, 0xfeffbaec, 0xfefc4000, 0xfefc4000, 0xffffffff, 0x8050034, 0x8047fca, 0x80467d8, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0x0, 0x2), at 0xfefd4f9c
[25] _setup(0x8046794, 0xfeffbaec, 0x3, 0x80467d8, 0x4, 0x80467e0), at 0xfefe0c66
[26] _rt_boot(0x0, 0x80469e0, 0x80469ed, 0x8046b28, 0x8046b3e, 0x8046b49), at 0xfefcb94c
The intercepted symbol was strncasecmp, which is used in PR_NewLogModule.
This triggered the loading of a C++ shared library that implemented strncasecmp. The init function for that library called PR_NewLock, even though NSPR had not completed its initialization. This ended up causing a failure in pthread_mutex_init and an assertion in NSPR.
The only way to solve this problem is to prevent interposition of this symbol in order to avoid causing the other shared library to be loaded.
The Solaris linker provides -Bdirect and -z direct options to accomplish this. Both of them solve the problem. I'm not sure which one is preferrable to use.
We may also want to do this on other Unix platforms as well if such a linker option is available.
Comment 1•16 years ago
|
||
The problem is that that C++ shared library implements
strncasecmp. We can change NSPR to not call strncasecmp.
If -Bdirect works like -Bsymbolic, it'll prevent us from
overriding some standard library function calls easily,
and we should not impose that on all users of NSPR.
Assignee | ||
Comment 2•16 years ago
|
||
Wan-Teh,
-Bdirect is not the same as -Bsymbolic.
The way I understand things, it's still possible to interpose if one builds an interposer library with -z interpose. It just prevents additional interposition.
-B direct | nodirect
Options governing direct binding. -B direct establishes
direct binding information by recording the relationship
between each symbol reference and the dependency that
provides the definition. In addition, direct binding
information is established between each symbol reference
and an associated definition within the object being
created. The runtime linker uses this information to
search directly for a symbol in the associated object
rather than to carry out a default symbol search.
Direct binding information can only be established to
dependencies specified with the link-edit. Thus, you
should use the -z defs option. Objects that wish to
interpose on symbols in a direct binding environment
should identify themselves as interposers with the -z
interpose option. The use of -B direct enables -z lazy-
load for all dependencies.
-B nodirect prevents any direct binding to the inter-
faces offered by the object being created. The object
being created can continue to directly bind to external
interfaces by specifying the -z direct option.
Assignee | ||
Comment 3•16 years ago
|
||
I meant "accidental" interposition.
Assignee | ||
Comment 4•16 years ago
|
||
Here is the doc for -z interpose :
-z interpose
Marks the object as an interposer. When direct bindings
are in effect, the runtime linker searchs for symbols in
any interposers before the object associated to the
direct binding. See -B direct.
I did not try to build an interposer library.
Comment 5•16 years ago
|
||
Re: strncasecmp
I believe that NSPR uses strcasecmp instead of strncasecmp.
Could you please doublecheck?
In any case, we can modify NSPR so that it doesn't call
any non-standard C library function such as strcasecmp.
That'll significantly reduce the likelihood of accidental
interposition because there is little reason for a normal
library (as opposed to a special-purpose library such as
jemalloc or SmartHeap) to implement a universally available
libc function such as strlen or strcmp.
Re: -Bdirect
If the Solaris system libraries such as libpthread.so,
libthread.so, librt.so, libsocket.so, libnsl.so are
built with -Bdirect, then I'm fine with building NSPR
with -Bdirect.
Is libmtmalloc.so built with -z interpose?
Assignee | ||
Comment 6•16 years ago
|
||
Wan-Teh,
You are right, it is strcasecmp.
Re: mtmalloc, I don't know if it's built with -z interpose. ldd / nm / file don't show that.
Comment 7•16 years ago
|
||
prlog.c uses strcasecmp but not strncasecmp, so this macro
definition can be removed.
Attachment #335134 -
Flags: review?(julien.pierre.boogz)
Assignee | ||
Comment 8•16 years ago
|
||
Comment on attachment 335134 [details] [diff] [review]
Remove an unused strncasecmp macro definition. (checked in)
This change is fine, although it is not directly related to this Solaris problem .
Attachment #335134 -
Flags: review?(julien.pierre.boogz) → review+
Comment 9•16 years ago
|
||
Comment on attachment 335134 [details] [diff] [review]
Remove an unused strncasecmp macro definition. (checked in)
I checked in this patch on the NSPR trunk (NSPR 4.7.3).
Checking in prlog.c;
/cvsroot/mozilla/nsprpub/pr/src/io/prlog.c,v <-- prlog.c
new revision: 3.43; previous revision: 3.42
done
Attachment #335134 -
Attachment description: Remove an unused strncasecmp macro definition. → Remove an unused strncasecmp macro definition. (checked in)
Assignee | ||
Comment 10•16 years ago
|
||
I verified that this still allows LD_PRELOAD of /usr/lib/libmtmalloc.so to work . The malloc function from libmtmalloc.so.1 is called in dbx, not the one from libc.so.1 .
Attachment #336128 -
Flags: review?(wtc)
Comment 11•16 years ago
|
||
Comment on attachment 336128 [details] [diff] [review]
Use -Bdirect when linking NSPR shared libraries on Solaris
r=wtc.
If GNU ld doesn't support the -Bdirect flag, you can
test the GCC_USE_GNU_LD variable.
Attachment #336128 -
Flags: review?(wtc) → review+
Comment 12•16 years ago
|
||
NSPR 4.7.2 is almost done. Please set target milestone
to 4.7.3.
Assignee | ||
Comment 13•16 years ago
|
||
Thanks for the review, Wan-Teh. Should I wait until NSPR 4.7.2 has an RTM tag before I check in to the trunk ?
Target Milestone: 4.7.2 → 4.7.3
Comment 14•16 years ago
|
||
You can check it in now.
Comment 15•16 years ago
|
||
I was hoping we could get it into 4.7.2 when we release 3.11.10 soon (maybe
next week).
Assignee | ||
Comment 16•16 years ago
|
||
Thanks, Wan-Teh. I checked this in to the trunk.
Checking in configure;
/cvsroot/mozilla/nsprpub/configure,v <-- configure
new revision: 1.237; previous revision: 1.236
done
Checking in configure.in;
/cvsroot/mozilla/nsprpub/configure.in,v <-- configure.in
new revision: 1.241; previous revision: 1.240
done
I'm also in favor of having this go into 4.7.2 if it's not too late.
Priority: -- → P2
Summary: NSPR shared libraries should use direct bindings → NSPR shared libraries should use direct bindings on Solaris
Version: other → 4.7.1
Comment 17•16 years ago
|
||
OK. I put this in NSPR 4.7.2 Beta 3. I just imported
NSPR_4_7_2_BETA3 into mozilla-central for testing on the
Mozilla trunk.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Target Milestone: 4.7.3 → 4.7.2
Comment 18•16 years ago
|
||
Stack for pipeping2:
core 'core' of 24992: ./pipeping2
----------------- lwp# 1 / thread# 1 --------------------
ff352a48 ForkAndExec (119d0, 21c78, 0, 236d8, 0, 0) + c0
ff3530e8 _MD_CreateUnixProcess (119d0, 21c78, 0, 236d8, 5, 4) + 80
ff32d470 PR_CreateProcess (119d0, 21c78, 0, 236d8, 31960, 0) + 30
00011394 main (1, ffbef9b4, ffbef9bc, 21c00, 0, 0) + 364
00010c00 _start (0, 0, 0, 0, 0, 0) + 108
----------------- lwp# 2 / thread# 2 --------------------
ff29edc4 _signotifywait (ff1cc000, 0, 0, 0, 0, 0) + 8
ff1b1c2c thr_yield (0, 0, 0, 0, 0, 0) + 8c
----------------- lwp# 3 --------------------------------
ff29c968 _door_return (3, ff1cd658, ff1cd670, 3, ff1cc000, 1) + 10
ff1aa358 _lwp_start (ff145d98, 0, 6000, ffbef2e4, 0, 0) + 18
ff1b1c2c thr_yield (0, 0, 0, 0, 0, 0) + 8c
-------------------------- thread# 3 --------------------
ff1ad9b8 _reap_wait (ff1d0980, 1e924, 0, ff1cc000, 0, 0) + 38
ff1ad710 _reaper (ff1cce00, ff1d2708, ff1d0980, ff1ccdd8, 1, fe400000) + 38
ff1bb01c _thread_start (0, 0, 0, 0, 0, 0) + 40
-------------------------- thread# 4 --------------------
ff1bafd4 _restorefsr (249c0, 0, 0, 0, 0, 0) + 8
Comment 19•16 years ago
|
||
System resources available:
> limit
cputime unlimited
filesize unlimited
datasize unlimited
stacksize 32768 kbytes
coredumpsize unlimited
vmemoryuse unlimited
descriptors 1024
Originally, stacksize was 8192 kb. I tried to increase it to 32,768 kb but it didn't change anything.
Comment 20•16 years ago
|
||
Stack for sockping:
core 'core' of 24998: ./sockping
----------------- lwp# 1 / thread# 1 --------------------
fdb52a48 ForkAndExec (11834, 21ad8, 0, 23538, 0, 0) + c0
fdb530e8 _MD_CreateUnixProcess (11834, 21ad8, 0, 23538, 2, 4) + 80
fdb2d470 PR_CreateProcess (11834, 21ad8, 0, 23538, 31960, 0) + 30
0001126c main (1, ffbef9b4, ffbef9bc, 21800, 0, 0) + 234
00010c08 _start (0, 0, 0, 0, 0, 0) + 108
----------------- lwp# 2 / thread# 2 --------------------
fda9edc4 _signotifywait (fd9cc000, 0, 0, 0, 0, 0) + 8
fd9b1c2c thr_yield (0, 0, 0, 0, 0, 0) + 8c
----------------- lwp# 3 --------------------------------
fda9c968 _door_return (3, fd9cd658, fd9cd670, 3, fd9cc000, 1) + 10
fd9aa358 _lwp_start (fd945d98, 0, 6000, ffbef2e4, 0, 0) + 18
fd9b1c2c thr_yield (0, 0, 0, 0, 0, 0) + 8c
-------------------------- thread# 3 --------------------
fd9ad9b8 _reap_wait (fd9d0980, 1e924, 0, fd9cc000, 0, 0) + 38
fd9ad710 _reaper (fd9cce00, fd9d2708, fd9d0980, fd9ccdd8, 1, fe400000) + 38
fd9bb01c _thread_start (0, 0, 0, 0, 0, 0) + 40
-------------------------- thread# 4 --------------------
fd9bafd4 _restorefsr (24820, 0, 0, 0, 0, 0) + 8
Comment 21•16 years ago
|
||
I am sorry, comments 18, 19 and 20 where meant for another bug. Please disregard these comments.
Comment 22•16 years ago
|
||
Well, after investigation, it looks like the 2 stacks above are related to this bug...
I made a build for NSPR_4_7_2_RTM without the -Bdirect and these 2 tests don't crash anymore.
Assignee | ||
Comment 23•16 years ago
|
||
Christophe, is there another bug open for these test failures ?
You need to log in
before you can comment on or make changes to this bug.
Description
•