Closed
Bug 362768
Opened 18 years ago
Closed 17 years ago
Unix: clean up NSPR when the NSPR library is unloaded
Categories
(NSPR :: NSPR, enhancement, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
4.7
People
(Reporter: wtc, Assigned: wtc)
References
Details
Attachments
(2 files, 4 obsolete files)
(deleted),
patch
|
alvolkov.bgs
:
review+
nelson
:
superreview+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
nelson
:
review+
|
Details | Diff | Splinter Review |
Right now NSPR initialization is done implicitly the first time
an NSPR function is called, and one has to call PR_Cleanup explicitly
to clean up NSPR.
The problem with the current implicit initialization is that it is
not thread-safe. If two or more threads call NSPR functions at the
same time when NSPR isn't initialized, both of them will call the
_PR_ImplicitInitialization function.
The problem with calling PR_Cleanup is that it is impossible for an
NSPR-based library such as NSS and LDAP C SDK to know that the process
is done with NSPR and therefore it is safe to shut down NSPR.
This enhancement request asks that we use techniques such as a Unix
shared library's init and fini functions or GCC's constructor and
destructor attributes for functions to so that NSPR is initialized and
cleaned up when the NSPR library is loaded and unloaded.
This enhancement request doesn't cover Win32. In particular, there are
a lot of restrictions on the functions that can be called from DllMain
that we can't perform NSPR initialization and cleanup in DllMain on
Windows.
Assignee | ||
Comment 1•18 years ago
|
||
I'm proposing this patch for the NSPR trunk and the NSPR client
branch only, so that we can get some testing on the Mozilla trunk.
This patch uses GCC's destructor function attribute to specify that
the _PR_Fini function should be called when the NSPR library is
unloaded. Right now _PR_Fini only cleans up the thread local storage
of the current thread. Eventually it needs to do everything we do in
PR_Cleanup except waiting for all the PR_USER_THREAD's to terminate.
Attachment #247477 -
Flags: review?(alexei.volkov)
Assignee | ||
Updated•18 years ago
|
Attachment #247477 -
Flags: review?(alexei.volkov) → review?(alexei.volkov.bugs)
Assignee | ||
Comment 2•18 years ago
|
||
Comment on attachment 247477 [details] [diff] [review]
Incomplete patch for NSPR cleanup
I should explain why, among all the resources used by NSPR,
I want to clean up the thread local storage first.
If the thread local storage used by NSPR to store a pointer
to the PRThread structure is not NULL, when the thread terminates,
the pthread library will pass that pointer to the thread local
storage destructor function, _pt_thread_death. _pt_thread_death
is defined in libnspr4.so, so if the pthread library calls
_pt_thread_death after libnspr4.so is unloaded, it will crash.
Therefore, it's important to set the thread local storage to
a NULL pointer before libnspr4.so is unloaded.
So while not freeing the other resources will only result in
resource leaks, not cleaning up the thread local storage may
result in a crash. I will write a new version of this patch
that adds the _fini function to libnspr4.so for non-GCC compilers.
Assignee | ||
Comment 3•18 years ago
|
||
This is the complete patch. It covers all the major platforms.
1. GCC: used on Linux, Mac OS X, and the BSD's. I tested it on
Linux and Mac OS X.
2. Sun compilers: used on Solaris. I tested it on Solaris 9 with
Forte 6 update 2 and Studio 11.
3. HP C Compiler: used on HP-UX. Unfortunately I can't find a
macro that is predefined by all versions of the HP C Compiler,
so I'm using the HPUX macro as a substitute. I tested it on
HP-UX 11.11 (PA-RISC) and 11.23 (ia64), both 32-bit and 64 bit
builds.
The only Unix platform I don't cover that we may care about is
AIX. If necessary, I can do the research. But I don't have any
AIX machine.
Attachment #247477 -
Attachment is obsolete: true
Attachment #247602 -
Flags: superreview?(nelson)
Attachment #247602 -
Flags: review?(alexei.volkov.bugs)
Attachment #247477 -
Flags: review?(alexei.volkov.bugs)
Assignee | ||
Comment 4•18 years ago
|
||
On AIX, we can use the "initfini" linker option:
http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.cmds/doc/aixcmds3/ld.htm
The linker option –binitfini::_PR_Fini should work for us.
Status: NEW → ASSIGNED
Priority: -- → P2
Target Milestone: --- → 4.6.5
Assignee | ||
Comment 5•18 years ago
|
||
Improved comments. No other changes.
Attachment #247602 -
Attachment is obsolete: true
Attachment #247747 -
Flags: superreview?(nelson)
Attachment #247747 -
Flags: review?(alexei.volkov.bugs)
Attachment #247602 -
Flags: superreview?(nelson)
Attachment #247602 -
Flags: review?(alexei.volkov.bugs)
Comment 6•18 years ago
|
||
Comment on attachment 247747 [details] [diff] [review]
Proposed patch v2
Wan-Teh, is this patch incomplete for AIX?
In ptthread.c you add this code for AIX:
>+#elif defined(AIX)
>+/* Need to use the binitfini::_PR_Fini linker option. */
>+#endif
Yet the patch contains no makefile change to add that linker option.
Assignee | ||
Comment 7•18 years ago
|
||
Nelson, yes, this patch is incomplete for AIX. I don't
have an AIX system.
Comment 8•18 years ago
|
||
Comment on attachment 247747 [details] [diff] [review]
Proposed patch v2
Wan-Teh, Christophe is willing to test a patch for AIX.
Please attach the proposed patch to this bug.
One more question. _PR_Fini is declared static.
Will the linker be able to use a static symbol with the initfini: command?
Attachment #247747 -
Flags: superreview?(nelson) → superreview+
Assignee | ||
Comment 9•18 years ago
|
||
Comment on attachment 247747 [details] [diff] [review]
Proposed patch v2
I checked in proposed patch v2 on the NSPR trunk (NSPR 4.7)
and the NSPRPUB_PRE_4_2_CLIENT_BRANCH (Mozilla 1.9 alpha 2).
Checking in src/Makefile.in;
/cvsroot/mozilla/nsprpub/pr/src/Makefile.in,v <-- Makefile.in
new revision: 1.44; previous revision: 1.43
done
Checking in src/pthreads/ptthread.c;
/cvsroot/mozilla/nsprpub/pr/src/pthreads/ptthread.c,v <-- ptthread.c
new revision: 3.70; previous revision: 3.69
done
Checking in Makefile.in;
/cvsroot/mozilla/nsprpub/pr/src/Makefile.in,v <-- Makefile.in
new revision: 1.25.2.22; previous revision: 1.25.2.21
done
Checking in pthreads/ptthread.c;
/cvsroot/mozilla/nsprpub/pr/src/pthreads/ptthread.c,v <-- ptthread.c
new revision: 3.42.2.16; previous revision: 3.42.2.15
done
Assignee | ||
Comment 10•18 years ago
|
||
(In reply to comment #8)
>
> One more question. _PR_Fini is declared static.
> Will the linker be able to use a static symbol with the initfini: command?
If _PR_Fini can't be declared static, we can remove the 'static'.
On all the platforms (GCC, Solaris, and HP-UX), a static symbol
works.
I will arrange a time to come over to Sun and write the AIX implementation.
The testing of the function requires the use of the debugger.
Comment 11•18 years ago
|
||
the checkin on trunk turned balsa orange (and nye more orange)
Comment 12•18 years ago
|
||
The tree is closed due to the orange above, by the way. Please fix ASAP or back out on NSPRPUB_PRE_4_2_CLIENT_BRANCH so the tree can be reopened?
Assignee | ||
Comment 13•18 years ago
|
||
Comment on attachment 247747 [details] [diff] [review]
Proposed patch v2
Thank you for letting me know. I backed out the patch
on the NSPRPUB_PRE_4_2_CLIENT_BRANCH.
Checking in Makefile.in;
/cvsroot/mozilla/nsprpub/pr/src/Makefile.in,v <-- Makefile.in
new revision: 1.25.2.23; previous revision: 1.25.2.22
done
Checking in pthreads/ptthread.c;
/cvsroot/mozilla/nsprpub/pr/src/pthreads/ptthread.c,v <-- ptthread.c
new revision: 3.42.2.17; previous revision: 3.42.2.16
done
Comment 14•18 years ago
|
||
For what it's worth, my build with the patch in hung on shutdown with the stack:
#0 0x0028e39b in js_ThreadDestructorCB (ptr=0x8d9b240)
at /builds/trunk/mozilla/js/src/jscntxt.c:82
#1 0x0012604b in _PR_DestroyThreadPrivate (self=0x8c84c10)
at /builds/trunk/mozilla/nsprpub/pr/src/threads/prtpd.c:265
#2 0x0013e033 in _pt_thread_death (arg=0x8c84c10)
at /builds/trunk/mozilla/nsprpub/pr/src/pthreads/ptthread.c:825
#3 0x0013e11e in _PR_Fini ()
at /builds/trunk/mozilla/nsprpub/pr/src/pthreads/ptthread.c:990
#4 0x0011c7b0 in __do_global_dtors_aux () from ./libnspr4.so
#5 0x00143e7c in _fini () from ./libnspr4.so
#6 0x4f5aa572 in _dl_fini () at dl-fini.c:248
#7 0x4f5e4939 in *__GI_exit (status=0) at exit.c:75
#8 0x4f5cef34 in __libc_start_main (main=0x80485f0 <main>, argc=3,
ubp_av=0xbfe7d0c4, init=0x8048660 <__libc_csu_init>,
fini=0x8048650 <__libc_csu_fini>, rtld_fini=0x4f5aa390 <_dl_fini>,
stack_end=0xbfe7d0bc) at libc-start.c:262
#9 0x08048561 in _start ()
Assignee | ||
Comment 15•18 years ago
|
||
I checked in this workaround on the NSPR trunk (NSPR 4.7) and
NSPRPUB_PRE_4_2_CLIENT_BRANCH (Mozilla 1.9 alpha 2).
The _pt_thread_death function calls the NSPR thread private data
destructors. In Mozilla the thread private data destructor for
JavaScript was causing the hang on shutdown. Before I have the
setup to debug it, it is best to comment out the _pt_thread_death
call in _PR_Fini. In general it seems risky to call NSPR thread
private data destructors in NSPR's shared library finalization
function because the libraries in which the destructors are defined
may have been unloaded.
For our immediate problem, the ptthread_setspecific(pt_book.key, NULL)
call is what we need. So there is still value to test the partial
patch on the Mozilla trunk.
Assignee | ||
Comment 16•18 years ago
|
||
Brendan, please refer to the stack trace in comment 14.
I checked in an NSPR patch to add a shared library finalization function,
traditionally known as "_fini". On Linux this function is implemented
using the GCC __attribute__ ((destructor)) language extension.
Right now the only cleanup I do is to set the pthread thread-specific
data used by NSPR to NULL. Since that data is a pointer to the
PRThread structure, I used the existing function _pt_thread_death
to free the PRThread structure first. But _pt_thread_death does
more cleanup than just freeing the memory. In particular, it invokes
NSPR's thread private data destructors on the NSPR thread private data
for that thread. (Note the distinction between pthread's thread-specific
data and NSPR's thread private data. Both are used here.)
On Mozilla shutdown, we have a hang in js_ThreadDestructorCB. Since
the line is in a while loop, I suspect that the hang is really an
infinite loop. It seems that only debug builds have this problem.
Do you have any idea what might be the cause?
In any case, it seems risky for NSPR's shared library finalization
function to call NSPR's thread private data destructors. I am just
curious as to why we hang in js_ThreadDestructorCB.
Assignee | ||
Comment 17•18 years ago
|
||
We reproduced this hang in both debug and optimized builds.
The hang comes with 100% CPU utilization, so it's an infinite
loop. Here are some data values in the debugger.
Stack trace is in http://pastebin.mozilla.org/2792
(gdb) print thread->contextList
$1 = {next = 0xbe5d08, prev = 0x2aaab7a6e4f8}
(gdb) print *thread
$6 = {contextList = {next = 0xbe5d08, prev = 0x2aaab7a6e4f8}, id = 5532192, gcFreeLists = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0}, gcMallocBytes = 48, gcRunningCloseHooks = 0, gsnCache = {script = 0x0, table = {ops = 0x0, data = 0x0,
hashShift = 0, maxAlphaFrac = 0 '\0', minAlphaFrac = 0 '\0', entrySize = 0, entryCount = 0, removedCount = 0,
generation = 0, entryStore = 0x0}}}
(gdb) print thread
$7 = (JSThread *) 0x680c40
which is identical after a couple of iterations
(gdb) print *thread->contextList.next
$11 = {next = 0xbe5d08, prev = 0xbe5d08}
(gdb) print *thread->contextList.prev
$12 = {next = 0x680c40, prev = 0x2aaab79e2138}
Comment 18•18 years ago
|
||
Probably a free memory read, but I have no idea why. The prev link looks bad. Can you purify or equivalent?
/be
Comment 19•18 years ago
|
||
I'm so old-school -- of course I mean "run valgrind." ;-).
/be
Comment 20•18 years ago
|
||
Or try MALLOC_CHECK_ set to 2 in the environment, for Linux?
/be
Assignee | ||
Comment 21•18 years ago
|
||
Brendan, we found out why. It's a bug in the JavaScript code.
This is a variant of the common problem of passing x++ as the
argument to a macro that instantiates the argument twice,
resulting in x being incremented twice.
The JavaScript code passes thread->contextList.next to the
JS_REMOVE_AND_INIT_LINK macro. The macro changes the value
of thread->contextList.next in the first assignment:
(_e)->prev->next = (_e)->next;
Comment 22•18 years ago
|
||
Wow, embarrassing. Thanks for finding this.
/be
Attachment #251328 -
Flags: review?(wtchang)
Assignee | ||
Comment 23•18 years ago
|
||
Comment on attachment 251328 [details] [diff] [review]
fix js_ThreadDestructorCB
Brendan, thanks for the patch. Please use bug 366869 to follow up on the JavaScript bug.
Attachment #251328 -
Attachment is obsolete: true
Attachment #251328 -
Flags: review?(wtchang)
Assignee | ||
Comment 24•18 years ago
|
||
Comment on attachment 251300 [details] [diff] [review]
Workaround: don't call _pt_thread_death in _PR_Fini
I restored the _pt_thread_death call in _PR_Fini on the
NSPR trunk and NSPRPUB_PRE_4_2_CLIENT_BRANCH because Brendan
has checked in the JavaScript fix.
Attachment #251300 -
Attachment is obsolete: true
Assignee | ||
Updated•18 years ago
|
Target Milestone: 4.6.5 → 4.7
Assignee | ||
Comment 25•18 years ago
|
||
Tested on Sun's AIX 5.1 machine 'zebra' with both 32-bit and 64-bit
builds.
On AIX, the _PR_Fini function cannot be static. So I moved the
static declaration of _PR_Fini to the ifdef sections for the platforms
that allow _PR_Fini to be static.
Attachment #252010 -
Flags: review?(nelson)
Assignee | ||
Comment 26•18 years ago
|
||
Comment on attachment 252010 [details] [diff] [review]
AIX patch
On some platforms, this patch uses the 'static' specifier on
only the declaration of _PR_Fini but not the definition of _PR_Fini:
static void _PR_Fini(void);
void _PR_Fini(void)
{
....
}
This is allowed in C. The function _PR_Fini is static in this
case. See The C Programming Language, 2nd Ed., Appendix A10.2.
On page 227, it says:
If the first external declaration for a function or object
includes the 'static' specifier, the identifier has internal
linkage; otherwise it has external linkage.
If this is an esoteric feature of C, I can remove those static
declarations from the patch.
Updated•18 years ago
|
Attachment #247747 -
Flags: review?(alexei.volkov.bugs) → review+
Updated•18 years ago
|
Attachment #252010 -
Flags: review?(nelson) → review+
Assignee | ||
Comment 27•18 years ago
|
||
I checked in the AIX patch on the NSPR trunk (NSPR 4.7) and
the NSPRPUB_PRE_4_2_CLIENT_BRANCH (Mozilla 1.9 alpha 2).
The remaining work is to implement the NSPR shared library
initialization function. It may be as simple as calling
_PR_ImplicitInitialization as the "init" routine for all
the platforms.
Whiteboard: Remaining work: implement NSPR shared library initialization function
Assignee | ||
Updated•18 years ago
|
Priority: P2 → P1
Comment 28•17 years ago
|
||
is this ready for check in?
Assignee | ||
Comment 29•17 years ago
|
||
Kai, all the patches in this bug have been checked in.
The bug was left open because I haven't done the remaining
work (see comment 27). I don't have time to finish the
remaining work in NSPR 4.7, so I'm going to update the
bug's summary to reflect the completed work, and mark
the bug fixed.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Summary: Unix: initialize and clean up NSPR when the NSPR library is loaded and unloaded → Unix: clean up NSPR when the NSPR library is unloaded
Whiteboard: Remaining work: implement NSPR shared library initialization function
You need to log in
before you can comment on or make changes to this bug.
Description
•