Closed Bug 362768 Opened 18 years ago Closed 17 years ago

Unix: clean up NSPR when the NSPR library is unloaded

Categories

(NSPR :: NSPR, enhancement, P1)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wtc, Assigned: wtc)

References

Details

Attachments

(2 files, 4 obsolete files)

Right now NSPR initialization is done implicitly the first time an NSPR function is called, and one has to call PR_Cleanup explicitly to clean up NSPR. The problem with the current implicit initialization is that it is not thread-safe. If two or more threads call NSPR functions at the same time when NSPR isn't initialized, both of them will call the _PR_ImplicitInitialization function. The problem with calling PR_Cleanup is that it is impossible for an NSPR-based library such as NSS and LDAP C SDK to know that the process is done with NSPR and therefore it is safe to shut down NSPR. This enhancement request asks that we use techniques such as a Unix shared library's init and fini functions or GCC's constructor and destructor attributes for functions to so that NSPR is initialized and cleaned up when the NSPR library is loaded and unloaded. This enhancement request doesn't cover Win32. In particular, there are a lot of restrictions on the functions that can be called from DllMain that we can't perform NSPR initialization and cleanup in DllMain on Windows.
Attached patch Incomplete patch for NSPR cleanup (obsolete) (deleted) — Splinter Review
I'm proposing this patch for the NSPR trunk and the NSPR client branch only, so that we can get some testing on the Mozilla trunk. This patch uses GCC's destructor function attribute to specify that the _PR_Fini function should be called when the NSPR library is unloaded. Right now _PR_Fini only cleans up the thread local storage of the current thread. Eventually it needs to do everything we do in PR_Cleanup except waiting for all the PR_USER_THREAD's to terminate.
Attachment #247477 - Flags: review?(alexei.volkov)
Attachment #247477 - Flags: review?(alexei.volkov) → review?(alexei.volkov.bugs)
Comment on attachment 247477 [details] [diff] [review] Incomplete patch for NSPR cleanup I should explain why, among all the resources used by NSPR, I want to clean up the thread local storage first. If the thread local storage used by NSPR to store a pointer to the PRThread structure is not NULL, when the thread terminates, the pthread library will pass that pointer to the thread local storage destructor function, _pt_thread_death. _pt_thread_death is defined in libnspr4.so, so if the pthread library calls _pt_thread_death after libnspr4.so is unloaded, it will crash. Therefore, it's important to set the thread local storage to a NULL pointer before libnspr4.so is unloaded. So while not freeing the other resources will only result in resource leaks, not cleaning up the thread local storage may result in a crash. I will write a new version of this patch that adds the _fini function to libnspr4.so for non-GCC compilers.
Attached patch Proposed patch (obsolete) (deleted) — Splinter Review
This is the complete patch. It covers all the major platforms. 1. GCC: used on Linux, Mac OS X, and the BSD's. I tested it on Linux and Mac OS X. 2. Sun compilers: used on Solaris. I tested it on Solaris 9 with Forte 6 update 2 and Studio 11. 3. HP C Compiler: used on HP-UX. Unfortunately I can't find a macro that is predefined by all versions of the HP C Compiler, so I'm using the HPUX macro as a substitute. I tested it on HP-UX 11.11 (PA-RISC) and 11.23 (ia64), both 32-bit and 64 bit builds. The only Unix platform I don't cover that we may care about is AIX. If necessary, I can do the research. But I don't have any AIX machine.
Attachment #247477 - Attachment is obsolete: true
Attachment #247602 - Flags: superreview?(nelson)
Attachment #247602 - Flags: review?(alexei.volkov.bugs)
Attachment #247477 - Flags: review?(alexei.volkov.bugs)
On AIX, we can use the "initfini" linker option: http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.cmds/doc/aixcmds3/ld.htm The linker option –binitfini::_PR_Fini should work for us.
Status: NEW → ASSIGNED
Priority: -- → P2
Target Milestone: --- → 4.6.5
Attached patch Proposed patch v2 (deleted) — Splinter Review
Improved comments. No other changes.
Attachment #247602 - Attachment is obsolete: true
Attachment #247747 - Flags: superreview?(nelson)
Attachment #247747 - Flags: review?(alexei.volkov.bugs)
Attachment #247602 - Flags: superreview?(nelson)
Attachment #247602 - Flags: review?(alexei.volkov.bugs)
Comment on attachment 247747 [details] [diff] [review] Proposed patch v2 Wan-Teh, is this patch incomplete for AIX? In ptthread.c you add this code for AIX: >+#elif defined(AIX) >+/* Need to use the binitfini::_PR_Fini linker option. */ >+#endif Yet the patch contains no makefile change to add that linker option.
Nelson, yes, this patch is incomplete for AIX. I don't have an AIX system.
Comment on attachment 247747 [details] [diff] [review] Proposed patch v2 Wan-Teh, Christophe is willing to test a patch for AIX. Please attach the proposed patch to this bug. One more question. _PR_Fini is declared static. Will the linker be able to use a static symbol with the initfini: command?
Attachment #247747 - Flags: superreview?(nelson) → superreview+
Comment on attachment 247747 [details] [diff] [review] Proposed patch v2 I checked in proposed patch v2 on the NSPR trunk (NSPR 4.7) and the NSPRPUB_PRE_4_2_CLIENT_BRANCH (Mozilla 1.9 alpha 2). Checking in src/Makefile.in; /cvsroot/mozilla/nsprpub/pr/src/Makefile.in,v <-- Makefile.in new revision: 1.44; previous revision: 1.43 done Checking in src/pthreads/ptthread.c; /cvsroot/mozilla/nsprpub/pr/src/pthreads/ptthread.c,v <-- ptthread.c new revision: 3.70; previous revision: 3.69 done Checking in Makefile.in; /cvsroot/mozilla/nsprpub/pr/src/Makefile.in,v <-- Makefile.in new revision: 1.25.2.22; previous revision: 1.25.2.21 done Checking in pthreads/ptthread.c; /cvsroot/mozilla/nsprpub/pr/src/pthreads/ptthread.c,v <-- ptthread.c new revision: 3.42.2.16; previous revision: 3.42.2.15 done
(In reply to comment #8) > > One more question. _PR_Fini is declared static. > Will the linker be able to use a static symbol with the initfini: command? If _PR_Fini can't be declared static, we can remove the 'static'. On all the platforms (GCC, Solaris, and HP-UX), a static symbol works. I will arrange a time to come over to Sun and write the AIX implementation. The testing of the function requires the use of the debugger.
the checkin on trunk turned balsa orange (and nye more orange)
The tree is closed due to the orange above, by the way. Please fix ASAP or back out on NSPRPUB_PRE_4_2_CLIENT_BRANCH so the tree can be reopened?
Comment on attachment 247747 [details] [diff] [review] Proposed patch v2 Thank you for letting me know. I backed out the patch on the NSPRPUB_PRE_4_2_CLIENT_BRANCH. Checking in Makefile.in; /cvsroot/mozilla/nsprpub/pr/src/Makefile.in,v <-- Makefile.in new revision: 1.25.2.23; previous revision: 1.25.2.22 done Checking in pthreads/ptthread.c; /cvsroot/mozilla/nsprpub/pr/src/pthreads/ptthread.c,v <-- ptthread.c new revision: 3.42.2.17; previous revision: 3.42.2.16 done
For what it's worth, my build with the patch in hung on shutdown with the stack: #0 0x0028e39b in js_ThreadDestructorCB (ptr=0x8d9b240) at /builds/trunk/mozilla/js/src/jscntxt.c:82 #1 0x0012604b in _PR_DestroyThreadPrivate (self=0x8c84c10) at /builds/trunk/mozilla/nsprpub/pr/src/threads/prtpd.c:265 #2 0x0013e033 in _pt_thread_death (arg=0x8c84c10) at /builds/trunk/mozilla/nsprpub/pr/src/pthreads/ptthread.c:825 #3 0x0013e11e in _PR_Fini () at /builds/trunk/mozilla/nsprpub/pr/src/pthreads/ptthread.c:990 #4 0x0011c7b0 in __do_global_dtors_aux () from ./libnspr4.so #5 0x00143e7c in _fini () from ./libnspr4.so #6 0x4f5aa572 in _dl_fini () at dl-fini.c:248 #7 0x4f5e4939 in *__GI_exit (status=0) at exit.c:75 #8 0x4f5cef34 in __libc_start_main (main=0x80485f0 <main>, argc=3, ubp_av=0xbfe7d0c4, init=0x8048660 <__libc_csu_init>, fini=0x8048650 <__libc_csu_fini>, rtld_fini=0x4f5aa390 <_dl_fini>, stack_end=0xbfe7d0bc) at libc-start.c:262 #9 0x08048561 in _start ()
I checked in this workaround on the NSPR trunk (NSPR 4.7) and NSPRPUB_PRE_4_2_CLIENT_BRANCH (Mozilla 1.9 alpha 2). The _pt_thread_death function calls the NSPR thread private data destructors. In Mozilla the thread private data destructor for JavaScript was causing the hang on shutdown. Before I have the setup to debug it, it is best to comment out the _pt_thread_death call in _PR_Fini. In general it seems risky to call NSPR thread private data destructors in NSPR's shared library finalization function because the libraries in which the destructors are defined may have been unloaded. For our immediate problem, the ptthread_setspecific(pt_book.key, NULL) call is what we need. So there is still value to test the partial patch on the Mozilla trunk.
Brendan, please refer to the stack trace in comment 14. I checked in an NSPR patch to add a shared library finalization function, traditionally known as "_fini". On Linux this function is implemented using the GCC __attribute__ ((destructor)) language extension. Right now the only cleanup I do is to set the pthread thread-specific data used by NSPR to NULL. Since that data is a pointer to the PRThread structure, I used the existing function _pt_thread_death to free the PRThread structure first. But _pt_thread_death does more cleanup than just freeing the memory. In particular, it invokes NSPR's thread private data destructors on the NSPR thread private data for that thread. (Note the distinction between pthread's thread-specific data and NSPR's thread private data. Both are used here.) On Mozilla shutdown, we have a hang in js_ThreadDestructorCB. Since the line is in a while loop, I suspect that the hang is really an infinite loop. It seems that only debug builds have this problem. Do you have any idea what might be the cause? In any case, it seems risky for NSPR's shared library finalization function to call NSPR's thread private data destructors. I am just curious as to why we hang in js_ThreadDestructorCB.
We reproduced this hang in both debug and optimized builds. The hang comes with 100% CPU utilization, so it's an infinite loop. Here are some data values in the debugger. Stack trace is in http://pastebin.mozilla.org/2792 (gdb) print thread->contextList $1 = {next = 0xbe5d08, prev = 0x2aaab7a6e4f8} (gdb) print *thread $6 = {contextList = {next = 0xbe5d08, prev = 0x2aaab7a6e4f8}, id = 5532192, gcFreeLists = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, gcMallocBytes = 48, gcRunningCloseHooks = 0, gsnCache = {script = 0x0, table = {ops = 0x0, data = 0x0, hashShift = 0, maxAlphaFrac = 0 '\0', minAlphaFrac = 0 '\0', entrySize = 0, entryCount = 0, removedCount = 0, generation = 0, entryStore = 0x0}}} (gdb) print thread $7 = (JSThread *) 0x680c40 which is identical after a couple of iterations (gdb) print *thread->contextList.next $11 = {next = 0xbe5d08, prev = 0xbe5d08} (gdb) print *thread->contextList.prev $12 = {next = 0x680c40, prev = 0x2aaab79e2138}
Probably a free memory read, but I have no idea why. The prev link looks bad. Can you purify or equivalent? /be
I'm so old-school -- of course I mean "run valgrind." ;-). /be
Or try MALLOC_CHECK_ set to 2 in the environment, for Linux? /be
Brendan, we found out why. It's a bug in the JavaScript code. This is a variant of the common problem of passing x++ as the argument to a macro that instantiates the argument twice, resulting in x being incremented twice. The JavaScript code passes thread->contextList.next to the JS_REMOVE_AND_INIT_LINK macro. The macro changes the value of thread->contextList.next in the first assignment: (_e)->prev->next = (_e)->next;
Attached patch fix js_ThreadDestructorCB (obsolete) (deleted) — Splinter Review
Wow, embarrassing. Thanks for finding this. /be
Attachment #251328 - Flags: review?(wtchang)
Comment on attachment 251328 [details] [diff] [review] fix js_ThreadDestructorCB Brendan, thanks for the patch. Please use bug 366869 to follow up on the JavaScript bug.
Attachment #251328 - Attachment is obsolete: true
Attachment #251328 - Flags: review?(wtchang)
Comment on attachment 251300 [details] [diff] [review] Workaround: don't call _pt_thread_death in _PR_Fini I restored the _pt_thread_death call in _PR_Fini on the NSPR trunk and NSPRPUB_PRE_4_2_CLIENT_BRANCH because Brendan has checked in the JavaScript fix.
Attachment #251300 - Attachment is obsolete: true
Target Milestone: 4.6.5 → 4.7
Attached patch AIX patch (deleted) — Splinter Review
Tested on Sun's AIX 5.1 machine 'zebra' with both 32-bit and 64-bit builds. On AIX, the _PR_Fini function cannot be static. So I moved the static declaration of _PR_Fini to the ifdef sections for the platforms that allow _PR_Fini to be static.
Attachment #252010 - Flags: review?(nelson)
Comment on attachment 252010 [details] [diff] [review] AIX patch On some platforms, this patch uses the 'static' specifier on only the declaration of _PR_Fini but not the definition of _PR_Fini: static void _PR_Fini(void); void _PR_Fini(void) { .... } This is allowed in C. The function _PR_Fini is static in this case. See The C Programming Language, 2nd Ed., Appendix A10.2. On page 227, it says: If the first external declaration for a function or object includes the 'static' specifier, the identifier has internal linkage; otherwise it has external linkage. If this is an esoteric feature of C, I can remove those static declarations from the patch.
Attachment #247747 - Flags: review?(alexei.volkov.bugs) → review+
Attachment #252010 - Flags: review?(nelson) → review+
I checked in the AIX patch on the NSPR trunk (NSPR 4.7) and the NSPRPUB_PRE_4_2_CLIENT_BRANCH (Mozilla 1.9 alpha 2). The remaining work is to implement the NSPR shared library initialization function. It may be as simple as calling _PR_ImplicitInitialization as the "init" routine for all the platforms.
Whiteboard: Remaining work: implement NSPR shared library initialization function
Priority: P2 → P1
Depends on: 368289
Blocks: 379550
Blocks: 383977
is this ready for check in?
Kai, all the patches in this bug have been checked in. The bug was left open because I haven't done the remaining work (see comment 27). I don't have time to finish the remaining work in NSPR 4.7, so I'm going to update the bug's summary to reflect the completed work, and mark the bug fixed.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Summary: Unix: initialize and clean up NSPR when the NSPR library is loaded and unloaded → Unix: clean up NSPR when the NSPR library is unloaded
Whiteboard: Remaining work: implement NSPR shared library initialization function
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: