Closed Bug 451024 Opened 16 years ago Closed 16 years ago

certutil.exe crashes with Segmentation fault inside PR_Cleanup

Categories

(NSS :: Libraries, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
3.12.2

People

(Reporter: maksr3, Assigned: wtc)

Details

Attachments

(1 file, 1 obsolete file)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008081311 Minefield/3.0.1 Build Identifier: nss 3.12 May be I have a somewhat unusual build, but may be that is why the bug at least showed up. I've succeeded to build Firefox unofficially on Windows XP with gcc+mingw+buildtools. Firefox runs OK. Then in the same build directories, inside mozilla/security/nss/cmd/certutil I've built certutil.exe with the command /bin/make -C $(pwd) MAKE="/bin/make -j1" -j1 CC="gcc -g -mno-cygwin" SOURCE_MD_DIR=d:/Maksr/Produkti/Install/Mozilla/mozilla/dist DIST=d:/Maksr/Produkti/Install/Mozilla/mozilla/dist NSPR_INCLUDE_DIR=d:/Maksr/Produkti/Install/Mozilla/mozilla/dist/include/nspr NSPR_LIB_DIR=d:/Maksr/Produkti/Install/Mozilla/mozilla/dist/lib MOZILLA_CLIENT=1 NO_MDUPDATE=1 NSS_ENABLE_ECC=1 BUILD_TREE=d:/Maksr/Produkti/Install/Mozilla/mozilla BUILD_OPT=1 OPT_CODE_SIZE=1 NS_USE_GCC=1 NS_USE_NATIVE= OS_TARGET=WIN95 I add the .../mozilla/dist/bin to my PATH. I copy the three files, cert8.db, key3.db, secmod.db from my Firefox profile directory to a temporary working directory and run the command certuil -L -d . It lists a few certs, then it crashes with a Dialog box to send an error report to Microsoft. I've tried to run it with gdb and I've found out, that at the end of the certutil_main function in nss/cmd/certutil/certutil.c there is a call to NSS_Shutdown, which unloads among others the library nssckbi.dll, but after that the call to PR_Cleanup wants to clean some thread and calls a destructor at address 0x63c0b96c that is no longer in mermory, because it has been unloaded with nssckbi. Here is an excerpt from my session with gdb: Breakpoint 9, certutil_main (argc=0, argv=0x22fc98, initialize=1) at certutil.c:2894 2894 if ((initialized == PR_TRUE) && NSS_Shutdown() != SECSuccess) { (gdb) info share From To Syms Read Shared Object Library 0x7c901000 0x7c9afe88 Yes C:\WINNT\system32\ntdll.dll 0x7c801000 0x7c8f4bec Yes C:\WINNT\system32\kernel32.dll 0x77c11000 0x77c67d74 Yes C:\WINNT\system32\msvcrt.dll 0x643c1000 0x644943ec Yes D:\Maksr\Produkti\Install\Mozilla\mozilla\di st\bin\nss3.dll 0x64f41000 0x64f727b0 Yes D:\Maksr\Produkti\Install\Mozilla\mozilla\di st\bin\nspr4.dll 0x77dd1000 0x77e6ab38 Yes C:\WINNT\system32\advapi32.dll 0x77e71000 0x77f01464 Yes C:\WINNT\system32\rpcrt4.dll 0x77fe1000 0x77ff0884 Yes C:\WINNT\system32\secur32.dll 0x76b41000 0x76b6c8b4 Yes C:\WINNT\system32\winmm.dll 0x7e411000 0x7e49fde8 Yes C:\WINNT\system32\user32.dll 0x77f11000 0x77f568a0 Yes C:\WINNT\system32\gdi32.dll 0x71ad1000 0x71ad804c Yes C:\WINNT\system32\wsock32.dll 0x71ab1000 0x71ac6dc8 Yes C:\WINNT\system32\ws2_32.dll 0x71aa1000 0x71aa7324 Yes C:\WINNT\system32\ws2help.dll 0x622c1000 0x622c9110 Yes D:\Maksr\Produkti\Install\Mozilla\mozilla\di st\bin\plds4.dll 0x6ce41000 0x6ce49140 Yes D:\Maksr\Produkti\Install\Mozilla\mozilla\di st\bin\plc4.dll 0x68f81000 0x68f947cc Yes D:\Maksr\Produkti\Install\Mozilla\mozilla\di st\bin\nssutil3.dll 0x6e1c1000 0x6e1e2718 Yes D:\Maksr\Produkti\Install\Mozilla\mozilla\di st\bin\smime3.dll 0x64a81000 0x64aafe08 Yes D:\Maksr\Produkti\Install\Mozilla\mozilla\di st\bin\softokn3.dll 0x61c01000 0x61c65f84 Yes D:\Maksr\Produkti\Install\Mozilla\mozilla\di st\bin\sqlite3.dll 0x679c1000 0x679e2914 Yes D:\Maksr\Produkti\Install\Mozilla\mozilla\di st\bin\nssdbm3.dll 0x696c1000 0x69707f04 Yes D:\Maksr\Produkti\Install\Mozilla\mozilla\di st\bin\freebl3.dll 0x63c01000 0x63c4c030 Yes D:\Maksr\Produkti\Install\Mozilla\mozilla\di st\bin\nssckbi.dll (gdb) info symbol 0x63c0b96c PR_Free in section .text (gdb) next 2898 PR_Cleanup(); (gdb) info symbol 0x63c0b96c No symbol matches 0x63c0b96c. (gdb) next Breakpoint 6, _PR_DestroyThreadPrivate (self=0x3d4790) at prtpd.c:245 245 { (gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. 0x63c0b96c in ?? () (gdb) (gdb) backtrace #0 0x63c0b96c in ?? () #1 0x64f484ed in _PR_DestroyThreadPrivate (self=0x3d4790) at prtpd.c:265 #2 0x64f57fe2 in _PR_CleanupThread (thread=0x3d4790) at prcthr.c:62 #3 0x64f4d8ee in PR_Cleanup () at prinit.c:424 #4 0x00406f5d in certutil_main (argc=0, argv=0x22fc98, initialize=1) at certutil.c:2898 #5 0x004095aa in main (argc=4, argv=0x3d2890) at certutil.c:2910 (gdb) The most informing are two different answers to "info symbol 0x63c0b96c". Maks Romih. Reproducible: Sometimes Steps to Reproduce: 1. 2. 3.
What version of NSPR are you using? Where/how did you get it?
Assignee: nobody → wtc
Component: Tools → NSPR
Product: NSS → NSPR
QA Contact: tools → nspr
Version: unspecified → other
Maks, PR_Free is the NSPR thread private data destructor installed by this call: http://mxr.mozilla.org/security/source/security/nss/lib/base/error.c#103 100 static PRStatus 101 error_once_function ( void) 102 { 103 return PR_NewThreadPrivateIndex(&error_stack_index, PR_Free); 104 } I need to know the version of NSS you're using. 1. In mozilla/client.mk, what's the value of the NSS_CO_TAG variable? 2. In mozilla/security/nss/lib/nss/nss.h, what's the value of the NSS_VERSION macro?
(In reply to comment #1) > What version of NSPR are you using? > Where/how did you get it? > I've got NSPR with the Firefox source tarball firefox-3.0.1-source.tar.bz2 that I downloaded on Jully 28th. Properties/Version on my nspr4.dll shows 4.7.1.0.
> > PR_Free is the NSPR thread private data destructor installed by > this call: > http://mxr.mozilla.org/security/source/security/nss/lib/base/error.c#103 > > 100 static PRStatus > 101 error_once_function ( void) > 102 { > 103 return PR_NewThreadPrivateIndex(&error_stack_index, PR_Free); > 104 } > Yes, exactly. This function in my case gets called twice. Problematic is second call which is made after loading nssckbi.dll. Here is the call stack: #0 PR_NewThreadPrivateIndex (newIndex=0x63c43040, dtor=0x63c0b96c <PR_Free>) at prtpd.c:131 #1 0x63c0b35a in error_once_function () at error.c:99 #2 0x64f4df74 in PR_CallOnce (once=0x63c43050, func=0x63c0b348 <error_once_function>) at prinit.c:808 #3 0x63c0b37a in error_get_my_stack () at error.c:120 #4 0x63c43050 in error_stack_index () from D:\Maksr\Produkti\Install\Mozilla\mozilla\dist\bin\nssckbi.dll #5 0x63c0b348 in nss_arena_hash_free_entry () from D:\Maksr\Produkti\Install\Mozilla\mozilla\dist\bin\nssckbi.dll #6 0x63c0b453 in nss_ClearErrorStack () at error.c:277 #7 0x63c0ac09 in NSSArena_Create () at arena.c:385 #8 0x63c07a7c in nssCKFWInstance_Create (pInitArgs=0x6445b70a, LockingState=MultiThreaded, mdInstance=0x63c10080, pError=0x22f954) at instance.c:217 #9 0x63c01efa in NSSCKFWC_Initialize (pFwInstance=0x63c43020, mdInstance=0x63c10080, pInitArgs=0x6445b70a) at wrap.c:205 #10 0x63c01195 in builtinsC_Initialize (pInitArgs=0x6445b70a) at d:/Maksr/Produkti/Install/Mozilla/mozilla/dist/public/nss/nssck.api:117 #11 0x643d54f7 in secmod_ModuleInit (mod=0xb38b08, alreadyLoaded=0x22f9c8) at pk11load.c:147 #12 0x643d58bf in SECMOD_LoadPKCS11Module (mod=0xb38b08) at pk11load.c:379 #13 0x643e4930 in SECMOD_AddModule (newModule=0xb38b08) at pk11util.c:487 #14 0x643e49b8 in SECMOD_AddNewModuleEx (moduleName=0x6445a244 "Root Certs", dllPath=0xb389e8 "./nssckbi.dll", defaultMechanismFlags=0, cipherEnableFlags=0, modparms=0x0, nssparms=0x0) at pk11util.c:586 #15 0x643e4a81 in SECMOD_AddNewModule (moduleName=0x6445a244 "Root Certs", dllPath=0xb389e8 "./nssckbi.dll", defaultMechanismFlags=0, cipherEnableFlags=0) at pk11util.c:626 #16 0x643c1ae6 in nss_Init (configdir=0x421e90 ".", certPrefix=0x417141 "", keyPrefix=0x417141 "", secmodName=0x418ab3 "secmod.db", updateDir=0x6445a00c "", updCertPrefix=0x6445a00c "", updKeyPrefix=0x6445a00c "", updateID=0x6445a00c "", updateName=0x6445a00c "", readOnly=1, noCertDB=0, noModDB=0, forceOpen=0, noRootInit=0, optimizeSpace=0, noSingleThreadedModules=0, allowAlreadyInitializedModules=0, dontFinalizeModules=0) at nssinit.c:388 #17 0x643c1ca1 in NSS_Initialize (configdir=0x421e90 ".", certPrefix=0x417141 "", keyPrefix=0x417141 "", secmodName=0x418ab3 "secmod.db", flags=1) at nssinit.c:661 #18 0x00406d9c in certutil_main (argc=4288833, argv=0x1, initialize=1) at certutil.c:2324 #19 0x004095aa in main (argc=4, argv=0x3d2980) at certutil.c:2910 In my nssckbi.dll the PR_Free is imported, so the address, that gets transferred in the second parameter to PR_NewThreadPrivateIndex is from the segment of nssckbi.dll. This causes segfault afterwards, when the NSS_Shutdown unloads the nssckbi.dll before PR_Cleanup tries to destroy the thread. I will investigate now if the MSVC build has the same problem, however I'm not so at home wih WinDbg as I'm with Gdb. Maybe import mechanism between DLLs in MSVC works differently than in GCC. > I need to know the version of NSS you're using. > 1. In mozilla/client.mk, what's the value of the NSS_CO_TAG variable? > 2. In mozilla/security/nss/lib/nss/nss.h, what's the value of the > NSS_VERSION macro? > NSS_CO_TAG = FIREFOX_3_0_1_RELEASE #define NSS_VERSION "3.12.0.3" _NSS_ECC_STRING _NSS_CUSTOMIZED
yeah, you're basically passing a thunk to the cleanup library. and then unloading the library. by the time you call pr_cleanup, the library containing the thunk is dead and the pointer isn't valid. you can use nspr to lookup the real PR_Free function instead.
Maks, Thanks for the NSS version info. I was worried that you're using a very old version of NSS (used in Firefox 1.0.x), whose nssckbi.dll has its own stub of PR_Free: http://mxr.mozilla.org/aviarybranch/source/security/nss/lib/ckfw/nsprstub.c#317 If you unload that old nssckbi.dll, it would make sense that its PR_Free stub is also gone from the address space. You are using the latest version of NSS, whose nssckbi.dll uses the real PR_Free exported from nspr4.dll. Except that in this case it is a thunk in nssckbi.dll for PR_Free that is passed to PR_NewThreadPrivateIndex. The fix that timeless suggested in comment 5 should work, but is ugly. Can you modify GCC (or suggest to the GCC maintainers) so that GCC uses the address of the real function rather than the thunk when the address is passed to a function defined in another DLL? We fixed a related memory leak (bug 431805) before. I believe the same fix applied to nssckbi.dll would fix this crash. In mozilla/security/nss/lib/ckfw/wrap.c, function NSSCKFWC_Finalize, could you add this after the two NULL pointer checks? nss_DestroyErrorStack(); There are two other ways to fix this crash: 1. Make lib/base a shared library. (We could just make it part of nssutil3.dll.) Right now lib/base is a static library. So anything that links with lib/base (nss3.dll and nssckbi.dll) has its own copy of the NSS error stack, and each one needs to call nss_DestroyErrorStack() before shutdown. 2. Remove the NSS error stack. I don't think anyone is using it.
Assignee: wtc → nobody
Status: UNCONFIRMED → NEW
Component: NSPR → Libraries
Ever confirmed: true
Product: NSPR → NSS
QA Contact: nspr → libraries
Version: other → unspecified
Status: NEW → ASSIGNED
Assignee: nobody → wtc
Status: ASSIGNED → NEW
Is this a consequence of the fix for bug 337887, which added -mnop-fun-dllimport to the CFLAGS for GCC on win32? That was a workaround for an unnecessary gcc incompatibility. Maybe we can get the gcc folks to fix the cause of bug 337887 and then back out that workaround?
(In reply to comment #6) Wan-Teh, thank you for the solution! > We fixed a related memory leak (bug 431805) before. I believe > the same fix applied to nssckbi.dll would fix this crash. In > mozilla/security/nss/lib/ckfw/wrap.c, function NSSCKFWC_Finalize, > could you add this after the two NULL pointer checks? > > nss_DestroyErrorStack(); > This really helps! No more crash. I've also verified with gdb that the destructor PR_Free is now called only once from _PR_DestroyThreadPrivate. Maks.
(In reply to comment #8) > (In reply to comment #6) > > Wan-Teh, thank you for the solution! > > > We fixed a related memory leak (bug 431805) before. I believe > > the same fix applied to nssckbi.dll would fix this crash. In > > mozilla/security/nss/lib/ckfw/wrap.c, function NSSCKFWC_Finalize, > > could you add this after the two NULL pointer checks? > > > > nss_DestroyErrorStack(); > > > > This really helps! No more crash. > I've tried certutil and it really works now. But there's still a problem with modutil, that still crashes in a similar manner. I think it would be better to put nss_DestroyErrorStack after the row error = nssCKFWInstance_Destroy(*pFwInstance); because, when linked into modutil, this row creates a new error stack. Maks.
Maks, You're right. The NSSArena_Destroy call at the end of nssCKFWInstance_Destroy creates the error stack if it doesn't exist. We should put nss_DestroyErrorStack right before the "return error;" statement at the end of NSSCKFWC_Finalize. I am wondering why certutil doesn't crash when we put nss_DestroyErrorStack before the line error = nssCKFWInstance_Destroy(*pFwInstance); nssCKFWInstance_Destroy should also create a new error stack when linked into certutil, right? Can you find out what happens in gdb?
(In reply to comment #10) > > We should put nss_DestroyErrorStack right before the > "return error;" statement at the end of NSSCKFWC_Finalize. > Now I've put it there and both, certutil and modutil seem to work OK. > I am wondering why certutil doesn't crash when we put > nss_DestroyErrorStack before the line > > error = nssCKFWInstance_Destroy(*pFwInstance); > > nssCKFWInstance_Destroy should also create a new error stack > when linked into certutil, right? Can you find out what > happens in gdb? Yeah, I was wondering also. The fact is in comment 8 and comment 9 I was mistaken. This bug was as elusive as dangling pointers can be (I marked it as "Reproducible: Sometimes" in comment 1). When tracing with gdb I was misled, because in the function _PR_DestroyThreadPrivate the destructor really got called one time less than before, so I thought it was OK. But one call less was the result of my applying the patch from bug 431805, not the fix suggested by Wan-Teh. Now, that I've put the nss_DestroyErrorStack just before return in NSSCKFWC_Finalize, the destructor PR_Free doesn't get called at all. Both error stacks are destroyed before PR_Cleanup, one from NSSCKFWC_Finalize (called indirectly from NSS_Shutdown) and one from NSS_Shutdown itself. I hope this concludes the matter. This was my first reported bug in this bugzilla and I'm really not handy enough to make an official patch. Wan-Teh, will you take care? Maks.
(In reply to comment #11) > NSSCKFWC_Finalize, the destructor PR_Free doesn't get called at all. Both PR_Free of course do get called, but I mean not from _PR_DestroyThreadPrivate when PR_Cleanup destroys the threads. Maks.
Attached patch Proposed patch (obsolete) (deleted) — Splinter Review
Maks, please test this patch, which is slightly different from what you tested before. Note that I call nss_DestroyErrorStack() right before NSSCKFWC_Finalize returns, but only if NSSCKFWC_Finalize returns successfully (CKR_OK). The reason is that people may want to examine the error stack if NSSCKFWC_Finalize returns an error. Of course, people may ignore the NSSCKFWC_Finalize failure and unload the nssckbi.dll module anyway, so this may not be a good idea.
Attachment #335151 - Flags: superreview?(nelson)
Attachment #335151 - Flags: review?(maksr3)
Comment on attachment 335151 [details] [diff] [review] Proposed patch If I remove the very large comment from the patch, it boils down to: > switch( error ) { > PRInt32 remainingInstances; > case CKR_OK: > remainingInstances = PR_AtomicDecrement(&liveInstances); > if (!remainingInstances) { > nssArena_Shutdown(); > } >+ nss_DestroyErrorStack(); > break; > case CKR_CRYPTOKI_NOT_INITIALIZED: > case CKR_FUNCTION_FAILED: > case CKR_GENERAL_ERROR: > case CKR_HOST_MEMORY: > break; > default: > error = CKR_GENERAL_ERROR; > break; > } > > return error; Seems like the crash will occur in the event that "error" is any value other than CKR_OK. No?
Right. But that's my dilemma. By definition, the error stack is used to provide more information about an error. That's why I don't want to destroy the error stack on error. But I suspect that our users will ignore the return value of NSSCKFWC_Finalize and unload the nssckbi.dll module anyway. So I'll be happy to move the nss_DestroyErrorStack() call before the "return error;" statement.
Well, my first preference would be to find a fix that avoids/prevents passing an address from within nssckbi.dll to PR_NewThreadPrivateIndex.
Thinking out loud here. Comment 16 wasn't meant to imply that I oppose the fix you proposed in comment 14. Maybe the new line you're proposing to add should be ifdef'ed for MinGW too. I think that's entirely appropriate. Or Maybe nssckbi should call NSPR's equivalent of dlsym to get the address of PR_Free, and then pass that address to PR_NewThreadPrivateIndex, but only for these MinGW builds.
The nss_DestroyErrorStack() call fixes the leak of the primordial thread's error stack if PR_Cleanup is not called, so it should not be ifdef'ed for MinGW. (This leak is similar to the leak of bug 431805, and is caused by lib/base being a static library and being included in nss3.dll and nssckbi.dll, resulting in two error stacks per thread.) I don't want to spend too much time on this MinGW bug. I'd rather be working on more important NSS bugs.
Well, if you attach the patch that I think you were suggesting at the end of comment 15, I'm willing to r+ it.
Attached patch Proposed patch v2 (deleted) — Splinter Review
Thanks, Nelson. Here is the patch that Maks tested.
Attachment #335151 - Attachment is obsolete: true
Attachment #335228 - Flags: review?(nelson)
Attachment #335151 - Flags: superreview?(nelson)
Attachment #335151 - Flags: review?(maksr3)
Comment on attachment 335228 [details] [diff] [review] Proposed patch v2 r=nelson
Attachment #335228 - Flags: review?(nelson) → review+
I checked in the patch on the NSS trunk (NSS 3.12.2). Checking in wrap.c; /cvsroot/mozilla/security/nss/lib/ckfw/wrap.c,v <-- wrap.c new revision: 1.17; previous revision: 1.16 done
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Target Milestone: --- → 3.12.2
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: