Closed
Bug 1389967
Opened 7 years ago
Closed 6 years ago
Browser built with mingw-w64 crashes on shutdown (Faulting module name: nssckbi.dll_unloaded)
Categories
(NSS :: Libraries, enhancement)
Tracking
(Not tracked)
RESOLVED
FIXED
3.39
People
(Reporter: gk, Assigned: tjr)
References
(Blocks 2 open bugs)
Details
(Whiteboard: [tor])
Attachments
(1 file, 10 obsolete files)
(deleted),
patch
|
franziskus
:
review+
|
Details | Diff | Splinter Review |
After we switched Tor Browser to ESR 52 we got reports of crashes during shutdown (Faulting module name: nssckbi.dll_unloaded).
It turns out we had to deal with those already quite some time ago (see: https://trac.torproject.org/projects/tor/ticket/10761 and bug 1151447). Back then during the ESR 38 timeframe they got fixed by removing the -mnop-fun-dllimport flag for NSS builds. And this was before library folding (bug 856404) hit m-c.
During ESR 45 we had to revert bug 856404 due to bug 1248552 and did not get a single report about shutdown crashes. They only showed up again shortly after switching to ESR 52 which has library folding enabled.
Reporter | ||
Comment 1•7 years ago
|
||
Jacek, I am not sure what is going on. Could that be related to the library folding being enabled and/or MOZ_FOLD_LIBS_FLAGS="-mnop-fun-dllimport" (still) being available in old-configure.in (https://dxr.mozilla.org/mozilla-esr52/source/old-configure.in#1005)?
FWIW the bug on our side is https://trac.torproject.org/projects/tor/ticket/22581.
Flags: needinfo?(jacek)
Whiteboard: [tor]
Comment 2•7 years ago
|
||
Yes, it may be related. Now that libraries are folded, there should not be inter-library calls that were problematic previously, but there may be other similar calls (I can't test myself ATM, sorry). Ideally we wouldn't use -mnop-fun-dllimport, but that's impossible with current way libraries are folded. Maybe there is a workaround for that crash, but we'd need to know which exact call is problematic.
Flags: needinfo?(jacek)
Reporter | ||
Comment 3•7 years ago
|
||
Okay, I'll try to get a debug build done and some stack traces from the problematic machine. It could take a while, though.
Reporter | ||
Comment 4•7 years ago
|
||
Yesterday I got a machine in my hands exhibiting the problem. Here is the gdb backtrace I got with a non-stripped build:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1048.0x1cc]
0x6edfda4c in ?? ()
(gdb) bt
#0 0x6edfda4c in ?? ()
#1 0x634cca60 in _PR_DestroyThreadPrivate ()
from C:\Users\user\Desktop\Tor Browser\Browser\nss3.dll
#2 0x634cbb1a in _PR_CleanupThread ()
from C:\Users\user\Desktop\Tor Browser\Browser\nss3.dll
#3 0x634ca36c in _PR_NativeRunThread ()
from C:\Users\user\Desktop\Tor Browser\Browser\nss3.dll
#4 0x634bc19e in pr_root@4 ()
from C:\Users\user\Desktop\Tor Browser\Browser\nss3.dll
#5 0x6c29c556 in endthreadex ()
from C:\Users\user\Desktop\Tor Browser\Browser\msvcr100.dll
#6 0x6c29c600 in endthreadex ()
from C:\Users\user\Desktop\Tor Browser\Browser\msvcr100.dll
#7 0x7572ef8c in KERNEL32!AcquireSRWLockExclusive ()
from C:\Windows\system32\kernel32.dll
#8 0x76ed367a in ntdll!RtlInsertElementGenericTableAvl ()
from C:\Windows\system32\ntdll.dll
#9 0x76ed364d in ntdll!RtlInsertElementGenericTableAvl ()
from C:\Windows\system32\ntdll.dll
#10 0x00000000 in ?? ()
Jacek: Does that ring some bell?
Flags: needinfo?(jacek)
Comment 5•7 years ago
|
||
Sorry for late response. I can't be sure, but it looks like -mnop-fun-dllimport may be the problem once again. The thing is that we can't do libraries folding without it. Maybe, if it's just one place that's problematic, we could hack it in problematic cases. It crashes when destroying thread private data via registered destructors. It would be interesting to know, where the problematic destructor was registered.
Flags: needinfo?(jacek)
Reporter | ||
Comment 6•7 years ago
|
||
Okay, thanks. I'll have access to the machine where this is reproducible in a couple of weeks again. I guess I'll compile a bundle without library folding again and see whether that fixes it. Meanwhile, how would I find out where the destructor got resgistered assuming I have an unstripped build and gdb handy? Would it help to have stack traces from all threads and not just the crashing one? Or...?
Flags: needinfo?(jacek)
Comment 7•7 years ago
|
||
One way would be to try to track address of instruction on top of the stack. It's 0x6edfda4c in backtrace from comment 4. If the crash is indeed similar to observed previously, it will be a thunk in unloaded library.
Another thing worth trying is a breakpoint at PR_NewThreadPrivateIndex and watch dtor arguments passed there.
Flags: needinfo?(jacek)
Updated•7 years ago
|
Product: Core → Firefox Build System
Assignee | ||
Comment 8•7 years ago
|
||
Turns out :dmajor and I have been chasing this bug for the past couple of weeks.
I think there's five non-null dtors that get registered:
A: https://searchfox.org/mozilla-central/source/ipc/glue/BackgroundImpl.cpp#413
B: https://searchfox.org/mozilla-central/source/toolkit/crashreporter/ThreadAnnotation.cpp#206
C: https://searchfox.org/mozilla-central/source/xpcom/threads/nsThreadManager.cpp#100
D: https://searchfox.org/mozilla-central/source/nsprpub/pr/src/threads/prrwlock.c#417
E: https://searchfox.org/mozilla-central/source/nsprpub/pr/src/malloc/prmem.c#453
I've seen successful calls to:
22:37:39 INFO - GECKO(4164) | function pointer call index 4: 00000000030488E0
22:37:40 INFO - GECKO(4164) | function pointer call index 3: 00000000038AA910
22:37:40 INFO - GECKO(4164) | function pointer call index 7: 000000006B6A2EB0
And then
22:37:40 INFO - GECKO(4164) | function pointer call index 8: 0000000069A81458 <- Failure
3 and 4 are in xul.dll:
22:37:41 INFO - 0x02620000 - 0x0d48bfff xul.dll 60.0.0.6697 (WARNING: No symbols, , 4F252EBC1D196EB8F28B802D8950E0951)
And are probably A, B, or C
7 is in nss:
22:37:41 INFO - 0x6b640000 - 0x6b93bfff nss3.dll 60.0.0.6697 (WARNING: No symbols, , BB0FF785D171D372AB55BFCF3F03C9C61)
:dmajor found the unloaded module and connected out issue to this bug:
> <Unloaded_nssckbi.dll>+0x1458:
> 00000000`69a81458 ?? ???
Assignee: nobody → tom
The only one from that list that I could find in nssckbi was this: https://searchfox.org/mozilla-central/rev/53afcfdbabed96883126d0ebbcac499b358e32f2/security/nss/lib/base/error.c#68
Which on an MSVC build looks like:
nssckbi!error_once_function [z:\build\build\src\security\nss\lib\base\error.c @ 67]:
00000001`800022b4 488b15b5be0000 mov rdx,qword ptr [nssckbi!_imp_PR_Free (00000001`8000e170)]
00000001`800022bb 488d0dceaf0500 lea rcx,[nssckbi!error_stack_index (00000001`8005d290)]
00000001`800022c2 48ff2587be0000 jmp qword ptr [nssckbi!_imp_PR_NewThreadPrivateIndex (00000001`8000e150)]
PR_Free is imported from nss3 by nssckbi. When nssckbi registers `PR_Free` as that function pointer, we go and grab the value out of the import table and so the function pointer we register is actually code in NSS, and we can survive the unloading of nssckbi. That's on an MSVC build.
What if, for some reason mingw instead registers some thunk within nssckbi that says "call qword ptr [nssckbi!_imp_PR_Free]"? Then if nssckbi gets unloaded, that function pointer is no longer usable.
Comment 10•7 years ago
|
||
> What if, for some reason mingw instead registers some thunk within nssckbi
> that says "call qword ptr [nssckbi!_imp_PR_Free]"? Then if nssckbi gets
> unloaded, that function pointer is no longer usable.
https://searchfox.org/mozilla-central/rev/53afcfdbabed96883126d0ebbcac499b358e32f2/old-configure.in#885-886
Comment 11•7 years ago
|
||
I wonder if we could remove -mnop-fun-dllimport just for the one .c file containing error_once_function? Or even move that function to its own file...
Comment 12•7 years ago
|
||
Or maybe worst case do a GetProcAddress to get to the actual code of PR_Free.
Assignee | ||
Comment 13•7 years ago
|
||
(In reply to David Major [:dmajor] from comment #11)
> I wonder if we could remove -mnop-fun-dllimport just for the one .c file
> containing error_once_function? Or even move that function to its own file...
I have been messing with the NSS gyp files for a bit and have had no luck at moving the flag into them (to subsequently remove it for the one file.) I think I'm going to need some NSS help on how to make this happen....
(I confirmed it doesn't build without the flag entirely.)
Comment 14•7 years ago
|
||
We will soon need to change -mnop-fun-dllimport for supporting clang builds (bug 1390583) since it's not supported by clang. Proposed change may be useful here as well:
- MOZ_FOLD_LIBS_FLAGS="-mnop-fun-dllimport"
+ MOZ_FOLD_LIBS_FLAGS="-Ddllimport=noop"
Once we have that, we could just
#undef dllimport
wherever we need in source/header files. We could even combine that with #pragma push_macro/pop_macro to limit its scope to problematic functions.
Assignee | ||
Comment 15•7 years ago
|
||
Still analyzing the results but changing -mnop-fun-dllimport to -Ddllimport=noop did not work (was it intended to work with gcc?)
[task 2018-05-07T14:18:27.482Z] 14:18:27 INFO - /builds/worker/workspace/build/src/sccache2/sccache /builds/worker/workspace/build/src/mingw32/bin/x86_64-w64-mingw32-gcc -std=gnu99 -mwindows -o error.o -c -DDEBUG -DNSS_FIPS_DISABLED -DNSS_NO_INIT_SUPPORT -DNSS_X86_OR_X64 -DNSS_X64 -DNSS_USE_64 -DUSE_UTIL_DIRECTLY -DNO_NSPR_10_SUPPORT -DSSL_DISABLE_DEPRECATED_CIPHER_SUITE_NAMES -D_WINDOWS -DWIN95 -D_CRT_SECURE_NO_WARNINGS -D_CRT_NONSTDC_NO_WARNINGS -DNSS_DISABLE_LIBPKIX -I/builds/worker/workspace/build/src/security/nss/lib/base -I/builds/worker/workspace/build/src/obj-firefox/security/nss/lib/base/base_nssb -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nspr -I/builds/worker/workspace/build/src/obj-firefox/dist/include/private/nss -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nss -I/builds/worker/workspace/build/src/obj-firefox/dist/include -include /builds/worker/workspace/build/src/obj-firefox/mozilla-config.h -DMOZILLA_CLIENT -Wall -Wempty-body -Wignored-qualifiers -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wduplicated-cond -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=free-nonheap-object -Wno-unknown-pragmas -Wno-unused-function -Wno-switch -Wno-enum-compare -fno-strict-aliasing -mms-bitfields -fno-keep-inline-dllexport -ffunction-sections -fdata-sections -Wa,-mbig-obj -fno-math-errno -pthread -pipe -g -fno-omit-frame-pointer -Ddllimport=noop -MD -MP -MF .deps/error.o.pp /builds/worker/workspace/build/src/security/nss/lib/base/error.c
> [task 2018-05-07T14:18:27.483Z] 14:18:27 INFO - /builds/worker/workspace/build/src/obj-firefox/dist/include/nspr/prlock.h:51:1: warning: 'noop' attribute directive ignored [-Wattributes]
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dde068be0fcc2719ffdf49356a6150a9c9662111
Assignee | ||
Comment 16•7 years ago
|
||
(In reply to Tom Ritter [:tjr] from comment #15)
> Still analyzing the results but changing -mnop-fun-dllimport to
> -Ddllimport=noop did not work (was it intended to work with gcc?)
Sorry ignore that, I need to analyze the results more...
Assignee | ||
Comment 17•7 years ago
|
||
(In reply to Tom Ritter [:tjr] from comment #16)
> (In reply to Tom Ritter [:tjr] from comment #15)
> > Still analyzing the results but changing -mnop-fun-dllimport to
> > -Ddllimport=noop did not work (was it intended to work with gcc?)
>
> Sorry ignore that, I need to analyze the results more...
Okay, I can confirm that while the build did compile with -Ddllimport=noop; undefing it for the one function, as shown in this build https://treeherder.mozilla.org/#/jobs?repo=try&revision=0c53a08e2be15c51099a921709b1ccd409271640 , the thunk is still generated:
> nss3!PR_DelSleepQ+0x335:
> 00000000`6b6d6965 ffd0 call rax {nssckbi+0x1458 (00000000`69a81458)}
> 0:070> uf nssckbi+0x1458
> nssckbi+0x1458:
> 00000000`69a81458 ff25da0e0700 jmp qword ptr [nssckbi!C_GetFunctionList+0x6f878 (00000000`69af2338)] Branch
> 0:070> t
> nssckbi+0x1458:
> 00000000`69a81458 ff25da0e0700 jmp qword ptr [nssckbi!C_GetFunctionList+0x6f878 (00000000`69af2338)] ds:00000000`69af2338={nss3!PR_Free (00000000`6b6a2e80)}
> 0:070> t
> nss3!PR_Free:
> 00000000`6b6a2e80 55 push rbp
Comment 18•7 years ago
|
||
> undefing it for the one function, as shown in this build
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=0c53a08e2be15c51099a921709b1ccd409271640 , the thunk
> is still generated:
That patch didn't really undefine the culprit dllimport. Presumably, somewhere in the preprocessed error.c, there is some #included definition of PR_Free containing a dllimport declaration. *That's* what you want to undefine. Try moving the #undef dllimport to the top of this file, before any #includes.
Comment 19•7 years ago
|
||
Or, perhaps you'll need...
#undef dllimport
#define dllimport noop
...to match the spirit of the previous patch.
For good measure you may want to move error_once_function to its own .c file in case the remainder of error.c doesn't want its dllimport definition messed with.
Assignee | ||
Comment 20•7 years ago
|
||
Okay, we tried three approaches to fix this.
1) https://hg.mozilla.org/try/rev/e9d1d7d60c9034526008709dd3823a428c1a07bf
This adds a new function
>PR_IMPLEMENT(PRStatus) PR_NewThreadPrivateIndexWithPR_Free(
> PRUintn *newIndex)
>{
> return PR_NewThreadPrivateIndex(newIndex, PR_Free);
>}
that we call in error_once_function instead, to avoid storing the address of a thunk.
2) https://hg.mozilla.org/try/rev/d85a8ca8b416b26055ce23bd7456e6c0a350a5c6
This changes mnop-fun-dllimport to Ddllimport=noop and then undefines it for PR_Free only in error.c
The uglier part is that without it, it's looking for __imp_PR_Free which doesn't exist. So we we add it with 'typeof(PR_Free) *__imp_PR_Free = PR_Free;'
3) https://hg.mozilla.org/try/rev/19f8badaebec136c917a49980ad88835f308d279
This is the same idea as #2, but it uses mno-nop-fun-dllimport. To make it work, we have to move error_once_function to a new file, and then add a new gyp target.
#3 is extremely ugly. #2 seems less ugly than #1. So the patch proposal is #2.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8974759 -
Flags: review?(franziskuskiefer)
Comment 24•7 years ago
|
||
mozreview-review |
Comment on attachment 8974759 [details]
Bug 1389967 In MinGW, work around a pointer to a function thunk disappearing when we unload nssckbi
https://reviewboard.mozilla.org/r/243152/#review249538
As your name says this is pretty hacky... Looking at the 3 options you mention I think I'd prefer #1. It gives us a real solution to the problem instead of hacking around it.
::: nsprpub/pr/src/malloc/prmem.c:462
(Diff revision 3)
> else
> free(ptr);
> }
>
> +// See the comment about MINGW_HACK in prmem.h
> +#if defined(__MINGW32__)
Is this always needed or only `#ifdef MINGW_HACK`?
Attachment #8974759 -
Flags: review?(franziskuskiefer)
Comment 25•6 years ago
|
||
mozreview-review |
Comment on attachment 8974759 [details]
Bug 1389967 In MinGW, work around a pointer to a function thunk disappearing when we unload nssckbi
https://reviewboard.mozilla.org/r/243152/#review249592
::: nsprpub/pr/include/prmem.h:63
(Diff revision 3)
> + ** we're not doing the fixup there! So inside prmem.c we create a
> + ** global function pointer to the real PR_Free that has the incorrect
> + ** name: __imp_PR_Free
> + */
> +#ifdef MINGW_HACK
> +#pragma push_macro("dllimport")
Do we want to limit the change to error.c? If we're going to support it anyway, it may be a good idea to have it enabled for other files as well. Other callers may also benefit from that. I'd suggest to change it to #ifdef __MINGW32__ or even #ifdef __MINGW_IMP_SYMBOL and leave error.c unchanged.
::: nsprpub/pr/src/malloc/prmem.c:520
(Diff revision 3)
> #endif
> }
>
> +// See the comment about MINGW_HACK in prmem.h
> +#if defined(__MINGW32__)
> +#if defined(HAVE_64BIT_BUILD)
mingw-w64 provides __MINGW_IMP_SYMBOL macro for that. It may be better to use it.
Comment 26•6 years ago
|
||
(In reply to Franziskus Kiefer [:fkiefer or :franziskus] from comment #24)
> As your name says this is pretty hacky... Looking at the 3 options you
> mention I think I'd prefer #1. It gives us a real solution to the problem
> instead of hacking around it.
FWIW (I'm not suggesting to do that, but it's probably too much of complication, I think we should take one of Tom's solution, but I think it's worth mentioning), I think that the real non-hack solution would be changing build system to not link the same object files into different modules, where sometimes they need dllimport, sometimes live in the same library as imported functions. MSVC somehow can handle that (although AFAIK it gives warnings and code may be sub-optimal), but it's not really the right thing to do. I recall that we had a bug for that, but I can't find it.
Comment hidden (mozreview-request) |
Comment 28•6 years ago
|
||
mozreview-review |
Comment on attachment 8974759 [details]
Bug 1389967 In MinGW, work around a pointer to a function thunk disappearing when we unload nssckbi
https://reviewboard.mozilla.org/r/243152/#review250200
I agree with Jacek that the way this is built is the real issue. But that's unfortunately nothing we can easily change.
From the doable solutions I like this one best.
To get this landed we need two separate patches for NSS and NSPR though.
Attachment #8974759 -
Flags: review?(franziskuskiefer)
Assignee | ||
Updated•6 years ago
|
Attachment #8974759 -
Attachment is obsolete: true
Assignee | ||
Comment 29•6 years ago
|
||
Attachment #8976283 -
Flags: review?(franziskuskiefer)
Assignee | ||
Comment 30•6 years ago
|
||
Attachment #8976284 -
Flags: review?(franziskuskiefer)
Assignee | ||
Comment 31•6 years ago
|
||
Franziskus - I will want to uplift these to esr60; what is the process for doing that with NSS/NSPR?
Updated•6 years ago
|
Attachment #8976283 -
Attachment is patch: true
Comment 32•6 years ago
|
||
Comment on attachment 8976283 [details] [diff] [review]
nss.patch
Review of attachment 8976283 [details] [diff] [review]:
-----------------------------------------------------------------
thanks. Can you put up a patch that I can apply? :) (this patch has windows line endings)
Attachment #8976283 -
Flags: review?(franziskuskiefer) → review+
Comment 33•6 years ago
|
||
Comment on attachment 8976284 [details] [diff] [review]
nspr.patch
Review of attachment 8976284 [details] [diff] [review]:
-----------------------------------------------------------------
lgtm
Attachment #8976284 -
Flags: review?(franziskuskiefer) → review+
Comment 34•6 years ago
|
||
(In reply to Tom Ritter [:tjr] from comment #31)
> Franziskus - I will want to uplift these to esr60; what is the process for
> doing that with NSS/NSPR?
Well that's a little difficult. We'd have to make new point releases for NSS and NSPR. There are currently no plans for an NSS 3.36.2 release and I don't know if we can require a new NSPR version for an NSS point release. So I'm not sure if this going to happen. Kai, what do you think?
Flags: needinfo?(kaie)
Assignee | ||
Comment 35•6 years ago
|
||
I'm not sure why one had Windows line endings and the other didn't, but maybe this will work...?
Attachment #8976283 -
Attachment is obsolete: true
Attachment #8976530 -
Flags: review?(franziskuskiefer)
Updated•6 years ago
|
Attachment #8976530 -
Flags: review?(franziskuskiefer) → review+
Comment 36•6 years ago
|
||
IIUC this fix is only required for the Windows environment.
The version dependency chain is tricky. If we change NSS on a stable branch on all platforms to require a new NSPR function, and if any packager fails to increase the version dependency on the NSPR package, things break. We had this happen in the past, when we changed NSS to use PR_GetEnvSecure instead of getenv.
Because you're changing nss/lib/base/error.c I suspect that the effect won't be limited to nssckbi, but will affect all NSS libs, which would then all require the newer NSPR version with the new symbol.
Could we limit the use of this new API to the platform that requires it? Could the error_once_function contain an #idef that uses the new API on mingw, only, and keeps using the old API on all other environments?
Are nspr/nss ever packaged separately from the application with mingw builds? If in all mingw builds the application is always bundled together with the nspr/nss libraries, then you have full control, and could use a local patch in your application to fix this, because you don't depend on external nspr/nss packages to contain this fix.
Is this correct? If yes, it would avoid a lot of work, because we wouldn't have to work on the uplifting to older branches.
Instead of branch uplifting, we could limit this change to the latest development versions. We could add the new API in NSPR 4.20. And NSS 3.38 could contain the updated error_once_function with the #ifdef to use the new API on the mingw environment, only. We can pick that up for Firefox 62.
If you require this to be added to the official Firefox 60 ESR.x branch, could you please briefly justify why?
If it's really necessary, we'd have to create an NSPR 4.19.1 release, and also NSS 3.36.2 (ESR 60) and NSS 3.37.1 (FF 61) releases.
In any case, there's one more change that the NSPR patch will require. All exported NSPR functions must be declared in file pr/src/nspr.def. You'd have to add a new section at the end of that file, with the first version of the library that provides it.
Flags: needinfo?(kaie)
Comment 37•6 years ago
|
||
Also, I have an idea for a cleaner and more general solution.
IIUC, you require a mechanism that returns a function pointer from the point of view from within the NSPR library.
How about a new API PR_GetNSPRScopeFunctionPointer(const char *name_of_function).
This function could contain a switch that returns function pointers based on the given name parameter.
If we ever require additional function pointers, besides PR_Free, we could reuse that API, and could simply extend the implementation of PR_GetNSPRScopeFunctionPointer to support more names. We'd still have to increase the version number dependency on future enhancements. However, with this approach, code that uses this API could be implemented in a semi failsafe way, because they could fall back like this:
ptr = PR_GetNSPRScopeFunctionPointer("PR_Free");
if (!ptr) { ptr = PR_Free; }
With this approach, we wouldn't introduce new immediate breakage if the required implementation isn't available in a secondary nspr package. Instead, it would be limited to the crash-at-shutdown that we have today.
Comment 38•6 years ago
|
||
Either like this:
static PRStatus
error_once_function(void)
{
#ifdef __MINGW32__
PRLibrary *lib = NULL;
if (PR_FindFunctionSymbolAndLibrary("PR_NewThreadPrivateIndexWithPR_Free",
&lib)) {
PR_UnloadLibrary(lib);
return PR_NewThreadPrivateIndexWithPR_Free(&error_stack_index);
}
#endif
return PR_NewThreadPrivateIndex(&error_stack_index, PR_Free);
}
or like this:
static PRStatus
error_once_function(void)
{
#ifdef __MINGW32__
PRLibrary *lib = NULL;
PRFuncPtr *func = NULL;
if (PR_FindFunctionSymbolAndLibrary("PR_GetNSPRScopeFunctionPointer",
&lib)) {
PR_UnloadLibrary(lib);
func = PR_GetNSPRScopeFunctionPointer("PR_Free");
if (func) {
return PR_NewThreadPrivateIndex(&error_stack_index, func);
}
}
#endif
return PR_NewThreadPrivateIndex(&error_stack_index, PR_Free);
}
Comment 39•6 years ago
|
||
I'd like to ask for the use of PR_FindFunctionSymbolAndLibrary for any official stable branches this must be backported to.
For Firefox 62 and newer, we don't need PR_FindFunctionSymbolAndLibrary but can assume the functions exist.
Comment 40•6 years ago
|
||
If MINGW never packages NSPR/NSS separately, and if the use of PR_NewThreadPrivateIndexWithPR_Free (or PR_GetNSPRScopeFunctionPointer) is limited with #idef __MINGW32__, then we don't need PR_FindFunctionSymbolAndLibrary on the stable branch either.
Assignee | ||
Comment 41•6 years ago
|
||
(In reply to Kai Engert (:kaie:) from comment #36)
> IIUC this fix is only required for the Windows environment.
Yes.
> Could we limit the use of this new API to the platform that requires it?
> Could the error_once_function contain an #idef that uses the new API on
> mingw, only, and keeps using the old API on all other environments?
Yes.
> Are nspr/nss ever packaged separately from the application with mingw
> builds?
This is needed for -esr60 (and eventually -central) builds in TaskCluster and for Tor (who will track esr60 in their own repo).
> If in all mingw builds the application is always bundled together
> with the nspr/nss libraries, then you have full control, and could use a
> local patch in your application to fix this, because you don't depend on
> external nspr/nss packages to contain this fix.
While Tor could apply a local patch in their repo, we would like to have these patches in TaskCluster builds of esr60. I don't know what the options are for this: apply a patch directly onto -esr60? apply a patch during the ./mach build process?
> If you require this to be added to the official Firefox 60 ESR.x branch,
> could you please briefly justify why?
Our goal (in ~1-2 months) is to get MinGW x86/x64 running in esr60 in TC with passing tests. Without this fix, all the tests fail. We'd like to have TC esr60 building and running the MinGW build to avoid unexpected build breakage from changes in esr60.
We can delay bringing this into esr60 until we've cleared all other test blockers, but we would like to bring it in in that timeframe (1-2 months).
> How about a new API PR_GetNSPRScopeFunctionPointer(const char
> *name_of_function).
This approach is fine by me.
Comment 42•6 years ago
|
||
Tom, thanks for your response. You also clarified (on IRC) that none of the mingw environments require separate packaging of nss/nspr.
I'd like to dismiss my idea about the generic PR_GetNSPRScopeFunctionPointer API. It doesn't really create a benefit, if the availability of this API cannot guarantee that any function pointer can be obtained with it, but that some kind of version dependency would still be necessary. Also, I'd like to dismiss the idea about checking whether the symbol is present in a shared library. If we are careful about not using it outside of mingw on the stable branches, we should never run into a situation were it isn't available.
I still think that function PR_NewThreadPrivateIndexWithPR_Free isn't ideal.
So here's a new suggestion:
- Instead of PR_NewThreadPrivateIndexWithPR_Free, we create a new NSPR API:
PR_GetFuncPtr_PR_Free, which just does "return PR_Free".
- we create a new NSPR 4.19.1 release that offers PR_GetFuncPtr_PR_Free
for all platforms.
(Both ESR 60 and FF 61 use NSPR 4.19, so this release can be sufficient
for both.)
- ESR 60 uses NSS 3.36.1
We release a new NSS 3.36.2, which only uses the new NSPR API on the
mingw platform. This way we should avoid all dependency issues for
all platforms that package it separately.
static PRStatus
error_once_function(void)
{
#ifdef __MINGW32__
return PR_NewThreadPrivateIndex(&error_stack_index, PR_GetFuncPtr_PR_Free());
#else
return PR_NewThreadPrivateIndex(&error_stack_index, PR_Free);
#endif
}
- Because FF 61 uses NSS 3.37, we should probably release NSS 3.37.1,
which contains the same platform specific fix. This will ensure that upgrading
to a newer version doesn't reintroduce the same issue.
- In theory, for FF 62 and newer, we could get rid of the platform specific
implementation of the error_once_function.
However, it might also be fine to keep it. It would be a good documentation
why this is necessary.
So my suggestion is, use the same #ifdef on NSS trunk for NSS 3.38+.
Comment 43•6 years ago
|
||
(In reply to Kai Engert (:kaie:) from comment #42)
> - Instead of PR_NewThreadPrivateIndexWithPR_Free, we create a new NSPR API:
> PR_GetFuncPtr_PR_Free, which just does "return PR_Free".
(Sorry if I'm intruding)
Since all of this is limited to MinGW anyway, is `PR_NewThreadPrivateIndexWithPR_Free` much different from just calling the system `GetProcAddress`?
Comment 44•6 years ago
|
||
(Er, I meant `PR_GetFuncPtr_PR_Free`.)
Comment 45•6 years ago
|
||
I don't know the answer to comment 43 and 44. I have the impression that what matters is the location of the caller. If you try to obtain the function address from within NSS, what you get is apparently different from what you get, if you obtain that address from within the NSPR scope. That's how I understood Tom's explanation. I don't know if GetProcAddress would be independent of the scope from which it is called.
Comment 46•6 years ago
|
||
One minor change.
Instead of calling the NSPR release 4.19.1, I'd prefer to call it 4.20, but it depends on being able to convince the ESR 60.x drivers to accept such an upgrade.
So far, the NSPR trunk hasn't received any functional changes. Only platform specific build defines had been added. These changes should be fine for the ESR 60 branch. Calling it 4.20 would be more aligned with the rule that new APIs aren't introduced on stable NSPR branches.
You'd have to ask that ESR 60 gets upgraded to NSPR 4.20 (instead of 4.19.1), which also picks up these two minor changes for the Risc-V and FreeBSD platforms. I think this should be accepteable for the ESR 60.x release drivers.
Comment 47•6 years ago
|
||
(In reply to Kai Engert (:kaie:) from comment #45)
> I don't know if GetProcAddress would be independent of the scope from which it is called.
Windows would look it up using nss3.dll's export table, so it would not be from error_once_function's perspective.
Comment 48•6 years ago
|
||
NSPR part for my latest suggestion.
Comment 49•6 years ago
|
||
NSS part
Comment 50•6 years ago
|
||
(In reply to David Major [:dmajor] from comment #47)
> (In reply to Kai Engert (:kaie:) from comment #45)
> > I don't know if GetProcAddress would be independent of the scope from which it is called.
>
> Windows would look it up using nss3.dll's export table, so it would not be
> from error_once_function's perspective.
If GetProcAddress(PR_Free) would solve the issue, that would be great. No change to NSPR would then be necessary.
Tom, would you like to try that?
Assignee | ||
Comment 51•6 years ago
|
||
(In reply to Kai Engert (:kaie:) from comment #50)
> (In reply to David Major [:dmajor] from comment #47)
> > (In reply to Kai Engert (:kaie:) from comment #45)
> > > I don't know if GetProcAddress would be independent of the scope from which it is called.
> >
> > Windows would look it up using nss3.dll's export table, so it would not be
> > from error_once_function's perspective.
>
> If GetProcAddress(PR_Free) would solve the issue, that would be great. No
> change to NSPR would then be necessary.
>
> Tom, would you like to try that?
Sure I'll give it a shot. Won't have results until next week.
Assignee | ||
Comment 52•6 years ago
|
||
Seems to work!
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3dc264cbf93b32256802110a71ebb7de007849f8&filter-searchStr=mingw&selectedJob=179518769
I'd like to hold off on landing this for the time being until I've got everything else in place and know for certain this is sufficient.
Attachment #8976284 -
Attachment is obsolete: true
Attachment #8976530 -
Attachment is obsolete: true
Attachment #8977033 -
Attachment is obsolete: true
Attachment #8977034 -
Attachment is obsolete: true
Attachment #8979285 -
Flags: review?(franziskuskiefer)
Attachment #8979285 -
Flags: review?(dmajor)
Attachment #8979285 -
Flags: feedback?(kaie)
Comment 53•6 years ago
|
||
Comment on attachment 8979285 [details] [diff] [review]
nss-getprocaddress.patch
Review of attachment 8979285 [details] [diff] [review]:
-----------------------------------------------------------------
I can't review in nss land but this looks good to me!
Are we sure that nss3.dll will always be loaded at the point of this call? Otherwise GetModuleHandle will return null and GetProcAddress will be unhappy. I'll let the reviewers weigh in but IMO we shouldn't worry about it unless we have a known case where nss3 isn't present.
::: lib/base/error.c
@@ +83,5 @@
> + * crash. So we do this bit of ugly to avoid that crash. Fortunately
> + * this is the only place we've had to do this.
> + */
> +#if defined(__MINGW32__)
> + HMODULE nss3 = GetModuleHandle("nss3");
Minor nitpick, I'd write this as GetModuleHandleW(L"nss3.dll") to avoid making Windows do the translation for you. Same 'W' and 'L' on GetProcAddress.
Attachment #8979285 -
Flags: review?(dmajor) → feedback+
Comment 54•6 years ago
|
||
Might as well use A variants rather than W.
Comment 55•6 years ago
|
||
Why? 'A' means that Windows has to do the conversion for you.
Comment 56•6 years ago
|
||
Great, so we don't need a new NSPR API, which simplifies the situation a lot.
The implementation of the PR_Free function isn't inside nss3, it's implemented in the nspr4 library.
The Firefox build configuration for Windows uses library folding. Which DLL contains the NSPR library code? You should use GetModuleHandle with that library name.
(I cannot comment on the use of character conversion functions on Windows.)
Once we have found the final code (after the library name and conversion discussion), we should still make a new NSS release for the ESR 60.x branch, IMHO.
Comment 57•6 years ago
|
||
On Windows, the implementation of PR_Free is indeed in nss3.dll.
Assignee | ||
Comment 58•6 years ago
|
||
Updated patch to use W variants.
Attachment #8979285 -
Attachment is obsolete: true
Attachment #8979285 -
Flags: review?(franziskuskiefer)
Attachment #8979285 -
Flags: feedback?(kaie)
Attachment #8980590 -
Flags: review?(franziskuskiefer)
Assignee | ||
Updated•6 years ago
|
Attachment #8980590 -
Attachment is patch: true
Assignee | ||
Comment 59•6 years ago
|
||
Attachment #8980590 -
Attachment is obsolete: true
Attachment #8980590 -
Flags: review?(franziskuskiefer)
Attachment #8980633 -
Flags: review?(franziskuskiefer)
Updated•6 years ago
|
Attachment #8980633 -
Attachment is patch: true
Comment 60•6 years ago
|
||
Comment on attachment 8980633 [details] [diff] [review]
nss-getprocaddress.patch
Review of attachment 8980633 [details] [diff] [review]:
-----------------------------------------------------------------
lgtm, except for the comment.
Let me know then when you want this landed.
::: lib/base/error.c
@@ +84,5 @@
> + * this is the only place we've had to do this.
> + */
> +#if defined(__MINGW32__)
> + HMODULE nss3 = GetModuleHandleW(L"nss3");
> + return PR_NewThreadPrivateIndex(&error_stack_index, GetProcAddress(nss3, "PR_Free"));
I'd like to be prudent here and check that nss3 and the result of GetProcAddress are not null.
Attachment #8980633 -
Flags: review?(franziskuskiefer)
Comment 61•6 years ago
|
||
FWIW, in mingw clang build, I managed to support library folding without ignoring dllimport attribute. This means that clang builds shouln't be affected by this bug.
Updated•6 years ago
|
Depends on: mingw-clang
Assignee | ||
Updated•6 years ago
|
Blocks: mingw-clang
No longer depends on: mingw-clang
Comment 62•6 years ago
|
||
I'm not sure it should block 1465790. In clang builds we don't use dllimport tricks, so this bug should not happen.
Assignee | ||
Comment 64•6 years ago
|
||
Forget to update this patch to check return values; finally did so.
Attachment #8980633 -
Attachment is obsolete: true
Attachment #8993135 -
Flags: review?(franziskuskiefer)
Updated•6 years ago
|
Attachment #8993135 -
Attachment is patch: true
Comment 65•6 years ago
|
||
Comment on attachment 8993135 [details] [diff] [review]
nss-getprocaddess.patch
Review of attachment 8993135 [details] [diff] [review]:
-----------------------------------------------------------------
lgtm but can you make one that doesn't have windows line endings?
Attachment #8993135 -
Flags: review?(franziskuskiefer)
Assignee | ||
Comment 66•6 years ago
|
||
(In reply to Franziskus Kiefer [:fkiefer or :franziskus] from comment #65)
> Comment on attachment 8993135 [details] [diff] [review]
> nss-getprocaddess.patch
>
> Review of attachment 8993135 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> lgtm but can you make one that doesn't have windows line endings?
Hopefully this works. I don't know why that happens, I didn't write it on Windows, I didn't post it on Windows, etc....
Attachment #8993135 -
Attachment is obsolete: true
Assignee | ||
Updated•6 years ago
|
Attachment #8993399 -
Flags: review?(franziskuskiefer)
Updated•6 years ago
|
Attachment #8993399 -
Flags: review?(franziskuskiefer) → review+
Comment 67•6 years ago
|
||
Status: NEW → RESOLVED
Closed: 6 years ago
Component: General → Libraries
Product: Firefox Build System → NSS
Resolution: --- → FIXED
Target Milestone: --- → 3.39
Version: 52 Branch → other
You need to log in
before you can comment on or make changes to this bug.
Description
•