Closed Bug 4303 Opened 26 years ago Closed 26 years ago

[PP]Running the first time (without ~/.mozilla/registry) crashes both viewer and apprunner

Categories

(NSPR :: NSPR, defect, P1)

x86
Linux
defect

Tracking

(Not tracked)

VERIFIED DUPLICATE of bug 8849

People

(Reporter: ramiro, Assigned: alecf)

Details

I get the following error: *** SilentDownload is being registered ************************************************** nsComponentManager: Load(/builds3/ramiro/s/mozilla/dist/bin/components/libxpinstall.so) FAILED with error: /builds3/ramiro/s/mozilla/dist/bin/components/libxpinstall.so: undefined symbol: ZIP_CloseArchive ************************************************** ************************************************** nsComponentManager: Load(libxpcom.so) FAILED with error: /builds3/ramiro/s/mozilla/dist/bin/components/libxpinstall.so: undefined symbol: ZIP_CloseArchive ************************************************** Aborted (core dumped)
There are two separate problems here. The first is a problem with the XPinstall build. But XPCOM should not dump core because it can't load one of the files in the components directory.
There are two separate problems here. The first is a problem with the XPinstall build. But XPCOM should not dump core because it can't load one of the files in the components directory.
dp is working on the dlopen problem. I see the same behavior in both dual and single cpu machines, fyi.
This looks related to 4306.
Assignee: dveditz → larryh
Component: XPInstall → NSPR
For the record, the problem is that linux seems to core dump on dlopen() if a previous dlopen() failed. It is supected that PR_LoadLibrary() isn't clearing error if there was one and that is causing this wierdness.
Ccing shaver in.
dp / ramiro are you sure it core dumps? For me, it was just exiting with 1.
Status: NEW → ASSIGNED
I ran the dlltest.c test case again on both RedHat 5.2 and RH52 with kernel 2.2.1. I was unable to reproduce the symptom. ... OK, dlltest.c is pretty simple. So, I hacked dlltest.c to do a PR_LoadLibrary() on a known non existent library, then did another PR_LoadLibrary() on a library known to exist. The test passes. Absent a core dump, stack trace, other diagnostic data, I'm stuck. Somebody got more data?
Today, 3-29-99I am not seeing this problem I removed my registry and ran apprunner, and it did not exit 1. I'll double check and report back
This problem has been sporadic for the last few weeks. Sometimes I see it if I recompile a library and forget a symbol (e.g. forget the =0 in an nsI*.h interface file); then when the library load fails, the app either crashes or exits, and sometimes will continue to do so for the first run or two after you remove the registry file. A clean build with all correct libraries (no unknown symbols) usually won't demonstrate the problem; you have to have one or more libraries with missing symbols or other problems.
this is a wily bug. it seems that every other checkout and build can switch behaviour, depending on how the libraries got linked. we should keep this bug open until we are sure it is squashed.
More traffic on seamonkey-eng. Talked to dp to get a better understanding of what is going on. ... Here it is as I understand it: Client says PR_LoadLibrary(). The library being loaded itself needs some library. The needed library has unresolved symbols. Subsequent calls to PR_LoadLibrary() fails even if no error would occur. dp suspected that dlerror() was not being called by NSPR after the first error, that the man page says the error must be cleared by a call to dlerror() before other dlopen() calls can succeed. By inspection, I believe we determined that PR_LoadLibrary() does call dlerror() via DLLErrorInternal()for Linux after dlopen() fails. Somebody check my work: ...nsprpub/pr/src/linking/prlink.c. I'm gonna try to construct a test case that operates as described above to see if I can reproduce the problem. ... Target: RH Linux 5.2, kernel 2.0.36. Will that do it?
larry: yes, thats a good setup to test. If I understood dp correctly, the problem is that if dlerror() returns a real error (non NULL) it needs to be cleared before calling other dl functions. Is this right, dp ? According to the man page below, if dlerror() is called following a dl call that resulted in an error, it will return NULL. So, is this a bug in dlerror() ? It doesnt behave as the man says ? man dlerror: If dlopen fails for any reason, it returns NULL. A human readable string describing the most recent error that occurred from any of the dl routines (dlopen, dlsym or dlclose) can be extracted with dlerror(). dlerror returns NULL if no errors have occurred since initialization or since it was last called. (Calling dlerror() twice con- secutively, will always result in the second call return- ing NULL.) One workaround to try might be to enable xpinstall (or other broken components) on unix and try it on another platform. Solaris with gcc 2.7 for example and see if dlerror() is broken only on unix.
I was unable to reproduce the problem with my build from Friday April 9. Does anyone have a reliable way to reproduce this?
Summary: Running the first time (without ~/.mozilla/registry) crashes both viewer and apprunner → [PP]Running the first time (without ~/.mozilla/registry) crashes both viewer and apprunner
Assignee: larryh → dveditz
Status: ASSIGNED → NEW
Sigh. ... I have been unable to reproduce this. I'm giving this back to dveditz.
NSPR now has its own Bugzilla product. Moving this bug to the NSPR product.
Target Milestone: M6
This looks like it's working for me, too. Putting this on M6 radar.
Assignee: dveditz → dp
This bug seems to have morphed into a dlopen() bug
Assignee: dp → larryh
Here is a way to reproduce this: - cd intl/strres/src - apply the following patch to nsStringBundle.cpp - change this to #if 1 - gmake Now if you run apprunner, you will see the problem. Here is the patch. All it does is defines an undefined symbol in the libstrres.so component. Index: nsStringBundle.cpp =================================================================== RCS file: /cvsroot/mozilla/intl/strres/src/nsStringBundle.cpp,v retrieving revision 1.12 diff -c -r1.12 nsStringBundle.cpp *** nsStringBundle.cpp 1999/04/22 07:32:49 1.12 --- nsStringBundle.cpp 1999/05/01 14:16:52 *************** *** 64,69 **** --- 64,76 ---- { NS_INIT_REFCNT(); + #if 0 + // XXX specially for larryh to + // XXX reproduce the linux dlopen() crash bug# 5795 + extern int undefined_symbol; + undefined_symbol = 1; + #endif + mProps = nsnull; nsINetService* pNetService = nsnull;
Status: NEW → ASSIGNED
Target Milestone: M6 → M7
I dont see progress on this one. Larry can we plan to get this in for M7.
If you need help like a tree to debug and stuff, let me know.
not likely to show in m6 release builds. need to get this fixed as soon as we can in m7
Target Milestone: M7 → M8
would like to get this in m8.
Target Milestone: M8 → M9
Assignee: larryh → alecf
Status: ASSIGNED → NEW
if this instance of the problem is seen on 5.2 only alecf and leaf are posting the upgrade minimums that we need to resolve this problem andm the doced in bug 8849. and we can close this bug out. we need to drop support for standard RH 5.2 installations.
Status: NEW → RESOLVED
Closed: 26 years ago
Resolution: --- → DUPLICATE
oh, HERE is this bug...yes, this is the same as 8849. I'm marking dupe. *** This bug has been marked as a duplicate of 8849 ***
Status: RESOLVED → VERIFIED
Target Milestone: M9 → ---
You need to log in before you can comment on or make changes to this bug.