Closed
Bug 4303
Opened 26 years ago
Closed 26 years ago
[PP]Running the first time (without ~/.mozilla/registry) crashes both viewer and apprunner
Categories
(NSPR :: NSPR, defect, P1)
Tracking
(Not tracked)
People
(Reporter: ramiro, Assigned: alecf)
Details
I get the following error:
*** SilentDownload is being registered
**************************************************
nsComponentManager:
Load(/builds3/ramiro/s/mozilla/dist/bin/components/libxpinstall.so) FAILED with
error: /builds3/ramiro/s/mozilla/dist/bin/components/libxpinstall.so: undefined
symbol: ZIP_CloseArchive
**************************************************
**************************************************
nsComponentManager: Load(libxpcom.so) FAILED with error:
/builds3/ramiro/s/mozilla/dist/bin/components/libxpinstall.so: undefined symbol:
ZIP_CloseArchive
**************************************************
Aborted (core dumped)
Comment 1•26 years ago
|
||
There are two separate problems here. The first is a problem with the XPinstall
build. But XPCOM should not dump core because it can't load one of the files in
the components directory.
Comment 2•26 years ago
|
||
There are two separate problems here. The first is a problem with the XPinstall
build. But XPCOM should not dump core because it can't load one of the files in
the components directory.
dp is working on the dlopen problem.
I see the same behavior in both dual and single cpu machines, fyi.
Comment 4•26 years ago
|
||
This looks related to 4306.
Comment 5•26 years ago
|
||
added alecf to the cc
Updated•26 years ago
|
Assignee: dveditz → larryh
Component: XPInstall → NSPR
Comment 6•26 years ago
|
||
For the record, the problem is that linux seems to core dump on dlopen() if a
previous dlopen() failed.
It is supected that PR_LoadLibrary() isn't clearing error if there was one and
that is causing this wierdness.
Comment 7•26 years ago
|
||
Ccing shaver in.
Comment 8•26 years ago
|
||
dp / ramiro are you sure it core dumps? For me, it was just exiting with 1.
Updated•26 years ago
|
Status: NEW → ASSIGNED
Comment 9•26 years ago
|
||
I ran the dlltest.c test case again on both RedHat 5.2 and RH52 with kernel
2.2.1. I was unable to reproduce the symptom. ... OK, dlltest.c is pretty
simple. So, I hacked dlltest.c to do a PR_LoadLibrary() on a known non existent
library, then did another PR_LoadLibrary() on a library known to exist. The test
passes.
Absent a core dump, stack trace, other diagnostic data, I'm stuck. Somebody got
more data?
Comment 10•26 years ago
|
||
Today, 3-29-99I am not seeing this problem
I removed my registry and ran apprunner, and it did not exit 1.
I'll double check and report back
Comment 11•26 years ago
|
||
This problem has been sporadic for the last few weeks. Sometimes I see it if I
recompile a library and forget a symbol (e.g. forget the =0 in an nsI*.h
interface file); then when the library load fails, the app either crashes or
exits, and sometimes will continue to do so for the first run or two after you
remove the registry file.
A clean build with all correct libraries (no unknown symbols) usually won't
demonstrate the problem; you have to have one or more libraries with missing
symbols or other problems.
Comment 12•26 years ago
|
||
this is a wily bug. it seems that every other checkout and build can switch
behaviour, depending on how the libraries got linked.
we should keep this bug open until we are sure it is squashed.
Comment 13•26 years ago
|
||
More traffic on seamonkey-eng. Talked to dp to get a better understanding of
what is going on. ... Here it is as I understand it:
Client says PR_LoadLibrary(). The library being loaded itself needs some
library. The needed library has unresolved symbols. Subsequent calls to
PR_LoadLibrary() fails even if no error would occur.
dp suspected that dlerror() was not being called by NSPR after the first
error, that the man page says the error must be cleared by a call to dlerror()
before other dlopen() calls can succeed. By inspection, I believe we determined
that PR_LoadLibrary() does call dlerror() via DLLErrorInternal()for Linux after
dlopen() fails. Somebody check my work: ...nsprpub/pr/src/linking/prlink.c.
I'm gonna try to construct a test case that operates as described above to see
if I can reproduce the problem. ... Target: RH Linux 5.2, kernel 2.0.36. Will
that do it?
Comment 14•26 years ago
|
||
larry: yes, thats a good setup to test.
If I understood dp correctly, the problem is that if dlerror() returns a real
error (non NULL) it needs to be cleared before calling other dl functions.
Is this right, dp ? According to the man page below, if dlerror() is called
following a dl call that resulted in an error, it will return NULL.
So, is this a bug in dlerror() ? It doesnt behave as the man says ?
man dlerror:
If dlopen fails for any reason, it returns NULL. A human
readable string describing the most recent error that
occurred from any of the dl routines (dlopen, dlsym or
dlclose) can be extracted with dlerror(). dlerror returns
NULL if no errors have occurred since initialization or
since it was last called. (Calling dlerror() twice con-
secutively, will always result in the second call return-
ing NULL.)
One workaround to try might be to enable xpinstall (or other broken components)
on unix and try it on another platform. Solaris with gcc 2.7 for example and
see if dlerror() is broken only on unix.
Comment 15•26 years ago
|
||
I was unable to reproduce the problem with my build from Friday April 9. Does
anyone have a reliable way to reproduce this?
Summary: Running the first time (without ~/.mozilla/registry) crashes both viewer and apprunner → [PP]Running the first time (without ~/.mozilla/registry) crashes both viewer and apprunner
Updated•26 years ago
|
Assignee: larryh → dveditz
Status: ASSIGNED → NEW
Comment 16•26 years ago
|
||
Sigh. ... I have been unable to reproduce this.
I'm giving this back to dveditz.
Comment 17•26 years ago
|
||
NSPR now has its own Bugzilla product. Moving this bug to the NSPR product.
Updated•26 years ago
|
Target Milestone: M6
Comment 18•26 years ago
|
||
This looks like it's working for me, too.
Putting this on M6 radar.
Updated•26 years ago
|
Assignee: dveditz → dp
Comment 19•26 years ago
|
||
This bug seems to have morphed into a dlopen() bug
Updated•26 years ago
|
Assignee: dp → larryh
Comment 20•26 years ago
|
||
Here is a way to reproduce this:
- cd intl/strres/src
- apply the following patch to nsStringBundle.cpp
- change this to #if 1
- gmake
Now if you run apprunner, you will see the problem.
Here is the patch. All it does is defines an undefined symbol in the
libstrres.so component.
Index: nsStringBundle.cpp
===================================================================
RCS file: /cvsroot/mozilla/intl/strres/src/nsStringBundle.cpp,v
retrieving revision 1.12
diff -c -r1.12 nsStringBundle.cpp
*** nsStringBundle.cpp 1999/04/22 07:32:49 1.12
--- nsStringBundle.cpp 1999/05/01 14:16:52
***************
*** 64,69 ****
--- 64,76 ----
{
NS_INIT_REFCNT();
+ #if 0
+ // XXX specially for larryh to
+ // XXX reproduce the linux dlopen() crash bug# 5795
+ extern int undefined_symbol;
+ undefined_symbol = 1;
+ #endif
+
mProps = nsnull;
nsINetService* pNetService = nsnull;
Updated•26 years ago
|
Status: NEW → ASSIGNED
Updated•26 years ago
|
Target Milestone: M6 → M7
Comment 21•26 years ago
|
||
I dont see progress on this one.
Larry can we plan to get this in for M7.
Comment 22•26 years ago
|
||
If you need help like a tree to debug and stuff, let me know.
Comment 23•26 years ago
|
||
not likely to show in m6 release builds.
need to get this fixed as soon as we can in m7
Updated•26 years ago
|
Target Milestone: M7 → M8
Comment 24•26 years ago
|
||
would like to get this in m8.
Updated•26 years ago
|
Target Milestone: M8 → M9
Comment 25•26 years ago
|
||
Updated•26 years ago
|
Assignee: larryh → alecf
Status: ASSIGNED → NEW
Comment 26•26 years ago
|
||
if this instance of the problem is seen on 5.2 only
alecf and leaf are posting the upgrade minimums that we need to
resolve this problem andm the doced in bug 8849.
and we can close this bug out.
we need to drop support for standard RH 5.2 installations.
Assignee | ||
Updated•26 years ago
|
Status: NEW → RESOLVED
Closed: 26 years ago
Resolution: --- → DUPLICATE
Assignee | ||
Comment 27•26 years ago
|
||
oh, HERE is this bug...yes, this is the same as 8849.
I'm marking dupe.
*** This bug has been marked as a duplicate of 8849 ***
Updated•26 years ago
|
Status: RESOLVED → VERIFIED
Updated•25 years ago
|
Target Milestone: M9 → ---
You need to log in
before you can comment on or make changes to this bug.
Description
•