Closed Bug 89488 Opened 23 years ago Closed 23 years ago

Profile mgr and java cause hang (was: mozilla catatonic with Java installed)

Categories

(Core Graveyard :: Java: OJI, defect, P2)

x86
Linux
defect

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.0

People

(Reporter: siemsen, Assigned: dbaron)

References

Details

(Keywords: regression, Whiteboard: fixed on branch)

Attachments

(5 files)

I just installed Mozilla 0.9.2 on my Dell Laptop, Pentium III, SuSE Linux 7.0, 2.4.5 kernel, XFree86 4.0.3, Gnome, sawfish. Worked fine. Then I installed the JRE 1.3 plug-in, and got "Installation successful". I exited Mozilla. When I restarted, I got a message that some part of Mozilla was still running, so I rebooted. Now when I start Mozilla, I get the "Select user profile" window as usual, and the I am inside the initialize Hey : You are in QFA Startup (QFA)Talkback loaded Ok. ...and then nothing. I found the release note about putting a link in my mozilla0.9.2/plugins directory for libjavaplugin_oji.so. That seems to apply to a previous version of mozilla, because the link was already there when I looked. Note that the Release Notes says "libjava.oji.so directory", which I think should say "libjavaplugin_oji.so file". Anyway I tried renaming the link to something meaningless, to make mozilla ignore the Java plugin and start working again, but no luck: it just hangs. This is quite reproducible: I can't run mozilla at all now. Is there a command-line argument I can use to produce verbose debugging to give you more information than this?
thisis probably the profile manager issue where you cannot start with an old profile or something..i do not know the bug number.. ccing the qa
I believe you are beyond profiles here. I think the console lines before what is included here are what profile passed in etc. ccing dbragg for insights on the first message - remove xpicleanup.dat?
I'd need to see the actual text of the "mozilla is running" message. Was it more like, "Mozilla needs to shutdown to allow a previous installation to finish"? Are you getting absolutely nothing when you start mozilla or are you getting a message dialog everytime?
Seeing this on the commercial builds as well. If you install the commercial build with custom or full install method and include the Java Plugin, this hangs the browser on initial startup. Changing severity to critical and adding nsbranch keyword. Confirming bug. The last known set of builds that worked was the July02 builds. Trying to trace down what changed between July 02 and July 03.
Severity: normal → critical
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: nsBranch
shannon, i believed you mentioned that you are seeing this on HPUX. adding you to Cc.
Is this a plugin issue or an installer issue? What happens if one manually installs the Java plugin?
Peter, I installed the plugin from http://home.netscape.com/plugins/jvm.html and the browser started with no problems (using one single profile). I see the problem if I do a clean installation of the commercial and trunk bits on my linux system (clean meaning I delete my .mozilla directory). If I install todays bits (2001070510) with the "Recommended" option, browser starts up fine. If choose a full or custom, and specifically choose to install the JVM plugin with no .mozilla directory on home directory, will reproduce this hang. From what the reporter stated, he had multiple profiles in his .mozilla directory. After I installed the Java Plugin with multiple profiles in my .mozilla directory, and attempted to restart the browser I can reproduce the hang. So the issue happens in both cases...after plugin install (assuming multiple profiles) and during initial browser install (with no profiles existing on the system).
see bug 89188, sounds similar
*** Bug 89188 has been marked as a duplicate of this bug. ***
Cancel the comment about the july 02 builds working. Just checked my build, and I installed it without the java plugin (thus multi-profiles work). I've been seeing variations of this hang for a while (see bug 897843, and bugscape-bug 5992), hopefully this is narrowed down to the java plugin installation.
Whiteboard: critical for 0.9.2
reassign : dbragg?
Assignee: av → dbragg
QA Contact: shrir → gbush
Uh, No one has verified that this is a java "installation" issue and not just a java problem that I can see. Does anyone know what the java installer is putting on the system? Sounds like the java installer (a Sun product as far as I know) is putting things on the system that break Mozilla.
I don't know exactly who wrote the java installer or what it's SUPPOSED to be doing but install issues are supposed to go to Syd. reassigning.
Assignee: dbragg → syd
Blocks: 88893
adding bug this blocks Syd, See bug 88893 - N6 does not launch when using profile manager - with FULL setup type build. I confirmed by doing all my regular Profile and Activation tests with a recommended setup type. Then I added Java to the recommended installation and am unable to launch- even with profiles created in first set of tests. I can also reproduce the problem with a FULL setup type install etc
see bug 5016- java installs ok on commercial trunk
Yeah but that was closed on 5/23. It seems like this has broken since then. Can someone verify it's working on latest commercial branch/trunk?
Found a simple work-around! The comments about different results depending on the existence of multiple profiles made me try deleting the "default" profile, leaving only my own profile. This fixed the problem. Thanks again! I'm a bugzilla newbie, so I don't know the "proper" disposition of this bug. I suppose it should have a lower priority, and be marked "work-around exists". BTW, when the bug was alive, another "fix" was to run mozilla as root. Thanks to all for the help! Now if I can just figure out how to make mozilla stop using massively ugly fonts...
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → LATER
This is NOT working on the 2001070604 branch build for either K'Trina or me Once I install Java whether by FULL setup type install, or by RECOMMENDED and add Java using the plugin, I cannot launch N6 Marking blocker - if you want to use Java, you cannot launch with a new or migrated profile- not sure about old profile but will test
Severity: critical → blocker
Status: RESOLVED → REOPENED
Keywords: nsbeta1, regression
Resolution: LATER → ---
Lisa, Wanted you to be aware of this going into PDT today.
PDT+. We need to fully understand what's going on here. Adding a Sun person to the CC list...
Whiteboard: critical for 0.9.2 → PDT+
Reporter: Can you use other plugins other than Java Plugin such as flash media player?
Is there any eta that can be added to the status whiteboard?
Whiteboard: PDT+ → PDT+; no eta
Manually installing the flash player files ShockwaveFlash.class and libflashplayer.so files in the plugins directory works fine using todays branch builds. Installed on single profile, and multiprofile environment. The Java plugin looks like the only plugin that results in a hang for multi-profiles and initial install (install w/out .mozilla directory and migrate a 4.x profile).
I did a custom install - deselected Java, select Flash Player and profiles acted as expected, app launched as expected
Looking
Status: REOPENED → ASSIGNED
Adding eng. mgr. of Java Plug-In group; don't know if this is relevant to him, though.
*** Bug 64351 has been marked as a duplicate of this bug. ***
Attached file files we install (deleted) —
verified failure/success scenarios. Here are the files we install (did a diff of the directory contents). Can someone verify there is nothing missing?
Second patch indicates that there are some missing files, can someone intimate with the JRE plugin comment? This was a "typical" install followed by a visit to http://home.netscape.com/plugins/jvm.html, followed by a directory compare against the installed version that fails.
bug happens for me if I installer with the stub (native) installer if I have a profile already established on the machine. I fail to see the connection between profiles and the JRE at this point.
are we sure that we have the same version of the JRE on the smartupdate (it is smartupdate, right) download page as we are currently shipping? cc'ing jcall and maier for help figuring out what we have on the download page.
Do we have two different Java versions here? I think we use 1.3.1 now but used to use 1.3.0_x
Here are stack traces of the various threads when we are "hung" #0 0x405592c7 in __poll (fds=0x830a1a8, nfds=3, timeout=9) at ../sysdeps/unix/sysv/linux/poll.c:63 #1 0x40369485 in g_main_poll () from /usr/lib/libglib-1.2.so.0 #2 0x40368dca in g_main_iterate () from /usr/lib/libglib-1.2.so.0 #3 0x403691cc in g_main_run () from /usr/lib/libglib-1.2.so.0 #4 0x4027fe57 in gtk_main () from /usr/lib/libgtk-1.2.so.0 #5 0x40876f70 in NSGetModule () from /tmp/nsinstallertest/components/libwidget_gtk.so #6 0x40710a5a in NSGetModule () from /tmp/nsinstallertest/components/libnsappshell.so #7 0x0804f8bf in main1 () #8 0x08050165 in main () #9 0x4049eb65 in __libc_start_main (main=0x8050038 <main>, argc=1, ubp_av=0xbffffa44, init=0x804b2d0 <_init>, fini=0x8051e78 <_fini>, rtld_fini=0x4000df24 <_dl_fini>, stack_end=0xbffffa3c) at ../sysdeps/generic/libc-start.c:111 #0 0x405592c7 in __poll (fds=0x8103f14, nfds=1, timeout=2000) at ../sysdeps/unix/sysv/linux/poll.c:63 #1 0x401d97b0 in __pthread_manager (arg=0xb) at manager.c:148 #2 0x401da29b in __pthread_manager_event (arg=0xb) at manager.c:230 #0 0x405592c7 in __poll (fds=0xbf7ffa9c, nfds=1, timeout=28948) at ../sysdeps/unix/sysv/linux/poll.c:63 #1 0x401be684 in PR_Poll () from /tmp/nsinstallertest/libnspr4.so #2 0x4077af22 in NSGetModule () from /tmp/nsinstallertest/components/libnecko.so #3 0x401375fa in nsThread::Main () from /tmp/nsinstallertest/libxpcom.so #4 0x401bf6ee in PR_Select () from /tmp/nsinstallertest/libnspr4.so #5 0x401d9a4f in pthread_start_thread_event (arg=0xbf7ffc00) at manager.c:274 #0 0x404af585 in __sigsuspend (set=0xbf5ff9b8) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45 #1 0x401dc4b9 in __pthread_wait_for_restart_signal (self=0xbf5ffc00) at pthread.c:896 #2 0x401d8a59 in pthread_cond_wait (cond=0x810089c, mutex=0x81217c8) at restart.h:34 #3 0x401bb3fe in PR_WaitCondVar () from /tmp/nsinstallertest/libnspr4.so #4 0x407903b3 in NSGetModule () from /tmp/nsinstallertest/components/libnecko.so #5 0x4079000b in NSGetModule () from /tmp/nsinstallertest/components/libnecko.so #6 0x401375fa in nsThread::Main () from /tmp/nsinstallertest/libxpcom.so #7 0x401bf6ee in PR_Select () from /tmp/nsinstallertest/libnspr4.so #8 0x401d9a4f in pthread_start_thread_event (arg=0xbf5ffc00) at manager.c:274 #0 0x404af585 in __sigsuspend (set=0xbf3ff990) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45 #1 0x401dc4b9 in __pthread_wait_for_restart_signal (self=0xbf3ffc00) at pthread.c:896 #2 0x401d8a59 in pthread_cond_wait (cond=0x81040fc, mutex=0x81214a8) at restart.h:34 #3 0x401bb3fe in PR_WaitCondVar () from /tmp/nsinstallertest/libnspr4.so #4 0x401381a5 in nsThreadPool::GetRequest () from /tmp/nsinstallertest/libxpcom.so #5 0x4013881e in nsThreadPoolRunnable::Run () from /tmp/nsinstallertest/libxpcom.so #6 0x401375fa in nsThread::Main () from /tmp/nsinstallertest/libxpcom.so #7 0x401bf6ee in PR_Select () from /tmp/nsinstallertest/libnspr4.so #8 0x401d9a4f in pthread_start_thread_event (arg=0xbf3ffc00) at manager.c:274
ok, now I can hang with the bits downloaded from netscape.com/plugins
(from the Reporter, in response to earlier question) Now that I've deleted the "default" profile, leaving only a single profile for myself, Mozilla starts, and I can use Java and flash plugins.
Blocks: 89792
It seems that I have the same problem with Win98. Update OS to "all"?
win98 works fine for me with multiple profiles (>1). I tried a full install and also a recommended, then triggering jvm.xpi. Both worked fine without errors. What's the error you're seeing under win98?
Looks like this was somehow fixed for linux on the trunk. Linux trunk build 2001071008 shows this bug but on build 2001071021 its fixed. Unfortunately, this is still broken on the branch.
Just a quick guess, but brian checked in a Mac plugin fix on the tenth that might have fixed this problem. the bug ID for that is bug 85231. Thanks shannon for pointing this out. I'm applying his patches to my debug build (last updated on the ninth) to see if this works. Brian, any idea if your patch fixed this problem as well?
cancel the last comment, brians patches didn't do the trick for me.
Sean Su: Hello! On 2001-07-09 the bug 64351, which I have opened, was marked as a duplicate of this one. The problem: If I try to enter a site with java-applets (JRE 1.3.0_01 is installed through mozilla), Mozilla crashes and talkback appears. I really don´t know if the bugs have the same origin. Bye, Daniel
*** Bug 90958 has been marked as a duplicate of this bug. ***
Keywords: smoketest
Do we need to bring any more help on this one? Who do we want?
Assignee: syd → racham
Status: ASSIGNED → NEW
Component: Plug-ins → Profile Migration
Summary: mozilla catatonic after JRE plugin installation → mozilla catatonic inside Migrate Profile Routine with Java installed
I'm now able to reproduce this easily by simply blowing alway my ~/.mozilla directory. Starting back up after killing the frozen task seemed to make Mozilla work okay. The installer may be off the hook because everything works fine as long as we don't go into profile migration after having installed Java. In fact, I can even reproduce by doing: 1) Typical install 2) Install Java (either from web or manually create symbolic links) 3) Exit 4) Remove ~./mozilla 5) Start Netscape and notice the same hang, except I got the Activation window (still catatonic) to appear for a few moments. Having said that, I think this bug may belong to whomever owns the "Migrate Profile Routine" because that's where it seem to freeze and it's the last thing on the console.
Peter, I am not sure it is just profile migration. I can go into profile manager and create a profile and hang also. I can have my profile migrated from an earlier test and it will not be usable after I install Java.
Shannond discovered that a checkin on July 10 between 08:00 a.m. and 09:00 p.m. fixed this problem on the trunk. I verified that yes this is fixed in the trunk with a single profile (no .mozilla directory) and multiple profiles. If anybody knows what checkin fixed this problem, please inform the masses. I've tried looking on bonsai.mozilla.org to see what checkin it could possibly be, still haven't found it since there were NUMEROUS checkins during that period. I'm currently patching up my debug branch build one checkin at a time which really isn't producing any significant results...for me anyway. If anybody knows an easier way to find out what fixed this bug, please advise. Here is the URL of all the checkins to the trunk on July 10 between 8 a.m. and 9 p.m. http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=SeaMonkeyAll&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Change+Size&hours=2&date=explicit&mindate=07/10/2001+08:00:00&maxdate=07/10/2001+21:00:00&cvsroot=/cvsroot Hope this helps us narrow down what fixed this problem.
Summary: mozilla catatonic inside Migrate Profile Routine with Java installed → mozilla catatonic after JRE plugin installation
sweetlou no longer has the 2001-07-10-08-trunk bits (but you can still get the working 07-10-21-trunk). If you want a copy of the last bad trunk bits e-mail me and I'll let you where to get em on my machine.
Syd and I are seeing the same thing, trunk works, branch doesn't. This could possibly be caused on the branch only because of bug 87913. That patch still needs to be backed out of the branch but it's already out of the trunk. It changed binary compatiblity of the component manager, however, that checkin was made on the 11th, not the 10th. I'll try backing out that change tonight and we'll see if this bug gets fixed tomorrow.
Summary: mozilla catatonic after JRE plugin installation → mozilla catatonic inside Migrate Profile Routine with Java installed
Yes, only branch. I see the profile manager dialog come up and go away just before hanging. But that doesn't prove anything to me regarding where the blame lies.
Grace mentioned that we see this even with new profiles. Peter did you notice that ? Getting linux bits from sweetlou.
I am not sure if this is a profile migration issue unless it is something new that need to be taken care of (due to other changes). Getting all my builds ready. Adding Putterman, Conrad and Seth to the list.
I've got a debug build and have the plugins folder filled in. Some interesting things, if I remove .mozilla and run, the activation window comes up but the content does not display (there are a bunch of JS errors that display). After a while the timeout occurs for the registration window and then it hangs. Like before the stack trace tells me nothing. If I ctrl-C and restart, with a .mozilla folder now in place everything runs great.
So, the next step was to leave everything the same, remove the Java plugin, and try the same tests. Now I can't break it. This is *long* after installer, I'm just playing with stuff in a directory on my machine. So you *know* that the java plugin being there is hosing us, because I can't crash it no matter what the profile situation is (i.e., regardless of my deleting .mozilla or not). It is probably *not* component registration, as that happens only once, not each time I run. So, I really need to know the following from someone -- what is the interaction between mozilla and this java plugin each time the browser is started? Do we call into the java plugin each time we startup? If so, what is done? I think we should focus on that code and see if anything is getting trashed.
If this is Buvhan's bug, then I'm Britney Spears. Reassigning to Peter to help me understand the pluggin issues.
Assignee: racham → peterl
Summary: mozilla catatonic inside Migrate Profile Routine with Java installed → mozilla catatonic with Java installed
I commented out the call to nsPluginHostImpl::LoadPlugins. This fixes things too.
OK, this is interesting, and the best clue so far. Seems that LoadPlugins happens when a docshell is created. Guess when the first docshell is created? Yes, when the window that shows the barber pole is displayed, showing you profile migration. And we hang after that docshell is destroyed. Obviously, then, but not removing the .mozilla folder, we are deferring creation of a docshell, and thus loading of plugins until later in the startup sequence. Perhaps the problem isn't one of memory corruption (after all, we have seen both profile migration, and java plugin work fine, just not fine together). Perhaps it is one of timing.
Just to clarify, when I refer to timing, I think this is about stuff happening too early, or a docshell going away to soon. If I comment out the call to show the little progress bar dialog, I get past this problem. I'll attach a patch. This patch is not the fix, necessarily.
Index: pref-migrator/src/nsPrefMigration.cpp =================================================================== RCS file: /cvsroot/mozilla/profile/pref-migrator/src/nsPrefMigration.cpp,v retrieving revision 1.144.24.1 diff -u -r1.144.24.1 nsPrefMigration.cpp --- nsPrefMigration.cpp 2001/06/25 13:59:53 1.144.24.1 +++ nsPrefMigration.cpp 2001/07/17 04:59:30 @@ -309,7 +309,8 @@ nsPrefMigration::ProcessPrefs(PRBool showProgressAsModalWindow) { nsresult rv; - + +#if 0 nsCOMPtr<nsIWindowWatcher> windowWatcher(do_GetService("@mozilla.org/embedcomp/window-watcher;1", &rv)); if (NS_FAILED(rv)) return rv; @@ -321,6 +322,7 @@ nsnull, getter_AddRefs(mPMProgressWindow)); if (NS_FAILED(rv)) return rv; +#endif return NS_OK; }
okay, I can also hang if I give it a --profilemanager argument. So, this is all about window creation, not profile migration or java. Or maybe java plugin initialization can't handle its docshell being closed, according to dan veditz.
In the bonasi query pasted by Rodney Velasco 2001-07-16, it looks like Timeless made some check-ins to nsWindow.cpp and nsWidget.cpp. Could this explain Syd's window timing theory?
No. Dan and I applied those changes. There was no effect. I think there is a real problem here of some kind. Let's look for the real bug and get this solved.
nsDocShell::NewContentViewerObj() appears to be trying to load the plugins only when we can't otherwise find a doc loader factory with which to view content. What is |aContentType| when you hit this code?
Peter and I discovered if we call the profile migration code in nsAppRunner.cpp *after* we allow the hidden window to be created, then the plugin loading triggered off the docshell works great. Dan Veditz has suggested maybe there is a dependency on hidden window, is that possible? Chris, I'll look into your question.
Well, the plugin loading code shouldn't be triggered off the docshell unless we're trying to load a content type that mozilla doesn't know about. So, what content type are we trying to load that mozilla doesn't know about?
So, what if you make the hidden window earlier. There is this comment here: http://lxr.mozilla.org/mozilla/source/xpfe/bootstrap/nsAppRunner.cpp#1145 but I'm pretty sure that's not true anymore.
Thanks to Syd's detail in debuggin', he found the problem. It seems that the profile migration code happens BEFORE the hidden window is created.
Status: NEW → ASSIGNED
Keywords: patch
Priority: -- → P1
Whiteboard: PDT+; no eta → PDT+ [SEEKING REVIEWS]
But still, what's the actual problem? Why should migration have to happen after the hidden window is created? The whole point of windowwatcher is that it can make windows with or without a parent and without affecting or being affected by the hidden window. If that's not the case, there's a problem. Making the hidden window before migration might do the trick, but...
Will somebody please determine what MIME type is failing before we wallpaper over another gaping hole? Thanks.
Right, and when we changed the order of the hidden window creation vs. the profile migration in the AppShell, it worked great! However, we then noticed that the profile manager has this same problem, that window is created BEFORE the hidden window is shown. To explain the relationship between LoadPlugins() and the hidden window, is that when the hidden window is created it is created with a script context. This, in turns, causes LiveConnect to init which in turns causes the JVMManager to start which in turn attempts to LoadPlugins() to see if Java is there. Ed Burns says in e-mail that this is really touchy. Our idea of having LoadPlugins() create the hidden window if it hasn't been created didn't work. What worked very well was adding an EnsureScriptEnvironment() call in the Docshell before we call LoadPlugins(). We don't know the docshell code well enough to understand the side effects of this, but it would sovle this problem in one line. Checking content-type next.
Waterson, the content type is application/vnd.mozilla.xul.+xml. Should we just special case this?
Is it _exactly_ "application/vnd.mozilla.xul.+xml"? If so, it _should_ be application/vnd.mozilla.xul+xml and somebody is screwing it up somewhere. LXR couldn't find any instances of the string you typed, so did you miskey it? If not, maybe someone mangled a strcat() along the way to the docshell?
Sorry, it is exactly: "application/vnd.mozilla.xul+xml" Here is a link to the full text search: http://lxr.mozilla.org/seamonkey/search?string=application%2Fvnd.mozilla .xul
Okay, so -- we should be able to create a doc loader factory for that MIME type; specifically, content/build/nsContentDLF.cpp is able to handle it. I dug around a bit but couldn't find where on earth we actually _register_ the component ID properly, though. What does the stack trace look like at the time we try to |do_CreateInstance()| the DLF? Presumably we'll unwind to the caller with a failure code, who'll have some hard-coded knowledge about using the built-in DLF? Maybe we just need to reverse that, so we try our DLF first before throwing it at the docshell. Alternatively (and probably better), we could make sure that we properly register all the DLF's contract ID's. I've finally got the JRE installed, so I'll dig at it a bit, too.
What if the call to LoadPlugins() was completely removed from the docshell? I don't think we need it here because plugins will be loaded by the JVMManager anyway.
It's conceivable that someone would want plugins without a JVM (e.g., embedded on a device), but that may be a reasonable way to fix this on the branch.
I think nsContentDLF is unused code and you should be modifying nsLayoutDLF. See bug 87476.
Peter, not sure how you came to this conclusion, maybe we had a testing error or something, but adding a call to EnsureScriptEnvironment() does not fix the problem. The only fix I have seen is to get a hidden window created ahead of time, for whatever reason. cc'ing patrick beard at the request of PDT.
If I don't install Java, would Peter's proposal result in plugins not working? That wouldn't be OK since not all users take Java from our installs. Is it reasonable to create a patch based on Chris's comment about the MIME type?
I just chatted with Syd and he says there is even a latter call to LoadPlugins() is causing the same problem if the one in docshell is commented out. We're trying to get a stack. I think I've got reproducing down. Starting mozilla with the -ProfileWizard switch can trigger this every time for me without deleteing any files.
Keywords: patch
Whiteboard: PDT+ [SEEKING REVIEWS] → PDT+
I've tried narrowing down which check-ins fixed the trunk. On 7/10 at 16:00 we hang, and at 16:50 we don't (incorporating one later typo build-bustage fix). I'm trying to narrow it down further, but having to clobber my build. Don't see how any of the 3 checkins in that time block would affect startup ordering.
so, a couple of answers to a couple of questions. 1) no, the plugins are not being unloaded when the windows (profile) go down. 2) I still get the bug if I remove the call to LoadPlugins from inside the docshell stuff. It just gets called elsewhere, as in the following stack. Makes me think that we need to make it work instead of trying to not make it happen. #0 goobooroo () at prmem.c:37 #1 0x40b2df89 in nsPluginHostImpl::LoadPlugins (this=0x80fdd68) at nsPluginHostImpl.cpp:3940 #2 0x40b2d2f1 in nsPluginHostImpl::GetPluginFactory (this=0x80fdd68, aMimeType=0x40a342b1 "application/x-java-vm", aPlugin=0xbfffe43c) at nsPluginHostImpl.cpp:3685 #3 0x40a26630 in nsJVMManager::StartupJVM (this=0x840cf38) at nsJVMManager.cpp:602 #4 0x40a26bee in nsJVMManager::MaybeStartupLiveConnect (this=0x840cf38) at nsJVMManager.cpp:783 #5 0x40a2cc2f in nsJVMManager::StartupLiveConnect (this=0x840cf38, runtime=0x810a440, outStarted=@0xbfffe538) at nsJVMManager.h:128 #6 0x41bed87b in nsJSEnvironment::nsJSEnvironment (this=0x840ce98) at nsJSEnvironment.cpp:1527 #7 0x41bed265 in nsJSEnvironment::GetScriptingEnvironment () at nsJSEnvironment.cpp:1446 #8 0x41bedcf4 in NS_CreateScriptContext (aGlobal=0x83e86d0, aContext=0x83e7f20) at nsJSEnvironment.cpp:1574 #9 0x41be5c74 in nsDOMSOFactory::NewScriptContext (this=0x83e8668, ---Type <return> to continue, or q <return> to quit--- aGlobal=0x83e86d0, aContext=0x83e7f20) at nsDOMFactory.cpp:123 #10 0x42b15a15 in nsDocShell::EnsureScriptEnvironment (this=0x83e7e70) at nsDocShell.cpp:5830 #11 0x42b17b24 in nsWebShell::GetInterface (this=0x83e7e70, aIID=@0x40d28e54, aInstancePtr=0xbfffe850) at nsWebShell.cpp:322 #12 0x401e9f1e in nsGetInterface::operator() (this=0xbfffe8d0, aIID=@0x40d28e54, aInstancePtr=0xbfffe850) at nsIInterfaceRequestor.cpp:37 #13 0x40d1c014 in ?? () from /opt/raptor/branch/ns/dist/bin/components/libembedcomponents.so #14 0x40d1eaa0 in ?? () from /opt/raptor/branch/ns/dist/bin/components/libembedcomponents.so #15 0x40d12925 in ?? () from /opt/raptor/branch/ns/dist/bin/components/libembedcomponents.so #16 0x40d0f549 in ?? () from /opt/raptor/branch/ns/dist/bin/components/libembedcomponents.so #17 0x40d0e6d9 in ?? () from /opt/raptor/branch/ns/dist/bin/components/libembedcomponents.so #18 0x42b5491c in nsPrefMigration::ProcessPrefs (this=0x83e7ae0, showProgressAsModalWindow=0) at nsPrefMigration.cpp:322 ---Type <return> to continue, or q <return> to quit--- #19 0x420e33e3 in nsProfile::MigrateProfile (this=0x83d5320, profileName=0x83e7b48, showProgressAsModalWindow=0) at nsProfile.cpp:1957 #20 0x420e3d68 in nsProfile::MigrateAllProfiles (this=0x83d5320) at nsProfile.cpp:2086 #21 0x420dc44c in nsProfile::AutoMigrate (this=0x83d5320) at nsProfile.cpp:610 #22 0x420dd6e3 in nsProfile::ProcessArgs (this=0x83d5320, cmdLineArgs=0x8249888, profileDirSet=0xbffff5e8, profileURLStr=@0xbffff5c0) at nsProfile.cpp:841 #23 0x420dad70 in nsProfile::StartupWithArgs (this=0x83d5320, cmdLineArgs=0x8249888, canInteract=1) at nsProfile.cpp:376 #24 0x080592f6 in InitializeProfileService (cmdLineArgs=0x8249888) at nsAppRunner.cpp:904 #25 0x0805a81d in main1 (argc=1, argv=0xbffff9d4, nativeApp=0x0) at nsAppRunner.cpp:1191 #26 0x0805b83f in main (argc=1, argv=0xbffff9d4) at nsAppRunner.cpp:1532 #27 0x405b1b65 in __libc_start_main (main=0x805b640 <main>, argc=1, ubp_av=0xbffff9d4, init=0x8053fe4 <_init>, fini=0x8066f20 <_fini>, rtld_fini=0x4000df24 <_dl_fini>, stack_end=0xbffff9cc) at ../sysdeps/generic/libc-start.c:111 (goobooroo is my little hack to get me a breakpoint in gdb without loading the .so, which would be too late) I think we have to understand what (if there is one) the dependency on a global window is all about.
This sounds very much like bug 87843 which happened to be fixed right around the time this one was opened. David Baron fixed the problem with the nsDeviceContextGTK but perhaps there is something additional that needs to be done for Java to work. cc:ing Marc and Brendan, reviewers from bug 87843.
Although it may *sound* like bug 87843, the underlying cause is most likely very different. The problem in bug 87843 is now fixed.
Bug 87843 was about the screen resolution not being calculated, or being calculated incorectly, when the activation window was drawn. The telltale sign of that bug was: ###!!! ASSERTION: Negative Width Input - very bad: 'mComputedWidth>=0', file nsHTMLReflowState.cpp, line 2472 I don't think it's similar.
*** Bug 87571 has been marked as a duplicate of this bug. ***
Change subject.
Summary: mozilla catatonic with Java installed → Profile mgr and java cause hang (was: mozilla catatonic with Java installed)
*** Bug 91385 has been marked as a duplicate of this bug. ***
After working with Ed burns, here's my first attempt at a patch of hackery to actually fully fix this. Please review and comment, maybe someone has a better idea. I tested this out only on Linux branch so far in the following situations: 1) No .mozilla and profile migration 2) One profile and no proifle manager 3) Several profiles and the profile manager come sup 4) Using the --ProfileWizard switch All of those work for me with this patch plus on each one, I verified that Java was working by visiting http://www.javasoft.com Marc suggested that Liveconnect be tested as well. Could someone more familiar with Liveconnect either try this or point me to a testcase? Thanks!
Keywords: patch
Lamentably, the hack breaks liveconnect. Peter, you need to make it so the hack is in place if and only if we're in the (ProfileManager|ProfileWizard|Profile Migration) case. If I remove the hack and start mozilla with only one profile, liveconnect works.
Peter would it be possible to look for the existence of some service that only is known to exist during the *real* browser running situation? How could we determine whether or not we're in the *real* browser case, or in the "before the browser really starts" case?
Does it still solve the problem to create the hidden window before calling InitializeProfileService()? That seems like a bit less of a hack than changing nsJVMManager.cpp. All you'd have to do is move nsAppRunner.cpp#1147 up to #1134. The comment at #1145 just isn't so. You'd have to make a new scary comment though.
dbaron and I updated modules/libpref/src/init/all.js from r3.252 to r.3.253, and it fixes the problem. This was the magic checkin that ``fixed'' things on the trunk. cc'ing jesse, who came up with the patch.
Heh, here's the fix: Index: all.js =================================================================== RCS file: /cvsroot/mozilla/modules/libpref/src/init/all.js,v retrieving revision 3.245.2.7 diff -u -r3.245.2.7 all.js --- all.js 2001/07/17 03:40:18 3.245.2.7 +++ all.js 2001/07/19 03:04:17 @@ -230,6 +230,10 @@ pref("capability.policy.mailnews.sites", "mailbox: imap: news:"); pref("capability.policy.mailnews.Window.name.set", "noAccess"); pref("capability.policy.mailnews.Window.location", "noAccess"); +//////////////////////////////////////////////////////////// +pref("capability.principal.codebase.foo.id", "http://www.netscape.com"); +pref("capability.principal.codebase.foo.granted", "UniversalFoo"); +////////////////////////////////////////////////////////// pref("javascript.enabled", true); pref("javascript.allow.mailnews", false);
From a conversation with mstoltz, this fix was checked in accidentally (along with jesse's changes that were meant to be checked in). It shouldn't actually do anything, but it probably affects initialization order in some way that fixes this bug.
Glad to see that the stench of the hack that Peter and I were working on was enough to wake people up from their torpor. Waterson's all.js patch works like a charm. linux: ./mozilla --ProfileWizard -> browser, with java and liveconnect works fine ./mozilla (two profiles) -> browser with java and liveconnect works fine ./mozilla (one profile) -> browser with java and liveconnect works fine.
The presence of those lines in all.js (which were checked in by accident) cause the for loop in nsScriptSecurityManager::InitPrincipals to iterate once. Without those prefs, aPrefCount will probably be zero, and the code in the for loop will never run. My guess is that the code in that for loop is affecting initialization order somehow.
Great work Waterson! Get that into the trunk!
..er..I mean branch!
Wow, finally a fix. Don't mean to be a bringdown since you all found the fix.....but anybody know why xpicleanup is still running after you install the JVM, and restart browser? I get this dialog box "The program must close to allow a previous installation attempt to complete. Please restart" Right after I installed the jvm from http://home.netscape.com/plugins/jvm.html and file -> quit the browser. I need to start the netscape binary about two times before I can actually get the Profile Manager window (which actually works when I choose a profile..yeah). With that said, I'll file another bug on this...I'm sure most of you hate to see the word 'catatonic' in your mailbox.
bug 91427 filed for above issue I described.
The replacement utility has a far too long sleep cycle, resulting in most people being able to re-launch Mozilla again before xpicleanup realizes Mozilla has shut down and gotten out of its way.
Has anyone already checked in the fix from Chris Waterson 2001-07-18 20:07? We need this in to make the next build a viable candidate build. I've asked dveditz to do the checkin if it hasn't been done and nobody is around.
Checked in to MOZILLA_0_9_2_BRANCH, 2001-07-19 01:02 PDT, all.js rev. 3.245.2.8.
Whiteboard: PDT+ → PDT+ fixed on branch
Thank You Thank You Thank You!!!
Yes!!! fixed on branch build 2001-07-19-04-0.9.2
not to rain on dbaron and waterson's parade, but... we're going to keep this open though and find and resolve the real underlying problem, right? Otherwise what are we going to do when someone removes those prefs for some reason and it comes back...
Uh, yeah. I hope so. In fact, I think mstoltz is going to back the spurious ``fix'' out on the trunk, so we'll soon be able to debug the problem there.
Reassigning to mstoltz...
Assignee: peterl → mstoltz
Status: ASSIGNED → NEW
Keywords: smoketest
The comments suggest that this is checked in. But I just installed the 20010719 4am branch build (net installer, checked ALL to get Java) and when I go to http://www.shallowsky.com/jupiter.html (which has an applet on it) the app hangs before it even loads any of the page, while displaying "Connecting to ..." in the status bar. This is repeatable. The page works in other java-enabled browsers (e.g. galeon). Am I seeing a different bug, or did the fix somehow not make it into 20010719?
akkana, you might be seeing the 'lazy java loading' thing fixed with bug 26516.
If so, it needs to be done differently. The fact is that going to a page containing java completely locks up the browser (presumably forever, but I only timed it for 3 minutes -- should I wait longer than that? The whole app never used to take that long to start up) so that you have to kill the app externally and start over. BTW, this build is: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2) Gecko/20010719 Netscape6/6.1
Akkana: You're probably seeing bug 84093 if you're running RedHat 7.1.
Yes, I'm seeing bug 84093.
Whiteboard: PDT+ fixed on branch → fixed on branch
*** Bug 89792 has been marked as a duplicate of this bug. ***
I don't understand, why was this assigned to me? Shouldn't we come up with a real fix before I back out the accidental fix and reopen the blocker? Back to peterl.
Assignee: mstoltz → peterlubczynski
Mitch, I assigned it to you because I thought nobody knows more about this mysterious pref than you and maybe you could shed some light (in greater detail) as to what this pref actualy does? Do you have ANY ideas WHY Java MUST have this pref set? Using this pref trips _something_ pretty important in security that makes the browser not hang with Java. Also, what's wrong with considering keeping this pref as the REAL fix? I agree the name is awful, but what's wrong with replacing "foo" with "oji" or something like that? I think the problem is with the interaction of browser security and OJI. Even if I did understand the real problem, I don't know enough about OJI or security to fix it. Reassigning to OJI.
Assignee: peterlubczynski → edburns
Component: Profile Migration → OJI
QA Contact: gbush → shrir
I mentioned above what those prefs do - they cause some additional initialization code to run as nsScriptSecurityManager is initialized. I don't know how or why that affects java, and I'd really like to see this fixed at the root of the problem, not haphazardly. Does anyone know when this started/what checkin caused the bug to begin with?
Over the weekend I debugging this a bit and found that the hang was happening where the OnStartRequest for the load of navigator.xul was sent into a proxy and never came out on the other side and got handled by the main thread. On a suggestion from a conversation with darin and gordon (that in the past there was a bug with the cache where holding on to an event queue would stop events from being processed), I debugged this a bit with danm. We found that: * when Java was not installed, there are two event queues destroyed, one right after the profile manager closes (right before the load of navigator.xul starts) and the other at app shutdown * when Java is installed and we don't grant UniversalFoo, the first destruction doesn't happen and the navigator.xul load events that are dropped go through the codepath in PostEvent that punts to an elder queue. * when Java is installed and we do grant UniversalFoo, the event queue destruction at app shutdown doesn't happen, but the one after the profile manager closes does happen (which presumably causes us to run correctly) So it seems like this bug might be related to holding on to an event queue. It's not fully understood why this causes a hang, but the solution may just be not to hold on to the event queue. danm says the event queue stuff is very fragile and changing that might not be a good idea. (Or something like that...) Is the Java plugin holding on to the event queue somehow? (It's an XPCOM plugin, right?) If so, could it be changed not to do so?
It would be nice if someone who has access to the source of the Java plugin (Ed Burns?) could investigate my last comment. The "UniversalFoo" hack really shouldn't be the permanent fix for this bug.
With 0.9.1, I installed the java plugin (1.3.0_01). When starting 0.9.2 or 0.9.3, I had the "Mozilla needs to shutdown to allow a previous installation to finish" error message. I rm -rf my .mozilla directory, run mozilla as root, to no avail. Following the thread here, I remove the symlink /usr/lib/mozilla/plugins/libjavaplugin_oji.so, ran mozilla as root (it worked), rm -rf .mozilla, and run mozilla as a user. It worked. After that, I've put back the symlink for /usr/lib/mozilla/plugins/libjavaplugin_oji.so, and mozilla started, although java is not working (complete hang). Galeon (0.11, 0.11.1, 0.11.2, 0.11.3) does not have those problems.
*** Bug 93169 has been marked as a duplicate of this bug. ***
Priorities in the new economy.
Severity: blocker → critical
Priority: P1 → P2
Target Milestone: --- → mozilla0.9.5
SPAM: reassigning all OJI bugs to new OJI QA, pmac ( 227 bugs)
QA Contact: shrir → pmac
this checkin has a significant negative effect on startup time, because it forces http to initialize during startup. http in turn causes several other components to be initialized. could we get by with just changing the pref to reference about:blank instead of www.netscape.com? any help testing this would be most appreciated. thanks! see bug 97462 and bug 96681
We could probably come up with a better hack anyway -- like doing something that causes the Java library to be initialized before we start the nested event queue for the profile manager. Although, really, it would be nice if the plugin itself were fixed, or if we fixed event queues not to behave this way.
Would a "good" place for a less-invasive hack be nsProfile::LoadDefaultProfileDir, a little before we call OpenWindow?
I don't think so. There is another place in profile startup (Confirming auto-migration) where a modal window is opened with windowwatcher. There may be others. In general, we should be able to open modal windows without hacking around this problem. Let's find a real answer.
OK, I've been assuming that the problem was in the java plugin itself, but I think it's in the JVM manager, so I think we can find a real fix.
Actually, never mind that. The JVM Manager doesn't hold on to the event queue that it gets, so I think the problem is in the Java plugin, which we can't change. Only Sun can, and they don't seem interested in doing that anytime soon.
it sounds to me like someone (maybe the java plugin) has cached an eventQ... Then when a new eventQ is pushed (because of the modal window) some events end up being pushed to the wrong eventQ. This ultimately leads to the hang... -- rick
It's actually a leak of an event queue -- see my comments dated 2001-07-24 15:53.
I assume that the eventQ leaks because someone is holding a reference to it... This is really starting to sound as if the java plugin (or someone) is grabbing an eventQ during early initialization, later posting events to the cached Q (rather than the currently active Q)... Since the eventQ is cached, and everything goes to hell in a hand-basket, we end up leaking the eventQ... Can we add an ASSERT in the event posting code which checks that the eventQ being tageted is the 'active' event Q for that thread? This could catch similar situations in the future :-) -- rick
Whiteboard says 'fixed on branch'. Is that the 0.9.4 branch? If so, could someone please remove the nsbranch keyword to get it off the radar?
It's been fixed temporarily, with a hack of sorts, before 0.9.4 was cut, so removing nsbranch. I would still like this to be fixed soon so we can remove the 'UniversalFoo' hack from all.js.
Severity: critical → normal
Keywords: nsbranch
reassign
Assignee: edburns → joe.chou
Reassign to Joe as I'm leaving the role of OJI module owner.
Assignee: joe.chou → edburns
Ressign to Joe Chou, as I am no longer working officially on OJI.
Assignee: edburns → joe.chou
Target Milestone: mozilla0.9.5 → mozilla0.9.6
I just installed mozilla 0.9.4-2 on a system running RH7.1 with the 2.4.3-12 kernel. I still have the same problems I originally had way back with 0.9.1. As root I tried through debug applets. It asked me to download java which I did, and it claimed it was successfully installed. So I exited mozilla and tried to run it again. I got that same old message about mozilla needing to shutdown. This time I was able to kill the running process (although it wasn't up on the screen) and then I could start mozilla. But when I went to test the applets through debug, it hung again just as it always has. It also hung when I tried to test javascript. Is any progress being made on this at all. Or are we just going around in circles. Any suggestions about what I might try?
len: you seem anxious to try something, and I have something you could try. I don't have the JRE on my machine and I'm pressed for time right now. If anyone out there can build with the patch I'm about to post and let me know if it helps... The patch isn't what I'd propose for an actual checkin. I'm just curious whether it fixes the problem. If it does, we at least know for certain that we understand the problem. The patch wants to be tried using some version of the source that doesn't have the 0.9.4 UniversalFoo "fix." That fix just disguises the problem through some mysterious process. It seems to be checked in everywhere; it can be backed out by removing the two lines in all.js that contain the word "foo". I've tried the patch on my machine. It doesn't hurt anything. I'm curious to know whether it helps. Any volunteers? (Lacking a response, I'll try this myself in a week or maybe.)
Blocks: 104166
I just installed version 0.9.5 from the RedHat_7x rpm package As usualy I started at debug and picked applets. That prompted a download of the java plugin and installed it. This time I was able (as root) to close java without any hanging. But when I went back to the debug > applets test, it hung as usual. Why does the Mozilla web page keep saying that Mozilla has been tested with java, when it still doesn't work. It seems to me this is not a minor issue.
Okay. I finally gave up on getting the Java plugin from mozilla which is supposed to automatically find the plugin you need. Instead I went to sun and got the rpm package for jre 1.3.1. In installed that and moved the link in /usr/lib/mozilla/plugins so it pointed at the plugin in the /usr/java/... tree. Now it works. I don't know how long it has been working, but perhaps the web page should be rewritten. It seems to imply that 1.3.0 will work and you get a version of that if you let mozilla automatically get the plugin. But in fact that version doesn't work.
What you're describing sounds like bug 84093 rather than this bug. (I think the web site should have been changed before resolving that bug, but it's probably going to happen sometime, anyway...)
Sun's JRE 1.3.0 does not like RH 7.1 unless your set LD_ASSUME_KERNEL=2.2.5. See bug 84093. The remaining issues with JRE 1.3.1 and Mozilla have been fixed and the next release of Netscape will come with JRE 1.3.1. The web page for download should also be updated soon, see bug 103926.
Blocks: 97462
Re-assign to nidheesh.
Assignee: joe.chou → nidheesh
Target Milestone: mozilla0.9.6 → mozilla0.9.8
Status: NEW → ASSIGNED
Keywords: mozilla1.0
Keywords: mozilla1.0+
Keywords: mozilla1.0
Nidheesh, any update on this?
*** Bug 97462 has been marked as a duplicate of this bug. ***
Could this be the same issue as bug #99026? If so, perhaps that patch fixes this too :-)
Attached patch patch to remove hack (deleted) — Splinter Review
This is fixed in my tree even with the hack removed. (I tested in my Linux debug build by linking the java plugin into the plugins directory, and starting with ./mozilla -ProfileManager. The build started, and java worked.) So here's a patch that removes the hack that is no longer necessary (and causes HTTP to start up earlier, etc., etc.).
hey david, can you tell if it was the patch for bug #99026 that fixed the problem? this patch 'kinda snuck' into the tree as part of darin's nsIChannel changes ;-) it would be nice if we knew *why* we no longer have this problem. -- rick
Yes, it was the patch for bug 99026. (It looks like only the first part of that patch was checked in. When I back out that one line change in my tree, I get the hang again.)
Comment on attachment 77992 [details] [diff] [review] patch to remove hack sr=darin (with pleasure!)
Attachment #77992 - Flags: superreview+
Yeah! finally :-)
Comment on attachment 77992 [details] [diff] [review] patch to remove hack I second that! r=mstoltz.
Attachment #77992 - Flags: review+
thank you david!!!!! it's GREAT to get rid of this wacky bug ;-) -- rick
I'm taking this bug with the intent of closing it once the above patch is checked in (and the hack removed), since the real bug in Mozilla is already fixed. There remains the issue of the leak of the event queue within the Java plugin itself, but I'm not sure how to report a bug to Sun on the Java plugin. Having it in bugzilla didn't seem to lead to a fix.
Assignee: nidheesh → dbaron
Status: ASSIGNED → NEW
Target Milestone: mozilla0.9.8 → mozilla1.0
Comment on attachment 77992 [details] [diff] [review] patch to remove hack a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #77992 - Flags: approval+
Hack removed 2002-04-08 18:22 PDT. Marking fixed (although it was really bug 99026).
Status: NEW → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
verified on linux redhat 7.1, jre 1.3.1(branch build: 2002-04-17-10-1.0.0). Visiting http://www.java.sun.com, no hang!
Keywords: verified1.0.0
verified
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: