Closed Bug 64332 Opened 24 years ago Closed 23 years ago

Webclient test, EMWindow, freezes after a period of use.

Categories

(Core Graveyard :: Java APIs to WebShell, defect)

x86
Windows NT
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: edburns, Assigned: edburns)

References

Details

Attachments

(5 files)

Environment: Mozilla: Netscape_20000922_BRANCH OS: Winnt 4.0 SP 6 Webclient: JAVADEV_RTM_20001102 Launch webclient with .\runem. The app will work for a while, and then freeze.
I accept.
Status: NEW → ASSIGNED
I see the same behavior on Linux-SMP. The embedded application will freeze after a few minutes, leaving the java end running but seemingly waiting for an event from mozilla. On the same OS but single processor, we've seen the webclient run for roughly 24-36 hours without a hitch. My config: OS: Red Hat 6.2, kernel 2.2.14-5.0smp Box: Dell Precision, Dual PIII 866, 256 RAM JDK: (1.2.2/1.2.2rc4) I've tried both the Sun distro and the blackdown distro. Both with native and green threads...to no avail.
I'm homing in on the cause of this bug. After much trouble, I have a debuggable software stack, but I still can't debug into the hotspot vm. No matter, the problem occurrs in the classic vm as well so I can trace it from there. It looks like it's deadlocking here: USER32! 77e72c30() Java_sun_awt_windows_WToolkit_eventLoop(JNIEnv_ * 0x00849510, _jobject * 0x0516f0ec) line 1453 invoke_V_V(Hjava_lang_Object * 0x015d2b98, methodblock * 0x04e55914, int 1, execenv * 0x00849510) line 71 invokeLazyNativeMethod(Hjava_lang_Object * 0x015d2b98, methodblock * 0x04e55914, int 1, execenv * 0x00849510) line 680 + 22 bytes ExecuteJava_C(unsigned char * 0x0516fed8, execenv * 0x00849510) line 1559 + 22 bytes do_execute_java_method_vararg(execenv * 0x00849510, void * 0x015d2e48, char * 0x0077d968, char * 0x00773bb8, methodblock * 0x00000000, int 0, char * 0x0516ff60, long * 0x00000000, int 0) line 561 + 14 bytes execute_java_dynamic_method(execenv * 0x00849510, Hjava_lang_Object * 0x015d2e48, char * 0x1006bf50, char * 0x1006bf4c) line 277 + 33 bytes ThreadRT0(Hjava_lang_Thread * 0x015d2e48) line 2084 + 23 bytes saveStackBase(void * 0x10042340 ThreadRT0(Hjava_lang_Thread *)) line 139 + 10 bytes _start(sys_thread * 0x00849590) line 293 + 13 bytes _threadstartex(void * 0x00849470) line 212 + 13 bytes KERNEL32! 77f04f3e() More status to come.
Hi Bryan, Try this. Add the following command line options to the java interpreter: -classic -Djava.compiler=NONE This disables hotspot, and disables the Just In Time compiler. This seems to prevent freezes on my system. Please post here whether or not this works around the freeze.
Just tried the "-classic -Djava.compiler=NONE" options on a Linux-SMP and it still hangs after some use. But the SMP freeze may be a separate issue. I'll wait and see what Bryan Hunter observes on a single processor windows machine. Brian
I posted the previous attachment, and am typing this message using webclient with the attached patch. You can patch your runem.pl script using the patch in the attachment to install the workaround. I'm planning on checking this workaround in once I get approval. Ashu, the rest of this patch makes it so util_InitStringConstants() no longer takes a JNIEnv pointer. It now obtains one using JNU_GetEnv().
First attachment checked in. Geetha, this is a dup of another bug, which bug is it?
I tried running the test app for WebClient using the "-classic - Djava.compiler=NONE" option but it still freezes up on me. I tried several times after clean reboots of my system, but no luck. I modified the runem.pl in src_share and verified on the screen that these options are being invoked.
My comments above were for a WinNT 4.0 SP6 system running JDK1.3
I have confirmed that upgrading to jdk1.3.1 will fix this bug. Please wait for jdk1.3.1 beta to come out and we'll re-try it then.
In the meantime, please try the most recent workaround. If it works for you, I'll check it in.
I applied the patch (id=24600) and tested. It did appear to be more stable, but still froze up after some use. I think this patch would only affect the test application and not the API itself, is this correct? I am going to try JDK1.3.0_01.
I tried JDK 1.3.0_01 but did not see any improvement. I recently downloaded and installed JDK1.3.1 Beta for WinNT (and uninstalled all previous versions of the JDK and confirmed relevant env vars were changed accordingly). I do notice that the test app is more stable, but it does still lock up on me. I start up at www.google.com then proceed to "Google Web Directory" and navigate through the various levels of categories. I can't seem to find a reliable pattern but this pattern does seem to lock it up most of the time: From the Google main page, select "Google Web Directory", "Business", "Financial Services", "Mortgages", "United States", "Pennsylvania". Usually it locks up while loading the "Pennsylvania" page. Environment: Mozilla: Netscape_20000922_BRANCH OS: Winnt 4.0 SP 6 Webclient: JAVADEV_RTM_20001102
Rejoice. JDK1.3.1Beta plus the newly released WebClient 1.0 is working well for me now. I would recommend that everyone who needs WebClient use 1.0 with JDK1.3.1. As far as I'm concerned, this bug is fixed.
This is very good news. I'll update the release notes bug 64334
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
My previous comments referred to Linux. On WinNT I'm still have trouble, this may be due to a configuration issue. Will work with Ed to identify and resolve.
This ain't fixed. Bryan, after the freeze occurrs, can you please give the console from which you started webclient keyboard focus and press Ctrl Break? This should give you a thread state dump. Please post that to this bug. Thanks, Ed
Creating Event Queue InitMozillaStuff(784670): Create the action queue Init the baseWindow Create the BaseWindow... Creation Done..... Show the webBrowser in BrowserControlCanvas setBounds: x = 4 y = 59 w = 632 h = 410 native library does implement webclient.Navigation in BrowserControlCanvas setBounds: x = 4 y = 59 w = 632 h = 410 native library does implement webclient.CurrentPage native library does implement webclient.History native library does implement webclient.Preferences java.lang.Exception: nativeRegisterPrefChangedCallback: can't set callback native library does implement webclient.EventRegistration native library does implement webclient.Bookmarks debug: edburns: got Bookmarks instance +++++++++++++++++++++ Thread Id ---- 00785510 has multiple monitor apis is 0 +++++++++++++++++++++ Thread Id ---- 00785510 debug: edburns: Currently Viewing: http://www.google.com/ Button1 debug: edburns: Currently Viewing: http://directory.google.com/ Button1 debug: edburns: Currently Viewing: http://directory.google.com/Top/Arts/ Button1 debug: edburns: Currently Viewing: http://directory.google.com/Top/Arts/Animation/ Button1 debug: edburns: Currently Viewing: http://directory.google.com/Top/Arts/Animation/Anime/ Button1 debug: edburns: Currently Viewing: http://directory.google.com/Top/Arts/Animation/Anime/Fandom/ Full thread dump: "Thread-1" prio=5 tid=0x9184910 nid=0x103 waiting on monitor [0..0x6fb30] "Screen Updater" prio=5 tid=0x857d80 nid=0x127 waiting on monitor [0x916f000..0x916fdc0] at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:420) at sun.awt.ScreenUpdater.nextEntry(ScreenUpdater.java:76) at sun.awt.ScreenUpdater.run(ScreenUpdater.java:95) "EventThread-7882352" prio=5 tid=0x785ed0 nid=0x112 runnable [0x8f6f000..0x8f6fdc0] at org.mozilla.webclient.wrapper_native.NativeEventThread.nativeProcessEvents (Native Method) at org.mozilla.webclient.wrapper_native.NativeEventThread.run (NativeEventThread.java:244) "AWT-Windows" prio=7 tid=0x778470 nid=0x109 runnable [0x8f0f000..0x8f0fdc0] at sun.awt.windows.WToolkit.eventLoop(Native Method) at sun.awt.windows.WToolkit.run(WToolkit.java:188) at java.lang.Thread.run(Thread.java:484) "SunToolkit.PostEventQueue-0" prio=7 tid=0x7773b0 nid=0x128 waiting on monitor [0x8ecf000..0x8ecfdc0] at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:420) at sun.awt.PostEventQueue.run(SunToolkit.java:491) "AWT-EventQueue-0" prio=7 tid=0x777a90 nid=0x11f waiting on monitor [0x8e8f000..0x8e8fdc0] at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:420) at java.awt.EventQueue.getNextEvent(EventQueue.java:260) at java.awt.EventDispatchThread.pumpOneEvent (EventDispatchThread.java:101) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93) at java.awt.EventDispatchThread.run(EventDispatchThread.java:84) "Signal Dispatcher" daemon prio=10 tid=0x768180 nid=0xe2 waiting on monitor [0..0] "Finalizer" daemon prio=9 tid=0x766560 nid=0x14d waiting on monitor [0x8d8f000..0x8d8fdc0] at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:108) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:123) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:162) "Reference Handler" daemon prio=10 tid=0x765280 nid=0x140 waiting on monitor [0x8d4f000..0x8d4fdc0] at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:420) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:110) "VM Thread" prio=5 tid=0x7644c0 nid=0x100 runnable "VM Periodic Task Thread" prio=10 tid=0x7687d0 nid=0x14c waiting on monitor
Reopen
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I've compared Thread dumps from the same invocation in the pre and post freeze states, and they are exactly the same. I think this tells me that the freeze has to be in mozilla code.
I have determined that when the freeze occurrs, the NativeEventThread does indeed stop.
Changing QA contact
QA Contact: geetha.vaidyanaathan → avm
I reproduce this bug with latest nightly mozilla build. I reproduce it on Linux 2.2 at dual Intel Pentim processors machine. Webclient freeze after about 10 minutes of use.
Attached file Tar.gz of files for stress test. (deleted) —
Ed, I verify you stress test on Linux SMP and on NT with nightly mozilla & Webclient. On both platform test failed. On Linux it's crashed with following errors: +++++++++++++++++++++ Thread Id ---- 0x812dca8 debug: edburns: Currently Viewing: http://random.yahoo.com/bin/ryl Enabling Quirk StyleSheet /~mindeec/mindee.html /adi/tr.ln/member;h=misc;sz=468x60;ord=104413188219354? WEBSHELL- = 2 +++++++++++++++++++++ Thread Id ---- 0x812dca8 debug: edburns: Currently Viewing: http://random.yahoo.com/bin/ryl ###!!! ASSERTION: couldn't lazily create the server : 'NS_SUCCEEDED(rv)', file nsMsgAccount.cpp, line 94 ###!!! Break: at file nsMsgAccount.cpp, line 94 Enabling Quirk StyleSheet /~danreed/ +++++++++++++++++++++ Thread Id ---- 0x812dca8 debug: edburns: Currently Viewing: http://random.yahoo.com/bin/ryl Opening file cookperm.txt failed Enabling Quirk StyleSheet Enabling Quirk StyleSheet /alt.irc.undernet Opening file cookperm.txt failed ###!!! ASSERTION: NS_ENSURE_TRUE(NS_SUCCEEDED(mTreeOwner->FindItemWithName (aName, static_c ast< nsIDocShellTreeItem * >( this), _retval))) failed: '(!((mTreeOwner- >FindItemWithName( aName, static_cast< nsIDocShellTreeItem * >( this), _retval)) & 0x80000000))', file nsDocS hell.cpp, line 1143 ###!!! Break: at file nsDocShell.cpp, line 1143 +++++++++++++++++++++ Thread Id ---- 0x812dca8 debug: edburns: Currently Viewing: http://random.yahoo.com/bin/ryl +++++++++++++++++++++ Thread Id ---- 0x812dca8 debug: edburns: Currently Viewing: http://random.yahoo.com/bin/ryl Enabling Quirk StyleSheet /life/cyber/tech/ctg660.htm +++++++++++++++++++++ Thread Id ---- 0x812dca8 debug: edburns: Currently Viewing: http://random.yahoo.com/bin/ryl Enabling Quirk StyleSheet / ###!!! ASSERTION: You can't dereference a NULL nsCOMPtr with operator-> ().: 'mRawPtr != 0' , file ../../../../dist/include/nsCOMPtr.h, line 652 ###!!! Break: at file ../../../../dist/include/nsCOMPtr.h, line 652 # # An unexpected exception has been detected in native code outside the VM.# Program coun ter=0x4ae1821e # # Problematic Thread: prio=1 tid=0x48ec6d68 nid=0x147e runnable # On NT platform it's hung after 5 minutes with following output: +++++++++++++++++++++ Thread Id ---- 050357D0 debug: edburns: Currently Viewing: http://random.yahoo.com/bin/ryl debug: edburns: STATE_REDIRECTING debug: edburns: STATE_TRANSFERRING +++++++++++++++++++++ Thread Id ---- 050357D0 debug: edburns: Currently Viewing: http://random.yahoo.com/bin/ryl debug: edburns: STATE_REDIRECTING debug: edburns: STATE_TRANSFERRING Enabling Quirk StyleSheet debug: edburns: STATE_TRANSFERRING / +++++++++++++++++++++ Thread Id ---- 050357D0 debug: edburns: Currently Viewing: http://random.yahoo.com/bin/ryl debug: edburns: STATE_REDIRECTING debug: edburns: STATE_TRANSFERRING Enabling Quirk StyleSheet debug: edburns: STATE_TRANSFERRING debug: edburns: STATE_TRANSFERRING / Opening file cookperm.txt failed +++++++++++++++++++++ Thread Id ---- 050357D0 debug: edburns: Currently Viewing: http://random.yahoo.com/bin/ryl debug: edburns: STATE_REDIRECTING debug: edburns: STATE_TRANSFERRING Enabling Quirk StyleSheet debug: edburns: STATE_TRANSFERRING / +++++++++++++++++++++ Thread Id ---- 050357D0 debug: edburns: Currently Viewing: http://random.yahoo.com/bin/ryl debug: edburns: STATE_REDIRECTING debug: edburns: STATE_TRANSFERRING +++++++++++++++++++++ Thread Id ---- 050357D0 debug: edburns: Currently Viewing: http://random.yahoo.com/bin/ryl debug: edburns: STATE_REDIRECTING debug: edburns: STATE_TRANSFERRING +++++++++++++++++++++ Thread Id ---- 050357D0 debug: edburns: Currently Viewing: http://random.yahoo.com/bin/ryl debug: edburns: STATE_REDIRECTING debug: edburns: STATE_TRANSFERRING Enabling Quirk StyleSheet debug: edburns: STATE_TRANSFERRING /dnsmith/Archie/archie.html debug: edburns: Currently Viewing: http://www.pichamber.com/ Opening file cookperm.txt failed Enabling Quirk StyleSheet debug: edburns: STATE_TRANSFERRING /pimaine.html Opening file cookperm.txt failed debug: edburns: STATE_TRANSFERRING +++++++++++++++++++++ Thread Id ---- 050357D0 debug: edburns: Currently Viewing: http://random.yahoo.com/bin/ryl WARNING: Never finished decoding the JPEG., file c:\mozilla\mozilla\modules\libp r0n\decoders\jpeg\nsJPEGDecoder.cpp, line 177 debug: edburns: STATE_REDIRECTING
Sorry, on NT platform Webclient hang up after 5 minutes :)
Still occurrs. This does not occurr with WinEmbed so it must be something to do with the interaction of Java and Mozilla.
I notice that before the app hangs both the java event queue, in AwtToolkit::MessageLoop(), and the mozilla event queue, in NativeEventThread_nativeProcessEvents() continually fire. After the hang, the mozilla event queue doesn't fire at all, and the java event queue only fires when the mouse is moved inside the java part of the app.
Status: REOPENED → ASSIGNED
Post freeze thread dump: Full thread dump Classic VM (1.3.1-rc1-b21, native threads): "Thread-1" (TID:0x15c74f0, sys_thread_t:0x86ff90, state:CW, native ID:0x137) prio=5 "Screen Updater" (TID:0x15c8f70, sys_thread_t:0x84ff70, state:CW, native ID:0x107) prio=5 at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:420) at sun.awt.ScreenUpdater.nextEntry(ScreenUpdater.java:76) at sun.awt.ScreenUpdater.run(ScreenUpdater.java:95) "EventThread-85677680" (TID:0x15bcb40, sys_thread_t:0x83e8c0, state:R, native ID:0x125) prio=5 at org.mozilla.webclient.wrapper_native.NativeEventThread.nativeProcessEvents (Native Method) at org.mozilla.webclient.wrapper_native.NativeEventThread.run (NativeEventThread.java:244) "AWT-Windows" (TID:0x15c63c8, sys_thread_t:0x7f00c0, state:R, native ID:0x13f) prio=6 at sun.awt.windows.WToolkit.eventLoop(Native Method) at sun.awt.windows.WToolkit.run(WToolkit.java:188) at java.lang.Thread.run(Thread.java:484) "SunToolkit.PostEventQueue-0" (TID:0x15c62f0, sys_thread_t:0x7f08e0, state:CW, native ID:0x138) prio=6 at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:420) at sun.awt.PostEventQueue.run(SunToolkit.java:491) "AWT-EventQueue-0" (TID:0x15c62c8, sys_thread_t:0x7ef1d0, state:CW, native ID:0x14d) prio=6 at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:420) at java.awt.EventQueue.getNextEvent(EventQueue.java:260) at java.awt.EventDispatchThread.pumpOneEventForHierarchy (EventDispatchThread.java:106) at java.awt.EventDispatchThread.pumpEventsForHierarchy (EventDispatchThread.java:98) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93) at java.awt.EventDispatchThread.run(EventDispatchThread.java:85) "Finalizer" (TID:0x15a9528, sys_thread_t:0x78b9a0, state:CW, native ID:0x14c) prio=8 at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:108) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:123) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:162) "Reference Handler" (TID:0x15a9300, sys_thread_t:0x789130, state:CW, native ID:0x13b) prio=10 at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:420) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:110) "Signal dispatcher" (TID:0x15a9330, sys_thread_t:0x7894e0, state:R, native ID:0x144) prio=5 Monitor Cache Dump: sun.awt.ScreenUpdater@15C8F70/1637190: <unowned> Waiting to be notified: "Screen Updater" (0x84ff70) java.lang.ref.ReferenceQueue$Lock@15A9540/15DF358: <unowned> Waiting to be notified: "Finalizer" (0x78b9a0) java.lang.ref.Reference$Lock@15A9310/15DEF30: <unowned> Waiting to be notified: "Reference Handler" (0x789130) sun.awt.PostEventQueue@15C62F0/15E1A00: <unowned> Waiting to be notified: "SunToolkit.PostEventQueue-0" (0x7f08e0) org.mozilla.webclient.wrapper_native.NativeEventThread@15BCB40/1648938: owner "EventThread-85677680" (0x83e8c0) 1 en try java.awt.EventQueue@15C6418/1630538: <unowned> Waiting to be notified: "AWT-EventQueue-0" (0x7ef1d0) Registered Monitor Dump: utf8 hash table: <unowned> JNI pinning lock: <unowned> JNI global reference lock: <unowned> BinClass lock: <unowned> Class linking lock: <unowned> System class loader lock: <unowned> Code rewrite lock: <unowned> Heap lock: <unowned> Monitor cache lock: owner "Signal dispatcher" (0x7894e0) 1 entry Thread queue lock: owner "Signal dispatcher" (0x7894e0) 1 entry Waiting to be notified: "Thread-1" (0x86ff90) Monitor registry: owner "Signal dispatcher" (0x7894e0) 1 entry
Pre freeze thread dump: Full thread dump Classic VM (1.3.1-rc1-b21, native threads): "Thread-1" (TID:0x15c66e8, sys_thread_t:0x86e210, state:CW, native ID:0x1ec) prio=5 "Screen Updater" (TID:0x15c8f70, sys_thread_t:0x84ff70, state:CW, native ID:0x13d) prio=5 at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:420) at sun.awt.ScreenUpdater.nextEntry(ScreenUpdater.java:76) at sun.awt.ScreenUpdater.run(ScreenUpdater.java:95) "EventThread-85677680" (TID:0x15bcb40, sys_thread_t:0x83e8c0, state:R, native ID:0x138) prio=5 at org.mozilla.webclient.wrapper_native.NativeEventThread.nativeProcessEvents (Native Method) at org.mozilla.webclient.wrapper_native.NativeEventThread.run (NativeEventThread.java:244) "AWT-Windows" (TID:0x15c63c8, sys_thread_t:0x7f00c0, state:R, native ID:0xf7) prio=6 at sun.awt.windows.WToolkit.eventLoop(Native Method) at sun.awt.windows.WToolkit.run(WToolkit.java:188) at java.lang.Thread.run(Thread.java:484) "SunToolkit.PostEventQueue-0" (TID:0x15c62f0, sys_thread_t:0x7f08e0, state:CW, native ID:0x13b) prio=6 at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:420) at sun.awt.PostEventQueue.run(SunToolkit.java:491) "AWT-EventQueue-0" (TID:0x15c62c8, sys_thread_t:0x7ef1d0, state:CW, native ID:0x144) prio=6 at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:420) at java.awt.EventQueue.getNextEvent(EventQueue.java:260) at java.awt.EventDispatchThread.pumpOneEventForHierarchy (EventDispatchThread.java:106) at java.awt.EventDispatchThread.pumpEventsForHierarchy (EventDispatchThread.java:98) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93) at java.awt.EventDispatchThread.run(EventDispatchThread.java:85) "Finalizer" (TID:0x15a9528, sys_thread_t:0x78b9a0, state:CW, native ID:0x147) prio=8 at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:108) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:123) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:162) "Reference Handler" (TID:0x15a9300, sys_thread_t:0x789130, state:CW, native ID:0x10c) prio=10 at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:420) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:110) "Signal dispatcher" (TID:0x15a9330, sys_thread_t:0x7894e0, state:R, native ID:0x1e3) prio=5 Monitor Cache Dump: sun.awt.ScreenUpdater@15C8F70/1637950: <unowned> Waiting to be notified: "Screen Updater" (0x84ff70) java.lang.ref.ReferenceQueue$Lock@15A9540/15DF3A8: <unowned> Waiting to be notified: "Finalizer" (0x78b9a0) java.lang.ref.Reference$Lock@15A9310/15DEF80: <unowned> Waiting to be notified: "Reference Handler" (0x789130) sun.awt.PostEventQueue@15C62F0/15E1A50: <unowned> Waiting to be notified: "SunToolkit.PostEventQueue-0" (0x7f08e0) org.mozilla.webclient.wrapper_native.NativeEventThread@15BCB40/16499A8: owner "EventThread-85677680" (0x83e8c0) 1 en try java.awt.EventQueue@15C6418/1630B18: <unowned> Waiting to be notified: "AWT-EventQueue-0" (0x7ef1d0) Registered Monitor Dump: utf8 hash table: <unowned> JNI pinning lock: <unowned> JNI global reference lock: <unowned> BinClass lock: <unowned> Class linking lock: <unowned> System class loader lock: <unowned> Code rewrite lock: <unowned> Heap lock: <unowned> Monitor cache lock: owner "Signal dispatcher" (0x7894e0) 1 entry Thread queue lock: owner "Signal dispatcher" (0x7894e0) 1 entry Waiting to be notified: "Thread-1" (0x86e210) Monitor registry: owner "Signal dispatcher" (0x7894e0) 1 entry
I think I have a testcase that will reliably bring on the freeze: 1. go to http://www.ebay.com/ in webclient 2. type something in the search text field and press enter.
After writing some debugging code, I found the last MSG.message processed by the mozilla event queue is c138. What is this message value? It's always the last one before the freeze.
Solicited help from newsgroup: From: Ed Burns <ed.burnsREMOVE_THIS@sun.com> Newsgroups: netscape.public.mozilla.embedding,netscape.public.mozilla.general Subject: Judson Valeski (or any other embedding guru): Please Help Date: 13 Jul 2001 13:17:46 -0700 Message-ID: <7cb4rsgeg7p.fsf@sun.com>
I modified prmon.c to print out a message on monitor enter and exit like this: fprintf(msgFile, "Enter Monitor: %p\n", mon); fflush(msgFile); I analyzed the output and found that the monitor with pointer value 0x056BB390 was "Entered" 5153 times and "Exited" 5150 times before the crash. Also monitor 0x063FA9C0 "Entered" 11 times, "Exited" 9 times. Could this be causing deadlock? Here's the full output of my test data: The number after the pointer is the number of times that monitor was entered or exited. Enter Monitor: 051D2170 37 Exit Monitor: 051D2170 37 Enter Monitor: 051D4420 6 Exit Monitor: 051D4420 6 Enter Monitor: 051D4CF0 117 Exit Monitor: 051D4CF0 117 Enter Monitor: 051D6880 4327 Exit Monitor: 051D6880 4327 Enter Monitor: 052A66C0 14 Exit Monitor: 052A66C0 14 Enter Monitor: 056566B0 3182 Exit Monitor: 056566B0 3182 Enter Monitor: 056BB390 5153 Exit Monitor: 056BB390 5150 Enter Monitor: 056BED70 9 Exit Monitor: 056BED70 9 Enter Monitor: 056D36A0 55 Exit Monitor: 056D36A0 55 Enter Monitor: 06382750 178 Exit Monitor: 06382750 178 Enter Monitor: 063B8D10 2 Exit Monitor: 063B8D10 2 Enter Monitor: 063C1030 7 Exit Monitor: 063C1030 7 Enter Monitor: 063C1420 5 Exit Monitor: 063C1420 5 Enter Monitor: 063C2D50 4 Exit Monitor: 063C2D50 4 Enter Monitor: 063F0E90 16 Exit Monitor: 063F0E90 16 Enter Monitor: 063F1A60 21 Exit Monitor: 063F1A60 21 Enter Monitor: 063FA9C0 11 Exit Monitor: 063FA9C0 9
Here's a google link to the news articles posted about this bug. http://groups.google.com/groups?as_umsgid=7cb4rsgeg7p.fsf@sun.com
Ed Burns <ed.burnsREMOVE_THIS@sun.com> writes: > Ed Burns <ed.burnsREMOVE_THIS@sun.com> writes: > > > Grr. Lamentably, incorporating these calls in my event loop in the same > > manner as used in winEmbed did not fix the problem. > > > > This time the last event processed by the msg queue is 0xC16F. Any ideas? > > I modified prmon.c to print out a message on monitor enter and exit like > this: > > fprintf(msgFile, "Enter Monitor: %p\n", mon); > fflush(msgFile); I have refined the test data some more, writing a perl program to take the monitor exit and enter output data and print out only the cases where numEnters != numExits. I ran the program in the debugger until it froze and collected the data, along with the stack traces on the threads mentioned in the data. I did this twice. Hypothesis: I believe that deadlock is occurring and I have a hunch that there are some valuable clues in the data below. Can someone please look at the stack traces and see if they can spot the deadlock? Is this the right forum for this kind of information? I'm really stuck here. My MO is to collect enough information for someone who is an expert to gain insight. CASE 1 ------ Enter Monitor: 056BBD70: 229949. Exit Monitor: 056BBD70: 229945. Enter Monitor: 08AFFA80: 14. Exit Monitor: 08AFFA80: 12. Thread A ======== PR_EnterMonitor(PRMonitor * 0x056bbd70) line 87 + 14 bytes util_PostEvent(WebShellInitContext * 0x051d53d0, PLEvent * 0x08ae7404) line 49 + 21 bytes Java_org_mozilla_webclient_wrapper_1native_NavigationImpl_nativeStop(JNIEnv_ * 0x0086faa0, _jobject * 0x070dfebc, long 85808080) line 295 + 13 bytes Thread B ======== PR_EnterMonitor(PRMonitor * 0x08affa80) line 87 + 14 bytes nsAutoMonitor::nsAutoMonitor(PRMonitor * 0x08affa80) line 184 + 13 bytes nsSocketTransport::Dispatch(nsSocketRequest * 0x08aff790) line 1288 nsSocketRequest::Cancel(nsSocketRequest * const 0x08aff790, unsigned int 2152398850) line 2527 nsHttpConnection::OnTransactionComplete(unsigned int 2152398850) line 247 nsHttpTransaction::Cancel(nsHttpTransaction * const 0x08afadd0, unsigned int 2152398850) line 598 nsHttpChannel::Cancel(nsHttpChannel * const 0x08afccb0, unsigned int 2152398850) line 1563 nsLoadGroup::Cancel(nsLoadGroup * const 0x063765b0, unsigned int 2152398850) line 239 + 16 bytes nsDocLoaderImpl::Stop(nsDocLoaderImpl * const 0x06376620) line 278 + 31 bytes nsURILoader::Stop(nsURILoader * const 0x06376e40, nsISupports * 0x06376638) line 536 + 23 bytes nsDocShell::Stop(nsDocShell * const 0x06376e90) line 2211 wsStopEvent::handleEvent() line 355 + 18 bytes handleEvent(PLEvent * 0x08aff844) line 48 + 11 bytes PL_HandleEvent(PLEvent * 0x08aff844) line 590 + 10 bytes processEventLoop(WebShellInitContext * 0x051d53d0) line 439 + 9 bytes Java_org_mozilla_webclient_wrapper_1native_NativeEventThread_nativeProcessEvents (JNIEnv_ * 0x0083e840, _jobject * 0x0544febc, long 85808080) line 242 + 9 bytes Thread C ======== PR_EnterMonitor(PRMonitor * 0x056bbd70) line 87 + 14 bytes PL_PostEvent(PLEventQueue * 0x056bb740, PLEvent * 0x051d5eac) line 251 + 10 bytes nsEventQueueImpl::PostEvent(nsEventQueueImpl * const 0x056bb1a0, PLEvent * 0x051d5eac) line 251 + 16 bytes nsMemoryImpl::FlushMemory(const unsigned short * 0x05141db4, int 0) line 432 + 30 bytes MemoryFlusher::Run(MemoryFlusher * const 0x051d6d60) line 177 + 43 bytes nsThread::Main(void * 0x051d6bb0) line 105 + 26 bytes _PR_NativeRunThread(void * 0x051d6990) line 399 + 13 bytes _threadstartex(void * 0x051d67e0) line 212 + 13 bytes Thread D ======== PR_EnterMonitor(PRMonitor * 0x056bbd70) line 87 + 14 bytes PL_PostEvent(PLEventQueue * 0x056bb740, PLEvent * 0x08aff324) line 251 + 10 bytes nsEventQueueImpl::PostEvent(nsEventQueueImpl * const 0x056bb1a0, PLEvent * 0x08aff324) line 251 + 16 bytes nsRequestObserverProxy::FireEvent(nsARequestObserverEvent * 0x08aff320) line 244 + 35 bytes nsRequestObserverProxy::OnStartRequest(nsRequestObserverProxy * const 0x08afb630, nsIRequest * 0x08afadd0, nsISupports * 0x00000000) line 185 + 12 bytes nsStreamListenerProxy::OnStartRequest(nsStreamListenerProxy * const 0x08afcee0, nsIRequest * 0x08afadd0, nsISupports * 0x00000000) line 224 nsHttpTransaction::HandleContent(char * 0x07a90b88, unsigned int 0, unsigned int * 0x05cbfdc8) line 466 + 41 bytes nsHttpTransaction::Read(nsHttpTransaction * const 0x08afadd4, char * 0x07a90b88, unsigned int 0, unsigned int * 0x05cbfdc8) line 709 + 23 bytes nsReadFromInputStream(nsIOutputStream * 0x08afb5c4, void * 0x08afadd4, char * 0x07a90b88, unsigned int 0, unsigned int 4096, unsigned int * 0x05cbfdc8) line 831 nsPipe::nsPipeOutputStream::WriteSegments(nsPipe::nsPipeOutputStream * const 0x08afb5c4, unsigned int (nsIOutputStream *, void *, char *, unsigned int, unsigned int, unsigned int *)* 0x050b5530 nsReadFromInputStream(nsIOutputStream *, void *, char *, unsigned int, unsigned int, unsigned int *), void * 0x08afadd4, unsigned int 16384, unsigned int * 0x05cbfe5c) line 704 + 29 bytes nsPipe::nsPipeOutputStream::WriteFrom(nsPipe::nsPipeOutputStream * const 0x08afb5c4, nsIInputStream * 0x08afadd4, unsigned int 16384, unsigned int * 0x05cbfe5c) line 839 nsStreamListenerProxy::OnDataAvailable(nsStreamListenerProxy * const 0x08afcee0, nsIRequest * 0x08afadd0, nsISupports * 0x00000000, nsIInputStream * 0x08afadd4, unsigned int 0, unsigned int 16384) line 283 + 38 bytes nsHttpTransaction::OnDataReadable(nsIInputStream * 0x08afc3b0) line 214 + 72 bytes nsHttpConnection::OnDataAvailable(nsHttpConnection * const 0x08afaa10, nsIRequest * 0x08aff790, nsISupports * 0x00000000, nsIInputStream * 0x08afc3b0, unsigned int 0, unsigned int 8192) line 631 + 15 bytes nsSocketReadRequest::OnRead() line 2670 + 57 bytes nsSocketTransport::doReadWrite(short 1) line 991 + 14 bytes nsSocketTransport::Process(short 1) line 477 + 13 bytes nsSocketTransportService::Run(nsSocketTransportService * const 0x056b9fb4) line 419 + 13 bytes nsThread::Main(void * 0x056bd950) line 105 + 26 bytes _PR_NativeRunThread(void * 0x056bd730) line 399 + 13 bytes _threadstartex(void * 0x056bd580) line 212 + 13 bytes --------------------------------------------------------------------------- CASE 2 ------ Enter Monitor: 056BBD70: 5828. Exit Monitor: 056BBD70: 5825. Enter Monitor: 063EA6B0: 6. Exit Monitor: 063EA6B0: 3. Thread A ======== PR_EnterMonitor(PRMonitor * 0x056bbd70) line 87 + 14 bytes util_PostEvent(WebShellInitContext * 0x051d53d0, PLEvent * 0x063ebe64) line 49 + 21 bytes Java_org_mozilla_webclient_wrapper_1native_NavigationImpl_nativeStop(JNIEnv_ * 0x0086f4c0, _jobject * 0x0705fe98, long 85808080) line 295 + 13 bytes Thread B ======== PR_EnterMonitor(PRMonitor * 0x056bbd70) line 87 + 14 bytes PL_PostEvent(PLEventQueue * 0x056bb740, PLEvent * 0x063ea180) line 251 + 10 bytes nsEventQueueImpl::PostEvent(nsEventQueueImpl * const 0x056bb1a0, PLEvent * 0x063ea180) line 251 + 16 bytes nsProxyObject::Post(unsigned int 4, nsXPTMethodInfo * 0x06dc2d6c, nsXPTCMiniVariant * 0x05cbfd4c, nsIInterfaceInfo * 0x05296c10) line 470 nsProxyEventObject::CallMethod(nsProxyEventObject * const 0x063ea940, unsigned short 4, const nsXPTMethodInfo * 0x06dc2d6c, nsXPTCMiniVariant * 0x05cbfd4c) line 463 + 52 bytes PrepareAndDispatch(nsXPTCStubBase * 0x063ea940, unsigned int 4, unsigned int * 0x05cbfdfc, unsigned int * 0x05cbfdec) line 100 + 31 bytes SharedStub() line 124 nsHttpConnection::OnStatus(nsHttpConnection * const 0x063eaf48, nsIRequest * 0x063ea470, nsISupports * 0x063eaf40, unsigned int 2152398851, const unsigned short * 0x05cbfe50) line 666 nsSocketTransport::OnStatus(nsSocketRequest * 0x063ea470, nsISupports * 0x063eaf40, unsigned int 2152398851) line 1772 + 63 bytes nsSocketTransport::OnStatus(unsigned int 2152398851) line 1787 nsSocketTransport::Process(short 0) line 462 nsSocketTransportService::ProcessWorkQ() line 243 + 10 bytes nsSocketTransportService::Run(nsSocketTransportService * const 0x056b9fb4) line 446 + 11 bytes nsThread::Main(void * 0x056bd950) line 105 + 26 bytes _PR_NativeRunThread(void * 0x056bd730) line 399 + 13 bytes _threadstartex(void * 0x056bd580) line 212 + 13 bytes Thread C ======== PR_EnterMonitor(PRMonitor * 0x063ea6b0) line 87 + 14 bytes nsAutoMonitor::nsAutoMonitor(PRMonitor * 0x063ea6b0) line 184 + 13 bytes nsSocketTransport::OnFound(nsSocketTransport * const 0x063ea804, nsISupports * 0x00000000, const char * 0x063ea350, nsHostEnt * 0x06db8e74) line 1337 nsDNSRequest::FireStop(unsigned int 0) line 271 + 62 bytes nsDNSLookup::CompleteLookup(unsigned int 0) line 702 + 18 bytes nsDNSService::ProcessLookup(HWND__ * 0x005502d8, unsigned int 1024, unsigned int 1, long 64) line 849 + 22 bytes nsDNSEventProc(HWND__ * 0x005502d8, unsigned int 1024, unsigned int 1, long 64) line 869 + 27 bytes Thread D ======== PR_EnterMonitor(PRMonitor * 0x063ea6b0) line 87 + 14 bytes nsAutoMonitor::nsAutoMonitor(PRMonitor * 0x063ea6b0) line 184 + 13 bytes nsSocketTransport::AsyncRead(nsSocketTransport * const 0x063ea800, nsIStreamListener * 0x063eaf40, nsISupports * 0x00000000, unsigned int 0, unsigned int 4294967295, unsigned int 3, nsIRequest * * 0x063eaf60) line 1420 nsHttpConnection::ActivateConnection() line 382 + 65 bytes nsHttpConnection::SetTransaction(nsHttpTransaction * 0x063e9300) line 154 + 8 bytes nsHttpHandler::InitiateTransaction(nsHttpTransaction * 0x063e9300, nsHttpConnectionInfo * 0x063e7110, int 0) line 387 + 12 bytes nsHttpChannel::Connect(int 1) line 242 nsHttpChannel::AsyncOpen(nsHttpChannel * const 0x063e71c0, nsIStreamListener * 0x063e8d80, nsISupports * 0x00000000) line 1802 + 10 bytes nsDocumentOpenInfo::Open(nsIChannel * 0x063e71c0, int 0, nsISupports * 0x06376e80) line 184 + 18 bytes nsURILoader::OpenURIVia(nsURILoader * const 0x06376e40, nsIChannel * 0x063e71c0, int 0, nsISupports * 0x06376e80, unsigned int 0) line 521 + 20 bytes nsURILoader::OpenURI(nsURILoader * const 0x06376e40, nsIChannel * 0x063e71c0, int 0, nsISupports * 0x06376e80) line 483 nsDocShell::DoChannelLoad(nsIChannel * 0x063e71c0, int 0, nsIURILoader * 0x06376e40) line 4667 + 24 bytes nsDocShell::DoURILoad(nsIURI * 0x063e5ba0, nsIURI * 0x00000000, nsISupports * 0x00000000, int 0, nsIInputStream * 0x00000000, nsIInputStream * 0x00000000) line 4456 + 36 bytes nsDocShell::InternalLoad(nsDocShell * const 0x06376e80, nsIURI * 0x063e5ba0, nsIURI * 0x00000000, nsISupports * 0x00000000, int 1, int 0, const unsigned short * 0x0544fab4, nsIInputStream * 0x00000000, nsIInputStream * 0x00000000, unsigned int 1, nsISHEntry * 0x00000000) line 4275 + 43 bytes nsDocShell::LoadURI(nsDocShell * const 0x06376e80, nsIURI * 0x063e5ba0, nsIDocShellLoadInfo * 0x00000000, unsigned int 0) line 559 + 72 bytes nsDocShell::LoadURI(nsDocShell * const 0x06376e90, const unsigned short * 0x063e1d60, unsigned int 0) line 2161 + 31 bytes wsLoadURLEvent::handleEvent() line 70 + 33 bytes handleEvent(PLEvent * 0x063e1f84) line 48 + 11 bytes PL_HandleEvent(PLEvent * 0x063e1f84) line 590 + 10 bytes processEventLoop(WebShellInitContext * 0x051d53d0) line 439 + 9 bytes Java_org_mozilla_webclient_wrapper_1native_NativeEventThread_nativeProcessEvents (JNIEnv_ * 0x0083e840, _jobject * 0x0544febc, long 85808080) line 242 + 9 bytes
For the record, here is little audit trail of email: On 16 July 17:52:38, Judson Valeski wrote: > There's definately a lock/monitor in-balance here. I'm at a complete > loss as to what the cause could be though (esp. now that you're "doing > embedding idle stuff". On 17 July 09:47:53, Judson Valeski wrote: > Rick just landed (on the trunk) a re-write of the proxy object code to > prevent some crashing. I haven't looked at the code, but it's quite > possible that monitors/locks were re-worked in the process and may have > a positive impact here. > > The socket transport thread shows up here too (no suprise).... could be > some bad interraction there. On 17 July 10:05:01, Judson Valeski wrote: > You're clearly banging on this hard :-/. I'm going to be out there next > week for a few days, but we're in the middle of shipping a few products > (NS6, and a couple embedding products). On top of that I'll be giving a > presentation in San Diego on embedding on Wed of next week. In short, > I'm totally slammed for the forseeable future. On 17 July 14:13:05, Rick Potts wrote: > hey ed, > > I looked at your thread stack-traces and here's my "wild ass guess" as > to whats happening :-) > > It appears that the basic deadlock is between the UI thread and the > socket transport thread in a classic A-B-B-A deadlock. The two monitors > appear to be the PLEventQ monitor and the socket transport monitor. > > In order for this to happen, a couple of things must be true: > > 1. Your version of nsSocketTransport.cpp < rev 2.206. Since in rev > 2.206 darin added a patch to release the sockettransport monitor > *before* calling OnRead(...) > 2. The function processEventLoop(...) must be holding on to the > PLEventQ monitor when PL_HandleEvent(...) is called. > > It appears that the second case exposes another potential deadlock in > the nsSocketTransport. Because the socketTransport lock is *not* > released before OnStatus(...) is called. I think that it probably > should be... > > ed, do these ramblings make any sense to you? On 18 July 15:46:39, Rick Potts wrote: > hey ed, > > so it looks like you are still deadlocking because your function > processEventLoop(...) is holding onto the eventQ monitor when > PLEvent->HandleEvent(...) is called. > > I'm assuming that processEventLoop(...) is your function :-) It looks > like it should be very similar to PL_ProcessPendingEvents(...) except > that there, the monitor is released before calling HandleEvent(...) > > We definately need to fix the corrosponding problem on our side - that > we are calling out of the sockettransport while we are holding onto the > sochet transport lock...
Rick suggested a modification to NativeEventThread.cpp::processEventLoop(), which I made to webclient. I also updated to the today's trunk on win32. After doing this, my WCRandom app, which reloads <http://random.yahoo.com/bin/ryl> every ten seconds, ran for a half hour, eventually crashing due to a memory problem. It didn't hang. I'm posting the log data to this bug. Then I'll try making Rick's modification in webclient, with the 0.9.1 build, in which Netscape 6.1 beta for Solaris will ship, and with which webclient will ship.
Tried Rick's fix with 0.9.1 and it prevents the freeze. This time it ran for 45 minutes then crashed in some JavaScript code, not related to webclient. Marking FIXED. Woohoo!
Status: ASSIGNED → RESOLVED
Closed: 24 years ago23 years ago
Resolution: --- → FIXED
*** Bug 96826 has been marked as a duplicate of this bug. ***
Verified with Webclient under Linux and mozilla 0.9.3 I loaded about 100 urls during 1 hor and webclient do not freeze.
Mark VERIFIED according to Vladimir's comment.
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: