Closed Bug 4779 Opened 26 years ago Closed 26 years ago

[REGRESSION] Crash in file download dialog

Categories

(SeaMonkey :: UI Design, defect, P1)

defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: mcmullen, Assigned: law)

References

()

Details

(Whiteboard: Fix checked in.)

In Apprunner, type a URL of a small file on an http server that is not an html file (such as a .exe file, as in the URL in this bug report). The download dialog appears. Type a path and click OK. It may work the first time. If so, try it again (and again?) Eventually, you will first assert as status messages from Netlib are sent and forwarded to an unprepared and unsuspecting XULDocumentImpl thingie with uninitialized or null members in it. after about 10 asserts, you will crash on a callback from rdf.
Component: Apprunner → XPApps
Priority: P3 → P1
Target Milestone: M4
Set target milestone to M4, changed component to XPApps, and priority to P1. Bill, this happens on Linux too according to Steve, right?
Status: NEW → ASSIGNED
Yes; seems to be a timing thing with the loading of the progress dialog's .xul file competing with the download of the data. I think there's an easy workaround and I'll talk to waterson about it.
QA Contact: 3853 → 4078
I coded up the work around. It works well on win32. Mediocre on Linux (but no crashes with limited testing). Waiting to here about Mac. Check tomorrow.
Sorry to report this, but it doesn't crash now, but it doesn't work, either.
... the file gets downloaded, but the progress meter stays at "unknown" as if the download has not started yet.
on linux 1999-04-09-08, i get an accurate progress meter, which went from 0% -> 55% -> 100% and the download completed. it looks pretty nice. but the dialog is never dismissed; here is standard err: =-=-=-=-=-=-= JavaScript error: dialog.cancel has no properties URL: file:////u/phillip/seamonkey/linux/package/res/samples/downloadProgress.xul LineNo: 65 JavaScript error: dialog.cancel has no properties URL: file:////u/phillip/seamonkey/linux/package/res/samples/downloadProgress.xul LineNo: 131 =-=-=-=-=-=-=
First, the stuck "unknown" state is a known problem that can occur with small downloads (the download completes before the dialog document is to a state that it can receive notification). I'm working on resolving that problem. It is likely we'll have to live with this for M4. BTW, this is symptomatic of the same problem that used to result in the crash (so we're making progress). Second, the dialog is not dismissed because at the time I coded it, it couldn't be (without crashing). Reportedly, it can be made to automatically dismiss when done. I'll look into it and if it works, I'll apply for getting that fix in M4. There is one other problem that relates to the mysterious JS errors. The problem appears to be that the new expat XML parser doesn't like the html that defines the Cancel button. So the cancel button doesn't appear and the code that disables it when the download is complete fails. I have to talk to Nisheeth about fixing that html (or the parser; the code looks OK to me).
Last time I looked, Close() did not crash anywhere, but it only worked properly on Macintosh. Can't hurt to call it though, I think.
Target Milestone: M4 → M5
Checked in a fix that reduces the severity of this problem and fixes the XML problem that caused the cancel button to not appear. Moving to M5.
Status: ASSIGNED → RESOLVED
Closed: 26 years ago
Resolution: --- → FIXED
Bill says he's checked in a fix for this last night.
Status: RESOLVED → REOPENED
OS: Mac System 8.5 → All
This is crashing on windows95 and linux, 6/14 builds. On Mac, nothing really happens, no crash, but you don't even get to the Save As.. dialogue. To reproduce, put http://law/test.exe in the location bar and hit return. Then choose a filename and hit Open. Seems to crash about half the time. I filed a talkback report but can't get to the server right now... Don't know if it's the same problem, as this was marked fixed 6 weeks ago. Re-opening bug and clearing milestone info, as M5 is long gone.
Resolution: FIXED → ---
Clearing resolution.
Target Milestone: M5 → M7
Moved to M7. Bill, is this the same bug or a newer more exciting problem?
Here's is the windows95 stack trace and link to talkback report. http://cyclone.mcom.com/reports/incidenttemplate.CFM?reportID=1076&style=0&tc=1& cp=1&ck1=SNub+trigger+event+time&cd1=1999%2F06%2F15&bbid=10006256 Call Stack: (Signature = js_Invoke 7bb35294) js_Invoke [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 703] js_Interpret [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 2207] js_Invoke [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 676] js_Interpret [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 2207] js_Invoke [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 676] js_CallFunctionValue [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 741] JS_CallFunctionValue [d:\builds\seamonkey\mozilla\js\src\jsapi.c, line 2556] nsJSEventListener::HandleEvent [d:\builds\seamonkey\mozilla\dom\src\events\nsJSEventListener.cpp, line 98] nsEventListenerManager::HandleEvent [d:\builds\seamonkey\mozilla\layout\events\src\nsEventListenerManager.cpp, line 886] RDFElementImpl::HandleDOMEvent [d:\builds\seamonkey\mozilla\rdf\content\src\nsRDFElement.cpp, line 2289] RDFElementImpl::ExecuteJSCode [d:\builds\seamonkey\mozilla\rdf\content\src\nsRDFElement.cpp, line 2591] RDFElementImpl::ExecuteOnChangeHandler [d:\builds\seamonkey\mozilla\rdf\content\src\nsRDFElement.cpp, line 2517] RDFElementImpl::SetAttribute [d:\builds\seamonkey\mozilla\rdf\content\src\nsRDFElement.cpp, line 1900] RDFXULBuilderImpl::AddAttribute [d:\builds\seamonkey\mozilla\rdf\content\src\nsRDFXULBuilder.cpp, line 2414] RDFXULBuilderImpl::OnAssert [d:\builds\seamonkey\mozilla\rdf\content\src\nsRDFXULBuilder.cpp, line 982] CompositeDataSourceImpl::OnAssert [d:\builds\seamonkey\mozilla\rdf\base\src\nsCompositeDataSource.cpp, line 1171] InMemoryDataSource::Assert [d:\builds\seamonkey\mozilla\rdf\base\src\nsInMemoryDataSource.cpp, line 1136] CompositeDataSourceImpl::Assert [d:\builds\seamonkey\mozilla\rdf\base\src\nsCompositeDataSource.cpp, line 797] RDFXULBuilderImpl::OnSetAttribute [d:\builds\seamonkey\mozilla\rdf\content\src\nsRDFXULBuilder.cpp, line 1546] XULDocumentImpl::OnSetAttribute [d:\builds\seamonkey\mozilla\rdf\content\src\nsXULDocument.cpp, line 3666] RDFElementImpl::SetAttribute [d:\builds\seamonkey\mozilla\rdf\content\src\nsRDFElement.cpp, line 898] setAttribute [d:\builds\seamonkey\mozilla\xpfe\components\xfer\src\nsDownloadProgressDialog.c pp, line 342] nsDownloadProgressDialog::OnStopBinding [d:\builds\seamonkey\mozilla\xpfe\components\xfer\src\nsDownloadProgressDialog.c pp, line 264] OnStopBindingProxyEvent::HandleEvent [d:\builds\seamonkey\mozilla\network\module\nsNetThread.cpp, line 594] StreamListenerProxyEvent::HandlePLEvent [d:\builds\seamonkey\mozilla\network\module\nsNetThread.cpp, line 474] PL_HandleEvent [plevent.c, line 492] PL_ProcessPendingEvents [plevent.c, line 453] _md_EventReceiverProc [plevent.c, line 872] KERNEL32.DLL + 0x3663 (0xbff73663) KERNEL32.DLL + 0x22894 (0xbff92894) 0x00768c08
The problem occurs when the code handling the netlib stream transfer informs the .xul (via a broadcaster node) that the download is complete. The RDF element attempts to query the onchange attribute of the <observes> node that is observing the attribute and this fails an assertion for mDocument!=0. I.e., the observes RDF element's mDocument member is null. I thought this might be due to the firing of the onload handler occurring too soon. However, it was another dialog that I had changed; this one still fires (in effect) when the nsIXULWindowCallbacks ConstructBeforeJavaScript member is called. In fact, I inserted a 10 second delay before the download commences and this had no effect. Something is different in the way this document's observes elements are being constructed. The interesting thing is that other broadcaster nodes in this dialog seem to work OK. It's only the one that is changed when the stream transfer completes that causes a problem. I think maybe Chris Waterson should take over on this one.
The stack trace posted here doesn't match what I'm seeing. I'm getting an assertion failure within RDFElementImpl::ExecuteOnChangeHandler.
Assignee: law → waterson
Status: REOPENED → NEW
Reassigned to Chris Waterson, with his permission.
Bill, could you post the stack trace you're seeing?
Assignee: waterson → law
I have debugged the problem. What happens is that an attribute change is triggering synchronous window destruction via an observer. Destroying a window synchronously causes the window's JS context to be released. This recursively causes the mDocument parameter of each content node in the document to be nullified. Unfortunately, there are TWO observers of this attribute. So the next observer tries to poke an identical attribute change, which asserts because the content is dying. Recommendation: don't synchronously close the window. Use window.close(). Meta recommendation: re-write this -- now that we have XPConnect, I think this should reduce to 80 lines of JS code, as opposed to the huge mess it is now. Reassigning to law to fix.
One further comment: the assert, although annoying in debug builds, is not indicative of a fatal condition, and the code recovers from this. So whatever was causing the crash that McMullen describes has been fixed. (I commented out the assert and downloaded a file 10 times. Yeah, I don't have much better to do.)
I believe this will get a lot better once we stop doing this reverse poke of a broadcaster just to go back into C++. Write XPIDL, and just call some exposed functions.
Here's my stack trace: NTDLL! 77f76148() nsDebug::Assertion RDFElementImpl::GetAttribute RDFElementImpl::GetAttribute RDFElementImpl::ExecuteOnChangeHandler RDFElementImpl::SetAttribute RDFXULBuilderImpl::AddAttribute RDFXULBuilderImpl::OnAssert CompositeDataSourceImpl::OnAssert InMemoryDataSource::Assert CompositeDataSourceImpl::Assert RDFXULBuilderImpl::OnSetAttribute XULDocumentImpl::OnSetAttribute RDFElementImpl::SetAttribute setAttribute nsDownloadProgressDialog::OnStopBinding OnStopBindingProxyEvent::HandleEvent StreamListenerProxyEvent::HandlePLEvent PL_HandleEvent PL_ProcessPendingEvents _md_EventReceiverProc USER32! 77e713ed() 00c2cab0() So I don't understand what Chris is saying. There's no window closing going on here. Netlib has said that the stream being downloaded is finished (the OnStopBinding call). A broadcaster attribute is set in order for the UI to be notified of this fact (so it can display that fact to the user). The .xul file has an <observes> element observing that attribute; the GetAttribute call on that <observes> element (to extract the JavaScript code for its onchange handler) is failing the assertion that the element's mDocument data member be non-null. Apparently that's not what Chris saw. Perhaps we should figure out why we're seeing two different things. Next, I don't understand his admonition to rewrite this code. How would that simplify things, exactly? As it stands, there's one class (nsDownloadProgressDialog), that implements nsIStreamListener (required to handle the netlib stream coming in) plus 2 methods used to communicate with the "front-end" (ConstructBeforeJavaScript and AttributeChanged). The latter is 17 lines of C++ code. The former is a few lines of code, but it's mostly all xpcom boilerplate (QI's, etc.). To rewrite that as an xpconnect interface would certainly require more C++ (albeit much of that boilerplate, also), in addition to the 80 lines of JS you suggest. While we could rearrange the code, I'm not sure how wise that would be in the long term. The code currently in place implements but one of a set of related "stream transfer" operations (see nsIStreamTransfer). The current clean separation between the front-end and back-end code hopefully will enable that front-end code to be reused to implement progress dialogs for the majority of the stream transfer operations. Moving back-end logic into JS code in the front-end would seem to make that more difficult, not less. I'll try your advice about switching to window.close(), however. At the time this dialog was implemented, that didn't work, of course. I'm still unsure about how that will solve the assertion failure I'm seeing (since I wouldn't have gotten to the window.close() yet). BTW: This code worked fine just 2 weeks ago (I used this extensively as a test for the nsIXULWindowCallbacks onload-handler "adjustments" I did). I wonder what changed to cause it to stop working? Does that change fix something else we can't forego, or is it the source of the problem? One further comment: The crash John McMullen reported was fixed a long time ago by coding up this intricate start-up/shut-down synchronization (perhaps the "huge mess" you refer to). That problem, as I recall, was due to the stream listener setting the progress broadcaster before the document had loaded sufficiently far along to accept that change. Thus, the delay of the stream transfer till the onload handler fires.
I have also been seeing some odd assertions where mDocument is null. Maybe it's related.
No Bill, I saw exactly what you are referring to. Let me explain what is happening. 1. When the stream closes, it calls nsDownloadDialog::OnStopBinding(). 2. This sets the "completed" attribute to "true" on "data.progress". 3. There is a broadcaster node observing "data.progress"; the onchange handler calls the JS onCompletion() method. 4. The JS onCompletion() method sets the "command" attribute to "close" on the "data.execute" node. 5. Your dialog implements nsIDocumentObserver: setting the "close" attribute calls the nsDownloadProgress::AttributeChanged() method. 6. In AttributeChanged(), you switch on the "close" value of the "command" property to call your own OnClose() method. 7. nsDownloadProgressWindow::OnClose() calls mWindow->Close(), which closes the window. Now for the part that isn't your fault: 8. Closing the window releases the document. 9. Releasing the document invalidates the context for the DOM nodes that live in the document. 10. The XUL broadcaster mechanism attempts to push the attribute change to another observer. 11. The SetAttribute() asserts because an attempt is being made to SetAttribute() on a node whose context has been destroyed. Like I said, the assert is harmless, but annoying in debug builds. I don't think using "window.close()" will work either; when I wrote that, I thought it would asynchronously close the window. (You could do setTimeout("window.close()", 0) -- that should work.) I considered fixing the broadcaster mechanism to check mDocument after each poke, and *not* set attributes if the mDocument == nsnull; however, I think this isn't going to buy us much. Somebody could always write JS code that does: var node = /* get a DOM node somehost */; window.close(); node.setAttribute('foo', 'bar'); Like I said, this causes an ASSERT, but it is idempotent and not anything serious to worry about. The XUL code should be protecting itself, and I was unable to reproduce the crash that McMullen mentioned.
So you could later this bug without fear of anything going wrong in release builds.
This bug has morphed so many times that it is not really the same bug I originally reported.
Good news and bad news. The bad news is that window.close() doesn't work (the window doesn't close). The good news is that it doesn't crash, either. This has me confused. I must be misreading what's going on here. I see my SetAttribute call (to tell the front end that the download is complete) still on the call stack and concluded that the attribute didn't get set. But it got far enough to trigger the call to the onchange handler for the <observes> element that's looking for completion. That handler used to set the attribute that caused the window to be closed. Apparently, this used to go into the AttributeChanged function, issue the CloseWindow() call, and return, all the way back through the JS setAttribute call, and back to the code that called the onchange handler to execute that JS code. Then that code proceeded to examine the other <observes> elements, and asserted on *those* (since the document died at the CloseWindow() call). Now I understand what Chris meant. The spooky thing is that if I add a dump() call to the onchange handler in question, I get a crash *elsewhere* (an assertion in the style-sheet code which seems to think there's no content). Most likely, when I previously attempted to determine whether my onchange handler was called, I inserted a similar dump() call, ran it, got a crash (Win32 message box), observed that my dump() msg wasn't in the console window, and concluded my onchange handler wasn't called (without bothering to launch the debugger to see if the crash was in the same place). My mistake; sorry for any confusion that ensued. So Chris is correct about the cause of the problem. We just need a better fix (one that works, and hopefully one that's not so much work). Here's my plan: 1. Add two state flags to the nsDownloadProgressDialog object. One to indicate whether the front-end has stopped, another to indicate whether the stream transfer is complete (the latter may already be there, actually). 2. Set the second flag upon completion of the download. 3. Check the second flag when the "front end" sets the "stop" attribute and if set, skip the call to CloseWindow(), instead setting the "front-end has requested close" flag. 4. Add code downstream from the completion notification code that will now check the "close requested" flag, and if set, calls CloseWindow(). Basically, we keep track of whether there's a SetAttribute call for the dialog currently on the call stack and avoid the CloseWindow() till later if there is. One might think that the CloseWindow() call could be issued unilaterally, but they'd be wrong. There has been talk of modifying the front end so that it prompts the user to execute the downloaded file. In that case, the CloseWindow() might need to be deferred till the user dismisses *that* dialog. I'll code this up and see how it works. I know this will make the code a bigger mess, but a mess in the hand is worth two in the bush (or something like that).
First, I tried the trick of deferring the window.close() using setTimeout but that didn't help. I'm not sure window.close() is working yet. My suggested fix worked fine. So I'll guess I'll petition to have it accepted into M7. BTW, I still wonder why this suddenly broke. I presume it's due to some memory leak plugging that now results in the window/document actually getting properly freed on window closing. That's a good thing so forget about tracking down the underlying change that caused this to stop working.
Summary: Crash in file download dialog → [REGRESSION] Crash in file download dialog
Whiteboard: Hold for fix in M7
Whiteboard: Hold for fix in M7 → Fix is ready and needs review by waterson
Status: NEW → RESOLVED
Closed: 26 years ago26 years ago
Resolution: --- → FIXED
Fix reviewed by waterson and just now checked in: ? diffs Index: src/nsDownloadProgressDialog.cpp =================================================================== RCS file: /cvsroot/mozilla/xpfe/components/xfer/src/nsDownloadProgressDialog.cpp,v retrieving revision 1.4 diff -r1.4 nsDownloadProgressDialog.cpp 37c37 < // Close the window. --- > // Close the window (if broadcaster/observer traffic isn't pending). 39c39,46 < mWindow->Close(); --- > if ( mStopNotificationPending ) { > // Remember that we need to close the window when that > // notification completes. > mCloseRequested = PR_TRUE; > } else { > // Go ahead and close the window. > mWindow->Close(); > } 75c82,84 < mStopped( PR_FALSE ) { --- > mStopped( PR_FALSE ), > mCloseRequested( PR_FALSE ), > mStopNotificationPending( PR_FALSE ) { 262a272 > mStopNotificationPending = PR_TRUE; 263a274,279 > mStopNotificationPending = PR_FALSE; > // See if that notification triggered close request. > if ( mCloseRequested ) { > // Close the window now. > OnClose(); > } Index: src/nsDownloadProgressDialog.h =================================================================== RCS file: /cvsroot/mozilla/xpfe/components/xfer/src/nsDownloadProgressDialog.h,v retrieving revision 1.6 diff -r1.6 nsDownloadProgressDialog.h 126a127,128 > PRBool mCloseRequested; > PRBool mStopNotificationPending;
Status: RESOLVED → VERIFIED
Whiteboard: Fix is ready and needs review by waterson → Fix checked in.
the crash is no longer there on 1999-06-29-08 RedHat Linux 5.2 kernel 2.2.7 1999-06-29-09 WinNT 4.0 sp4 1999-06-29-09 MacOS 8.51
Product: Core → Mozilla Application Suite
You need to log in before you can comment on or make changes to this bug.