Closed Bug 104769 Opened 23 years ago Closed 23 years ago

[regression] Some JS created windows become invisible zombies (it's complicated, see description)

Categories

(Core :: XUL, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla0.9.6

People

(Reporter: sdagley, Assigned: rpotts)

References

()

Details

(Keywords: regression, Whiteboard: [PDT+])

Attachments

(2 files, 1 obsolete file)

There is a failure to create a window from the JS on the example page to view a trailer for the Spider-Man movie. When this failure occurs, it leaves Mozilla in a state where it won't quit without the user forcing a quit from the OS. When running under OS X it will cause Mozilla to consume a large portion of CPU time until the app is forced to quit, even if there are no visible windows open. This is a regression from Mozilla 0.9.3/Netscape 6.1. It happens under both Mac OS 9 and Mac OS X 10.1 on the current 0.9.4 branch and the 0.9.4 milestone release (an the current trunk build). 1) Go to the example URL and click on the link for a Low Res trailer (it doesn't actually matter what size trailer you pick) 2) Observe the cursor turns into the busy arrow 3) Observe no window for the trailer opens 4) Click in the Tasks menu and observe that there is a new window in the list which I'll refer to as an 'Invisible Zombie' 5) Observe that if you pick the Invisible Zombie from the Tasks list the trailer window doesn't become deactivated (but Close from the File menu doesn't close the trailer window but a command-W will the remove Invisible Zombie from the Tasks list) 6) Observe that it is now impossible to quit the app without a force quit (the menu bar will remain but nothing can be selected) 7) If you're running under Mac OS X, bring up a terminal window and run top and observe that Mozilla is now hogging lots of CPU time.
Applying regression keyword and flagging for PDT on status whiteboard. I don't know what other sites might experience this problem (I just randomly clicked the Spider-Man trailer link to test a plugin related bug) but it seems pretty nasty.
Keywords: regression
OS: Mac System 8.5 → All
Priority: -- → P1
Whiteboard: PDT
Target Milestone: --- → mozilla0.9.6
adding gagan, darin and pav to cc: as bbaetz thinks it may be related to 77072
This keeps getting better and better. All platforms are impacted to some degree. There's also another issue on Mac - due to the manner in which new windows are created as being tiled from the last one created, any new windows brought up post-creation of the Invisible Zombie will be invisible themselves become Invisible Zombies.
Hardware: Macintosh → All
http://bugzilla.mozilla.org/show_bug.cgi?id=77072 was filed back in April which means the bug should have happened in N6.1. N6.1, in Steve's example, brought up the window with the trailer w/out any problems.
maybe related ? bug 88355
i'm debugging now. it appears that once you click the link, nsWebShellWindow::OnStateChanged() gets called only when you click on the stop button, but it thinks that the chrome has already loaded (mChromeLoaded is true) so it doesn't ever resize the window. For every subsequent window, nsWebShellWindow::OnStateChanged() simply doesn't get called at all, so neither does nsXULWindow::OnChromeLoaded(). Subsequent windows are being created, they just never get resized to bigger than (1,1). this smells like necko to me. over gagan.
Assignee: sdagley → gagan
the OnStateChange I mentioned above is getting called on the main browser window (hence why mChromeLoaded is true). So we're never getting the correct state changes on the zombie window.
hmm. This is some wierd js foo. Attaching a simplified test case...
Attached file simple test case (obsolete) (deleted) —
Attachment #53625 - Attachment is obsolete: true
ignore that last attachment.
Rpotts is looking into this. He believes that this is one of the class of several "inital modal event loop" problems. I guess the best bet for that class of bugs is either rpotts or danm. (both are cc'd here) but it's definitely not necko.
A second event queue does become active as you click on the movie trailer link -- that's to be expected if you're opening a new content window. The new event queue is never deactivated, so any events pending on the older queue go forever unprocessed. There were six such events in the trial I just ran. That explains the hang. But this is also behaviour you'd expect if the new content window never finished loading its chrome.
Has anyone confirmed that this is a problem on all platforms? Using 2001-10-12 commercial branch on WinNT, I went to the test URL (http://www.apple.com/trailers/columbia/spiderman/) and chose the low res trailer. A new window was correctly opened. The only error I saw was that QuickTime 5 was not correctly detected (I have 5.0.1 on my machine, and I was redirected to Apple to download Quicktime). I had no problem with the window for the trailer, however.
dan: if this is not your bug pls. reassign, but based on your comments I am assigning this to you.
Assignee: gagan → danm
Jeez, I was just trying to be helpful. My comments above stated that things were being thrown out of whack because the window didn't finish loading. So I get to be the "why didn't the window finish loading" bitch?
confirmed on all the platforms. linux 10-15-05-0.9.4 windows 10-15-05-0.9.4 mac 0Sx 10-15-11-0.9.4 The new window is not launched but the task shows other way. window can be closed using File>Close in all platforms does not have to kill. Cntl-N or "File>New window" does not work (incase u have open windows). The cpu usage: linux - no significant change. mac - 84.5% windows - 0% (behaves as if the netscape is idle) window.open(url, '', ''); worked fine. window.open('http://www.apple.com/trailers/columbia/spiderman/qt_low.html', '300', '400'); also works...
adding to cc list
Whiteboard: PDT → [PDT]
Per request, I went through the builds between when 0.9.3 branched (day of 7/31), and when 0.9.4 branched (day of 9/04), to track down the date that this changed. I have: 2001-08-08-09-trunk Mac OS 9.0: works 2001-08-09-09-trunk Mac OS 9.0: works 2001-08-10-08-trunk Mac OS 9.0: zombie 2001-08-13-08-trunk Mac OS 9.0: zombie (no builds for 8/11 and 8/12). I'll attach a link to the checkins for that period of time (8/9 8am to 8/10 8am).
Attached file checkins to trunk 8/9 8am to 8/10 8am (deleted) —
hey you!! modality bitch!! here is what seems to be going on... hold onto your chair 'cause it aaint pretty !! First, an image is loaded (in this case chrome://navigator/skin/btn1/stop.gif). it turns out that this image is *not* in the imgCache, so an imgRequest is created and the underlying storage is read asynchronously... Now before this imgRequest finishes, a new eventQ is pushed (for our new window modality joy)... next, the *same* image is loaded again (with the new eventQ active)... this time, the image *is* in the cache - because it was in the process of loading... so, the *same* underlying imgRequest is reused. Unfortunattely, the first imgRequest is STALLED because it's eventQ is no longer active. This causes the second imgRequest to also STALL... thus preventing navigator.xul from ever finishing and so we never exit our modal event loop :-( The bottom line is that we have an underlying race condition whenever we load certain chrome:// urls in this situation :-) By totally disabling the imgCache we no longer have this problem :-) Of course we don't have a cache either :-) So, i'm looking into the possibility of adding the eventQ as a 'imgCache discriminator'... hopefully this will work :-)
By the way, that page will correctly load the popup if you remove the humongous background image (1500px x 1000px JPEG == 6,000,000 bytes). Of course, that's probably the reason why the chrome image isn't in the cache, which leads to the above situation ... e.g., http://jrgm.mcom.com/bugs/104769/page4.html -- replica minus background, works http://jrgm.mcom.com/bugs/104769/page5.html -- replica with background, hangs
-> rpotts i'm putting together a patch to deal with this situation... --rick
this time i *actually* reassigned it :-)
Assignee: danm → rpotts
Attached patch Proposed patch... (deleted) — Splinter Review
I've just attached a patch which addresses this problem. The patch does *not* adversely effect 'new window opening' !! It only comes into play in the specific instance when an image is actively being loaded on one event Q and it is requested again from another event queue. If this happens, then a second request is made however, it is NOT put into the imgCache. -- rick
Comment on attachment 53860 [details] [diff] [review] Proposed patch... r=pavlov
Attachment #53860 - Flags: review+
Who can SR this patch, so we can get it inot tonight?
Comment on attachment 53860 [details] [diff] [review] Proposed patch... Make the cacheId thingy a void* in stead of a PRInt32 so that it'll work reliably on 64 bit platforms too. sr=jst
Attachment #53860 - Flags: superreview+
tentative PDT+, pls check this into the 094 branch ... but , pls not jst's comments.
Whiteboard: [PDT] → [PDT+]
patch checked into the 0.9.4 branch (incorporating jst's comments).
patch checked into the trunk (incorporating jst's comments). closing out.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Can you recommend some test scenarios to try out which this code will mostly affect? Thanks.
*** Bug 88355 has been marked as a duplicate of this bug. ***
The patch effects image loading and when images are retrieved from the image cache... if the code is 'correct' it should only be a factor when new windows are opened, or there is some type of window modailty. however, since some of the new code is executed whenever an image is loaded, it would be a good idea to verify that there are no image loading regressions... -- rick
looks god on commercila branch builds: windows 2001-10-17-05-0.9.4 linux 2001-10-17-04-0.9.4 mac 2001-10-17-03-0.9.4
Status: RESOLVED → VERIFIED
We should also verify that the cases in dup Bug 88355 (and it's many dupes) are verified as working correctly. stephend - since you filed the other bug, do you have time to do this on the 094 branch builds? Thanks
My bug is truly fixed, but as far as verifying all of the other testcases, I'm not so sure. I checked Linux, Mac OS 9.1 and Windows 2000 on my testcase. Maybe Prashant can help out on the DUPs?
Lisa, as far as I can tell, bug 91212, bug 99333, bug 102486, bug 96360, bug 104650 and bug 104820 all work fine on the 0.9.4 commercial branch. Can't verify yet on trunk. Most of the bugs have clear testcases which can be broken down to exactly the same as mine in bug 88355. For 100% assurance, some Javascript QA might be needed though.
Great, thanks. tpreston has run through the image loading testcases she has and no major problems found. desale is out so I will cc: gerardo for the JS testing although we've will run through the set of JS tests with today's branch builds, but will double check.
The Client-side JavaScript test suite is being run; and HTML Elements that open new windows, as well as elements used for images are being tested on the latest 0.9.4 builds.
per request (although I'm a tad tardy :/), this checking had no performance impact for page load or new window time on any of mac/linux/win32.
*** Bug 105483 has been marked as a duplicate of this bug. ***
*** Bug 94630 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: