Closed Bug 132759 (100%CPU) Opened 23 years ago Closed 22 years ago

[HANG]high(100%) cpu usage with Flash: WM_USER+1 events given too high priority and starve UI

Categories

(Core Graveyard :: Plug-ins, defect, P1)

x86
Windows NT
defect

Tracking

(Not tracked)

VERIFIED FIXED
Future

People

(Reporter: andrea.aime, Assigned: serhunt)

References

()

Details

(Keywords: hang, perf, topembed-, Whiteboard: [PL2:Vendor]][NOT A MOZILLA ISSUE THIS IS A FLASH ISSUE -- NOT MOZILLA])

Attachments

(5 files, 22 obsolete files)

(deleted), application/x-shockwave-flash
Details
(deleted), application/octet-stream
Details
(deleted), application/octet-stream
Details
(deleted), application/octet-stream
Details
(deleted), patch
serhunt
: review+
serhunt
: superreview+
Details | Diff | Splinter Review
From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:0.9.9+) Gecko/20020314 BuildID: 2002031403 Just try to go to the cited URL and see... there are lots of problems with this page, fixed background, animated gifs and lots of flash animation, but in IE 5 it works fine Reproducible: Always Steps to Reproduce: 1. go to www.informaniak.it 2. 3. Actual Results: The page should not hang mozilla with 100% CPU utilization Expected Results: Had to kill Mozilla... Hardware: PII450 with 380 MB RAM
Also very slow on www.infomaniak.it, but CPU goes to 100% only when scrolling... seems that fixed background is playin a major part in the slowdown... maybe only a duplicate of some other bugs?
It seems to be no problems with 21th march build under WinXP. On my computer, of course ! ;-)
My 1.5 GHz machine loads this page but it seems to be chewing plenty CPU cycles. This particular page is triggering the Flash plugin but I've seen other pages running other plugins like Acrobat Reader also cause the browser to become unresponsive. With this page loaded I can click in the URL bar (delayed responsiveness though) but can't seem to close the button with the top right `x' window close widget. Over to plugins for a look.
Assignee: sgehani → beppe
Component: XP Apps → Plug-ins
QA Contact: paw → shrir
Could also be that it is unrelated to the page. Today I saw Mozilla suddenly jump to 100% CPU just by looking severely on it...
4.79 bows down too. same behaviour on this page as 6.x
Status: UNCONFIRMED → NEW
Ever confirmed: true
*** Bug 141400 has been marked as a duplicate of this bug. ***
*** Bug 144390 has been marked as a duplicate of this bug. ***
*** Bug 144713 has been marked as a duplicate of this bug. ***
*** Bug 143580 has been marked as a duplicate of this bug. ***
I heard that the latest release of flash 6.0r26 (probably) fixes these freeze issues. Can anyone check as well ? Thx!
Mozilla with Shockwave Flash 6.0 r29 still goes to 100%
Henrik: what sites did you visit? Did you test the ones listed here, or do you have other places you test?
http://terraplus.terra.com.br/ is also making mozilla suck
Priority: -- → P2
Whiteboard: [Flash]
Target Milestone: --- → mozilla1.0.1
http://terraplus.terra.com.br/ I crashed my debug build on the last URL playing with back/forward. By the looks of my stack and the source, I hope bug 66748 will fix the crash. http://zooch.hn.org/mozcrash/mozcrash1.html Something funky is going on here. I see the plugin get created and destroyed. A simplified testcase would help. http://www.ananova.com/assets/ticketsbanner.swf Is more than normal (about 30-40%) but not 100% cpu using Flash 6.0r29
this site http://radio2.dk/kobenhavn.htm is also pretty high on CPU usage. It's Java. using Sun Java JRE 1.4.0
ok..my findings: This new version does definitely bring *down* the cpu usage on "shockwave.com" . Initially, it peaks up to 100%..but later on comes down withiina s econd ( this wasn't the case with 6.0r23)..however I did not see the two advertisement banners come up this time ( orbitz andd another one)..maybe that's the reason? Also, the links Henrik mentioned are still showing a rise rise in cpu usage ...playing in the mid 90's. Especially, The terraplus site...it has some mambo jumbo music on it as well...it's really slowing things down. cc'ing Troy evans from Macromedia for his insight.
Quite likely that this depends on Windows flavor. ananova.com works fine for me with 5.0 r41 on WinXP.
I tested on NT
http://www.ananova.com/assets/ticketsbanner.swf definitely spikes up to 100% and stays there with flash6.0r29 on my NT.
av, are u using flash5.x? Pls try flash6 (r29). flash6.r23 introduced these issues..is what I learnt from people.
shockwave.com definitely does not chow on cpu with 6.0r29 .I just checked with the ad banners ON.
playing of http://www.overclockmaniak.it/newsite/images/ventola.swf (I've saved the content of that file in this attachment) is very cpu expensive and this is flash plugin problem not mozilla, well mozilla problem is that it allows to run such kind of cpu hungry plugins on its main ui thread:( BTW: with IE cpu usage jumps 100% too, but its ui still can respond.
Adding also topembed here. Equally important for embed-based products. Adding Roger to the cc since terraplus.terra.com.br is major portal in Brazil.
Keywords: topembed
I just repro'd this using a 1.0 mfcEmbed build from today. As this is a problem in the flash plugin, the likely only way we can resolve this is to start running plugins out in their own thread/process which isn't going to happen anytime soon AFAICT.
Keywords: topembedtopembed-
http://www.idefense.com/XSS.html also seems to do some high cpu usage
*** Bug 146821 has been marked as a duplicate of this bug. ***
*** Bug 148341 has been marked as a duplicate of this bug. ***
Flash and high CPU usage also in bug 99361, bug 115666, bug 124643
a re-written Summary for this bug would help find it: high cpu usage with Flash?
life has not become any easier with this 6.0r29 drop from flash. just going to shockwave.com or the test url above still takes the cpu to around 100%..slows everything down on my box and stays there for a while. the browser is literally hung upside down...ugh !
Summary: 100% cpu utilization, I had to kill Mozilla → high(100%) cpu usage with Flash
Still works fine for me with 6.0 r29 on WinXP, CPU is about 30-40% in almost all testcases. The test case in the attachment does show 100% but the browser is still perfectly responsive. I am testing with NS7.
*** Bug 149849 has been marked as a duplicate of this bug. ***
shrir: what utility are you using to monitor your usage? I am not seeing consistent peeking, in my testing I have seen small peeks in respect to mouse events, which the plug-in instance may be coded to monitor. On other pages, I do not see that same problem. I wonder if this is not due to event monitoring by the plug-in
am using task manager...I do not move my mouse once I am on shockwave.com...and I see a lot of slowness...arun is shockwave looking into this? Thx!
*** Bug 151357 has been marked as a duplicate of this bug. ***
is anyone taking this bug seriously ? Oh pls.. look at the number of dup....trust me. Can someone form macromedia be contacted? Arun, before u leave on vacation, you gotta clear this one up :)...pls !
*** Bug 124643 has been marked as a duplicate of this bug. ***
shrir - I think people are taking this bug seriously, but based on Comment #27 From Judson Valeski. This sounds like a issue Macromedia needs to address to ME.
Whiteboard: [Flash] → [Flash] [adt2 RTM]
ok guys, heres the deal. It's not a flash problem totally. Flash eats a lot of cpu cycles. It has to, to do what it does. The problem is in mozilla, and it's deep down inside it. I have researched this issue, and some of the pages posted here are getting cpu cycles eaten up because the flash is very large on the screen. Many times this takes up 100% or close cpu time in ie or mozilla, this is not a problem. The problem is that unlike ie, mozilla runs plugins in the same thread as its ui. This is why ie appears responsive while mozilla does not. One of the main problems i've had deals with multiple instances of flash. If there are multiple instances of flash on a page, they slow down the browser considerably, this is because mozilla renders the flash even when it is not visible in the window. In IE, you can minimize the window, or scroll down the page to where flash isnt visible, and the cpu cycles it eats go down. IE doesnt render flash that isnt in view! what a good idea! I love mozilla, but i have recently had to go to many websites with multiple flash instances(forums with flash signatures). This is killing my browser so i have to go to ie(which i hate) so pleas fix this so i can use mozilla 100% all the time!
This bug is not about a problem in Mozilla but rather something with Flash 6 because the same problem is present using 4.x with the testcases in this bug. I don't believe the problem was present with Flash 5, but can someone confirm? Does anyone know if Macromedia will fix this in a dot release?
Summary: high(100%) cpu usage with Flash → high(100%) cpu usage with Flash 6
Whiteboard: [Flash] [adt2 RTM] → [Flash] [adt2 RTM][NOT A MOZILLA ISSUE THIS IS A FLASH 6 ISSUE -- NOT MOZILLA]
Bug 124643 was marked a dupe of this, but Bug 124643 is linux-only. Also, that bug clearly showed how flash used much more CPU time in Mozilla as compared to Netscape 4. (which would hint towards Mozilla problem, not flash). Should that be re-opened??
i had the same problem with flash 5, its all in the way mozilla handles plugins. I dont think this will get fixed very quickly, due to the massive changes it will take to do something like this. Plugins need to be run in their own thread, and plugins need to be turned on only when in the viewable area. These 2 things will make flash in mozilla 100% better imho. as i said before, flash takes a lot of cpu cycles by nature, in all browsers, it's the fact that mozilla handles this much less gracefully than IE.
Having a separate thread for a plugin may be a valid point, but I can hardly imagine, say, some audio plugin which stops playing music when user scrolls it out of view. Does Flash really stop do whatever it is doing? Maybe they have a special communications with IE to allow the browser to stop plugin in such a case?
I do not think it is correct to shut down the plug-in if it is out of view, as AV mentioned that is not appropriate for an audio instance. We also have to contend with the tabbing option - if a user tabs to a different window -- should we really stop playing the radio? We are looking at separating out the plug-ins and determining how effective that will really be. As Peter has mentioned, we have had discussions with Flash and they know that there is a issue and they are actively working that issue.
As per comment 44 I'm not convinced that bug 124643 really is a dup of this. The complaint was that flash on Linux had become slower in newere versions of Mozilla (not that it chewed up 100% of the CPU - I don't think it does). Flash does not chew up 100% of the CPU for me and if it were a flash problem why don't older versions of Mozilla (or indeed Opera 6) run so slowly? I would also ask that someone check that 100% CPU usage happen on not Windows OSes (I'm not sure it does but if someone can check and change the OS on this bug).
*** Bug 143759 has been marked as a duplicate of this bug. ***
*** Bug 152043 has been marked as a duplicate of this bug. ***
*** Bug 152186 has been marked as a duplicate of this bug. ***
check about Bug 152186 ... it's not a duplicate.. the cpu usage is only bout 40 - 50%.. scrolling works okay.. the only problem is bout those 3 buttons
Blocks: 71668
Blocks: 152186
No longer blocks: 71668
Keywords: perf
this should have been marked as nsbeta1- already, this is out of the mozilla code
Keywords: nsbeta1nsbeta1-
*** Bug 104218 has been marked as a duplicate of this bug. ***
*** Bug 152731 has been marked as a duplicate of this bug. ***
*** Bug 152994 has been marked as a duplicate of this bug. ***
*** Bug 153408 has been marked as a duplicate of this bug. ***
http://www.macromedia.com/software/trial_download/ downloaded flash development trial s/w from here. Then saved a flash ad from usatoday.com locally and launched it in just the standalone flash player (no browser was open , as u will see in the screenshot)...the CPU reaches 100% with just two instances of the flash file open. It's a single frame with fr/rate =15 (shows the app debug info).
http://jazz/users/shrir/publish/Clipboard.jpg could not attach this big a file.
Doron and I just discovered if we set the flash plugin size to 1x1 pixel the CPU usage goes way down but sound continues. Perhaps we could check if a plugin is not visible and then artificially set its size to 1 by 1 pixel.
To update everybody, here's what I've found so far: With the help of Spy++, as a test, I overrode Flash's default window procedure with my own so I could filter windows messages. I put some printf's in and some ::GetTickCount()'s to measure how long between each message. I'm not sure if this has anything to do with the problem, but here's what I found: a) WM_TIMER messages seem to increase with complexity of swf and/or total quantity of swfs open. b) I get many WM_USER+1 (code = 0x0401) messages. Many, many more with two macromedia.com windows opened. c) Sometimes, I get a difference of zero in tick count between consecutive 0x0401 messages. Am I incorrectly sampling time? d) So I tried consuming these 0x0401 messages when the diff was zero, and lo and behold Flash would not do any animation however the background color seemed to paint. It also would not send any more of these kind of events after that. What's more interesting is that I could easily open many, many browser windows with macromedia.com and still have a responsive GUI. I'm not sure if any of this info will help, but I wonder if anyone else thinks there could be something up with the timing of these messages or perhaps in the Flash painting? Ideas? Thoughts? Doron's observation that a 1x1 sized plugin doesn't show this bug could possible mean it's related to painting. Tomorrow, I may try to figure out a way to slightly delay the forwarding of these 0x0401 messages. We already have a bug that we are at least 25% slower on the Mac sending idle events and I hope the same bug won't be repeated on Windows. :(
Peter, don't rely much on GetTickCount, it has quite an error span. I think its accurateness is somewhere around 50 ms.
|GetLocalTime| is probably better. However (from MSVC help): "It is not recommended that you add and subtract values from the SYSTEMTIME structure to obtain relative times. Instead, you should: o Convert the SYSTEMTIME structure to a FILETIME structure. o Copy the resulting FILETIME structure to a LARGE_INTEGER structure. o Use normal 64-bit arithmetic on the LARGE_INTEGER value."
*** Bug 118949 has been marked as a duplicate of this bug. ***
handing over to andrei to help resolve this issue
Assignee: beppe → av
Real fix shouldn't pass everything around in global vars, but this demonstrates that we can throttle the WM_USER events that Flash is apparently using as a callback timer without impacting the performance of the animation. In fact, this hack gets better cpu usage than IE. Cheers
Added Freeze to summary and bumped up to P1 The patch needs to be reviewed for risk, we also need to figure out how to resolve this issue on mac and linux. Once the patch is considered ready for primetime, this needs to be checked in on the trunk and baked for a few days. QA will need to test it thoroughly. We will also need to get MM to test this and let us know if it affects anything within their application. This may be a bit too risky to check in at this point in time (on the branch).
Keywords: hang
Priority: P2 → P1
Summary: high(100%) cpu usage with Flash 6 → [HANG]high(100%) cpu usage with Flash 6
We need to consider several options in what to do with fast consequitive messages: 1. just pretend they are consumed and return 2. reschedule with a timer, and when it fires: 2.1. do SendMessage (append to the MQ synchronously) 2.2. do PostMessage (append to the MQ asynchronously) 2.3. do CallWindowProc (call immediately) 3. say it is not consumed and put it back to the message queue I am a little concerned with the approach in the above patch. As far as I understand it follows scenario 2.3. In this case we will accumulate things and create new timers faster than old ones fire. We probably need some smart compromise on dropping messages. Ideas?
The WM_USER throttling makes me very nervous. While it fixes the 100% bug, it will almost certainly introduce other bugs into the Flash Player, notably in regards to video and sound playback. Also, there is some risk to Mozilla that WM_USER events become unreliable, which could cause Mozilla problems in the future. I fear the throttling will lead to a "whack a mole" bug situation where you're always putting one down to find one more... Peter's analysis is essentially correct: we post about 120 WM_USER messages per second, per movie, for a 12 fps movie. This is extravagent for the plugin architecture, and it is a bug we are reviewing. I think a solution on the Flash Player side makes more sense: we're causing the bug, and our code can make a better quality / responsiveness decision than the browser could make for us. A related issue -- the plugin draws when off screen -- is compounding the problem. We're also working on a fix for this, put the "1x1" behavior is shutting down the paint loop, which works around the problem. Note that the "1x1" solution is inadvertantly taking advantage of an eccentric behavior of the Flash Player, and I would hesitate to rely on it for the future. We're working on a fix, but have no timeline for release. lee thomason Flash Player Architect
Our system also gives higher priority to WM_USER events (which is how PLEvents work) than other events like painting.
Summary: [HANG]high(100%) cpu usage with Flash 6 → [HANG]high(100%) cpu usage with Flash: WM_USER+1 events given too high priority and starve UI
Whiteboard: [Flash] [adt2 RTM][NOT A MOZILLA ISSUE THIS IS A FLASH 6 ISSUE -- NOT MOZILLA] → [Flash] [adt2 RTM][NOT A MOZILLA ISSUE THIS IS A FLASH ISSUE -- NOT MOZILLA]
Lee: right now there are a variety of conditions where mozilla looks pretty bad when running Flash. Any suggestions for things we can do in the short term to address this?
For what it's worth, the throttling hasn't adversely affected any animations or sound in my testing, and based on what Peter was seeing in Spy running IE, you're getting the WM_USER events back from the queue at 70ms intervals in their environment, so I didn't randomly choose that number. That said, I'd much prefer the bug be fixed on the Flash side. As far as I can tell, the message queue is being used as a heartbeat/timer. Are there any other uses for those WM_USER events that I might want to know about? Is Flash internally limited to a maximum frame rate or WM_USER callback rate? If so we could match the timer to that maximum, and probably still solve the problem.
I was looking at Opera, not IE. IE looks like it's running Flash in some sort of "windowless" mode so I don't think it's a good comparison. In looking at Opera, I noticed different speeds, some as small as 10ms but they all seemed uniform and most times were closer to ~50ms.
*** Bug 155815 has been marked as a duplicate of this bug. ***
Blocks: 154896
No longer blocks: 154896
Blocks: 154896
*** Bug 156268 has been marked as a duplicate of this bug. ***
Peter, we have a contact inside Opera if you think they are doing something special with flash.
"Any suggestions for things we can do in the short term to address this?" Probably not; the essential problem is a bug in the Flash Player. However, even if you strive for the commendable goal of making the browser safe to ill- behaved plugins, you tread a fine line between safety and introducing unforseen behavior to plugins (not just Flash). I'll be in on some brainstorming later this week on how that might be possible, but from what I know at this point it's not a quick fix. (Well, the fix may or may not, but the testing of both Mozilla and the relevant plugins will take some time.) I don't mean to be overly conservative about this bug fix (Flash Player and / or Mozilla enhancement) but Mozilla and Flash needs a robust solution that we can be happy with for a long time.
An interesting behaviour I've noticed (I most probably missed something related to this, so sorry if I'm repeating) in http://www.salo.cl is that every flash piece seems to be highly responsive, while the rest of mozilla can't even manage to redraw itself. (Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.1a+) Gecko/20020709)
We noticed that IE takes no CPU when the window containing Flash is hidden, while Gecko still takes up the cpu as if it was showing. Is there a way to get this benefit into our code?
IE looks to default to display flash in "windowless" mode so it has fine grain control over the events. If the Flash Netscape plugin was windowless, I think we could do the same. See bug 93959 for more information about windowless/transparent Flash and also some sample code.
*** Bug 115666 has been marked as a duplicate of this bug. ***
Whiteboard: [Flash] [adt2 RTM][NOT A MOZILLA ISSUE THIS IS A FLASH ISSUE -- NOT MOZILLA] → [Flash][NOT A MOZILLA ISSUE THIS IS A FLASH ISSUE -- NOT MOZILLA]
Target Milestone: mozilla1.0.1 → mozilla1.0.2
Attached patch hack to throttle flash plugin callbacks (obsolete) (deleted) — Splinter Review
This patch makes sure there is at least 33mS between each call to flash. note: this (hack) only supports one flash per window
Attachment #90109 - Attachment is obsolete: true
Attached patch same hack moved to nsWindow (obsolete) (deleted) — Splinter Review
Here's some changes I've been working on to the platform-specific widget Windows code to do the same kind of throttle hack as the last few patches. It doesn't quite work very good, but nsWindow is probably a more likely place to implement some kind of subclassing as there is already code that supports it. I just publicly exposed it and also added some code from bug 128127. Then I gave plugin windows their own ProcessMessage routine. Some of the globals still need to be moved into the class. The globals may be why this patch also does not solve the problem of several windows open to the macromedia.com page. Kind funny that this also has the side effect of indirectly fixing bug 38484. This is a hack and not a real patch.
> Created an attachment (id=91206) > same hack moved to nsWindow Cool! My current opinion is that we have 3 problems: 1) Flash draws even when its drawing area is not visible. 2) Mozilla has a single event loop which give priority to WM_USER events 3) Moz should gracefully manage plugins that request service too often Throttling plugins to some upper limit (say 30-50 per second) is a good defensive strategy for moz. We should probably this add to moz. But I want to point out that even with throttling I could still get the CPU load up to 100% by opening multiple windows with Flash running. When the plugin load is 100% moz should still allow the user to use the window close button (the "X" button) to get rid of offending windows. Moz will probably also need to re-prioritize its event queue or have multiple event queues. However, It is my current opinion that the really critical problem is that Flash continues to draw even when it drawing area is not visible (as Jon pointed out in comment #42). If this were addressed then I believe the chance of seeing the other problems (#2 and #3) diminish significantly.
Hm...I think we could do a better job of hiding the plugin too. This may help bug 116766. What we currently do is rely on the OS when a parent window is hidden. For example, on a tab switch, nsWindow::Show(PR_FALSE) is called on the document's widget. :ShowWindow(mWnd, SW_HIDE); is called on that window, but it doesn't appear that that children are ever notified. A possible hack could be going through the children when nsWindow::Show is called and notifying (or possibly hacking the size) of windows that are plugins. That last patch helps a little bit in that it allows for nsWindow to detect plugins by eWindowType_plugin. Of course, this does not apply to top level windows which is where the problem appears to most manifests itself.
simple flash that has low drawing load use this to see the load from the servicing the event loop
Removing minus from topembed to reconsider for embeddin
Keywords: topembed-topembed
My belief is that this was nsbeta1- and topembed- because the fix for this needs to come from Macromedia in a new drop of Flash (i.e. 3rd party issue, not a Mozilla issue). Returning the -, per my chat with Mike.
Keywords: topembedtopembed-
Whiteboard: [Flash][NOT A MOZILLA ISSUE THIS IS A FLASH ISSUE -- NOT MOZILLA] → [Flash][NOT A MOZILLA ISSUE THIS IS A FLASH ISSUE -- NOT MOZILLA][PL2:NA]
Alias: 100%CPU
The bug also occurs when you visit http://news.sina.com.cn , the biggest chinese news site that has lots of flashes.
On my 500MHz PIII displaying http://news.sina.com.cn with today's branch and trunk builds with Flash6 r40 the CPU stays at 100%. Using the experimental Flash6 r40 with the "non visible" fix the CPU is in the 60% range. IE 6 seems to have problems displaying this page.
IE6 have no problem in displaying the http://news.sina.com.cn. I browse it everyday with IE6 and lots of people in china visit the site using IE.
I don't know why, but the Flash IE uses (ActiveX) does not spend CPU rendering Flash that is not visible. The current Flash that gecko uses (Netscape plugin) does and thus pages like this use lots of CPU that is unnecessary. MacroMedia has been contacted about this and we should see a fixed version in the near future. Once this new version is out then we should retest to see what issues exist then.
reassign to me since this is a vendor issue
Assignee: av → beppe
Whiteboard: [Flash][NOT A MOZILLA ISSUE THIS IS A FLASH ISSUE -- NOT MOZILLA][PL2:NA] → [PL2:Vendor]][NOT A MOZILLA ISSUE THIS IS A FLASH ISSUE -- NOT MOZILLA]
Target Milestone: mozilla1.0.2 → Future
Blocks: 139820
Blocks: 38484
I found that a small change to the event queue management made the buttons responsive. There was a secondary problem and I ended up talking to danm. He pointed me to a patch by hyatt that exactly takes out the code that I find helps make the buttons responsive. 3.41 hyatt%netscape.com Oct 16 2001 Fix for 97805, also fixing event prioritization problems on Win32 http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&fi le=nsAppShell.cpp&root=/cvsroot&subdir=mozilla/widget/src/windows&command=DIFF_F RAMESET&rev1=3.40&rev2=3.41 Danm said hyatt did this to get a 10% performance improvment.
Yes, I remember this change. I had it measured at 7.5%, but 10% is the fish story version of that number :-). I'll give this another spin and see what the current number looks like.
So I just ran a couple of tests with and without that old change to remove the PeekMessage for WM_USER-1. Whereas before, in Oct 2001, this made a big change in pageload results, now, at least on win2k, I am not seeing a significant difference. I probably need to try this out on win98 to see if there is a difference on that OS. I probably need to sanity check my tests too, since I'm puzzled. But maybe this is no longer an issue (e.g., something else changed that made this change moot?). [hyatt: I know you no longer remember anything about Win32, but maybe you have a comment]. [I should note that I'm running the current trunk build, but I also have pending changes from bug 144533 that substantially reduce the number of progress notifications that arrive in the front-end code.]
Attached patch patch to prioritize min/max/close buttons (obsolete) (deleted) — Splinter Review
This patch prioritizes the minimize/maximize/close button so the respond even when flash has the CPU at 100%. It could be reduced to just test for the single event WM_NCLBUTTONDOWN.
If the code includes WM_PAINT in the prioritization then I see the throbber blinking between its final or static background and the transitional images; eg: prioritizing this "::PeekMessage(&msg, NULL, 0, WM_USER-1, PM_REMOVE)", or any range that includes WM_PAINT seems to cause the throbber to blink.
the patch attachment 93671 [details] [diff] [review] helps but when flash is using 100% CPU and the user tries to move the window moz hangs in ::DispatchMessage(&msg); where it appears to be reading the event queue in the natural (unsorted/unprioritized) order. As long as Flash use PostMessage their events will have priority over keyboard and mouse. This is the result of Microsoft's design choice to always handle posted event before user/hardware events. The theory is that 2 hardware events, eg: keydown followed by keyup, can generate multiple posted events; eg: keydown, send char, keyup, which should be processed before the hardware events. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winui/winui/windowsuserinterface/windowing/messagesandmessagequeues/messagesandmessagequeuesreference/messagesandmessagequeuesfunctions/peekmessage.asp As far as I can tell, as long as Flash uses PostMessage (or SendMessage) and not a timer event, moz will always be subject to hangs when Flash is using 100% CPU.
Note: bug 157144 is related to this. Bug 157144 covers the case where Mozilla posts WM_APP events for notification of PLEvents which are used for reflow, timers, network data available, etc. When Mozilla posts these events during the load of long documents the UI will freeze if the user tries to move the top level window or resize it.
Blocks: 151917
we can fix this by have a window proc that checks if user input is pending is if it is then delays the plugin by putting it on a timer with a delay of 0. PRBool nsWindow::ProcessPluginMessage(UINT msg, WPARAM wParam, LPARAM lParam, LRESULT *aRetValue) { *aRetValue = 0; // check if user input is being starved // input waiting so delay flash if (HIWORD(GetQueueStatus(QS_INPUT))) { if (msg == WM_USER+1) { // need to make this generic for all plugins plugin_wparam = wParam; plugin_lparam = lParam; //printf("delay Flash\n"); return ::SetTimer(mWnd, NS_EVENT_THROTTLE_TIMER_ID, 0, (TIMERPROC)EventTimerThrottleCallback); } } nsWindow *someWindow = GetNSWindowPtr(mWnd); if (someWindow) *aRetValue = ::CallWindowProc((WNDPROC)someWindow->GetPrevWindowProc(), someWindow->mWnd, msg, wParam, lParam); return true; }
FYI: I'm working on a more complete solution over in bug 157144.
www.sina.com.cn fails to draw some of the flash areas if I delay these messages: 1121, 1122, 1123
Attached patch WIP patch (obsolete) (deleted) — Splinter Review
This still needs some work: I still need to find out what Flash is doing with message IDs 1121, 1122, 1123. This patch crashes if the Flash plugin is missing but this appears to be in the code carried over from attachment 91206 [details] [diff] [review].
Attachment #91037 - Attachment is obsolete: true
Attachment #93671 - Attachment is obsolete: true
Attachment #96096 - Attachment is obsolete: true
I've opened bug 164084 to discuss whether we should remove the PeekMessage code. Please cc: yourself if interested.
I find that with attachment 91206 [details] [diff] [review] and no Flash plugin moz crashes when I try to go to http://www.sony.com.cn/home/index.htm
Attached file sample java applet (deleted) —
moz crashes displaying this applet
Just to mention another web site with slooooow performance on Mozilla (1.0.1 RC2), Shockwave Flash 6.0 r47 : http://www.citroen.fr (the site is horrible even on IE, but this is not the point !) I think it is related to this bug too.
Attached patch WIP; needs tuneup of the subclassing code (obsolete) (deleted) — Splinter Review
yes, the http://www.citroen.fr site can use 100%CPU and this WIP patch allows gecko to continue to respond to user input.
Attachment #91206 - Attachment is obsolete: true
Attachment #96259 - Attachment is obsolete: true
Should the owner for this bug be reassigned to Peter or bstell, as they are actively working on the issue?
Here's a patch to only the plugins module that will put any WM_USER+1 messages sent to the plugin's window on a normal Gecko PLEvent, like is used other actions such as layout reflow and timers. On my 1.4 Ghz, I get slightly better response from the UI on http://www.overclockmaniak.it/ with my patch than with Brian's but I need to test on a slower system. Since this patch combines the priority of both Flash's events with ours, it's a bit more difficult to lock us up than before however it's still possible to have many Flash plugin instances overwhelm us with events. Also this is quite touchy as when I rescheduled other events besides WM_USER+1, Flash completely stopped working. I think the real fix needs to come from Macromedia. This patch is also still a work in progress as there are still remaining issues to be resolved but please try it out.
Peter, didn't we see the other day how a plugin steels our subclassing? Was it with previous version of Flash?
Jamie: ultimately, this issue needs to be resolved by the plug-in vendor. Any hack that we provide will truly only mask the issue, not resolve it. So, regardless if someone is working on this or not, or if a patch is checked in or not, this bug needs to remain open until such time as the vendor supplies the resolution. Keeping this one assigned to me is fine, I am tracking vendor related issues.
Another site using Flash which drive CPU to 100%: http://disney.com.cn
Attached patch revised patch using PLEvents (obsolete) (deleted) — Splinter Review
Attachment #97862 - Attachment is obsolete: true
Attachment #98279 - Attachment is obsolete: true
Attached patch revised patch that only delays Flash messages (obsolete) (deleted) — Splinter Review
Attachment #98535 - Attachment is obsolete: true
Attached patch patch v.11 (obsolete) (deleted) — Splinter Review
This is slightly modified previous patch. - I think using hash table to create window/instance associations is a bit overkill here, we can do that using Win32 API - I put winproc code in a separate file just to keep thing cleaner - SetProp/GetProp API calls are not supported in Windows CE, is it going to be a problem? From the other hand I found them elsewhere in the Mozilla code.
I worked with Rick Potts a bit today and he noticed that flash runs much faster on Navigator compared to gecko.
Brian's last comment suggests that we have to be careful as we declare issues as Macromedia issues (and NOT Mozilla issues) vs. issues we find are legitimately Mozilla issues. Brian stopped by and showed me how N7's Flash performance varies distinctly from Netscape Communicator 4.x, which displayed Flash animations working faster than on N7. So perhaps this bug does indeed boil down to an issue which we can, at some point, hand off to the plugin vendor. But a *separate bug* should be logged in which we dig a bit deeper into our performance concerns. I guess I'm looking to Brian to log this bug, and to mention that bug number here for reference.
Is the following URL affectected by this precise bug? Pay attention, I loose the control of Mozilla when I use it! Patrick --- http://www.speed.be/ADSL/resultat_adsl.asp
the http://disney.com.cn site has a 100% load from java and does not show a UI lockup
Patrick, yes, running a build built with one of the last few patches does allow UI responsivness for me on this URL: http://www.speed.be/ADSL/resultat_adsl.asp Without the patch, the page doesn't even complete loading for me before the hang. ---- I really like the approach Andrei took in attachment 98665 [details] [diff] [review] in that it creates a new class for our platform specific plugin window code which is currently #ifdef sprinkled throughout.
My concern with using the current PLEvent starvation code is that the paint starvation time of 750 mSec (3000 mSec for win9X) makes using menus very sluggish and cascading menus do not cascade at all since the PLEvent starvation code does not test for timer starvation. I do not find that moz hangs when displaying http://www.speed.be/ADSL/resultat_adsl.asp but the startup time is very long.
s/I do not find that moz hangs/With these fixes I do not find that moz hangs/
*** Bug 151750 has been marked as a duplicate of this bug. ***
*** Bug 168197 has been marked as a duplicate of this bug. ***
Attachment #98612 - Attachment is obsolete: true
We have kind of two separate tasks here: subclassing plugin window and message handling. I think we should benefit from subclassing anyway and must take more carefull approach here, the current patches limit us to 4x plugins only. After talking to Peter who started to look at it, I will continue to try to find the better way of implementing this. Meanwhile, message handling experimenting may continue with the existing patch as it is more or less independent thing, and whatever is found the best eventually will be combined with whatever subclassing approach Peter and I may come up with.
We should be able to solve this problem by rescheduling the WM_USER events as PLEvents after I land the fix for bug 164931. I am planning on making the plevent native notification use timers on WIN32 when a document is not loading. Using the patch in bug 164931 and rescheduling the WM_USER events as PL_Events cut the test case in this bug from 100% to 90% in a Debug build and the UI remained interactive while flash was running. The patch in bug 163528 should also make Mozilla more interactive. It forces the synchronous paint of any pending paint events before processing mouse or keyboard input
I've also experienced this using the final 1.1 version. You can "predictably" get the (almost) 100% CPU when visiting www.theinquirer.net Also, sometimes "No reason" is clear (ie. no flash animations running or visible). Usually "just waiting" brings back the UI, had to kill the app only on a couple of ocasions
Painting with the latest patch occurs much more smoothly.
Attached patch patch v.13 (obsolete) (deleted) — Splinter Review
This patch utilizes Peter's idea of subclassing the plugin window from object frame rather than plugin instance code, because we don't have an access to the plugin instance code in general case. - code moved to layout - new object |nsPluginNativeWindow| introduced, it is derived from |nsPluginWindow| struct and has pure virtuals to manipulate native window stuff - subclassing now happens for all plugins, although all the messages are just relayed to the original winproc, except for WM_USER+1 for Flash when starvation occurs - event handling logic was just taken from the latest Brian's patch, Brian please take a look to make sure I did not break anything.
Attachment #98665 - Attachment is obsolete: true
In general I do not care exactly how this solved. I'm okay with using the PLEvent code for delaying the messages. There are several issues I do care about. Avoiding unneeded malloc's ========================== The patches I have seen so far put all Flash messages on the PLEvent queue. This causes us to use a malloc'd class to hold the event info. In general I prefer to avoid malloc'ing when not necessary. For Flash messages which are projected to come in at 120 messages per second per Flash area I really would strongly prefer not to malloc. For example, the www.sina.com.cn page today has 9 Flash animations which if there were 120 messages per second would mean 1080 malloc's per second. I believe that to avoid malloc'ing for every message we will need to test for starvation locally and only defer the message when starvation is happening. Perhaps we can reuse the PLEvent starvation logic locally. Relatively Prompt Painting =========================== Bug 164931 may allow us to use a much shorter paint starvation time. Since page layout is a one time event we probably want a relatively long time so we do not paint too often. This optimizes layout speed. If menus are sluggish during page layout that seems okay since this will not last long. Flash animations are quite different in that they are a continuous activity so we need prompter painting so menus are usable. The current 3 second starvation for Win9x seems far too long for menus to be usable. Timer Starvation Detection ========================== If we use PLEvents we will need to add timer starvation logic so cascading menus will work.
Blocks: 169071
"Bug 164931 may allow us to use a much shorter paint starvation time." When the fix for bug 164931 is checked in *all* PL_Event notification will be done with a timer except when a document is loading. The input and paint starvation limits will only apply when a page is loading. All painting will complete before the next PL_Event notification is processed. There will not be any paint starvation. If I check in the fix for bug 164931 and we continue to use posted WM_APP events for flash after the page is loaded flash will be able to starve out Mozilla layout, paint and timer events.
"Avoiding unneeded malloc's" this is probably an issue for Mozilla timers as well since they are implemented as a separate thread which posts a PL_Event when the timer expires. DHTML animations often have at least one timer and some have many timers running simultaneously. Maybe we need to create an arena for PL_Events? "Relatively Prompt Painting" The fix for bug 164931 should eliminate this issue for flash animations after the page has loaded. Paints should not be starved for flash or Mozilla after the page has loaded since the timers for PL_Event notification will not fire until the paints have been processed "Timer Starvation Detection" The fix for bug 164931 should also eliminate this issue for flash animations after the page has loaded. If we decide not to fix this issue by converting the WM_USER flash user messages to PL_Events I would suggest always converting the WM_USER flash messages into timer events instead of doing it only when timers, paint, and input are starved. This will significantly lower the CPU utilization on pages with flash animations.
Per Peter's suggestion I made a version that adds a PluginWindowEvent as a member of the ns4xPluginInstance window to avoid the malloc per message. The patch always puts Flash messages on the PLEvent queue. I also applied the patch from bug 164391 which makes the UI responsive even when running http://www.overclockmaniak.it/. I do note that with the patch from bug 164391 that the CPU never goes over about 80% on the 1.8 GHz system I am developing on and the Flash animations seem a bit slower. I heard that Andrei is working on a revision. If desired I could merge this into his new version.
the previous message should read: "bug 164931"
Taking it back since I am one of those who is working on it. Brian, Peter, are we decided on what approach to take eventwise? Brian, you may post your latest version using any of the last three patches since the event code is pretty distinct and I can easily incorporate it to whatever subclassing code I will come up with.
Assignee: beppe → av
I think the problem of always mallocing PLEvents is one beyond the scope of this bug. I've opened bug 169247 to investigate using areans for PLEvents throughout the code.
Andrei: here is a version that uses the PLEvent queue and only malloc's once. The patch in bug 164931 is also required to actually prevent UI starvation.
Attachment #98957 - Attachment is obsolete: true
Depends on: 164931
Attached patch patch v.15 (obsolete) (deleted) — Splinter Review
This patch combines the latest from event handling (the very previous patch) and subclassing. I think it is complete enough and is in working condition. I tested it with Flash, the default plugin, scriptable sample, Java and Real. Please review.
Attachment #99307 - Attachment is obsolete: true
Attachment #99558 - Attachment is obsolete: true
*** Bug 160775 has been marked as a duplicate of this bug. ***
Attached patch patch v.16 (obsolete) (deleted) — Splinter Review
Some Peter's comments from our conversation addressed: - default native window stubs for other platforms are now common and reside in one file - full page mode is added to the mechanism - |nsIPluginInstance->SetWindow| call is now encapsulated in |nsPluginNativeWindow->CallSetWindow| - undoing subclassing is now hidden in the dtor Please review.
Attachment #99611 - Attachment is obsolete: true
Andrei: I think we should add code to remove any pending events when destroying the window. I believe it should look like this: nsPluginNativeWindowWin::~nsPluginNativeWindowWin() { + UndoSubclassAndAssociateWindow(); } PluginWindowEvent* nsPluginNativeWindowWin::GetPluginWindowEvent(HWND aWnd, UINT aMsg, WPARAM aWParam, LPARAM aLParam) { if (!mPluginWindowEvent) mPluginWindowEvent = new PluginWindowEvent; if (mPluginWindowEvent) { mPluginWindowEvent->Init(aWnd, aMsg, aWParam, aLParam); - PL_InitEvent(mPluginWindowEvent, null, &PluginWindowEvent_Handle, PluginWindowEvent_Destroy); + PL_InitEvent(mPluginWindowEvent, this, &PluginWindowEvent_Handle, PluginWindowEvent_Destroy); } return mPluginWindowEvent; }; nsresult nsPluginNativeWindowWin::UndoSubclassAndAssociateWindow() { HWND hWnd = (HWND)window; if (IsWindow(hWnd)) ::RemoveProp(hWnd, NS_PLUGIN_WINDOW_PROPERTY_ASSOCIATION); + // clear any pending events to avoid dangling pointers + nsCOMPtr<nsIEventQueueService> eventService(do_GetService(kEventQueueServiceCID)); + if (eventService) { + nsCOMPtr<nsIEventQueue> eventQueue; + eventService->GetThreadEventQueue(PR_GetCurrentThread(), getter_AddRefs(eventQueue)); + if (eventQueue) { + eventQueue->RevokeEvents(this); + } + } + if (mPluginWindowEvent) { + delete mPluginWindowEvent; + mPluginWindowEvent = NULL; + } + return NS_OK; }
Kevin: does this look correct to you?
Attached patch patch v.17 (obsolete) (deleted) — Splinter Review
- event leak fixed, my fault, thanks, Brian for helping with this one - some minor perfections added to nsPluginNativeWindow base class and its derivative for Windows Please review.
Attachment #99734 - Attachment is obsolete: true
Except for the fix to the assertion this looks fine to me. - NS_ASSERTION(!win, "plugin window already has property"); + NS_ASSERTION(!win || (win == this), "plugin window already has property and this is not us"); I feel competent to review the event queue part, so r=bstell@ix.netcom.com on the event queue part. Perhaps Peter would r= the subclassing part. Thanks for working on this!
Attached patch patch v.17.1 (obsolete) (deleted) — Splinter Review
Very minor change to patch v.17 - fixed assertion in |nsPluginNativeWindowWin::SubclassAndAssociateWindow| after ::GetProp - made property name a bit more unique
Attachment #99749 - Attachment is obsolete: true
Comment on attachment 99762 [details] [diff] [review] patch v.17.1 Carrying over r=bstell on the event part.
Comment on attachment 99762 [details] [diff] [review] patch v.17.1 r=peterl
Attachment #99762 - Flags: review+
am testing av's special build: ran thru flash, qktime, real shockwave, viewpoint, acrobat, still testing. found one crasher on cbsnews.com (video url on home page, click to view..and then while the video is playing, close the window using 'x'..browser crashes). This is using real1 and real8 and happens only on this special build. same thing happens on cnet radio live(cnet.com ,click on 'Listen Live' on bottom right). Confirmed with Beppe. will update once again after finishing my tests.
I'm seeing the same thing Shrir is seeing but only with the build found here: http://warp.mcom.com/u/av/publish_html/mozilla-100CPU.zip If I apply attachment 99762 [details] [diff] [review] to my debug trunk build, I do not crash. I'll double check the optmized build I built with this patch at home later.
I am also testing av's build. Tested flash, qt, java, and real1. I had up to 10 different windows open each with at least one active plug-in (some had several instances). I did not experience reduced performance, the app did not lock up either. I ran through most of the URL's referenced in this bug as well as the macromedia site, the quicktime site, bmwfilms, aim.com, izzy-sings.com. It's looking great. I was able to repro the crash that Shrir describes, but could not repro the crash on the current trunk build.
and WMP too -- forgot to mention that one
Well, I don't build optimized frequently, so maybe I did not do it right way. Could have I missed something? I specified MOZCONFIG with the following three options: ac_add_options --enable-crypto ac_add_options --disable-debug ac_add_options --enable-optimize Then updated the tree, clobbered and rebuild. Zip only contains /bin folder content.
We need to make sure that this is not like my local build is spoiled. I am rebuilding with symbols now, would be nice if somebody can rebuild optimized with the patch on the fresh tree.
Comment on attachment 99762 [details] [diff] [review] patch v.17.1 - In nsObjectFrame.cpp: + nsCOMPtr<nsIPluginInstance> inst; + // we need to finish with the plugin before native window is destroyed // doing this in the destructor is too late. if(mInstanceOwner != nsnull) { - nsIPluginInstance *inst; - if(NS_OK == mInstanceOwner->GetInstance(inst)) Why move the declaration of |inst|? Is it used in a broader scope now? Also: - inst->SetWindow(nsnull); + window ? window->CallSetWindow(nullinst) : inst->SetWindow(nsnull); For readability's sake I'd suggest you rewrite the above with an if statement, i.e.: + if (window) { + window->CallSetWindow(nullinst); + } else { + inst->SetWindow(nsnull); + } - In nsPluginInstanceOwner::GetWindow(): { - aWindow = &mPluginWindow; + aWindow = mPluginWindow; return NS_OK; } Should we assert that mPluginWindow is not null here? Before this change a caller never got null back from this method, now they could possibly get null back... - In nsPluginHostImpl.cpp: + rv =aInstance->GetPeer(getter_AddRefs(peer)); Be consistent with space around '=' (I know you didn't introduce this inconsistency here, but clean it up while you're at it). - In nsPluginNativeWindow.h: +#ifndef _nsPluginWindow_h_ +#define _nsPluginWindow_h_ ... +#endif //_nsPluginWindow_h_ Those should all be _nsPluginNativeWindow_h_ - In nsPluginNativeWindow.cpp: +class nsPluginNativeWindowPLATFORM : public nsPluginNativeWindow { +public: + nsPluginNativeWindowPLATFORM(); + ~nsPluginNativeWindowPLATFORM(); +}; Since nsPluginNativeWindow has virtual methods, you'll need to mark the destructor in this class as virtual too. +nsPluginNativeWindowPLATFORM::nsPluginNativeWindowPLATFORM() +{ Don't you want to call nsPluginNativeWindow's ctor here too (for clarity and consistency, if nothing else)? - In nsPluginNativeWindowWin.cpp: +static NS_DEFINE_CID(kEventQueueServiceCID, NS_EVENTQUEUESERVICE_CID); +static NS_DEFINE_IID(kCPluginManagerCID, NS_PLUGINMANAGER_CID); // needed for NS_TRY_SAFE_CALL Don't use NS_DEFINE_IID for defining CID's, use NS_DEFINE_CID in both those places. +class nsPluginNativeWindowWin : public nsPluginNativeWindow { +public: + nsPluginNativeWindowWin(); + ~nsPluginNativeWindowWin(); Mark this dtor virtual too. +private: + nsresult SubclassAndAssociateWindow(); + nsresult UndoSubclassAndAssociateWindow(); + +public: + nsresult CallSetWindow(nsCOMPtr<nsIPluginInstance> &aPluginInstance); Mark this virtual too, for clarity if nothing else. + +public: + // locals + WNDPROC GetWindowProc(); + PluginWindowEvent * GetPluginWindowEvent(HWND aWnd, UINT aMsg, WPARAM aWParam, LPARAM aLParam); + +private: + WNDPROC mPluginWinProc; + PluginWindowEvent *mPluginWindowEvent; What's up with all those public and private keywords? :-) Can't you reorder things a bit so you only need one of each? - In ProcessFlashMessageDelayed() First of all, this method returns an nsresult and based on that result we decide if the message was processed or not. That's IMO not clean, nsresults have special meaning, and since we don't use the return value for anything other than to know if the message was processed or not, don't use an nsresult. How about using a PRBool here in stead? I.e. the method returns PR_TRUE if the message was processed and PR_FALSE if it wasn't processed. Second, we do a fair bit of work here if we're processing a WM_USER+1 message. We should at the very least use a boolean to cache if the plugin is flash or not (i.e. have a boolean in the native window or somesuch, or set a property on hWnd) so that we don't need to get the plugin mime type for every message and do a string compare. Additionally, we should really also cache the nsIEventQueueService on the plugin window so that we don't need to hit the service manager every single time we go through this code (which can be really frequently). Also, the code: + nsCOMPtr<nsIEventQueue> eventQueue; + eventService->GetThreadEventQueue(PR_GetCurrentThread(), getter_AddRefs(eventQueue)); should be changed to: + nsCOMPtr<nsIEventQueue> eventQueue; + eventService->GetSpecialEventQueue(nsIEventQueueService::UI_THREAD_EVENT_QUEUE, getter_AddRefs(eventQueue)); Since that call is faster (AFAIK) than the GetThreadEventQueue. We're quaranteed to always be on the UI thread here, right? Then: + PluginWindowEvent *pwe = win->GetPluginWindowEvent(hWnd, msg, wParam, lParam); + if (pwe) { + eventQueue->PostEvent(pwe); + } GetPluginWindowEvent() always returns the same event, are we guaranteed that the event is not already in use? I don't see any code that guarantees that, and I don't feel good about making the assumption that flash will never post a WM_USER+1 message before the last one has been processed. Or did I miss something here? - In nsPluginNativeWindowWin::GetPluginWindowEvent(): + if (!mPluginWindowEvent) + mPluginWindowEvent = new PluginWindowEvent; + + if (mPluginWindowEvent) { ... + return mPluginWindowEvent; You can avoid the double check on mPluginWindowEvent (in the case when there is an mPluginWindowEvent) if you'd write the above as: + if (!mPluginWindowEvent) { + mPluginWindowEvent = new PluginWindowEvent; + + if (!mPluginWindowEvent) { + return nsnull; + } + } ... + return mPluginWindowEvent; And this code really should make sure that it doesn't return an event that's already in use. Maybe the code that fires this event can null out mPluginWindowEvent when it fires the event and then once the event is destroyed mPluginWindowEvent could be set to point to the event again, unless it already points to another event in which case you'd just free the memory used by the handled event. - In PLUG_NewPluginNativeWindow(): + *aPluginNativeWindow = new nsPluginNativeWindowWin(); + + return *aPluginNativeWindow ? NS_OK : NS_ERROR_FAILURE; Use NS_ERROR_OUT_OF_MEMORY to flag OOM, not NS_ERROR_FAILURE. - In nsPluginViewer.cpp: } } - - return NS_OK; } Please clean up that weird indentation. Remove the tab(s). - In pluginInstanceOwner :: GetWindow(): { - aWindow = &mPluginWindow; + aWindow = mPluginWindow; return NS_OK; } Again, should we assert if mPluginWindow is null? Other than that, the changes look reasonable to me, but I heard rumors about a crash, so I'll hold my sr for now, and the above needs some fixing too...
Attachment #99762 - Flags: needs-work+
I think I know why we crash. We don't want to subclass after we called SetWindow(null). Patch is coming.
This fixes real crash on sbsnews. Please tell me if it looks reasonable.
Johnny, thanks for great comments. Brian and me will address all of them in the incoming patch. I just have a couple of things on which I disagree: > Since nsPluginNativeWindow has virtual methods, you'll need to mark the > destructor in this class as virtual too. Destructor of the base class is marked virtual. According to MSVC help "The virtual keyword is needed only in the base class's declaration of the function; any subsequent declarations in derived classes are virtual by default." Is not this standard? Also, is not the destructor always virtual? I will change it but just wanted to know for sure for the future. >+nsCOMPtr<nsIEventQueue> eventQueue; >+eventService->GetSpecialEventQueue(nsIEventQueueService::UI_THREAD_EVENT_QUEUE, getter_AddRefs(eventQueue)); > > Since that call is faster (AFAIK) than the GetThreadEventQueue. We're > quaranteed to always be on the UI thread here, right? No. Java is not in the UI thread AFAIK, and being in our thread or not is up to plugins themselves, this is not under our control. I would not change this unless somebody convinces me.
Attached file WIP: rework the event code (obsolete) (deleted) —
Attachment #99950 - Attachment is obsolete: true
Attached patch patch v.18 (obsolete) (deleted) — Splinter Review
- crash on closing plugin window is fixed - some clean up and effectiveness is done, thanks, Johnny for comments! - event handling logic, when new event comes before old is handled, fixed, thanks, Brian Please review.
Attachment #99762 - Attachment is obsolete: true
Attachment #99923 - Attachment is obsolete: true
1) could we change: - PRBool InUse() { return !(mWnd==NULL && mMsg==0 && mWParam==0 && mLParam==0); }; + PRBool InUse() { return (mWnd!=NULL || mMsg!=0); }; 2) since we use "(mWnd!=NULL || mMsg!=0)" to tell that the event is in use can we add an assertion void PluginWindowEvent::Init(HWND aWnd, UINT aMsg, WPARAM aWParam, LPARAM aLParam) { + NS_ASSERTION(aWnd!=NULL && mMsg!=0, "invalid plugin event value"); NS_ASSERTION(mWnd==NULL && mMsg==0 && mWParam==0 && mLParam==0,"event already in use"); PluginWindowEvent* nsPluginNativeWindowWin::GetPluginWindowEvent(HWND aWnd, UINT aMsg, WPARAM aWParam, LPARAM aLParam) { PluginWindowEvent *event; if (mPluginWindowEvent.InUse()) { + // We have the ability to alloc if needed in case in the future some plugin + // should post multiple PostMessages. However, this could lead to many + // alloc's per second which could become a performance issue. If/when this + // is asserting then this needs to be studied. See bug 169247 + NS_ASSERTION(1, "possible plugin performance issue"); event = new PluginWindowEvent;
> GetPluginWindowEvent() always returns the same event, are we guaranteed that > the event is not already in use? This is a good point. While it is true that currently a given Flash only posts one at a time this could change in the future. Code to alloc was added in attachment 99954 [details] [diff] [review]. We should have the assertion mentioned in my comment #167 to notice if this code is being used so we can check for performance issues.
used the newer special build: stuff seems great, the crasher is gone, other plugins work fine too. However, another issue came up... steps: 1. go here : http://www.mozilla.org/quality/browser/front-end/testcases/plugins/ra5.html 2. observe that a dialog comes up saying 'real network is being contacted...' which ultimately comes back saying 'nothing was found to play this clip' (.ram which runs just fine on 4.79) 3. Click CANCEL on this dialog and click BACK to go back to the previous page... Result : Cpu goes to 100% and stays there...previous page loads after a big delay but cpu is still 100% and UI navigation is impossible.
Av, yes, if a base class' method (including the destructor) is marked virtual the virtual keyword is implied on any overridden methods, but for clarity it's always good to put the keyword in there anyways. And in your case, the base class' destructor is virtual so you're ok, at least kinda. You're out on thin ice here though since nsPluginNativeWindow (which is the class that has the destructor that's actually marked virtual) inherits from nsPluginWindow which is a struct that doesn't have a destructor, let alone a virtual one. This causes weird things to happen, depending on the compiler you're using, some cases are worse than others. Independent of the compiler, if someone was to delete one of these nsPluginNativeWindows objects through an nsPluginWindow pointer w/o doing the proper cast, the nsPluginNativeWindow's destructor wouldn't be called. Same goes for the destructors on any other classes that inherits nsPluginNativeWindow, of course. Using MSVC, you'd even crash if you'd do the above. The reason for this is that when you instanciate a new nsPluginNativeWindow the MSVC compiler will put the vtable for the virtual methods in the first 4 bytes of the chunk of memory allocated for this new object. This means that the nsPluginWindow is at a 4-byte offset from the start of the memory that was allocated, and thus, if you cast this object to a nsPluginWindow you'll get a pointer to the nsPluginNativeWindow + 4 bytes, and delete (and internally free) won't be able to free that memory (due to the 4-byte offset). Using gcc, you'd be ok since it seems to put the vtable at the end of the memory allocated for a class, but on windows you're kinda screwed. This is fragile, at the very least. I assume we can't change nsPluginWindow any more, right? If we could, this whole mess could be solved by adding a virtual destructor to the nsPluginWindow struct, but w/o that we're stuck. The other option which is a bit more work and a bit trickier to start with, but much more robust once done, is to not mix C-style structs with C++ classes that have virtual methods. IOW, you'd need to rewrite the nsPluginNativeWindow so that there are no more virtual methods, you'd need to use function pointers or something like that to make that code work w/o virtual methods in nsPluginNativeWindow. You'd need to manually do the cleanup of everything that a nsPluginNativeWindow holds on to, and that would get kinda messy :-(. I'm not sure what to suggest here, if you think the code is safe and you know that a nsPluginNativeWindow will never be deleted through a pointer to nsPluginWindow, then I guess we can live with that if we choose to accept this fragility. I guess you could add a protected non-virtual destructor to nsPluginWindow so that it's impossible to call delete on a pointer to a nsPluginWindow directly (that would not change the layout of the struct), would that make sense? (You'd need to wrap that in an |extern "C++" { }| block to leave the struct usable from C.) As far as the threading issues goes, you're right, Java and other plugins may run on other threads, but flash doesn't AFAIK, and the code in question would never be hit using other plugins. Either way, it's safer to do what you did, and who knows, maybe flash does, or will some day, run on other threads as well... Let me know what you think about the above virtual method stuff and if you think that what you have is what you want then I'll go over this patch once more and sr it for you.
Johnny, thanks for the clarification! I was not objecting to marking functions virtual (in fact, I did it in the last patch, v.18), I just wanted to make sure I understand your motivation. It's perfectly fine with me. You are right we cannot change nsPluginWindow -- it stems from 2.0 days and everybody is relying on the way it is now. The good news is nobody except us is supposed to delete this object, so I think we are fine here as we will do proper cast as necessary. Adding a protected non-virtual destructor to nsPluginWindow seems reasonable to me, as it would catch the crash situation you described on the compilation stage.
*** Bug 162007 has been marked as a duplicate of this bug. ***
I'd be cool with this if we'd add a protected destructor to nsPluginWindow, that way we won't do the wrong thing in our code...
it appears to be in the real player plugin PNEN3260
Looks like I spoke too soon. We cannot add a destructor to |nsPluginWindow|. This struct is used as a union member in other structs in nsplugindefs.h. Johnny, I think we are still fine, because only we in our code will delete it. If you think is not acceptable, I can rewrite the code so that this SetWindow call is not encapsulated and we don't derive from |nsPluginWindow| at all. This will be less clear and not as convenient though. Please let me know.
my last comment, #174, is regarding the infinite loop mentioned in comment 169 http://bugzilla.mozilla.org/show_bug.cgi?id=132759#c169
the real player infinite loop of comment 169 appears (as Andrei mentioned in a private email) to happen for RealPlayer 8 but not RealOne.
Attached patch patch v.19 (obsolete) (deleted) — Splinter Review
- more accurate implementation of subclassing/unsubclassing. This fixed the reported Real 100% CPU problem - big comment about potential danger during destruction of the |nsPluginNativeWindow| object added to nsPluginNativeWindow.h - latest Brian's additions to event handling incorporated
Attachment #99954 - Attachment is obsolete: true
I have tested the latest test build with the above patch included. I confirm all the issues that had been reported have now got fixed(cpu usage issue for real). Did extensive functionality testing of flash/scriptable flash, shockwave director,quicktime,ipix,viewpoint,realplayer,acrobat (tier 1 & 2 plugins). Everything is working fine.
Attachment #100352 - Flags: review+
Comment on attachment 100352 [details] [diff] [review] patch v.19 r=peterl
Comment on attachment 100352 [details] [diff] [review] patch v.19 - In nsPluginHostImpl::HandleBadPlugin(): - rv =instance->GetPeer(getter_AddRefs(peer)); + rv =aInstance->GetPeer(getter_AddRefs(peer)); Since you're already touching this code, add a space after '='. - In nsPluginNativeWindowWin::GetPluginWindowEvent(): + event = new PluginWindowEvent; Add () after the type name. I guess it doesn't matter to the compiler, but the parens are normally there, so for consistency's sake if nothing else... + if (event) { + event->Clear(); No point in calling Clear() here, the ctor of PluginWindowEvent call Clear() for you. + event->SetIsAlloced(PR_TRUE); + } + } + else { + event = &mPluginWindowEvent; + } + + if (!event) { + return nsnull; + } This last if check should be moved into the if (mPluginWindowEvent.InUse()) check since &mPluginWindowEvent is never null. sr=jst if you change that when you check in.
Attachment #100352 - Flags: superreview+
We also need to null out the |nsPluginNativeWindow| struct in generic platform implementation, as Linux checks on some fields before accessing them (ws_infor). Thanks Serge for pointing this out.
Attached patch patch v.20 (deleted) — Splinter Review
- the above comments are addressed - priming |nsPluginWindow| struct fields with null added to the default |nsPluginNativeWindow| constructor platform implementations - added a new method |nsPIPluginHost::DeletePluginNativeWindow| to pair with |nsPIPluginHost::NewPluginNativeWindow|, this is done mainly because some compilers don't like to delete objects using pointers to the base class, so we need to cast it back to platform specific type before deleting.
Attachment #100352 - Attachment is obsolete: true
Comment on attachment 100631 [details] [diff] [review] patch v.20 carrying over r= and sr=
Attachment #100631 - Flags: superreview+
Attachment #100631 - Flags: review+
Checked in to the trunk. Marking fixed. Please try it heavily.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Using today's trunk build (2002-09-26-08-trunk), I get crash consistently when dragging around the flash pop-up on http://www.ccidnet.com I'm on Simplified Chinese XP.
http://www.overlookmaniak.it and http://www.yesky.com works for me though with the same build. The UI lockup problem is gone and CPU usage dropped down quite a bit for these two URLs.
Using 2002092608 on Win2k http://www.ccidnet.com hangs mozilla (or hogs so much CPU that it cannot be used)
Problem with ccidnet.com seems to be yet another issue. In the console output I see that while loading the page layout keeps creating webshells and gives up when the number reaches 100. I remember filing a bug on this the other day. This is likely because bad page coding causes some sort of recursion, e.g. window.reload in JavaScript will cause this, and layout does not have a mechanism to handle such a situation protectively.
Here is the similar bug -- bug 126466, it even has the same original url. It is marked fixed in May. Regressed?
And the problem exists even without Flash. Should we reopen that bug?
I compared the 2002-09-24-08-trunk with 2002-09-26-08-trunk (2002-09-25-08-trunk fails to install) http://www.overclockmaniak.it 9-24 = hard freeze, no ui response including unable to close with "x" 9-26 = a bit sluggish but responds to all UI input http://www.yesky.com 9-24 = hard freeze, no ui response including unable to close with "x" 9-26 = a bit sluggish but responds to all UI input http://www.ccidnet.com 9-24 = hard freeze, no ui response including unable to close with "x" 9-26 = this freezes but in an odd way; memory use goes up a down, app can be moved around, app appears not to paint completely, the "x" button visually responds but instead of closing the app a dialog appears saying the app is not responding and asking if the app should be ended.
On http://www.ccidnet.com I note that when I move the mouse over a link then the flash animations stop.
When I disable flash by removing it from the plugins dir http://www.ccidnet.com comes up, the flash plugin areas show the missing plugin symbol, and the page has a low CPU load. When I move the mouse over a link then the CPU load goes to 100% and the app freezes (marquee stops, jpgs/gifs stop) but I can still move the app and the "x" button visually responds but brings up a dialog saying the app is not responding.
With the 9-24 build if I cover the flash areas the CPU load is about 80% and the "x" is visually responisive until I move the mouse over a link then the load goes to 100% and the app stop responding. I fairly positive that the http://www.ccidnet.com issue is not related to this bug's checkin.
*** Bug 155001 has been marked as a duplicate of this bug. ***
*** Bug 171057 has been marked as a duplicate of this bug. ***
guess I can mark this bug verif now.
Status: RESOLVED → VERIFIED
*** Bug 154070 has been marked as a duplicate of this bug. ***
***IF*** we were to land this on the 1.0 branch, what other patches do we need? Do we need the patches for bug 164931 and bug 163528?
Kevin's work in bug 164931 is a companion fix with this bug, so yes it will also need to go in. I'm not sure of bug 163528, looking at the fix date, it does not seem to be related, but I will leave that up to av
I think this description is accurate. Bug 163528 is not needed for the patch in the present bug to work.
*** Bug 179032 has been marked as a duplicate of this bug. ***
*** Bug 179752 has been marked as a duplicate of this bug. ***
Blocks: 507216
(In reply to Henrik Gemal from comment #10) > http://www.ananova.com/assets/ticketsbanner.swf is also a 100% > http://www.dailyhosting.net/blog is also a 100%
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: