Open Bug 648353 Opened 14 years ago Updated 2 years ago

when processing a large volume of internal events we spend too much time checking for native events

Categories

(Core :: DOM: UI Events & Focus Handling, defect)

x86
macOS
defect

Tracking

()

People

(Reporter: gal, Unassigned)

References

(Depends on 1 open bug)

Details

Attachments

(5 files)

No description provided.
Attached file test case (deleted) —
This is something I see all the time with the mac event loop... I _think_ we do it to prevent starvation of user events. But yes, it really hurts us perf-wise.
Based on sysprof, the situation is a bit less bad on linux. "Only" 25% is under DoProcessNextNativeEvent, not 50% like on OSX.
Still very lousy if you ask me.
Can we use a timer to control the rate at which we check for native events?
#6 would also help with the couple percent we spend in polling the timer in AppShellBase.
Or could we just change nsBaseAppShell::OnProcessNextEvent a bit more, like remove the performance hint. I have a patch which seems to work well with the testcase on my linux laptop (it cuts 33% of the processing time and UI stays responsive). I need to run it through tryserver to see if it regress tp badly. I have no idea how well it works on other platforms.
Can you post the patch? I am curious what it looks like.
Attached patch wip (deleted) — Splinter Review
So the idea is to process native events a bit less often, but when process them, try to process them as many as possible. The current else{} block in nsBaseAppShell::OnProcessNextEvent where DoProcessNextNativeEvent(PR_FALSE); is called just once doesn't seem to behave too well. This is all very experimental.
To prevent starving either user events or web events, don't we want to process both in small chunks as opposed to big batches?
If we can't keep up with native events, I think we are kinda screwed anyway. No point in delaying them. No?
I seem to recall there being cases (on Windows?) where we can get as many native events as we ask for... Maybe during resize event loops? But I might be misremembering. It's been a while, and I was watching from the sidelines.
Resizes and native modal dialogs spin their own event loops. (I don't know how Gecko events get processed in that model.)
(In reply to comment #13) > I seem to recall there being cases (on Windows?) where we can get as many > native events as we ask for... Maybe during resize event loops? But I might > be misremembering. It's been a while, and I was watching from the sidelines. That was a bit different - in that case Windows internal modal dispatch loops were prioritizing the native msg we post that triggers gecko event processing, starving input events. (Windows and mozilla prioritize differently.) The result was locked up native ui actions like window resize. http://mxr.mozilla.org/mozilla-central/source/widget/src/windows/nsAppShell.cpp#275
(In reply to comment #11) > To prevent starving either user events or web events, don't we want to process > both in small chunks as opposed to big batches? Well, the wip patch makes us process user events in smaller chunks, yet allow gecko events to get processes too.
The patch seems to work on tryserver, but it does trigger random oranges quite often. I mean such oranges which have been reported already before. I would assume in many cases the tests are buggy, or some other code has wrong assumptions.
For windows builds- a little bit of general testing shows everything working as expected. The test case returns 375->450 both in the try build and the latest nightly.
No longer blocks: 648850
Attached patch simpler (deleted) — Splinter Review
Don't remove perfmode, since I suspect the removal caused some twin regression. Uploaded to tryserver.
(In reply to comment #17) > The patch seems to work on tryserver, but it does trigger random oranges quite > often. > I mean such oranges which have been reported already before. > I would assume in many cases the tests are buggy, or some other code > has wrong assumptions. I'd be very cautious with this patch then...
Yeah, need to probably fix those tests before landing this kind of change.
(In reply to comment #22) > Yeah, need to probably fix those tests before landing this kind of change. I meant, do we know that it only affects tests in a negative way? Could it also affect web content in a negative way, without us realizing?
My *guess* is that those (already without the patch randomly failing) tests have some bad assumptions about when timers and runnables run.
(In reply to comment #24) > My *guess* is that those (already without the patch randomly failing) tests > have some bad assumptions about when timers and runnables run. That's interesting. Do you have a set of those tests? They may be among our flaky tests which fail intermittently...
I posted the patch to tryserver and this time I got the following failures. No new random failures. Bug 608206 - Intermittent timeout in test_prompt.html | Test timed out. * uses nsITimer Bug 632102 - Intermittent test_bug629838.html | Got MozAfterPaint event 25 time(s). * has bogus assumptions about how often events fire. There is a patch to fix the test. Bug 645267 - Intermittent failure in layout/reftests/svg/as-image/img-and-image-1.html, svg-image-recursive-1a.svg, svg-image-recursive-1b.svg, svg-image-recursive-2a.svg and svg-image-recursive-2b.svg | image comparison (==) * not sure about this. Happens quite often anyway. Bug 637806 - Intermittent failures in test_titlebar.xul * this one uses setTimeout Bug 542928 - [Linux] browser_bug511456.js | application timed out after 330 seconds with no output * common one Bug 626998 - Intermittent failure in browser_bug553455.js | Test timed out * There are patches for this Bug 620721 - Intermittent test_timeupdate_small_files.html | Test timed out. followed by shutdown timeout * relies on setTimeout
Patch looks great. I would say land it. If this patch does alter the behavior of some test, thats most likely due to a brittle test and should be fixed anyway.
(In reply to comment #27) > Patch looks great. I would say land it. If this patch does alter the behavior > of some test, thats most likely due to a brittle test and should be fixed > anyway. I don't agree, we shouldn't just land stuff which can cause tests to fail more frequently without fixing the issues before landng. Let's take a look at these: (In reply to comment #26) > I posted the patch to tryserver and this time I got the following failures. No > new random failures. > Bug 608206 - Intermittent timeout in test_prompt.html | Test timed out. > * uses nsITimer Can you please trigger this test multiple times and see how frequently does it fail? The frequency should be ~0.05 times per test run (which is pretty low) <http://brasstacks.mozilla.com/orangefactor/?display=Bug&endday=2011-04-26&startday=2011-02-25&bugid=608206>. > Bug 632102 - Intermittent test_bug629838.html | Got MozAfterPaint event 25 > time(s). > * has bogus assumptions about how often events fire. There is a patch to fix > the test. Then this bug should depend on bug 632102. Can you try pushing this patch with the patch on bug 632102 to see if that helps the situation? > Bug 645267 - Intermittent failure in > layout/reftests/svg/as-image/img-and-image-1.html, svg-image-recursive-1a.svg, > svg-image-recursive-1b.svg, svg-image-recursive-2a.svg and > svg-image-recursive-2b.svg | image comparison (==) > * not sure about this. Happens quite often anyway. Yeah, this one is really flaky, and I don't think your patch here can affect this in any way. > Bug 637806 - Intermittent failures in test_titlebar.xul > * this one uses setTimeout I think we need to fix this bug before this can land. > Bug 542928 - [Linux] browser_bug511456.js | application timed out after 330 > seconds with no output > * common one Actually it's not that common: <http://brasstacks.mozilla.com/orangefactor/?display=Bug&endday=2011-04-26&startday=2011-02-25&bugid=542928&tree=all> How often does this happen for you on the try server? > Bug 626998 - Intermittent failure in browser_bug553455.js | Test timed out > * There are patches for this It seems to have been fixed. If you update to tip and reapply the patch and push it to try, this shouldn't happen again. > Bug 620721 - Intermittent test_timeupdate_small_files.html | Test timed out. > followed by shutdown timeout > * relies on setTimeout This one also seems to be close to getting fixed.
Depends on: 632102, 637806, 626998, 620721
I don't see how it makes sense to gate patches on **** tests. If a test is clearly flaky and the patch is unrelated and merely causes the alignment of the sun and the moon to change enough for a test to fail more often, we don't win anything from not landing it. Such tests should be disabled until they are fixed.
(In reply to comment #29) > I don't see how it makes sense to gate patches on shitty tests. If a test is > clearly flaky and the patch is unrelated and merely causes the alignment of the > sun and the moon to change enough for a test to fail more often, we don't win > anything from not landing it. Such tests should be disabled until they are > fixed. Landing something which increases the frequency of intermittent failures is going to make the lives of everyone harder. There is a cost to landing such patches. Like I said in comment 28, a lot of these issues have a fix at hand, and for those which don't, we should advocate on the respective bugs for more attention. None of these bugs seem like big issues to me, but if for some reason we can't find anybody to work on some of them, we should disable the test with the blessing of the respective module owner.
Also, Olli, can you please do another try server test run from the tip so that we have more data on the frequency of the failures? Thanks!
Sorry, forgot to push to try. A new build is coming http://tbpl.mozilla.org/?tree=Try&rev=81bb5e879230
This time failures were Bug 662800, Bug 636379, Bug 632117 and REFTEST TEST-UNEXPECTED-FAIL | file:///home/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/491323-1.xul | image comparison (==)
Comment on attachment 525490 [details] [diff] [review] simpler gal, bz, what you think of landing this. If this causes any problem it would be easy just back this out.
Attachment #525490 - Flags: review?(gal)
Attachment #525490 - Flags: review?(bzbarsky)
So I'm trying to understand this patch. If we get into nsBaseAppShell::OnProcessNextEvent and we recently processed a native event so we skip native event processing per this patch... what will make sure we process those native events eventually?
Comment on attachment 525490 [details] [diff] [review] simpler I uploaded this once again to tryserver. Seems like this is causing still too many random (usually known) failures.
Attachment #525490 - Flags: review?(gal)
Attachment #525490 - Flags: review?(bzbarsky)
(In reply to comment #37) w > So I'm trying to understand this patch. > > If we get into nsBaseAppShell::OnProcessNextEvent and we recently processed > a native event so we skip native event processing per this patch... what > will make sure we process those native events eventually? Hmm, good question. But if there is a problem with the patch, it is there also without the patch. If we're in perfmode, we can easily not process native events, and even in non-perfmode we may not process all the native events. /me reads some more code...
Ah, fair. I sort of wonder what determines whether we're in perf mode or not...
Attached patch safer? (deleted) — Splinter Review
Uploaded this, perhaps a tiny bit safer patch to tryserver.
Component: Event Handling → User events and focus handling
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: