Closed Bug 177958 Opened 22 years ago Closed 22 years ago

DHTML Performance Regression

Categories

(Core :: Layout, defect, P3)

x86
Windows XP
defect

Tracking

()

RESOLVED WONTFIX
Future

People

(Reporter: kerz, Unassigned)

References

()

Details

(Keywords: perf, regression)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.2b) Gecko/20021023 Netscape/7.0+ Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.2b) Gecko/20021023 Netscape/7.0+ We're running some tests using the DHTML Pageload suite, and we've found that one of the test's times has risen a huge amount compared to the 1.0 branch. If you go to the above URL, the main problem is with the slidein test. it's taking 3-4 minutes to finish, where it takes no more than 15 seconds on the branch Reproducible: Always Steps to Reproduce: 1. Run the pageload test with a branch build 2. Run the pageload test with a trunk build 3. Compare Actual Results: Trunk build's results are terrible. working on a standalone testcase for that page. You can see the source of the page at http://pageload.craptastic.org/pages/dhtml/slidein/index.html but it won't do anything, because the js gets triggered by the test script.
Unconfirmed i think not. CCing some folks. Internal pageload server for this is http://cowtools.mcom.com/page-loader-dhtml/loader.pl
Status: UNCONFIRMED → NEW
Ever confirmed: true
This is principally due to the fix for bug 123273 - "setTimeout(something, 0) causes 100% CPU constant" - which only applies to the trunk. The fix in that case was to set a minimum of 10msec as the timeout time to prevent a flood of timers. If I modify the testcase to explicitly use a timeout of 10msec and run that test with a branch build, I get essentially the same result as I do in a trunk build. (In other words, this change in performance is the inevitable result of setting that 10msec minimum timeout time). However (I'm a little curious about this), I tried backing out the fix in GlobalWindow.cpp in a trunk build and running the testcase with a timeout of 1msec. The test starts out moving the divs across the page quite rapidly, but then "bogs down", presumably overrun by timer requests. I'm wondering what particular change on the trunk makes us more susceptible to this 'overrun' than we are on the branch (although I vaguely recall that there were some significant changes in timer code that landed after we branched).
> it's taking 3-4 minutes to finish, where it takes no more than 15 seconds on > the branch Funny. For me, it takes about 70 seconds on trunk, 18 seconds on branch, for win2k/500MHz. (But then, in some ways, this test is really about timer resolution (and OS/system dependencies), and not so much about painting/reflow/etc.).
Blocks: 21762
QA Contact: gerardok → trix
jst and kevin, can you guys help take a look at this major DHTML regression?
This is probably the result of the fix for http://bugzilla.mozilla.org/show_bug.cgi?id=164931 which changes the PLEvent notification mechanism to use timers 2 seconds after the document is loaded. John does it "bog down" 2 seconds after the page loads? I agree with John's assessment that the testcase http://pageload.craptastic.org/pages/dhtml/slidein/index.html is only measuring the frequency of the timer callback, not the reflow/painting performance when moving the object. For more detail see http://bugzilla.mozilla.org/show_bug.cgi?id=164931#c24
Yeah, I see a degradation of performance on all the tests in the suite about 2 seconds after they start.
I'm currently trying to repair my windows box, but from memory, yeah, it did seem to 'bog down' on that test after about 2 seconds.
I think this bug is won't fix. The new behavior of limiting the timer frequency is intended and provides up to 100 frames per second for animation's. Which is plenty. This also matches IE's behavior. IE uses native timers which have a 10ms granularity. I would suggest changing the DHTML Pageload suite to do the following. 1. Set the timeout between moves to 10ms instead of 1ms. 2. Change the testcases so the machines we typically run the perf test on can *not* reflow/render within the 10ms interval. This could be accomplished by moving a large number of elements simultaneously.
Resolving as WONTFIX
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → WONTFIX
resolving wontfix
Status: RESOLVED → VERIFIED
wait, don't mark this WONTFIX to VERIFIED yet! According to our DHTML test suite, this is a 200-500% regression. We need some 2nd opinions on this WONTFIX decision. CC'ing folks... win98 800mHz --------------------- 1.0 branch 4341 ms trunk 20042 ms ie6 21195 ms win98 1.7GHz --------------------- 1.0 branch 4587 ms trunk 20131 ms ie6 18647 ms winXP 800mHz --------------------- 1.0 branch 4655 ms trunk 9654 ms ie6 7634 ms winXP 1.7GHz --------------------- 1.0 branch 4752 ms trunk 9711 ms ie6 7076 ms
Status: VERIFIED → REOPENED
Resolution: WONTFIX → ---
and linux and mac numbers... linux RH7.2 800mHz --------------------- 1.0 branch 4648 ms trunk 9451 ms linux RH7.2 1.7GHz --------------------- 1.0 branch 3962 ms trunk 8732 ms mac OSX G4 733MHz --------------------- 1.0 branch 4646 ms trunk 8436 ms
Actually, I was okay with the wontfix resolution. It looks like a big regression in performance but, as noted in comment #3, it's really just the direct consequence of setting a minimum time of 10ms for setTimeout() calls. I'd like to see a "real world" example where that cap is significantly hurting our usability/performance before revisiting that decision. But, other opinions welcome.
note: the Dhtml perf results using the 1.0 branch are not valid because we used to allow painting to be starved. i.e. we would move the element but we would not always paint it. Using system timers for plevent dispatch fixed this because the timer will not fire until all painting is complete. system timers on WIN32 are limited to a 10ms interval. here's why we have to limit it to 10ms 1. compatibility with DHTML created to run on I.E. 2. allowing sub 10ms intervals cause high CPU usage when compared with IE. 3. If we require sub 10ms intervals we can not use system timers to dispatch pl events and painting will be starved again 4. the only usage for sub 10ms intervals I've seen are these dhtml perf tests which I believe are flawed. In my opinion, these tests need to be measuring relow/painting performance not the minimum timeout interval. 5. A timeout interval of 10ms allows for 100 frame per second, which is more than adequate.
another indication that the dhtml perf tests are flawed is they show the moz1.0 branch being 200-500% faster than I.E. this does not match what I've seen on real world examples.
ok, we've updated the test suite to allow specification of the timeout. It defaults to 10 ms. We are still showing a performance regression (although not as bad as before). 2002-12-05-08-1.0 - 5447 ms 2002-12-05-08-trunk - 6988 ms IE 6 - 6835 ms This was using my desktop machine (2.4 ghz).
It's still a bogus comparison because the 1.0 branch can still starve painting when the timeout is set to 10ms.
So, what would be the right approach to make a comparison?
Someone would have modify the 1.0 branch to force it to synchronously paint any invalidated areas before processing the PLEvent associated with JavaScript timer driving the animation. I don't think its worth spending time to do this. There is only a 2% performance difference between the trunk and IE according to comment 17.
Think so too - time is better spent on bug 157681 and optimizing our painting performance now I'd say. What's gonna be done about this one now (wontfix)?
Resolving as WONTFIX
Status: REOPENED → RESOLVED
Closed: 22 years ago22 years ago
Priority: -- → P3
Resolution: --- → WONTFIX
Target Milestone: --- → Future
No longer blocks: 21762
Blocks: 21762
You need to log in before you can comment on or make changes to this bug.