Closed Bug 277762 Opened 20 years ago Closed 20 years ago

DHTML performance regression because of the patch from bug 228399

Categories

(Core Graveyard :: GFX: Win32, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: martijn.martijn, Assigned: emaijala+moz)

References

()

Details

(Keywords: perf, regression, testcase)

Attachments

(6 files)

This is a spin-off from bug 277044. See the table with times in the testcase from bug 277044: Transparent image: Build relative absolute fixed static 2004-11-26 7:42am 24285ms 8032ms 16043ms 8032ms 2004-11-27 7:48am 39937ms 36222ms 36463ms 8032ms Non-transparent image: Build relative absolute fixed static 2004-11-26 7:42am 19949ms 8022ms 10044ms 8032ms 2004-11-27 7:48am 10035ms 8031ms 8052ms 8032ms The build I used for this test was a Firefox trunk build. Also I build a non-debug build from a 2004-11-23 dated source of Mozilla, and tested with and without the patch from bug 228399, using the automatic testcase from this url: http://martijn.heelveel.info/test/mozilla/slowdhtml/slow_relative4.html Those test results are here: http://martijn.heelveel.info/test/mozilla/slowdhtml/results_patch228399.htm moz_20050109 = the build without the patch from bug 228399 moz_20050110 = the build with the patch from bug 228399 In the positioned fixed/transparent image case, the red block begins to move especially slow in the build, when it is moving at the end, out of the transparent image.
Depends on: 277766
No longer depends on: 277766
Note: this regression seems Windows-only (I don't see this effect on Linux...)
Assignee: win32 → emaijala
Flags: blocking1.8b?
Could you test a build from 2004-11-03? There were two patches in bug 228399. The first one was checked in 2004-11-04 and fixed the actual issue. The second one was checked in 2004-11-27 and reverted the system partially back to the old behavior because of other performance problems (possibly the same you can see with the non-transparent image here) the first change caused.
Ok, here are my test results. For completeness I also added the results for Mozilla at 2004-11-05 Results from Mozilla 2004-11-03 Mozilla 2004-11-05 relative: transpare: 8031 8031 absolute: transpare: 8021 8031 fixed : transpare: 9243 8382 static : transpare: 8032 8032 relative: not_trans: 8022 8032 absolute: not_trans: 8032 8032 fixed : not_trans: 8052 8132 static : not_trans: 8032 8032 relative: div_block: 8032 8032 absolute: div_block: 8022 8032 fixed : div_block: 8032 8142 static : div_block: 8032 8032
I took my few days old SeaMonkey trunk build and removed both patches of bug 228399. I did the test, added added the first patch, tested again, added the supplementary patch and tested again. Here are the results: Original: relative: transpare: 10766 absolute: transpare: 8032 fixed : transpare: 8022 static : transpare: 8032 relative: not_trans: 10245 absolute: not_trans: 8022 fixed : not_trans: 8022 static : not_trans: 8032 relative: div_block: 8022 absolute: div_block: 8022 fixed : div_block: 8022 static : div_block: 8032 With patch 1: Results: relative: transpare: 17044 absolute: transpare: 10595 fixed : transpare: 10956 static : transpare: 8031 relative: not_trans: 14861 absolute: not_trans: 8713 fixed : not_trans: 8833 static : not_trans: 8022 relative: div_block: 8983 absolute: div_block: 8021 fixed : div_block: 8031 static : div_block: 8031 With patch 2 (supplementary optimization patch): Results: relative: transpare: 10815 absolute: transpare: 8011 fixed : transpare: 8021 static : transpare: 8021 relative: not_trans: 10044 absolute: not_trans: 8021 fixed : not_trans: 8021 static : not_trans: 8021 relative: div_block: 8031 absolute: div_block: 8031 fixed : div_block: 8011 static : div_block: 8021 So, the primary patch slowed the rendering down as I noticed, but the secondary patch pushed it back to the original speed. So, for some reason my results don't support the others. My tests were run with a debug build on Athlon XP 2700+ with Matrox Millennium G550 at 1280x1024 32bit.
So... a few things: 1) Looking at performance on a debug build is usually not useful 2) The number is not all there is to this test. The number will be at _least_ 8000-some no matter what due to how we process timeouts (at least 10ms per timeout). The question is also how much CPU is being used in the process... on slower machines that may increase the times, but on a modern fast machine chances are we can still get all the processing into that 10ms timeslot with the simple testcase....
I've found debug build adequate for testing performance delta. In the test case the timeout was 20 milliseconds. I changed it to 0 (=10ms) and the results didn't halve, so I'm now jumping to the assumption the results now reflect the real maximum speed better. Here are the results again: Original: relative: transpare: 10935 absolute: transpare: 5778 fixed : transpare: 5989 static : transpare: 5207 relative: not_trans: 10205 absolute: not_trans: 5457 fixed : not_trans: 5608 static : not_trans: 5148 relative: div_block: 7681 absolute: div_block: 4947 fixed : div_block: 4977 static : div_block: 5087 Patch 1: relative: transpare: 17064 absolute: transpare: 10605 fixed : transpare: 10766 static : transpare: 5158 relative: not_trans: 15002 absolute: not_trans: 8893 fixed : not_trans: 8823 static : not_trans: 5158 relative: div_block: 8993 absolute: div_block: 4927 fixed : div_block: 5027 static : div_block: 5017 Patch 2: relative: transpare: 10926 absolute: transpare: 5809 fixed : transpare: 5949 static : transpare: 5218 relative: not_trans: 10145 absolute: not_trans: 5448 fixed : not_trans: 5628 static : not_trans: 5157 relative: div_block: 7791 absolute: div_block: 4937 fixed : div_block: 5007 static : div_block: 5058 Anyway, even if these are completely meaningless, I can't actually see why the latter patch would make the situation worse than it was unpatched.
Could this depend on graphics card or Windows version?
Fwiw, these are the results on my WinXP computer, Duron 600MHz, NVidia GeForce2 MX 100/200 32MB, screen setting 1024*768 16bit color depth. Results build: 2004-11-26 2004-11-28 relative: transpare: 24676 40738 absolute: transpare: 8032 38174 fixed : transpare: 16484 38295 static : transpare: 8032 8032 relative: not_trans: 16574 8042 absolute: not_trans: 8032 8032 fixed : not_trans: 10035 8032 static : not_trans: 8022 8032 relative: div_block: 11487 8032 absolute: div_block: 8022 8032 fixed : div_block: 8112 8032 static : div_block: 8032 8032
Attached file movement testcase (deleted) —
This is probably not the best Bug to place this comment, since it does not relate directly to the bugs mentioned here. Since Bug #245131 Cpu utilization has dropped dramatically. But at least since 10/14/04 some scripts are not aethetically inferior to the branch builds. The testcase should move the dots smoothly and with constant velocity, and thats true on the branch builds. Viewed with trunk builds of thunderbird or firefox the movement is jerky and erratic, compared to the branch. If this testcase/comment should be posted elsewhere, sorry for the spam. I am using WinXP pro sp2, P3-s 1.4 g, 512m Ram, ATI radeon 7200 64m DDR
Yes, that testcase should be placed elsewhere. Specifically, in a bug of its own, with more information on when you see the performance degradation start. Please cc me on said bug. _This_ bug covers a specific performance regression that happened on a specific day due to a specific patch. It's got enough complications with failures to reproduce that adding random unrelated comments is very much NOT appreciated. Please don't ever do that again. Especially since you KNEW that your issue was unrelated to this bug, as your comment shows. I've seen your name around Bugzilla enough that you should know the simple "one bug per issue" rule by now....
Attached file What I changed (deleted) —
Ok, this is what I changed/removed in nsDrawingSurfaceWin.cpp (I then rebuilt the gfx directory). These are the results in a non-debug build: Results unchanged changed: relative: transpare: 36232 16464 absolute: transpare: 36061 16033 fixed : transpare: 68128 24045 static : transpare: 8022 8032 relative: not_trans: 8082 12188 absolute: not_trans: 8042 8182 fixed : not_trans: 11957 12148 static : not_trans: 8031 8032 relative: div_block: 8031 8032 absolute: div_block: 8031 8022 fixed : div_block: 8031 8202 static : div_block: 8031 8032
Interesting. That's just backing out the optimization patch from bug 228399... I wonder why that patch makes things faster on some systems and slower on others?
What it comes down to is video card drivers. Most drivers handle 24 bit CreateDIBSection fast in any video mode. A few, do not. CreateCompatibleBitmap, for the most part, is slower than CreateDIBSection, but not always. That's been my experience. The really odd thing is that sometimes notching down the graphics accelleration sometimes speeds things up. Older nVidea cards were especially bad for this. Personally, I stay away from CreateCompatibleBitmap, but then again, I currently have a nVidea card. I played with this sort of stuff a few years ago, when I had a matrox and an nVidea card. It becomes really frustrating after a while. I'll dig back and see if I have documented anything from back then, but most of my work/testing is probably long gone (and forgotten by me)
Well, as I said, the optimization patch reverts the functionality partially back to what it was before the first patch as the first patch caused a quite bad slowdown in a specific case at 16 bit and I thought it would be safer to do as we previously did for the cases where blending isn't needed. I know that many things are faster without the "optimization" patch, but it seems that with the current architecture we'll always have a compromise. I wonder if it would be worth a shot trying to always use a dib section if the color depth is >=24.
Btw, it's not that CreateDIBSection would be slow, it's that GetDIBits is slow. It would be so much easier if we had the data around and wouldn't need GetDIBits.
Comment 15 sounds like a reasonable approach given comment 14. Want to post a patch that Martijn could take for a spin?
Attached patch Test patch (deleted) — Splinter Review
My results were not too encouraging. The test case slowed down considerably with this patch, but it would be nice to see Martijn's results. I think we might do way better if we'd be able to keep a backbuffer around all the time so there wouldn't be need to get the data from the device.
These are my results before and after I applied the test patch: Results before: after: relative: transpare: 36463 36522 absolute: transpare: 36062 36062 fixed : transpare: 68549 68458 static : transpare: 8031 8031 relative: not_trans: 8092 8081 absolute: not_trans: 8042 8041 fixed : not_trans: 12037 12047 static : not_trans: 8031 8032 relative: div_block: 8031 8032 absolute: div_block: 8031 8022 fixed : div_block: 8031 8032 static : div_block: 8031 8032 It doesn't seem to make any difference.
Attached image img1 for testcase (transparent) (deleted) —
My server is down for quite some time now, that's why I'm attaching the testcase to this bug.
Attached image img2 for testcase (not transparent) (deleted) —
Attached file Testcase (deleted) —
Right, you're running @ 16 bits color depth, right? Then it doesn't make a difference. Is it possible for you to test 24 bits mode?
Well, I can run in 32 bits mode. That seems also covered by your test patch. These are the results: Results: Before: After: relative: transpare: 72855 22592 absolute: transpare: 72114 20179 fixed : transpare: 141664 36062 static : transpare: 8032 8032 relative: not_trans: 12098 17565 absolute: not_trans: 8112 16133 fixed : not_trans: 16203 28051 static : not_trans: 8031 8032 relative: div_block: 8061 8022 absolute: div_block: 8031 8022 fixed : div_block: 8021 8042 static : div_block: 8021 8032 So in that case, it seems improve things quite a bit.
If it only didn't make the non-transparents so much worse..
too late for 1.8b1, transferring request to 1.8b2
Flags: blocking1.8b?
Flags: blocking1.8b2?
Flags: blocking1.8b-
*** Bug 255648 has been marked as a duplicate of this bug. ***
There's been several image drawing bugs created over the last little while related to slow drawing on 1 bit transparent images on nVidia cards. I dup'd them all to one bug, then dup'd that one here. The patch fixes the problem. However, from the looks of it, drawing ontop anything that needs blending will be slow for nVidia cards. But I suppose in the future if ::AlphaBlend can be incorperated (as mentioned in the other bug), then we will be cooking. Just for the record, I'll paste my comment from Bug 255648: It appears using CreateDIBSection to create the surface instead of CreateCompatibleBitmap slows down nVidia's (on card?) optimization routines (namely StretchDIBits).
Blocks: 255648
Sorry, that should read: It appears using CreateCompatibleBitmap to create the surface instead of CreateDIBSection slows down nVidia's (on card?) optimization routines (namely StretchDIBits). In other words: CreateDIBSection + nVidia StretchDIBits() = ok CreateCompatibleBitmap + nVidia StretchDIBits() = slow painting
nVidia Card; Win2k; I removed all the div_block results since they hovered around 8031 a) Without "Test Patch" (Uses CreateCompatibleBitmap): relative: transpare: 53593 absolute: transpare: 52094 fixed : transpare: 52187 static : transpare: 8031 relative: not_trans: 8032 absolute: not_trans: 8031 fixed : not_trans: 8031 static : not_trans: 8031 b) With "Test Patch" (Uses CreateDIBSection) relative: transpare: 14750 absolute: transpare: 13437 fixed : transpare: 13391 static : transpare: 8047 relative: not_trans: 13750 absolute: not_trans: 13359 fixed : not_trans: 13328 static : not_trans: 8031 c) Without "Test Patch", but with last patch of Bug 205893 backed out: relative: transpare: 8031 absolute: transpare: 8031 fixed : transpare: 8031 static : transpare: 8032 relative: not_trans: 8031 absolute: not_trans: 8031 fixed : not_trans: 8031 static : not_trans: 8032 d) With "Test Patch", last patch of 205893 backedout: Results: relative: transpare: 12891 absolute: transpare: 12516 fixed : transpare: 12516 static : transpare: 8031 relative: not_trans: 12750 absolute: not_trans: 8015 fixed : not_trans: 8188 static : not_trans: 8015 I bet you like c ;) The last patch in Bug 205893 changed image optimization from CreateDIBitmap (which create a DDB) to CreateDIBSection (which creates a DIB). The premise of the patch in Bug 205893 was that DIB sections are just as fast as DDBs. Just backing out the last patch of Bug 205893 will not solve everything. Currently we have a 0x20000 minimum size limit before optimizing an image. When an image is smaller than that, we'll get results like we do in a), which is horribly slow, or with test patch, b) which is moderately slow. The best solution seems to be always have an image of any size optimized, and use CreateCompatibleBitmap (no "test patch"). The best solution is also the most complex, as in order to have all images optimized, we need some sort of HBITMAP caching mechanism. See http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsImageWin.cpp#1443 To update my statement on Comment #29: Surface | Image | Result --------------------------------------------- CompatibleBitmap(DDB) | DIB | VERY Slow CompatibleBitmap(DDB) | DDB | Fast CompatibleBitmap(DDB) | None | VERY Slow DIBSection | DIB | Slow DIBSection | DDB | Not as Slow DIBSection | None | Slow
Martijn, Ere, others, could you run the test with the latest nightly?
Depends on: 284716
Ok, results with a 2005-3-4 and a 2005-3-6 trunk build. On a 600MHz Duron, Geforce 2MX 100/200 at 16bits color depth. Results 03-04: 03-06: relative: transpare: 41860 8031 absolute: transpare: 36492 8031 fixed : transpare: 37303 8031 static : transpare: 8031 8031 relative: not_trans: 8312 8031 absolute: not_trans: 8032 8031 fixed : not_trans: 8032 8031 static : not_trans: 8032 8031 relative: div_block: 8032 8021 absolute: div_block: 8022 8031 fixed : div_block: 8032 8031 static : div_block: 8032 8031 The results in 32 color depth are the same for the 3-6 build (the 4-3 build is even slower than in 16b color depth). Also, I don't see any cpu strain anymore in the 3-6 build. Thank you!!! I would call this bug fixed, not? :) Minor nit, when dynamically changing the color depth and then running the test, the test becomes very slow. I have to restart the browser to make it fast again.
Fixed by checkin for bug 284716
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Isn't that a WFM instead of fix?
wfm is when it magically works and you don't know why. We _do_ know the exact code change that fixed this bug, though.
Flags: blocking1.8b2?
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: