Closed
Bug 277762
Opened 20 years ago
Closed 20 years ago
DHTML performance regression because of the patch from bug 228399
Categories
(Core Graveyard :: GFX: Win32, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: martijn.martijn, Assigned: emaijala+moz)
References
()
Details
(Keywords: perf, regression, testcase)
Attachments
(6 files)
This is a spin-off from bug 277044.
See the table with times in the testcase from bug 277044:
Transparent image:
Build relative absolute fixed static
2004-11-26 7:42am 24285ms 8032ms 16043ms 8032ms
2004-11-27 7:48am 39937ms 36222ms 36463ms 8032ms
Non-transparent image:
Build relative absolute fixed static
2004-11-26 7:42am 19949ms 8022ms 10044ms 8032ms
2004-11-27 7:48am 10035ms 8031ms 8052ms 8032ms
The build I used for this test was a Firefox trunk build.
Also I build a non-debug build from a 2004-11-23 dated source of Mozilla, and
tested with and without the patch from bug 228399, using the automatic testcase
from this url:
http://martijn.heelveel.info/test/mozilla/slowdhtml/slow_relative4.html
Those test results are here:
http://martijn.heelveel.info/test/mozilla/slowdhtml/results_patch228399.htm
moz_20050109 = the build without the patch from bug 228399
moz_20050110 = the build with the patch from bug 228399
In the positioned fixed/transparent image case, the red block begins to move
especially slow in the build, when it is moving at the end, out of the
transparent image.
Comment 1•20 years ago
|
||
Note: this regression seems Windows-only (I don't see this effect on Linux...)
Assignee: win32 → emaijala
Flags: blocking1.8b?
Assignee | ||
Comment 2•20 years ago
|
||
Could you test a build from 2004-11-03? There were two patches in bug 228399.
The first one was checked in 2004-11-04 and fixed the actual issue. The second
one was checked in 2004-11-27 and reverted the system partially back to the old
behavior because of other performance problems (possibly the same you can see
with the non-transparent image here) the first change caused.
Reporter | ||
Comment 3•20 years ago
|
||
Ok, here are my test results. For completeness I also added the results for
Mozilla at 2004-11-05
Results from Mozilla 2004-11-03 Mozilla 2004-11-05
relative: transpare: 8031 8031
absolute: transpare: 8021 8031
fixed : transpare: 9243 8382
static : transpare: 8032 8032
relative: not_trans: 8022 8032
absolute: not_trans: 8032 8032
fixed : not_trans: 8052 8132
static : not_trans: 8032 8032
relative: div_block: 8032 8032
absolute: div_block: 8022 8032
fixed : div_block: 8032 8142
static : div_block: 8032 8032
Assignee | ||
Comment 4•20 years ago
|
||
I took my few days old SeaMonkey trunk build and removed both patches of bug
228399. I did the test, added added the first patch, tested again, added the
supplementary patch and tested again. Here are the results:
Original:
relative: transpare: 10766
absolute: transpare: 8032
fixed : transpare: 8022
static : transpare: 8032
relative: not_trans: 10245
absolute: not_trans: 8022
fixed : not_trans: 8022
static : not_trans: 8032
relative: div_block: 8022
absolute: div_block: 8022
fixed : div_block: 8022
static : div_block: 8032
With patch 1:
Results:
relative: transpare: 17044
absolute: transpare: 10595
fixed : transpare: 10956
static : transpare: 8031
relative: not_trans: 14861
absolute: not_trans: 8713
fixed : not_trans: 8833
static : not_trans: 8022
relative: div_block: 8983
absolute: div_block: 8021
fixed : div_block: 8031
static : div_block: 8031
With patch 2 (supplementary optimization patch):
Results:
relative: transpare: 10815
absolute: transpare: 8011
fixed : transpare: 8021
static : transpare: 8021
relative: not_trans: 10044
absolute: not_trans: 8021
fixed : not_trans: 8021
static : not_trans: 8021
relative: div_block: 8031
absolute: div_block: 8031
fixed : div_block: 8011
static : div_block: 8021
So, the primary patch slowed the rendering down as I noticed, but the secondary
patch pushed it back to the original speed. So, for some reason my results don't
support the others. My tests were run with a debug build on Athlon XP 2700+ with
Matrox Millennium G550 at 1280x1024 32bit.
Comment 5•20 years ago
|
||
So... a few things:
1) Looking at performance on a debug build is usually not useful
2) The number is not all there is to this test. The number will be at _least_
8000-some no matter what due to how we process timeouts (at least 10ms per
timeout). The question is also how much CPU is being used in the process...
on slower machines that may increase the times, but on a modern fast machine
chances are we can still get all the processing into that 10ms timeslot with
the simple testcase....
Assignee | ||
Comment 6•20 years ago
|
||
I've found debug build adequate for testing performance delta.
In the test case the timeout was 20 milliseconds. I changed it to 0 (=10ms) and
the results didn't halve, so I'm now jumping to the assumption the results now
reflect the real maximum speed better. Here are the results again:
Original:
relative: transpare: 10935
absolute: transpare: 5778
fixed : transpare: 5989
static : transpare: 5207
relative: not_trans: 10205
absolute: not_trans: 5457
fixed : not_trans: 5608
static : not_trans: 5148
relative: div_block: 7681
absolute: div_block: 4947
fixed : div_block: 4977
static : div_block: 5087
Patch 1:
relative: transpare: 17064
absolute: transpare: 10605
fixed : transpare: 10766
static : transpare: 5158
relative: not_trans: 15002
absolute: not_trans: 8893
fixed : not_trans: 8823
static : not_trans: 5158
relative: div_block: 8993
absolute: div_block: 4927
fixed : div_block: 5027
static : div_block: 5017
Patch 2:
relative: transpare: 10926
absolute: transpare: 5809
fixed : transpare: 5949
static : transpare: 5218
relative: not_trans: 10145
absolute: not_trans: 5448
fixed : not_trans: 5628
static : not_trans: 5157
relative: div_block: 7791
absolute: div_block: 4937
fixed : div_block: 5007
static : div_block: 5058
Anyway, even if these are completely meaningless, I can't actually see why the
latter patch would make the situation worse than it was unpatched.
Comment 7•20 years ago
|
||
Could this depend on graphics card or Windows version?
Reporter | ||
Comment 8•20 years ago
|
||
Fwiw, these are the results on my WinXP computer, Duron 600MHz, NVidia GeForce2
MX 100/200 32MB, screen setting 1024*768 16bit color depth.
Results build: 2004-11-26 2004-11-28
relative: transpare: 24676 40738
absolute: transpare: 8032 38174
fixed : transpare: 16484 38295
static : transpare: 8032 8032
relative: not_trans: 16574 8042
absolute: not_trans: 8032 8032
fixed : not_trans: 10035 8032
static : not_trans: 8022 8032
relative: div_block: 11487 8032
absolute: div_block: 8022 8032
fixed : div_block: 8112 8032
static : div_block: 8032 8032
Comment 9•20 years ago
|
||
Comment 10•20 years ago
|
||
This is probably not the best Bug to place this comment, since it does not
relate directly to the bugs mentioned here. Since Bug #245131 Cpu utilization
has dropped dramatically. But at least since 10/14/04 some scripts are not
aethetically inferior to the branch builds. The testcase should move the dots
smoothly and with constant velocity, and thats true on the branch builds. Viewed
with trunk builds of thunderbird or firefox the movement is jerky and erratic,
compared to the branch.
If this testcase/comment should be posted elsewhere, sorry for the spam.
I am using WinXP pro sp2, P3-s 1.4 g, 512m Ram, ATI radeon 7200 64m DDR
Comment 11•20 years ago
|
||
Yes, that testcase should be placed elsewhere. Specifically, in a bug of its
own, with more information on when you see the performance degradation start.
Please cc me on said bug. _This_ bug covers a specific performance regression
that happened on a specific day due to a specific patch. It's got enough
complications with failures to reproduce that adding random unrelated comments
is very much NOT appreciated. Please don't ever do that again. Especially
since you KNEW that your issue was unrelated to this bug, as your comment shows.
I've seen your name around Bugzilla enough that you should know the simple "one
bug per issue" rule by now....
Reporter | ||
Comment 12•20 years ago
|
||
Ok, this is what I changed/removed in nsDrawingSurfaceWin.cpp (I then rebuilt
the gfx directory). These are the results in a non-debug build:
Results unchanged changed:
relative: transpare: 36232 16464
absolute: transpare: 36061 16033
fixed : transpare: 68128 24045
static : transpare: 8022 8032
relative: not_trans: 8082 12188
absolute: not_trans: 8042 8182
fixed : not_trans: 11957 12148
static : not_trans: 8031 8032
relative: div_block: 8031 8032
absolute: div_block: 8031 8022
fixed : div_block: 8031 8202
static : div_block: 8031 8032
Comment 13•20 years ago
|
||
Interesting. That's just backing out the optimization patch from bug 228399...
I wonder why that patch makes things faster on some systems and slower on others?
Comment 14•20 years ago
|
||
What it comes down to is video card drivers. Most drivers handle 24 bit
CreateDIBSection fast in any video mode. A few, do not.
CreateCompatibleBitmap, for the most part, is slower than CreateDIBSection, but
not always.
That's been my experience. The really odd thing is that sometimes notching down
the graphics accelleration sometimes speeds things up. Older nVidea cards were
especially bad for this. Personally, I stay away from CreateCompatibleBitmap,
but then again, I currently have a nVidea card.
I played with this sort of stuff a few years ago, when I had a matrox and an
nVidea card. It becomes really frustrating after a while. I'll dig back and
see if I have documented anything from back then, but most of my work/testing is
probably long gone (and forgotten by me)
Assignee | ||
Comment 15•20 years ago
|
||
Well, as I said, the optimization patch reverts the functionality partially back
to what it was before the first patch as the first patch caused a quite bad
slowdown in a specific case at 16 bit and I thought it would be safer to do as
we previously did for the cases where blending isn't needed. I know that many
things are faster without the "optimization" patch, but it seems that with the
current architecture we'll always have a compromise.
I wonder if it would be worth a shot trying to always use a dib section if the
color depth is >=24.
Assignee | ||
Comment 16•20 years ago
|
||
Btw, it's not that CreateDIBSection would be slow, it's that GetDIBits is slow.
It would be so much easier if we had the data around and wouldn't need GetDIBits.
Comment 17•20 years ago
|
||
Comment 15 sounds like a reasonable approach given comment 14. Want to post a
patch that Martijn could take for a spin?
Assignee | ||
Comment 18•20 years ago
|
||
My results were not too encouraging. The test case slowed down considerably
with this patch, but it would be nice to see Martijn's results.
I think we might do way better if we'd be able to keep a backbuffer around all
the time so there wouldn't be need to get the data from the device.
Reporter | ||
Comment 19•20 years ago
|
||
These are my results before and after I applied the test patch:
Results before: after:
relative: transpare: 36463 36522
absolute: transpare: 36062 36062
fixed : transpare: 68549 68458
static : transpare: 8031 8031
relative: not_trans: 8092 8081
absolute: not_trans: 8042 8041
fixed : not_trans: 12037 12047
static : not_trans: 8031 8032
relative: div_block: 8031 8032
absolute: div_block: 8031 8022
fixed : div_block: 8031 8032
static : div_block: 8031 8032
It doesn't seem to make any difference.
Reporter | ||
Comment 20•20 years ago
|
||
My server is down for quite some time now, that's why I'm attaching the
testcase to this bug.
Reporter | ||
Comment 21•20 years ago
|
||
Reporter | ||
Comment 22•20 years ago
|
||
Assignee | ||
Comment 23•20 years ago
|
||
Right, you're running @ 16 bits color depth, right? Then it doesn't make a
difference. Is it possible for you to test 24 bits mode?
Reporter | ||
Comment 24•20 years ago
|
||
Well, I can run in 32 bits mode. That seems also covered by your test patch.
These are the results:
Results: Before: After:
relative: transpare: 72855 22592
absolute: transpare: 72114 20179
fixed : transpare: 141664 36062
static : transpare: 8032 8032
relative: not_trans: 12098 17565
absolute: not_trans: 8112 16133
fixed : not_trans: 16203 28051
static : not_trans: 8031 8032
relative: div_block: 8061 8022
absolute: div_block: 8031 8022
fixed : div_block: 8021 8042
static : div_block: 8021 8032
So in that case, it seems improve things quite a bit.
Assignee | ||
Comment 25•20 years ago
|
||
If it only didn't make the non-transparents so much worse..
Comment 26•20 years ago
|
||
too late for 1.8b1, transferring request to 1.8b2
Flags: blocking1.8b?
Flags: blocking1.8b2?
Flags: blocking1.8b-
Reporter | ||
Updated•20 years ago
|
Comment 27•20 years ago
|
||
*** Bug 255648 has been marked as a duplicate of this bug. ***
Comment 28•20 years ago
|
||
There's been several image drawing bugs created over the last little while
related to slow drawing on 1 bit transparent images on nVidia cards. I dup'd
them all to one bug, then dup'd that one here.
The patch fixes the problem. However, from the looks of it, drawing ontop
anything that needs blending will be slow for nVidia cards. But I suppose in
the future if ::AlphaBlend can be incorperated (as mentioned in the other bug),
then we will be cooking.
Just for the record, I'll paste my comment from Bug 255648:
It appears using CreateDIBSection to create the surface instead of
CreateCompatibleBitmap slows down nVidia's (on card?) optimization routines
(namely StretchDIBits).
Comment 29•20 years ago
|
||
Sorry, that should read:
It appears using CreateCompatibleBitmap to create the surface instead of
CreateDIBSection slows down nVidia's (on card?) optimization routines
(namely StretchDIBits).
In other words:
CreateDIBSection + nVidia StretchDIBits() = ok
CreateCompatibleBitmap + nVidia StretchDIBits() = slow painting
Comment 30•20 years ago
|
||
nVidia Card; Win2k; I removed all the div_block results since they hovered
around 8031
a) Without "Test Patch" (Uses CreateCompatibleBitmap):
relative: transpare: 53593
absolute: transpare: 52094
fixed : transpare: 52187
static : transpare: 8031
relative: not_trans: 8032
absolute: not_trans: 8031
fixed : not_trans: 8031
static : not_trans: 8031
b) With "Test Patch" (Uses CreateDIBSection)
relative: transpare: 14750
absolute: transpare: 13437
fixed : transpare: 13391
static : transpare: 8047
relative: not_trans: 13750
absolute: not_trans: 13359
fixed : not_trans: 13328
static : not_trans: 8031
c) Without "Test Patch", but with last patch of Bug 205893 backed out:
relative: transpare: 8031
absolute: transpare: 8031
fixed : transpare: 8031
static : transpare: 8032
relative: not_trans: 8031
absolute: not_trans: 8031
fixed : not_trans: 8031
static : not_trans: 8032
d) With "Test Patch", last patch of 205893 backedout:
Results:
relative: transpare: 12891
absolute: transpare: 12516
fixed : transpare: 12516
static : transpare: 8031
relative: not_trans: 12750
absolute: not_trans: 8015
fixed : not_trans: 8188
static : not_trans: 8015
I bet you like c ;) The last patch in Bug 205893 changed image optimization
from CreateDIBitmap (which create a DDB) to CreateDIBSection (which creates a
DIB). The premise of the patch in Bug 205893 was that DIB sections are just as
fast as DDBs.
Just backing out the last patch of Bug 205893 will not solve everything.
Currently we have a 0x20000 minimum size limit before optimizing an image. When
an image is smaller than that, we'll get results like we do in a), which is
horribly slow, or with test patch, b) which is moderately slow.
The best solution seems to be always have an image of any size optimized, and
use CreateCompatibleBitmap (no "test patch"). The best solution is also the
most complex, as in order to have all images optimized, we need some sort of
HBITMAP caching mechanism. See
http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsImageWin.cpp#1443
To update my statement on Comment #29:
Surface | Image | Result
---------------------------------------------
CompatibleBitmap(DDB) | DIB | VERY Slow
CompatibleBitmap(DDB) | DDB | Fast
CompatibleBitmap(DDB) | None | VERY Slow
DIBSection | DIB | Slow
DIBSection | DDB | Not as Slow
DIBSection | None | Slow
Comment 31•20 years ago
|
||
Martijn, Ere, others, could you run the test with the latest nightly?
Reporter | ||
Comment 32•20 years ago
|
||
Ok, results with a 2005-3-4 and a 2005-3-6 trunk build. On a 600MHz Duron,
Geforce 2MX 100/200 at 16bits color depth.
Results 03-04: 03-06:
relative: transpare: 41860 8031
absolute: transpare: 36492 8031
fixed : transpare: 37303 8031
static : transpare: 8031 8031
relative: not_trans: 8312 8031
absolute: not_trans: 8032 8031
fixed : not_trans: 8032 8031
static : not_trans: 8032 8031
relative: div_block: 8032 8021
absolute: div_block: 8022 8031
fixed : div_block: 8032 8031
static : div_block: 8032 8031
The results in 32 color depth are the same for the 3-6 build (the 4-3 build is
even slower than in 16b color depth). Also, I don't see any cpu strain anymore
in the 3-6 build.
Thank you!!! I would call this bug fixed, not? :)
Minor nit, when dynamically changing the color depth and then running the test,
the test becomes very slow. I have to restart the browser to make it fast again.
Comment 33•20 years ago
|
||
Fixed by checkin for bug 284716
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Comment 34•20 years ago
|
||
Isn't that a WFM instead of fix?
Comment 35•20 years ago
|
||
wfm is when it magically works and you don't know why. We _do_ know the exact
code change that fixed this bug, though.
Updated•20 years ago
|
Flags: blocking1.8b2?
Updated•16 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•