Closed Bug 397303 Opened 17 years ago Closed 16 years ago

Thebes rendering of inset/outset borders very slow

Categories

(Core :: Graphics, defect, P2)

x86
All
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: bzbarsky, Assigned: vlad)

References

(Blocks 1 open bug)

Details

(Keywords: perf, regression)

Attachments

(1 file, 1 obsolete file)

As I debugged bug 392629, I figured out that changing the forms.css rules that say: border: 1px outset black !important; border-left-width: 2px ! important; to either have "border-style: solid" on all sides or have "border-width: 1px" on all sides sped things up immensely. Setting all sides to the same width but keeping the same style was still painfully slow. We're going to fix bug 392629 by just not repainting a the list control's border as one scrolls inside it. But we should still address the beveled border painting issue.
Flags: blocking1.9?
Flags: blocking1.9? → blocking1.9+
Priority: -- → P4
Assignee: nobody → vladimir
This might be related to the wikipedia problem with clip performance issues, perhaps? In any case, should this really be a P4, and not P3 given the current "P4 is actually just 'wanted'" thing?
(In reply to comment #1) > This might be related to the wikipedia problem with clip performance issues, > perhaps? > > In any case, should this really be a P4, and not P3 given the current "P4 is > actually just 'wanted'" thing? > Hrm. Do we think this is common perf issue? If so I'd vote for movin it up..
Hmm. I'm being self-contradictory in comment 0; that's what happens when I type things as I test. I seem to recall that this was a problem no matter what the widths were, as long as the outset style was used. Which means any <table border="N"> for N > 0 will trigger it. I can double-check when I get back to Chicago, I guess...
Bz you get a chance to check this out?
This got lost somehow. I just retested, and the behavior is: 1) Solid borders of different widths are reasonably fast. 2) Outset borders all of the same width > 1px are slow. The 1px case is fast. 3) Outset borders of different widths are even slower than outset borders all of the same width. So yes, <table border="N"> for N > 1, will trigger this bug.
(In reply to comment #5) > This got lost somehow. I just retested, and the behavior is: > > 1) Solid borders of different widths are reasonably fast. > 2) Outset borders all of the same width > 1px are slow. The 1px case is fast. > 3) Outset borders of different widths are even slower than outset borders all > of the same width. > > So yes, <table border="N"> for N > 1, will trigger this bug. > K - thanks Bz. Vlad you got this? Should it be higher than P4?
I've got a benchmark somewhere here for border rendering that I used when writing this code; let me grab some numbers from all three platforms and see where we're at.
Ok, BorderBench 2008* says that inset/outset/groove/ridge borders are significantly slower than even the most general case of 4 totally different borders. This is obviously wrong and will be fixed. This is also the case on Win32 as well as Linux; oddly, Mac doesn't show this problem. [*] http://people.mozilla.com/~vladimir/misc/borderbench.html
OS: Linux → All
Priority: P4 → P2
For what it's worth, with Linux and outset borders, the gfxContext::Fill call under FillFastBorderPath seems to be a large contributor to painting performance on the testcase from comment 0 with the invalidate that covers the border rect reintroduced. On that codepath, we call _cairo_surface_fallback_fill, then _clip_and_composite_trapezoids, and end up spending almost all our time actually under XSync called from XRenderCompositeTrapezoids (this is a --sync profile, of course).
Can we add this benchmark to Tgfx?
We can, though I'm starting to rethink the value of very targetted tests in Tgfx. Right now there are a bunch of actual pages in there, which is good, but also some specific benchmarks that I wrote. These were good while I was working on those code paths, but I don't think they're useful to monitor in general -- only people working on those specific areas should really have need of the fine-grained benchmarks.
(In reply to comment #11) > We can, though I'm starting to rethink the value of very targetted tests in > Tgfx. Right now there are a bunch of actual pages in there, which is good, but > also some specific benchmarks that I wrote. These were good while I was > working on those code paths, but I don't think they're useful to monitor in > general -- only people working on those specific areas should really have need > of the fine-grained benchmarks. > Why not? Fine grained benchmarks are a great way to isolate the particular problem at hand. If we already got them why not get them into the "permanent record" as it were.
Attached patch test patch [Checkin: Comment 17] (obsolete) (deleted) — Splinter Review
bz, can you test this patch out and see if it improves things? It's not a final fix, because I still need to understand why the optimization is actually slowing things down, but this improves my perf by at least 2x.
That patch definitely improves things a lot.
(In reply to comment #14) > That patch definitely improves things a lot. Not sure whether to be happy or not about that. I'm trying to isolate things down into a cairo-perf testcase so that I can see what the heck the bad 'optimized' path is doing, since the 4-pass method does a pile more clipping than the 2-pass. However, what I'm hoping I'll find is that there's some brokenness that causes 2-pass to be slower for a whole bunch of good reasons, in which case I can go and delete a pile of code and have either 1-pass or 4-pass rendering.
Comment on attachment 296232 [details] [diff] [review] test patch [Checkin: Comment 17] Ok, let's go with this for now. What's happening is that in the 4-pass case, we set up a clip path for just that one side. This means that we end up creating temporary surfaces that are overall small compared to the entire border area, even though we do a bunch more work. For the 2-pass case, we don't do any clipping, we just draw two paths.. but the bounds of both of those paths are the entire rectangle covered by the border area. This means that a temporary surface gets created that's huge. A real, final fix for all of this stuff is to render each rectangular side region independently from the corners, since the sides can be very fast rectangular aligned fills. I don't know whether I should keep this bug open for that or file a new one, but we should at least take this patch for now. Once I do the other bit, I'll remove all the unnecessary code that deals with 4.
Attachment #296232 - Flags: review?(roc)
Attachment #296232 - Attachment description: test patch → test patch [Checkin: Comment 17]
See bug 377419 comment 39 for testcase(s), exhibiting 2(!) regressions, which were fixed by this checkin; there was also a discussion in that bug which actually concerns the current bug and not bug 377419...
No longer blocks: 377419
(In reply to comment #18) > See bug 377419 comment 39 for testcase(s), exhibiting 2(!) regressions, which > were fixed by this checkin; You're talking about performance regressions and not functionality/rendering/etc, correct?
So the bad news is that the bandaid made scrolling tinderbox a lot slower. :( A single middle-click on the right edge of the bottom scrollbar (to look at what the Windows builds are doing) takes about 3-4 seconds to repaint. Dragging the scroll thumb or using the arrow keys is even worse (more repaints). Reverting back to 2-pass rendering for inset/outset makes the number < 0.5 second for sure. I bet part of the problem is that tables are drawing too many of the clipped-out borders. In any case, I profiled scrolling the page with arrow keys while running --sync, and during said scrolling we spend about 75% of total time under nsCSSRendering::PaintBorder. I'll attach a jprof profile of just the time spent under there. To make a long story short, the time is largely spent calling: XRenderCompositeTrapezoids XRenderComposite _create_a8_picture XRenderFreePicture all under gfxContext::Clip().
Attached file Profile of the painting (deleted) —
Flags: wanted-next+
Flags: tracking1.9+
Flags: blocking1.9-
Blocks: 421601
It might be worth backing out the band-aid if it fixes bug 421601 or something.... does it?
(In reply to comment #23) > It might be worth backing out the band-aid if it fixes bug 421601 or > something.... does it? Yes, it seems like it does. With the testcase from that bug, I get 160206ms as result before I backed out the patch, after backing out the patch, I got 17914ms. (note, I tested with a debug build)
Usually numbers from debug builds are pretty meaningless... They can be off by many orders of magnitude in some cases.
From my experience, I seem to get pretty reliable numbers from debug builds. Fwiw, I'm trying to get a non-debug build, but I'm now blocked by bug 338224 (and was blocked by all kinds of Vista woes). I guess it could take me quite some time to get a working non-debug build, I'm afraid.
Comment on attachment 296232 [details] [diff] [review] test patch [Checkin: Comment 17] Backing this patch out; marking obsolete. It caused a significant perf regression in more common situations.
Attachment #296232 - Attachment is obsolete: true
This should be significantly better now after bug 424423 was fixed.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: