Bugzilla

Updated

•

17 years ago

Flags: blocking1.9? → blocking1.9+

Updated

•

17 years ago

Priority: -- → P4

Updated

•

17 years ago

Assignee: nobody → vladimir

Reporter

Comment 1

•

17 years ago

This might be related to the wikipedia problem with clip performance issues, perhaps? In any case, should this really be a P4, and not P3 given the current "P4 is actually just 'wanted'" thing?

Comment 2

•

17 years ago

(In reply to comment #1) > This might be related to the wikipedia problem with clip performance issues, > perhaps? > > In any case, should this really be a P4, and not P3 given the current "P4 is > actually just 'wanted'" thing? > Hrm. Do we think this is common perf issue? If so I'd vote for movin it up..

Reporter

Comment 3

•

17 years ago

Hmm. I'm being self-contradictory in comment 0; that's what happens when I type things as I test. I seem to recall that this was a problem no matter what the widths were, as long as the outset style was used. Which means any <table border="N"> for N > 0 will trigger it. I can double-check when I get back to Chicago, I guess...

Comment 4

•

17 years ago

Bz you get a chance to check this out?

Reporter

Comment 5

•

17 years ago

This got lost somehow. I just retested, and the behavior is: 1) Solid borders of different widths are reasonably fast. 2) Outset borders all of the same width > 1px are slow. The 1px case is fast. 3) Outset borders of different widths are even slower than outset borders all of the same width. So yes, <table border="N"> for N > 1, will trigger this bug.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 6

•

17 years ago

(In reply to comment #5) > This got lost somehow. I just retested, and the behavior is: > > 1) Solid borders of different widths are reasonably fast. > 2) Outset borders all of the same width > 1px are slow. The 1px case is fast. > 3) Outset borders of different widths are even slower than outset borders all > of the same width. > > So yes, <table border="N"> for N > 1, will trigger this bug. > K - thanks Bz. Vlad you got this? Should it be higher than P4?

Assignee

Comment 7

•

17 years ago

I've got a benchmark somewhere here for border rendering that I used when writing this code; let me grab some numbers from all three platforms and see where we're at.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 8

•

17 years ago

Ok, BorderBench 2008* says that inset/outset/groove/ridge borders are significantly slower than even the most general case of 4 totally different borders. This is obviously wrong and will be fixed. This is also the case on Win32 as well as Linux; oddly, Mac doesn't show this problem. [*] http://people.mozilla.com/~vladimir/misc/borderbench.html

OS: Linux → All

Priority: P4 → P2

Reporter

Comment 9

•

17 years ago

For what it's worth, with Linux and outset borders, the gfxContext::Fill call under FillFastBorderPath seems to be a large contributor to painting performance on the testcase from comment 0 with the invalidate that covers the border rect reintroduced. On that codepath, we call _cairo_surface_fallback_fill, then _clip_and_composite_trapezoids, and end up spending almost all our time actually under XSync called from XRenderCompositeTrapezoids (this is a --sync profile, of course).

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 10

•

17 years ago

Can we add this benchmark to Tgfx?

Assignee

Comment 11

•

17 years ago

We can, though I'm starting to rethink the value of very targetted tests in Tgfx. Right now there are a bunch of actual pages in there, which is good, but also some specific benchmarks that I wrote. These were good while I was working on those code paths, but I don't think they're useful to monitor in general -- only people working on those specific areas should really have need of the fine-grained benchmarks.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 12

•

17 years ago

(In reply to comment #11) > We can, though I'm starting to rethink the value of very targetted tests in > Tgfx. Right now there are a bunch of actual pages in there, which is good, but > also some specific benchmarks that I wrote. These were good while I was > working on those code paths, but I don't think they're useful to monitor in > general -- only people working on those specific areas should really have need > of the fine-grained benchmarks. > Why not? Fine grained benchmarks are a great way to isolate the particular problem at hand. If we already got them why not get them into the "permanent record" as it were.

Assignee

Comment 13

•

17 years ago

Attached patch test patch [Checkin: Comment 17] (obsolete) (deleted) — Details — Splinter Review

bz, can you test this patch out and see if it improves things? It's not a final fix, because I still need to understand why the optimization is actually slowing things down, but this improves my perf by at least 2x.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Reporter

Comment 14

•

17 years ago

That patch definitely improves things a lot.

Assignee

Comment 15

•

17 years ago

(In reply to comment #14) > That patch definitely improves things a lot. Not sure whether to be happy or not about that. I'm trying to isolate things down into a cairo-perf testcase so that I can see what the heck the bad 'optimized' path is doing, since the 4-pass method does a pile more clipping than the 2-pass. However, what I'm hoping I'll find is that there's some brokenness that causes 2-pass to be slower for a whole bunch of good reasons, in which case I can go and delete a pile of code and have either 1-pass or 4-pass rendering.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 16

•

17 years ago

Comment on attachment 296232 [details] [diff] [review] test patch [Checkin: Comment 17] Ok, let's go with this for now. What's happening is that in the 4-pass case, we set up a clip path for just that one side. This means that we end up creating temporary surfaces that are overall small compared to the entire border area, even though we do a bunch more work. For the 2-pass case, we don't do any clipping, we just draw two paths.. but the bounds of both of those paths are the entire rectangle covered by the border area. This means that a temporary surface gets created that's huge. A real, final fix for all of this stuff is to render each rectangular side region independently from the corners, since the sides can be very fast rectangular aligned fills. I don't know whether I should keep this bug open for that or file a new one, but we should at least take this patch for now. Once I do the other bit, I'll remove all the unnecessary code that deals with 4.

Attachment #296232 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

17 years ago

Blocks: 377419

Updated

•

17 years ago

Attachment #296232 - Flags: superreview+

Attachment #296232 - Flags: review?(roc)

Attachment #296232 - Flags: review+

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 17

•

17 years ago

Ok, bandaid checked in.

Updated

•

17 years ago

Attachment #296232 - Attachment description: test patch → test patch [Checkin: Comment 17]

Comment 18

•

17 years ago

See bug 377419 comment 39 for testcase(s), exhibiting 2(!) regressions, which were fixed by this checkin; there was also a discussion in that bug which actually concerns the current bug and not bug 377419...

No longer blocks: 377419

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Updated

•

17 years ago

Blocks: 400226

Assignee

Comment 19

•

17 years ago

(In reply to comment #18) > See bug 377419 comment 39 for testcase(s), exhibiting 2(!) regressions, which > were fixed by this checkin; You're talking about performance regressions and not functionality/rendering/etc, correct?

Comment 20

•

17 years ago

'Perf', yes: see bug 377419 comment 46.

Reporter

Comment 21

•

17 years ago

So the bad news is that the bandaid made scrolling tinderbox a lot slower. :( A single middle-click on the right edge of the bottom scrollbar (to look at what the Windows builds are doing) takes about 3-4 seconds to repaint. Dragging the scroll thumb or using the arrow keys is even worse (more repaints). Reverting back to 2-pass rendering for inset/outset makes the number < 0.5 second for sure. I bet part of the problem is that tables are drawing too many of the clipped-out borders. In any case, I profiled scrolling the page with arrow keys while running --sync, and during said scrolling we spend about 75% of total time under nsCSSRendering::PaintBorder. I'll attach a jprof profile of just the time spent under there. To make a long story short, the time is largely spent calling: XRenderCompositeTrapezoids XRenderComposite _create_a8_picture XRenderFreePicture all under gfxContext::Clip().

Reporter

Comment 22

•

17 years ago

Attached file Profile of the painting (deleted) — Details

Updated

•

17 years ago

Flags: wanted-next+

Flags: tracking1.9+

Flags: blocking1.9-

Reporter

Updated

•

17 years ago

Blocks: 421601

Reporter

Comment 23

•

17 years ago

It might be worth backing out the band-aid if it fixes bug 421601 or something.... does it?

Martijn Wargers (dead)

Comment 24

•

17 years ago

(In reply to comment #23) > It might be worth backing out the band-aid if it fixes bug 421601 or > something.... does it? Yes, it seems like it does. With the testcase from that bug, I get 160206ms as result before I backed out the patch, after backing out the patch, I got 17914ms. (note, I tested with a debug build)