Closed
Bug 585258
Opened 14 years ago
Closed 14 years ago
Performance regression on the testcase in bug 424715
Categories
(Core :: Web Painting, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
blocking2.0 | --- | final+ |
People
(Reporter: bzbarsky, Assigned: roc)
References
(Depends on 1 open bug, Blocks 1 open bug, )
Details
(Keywords: perf, regression)
Attachments
(4 files, 1 obsolete file)
(deleted),
application/java-archive
|
Details | |
(deleted),
patch
|
dbaron
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
bzbarsky
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review |
We've got to add this to some regression suite...
I profiled the testcase in bug 424715 after bug 424715 comment 35, and a large part of what seems to be happening is that we're doing a LOT of nsRegion::Sub operations in nsDisplayItem::RecomputeVisibility. I would assume that we in fact do one Sub() call for each div, so each painting pass ends up O(N^2) in number of divs. Is this a region we can sanely simplify outward if it gets too complicated or something?
Note that this code was introduced in "Bug 564991. Part 9: Retain layer trees. r=tnikkel,sr=mats"
Reporter | ||
Updated•14 years ago
|
blocking2.0: --- → ?
Keywords: perf,
regression
Assignee | ||
Comment 1•14 years ago
|
||
Yes, we should simplify there.
Reporter | ||
Updated•14 years ago
|
Comment 2•14 years ago
|
||
I made a scrolling test out of this. These are the results:
Fx3.6.8: 46154ms
2010-07-14: 54926ms
2010-07-16: 35751ms
2010-08-30: 43296ms
Seamonkey 1.9.1 build of 2010-07-01: 8376ms
Opera: 10703ms
Between 2010-07-14 and 2010-07-16 is when bug 564991 landed.
Btw, notice how 1.9.1 builds seem to be that much quicker.
Is this test useful for this bug? Is this useful for a regression testsuite? What regression suite was commented about in comment 0?
Reporter | ||
Comment 3•14 years ago
|
||
> Is this test useful for this bug?
No, but it's a useful test....
> What regression suite was commented about in comment 0?
We don't have one like that yet. That's the problem.
roc, this should probably block a beta, right?
Assignee | ||
Updated•14 years ago
|
Attachment #471538 -
Attachment mime type: application/zip → application/java-archive
Assignee | ||
Comment 4•14 years ago
|
||
I don't think it needs to block beta, since all we need to do here is back off some analysis when it's getting expensive and repaint the entire window. But it definitely blocks!
Martijn's test should probably be added to Tscroll. But we don't need to use a huge file, we can document.write() our way to victory.
Assignee | ||
Comment 5•14 years ago
|
||
(In reply to comment #4)
> I don't think it needs to block beta, since all we need to do here is back off
> some analysis when it's getting expensive and repaint the entire window.
In fact we won't even need to repaint the window.
Assignee | ||
Updated•14 years ago
|
blocking2.0: ? → final+
Comment 6•14 years ago
|
||
I know this is "blocking final+" but my understanding is there are a large number of blocking bugs and not very much time to ship and cuts must come.
Can anyone really fix this fairly large performance regression (in terms of effort people made the test case performant in the first place) or should it really be considered a more long term gaol?
Assignee | ||
Comment 7•14 years ago
|
||
I think we can fix it fairly easily.
Comment 8•14 years ago
|
||
Doing a quick test of the original benchmark on my fairly fast system the current browsers look like this to me:
Firefox nightly build: 160 seconds at about 110/120 frames and then crash http://crash-stats.mozilla.com/report/index/bp-3f3378ae-890e-480d-b5a5-7e13a2101219
IE 9 Beta 2: 46.774 seconds
Chrome latest Canary Build: 10.877 seconds
Opera 11: 8.17 seconds
Reporter | ||
Comment 9•14 years ago
|
||
Yes, there's no point retesting until this bug is fixed. :(
Comment 10•14 years ago
|
||
Sorry for the bugspam, I get a little excited sometimes as I don't have much spare time to do Firefox testing these days. Good luck for Fx4 release.
Assignee | ||
Comment 11•14 years ago
|
||
Link to the testcase: jar:https://bug585258.bugzilla.mozilla.org/attachment.cgi?id=471538!/progressive_raytracer.html
Assignee | ||
Comment 12•14 years ago
|
||
My time is 53692ms in my debug build.
Assignee | ||
Comment 13•14 years ago
|
||
Actually that was 53s in an opt build.
Assignee | ||
Comment 14•14 years ago
|
||
This reduces it to 37s for me.
Attachment #499230 -
Flags: review?(dbaron)
Reporter | ||
Comment 15•14 years ago
|
||
OK, so I applied that patch. Things are still Really Slow, though. Compared to bug 424715 comment 26, I see something like 87% of the time in painting (used to be 42%). 20% is spent building the display list (14%) and computing visibility (6%). But painting itself is 65%: 10% building layers and calling WillEndTransaction on the layer builder, and 55% painting backgrounds...
Quartz debug (insofar as I can trust it) claims we repaint a box bounded by the text in the upper right corner and the most-recent line of the raytrace, so we would be repainting all the previously-painted things. I bet we didn't use to do that before...
Assignee | ||
Comment 16•14 years ago
|
||
On this hardware, FF 3.6 is 70s, Chrome is 13s. I'll see what else we can do, but this is a rare sort of testcase; we hardly ever see real pages with such an absurd number of element visible at the same time.
Assignee | ||
Comment 17•14 years ago
|
||
Oh, I'm testing the scroll link from the testcase in comment #2. Are you testing pageload in the original test?
Assignee | ||
Comment 18•14 years ago
|
||
At least during scrolling, we're repainting exactly the areas you'd expect.
Reporter | ||
Comment 19•14 years ago
|
||
3.6 is before we started fixing the deps of bug 424715, as I recall....
I just tried modifying the testcase by removing the innerHTML set for the thing at the top, and time I see dropped from 61s (with the attached patch) to about 37s. Safari 5 and Chrome are both at about 11s (with the innerHTML set).
We still have the fix for bug 516740 in, so that's not it. :(
Reporter | ||
Comment 20•14 years ago
|
||
> Oh, I'm testing the scroll link from the testcase in comment #2.
Oh. See comment 3!
I'm testing the original test. Load http://nontroppo.org/timer/progressive_raytracer.html and click "Full Render".
Comment 21•14 years ago
|
||
Just FYI... Early betas of FF 3.7 had this test case down to 11 seconds on my Core2Duo, so this appears to have been more or less completely resolved at one point.
Reporter | ||
Comment 22•14 years ago
|
||
Yes, we know... See comment 0. ;)
Assignee | ||
Comment 23•14 years ago
|
||
OK thanks for clarifying that.
Now I think that the problem is in FrameLayerBuilder::InvalidateThebesLayerContents. The invalid region we accumulate gets complicated because the rects we invalidate aren't simply in top-to-bottom, left-to-right order; we get some holes. When it gets complicated, nsRegion::SimplifyOutward fluffs out to the bounding box, which includes the text at the top-right, so we invalidate a big chunk of the layer.
I think we need to make nsRegion::SimplifyOutward smarter.
Assignee | ||
Comment 24•14 years ago
|
||
This takes the "full render" from 45s to 30s on my machine. We repaint a constant number of display items in every pass --- 732, which seems high. I'll look into that next.
Attachment #499257 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 25•14 years ago
|
||
Oh, 732 is about right actually. 3 rows, each 240 pixels (elements) wide.
Assignee | ||
Comment 26•14 years ago
|
||
Hmm, xperf is still not nearly as good as Shark for this work, if I'm using it right...
Assignee | ||
Comment 27•14 years ago
|
||
Maybe --enable-profiling would help...
Assignee | ||
Comment 28•14 years ago
|
||
Indeed, xperf looks quite good now.
Painting time still dominates (74% of total profile in nsLayoutUtils::PaintFrame). Of the painting time, roughly 33% is display-list construction (BuildDisplayListForStackingContext), 28% is ComputeVisibilityForRoot, 34% is layer construction (under FrameLayerBuilder::BuildContainerLayerFor and FrameLayerBuilder::WillEndTransaction), and 5% is actual painting (under LayerManagerD3D9::EndTransaction).
Reporter | ||
Comment 29•14 years ago
|
||
Hmm. So I tried applying the second patch, but it doesn't seem to help anything here... In fact, even if I make nsRegion::SimplifyOutward a no-op that changes nothing. I still see over 50% of the time spent in actual painting, and over 87% spent in painting-related stuff (including painting, layer construction, etc)...
Maybe we're simplifying the region somewhere else on Mac too or something? Or maybe our painting really has gotten that much slower? I'm seeing this crap land inside CoreGraphics color management stuff, for one thing.
Reporter | ||
Comment 30•14 years ago
|
||
Comment on attachment 499257 [details] [diff] [review]
Part 2: Make SimplifyOutward smarter
>+ pRect->YMost () >= pRect->next->y)
Drop the space char after "YMost" here?
>+ pRect->UnionRect(*pRect, *pRect->next);
>+ delete Remove (pRect->next);
And here after "Remove"?
r=me with those nits.
Attachment #499257 -
Flags: review?(bzbarsky) → review+
Reporter | ||
Comment 31•14 years ago
|
||
But again, part 2 seems to do nothing for me on Mac.
Assignee | ||
Comment 32•14 years ago
|
||
(In reply to comment #30)
> Comment on attachment 499257 [details] [diff] [review]
> Part 2: Make SimplifyOutward smarter
>
> >+ pRect->YMost () >= pRect->next->y)
>
> Drop the space char after "YMost" here?
>
> >+ pRect->UnionRect(*pRect, *pRect->next);
> >+ delete Remove (pRect->next);
>
> And here after "Remove"?
>
> r=me with those nits.
Those are consistent with the rest of nsRegion.cpp.
Assignee | ||
Comment 33•14 years ago
|
||
If you count the number of times we reach here:
http://mxr.mozilla.org/mozilla-central/source/layout/base/FrameLayerBuilder.cpp#1672
in each call to DrawThebesLayer, you should see that number growing linearly with each paint before part 2, and not growing with each paint after part 2. At least I did.
(In reply to comment #28)
> Painting time still dominates (74% of total profile in
> nsLayoutUtils::PaintFrame). Of the painting time, roughly 33% is display-list
> construction (BuildDisplayListForStackingContext), 28% is
> ComputeVisibilityForRoot, 34% is layer construction (under
> FrameLayerBuilder::BuildContainerLayerFor and
> FrameLayerBuilder::WillEndTransaction), and 5% is actual painting (under
> LayerManagerD3D9::EndTransaction).
Of those, the layer construction time (34%) is new. The rest should be pretty much unchanged since 3.6. I guess it's possible that Quartz painting has slowed down ... we've switched from drawing to a CGBitmapContext to a CGLayer, and maybe that adds overhead that shows up when you're drawing 59,000 1x1 pixel rects.
I'm willing to do small tweaks to make this testcase faster but not major restructuring. This testcase has 100x more visible elements than any page you'll ever see that's not doing some kind of one-DIV-per-pixel hack ... and <canvas> is the right API for this use-case.
Reporter | ||
Comment 34•14 years ago
|
||
> Those are consistent with the rest of nsRegion.cpp.
Ah, ok.
I agree we shouldn't spend too much time on this except insofar as real pages might be affected. I'll take a look at the counts I see.
Assignee | ||
Comment 35•14 years ago
|
||
If the counts keep growing, you'd want to look at calls to ThebesLayerOGL::InvalidateRegion to see which region(s) are being invalidated at each paint. It should be something reasonable, a rectangle for the text and one or more rectangles for the strip of DIVs.
Reporter | ||
Comment 36•14 years ago
|
||
Over here (on Mac), unless I'm screwing something up, I see the number of times we reach that line growing linearly (adding 729 per paint) no matter what I do with nsRegion::SimplifyOutward (including just putting a return at the very beginning of the function)...
Lemme take a look at those regions.
Reporter | ||
Comment 37•14 years ago
|
||
Yeah, those look broken to me. I see two InvalidateRegion calls per paint. The first is passed an empty region. The second is passed a region with three rects. Typical rects for that second call:
x=502 y=-1 width=745 height=231
x=502 y=230 width=243 height=2
x=745 y=230 width=502 height=1
On each call the y values for rects 2 and 3 and the height for rect 1 increase by 3.
So I guess the question is where that first rect comes from.
Comment 38•14 years ago
|
||
Comment on attachment 499257 [details] [diff] [review]
Part 2: Make SimplifyOutward smarter
>+ // Combine with the following rectangle if they have the same YMost
>+ // or if they overlap vertically. This ensures that all overlapping
>+ // rectangles are merged, preserving the invariant that rectangles
>+ // don't overlap.
"... don't intersect" would distinguish from other kinds of overlapping.
Comment 39•14 years ago
|
||
Real page load times are always most important, but it would be cool if there existed a suite of 'chaos' benchmarks that reacted very badly to unintended performance regressions in very important functions, that way they could be treated as less important but give developers of an idea of potential unintended regressions.
Also it's a bit of a shame between Fx 3.6 and 4.0 this benchmark has had Fx as the slowest modern browser, to the fastest, to the slowest again. Much appreciate your attention roc!
Reporter | ||
Comment 40•14 years ago
|
||
I _think_ it's already present in the invalidThebesContent region we get in FrameLayerBuilder::BuildContainerLayerFor. Though that region has 5 rects, and the region that's returned from ToOutsidePixels only has 3...
Comment 41•14 years ago
|
||
Comment on attachment 499230 [details] [diff] [review]
Part 1: Don't make visible region arbitrarily complex in RecomputeVisibility
r=dbaron
Attachment #499230 -
Flags: review?(dbaron) → review+
Assignee | ||
Updated•14 years ago
|
Keywords: checkin-needed
Whiteboard: [needsd landing]
Assignee | ||
Comment 42•14 years ago
|
||
(In reply to comment #40)
> I _think_ it's already present in the invalidThebesContent region we get in
> FrameLayerBuilder::BuildContainerLayerFor. Though that region has 5 rects, and
> the region that's returned from ToOutsidePixels only has 3...
That region is added to in FrameLayerBuilder::InvalidateThebesLayerContents. If you log the incoming aRect and the resulting invalidThebesContent region, you should be able to understand how the region ends up with that oversized first rectangle.
The SimplifyOutward(20) call there is what was causing trouble for me, which patch 2 fixed. Everything you're saying makes it sound like patch 2 was not applied ...
Reporter | ||
Comment 43•14 years ago
|
||
> Everything you're saying makes it sound like patch 2 was not applied ...
It really truly is!
I did this with printfs just now instead of a debugger, and in an opt build to avoid whatever noise a debug build was introducing. I can no longer reproduce comment 37 on, but I do still see comment 36. Here's what a log looks like:
Our region is: Rect count: 2
[(52261,5130) 22559x930]
[(30120,26976) 14580x180]
And will become: Rect count: 2
[(871,85) 376x16]
[(502,449) 243x4]
Invaliding region: Rect count: 2
[(871,-1) 376x16]
[(502,363) 243x4]
Resulting valid region: Rect count: 6
[(0,0) 871x15]
[(0,15) 502x409]
[(502,15) 745x348]
[(745,363) 502x4]
[(502,367) 745x57]
[(0,424) 1247x420]
Painted item count: 45293
The logging for regions logs the rect count and then logs [(x,y) width x height] for each rect.
Those first two regions above are before and after the ToOutsidePixels call in FrameLayerBuilder::BuildContainerLayerFor. The next two are from ThebesLayerOGL::InvalidateRegion. The "painted item count" is the number of times in FrameLayerBuilder::DrawThebesLayer that we get past the cdi->mItem->GetVisibleRect().IsEmpty() check...
So that looks like the valid region we end up with in ThebesLayerOGL::InvalidateRegion is correct (I drew those 6 rects out, and it's the whole viewport except a 376px wide by 16px tall strip at top right and a 243px wide by 4px tall strip in the middle). But we're still painting too much in DrawThebesLayer for some reason.
Reporter | ||
Comment 44•14 years ago
|
||
OK. Looks like the aRegionToDraw passed to FrameLayerBuilder::DrawThebesLayer is too big. Typical values:
region to draw: Rect count: 1
[(502,0) 745x199]
region to draw: Rect count: 1
[(502,0) 745x202]
region to draw: Rect count: 1
[(502,0) 745x205]
etc
Reporter | ||
Comment 45•14 years ago
|
||
Ah, I found it! BasicTextureImage::BeginUpdate in GLContext.cpp always fluffs out aRegion to its bounding rect, with this comment:
// the basic impl can only upload updates to rectangles
So in BasicBufferOGL::BeginPaint we call BeginUpdate, and that sets result.mRegionToDraw to the bounding rect of the two-rect region we started with. With GL disabled, I see about 38 seconds for this testcase over here after applying the patches in this bug; with GL enabled it's closer to 60 seconds. Do we need a separate bug on this?
If I disable opengl layers, then the number of items I'm seeing painted per call to DrawThebesLayer stops growing and stabilizes at 1137 (no idea why; since we're invalidating 4px strips for some reason, I would have expected 4 * 243 = 972 plus maybe a few for the text, so 1137 is still a bit high, but it's less than 5*243...).
Reporter | ||
Comment 46•14 years ago
|
||
Reporter | ||
Comment 47•14 years ago
|
||
And with gl disabled my "paint-related" time is about 75% instead of 87%; most of this is display list management and such; only 4% is actual painting. There seem to be no obvious hotspots in the sense that all the display list stuff is just linear in the number of all things on the page, not just the things in the dirty area.
Assignee | ||
Comment 48•14 years ago
|
||
(In reply to comment #45)
> So in BasicBufferOGL::BeginPaint we call BeginUpdate, and that sets
> result.mRegionToDraw to the bounding rect of the two-rect region we started
> with. With GL disabled, I see about 38 seconds for this testcase over here
> after applying the patches in this bug; with GL enabled it's closer to 60
> seconds. Do we need a separate bug on this?
Yes, thanks. Fixing that would bring Mac in line with Windows here, and after these patches are applied, the problem boils down to "processing display items for all 59,000 visible rectangles in the window is slow".
Reporter | ||
Comment 49•14 years ago
|
||
File bug 621778.
Comment 50•14 years ago
|
||
This compiles, which is nice, but sensible people like a little more than "philor blindly unrotted it with absolutely no understanding of the patch or the change that bitrotted it" for things that are going to land under their name.
Attachment #499230 -
Attachment is obsolete: true
Attachment #500629 -
Flags: feedback?(roc)
Updated•14 years ago
|
Keywords: checkin-needed
Whiteboard: [needsd landing] → [needs landing]
Comment 51•14 years ago
|
||
The reason the original didn't compile is not because it is bitrotted, but because it depends on other changes in roc's queue.
Comment 52•14 years ago
|
||
(specifically bug 602757)
Updated•14 years ago
|
Attachment #500629 -
Attachment is obsolete: true
Attachment #500629 -
Flags: feedback?(roc)
Updated•14 years ago
|
Attachment #499230 -
Attachment is obsolete: false
Updated•14 years ago
|
Depends on: 602757
Whiteboard: [needs landing] → [needs landing][part 1 applies on top of attachment 491151 from bug 602757]
Assignee | ||
Comment 53•14 years ago
|
||
Yes, sorry, I added checkin-need prematurely. Very sorry!
Assignee | ||
Comment 54•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/35013af94ec7
http://hg.mozilla.org/mozilla-central/rev/c3825c079c00
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: [needs landing][part 1 applies on top of attachment 491151 from bug 602757]
Comment 55•14 years ago
|
||
Does this resolve the actual reported problem or just get it back to Firefox 3.6 levels? Which bug 424715 orriginal solved by making Firefox about 20x faster at this testcase than Firefox 3.6 was. Just need to know if I need to open another bug on the issue.
Assignee | ||
Comment 56•14 years ago
|
||
Gets us back to Firefox 3.6 levels I guess.
We're building a complete display list for the entire window on every paint to simplify the construction and update of our layer tree. That works pretty well when the window has hundreds of visible elements, not well when the window has 60,000 visible elements like here. I don't think we should be designing around this rare case.
Comment 57•14 years ago
|
||
I concur with the reasoning and only hopped the absurdity of the testcase would help find underlying problems when firefox has to deal with a large number of visible elements. And as other browsers still do thisin excess 10x faster I think it is still valid in that sense, will make another low priority bug for this.
Comment 58•14 years ago
|
||
When I was testing this, I found out that on my machine, each next test takes longer (first 3.5 seconds, second about 6 seconds, third 9 seconds, etc). Anyone can reproduce this? Or is this already a known bug?
Comment 59•14 years ago
|
||
All browsers will do this as it is related to the way the code is written.
Updated•6 years ago
|
Component: Layout: View Rendering → Layout: Web Painting
You need to log in
before you can comment on or make changes to this bug.
Description
•