Closed Bug 554004 Opened 15 years ago Closed 2 years ago

Moving lots of SVG circles around the screen has huge display list overhead

Categories

(Core :: SVG, defect)

x86
All
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: webmaster, Unassigned)

References

(Depends on 2 open bugs, )

Details

(Keywords: perf, regression, Whiteboard: [in-the-wild] [external-report][animating cx/cy of lots of circles][display list overhead])

Attachments

(1 file, 1 obsolete file)

User-Agent: Mozilla/5.0 (X11; U; Linux i686; sv-SE; rv:1.9.1.8) Gecko/20100216 Fedora/3.5.8-1.fc12 Firefox/3.5.8 Build Identifier: A few benchmarks are provided on the page. Running on my Z61p I can confirm that Firefox is way much slower than Chrome. Note that Jeff Schiller has made a much more Firefox friendly version of this test at http://www.codedread.com/blog/archives/2010/03/22/know-the-beast/ However, the difference for the un-optimized case is worth investigating. Reproducible: Always Steps to Reproduce: 1. Open URL Actual Results: 2.5 fps on my Thinkpad Z61p running Fedora Linux Expected Results: At least 15 fps
14fps on Webkit/Chrome, 12fps on Opera 10.51, 1.8fps on recent trunk nightly for me. i guess profiling is needed here ...
A profile shows us spending most of our time doing invalidation. 0.1% 58.7% AppKit -[NSWindow _setNeedsDisplayInRect:] 0.1% 58.5% AppKit -[NSRegion addRect:] 0.1% 57.8% CoreGraphics CGSUnionRegionWithRect 0.0% 57.3% CoreGraphics shape_union_with_bounds 49.3% 57.2% CoreGraphics shape_union 0.0% 7.8% CoreGraphics mem_heap_realloc 0.1% 7.8% libSystem.B.dylib malloc_zone_realloc
Status: UNCONFIRMED → NEW
Ever confirmed: true
Indeed. At least on Mac and Linux every invalidate just adds to the native invalidation region.... which can get expensive if lots of little rects are invalidated. I could have sworn that we had bugs on this, but I can't find them right now. In any case, I'd think that display-list based invalidation would fix this...
Depends on: dlbi
Another way to fix it is to delay the native widget invalidation flush and to use our own dirty region handling as long as possible. That way we'll keep the region simple (dirtyRegion->SimplifyOutward(8)). I've got an early patch that does that by flushing invalidations from the refresh driver and it speeds up this testcase from 3.5 FPS to 22 FPS on my machine.
Yes, that's the other obvious option. Care to attach the patch? Then I can do some more profiling to see where the remaining bottlenecks are...
Is this something we could get done for Firefox 4? Is the patch likely to help on other work loads as well?
I seem to recall that invalidates from style changes go through a view update batch, so are already coalesced as per comment 5 (though maybe that's changed since I last looked at it?). So this sort of change would primarily help workloads that have non-style-change-related invalidates. That would be SVG and animated images, mostly. Then again, my recollection could be wrong... If the patch described in comment 5 is safe enough, I think it's very much worth doing for 2.0.
There are a number of performance bugs on SVG that are all down to invalidation. In fact they probably form the majority of the performance bugs we have on SVG.
Note that proposal in comment 5 will likely leave invalidation taking more time than fixing bug 539356 would, though.
(In reply to comment #6) > Yes, that's the other obvious option. Care to attach the patch? Filed bug 598482 with the wip patch. (In reply to comment #7) > Is this something we could get done for Firefox 4? Could be possible, but I wouldn't block on it.
Depends on: 598482
OK, with Markus' patch I get 34fps, which is a lot better than by old 3fps. Chrome gets 55fps, though. ;) Curiously, if I turn on GL layers I get closer to 22fps, not 34fps. Matt, joe, want a bug on that? Profile says that: 18% of the time is still under nsIFrame::InvalidateInternal, breaking down as: 4% the recursion up the frame tree 4% doing MayHavePaintEventListener checks from nsPresContext::NotifyInvalidation 3% calling UsingEffectsForFrame 2% calling InvalidateRoot and bouncing into the viewmanager to add to the dirty region 2% GetOffsetToCrossDoc 2% invalidating thebes layers). 11% is under nsSVGPathGeometryFrame::UpdateCoveredRegion (generating paths, creating/destroying gfxContexts, GetUserPathExtent, etc). This part is actually svg code. 2% FindFilterInvalidation 4% HasAttributeDependentStyle (from before and after attr change notifications) 8% nsSVGElement::ParseAttribute; fully 2/3 of this is under PR_strtod via nsSVGLength2::SetBaseValueString. I wonder whether we can expose the CSS parser's number-tokenizing code here; as long as we want to end up with floats, not doubles, it's good enough and it's a lot faster than strtod. And we already use it for numbers elsewhere in SVG, of course.... Jonathan, worth a bug on this? 3% converting numbers to strings in the JS engine (presumably because numbers are what's passed to SetAttribute here). Doesn't SVG actually have APIs that would preclude the need for this serialization-and-parsing dance? 5% running JS stuff 35% painting. Of this, 27% is nsSVGPathGeometryFrame::Render, which lands in cairo and CoreGraphics stuff. There's got to be ways to make that stuff faster... I wonder how it does on Windows with D2D.
Oh, and the point is that there should be a win of about 10-15% (I suspect) there from doing invalidation off display listsand another maybe 3% if we stop using PR_strtod. Which will get us from 34fps to 41fps or so, if all goes right. To make progress past that, we need to speed up painting and UpdateCoveredRegion.
(In reply to comment #12) > > 8% nsSVGElement::ParseAttribute; fully 2/3 of this is under PR_strtod via > nsSVGLength2::SetBaseValueString. I wonder whether we can expose the CSS > parser's number-tokenizing code here; as long as we want to end up with > floats, not doubles, it's good enough and it's a lot faster than strtod. > And we already use it for numbers elsewhere in SVG, of course.... Jonathan, > worth a bug on this? > You already thought of that :-) bug 270255
Bah! Nothing like 6 year's passage to forget that sort of thing! Though in this case, I think the values aren't mapped into CSS, at least, right? I assume they're x and y.
Depends on: 270255
Right. Not sure what you mean about mapped again to CSS in that bug.
(In reply to comment #13) > Oh, and the point is that there should be a win of about 10-15% (I suspect) > there from doing invalidation off display listsand another maybe 3% if we stop > using PR_strtod. Which will get us from 34fps to 41fps or so, if all goes > right. > > To make progress past that, we need to speed up painting and > UpdateCoveredRegion. Caching paths and tesselations would help. Refactoring the cairo D2D and Quartz backends to be thin wrappers around platform APIs would also help.
Keywords: perf
The work in bug 734079 has improved this a bit. For me the testcase now settles at ~40 "FPS" instead of ~30 "FPS". The pre- and post-change nightlies that I tested were: cset 4bdae514b9be https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/03/2012-03-21-03-11-51-mozilla-central/firefox-14.0a1.en-US.mac.dmg cset 5c13fce74f83 https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/03/2012-03-22-03-12-20-mozilla-central/firefox-14.0a1.en-US.mac.dmg
Updated scores from the bug reporter: 15 fps in Nightly on Linux 14.0a1 (2012-03-22) (has the latest bug fixes been applied?) Opera 11.62 manages 31 fps Chromium 130 fps (I've got a better computer nowadays. Sandy Bridge i7 quad core, Nvidia Mobile 400 graphics, but not the best graphics driver installed at the moment. Chrome does not seem to mind, though.)
The most obvious thing I see in the profile from comment 20 is SVG path painting, but even that is only about 35% of the total.
(In reply to On vacation Oct 12 - Oct 27 from comment #13) > Oh, and the point is that there should be a win of about 10-15% (I suspect) > there from doing invalidation off display listsand another maybe 3% if we > stop using PR_strtod. Which will get us from 34fps to 41fps or so, if all > goes right. > Bug 919319 got rid of PR_strtod.
Whiteboard: [in-the-wild] [external-report]
We have massively regressed (30x) between Release 32 and current Nightly.
Bug 982338 regressed this. If I flip off the tiling prefs, then it seems that bug 1074294 improved performance from oscillating 25-30 FPS to oscillating between 30-35 FPS. So maybe a 15% improvement. Still a long way to go on this one.
Blocks: osx-tiling
Depends on: 1074294
Attached patch Simplify invalid region for tiled layers (obsolete) (deleted) — Splinter Review
This just copies what we do for the non-tiled version.
Attachment #8500195 - Flags: review?(jmuizelaar)
Can we please put the regression fix in a different bug instead of hijacking this one?
Comment on attachment 8500195 [details] [diff] [review] Simplify invalid region for tiled layers Yeah, good point, attaching this to bug 1077842 instead.
Attachment #8500195 - Attachment is obsolete: true
Attachment #8500195 - Flags: review?(jmuizelaar)
Bug 1090211 seems to have provided a big boost on this test. It's a bit erratic, but on my machine things go from ~30-40 "FPS" to ~70-100 "FPS". Still some way short of Chrome's 200-250 "FPS" and visually a lot less smooth, but moving in the right direction.
Depends on: 1090211
Comment on attachment 477057 [details] Testcase - append &particles=2000 to the URL to increase the number of particles For me on Mac, increasing the number of circles to 2000 brings Firefox down to ~3 FPS, while Chrome manages 40 FPS.
Attachment #477057 - Attachment description: Standalone testcase → Testcase - append &particles=2000 to the URL to increase the number of particles
Profiling, it seems like we're spending 75% of the time building display lists: https://perfht.ml/2nQUtfp
Depends on: 869505
Summary: On screen rendering of SVG particles is painfully slow → Moving lots of SVG circles around the screen has huge display list overhead
Whiteboard: [in-the-wild] [external-report] → [in-the-wild] [external-report][animating cx/cy of lots of circles][display list overhead]

Here is the latest profile on Windows with WR on, sequential styling enabled in profiler :

https://bit.ly/2UGaI0V
Looks like most of the time is spent in stylo. Emilio, is this something worth looking over, or can this bug be closed?

(For a more extreme case, use : https://bug554004.bmoattachments.org/attachment.cgi?id=477057&particles=2000 . Both Chrome and nightly ~20fps)

Flags: needinfo?(emilio)

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

I think this looks mostly as expected. We could optimize something here: Changing mapped attribute values and such could avoid selector-matching. That's save a 0.6%. Similar optimizations could be done here and there too...

I may take a look at them, but not right now.

Flags: needinfo?(emilio)
Severity: minor → S4

The severity field for this bug is relatively low, S4. However, the bug has 12 votes.
:jwatt, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jwatt)

The last needinfo from me was triggered in error by recent activity on the bug. I'm clearing the needinfo since this is a very old bug and I don't know if it's still relevant.

Flags: needinfo?(jwatt)

We're much much faster than Chrome now. 120 FPS for 2000 particles against Chrome at 15 FPS.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: