572200 - Consider always caching reset structs on rulenodes when asked for them

Assignee

Description

•

15 years ago

Attached patch Measurement patch (deleted) — Details — Splinter Review

I used the attached patch to record which rulenodes had PropagateDependentBit called on them, on the assumption that we'd cache at those callsites. Then in ~nsRuleNode I classified rulenodes into 4 bins: either the rulenode has _some_ reset struct cached on itself or it does not, and either it's had its PropagateDependentBit member called or it has not. The data varies slightly depending on the test load, but here are the numbers for our Tp4 test: 43471 had PropagateDependentBit but no cached data 62778 had cached data but not PropagateDependentBit 109515 had both 78368 had neither. So looks like about 41% of rulenodes are saving the memory needed by a nsCachedResetData. 14% would stop doing so if we cached in PropagateDependentBit. The memory involved is 14 words; I'm still working on measuring the performance impact, if any. If I look at actual dependent bits, not just the nodes that had PropagateDependentBit called on them, then a higher percentage of nodes has the bit but not cached data; apparently some rulenodes have dependent bits but never have style data gotten from them directly. For inherit structs we'd almost certainly not want to do this. I didn't measure, but the chance of many rulenodes having at least one inherit struct is ... low.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 1

•

15 years ago

I tried doing this, and it seems to be a 1% win or so, maybe, on the benchmark I was measuring on. I'll try to get more data, ideally with less noise...

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

15 years ago

Blocks: 500410

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

14 years ago

Priority: -- → P1

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

14 years ago

Priority: P1 → P2

Boris Zbarsky [:bzbarsky]

Assignee

Comment 2

•

14 years ago

One other note. If we did this, so that the mCachedResetData would never be null for a rulenode, then we could also ensure that mCachedResetData is non-null for a style context by just pointing it at mRuleNode's mCachedResetData as needed. We could further cache structs on the rulenode that asked for them, so that reset structs are treated in the rule tree just like inherit structs in the context tree: always cached right where you want them...

David Baron :dbaron:

Comment 3

•

14 years ago

So the idea here is that you're proposing that we cache structs when we set dependent bits, possibly only on the requesting node rather than on all nodes on the path? It seems like a classic space vs. size tradeoff; I don't have an opinion on what the right thing is other than that we should measure the tradeoff.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 4

•

14 years ago

> you're proposing that we cache structs when we set dependent bits Yes, pretty much. And that if we do this for all nodes in that path, then we should consider not heap-allocating the mCachedResetData for rulenodes. Any idea what would actually be useful to measure here?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 5

•

14 years ago

I did do some measurements basically using the following testcase: 1) Load HTML5 single-page spec. 2) Change body.style.top. 3) Flush out restyles only (not layout). Measure the time this step takes. Note that the vast majority of the time is reparenting style contexts, so there is no selector matching or anything involved; just recursing down in ReResolveStyleContext, creating the new style contexts, and the various Get/PeekStyleData involved, plus the CalcDifference calls. On my machine a vanilla Mac build does this in 275ms or so. If I implement the suggestion from this bug, that drops to 260ms. If I then inline the fast path in nsRuleNode::GetStyle* (which is now always hit after the first time for that rulenode), it drops to 245ms. The overall time for that testcase is about 23% recursive calls in ReResolveStyleContext (maybe we should pass along fewer arguments... if we can), 15% GetChildList calls on nsBlockFrame and nsContainerFrame (mostly the latter; see bug 645597), 6% the actual creation of new style contexts. The places where time is spent getting styles are DidSetStyleContext (5% of total time after the changes listed above), CalcStyleDifference (20-25% of the time after said changes). So there's no way we can get more than a 30% speedup on that testcase no matter what we do in this bug. I _think_ that testcase is the one where we call into the ruletree most in comparison to other things we're doing. I could be wrong, of course. I find myself wishing for the ability to write self-modifying code so that after the first GetStyle* call and caching the data I can just self-modify away all the null-checks and gunk and have a fast-path that gets from the cache. I've clearly been spending too much time with the jits. :( I wonder whether a better approach is reducing the number of style structs we have, such that we can stop lazily heap-allocating the storage for them. That might get us enough wins in terms of not needing null-checks plus better cache locality

Boris Zbarsky [:bzbarsky]

Assignee

Comment 6

•

14 years ago

To continue: to do even better than the above numbers (if nothing else because there would be no cost then to caching everything on every rulenode, etc).

Boris Zbarsky [:bzbarsky]

Assignee

Comment 7

•

14 years ago

This could be an interesting project for a layout intern if we ever get one: measuring just what properties we actually care about how much and figuring out whether our current set of style structs is the right one...

Boris Zbarsky [:bzbarsky]

Assignee

Comment 8

•

14 years ago

Attached patch Hackeypatch I used to get those numbers (deleted) — Details — Splinter Review

David Baron :dbaron:

Comment 9

•

12 years ago

So as a result of profiling the testcase in bug 780341 comment 56, I came back to this. I wrote a somewhat similar patch to Boris's, also based on the idea of caching only on rule nodes that actually have a style context (though that's technically slightly different from Boris's idea of doing it on-request), rather than the intermediate rule nodes in the chain that aren't pointed to. Except I took a slightly different tack, and decided to (a) mark rule nodes that have a style context pointing to them and (b) maintain the invariant that any rule node that has a style context pointing to it *always* has any style structs in its mDependentBits cached, which allows completely eliminating nsRuleNode::GetParentData/GetParent##struct_ and removing the mDependentBits check in nsRuleNode::GetStyleData/GetStyle##struct_ around the call to it. (I'm also doing it for both reset and inherited structs; I'm not sure whether that's a good idea or not, though I expect it won't make much difference, and I think the uniformity is probably valuable.) This ended up giving me a 14% improvement on the testcase, which is at least intended to have the same performance characteristics as the gaia homescreen, and thus basically be a real testcase (related to dynamic changes of transform). (This is substantially bigger, by percentage, than the 275ms to 260ms that Boris reported in comment 5; presumably the additional inlining that Boris used there to get to 245ms could help here as well.) The patches I tested with are: https://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/38d0e857c5a4/rule-node-know-used-directly https://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/38d0e857c5a4/rule-node-dont-destroy-structs https://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/38d0e857c5a4/cache-on-used-rule-nodes I think it's worth noting that this is likely to have the most performance improvement on testcases that are doing simple dynamic changse to non-inherited properties on elements with a lot of descendants that match a lot of style rules.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 10

•

12 years ago

Hmm. So those patches don't give us an always-non-null mCachedResetData, right? So comment 2 won't really apply... Even without that I think it's a good idea.

David Baron :dbaron:

Comment 11

•

12 years ago

Some performance numbers: without patches: https://tbpl.mozilla.org/?tree=Try&rev=1c8e1f919a6d with patches: https://tbpl.mozilla.org/?tree=Try&rev=283945137774

Boris Zbarsky [:bzbarsky]

Assignee

Comment 12

•

12 years ago

I'm not seeing any obvious positive changes there based on http://perf.snarkfest.net/compare-talos/index.html?oldRevs=1c8e1f919a6d&newRev=283945137774&submit=true :(

David Baron :dbaron:

Comment 13

•

12 years ago

Are any of the numbers a page loading performance test? None of them look like it to me. (Do we still run such a test?) I was also interested in looking at memory numbers for the pageload tests.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 14

•

12 years ago

> Are any of the numbers a page loading performance test? Hrm. I don't know.... :(

Boris Zbarsky [:bzbarsky]

Assignee

Comment 15

•

12 years ago

The claim on irc is that tp5n_paint may or may not be the pageload time. And that compare-talos didn't get updated when people once again renamed tests....

David Baron :dbaron:

Comment 16

•

12 years ago

On the assumption that tp5n_paint is the relevant pageloading performance test, and tp5n_main_rss_paint is the relvant pageloading memory test: Roughly a 1% improvement on tp5n_paint Linux32: Welch Two Sample t-test data: c(366.71, 369.69, 368.22, 367.83, 368.06) and c(365.59, 366.56, 363.35, 361.9, 368.34) t = 2.3839, df = 5.35, p-value = 0.05955 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.1695921 6.0775921 sample estimates: mean of x mean of y 368.102 365.148 Roughly a 2% improvement on tp5n_paint Linux64: Welch Two Sample t-test data: c(317.95, 323.61, 322.27, 322.78, 319.85) and c(314.7, 315.12, 316.87, 316.68, 313.85) t = 4.8944, df = 6.264, p-value = 0.002417 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 2.953949 8.742051 sample estimates: mean of x mean of y 321.292 315.444 Roughly a 1% increase on tp5n_main_rss_paint Linux32: Welch Two Sample t-test data: c(109.9, 109.9, 109.8, 109.6, 109.3) and c(110.7, 111.2, 114.9, 110.5, 110.3) t = -2.1024, df = 4.141, p-value = 0.101 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -4.1915361 0.5515361 sample estimates: mean of x mean of y 109.70 111.52 Much smaller increase on tp5n_main_rss_paint Linux64: Welch Two Sample t-test data: c(143.4, 143.2, 142.8, 142.7, 142.4) and c(143.3, 142.8, 143.3, 142.9, 143) t = -0.7752, df = 6.388, p-value = 0.4659 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.6576939 0.3376939 sample estimates: mean of x mean of y 142.90 143.06

David Baron :dbaron:

Comment 17

•

12 years ago

Performance tests on additional platforms: baseline: https://tbpl.mozilla.org/?tree=Try&rev=dedd3fa71549 with patches: https://tbpl.mozilla.org/?tree=Try&rev=88183b4e69e8

David Baron :dbaron:

Comment 18

•

12 years ago

Some but not all of those results (before patches, then after): OSX 10.6 tp5n_paint: data: c(257.82, 258, 258.68, 258.84, 257.93) and c(255.72, 257.29, 256.31, 255.71, 257.52) t = 3.993, df = 6.208, p-value = 0.006694 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.6838871 2.8041129 sample estimates: mean of x mean of y 258.254 256.510 OSX 10.6 tp5n_main_rss_paint: data: c(179.2, 179.8, 179.2, 178.7, 179.2) and c(178.6, 177.9, 179.1, 178.1, 178.7) t = 2.6702, df = 7.667, p-value = 0.02944 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.09608106 1.38391894 sample estimates: mean of x mean of y 179.22 178.48 WINNT 6.1 tp5n_paint: data: c(318.46, 318.38, 315.55, 316.99, 313.13) and c(314.06, 314.48, 314.6, 314.35, 312.33) t = 2.3477, df = 5.366, p-value = 0.06225 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.1849325 5.2609325 sample estimates: mean of x mean of y 316.502 313.964 WINNT 6.1 tp5n_pbytes_paint: data: c(119.7, 121.1, 121.4, 121.9, 119.9) and c(121.3, 120.7, 121, 120, 121.2) t = -0.0819, df = 6.182, p-value = 0.9373 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.226767 1.146767 sample estimates: mean of x mean of y 120.80 120.84

David Baron :dbaron:

Comment 19

•

12 years ago

Attached patch , part 1: Make rule nodes know whether they're used directly by a style context. (deleted) — Details — Splinter Review

Attachment #678991 - Flags: review?(bzbarsky)

David Baron :dbaron:

Comment 20

•

12 years ago

Attached patch , part 2: Allow rule nodes to cache data owned by an ancestor rule node: don't destroy structs that we don't own. (deleted) — Details — Splinter Review

Attachment #678992 - Flags: review?(bzbarsky)

David Baron :dbaron:

Comment 21

•

12 years ago

Attached patch , part 3: Cache data that lives in the rule tree on every relevant rule node that has a style context directly pointing to it directly. (deleted) — Details — Splinter Review

Attachment #678994 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 22

•

12 years ago

Comment on attachment 678991 [details] [diff] [review] , part 1: Make rule nodes know whether they're used directly by a style context. r=me

Attachment #678991 - Flags: review?(bzbarsky) → review+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 23

•

12 years ago

Comment on attachment 678992 [details] [diff] [review] , part 2: Allow rule nodes to cache data owned by an ancestor rule node: don't destroy structs that we don't own. r=me

Attachment #678992 - Flags: review?(bzbarsky) → review+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 24

•

12 years ago

Comment on attachment 678994 [details] [diff] [review] , part 3: Cache data that lives in the rule tree on every relevant rule node that has a style context directly pointing to it directly. I don't think allocation from the presshell arena is infallible yet. We should really fix that... r=me modulo that.

Attachment #678994 - Flags: review?(bzbarsky) → review+

David Baron :dbaron:

Updated

•

12 years ago

Depends on: 809533

David Baron :dbaron:

Comment 25

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #24) > Comment on attachment 678994 [details] [diff] [review] > , part 3: Cache data that lives in the rule tree on every relevant rule > node that has a style context directly pointing to it directly. > > I don't think allocation from the presshell arena is infallible yet. We > should really fix that... > > r=me modulo that. ok, fixing that in bug 809533.

David Baron :dbaron:

Comment 26

•

12 years ago

remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/bc2720708bc2 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/d5566569a5ce remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/074e0ab07f0b

Ryan VanderMeulen [:RyanVM]

Comment 27

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/bc2720708bc2 https://hg.mozilla.org/mozilla-central/rev/d5566569a5ce https://hg.mozilla.org/mozilla-central/rev/074e0ab07f0b

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla19

David Baron :dbaron:

Comment 28

•

12 years ago

Comment on attachment 678991 [details] [diff] [review] , part 1: Make rule nodes know whether they're used directly by a style context. [Approval Request Comment] Bug caused by (feature/regressing bug #): N/A User impact if declined: Lack of significant performance improvement to dynamic changes of non-inherited CSS properties on elements with large numbers of descendants, which is a signficant improvement for the B2G homescreen. Testing completed (on m-c, etc.): on mozilla-central Risk to taking this patch (and alternatives if risky): Very low; just setting a bit on an an object. String or UUID changes made by this patch: None

Attachment #678991 - Flags: approval-mozilla-aurora?

David Baron :dbaron:

Comment 29

•

12 years ago

Comment on attachment 678992 [details] [diff] [review] , part 2: Allow rule nodes to cache data owned by an ancestor rule node: don't destroy structs that we don't own. [Approval Request Comment] Bug caused by (feature/regressing bug #): N/A User impact if declined: Lack of significant performance improvement to dynamic changes of non-inherited CSS properties on elements with large numbers of descendants, which is a signficant improvement for the B2G homescreen. Testing completed (on m-c, etc.): on mozilla-central Risk to taking this patch (and alternatives if risky): Low; just tells destructor not to destroy objects that we don't own. String or UUID changes made by this patch: None

Attachment #678992 - Flags: approval-mozilla-aurora?

David Baron :dbaron:

Comment 30

•

12 years ago

Comment on attachment 678994 [details] [diff] [review] , part 3: Cache data that lives in the rule tree on every relevant rule node that has a style context directly pointing to it directly. [Approval Request Comment] Bug caused by (feature/regressing bug #): N/A User impact if declined: Lack of significant performance improvement to dynamic changes of non-inherited CSS properties on elements with large numbers of descendants, which is a signficant improvement for the B2G homescreen. Testing completed (on m-c, etc.): on mozilla-central Risk to taking this patch (and alternatives if risky): The risk from this patch is probably relatively low simply because the code is exercised so well by just about everything in the browser that we'd notice quickly if there were something wrong with it. That said, there's always the possibility that it has some sort of mistake that we don't catch quickly (e.g., a new rare crash, a new memory leak). String or UUID changes made by this patch: None

Attachment #678994 - Flags: approval-mozilla-aurora?

bhavana bajaj [:bajaj]

Comment 31

•

12 years ago

Comment on attachment 678991 [details] [diff] [review] , part 1: Make rule nodes know whether they're used directly by a style context. Approving on aurora as the risk is manageable and this is something that helps b2g

Attachment #678991 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

bhavana bajaj [:bajaj]

Updated

•

12 years ago

Attachment #678992 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

bhavana bajaj [:bajaj]

Updated

•

12 years ago

Attachment #678994 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

David Baron :dbaron:

Comment 32

•

12 years ago

remote: https://hg.mozilla.org/releases/mozilla-aurora/rev/af8428e09956 remote: https://hg.mozilla.org/releases/mozilla-aurora/rev/44de69873585 remote: https://hg.mozilla.org/releases/mozilla-aurora/rev/dd0abb35994c

status-firefox18: --- → fixed

Measurement patch 15 years ago Boris Zbarsky [:bzbarsky] (deleted), patch		Details \| Diff \| Splinter Review
Hackeypatch I used to get those numbers 14 years ago Boris Zbarsky [:bzbarsky] (deleted), patch		Details \| Diff \| Splinter Review
, part 1: Make rule nodes know whether they're used directly by a style context. 12 years ago David Baron :dbaron: (deleted), patch	bzbarsky : review+ bajaj : approval-mozilla-aurora+	Details \| Diff \| Splinter Review
, part 2: Allow rule nodes to cache data owned by an ancestor rule node: don't destroy structs that we don't own. 12 years ago David Baron :dbaron: (deleted), patch	bzbarsky : review+ bajaj : approval-mozilla-aurora+	Details \| Diff \| Splinter Review
, part 3: Cache data that lives in the rule tree on every relevant rule node that has a style context directly pointing to it directly. 12 years ago David Baron :dbaron: (deleted), patch	bzbarsky : review+ bajaj : approval-mozilla-aurora+	Details \| Diff \| Splinter Review