Closed
Bug 1398658
Opened 7 years ago
Closed 7 years ago
stylo: Refactor some stuff in the style sharing cache
Categories
(Core :: CSS Parsing and Computation, defect, P3)
Core
CSS Parsing and Computation
Tracking
()
RESOLVED
FIXED
mozilla57
People
(Reporter: bholley, Assigned: bholley)
References
Details
Attachments
(4 files)
(deleted),
patch
|
emilio
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
emilio
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
emilio
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
emilio
:
review+
|
Details | Diff | Splinter Review |
I spent the weekend working on a rule-node-based reuse scheme for ComputedValues (bug 1397976), which is analogous to how Gecko shares style contexts via the style context tree.
The results there are encouraging, demonstrating a 1% topline improvement on AWSY for 4 threads per heycam's measurements. However, my measurements on the HTML spec show that we still do significantly worse than Gecko, even for STYLO_THREADS=1.
From what I can tell, this is the result of eviction in the style sharing cache. We can store a maximum of 31 entries in the cache, after which point we start evicting elements. Since my scheme in bug 1397976 is based on top of the style sharing cache, this places a limit on how much reuse we can get.
Gecko does have a limit of 10 [1], but it's per-parent. This isn't directly comparable with Stylo's scheme (31 shared for the whole level), and it's easy to construct scenarios where each approach shares better than the other. That said, on the HTML spec, Gecko's approach seems to be working significantly better.
Luckily, I think we can avoid weighing the tradeoffs, and get the best of both worlds. Specifically, my plan is to replace the single LRUCache with a HashMap of LRUCaches, keyed off of parent. In terms of both sharing and lookup performance, this should perform no worse than either Stylo or Gecko, since we basically just trade a hashtable lookup for a reduction in elements to examine, which in turn allows us to cache more elements without increasing search time.
I hacked this up today and measured it, and the results are encouraging. Here are the style context counts for the HTML spec with the various approaches:
Gecko: 27k
Stylo (1 thread, 3 threads): 46k, 84k
Stylo + bug 1397976 (1 thread, 3 threads); 40k, 72k
Stylo + bug 1397976 + this bug (1 thread, 3 threads); 31k, 61.5k
So that's a quite significant improvement.
The obvious downside to my proposed 2-tiered setup is that it's heavier-weight, and the extra allocation and memory traffic could bite us if we're not careful. However, I think we can make it work, especially since we already store the cache in TLS and reuse it across traversals.
My proposal is the following:
* Instead of just an LRUCache, the cache becomes a HashMap<*const ComputedValues, CacheEntry>
* CacheEntry is an enum that is either an Element of a Box<LRUCache>. This would just be two words, so the hashmap would be small and efficient to manage.
* The Element variant is an optimization for the case where we have only one entry in the cache. We can avoid storing an entire StyleSharingCandidate here, and just promote the Element to a Box<LRUCache> when somebody does a lookup for the given key, since anybody performing that lookup has a good chance of inserting a second element afterward.
* To avoid a lot of allocation traffic on the LRUCache buffers, we collect them when clearing the cache, and store them in a vec for reuse, which can also persist across traversals. We may want to set a maximum number of free LRUCaches to persist across traversals in order to keep overhead down.
On the whole, this scheme would increase static leaked data somewhat, but not too much. Instead of 8k per thread for the LRUCache, we'd have a multiplier for the number of LRUCaches we decided to persist (4 might be a reasonable number).
Emilio, WDYT?
[1] http://searchfox.org/mozilla-central/rev/70cfd6ceecacbe779456654b596bbee4f2b8890b/layout/style/GeckoStyleContext.cpp#325
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(emilio)
Comment 1•7 years ago
|
||
I think optimizing for the HTML spec is not the best we could do, fwiw.
> * The Element variant is an optimization for the case where we have only one entry in the cache. We can avoid storing an entire StyleSharingCandidate here, and just promote the Element to a Box<LRUCache> when somebody does a lookup for the given key, since anybody performing that lookup has a good chance of inserting a second element afterward.
Hmm, even when it matches? That doesn't sound great to me, but maybe it's not too bad.
> * To avoid a lot of allocation traffic on the LRUCache buffers, we collect them when clearing the cache, and store them in a vec for reuse, which can also persist across traversals. We may want to set a maximum number of free LRUCaches to persist across traversals in order to keep overhead down.
I'm assuming these would be all empty, right? Do you plan to clear them recursively I guess? As long as we don't leak weak element references outside of the style pass I guess it's fine...
I think this approach sounds super heavy-weight overall.
Your measurements show that this would save 9kb extra on the HTML spec, but you say you want to persist up to 4 LRUCaches per thread, which add up to much more than that (and of course the more threads, the less sharing and more leaked data).
Flags: needinfo?(emilio)
Comment 2•7 years ago
|
||
(In reply to Emilio Cobos Álvarez [:emilio] from comment #1)
> Your measurements show that this would save 9kb extra on the HTML spec, but
> you say you want to persist up to 4 LRUCaches per thread, which add up to
> much more than that (and of course the more threads, the less sharing and
> more leaked data).
Err, this is wrong, your 9k is number of ComputedValues, not kb, so I guess the real memory saved depends on what those are, and how much do they cost us.
Assignee | ||
Comment 3•7 years ago
|
||
(In reply to Emilio Cobos Álvarez [:emilio] from comment #1)
> I think optimizing for the HTML spec is not the best we could do, fwiw.
>
> > * The Element variant is an optimization for the case where we have only one entry in the cache. We can avoid storing an entire StyleSharingCandidate here, and just promote the Element to a Box<LRUCache> when somebody does a lookup for the given key, since anybody performing that lookup has a good chance of inserting a second element afterward.
>
> Hmm, even when it matches? That doesn't sound great to me, but maybe it's
> not too bad.
So, the alternatives would be:
(A) Storing a StyleSharingCandidate inline via something like a SmallVec[;1]. This bumps the size of the HashMap entry from 16 bytes to several hundred (because of the ValidationData), which means that the robin-hood memmoves get expensive.
(B) Throw away the ValidationData in the event of a cache hit.
Neither of those seem great.
>
> > * To avoid a lot of allocation traffic on the LRUCache buffers, we collect them when clearing the cache, and store them in a vec for reuse, which can also persist across traversals. We may want to set a maximum number of free LRUCaches to persist across traversals in order to keep overhead down.
>
> I'm assuming these would be all empty, right?
When we store them, yes - just like the way we store the LRUCache now.
> Do you plan to clear them
> recursively I guess?
Not sure what you mean by "recursively". When we clear the HashMap, we'd hold on to the first N entries or something, and clear them before storing.
> As long as we don't leak weak element references
> outside of the style pass I guess it's fine...
>
> I think this approach sounds super heavy-weight overall.
>
> Your measurements show that this would save 9kb extra on the HTML spec, but
> you say you want to persist up to 4 LRUCaches per thread, which add up to
> much more than that (and of course the more threads, the less sharing and
> more leaked data).
(In reply to Emilio Cobos Álvarez [:emilio] from comment #2)
> (In reply to Emilio Cobos Álvarez [:emilio] from comment #1)
> > Your measurements show that this would save 9kb extra on the HTML spec, but
> > you say you want to persist up to 4 LRUCaches per thread, which add up to
> > much more than that (and of course the more threads, the less sharing and
> > more leaked data).
>
> Err, this is wrong, your 9k is number of ComputedValues, not kb, so I guess
> the real memory saved depends on what those are, and how much do they cost
> us.
Bug 1384945 comment 36 suggests these are costing us a lot.
Assignee | ||
Comment 4•7 years ago
|
||
MozReview-Commit-ID: EcVQDLoxwFP
Assignee | ||
Comment 5•7 years ago
|
||
MozReview-Commit-ID: 3l7L0MfdOxh
Assignee | ||
Comment 6•7 years ago
|
||
MozReview-Commit-ID: 2x9BIhmwH83
Assignee | ||
Comment 7•7 years ago
|
||
MozReview-Commit-ID: 70w3bZ2FU0W
Assignee | ||
Updated•7 years ago
|
Attachment #8906831 -
Flags: review?(emilio)
Assignee | ||
Updated•7 years ago
|
Attachment #8906832 -
Flags: review?(emilio)
Assignee | ||
Updated•7 years ago
|
Attachment #8906833 -
Flags: review?(emilio)
Assignee | ||
Updated•7 years ago
|
Attachment #8906834 -
Flags: review?(emilio)
Assignee | ||
Comment 8•7 years ago
|
||
There are just some initial refactoring / bug fixes that I wanted to get reviewed along the way.
Updated•7 years ago
|
Attachment #8906831 -
Flags: review?(emilio) → review+
Comment 9•7 years ago
|
||
Comment on attachment 8906832 [details] [diff] [review]
Part 2 - Fix an awful bug in LRUCache::touch(). v1
Review of attachment 8906832 [details] [diff] [review]:
-----------------------------------------------------------------
::: servo/components/style/cache.rs
@@ +38,5 @@
>
> #[inline]
> /// Touch a given entry, putting it first in the list.
> pub fn touch(&mut self, pos: usize) {
> + if pos != 0 {
Gah, this has been here since we switched the order of the cache.
Attachment #8906832 -
Flags: review?(emilio) → review+
Comment 10•7 years ago
|
||
Comment on attachment 8906833 [details] [diff] [review]
Part 3 - Encapsulate the sharing cache backend better. v1
Review of attachment 8906833 [details] [diff] [review]:
-----------------------------------------------------------------
looks good, only formatting nits.
::: servo/components/style/sharing/mod.rs
@@ +400,5 @@
> + validation_data: validation_data,
> + });
> + }
> +
> + fn lookup<F>(&mut self, mut f: F) -> Option<ElementStyles>
nit: maybe rename `f` to `test_one`?
@@ +401,5 @@
> + });
> + }
> +
> + fn lookup<F>(&mut self, mut f: F) -> Option<ElementStyles>
> + where F: FnMut(&mut StyleSharingCandidate<E>) -> bool {
nit: move the F condition to the next line, brace to the following one.
@@ +464,5 @@
> }
> }
>
> impl<E: TElement> StyleSharingCache<E> {
> + #[allow(dead_code)]
I'd just remove it, unless you want to use it later.
@@ +584,3 @@
> }
>
> + self.cache_mut().lookup(|candidate|
nit: braces for multi-line closure,
@@ +682,4 @@
> }
>
> + debug_assert!(target.has_current_styles_for_traversal(
> + &candidate.element.borrow_data().unwrap(),
nit: indentation looks slightly weird.
Attachment #8906833 -
Flags: review?(emilio) → review+
Comment 11•7 years ago
|
||
Comment on attachment 8906834 [details] [diff] [review]
Part 4 - Avoid memmoving ValidationData more than necessary. v1
Review of attachment 8906834 [details] [diff] [review]:
-----------------------------------------------------------------
I'd really prefer not doing this, have you verified you're doing actual memmoving, and that your patch skips it?
::: servo/components/style/sharing/mod.rs
@@ +500,1 @@
> pub fn insert_if_possible(&mut self,
We only have one caller, can you instead inline this function?
I'm actually surprised LLVM doesn't inline this by default.
Attachment #8906834 -
Flags: review?(emilio)
Comment 12•7 years ago
|
||
Comment on attachment 8906834 [details] [diff] [review]
Part 4 - Avoid memmoving ValidationData more than necessary. v1
Review of attachment 8906834 [details] [diff] [review]:
-----------------------------------------------------------------
Per IRC discussion inlining doesn't help, so this is ok, I guess, but let's remove the other stuff.
::: servo/components/style/sharing/mod.rs
@@ +392,5 @@
> }
> }
>
> impl<E: TElement> SharingCache<E> {
> #[inline(always)]
Since I thought this was there only to avoid memmoving, and per IRC discussion it's not necessary, let's rip it off.
@@ +393,5 @@
> }
>
> impl<E: TElement> SharingCache<E> {
> #[inline(always)]
> + fn insert(&mut self, el: E, validation_data_holder: &mut StyleSharingTarget<E>) {
I don't really prefer if this was called target, or style_sharing_target, but I guess it's fine.
Attachment #8906834 -
Flags: review+
Assignee | ||
Comment 13•7 years ago
|
||
(In reply to Emilio Cobos Álvarez [:emilio] from comment #10)
> Comment on attachment 8906833 [details] [diff] [review]
> Part 3 - Encapsulate the sharing cache backend better. v1
>
> Review of attachment 8906833 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> looks good, only formatting nits.
>
> ::: servo/components/style/sharing/mod.rs
> @@ +400,5 @@
> > + validation_data: validation_data,
> > + });
> > + }
> > +
> > + fn lookup<F>(&mut self, mut f: F) -> Option<ElementStyles>
>
> nit: maybe rename `f` to `test_one`?
I'll go with is_match().
>
> @@ +401,5 @@
> > + });
> > + }
> > +
> > + fn lookup<F>(&mut self, mut f: F) -> Option<ElementStyles>
> > + where F: FnMut(&mut StyleSharingCandidate<E>) -> bool {
>
> nit: move the F condition to the next line, brace to the following one.
Fixed.
>
> @@ +464,5 @@
> > }
> > }
> >
> > impl<E: TElement> StyleSharingCache<E> {
> > + #[allow(dead_code)]
>
> I'd just remove it, unless you want to use it later.
Given the unsafe code involved, I think it's better to keep, since somebody may want to use it and might prefer to not page in all the safety properties.
>
> @@ +584,3 @@
> > }
> >
> > + self.cache_mut().lookup(|candidate|
>
> nit: braces for multi-line closure,
Fixed.
>
> @@ +682,4 @@
> > }
> >
> > + debug_assert!(target.has_current_styles_for_traversal(
> > + &candidate.element.borrow_data().unwrap(),
>
> nit: indentation looks slightly weird.
Fixed.
(In reply to Emilio Cobos Álvarez [:emilio] from comment #12)
> > impl<E: TElement> SharingCache<E> {
> > #[inline(always)]
>
> Since I thought this was there only to avoid memmoving, and per IRC
> discussion it's not necessary, let's rip it off.
Fixed.
>
> @@ +393,5 @@
> > }
> >
> > impl<E: TElement> SharingCache<E> {
> > #[inline(always)]
> > + fn insert(&mut self, el: E, validation_data_holder: &mut StyleSharingTarget<E>) {
>
> I don't really prefer if this was called target, or style_sharing_target,
> but I guess it's fine.
I did it this way just to make it easier for the reader to realize that a target isn't conceptually necessary for the cache insertion per se, and that we're just using it as a hacky storage for the validation data.
Assignee | ||
Comment 14•7 years ago
|
||
Assignee | ||
Comment 15•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=77e223bd969c2255b21063c7fd81e6df990b64bb
Landing the set of already-reviewed refactoring patches in https://github.com/servo/servo/pull/18453
Updated•7 years ago
|
status-firefox55:
--- → wontfix
status-firefox56:
--- → wontfix
status-firefox57:
--- → affected
status-firefox-esr52:
--- → wontfix
Comment 16•7 years ago
|
||
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Assignee | ||
Comment 17•7 years ago
|
||
This didn't land, only the refactoring patches.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 18•7 years ago
|
||
Actually, since we're probably not going to land 2tier at this point, I'll just morph this bug to describe what did land.
The full 2tier patches are at https://github.com/bholley/gecko/commits/2tiercache
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Summary: stylo: Use a two-tiered setup for the style sharing cache → stylo: Refactor some stuff in the style sharing cache
You need to log in
before you can comment on or make changes to this bug.
Description
•