Closed Bug 631527 Opened 14 years ago Closed 6 years ago

Parallelize selector matching

Categories

(Core :: CSS Parsing and Computation, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bzbarsky, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [tech-p3])

Attachments

(4 files, 15 obsolete files)

(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
This should be doable, if we make sure that selector matching never changes DOM state.
Blocks: 630480
Of course right now it _does_ change DOM state: the various slow selector flags.  We'd need to figure out how to handle that.
OS: Mac OS X → All
Hardware: x86 → All
Depends on: 660959
Depends on: 598833
Depends on: 661746
The other data structure whose updates are interleaves with selector matching is the ruletree.  We need to fix that too.
Assignee: nobody → dzbarsky
Attached patch wip (obsolete) (deleted) — Splinter Review
Notes on pastebin'd WIP

 - Is there a wide variation in the cost of FindRulesFor()?  That is, is it way more expensive for some nodes that others.

 - What's the cost of ResolveStyleFor()/ResolveStyleForNonElement() compared to FindRulesFor(), usually?

You could cut down on a lot of possibly unnecessary synchronization overhead here by dividing the node list among threads before dispatching the tasks to them.  Similarly, you could eliminate a lot of contention by having each thread keep its own table of results.

If there's a huge variation in FindRulesFor(), it might be better to dynamically schedule the tasks among threads by having them steal nodes off a list as here.

If ResolveStyleFor()/ResolveStyleForNonElement() is expensive (and main-thread only?), it might make sense to have the main thread doing this in parallel to workers running FindRulesFor().  In that case, it would be better for the workers to post tasks to a shared data structure that the main thread steals work from.

Stylistic notes

 - Instead of the PR* synchronization stuff, use the mozilla:: wrappers in xpcom/glue (Mutex, CondVar).  They have useful checking

 - Use MutexAutoLock instead of explicit Lock/Unlock.  It's much less error prone.

 - Your WIP is "spinning" on a shared counter and doing a non-atomic load (technically).  If we're going to fork threads for style-matching here and then destroy them, then it would be better to Join() all the forked threads.  If the idea is to keep them around, then the primitive we want here is a barrier; we should land bug 486442 with fast pthreads+windows impls soon, if so.

Sorry I couldn't spot the bug in the WIP.  I'd recommend setting up a VM to run valgrind.

Let me know if you have any more questions.
> - Is there a wide variation in the cost of FindRulesFor()?

There can be, yes.  Certainly it's trivial to write testcases where this is the case...  What happens in practice on web pages is a good question.

> - What's the cost of ResolveStyleFor()/ResolveStyleForNonElement() compared to
> FindRulesFor(), usually?

It varies, but generally the non-FindRulesFor parts are cheaper than FindRulesFor.  Sometimes much cheaper.

Having the main thread doing ResolveStyleFor (which for now is indeed main-thread-only) in parallel with worker threads doing the rule matching is a pretty interesting idea.  I like it, but I'm ok with trying it as a followup too.

Keeping the matching thread around seems fine to me.  Is there any drawback to doing that?
(In reply to comment #5)
> Keeping the matching thread around seems fine to me.  Is there any drawback
> to doing that?

Well, thread*s* eventually, right?  Nothing subtle, just standard memory vs. complexity tradeoff.  If the pool would be per process, and matching doesn't require much stack space (so the threads don't need much allocated), then I don't see any reason to bother with expiring threads.
Pool would be per process, yes.  We'd need to measure matching stack space, but I suspect it won't be too bad (in particular, we avoid recursive algorithms in matching already).
Attachment #544023 - Attachment is obsolete: true
Attachment #546175 - Flags: review?(bzbarsky)
Comment on attachment 546175 [details] [diff] [review]
Part 1: Separate rule matching from tree building

>--- a/content/base/src/nsGenericElement.h
>+++ b/content/base/src/nsGenericElement.h
>+class nsRuleWalker;

This doesn't seem really necessary

>--- a/layout/style/nsCSSRuleProcessor.cpp
>+++ b/layout/style/nsCSSRuleProcessor.cpp
>-    if (!next || SelectorMatchesTree(data->mElement, next,
>+    return (!next || SelectorMatchesTree(data->mElement, next,
>                                      data->mTreeMatchContext,

Indentation is off.

>--- a/layout/style/nsStyleSet.h
>+++ b/layout/style/nsStyleSet.h
>+  already_AddRefed<nsStyleContext>
>+  
>+  ResolveStyleFor(ElementRuleProcessorData* aData,
>+                  nsStyleContext* aParentContext,
>+                  TreeMatchContext& aTreeMatchContext);

New line *before* already_AddRefed<nsStyleContext>?

And maybe file a followup to move WalkContentStyleRules from nsIContent to Element?
Comment on attachment 546176 [details] [diff] [review]
Part 2: Add a function to nsCSSFrameConstructor to resolve styles in parallel

>--- a/layout/base/nsCSSFrameConstructor.cpp
>+++ b/layout/base/nsCSSFrameConstructor.cpp
>+  for (nsIContent* child = aStartChild;
>+        child != aEndChild;
>+        child = child->GetNextSibling()) {
>+

Indentation; also below.

>+    // XXX the GetContent() != aContent check is needed due to bug 135040.

No aContent in sight.

>+    // Remove it once that's fixed.  

Trailing spaces.

>+  
>+  /*for (nsIContent* child = aFirstNewContent;
>+       child;
>+       child = child->GetNextSibling()) {
>+    AddFrameConstructionItems(state, child, PR_FALSE, parentFrame, items);
>+  }*/

I'm assuming you're removing this code.
Attached patch wip Part 3 (obsolete) (deleted) — Splinter Review
Attachment #546175 - Attachment is obsolete: true
Attachment #546176 - Attachment is obsolete: true
Attachment #546815 - Attachment is obsolete: true
Attachment #548089 - Flags: review?(bzbarsky)
Attachment #546175 - Flags: review?(bzbarsky)
Attachment #546176 - Flags: review?(bzbarsky)
Attached patch Part 3: Use multiple threads (obsolete) (deleted) — Splinter Review
Attachment #548091 - Flags: review?(bzbarsky)
Attached patch Part 3: Use multiple threads (obsolete) (deleted) — Splinter Review
Attachment #548091 - Attachment is obsolete: true
Attachment #548091 - Flags: review?(bzbarsky)
Attachment #548094 - Flags: review?(bzbarsky)
Avoid copying arrays
Attachment #548089 - Attachment is obsolete: true
Attachment #548870 - Flags: review?(bzbarsky)
Attachment #548089 - Flags: review?(bzbarsky)
Depends on: 486442
So one obvious issue with part 1: nsHTMLBodyElement::WalkContentStyleRules is writing to mContentStyleRule.

In fact, WalkContentStyleRules is not const.  It needs to be, right?

For the preshint level, can we assert that we either got css rules or non-CSS ones but not both?  Is that even an invariant that holds?  If we get both, we don't know how to sort them against each other with this patch, right?

The code duplication in FileRules/FileRules2 and ConstructRuleTree bothers me, but maybe there's really no better way....
The changes we talked about on IRC
Attachment #548870 - Attachment is obsolete: true
Attachment #549581 - Flags: review?(bzbarsky)
Attachment #548870 - Flags: review?(bzbarsky)
Attached patch rollup patch (obsolete) (deleted) — Splinter Review
Attachment #548090 - Attachment is obsolete: true
Attachment #548094 - Attachment is obsolete: true
Attachment #549581 - Attachment is obsolete: true
Attachment #550935 - Flags: review?(bzbarsky)
Attachment #548090 - Flags: review?(bzbarsky)
Attachment #548094 - Flags: review?(bzbarsky)
Attachment #549581 - Flags: review?(bzbarsky)
Attachment #550936 - Flags: review?(bzbarsky)
Attachment #551081 - Flags: review?(bzbarsky)
Blocks: 676895
Blocks: 676900
Attachment #550601 - Attachment is obsolete: true
Attachment #551835 - Flags: review?(bzbarsky)
Change a little due to part 0.
Attachment #550936 - Attachment is obsolete: true
Attachment #551836 - Flags: review?(bzbarsky)
Attachment #550936 - Flags: review?(bzbarsky)
Depends on: 677685
No longer depends on: 677685
Attached patch rollup patch (obsolete) (deleted) — Splinter Review
Attachment #550935 - Attachment is obsolete: true
Attachment #551081 - Attachment is obsolete: true
Attachment #551980 - Attachment is obsolete: true
Attachment #552675 - Flags: review?(bzbarsky)
Attachment #550935 - Flags: review?(bzbarsky)
Attachment #551081 - Flags: review?(bzbarsky)
Attachment #556314 - Flags: review?(bzbarsky)
Blocks: 676054
Blocks: 702160
Blocks: 705877
Attachment #551835 - Flags: review?(bzbarsky)
Attachment #551836 - Flags: review?(bzbarsky)
Attachment #552675 - Flags: review?(bzbarsky)
Attachment #556314 - Flags: review?(bzbarsky)
So one thing I definitely recall being a problem with the above patches and that I've meant to put in here for a while...  They were calling into the animation/transition manager from off-thread, during selector matching, iirc.  Those parts needed to move to ruletree construction instead.
(We need this kind of work to take better advantage of multicore phones.)
Blocks: b2g-v-next
For the b2g use case, we'll need this for multicore devices, but those are a bit off in the future so for now important technical issue.

Things are totally different on higher-end HW though! :)
Whiteboard: [tech-p3]
Done by stylo.
Assignee: dzbarsky → nobody
Status: NEW → RESOLVED
Closed: 6 years ago
Depends on: stylo
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: