Open Bug 355322 Opened 18 years ago Updated 2 years ago

Be smarter about which templates are tested in txStylesheet::findTemplate

Categories

(Core :: XSLT, defect)

defect

Tracking

()

People

(Reporter: sicking, Unassigned)

References

(Blocks 1 open bug)

Details

We could be a lot smarter about which templates we search in txStylesheet::findTemplate. By making it possible to ask a pattern which types of nodes it can match we could put the templates in different buckets and then ignore the buckets that a node could not possibly match. For example, if we could ask the pattern which node type it could match, (element, text, pi, etc), we could create one bucket for each node type. Templates that can match any node (i.e. patterns like "foo/node()") would be placed in every bucket. Then when the time comes to find the template for, say, an element we just go to the element bucket and look through the elements there. But we could be even more fine-grained than that. The problem with the above is that the majority of the nodes tested are likely elements, and the majority of the templates match elements. This would mean that we'd test most nodes against most templates anyway. What we could do to improve that is to create separate buckets for different element names. So templates like "foo" and "bar/foo[@bin]" would end up in the 'foo' bucket, but "foo/bin" would end up in the 'bin' bucket. Patterns like "foo/*" could either be placed in every named bucket or in a separate bucket that is always seached alongside with a named bucket. The latter is probably better from a memory perspective. The same thing could be done for attributes and even PIs. However we should consider how much perf-for-code we're getting. I don't think matching elements to attributes and PIs are *that* common. I suspect a good code-vs-speed tradeoff would be to have one bucket for each element-name, one for wildcard elements, one for textnodes, one for attributes and one for 'other' nodetypes. Oh, and one tricky aspect is that most nodes can be a root in addition to whatever DOM-type they are, since they can be the root of an disconnected subtree. And once we have the ability to check if a pattern can match attributes or not, we could use that to speed up key()s significantly by only walking attribute nodes when they could possibly be matched.
This sounds a lot like what the style system does, actually. The rules are hashed by tag, id, class of the rightmost selector. Then when we need to find the list of rules that apply to a node we look up the tag/id/class in the relevant hashtables and toss in all rules that don't have any of thsoe specified. See RuleHash::EnumerateAllRules.
This is a very common optimization among XSLT engines. Pretty much every other one I've looked at out there does this.
QA Contact: keith → xslt

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: jonas → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.