Closed Bug 1232023 Opened 9 years ago Closed 6 years ago

cloneNode slower than in Chrome

Categories

(Core :: DOM: Core & HTML, defect, P3)

defect

Tracking

()

RESOLVED FIXED
Performance Impact low

People

(Reporter: jandem, Assigned: bytesized)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(1 file, 1 obsolete file)

Attached file Micro benchmark (deleted) —
Noticed this in a profile of the Ember Render-List test. Attached is a micro-benchmark; I think it's similar to the cloneNode calls Ember makes on this test, not entirely sure though. Anyways I get the following results: Nightly 45: 290 ms Safari 9: 160 ms Chrome 48: 201 ms So both Safari and Chrome are faster than us here.
Keywords: perf
Summary: cloneNode slower than in Safari → cloneNode slower than in Safari and Chrome
So a profile shows that out of about 325 ms the time breaks down as follows: 19ms JS stuff 13ms self time in NodeBinding::cloneNode 8ms under AutoJSContext ctor under CloneAndAdopt (???) 10ms under HTMLTableRowElement::Clone (mostly malloc, addref, ctor) 10ms under HTMLTableCellElement::Clone (same) 12ms under nsContentUtils::IsCustomElementName, half is MOZ_XMLCheckQName 18ms under HTMLTableSectionElement::Clone (mostly malloc) 30ms under HTMLTableElement::Clone (mostly CopyInnerTo copying the attribute; we should be able to make this faster by not reparsing the attr value and a few other tweaks, I bet) 70ms under nsGenericDOMDataNode::Clone. 10ms of this is addref, the rest is nsTextNode::CloneDataNode, which is mostly malloc and the ctor and nsTextFragment assignment. 74ms under nsINode::doInsertChildAt. Realloc for the child array (20ms), BindToTree calls (34ms), removing script blockers (5ms, not least because it calls NS_IsMainThread(!)) 12ms creating JS reflector for the return value plus some long-tailish stuff with refcounting, ~AutoJSAPI, etc.
I wonder if sharing textfragment would make sense - copy-on-write. Would save some memory too. nsContentUtils::IsCustomElementName should be optimized out when web components is disabled, but that would be a temporary fix.
How about we just fix IsCustomElementName to be smarter? It's doing 2 checks like so: 2779 if (NS_FAILED(nsContentUtils::CheckQName(str, false, &colon)) || colon || 2780 str.FindChar('-') == -1) { 2781 return false; but in practice the CheckQName() call will always return success, there will almost never be a colon and there will almost never be a '-'. So why aren't we checking for the '-' first and skipping the expensive CheckQName call in the common case of there being no '-'? Filed bug 1237714 and will measure how much that helps.
Depends on: 1237714
So fundamentally, I think the main slowness here is the combination of malloc and just doing too many things... We can shave off something like 10ms maybe by being smarter with attributes. We can try to preallocate the child array, maybe. Not sure how much that would help. Jan, how much is this actually important in practice?
Flags: needinfo?(jdemooij)
(In reply to Boris Zbarsky [:bz] (Out June 25-July 6) from comment #4) > Jan, how much is this actually important in practice? Hm I've seen this a few times in profiles but probably not terribly important. I'll look at some benchmarks again soon, will update this bug if it's still an issue.
Flags: needinfo?(jdemooij)
Whiteboard: [qf]
Whiteboard: [qf] → [qf:p1]
Assignee: nobody → ksteuber
These are the micro-benchmark results using current software versions and my hardware: Nightly 55: 350ms Chrome 57: 230ms Edge 38: 920ms System Configuration: Dell Precision T1700 Desktop Windows 10 64-bit 32 GB RAM Intel i7-4790 3.60Ghz
FWIW, on linux FF is faster than Chrome. Chrome ~360ms Nightly ~320ms Since for some reason our malloc seems to be rather slow on OSX and Windows, optimizing out some extra malloc/free might be rather effective.
Depends on: 1357865
Well I finished a patch that pre-allocates the attribute/child array when cloning a node, but it does not improve the performance of the microbenchmark and I think I know why. The attribute/child array (nsAttrAndChildArray), when it is first created, does not allocate any space for the array. Once it needs to start expanding, it adds 8 slots at a time (enough space, I believe, for 4 attributes or 8 children) until the array has 32 slots, then expands to the next power of two on each expansion [1]. Since no element cloned by the microbenchmark has more than 8 children, the child arrays are never actually resized in that benchmark. Thus, all my optimization introduces is a small amount of overhead in that case. I could make a different benchmark where elements have more than 8 children, which might help to reveal the improvements that this patch gives. However, I am not sure whether this would accurately reflect real world usage. If we want to optimize this function to perform better with nodes possessing less than 8 children, I do not think this patch will cut it. :bz - Do you have any additional insight here? Do you think this patch has merit, or should I discard it? Do you have any other suggestions for optimizations here? Otherwise I think I need to go back to the profiler and start over. [1] http://searchfox.org/mozilla-central/rev/6e1c138a06a80f2bb3167d9dac050cc0fd4fb39e/dom/base/nsAttrAndChildArray.cpp#798
Flags: needinfo?(bzbarsky)
Ah, I didn't realize we already did bulk-allocation of slots in there. Unless there's a slowdown from the patch on the microbenchmark, I still think it makes sense to do, since for cases with more kids it should in fact help. Real-world usage certainly has nodes with more than 8 kids. ;) We probably want to do it in a separate bug blocking this one... Past that, no obvious ideas for optimizing things; I'd have to reprofile.
Flags: needinfo?(bzbarsky)
Since stuff under GrowBy() when used in InsertChildAt takes 8% of the clone time, I wonder how bug 651120 will affect to this.
Depends on: 651120
If I am correctly understanding Bug 651120, the actual array in nsAttrAndChildArray will be removed. If this is the case, it seems pointless to submit a patch to optimize the creation of that array. I could check the patch in if Bug 651120 may not be finished anytime soon, but it seems pointless to fix this if it is going to be removed in the near future. @smaug What do you think?
Flags: needinfo?(bugs)
It'd still be a speedup for elements with more than 4 attrs, fwiw. I expect that's pretty common for SVG.
Ah, perhaps I misunderstood. The array will continue to hold attributes, but not children? Is that correct?
right. And there are other things to optimize for cloneNode, I think. (would need to re-profile)
Flags: needinfo?(bugs)
Depends on: 1359556
Comment on attachment 8860531 [details] Bug 1232023 - Optimize cloneNode by preinitializing attribute and child arrays Moving this patch to Bug 1359556
Attachment #8860531 - Attachment is obsolete: true
Attachment #8860531 - Attachment is obsolete: true
Depends on: 1357645
Depends on: 1375189
Depends on: 1377219
This bug won't be fix by 57; moving it to P2 for post 57 work.
Whiteboard: [qf:p1] → [qf:p2]
Whiteboard: [qf:p2] → [perf:p2]
Whiteboard: [perf:p2] → [qf:p2]
Whiteboard: [qf:p2] → [qf:p3]
Priority: -- → P3
Results on Windows 10: Latest Nightly - 420 ms Firefox 61.0.1 - 490 ms Chrome 68 - 540 ms
Setting a new title, and closing. On Linux Nightly is faster than Chrome too. There are already other bugs to make this even faster, at least in case element has attributes when doing cloning.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Summary: cloneNode slower than in Safari and Chrome → cloneNode slower than in Chrome
Component: DOM → DOM: Core & HTML
Performance Impact: --- → P3
Whiteboard: [qf:p3]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: