Closed
Bug 1232023
Opened 9 years ago
Closed 6 years ago
cloneNode slower than in Chrome
Categories
(Core :: DOM: Core & HTML, defect, P3)
Core
DOM: Core & HTML
Tracking
()
RESOLVED
FIXED
Performance Impact | low |
People
(Reporter: jandem, Assigned: bytesized)
References
(Blocks 1 open bug)
Details
(Keywords: perf)
Attachments
(1 file, 1 obsolete file)
(deleted),
text/html
|
Details |
Noticed this in a profile of the Ember Render-List test.
Attached is a micro-benchmark; I think it's similar to the cloneNode calls Ember makes on this test, not entirely sure though.
Anyways I get the following results:
Nightly 45: 290 ms
Safari 9: 160 ms
Chrome 48: 201 ms
So both Safari and Chrome are faster than us here.
Reporter | ||
Updated•9 years ago
|
Keywords: perf
Summary: cloneNode slower than in Safari → cloneNode slower than in Safari and Chrome
Comment 1•9 years ago
|
||
So a profile shows that out of about 325 ms the time breaks down as follows:
19ms JS stuff
13ms self time in NodeBinding::cloneNode
8ms under AutoJSContext ctor under CloneAndAdopt (???)
10ms under HTMLTableRowElement::Clone (mostly malloc, addref, ctor)
10ms under HTMLTableCellElement::Clone (same)
12ms under nsContentUtils::IsCustomElementName, half is MOZ_XMLCheckQName
18ms under HTMLTableSectionElement::Clone (mostly malloc)
30ms under HTMLTableElement::Clone (mostly CopyInnerTo copying the attribute; we should be able to make this
faster by not reparsing the attr value and a few other tweaks, I bet)
70ms under nsGenericDOMDataNode::Clone. 10ms of this is addref, the rest is nsTextNode::CloneDataNode,
which is mostly malloc and the ctor and nsTextFragment assignment.
74ms under nsINode::doInsertChildAt. Realloc for the child array (20ms), BindToTree calls (34ms), removing
script blockers (5ms, not least because it calls NS_IsMainThread(!))
12ms creating JS reflector for the return value
plus some long-tailish stuff with refcounting, ~AutoJSAPI, etc.
Comment 2•9 years ago
|
||
I wonder if sharing textfragment would make sense - copy-on-write. Would save some memory too.
nsContentUtils::IsCustomElementName should be optimized out when web components is disabled, but that
would be a temporary fix.
Comment 3•9 years ago
|
||
How about we just fix IsCustomElementName to be smarter? It's doing 2 checks like so:
2779 if (NS_FAILED(nsContentUtils::CheckQName(str, false, &colon)) || colon ||
2780 str.FindChar('-') == -1) {
2781 return false;
but in practice the CheckQName() call will always return success, there will almost never be a colon and there will almost never be a '-'. So why aren't we checking for the '-' first and skipping the expensive CheckQName call in the common case of there being no '-'? Filed bug 1237714 and will measure how much that helps.
Depends on: 1237714
Comment 4•9 years ago
|
||
So fundamentally, I think the main slowness here is the combination of malloc and just doing too many things... We can shave off something like 10ms maybe by being smarter with attributes. We can try to preallocate the child array, maybe. Not sure how much that would help.
Jan, how much is this actually important in practice?
Flags: needinfo?(jdemooij)
Reporter | ||
Comment 5•8 years ago
|
||
(In reply to Boris Zbarsky [:bz] (Out June 25-July 6) from comment #4)
> Jan, how much is this actually important in practice?
Hm I've seen this a few times in profiles but probably not terribly important. I'll look at some benchmarks again soon, will update this bug if it's still an issue.
Flags: needinfo?(jdemooij)
Updated•8 years ago
|
Whiteboard: [qf]
Updated•8 years ago
|
Whiteboard: [qf] → [qf:p1]
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → ksteuber
Assignee | ||
Comment 6•8 years ago
|
||
These are the micro-benchmark results using current software versions and my hardware:
Nightly 55: 350ms
Chrome 57: 230ms
Edge 38: 920ms
System Configuration:
Dell Precision T1700 Desktop
Windows 10 64-bit
32 GB RAM
Intel i7-4790 3.60Ghz
Comment 7•8 years ago
|
||
FWIW, on linux FF is faster than Chrome.
Chrome ~360ms
Nightly ~320ms
Since for some reason our malloc seems to be rather slow on OSX and Windows, optimizing out some extra malloc/free might be rather effective.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 9•8 years ago
|
||
Well I finished a patch that pre-allocates the attribute/child array when cloning a node, but it does not improve the performance of the microbenchmark and I think I know why. The attribute/child array (nsAttrAndChildArray), when it is first created, does not allocate any space for the array. Once it needs to start expanding, it adds 8 slots at a time (enough space, I believe, for 4 attributes or 8 children) until the array has 32 slots, then expands to the next power of two on each expansion [1]. Since no element cloned by the microbenchmark has more than 8 children, the child arrays are never actually resized in that benchmark. Thus, all my optimization introduces is a small amount of overhead in that case.
I could make a different benchmark where elements have more than 8 children, which might help to reveal the improvements that this patch gives. However, I am not sure whether this would accurately reflect real world usage. If we want to optimize this function to perform better with nodes possessing less than 8 children, I do not think this patch will cut it.
:bz - Do you have any additional insight here? Do you think this patch has merit, or should I discard it? Do you have any other suggestions for optimizations here? Otherwise I think I need to go back to the profiler and start over.
[1] http://searchfox.org/mozilla-central/rev/6e1c138a06a80f2bb3167d9dac050cc0fd4fb39e/dom/base/nsAttrAndChildArray.cpp#798
Flags: needinfo?(bzbarsky)
Comment 10•8 years ago
|
||
Ah, I didn't realize we already did bulk-allocation of slots in there.
Unless there's a slowdown from the patch on the microbenchmark, I still think it makes sense to do, since for cases with more kids it should in fact help. Real-world usage certainly has nodes with more than 8 kids. ;)
We probably want to do it in a separate bug blocking this one...
Past that, no obvious ideas for optimizing things; I'd have to reprofile.
Flags: needinfo?(bzbarsky)
Comment 11•8 years ago
|
||
Since stuff under GrowBy() when used in InsertChildAt takes 8% of the clone time, I wonder how
bug 651120 will affect to this.
Depends on: 651120
Assignee | ||
Comment 12•8 years ago
|
||
If I am correctly understanding Bug 651120, the actual array in nsAttrAndChildArray will be removed. If this is the case, it seems pointless to submit a patch to optimize the creation of that array.
I could check the patch in if Bug 651120 may not be finished anytime soon, but it seems pointless to fix this if it is going to be removed in the near future.
@smaug What do you think?
Flags: needinfo?(bugs)
Comment 13•8 years ago
|
||
It'd still be a speedup for elements with more than 4 attrs, fwiw.
I expect that's pretty common for SVG.
Assignee | ||
Comment 14•8 years ago
|
||
Ah, perhaps I misunderstood. The array will continue to hold attributes, but not children? Is that correct?
Comment 15•8 years ago
|
||
right.
And there are other things to optimize for cloneNode, I think. (would need to re-profile)
Flags: needinfo?(bugs)
Assignee | ||
Comment 16•8 years ago
|
||
Comment on attachment 8860531 [details]
Bug 1232023 - Optimize cloneNode by preinitializing attribute and child arrays
Moving this patch to Bug 1359556
Attachment #8860531 -
Attachment is obsolete: true
Comment hidden (mozreview-request) |
Assignee | ||
Updated•8 years ago
|
Attachment #8860531 -
Attachment is obsolete: true
Comment 18•7 years ago
|
||
This bug won't be fix by 57; moving it to P2 for post 57 work.
Whiteboard: [qf:p1] → [qf:p2]
Updated•7 years ago
|
Whiteboard: [qf:p2] → [perf:p2]
Updated•7 years ago
|
Whiteboard: [perf:p2] → [qf:p2]
Updated•7 years ago
|
Whiteboard: [qf:p2] → [qf:p3]
Updated•6 years ago
|
Priority: -- → P3
Comment 19•6 years ago
|
||
Results on Windows 10:
Latest Nightly - 420 ms
Firefox 61.0.1 - 490 ms
Chrome 68 - 540 ms
Comment 20•6 years ago
|
||
Setting a new title, and closing. On Linux Nightly is faster than Chrome too.
There are already other bugs to make this even faster, at least in case element has attributes when doing cloning.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Summary: cloneNode slower than in Safari and Chrome → cloneNode slower than in Chrome
Updated•6 years ago
|
Component: DOM → DOM: Core & HTML
Updated•3 years ago
|
Performance Impact: --- → P3
Whiteboard: [qf:p3]
You need to log in
before you can comment on or make changes to this bug.
Description
•