Closed Bug 744830 Opened 13 years ago Closed 12 years ago

Implement fast serializer for innerHTML/outerHTML

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: smaug, Assigned: smaug)

References

(Depends on 1 open bug,
URL
)

Details

Attachments

(7 files, 4 obsolete files)

Implement fast serializer for innerHTML/outerHTML. 13 years ago Josh Matthews [:jdm] (deleted), patch		Details \| Diff \| Splinter Review
Implement fast serializer for innerHTML/outerHTML. 13 years ago Josh Matthews [:jdm] (deleted), patch		Details \| Diff \| Splinter Review
a stringbuilder 13 years ago Olli Pettay [:smaug][bugs@pettay.fi] (deleted), patch		Details \| Diff \| Splinter Review
Implement fast serializer for innerHTML/outerHTML. 13 years ago Josh Matthews [:jdm] (deleted), patch		Details \| Diff \| Splinter Review
current patch. newline handling per spec 13 years ago Olli Pettay [:smaug][bugs@pettay.fi] (deleted), patch		Details \| Diff \| Splinter Review
+missing null check 13 years ago Olli Pettay [:smaug][bugs@pettay.fi] (deleted), patch		Details \| Diff \| Splinter Review
better to not leak 13 years ago Olli Pettay [:smaug][bugs@pettay.fi] (deleted), patch		Details \| Diff \| Splinter Review
without newline changes 13 years ago Olli Pettay [:smaug][bugs@pettay.fi] (deleted), patch	jdm : feedback+	Details \| Diff \| Splinter Review
without ignoreLevel 13 years ago Olli Pettay [:smaug][bugs@pettay.fi] (deleted), patch		Details \| Diff \| Splinter Review
patch 13 years ago Olli Pettay [:smaug][bugs@pettay.fi] (deleted), patch	hsivonen : review+	Details \| Diff \| Splinter Review
v7 13 years ago Olli Pettay [:smaug][bugs@pettay.fi] (deleted), patch	hsivonen : review+	Details \| Diff \| Splinter Review

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Description

•

13 years ago

Currently we use the slow and generic serializer. We could do better.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 1

•

13 years ago

http://mxr.mozilla.org/mozilla-central/source/content/html/content/src/nsGenericHTMLElement.cpp#670 should do something better than use document encoder.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 2

•

13 years ago

I suggest doing the following: Don't have a heap-allocated serializer object. Instead, use a method with a signature like static nsresult Serialize(nsINode* aRoot, bool aDescendantsOnly, nsAString& aOut) where the return value signals OOM (fallible string appends should probably be used in order to avoid OOM crashes when a page gets the innerHTML of the root of a huge document), aRoot is the root of the subtree to serialize, aDescendantsOnly indicates whether tags for the aRoot node itself should be included and aOut is the resulting serialization. Don't recurse. Instead, use the iterative algorithm seen in https://hg.mozilla.org/projects/htmlparser/file/5b026546075d/src/nu/validator/htmlparser/dom/Dom2Sax.java#l57 . However, instead of calling out to other methods, just inline the stuff that's calls out to contentHandler or lexicalHandler in the Dom2Sax code above. When the Java code above is used for serialization, both contentHander and lexicalHandler point to an instance of https://hg.mozilla.org/projects/htmlparser/file/5b026546075d/src/nu/validator/htmlparser/sax/HtmlSerializer.java

Boris Zbarsky [:bzbarsky]

Comment 3

•

13 years ago

Another option would be to change nsINode::GetNextNode() to make a callback (to an inline function template parameter) when stepping out of an element. Not sure whether that would be better than just redoing the iterative walk or not.....

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 4

•

13 years ago

(In reply to Boris Zbarsky (:bz) from comment #3) > Another option would be to change nsINode::GetNextNode() to make a callback > (to an inline function template parameter) when stepping out of an element. > Not sure whether that would be better than just redoing the iterative walk > or not..... To me, that seems more complicated than letting the tree walk loop appear in the serialization method itself. (I didn't suggest GetNextNode, because it's not nice for writing end tags.)

Josh Matthews [:jdm]

Comment 5

•

13 years ago

I'm playing around with implementing comment 2.

Josh Matthews [:jdm]

Comment 6

•

13 years ago

Attached patch Implement fast serializer for innerHTML/outerHTML. (obsolete) (deleted) — Details — Splinter Review

WIP.

Josh Matthews [:jdm]

Updated

•

13 years ago

Assignee: nobody → josh

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 7

•

13 years ago

Comment on attachment 616606 [details] [diff] [review] Implement fast serializer for innerHTML/outerHTML. Review of attachment 616606 [details] [diff] [review]: ----------------------------------------------------------------- Some nits... ::: content/html/content/src/nsGenericHTMLElement.cpp @@ +672,5 @@ > + aBuf.BeginReading(start_iter); > + aBuf.EndReading(end_iter); > + > + const PRUnichar* c = start_iter.get(); > + const PRUnichar* end = end_iter.get(); const PRUnichar* c = aBuf.BeginReading(); const PRUnichar* end = aBuf.EndReading(); @@ +694,5 @@ > + break; > + } > + c++; > + } > + aBuf = modified; We probably only want to do this if anything changed. @@ +738,5 @@ > + return true; > +} > + > +static nsresult > +StartElement(const nsAString& aLocalName, nsIContent* aContent, Element* aElement Does this need to return nsresult, or is a bool sufficient? @@ +763,5 @@ > + PRInt32 attNs = name->NamespaceID(); > + nsIAtom* attName = name->LocalName(); > + > + if (attNs == kNameSpaceID_None) { > + NS_ENSURE_TRUE(FallibleAppend(aOut, NS_LITERAL_STRING(" ")), NS_ERROR_OUT_OF_MEMORY); Mm... A macro for NS_ENSURE_TRUE(FallibleAppend(foo, bar), NS_ERROR_OUT_OF_MEMORY);? @@ +799,5 @@ > + static const char* voidElements[] = { > + "area", "base", "basefont", "bgsound", "br", "col", "command", "embed", "frame", > + "hr", "img", "input", "keygen", "link", "meta", "param", "source", "track", "wbr" > + }; > + for (PRUint32 i = 0; i < mozilla::ArrayLength(voidElements); i++) { Drop "mozilla::" @@ +800,5 @@ > + "area", "base", "basefont", "bgsound", "br", "col", "command", "embed", "frame", > + "hr", "img", "input", "keygen", "link", "meta", "param", "source", "track", "wbr" > + }; > + for (PRUint32 i = 0; i < mozilla::ArrayLength(voidElements); i++) { > + if (aLocalName.EqualsASCII(voidElements[i])) { Atoms? @@ +816,5 @@ > + static const char* nonEscapingElements[] = { > + "iframe", "noembed", "noframes", "noscript", "plaintext", "script", "style", "xmp" > + }; > + for (PRUint32 i = 0; i < mozilla::ArrayLength(nonEscapingElements); i++) { > + if (aLocalName.EqualsASCII(nonEscapingElements[i])) { Same @@ +841,5 @@ > + case nsIDOMNode::ELEMENT_NODE: { > + nsString name = current->LocalName(); > + if (name.IsEmpty()) > + name = current->NodeName(); > + nsresult rv = StartElement(name, static_cast<nsIContent*>(current), current->AsElement() @@ +857,5 @@ > + break; > + } > + > + case nsIDOMNode::COMMENT_NODE: { > + const nsTextFragment* text =static_cast<nsIContent*>(current)->GetText(); ' = ' @@ +868,5 @@ > + break; > + } > + > + case nsIDOMNode::DOCUMENT_NODE: > + buf = NS_LITERAL_STRING("<!DOCTYPE html>"); Eh? @@ +876,5 @@ > + NS_ENSURE_SUCCESS(FallibleAppend(aOut, buf), NS_ERROR_OUT_OF_MEMORY); > + > + nsIContent* next; > + if ((next = current->GetFirstChild())) { > + current = static_cast<nsINode*>(next); No cast @@ +883,5 @@ > + > + while (true) { > + if (current->NodeType() == nsIDOMNode::ELEMENT_NODE) { > + if (escapeLevel > 0) > + escapeLevel--; {} @@ +902,5 @@ > + return NS_OK; > + } > + > + if ((next = current->GetNextSibling())) { > + current = static_cast<nsINode*>(next); No need for the cast

Boris Zbarsky [:bzbarsky]

Comment 8

•

13 years ago

> Atoms? We should file a followup to consider moving that boolean into nodeinfo.

(no longer active)

Comment 9

•

13 years ago

This patch needs to handle moz-dirty based on the work being done in bug 599983.

Depends on: 599983

Josh Matthews [:jdm]

Comment 10

•

13 years ago

Smaug says this is a good testcase: https://bug-81214-attachments.webkit.org/attachment.cgi?id=132034

(no longer active)

Comment 11

•

13 years ago

(In reply to Josh Matthews [:jdm] from comment #10) > Smaug says this is a good testcase: > https://bug-81214-attachments.webkit.org/attachment.cgi?id=132034 (Note to everybody, this will hang your browser for a while)

Josh Matthews [:jdm]

Comment 12

•

13 years ago

Attached patch Implement fast serializer for innerHTML/outerHTML. (obsolete) (deleted) — Details — Splinter Review

Josh Matthews [:jdm]

Updated

•

13 years ago

Attachment #616606 - Attachment is obsolete: true

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 13

•

13 years ago

Comment on attachment 616711 [details] [diff] [review] Implement fast serializer for innerHTML/outerHTML. Review of attachment 616711 [details] [diff] [review]: ----------------------------------------------------------------- ::: content/html/content/src/nsGenericHTMLElement.cpp @@ +743,5 @@ > + > +#define APPEND_OR_FAIL(a, b) NS_ENSURE_TRUE(FallibleAppend(a, b), false) > + > +static bool > +StartElement(nsIAtom* aLocalName, Element* aContent, Why pass aLocalName separately? @@ +782,5 @@ > + // Filter out special case of <br type="_moz"> or <br _moz*>, > + // used by the editor. Bug 16988. Yuck. > + // > + PRInt32 tagNS = aContent->GetNameSpaceID(); > + if (aLocalName == nsGkAtoms::br && tagNS == kNameSpaceID_XHTML && aContent->IsHTML(nsGkAtoms::br) @@ +941,5 @@ > nsGenericHTMLElement::GetMarkup(bool aIncludeSelf, nsAString& aMarkup) > { > aMarkup.Truncate(); > > nsIDocument* doc = OwnerDoc(); Move this below the if

Josh Matthews [:jdm]

Comment 14

•

13 years ago

Builds should be available for profiling at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/josh@joshmatthews.net-9d168c7b6001 in a few hours. I fixed up the problems in the last try run, so I would expect all tests to be passing with this build.

Mozilla RelEng Bot

Comment 15

•

13 years ago

Try run for 4cbda5817a8c is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=4cbda5817a8c Results (out of 15 total builds): exception: 14 failure: 1 Builds (or logs if builds failed) available at: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/josh@joshmatthews.net-4cbda5817a8c

Mozilla RelEng Bot

Comment 16

•

13 years ago

Try run for ccc14e9dc24d is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=ccc14e9dc24d Results (out of 296 total builds): exception: 20 success: 139 warnings: 81 failure: 56 Builds (or logs if builds failed) available at: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/josh@joshmatthews.net-ccc14e9dc24d

Mozilla RelEng Bot

Comment 17

•

13 years ago

Try run for 9d168c7b6001 is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=9d168c7b6001 Results (out of 288 total builds): success: 246 warnings: 40 failure: 2 Builds (or logs if builds failed) available at: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/josh@joshmatthews.net-9d168c7b6001

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 18

•

13 years ago

Comment on attachment 616711 [details] [diff] [review] Implement fast serializer for innerHTML/outerHTML. Thanks for working on this! Some comments: >+static bool >+StartElement(nsIAtom* aLocalName, Element* aContent, >+ PRUint32& aEscapeLevel, PRUint32& aIgnoreLevel, >+ nsAString& aOut) >+{ >+ if (aEscapeLevel > 0) { >+ aEscapeLevel++; >+ } Hmm. Looks like the spec now only considers the parent--not ancestors for deciding when not to escape. See http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#html-fragment-serialization-algorithm >+ if (attNs == kNameSpaceID_None) { >+ APPEND_OR_FAIL(aOut, NS_LITERAL_STRING(" ")); >+ } else if (!aContent->IsHTML() && attNs == kNameSpaceID_XLink) { >+ APPEND_OR_FAIL(aOut, NS_LITERAL_STRING(" xlink:")); The spec says to do this even if the element is an HTML element. >+ } else if (attNs == kNameSpaceID_XML) { >+ if (aContent->IsHTML()) { >+ if (attName == nsGkAtoms::lang) { >+ APPEND_OR_FAIL(aOut, NS_LITERAL_STRING(" ")); This lang stuff is not needed per spec. It's again for differing use cases of the Java code. >+ } else { >+ continue; Again, the spec says to do this even if the element is an HTML element. >+ } Additionally, the xmlns cases need to be handled here. (The Java code doesn't handle them, because Java SAX doesn't represent namespace declarations as attributes.) >+ } else { >+ APPEND_OR_FAIL(aOut, NS_LITERAL_STRING(" xml:")); >+ } >+ } else { >+ continue; The spec says to use the attributes qualified name in the anything else case. >+ static const nsIAtom* voidElements[] = { >+ nsGkAtoms::area, nsGkAtoms::base, nsGkAtoms::basefont, nsGkAtoms::bgsound, nsGkAtoms::br, nsGkAtoms::cols, >+ nsGkAtoms::command, nsGkAtoms::embed, nsGkAtoms::frame, nsGkAtoms::hr, nsGkAtoms::img, nsGkAtoms::input, >+ nsGkAtoms::keygen, nsGkAtoms::link, nsGkAtoms::meta, nsGkAtoms::param, nsGkAtoms::source, nsGkAtoms::track, >+ nsGkAtoms::wbr >+ }; >+ for (PRUint32 i = 0; i < ArrayLength(voidElements); i++) { >+ if (aLocalName == voidElements[i]) { >+ aIgnoreLevel++; >+ return true; >+ } >+ } Instead of having an ignore level integer here and having the caller method walk the descendants of void elements, you could inline the start tag handling into the tree walking loop and make the tree walking skip the next sibling when the current node is a void element. >+ /*if (aLocalName == nsGkAtoms::pre || aLocalName == nsGkAtoms::textarea || >+ aLocalName == nsGkAtoms::listing) { >+ APPEND_OR_FAIL(aOut, NS_LITERAL_STRING("\n")); >+ }*/ This needs to check the element namespace, too. >+ static const nsIAtom* nonEscapingElements[] = { >+ nsGkAtoms::iframe, nsGkAtoms::noembed, nsGkAtoms::noframes, nsGkAtoms::noscript, nsGkAtoms::plaintext, >+ nsGkAtoms::script, nsGkAtoms::style, nsGkAtoms::xmp >+ }; >+ for (PRUint32 i = 0; i < ArrayLength(nonEscapingElements); i++) { >+ if (aLocalName == nonEscapingElements[i]) { >+ aEscapeLevel++; >+ } >+ } Per spec, escaping only considers the parent of the text node, so there's no need to maintain an escape level integer. >+ case nsIDOMNode::COMMENT_NODE: { >+ const nsTextFragment* text = static_cast<nsIContent*>(current)->GetText(); >+ text->AppendTo(buf); >+ if (escapeLevel == 0 && ignoreLevel == 0) { >+ buf = NS_LITERAL_STRING(""); >+ } else { >+ buf = EmptyString(); The spec serializes comments unconditionally even if the result doesn't round trip. >+ case nsIDOMNode::DOCUMENT_NODE: >+ buf = NS_LITERAL_STRING("<!DOCTYPE html>"); >+ break; Again, the Java code intentionally deviates from the spec here due to different use cases and different APIs. In Gecko, instead of having a case for the document node, there should be a spec-compliant case for doctype nodes. Curiously, the spec requires handling processing instruction nodes, even though they don't round trip. >+ nsString name = NS_LITERAL_STRING("</"); >+ nsString nodeName = current->LocalName(); For elements that aren't in the HTML, SVG or MathML namespaces, the spec requires using the qualified name as the tag name. (I'm slightly uncomfortable with using the qualified name when the qualified name is the same as the local name, since the effects will be surprising when round-tripping, but that's what the spec says currently.)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 19

•

13 years ago

Attached patch a stringbuilder (deleted) — Details — Splinter Review

I was just testing something based on a stringbuilder, but still couldn't get the speed of the current serializer.

Josh Matthews [:jdm]

Comment 20

•

13 years ago

Attached patch Implement fast serializer for innerHTML/outerHTML. (deleted) — Details — Splinter Review

This is my most recent work from before I left on my travels. I optimized the 'always append everything' codepath fairly well based on profiles; as I recall, the majority of the work seemed to be in SetCapacity and the actual DOM grubbing last time I checked. I don't recall how the speed stacked up against the current serializer. I'm happy to let somebody run with this, as I'm focusing on per-window pb in my limited free time.

Josh Matthews [:jdm]

Updated

•

13 years ago

Attachment #616711 - Attachment is obsolete: true

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 21

•

13 years ago

Comment on attachment 620064 [details] [diff] [review] Implement fast serializer for innerHTML/outerHTML. Review of attachment 620064 [details] [diff] [review]: ----------------------------------------------------------------- A couple of nits for when you get back ::: content/html/content/src/nsGenericHTMLElement.cpp @@ +681,5 @@ > + if (!found) > + return; > + > + nsAutoString modified; > + modified.SetCapacity(aBuf.Length() + 5 * found); I would like a constant for the 5 (MAX_ENTITY_LENGTH?) @@ +747,5 @@ > + > +static inline bool > +FallibleAppend(nsAString& aBuf, const nsAString& aExtra) > +{ > + if(NS_UNLIKELY(!aBuf.SetCapacity(aBuf.Length() + aExtra.Length()))) 'if (', and {}s @@ +766,5 @@ > + APPEND_OR_FAIL(aOut, nsDependentAtomString(localName)); > + //toAppend.AppendElement(NS_LITERAL_STRING("<")); > + //toAppend.AppendElement(nsDependentAtomString(localName)); > + > + PRInt32 count = aContent->GetAttrCount(); PRUint32 @@ +787,5 @@ > + // > + // Filter out special case of <br type="_moz"> or <br _moz*>, > + // used by the editor. Bug 16988. Yuck. > + // > + PRInt32 tagNS = aContent->GetNameSpaceID(); Why is this in the loop? @@ +885,5 @@ > +Serialize(nsINode* aRoot, bool aDescendentsOnly, nsAString& aOut) > +{ > + nsINode* current = aDescendentsOnly ? aRoot->GetFirstChild() : aRoot; > + if (!current) > + return true; {} @@ +893,5 @@ > + while (true) { > + nsAutoString buf; > + switch (current->NodeType()) { > + case nsIDOMNode::ELEMENT_NODE: { > + mozilla::dom::Element* elem = current->AsElement(); This is in content/, so drop the 'mozilla::dom::' @@ +935,5 @@ > + buf = NS_LITERAL_STRING("<?"); > + buf.Append(current->NodeName()); > + buf.Append(NS_LITERAL_STRING(">")); > + break; > + } default case with MOZ_NOT_REACHED? @@ +943,5 @@ > + APPEND_OR_FAIL(aOut, buf); > + } > + > + nsIContent* next; > + if ((next = current->GetFirstChild())) { Either if (nsIContent* next = current->GetFirstChild()) { or nsIContent* next = current->GetFirstChild(); if (next) { @@ +955,5 @@ > + ignoreLevel--; > + } else { > + nsString name = NS_LITERAL_STRING("</"); > + nsString nodeName; > + mozilla::dom::Element* elem = current->AsElement(); Same @@ +971,5 @@ > + if (current == aRoot) { > + return true; > + } > + > + if ((next = current->GetNextSibling())) { Same

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 22

•

13 years ago

I can't figure out how the magical newline handling should work. Per spec document.body.innerHTML = null; document.body.appendChild(document.createElement("textarea")); document.body.firstChild.appendChild(document.createTextNode("\nhello")); console.log(document.body.innerHTML) should report "<textarea>\n\nhello</textarea>" But that is not what current browsers do. (tested N, C, O)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 23

•

13 years ago

Attached patch current patch. newline handling per spec (deleted) — Details — Splinter Review

In the testcase this gives same or a bit better performance than trunk-C. https://tbpl.mozilla.org/?tree=Try&rev=9911221e883d

Aryeh Gregor (:ayg) (no longer with Mozilla)

Comment 24

•

13 years ago

(In reply to Olli Pettay [:smaug] from comment #22) > I can't figure out how the magical newline handling should work. > Per spec > document.body.innerHTML = null; > document.body.appendChild(document.createElement("textarea")); > document.body.firstChild.appendChild(document.createTextNode("\nhello")); > console.log(document.body.innerHTML) > should report "<textarea>\n\nhello</textarea>" > > But that is not what current browsers do. (tested N, C, O) Current browsers are broken -- document.body.innerHTML = document.body.innerHTML will eat the newline in this case. I.e., the DOM doesn't round-trip through innerHTML. There's no reason this should happen here. Same for <pre>. Browsers should change to match the spec. Also, <xmp> needs to have its contents not HTML-escaped, same as <style> and <script>. The spec requires this and WebKit does it, but Gecko gets it wrong. (This is obviously a corner case, but my editing tests hit it, FWIW.) See bug 608322 and dependencies.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 25

•

13 years ago

(In reply to Aryeh Gregor from comment #24) > Current browsers are broken -- document.body.innerHTML = > document.body.innerHTML will eat the newline in this case. I.e., the DOM > doesn't round-trip through innerHTML. There's no reason this should happen > here. Same for <pre>. Browsers should change to match the spec. Ok, but then this is a *very* risky change, and need to happen at the beginning of a cycle.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 26

•

13 years ago

Attached patch +missing null check (obsolete) (deleted) — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=27500fcb7375

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 27

•

13 years ago

Attached patch better to not leak (deleted) — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=e02eb979f605

Attachment #627201 - Attachment is obsolete: true

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 28

•

13 years ago

Actually, I think we should not change newline handling in this bug. There are enough changes and risk for regressions even without the change.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 29

•

13 years ago

Attached patch without newline changes (deleted) — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=ddfe41520365

Attachment #627305 - Flags: feedback?(josh)

Josh Matthews [:jdm]

Comment 30

•

13 years ago

Comment on attachment 627305 [details] [diff] [review] without newline changes Nothing sticks out at me as an obvious optimization, but the linked list made my brain explode. You could look into Henri's suggestion of the way to get rid of ignoreLevel.

Attachment #627305 - Flags: feedback?(josh) → feedback+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 31

•

13 years ago

Wouldn't simply writing to a string but using a doubling strategy for growing the string length result in significantly simpler code but with just as good performance?

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 32

•

13 years ago

(In reply to Jonas Sicking (:sicking) from comment #31) > Wouldn't simply writing to a string but using a doubling strategy for > growing the string length That is what our string code does atm. > result in significantly simpler code but with just > as good performance? And no, it doesn't give good enough performance. At least I haven't managed to do so.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 33

•

13 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #18) > Instead of having an ignore level integer here and having the caller method > walk the descendants of void elements, you could inline the start tag > handling into the tree walking loop and make the tree walking skip the next > sibling when the current node is a void element. Hmm, actually I don't understand the spec. "If current node is an area, base, basefont, bgsound, br, col, command, embed, frame, hr, img, input, keygen, link, meta, param, source, track or wbr element, then continue on to the next child node at this point." The next child node of what? current node? or the node?

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 34

•

13 years ago

I wonder which behavior the spec defines here. At least not Gecko nor Webkit behavior.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 35

•

13 years ago

Since Gecko creates <input><div></div> if div is a child of an input element. Webkit creates <input></input>

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 36

•

13 years ago

What should happen if DOM is <input><div></div></input> and one does inputElement.innerHTML?

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 37

•

13 years ago

(In reply to Olli Pettay [:smaug] from comment #36) > What should happen if DOM is <input><div></div></input> and one does > inputElement.innerHTML? I would say '<input>'.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 38

•

13 years ago

For a DOM like <input><div/></input> returning "<input>" for myInputElement.innerHTML wouldn't make sense. The question is if we should return "" or "<div></div>". (Or throw, but I think that would be a bad idea)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 39

•

13 years ago

Attached patch without ignoreLevel (obsolete) (deleted) — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=dc27746c2c24

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 40

•

13 years ago

Comment on attachment 627640 [details] [diff] [review] without ignoreLevel Apparently I broke something with this.

Attachment #627640 - Attachment is obsolete: true

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 41

•

13 years ago

Attached patch patch (deleted) — Details — Splinter Review

Assignee: josh → bugs

Attachment #627682 - Flags: review?(hsivonen)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 42

•

13 years ago

Comment on attachment 627682 [details] [diff] [review] patch >+ var PI = document.createProcessingInstruction('foo', 'bar="1.0"'); >+ t.appendChild(PI); >+ is(t.innerHTML, '<span>hi</span> there <?foo bar="1.0">', >+ "pi nodes should be included"); Please add tests for: 1) An SVG element 2) A MathML element 3) An element that's none of HTML, MathML or SVG and has a prefixed node name 4) An element that's none of HTML, MathML or SVG and has a prefixless node name 5) An xml:* attribute 6) An xlink:* attribute 7) An xmlns="foo" attribute 8) An xmlns:foo="bar" attribute 9) A namespaced attribute that's none of the above 10) A CDATA section. 11) HTML script and style with <, >, & and no-break space in the content 12) SVG script and style with <, >, & and no-break space in the content (different escaping from the HTML case) 13) MathML script and style with <, >, & and no-break space in the content (same as the SVG case) >+#define STRING_BUFFER_UNITS 1020 Please add a comment that explains where this magic number comes from. The magic number looks like it's trying to avoid a clown-shoe allocation. If possible, please add a static assertion about sizeof(StringBuilder) to make sure that subsequent changes don't accidentally push the size of the object over a clown shoe boundary. >+ enum Type >+ { >+ eUnknown, >+ eAtom, >+ eString, >+ eStringWithEncode, >+ eLiteral, >+ eTextFragment, >+ eTextFragmentWithEncode, >+ }; There are plenty of cases where the unit ends up containing a single-character string literal. Would it be worthwhile to save a pointer dereference and have a unit type for a single character and passing e.g. '<' instead of "<" to Append()? >+ template<int N> >+ void Append(const char (&aLiteral)[N]) I haven't seen this template pattern before, but I trust that it does what it seems to do. >+ void Append(const nsAString& aString) >+ { >+ Unit* u = AddUnit(); >+ u->mString = new nsAutoString(aString); Using nsAutoString instead of nsString seems to be against the rules. However, using nsString doesn't make sense, because append to the final output string always copies the buffer of the string instead of adopting it. It seems to me that it would make sense to use PRUnichar[] instead of nsAutoString here. >+ PRUint32 mLength; Please add a comment that explains that this field is only meaningful on the first StringBuffer object in the linked list. In general, it would be nice to have a comment that explains the linked list/unit array duality. >+ if (attName == nsGkAtoms::xmlns) { >+ aBuilder.Append(" xmlns"); ... >+ aBuilder.Append(attName); The xmlns="foo" case looks wrong. >+static void >+AppendEncodedCharacters(const nsTextFragment* aText, StringBuilder& aBuilder) ... >+static void >+AppendEncodedAttributeValue(nsAutoString* aValue, StringBuilder& aBuilder) >+ ++found; >+ aBuilder.AppendWithAttrEncode(aValue, aValue->Length() + (5 * found)); Instead of getting an inaccurate result my multiplication here, I think it would make more sense to count the exact number of extra characters generated by each escape by incrementing |found| by 3 in the case of < and >, by 4 in the case of & and by 5 in the case of   and ". Please also add a comment that explains why it's worthwhile to iterate over the strings twice in order to make a better-sized allocation for the final output string. r=me with the above points addressed.

Attachment #627682 - Flags: review?(hsivonen) → review+

Nathan Froyd [:froydnj]

Comment 43

•

13 years ago

Comment on attachment 627682 [details] [diff] [review] patch Review of attachment 627682 [details] [diff] [review]: ----------------------------------------------------------------- ::: content/html/content/src/nsGenericHTMLElement.cpp @@ +667,5 @@ > + nsAutoString* mString; > + const nsTextFragment* mTextFragment; > + }; > + Type mType; > + PRUint32 mLength; Worth using mozilla/StandardInteger.h types here and other places? I realize this doesn't match existing style of the file, but this is mostly new code anyway. @@ +811,5 @@ > + const PRUnichar* end = aValue.EndReading(); > + while (c < end) { > + switch (*c) { > + case '"': > + aOut.Append(NS_LITERAL_STRING(""")); AppendLiteral(""") here and other places is slightly clearer, IMHO.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 44

•

13 years ago

(In reply to Nathan Froyd (:froydnj) from comment #43) > > + aOut.Append(NS_LITERAL_STRING(""")); > > AppendLiteral(""") here and other places is slightly clearer, IMHO. Oops. Yeah, consider that part of my review, too.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 45

•

13 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #42) > >+ template<int N> > >+ void Append(const char (&aLiteral)[N]) > > I haven't seen this template pattern before, but I trust that it does what > it seems to do. That is how AppendLiteral works in strings. > >+ void Append(const nsAString& aString) > >+ { > >+ Unit* u = AddUnit(); > >+ u->mString = new nsAutoString(aString); > > Using nsAutoString instead of nsString seems to be against the rules. > However, using nsString doesn't make sense, because append to the final > output string always copies the buffer of the string instead of adopting it. > It seems to me that it would make sense to use PRUnichar[] instead of > nsAutoString here. Well, the whole idea to use nsAutoString here is that autostring has internal buffer which is enough for common cases. Append(const nsAString& aString) is quite rare case. I don't see any good way to get PRUnichar[] attr values, and there shouldn't be need for special case PI and DTD nodes.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 46

•

13 years ago

(In reply to Olli Pettay [:smaug] from comment #45) > Append(const nsAString& aString) is quite rare case. OK. > I don't see any good way to get PRUnichar[] attr values, and there shouldn't > be need > for special case PI and DTD nodes. I see. Let's go with the code you already wrote, then.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 47

•

13 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #42) > > There are plenty of cases where the unit ends up containing a > single-character string literal. Would it be worthwhile to save a pointer > dereference and have a unit type for a single character and passing e.g. '<' > instead of "<" to Append()? Not sure worth the slight complexity. Could do after profiling.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 48

•

13 years ago

Attached patch v7 (deleted) — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=8f58f13a9e00 couldn't figure out how to add a truly useful check for jemalloc bucket size, but added a comment.

Attachment #628790 - Flags: review?(hsivonen)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 49

•

12 years ago

Comment on attachment 628790 [details] [diff] [review] v7 Looks good. Thanks. If you have benchmark numbers that compare the new and old serializers, it would probably be a good idea to ping the Hacks blog team about those numbers to get them into a "what's new" article.

Attachment #628790 - Flags: review?(hsivonen) → review+

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 50

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/7bf0125b26b5 I'll get the numbers from Nightlies. I hope this change doesn't break too many websites.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 51

•

12 years ago

(In reply to Josh Matthews [:jdm] (travelling until June 25th) from comment #10) > Smaug says this is a good testcase: > https://bug-81214-attachments.webkit.org/attachment.cgi?id=132034 Tested Nightly builds on an old Windows machine: a build w/o the patch with the patch div.innerHTML : 2905 1350 div.outerHTML : 15513 5101 body.innerHTML : 1119 790 body.outerHTML : 1116 758

Josh Matthews [:jdm]

Updated

•

12 years ago

Depends on: 770867

Alice0775 White

Updated

•

12 years ago

Depends on: 788444

Loic

Updated

•

12 years ago

Depends on: 789081

Alice0775 White

Updated

•

12 years ago

Depends on: 813441

Robert Longson [:longsonr]

Updated

•

11 years ago

Depends on: 986913

Boris Zbarsky [:bzbarsky]

Updated

•

9 years ago

Blocks: 578141

You need to log in before you can comment on or make changes to this bug.