Closed
Bug 744830
Opened 13 years ago
Closed 12 years ago
Implement fast serializer for innerHTML/outerHTML
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: smaug, Assigned: smaug)
References
(Depends on 1 open bug, )
Details
Attachments
(7 files, 4 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
jdm
:
feedback+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
hsivonen
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
hsivonen
:
review+
|
Details | Diff | Splinter Review |
Currently we use the slow and generic serializer. We could do better.
Assignee | ||
Comment 1•13 years ago
|
||
http://mxr.mozilla.org/mozilla-central/source/content/html/content/src/nsGenericHTMLElement.cpp#670
should do something better than use document encoder.
I suggest doing the following:
Don't have a heap-allocated serializer object. Instead, use a method with a signature like
static nsresult Serialize(nsINode* aRoot, bool aDescendantsOnly, nsAString& aOut)
where the return value signals OOM (fallible string appends should probably be used in order to avoid OOM crashes when a page gets the innerHTML of the root of a huge document), aRoot is the root of the subtree to serialize, aDescendantsOnly indicates whether tags for the aRoot node itself should be included and aOut is the resulting serialization.
Don't recurse. Instead, use the iterative algorithm seen in https://hg.mozilla.org/projects/htmlparser/file/5b026546075d/src/nu/validator/htmlparser/dom/Dom2Sax.java#l57 .
However, instead of calling out to other methods, just inline the stuff that's calls out to contentHandler or lexicalHandler in the Dom2Sax code above. When the Java code above is used for serialization, both contentHander and lexicalHandler point to an instance of https://hg.mozilla.org/projects/htmlparser/file/5b026546075d/src/nu/validator/htmlparser/sax/HtmlSerializer.java
Comment 3•13 years ago
|
||
Another option would be to change nsINode::GetNextNode() to make a callback (to an inline function template parameter) when stepping out of an element. Not sure whether that would be better than just redoing the iterative walk or not.....
(In reply to Boris Zbarsky (:bz) from comment #3)
> Another option would be to change nsINode::GetNextNode() to make a callback
> (to an inline function template parameter) when stepping out of an element.
> Not sure whether that would be better than just redoing the iterative walk
> or not.....
To me, that seems more complicated than letting the tree walk loop appear in the serialization method itself. (I didn't suggest GetNextNode, because it's not nice for writing end tags.)
Comment 6•13 years ago
|
||
WIP.
Updated•13 years ago
|
Assignee: nobody → josh
Comment 7•13 years ago
|
||
Comment on attachment 616606 [details] [diff] [review]
Implement fast serializer for innerHTML/outerHTML.
Review of attachment 616606 [details] [diff] [review]:
-----------------------------------------------------------------
Some nits...
::: content/html/content/src/nsGenericHTMLElement.cpp
@@ +672,5 @@
> + aBuf.BeginReading(start_iter);
> + aBuf.EndReading(end_iter);
> +
> + const PRUnichar* c = start_iter.get();
> + const PRUnichar* end = end_iter.get();
const PRUnichar* c = aBuf.BeginReading();
const PRUnichar* end = aBuf.EndReading();
@@ +694,5 @@
> + break;
> + }
> + c++;
> + }
> + aBuf = modified;
We probably only want to do this if anything changed.
@@ +738,5 @@
> + return true;
> +}
> +
> +static nsresult
> +StartElement(const nsAString& aLocalName, nsIContent* aContent,
Element* aElement
Does this need to return nsresult, or is a bool sufficient?
@@ +763,5 @@
> + PRInt32 attNs = name->NamespaceID();
> + nsIAtom* attName = name->LocalName();
> +
> + if (attNs == kNameSpaceID_None) {
> + NS_ENSURE_TRUE(FallibleAppend(aOut, NS_LITERAL_STRING(" ")), NS_ERROR_OUT_OF_MEMORY);
Mm... A macro for NS_ENSURE_TRUE(FallibleAppend(foo, bar), NS_ERROR_OUT_OF_MEMORY);?
@@ +799,5 @@
> + static const char* voidElements[] = {
> + "area", "base", "basefont", "bgsound", "br", "col", "command", "embed", "frame",
> + "hr", "img", "input", "keygen", "link", "meta", "param", "source", "track", "wbr"
> + };
> + for (PRUint32 i = 0; i < mozilla::ArrayLength(voidElements); i++) {
Drop "mozilla::"
@@ +800,5 @@
> + "area", "base", "basefont", "bgsound", "br", "col", "command", "embed", "frame",
> + "hr", "img", "input", "keygen", "link", "meta", "param", "source", "track", "wbr"
> + };
> + for (PRUint32 i = 0; i < mozilla::ArrayLength(voidElements); i++) {
> + if (aLocalName.EqualsASCII(voidElements[i])) {
Atoms?
@@ +816,5 @@
> + static const char* nonEscapingElements[] = {
> + "iframe", "noembed", "noframes", "noscript", "plaintext", "script", "style", "xmp"
> + };
> + for (PRUint32 i = 0; i < mozilla::ArrayLength(nonEscapingElements); i++) {
> + if (aLocalName.EqualsASCII(nonEscapingElements[i])) {
Same
@@ +841,5 @@
> + case nsIDOMNode::ELEMENT_NODE: {
> + nsString name = current->LocalName();
> + if (name.IsEmpty())
> + name = current->NodeName();
> + nsresult rv = StartElement(name, static_cast<nsIContent*>(current),
current->AsElement()
@@ +857,5 @@
> + break;
> + }
> +
> + case nsIDOMNode::COMMENT_NODE: {
> + const nsTextFragment* text =static_cast<nsIContent*>(current)->GetText();
' = '
@@ +868,5 @@
> + break;
> + }
> +
> + case nsIDOMNode::DOCUMENT_NODE:
> + buf = NS_LITERAL_STRING("<!DOCTYPE html>");
Eh?
@@ +876,5 @@
> + NS_ENSURE_SUCCESS(FallibleAppend(aOut, buf), NS_ERROR_OUT_OF_MEMORY);
> +
> + nsIContent* next;
> + if ((next = current->GetFirstChild())) {
> + current = static_cast<nsINode*>(next);
No cast
@@ +883,5 @@
> +
> + while (true) {
> + if (current->NodeType() == nsIDOMNode::ELEMENT_NODE) {
> + if (escapeLevel > 0)
> + escapeLevel--;
{}
@@ +902,5 @@
> + return NS_OK;
> + }
> +
> + if ((next = current->GetNextSibling())) {
> + current = static_cast<nsINode*>(next);
No need for the cast
Comment 8•13 years ago
|
||
> Atoms?
We should file a followup to consider moving that boolean into nodeinfo.
Comment 9•13 years ago
|
||
This patch needs to handle moz-dirty based on the work being done in bug 599983.
Depends on: 599983
Comment 10•13 years ago
|
||
Smaug says this is a good testcase: https://bug-81214-attachments.webkit.org/attachment.cgi?id=132034
Comment 11•13 years ago
|
||
(In reply to Josh Matthews [:jdm] from comment #10)
> Smaug says this is a good testcase:
> https://bug-81214-attachments.webkit.org/attachment.cgi?id=132034
(Note to everybody, this will hang your browser for a while)
Comment 12•13 years ago
|
||
Updated•13 years ago
|
Attachment #616606 -
Attachment is obsolete: true
Comment 13•13 years ago
|
||
Comment on attachment 616711 [details] [diff] [review]
Implement fast serializer for innerHTML/outerHTML.
Review of attachment 616711 [details] [diff] [review]:
-----------------------------------------------------------------
::: content/html/content/src/nsGenericHTMLElement.cpp
@@ +743,5 @@
> +
> +#define APPEND_OR_FAIL(a, b) NS_ENSURE_TRUE(FallibleAppend(a, b), false)
> +
> +static bool
> +StartElement(nsIAtom* aLocalName, Element* aContent,
Why pass aLocalName separately?
@@ +782,5 @@
> + // Filter out special case of <br type="_moz"> or <br _moz*>,
> + // used by the editor. Bug 16988. Yuck.
> + //
> + PRInt32 tagNS = aContent->GetNameSpaceID();
> + if (aLocalName == nsGkAtoms::br && tagNS == kNameSpaceID_XHTML &&
aContent->IsHTML(nsGkAtoms::br)
@@ +941,5 @@
> nsGenericHTMLElement::GetMarkup(bool aIncludeSelf, nsAString& aMarkup)
> {
> aMarkup.Truncate();
>
> nsIDocument* doc = OwnerDoc();
Move this below the if
Comment 14•13 years ago
|
||
Builds should be available for profiling at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/josh@joshmatthews.net-9d168c7b6001 in a few hours. I fixed up the problems in the last try run, so I would expect all tests to be passing with this build.
Comment 15•13 years ago
|
||
Try run for 4cbda5817a8c is complete.
Detailed breakdown of the results available here:
https://tbpl.mozilla.org/?tree=Try&rev=4cbda5817a8c
Results (out of 15 total builds):
exception: 14
failure: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/josh@joshmatthews.net-4cbda5817a8c
Comment 16•13 years ago
|
||
Try run for ccc14e9dc24d is complete.
Detailed breakdown of the results available here:
https://tbpl.mozilla.org/?tree=Try&rev=ccc14e9dc24d
Results (out of 296 total builds):
exception: 20
success: 139
warnings: 81
failure: 56
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/josh@joshmatthews.net-ccc14e9dc24d
Comment 17•13 years ago
|
||
Try run for 9d168c7b6001 is complete.
Detailed breakdown of the results available here:
https://tbpl.mozilla.org/?tree=Try&rev=9d168c7b6001
Results (out of 288 total builds):
success: 246
warnings: 40
failure: 2
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/josh@joshmatthews.net-9d168c7b6001
Comment on attachment 616711 [details] [diff] [review]
Implement fast serializer for innerHTML/outerHTML.
Thanks for working on this! Some comments:
>+static bool
>+StartElement(nsIAtom* aLocalName, Element* aContent,
>+ PRUint32& aEscapeLevel, PRUint32& aIgnoreLevel,
>+ nsAString& aOut)
>+{
>+ if (aEscapeLevel > 0) {
>+ aEscapeLevel++;
>+ }
Hmm. Looks like the spec now only considers the parent--not ancestors for deciding when not to escape. See http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#html-fragment-serialization-algorithm
>+ if (attNs == kNameSpaceID_None) {
>+ APPEND_OR_FAIL(aOut, NS_LITERAL_STRING(" "));
>+ } else if (!aContent->IsHTML() && attNs == kNameSpaceID_XLink) {
>+ APPEND_OR_FAIL(aOut, NS_LITERAL_STRING(" xlink:"));
The spec says to do this even if the element is an HTML element.
>+ } else if (attNs == kNameSpaceID_XML) {
>+ if (aContent->IsHTML()) {
>+ if (attName == nsGkAtoms::lang) {
>+ APPEND_OR_FAIL(aOut, NS_LITERAL_STRING(" "));
This lang stuff is not needed per spec. It's again for differing use cases of the Java code.
>+ } else {
>+ continue;
Again, the spec says to do this even if the element is an HTML element.
>+ }
Additionally, the xmlns cases need to be handled here. (The Java code doesn't handle them, because Java SAX doesn't represent namespace declarations as attributes.)
>+ } else {
>+ APPEND_OR_FAIL(aOut, NS_LITERAL_STRING(" xml:"));
>+ }
>+ } else {
>+ continue;
The spec says to use the attributes qualified name in the anything else case.
>+ static const nsIAtom* voidElements[] = {
>+ nsGkAtoms::area, nsGkAtoms::base, nsGkAtoms::basefont, nsGkAtoms::bgsound, nsGkAtoms::br, nsGkAtoms::cols,
>+ nsGkAtoms::command, nsGkAtoms::embed, nsGkAtoms::frame, nsGkAtoms::hr, nsGkAtoms::img, nsGkAtoms::input,
>+ nsGkAtoms::keygen, nsGkAtoms::link, nsGkAtoms::meta, nsGkAtoms::param, nsGkAtoms::source, nsGkAtoms::track,
>+ nsGkAtoms::wbr
>+ };
>+ for (PRUint32 i = 0; i < ArrayLength(voidElements); i++) {
>+ if (aLocalName == voidElements[i]) {
>+ aIgnoreLevel++;
>+ return true;
>+ }
>+ }
Instead of having an ignore level integer here and having the caller method walk the descendants of void elements, you could inline the start tag handling into the tree walking loop and make the tree walking skip the next sibling when the current node is a void element.
>+ /*if (aLocalName == nsGkAtoms::pre || aLocalName == nsGkAtoms::textarea ||
>+ aLocalName == nsGkAtoms::listing) {
>+ APPEND_OR_FAIL(aOut, NS_LITERAL_STRING("\n"));
>+ }*/
This needs to check the element namespace, too.
>+ static const nsIAtom* nonEscapingElements[] = {
>+ nsGkAtoms::iframe, nsGkAtoms::noembed, nsGkAtoms::noframes, nsGkAtoms::noscript, nsGkAtoms::plaintext,
>+ nsGkAtoms::script, nsGkAtoms::style, nsGkAtoms::xmp
>+ };
>+ for (PRUint32 i = 0; i < ArrayLength(nonEscapingElements); i++) {
>+ if (aLocalName == nonEscapingElements[i]) {
>+ aEscapeLevel++;
>+ }
>+ }
Per spec, escaping only considers the parent of the text node, so there's no need to maintain an escape level integer.
>+ case nsIDOMNode::COMMENT_NODE: {
>+ const nsTextFragment* text = static_cast<nsIContent*>(current)->GetText();
>+ text->AppendTo(buf);
>+ if (escapeLevel == 0 && ignoreLevel == 0) {
>+ buf = NS_LITERAL_STRING("<!--") + buf + NS_LITERAL_STRING("-->");
>+ } else {
>+ buf = EmptyString();
The spec serializes comments unconditionally even if the result doesn't round trip.
>+ case nsIDOMNode::DOCUMENT_NODE:
>+ buf = NS_LITERAL_STRING("<!DOCTYPE html>");
>+ break;
Again, the Java code intentionally deviates from the spec here due to different use cases and different APIs. In Gecko, instead of having a case for the document node, there should be a spec-compliant case for doctype nodes.
Curiously, the spec requires handling processing instruction nodes, even though they don't round trip.
>+ nsString name = NS_LITERAL_STRING("</");
>+ nsString nodeName = current->LocalName();
For elements that aren't in the HTML, SVG or MathML namespaces, the spec requires using the qualified name as the tag name. (I'm slightly uncomfortable with using the qualified name when the qualified name is the same as the local name, since the effects will be surprising when round-tripping, but that's what the spec says currently.)
Assignee | ||
Comment 19•13 years ago
|
||
I was just testing something based on a stringbuilder, but still couldn't get
the speed of the current serializer.
Comment 20•13 years ago
|
||
This is my most recent work from before I left on my travels. I optimized the 'always append everything' codepath fairly well based on profiles; as I recall, the majority of the work seemed to be in SetCapacity and the actual DOM grubbing last time I checked. I don't recall how the speed stacked up against the current serializer. I'm happy to let somebody run with this, as I'm focusing on per-window pb in my limited free time.
Updated•13 years ago
|
Attachment #616711 -
Attachment is obsolete: true
Comment 21•13 years ago
|
||
Comment on attachment 620064 [details] [diff] [review]
Implement fast serializer for innerHTML/outerHTML.
Review of attachment 620064 [details] [diff] [review]:
-----------------------------------------------------------------
A couple of nits for when you get back
::: content/html/content/src/nsGenericHTMLElement.cpp
@@ +681,5 @@
> + if (!found)
> + return;
> +
> + nsAutoString modified;
> + modified.SetCapacity(aBuf.Length() + 5 * found);
I would like a constant for the 5 (MAX_ENTITY_LENGTH?)
@@ +747,5 @@
> +
> +static inline bool
> +FallibleAppend(nsAString& aBuf, const nsAString& aExtra)
> +{
> + if(NS_UNLIKELY(!aBuf.SetCapacity(aBuf.Length() + aExtra.Length())))
'if (', and {}s
@@ +766,5 @@
> + APPEND_OR_FAIL(aOut, nsDependentAtomString(localName));
> + //toAppend.AppendElement(NS_LITERAL_STRING("<"));
> + //toAppend.AppendElement(nsDependentAtomString(localName));
> +
> + PRInt32 count = aContent->GetAttrCount();
PRUint32
@@ +787,5 @@
> + //
> + // Filter out special case of <br type="_moz"> or <br _moz*>,
> + // used by the editor. Bug 16988. Yuck.
> + //
> + PRInt32 tagNS = aContent->GetNameSpaceID();
Why is this in the loop?
@@ +885,5 @@
> +Serialize(nsINode* aRoot, bool aDescendentsOnly, nsAString& aOut)
> +{
> + nsINode* current = aDescendentsOnly ? aRoot->GetFirstChild() : aRoot;
> + if (!current)
> + return true;
{}
@@ +893,5 @@
> + while (true) {
> + nsAutoString buf;
> + switch (current->NodeType()) {
> + case nsIDOMNode::ELEMENT_NODE: {
> + mozilla::dom::Element* elem = current->AsElement();
This is in content/, so drop the 'mozilla::dom::'
@@ +935,5 @@
> + buf = NS_LITERAL_STRING("<?");
> + buf.Append(current->NodeName());
> + buf.Append(NS_LITERAL_STRING(">"));
> + break;
> + }
default case with MOZ_NOT_REACHED?
@@ +943,5 @@
> + APPEND_OR_FAIL(aOut, buf);
> + }
> +
> + nsIContent* next;
> + if ((next = current->GetFirstChild())) {
Either
if (nsIContent* next = current->GetFirstChild()) {
or
nsIContent* next = current->GetFirstChild();
if (next) {
@@ +955,5 @@
> + ignoreLevel--;
> + } else {
> + nsString name = NS_LITERAL_STRING("</");
> + nsString nodeName;
> + mozilla::dom::Element* elem = current->AsElement();
Same
@@ +971,5 @@
> + if (current == aRoot) {
> + return true;
> + }
> +
> + if ((next = current->GetNextSibling())) {
Same
Assignee | ||
Comment 22•13 years ago
|
||
I can't figure out how the magical newline handling should work.
Per spec
document.body.innerHTML = null;
document.body.appendChild(document.createElement("textarea"));
document.body.firstChild.appendChild(document.createTextNode("\nhello"));
console.log(document.body.innerHTML)
should report "<textarea>\n\nhello</textarea>"
But that is not what current browsers do. (tested N, C, O)
Assignee | ||
Comment 23•13 years ago
|
||
In the testcase this gives same or a bit better performance than trunk-C.
https://tbpl.mozilla.org/?tree=Try&rev=9911221e883d
Comment 24•13 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #22)
> I can't figure out how the magical newline handling should work.
> Per spec
> document.body.innerHTML = null;
> document.body.appendChild(document.createElement("textarea"));
> document.body.firstChild.appendChild(document.createTextNode("\nhello"));
> console.log(document.body.innerHTML)
> should report "<textarea>\n\nhello</textarea>"
>
> But that is not what current browsers do. (tested N, C, O)
Current browsers are broken -- document.body.innerHTML = document.body.innerHTML will eat the newline in this case. I.e., the DOM doesn't round-trip through innerHTML. There's no reason this should happen here. Same for <pre>. Browsers should change to match the spec.
Also, <xmp> needs to have its contents not HTML-escaped, same as <style> and <script>. The spec requires this and WebKit does it, but Gecko gets it wrong. (This is obviously a corner case, but my editing tests hit it, FWIW.)
See bug 608322 and dependencies.
Assignee | ||
Comment 25•13 years ago
|
||
(In reply to Aryeh Gregor from comment #24)
> Current browsers are broken -- document.body.innerHTML =
> document.body.innerHTML will eat the newline in this case. I.e., the DOM
> doesn't round-trip through innerHTML. There's no reason this should happen
> here. Same for <pre>. Browsers should change to match the spec.
Ok, but then this is a *very* risky change, and need to happen at the beginning of a cycle.
Assignee | ||
Comment 26•13 years ago
|
||
Assignee | ||
Comment 27•13 years ago
|
||
Attachment #627201 -
Attachment is obsolete: true
Assignee | ||
Comment 28•13 years ago
|
||
Actually, I think we should not change newline handling in this bug.
There are enough changes and risk for regressions even without the change.
Assignee | ||
Comment 29•13 years ago
|
||
Attachment #627305 -
Flags: feedback?(josh)
Comment 30•13 years ago
|
||
Comment on attachment 627305 [details] [diff] [review]
without newline changes
Nothing sticks out at me as an obvious optimization, but the linked list made my brain explode. You could look into Henri's suggestion of the way to get rid of ignoreLevel.
Attachment #627305 -
Flags: feedback?(josh) → feedback+
Wouldn't simply writing to a string but using a doubling strategy for growing the string length result in significantly simpler code but with just as good performance?
Assignee | ||
Comment 32•13 years ago
|
||
(In reply to Jonas Sicking (:sicking) from comment #31)
> Wouldn't simply writing to a string but using a doubling strategy for
> growing the string length
That is what our string code does atm.
> result in significantly simpler code but with just
> as good performance?
And no, it doesn't give good enough performance. At least I haven't managed to do so.
Assignee | ||
Comment 33•13 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #18)
> Instead of having an ignore level integer here and having the caller method
> walk the descendants of void elements, you could inline the start tag
> handling into the tree walking loop and make the tree walking skip the next
> sibling when the current node is a void element.
Hmm, actually I don't understand the spec.
"If current node is an area, base, basefont, bgsound, br, col, command, embed, frame, hr, img, input, keygen, link, meta, param, source, track or wbr element, then continue on to the next child node at this point."
The next child node of what? current node? or the node?
Assignee | ||
Comment 34•13 years ago
|
||
I wonder which behavior the spec defines here. At least not Gecko nor Webkit behavior.
Assignee | ||
Comment 35•13 years ago
|
||
Since Gecko creates <input><div></div>
if div is a child of an input element.
Webkit creates <input></input>
Assignee | ||
Comment 36•13 years ago
|
||
What should happen if DOM is <input><div></div></input> and one does
inputElement.innerHTML?
Comment 37•13 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #36)
> What should happen if DOM is <input><div></div></input> and one does
> inputElement.innerHTML?
I would say '<input>'.
For a DOM like <input><div/></input> returning "<input>" for myInputElement.innerHTML wouldn't make sense. The question is if we should return "" or "<div></div>". (Or throw, but I think that would be a bad idea)
Assignee | ||
Comment 39•13 years ago
|
||
Assignee | ||
Comment 40•13 years ago
|
||
Comment on attachment 627640 [details] [diff] [review]
without ignoreLevel
Apparently I broke something with this.
Attachment #627640 -
Attachment is obsolete: true
Assignee | ||
Comment 41•13 years ago
|
||
Assignee: josh → bugs
Attachment #627682 -
Flags: review?(hsivonen)
Comment on attachment 627682 [details] [diff] [review]
patch
>+ var PI = document.createProcessingInstruction('foo', 'bar="1.0"');
>+ t.appendChild(PI);
>+ is(t.innerHTML, '<span>hi</span> there <!-- mon ami --><?foo bar="1.0">',
>+ "pi nodes should be included");
Please add tests for:
1) An SVG element
2) A MathML element
3) An element that's none of HTML, MathML or SVG and has a prefixed node name
4) An element that's none of HTML, MathML or SVG and has a prefixless node name
5) An xml:* attribute
6) An xlink:* attribute
7) An xmlns="foo" attribute
8) An xmlns:foo="bar" attribute
9) A namespaced attribute that's none of the above
10) A CDATA section.
11) HTML script and style with <, >, & and no-break space in the content
12) SVG script and style with <, >, & and no-break space in the content (different escaping from the HTML case)
13) MathML script and style with <, >, & and no-break space in the content (same as the SVG case)
>+#define STRING_BUFFER_UNITS 1020
Please add a comment that explains where this magic number comes from. The magic number looks like it's trying to avoid a clown-shoe allocation. If possible, please add a static assertion about sizeof(StringBuilder) to make sure that subsequent changes don't accidentally push the size of the object over a clown shoe boundary.
>+ enum Type
>+ {
>+ eUnknown,
>+ eAtom,
>+ eString,
>+ eStringWithEncode,
>+ eLiteral,
>+ eTextFragment,
>+ eTextFragmentWithEncode,
>+ };
There are plenty of cases where the unit ends up containing a single-character string literal. Would it be worthwhile to save a pointer dereference and have a unit type for a single character and passing e.g. '<' instead of "<" to Append()?
>+ template<int N>
>+ void Append(const char (&aLiteral)[N])
I haven't seen this template pattern before, but I trust that it does what it seems to do.
>+ void Append(const nsAString& aString)
>+ {
>+ Unit* u = AddUnit();
>+ u->mString = new nsAutoString(aString);
Using nsAutoString instead of nsString seems to be against the rules. However, using nsString doesn't make sense, because append to the final output string always copies the buffer of the string instead of adopting it. It seems to me that it would make sense to use PRUnichar[] instead of nsAutoString here.
>+ PRUint32 mLength;
Please add a comment that explains that this field is only meaningful on the first StringBuffer object in the linked list. In general, it would be nice to have a comment that explains the linked list/unit array duality.
>+ if (attName == nsGkAtoms::xmlns) {
>+ aBuilder.Append(" xmlns");
...
>+ aBuilder.Append(attName);
The xmlns="foo" case looks wrong.
>+static void
>+AppendEncodedCharacters(const nsTextFragment* aText, StringBuilder& aBuilder)
...
>+static void
>+AppendEncodedAttributeValue(nsAutoString* aValue, StringBuilder& aBuilder)
>+ ++found;
>+ aBuilder.AppendWithAttrEncode(aValue, aValue->Length() + (5 * found));
Instead of getting an inaccurate result my multiplication here, I think it would make more sense to count the exact number of extra characters generated by each escape by incrementing |found| by 3 in the case of < and >, by 4 in the case of & and by 5 in the case of and ". Please also add a comment that explains why it's worthwhile to iterate over the strings twice in order to make a better-sized allocation for the final output string.
r=me with the above points addressed.
Attachment #627682 -
Flags: review?(hsivonen) → review+
Comment 43•13 years ago
|
||
Comment on attachment 627682 [details] [diff] [review]
patch
Review of attachment 627682 [details] [diff] [review]:
-----------------------------------------------------------------
::: content/html/content/src/nsGenericHTMLElement.cpp
@@ +667,5 @@
> + nsAutoString* mString;
> + const nsTextFragment* mTextFragment;
> + };
> + Type mType;
> + PRUint32 mLength;
Worth using mozilla/StandardInteger.h types here and other places? I realize this doesn't match existing style of the file, but this is mostly new code anyway.
@@ +811,5 @@
> + const PRUnichar* end = aValue.EndReading();
> + while (c < end) {
> + switch (*c) {
> + case '"':
> + aOut.Append(NS_LITERAL_STRING("""));
AppendLiteral(""") here and other places is slightly clearer, IMHO.
(In reply to Nathan Froyd (:froydnj) from comment #43)
> > + aOut.Append(NS_LITERAL_STRING("""));
>
> AppendLiteral(""") here and other places is slightly clearer, IMHO.
Oops. Yeah, consider that part of my review, too.
Assignee | ||
Comment 45•13 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #42)
> >+ template<int N>
> >+ void Append(const char (&aLiteral)[N])
>
> I haven't seen this template pattern before, but I trust that it does what
> it seems to do.
That is how AppendLiteral works in strings.
> >+ void Append(const nsAString& aString)
> >+ {
> >+ Unit* u = AddUnit();
> >+ u->mString = new nsAutoString(aString);
>
> Using nsAutoString instead of nsString seems to be against the rules.
> However, using nsString doesn't make sense, because append to the final
> output string always copies the buffer of the string instead of adopting it.
> It seems to me that it would make sense to use PRUnichar[] instead of
> nsAutoString here.
Well, the whole idea to use nsAutoString here is that autostring has internal buffer
which is enough for common cases.
Append(const nsAString& aString) is quite rare case.
I don't see any good way to get PRUnichar[] attr values, and there shouldn't be need
for special case PI and DTD nodes.
(In reply to Olli Pettay [:smaug] from comment #45)
> Append(const nsAString& aString) is quite rare case.
OK.
> I don't see any good way to get PRUnichar[] attr values, and there shouldn't
> be need
> for special case PI and DTD nodes.
I see. Let's go with the code you already wrote, then.
Assignee | ||
Comment 47•13 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #42)
>
> There are plenty of cases where the unit ends up containing a
> single-character string literal. Would it be worthwhile to save a pointer
> dereference and have a unit type for a single character and passing e.g. '<'
> instead of "<" to Append()?
Not sure worth the slight complexity. Could do after profiling.
Assignee | ||
Comment 48•13 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=8f58f13a9e00
couldn't figure out how to add a truly useful check for jemalloc bucket size, but
added a comment.
Attachment #628790 -
Flags: review?(hsivonen)
Comment on attachment 628790 [details] [diff] [review]
v7
Looks good. Thanks.
If you have benchmark numbers that compare the new and old serializers, it would probably be a good idea to ping the Hacks blog team about those numbers to get them into a "what's new" article.
Attachment #628790 -
Flags: review?(hsivonen) → review+
Assignee | ||
Comment 50•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/7bf0125b26b5
I'll get the numbers from Nightlies.
I hope this change doesn't break too many websites.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 51•12 years ago
|
||
(In reply to Josh Matthews [:jdm] (travelling until June 25th) from comment #10)
> Smaug says this is a good testcase:
> https://bug-81214-attachments.webkit.org/attachment.cgi?id=132034
Tested Nightly builds on an old Windows machine:
a build w/o the patch with the patch
div.innerHTML : 2905 1350
div.outerHTML : 15513 5101
body.innerHTML : 1119 790
body.outerHTML : 1116 758
You need to log in
before you can comment on or make changes to this bug.
Description
•