1385172 - Improve nsEscapeHTML() and nsEscapeHTML2()

Assignee

Description

•

7 years ago

These functions return malloc'd |char*|/|char16_t*| buffers which are, in most cases, immediately Adopt()ed into nsStrings. We should make them operate on nsAString/nsACString. Also, they over-allocate terribly! All they do is escape '<', '>', '&', '"', and '\''. The worst case is '"' which expands to """, which is six chars. So they just assume the worst and allocate 6x more chars for the output than what the input has!

Nicholas Nethercote [inactive]

Assignee

Comment 1

•

7 years ago

I did some ad hoc profiling. Starting the browser and looking at a couple of pages, nsEscapeHTML() was called a grand total of 8 times, all for "moz-extension:" URLs. nsEscapeHTML2() wasn't called at all.

Eric Rahm [:erahm]

Comment 2

•

7 years ago

(In reply to Nicholas Nethercote [:njn] from comment #1) > I did some ad hoc profiling. Starting the browser and looking at a couple of > pages, nsEscapeHTML() was called a grand total of 8 times, all for > "moz-extension:" URLs. nsEscapeHTML2() wasn't called at all. The best I can tell nsEscapeHTML2 is just used for rendering indexes (think foo.bar.com/~erahm/).

Nicholas Nethercote [inactive]

Assignee

Comment 3

•

7 years ago

Attached patch Replace nsEscapeHTML{,2}() with new nsAppendEscapedHTML() function (obsolete) (deleted) — Details — Splinter Review

The existing functions work with C strings but almost all the call sites use Mozilla strings. The replacement function has the following properties. - It works with Mozilla strings, which makes it much simpler and also improves the call sites. - It appends to the destination string because that's what a lot of the call sites need. For those that don't, we can just append to an empty string. - It is declared outside the |extern "C"| section because there is no need for it to be in that section. Note: there is no 16-bit variant of nsAppendEscapedHTML(). This is because there are only two places that need 16-bit variants, both rarely executed, and so converting to and from 8-bit is good enough. The patch also adds some testing of the new function, renaming TestEscapeURL.cpp as TestEscape.cpp in the process, because that file is now testing other kinds of escaping.

Attachment #8898603 - Flags: review?(erahm)

Nicholas Nethercote [inactive]

Assignee

Updated

•

7 years ago

Assignee: nobody → n.nethercote

Status: NEW → ASSIGNED

Nicholas Nethercote [inactive]

Assignee

Comment 4

•

7 years ago

Attached patch Replace nsEscapeHTML{,2}() with new nsAppendEscapedHTML() function (deleted) — Details — Splinter Review

Attachment #8898626 - Flags: review?(erahm)

Nicholas Nethercote [inactive]

Assignee

Updated

•

7 years ago

Attachment #8898603 - Attachment is obsolete: true

Attachment #8898603 - Flags: review?(erahm)

Nicholas Nethercote [inactive]

Assignee

Comment 5

•

7 years ago

Attached patch Move some comments in nsEscape.h (deleted) — Details — Splinter Review

Comments about functions should go before the declaration, not after!

Attachment #8898629 - Flags: review?(erahm)

Eric Rahm [:erahm]

Comment 6

•

7 years ago

Comment on attachment 8898626 [details] [diff] [review] Replace nsEscapeHTML{,2}() with new nsAppendEscapedHTML() function Review of attachment 8898626 [details] [diff] [review]: ----------------------------------------------------------------- Thanks, this is much better. ::: xpcom/io/nsEscape.cpp @@ +214,3 @@ > { > + // Preparation: aDst's length will increase by at least aSrc's length. > + aDst.SetCapacity(aDst.Length() + aSrc.Length()); Nice, no more 6X! @@ +217,4 @@ > > + nsACString::const_iterator cur, end; > + aSrc.BeginReading(cur); > + aSrc.EndReading(end); This could just be: > for (auto cur = aSrc.BeginReading(); cur != aSrc.EndReading(); cur++) or slightly more concise (and avoiding our odd iterator class): > auto cur = aSrc.BeginReading(); > auto end = aSrc.EndReading(); It's fine as-is of course.

Attachment #8898626 - Flags: review?(erahm) → review+

Eric Rahm [:erahm]

Updated

•

7 years ago

Attachment #8898629 - Flags: review?(erahm) → review+

Nathan Froyd [:froydnj]

Comment 7

•

7 years ago

Comment on attachment 8898626 [details] [diff] [review] Replace nsEscapeHTML{,2}() with new nsAppendEscapedHTML() function Review of attachment 8898626 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/io/nsEscape.cpp @@ +214,3 @@ > { > + // Preparation: aDst's length will increase by at least aSrc's length. > + aDst.SetCapacity(aDst.Length() + aSrc.Length()); The result of this addition should be checked for overflow. Although, hm, this function isn't fallible...

Nicholas Nethercote [inactive]

Assignee

Comment 8

•

7 years ago

> The result of this addition should be checked for overflow. Although, hm, > this function isn't fallible... Turns out it can't overflow, and it wouldn't matter if it did. I've added this comment: > // Preparation: aDst's length will increase by at least aSrc's length. > // This should never overflow because kMaxCapacity is less than half of the > // maximum value of nsACString::kMaxCapacity. But even if that changed, > // Length() is unsigned, so any overflow will just cause this sum to be > // smaller than it should be, in which case this call will be a no-op. > aDst.SetCapacity(aDst.Length() + aSrc.Length());

Nathan Froyd [:froydnj]

Comment 9

•

7 years ago

(In reply to Nicholas Nethercote [:njn] from comment #8) > > // Preparation: aDst's length will increase by at least aSrc's length. > > // This should never overflow because kMaxCapacity is less than half of the > > // maximum value of nsACString::kMaxCapacity. But even if that changed, Huh, interesting. What is the non-qualified kMaxCapacity in the above, where is that coming from? Do you know offhand if we have a static assert for the relationship cited here? > > // Length() is unsigned, so any overflow will just cause this sum to be > > // smaller than it should be, in which case this call will be a no-op. > > aDst.SetCapacity(aDst.Length() + aSrc.Length()); Smaller capacities don't look like a no-op to me, particularly: https://dxr.mozilla.org/mozilla-central/source/xpcom/string/nsTSubstring.cpp#683 https://dxr.mozilla.org/mozilla-central/source/xpcom/string/nsTSubstring.cpp#708 We'll overwrite old data if we shrink. But perhaps this is not a problem because of the capacity arguments cited above.

Nicholas Nethercote [inactive]

Assignee

Comment 10

•

7 years ago

Eh, I'll just add overflow checking and skip the SetCapacity() call on failure.

Nicholas Nethercote [inactive]

Assignee

Comment 11

•

7 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/a9fb936a3c82cf1a34cbd1f7ac0e7abd0d10eba4 Bug 1385172 - Replace nsEscapeHTML{,2}() with new nsAppendEscapedHTML() function. r=erahm. https://hg.mozilla.org/integration/mozilla-inbound/rev/1f1322219d9ffefe7a51c000fb68d1722d64901d Bug 1385172 - Move some comments in nsEscape.h. r=erahm.

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 12

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a9fb936a3c82 https://hg.mozilla.org/mozilla-central/rev/1f1322219d9f

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox57: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla57

Jorg K (CEST = GMT+2)

Comment 13

•

7 years ago

That busted us without any warning :-(

Jorg K (CEST = GMT+2)

Comment 14

•

7 years ago

Just a string question: I converted this: char *escapedBody = nsEscapeHTML(*body); if (escapedBody) { PR_Free(*body); *body = escapedBody; bodyLen = strlen(*body); } to this: nsCString escapedBody; nsAppendEscapedHTML(nsDependentCString(*body), escapedBody); if (!escapedBody.IsEmpty()) { PR_Free(*body); *body = strdup(escapedBody.get()); bodyLen = strlen(*body); } The 'strdup()' is really unfortunate, but I guess there is no opposite of Adopt(), so something like: *body = escapedBody.get(); escapedBody.ForgetBuffer(); right?

Flags: needinfo?(erahm)

Eric Rahm [:erahm]

Comment 15

•

7 years ago

(In reply to Jorg K (GMT+2) from comment #14) > Just a string question: > > I converted this: > char *escapedBody = nsEscapeHTML(*body); > if (escapedBody) > { > PR_Free(*body); > *body = escapedBody; > bodyLen = strlen(*body); > } > > to this: > nsCString escapedBody; > nsAppendEscapedHTML(nsDependentCString(*body), escapedBody); > if (!escapedBody.IsEmpty()) > { > PR_Free(*body); > *body = strdup(escapedBody.get()); > bodyLen = strlen(*body); > } > > The 'strdup()' is really unfortunate, but I guess there is no opposite of > Adopt(), so something like: > *body = escapedBody.get(); > escapedBody.ForgetBuffer(); > right? Correct, we don't have a `ForgetBuffer` (the buffer itself includes the ref counting bits so it's not as easy as just forgetting it). You can use `ToNewCString` [1] instead of the `strdup` and `nsCString::Length` instead of `strlen` at least: > *body = ToNewCString(escapedBody); > bodyLen = escapedBody.Length(); [1] http://searchfox.org/mozilla-central/rev/48ea452803907f2575d81021e8678634e8067fc2/xpcom/string/nsReadableUtils.h#87

Flags: needinfo?(erahm)

Jorg K (CEST = GMT+2)

Comment 16

•

7 years ago

Thanks Eric, I was reading the string guide https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Guide/Internal_strings on my mobile device while afk and came to the same conclusions, but you answered before I could cancel the NI. That page even says: "In other words, there is no way to "grab" the internal char * from a String, i.e. have the string "forget" about it and hand off ownership to other code. Sorry."

Nicholas Nethercote [inactive]

Assignee

Comment 17

•

7 years ago

Apologies for the breakage. For no good reason I utterly failed to check this change against comm-central. I will try to do better in the future.

Jorg K (CEST = GMT+2)

Comment 18

•

7 years ago

No problem. There were about 20 call sites and the new function allowed us to toss a lot of clumsy code.

Replace nsEscapeHTML{,2}() with new nsAppendEscapedHTML() function 7 years ago Nicholas Nethercote [inactive] (deleted), patch		Details \| Diff \| Splinter Review
Replace nsEscapeHTML{,2}() with new nsAppendEscapedHTML() function 7 years ago Nicholas Nethercote [inactive] (deleted), patch	erahm : review+	Details \| Diff \| Splinter Review
Move some comments in nsEscape.h 7 years ago Nicholas Nethercote [inactive] (deleted), patch	erahm : review+	Details \| Diff \| Splinter Review