Closed Bug 1385172 Opened 7 years ago Closed 7 years ago

Improve nsEscapeHTML() and nsEscapeHTML2()

Categories

(Core :: XPCOM, enhancement)

54 Branch
enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: n.nethercote, Assigned: n.nethercote)

Details

Attachments

(2 files, 1 obsolete file)

These functions return malloc'd |char*|/|char16_t*| buffers which are, in most cases, immediately Adopt()ed into nsStrings. We should make them operate on nsAString/nsACString. Also, they over-allocate terribly! All they do is escape '<', '>', '&', '"', and '\''. The worst case is '"' which expands to "&quot;", which is six chars. So they just assume the worst and allocate 6x more chars for the output than what the input has!
I did some ad hoc profiling. Starting the browser and looking at a couple of pages, nsEscapeHTML() was called a grand total of 8 times, all for "moz-extension:" URLs. nsEscapeHTML2() wasn't called at all.
(In reply to Nicholas Nethercote [:njn] from comment #1) > I did some ad hoc profiling. Starting the browser and looking at a couple of > pages, nsEscapeHTML() was called a grand total of 8 times, all for > "moz-extension:" URLs. nsEscapeHTML2() wasn't called at all. The best I can tell nsEscapeHTML2 is just used for rendering indexes (think foo.bar.com/~erahm/).
The existing functions work with C strings but almost all the call sites use Mozilla strings. The replacement function has the following properties. - It works with Mozilla strings, which makes it much simpler and also improves the call sites. - It appends to the destination string because that's what a lot of the call sites need. For those that don't, we can just append to an empty string. - It is declared outside the |extern "C"| section because there is no need for it to be in that section. Note: there is no 16-bit variant of nsAppendEscapedHTML(). This is because there are only two places that need 16-bit variants, both rarely executed, and so converting to and from 8-bit is good enough. The patch also adds some testing of the new function, renaming TestEscapeURL.cpp as TestEscape.cpp in the process, because that file is now testing other kinds of escaping.
Attachment #8898603 - Flags: review?(erahm)
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
Attachment #8898603 - Attachment is obsolete: true
Attachment #8898603 - Flags: review?(erahm)
Comments about functions should go before the declaration, not after!
Attachment #8898629 - Flags: review?(erahm)
Comment on attachment 8898626 [details] [diff] [review] Replace nsEscapeHTML{,2}() with new nsAppendEscapedHTML() function Review of attachment 8898626 [details] [diff] [review]: ----------------------------------------------------------------- Thanks, this is much better. ::: xpcom/io/nsEscape.cpp @@ +214,3 @@ > { > + // Preparation: aDst's length will increase by at least aSrc's length. > + aDst.SetCapacity(aDst.Length() + aSrc.Length()); Nice, no more 6X! @@ +217,4 @@ > > + nsACString::const_iterator cur, end; > + aSrc.BeginReading(cur); > + aSrc.EndReading(end); This could just be: > for (auto cur = aSrc.BeginReading(); cur != aSrc.EndReading(); cur++) or slightly more concise (and avoiding our odd iterator class): > auto cur = aSrc.BeginReading(); > auto end = aSrc.EndReading(); It's fine as-is of course.
Attachment #8898626 - Flags: review?(erahm) → review+
Attachment #8898629 - Flags: review?(erahm) → review+
Comment on attachment 8898626 [details] [diff] [review] Replace nsEscapeHTML{,2}() with new nsAppendEscapedHTML() function Review of attachment 8898626 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/io/nsEscape.cpp @@ +214,3 @@ > { > + // Preparation: aDst's length will increase by at least aSrc's length. > + aDst.SetCapacity(aDst.Length() + aSrc.Length()); The result of this addition should be checked for overflow. Although, hm, this function isn't fallible...
> The result of this addition should be checked for overflow. Although, hm, > this function isn't fallible... Turns out it can't overflow, and it wouldn't matter if it did. I've added this comment: > // Preparation: aDst's length will increase by at least aSrc's length. > // This should never overflow because kMaxCapacity is less than half of the > // maximum value of nsACString::kMaxCapacity. But even if that changed, > // Length() is unsigned, so any overflow will just cause this sum to be > // smaller than it should be, in which case this call will be a no-op. > aDst.SetCapacity(aDst.Length() + aSrc.Length());
(In reply to Nicholas Nethercote [:njn] from comment #8) > > // Preparation: aDst's length will increase by at least aSrc's length. > > // This should never overflow because kMaxCapacity is less than half of the > > // maximum value of nsACString::kMaxCapacity. But even if that changed, Huh, interesting. What is the non-qualified kMaxCapacity in the above, where is that coming from? Do you know offhand if we have a static assert for the relationship cited here? > > // Length() is unsigned, so any overflow will just cause this sum to be > > // smaller than it should be, in which case this call will be a no-op. > > aDst.SetCapacity(aDst.Length() + aSrc.Length()); Smaller capacities don't look like a no-op to me, particularly: https://dxr.mozilla.org/mozilla-central/source/xpcom/string/nsTSubstring.cpp#683 https://dxr.mozilla.org/mozilla-central/source/xpcom/string/nsTSubstring.cpp#708 We'll overwrite old data if we shrink. But perhaps this is not a problem because of the capacity arguments cited above.
Eh, I'll just add overflow checking and skip the SetCapacity() call on failure.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
That busted us without any warning :-(
Just a string question: I converted this: char *escapedBody = nsEscapeHTML(*body); if (escapedBody) { PR_Free(*body); *body = escapedBody; bodyLen = strlen(*body); } to this: nsCString escapedBody; nsAppendEscapedHTML(nsDependentCString(*body), escapedBody); if (!escapedBody.IsEmpty()) { PR_Free(*body); *body = strdup(escapedBody.get()); bodyLen = strlen(*body); } The 'strdup()' is really unfortunate, but I guess there is no opposite of Adopt(), so something like: *body = escapedBody.get(); escapedBody.ForgetBuffer(); right?
Flags: needinfo?(erahm)
(In reply to Jorg K (GMT+2) from comment #14) > Just a string question: > > I converted this: > char *escapedBody = nsEscapeHTML(*body); > if (escapedBody) > { > PR_Free(*body); > *body = escapedBody; > bodyLen = strlen(*body); > } > > to this: > nsCString escapedBody; > nsAppendEscapedHTML(nsDependentCString(*body), escapedBody); > if (!escapedBody.IsEmpty()) > { > PR_Free(*body); > *body = strdup(escapedBody.get()); > bodyLen = strlen(*body); > } > > The 'strdup()' is really unfortunate, but I guess there is no opposite of > Adopt(), so something like: > *body = escapedBody.get(); > escapedBody.ForgetBuffer(); > right? Correct, we don't have a `ForgetBuffer` (the buffer itself includes the ref counting bits so it's not as easy as just forgetting it). You can use `ToNewCString` [1] instead of the `strdup` and `nsCString::Length` instead of `strlen` at least: > *body = ToNewCString(escapedBody); > bodyLen = escapedBody.Length(); [1] http://searchfox.org/mozilla-central/rev/48ea452803907f2575d81021e8678634e8067fc2/xpcom/string/nsReadableUtils.h#87
Flags: needinfo?(erahm)
Thanks Eric, I was reading the string guide https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Guide/Internal_strings on my mobile device while afk and came to the same conclusions, but you answered before I could cancel the NI. That page even says: "In other words, there is no way to "grab" the internal char * from a String, i.e. have the string "forget" about it and hand off ownership to other code. Sorry."
Apologies for the breakage. For no good reason I utterly failed to check this change against comm-central. I will try to do better in the future.
No problem. There were about 20 call sites and the new function allowed us to toss a lot of clumsy code.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: