Closed
Bug 1385172
Opened 7 years ago
Closed 7 years ago
Improve nsEscapeHTML() and nsEscapeHTML2()
Categories
(Core :: XPCOM, enhancement)
Tracking
()
RESOLVED
FIXED
mozilla57
Tracking | Status | |
---|---|---|
firefox57 | --- | fixed |
People
(Reporter: n.nethercote, Assigned: n.nethercote)
Details
Attachments
(2 files, 1 obsolete file)
(deleted),
patch
|
erahm
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
erahm
:
review+
|
Details | Diff | Splinter Review |
These functions return malloc'd |char*|/|char16_t*| buffers which are, in most cases, immediately Adopt()ed into nsStrings. We should make them operate on nsAString/nsACString.
Also, they over-allocate terribly! All they do is escape '<', '>', '&', '"', and '\''. The worst case is '"' which expands to """, which is six chars. So they just assume the worst and allocate 6x more chars for the output than what the input has!
Assignee | ||
Comment 1•7 years ago
|
||
I did some ad hoc profiling. Starting the browser and looking at a couple of pages, nsEscapeHTML() was called a grand total of 8 times, all for "moz-extension:" URLs. nsEscapeHTML2() wasn't called at all.
Comment 2•7 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #1)
> I did some ad hoc profiling. Starting the browser and looking at a couple of
> pages, nsEscapeHTML() was called a grand total of 8 times, all for
> "moz-extension:" URLs. nsEscapeHTML2() wasn't called at all.
The best I can tell nsEscapeHTML2 is just used for rendering indexes (think foo.bar.com/~erahm/).
Assignee | ||
Comment 3•7 years ago
|
||
The existing functions work with C strings but almost all the call sites use
Mozilla strings.
The replacement function has the following properties.
- It works with Mozilla strings, which makes it much simpler and also improves
the call sites.
- It appends to the destination string because that's what a lot of the call
sites need. For those that don't, we can just append to an empty string.
- It is declared outside the |extern "C"| section because there is no need for
it to be in that section.
Note: there is no 16-bit variant of nsAppendEscapedHTML(). This is because
there are only two places that need 16-bit variants, both rarely executed,
and so converting to and from 8-bit is good enough.
The patch also adds some testing of the new function, renaming
TestEscapeURL.cpp as TestEscape.cpp in the process, because that file is now
testing other kinds of escaping.
Attachment #8898603 -
Flags: review?(erahm)
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
Assignee | ||
Comment 4•7 years ago
|
||
Attachment #8898626 -
Flags: review?(erahm)
Assignee | ||
Updated•7 years ago
|
Attachment #8898603 -
Attachment is obsolete: true
Attachment #8898603 -
Flags: review?(erahm)
Assignee | ||
Comment 5•7 years ago
|
||
Comments about functions should go before the declaration, not after!
Attachment #8898629 -
Flags: review?(erahm)
Comment 6•7 years ago
|
||
Comment on attachment 8898626 [details] [diff] [review]
Replace nsEscapeHTML{,2}() with new nsAppendEscapedHTML() function
Review of attachment 8898626 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks, this is much better.
::: xpcom/io/nsEscape.cpp
@@ +214,3 @@
> {
> + // Preparation: aDst's length will increase by at least aSrc's length.
> + aDst.SetCapacity(aDst.Length() + aSrc.Length());
Nice, no more 6X!
@@ +217,4 @@
>
> + nsACString::const_iterator cur, end;
> + aSrc.BeginReading(cur);
> + aSrc.EndReading(end);
This could just be:
> for (auto cur = aSrc.BeginReading(); cur != aSrc.EndReading(); cur++)
or slightly more concise (and avoiding our odd iterator class):
> auto cur = aSrc.BeginReading();
> auto end = aSrc.EndReading();
It's fine as-is of course.
Attachment #8898626 -
Flags: review?(erahm) → review+
Updated•7 years ago
|
Attachment #8898629 -
Flags: review?(erahm) → review+
Comment 7•7 years ago
|
||
Comment on attachment 8898626 [details] [diff] [review]
Replace nsEscapeHTML{,2}() with new nsAppendEscapedHTML() function
Review of attachment 8898626 [details] [diff] [review]:
-----------------------------------------------------------------
::: xpcom/io/nsEscape.cpp
@@ +214,3 @@
> {
> + // Preparation: aDst's length will increase by at least aSrc's length.
> + aDst.SetCapacity(aDst.Length() + aSrc.Length());
The result of this addition should be checked for overflow. Although, hm, this function isn't fallible...
Assignee | ||
Comment 8•7 years ago
|
||
> The result of this addition should be checked for overflow. Although, hm,
> this function isn't fallible...
Turns out it can't overflow, and it wouldn't matter if it did. I've added this comment:
> // Preparation: aDst's length will increase by at least aSrc's length.
> // This should never overflow because kMaxCapacity is less than half of the
> // maximum value of nsACString::kMaxCapacity. But even if that changed,
> // Length() is unsigned, so any overflow will just cause this sum to be
> // smaller than it should be, in which case this call will be a no-op.
> aDst.SetCapacity(aDst.Length() + aSrc.Length());
Comment 9•7 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #8)
> > // Preparation: aDst's length will increase by at least aSrc's length.
> > // This should never overflow because kMaxCapacity is less than half of the
> > // maximum value of nsACString::kMaxCapacity. But even if that changed,
Huh, interesting. What is the non-qualified kMaxCapacity in the above, where is that coming from? Do you know offhand if we have a static assert for the relationship cited here?
> > // Length() is unsigned, so any overflow will just cause this sum to be
> > // smaller than it should be, in which case this call will be a no-op.
> > aDst.SetCapacity(aDst.Length() + aSrc.Length());
Smaller capacities don't look like a no-op to me, particularly:
https://dxr.mozilla.org/mozilla-central/source/xpcom/string/nsTSubstring.cpp#683
https://dxr.mozilla.org/mozilla-central/source/xpcom/string/nsTSubstring.cpp#708
We'll overwrite old data if we shrink. But perhaps this is not a problem because of the capacity arguments cited above.
Assignee | ||
Comment 10•7 years ago
|
||
Eh, I'll just add overflow checking and skip the SetCapacity() call on failure.
Assignee | ||
Comment 11•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/a9fb936a3c82cf1a34cbd1f7ac0e7abd0d10eba4
Bug 1385172 - Replace nsEscapeHTML{,2}() with new nsAppendEscapedHTML() function. r=erahm.
https://hg.mozilla.org/integration/mozilla-inbound/rev/1f1322219d9ffefe7a51c000fb68d1722d64901d
Bug 1385172 - Move some comments in nsEscape.h. r=erahm.
Comment 12•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/a9fb936a3c82
https://hg.mozilla.org/mozilla-central/rev/1f1322219d9f
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Comment 13•7 years ago
|
||
That busted us without any warning :-(
Comment 14•7 years ago
|
||
Just a string question:
I converted this:
char *escapedBody = nsEscapeHTML(*body);
if (escapedBody)
{
PR_Free(*body);
*body = escapedBody;
bodyLen = strlen(*body);
}
to this:
nsCString escapedBody;
nsAppendEscapedHTML(nsDependentCString(*body), escapedBody);
if (!escapedBody.IsEmpty())
{
PR_Free(*body);
*body = strdup(escapedBody.get());
bodyLen = strlen(*body);
}
The 'strdup()' is really unfortunate, but I guess there is no opposite of Adopt(), so something like:
*body = escapedBody.get();
escapedBody.ForgetBuffer();
right?
Flags: needinfo?(erahm)
Comment 15•7 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #14)
> Just a string question:
>
> I converted this:
> char *escapedBody = nsEscapeHTML(*body);
> if (escapedBody)
> {
> PR_Free(*body);
> *body = escapedBody;
> bodyLen = strlen(*body);
> }
>
> to this:
> nsCString escapedBody;
> nsAppendEscapedHTML(nsDependentCString(*body), escapedBody);
> if (!escapedBody.IsEmpty())
> {
> PR_Free(*body);
> *body = strdup(escapedBody.get());
> bodyLen = strlen(*body);
> }
>
> The 'strdup()' is really unfortunate, but I guess there is no opposite of
> Adopt(), so something like:
> *body = escapedBody.get();
> escapedBody.ForgetBuffer();
> right?
Correct, we don't have a `ForgetBuffer` (the buffer itself includes the ref counting bits so it's not as easy as just forgetting it). You can use `ToNewCString` [1] instead of the `strdup` and `nsCString::Length` instead of `strlen` at least:
> *body = ToNewCString(escapedBody);
> bodyLen = escapedBody.Length();
[1] http://searchfox.org/mozilla-central/rev/48ea452803907f2575d81021e8678634e8067fc2/xpcom/string/nsReadableUtils.h#87
Flags: needinfo?(erahm)
Comment 16•7 years ago
|
||
Thanks Eric, I was reading the string guide https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Guide/Internal_strings on my mobile device while afk and came to the same conclusions, but you answered before I could cancel the NI.
That page even says: "In other words, there is no way to "grab" the internal char * from a String, i.e. have the string "forget" about it and hand off ownership to other code. Sorry."
Assignee | ||
Comment 17•7 years ago
|
||
Apologies for the breakage. For no good reason I utterly failed to check this change against comm-central. I will try to do better in the future.
Comment 18•7 years ago
|
||
No problem. There were about 20 call sites and the new function allowed us to toss a lot of clumsy code.
You need to log in
before you can comment on or make changes to this bug.
Description
•