Closed
Bug 432525
Opened 17 years ago
Closed 15 years ago
optimize string.replace for flat strings
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
DUPLICATE
of bug 511777
People
(Reporter: shaver, Assigned: dmandelin)
Details
(Keywords: perf, Whiteboard: [parity-IE][parity-Safari])
Attachments
(2 files, 2 obsolete files)
(deleted),
application/x-javascript
|
Details | |
(deleted),
patch
|
brendan
:
review+
|
Details | Diff | Splinter Review |
Lots of cases on the web of using a flat string as the needle, and it's pretty expensive to spin up the regex engine for such a case. We have such code in our own FE, for doing entity replacement and other such processing.
I think there are 3 phases here:
1) detect the simple str.replace("needle", "new") case and short-circuit it by reusing indexOf's internals
2) handle the global and case-insensitive versions, like str.replace("needLE", "new", "gi")
3) undo all of this in favour of optimizing the regex engine for the simple-string case
I have a patch for 1, which shows a decent gain on Celtic Kane's Test_String:
Before: Test_String : 308
After : Test_String : 221
(The next big chunk of that test is spent finding the String prototype to look up the methods, with my patch.)
As well as my patch, I'll attach my sharkified ck.js here, I typically run it via
js -f ck.js -e 'useshark = 1; testcount = 1; RunTests("Test_String")'
to profile.
Reporter | ||
Comment 1•17 years ago
|
||
Assignee: general → shaver
Status: NEW → ASSIGNED
Reporter | ||
Comment 2•17 years ago
|
||
Comment 3•17 years ago
|
||
Comment on attachment 319669 [details] [diff] [review]
phase 1: str.replace("needle", "flat")
>+#if 0
> /* XXX tune the BMH threshold (512) */
> if (textlen - i >= 512 && (jsuint)(patlen - 2) <= BMH_PATLEN_MAX - 2) {
> index = js_BoyerMooreHorspool(text, textlen, pat, patlen, i);
>+ v = JS_ARGV(cx, vp)[0];
>+ if (!JSVAL_IS_REGEXP(cx, v) && repstr != NULL && argc < 3) {
Fastest path is for JSVAL_IS_STRING(v). Who cares about objects, etc.?
>+ global = JS_FALSE; /* for when we do flat-string regexes through this */
Premature logic control alert! :-P
>+ do {
>+ jsint found;
>+
>+ found = patlen
>+ ? str_indexOf_inner(text, textlen, pat, patlen, i)
>+ : 0;
Nit: underhang right-hand side of assignment.
Prefer _helper to _inner (see jsxml.c).
>+ if (found < 0)
>+ break;
No more occurrences.
>+
>+ length += found - i + replen;
>+ chars = (jschar *) JS_realloc(cx, chars, length * sizeof(jschar));
>+ if (!chars)
>+ return JS_FALSE;
>+
>+ APPEND_CHARS(text, i, found - i);
>+ APPEND_CHARS(reptext, 0, replen);
>+
>+ i = found + patlen;
>+ if (!global)
>+ break;
>+ chars = (jschar *)JS_realloc(cx, chars, length * sizeof(jschar));
Nit: space after cast.
>+ if (!chars)
>+ return JS_FALSE;
Non-nit: leak on error. Need newchars or similar.
>+ APPEND_CHARS(text, i, textlen - i);
>+ str = js_NewString(cx, chars, length);
Non-nit: need (length + 1) * sizeof(jschar) JS_realloc argument.
>+ if (!str) {
>+ JS_free(cx, chars);
>+ return JS_FALSE;
>+ }
/be
Reporter | ||
Comment 4•17 years ago
|
||
Ripping out the unused global-replace capabilities let me use a single allocation point, making for better speed and simpler code.
Test_String : 172
Attachment #319669 -
Attachment is obsolete: true
Attachment #320027 -
Flags: review?(brendan)
Comment 5•16 years ago
|
||
Comment on attachment 320027 [details] [diff] [review]
brendan's comments addressed, performance improved
>+ if (JSVAL_IS_STRING(v) && repstr != NULL && argc < 3) {
Eschew != NULL unless nesting assignment on the right in a loop condition.
>+ jschar *text, *pat, *reptext;
>+ size_t textlen, patlen, replen, charsindex;
>+ jsint found;
>+
Order by order of initialization?
>+ str = JSVAL_TO_STRING(v);
>+ JSSTRING_CHARS_AND_LENGTH(str, pat, patlen);
>+
>+ NORMALIZE_THIS(cx, vp, str);
>+ JSSTRING_CHARS_AND_LENGTH(str, text, textlen);
>+
>+ JSSTRING_CHARS_AND_LENGTH(repstr, reptext, replen);
>+
>+ chars = NULL;
>+ charsindex = 0;
chars is not used before set after this.
charsindex is unused after this.
>+
>+ found = patlen ? str_indexOf_helper(text, textlen, pat, patlen, 0) : 0;
>+ if (found < 0) {
>+ *vp = vp[1]; /* Return original value if pattern wasn't found. */
>+ return JS_TRUE;
>+ }
Use an if-else to avoid testing found < 0 in the patlen == 0 case.
>+
>+ length = textlen + replen - patlen;
>+ chars = (jschar *) JS_malloc(cx, (length + 1) * sizeof(jschar));
>+ if (!chars)
>+ return JS_FALSE;
>+
>+ js_strncpy(chars, text, found); /* Prefix. */
>+ js_strncpy(chars + found, reptext, replen); /* Replacement text. */
>+ js_strncpy(chars + found + replen, text + found + patlen,
>+ textlen - found - patlen); /* Suffix. */
Nit: "inline" or "on the right" comments use sentence fragments -- no capitalization or full stop.
/be
Reporter | ||
Comment 6•16 years ago
|
||
Attachment #320027 -
Attachment is obsolete: true
Attachment #322955 -
Flags: review?(brendan)
Attachment #320027 -
Flags: review?(brendan)
Comment 7•16 years ago
|
||
Comment on attachment 322955 [details] [diff] [review]
with brendan's comments addressed
>+ jsint i, index, textlen, patlen;
Given the following changes, should i be renamed to start? Probably -- r=me with that if you agree.
/be
Attachment #322955 -
Flags: review?(brendan) → review+
Reporter | ||
Comment 8•16 years ago
|
||
Committed with the i/start rename:
15341[tip] 1f599577eca2 2008-06-12 18:30 -0700 shaver
Bug 432525: optimize string.replace for flat strings; r=brendan
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
This caused a large number of mochitest failures on all platforms so I backed it out.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 10•16 years ago
|
||
Um, yeah. I think I had my mq entirely qpopped when I mochitested, because wow. Sorry, and thanks to roc for wiping my chin.
Updated•16 years ago
|
Flags: wanted1.9.1?
Comment 11•16 years ago
|
||
It's probably a moot point now - but historically I've found:
string.split("needle").join("flat")
to be faster than
string.replace("needle", "flat")
Would there be interest in some test cases to see if/how we've improved here?
Reporter | ||
Comment 12•16 years ago
|
||
I think Dave is going to pick up this patch and iron it out.
The first version of the patch has the infrastructure for global replace, though not wired up to the "g" arg. (SunSpider, at least, needs it.) You want to work from the most recent patch, I think, but the first one shows one way to get the global stuff connected.
Assignee: shaver → dmandelin
Assignee | ||
Comment 13•16 years ago
|
||
Thanks for the tip about global. I know I have to do it but I wasn't aware you had already coded it up in the first patch--I had looked only at the latest one.
Assignee | ||
Comment 14•16 years ago
|
||
So, do we still care about this now that SunSpider has been updated not to use the 'g' flag? As of now, this patch speeds up trunk SunSpider by about 2ms, by reducing the time for the replace operations from 15ms to 13ms.
I think it's worthwhile only if we care to speed up the old SunSpider, which will presumably be the one on the web for some time. In that case I'd need to get back the "g" machinery and also put in something for $&. Let me know.
Comment 15•16 years ago
|
||
It seems like both IE and Safari do this optimization (and both without any $& substitution) which can and does confuse web developers (see e.g. bug 459378).
Whiteboard: [parity-IE][parity-Safari]
Comment 16•15 years ago
|
||
There are still some things mentioned here that aren't being done by bug 511777 (e.g. picking out flat RegExp objects, and 'g' matching).
Status: REOPENED → RESOLVED
Closed: 16 years ago → 15 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•