Closed
Bug 587366
Opened 14 years ago
Closed 14 years ago
Regexp failure for ")".replace(")","*$&*");
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
mozilla2.0
Tracking | Status | |
---|---|---|
blocking2.0 | --- | beta5+ |
People
(Reporter: alice0775, Assigned: cdleary)
References
Details
(Keywords: regression, Whiteboard: fixed-in-tracemonkey)
Attachments
(3 files, 2 obsolete files)
(deleted),
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b4pre) Gecko/20100814 Minefield/4.0b4pre ID:20100814040443
Build Identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b4pre) Gecko/20100814 Minefield/4.0b4pre ID:20100814040443
See forum, http://forums.mozillazine.org/viewtopic.php?p=9754685#p9754685
When execute the following code, returns an error.
")".replace(")","*$&*");
Actual Results:
Error: unmatched ) in regular expression
Source File: javascript:%20")".replace(")","*$&*")
Line: 1
Expected Results:
*)*
Regression pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0ee09dea0911&tochange=fc6783c960ca
Comment 1•14 years ago
|
||
Regression from the YARR! landing?
Comment 2•14 years ago
|
||
Looks like a combination of our turning flat strings into flat regexps to search for (per ES3 and ES5, a non-RegExp first arg to replace means convert that arg to string and find it in the |this| string) and YARR/JSPCRE being pickier about unbalanced parens.
/be
Updated•14 years ago
|
Assignee: general → cdleary
blocking2.0: --- → beta5+
Assignee | ||
Comment 3•14 years ago
|
||
Patching now. As Brendan says, turning flat string to flat regexp and displaying a compilation error is incorrect behavior here.
Interesting side note: jsc's result for this '*$&*'. They have a special single-char pattern replacement path within JSC::stringProtoFuncReplace that does no substitution of back references. (Sure, there can be no parenthetical backreference substitution in this case, but not having the leftContext, rightContext, and match substitutions is in error per ECMAv5:15.5.4.11.) I'll file a bug against webkit before closing this.
Another aside, the behavior for numbered capture group backreference substitution in this scenario is implementation defined according to that section! Why don't we just specify something reasonable there, like literal character substitution?
Status: NEW → ASSIGNED
Assignee | ||
Comment 4•14 years ago
|
||
Out of curiosity I ran some stats for single-char flat regexp replace on tracemonkey benchmarks:
SunSpider single character replace (in cycles):
Mean: 781.455
StdDev: 613.818
Count: 11
Total: 8596
ms @ 3GHz: 0.00286533
v8 single character replace (in cycles):
Mean: 229.77
StdDev: 167.209
Count: 4155
Total: 954695
ms @ 3GHz: 0.318232
Assignee | ||
Comment 5•14 years ago
|
||
Disregard those measurements, was measuring text length instead of pattern length... some of the variable names in that area of code are easily confused.
Assignee | ||
Comment 6•14 years ago
|
||
Here are the corrected stats for single char flat patterns on tracemonkey. SunSpider only, v8 had no such patterns:
single character flat patterns (in cycles):
Mean: 1304.24
StdDev: 49330.3
Count: 2512
Total: 3.27624e+06
ms @ 3GHz: 1.09208
Assignee | ||
Comment 7•14 years ago
|
||
So it turns out that our String.prototype.replace has been a little more non-standard than is correct. If there is a dollar in the replaceValue or a lambda, we turn a non-regexp value of searchValue into a regular expression and perform a regular expression match. This has the side-effect of updating the RegExp statics, which is not to spec! Unfortunately, it meshes nicely with our optional third parameter, flags ( https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/String/replace ). This allows users to specify things like 'g', whereas the normal String.prototype.replace can only perform a single replace.
To rectify the situation we'll rip out the spec-violating behavior, make the non-flags case as fast and minimal as possible, and put the flags case on a slow path that updates the RegExp statics.
Assignee | ||
Comment 8•14 years ago
|
||
Refactors the RegExpGuard results into result classes instead of accessing RegExpGuard members directly.
Revised |String.prototype.replace| functionality soon to follow.
Attachment #466791 -
Flags: review?(lw)
Assignee | ||
Comment 10•14 years ago
|
||
Lambda case is still outstanding. Going back to tweak a few things in the base patch after talking to lw.
Assignee | ||
Comment 11•14 years ago
|
||
Attachment #466791 -
Attachment is obsolete: true
Attachment #466879 -
Flags: review?(lw)
Attachment #466791 -
Flags: review?(lw)
Assignee | ||
Comment 12•14 years ago
|
||
There's one fixme left in there for lambda-based substitution that I'll make the next patch on the stack.
Attachment #466868 -
Attachment is obsolete: true
Attachment #466881 -
Flags: review?(lw)
Assignee | ||
Comment 13•14 years ago
|
||
Attachment #466967 -
Flags: review?(lw)
Comment 14•14 years ago
|
||
Comment on attachment 466879 [details] [diff] [review]
Part 1: Clean up RegExpGuard results.
Looks good, a few tiny comments:
>+class FlatMatch
>+{
>+ bool fail;
...
>+ public:
>+ operator bool() const { return !fail; }
As discussed, tryFlatMatch can just return a FlatMatch*, so no need for 'fail' or operator bool().
> /*
> * RegExpGuard factors logic out of String regexp operations. After each
> * operation completes, RegExpGuard data members become available, according to
> * the comments below.
> *
> * Notes on parameters to RegExpGuard member functions:
> * - 'optarg' indicates in which argument position RegExp flags will be found,
> * if present. This is a Mozilla extension and not part of any ECMA spec.
> * - 'flat' indicates that the given pattern string will not be interpreted as
> * a regular expression, hence regexp meta-characters are ignored.
> */
It looks like this comment can be gutted.
> if (optarg < argc ||
>- (!flat &&
>- (patlen > sMaxFlatPatLen || RegExp::hasMetaChars(pat, patlen)))) {
>- return false;
>+ (checkMetaChars && (fm.patlen > MAX_FLAT_PAT_LEN
>+ || RegExp::hasMetaChars(fm.pat, fm.patlen)))) {
SM style puts the || at the end of the line and probably align RegExp::hasMetaChars with fm.patlen.
>+ if (!rdata.dollar && !rdata.lambda) {
>+ FlatMatch fm = rdata.g.tryFlatMatch(rdata.str, 2, argc, false);
>+ if (fm)
>+ return BuildFlatReplacement(cx, rdata.str, rdata.repstr, fm, vp);
Can merge decl and test.
Attachment #466879 -
Flags: review?(lw) → review+
Comment 15•14 years ago
|
||
(In reply to comment #14)
> > if (optarg < argc ||
> >+ (checkMetaChars && (fm.patlen > MAX_FLAT_PAT_LEN
> >+ || RegExp::hasMetaChars(fm.pat, fm.patlen)))) {
>
> SM style puts the || at the end of the line and probably align
> RegExp::hasMetaChars with fm.patlen.
Yes to || at end (&& too), but rather than run on in a right-heavy way past the nested &&, break after && in the parenthetical and indent accordingly.
Even better, since the consequent looks to have been (still be?) return false;, write two if-then-return-false statements, one for the optarg < argc condition, the second after it for the longer condition.
> >+ if (!rdata.dollar && !rdata.lambda) {
> >+ FlatMatch fm = rdata.g.tryFlatMatch(rdata.str, 2, argc, false);
> >+ if (fm)
> >+ return BuildFlatReplacement(cx, rdata.str, rdata.repstr, fm, vp);
>
> Can merge decl and test.
Doing that all over lately, and Waldo is too. While gal's Java-tainted reptile brain had issues with it, I declare it pure C++ win (something Algol had, IIRC, lost in the C lineage early on).
/be
Comment 16•14 years ago
|
||
Workaround:
String.prototype._replace_ = String.prototype.replace;
String.prototype.replace = function replace() {
var args = Array.slice(arguments);
if (typeof args[0] == "string") {
args[0] = RegExp(args[0]._replace_(/[{[(\\^.$|?*+/)\]}]/g, "\\$&"));
}
return this._replace_.apply(this, args);
};
Comment 17•14 years ago
|
||
Comment on attachment 466881 [details] [diff] [review]
Part 2: Fix non-regexp replace.
r+ with some nits:
>+ JSCharBuffer newStringChars(cx); /* newstring in the spec is the replaceValue with subs. */
Perhaps:
JSCharBuffer newReplaceChars(cx); /* = dollarSub(replaceValue) */
?
>+ /*
>+ * Move the rest of the replacement string char-by-char, interpreting dollars as we encounter
>+ * them.
>+ */
4 lines of comment are so much more than 1; perhaps s/of the replacement string// so that it fits on one line?
>+ /*
>+ * Note: we know it's a dollar in the first iteration, but this isn't
>+ * that hot of a path to care to optimize it at this point.
>+ */
True, but probably doesn't need a comment. We won't judge you for not caring ;-)
>+ JSString *newString = js_NewStringFromCharBuffer(cx, newStringChars);
>+ if (!newString)
>+ return false;
Perhaps newReplace?
>+ newStringChars.append(*it);
It looks like you need to OOM-check this and all the other append() calls in this function.
>+ FlatMatch fm = rdata.g.tryFlatMatch(rdata.str, 2, argc, false);
>+ if (!fm)
>+ return str_replace_regexp(cx, argc, vp, rdata);
In the spirit of the nice comment before this, could you JS_ASSERT(argc > 2) in the then-path? (This assert depends on there not being an error/OOM path out of tryFlatMatch, but I didn't see one...)
>+ default: /* The dollar we saw was not special (no matter what its mother told it). */
:)
>+ * Note: we could optimize the text.length == pattern.length case if we wanted,
>+ * even in the presence of dolla' bills (y'all).
:(
Attachment #466881 -
Flags: review?(lw) → review+
Comment 18•14 years ago
|
||
Comment on attachment 466967 [details] [diff] [review]
Part 3: Fix non-regexp searchValue with lambda replaceValue.
Thanks for cutting this up into 3 readable patches!
>+static inline bool
>+str_replace_flat_lambda(JSContext *cx, uintN argc, Value *vp, ReplaceData &rdata,
>+ const FlatMatch &fm)
>+{
>+ JS_ASSERT(fm.match() >= 0);
>+ /* lambda(matchStr, matchStart, textstr) */
>+ LeaveTrace(cx);
Could you push this comment down to right before the definition of lambdaArgc?
>+ Value *sp = args.argv();
>+ sp++->setString(matchStr);
>+ sp++->setInt32(fm.match());
>+ sp++->setString(rdata.str);
With base+offset addressing, I think it would be faster to write:
Value *argv = args.argv();
argv[0].setString(matchStr);
argv[1].setInt32(fm.match());
argv[2].setString(rdata.str);
Not that it matters thaaaat much, as we are about to call Invoke (OTOH, Invoke is getting faster by the day!).
>diff --git a/js/src/tests/ecma_5/String/15.5.4.11-01.js b/js/src/tests/ecma_5/String/15.5.4.11-01.js
>new file mode 100644
>--- /dev/null
>+++ b/js/src/tests/ecma_5/String/15.5.4.11-01.js
Great!
Attachment #466967 -
Flags: review?(lw) → review+
Assignee | ||
Comment 19•14 years ago
|
||
Whiteboard: fixed-in-tracemonkey
Comment 20•14 years ago
|
||
Still has a problem when third argument flag is specified like a following expression:
")".replace(")","*$&*","g");
Workaround:
Make a first argument into regexp with flag, e.g. Bug 587608.
Comment 21•14 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 22•14 years ago
|
||
(In reply to comment #21)
> http://hg.mozilla.org/mozilla-central/rev/77f1425d2731
So what?
The problem in comment #20 is not yet fixed.
Should I file a new bug?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 23•14 years ago
|
||
cdleary will file a followup to deal with the "g" case.
Status: REOPENED → RESOLVED
Closed: 14 years ago → 14 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 24•14 years ago
|
||
Filed as bug 592027 (YARR: regexp syntax error on unbalanced parens).
Comment 25•14 years ago
|
||
Chris, bug 592027 is not about the optional third flags ("g") parameter that our String.prototype.replace implementation used to take. But that seems to be the bone of contention in comment 20 and comment 22, and sayrer names it explicitly in comment 23.
IMHO we should remove that old extension. It already burned us on SunSpider.
/be
Assignee | ||
Comment 26•14 years ago
|
||
(In reply to comment #25)
> IMHO we should remove that old extension. It already burned us on SunSpider.
Oh cool! I've thought about this a little bit before but got stumped on how to make it a smooth transition -- is there a way to provide user feedback that they provided us one-too-many arguments?
Removing the functionality of the third (flags) argument could create very subtle bugs in existing code -- i.e. regexps silently becoming case-sensitive or non-multiline.
Comment 27•14 years ago
|
||
Grep for warnedAboutTwoArgumentEval.
/be
Assignee | ||
Comment 28•14 years ago
|
||
Filed bug 592143 (retire three-argument RegExp extensions).
Comment 29•14 years ago
|
||
Workaround:
String.prototype.replace = (function() {
var func = String.prototype.replace;
return function replace() {
if (typeof arguments[0] == "string") {
arguments[0] = RegExp(arguments[0].replace(/[{[(\\^|$.?*+/)\]}]/g, "\\$&"), arguments[2]);
}
return func.apply(this, arguments);
};
})();
You need to log in
before you can comment on or make changes to this bug.
Description
•