Closed Bug 85721 (RegExpPerf) Opened 24 years ago Closed 21 years ago

Regexp performance degraded from 4.7

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.6alpha

People

(Reporter: rogerl, Assigned: rogerl)

References

Details

(Keywords: perf, Whiteboard: [Have filed bug 125562 against Rhino for the same issue])

Attachments

(9 files, 30 obsolete files)

(deleted), text/plain
Details
(deleted), text/plain
Details
(deleted), text/plain
Details
(deleted), text/plain
Details
(deleted), text/plain
Details
(deleted), text/plain
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), text/plain
Details
Here's the email reporting the issue from zalman@macromedia.com -------------------------------------------------------------------------------- I am working on Dreamweaver at Macromedia. Narciso Jaramillo gave me your email address as a good contact for issues pertaining to the regular expressions component of the Spider Monkey JavaScript interpreter. Dreamweaver and UltraDev use this interpreter to provide an extension language within the applications. Fairly sophisticated extensions are written in JavaScript and one aspect that gets used a lot is regular expressions. We recently updated to the JavaScript 1.5 interpreter and are seeing a huge slowdown in regular expression performance. Admittedly the regexps in question are quite complicated, but the performance is off by a factor of 25 or so compared to the 1.2 version of the Netscape interpreter we were using before. I've enclosed an example HTML file which takes approximately 40 seconds to load in Mozilla 0.9 on a 450 MHz Celeron running Win2k. (I expect results are similar for the Mac given that's what I've been testing Dreamweaver regexp performance on.) I profiled the code in jsregexp.c and took a look at what's going on inside. Note that there are a hair over 90 million calls to matchChar in evaluating this regexp. I understand that getting correct results requires a certain amount of backtracking, but this seems like something is wrong. (E.g. the pcre package evaluates this regexp on a 180 MHz PPro in 80 milliseconds. Navigator 4.7 on the Mac loads the file almost instantly, as does IE 6 preview on Windows. IE 5.5 on the Mac fails to compile the regexp, so I'll freely admit it could be worse :-)) Any ideas on what could be done to make this faster? Can you comment on the changes to the regular expression algorithm between 1.2 and 1.5? It used to "compile" the parse tree into a byte coded state machine, with some optimization. Now it seems to execute directly off the parse tree. I assume this was done to address correctness bugs in the matching. Is this true?
Attached file regexp timing test (deleted) —
I'm attaching a proposed patch - it passses the current test suite but I'm nervous enough about it that I'm waiting for the new regexp test cases that Phil and I are working on before merging into the tree.
Attached patch Early-out optimization (obsolete) (deleted) — Splinter Review
Keywords: perf
Status: NEW → ASSIGNED
rogerl, how much does that patch help? /be
I think byte-coding was dropped to simplify the task of implementing ECMA-262 edition 3 extensions -- rogerl can say for sure. I feel it was dropped without enough consideration of the performance tradeoffs (or any measurement, AFAICT). I'll see if I can dust off the old code and migrate it forward, and measure A vs. B. /be
Actually Norris & I did get measurements to support dropping the bytecode - it was mostly a space saving, but there was no performance degradation - in fact I recall that small simple cases ran faster. Having said that however, I don't know how well the bytecode engine ran on the nested quantifier (e.g. /(a+)+a/) cases that have been the issue here, it would be interesting to see. The patch for this bug changes the Macromedia test case run from 26000 to 1 (as timed by Date.getTime()). I still remain insecure about the fix, though it passes the test suite fine. Personally I'd rather get the other more obvious fixes into the tarball before worrying about this one. I'm hoping to clean up the regexp engine over the christmas break by incorporating some ideas from Waldemar's algorithm, and that would obviate the need to make this patch at all.
Ok, sounds like we can live without the fix for RC4 -- I foresee an RC4a or even RC5 before mozilla1.0/js1.5. Given the separately malloc'd RENodes vs. the old code's temporary (during compile only) arena-allocated nodes, which translated to single-allocation bytecode, I wonder why the memory use of the bytecoded implementation was worse, after compilation. Maybe there was a leak, or the worst-case memory use was counted? /be
How much does the patch help? Should it go into 0.9.8 (tonight) or 0.9.9? /be
I'm not happy enough with the results of the PERL test cases to put this is now, so I'm moving to 0.9.9. I have a different version that feels like a better fix in any case.
Target Milestone: --- → mozilla0.9.9
Testcase added to JS testsuite: js/tests/ecma_3/RegExp/regress-85721.js N.B. - this test takes 1 1/2 minutes to run in the current JS Engine. If you don't want to wait, you can run the test driver with the skip option (the -L option to jsDriver.pl), or you can rename the test with a different extension. The test driver only runs tests that have a '.js' extension. When the fix for this bug goes in, the test will run in a few milliseconds.
Whiteboard: [Have filed bug 125562 against Rhino for the same issue]
Not going in for 0.9.9.
Target Milestone: mozilla0.9.9 → mozilla1.2
roger, can you put your new bytecoded-NFA jsregexp.[ch] in an attachment here? I agree this should cook more, but how about 1.1? /be
Attached file New implementation of fix. (obsolete) (deleted) —
Proposed fix - it's a big change so I'm attaching the pair of files jsregexp.[ch] instead of as a diff. This is a return to the bytecode scheme with the addition of moving recursion/backtracking onto a heap stack instead. Passes the test suite (except maybe the one about returning undefined instead of null strings) including performance bugs. Needs 'baking' before acceptance.
Attachment #38268 - Attachment is obsolete: true
Attached file Header file for above (obsolete) (deleted) —
Attached file Tweaks (obsolete) (deleted) —
Removed parenIndex & parenCount from state - can reload them from bytecode to save space on backtrack stack. Some other clean-ups also. Now returns undefined instead of empty strings for empty paren contents - (this causes three more test case failures, but they had incorrectly required the old "" result instead).
Attachment #70537 - Attachment is obsolete: true
Roger: do you have those test names? I'll go fix them; thanks -
Sure it's: ecma_3/RegExp/regress-31316.js, section 1 ecma_3/RegExp/regress-57572.js, section 3 ecma_3/RegExp/regress-87231.js, section 3 & section 6
Note: this bug has a patch needing r= and sr=
Attached file Missing re-initialization of empty paren contents (obsolete) (deleted) —
Attachment #72515 - Attachment is obsolete: true
Attached file Changed parenCount to uint16 (obsolete) (deleted) —
Fixed compiler warning from assigning JSRegExp->parenCount (which is a 24 bit value) to the internal parenCount (which is now uint16 throughout). I'm prepared to argue that uint16 is way plenty of parentheses. [The bytecode only allows for uint16 encoding of parenthesis indices in any case]
Attachment #85675 - Attachment is obsolete: true
cc'ing Waldemar as a super-reviewer -
waldemar is not a super-reviewer, though I'm happy to propose him if he wants to become one. I'll take a look at the latest patch soon. It would be good if khanson would r= it first. /be
I will review.
Blocks: 149801
Attached file Fixed beyond end of string reference from \w & \W (obsolete) (deleted) —
Java port discovered that the code for \w & \W was reading beyond the end of the string. Plus some preemptive nit-picking.
Attachment #86255 - Attachment is obsolete: true
Attached file One more fix from Java port (obsolete) (deleted) —
The Java port also discovered the off-the-end-of-the-array bug when resetting the capture contents for a failed quantifier child.
Attachment #88734 - Attachment is obsolete: true
Attached file Fixed match on empty paren (obsolete) (deleted) —
Expressions like /(a)|\1/("x") were matching because an empty (i.e. un-matched) capture content was matching against any input instead of none.
Attachment #89002 - Attachment is obsolete: true
Attached file Re-fixed empty paren match (obsolete) (deleted) —
Backing out previous change to empty capture match. An undefined value in the captures array should trivially match, not fail. Also fixed enumerability (or required lack thereof) of RegExp properties.
Attachment #89618 - Attachment is obsolete: true
Attached file DontEnum regexp props (obsolete) (deleted) —
Added fix from Bug155291 to make RegExp properties not enumerable.
Attachment #90652 - Attachment is obsolete: true
Oops, running out of time for 1.2alpha, but I'll plead the case with drivers. Newly updated patch needed? /be
Attached file Updated jsregexp.c (obsolete) (deleted) —
Updated for octal handling & get/setLastIndex changes.
Attachment #95665 - Attachment is obsolete: true
Attached file Updated jsregexp.h (obsolete) (deleted) —
Ditto. I've run these set of changes against the test suite - all passed (well, all except for 8 that have nothing to do with regexp).
Attachment #70538 - Attachment is obsolete: true
*** Bug 169497 has been marked as a duplicate of this bug. ***
Ok, I'm going to review this patch this week. Rogerl, do you have that testcase from the back of Friedl's regex book from O'Reilly, the huge RFC-822 address-matching one? We should benchmark that and get it into the testsuite. /be
It's in the test suite already - it's parts 3 & 4 of ecma_3/RegExp/regress-85721.js, the original DreamWeaver test case is parts 1 & 2 of that file - Phil has it set to pass if it runs in under 100ms.
How does perl do on the same regexp and input, on the same machine? /be
Best I can tell, the JS engine is about 25 times slower than Perl's, I'm attaching the test files - I ran Perl with 1000000 iterations in 43 seconds (I could only figure out how to get timing to a second). JS ran 10000 iterations in 10 seconds.
Attached file JS version of timing test (deleted) —
Attached file Perl version of timing test (deleted) —
Performance improvement on ecma_3/RegExp/regress-85721.js. In optimized JS shell, WinNT4.0(SP6) 500Mhz 128M Section 1 Section 3 BEFORE THE PATCH 7687 ms 49329 ms AFTER THE PATCH 0 ms 0 ms Yes, 0 ms most of the time after the patch! (Some passes are 15 ms, but most are 0 ms)
Attached file Peek ahead to avoid backtrack traffic (obsolete) (deleted) —
Quantify showed that about 75% of the time is allocating and loading the backtrack stack. This optimization peeks at the next opcode and, if it's 'simple' enough, speculatively executes it. If it fails, no need to push/pop on the stack. Speeds things up by about 3 - so now I have only another 8x to go :-)
Attachment #97781 - Attachment is obsolete: true
Attached file Squeezed out another 15% (obsolete) (deleted) —
Cleaned up previous effort - take advantage of peek match if it was succesful. Reduced size & complexity of backtrack allocation.
Attachment #100122 - Attachment is obsolete: true
This is the current situation reported by Quantify. The only stuff being arena allocated is the copy of the state stack that accompanies each backtrack state structure. The maximum depth of that is only 5, so I think it'd be cheaper to allocate a simple pool and track it myself rather than end up manipulating the free list? Function Calls F. Time F+D. Time F% F+D% ---------------------------------------------------------------------------------- MatchRegExp 10000 730.11 4992238.85 0.01 100.00 executeREBytecode 10000 1119551.86 4394836.28 22.43 88.03 pushBackTrackState 3855000 388977.84 1765455.43 7.79 35.36 memcpy 10955000 1745378.14 1745378.14 34.96 34.96 JS_ArenaRelease 10000 433.03 595804.77 0.01 11.93 FreeArenaList 10000 6329.31 595371.74 0.13 11.93 simpleMatch 9530000 521571.15 521664.91 10.45 10.45 JS_ArenaAllocate 135000 12643.50 519454.43 0.25 10.41 malloc 135026 351878.89 506883.45 7.05 10.15 free 135000 285937.59 452865.68 5.73 9.07 memset 270026 136180.92 136180.92 2.73 2.73 JS_realloc 15000 287.01 100676.78 0.01 2.02 realloc 15000 49925.71 100389.77 1.00 2.01 processCharSet 26 8.21 93.76 0.00 0.00 JS_malloc 26 0.65 73.18 0.00 0.00 addCharacterToCharSet 198 5.38 5.38 0.00 0.00 addCharacterRangeToCharSet 25 2.82 2.82 0.00 0.00 Maximum depth of backtrack stack is 64, maximum depth of state stack is 5
Attached file More speed-ups (obsolete) (deleted) —
Latest version - uses a single Arena for all regexp execution-time allocations, re-uses parent stack for all quantifier children, some inlining. Now running 100000 iterations of the timing test in 24 seconds, so only off by a factor of 6 instead of 25!
Attachment #100198 - Attachment is obsolete: true
Alias: RegExpPerf
Attached file Latest 'n' greatest (obsolete) (deleted) —
A variety of changes; some more minor performance tweaks, including prerequisites for REOP_ALT when both paths have either FLAT or a FLAT and CLASS. Collapsed the top parent state being held separately into the parent state stack, keeping it separate turned out to be a false economy. A bunch of source cleaning, avoiding some unneccesary trips through the dispatch loop. [Still need to copy these changes back into the Java version].
Attachment #102047 - Attachment is obsolete: true
Blocks: 188206
Roger, it looks like we need a new jsregexp.h, too. (REGlobalData.pool is missing). --scole
Attached file Really latest 'n' greatest (obsolete) (deleted) —
No, I'm just the idiot that uploaded the file I was messing around with. I'd commented out the 'pool' field to check the references to it.
Attachment #111344 - Attachment is obsolete: true
* quantatom: atom An unquantified atom. ... * any of which can be optionally followed by '?' for ungreedy Is this ungreedy option new?
/x*{/ is no longer a regex syntax error, as the ECMA spec says it should be. Is this desired behavior? --scole
- * IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or + * IS" basis, WITHOUT WARRANTY OF ANY KIND, either express oqr You might not want to put this misspelling back into the code base. --scole
Nope, the default case on the atom parser is not reporting an error for the meta-characters; fix forthcoming (and for the text).
Ungreedy has been in ECMA3 from the start, it behaves just like the Perl version.
REMatchState state; /* the state of the current match */ /* Note that it's size depends on it's => its the comments in REGlobalData aren't aligned... /* Construct an initialize an RENode, returning NULL for out-of-memory */ an => and else { if (tailTerm == NULL) { headTerm->next = state->result; tailTerm = state->result; while (tailTerm->next) tailTerm = tailTerm->next; } else { tailTerm->next = state->result; tailTerm = tailTerm->next; while (tailTerm->next) tailTerm = tailTerm->next; } } is there a reason not to unify the two while loops? if (JS7_ISDEC(c)) { value = (10 * value) + JS7_UNDEC(c); ++state->cp; } else break; to me this is the equivalent of an else after return, instead you could do: if (!JS7_ISDEC(c)) break; ... the documentation for: parseTerm(CompilerState *state) doesn't mention \u break; } else { <-- useless else /* a trailing '\' is an error */ js_ReportCompileErrorNumber(state->context, state->tokenStream, NULL, JSREPORT_ERROR, JSMSG_TRAILING_SLASH); return JS_FALSE; } your normal style appears to be to report errors early and then return, so if (<cond>) { <code> break; } else { <reportError> return JS_FALSE; } => if (!<cond>) { <reportError> return JS_FALSE; } <code> js_ReportCompileErrorNumber(state->context, state->tokenStream, NULL, JSREPORT_ERROR, JSMSG_MISSING_PAREN, termStart); return JS_FALSE; } else <-- useless else the switch in js_NewRegExpOpt indents |case| statements 2 spaces, whereas parseTerm doesn't.... both indentation styles appear in the attachment...
Attached file Error for bad quantifiers + style fixes (obsolete) (deleted) —
Thanks! I think I addressed everything with this. Note that two of the tests fail with this patch (regress-57572.js & perlstress-002.js) because of improperly escaped '?' characters. Phil; I think the right thing would be to change the tests and add a negative test for invalid syntax. The demi-indent was a left over from way back when the regular indenting had hit the 80 column too often, I think I've removed all such now.
Attachment #111427 - Attachment is obsolete: true
Testcase added to JS testsuite: mozilla/js/tests/ecma_3/RegExp/regress-188206.js As suggested by Roger in the comment above, this test deliberately misuses regexp quantifiers and checks for SyntaxErrors to result. I will now work on correcting regress-57572.js and perlstress-002.js.
Testcases regress-57572.js and perlstress-002.js have been corrected. In the former, one section was removed that tested /(\S+)????(.*)/ The ECMA spec allows no more than two consecutive |?| operators. In the latter, about twenty sections have been commented out. They tested the following operators, which are supported in Perl, but not in ECMAScript: ECMA doesn't support (?< ECMA doesn't support (?(condition) For example, here are two that have been commented out: pattern = /(?<=a)b/; pattern = /((?i)a)b/; etc.
Attached file Update for bug fixes (obsolete) (deleted) —
Attachment #111524 - Attachment is obsolete: true
Blocks: 191479
Blocks: 190685
Blocks: 94570
Blocks: 165353
Blocks: 169534
Depends on: 173067
Blocks: 187133
Attached file Switched to iterative parsing (obsolete) (deleted) —
Changed recursive RegExp parser & tree-walker to be iterative instead.
Attachment #113430 - Attachment is obsolete: true
Attachment #115991 - Attachment is obsolete: true
Blocks: 192626
Attached file Error message addition. (obsolete) (deleted) —
The latest patch passes the JS testsuite on WinNT, in both the debug and optimized JS shell. Additionally, the patch fixes all our current RegExp test failures. In fact, these were the only failures I got (all due to other known bugs): js1_5/Array/regress-101964.js js1_5/Object/regress-90596-001.js js1_5/Object/regress-90596-002.js js1_5/Object/regress-90596-003.js js1_5/Regress/regress-152646.js js1_5/Regress/regress-192414.js js1_5/Regress/regress-96128-n.js
How's performance with the latest patch, vs. Nav4 JS and vs. Perl5? Should I start reviewing? Sorry I haven't really done anything yet. I will help get this reviewed for 1.4a, for sure! /be
Attached file HTML timing test (Don't run under NS7!!) (obsolete) (deleted) —
Brendan; sorry for the delay, I've been 'flu'd. Yes, this is the version to be reviewed, please. I made a version of the original Macromedia testcase (the Friedl timing test won't work under Nav4 because it uses Perl5 features). I pulled JS1.4 release shell to test against: JS1.4 shell - 37594 ms. Patched 1.5 shell - 14981 ms. IE6 - 11076 ms. NS7 - Don't do it! 10 iterations took 51 seconds, 100000 iterations will hurt. I don't have a Perl version just yet. Perhaps when the dwarves in my head take a break from their mining operations.
Attached file Perl timing test (resolution in sec, not ms) (obsolete) (deleted) —
NOTE: the two loop UBounds in the HTML test in Comment #62 are 10, 1, resp. Typo: in the Perl test I just attached, the two UBounds are 100, 1, resp. I've just realized, that first UBound should be 100000 to be consistent with Roger's results. I'll go ahead and re-attach both testcases below. I'm assuming the second UBound is 1 because it's just testing a no-match. NOTE: the default install of Perl 5 can only time in seconds, not milliseconds. For ms, install the HiRes.pm module and use the |gettimeofday| function. I have assumed the default, and so the Perl testcase times in seconds -
Attachment #116327 - Attachment is obsolete: true
Attachment #116358 - Attachment is obsolete: true
Attachment #116362 - Attachment is obsolete: true
Sorry for the spam. Just spoke with Roger, and that second UBound should also be 100000. I will attach the final corrections below -
Attachment #116363 - Attachment is obsolete: true
Attachment #116365 - Attachment is obsolete: true
Attached file Cleaned up, reduced paren state thrashing. (obsolete) (deleted) —
Even cleaner, more reviewable version. [Note that I think there's an issue in the timing tests. In a straightforward backtracking head-to-head with the 1.4 engine, I'd expect the iterative engine to lose unless the stack was getting swapped, which I doubt is happening with these small test cases. I'm looking into what's up.]
Attachment #116044 - Attachment is obsolete: true
Target Milestone: mozilla1.2alpha → mozilla1.4alpha
Blocks: 197451
Comment on attachment 116728 [details] Cleaned up, reduced paren state thrashing. In addition to the usual stuff, I'm looking for comments on the performance. The bottleneck is the memory traffic for saving & restoring the backtrack state. Any insights there would be great! [Note that you need the jsregexp.c from this attachment, the jsregexp.h from attachment #97782 [details] and the js.msg from attachment #116046 [details] ]
Attachment #116728 - Flags: review?(brendan)
Attached patch js.msg change as a patch (obsolete) (deleted) — Splinter Review
Thanks to Phil for pointing out the js.msg attachment had rotted. This is a patch version against the current tip instead. The other attachments are the complete files and should drop straight into a tip build.
Attachment #116046 - Attachment is obsolete: true
Attachment #117909 - Attachment is obsolete: true
Blocks: js-perf
Comment on attachment 97782 [details] Updated jsregexp.h > * IS" basis, WITHOUT WARRANTY OF ANY KIND, either express oqr I notice that this jsregexp.h still has the "oqr" misspelling present... --scole
*** Bug 202564 has been marked as a duplicate of this bug. ***
I believe 1.5alpha is the right time to get this in. As it happens, we are currently in that milestone. Let's try to get this reviewed and checked in as soon as possible so that we will have reasonable amount of time to find and fix any regressions.
Target Milestone: mozilla1.4alpha → mozilla1.5alpha
Attached file Fixed string index overflow (deleted) —
Bug 209067 revealed limits from using int16. Also fixed 'oqr' in comment.
Attachment #97782 - Attachment is obsolete: true
Attachment #116728 - Attachment is obsolete: true
Blocks: 209067
Blocks: 209919
A minor problem with the patch: for ( var txt= "", i= 0; i < 65536; ++i ) txt += "x"; /^x{0,65535}$/.test( txt ); should be false, but now is true, because 65535 == (uint16)(-1), and thus it'll be treated as /^x*$/. The upper limit for min/max needs to be reduced to 65534. There's no explicit limit in the ECMA spec and I don't think it's much a practical backwards compatibility problem. And the parameter for the error messages for "overlarge minimum" and "overlarge maximum" ('{') isn't very meaningful. But it wasn't much better before (single digit instead of the whole number).
Blocks: 123437
moving out...
Target Milestone: mozilla1.5alpha → mozilla1.6alpha
Comment on attachment 125833 [details] Fixed string index overflow. Fixed illegal {} usage detection MSVC Level 4 warnings produce: mozjs\src\jsregexp.c(882) : warning C4189: 'parenBaseCount' : local variable is initialized but not referenced mozjs\src\jsregexp.c(3756) : warning C4505: 'flatNMatcher' : unreferenced local function has been removed
*** Bug 216591 has been marked as a duplicate of this bug. ***
Checked in for 1.6alpha. I'll file a sequel bug tomorrow. /be
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Attachment #116728 - Flags: review?(brendan)
Bug 223273 might be caused by this patch...
This have added a number of "might be used uninitialized" warnings on brad TBox: +js/src/jsregexp.c:1836 + `rangeStart' might be used uninitialized in this function + +js/src/jsregexp.c:2270 + `result' might be used uninitialized in this function + +js/src/jsregexp.c:2902 + `parsub' might be used uninitialized in this function + +js/src/jsregexp.c:425 + `errPos' might be used uninitialized in this function + +js/src/jsregexp.c:426 + `parenIndex' might be used uninitialized in this function + +js/src/jsregexp.c:430 + `op' might be used uninitialized in this function + +js/src/jsregexp.c:661 + `rangeStart' might be used uninitialized in this function
No longer blocks: 165349
Attachment #70537 - Attachment is patch: true
Comment on attachment 70537 [details] New implementation of fix. woops
Attachment #70537 - Attachment is patch: false
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: