Closed Bug 1634135 Opened 5 years ago Closed 5 years ago

Enable new regexp engine

Categories

(Core :: JavaScript Engine, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla78
Tracking Status
relnote-firefox --- 78+
firefox78 --- fixed

People

(Reporter: iain, Assigned: iain)

References

(Regressed 1 open bug)

Details

(Keywords: dev-doc-complete)

Attachments

(12 files, 1 obsolete file)

(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details

It's a little late in the 77 cycle to want to flip the switch, but we want to turn this on and get some testing in Nightly as soon as 78 opens.

My try builds are all green, and the performance numbers look good. There is still a ~5% regression on the backbonejs speedometer tests that I'm looking into, but it's balanced out by some nice wins on jetstream2.

With these patches, we support lookbehind assertions, unicode escape sequences, and the dotAll property. Named captures and capture indices will be implemented separately.

This was broken by changes to token streams in bug 1592105.

If a look-behind back-reference succeeds, we have to subtract the length of the capture from the current position (so that the current position points to the beginning of the capture). We don't have the length in a register, so we have to read it from the capture registers, which are stored on the stack. However, we pushed the initial value of the current position, so the stack pointer is offset by one word from where we expect.

The fix is to pop the saved value before subtracting the length.

With this fix, we pass all the test262 tests for look-behind assertions, dotAll, and unicode property escapes. (I will turn them on in a separate bug.)

Depends on D73112

The engine supports parsing named captures, but we don't have the code in place to expose the captured groups. Until we do, this code will make sure that we get a syntax error when parsing them.

Depends on D73113

Interpreted bytecode is ~3-5 times slower than compiled code, but ~5-10 times smaller. In general it seems like this is a good trade-off for the first few iterations, but for particularly long input strings this can cause a significant slowdown before we tier up. V8 eagerly tiers up to compiled code when the input string is 1000+ characters long. Following their lead on this fixes a significant regression in sunspider-regexp-dna.

Depends on D73115

This is as good a time as any to pull in some recent upstream changes (many of which I wrote). This patch was auto-generated using import-irregexp.py. The necessary changes to the shim code are in the next patch.

Depends on D73116

There are a few changes here:

  1. The code that used to be in wrapBody was moved to RegExpCompiler::PreprocessRegExp.

  2. The code field in RegExpCompileData used to be a bare Object, and is now a Handle<Object>. (This makes named captures way easier to implement, because it lets us to allocate GC things while parsing a regexp without angering the hazard analysis.) I also took this opportunity to remove some dead code in the shim implementation of Handle.

  3. Upstream V8 made a change to simplify the interface with the interpreter. Previously, the caller of IrregexpInterpreter::MatchForCallFromRuntime was responsible for allocating enough memory to hold both the capture results and the scratch registers. Now, the interpreter has the same interface as the compiler: the caller passes in a buffer large enough to hold the capture results, and the memory for the scratch registers is allocated internally. This requires a few small additions to the shim (IsInRange plus new methods on JSRegExp and SmallVector).

Depends on D73117

These two tests need to be updated to be aware of the new dotAll flag.

If we ever have to turn off the new engine, this patch should also be temporarily reverted.

Depends on D73118

Pull the lever!

(After responsibly waiting for 78 to open.)

Depends on D73119

Once the patch for this landed, it should probably be documented at https://wiki.developer.mozilla.org/en-US/docs/Mozilla/Firefox/Experimental_features.

Though it's not clear to me yet whether enabling the new engine actually also automatically enables all the features blocked by bug 1367105. Iain, can you please clarify that?

Sebastian

Flags: needinfo?(iireland)

With these patches, we support lookbehind assertions, unicode escape sequences, and the dotAll property. Named captures and capture indices will be implemented separately.

Thanks for the pointer to the wiki.

Flags: needinfo?(iireland)
Blocks: 1634810

If a regular expression is too big, the assembler may fail with RegExpError::kTooLarge. When it does so, we want to throw an error: "regexp too big".

Until the most recent reimport of irregexp, we were actually reporting an OOM in these cases, because CompilationResult::code was default-constructed as an UndefinedValue and we took the "OOM in GetCode" path. Now CompilationResult::code is a Handle, so we crash if we try to access the value.

Making the situation slightly more complicated is the fact that we still have a macroassembler live, which means that we can't GC, which means that we can't report an error. The old code used an AutoSuppressGC for this (https://searchfox.org/mozilla-central/source/js/src/irregexp/RegExpEngine.cpp#1703), but that seems like an extremely blunt instrument.

Instead, I've refactored CompilePattern to call a separate Assemble function. This means that we clean up the macroassembler before we call JS_ReportErrorASCII. The new function is a straight copy-paste of the old code, except for error handling and . to -> conversions for the values being passed by reference. Note that the order of checks has changed after calling compiler->Assemble(...): now we check result.Succeeded() before examining result.code.

We also change the shared labels in SMRegExpMacroAssembler to be NonAssertingLabels. This suppresses assertions in the Label destructor that they are not used without being bound. The assertion is already suppressed for OOM (https://searchfox.org/mozilla-central/source/js/src/jit/Label.h#82-86), which is why we did not trigger it previously.

Blocks: 1635275
Pushed by iireland@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/35f6669e31a9 Fix dummy TokenStream r=djvj https://hg.mozilla.org/integration/autoland/rev/6aaf6e87a5d1 Fix look-behind back-references r=jandem https://hg.mozilla.org/integration/autoland/rev/18b93628199b Disable named capture parsing until fully supported r=mgaudet https://hg.mozilla.org/integration/autoland/rev/c2183a88c085 Eagerly tier up for long regexp inputs r=mgaudet https://hg.mozilla.org/integration/autoland/rev/4560a77cbe7b Fresh import of irregexp r=mgaudet https://hg.mozilla.org/integration/autoland/rev/f849ce4fe7de Update shim code r=mgaudet https://hg.mozilla.org/integration/autoland/rev/080d06573454 Throw 'regexp too big' errors properly r=mgaudet https://hg.mozilla.org/integration/autoland/rev/88cf3b3be886 Update tests to expect RegExp.prototype.dotAll r=mgaudet https://hg.mozilla.org/integration/autoland/rev/392feb3c6f73 Turn new regexp engine on by default in Nightly r=mgaudet

We are adding support for the dotAll (/s) RegExp flag, so the list of expected properties on the RegExp prototype has to be updated.

This assertion triggered in dom/base/crashtests/1505811.html, which exhausts the stack before calling IsPatternMatching with an empty string pattern. The first attempt at parsing doesn't involve any stack checks, but the second regexp is more complicated and calls CheckRecursionLimit, which fails.

The fix is to handle the failure instead of asserting.

(Note: I think returning Nothing() here is correct, for the same reasons that we return Nothing() if the subsequent call to ExecuteRegExpNoStatics fails. I'm not sure why the first attempt at compilation returns Some, though.)

Depends on D74149

ASAN found a leak.

We destroy the isolate with a level of indirection through RegexpAPI so that JSContext doesn't have to be able to see the full definition of Isolate.

Severity: -- → N/A

The regexp engine is adding support for the dotAll flag (/s). As a result, /abc/gimsuy in structured-clone.any.js (https://searchfox.org/mozilla-central/source/testing/web-platform/tests/IndexedDB/structured-clone.any.js#159) is no longer a syntax error, which means that the structured cloning tests don't all immediately fail.

Some of them still fail, presumably for different reasons. This patch is the result of running the wpt tests and updating the expected results according to the instructions here: https://searchfox.org/mozilla-central/source/testing/web-platform/README.md#158-173

Try build: https://searchfox.org/mozilla-central/source/testing/web-platform/README.md#158-173

Attachment #9146340 - Attachment is obsolete: true

Thanks for all your hard work - good to see the new regexes are finally here!

Named captures and capture indices will be implemented separately.

Do you know when that will also be in? Will it be come before the lookbehind support moves from nightly into the beta release?

If a regular expression is too big, the assembler may fail with RegExpError::kTooLarge.

Are there any built-in limits, or is "too big" just based on available machine memory? And is the limit any different from the current Firefox release?

(Our largest, working on Chrome/Opera, is made from a string with over 1M unicode characters.)

Do you know when that will also be in? Will it be come before the lookbehind support moves from nightly into the beta release?

Named captures are mostly done, and should be in Firefox 78 alongside lookbehind and friends. I just have to finish up the integration with Regexp.prototype[@@replace].

I haven't started capture indices yet. I'm a little concerned that adding them might slow down the common case. I have some ideas for how to work around that, but it would be a larger project, and I'm not sure it will fit into 78.

Are there any built-in limits, or is "too big" just based on available machine memory? And is the limit any different from the current Firefox release?
Under the covers, we are using the same regexp engine as Chrome, so if it's not too big for Chrome, it shouldn't be too big for Firefox.

Flags: needinfo?(iireland)
Pushed by iireland@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a73e5ff68480 Fix dummy TokenStream r=djvj https://hg.mozilla.org/integration/autoland/rev/49b9d6a8a3a5 Fix look-behind back-references r=jandem https://hg.mozilla.org/integration/autoland/rev/f53110858cb9 Disable named capture parsing until fully supported r=mgaudet https://hg.mozilla.org/integration/autoland/rev/ba01ceb63cf3 Eagerly tier up for long regexp inputs r=mgaudet https://hg.mozilla.org/integration/autoland/rev/9dddfd577a3d Fresh import of irregexp r=mgaudet https://hg.mozilla.org/integration/autoland/rev/b3b82e1cda7f Update shim code r=mgaudet https://hg.mozilla.org/integration/autoland/rev/f3b9c956fa85 Throw 'regexp too big' errors properly r=mgaudet https://hg.mozilla.org/integration/autoland/rev/3f680457842f Update tests to expect RegExp.prototype.dotAll r=mgaudet https://hg.mozilla.org/integration/autoland/rev/d8f770d123f2 Don't leak Isolate r=mgaudet https://hg.mozilla.org/integration/autoland/rev/4eda5acc8e1f Update test_xrayToJS to handle Regexp.prototype.dotAll r=bholley https://hg.mozilla.org/integration/autoland/rev/0081b4c73633 Update expectations for wpt IndexedDB structured clone tests r=dom-workers-and-storage-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/ee1018a8611a Turn new regexp engine on by default in Nightly r=mgaudet
Depends on: 1636495

Iain, I suppose this should be documented in the developer release notes for 78? https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Releases/78 (dev-doc-needed keyword). Is that big enough of a change to be also in the developer section of our general release notes?

Flags: needinfo?(iireland)
Backout by archaeopteryx@coole-files.de: https://hg.mozilla.org/mozilla-central/rev/2ad15139df29 Backed out 12 changesets for causing crashes e.g. when urls get pasted in Slack (bug 1637243). a=backout

Backed out together with bug 1636495 for causing crashes e.g. when urls get pasted in Slack (Bug 1637243) - had to be done to create new Nightlies.

https://hg.mozilla.org/mozilla-central/rev/2ad15139df29d03fbf6157bb37bf3592bd3348aa

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Regressions: 1637243
Regressions: 1637523
Attachment #9146339 - Attachment description: Bug 1634135: Update test_xrayToJS to handle Regexp.prototype.dotAll r=mccr8 → Bug 1634135: Update test_xrayToJS to handle Regexp.prototype.dotAll r=bholley
Attachment #9146637 - Attachment description: Bug 1634135: Update expectations for wpt IndexedDB structured clone tests → Bug 1634135: Update expectations for wpt IndexedDB structured clone tests r=asuth
Pushed by iireland@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/80d21da86415 Fix dummy TokenStream r=djvj https://hg.mozilla.org/integration/autoland/rev/e0f929077cce Fix look-behind back-references r=jandem https://hg.mozilla.org/integration/autoland/rev/c474eb63f626 Disable named capture parsing until fully supported r=mgaudet https://hg.mozilla.org/integration/autoland/rev/596dca6e3ed1 Eagerly tier up for long regexp inputs r=mgaudet https://hg.mozilla.org/integration/autoland/rev/09b8bab47f6b Fresh import of irregexp r=mgaudet https://hg.mozilla.org/integration/autoland/rev/534658e99952 Update shim code r=mgaudet https://hg.mozilla.org/integration/autoland/rev/98d7859686e6 Throw 'regexp too big' errors properly r=mgaudet https://hg.mozilla.org/integration/autoland/rev/16a1d1bbdd3e Update tests to expect RegExp.prototype.dotAll r=mgaudet https://hg.mozilla.org/integration/autoland/rev/5c0565636d06 Don't leak Isolate r=mgaudet https://hg.mozilla.org/integration/autoland/rev/5871f4ecce87 Update test_xrayToJS to handle Regexp.prototype.dotAll r=bholley https://hg.mozilla.org/integration/autoland/rev/3d92a3a541ff Update expectations for wpt IndexedDB structured clone tests r=dom-workers-and-storage-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/e365cbeabc4f Turn new regexp engine on by default in Nightly r=mgaudet

Pascal: Yes, this needs a dev doc update. I think it is probably big enough to be in the general release notes, but I don't know what the criteria are for that.

Flags: needinfo?(iireland)
Keywords: dev-doc-needed

Please note that only shipping features should be mentioned in the release notes. As far as I can see, this bug only enables it for Nightly, though, which rather qualifies it to be mentioned at https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Experimental_features, as I mentioned earlier.

Iain, I didn't see a bug yet to let the changes ride the trains to release. Is there already one? If not, could you please create one? Once that one is fixed, it should be mentioned in the release notes. For MDN setting the dev-doc-needed keyword is enough, for the general Firefox release notes you should set the relnote-firefox then.

And by the way, thank you very much for implementing this! I've been waiting for the lookbehinds and the other features for years!

Sebastian

relnote-firefox: --- → ?
Flags: needinfo?(iireland)

Sorry, the name of this bug is misleading. To be clear: these changes are intended to ride the trains to release in 78.

(The first version of the final patch in this stack only turned it on for Nightly, but since we're targeting 78 anyway, we turned it on everywhere to avoid merge-to-beta issues on the new tests. I forgot to update the patch title.)

Flags: needinfo?(iireland)
Summary: Enable new regexp engine in Nightly → Enable new regexp engine

Ah, I see! So if the changes aren't backed out again, it should be fine to set relnote-firefox.

Sebastian

(In reply to Sebastian Zartner [:sebo] from comment #28)

Please note that only shipping features should be mentioned in the release notes. As far as I can see, this bug only enables it for Nightly, though, which rather qualifies it to be mentioned at https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Experimental_features, as I mentioned earlier.

This is not correct. We have release notes for nightly and beta maintained by the release management team so any new feature in nightly and beta should be noted there following this process https://wiki.mozilla.org/Release_Management/Release_Notes#How_to_nominate_a_bug_for_release_notes_addition.3F

Release notes in nightly do not get into beta release notes automatically and beta release notes do not get into the final release notes automatically either.

So yes please, request an addition to the general release notes :)

Regards

Regressions: 1637977
Regressions: 1637913
Regressions: 1638154
Regressions: 1637174

Hi Iain, can you suggest some text for this release note?

Flags: needinfo?(iireland)

Release Note Request (optional, but appreciated)
[Why is this notable]: This adds support for several new RegExp features, closing a significant gap in web-compat.
[Affects Firefox for Android]: Yes.
[Suggested wording]: New RegExp engine in SpiderMonkey, adding support for the dotAll flag, Unicode escape sequences, lookbehind references, and named captures.
[Links (documentation, blog post, etc)]: I am in the process of writing a Hacks blog post. I will add a comment linking to that post once it's up.

Flags: needinfo?(iireland)

Thanks! Added to the draft beta notes.

Regressions: 1644513
Regressions: 1685798
Regressions: 1691184
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: