Closed Bug 1559253 Opened 5 years ago Closed 4 years ago

Invalid regexp literal shows wrong line and column number

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla80
Tracking Status
firefox80 --- fixed

People

(Reporter: sites+mozilla, Assigned: evilpie, Mentored)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0

Steps to reproduce:

Create a JS file with an unsupported regexp group (eg. using lookbehind). For example:

const testRegex = /(?<=[a-z])(?=[A-Z])/g;

Add some blank lines or comments to the top of the JS file so that the regex is not on the first line.
Load the page
Notice that the "invalid regexp group" error always appears on "line 1, column 1", rather than the proper line and column on which the regex is located at.

Actual results:

SyntaxError: invalid regexp group shown with incorrect line number

Expected results:

Correct line number should be shown

Component: Untriaged → JavaScript Engine
Product: Firefox → Core

Jeff, does this having thing to do with UTF-8 column issues? Probably not, but could quickly look at this when you have time.

Flags: needinfo?(jwalden)
Priority: -- → P2

This was a deliberate decision/change made while working through the UTF-8 parsing stuff.

https://searchfox.org/mozilla-central/rev/e5327b05c822cdac24e233afa37d72c0552dbbaf/js/src/irregexp/RegExpParser.cpp#287

The theory is that line/column numbers within the regular expression text are more useful than line/column numbers of the overall script, because regular expressions in general (e.g. created by new RegExp) can contain newlines and other such characters, and even plain old scripts can contain escapes that would be converted to literal U+000A LINE FEED and so on by the time the regular expression engine saw them. Also, in the HTML context, for an inline script you could have &lt; and such within the content of a <script> that would further throw off column numbering, so all in all for regular expressions it makes more sense to just make the coordinates be relative to the regular expression string itself. That's why the line number is one, and that's why the column number is 1 (add some characters after that opening slash and the column number would be correspondingly increased).

However, as it happens regular expression literals' contents are accumulated code point by code point, so the JS engine does see two characters for the regular expression literal /\n/. And <script> element contents escaping doesn't quite work exactly as I was thinking it did, so that logic doesn't quite make sense either.

So this could probably be changed/reverted, and arguably it should be changed/reverted. I'm not 100% certain, but I think this could be done without interfering with any of the UTF-8 work that was done, that ostensibly motivated the change.

This is possibly a good first-ish bug, for someone who's willing to dig in a little for a change that shouldn't be too big, but will be a little gnarly in the probably couple-dozen lines that would need to be looked at/changed.

Mentor: jwalden
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(jwalden)
Version: 67 Branch → Trunk

I agree that having the line/column number from the regex can be useful. However, in my case the bad regex was in a large JavaScript bundle (1+ MB) and it was extremely difficult to track down the issue as the error message just said the error was on "line 1" :)

Is it possible to show both the JS line/column AND the line/column within the regex? For example, just include the regex line/column in the error message itself?

The gist of comment 3 was that we probably should just change back to the previous semantics. :-) The rationale for the new behavior isn't terribly great, and the old behavior is basically what we should be doing. So I don't think we need to put effort into showing both forms of information somehow.

I am currently facing this issue as well. As already noted, displaying the line and col of the Regex isn't very helpful when you can't find the actual regex that is offending. (just click the in error in the console and you get no were useful)..

Just as @Daniel15, I am having this in a JS bundle.
My problem is that if I disable bundling and load all the bundled script files separate I am not seeing this problem in FF and our page is loading as expected, so I really need to track down the problem in the bundle script to get a better understanding of what is going wrong.

Has anyone found a way to assist in identifying where the actual issue in such a script may reside?

I assume your are using something like named capture groups, those will be supported by the next stable Firefox 78.

However we should definitely fix the underlying issue here: RegExp errors need to show the line and column number in the source instead of inside the regexp. Who can fix this?

Flags: needinfo?(jwalden)
Flags: needinfo?(iireland)
Blocks: jserror

(In reply to Tom Schuster [:evilpie] from comment #8)

I assume your are using something like named capture groups, those will be supported by the next stable Firefox 78.

However we should definitely fix the underlying issue here: RegExp errors need to show the line and column number in the source instead of inside the regexp. Who can fix this?

It turned out to be a REGEX where one of our interns had introduced Positive and Negative Lookbehinds, which apparently works in Chrome.
But that was besides the point, the Question was more if anyone had a way to somewhat easily identify where the issue came from, sitting with 100k+ lines and trying to identify where the error is can be a daunting task.

I ended up having to do sort of a manual "Binary Search", Delete ½ of the file, see if the error wen't away. If it resided, delete ½ of what was left, check again, and then repeat until I had a small enough segment that I could scan through to find the error. But that was... well... Annoying to say the least. :S...

But that was my way, if anyone has a more optimal one please share >.<

Bug 1635724 is a dupe of this. I made a half-hearted attempt at fixing it, but I lost momentum in a twisty maze of error code. It should mostly involve rewriting this code.

Flags: needinfo?(iireland)
Summary: "SyntaxError: invalid regexp group" shows wrong line and column number → Invalid regexp literarl shows wrong line and column number
Summary: Invalid regexp literarl shows wrong line and column number → Invalid regexp literal shows wrong line and column number

This is somewhat of an ugly approach to implementing this. I couldn't figure out how to implement
this by calling computeErrorMetadata, because that doesn't exist on TokenStreamAnyChars.
This approach is non invasive and only causes changes for calls from the frontend Parser.

Depends on D81567

Assignee: nobody → evilpies
Status: NEW → ASSIGNED
Pushed by evilpies@gmail.com: https://hg.mozilla.org/integration/autoland/rev/8f807266e109 Make RegExp errors caused by parser point to source script line/column. r=iain
Flags: needinfo?(jwalden)
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla80
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: