Closed Bug 1478045 Opened 6 years ago Closed 6 years ago

Replace ungetCodePointIgnoreEOL/ungetNonAsciiNormalizedCodePoint with SourceUnits::{peek,consumeKnown}CodePoint

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Attachments

(6 files, 2 obsolete files)

(deleted), patch
arai
: review+
Details | Diff | Splinter Review
(deleted), patch
arai
: review+
Details | Diff | Splinter Review
(deleted), patch
arai
: review+
Details | Diff | Splinter Review
(deleted), patch
arai
: review+
Details | Diff | Splinter Review
(deleted), patch
arai
: review+
Details | Diff | Splinter Review
(deleted), patch
arai
: review+
Details | Diff | Splinter Review
I added these interfaces earlier, but upon further review they're not really the right thing. Most cases where they're used, it's super-obnoxious to have to unget-and-undo-EOL-updating. It's also messy to have to unget a variable number of code units -- a variable count that will be a bit more complicated in a UTF-8 world (and will involve us computing length-counts two separate places). Much better to peek at the next code point -- the actual operation we usually want -- and act based on that, sometimes never getting the code point, sometimes consuming it based on that already-computed info. It's maybe a bit subtle, but it's clearer than the alternative.
The first patch adds these functions and introduces one use of them -- incidentally, the most complicated use, because it must manually address updates for EOL (no other users will). Then I have patches replacing the other users of the existing functions, then a final patch to remove the old functions.
Attachment #8994549 - Flags: review?(arai.unmht)
Oh -- in the previous patch, note that PeekedCodePoint being CharT-parametrized is only to be able to assert proper length-in-units. I assume compilers can boil away the multiple class instantiations in practice.
Attachment #8994550 - Flags: review?(arai.unmht)
I suppose I should note that the existing users are all known-non-ASCII, but {peek,consumeKnown}CodePoint handle ASCII as well as non-ASCII. In principle I could exclude the ASCII case, maybe, but I'm not sure it buys anything, and I kinda feel like handling everything may come in handy if we use this peek-code-point functionality in any other places in the future.
Attachment #8994556 - Flags: review?(arai.unmht)
Turns out this algorithm comes in handy for SourceUnits::findWindowEnd, so it makes sense to define it such that SourceUnits::peekCodePoint *and* that function can use it.
Attachment #8994687 - Flags: review?(arai.unmht)
Attachment #8994549 - Attachment is obsolete: true
Attachment #8994549 - Flags: review?(arai.unmht)
Blocks: 1478170
Attachment #8994704 - Flags: review?(arai.unmht)
Attachment #8994687 - Attachment is obsolete: true
Attachment #8994687 - Flags: review?(arai.unmht)
Comment on attachment 8994704 [details] [diff] [review] One last bit of tweaking to this, for realz Review of attachment 8994704 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/frontend/TokenStream.h @@ +986,5 @@ > +template<> > +struct SourceUnitTraits<mozilla::Utf8Unit> > +{ > + public: > + static constexpr uint8_t maxUnitsLength = 2; 4 @@ +993,5 @@ > + return codePoint < 0x80 > + ? 1 > + : codePoint < 0x800 > + ? 2 > + : codePoint < 0x10000 might be nice to add constants for those boundaries in Unicode.h or somewhere, maybe in followup bug/patch. @@ +1016,5 @@ > + MOZ_ASSERT(lengthInUnits != 0, "bad code point length"); > + MOZ_ASSERT(lengthInUnits == SourceUnitTraits::lengthInUnits(codePoint)); > + } > + > + static PeekedCodePoint none() { can you add some comment at the top of this class that clarifies this class represents either a single code point or "none", and what the "none" actually means? (basically similar thing as the comment for `SourceUnits::peekCodePoint()`) without that it's surprising to see this method here.
Attachment #8994704 - Flags: review?(arai.unmht) → review+
Attachment #8994550 - Flags: review?(arai.unmht) → review+
Attachment #8994551 - Flags: review?(arai.unmht) → review+
Attachment #8994552 - Flags: review?(arai.unmht) → review+
Attachment #8994553 - Flags: review?(arai.unmht) → review+
Attachment #8994556 - Flags: review?(arai.unmht) → review+
Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/mozilla-inbound/rev/b4a63f805c03 Implement SourceUnits::{peek,consumeKnown}CodePoint for the uncommon cases where a code point must be gotten, tested against one or more predicates, then sometimes ungotten based on those predicates. (This is unfortunately a bit subtle, but getting and ungetting is arguably worse, because ungetting has to unget a variable number of code units -- whereas peeking can compute that number of code units and then use it directly when the peeked code point is consumed, avoiding double-computation and increased potential for error.) r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/c79e79e350bf Use SourceUnits::peekCodePoint to look for an IdentifierStart after most decimal numeric literals, rather than getting and ungetting. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/51e2ba3c67d6 Use SourceUnits::peekCodePoint when examining a non-ASCII code point that might terminate a directive in a comment. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/466d4573b39a Use SourceUnits::peekCodePoint when examining a non-ASCII code point that might terminate an IdentifierName. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/d1d2c7b998ef Use SourceUnits::peekCodePoint when examining a non-ASCII code point after non-decimal numeric literals and a subset of decimal numeric literals. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/d93c6f631ca7 Remove TokenStreamChars::{ungetCodePointIgnoreEOL,ungetNonAsciiNormalizedCodePoint} as unused now that SourceUnits::{peek,consumeKnown}CodePoint have replaced them. r=arai
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: