Closed
Bug 1478045
Opened 6 years ago
Closed 6 years ago
Replace ungetCodePointIgnoreEOL/ungetNonAsciiNormalizedCodePoint with SourceUnits::{peek,consumeKnown}CodePoint
Categories
(Core :: JavaScript Engine, enhancement)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla63
Tracking | Status | |
---|---|---|
firefox63 | --- | fixed |
People
(Reporter: Waldo, Assigned: Waldo)
References
Details
Attachments
(6 files, 2 obsolete files)
(deleted),
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
I added these interfaces earlier, but upon further review they're not really the right thing. Most cases where they're used, it's super-obnoxious to have to unget-and-undo-EOL-updating. It's also messy to have to unget a variable number of code units -- a variable count that will be a bit more complicated in a UTF-8 world (and will involve us computing length-counts two separate places). Much better to peek at the next code point -- the actual operation we usually want -- and act based on that, sometimes never getting the code point, sometimes consuming it based on that already-computed info. It's maybe a bit subtle, but it's clearer than the alternative.
Assignee | ||
Comment 1•6 years ago
|
||
The first patch adds these functions and introduces one use of them -- incidentally, the most complicated use, because it must manually address updates for EOL (no other users will). Then I have patches replacing the other users of the existing functions, then a final patch to remove the old functions.
Attachment #8994549 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 2•6 years ago
|
||
Oh -- in the previous patch, note that PeekedCodePoint being CharT-parametrized is only to be able to assert proper length-in-units. I assume compilers can boil away the multiple class instantiations in practice.
Attachment #8994550 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 3•6 years ago
|
||
Attachment #8994551 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 4•6 years ago
|
||
Attachment #8994552 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 5•6 years ago
|
||
Attachment #8994553 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 6•6 years ago
|
||
I suppose I should note that the existing users are all known-non-ASCII, but {peek,consumeKnown}CodePoint handle ASCII as well as non-ASCII. In principle I could exclude the ASCII case, maybe, but I'm not sure it buys anything, and I kinda feel like handling everything may come in handy if we use this peek-code-point functionality in any other places in the future.
Attachment #8994556 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 7•6 years ago
|
||
Turns out this algorithm comes in handy for SourceUnits::findWindowEnd, so it makes sense to define it such that SourceUnits::peekCodePoint *and* that function can use it.
Attachment #8994687 -
Flags: review?(arai.unmht)
Assignee | ||
Updated•6 years ago
|
Attachment #8994549 -
Attachment is obsolete: true
Attachment #8994549 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 8•6 years ago
|
||
Attachment #8994704 -
Flags: review?(arai.unmht)
Assignee | ||
Updated•6 years ago
|
Attachment #8994687 -
Attachment is obsolete: true
Attachment #8994687 -
Flags: review?(arai.unmht)
Comment 9•6 years ago
|
||
Comment on attachment 8994704 [details] [diff] [review]
One last bit of tweaking to this, for realz
Review of attachment 8994704 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/frontend/TokenStream.h
@@ +986,5 @@
> +template<>
> +struct SourceUnitTraits<mozilla::Utf8Unit>
> +{
> + public:
> + static constexpr uint8_t maxUnitsLength = 2;
4
@@ +993,5 @@
> + return codePoint < 0x80
> + ? 1
> + : codePoint < 0x800
> + ? 2
> + : codePoint < 0x10000
might be nice to add constants for those boundaries in Unicode.h or somewhere, maybe in followup bug/patch.
@@ +1016,5 @@
> + MOZ_ASSERT(lengthInUnits != 0, "bad code point length");
> + MOZ_ASSERT(lengthInUnits == SourceUnitTraits::lengthInUnits(codePoint));
> + }
> +
> + static PeekedCodePoint none() {
can you add some comment at the top of this class that clarifies this class represents either a single code point or "none", and what the "none" actually means?
(basically similar thing as the comment for `SourceUnits::peekCodePoint()`)
without that it's surprising to see this method here.
Attachment #8994704 -
Flags: review?(arai.unmht) → review+
Updated•6 years ago
|
Attachment #8994550 -
Flags: review?(arai.unmht) → review+
Updated•6 years ago
|
Attachment #8994551 -
Flags: review?(arai.unmht) → review+
Updated•6 years ago
|
Attachment #8994552 -
Flags: review?(arai.unmht) → review+
Updated•6 years ago
|
Attachment #8994553 -
Flags: review?(arai.unmht) → review+
Updated•6 years ago
|
Attachment #8994556 -
Flags: review?(arai.unmht) → review+
Comment 10•6 years ago
|
||
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b4a63f805c03
Implement SourceUnits::{peek,consumeKnown}CodePoint for the uncommon cases where a code point must be gotten, tested against one or more predicates, then sometimes ungotten based on those predicates. (This is unfortunately a bit subtle, but getting and ungetting is arguably worse, because ungetting has to unget a variable number of code units -- whereas peeking can compute that number of code units and then use it directly when the peeked code point is consumed, avoiding double-computation and increased potential for error.) r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/c79e79e350bf
Use SourceUnits::peekCodePoint to look for an IdentifierStart after most decimal numeric literals, rather than getting and ungetting. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/51e2ba3c67d6
Use SourceUnits::peekCodePoint when examining a non-ASCII code point that might terminate a directive in a comment. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/466d4573b39a
Use SourceUnits::peekCodePoint when examining a non-ASCII code point that might terminate an IdentifierName. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/d1d2c7b998ef
Use SourceUnits::peekCodePoint when examining a non-ASCII code point after non-decimal numeric literals and a subset of decimal numeric literals. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/d93c6f631ca7
Remove TokenStreamChars::{ungetCodePointIgnoreEOL,ungetNonAsciiNormalizedCodePoint} as unused now that SourceUnits::{peek,consumeKnown}CodePoint have replaced them. r=arai
Comment 11•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b4a63f805c03
https://hg.mozilla.org/mozilla-central/rev/c79e79e350bf
https://hg.mozilla.org/mozilla-central/rev/51e2ba3c67d6
https://hg.mozilla.org/mozilla-central/rev/466d4573b39a
https://hg.mozilla.org/mozilla-central/rev/d1d2c7b998ef
https://hg.mozilla.org/mozilla-central/rev/d93c6f631ca7
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox63:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
You need to log in
before you can comment on or make changes to this bug.
Description
•