Closed
Bug 207923
Opened 22 years ago
Closed 21 years ago
[RFE] Uint32toUTF16 (Uint32toString?) to use with fromCharCode()
Categories
(Core :: JavaScript Engine, enhancement)
Core
JavaScript Engine
Tracking
()
VERIFIED
INVALID
People
(Reporter: jshin1987, Assigned: waldemar)
Details
(Keywords: intl)
Attachments
(1 file)
(deleted),
text/html
|
Details |
a spin-off from bug 162431
fromCharCode() method of String class doesn't understand non-BMP characters. For
non-BMP characters, it always returns characters as if the input charcode were
bitwise 'AND'ed with 0xffff (i.e. only the lowest 16bits are interpreted).
Javascript is supposed to use UTF-16 (NOT UCS-2), but it may be still using
UCS-2 in some places.
Reporter | ||
Comment 1•22 years ago
|
||
This is to demonstrate that String.fromCharCode(0x10400) gives the same result
as String.fromCharCode(0x0400). That is, only the low 16bits are made use of.
On the other hand, String.fromCharCode(0xd801, 0xdc00) (a pair of surrogate
code points for U+10400) works. I guess this is just by coincidence because
Mozilla internally uses UTF-16.
It has to be checked what ECMAscript standard says about fromCharCode() as to
whether it's supposed to accept UCVs for non-BMP characters or accept surrogate
pairs.
Reporter | ||
Comment 2•22 years ago
|
||
Adding Brendan, the father of JS to CC.
Reporter | ||
Comment 3•22 years ago
|
||
This bug might be invalid because ECMA-262 has the following and ToUint16 is
defined to return the input value modulo 2^16. I'm not sure, but this doesn't
seem to be the best way to define fromCharCode.
---------------
15.5.3.2 String.fromCharCode ( [ char0 [ , char1 [ , … ] ] ] )
Returns a string value containing as many characters as the number of arguments.
Each argument
specifies one character of the resulting string, with the first argument
specifying the first character,
and so on, from left to right. An argument is converted to a character by
applying the operation
ToUint16 (9.7) and regarding the resulting 16-bit integer as the code point
value of a character. If no
arguments are supplied, the result is the empty string.
The length property of the fromCharCode function is 1.
--------------
Comment 4•22 years ago
|
||
This is not a valid bug against the engine. The ECMA TC39 working group is
looking into full Unicode 17 plane support, I hear. Cc'ing waldemar, but I
expect this bug should be an RFE for now.
/be
Severity: normal → enhancement
Summary: fromCharaCode doesn't understand non-BMP characters → fromCharCode doesn't understand non-BMP characters
Reporter | ||
Comment 5•22 years ago
|
||
Brendan, you came here while I was reading thruough ECMA 262 and writing this.
:-). I agree with you.
ECMA section 6 has the following about 'code point', 'character' and 'Unicode
character'. According to this and the definition of fromCharCode(), this bug is
invalid.
Although not convenient, one has to write a simple function (if not already
available) to convert a USV corresponding to a non-BMP char. to a pair of
surrogate code points and use that before invoking fromCharCode().
I'm changing the summary line for RFEing such a function. I realize that this
can't be done by Mozilla alone (although offering additional functions does not
violate the standard) and has to be coordinated through the standard body.
--------------
Throughout the rest of this document, the phrase “code point” and the word
“character” will be used to refer
to a 16-bit unsigned value used to represent a single 16-bit unit of UTF-16
text. The phrase “Unicode
character” will be used to refer to the abstract linguistic or typographical
unit represented by a single Unicode
scalar value (which may be longer than 16 bits and thus may be represented by
more than one code point).
----------------------
Summary: fromCharCode doesn't understand non-BMP characters → [RFE] Uint32toUTF16 (Uint32toString?) to use with fromCharCode()
Comment 6•22 years ago
|
||
Since this is a standards issue, let me reassign this to Waldemar -
Assignee: rogerl → waldemar
Assignee | ||
Comment 7•21 years ago
|
||
SpiderMonkey works as specified by ECMAScript Edition 3. For Edition 4 we've
already changed the standard to allow supplementary character codes as input to
fromCharCode, which will no longer treat the integers modulo 2^16. For
supplementary characters you'll get a pair of surrogates in the string.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → INVALID
Comment 8•21 years ago
|
||
Marking Verified.
jshin@mailaps.org: thank you for raising this question -
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•