Closed Bug 207923 Opened 22 years ago Closed 21 years ago

[RFE] Uint32toUTF16 (Uint32toString?) to use with fromCharCode()

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

VERIFIED INVALID

People

(Reporter: jshin1987, Assigned: waldemar)

Details

(Keywords: intl)

Attachments

(1 file)

a spin-off from bug 162431 fromCharCode() method of String class doesn't understand non-BMP characters. For non-BMP characters, it always returns characters as if the input charcode were bitwise 'AND'ed with 0xffff (i.e. only the lowest 16bits are interpreted). Javascript is supposed to use UTF-16 (NOT UCS-2), but it may be still using UCS-2 in some places.
Attached file a test case (deleted) —
This is to demonstrate that String.fromCharCode(0x10400) gives the same result as String.fromCharCode(0x0400). That is, only the low 16bits are made use of. On the other hand, String.fromCharCode(0xd801, 0xdc00) (a pair of surrogate code points for U+10400) works. I guess this is just by coincidence because Mozilla internally uses UTF-16. It has to be checked what ECMAscript standard says about fromCharCode() as to whether it's supposed to accept UCVs for non-BMP characters or accept surrogate pairs.
Adding Brendan, the father of JS to CC.
This bug might be invalid because ECMA-262 has the following and ToUint16 is defined to return the input value modulo 2^16. I'm not sure, but this doesn't seem to be the best way to define fromCharCode. --------------- 15.5.3.2 String.fromCharCode ( [ char0 [ , char1 [ , … ] ] ] ) Returns a string value containing as many characters as the number of arguments. Each argument specifies one character of the resulting string, with the first argument specifying the first character, and so on, from left to right. An argument is converted to a character by applying the operation ToUint16 (9.7) and regarding the resulting 16-bit integer as the code point value of a character. If no arguments are supplied, the result is the empty string. The length property of the fromCharCode function is 1. --------------
This is not a valid bug against the engine. The ECMA TC39 working group is looking into full Unicode 17 plane support, I hear. Cc'ing waldemar, but I expect this bug should be an RFE for now. /be
Severity: normal → enhancement
Summary: fromCharaCode doesn't understand non-BMP characters → fromCharCode doesn't understand non-BMP characters
Brendan, you came here while I was reading thruough ECMA 262 and writing this. :-). I agree with you. ECMA section 6 has the following about 'code point', 'character' and 'Unicode character'. According to this and the definition of fromCharCode(), this bug is invalid. Although not convenient, one has to write a simple function (if not already available) to convert a USV corresponding to a non-BMP char. to a pair of surrogate code points and use that before invoking fromCharCode(). I'm changing the summary line for RFEing such a function. I realize that this can't be done by Mozilla alone (although offering additional functions does not violate the standard) and has to be coordinated through the standard body. -------------- Throughout the rest of this document, the phrase “code point” and the word “character” will be used to refer to a 16-bit unsigned value used to represent a single 16-bit unit of UTF-16 text. The phrase “Unicode character” will be used to refer to the abstract linguistic or typographical unit represented by a single Unicode scalar value (which may be longer than 16 bits and thus may be represented by more than one code point). ----------------------
Summary: fromCharCode doesn't understand non-BMP characters → [RFE] Uint32toUTF16 (Uint32toString?) to use with fromCharCode()
Since this is a standards issue, let me reassign this to Waldemar -
Assignee: rogerl → waldemar
SpiderMonkey works as specified by ECMAScript Edition 3. For Edition 4 we've already changed the standard to allow supplementary character codes as input to fromCharCode, which will no longer treat the integers modulo 2^16. For supplementary characters you'll get a pair of surrogates in the string.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → INVALID
Marking Verified. jshin@mailaps.org: thank you for raising this question -
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: