Bug 103165 - String.fromCharCode does not support converting to Unicode chars in the supplementary planes
Summary: String.fromCharCode does not support converting to Unicode chars in the suppl...
Status: CLOSED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 420+
Hardware: All Linux
: P2 Major
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-23 18:47 PST by Edwin H
Modified: 2012-11-23 23:52 PST (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Edwin H 2012-11-23 18:47:16 PST
In the chrome JS console:

console.log(String.fromCharCode(119134));
텞
parseInt('1D15E', 16)
119134
parseInt('D15E', 16)
53598
console.log(String.fromCharCode(119134).charCodeAt(0));
53598

(If bugzilla doesn't support that char in this bug submission form, you should see the musical half note char U+1D15E instead of a Korean character.)

Looks like it only got the bottom 16 bits of the value. If you use the UTF-16 encoding, it works fine:

console.log(String.fromCharCode(55348) + String.fromCharCode(56670));
𝅗𝅥

(you should see the musical half note char U+1D15E)
Comment 1 Glenn Adams 2012-11-23 21:06:27 PST
The current implementation correctly implements the semantics of ECMAScript 5.1, Clause 15.5.3.2 [1], which states that "An argument is converted to a character by applying the operation ToUint16 (9.7) and regarding the resulting 16-bit integer as the code unit value of a character."

[1] http://www.ecma-international.org/ecma-262/5.1/#sec-15.5.3.2
Comment 2 Edwin H 2012-11-23 22:48:59 PST
Okay, given that definition for ECMAScript, I agree that the implementation is correct. I have emailed the guys I know on the ECMAScript i18n committee to see if they can get the definition updated to support the Unicode supplementary characters.
Comment 3 Alexey Proskuryakov 2012-11-23 23:23:05 PST
> In the chrome JS console:

As an aside, Chrome does not use JavaScriptCore, so their JS bugs should not be reported here. Of course, this specific case is not a bug, and JSC has the same behavior.
Comment 4 Oliver Hunt 2012-11-23 23:52:58 PST
(In reply to comment #2)
> Okay, given that definition for ECMAScript, I agree that the implementation is correct. I have emailed the guys I know on the ECMAScript i18n committee to see if they can get the definition updated to support the Unicode supplementary characters.

fromCharCode is unlikely to change in the way you want -- it can't really risk changing to return strings with length != 1

I believe the new internationalization apis have new functions to better deal with these issues