103165 – String.fromCharCode does not support converting to Unicode chars in the supplementary planes

Bug 103165 - String.fromCharCode does not support converting to Unicode chars in the supplementary planes

Summary: String.fromCharCode does not support converting to Unicode chars in the suppl...

Status:	CLOSED WONTFIX

Alias:	None

Product:	WebKit
Classification:	Unclassified
Component:	JavaScriptCore (show other bugs)
Version:	420+
Hardware:	All Linux

Importance:	P2 Major
Assignee:	Nobody

URL:
Keywords:

Depends on:
Blocks:

Reported:	2012-11-23 18:47 PST by Edwin H
Modified:	2012-11-23 23:52 PST (History)
CC List:	4 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Edwin H 2012-11-23 18:47:16 PST

In the chrome JS console:

console.log(String.fromCharCode(119134));
텞
parseInt('1D15E', 16)
119134
parseInt('D15E', 16)
53598
console.log(String.fromCharCode(119134).charCodeAt(0));
53598

(If bugzilla doesn't support that char in this bug submission form, you should see the musical half note char U+1D15E instead of a Korean character.)

Looks like it only got the bottom 16 bits of the value. If you use the UTF-16 encoding, it works fine:

console.log(String.fromCharCode(55348) + String.fromCharCode(56670));
𝅗𝅥

(you should see the musical half note char U+1D15E)

Comment 1 Glenn Adams 2012-11-23 21:06:27 PST

The current implementation correctly implements the semantics of ECMAScript 5.1, Clause 15.5.3.2 [1], which states that "An argument is converted to a character by applying the operation ToUint16 (9.7) and regarding the resulting 16-bit integer as the code unit value of a character."

[1] http://www.ecma-international.org/ecma-262/5.1/#sec-15.5.3.2

Comment 2 Edwin H 2012-11-23 22:48:59 PST

Okay, given that definition for ECMAScript, I agree that the implementation is correct. I have emailed the guys I know on the ECMAScript i18n committee to see if they can get the definition updated to support the Unicode supplementary characters.

Comment 3 Alexey Proskuryakov 2012-11-23 23:23:05 PST

> In the chrome JS console:

As an aside, Chrome does not use JavaScriptCore, so their JS bugs should not be reported here. Of course, this specific case is not a bug, and JSC has the same behavior.

Comment 4 Oliver Hunt 2012-11-23 23:52:58 PST

(In reply to comment #2)
> Okay, given that definition for ECMAScript, I agree that the implementation is correct. I have emailed the guys I know on the ECMAScript i18n committee to see if they can get the definition updated to support the Unicode supplementary characters.

fromCharCode is unlikely to change in the way you want -- it can't really risk changing to return strings with length != 1

I believe the new internationalization apis have new functions to better deal with these issues