RESOLVED FIXED Bug 144160
[ES6] Implement String.fromCodePoint
https://bugs.webkit.org/show_bug.cgi?id=144160
Summary [ES6] Implement String.fromCodePoint
Yusuke Suzuki
Reported 2015-04-24 14:20:51 PDT
[ES6] Implement String.fromCodePoint
Attachments
Patch (6.72 KB, patch)
2015-04-24 15:29 PDT, Yusuke Suzuki
no flags
Archive of layout-test-results from ews103 for mac-mavericks (539.07 KB, application/zip)
2015-04-24 16:02 PDT, Build Bot
no flags
Archive of layout-test-results from ews107 for mac-mavericks-wk2 (647.44 KB, application/zip)
2015-04-24 16:03 PDT, Build Bot
no flags
Patch (6.54 KB, patch)
2015-04-25 06:36 PDT, Yusuke Suzuki
no flags
Patch (6.48 KB, patch)
2015-04-25 06:49 PDT, Yusuke Suzuki
no flags
Archive of layout-test-results from ews102 for mac-mavericks (543.14 KB, application/zip)
2015-04-25 07:22 PDT, Build Bot
no flags
Archive of layout-test-results from ews104 for mac-mavericks-wk2 (589.31 KB, application/zip)
2015-04-25 07:45 PDT, Build Bot
no flags
Patch (10.66 KB, patch)
2015-04-25 07:49 PDT, Yusuke Suzuki
darin: review+
Yusuke Suzuki
Comment 1 2015-04-24 15:29:11 PDT
Build Bot
Comment 2 2015-04-24 16:02:22 PDT
Comment on attachment 251579 [details] Patch Attachment 251579 [details] did not pass mac-ews (mac): Output: http://webkit-queues.appspot.com/results/5169665965817856 New failing tests: js/Object-getOwnPropertyNames.html
Build Bot
Comment 3 2015-04-24 16:02:25 PDT
Created attachment 251581 [details] Archive of layout-test-results from ews103 for mac-mavericks The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews103 Port: mac-mavericks Platform: Mac OS X 10.9.5
Build Bot
Comment 4 2015-04-24 16:03:13 PDT
Comment on attachment 251579 [details] Patch Attachment 251579 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.appspot.com/results/5934236179628032 New failing tests: js/Object-getOwnPropertyNames.html
Build Bot
Comment 5 2015-04-24 16:03:17 PDT
Created attachment 251582 [details] Archive of layout-test-results from ews107 for mac-mavericks-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews107 Port: mac-mavericks-wk2 Platform: Mac OS X 10.9.5
Alexey Proskuryakov
Comment 6 2015-04-24 17:09:35 PDT
Comment on attachment 251579 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251579&action=review > Source/JavaScriptCore/runtime/StringConstructor.cpp:100 > + StringBuilder builder; > + > + for (unsigned i = 0; i < length; ++i) { Would it be more efficient to preallocate a 2*length size buffer than using StringBuilder?
Darin Adler
Comment 7 2015-04-24 18:08:59 PDT
Comment on attachment 251579 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251579&action=review Need to update the Object.getOwnPropertyNames test. >> Source/JavaScriptCore/runtime/StringConstructor.cpp:100 >> + for (unsigned i = 0; i < length; ++i) { > > Would it be more efficient to preallocate a 2*length size buffer than using StringBuilder? Or since it’s highly likely it will be length, not 2*length, should just call: builder.reserveCapacity(length); > Source/JavaScriptCore/runtime/StringConstructor.cpp:101 > + double nextCP = exec->uncheckedArgument(i).toNumber(exec); I suggest naming this codePointAsDouble rather than nextCP. > Source/JavaScriptCore/runtime/StringConstructor.cpp:109 > + if (!sameValue(exec, jsNumber(nextCP), jsNumber(jsNumber(nextCP).toInteger(exec)))) > + return throwVMError(exec, createRangeError(exec, ASCIILiteral("Arguments contain a value that is out of range of code points"))); > + > + if (nextCP < 0 || nextCP > 0x10FFFF) > + return throwVMError(exec, createRangeError(exec, ASCIILiteral("Arguments contain a value that is out of range of code points"))); This conversion back to jsNumber is really strange. I think we can do this check without using the sameValue function and without all this conversion back and forth. I think it’s probably simply this: UChar32 codePoint = codePointAsDouble; if (codePoint != codePointAsDouble || codePoint > 0x10FFFF) return throwVMError(exec, createRangeError(exec, ASCIILiteral("Arguments contain a value that is out of range of code points"))); > Source/JavaScriptCore/runtime/StringConstructor.cpp:111 > + unsigned codePoint = static_cast<unsigned>(nextCP); Should be UChar32 rather than unsigned. > Source/JavaScriptCore/runtime/StringConstructor.cpp:119 > + int encodeOffset = 0; > + UChar encodeBuffer[2] = { 0 }; > + U16_APPEND_UNSAFE(encodeBuffer, encodeOffset, codePoint); > + > + builder.append(encodeBuffer[0]); > + if (encodeOffset == 2) > + builder.append(encodeBuffer[1]); Here is a much cleaner way to write it: if (U_IS_BMP(codePoint)) builder.append(codePoint); else { builder.append(U16_LEAD(codePoint)); builder.append(U16_TRAIL(codePoint)); }
Yusuke Suzuki
Comment 8 2015-04-25 06:36:32 PDT
Yusuke Suzuki
Comment 9 2015-04-25 06:49:49 PDT
Yusuke Suzuki
Comment 10 2015-04-25 06:51:51 PDT
Comment on attachment 251579 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251579&action=review >>> Source/JavaScriptCore/runtime/StringConstructor.cpp:100 >>> + for (unsigned i = 0; i < length; ++i) { >> >> Would it be more efficient to preallocate a 2*length size buffer than using StringBuilder? > > Or since it’s highly likely it will be length, not 2*length, should just call: > > builder.reserveCapacity(length); Looks nice! If there's no code points represented by surrogate pairs, a length of a generated string becomes `length`. OK, so `builder.reserveCapacity(length);` looks very nice. >> Source/JavaScriptCore/runtime/StringConstructor.cpp:101 >> + double nextCP = exec->uncheckedArgument(i).toNumber(exec); > > I suggest naming this codePointAsDouble rather than nextCP. Looks nice. Fixed. >> Source/JavaScriptCore/runtime/StringConstructor.cpp:109 >> + return throwVMError(exec, createRangeError(exec, ASCIILiteral("Arguments contain a value that is out of range of code points"))); > > This conversion back to jsNumber is really strange. I think we can do this check without using the sameValue function and without all this conversion back and forth. > > I think it’s probably simply this: > > UChar32 codePoint = codePointAsDouble; > if (codePoint != codePointAsDouble || codePoint > 0x10FFFF) > return throwVMError(exec, createRangeError(exec, ASCIILiteral("Arguments contain a value that is out of range of code points"))); Looks nice. However, since `UChar32` is typedef-ed as `int32_t`, when `-1` comes, `codePoint != codePointAsDouble` becomes false. So instead of UChar32, I've changed it to `uint32_t` in the revised patch. >> Source/JavaScriptCore/runtime/StringConstructor.cpp:111 >> + unsigned codePoint = static_cast<unsigned>(nextCP); > > Should be UChar32 rather than unsigned. Sounds very nice! UChar32 is nice for representing unicode code points. >> Source/JavaScriptCore/runtime/StringConstructor.cpp:119 >> + builder.append(encodeBuffer[1]); > > Here is a much cleaner way to write it: > > if (U_IS_BMP(codePoint)) > builder.append(codePoint); > else { > builder.append(U16_LEAD(codePoint)); > builder.append(U16_TRAIL(codePoint)); > } Right. Your suggestion is cleaner :D
Build Bot
Comment 11 2015-04-25 07:22:20 PDT
Comment on attachment 251633 [details] Patch Attachment 251633 [details] did not pass mac-ews (mac): Output: http://webkit-queues.appspot.com/results/5669308101296128 New failing tests: js/Object-getOwnPropertyNames.html
Build Bot
Comment 12 2015-04-25 07:22:24 PDT
Created attachment 251635 [details] Archive of layout-test-results from ews102 for mac-mavericks The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews102 Port: mac-mavericks Platform: Mac OS X 10.9.5
Build Bot
Comment 13 2015-04-25 07:45:28 PDT
Comment on attachment 251633 [details] Patch Attachment 251633 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.appspot.com/results/4974387224641536 New failing tests: js/Object-getOwnPropertyNames.html
Build Bot
Comment 14 2015-04-25 07:45:32 PDT
Created attachment 251638 [details] Archive of layout-test-results from ews104 for mac-mavericks-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews104 Port: mac-mavericks-wk2 Platform: Mac OS X 10.9.5
Yusuke Suzuki
Comment 15 2015-04-25 07:49:27 PDT
Darin Adler
Comment 16 2015-04-25 09:54:55 PDT
Comment on attachment 251639 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251639&action=review Looks great. Correctness-wise this seems just right. Optimization-wise, maybe worth revisiting after landing. - We might consider a fast path in the extremely likely case that an argument is already an Int32; converting to double and back is expensive and unnecessary. The JSValue::toInt32 and toUInt32 functions does something like this, and while they are doing a different operation, the same principle applies. This one is really easy, we’d just check isUInt32 before calling toNumber and refactor so we don’t do the floating point arithmetic at all in that case. - We might consider a fast path in the extremely likely case that there is a single BMP argument that should return a single character string, like the special case we already have for fromCharCode. - If fromCharCode or fromCodePoint ever gets really hot usage then we could also consider optimizing either in the JIT the way we do with some other built-in functions like the math functions. It’s unfortunate that toUInt32 has the unwanted behavior of doing modulo rather than failing when the value cannot be converted into a number that is in the UInt32 range. Because aside from that, toUInt32 has desirable behavior such as avoiding the "to double and back" thing. > Source/JavaScriptCore/tests/stress/string-from-code-points.js:51 > + [ 0x10000, "\uD800\uDC00" ], > + [ 0x10001, "\uD800\uDC01" ], Should test that when you pass in an unpaired surrogate such as 0xD800 or 0xDC00 it just gets passed through unmodified, since that seems to be the specified behavior. Or even a pair of surrogates as code points that then turn into a single code point in the resulting string! > Source/JavaScriptCore/tests/stress/string-from-code-points.js:59 > +var invalidTests = [ Should specifically test a value that works with fromCharCode because of toUInt32 but not for this class. Such as 2^32 + 32, which produces a space if passed to fromCharCode, but should throw an exception if passed to fromCodePoint. > Source/JavaScriptCore/tests/stress/string-from-code-points.js:103 > + [ null, "\0" ], > + [ [], "\0" ], > + [ "0x41", "A" ], > + [ "", "\0" ], Should probably add tests for boolean true and false also.
Darin Adler
Comment 17 2015-04-25 10:01:50 PDT
Comment on attachment 251639 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251639&action=review >> Source/JavaScriptCore/tests/stress/string-from-code-points.js:51 >> + [ 0x10001, "\uD800\uDC01" ], > > Should test that when you pass in an unpaired surrogate such as 0xD800 or 0xDC00 it just gets passed through unmodified, since that seems to be the specified behavior. Or even a pair of surrogates as code points that then turn into a single code point in the resulting string! Besides unpaired surrogates, there are these values that are technically “non-characters” but that we intentionally handle rather than rejecting: - the last two code points on each plane (U+__FFFE and U+__FFFF, 34 code points total in 17 planes); of these we test only 0x10FFFF, 0x10FFFE and 0xFFFF here but we could test some others - U+FDD0-U+FDEF (32 code points) We might want to test a few more of them, at least the true edge cases. If we had used, for example, the U_IS_UNICODE_CHAR function from the ICU library rather than comparing against 0x10FFFF explicitly we would have rejected those characters.
Yusuke Suzuki
Comment 18 2015-04-25 14:59:51 PDT
Comment on attachment 251639 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251639&action=review Thank you for your review! Your thought is correct. I think if we need more performance for this area, implementing a fast path in JIT layer (implementing intrinsic thunk code for JS function) is preferable. >>> Source/JavaScriptCore/tests/stress/string-from-code-points.js:51 >>> + [ 0x10001, "\uD800\uDC01" ], >> >> Should test that when you pass in an unpaired surrogate such as 0xD800 or 0xDC00 it just gets passed through unmodified, since that seems to be the specified behavior. Or even a pair of surrogates as code points that then turn into a single code point in the resulting string! > > Besides unpaired surrogates, there are these values that are technically “non-characters” but that we intentionally handle rather than rejecting: > > - the last two code points on each plane (U+__FFFE and U+__FFFF, 34 code points total in 17 planes); of these we test only 0x10FFFF, 0x10FFFE and 0xFFFF here but we could test some others > - U+FDD0-U+FDEF (32 code points) > > We might want to test a few more of them, at least the true edge cases. If we had used, for example, the U_IS_UNICODE_CHAR function from the ICU library rather than comparing against 0x10FFFF explicitly we would have rejected those characters. Looks nice. I've added 1. unpaired surrogates 2. all non-characters code points (17 * 2 + 32 = 66) >> Source/JavaScriptCore/tests/stress/string-from-code-points.js:59 >> +var invalidTests = [ > > Should specifically test a value that works with fromCharCode because of toUInt32 but not for this class. Such as 2^32 + 32, which produces a space if passed to fromCharCode, but should throw an exception if passed to fromCodePoint. Added it (0x100000000 + 32). >> Source/JavaScriptCore/tests/stress/string-from-code-points.js:103 >> + [ "", "\0" ], > > Should probably add tests for boolean true and false also. Sounds nice. Added.
Yusuke Suzuki
Comment 19 2015-04-25 15:02:07 PDT
Note You need to log in before you can comment on or make changes to this bug.