RESOLVED FIXED 143934
Implement String.codePointAt()
https://bugs.webkit.org/show_bug.cgi?id=143934
Summary Implement String.codePointAt()
Benjamin Poulain
Reported 2015-04-19 01:09:11 PDT
Implement String.codePointAt()
Attachments
Patch (25.36 KB, patch)
2015-04-19 01:13 PDT, Benjamin Poulain
no flags
Patch (19.22 KB, patch)
2015-04-19 12:03 PDT, Benjamin Poulain
no flags
Patch (21.27 KB, patch)
2015-04-20 16:00 PDT, Benjamin Poulain
no flags
Patch (21.45 KB, patch)
2015-04-20 23:52 PDT, Benjamin Poulain
darin: review+
Benjamin Poulain
Comment 1 2015-04-19 01:13:21 PDT
Anders Carlsson
Comment 2 2015-04-19 08:21:33 PDT
Comment on attachment 251118 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251118&action=review > Source/WTF/wtf/text/StringImpl.h:657 > + WTF_EXPORT_STRING_API UChar32 jsCodePointAt(unsigned); If this is only used by JSC I don't think it needs to go into WTF.
Alexey Proskuryakov
Comment 3 2015-04-19 09:51:36 PDT
Comment on attachment 251118 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251118&action=review > Source/WTF/wtf/text/StringImpl.cpp:371 > + // -Let first be the code unit value of the element at index position in the String S. Delete these comments now that the code is written? > LayoutTests/js/script-tests/string-code-point-at.js:80 > +shouldThrow('"abc".codePointAt(Symbol("WebKit"))'); Nice test suite. Curious why Symbol is so much of a special case. > LayoutTests/js/script-tests/string-code-point-at.js:82 > +// The following are using special test functions because of limitations of WebKitTestRunner when handling strings with invalid codepoints. Can you reword this comment to explain what exactly doesn't work?
Yusuke Suzuki
Comment 4 2015-04-19 10:36:56 PDT
Comment on attachment 251118 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251118&action=review > LayoutTests/js/Object-getOwnPropertyNames-expected.txt:51 > +FAIL getSortedOwnPropertyNames(String.prototype) should be anchor,big,blink,bold,charAt,charCodeAt,concat,constructor,endsWith,fixed,fontcolor,fontsize,includes,indexOf,italics,lastIndexOf,length,link,localeCompare,match,repeat,replace,search,slice,small,split,startsWith,strike,sub,substr,substring,sup,toLocaleLowerCase,toLocaleUpperCase,toLowerCase,toString,toUpperCase,trim,trimLeft,trimRight,valueOf. Was anchor,big,blink,bold,charAt,charCodeAt,codePointAt,concat,constructor,endsWith,fixed,fontcolor,fontsize,includes,indexOf,italics,lastIndexOf,length,link,localeCompare,match,repeat,replace,search,slice,small,split,startsWith,strike,sub,substr,substring,sup,toLocaleLowerCase,toLocaleUpperCase,toLowerCase,toString,toUpperCase,trim,trimLeft,trimRight,valueOf. test failed. adding "charCodeAt' to test solves it :D
Darin Adler
Comment 5 2015-04-19 11:23:46 PDT
Comment on attachment 251118 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251118&action=review if > Source/JavaScriptCore/runtime/StringPrototype.cpp:816 > +EncodedJSValue JSC_HOST_CALL stringProtoFuncCodePointAt(ExecState* exec) Here’s the helper function I suggest implementing: static inline UChar32 codePointAt(const String& string, unsigned position) { ASSERT(position < string.length()); if (string.is8Bit()) return string[position]; UChar32 character; U16_NEXT(string.characters16(), position, string.length(), character); return character; } Or you could skip the is8Bit check and not bother with characters16 to make this cleaner: static inline UChar32 codePointAt(const String& string, unsigned position) { ASSERT(position < string.length()); UChar32 character; U16_NEXT(string, position, string.length(), character); return character; } I’m not sure how good the compiler will be at optimizing this second case, but it might do just fine. This implements the same rule that the JavaScript specification calls for. It’s annoying how close this is to StringImpl::characterStartingAt but for some reason that function returns 0 when it’s not a valid surrogate pair. >> Source/WTF/wtf/text/StringImpl.h:657 >> + WTF_EXPORT_STRING_API UChar32 jsCodePointAt(unsigned); > > If this is only used by JSC I don't think it needs to go into WTF. I agree. We don’t need to put this in StringImpl.
Darin Adler
Comment 6 2015-04-19 11:28:52 PDT
Comment on attachment 251118 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251118&action=review >> LayoutTests/js/script-tests/string-code-point-at.js:82 >> +// The following are using special test functions because of limitations of WebKitTestRunner when handling strings with invalid codepoints. > > Can you reword this comment to explain what exactly doesn't work? A good point. I’d like to know more.
Benjamin Poulain
Comment 7 2015-04-19 11:34:47 PDT
This reason for StringImpl + explicit implementation was the extra testing and having the spec text in code. Looks like nobody appreciates it. I'll move back to JSC.
Benjamin Poulain
Comment 8 2015-04-19 12:03:14 PDT
Darin Adler
Comment 9 2015-04-19 16:58:12 PDT
Comment on attachment 251123 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251123&action=review > Source/JavaScriptCore/runtime/StringPrototype.cpp:813 > return JSValue::encode(jsNaN()); Strange that charCodeAt returns NaN and codePointAt returns undefined in the analogous situation. > Source/JavaScriptCore/runtime/StringPrototype.cpp:832 > + String string = thisValue.toWTFString(exec); I think the fact that we convert "this" to a string before converting the argument to an integer is detectable. We should write tests using valueOf to see if so. And if so, decide whether that’s the correct order or not.
Benjamin Poulain
Comment 10 2015-04-19 17:28:14 PDT
(In reply to comment #9) > Comment on attachment 251123 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=251123&action=review > > > Source/JavaScriptCore/runtime/StringPrototype.cpp:813 > > return JSValue::encode(jsNaN()); > > Strange that charCodeAt returns NaN and codePointAt returns undefined in the > analogous situation. Agreed, that is just plain weird. > > Source/JavaScriptCore/runtime/StringPrototype.cpp:832 > > + String string = thisValue.toWTFString(exec); > > I think the fact that we convert "this" to a string before converting the > argument to an integer is detectable. We should write tests using valueOf to > see if so. And if so, decide whether that’s the correct order or not. Good point! It is the correct order. It is defined by the spec. I'll add coverage for that.
Darin Adler
Comment 11 2015-04-19 21:40:34 PDT
Oh, I thought of something else. If converting “this” to a string throws an exception, then the argument should not be converted to an integer; the exception should stop the processing before the second conversion is done. To implement this correctly, we probably need to add one early return if we see an exception. I bet we have this wrong for charCodeAt and probably other functions as well. It’s one good argument for writing more of the runtime library in JavaScript instead of C++ long term because it’s easier to get exceptions right that way.
Benjamin Poulain
Comment 12 2015-04-19 21:55:04 PDT
(In reply to comment #11) > Oh, I thought of something else. > > If converting “this” to a string throws an exception, then the argument > should not be converted to an integer; the exception should stop the > processing before the second conversion is done. To implement this > correctly, we probably need to add one early return if we see an exception. > I bet we have this wrong for charCodeAt and probably other functions as > well. It’s one good argument for writing more of the runtime library in > JavaScript instead of C++ long term because it’s easier to get exceptions > right that way. I thought about that, but what would be the way of checking if "this" is coercible to object in JavaScript?
Benjamin Poulain
Comment 13 2015-04-19 22:14:51 PDT
Comment on attachment 251123 [details] Patch I'll post a new patch tomorrow. The last bug you uncovered shows this need more work.
Jordan Harband
Comment 14 2015-04-20 14:52:20 PDT
Comment on attachment 251123 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251123&action=review Please also see es6-shim's tests here: https://github.com/paulmillr/es6-shim/blob/master/test/string.js#L429-L474 > LayoutTests/js/script-tests/string-code-point-at.js:8 > +shouldBe('String.prototype.codePointAt.length', '1'); It'd be good to also assert `shouldBeEqualToString('String.prototype.codePointAt.name', 'codePointAt')` > LayoutTests/js/script-tests/string-code-point-at.js:69 > +shouldBe('"abc".codePointAt(4)', 'undefined'); Maybe also a `shouldBe('var str = "abc"; str.codePointAt(str.length)', 'undefined');`?
Benjamin Poulain
Comment 15 2015-04-20 16:00:54 PDT
Benjamin Poulain
Comment 16 2015-04-20 16:02:07 PDT
Ok, the exception on toString() is not a problem since the generated string is null so the remaining code does nothing. I added test coverage for the issues pointed out so far.
Benjamin Poulain
Comment 17 2015-04-20 23:52:56 PDT
Darin Adler
Comment 18 2015-04-21 08:48:32 PDT
Comment on attachment 251221 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251221&action=review > Source/JavaScriptCore/runtime/StringPrototype.cpp:818 > + RELEASE_ASSERT(position < string.length()); This should not be RELEASE_ASSERT. The two call sites just below guarantee this invariant in a clear, obvious way. Maybe if this function was used in more places, but it seems like overkill given the way this is used. > Source/JavaScriptCore/runtime/StringPrototype.cpp:834 > + if (UNLIKELY(exec->hadException())) > + return JSValue::encode(jsUndefined()); You could put this after the isUInt32 check below, since we only need this check if we continue and call toInteger. Of course, you could also argue that putting it right here is best for clarity.
Benjamin Poulain
Comment 19 2015-04-22 14:04:18 PDT
(In reply to comment #18) > Comment on attachment 251221 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=251221&action=review > > > Source/JavaScriptCore/runtime/StringPrototype.cpp:818 > > + RELEASE_ASSERT(position < string.length()); > > This should not be RELEASE_ASSERT. The two call sites just below guarantee > this invariant in a clear, obvious way. Maybe if this function was used in > more places, but it seems like overkill given the way this is used. I think it is a good safety net. Today the code will be eliminated by the compiler. If anyone ever modify a caller, this branch could protect the "string.characters8()[position]".
Benjamin Poulain
Comment 20 2015-04-22 15:26:33 PDT
Note You need to log in before you can comment on or make changes to this bug.