Summary: | Implement String.codePointAt() | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Benjamin Poulain <benjamin> | ||||||||||
Component: | New Bugs | Assignee: | Benjamin Poulain <benjamin> | ||||||||||
Status: | RESOLVED FIXED | ||||||||||||
Severity: | Normal | CC: | ap, cmarcelo, commit-queue, darin, ggaren | ||||||||||
Priority: | P2 | ||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Attachments: |
|
Description
Benjamin Poulain
2015-04-19 01:09:11 PDT
Created attachment 251118 [details]
Patch
Comment on attachment 251118 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251118&action=review > Source/WTF/wtf/text/StringImpl.h:657 > + WTF_EXPORT_STRING_API UChar32 jsCodePointAt(unsigned); If this is only used by JSC I don't think it needs to go into WTF. Comment on attachment 251118 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251118&action=review > Source/WTF/wtf/text/StringImpl.cpp:371 > + // -Let first be the code unit value of the element at index position in the String S. Delete these comments now that the code is written? > LayoutTests/js/script-tests/string-code-point-at.js:80 > +shouldThrow('"abc".codePointAt(Symbol("WebKit"))'); Nice test suite. Curious why Symbol is so much of a special case. > LayoutTests/js/script-tests/string-code-point-at.js:82 > +// The following are using special test functions because of limitations of WebKitTestRunner when handling strings with invalid codepoints. Can you reword this comment to explain what exactly doesn't work? Comment on attachment 251118 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251118&action=review > LayoutTests/js/Object-getOwnPropertyNames-expected.txt:51 > +FAIL getSortedOwnPropertyNames(String.prototype) should be anchor,big,blink,bold,charAt,charCodeAt,concat,constructor,endsWith,fixed,fontcolor,fontsize,includes,indexOf,italics,lastIndexOf,length,link,localeCompare,match,repeat,replace,search,slice,small,split,startsWith,strike,sub,substr,substring,sup,toLocaleLowerCase,toLocaleUpperCase,toLowerCase,toString,toUpperCase,trim,trimLeft,trimRight,valueOf. Was anchor,big,blink,bold,charAt,charCodeAt,codePointAt,concat,constructor,endsWith,fixed,fontcolor,fontsize,includes,indexOf,italics,lastIndexOf,length,link,localeCompare,match,repeat,replace,search,slice,small,split,startsWith,strike,sub,substr,substring,sup,toLocaleLowerCase,toLocaleUpperCase,toLowerCase,toString,toUpperCase,trim,trimLeft,trimRight,valueOf. test failed. adding "charCodeAt' to test solves it :D Comment on attachment 251118 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251118&action=review if > Source/JavaScriptCore/runtime/StringPrototype.cpp:816 > +EncodedJSValue JSC_HOST_CALL stringProtoFuncCodePointAt(ExecState* exec) Here’s the helper function I suggest implementing: static inline UChar32 codePointAt(const String& string, unsigned position) { ASSERT(position < string.length()); if (string.is8Bit()) return string[position]; UChar32 character; U16_NEXT(string.characters16(), position, string.length(), character); return character; } Or you could skip the is8Bit check and not bother with characters16 to make this cleaner: static inline UChar32 codePointAt(const String& string, unsigned position) { ASSERT(position < string.length()); UChar32 character; U16_NEXT(string, position, string.length(), character); return character; } I’m not sure how good the compiler will be at optimizing this second case, but it might do just fine. This implements the same rule that the JavaScript specification calls for. It’s annoying how close this is to StringImpl::characterStartingAt but for some reason that function returns 0 when it’s not a valid surrogate pair. >> Source/WTF/wtf/text/StringImpl.h:657 >> + WTF_EXPORT_STRING_API UChar32 jsCodePointAt(unsigned); > > If this is only used by JSC I don't think it needs to go into WTF. I agree. We don’t need to put this in StringImpl. Comment on attachment 251118 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251118&action=review >> LayoutTests/js/script-tests/string-code-point-at.js:82 >> +// The following are using special test functions because of limitations of WebKitTestRunner when handling strings with invalid codepoints. > > Can you reword this comment to explain what exactly doesn't work? A good point. I’d like to know more. This reason for StringImpl + explicit implementation was the extra testing and having the spec text in code. Looks like nobody appreciates it. I'll move back to JSC. Created attachment 251123 [details]
Patch
Comment on attachment 251123 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251123&action=review > Source/JavaScriptCore/runtime/StringPrototype.cpp:813 > return JSValue::encode(jsNaN()); Strange that charCodeAt returns NaN and codePointAt returns undefined in the analogous situation. > Source/JavaScriptCore/runtime/StringPrototype.cpp:832 > + String string = thisValue.toWTFString(exec); I think the fact that we convert "this" to a string before converting the argument to an integer is detectable. We should write tests using valueOf to see if so. And if so, decide whether that’s the correct order or not. (In reply to comment #9) > Comment on attachment 251123 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=251123&action=review > > > Source/JavaScriptCore/runtime/StringPrototype.cpp:813 > > return JSValue::encode(jsNaN()); > > Strange that charCodeAt returns NaN and codePointAt returns undefined in the > analogous situation. Agreed, that is just plain weird. > > Source/JavaScriptCore/runtime/StringPrototype.cpp:832 > > + String string = thisValue.toWTFString(exec); > > I think the fact that we convert "this" to a string before converting the > argument to an integer is detectable. We should write tests using valueOf to > see if so. And if so, decide whether that’s the correct order or not. Good point! It is the correct order. It is defined by the spec. I'll add coverage for that. Oh, I thought of something else. If converting “this” to a string throws an exception, then the argument should not be converted to an integer; the exception should stop the processing before the second conversion is done. To implement this correctly, we probably need to add one early return if we see an exception. I bet we have this wrong for charCodeAt and probably other functions as well. It’s one good argument for writing more of the runtime library in JavaScript instead of C++ long term because it’s easier to get exceptions right that way. (In reply to comment #11) > Oh, I thought of something else. > > If converting “this” to a string throws an exception, then the argument > should not be converted to an integer; the exception should stop the > processing before the second conversion is done. To implement this > correctly, we probably need to add one early return if we see an exception. > I bet we have this wrong for charCodeAt and probably other functions as > well. It’s one good argument for writing more of the runtime library in > JavaScript instead of C++ long term because it’s easier to get exceptions > right that way. I thought about that, but what would be the way of checking if "this" is coercible to object in JavaScript? Comment on attachment 251123 [details]
Patch
I'll post a new patch tomorrow. The last bug you uncovered shows this need more work.
Comment on attachment 251123 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251123&action=review Please also see es6-shim's tests here: https://github.com/paulmillr/es6-shim/blob/master/test/string.js#L429-L474 > LayoutTests/js/script-tests/string-code-point-at.js:8 > +shouldBe('String.prototype.codePointAt.length', '1'); It'd be good to also assert `shouldBeEqualToString('String.prototype.codePointAt.name', 'codePointAt')` > LayoutTests/js/script-tests/string-code-point-at.js:69 > +shouldBe('"abc".codePointAt(4)', 'undefined'); Maybe also a `shouldBe('var str = "abc"; str.codePointAt(str.length)', 'undefined');`? Created attachment 251197 [details]
Patch
Ok, the exception on toString() is not a problem since the generated string is null so the remaining code does nothing. I added test coverage for the issues pointed out so far. Created attachment 251221 [details]
Patch
Comment on attachment 251221 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=251221&action=review > Source/JavaScriptCore/runtime/StringPrototype.cpp:818 > + RELEASE_ASSERT(position < string.length()); This should not be RELEASE_ASSERT. The two call sites just below guarantee this invariant in a clear, obvious way. Maybe if this function was used in more places, but it seems like overkill given the way this is used. > Source/JavaScriptCore/runtime/StringPrototype.cpp:834 > + if (UNLIKELY(exec->hadException())) > + return JSValue::encode(jsUndefined()); You could put this after the isUInt32 check below, since we only need this check if we continue and call toInteger. Of course, you could also argue that putting it right here is best for clarity. (In reply to comment #18) > Comment on attachment 251221 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=251221&action=review > > > Source/JavaScriptCore/runtime/StringPrototype.cpp:818 > > + RELEASE_ASSERT(position < string.length()); > > This should not be RELEASE_ASSERT. The two call sites just below guarantee > this invariant in a clear, obvious way. Maybe if this function was used in > more places, but it seems like overkill given the way this is used. I think it is a good safety net. Today the code will be eliminated by the compiler. If anyone ever modify a caller, this branch could protect the "string.characters8()[position]". Committed r183141: <http://trac.webkit.org/changeset/183141> |