WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
143934
Implement String.codePointAt()
https://bugs.webkit.org/show_bug.cgi?id=143934
Summary
Implement String.codePointAt()
Benjamin Poulain
Reported
2015-04-19 01:09:11 PDT
Implement String.codePointAt()
Attachments
Patch
(25.36 KB, patch)
2015-04-19 01:13 PDT
,
Benjamin Poulain
no flags
Details
Formatted Diff
Diff
Patch
(19.22 KB, patch)
2015-04-19 12:03 PDT
,
Benjamin Poulain
no flags
Details
Formatted Diff
Diff
Patch
(21.27 KB, patch)
2015-04-20 16:00 PDT
,
Benjamin Poulain
no flags
Details
Formatted Diff
Diff
Patch
(21.45 KB, patch)
2015-04-20 23:52 PDT
,
Benjamin Poulain
darin
: review+
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Benjamin Poulain
Comment 1
2015-04-19 01:13:21 PDT
Created
attachment 251118
[details]
Patch
Anders Carlsson
Comment 2
2015-04-19 08:21:33 PDT
Comment on
attachment 251118
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=251118&action=review
> Source/WTF/wtf/text/StringImpl.h:657 > + WTF_EXPORT_STRING_API UChar32 jsCodePointAt(unsigned);
If this is only used by JSC I don't think it needs to go into WTF.
Alexey Proskuryakov
Comment 3
2015-04-19 09:51:36 PDT
Comment on
attachment 251118
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=251118&action=review
> Source/WTF/wtf/text/StringImpl.cpp:371 > + // -Let first be the code unit value of the element at index position in the String S.
Delete these comments now that the code is written?
> LayoutTests/js/script-tests/string-code-point-at.js:80 > +shouldThrow('"abc".codePointAt(Symbol("WebKit"))');
Nice test suite. Curious why Symbol is so much of a special case.
> LayoutTests/js/script-tests/string-code-point-at.js:82 > +// The following are using special test functions because of limitations of WebKitTestRunner when handling strings with invalid codepoints.
Can you reword this comment to explain what exactly doesn't work?
Yusuke Suzuki
Comment 4
2015-04-19 10:36:56 PDT
Comment on
attachment 251118
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=251118&action=review
> LayoutTests/js/Object-getOwnPropertyNames-expected.txt:51 > +FAIL getSortedOwnPropertyNames(String.prototype) should be anchor,big,blink,bold,charAt,charCodeAt,concat,constructor,endsWith,fixed,fontcolor,fontsize,includes,indexOf,italics,lastIndexOf,length,link,localeCompare,match,repeat,replace,search,slice,small,split,startsWith,strike,sub,substr,substring,sup,toLocaleLowerCase,toLocaleUpperCase,toLowerCase,toString,toUpperCase,trim,trimLeft,trimRight,valueOf. Was anchor,big,blink,bold,charAt,charCodeAt,codePointAt,concat,constructor,endsWith,fixed,fontcolor,fontsize,includes,indexOf,italics,lastIndexOf,length,link,localeCompare,match,repeat,replace,search,slice,small,split,startsWith,strike,sub,substr,substring,sup,toLocaleLowerCase,toLocaleUpperCase,toLowerCase,toString,toUpperCase,trim,trimLeft,trimRight,valueOf.
test failed. adding "charCodeAt' to test solves it :D
Darin Adler
Comment 5
2015-04-19 11:23:46 PDT
Comment on
attachment 251118
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=251118&action=review
if
> Source/JavaScriptCore/runtime/StringPrototype.cpp:816 > +EncodedJSValue JSC_HOST_CALL stringProtoFuncCodePointAt(ExecState* exec)
Here’s the helper function I suggest implementing: static inline UChar32 codePointAt(const String& string, unsigned position) { ASSERT(position < string.length()); if (string.is8Bit()) return string[position]; UChar32 character; U16_NEXT(string.characters16(), position, string.length(), character); return character; } Or you could skip the is8Bit check and not bother with characters16 to make this cleaner: static inline UChar32 codePointAt(const String& string, unsigned position) { ASSERT(position < string.length()); UChar32 character; U16_NEXT(string, position, string.length(), character); return character; } I’m not sure how good the compiler will be at optimizing this second case, but it might do just fine. This implements the same rule that the JavaScript specification calls for. It’s annoying how close this is to StringImpl::characterStartingAt but for some reason that function returns 0 when it’s not a valid surrogate pair.
>> Source/WTF/wtf/text/StringImpl.h:657 >> + WTF_EXPORT_STRING_API UChar32 jsCodePointAt(unsigned); > > If this is only used by JSC I don't think it needs to go into WTF.
I agree. We don’t need to put this in StringImpl.
Darin Adler
Comment 6
2015-04-19 11:28:52 PDT
Comment on
attachment 251118
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=251118&action=review
>> LayoutTests/js/script-tests/string-code-point-at.js:82 >> +// The following are using special test functions because of limitations of WebKitTestRunner when handling strings with invalid codepoints. > > Can you reword this comment to explain what exactly doesn't work?
A good point. I’d like to know more.
Benjamin Poulain
Comment 7
2015-04-19 11:34:47 PDT
This reason for StringImpl + explicit implementation was the extra testing and having the spec text in code. Looks like nobody appreciates it. I'll move back to JSC.
Benjamin Poulain
Comment 8
2015-04-19 12:03:14 PDT
Created
attachment 251123
[details]
Patch
Darin Adler
Comment 9
2015-04-19 16:58:12 PDT
Comment on
attachment 251123
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=251123&action=review
> Source/JavaScriptCore/runtime/StringPrototype.cpp:813 > return JSValue::encode(jsNaN());
Strange that charCodeAt returns NaN and codePointAt returns undefined in the analogous situation.
> Source/JavaScriptCore/runtime/StringPrototype.cpp:832 > + String string = thisValue.toWTFString(exec);
I think the fact that we convert "this" to a string before converting the argument to an integer is detectable. We should write tests using valueOf to see if so. And if so, decide whether that’s the correct order or not.
Benjamin Poulain
Comment 10
2015-04-19 17:28:14 PDT
(In reply to
comment #9
)
> Comment on
attachment 251123
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=251123&action=review
> > > Source/JavaScriptCore/runtime/StringPrototype.cpp:813 > > return JSValue::encode(jsNaN()); > > Strange that charCodeAt returns NaN and codePointAt returns undefined in the > analogous situation.
Agreed, that is just plain weird.
> > Source/JavaScriptCore/runtime/StringPrototype.cpp:832 > > + String string = thisValue.toWTFString(exec); > > I think the fact that we convert "this" to a string before converting the > argument to an integer is detectable. We should write tests using valueOf to > see if so. And if so, decide whether that’s the correct order or not.
Good point! It is the correct order. It is defined by the spec. I'll add coverage for that.
Darin Adler
Comment 11
2015-04-19 21:40:34 PDT
Oh, I thought of something else. If converting “this” to a string throws an exception, then the argument should not be converted to an integer; the exception should stop the processing before the second conversion is done. To implement this correctly, we probably need to add one early return if we see an exception. I bet we have this wrong for charCodeAt and probably other functions as well. It’s one good argument for writing more of the runtime library in JavaScript instead of C++ long term because it’s easier to get exceptions right that way.
Benjamin Poulain
Comment 12
2015-04-19 21:55:04 PDT
(In reply to
comment #11
)
> Oh, I thought of something else. > > If converting “this” to a string throws an exception, then the argument > should not be converted to an integer; the exception should stop the > processing before the second conversion is done. To implement this > correctly, we probably need to add one early return if we see an exception. > I bet we have this wrong for charCodeAt and probably other functions as > well. It’s one good argument for writing more of the runtime library in > JavaScript instead of C++ long term because it’s easier to get exceptions > right that way.
I thought about that, but what would be the way of checking if "this" is coercible to object in JavaScript?
Benjamin Poulain
Comment 13
2015-04-19 22:14:51 PDT
Comment on
attachment 251123
[details]
Patch I'll post a new patch tomorrow. The last bug you uncovered shows this need more work.
Jordan Harband
Comment 14
2015-04-20 14:52:20 PDT
Comment on
attachment 251123
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=251123&action=review
Please also see es6-shim's tests here:
https://github.com/paulmillr/es6-shim/blob/master/test/string.js#L429-L474
> LayoutTests/js/script-tests/string-code-point-at.js:8 > +shouldBe('String.prototype.codePointAt.length', '1');
It'd be good to also assert `shouldBeEqualToString('String.prototype.codePointAt.name', 'codePointAt')`
> LayoutTests/js/script-tests/string-code-point-at.js:69 > +shouldBe('"abc".codePointAt(4)', 'undefined');
Maybe also a `shouldBe('var str = "abc"; str.codePointAt(str.length)', 'undefined');`?
Benjamin Poulain
Comment 15
2015-04-20 16:00:54 PDT
Created
attachment 251197
[details]
Patch
Benjamin Poulain
Comment 16
2015-04-20 16:02:07 PDT
Ok, the exception on toString() is not a problem since the generated string is null so the remaining code does nothing. I added test coverage for the issues pointed out so far.
Benjamin Poulain
Comment 17
2015-04-20 23:52:56 PDT
Created
attachment 251221
[details]
Patch
Darin Adler
Comment 18
2015-04-21 08:48:32 PDT
Comment on
attachment 251221
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=251221&action=review
> Source/JavaScriptCore/runtime/StringPrototype.cpp:818 > + RELEASE_ASSERT(position < string.length());
This should not be RELEASE_ASSERT. The two call sites just below guarantee this invariant in a clear, obvious way. Maybe if this function was used in more places, but it seems like overkill given the way this is used.
> Source/JavaScriptCore/runtime/StringPrototype.cpp:834 > + if (UNLIKELY(exec->hadException())) > + return JSValue::encode(jsUndefined());
You could put this after the isUInt32 check below, since we only need this check if we continue and call toInteger. Of course, you could also argue that putting it right here is best for clarity.
Benjamin Poulain
Comment 19
2015-04-22 14:04:18 PDT
(In reply to
comment #18
)
> Comment on
attachment 251221
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=251221&action=review
> > > Source/JavaScriptCore/runtime/StringPrototype.cpp:818 > > + RELEASE_ASSERT(position < string.length()); > > This should not be RELEASE_ASSERT. The two call sites just below guarantee > this invariant in a clear, obvious way. Maybe if this function was used in > more places, but it seems like overkill given the way this is used.
I think it is a good safety net. Today the code will be eliminated by the compiler. If anyone ever modify a caller, this branch could protect the "string.characters8()[position]".
Benjamin Poulain
Comment 20
2015-04-22 15:26:33 PDT
Committed
r183141
: <
http://trac.webkit.org/changeset/183141
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug