Bug 131704

Summary: Simple ES6 feature:String prototype additions
Product: WebKit Reporter: Oliver Hunt <oliver>
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, buildbot, cmarcelo, commit-queue, darin, dpino, ggaren, m.goleb+bugzilla, oliver, rniwa, webkit-bug-importer
Priority: P2 Keywords: EasyFix, InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 142350    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
Archive of layout-test-results from webkit-ews-03 for mac-mountainlion
none
Archive of layout-test-results from webkit-ews-07 for mac-mountainlion
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch none

Description Oliver Hunt 2014-04-15 15:04:31 PDT
String.prototype.startsWith
String.prototype.endsWith
String.prototype.contains
String.prototype.toArray
Comment 1 Radar WebKit Bug Importer 2014-04-15 15:04:49 PDT
<rdar://problem/16626077>
Comment 2 Diego Pino 2014-08-18 03:29:49 PDT
Created attachment 236753 [details]
Patch
Comment 3 Diego Pino 2014-08-18 03:31:52 PDT
String.prototype.toArray is no longer part of the ES6 draft (http://people.mozilla.org/~jorendorff/es6-draft.html).
Comment 4 Build Bot 2014-08-18 04:54:51 PDT
Comment on attachment 236753 [details]
Patch

Attachment 236753 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/6691662772305920

New failing tests:
js/Object-getOwnPropertyNames.html
fast/loader/redirect-to-invalid-url-using-javascript-disallowed-after-load.html
Comment 5 Build Bot 2014-08-18 04:54:54 PDT
Created attachment 236757 [details]
Archive of layout-test-results from webkit-ews-03 for mac-mountainlion

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-03  Port: mac-mountainlion  Platform: Mac OS X 10.8.5
Comment 6 Build Bot 2014-08-18 05:57:18 PDT
Comment on attachment 236753 [details]
Patch

Attachment 236753 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/6400938583523328

New failing tests:
js/Object-getOwnPropertyNames.html
fast/loader/redirect-to-invalid-url-using-javascript-disallowed-after-load.html
Comment 7 Build Bot 2014-08-18 05:57:21 PDT
Created attachment 236760 [details]
Archive of layout-test-results from webkit-ews-07 for mac-mountainlion

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-07  Port: mac-mountainlion  Platform: Mac OS X 10.8.5
Comment 8 Diego Pino 2014-08-18 06:44:27 PDT
Created attachment 236762 [details]
Patch
Comment 9 Geoffrey Garen 2014-08-18 11:30:43 PDT
Comment on attachment 236762 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=236762&action=review

Looks pretty good, but I think I see some bugs.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1561
> +    if (a0.isObject() && a0.inherits(RegExpObject::info()))

jsDynamicCast<RegExpObject> is the best way to do this.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1564
> +    String str = exec->thisValue().toString(exec)->value(exec);

You need to do some kind of type check on exec->thisValue(), since it can be anything, including an environment record (JSActivation), which you should not use directly. Perhaps the specification calls for something like "checkObjectCoercible"?

You should add a test for the case where this function is called with an environment record as 'this'. You can do so by doing something like "(function() { var f = String.prototype.startsWith; (function() { startsWith("a"); })(); })()".

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1582
> +        if (str.characterAt(i) != searchStr.characterAt(j))

You should use bracket access instead, as the WTFString.h header indicates:

    // Workaround for a compiler bug. Use operator[] instead.
    UChar characterAt(unsigned index) const

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1588
> +EncodedJSValue JSC_HOST_CALL stringProtoFuncEndsWith(ExecState* exec)

Same comments as above.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1623
> +EncodedJSValue JSC_HOST_CALL stringProtoFuncContains(ExecState* exec)

Ditto.
Comment 10 Darin Adler 2014-08-18 12:44:36 PDT
Comment on attachment 236762 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=236762&action=review

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1579
> +    uint32_t start = std::min(static_cast<uint32_t>(std::max(pos, 0)), len);
> +
> +    if (searchLength + start > len)
> +        JSValue::encode(jsBoolean(false));
> +
> +    if (!searchLength)
> +        return JSValue::encode(jsBoolean(!len));

We likely don’t need this code if we use WTF::String::startsWith.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1581
> +    for (uint32_t i = start, j = 0; j < searchLength; i++, j++) {

Why do this a character at a time instead of using WTF::String::startsWith? If WTF::String::startsWith is missing some features, we could add them.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1613
> +    uint32_t end = std::min(static_cast<uint32_t>(std::max(pos, 0)), len);
> +    int32_t start = end - searchLength;
> +
> +    if (start < 0)
> +        return JSValue::encode(jsBoolean(false));
> +
> +    if (!searchLength)
> +        return JSValue::encode(jsBoolean(!len));

We likely don’t need this code if we use WTF::String::endsWith.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1616
> +    for (uint32_t j = 0; j < searchLength; i++, j++) {

Why do this a character at a time instead of using WTF::String::endsWith? If WTF::String::endsWith is missing some features, we could add them.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1644
> +    uint32_t start = std::min(static_cast<uint32_t>(std::max(pos, 0)), len);
> +
> +    if (!searchLength)
> +        return JSValue::encode(jsBoolean(!len));

We likely don’t need this code if we use WTF::String::contains.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1646
> +    for (uint32_t i = start; i < len; i++) {

Why do this a character at a time instead of using WTF::String::contains? If WTF::String::contains is missing some features, we could add them.
Comment 11 Diego Pino 2014-08-18 16:40:19 PDT
Comment on attachment 236762 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=236762&action=review

>> Source/JavaScriptCore/runtime/StringPrototype.cpp:1581
>> +    for (uint32_t i = start, j = 0; j < searchLength; i++, j++) {
> 
> Why do this a character at a time instead of using WTF::String::startsWith? If WTF::String::startsWith is missing some features, we could add them.

The reason is that WTF::String::startsWith doesn't support a starting position, same as endsWith (doesn't support an endPosition) and contains (doesn't support a starting position). I can add new methods to WTF:String for supporting a second parameter indicating position for these methods and rely the implementation of String.prototype.startsWith on WTF:String::startsWith (same for endsWith and contains).
Comment 12 Diego Pino 2014-08-18 23:05:13 PDT
(In reply to comment #11)
> (From update of attachment 236762 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=236762&action=review
> 
> >> Source/JavaScriptCore/runtime/StringPrototype.cpp:1581
> >> +    for (uint32_t i = start, j = 0; j < searchLength; i++, j++) {
> > 
> > Why do this a character at a time instead of using WTF::String::startsWith? If WTF::String::startsWith is missing some features, we could add them.
> 
> The reason is that WTF::String::startsWith doesn't support a starting position, same as endsWith (doesn't support an endPosition) and contains (doesn't support a starting position). I can add new methods to WTF:String for supporting a second parameter indicating position for these methods and rely the implementation of String.prototype.startsWith on WTF:String::startsWith (same for endsWith and contains).

Forget about this comment, I can substring 'str' by 'pos' and use the methods of WTF::String. No need to add new methods to WTF::String.
Comment 13 Diego Pino 2014-08-19 01:39:05 PDT
Created attachment 236806 [details]
Patch
Comment 14 Diego Pino 2014-08-19 01:45:56 PDT
Fixes:

   * Use checkObjectCoercible. Added test.
   * Cast to regexp using jsDynamicCast<RegExpObject*>. Added test.
   * Implement startsWith, endsWith and contains using equivalent functions in WTF::String.
Comment 15 Geoffrey Garen 2014-08-19 13:15:56 PDT
Comment on attachment 236806 [details]
Patch

r=me
Comment 16 WebKit Commit Bot 2014-08-19 13:18:35 PDT
Comment on attachment 236806 [details]
Patch

Rejecting attachment 236806 [details] from commit-queue.

Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.appspot.com', '--bot-id=webkit-cq-02', 'apply-attachment', '--no-update', '--non-interactive', 236806, '--port=mac']" exit_code: 2 cwd: /Volumes/Data/EWS/WebKit

Last 500 characters of output:
 #1 succeeded at 1 with fuzz 3.
patching file LayoutTests/js/Object-getOwnPropertyNames-expected.txt
patching file LayoutTests/js/script-tests/Object-getOwnPropertyNames.js
patching file LayoutTests/js/script-tests/string-contains.js
patching file LayoutTests/js/string-contains-expected.txt
patching file LayoutTests/js/string-contains.html

Failed to run "[u'/Volumes/Data/EWS/WebKit/Tools/Scripts/svn-apply', '--force', '--reviewer', u'Geoffrey Garen']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

Full output: http://webkit-queues.appspot.com/results/6394734234828800
Comment 17 Geoffrey Garen 2014-08-19 13:23:18 PDT
Comment on attachment 236806 [details]
Patch

Looks like this didn't merge cleanly.
Comment 18 Darin Adler 2014-08-19 20:27:19 PDT
(In reply to comment #12)
> Forget about this comment, I can substring 'str' by 'pos' and use the methods of WTF::String.

That’s not a good strategy. Calling String::substring allocates a new string every time, so it’s really slow. We need to either add additional overloads or arguments to String member functions to let us deal with substrings without computing them, or use StringView::substring and StringView member functions instead.
Comment 19 Diego Pino 2014-08-25 07:05:04 PDT
Created attachment 237082 [details]
Patch
Comment 20 Diego Pino 2014-08-25 07:14:39 PDT
I added new methods to WTF::String to support startsWith, endsWith and contains supporting a position parameter.

In the case of contains, it was easy as the underlying implementation already supported this parameter but it was always set to 0.

For the cases of startsWith and endWith I added new methods. I added a new implementation of StringImpl::equalInner to support endOffset and do the comparison using StringImpl, instead of const char*, which is what the other implementation of equalInner uses.

I believe that some of the code for supporting the different implementations of startsWith and endsWith could be refactored, but I think that makes sense to do it in a different patch.
Comment 21 Darin Adler 2014-08-25 12:13:46 PDT
Comment on attachment 237082 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=237082&action=review

Good approach, but there are some mistakes here, so it’s not ready to go yet.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1564
> +    if (jsDynamicCast<RegExpObject*>(a0))

I believe we should only use jsDynamicCast when we want a pointer. If we are just looking for a boolean predicate, then we should just use a0.inherits(RegExpObject::info()).

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1570
> +    String str = thisValue.toString(exec)->value(exec);

I know the other functions in this file are using this kind of shorthand, but there is little value in using “str” instead of “string” for this local variable, or maybe even stringToSearchIn.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1571
> +    uint32_t len = str.length();

Same thing about “len” vs. “length”. But also, this variable isn’t needed at all.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1573
> +    String searchStr = a0.toString(exec)->value(exec);

And “searchStr” vs. “searchString”. But also, search string could either be the string we are searching "in" or the string we are searching "for". I’d use a name like “targetString” or “stringToSearchFor”.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1576
> +    JSValue a1 = exec->argument(1);
> +    int32_t pos = a1.isUndefined() ? 0 : a1.toInt32(exec);

The check for isUndefined here is unneeded. toInt32 will already yield zero for undefined as well as for null and quite a few other types. The code should just say:

    int position = exec->argument(1).toInt32(exec);

Or even better:

    unsigned start = std::max(0, exec->argument(1).toInt32(exec));

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1578
> +    uint32_t start = std::min(static_cast<uint32_t>(std::max(pos, 0)), len);

This call to std::min is not needed: String::startsWith already will handle that case.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1580
> +    return JSValue::encode(jsBoolean(str.startsWith(searchStr, start, true)));

We should make caseSensitive default to true and not pass it explicitly here. There should be far fewer local variables here.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1583
> +EncodedJSValue JSC_HOST_CALL stringProtoFuncEndsWith(ExecState* exec)

Same comments apply here.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1609
> +EncodedJSValue JSC_HOST_CALL stringProtoFuncContains(ExecState* exec)

Same comments apply here.

> Source/WTF/wtf/text/StringImpl.cpp:1373
> +    ASSERT(stringImpl);

If we are asserting a pointer is not null then it should be a reference, not a pointer. Both stringImpl and matchString should be of type StringImpl&. There is no need to mark it const, because a StringImpl is intrinsically const. The argument names don’t need “impl” in their names.

> Source/WTF/wtf/text/StringImpl.cpp:1375
> +    ASSERT(startOffset + matchString->length() <= stringImpl->length());

This can overflow. It should be written so it can’t overflow.

> Source/WTF/wtf/text/StringImpl.cpp:1379
> +    if (endOffset >= stringImpl->length())
> +        endOffset = stringImpl->length();

It’s strange that this function does this work for endOffset, but not startOffset, where it instead asserts.

> Source/WTF/wtf/text/StringImpl.cpp:1382
> +        const LChar* str = stringImpl->characters8() + startOffset;

Please don’t name a character pointer variable “str”. I think characters is a good name for this.

> Source/WTF/wtf/text/StringImpl.cpp:1384
> +        if (endOffset < stringImpl->length())
> +            memcpy((LChar*) str, (LChar*) str, endOffset);

This code is nonsense and does nothing. What were we trying to do here?

> Source/WTF/wtf/text/StringImpl.cpp:1391
> +    if (endOffset < stringImpl->length())
> +        memcpy((UChar*) str, (UChar*) str, endOffset);

This code is nonsense and does nothing. What were we trying to do here?

> Source/WTF/wtf/text/StringImpl.cpp:1431
> +    if (matchString->length() + startOffset > length())
> +        return false;

This can overflow. It needs to be rewritten in a way that cannot overflow.

> Source/WTF/wtf/text/StringImpl.cpp:1461
> +    ASSERT(matchLength);

How does this compile? There is nothing named matchLength.

> Source/WTF/wtf/text/StringImpl.cpp:1462
> +    int startOffset = endOffset - matchString->length();

This can overflow. It needs to be written in a way that cannot overflow.

> Source/WTF/wtf/text/StringImpl.h:676
> +    WTF_EXPORT_STRING_API bool startsWith(const StringImpl *, unsigned startOffset, bool caseSensitive) const;

Argument type should be StringImpl&, not const StringImpl*.

> Source/WTF/wtf/text/StringImpl.h:683
> +    WTF_EXPORT_STRING_API bool endsWith(const StringImpl *, unsigned endOffset, bool caseSensitive) const;

Argument type should be StringImpl&, not const StringImpl*.

> Source/WTF/wtf/text/WTFString.h:260
> +    bool contains(const LChar* str, bool caseSensitive = true, unsigned startOffset = 0) const { return find(str, startOffset, caseSensitive) != notFound; }
> +    bool contains(const String& str, bool caseSensitive = true, unsigned startOffset = 0) const { return find(str, startOffset, caseSensitive) != notFound; }

We don’t want the case sensitive boolean after the start offset. You should make a separate overload to add the start offset rather than tacking it on the end.

> Source/WTF/wtf/text/WTFString.h:271
> +    bool startsWith(const String& s, unsigned startOffset, bool caseSensitive)

Should default case sensitive to true.

> Source/WTF/wtf/text/WTFString.h:282
> +    bool endsWith(const String& s, unsigned endOffset, bool caseSensitive) const

Should default case sensitive to true.

> Source/WTF/wtf/text/WTFString.h:283
> +    { return m_impl ? m_impl->endsWith(s.impl(), endOffset, caseSensitive) : false; }

Incorrect indentation.
Comment 22 Diego Pino 2014-08-25 15:41:03 PDT
Comment on attachment 237082 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=237082&action=review

>> Source/JavaScriptCore/runtime/StringPrototype.cpp:1564
>> +    if (jsDynamicCast<RegExpObject*>(a0))
> 
> I believe we should only use jsDynamicCast when we want a pointer. If we are just looking for a boolean predicate, then we should just use a0.inherits(RegExpObject::info()).

Thanks for the extensive review.

Originally I did that but Geoffrey suggested me to use jsDynamicCast instead. See comment #9.

>> Source/JavaScriptCore/runtime/StringPrototype.cpp:1570
>> +    String str = thisValue.toString(exec)->value(exec);
> 
> I know the other functions in this file are using this kind of shorthand, but there is little value in using “str” instead of “string” for this local variable, or maybe even stringToSearchIn.

OK

>> Source/JavaScriptCore/runtime/StringPrototype.cpp:1571
>> +    uint32_t len = str.length();
> 
> Same thing about “len” vs. “length”. But also, this variable isn’t needed at all.

OK. The variable is used in line 1578, although could be replaced directly as it's the only line where is used.

> uint32_t start = std::min(static_cast<uint32_t>(std::max(pos, 0)), len);

>> Source/JavaScriptCore/runtime/StringPrototype.cpp:1573
>> +    String searchStr = a0.toString(exec)->value(exec);
> 
> And “searchStr” vs. “searchString”. But also, search string could either be the string we are searching "in" or the string we are searching "for". I’d use a name like “targetString” or “stringToSearchFor”.

What about "matchString"? It is what StringImpl uses and it will be consistent. If not, "stringToSearchFor" sounds good to me.

>> Source/JavaScriptCore/runtime/StringPrototype.cpp:1576
>> +    int32_t pos = a1.isUndefined() ? 0 : a1.toInt32(exec);
> 
> The check for isUndefined here is unneeded. toInt32 will already yield zero for undefined as well as for null and quite a few other types. The code should just say:
> 
>     int position = exec->argument(1).toInt32(exec);
> 
> Or even better:
> 
>     unsigned start = std::max(0, exec->argument(1).toInt32(exec));

OK.

>> Source/JavaScriptCore/runtime/StringPrototype.cpp:1578
>> +    uint32_t start = std::min(static_cast<uint32_t>(std::max(pos, 0)), len);
> 
> This call to std::min is not needed: String::startsWith already will handle that case.

OK.

>> Source/JavaScriptCore/runtime/StringPrototype.cpp:1580
>> +    return JSValue::encode(jsBoolean(str.startsWith(searchStr, start, true)));
> 
> We should make caseSensitive default to true and not pass it explicitly here. There should be far fewer local variables here.

OK.

>> Source/WTF/wtf/text/StringImpl.cpp:1373
>> +    ASSERT(stringImpl);
> 
> If we are asserting a pointer is not null then it should be a reference, not a pointer. Both stringImpl and matchString should be of type StringImpl&. There is no need to mark it const, because a StringImpl is intrinsically const. The argument names don’t need “impl” in their names.

OK, agree. My doubt is that many other methods in this file (the several implementations of startsWith and endsWith, for instance) use "const StringImpl*" as datatype and follow the same naming convention for "stringImpl". I think that maybe is valuable to use the same datatype and naming convention for keeping consistency, if not I can change it. WDYT?

>> Source/WTF/wtf/text/StringImpl.cpp:1375
>> +    ASSERT(startOffset + matchString->length() <= stringImpl->length());
> 
> This can overflow. It should be written so it can’t overflow.

OK.

>> Source/WTF/wtf/text/StringImpl.cpp:1382
>> +        const LChar* str = stringImpl->characters8() + startOffset;
> 
> Please don’t name a character pointer variable “str”. I think characters is a good name for this.

OK.

>> Source/WTF/wtf/text/StringImpl.cpp:1384
>> +            memcpy((LChar*) str, (LChar*) str, endOffset);
> 
> This code is nonsense and does nothing. What were we trying to do here?

What I was trying to do was to shrink "str", being its new length "endOffset" characters. But yes, you are correct, this code has no effect. So I was wondering how is that this still working? Well, the reason is that the value of "endOffset" is implicitly set in "startOffset". In StringImpl:endsWith:

> int startOffset = endOffset - matchString->length();
> if (startOffset < 0)
>    return false;

> return equalInner(this, startOffset, endOffset, matchString, caseSensitive);

So basically the parameter "endOffset" in this method is useless. There's another "equalInner" implementation in this class. It doesn't has a "endOffset" parameter and "matchString" is "const char*" instead of "StringImpl *". I guess I will use this method instead, and remove the one I added.

>> Source/WTF/wtf/text/StringImpl.cpp:1431
>> +        return false;
> 
> This can overflow. It needs to be rewritten in a way that cannot overflow.

OK.

>> Source/WTF/wtf/text/StringImpl.cpp:1461
>> +    ASSERT(matchLength);
> 
> How does this compile? There is nothing named matchLength.

True. AFAIK, ASSERTs are removed in shipping builds, it could be that. I added a new ASSERT with a non-existent variable and compiled :/ In any case, I'd expect this failed.

>> Source/WTF/wtf/text/StringImpl.h:676
>> +    WTF_EXPORT_STRING_API bool startsWith(const StringImpl *, unsigned startOffset, bool caseSensitive) const;
> 
> Argument type should be StringImpl&, not const StringImpl*.

OK.

>> Source/WTF/wtf/text/WTFString.h:260
>> +    bool contains(const String& str, bool caseSensitive = true, unsigned startOffset = 0) const { return find(str, startOffset, caseSensitive) != notFound; }
> 
> We don’t want the case sensitive boolean after the start offset. You should make a separate overload to add the start offset rather than tacking it on the end.

OK.

>> Source/WTF/wtf/text/WTFString.h:271
>> +    bool startsWith(const String& s, unsigned startOffset, bool caseSensitive)
> 
> Should default case sensitive to true.

OK.

>> Source/WTF/wtf/text/WTFString.h:283
>> +    { return m_impl ? m_impl->endsWith(s.impl(), endOffset, caseSensitive) : false; }
> 
> Incorrect indentation.

OK.
Comment 23 Darin Adler 2014-08-25 15:55:16 PDT
Comment on attachment 237082 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=237082&action=review

>>> Source/JavaScriptCore/runtime/StringPrototype.cpp:1564
>>> +    if (jsDynamicCast<RegExpObject*>(a0))
>> 
>> I believe we should only use jsDynamicCast when we want a pointer. If we are just looking for a boolean predicate, then we should just use a0.inherits(RegExpObject::info()).
> 
> Thanks for the extensive review.
> 
> Originally I did that but Geoffrey suggested me to use jsDynamicCast instead. See comment #9.

OK, Geoff is the authority on this, so go with what he said.

>>> Source/JavaScriptCore/runtime/StringPrototype.cpp:1573
>>> +    String searchStr = a0.toString(exec)->value(exec);
>> 
>> And “searchStr” vs. “searchString”. But also, search string could either be the string we are searching "in" or the string we are searching "for". I’d use a name like “targetString” or “stringToSearchFor”.
> 
> What about "matchString"? It is what StringImpl uses and it will be consistent. If not, "stringToSearchFor" sounds good to me.

Either of those is OK with me.

>>> Source/WTF/wtf/text/StringImpl.cpp:1373
>>> +    ASSERT(stringImpl);
>> 
>> If we are asserting a pointer is not null then it should be a reference, not a pointer. Both stringImpl and matchString should be of type StringImpl&. There is no need to mark it const, because a StringImpl is intrinsically const. The argument names don’t need “impl” in their names.
> 
> OK, agree. My doubt is that many other methods in this file (the several implementations of startsWith and endsWith, for instance) use "const StringImpl*" as datatype and follow the same naming convention for "stringImpl". I think that maybe is valuable to use the same datatype and naming convention for keeping consistency, if not I can change it. WDYT?

Using & instead of * for things that are never null is newly the style in WebKit this year, so older code would be using * and newer &. As far as const, it’s an easy mistake to make. Someone might not realize that StringImpl is an immutable class.

>>> Source/WTF/wtf/text/StringImpl.cpp:1375
>>> +    ASSERT(startOffset + matchString->length() <= stringImpl->length());
>> 
>> This can overflow. It should be written so it can’t overflow.
> 
> OK.

The idiom here for not overflowing is:

    ASSERT(startOffset <= stringImpl->length());
    ASSERT(matchString->length() <= stringImpl->length());
    ASSERT(startOffset + matchString->length() <= stringImpl->length());
Comment 24 Diego Pino 2014-08-26 13:13:27 PDT
Created attachment 237169 [details]
Patch
Comment 25 Diego Pino 2014-08-26 13:31:57 PDT
I applied the suggestions pointed out in the last review.

There's a change I was not able to do, which is setting a default value for argument "caseSensitive". The problem is that the signature of the new added methods collide with other existing methods, and although the compiler is able to solve them there are many annoying Warning messages during the build. More precisely, the problem is the following:

bool startsWith(const String& s, bool caseSensitive) const;
bool startsWith(String& s, unsigned startOffset, bool caseSensitive = true) const;

A call like "consumedString.startsWith(string, caseSensitive)" matches both signatures because the compiler can do an internal casting of boolean to int, although for the commented call it picks the first method.

The warning message would be something like:

../../Source/WebCore/platform/text/SegmentedString.h:397:60: warning: ISO C++ says that these are ambiguous, even though the worst conversion for the first is better than the worst conversion for the second: [enabled by default]
         if (consumedString.startsWith(string, caseSensitive))
                                                            ^
In file included from ../../Source/WTF/wtf/text/AtomicString.h:26:0,
                 from ../../Source/WebCore/dom/EventListenerMap.h:39,
                 from ../../Source/WebCore/dom/EventTarget.h:35,
                 from ../../Source/WebCore/dom/Node.h:29,
                 from ../../Source/WebCore/dom/ContainerNode.h:28,
                 from ../../Source/WebCore/dom/DocumentFragment.h:27,
                 from ../../Source/WebCore/dom/DocumentFragment.cpp:24:
../../Source/WTF/wtf/text/WTFString.h:266:10: note: candidate 1: bool WTF::String::startsWith(const WTF::String&, bool) const
     bool startsWith(const String& s, bool caseSensitive) const
          ^
../../Source/WTF/wtf/text/WTFString.h:273:10: note: candidate 2: bool WTF::String::startsWith(const WTF::String&, unsigned int, bool)
     bool startsWith(const String& s, unsigned startOffset, bool caseSensitive = true)

Possible solutions:

   * Name the new methods differently so their signature doesn't collide with other existing methods.
   * Refactor the code to reduce ambiguity, although it could be a big refactoring that may affect many files. I think that if we are opting for this solution, it should be a different patch.

To avoid similar problems in "contains" I opted for the solution suggested on the last patch, which is adding an extra paratemer to the existing "contains" method. What I don't like much of this solution is that ends up with "contains" having a different signature than "endsWith" and "startsWith".

bool startsWith(String& s, unsigned startOffset, bool caseSensitive);
bool endsWith(String& s, unsigned endOffset, bool caseSensitive);
bool contains(const String& str, bool caseSensitive = true, unsigned startOffset = 0);
Comment 26 Darin Adler 2014-08-26 20:37:32 PDT
Comment on attachment 237169 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=237169&action=review

Thanks for working so hard on this. Sorry to keep coming up with new things, but I realized a few issues i had overlooked earlier.

Also, this overloading problem is a bad thing to introduce into the String class. We really don’t want to avoid it by simply having mysterious additional "true" arguments at every call site; it’s simply an ugly pattern. I think the direction we will want to go to clean this up is that instead of adding more functions to String, we will want to add and use StringView::contains/startsWith/endsWith and use StringView::substring to restrict the ranges of the searches. When we do that we will then get rid of these new overloads we just added. I suppose it’s OK to land them like this for now, but since I do see some difficulty here in writing those functions correctly without overflow problems we might want to just write the StringView ones instead.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1571
> +    String stringToSearchIn = thisValue.toString(exec)->value(exec);
> +    String matchString = a0.toString(exec)->value(exec);

After each of these lines we need:

    if (exec->hadException())
        return JSValue::encode(jsUndefined());

That’s because toString can end up calling valueOf, which can raise an exception. It’s tricky to construct test cases that cover this, but it can be done by constructing multiple objects with valueOf functions, one that raises an exception and others that have side effects. The trick is that if one valueOf function raises an exception we can’t be seen calling the second one or the third one, since toInt32 can also end up calling valueOf. The test needs to check that the side effects of the subsequent valueOf functions do not occur when an earlier one raises an exception. We have very little test coverage for this particular fine point of getting these functions right. Making them work properly is relatively straightforward as long as we are careful to evaluate the arguments in the correct order and return if an exception occurs. Strictly speaking we also want to decide if thisValue is converted to a string before or after we check the type of argument 0.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1590
> +    String stringToSearchIn = thisValue.toString(exec)->value(exec);

Needs a hadException return statement as above.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1593
> +    String matchString = a0.toString(exec)->value(exec);

Needs a hadException return statement as above.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1615
> +    String stringToSearchIn = thisValue.toString(exec)->value(exec);

Needs a hadException return statement as above.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1616
> +    String matchString = a0.toString(exec)->value(exec);

Needs a hadException return statement as above.

> Source/WTF/wtf/text/StringImpl.cpp:1405
> +    ASSERT(matchString);

This assertion is incorrect and should be removed. References can’t be null. I’m not sure why this compiles.

> Source/WTF/wtf/text/StringImpl.cpp:1407
> +    if (matchString.length() + startOffset > length())
> +        return false;

This can overflow. It needs to be written so it is not subject to overflow. For example, matchString can be 1 character long and startOffset can be 0xFFFFFFFF. Then we’ll hit the assertion inside equalInner. Please write the code in a way that is not susceptible to overflow.

> Source/WTF/wtf/text/StringImpl.cpp:1409
> +    return equalInner(this, startOffset, reinterpret_cast<const char*>(matchString.characters8()), 
> +        matchString.length(), caseSensitive);

This won’t work if the string is not an 8-bit string.

> Source/WTF/wtf/text/StringImpl.cpp:1438
> +    ASSERT(matchString);

This assertion is incorrect and should be removed. References can’t be null. I’m not sure why this compiles.

> Source/WTF/wtf/text/StringImpl.cpp:1441
> +    int startOffset = endOffset - matchString.length();
> +    if (startOffset < 0)
> +        return false;

This can overflow. It needs to be written so it is not subject to overflow. For example, matchString can be 1 character long and endOffset can be 0x80000000. Then startOffset will be 0x7FFFFFFF and we’ll hit the assertion inside equalInner. Please write the code in a way that is not susceptible to overflow.

> Source/WTF/wtf/text/StringImpl.cpp:1443
> +    return equalInner(this, startOffset, reinterpret_cast<const char*>(matchString.characters8()), 
> +        matchString.length(), caseSensitive);

This won’t work if the string is not an 8-bit string.

> LayoutTests/js/script-tests/string-contains.js:1
> +description("This test checks the ES6 string functions startsWith(), endsWith() and contains().");

We need more edge cases here. These tests would miss all the problems with the range checking overflow in the functions.
Comment 27 Diego Pino 2014-08-29 03:08:39 PDT
Created attachment 237340 [details]
Patch
Comment 28 Diego Pino 2014-08-29 03:35:32 PDT
Comment on attachment 237340 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=237340&action=review

> Source/WTF/wtf/text/StringImpl.cpp:1454
> +    ASSERT(!(endOffset <= matchString.length() ^ startOffset <= 0));

This reads as if 'endOffset' is greater or equals than 'matchLength', 'startOffset' should be positive or zero. On the other hand, if 'endOffset' is lower or equals than 'matchLength', 'startOffset' should be negative.

> LayoutTests/js/script-tests/string-contains.js:56
> +sideEffect = 0;

It happened that offset.valueOf() was executed, so instead of making matchString.valueOf() to launch an exception, I did it the other way around. The offset.valueOf() launches an exception and check matchString.valueOf() was not run. I also changed the body of the JavaScript functions so arg(1) (offset) is fetched before arg(0) (matchString).

> LayoutTests/js/string-contains-expected.txt:42
> +PASS 'foo bar'.contains('bar', 0x7fffffff) is false

I didn't know what's the best way to validate an ASSERT. All functions return 0 when value of startOffset is negative. Now looking at it, startsWith and contains are returning false because the offset is larger than the length of the string. In the case of endsWith I think is working as 0x80000010 - 3 = 0x80000007 which is a negative number in int32_t.

I'm not sure about the idiom for checking for an overflow in startsWith and contains. I think it should check that the sum of the two values should be greater than any of them:

ASSERT(startOffset + matchString.length() >= startOffset);
Comment 29 Diego Pino 2014-08-29 03:40:07 PDT
(In reply to comment #26)

> Also, this overloading problem is a bad thing to introduce into the String class. We really don’t want to avoid it by simply having mysterious additional "true" arguments at every call site; it’s simply an ugly pattern. I think the direction we will want to go to clean this up is that instead of adding more functions to String, we will want to add and use StringView::contains/startsWith/endsWith and use StringView::substring to restrict the ranges of the searches. When we do that we will then get rid of these new overloads we just added. I suppose it’s OK to land them like this for now, but since I do see some difficulty here in writing those functions correctly without overflow problems we might want to just write the StringView ones instead.

I couldn't find similar functions in StringView (contains/startsWith/endsWith), there's a contain, but only works for one char, that's why I stick with adding new functions to WTF::String. Maybe I didn't look in the right place. I agree with your reasoning, I don't like very much either the mandatory caseSensitive flag.
Comment 30 Darin Adler 2014-08-29 09:18:24 PDT
(In reply to comment #29)
> (In reply to comment #26)
> 
> > Also, this overloading problem is a bad thing to introduce into the String class. We really don’t want to avoid it by simply having mysterious additional "true" arguments at every call site; it’s simply an ugly pattern. I think the direction we will want to go to clean this up is that instead of adding more functions to String, we will want to add and use StringView::contains/startsWith/endsWith and use StringView::substring to restrict the ranges of the searches. When we do that we will then get rid of these new overloads we just added. I suppose it’s OK to land them like this for now, but since I do see some difficulty here in writing those functions correctly without overflow problems we might want to just write the StringView ones instead.
> 
> I couldn't find similar functions in StringView (contains/startsWith/endsWith)

Correct. I said “add and use”, not “find and use”.
Comment 31 Darin Adler 2014-08-29 09:54:12 PDT
Comment on attachment 237340 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=237340&action=review

I appreciate your continued work on this. It’s almost right, but there are various things wrong.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1565
> +    if (exec->hadException())
> +        return JSValue::encode(jsUndefined());

Is the ordering right here? Does this need to be done before the checking of the argument?

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1572
> +    if (a0.isUndefinedOrNull())
> +        return JSValue::encode(jsBoolean(false));

This is new. Is it correct? Is a null value supposed to turn into a false result or turn into the string "null"? What about undefined? What about a missing argument? What does the specification say?

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1578
> +    if (exec->hadException())
> +        return JSValue::encode(jsUndefined());

It’s OK to include these two lines of code, but they are not needed. There is no way for the JavaScript code to detect whether we do the startsWith or not. The return value is undefined if we have an exception.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1591
> +    if (exec->hadException())
> +        return JSValue::encode(jsUndefined());

Is the ordering right here? Does this need to be done before the checking of the argument?

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1598
> +    if (a0.isUndefinedOrNull())
> +        return JSValue::encode(jsBoolean(false));

Same question as above.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1607
> +    if (exec->hadException())
> +        return JSValue::encode(jsUndefined());

Same point about not needing these two lines of code.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1627
> +    if (a0.isUndefinedOrNull())
> +        return JSValue::encode(jsBoolean(false));

Same question again.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1633
> +    if (exec->hadException())
> +        return JSValue::encode(jsUndefined());

Same point about not needing these two lines of code.

> Source/WTF/wtf/text/StringImpl.cpp:1414
> +    ASSERT(startOffset <= length());
> +    ASSERT(matchString.length() <= length());
> +    ASSERT(startOffset + matchString.length() <= length());

It’s not right to assert this. All the other functions in this class clamp these values. They don’t treat it as illegal to pass a large value. Putting an assertion in doesn’t fix the overflow issue.

> Source/WTF/wtf/text/StringImpl.cpp:1417
> +    if (matchString.length() + startOffset > length())
> +        return false;

Here’s how we write this and avoid overflow:

    if (startOffset > length())
        return false;
    if (matchString.length() > length())
        return false;
    if (matchString.length() + startOffset > length())
        return false;

No assertions. But also, I suggest we put these checks into equalInner rather than here. This function can just be a forwarder to equalInner.

> Source/WTF/wtf/text/StringImpl.cpp:1423
> +    return equalInner(this, startOffset, reinterpret_cast<const char*>(matchString.characters16()), 
> +        matchString.length(), caseSensitive);

This is wrong. We can’t just cast a run of 16-bit characters to a run of 8-bit characters. The code won’t compare all the characters! And it will convert individual bytes of the characters to lowercase if asked to not be case sensitive.

> Source/WTF/wtf/text/StringImpl.cpp:1452
> +    int startOffset = endOffset - matchString.length();

I am still not sure that I understand what an end offset is, so it’s really hard for me to understand what this function is trying to do.

But this is not the right way to write this to correctly range check and avoid overflow. We should instead just write this:

    if (endOffset > matchString.length())
        return false;
    unsigned startOffset = matchString.length() - endOffset;

Don’t involve signed arithmetic at all.

And we will need these checks:

    if (startOffset > length())
        return false;
    if (matchString.length() > length())
        return false;
    if (matchString.length() + startOffset > length())
        return false;

But I suggest putting them inside equalInner.

>> Source/WTF/wtf/text/StringImpl.cpp:1454
>> +    ASSERT(!(endOffset <= matchString.length() ^ startOffset <= 0));
> 
> This reads as if 'endOffset' is greater or equals than 'matchLength', 'startOffset' should be positive or zero. On the other hand, if 'endOffset' is lower or equals than 'matchLength', 'startOffset' should be negative.

Fine, but it’s not appropriate to assert these things.

> Source/WTF/wtf/text/StringImpl.cpp:1457
> +    if (startOffset < 0)
> +        return false;

Please just leave this out.

> Source/WTF/wtf/text/StringImpl.cpp:1463
> +    return equalInner(this, startOffset, reinterpret_cast<const char*>(matchString.characters16()),
> +        matchString.length(), caseSensitive);

Again, wrong handling of 16-bit characters, and not enough test coverage to see that’s what’s happening.

> LayoutTests/js/script-tests/string-contains.js:52
> +// If int32.valueOf() is called, string.valueOf() should not be called.

These tests are on the right track, but here’s what I would suggest:

Make three objects that all have valueOf on them (for the string itself, the offset, and the match string), each with separate side effects (appending to a string is one good way to do it) and then have a test to make sure the side effects all happen in the right order. For example, the use "A", "B", and "C" and make sure the side effects string is "ABC".

Then for each of the three objects, replace with one that raises an exception and make sure that the exception is propagated properly and only the side effects that are supposed to come first happen. So you’d get an exception and the string is empty, then "A", then "AB".

> LayoutTests/js/string-contains-expected.txt:14
> +PASS 'foo bar'.contains() is false

No test coverage for contains(null) or contains(undefined) or even contains(<number>). In particular we would want test cases like: "null".contains(null) and "xnullx".contains(null) and "xundefinedx".contains(undefined) and "xundefinedx".contains().

>> LayoutTests/js/string-contains-expected.txt:42
>> +PASS 'foo bar'.contains('bar', 0x7fffffff) is false
> 
> I didn't know what's the best way to validate an ASSERT. All functions return 0 when value of startOffset is negative. Now looking at it, startsWith and contains are returning false because the offset is larger than the length of the string. In the case of endsWith I think is working as 0x80000010 - 3 = 0x80000007 which is a negative number in int32_t.
> 
> I'm not sure about the idiom for checking for an overflow in startsWith and contains. I think it should check that the sum of the two values should be greater than any of them:
> 
> ASSERT(startOffset + matchString.length() >= startOffset);

If we ever hit an ASSERT due to JavaScript code, then the ASSERT is wrong or there is a bug we need to fix. ASSERT is used to state something that can never be true unless there is a mistake in the WebKit code.

So there is no way to “validate” an ASSERT, and this assertion indicates the patch is not yet ready.

> LayoutTests/js/string-contains-expected.txt:51
> +TEST COMPLETE

No test cases at all that check the 16-bit-character code path. That’s a problem.
Comment 32 Diego Pino 2014-09-02 03:14:00 PDT
Comment on attachment 237340 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=237340&action=review

>> Source/JavaScriptCore/runtime/StringPrototype.cpp:1565
>> +        return JSValue::encode(jsUndefined());
> 
> Is the ordering right here? Does this need to be done before the checking of the argument?

Do you mean to check if there were an Exception right after calling toString(exec)? Reading other snippets of code in the same file what it seems to be the general pattern is to verify if an exception was raised after calling to "toString(exec)->value(exec)", for example StringPrototype::replaceUsingStringSearch(), and others.

> String searchString = searchValue.toString(exec)->value(exec);
> if (exec->hadException())
>    return JSValue::encode(jsUndefined());

>> Source/JavaScriptCore/runtime/StringPrototype.cpp:1572
>> +        return JSValue::encode(jsBoolean(false));
> 
> This is new. Is it correct? Is a null value supposed to turn into a false result or turn into the string "null"? What about undefined? What about a missing argument? What does the specification say?

No, it's not new. It has been there since the first proposed patch. In any case, good catch because this is not correct. What the specifications says is: 

Step 5. Let searchStr be ToString(searchString).

And the ToString() operation specifies that and undefined value is turned to an "undefined" string and null to string "null". So 'null'.startsWith(null) should be true and 'undefined'.startsWith() should be true too. I will add those cases to the test, plus some other more with numbers and booleans, as suggested below.

>> Source/JavaScriptCore/runtime/StringPrototype.cpp:1578
>> +        return JSValue::encode(jsUndefined());
> 
> It’s OK to include these two lines of code, but they are not needed. There is no way for the JavaScript code to detect whether we do the startsWith or not. The return value is undefined if we have an exception.

According to comment #26, I thought we were adding this check because toString(exec) may end up calling valueOf() and that might return an exception.

>> Source/WTF/wtf/text/StringImpl.cpp:1417
>> +        return false;
> 
> Here’s how we write this and avoid overflow:
> 
>     if (startOffset > length())
>         return false;
>     if (matchString.length() > length())
>         return false;
>     if (matchString.length() + startOffset > length())
>         return false;
> 
> No assertions. But also, I suggest we put these checks into equalInner rather than here. This function can just be a forwarder to equalInner.

Agree. But I think the last 'if' should check if the sum of matchString.length() + startOffset() is lower than matchString.length() or startOffset. Imagine length() is 0xFFFFFFFF, the result of matchString.length() + startOffset will be a 32-bit unsigned value too and could never be higher than 0xFFFFFFFF. When an overflow of unsigned values happens the result of the sum is lower than any of the addends. For instance, 0xFFFFFFFF + 0x1 = 0x0.

>> Source/WTF/wtf/text/StringImpl.cpp:1423
>> +        matchString.length(), caseSensitive);
> 
> This is wrong. We can’t just cast a run of 16-bit characters to a run of 8-bit characters. The code won’t compare all the characters! And it will convert individual bytes of the characters to lowercase if asked to not be case sensitive.

OK, I will look into this.

>> Source/WTF/wtf/text/StringImpl.cpp:1452
>> +    int startOffset = endOffset - matchString.length();
> 
> I am still not sure that I understand what an end offset is, so it’s really hard for me to understand what this function is trying to do.
> 
> But this is not the right way to write this to correctly range check and avoid overflow. We should instead just write this:
> 
>     if (endOffset > matchString.length())
>         return false;
>     unsigned startOffset = matchString.length() - endOffset;
> 
> Don’t involve signed arithmetic at all.
> 
> And we will need these checks:
> 
>     if (startOffset > length())
>         return false;
>     if (matchString.length() > length())
>         return false;
>     if (matchString.length() + startOffset > length())
>         return false;
> 
> But I suggest putting them inside equalInner.

endOffset is the end position of stringToSearchIn that matchString will match against. For instance:

'foo bar'.endsWith('ba', 6)  // true, if 6 is considered the last position of 'foo bar', 'ba' is matched.
'foo bar'.endsWith('ba')     // false, same as endsWith('ba', 7)

Agree with the refactoring and moving the overflow checks to equalInner().

>>> Source/WTF/wtf/text/StringImpl.cpp:1454
>>> +    ASSERT(!(endOffset <= matchString.length() ^ startOffset <= 0));
>> 
>> This reads as if 'endOffset' is greater or equals than 'matchLength', 'startOffset' should be positive or zero. On the other hand, if 'endOffset' is lower or equals than 'matchLength', 'startOffset' should be negative.
> 
> Fine, but it’s not appropriate to assert these things.

OK, I will remove them.
Comment 33 Diego Pino 2014-09-02 03:25:41 PDT
(In reply to comment #32)

> Agree. But I think the last 'if' should check if the sum of matchString.length() + startOffset() is lower than matchString.length() or startOffset. 

I was thinking only of an overflow case, but this checking is still valid. In addition to it, I suggest to add a check for an overflow case.
Comment 34 Diego Pino 2014-09-11 08:22:07 PDT
Created attachment 237954 [details]
Patch
Comment 35 Diego Pino 2014-09-11 08:34:06 PDT
I fixed the issues from the last patch. The most important changes are:

* More tests added. Test 'undefined', 'null', 'true', 'false' values, numbers too.
* Improved the tests of side effect as suggested by Darin. For the string parameters, it seems valueOf() is not being called, but toString() instead, so I overloaded that method.
* I added a new implementation of equalInner that receives a StringImpl& matchString and casts the parameter to 8bit() or 16bit() accordingly. I added a test case with non-ascii strings (Japanese characters). The test run fine with:

$ bin/jsc standalone-pre.js string-contains.js standalone-post.js

but if I run it as a Layout test:

$ Tools/Scripts/run-webkit-tests --gtk -2 LayoutTests/js/string-contains.html

it messed up the encoding and I got this:

-PASS 'フーバー'.contains('ーバ') is true
-PASS 'フーバー'.contains('クー') is false
+PASS 'フーãƒ<90>ー'.contains('ーãƒ<90>') is true
+PASS 'フーãƒ<90>ー'.contains('クー') is false

So, how should I test non-ascii strings?
Comment 36 Darin Adler 2014-09-11 20:09:01 PDT
Comment on attachment 237954 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=237954&action=review

> LayoutTests/js/string-contains.html:1
> +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">

This is the wrong doctype to use. The right one is:

<!DOCTYPE html>

Or no DOCTYPE at all would be OK.

> LayoutTests/js/string-contains.html:4
> +<script src="../resources/js-test-pre.js"></script>

We also need this:

<meta charset="UTF-8">

That’s what will make non-ASCII tests work properly.
Comment 37 Darin Adler 2014-09-11 20:19:41 PDT
Comment on attachment 237954 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=237954&action=review

Looking better, but still not quite right. Thanks for so much hard work on this!

> Source/JavaScriptCore/runtime/StringPrototype.cpp:88
> +

Shouldn’t add an extra blank line here.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1561
> +    if (!checkObjectCoercible(thisValue))
> +        return throwVMTypeError(exec);

Why is this needed? I don’t see this in the existing string functions. What test covers this?

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1571
> +    unsigned start = std::max(0, exec->argument(1).toInt32(exec));

Need an if (exec->hadException()) here since toInt32 can raise an exception.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1575
> +    if (exec->hadException())
> +        return JSValue::encode(jsUndefined());

As I mentioned in earlier versions of this patch, we don’t actually need this exec->hadException check because it’s the last and no JavaScript executes after this point. If you try to construct a test case that shows this is needed, you will find that it’s impossible. It’s OK to leave it in, but it’s untestable.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1584
> +    if (!checkObjectCoercible(thisValue))
> +        return throwVMTypeError(exec);

Why is this needed? I don’t see this in the existing string functions. What test covers this?

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1596
> +    int pos = a1.isUndefined() ? length : a1.toInt32(exec);

Need an if (exec->hadException()) here since toInt32 can raise an exception.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1601
> +    if (exec->hadException())
> +        return JSValue::encode(jsUndefined());

As I mentioned in earlier versions of this patch, we don’t actually need this exec->hadException check because it’s the last and no JavaScript executes after this point. If you try to construct a test case that shows this is needed, you will find that it’s impossible. It’s OK to leave it in, but it’s untestable.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1610
> +    if (!checkObjectCoercible(thisValue))
> +        return throwVMTypeError(exec);

Why is this needed? I don’t see this in the existing string functions. What test covers this?

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1620
> +    unsigned start = std::max(0, exec->argument(1).toInt32(exec));

Need an if (exec->hadException()) here since toInt32 can raise an exception.

> Source/JavaScriptCore/runtime/StringPrototype.cpp:1624
> +    if (exec->hadException())
> +        return JSValue::encode(jsUndefined());

As I mentioned in earlier versions of this patch, we don’t actually need this exec->hadException check because it’s the last and no JavaScript executes after this point. If you try to construct a test case that shows this is needed, you will find that it’s impossible. It’s OK to leave it in, but it’s untestable.

> Source/WTF/wtf/text/StringImpl.cpp:1172
> +    ASSERT(index <= length());
> +    ASSERT(matchLength <= length());
> +    ASSERT(index + matchLength <= length());
> +
> +    if (matchLength + index > length())
> +        return notFound;

All of this is wrong. These cases are handled below. Please don’t add this code.

> Source/WTF/wtf/text/StringImpl.cpp:1378
> +ALWAYS_INLINE static bool equalInner(const StringImpl* stringImpl, unsigned startOffset, StringImpl& matchString, bool caseSensitive)

Do we really need ALWAYS_INLINE for this?

> Source/WTF/wtf/text/StringImpl.cpp:1380
> +    ASSERT(stringImpl);

This indicates the function should take a StringImpl&, not a StringImpl*. If we can’t take a null we should take a reference, not a pointer.

> Source/WTF/wtf/text/StringImpl.cpp:1389
> +    if (matchString.length() + startOffset < startOffset)
> +        return false;

This fourth check is not needed. The three checks above suffice cover this already.

> Source/WTF/wtf/text/StringImpl.cpp:1435
> +    if (matchString.length() + startOffset > length())
> +        return false;

This isn’t needed, since we already have this check in equalInner.

> Source/WTF/wtf/text/WTFString.h:262
> +    bool contains(const LChar* str, bool caseSensitive = true, unsigned startOffset = 0) const 
> +        { return find(str, startOffset, caseSensitive) != notFound; }
> +    bool contains(const String& str, bool caseSensitive = true, unsigned startOffset = 0) const 
> +        { return find(str, startOffset, caseSensitive) != notFound; }

This is really not good. We don’t want a startOffset argument that comes after the case sensitive boolean in this function. Is there some way to avoid introducing this?

> Source/WTF/wtf/text/WTFString.h:274
> +    bool startsWith(String& s, unsigned startOffset, bool caseSensitive) const
> +        { return m_impl ? m_impl->startsWith(*s.impl(), startOffset, caseSensitive) : false; }

Please name the argument “prefix”, not “s”. We prefer words to letters for argument names.

It’s not appropriate to assume that s.impl() is non-null without checking. We should not make functions where it’s illegal to call them with null strings.

> Source/WTF/wtf/text/WTFString.h:285
> +    bool endsWith(String& s, unsigned endOffset, bool caseSensitive) const
> +        { return m_impl ? m_impl->endsWith(*s.impl(), endOffset, caseSensitive) : false; }

Please name the argument “suffix”, not “s”. We prefer words to letters for argument names.

It’s not appropriate to assume that s.impl() is non-null without checking. We should not make functions where it’s illegal to call them with null strings.
Comment 38 Diego Pino 2014-09-12 12:19:26 PDT
Comment on attachment 237954 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=237954&action=review

>> Source/JavaScriptCore/runtime/StringPrototype.cpp:1561
>> +        return throwVMTypeError(exec);
> 
> Why is this needed? I don’t see this in the existing string functions. What test covers this?

It's required by the ES6 spec. It's the first step in 'startsWith' (in addition to 'endsWith' and 'contains').

1. Let O be RequireObjectCoercible(this value).

Other String functions (trim, toLowerCase, substring, etc) require this step too, and it's implemented in the same fashion.

According to the spec, the abstract operation RequireObjectCoercible throws an TypeError Exception if argument cannot be converted to an object using ToObject. The tests that call 'startsWith', 'endsWith' and 'contains' with an environment record 'this', cover this case, they all throw an Exception. More info in comment #9.

>> Source/WTF/wtf/text/StringImpl.cpp:1378
>> +ALWAYS_INLINE static bool equalInner(const StringImpl* stringImpl, unsigned startOffset, StringImpl& matchString, bool caseSensitive)
> 
> Do we really need ALWAYS_INLINE for this?

The logic of the functions that use this 'equalInner' implementation, 'startsWith' and 'endsWith', is rather small, the real matching is done here. If inlining is not going to save much I can't remove it though. FWIW, the other 'equalInner' implementation is inlined.

> Source/WTF/wtf/text/StringImpl.cpp:1379
> +{

As this method is new, I tried to make it as "StringImpl& stringImpl" and pass *this as parameter, but I got the following error:

../../Source/WTF/wtf/text/StringImpl.cpp: In member function 'bool WTF::StringImpl::startsWith(WTF::StringImpl&, unsigned int, bool) const':
../../Source/WTF/wtf/text/StringImpl.cpp:1434:69: error: no matching function for call to 'equalInner(const WTF::StringImpl&, unsigned int&, WTF::StringImpl&, bool&)'
../../Source/WTF/wtf/text/StringImpl.cpp:1434:69: note: candidates are:
../../Source/WTF/wtf/text/StringImpl.cpp:1362:55: note: bool WTF::equalInner(const WTF::StringImpl*, unsigned int, const char*, unsigned int, bool)
../../Source/WTF/wtf/text/StringImpl.cpp:1362:55: note:   candidate expects 5 arguments, 4 provided
../../Source/WTF/wtf/text/StringImpl.cpp:1378:55: note: bool WTF::equalInner(WTF::StringImpl&, unsigned int, WTF::StringImpl&, bool)
../../Source/WTF/wtf/text/StringImpl.cpp:1378:55: note:   no known conversion for argument 1 from 'const WTF::StringImpl' to 'WTF::StringImpl&'

It seems 'const' was required, but Darin told me all StringImpl& are implicitly const. Finally I decide to leave it as "const StringImpl* stringImpl" and pass 'this'.

>> Source/WTF/wtf/text/WTFString.h:262
>> +        { return find(str, startOffset, caseSensitive) != notFound; }
> 
> This is really not good. We don’t want a startOffset argument that comes after the case sensitive boolean in this function. Is there some way to avoid introducing this?

I agree, I don't like it either, but as I commented in #25 this is necessary to avoid ambiguous warning messages from the compiler. If I put 'startOffset' before 'caseSensitive', all the existing code of the form "contains(str)" or "contains(str, false)" will turn to be into "contains(str, 1, true)" and "contains(str, 0, true)", because the compiler does an implicit conversion between boolean types and integers.

At this point I think it might be better to not overload 'contains', 'startsWith' and 'endsWith' methods and name the new ones as 'containsFromPos', 'startsWithFromPos' and 'endsWithUntilPos'. In addition, having a single contains(str, offset, caseSenstive), forces to explicitly set an offset when searching a string non-sensisite, I mean, str.contains(substr, false) must be written as str.contains(substr, 0, false), which is unclear IMHO. I think having two different names helps to avoid this problem.
Comment 39 Diego Pino 2014-09-12 12:21:59 PDT
Created attachment 238045 [details]
Patch
Comment 40 Diego Pino 2014-09-12 12:32:37 PDT
Thanks Darin for the extensive review. I uploaded a new version of the patch with most of the last issues fixed.

I don't know why, while commenting your last review I repeated the comment about "const StringImpl&". Finally I managed to remove the 'const' modifier. I const_cast<StringImpl&>(*this). I don't know what's best.

There are still some issues pending to be clarified, so I didn't cq? the patch. Mostly we need to agree whether to leave the order of the parameters in 'contains' as it is, or instead, do not overload 'contains', 'startsWith' and 'endsWith' and add these functions with other names, that will give more freedom (and avoid any problem or error with the compiler warnings because of ambiguity). Each approach has its pros and cons. If we opted for the first approach a refactoring will be desired, but likely in another patch.
Comment 41 Darin Adler 2014-09-12 17:56:36 PDT
Comment on attachment 238045 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=238045&action=review

> Source/WTF/wtf/text/StringImpl.cpp:1424
> +bool StringImpl::startsWith(StringImpl& matchString, unsigned startOffset, bool caseSensitive) const
> +{
> +    return equalInner(const_cast<StringImpl&>(*this), startOffset, matchString, caseSensitive);
> +}

We should follow this up by removing all the const on StringImpl member functions since it’s an immutable class, and all StringImpl are const; when we remove the const from this one, we can remove the const_cast.

> Source/WTF/wtf/text/StringImpl.cpp:1455
> +bool StringImpl::endsWith(StringImpl& matchString, unsigned endOffset, bool caseSensitive) const
> +{
> +    if (endOffset < matchString.length())
> +        return false;
> +    return equalInner(const_cast<StringImpl&>(*this), endOffset - matchString.length(), matchString, caseSensitive);
> +}

We should follow this up by removing all the const on StringImpl member functions since it’s an immutable class, and all StringImpl are const; when we remove the const from this one, we can remove the const_cast.
Comment 42 Diego Pino 2014-09-19 00:46:50 PDT
(In reply to comment #41)
> (From update of attachment 238045 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=238045&action=review
> 
> > Source/WTF/wtf/text/StringImpl.cpp:1424
> > +bool StringImpl::startsWith(StringImpl& matchString, unsigned startOffset, bool caseSensitive) const
> > +{
> > +    return equalInner(const_cast<StringImpl&>(*this), startOffset, matchString, caseSensitive);
> > +}
> 
> We should follow this up by removing all the const on StringImpl member functions since it’s an immutable class, and all StringImpl are const; when we remove the const from this one, we can remove the const_cast.
> 

I'm a bit lost at this. This is the current declaration of equalInner:

equalInner(StringImpl& stringImpl, unsigned startOffset, ...)

It's declared without a 'const' modifier, since the class it's inmutable and it's not needed.

In startsWith, dereferencing 'this' returns a 'const StringImpl'. I need to remove the constness from *this to call equalInner, and that's what the casting does: const_cast<StringImpl&>(*this), converts a 'const StringImpl' to a 'StringImpl&'.

The compiler complains if I don't do the casting. Calling 'equalInner(*this,...) returns this:

../../Source/WTF/wtf/text/StringImpl.cpp: In member function 'bool WTF::StringImpl::startsWith(WTF::StringImpl&, unsigned int, bool) const':
../../Source/WTF/wtf/text/StringImpl.cpp:1423:24: error: invalid initialization of reference of type 'WTF::StringImpl&' from expression of type 'const WTF::StringImpl'

Another possibility is to declare equalInner as 'equalInner(const String&, ...)' and pass '*this' without removing the constness.
Comment 43 Darin Adler 2014-09-19 08:59:21 PDT
(In reply to comment #42)
> In startsWith, dereferencing 'this' returns a 'const StringImpl'.

Exactly. That’s a StringImpl member function. And I said that we should follow this patch up by removing all the const on StringImpl member functions since it’s an immutable class. So we would remove the const from the startsWith function declaration. And then after that was done we could get rid of all const StringImpl since it would mean the same thing as StringImpl.
Comment 44 Diego Pino 2014-09-19 09:58:29 PDT
(In reply to comment #43)
> (In reply to comment #42)
> > In startsWith, dereferencing 'this' returns a 'const StringImpl'.
> 
> Exactly. That’s a StringImpl member function. And I said that we should follow this patch up by removing all the const on StringImpl member functions since it’s an immutable class. So we would remove the const from the startsWith function declaration. And then after that was done we could get rid of all const StringImpl since it would mean the same thing as StringImpl.

OK, thanks for the clarification, now I understand better. So if there are no more changes to make I will switch the patch to cq?. Could you land?
Comment 45 WebKit Commit Bot 2014-09-19 10:47:58 PDT
Comment on attachment 238045 [details]
Patch

Clearing flags on attachment: 238045

Committed r173761: <http://trac.webkit.org/changeset/173761>
Comment 46 WebKit Commit Bot 2014-09-19 10:48:05 PDT
All reviewed patches have been landed.  Closing bug.