Bug 140047

Summary: Implement ES6 String.prototype.repeat(count)
Product: WebKit Reporter: Yusuke Suzuki <ysuzuki>
Component: JavaScriptCoreAssignee: Yusuke Suzuki <ysuzuki>
Status: RESOLVED FIXED    
Severity: Normal CC: buildbot, darin, rniwa, sam
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Archive of layout-test-results from ews105 for mac-mountainlion-wk2
none
Archive of layout-test-results from ews100 for mac-mountainlion
none
Patch
none
Patch
none
Patch darin: review+

Yusuke Suzuki
Reported 2015-01-02 17:34:52 PST
Implement ES6 String.prototype.repeat(count)
Attachments
Patch (10.07 KB, patch)
2015-01-02 18:02 PST, Yusuke Suzuki
no flags
Archive of layout-test-results from ews105 for mac-mountainlion-wk2 (515.56 KB, application/zip)
2015-01-02 18:35 PST, Build Bot
no flags
Archive of layout-test-results from ews100 for mac-mountainlion (516.06 KB, application/zip)
2015-01-02 18:48 PST, Build Bot
no flags
Patch (14.69 KB, patch)
2015-01-02 18:51 PST, Yusuke Suzuki
no flags
Patch (16.90 KB, patch)
2015-01-06 06:10 PST, Yusuke Suzuki
no flags
Patch (17.50 KB, patch)
2015-01-06 06:36 PST, Yusuke Suzuki
darin: review+
Yusuke Suzuki
Comment 1 2015-01-02 18:02:07 PST
Build Bot
Comment 2 2015-01-02 18:35:11 PST
Comment on attachment 243909 [details] Patch Attachment 243909 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.appspot.com/results/6324153409863680 New failing tests: js/Object-getOwnPropertyNames.html
Build Bot
Comment 3 2015-01-02 18:35:13 PST
Created attachment 243910 [details] Archive of layout-test-results from ews105 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews105 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Build Bot
Comment 4 2015-01-02 18:48:17 PST
Comment on attachment 243909 [details] Patch Attachment 243909 [details] did not pass mac-ews (mac): Output: http://webkit-queues.appspot.com/results/6369738045259776 New failing tests: js/Object-getOwnPropertyNames.html
Build Bot
Comment 5 2015-01-02 18:48:19 PST
Created attachment 243911 [details] Archive of layout-test-results from ews100 for mac-mountainlion The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews100 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Yusuke Suzuki
Comment 6 2015-01-02 18:51:50 PST
Darin Adler
Comment 7 2015-01-04 13:34:30 PST
Comment on attachment 243912 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=243912&action=review Looks generally good. Needs a little work. > Source/JavaScriptCore/runtime/StringPrototype.cpp:688 > + JSValue a0 = exec->argument(0); > + double n = a0.toInteger(exec); No need to put argument 0 into a local variable since we are using it only once. > Source/JavaScriptCore/runtime/StringPrototype.cpp:692 > + if (n < 0 || !std::isfinite(n)) > + return throwVMError(exec, createRangeError(exec, ASCIILiteral("repeat() argument must be greater than or equal to 0 and not be infinity."))); This error message doesn’t seem quite right for NaN. > Source/JavaScriptCore/runtime/StringPrototype.cpp:694 > + if ((n * string->length()) > std::numeric_limits<int32_t>::max()) We don’t normally use parentheses for an expression like this. We are used to recognizing that arithmetic operators like * bind more tightly that comparison operators like >. We should probably cut down a little bit on the floating point math. We can do this: if (n > std::numeric_limits<int32_t>::max() / string->length()) I don’t understand why the math is being done with int32_t rather than unsigned. String lengths are unsigned, not signed, so it doesn’t make sense to mention <int32_t>::max() here rather than <unsigned>::max(). > Source/JavaScriptCore/runtime/StringPrototype.cpp:712 > + JSString* result = jsEmptyString(exec); > + int32_t count = static_cast<int32_t>(n); > + while (count > 0) { > + if (count & 1) { > + JSValue concatenated = jsString(exec, result, string); > + if (exec->hadException()) > + return JSValue::encode(jsUndefined()); > + result = jsCast<JSString*>(concatenated); > + } > + count >>= 1; > + JSValue concatenated = jsString(exec, string, string); > + if (exec->hadException()) > + return JSValue::encode(jsUndefined()); > + string = jsCast<JSString*>(concatenated); > + } > + return JSValue::encode(result); This is unnecessarily inefficient. We should create only a single JSString, not all these extra ones. Each of those intermediate strings will need to be allocated and then garbage collected so it’s inappropriately slow. The code should look more like this: VM& vm = exec->vm(); if (!string->length() || !repeatCount) return jsEmptyString(vm); if (repeatCount == 1) return string; JSRopeString::RopeBuilder ropeBuilder(vm); for (unsigned i = 0; i < count; ++i) { if (!ropeBuilder.append(string)) return throwOutOfMemoryError(exec); } return ropeBuilder.release(); I’m not sure if we need the special case for empty strings, for repeat count 0, for repeat count 1, and for large lengths that would overflow. All of them are performance optimizations and not needed for correctness. The rope builder code will handle out of memory correctly without an explicit length check. And even with a length check there are going to be large repeat counts that will cause the code to loop for a long time before detecting that it ran out of memory. The length check is *not* needed to prevent overflow, so it’s solely a performance optimization for overflow cases, one that I’m not sure we need.
Darin Adler
Comment 8 2015-01-04 13:36:59 PST
Comment on attachment 243912 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=243912&action=review >> Source/JavaScriptCore/runtime/StringPrototype.cpp:692 >> + return throwVMError(exec, createRangeError(exec, ASCIILiteral("repeat() argument must be greater than or equal to 0 and not be infinity."))); > > This error message doesn’t seem quite right for NaN. Oh, I see, this won’t happen for NaN because toInteger is guaranteed to not return NaN. In that case, the code should use std::isinf rather than std::isfinite, since it is more efficient: if (n < 0 || std::isinf(n))
Darin Adler
Comment 9 2015-01-04 13:37:51 PST
Comment on attachment 243912 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=243912&action=review >>> Source/JavaScriptCore/runtime/StringPrototype.cpp:692 >>> + return throwVMError(exec, createRangeError(exec, ASCIILiteral("repeat() argument must be greater than or equal to 0 and not be infinity."))); >> >> This error message doesn’t seem quite right for NaN. > > Oh, I see, this won’t happen for NaN because toInteger is guaranteed to not return NaN. In that case, the code should use std::isinf rather than std::isfinite, since it is more efficient: > > if (n < 0 || std::isinf(n)) Do other error strings end in periods? I note that the one for out-of-memory does not (you can see that in the test result).
Darin Adler
Comment 10 2015-01-04 13:39:29 PST
Comment on attachment 243912 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=243912&action=review >> Source/JavaScriptCore/runtime/StringPrototype.cpp:694 >> + if ((n * string->length()) > std::numeric_limits<int32_t>::max()) > > We don’t normally use parentheses for an expression like this. We are used to recognizing that arithmetic operators like * bind more tightly that comparison operators like >. > > We should probably cut down a little bit on the floating point math. We can do this: > > if (n > std::numeric_limits<int32_t>::max() / string->length()) > > I don’t understand why the math is being done with int32_t rather than unsigned. String lengths are unsigned, not signed, so it doesn’t make sense to mention <int32_t>::max() here rather than <unsigned>::max(). Hmm, reading more code it seems there is some kind of int32_t limit for JSString.
Yusuke Suzuki
Comment 11 2015-01-04 14:09:15 PST
Comment on attachment 243912 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=243912&action=review Thank you for your review! >> Source/JavaScriptCore/runtime/StringPrototype.cpp:688 >> + double n = a0.toInteger(exec); > > No need to put argument 0 into a local variable since we are using it only once. Nice! I'll fix it. >>>> Source/JavaScriptCore/runtime/StringPrototype.cpp:692 >>>> + return throwVMError(exec, createRangeError(exec, ASCIILiteral("repeat() argument must be greater than or equal to 0 and not be infinity."))); >>> >>> This error message doesn’t seem quite right for NaN. >> >> Oh, I see, this won’t happen for NaN because toInteger is guaranteed to not return NaN. In that case, the code should use std::isinf rather than std::isfinite, since it is more efficient: >> >> if (n < 0 || std::isinf(n)) > > Do other error strings end in periods? I note that the one for out-of-memory does not (you can see that in the test result). > Oh, I see, this won’t happen for NaN because toInteger is guaranteed to not return NaN. In that case, the code should use std::isinf rather than std::isfinite, since it is more efficient: Looks very nice! I'll change to `if (n < 0 || std::isinf(n))`. > Do other error strings end in periods? I note that the one for out-of-memory does not (you can see that in the test result). Seeing the code, some ends in periods[1], but some doesn't end in periods[2]. But seeing the results, it seems that ending without periods is major, so I'll follow it (dropping the period). [1]: http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/runtime/RegExpPrototype.cpp#L116 [2]: http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/runtime/DatePrototype.cpp#L1083 >>> Source/JavaScriptCore/runtime/StringPrototype.cpp:694 >>> + if ((n * string->length()) > std::numeric_limits<int32_t>::max()) >> >> We don’t normally use parentheses for an expression like this. We are used to recognizing that arithmetic operators like * bind more tightly that comparison operators like >. >> >> We should probably cut down a little bit on the floating point math. We can do this: >> >> if (n > std::numeric_limits<int32_t>::max() / string->length()) >> >> I don’t understand why the math is being done with int32_t rather than unsigned. String lengths are unsigned, not signed, so it doesn’t make sense to mention <int32_t>::max() here rather than <unsigned>::max(). > > Hmm, reading more code it seems there is some kind of int32_t limit for JSString. Yes. Seeing the `JSValue jsString(ExecState* exec, JSString* s1, JSString* s2)` code, I found that there's a int32_t length limitation for JSString. http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/runtime/Operations.h#L47 Maybe, IMHO, it's introduced for some optimizations. By limiting JSString's length in int32_t, string.length results can be assumed within int32_t JSValue range. At this time, I'll use `JSRopeString::RopeBuilder` and dropping this check :) >> Source/JavaScriptCore/runtime/StringPrototype.cpp:712 >> + return JSValue::encode(result); > > This is unnecessarily inefficient. We should create only a single JSString, not all these extra ones. Each of those intermediate strings will need to be allocated and then garbage collected so it’s inappropriately slow. The code should look more like this: > > VM& vm = exec->vm(); > > if (!string->length() || !repeatCount) > return jsEmptyString(vm); > > if (repeatCount == 1) > return string; > > JSRopeString::RopeBuilder ropeBuilder(vm); > > for (unsigned i = 0; i < count; ++i) { > if (!ropeBuilder.append(string)) > return throwOutOfMemoryError(exec); > } > > return ropeBuilder.release(); > > I’m not sure if we need the special case for empty strings, for repeat count 0, for repeat count 1, and for large lengths that would overflow. All of them are performance optimizations and not needed for correctness. > > The rope builder code will handle out of memory correctly without an explicit length check. And even with a length check there are going to be large repeat counts that will cause the code to loop for a long time before detecting that it ran out of memory. The length check is *not* needed to prevent overflow, so it’s solely a performance optimization for overflow cases, one that I’m not sure we need. Ah, thank you!! I've misunderstood the JSC's String representation (I assumed that JSC has simple Seq/Cons Strings like V8, but JSC has multiple (3) slots). JSRopeString::RopeBuilder efficiently manages ropes, it looks very nice. I'll change the code to it. > I’m not sure if we need the special case for empty strings, for repeat count 0, for repeat count 1, and for large lengths that would overflow. All of them are performance optimizations and not needed for correctness. Additionally, I think checking "empty strings, for repeat count 0, for repeat count 1" allow JSC to execute `"".repeat(very large number)` without a OOM error. So I think introducing it looks nice. > The rope builder code will handle out of memory correctly without an explicit length check. And even with a length check there are going to be large repeat counts that will cause the code to loop for a long time before detecting that it ran out of memory. The length check is *not* needed to prevent overflow, so it’s solely a performance optimization for overflow cases, one that I’m not sure we need. Agreed. At the end of the execution, both cases produces the OOM error, so there's no observable results except for the consumed time.
Yusuke Suzuki
Comment 12 2015-01-04 14:13:56 PST
Comment on attachment 243912 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=243912&action=review >>> Source/JavaScriptCore/runtime/StringPrototype.cpp:712 >>> + return JSValue::encode(result); >> >> This is unnecessarily inefficient. We should create only a single JSString, not all these extra ones. Each of those intermediate strings will need to be allocated and then garbage collected so it’s inappropriately slow. The code should look more like this: >> >> VM& vm = exec->vm(); >> >> if (!string->length() || !repeatCount) >> return jsEmptyString(vm); >> >> if (repeatCount == 1) >> return string; >> >> JSRopeString::RopeBuilder ropeBuilder(vm); >> >> for (unsigned i = 0; i < count; ++i) { >> if (!ropeBuilder.append(string)) >> return throwOutOfMemoryError(exec); >> } >> >> return ropeBuilder.release(); >> >> I’m not sure if we need the special case for empty strings, for repeat count 0, for repeat count 1, and for large lengths that would overflow. All of them are performance optimizations and not needed for correctness. >> >> The rope builder code will handle out of memory correctly without an explicit length check. And even with a length check there are going to be large repeat counts that will cause the code to loop for a long time before detecting that it ran out of memory. The length check is *not* needed to prevent overflow, so it’s solely a performance optimization for overflow cases, one that I’m not sure we need. > > Ah, thank you!! I've misunderstood the JSC's String representation (I assumed that JSC has simple Seq/Cons Strings like V8, but JSC has multiple (3) slots). > JSRopeString::RopeBuilder efficiently manages ropes, it looks very nice. > I'll change the code to it. **Reformatting my exiting replies in the review to easily see in the "Review Patch" viewer.** your comment: "I’m not sure if we need the special case for empty strings, for repeat count 0, for repeat count 1, and for large lengths that would overflow. All of them are performance optimizations and not needed for correctness." Additionally, I think checking "empty strings, for repeat count 0, for repeat count 1" allow JSC to execute `"".repeat(very large number)` without a OOM error. So I think introducing it looks nice. your comment: "The rope builder code will handle out of memory correctly without an explicit length check. And even with a length check there are going to be large repeat counts that will cause the code to loop for a long time before detecting that it ran out of memory. The length check is *not* needed to prevent overflow, so it’s solely a performance optimization for overflow cases, one that I’m not sure we need." Agreed. At the end of the execution, both cases produces the OOM error, so there's no observable results except for the consumed time.
Darin Adler
Comment 13 2015-01-04 14:17:53 PST
Comment on attachment 243912 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=243912&action=review >>>> Source/JavaScriptCore/runtime/StringPrototype.cpp:712 >>>> + return JSValue::encode(result); >>> >>> This is unnecessarily inefficient. We should create only a single JSString, not all these extra ones. Each of those intermediate strings will need to be allocated and then garbage collected so it’s inappropriately slow. The code should look more like this: >>> >>> VM& vm = exec->vm(); >>> >>> if (!string->length() || !repeatCount) >>> return jsEmptyString(vm); >>> >>> if (repeatCount == 1) >>> return string; >>> >>> JSRopeString::RopeBuilder ropeBuilder(vm); >>> >>> for (unsigned i = 0; i < count; ++i) { >>> if (!ropeBuilder.append(string)) >>> return throwOutOfMemoryError(exec); >>> } >>> >>> return ropeBuilder.release(); >>> >>> I’m not sure if we need the special case for empty strings, for repeat count 0, for repeat count 1, and for large lengths that would overflow. All of them are performance optimizations and not needed for correctness. >>> >>> The rope builder code will handle out of memory correctly without an explicit length check. And even with a length check there are going to be large repeat counts that will cause the code to loop for a long time before detecting that it ran out of memory. The length check is *not* needed to prevent overflow, so it’s solely a performance optimization for overflow cases, one that I’m not sure we need. >> >> Ah, thank you!! I've misunderstood the JSC's String representation (I assumed that JSC has simple Seq/Cons Strings like V8, but JSC has multiple (3) slots). >> JSRopeString::RopeBuilder efficiently manages ropes, it looks very nice. >> I'll change the code to it. > > **Reformatting my exiting replies in the review to easily see in the "Review Patch" viewer.** > > your comment: "I’m not sure if we need the special case for empty strings, for repeat count 0, for repeat count 1, and for large lengths that would overflow. All of them are performance optimizations and not needed for correctness." > > Additionally, I think checking "empty strings, for repeat count 0, for repeat count 1" allow JSC to execute `"".repeat(very large number)` without a OOM error. So I think introducing it looks nice. > > your comment: "The rope builder code will handle out of memory correctly without an explicit length check. And even with a length check there are going to be large repeat counts that will cause the code to loop for a long time before detecting that it ran out of memory. The length check is *not* needed to prevent overflow, so it’s solely a performance optimization for overflow cases, one that I’m not sure we need." > > Agreed. At the end of the execution, both cases produces the OOM error, so there's no observable results except for the consumed time. I see your point, but single character strings with a large repeat count would be an optimization for string length 1, not repeat count 1. An optimized path for single character strings might make sense. Building a rope where each string in the rope is a single character is really inefficient. There's probably a cutoff for a small enough string size where we’d want to use StringBuilder instead of RopeBuilder and create the JSString from a single String instead of making a rope. It’s straightforward to use StringBuilder; just make one, then call reserveCapacity, then append in a loop. Or for single characters we could just use String::createUninitialized rather than using StringBuilder or RopeBuilder.
Yusuke Suzuki
Comment 14 2015-01-04 14:35:38 PST
Comment on attachment 243912 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=243912&action=review >>>>> Source/JavaScriptCore/runtime/StringPrototype.cpp:712 >>>>> + return JSValue::encode(result); >>>> >>>> This is unnecessarily inefficient. We should create only a single JSString, not all these extra ones. Each of those intermediate strings will need to be allocated and then garbage collected so it’s inappropriately slow. The code should look more like this: >>>> >>>> VM& vm = exec->vm(); >>>> >>>> if (!string->length() || !repeatCount) >>>> return jsEmptyString(vm); >>>> >>>> if (repeatCount == 1) >>>> return string; >>>> >>>> JSRopeString::RopeBuilder ropeBuilder(vm); >>>> >>>> for (unsigned i = 0; i < count; ++i) { >>>> if (!ropeBuilder.append(string)) >>>> return throwOutOfMemoryError(exec); >>>> } >>>> >>>> return ropeBuilder.release(); >>>> >>>> I’m not sure if we need the special case for empty strings, for repeat count 0, for repeat count 1, and for large lengths that would overflow. All of them are performance optimizations and not needed for correctness. >>>> >>>> The rope builder code will handle out of memory correctly without an explicit length check. And even with a length check there are going to be large repeat counts that will cause the code to loop for a long time before detecting that it ran out of memory. The length check is *not* needed to prevent overflow, so it’s solely a performance optimization for overflow cases, one that I’m not sure we need. >>> >>> Ah, thank you!! I've misunderstood the JSC's String representation (I assumed that JSC has simple Seq/Cons Strings like V8, but JSC has multiple (3) slots). >>> JSRopeString::RopeBuilder efficiently manages ropes, it looks very nice. >>> I'll change the code to it. >> >> **Reformatting my exiting replies in the review to easily see in the "Review Patch" viewer.** >> >> your comment: "I’m not sure if we need the special case for empty strings, for repeat count 0, for repeat count 1, and for large lengths that would overflow. All of them are performance optimizations and not needed for correctness." >> >> Additionally, I think checking "empty strings, for repeat count 0, for repeat count 1" allow JSC to execute `"".repeat(very large number)` without a OOM error. So I think introducing it looks nice. >> >> your comment: "The rope builder code will handle out of memory correctly without an explicit length check. And even with a length check there are going to be large repeat counts that will cause the code to loop for a long time before detecting that it ran out of memory. The length check is *not* needed to prevent overflow, so it’s solely a performance optimization for overflow cases, one that I’m not sure we need." >> >> Agreed. At the end of the execution, both cases produces the OOM error, so there's no observable results except for the consumed time. > > I see your point, but single character strings with a large repeat count would be an optimization for string length 1, not repeat count 1. > > An optimized path for single character strings might make sense. Building a rope where each string in the rope is a single character is really inefficient. There's probably a cutoff for a small enough string size where we’d want to use StringBuilder instead of RopeBuilder and create the JSString from a single String instead of making a rope. It’s straightforward to use StringBuilder; just make one, then call reserveCapacity, then append in a loop. Or for single characters we could just use String::createUninitialized rather than using StringBuilder or RopeBuilder. That's very nice insight! Repeating a single character is considered as very common use cases (e.g. `' '.repeat(4)` to generate 4 spaces for indentation). Optimizing for this case seems reasonable. And one note about the current code is that, in the current implementation, we concat the strings in the loop (`string` variable in the code) even if we don't use it (not adding it to `result`). As a result, we only loop log(N) times. If the N becomes larger, it might have an effect. But on the other hand, we can say that we just delay the processing time until resolving the ropes. In the current code, I took the log(N) method. What do you think of this? I'd like to hear your thought.
Darin Adler
Comment 15 2015-01-04 15:13:15 PST
Comment on attachment 243912 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=243912&action=review >>>>>> Source/JavaScriptCore/runtime/StringPrototype.cpp:712 >>>>>> + return JSValue::encode(result); >>>>> >>>>> This is unnecessarily inefficient. We should create only a single JSString, not all these extra ones. Each of those intermediate strings will need to be allocated and then garbage collected so it’s inappropriately slow. The code should look more like this: >>>>> >>>>> VM& vm = exec->vm(); >>>>> >>>>> if (!string->length() || !repeatCount) >>>>> return jsEmptyString(vm); >>>>> >>>>> if (repeatCount == 1) >>>>> return string; >>>>> >>>>> JSRopeString::RopeBuilder ropeBuilder(vm); >>>>> >>>>> for (unsigned i = 0; i < count; ++i) { >>>>> if (!ropeBuilder.append(string)) >>>>> return throwOutOfMemoryError(exec); >>>>> } >>>>> >>>>> return ropeBuilder.release(); >>>>> >>>>> I’m not sure if we need the special case for empty strings, for repeat count 0, for repeat count 1, and for large lengths that would overflow. All of them are performance optimizations and not needed for correctness. >>>>> >>>>> The rope builder code will handle out of memory correctly without an explicit length check. And even with a length check there are going to be large repeat counts that will cause the code to loop for a long time before detecting that it ran out of memory. The length check is *not* needed to prevent overflow, so it’s solely a performance optimization for overflow cases, one that I’m not sure we need. >>>> >>>> Ah, thank you!! I've misunderstood the JSC's String representation (I assumed that JSC has simple Seq/Cons Strings like V8, but JSC has multiple (3) slots). >>>> JSRopeString::RopeBuilder efficiently manages ropes, it looks very nice. >>>> I'll change the code to it. >>> >>> **Reformatting my exiting replies in the review to easily see in the "Review Patch" viewer.** >>> >>> your comment: "I’m not sure if we need the special case for empty strings, for repeat count 0, for repeat count 1, and for large lengths that would overflow. All of them are performance optimizations and not needed for correctness." >>> >>> Additionally, I think checking "empty strings, for repeat count 0, for repeat count 1" allow JSC to execute `"".repeat(very large number)` without a OOM error. So I think introducing it looks nice. >>> >>> your comment: "The rope builder code will handle out of memory correctly without an explicit length check. And even with a length check there are going to be large repeat counts that will cause the code to loop for a long time before detecting that it ran out of memory. The length check is *not* needed to prevent overflow, so it’s solely a performance optimization for overflow cases, one that I’m not sure we need." >>> >>> Agreed. At the end of the execution, both cases produces the OOM error, so there's no observable results except for the consumed time. >> >> I see your point, but single character strings with a large repeat count would be an optimization for string length 1, not repeat count 1. >> >> An optimized path for single character strings might make sense. Building a rope where each string in the rope is a single character is really inefficient. There's probably a cutoff for a small enough string size where we’d want to use StringBuilder instead of RopeBuilder and create the JSString from a single String instead of making a rope. It’s straightforward to use StringBuilder; just make one, then call reserveCapacity, then append in a loop. Or for single characters we could just use String::createUninitialized rather than using StringBuilder or RopeBuilder. > > That's very nice insight! > Repeating a single character is considered as very common use cases (e.g. `' '.repeat(4)` to generate 4 spaces for indentation). Optimizing for this case seems reasonable. > > And one note about the current code is that, in the current implementation, we concat the strings in the loop (`string` variable in the code) even if we don't use it (not adding it to `result`). As a result, we only loop log(N) times. If the N becomes larger, it might have an effect. But on the other hand, we can say that we just delay the processing time until resolving the ropes. In the current code, I took the log(N) method. > What do you think of this? I'd like to hear your thought. The rope builder still loops through all the strings to make each rope, so the O(log(N)) is a false economy; the looping across every string is still happening and the code is still O(N), not O(log(N)). Our single loop that runs across all N strings is more efficient than the loop that calls the rope code log(N) times.
Yusuke Suzuki
Comment 16 2015-01-04 15:22:24 PST
Comment on attachment 243912 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=243912&action=review >>>>>>> Source/JavaScriptCore/runtime/StringPrototype.cpp:712 >>>>>>> + return JSValue::encode(result); >>>>>> >>>>>> This is unnecessarily inefficient. We should create only a single JSString, not all these extra ones. Each of those intermediate strings will need to be allocated and then garbage collected so it’s inappropriately slow. The code should look more like this: >>>>>> >>>>>> VM& vm = exec->vm(); >>>>>> >>>>>> if (!string->length() || !repeatCount) >>>>>> return jsEmptyString(vm); >>>>>> >>>>>> if (repeatCount == 1) >>>>>> return string; >>>>>> >>>>>> JSRopeString::RopeBuilder ropeBuilder(vm); >>>>>> >>>>>> for (unsigned i = 0; i < count; ++i) { >>>>>> if (!ropeBuilder.append(string)) >>>>>> return throwOutOfMemoryError(exec); >>>>>> } >>>>>> >>>>>> return ropeBuilder.release(); >>>>>> >>>>>> I’m not sure if we need the special case for empty strings, for repeat count 0, for repeat count 1, and for large lengths that would overflow. All of them are performance optimizations and not needed for correctness. >>>>>> >>>>>> The rope builder code will handle out of memory correctly without an explicit length check. And even with a length check there are going to be large repeat counts that will cause the code to loop for a long time before detecting that it ran out of memory. The length check is *not* needed to prevent overflow, so it’s solely a performance optimization for overflow cases, one that I’m not sure we need. >>>>> >>>>> Ah, thank you!! I've misunderstood the JSC's String representation (I assumed that JSC has simple Seq/Cons Strings like V8, but JSC has multiple (3) slots). >>>>> JSRopeString::RopeBuilder efficiently manages ropes, it looks very nice. >>>>> I'll change the code to it. >>>> >>>> **Reformatting my exiting replies in the review to easily see in the "Review Patch" viewer.** >>>> >>>> your comment: "I’m not sure if we need the special case for empty strings, for repeat count 0, for repeat count 1, and for large lengths that would overflow. All of them are performance optimizations and not needed for correctness." >>>> >>>> Additionally, I think checking "empty strings, for repeat count 0, for repeat count 1" allow JSC to execute `"".repeat(very large number)` without a OOM error. So I think introducing it looks nice. >>>> >>>> your comment: "The rope builder code will handle out of memory correctly without an explicit length check. And even with a length check there are going to be large repeat counts that will cause the code to loop for a long time before detecting that it ran out of memory. The length check is *not* needed to prevent overflow, so it’s solely a performance optimization for overflow cases, one that I’m not sure we need." >>>> >>>> Agreed. At the end of the execution, both cases produces the OOM error, so there's no observable results except for the consumed time. >>> >>> I see your point, but single character strings with a large repeat count would be an optimization for string length 1, not repeat count 1. >>> >>> An optimized path for single character strings might make sense. Building a rope where each string in the rope is a single character is really inefficient. There's probably a cutoff for a small enough string size where we’d want to use StringBuilder instead of RopeBuilder and create the JSString from a single String instead of making a rope. It’s straightforward to use StringBuilder; just make one, then call reserveCapacity, then append in a loop. Or for single characters we could just use String::createUninitialized rather than using StringBuilder or RopeBuilder. >> >> That's very nice insight! >> Repeating a single character is considered as very common use cases (e.g. `' '.repeat(4)` to generate 4 spaces for indentation). Optimizing for this case seems reasonable. >> >> And one note about the current code is that, in the current implementation, we concat the strings in the loop (`string` variable in the code) even if we don't use it (not adding it to `result`). As a result, we only loop log(N) times. If the N becomes larger, it might have an effect. But on the other hand, we can say that we just delay the processing time until resolving the ropes. In the current code, I took the log(N) method. >> What do you think of this? I'd like to hear your thought. > > The rope builder still loops through all the strings to make each rope, so the O(log(N)) is a false economy; the looping across every string is still happening and the code is still O(N), not O(log(N)). Our single loop that runs across all N strings is more efficient than the loop that calls the rope code log(N) times. Thanks, I see. So I'll fix the code to use RopeBuilder.
Yusuke Suzuki
Comment 17 2015-01-06 06:10:16 PST
Yusuke Suzuki
Comment 18 2015-01-06 06:36:42 PST
Yusuke Suzuki
Comment 19 2015-01-06 06:46:37 PST
Comment on attachment 244056 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=244056&action=review I've updated the patch :) > Source/JavaScriptCore/runtime/StringPrototype.cpp:707 > + double repeatCountDouble = exec->argument(0).toInteger(exec); Directly use `exec->argument(0)`. > Source/JavaScriptCore/runtime/StringPrototype.cpp:710 > + if (repeatCountDouble < 0 || std::isinf(repeatCountDouble)) Use `isinf`. > Source/JavaScriptCore/runtime/StringPrototype.cpp:711 > + return throwVMError(exec, createRangeError(exec, ASCIILiteral("repeat() argument must be greater than or equal to 0 and not be infinity"))); Dropped the period. Seeing the code, some ends in periods[1], but some doesn't end in periods[2]. But seeing the results, it seems that ending without periods is major, so I've followed :) 1: http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/runtime/RegExpPrototype.cpp#L116 2: http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/runtime/DatePrototype.cpp#L1083 > Source/JavaScriptCore/runtime/StringPrototype.cpp:715 > + if (!string->length() || !repeatCountDouble) Fast path for length == 0 OR repeatCount == 0. > Source/JavaScriptCore/runtime/StringPrototype.cpp:718 > + if (repeatCountDouble == 1) Fast path for repeatCount == 1. > Source/JavaScriptCore/runtime/StringPrototype.cpp:721 > + // JSString requires the limitation that its length is in the range of int32_t. At first, I used `std::numeric_limits<unsigned>::max()` for checking the length. But JSString implicitly requires the StringImpl's length is in range of int32_t. (http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/runtime/JSString.h#L127) So we need to check `std::numeric_limits<int32_t>::max()` here. > Source/JavaScriptCore/runtime/StringPrototype.cpp:722 > + if (repeatCountDouble > std::numeric_limits<int32_t>::max() / string->length()) At first, we use the JSRopeString's OOM check. However, since it requires N time loop, it consumes much time and tests cannot be passed due to timeout. So in this time, I inserted the check that darin suggested. > Source/JavaScriptCore/runtime/StringPrototype.cpp:724 > + unsigned repeatCount = static_cast<unsigned>(repeatCountDouble); Reach here if repeatCountDouble doesn't exceed the int32_t max value. So we can cast safely. (Since string->length() == 0 case early returned, string->length() is always > 0 here. And since the above `repeatCountDouble > std::numeric_limits<int32_t>::max() / string->length()` test is passed, `repeatCountDouble should be <= std::numeric_limits<int32_t>::max()`. > Source/JavaScriptCore/runtime/StringPrototype.cpp:727 > + // allocating a sequential buffer and fill with the repeated string for efficiency. For repeating small string (in this time, length == 1), we use StringImpl::tryCreateUninitialized path. > Source/JavaScriptCore/runtime/StringPrototype.cpp:732 > + return JSValue::encode(repeatSmallString<UChar>(exec, s, repeatCount)); Extracted a repeating code into a function with CharacterType. > Source/JavaScriptCore/runtime/StringPrototype.cpp:735 > + JSRopeString::RopeBuilder ropeBuilder(vm); Use JSRopeString::RopeBuilder for constructing the string.
Darin Adler
Comment 20 2015-01-06 10:41:51 PST
Comment on attachment 244056 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=244056&action=review > Source/JavaScriptCore/runtime/StringPrototype.cpp:728 > + if (string->length() == 1) { If this is really for single character strings, then we can write a much tighter version of the repeatSmallString function, and I think we should. > Source/JavaScriptCore/runtime/StringPrototype.cpp:729 > + String s = string->value(exec); Please don’t use a single letter variable name.
Yusuke Suzuki
Comment 21 2015-01-06 11:19:45 PST
Comment on attachment 244056 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=244056&action=review Thank you for your review! >> Source/JavaScriptCore/runtime/StringPrototype.cpp:728 >> + if (string->length() == 1) { > > If this is really for single character strings, then we can write a much tighter version of the repeatSmallString function, and I think we should. Yes. So instead of generic version of repeatSmallString function, I've introduced new version optimized for single character. In this version, I've used `std::fill_n` to fill the buffer. >> Source/JavaScriptCore/runtime/StringPrototype.cpp:729 >> + String s = string->value(exec); > > Please don’t use a single letter variable name. I've changed it to `repeatedString` variable :)
Darin Adler
Comment 22 2015-01-06 11:32:03 PST
Comment on attachment 244056 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=244056&action=review >>> Source/JavaScriptCore/runtime/StringPrototype.cpp:728 >>> + if (string->length() == 1) { >> >> If this is really for single character strings, then we can write a much tighter version of the repeatSmallString function, and I think we should. > > Yes. So instead of generic version of repeatSmallString function, I've introduced new version optimized for single character. In this version, I've used `std::fill_n` to fill the buffer. Also should decide LChar vs. UChar based on the value of the character, not on whether the string happens to be 8-bit.
Yusuke Suzuki
Comment 23 2015-01-06 11:33:22 PST
Yusuke Suzuki
Comment 24 2015-01-06 12:04:39 PST
Comment on attachment 244056 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=244056&action=review >>>> Source/JavaScriptCore/runtime/StringPrototype.cpp:728 >>>> + if (string->length() == 1) { >>> >>> If this is really for single character strings, then we can write a much tighter version of the repeatSmallString function, and I think we should. >> >> Yes. So instead of generic version of repeatSmallString function, I've introduced new version optimized for single character. In this version, I've used `std::fill_n` to fill the buffer. > > Also should decide LChar vs. UChar based on the value of the character, not on whether the string happens to be 8-bit. Oops, before landing the patch, I've missed reflecting your last comment, sorry for troubling you. I've created the subsequent patch for fixing this issue[1]. 1: https://bugs.webkit.org/show_bug.cgi?id=140139
Note You need to log in before you can comment on or make changes to this bug.