Bug 26590 - Support for String.trim(), String.trimLeft() and String.trimRight() methods
Summary: Support for String.trim(), String.trimLeft() and String.trimRight() methods
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.5
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-06-21 15:51 PDT by Ioseb Dzmanashvili
Modified: 2013-08-13 03:13 PDT (History)
3 users (show)

See Also:


Attachments
Implementation of all three methods(trim, leftTrim, rightTrim) and corresponding test case. (5.60 KB, patch)
2009-06-21 15:55 PDT, Ioseb Dzmanashvili
oliver: review-
Details | Formatted Diff | Diff
String.trim(), String.trimLeft(), String.trimRight() patch update (11.54 KB, patch)
2009-06-22 15:52 PDT, Ioseb Dzmanashvili
no flags Details | Formatted Diff | Diff
String.trim(), String.trimLeft(), String.trimRight() patch update (11.88 KB, patch)
2009-06-23 12:25 PDT, Ioseb Dzmanashvili
darin: review-
Details | Formatted Diff | Diff
String.trim(), String.trimLeft(), String.trimRight() patch update (10.82 KB, patch)
2009-06-23 15:07 PDT, Ioseb Dzmanashvili
darin: review-
Details | Formatted Diff | Diff
String.trim(), String.trimLeft(), String.trimRight() patch update (11.61 KB, patch)
2009-06-24 01:28 PDT, Ioseb Dzmanashvili
no flags Details | Formatted Diff | Diff
Support for String.trim(), String.trimLeft() and String.trimRight() methods (12.97 KB, patch)
2009-10-10 19:48 PDT, Oliver Hunt
mjs: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ioseb Dzmanashvili 2009-06-21 15:51:06 PDT
Just created patch for supporting trim(), trimLeft() and trimRight() methods.

For more details see: https://developer.mozilla.org/En/Core_JavaScript_1.5_Reference/Global_Objects/String/Trim

String.trim() is part of the ECMAScript 5 standard, while String.trimLeft() and String.trimRight() are not...
Comment 1 Ioseb Dzmanashvili 2009-06-21 15:55:22 PDT
Created attachment 31624 [details]
Implementation of all three methods(trim, leftTrim, rightTrim) and corresponding test case.
Comment 2 Oliver Hunt 2009-06-21 16:06:45 PDT
Comment on attachment 31624 [details]
Implementation of all three methods(trim, leftTrim, rightTrim) and corresponding test case.

> +static inline int isWhiteSpace(int c) 
> +{
> +	return (c >= 0x09 && c <= 0x0d) || c == 0x20;
> +}
There is already an isWhitespace style function  
> +static inline JSValue trimString(ExecState* exec, JSValue thisValue, bool left, bool right) 
> +{		
> +	JSString* sVal = thisValue.toThisJSString(exec);
toThisJSString may throw, so there should be an exception check here.

> +	const UString& s = sVal->value();
> +		
> +	int sSize = s.size();
> +	if (!sSize)
> +		return sVal;
> +		
> +	const UChar* sData = s.data();
> +		
> +	int start = 0;
> +	int end = sSize - 1;
...		
> +	return jsSubstring(exec, s, static_cast<unsigned>(start), static_cast<unsigned>((end - start) + 1));
> +}
Rather than casting here just define start and end as unsigned in the first place
> Index: LayoutTests/fast/js/resources/string-trim.js
> ===================================================================
> --- LayoutTests/fast/js/resources/string-trim.js	(revision 0)
> +++ LayoutTests/fast/js/resources/string-trim.js	(revision 0)
> @@ -0,0 +1,12 @@
> +description(
> +"This test checks trim(), leftTrim() and rightTrim() functions."
> +);
> +
> +shouldBe("'					   '.trim()",          "''");
> +shouldBe("'					   '.trimLeft()",      "''");
> +shouldBe("'					   '.trimRight()",     "''");
> +shouldBe("'		foo bar			   '.trim()",      "'foo bar'");
> +shouldBe("'		foo bar			   '.trimLeft()",  "'foo bar			   '");
> +shouldBe("'		foo bar			   '.trimRight()", "'		foo bar'");

You should test non-space whitespace as well, \n, \t, \r, etc, also you should test trim.apply({}), etc
Comment 3 Darin Adler 2009-06-21 18:08:01 PDT
Looks really good. Thanks for taking this on!

The test case needs to cover a lot more cases, including the various characters that are and are not whitespace.

Also, we frown on the use of mysterious booleans. I think an enum TrimLeft, TrimRight, TrimBothLeftAndRight would work better than two booleans anyway.
Comment 4 Ioseb Dzmanashvili 2009-06-22 15:52:02 PDT
Created attachment 31687 [details]
String.trim(), String.trimLeft(), String.trimRight() patch update
Comment 5 Ioseb Dzmanashvili 2009-06-22 15:55:06 PDT
(In reply to comment #2)

Thank you Oliver!

I've added tests for all whitespace characters that are recommended by Ecma 3.1 spec. Tests for *.call() and *.apply() methods are added as well.

> (From update of attachment 31624 [details] [review])
> > +static inline int isWhiteSpace(int c) 
> > +{
> > +	return (c >= 0x09 && c <= 0x0d) || c == 0x20;
> > +}
> There is already an isWhitespace style function  
> > +static inline JSValue trimString(ExecState* exec, JSValue thisValue, bool left, bool right) 
> > +{		
> > +	JSString* sVal = thisValue.toThisJSString(exec);
> toThisJSString may throw, so there should be an exception check here.
> 
> > +	const UString& s = sVal->value();
> > +		
> > +	int sSize = s.size();
> > +	if (!sSize)
> > +		return sVal;
> > +		
> > +	const UChar* sData = s.data();
> > +		
> > +	int start = 0;
> > +	int end = sSize - 1;
> ...             
> > +	return jsSubstring(exec, s, static_cast<unsigned>(start), static_cast<unsigned>((end - start) + 1));
> > +}
> Rather than casting here just define start and end as unsigned in the first
> place
> > Index: LayoutTests/fast/js/resources/string-trim.js
> > ===================================================================
> > --- LayoutTests/fast/js/resources/string-trim.js	(revision 0)
> > +++ LayoutTests/fast/js/resources/string-trim.js	(revision 0)
> > @@ -0,0 +1,12 @@
> > +description(
> > +"This test checks trim(), leftTrim() and rightTrim() functions."
> > +);
> > +
> > +shouldBe("'					   '.trim()",          "''");
> > +shouldBe("'					   '.trimLeft()",      "''");
> > +shouldBe("'					   '.trimRight()",     "''");
> > +shouldBe("'		foo bar			   '.trim()",      "'foo bar'");
> > +shouldBe("'		foo bar			   '.trimLeft()",  "'foo bar			   '");
> > +shouldBe("'		foo bar			   '.trimRight()", "'		foo bar'");
> 
> You should test non-space whitespace as well, \n, \t, \r, etc, also you should
> test trim.apply({}), etc
> 

Comment 6 Ioseb Dzmanashvili 2009-06-22 16:01:32 PDT
(In reply to comment #4)

Thank you for positive response Darin!

I've added more tests for all mentioned characters. 

"Also, we frown on the use of mysterious booleans. I think an enum TrimLeft,
TrimRight, TrimBothLeftAndRight would work better than two booleans anyway."

Actually this is my first attempt to contribute code to the WebKit engine and I don't know if my code violates coding conventions.. if this is case I'll change code of course... these "mysterious booleans" are used in "trimString()" method that is used only internally by: trim(), trimLeft() and trimRight() methods and actually is not accessible from JavaScript.




> Created an attachment (id=31687) [review]
> String.trim(), String.trimLeft(), String.trimRight() patch update
> 

Comment 7 Darin Adler 2009-06-22 16:46:25 PDT
(In reply to comment #6)
> "Also, we frown on the use of mysterious booleans. I think an enum TrimLeft,
> TrimRight, TrimBothLeftAndRight would work better than two booleans anyway."
> 
> Actually this is my first attempt to contribute code to the WebKit engine and I
> don't know if my code violates coding conventions. if this is case I'll change
> code of course. these "mysterious booleans" are used in "trimString()" method
> that is used only internally by: trim(), trimLeft() and trimRight() methods and
> actually is not accessible from JavaScript.

Even inside the code, those booleans are unnecessarily vague, and an enum would work better.
Comment 8 Ioseb Dzmanashvili 2009-06-22 17:06:55 PDT
(In reply to comment #7)

Thank you for quick and helpful response, I agree with you I'll update patch ASAP.

> 
> Even inside the code, those booleans are unnecessarily vague, and an enum would
> work better.
> 

Comment 9 Ioseb Dzmanashvili 2009-06-23 12:25:43 PDT
Created attachment 31724 [details]
String.trim(), String.trimLeft(), String.trimRight() patch update
Comment 10 Ioseb Dzmanashvili 2009-06-23 12:33:14 PDT
(In reply to comment #7)

Darin,

I've updated patch and removed those booleans from the code... I changed them with enum as you suggested.

> 
> Even inside the code, those booleans are unnecessarily vague, and an enum would
> work better.
> 

Comment 11 Darin Adler 2009-06-23 13:03:56 PDT
Comment on attachment 31724 [details]
String.trim(), String.trimLeft(), String.trimRight() patch update

Thanks very much for addressing my comments from the early version. I now read this even more closely and have some new comments.

I have doubts about is the isWhiteSpace function.

There is already an isWhiteSpace function in Lexer, an isStrWhiteSpace function in JSGlobalObjectFunctions, and WTF::Unicode::isSeparatorSpace.

I suspect that this new isWhiteSpace function is identical to WTF::Unicode::isSeparatorSpace -- could you check on that?

Also, I'm not sure it's appropriate for the large isWhiteSpace function to be marked inline, and I suspect that the const int array ends up being re-created and initialized every time the function is called. None of this matters if we can just use isSeparatorSpace.

> +        (JSC::):

Please don't leave a line like this in the ChangeLog, even if the prepare-ChangeLog script puts it there.

> +static inline JSValue trimString(ExecState* exec, JSValue thisValue, TrimMode trimMode) 
> +{		
> +    JSString* sVal = thisValue.toThisJSString(exec);
> +    const UString& s = sVal->value();
> +		
> +    int sSize = s.size();
> +    if (!sSize)
> +        return sVal;
> +		
> +    const UChar* sData = s.data();
> +		
> +    int start = 0;
> +    int end = sSize - 1;
> +		
> +    if (trimMode == TrimLeft || trimMode == TrimBothLeftAndRight) {
> +        while (start < sSize && isWhiteSpace(sData[start])) {
> +            start++;
> +        }
> +    }
> +		
> +    if(trimMode == TrimRight || trimMode == TrimBothLeftAndRight) {
> +        while (end >= start && isWhiteSpace(sData[end])) {
> +            end--;
> +        }
> +    }
> +	
> +    return jsSubstring(exec, s, static_cast<unsigned>(start), static_cast<unsigned>((end - start) + 1));
> +}

I think the logic would be slightly clearer if end pointed past the last character instead of at the last character. You would have to say "end - 1" in the place you call isWhiteSpace, but otherwise it would be easier to read the function.

I suggest using unsigned instead of int for end, start, and sSize. You only need a signed int because of end being a pointer before the last character, I think. You could get rid of the type casts, then.

I think sSize could just be named size.

No braces around single line bodies of while. Need a space after if and before the open parenthesis.

This needs a special case to return sVal when start is 0 and end is sSize. I think it's common to call trim and not trim anything, and in that case we don't want to allocate a new JSString cell.

There is no need for the special case for a size of zero. The rest of the function already does the right thing for that case already, including the jsSubstring function which won't allocate anything for a zero-length string, and I don't think an empty string needs extra performance optimization.

> +var successfullyParsed = true;
> \ No newline at end of file

Please add the newline here.

I'd like to see more test cases of strings that don't need to be trimmed, including characters that might seem to be spaces but don't qualify, such as control characters. If we can make the test run fast enough it might even be good to find a way to test many other characters to see they are not trimmed. I can't think of an efficient way to do it on all 2^21 characters, though, so you might have to be creative.

review- for now since I'd like you to do at least some of the things I suggest.
Comment 12 Ioseb Dzmanashvili 2009-06-23 15:07:45 PDT
Created attachment 31741 [details]
String.trim(), String.trimLeft(), String.trimRight() patch update
Comment 13 Ioseb Dzmanashvili 2009-06-23 15:33:23 PDT
(In reply to comment #11)

Darin,

Thank you for detailed explanations and suggestions! I've changed the code.

> I have doubts about is the isWhiteSpace function.
> 
> There is already an isWhiteSpace function in Lexer, an isStrWhiteSpace function
> in JSGlobalObjectFunctions, and WTF::Unicode::isSeparatorSpace.
> 
> I suspect that this new isWhiteSpace function is identical to
> WTF::Unicode::isSeparatorSpace -- could you check on that?

I examined functions you mentioned before I wrote my own version. WTF::Unicode::isSeparatorSpace only checks if character is "space" and nothing more, I already tried the function before.

isStrWhiteSpace from JSGlobalObjectFunctions and isWhiteSpace from Lexer functions are not enough as far as they check very limited amount of possible whitespaces. Whitespace list I used in my own version is recommended by Ecma 3.1 spec.

> Also, I'm not sure it's appropriate for the large isWhiteSpace function to be
> marked inline, and I suspect that the const int array ends up being re-created
> and initialized every time the function is called. None of this matters if we
> can just use isSeparatorSpace.

I changed function to be inline no more. I analyzed code more carefully and found that there were two ranges of characters - one from 0x09 to 0x0d and second 0x2000 to 0x200b, without these ranges there left very few whitespace characters so I changed implementation and made it similar to the function from JSGlobalObjectFunctions.. so array re-creation and re-initialization is not the case anymore I think.
 
> > +        (JSC::):
> 
> Please don't leave a line like this in the ChangeLog, even if the
> prepare-ChangeLog script puts it there.

Done

> I think the logic would be slightly clearer if end pointed past the last
> character instead of at the last character. You would have to say "end - 1" in
> the place you call isWhiteSpace, but otherwise it would be easier to read the
> function.

Done
 
> I suggest using unsigned instead of int for end, start, and sSize. You only
> need a signed int because of end being a pointer before the last character, I
> think. You could get rid of the type casts, then.

Done
 
> I think sSize could just be named size.

Done
 
> No braces around single line bodies of while. Need a space after if and before
> the open parenthesis.

Done
 
> This needs a special case to return sVal when start is 0 and end is sSize. I
> think it's common to call trim and not trim anything, and in that case we don't
> want to allocate a new JSString cell.

Done
 
> There is no need for the special case for a size of zero. The rest of the
> function already does the right thing for that case already, including the
> jsSubstring function which won't allocate anything for a zero-length string,
> and I don't think an empty string needs extra performance optimization.

Done
 
> > +var successfullyParsed = true;
> > \ No newline at end of file
> 
> Please add the newline here.

Done
 
> I'd like to see more test cases of strings that don't need to be trimmed,
> including characters that might seem to be spaces but don't qualify, such as
> control characters. If we can make the test run fast enough it might even be
> good to find a way to test many other characters to see they are not trimmed. I
> can't think of an efficient way to do it on all 2^21 characters, though, so you
> might have to be creative.

Maybe I'm wrong, but in test cases I'm using whitespace character list recommended by Ecma 3.1 spec... other whitespace alike characters couldn't be treated as whitespace chars because characters are matched against predefined list.

Thank you for being responsive and helpful!

Comment 14 Darin Adler 2009-06-23 17:02:36 PDT
Comment on attachment 31741 [details]
String.trim(), String.trimLeft(), String.trimRight() patch update

Looks better. Thanks again for working on it.

> Index: JavaScriptCore/ChangeLog
> ===================================================================
> --- JavaScriptCore/ChangeLog	(revision 45008)
> +++ JavaScriptCore/ChangeLog	(working copy)
> @@ -1,3 +1,14 @@
> +2009-06-23  Ioseb Dzmanashvili  <ioseb.dzmanashvili@gmail.com>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        * runtime/StringPrototype.cpp:
> +        (JSC::isWhiteSpace):
> +        (JSC::trimString):
> +        (JSC::stringProtoFuncTrim):
> +        (JSC::stringProtoFuncTrimLeft):
> +        (JSC::stringProtoFuncTrimRight):

You need to have a change log entry here, with comments explaining what you did, and quoting the URL and title of the bug. Look at other ChangeLog entries to see examples of how to do it. Ideallyou you have a separate comment for each function.

> +bool isWhiteSpace(int c) 
> +{	
> +    switch (c) {
> +        case 0x20:
> +        case 0xa0:
> +        case 0x1680:
> +        case 0x180e:
> +        case 0x2028:
> +        case 0x2029:
> +        case 0x202f:
> +        case 0x205f:
> +        case 0x3000:
> +            return true;
> +        default:
> +            return c >= 0x09 && c <= 0x0d || c >= 0x2000 && c <= 0x200b || isSeparatorSpace(c);
> +	}
> +}

If this is used only within the file it should be marked "static" to give it internal linkage.

I don't understand what's going on with this function. It has the name "isWhiteSpace" which doesn't seem specific enough. Is there any difference between this and the isStrWhiteSpace function in JSGlobalObjectFunctions.h? Can you just use that function instead of adding your own?

> +    if(trimMode == TrimRight || trimMode == TrimBothLeftAndRight) {

Need a space here as I mentioned before.

> +    return jsSubstring(exec, s, static_cast<unsigned>(start), static_cast<unsigned>(end - start));

Please remove these unneeded static_cast.

review- because of the isWhiteSpace issue and lack of ChangeLog contents
Comment 15 Ioseb Dzmanashvili 2009-06-24 01:28:38 PDT
Created attachment 31774 [details]
String.trim(), String.trimLeft(), String.trimRight() patch update
Comment 16 Ioseb Dzmanashvili 2009-06-24 01:37:23 PDT
(In reply to comment #14)

> You need to have a change log entry here, with comments explaining what you
> did, and quoting the URL and title of the bug. Look at other ChangeLog entries
> to see examples of how to do it. Ideallyou you have a separate comment for each
> function.

Done

> If this is used only within the file it should be marked "static" to give it
> internal linkage.

Done

> I don't understand what's going on with this function. It has the name
> "isWhiteSpace" which doesn't seem specific enough. Is there any difference
> between this and the isStrWhiteSpace function in JSGlobalObjectFunctions.h? Can
> you just use that function instead of adding your own?

As I mentioned in my previous comment "isStrWhiteSpace" function in JSGlobalObjectFunction checks passed character against very limited list of whitespace chars... that was the reason to implementing my own function. I've nothing against existing functions but what they do is not really enough. 

> > +    if(trimMode == TrimRight || trimMode == TrimBothLeftAndRight) {
> 
> Need a space here as I mentioned before.

Done

> > +    return jsSubstring(exec, s, static_cast<unsigned>(start), static_cast<unsigned>(end - start));
> 
> Please remove these unneeded static_cast.
> 
> review- because of the isWhiteSpace issue and lack of ChangeLog contents
> 

Comment 17 Darin Adler 2009-06-24 09:47:35 PDT
(In reply to comment #16)
> As I mentioned in my previous comment "isStrWhiteSpace" function in
> JSGlobalObjectFunction checks passed character against very limited list of
> whitespace chars.

Please cite an example of a particular character that your function return true for and isStrWhiteSpace returns false for. And if these functions are different, please change the name of your to make it clear why it's different.
Comment 18 Ioseb Dzmanashvili 2009-06-24 15:24:53 PDT
(In reply to comment #17)

> Please cite an example of a particular character that your function return true
> for and isStrWhiteSpace returns false for. And if these functions are
> different, please change the name of your to make it clear why it's different.

Darin,

I'm very sorry, I wasn't careful enough.. problem was really in code I used to test capabilities of "isStrWhiteSpace". Real problem is in one character "\u200B" - ZERO WIDTH SPACE (category Cf) for other white space characters "isStrWhiteSpace" works perfectly. Obviously I need to remove my own function(at least rewrite) and what would be your suggestion? one solution is to check this particular character in the "while" condition, second I think would be creating inline function that wraps "isStrWhiteSpace()" function and  "\u200B" character check, something like this:

>static inline bool isCharWhiteSpace(int c) {
> return isStrWhiteSpace(c) || c == 0x200b;
>}

third one would ignoring this particular char.. but I thinks this solution is not suitable..
Comment 19 Eric Seidel (no email) 2009-06-30 03:17:44 PDT
Comment on attachment 31774 [details]
String.trim(), String.trimLeft(), String.trimRight() patch update

r- per the above (unresolved) isWhiteSpace discussion.

My question would be why should isStrWhiteSpace not respect 0x200b?  If it should then we should just change isStrWhiteSpace.  If it shouldn't, then you should add your own function:

static inline bool isTrimableWhiteSpaceChar(int c)
{
    return isStrWhiteSpace(c) || c == 0x200b; // zero width space
}

or similar.
Comment 20 Ioseb Dzmanashvili 2009-06-30 07:59:08 PDT
(In reply to comment #19)

Hi Eric,

Thank you for response.

> My question would be why should isStrWhiteSpace not respect 0x200b?  If it
> should then we should just change isStrWhiteSpace.

I'm not sure about this... I don't know if "0x200b" should be included in isStrWhiteSpace function's implementation. I took these characters from Ecma 3.1 proposal for String.trim();

Here is the link to the PDF document: http://wiki.ecmascript.org/lib/exe/fetch.php?id=es3.1%3Atargeted_additions_to_array_string_object_date&cache=cache&media=es3.1:es31_stringobject.pdf

Source: http://wiki.ecmascript.org/doku.php?id=es3.1:targeted_additions_to_array_string_object_date

> If it shouldn't, then you
> should add your own function:
> 
> static inline bool isTrimableWhiteSpaceChar(int c)
> {
>     return isStrWhiteSpace(c) || c == 0x200b; // zero width space
> }

Decision is up to you, I just need to know what to do next. I'm waiting for response and I'll upload patch immediately.

Thank you again.
Comment 21 Oliver Hunt 2009-10-10 19:48:12 PDT
Created attachment 40996 [details]
Support for String.trim(), String.trimLeft() and String.trimRight() methods
Comment 22 Maciej Stachowiak 2009-10-10 20:44:08 PDT
Comment on attachment 40996 [details]
Support for String.trim(), String.trimLeft() and String.trimRight() methods

r=me
Comment 23 Oliver Hunt 2009-10-10 20:47:28 PDT
Committed r49423
Comment 24 Vano Beridze 2009-10-13 04:38:12 PDT
IMHO Ioseb Dzmanashvili should be mentioned in the patch as the original author of this enhacement.