Bug 100576

Summary: Non-special escape character sequences cause JSC::Lexer::parseString to create 16 bit strings
Product: WebKit Reporter: Michael Saboff <msaboff>
Component: JavaScriptCoreAssignee: Michael Saboff <msaboff>
Status: RESOLVED FIXED    
Severity: Normal    
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
Patch
oliver: review-
Patch with suggested updates darin: review+, msaboff: commit-queue?

Description Michael Saboff 2012-10-26 16:47:00 PDT
During processing escape characters, like \n or \xXX, if parseString() sees an escaped character that doesn't change it will create a 16 bit string.  That is an escaped character that returns the character itself. A common example would be slashes in a URL:  "http:\/\/foo.com\/bar".
Comment 1 Michael Saboff 2012-10-26 23:46:23 PDT
Created attachment 171077 [details]
Patch
Comment 2 Oliver Hunt 2012-10-27 10:27:12 PDT
Comment on attachment 171077 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=171077&action=review

Can you also add tests for characters < 32 and greater than 127

Essentially okay, but i'd like to see the refreshed patch before r+'ing

> Source/JavaScriptCore/parser/Lexer.cpp:654
> +        return singleCharacterEscapeValuesForASCII[c-32];

Alas you're not using the FixedArray type (you can't because there's no nice way to initialize it), so i would at the very least add
ASSERT(c - 32 < ARRAY_SIZE(singleCharacterEscapeValuesForASCII));

> LayoutTests/fast/js/script-tests/normal-character-escapes-in-string-literals.js:9
> +    shouldBe('eval(\'"' + escapedCharacter + '"\')', "expectedResult");

don't quote expectedResult here, it will make the test output much more sane.
Comment 3 Michael Saboff 2012-10-29 11:39:41 PDT
Created attachment 171285 [details]
Patch with suggested updates

Needed to change the ShouldBe() to shouldBeEqualToString() to work with the unquoted expectedResult.
Comment 4 Darin Adler 2012-10-29 11:51:30 PDT
Comment on attachment 171285 [details]
Patch with suggested updates

View in context: https://bugs.webkit.org/attachment.cgi?id=171285&action=review

> Source/JavaScriptCore/ChangeLog:16
> +        (JSC::::Lexer):

?

> Source/JavaScriptCore/ChangeLog:19
> +        (JSC::::parseString):
> +        (JSC::::parseStringSlowCase):

Should be:

    (JSC::Lexer::parseString):
    (JSC::Lexer::parseStringSlowCase):

The script gets it wrong, but you could fix it.

> Source/JavaScriptCore/parser/Lexer.cpp:657
> +    if ((c >= 32) && (c <= 127)) {
> +        ASSERT(static_cast<size_t>(c - 32) < ARRAY_SIZE(singleCharacterEscapeValuesForASCII));
> +        return singleCharacterEscapeValuesForASCII[c - 32];
>      }
> +    return 0;

Why not save two branches by putting a bunch of zeros in for 0..31 and 128..255?

> LayoutTests/ChangeLog:8
> +        Added a new test to validated the behaviorC of the corresponding changes to string processing

Typo: behaviorC

> LayoutTests/fast/js/normal-character-escapes-in-string-literals-expected.txt:88
> +PASS eval('"\ "') is " "
> +PASS eval('"\£"') is "£"
> +PASS eval('"\«"') is "«"
> +PASS eval('"\´"') is "´"

All the others are great, but these end up a little ugly.

> LayoutTests/fast/js/script-tests/normal-character-escapes-in-string-literals.js:8
> +    escapedCharacter = '\\' + character;
> +    expectedResult = character;

No need for these local variables.

> LayoutTests/fast/js/script-tests/normal-character-escapes-in-string-literals.js:9
> +    shouldBeEqualToString('eval(\'"' + escapedCharacter + '"\')', expectedResult);

Could be:
 
    shouldBeEqualToString('eval(\'"\\' + character + '"\')', character);
Comment 5 Michael Saboff 2012-10-29 13:46:37 PDT
(In reply to comment #4)
> (From update of attachment 171285 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=171285&action=review
> 
> > Source/JavaScriptCore/ChangeLog:16
> > +        (JSC::::Lexer):
> 
> ?

I think this is from the table.  I deleted it.

> > Source/JavaScriptCore/ChangeLog:19
> > +        (JSC::::parseString):
> > +        (JSC::::parseStringSlowCase):
> 
> Should be:
> 
>     (JSC::Lexer::parseString):
>     (JSC::Lexer::parseStringSlowCase):
> 
> The script gets it wrong, but you could fix it.

Fixed.

> > Source/JavaScriptCore/parser/Lexer.cpp:657
> > +    if ((c >= 32) && (c <= 127)) {
> > +        ASSERT(static_cast<size_t>(c - 32) < ARRAY_SIZE(singleCharacterEscapeValuesForASCII));
> > +        return singleCharacterEscapeValuesForASCII[c - 32];
> >      }
> > +    return 0;
> 
> Why not save two branches by putting a bunch of zeros in for 0..31 and 128..255?

I filled in entries for 0..31 and eliminated one branch and the subtraction.  The argument is an int and can be in the UChar range (0..65535) so I didn't extend beyond 127.

> > LayoutTests/ChangeLog:8
> > +        Added a new test to validated the behaviorC of the corresponding changes to string processing
> 
> Typo: behaviorC

Fixed.

> > LayoutTests/fast/js/normal-character-escapes-in-string-literals-expected.txt:88
> > +PASS eval('"\ "') is " "
> > +PASS eval('"\£"') is "£"
> > +PASS eval('"\«"') is "«"
> > +PASS eval('"\´"') is "´"
> 
> All the others are great, but these end up a little ugly.

I made them half as ugly by using charCodeAt().  New testOther() contains
    shouldEvaluateTo('eval(\'"\\' + character + '"\').charCodeAt(0)', character.charCodeAt(0));

Now those entries are
+PASS eval('"\ "').charCodeAt(0) is 160
+PASS eval('"\£"').charCodeAt(0) is 163
+PASS eval('"\«"').charCodeAt(0) is 171
+PASS eval('"\´"').charCodeAt(0) is 180

If we test characters with codes above 127 I think we'll have this ugliness.

> > LayoutTests/fast/js/script-tests/normal-character-escapes-in-string-literals.js:8
> > +    escapedCharacter = '\\' + character;
> > +    expectedResult = character;
> 
> No need for these local variables.
> 
> > LayoutTests/fast/js/script-tests/normal-character-escapes-in-string-literals.js:9
> > +    shouldBeEqualToString('eval(\'"' + escapedCharacter + '"\')', expectedResult);
> 
> Could be:
> 
>     shouldBeEqualToString('eval(\'"\\' + character + '"\')', character);

Fixed.
Comment 6 Michael Saboff 2012-10-29 15:50:22 PDT
Committed r132853: <http://trac.webkit.org/changeset/132853>