Bug 140420

Summary: JavaScript identifier incorrectly parsed if the prefix before an escape sequence is a keyword
Product: WebKit Reporter: Alan Tam <Tam>
Component: JavaScriptCoreAssignee: Michael Saboff <msaboff>
Severity: Normal CC: ggaren, msaboff
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Description Flags
oliver: review+
Performance results of the patch none

Description Alan Tam 2015-01-13 17:49:23 PST
1. Run "in\u00e9dit = 1" in JavaScript console.

EXPECTED: It sets variable inédit to 1 and returns 1.
ACTUAL: SyntaxError: Unexpected keyword 'in'

This is a result of a minifier trying to convert all code to ASCII.

Chrome 39 and Firefox 34 both work fine.
Comment 1 Michael Saboff 2015-01-13 20:47:24 PST
This is probably related to adding for..in iteration to the parser.
Comment 2 Michael Saboff 2015-01-13 20:57:33 PST
Using ToT r178251.

For "in\u00e9dit = 1;" I get:
SyntaxError: Unexpected keyword 'in'

For "var in\u00e9dit = 1;" I get:
SyntaxError: Cannot use the keyword 'in' as a variable name.
Comment 3 Alan Tam 2015-01-14 02:45:00 PST
It is not limited to for..in, but all keywords. Indeed, object literal is another way to trigger the bug.

> ({while\u00e9dit:1})
SyntaxError: Unexpected identifier '\u00e9dit'. Expected a ':' following the property name 'while'.

Again, this works in Chrome and Firefox, returning this hash: {"whileédit":1}
Comment 4 Michael Saboff 2015-01-14 08:09:39 PST
Yes, it affects all keywords.

Test performance of a patch now.
Comment 5 Michael Saboff 2015-01-14 08:58:09 PST
The problem is due to parseKeyword() matching the "in" or any other keyword.  It then calls isIdentPart() on the next character, the \ for the unicode escape.  isIdentPart() only looks for characters with the types of CharacterIdentifierStart, CharacterZero and CharacterNumber.  The \ character is CharacterBackSlash.  The character that results from the unicode escape \u00e9 is é, which has the character class CharacterIdentifierStart.

parseKeyword() is generated from KeywordLookupGenerator.py.  Looks like it needs to be taught about escaped characters.

Adding a new isIdentPartOrEscape() function that will call isIdentPart().  If that fails, it looks for '\' an a valid unicode escape.  If it finds one, it checks that unicode character with isIdentPart().
Comment 6 Michael Saboff 2015-01-14 10:26:22 PST
Created attachment 244611 [details]
Comment 7 Michael Saboff 2015-01-14 10:29:43 PST
Created attachment 244612 [details]
Performance results of the patch

Seems to be neutral.
Comment 8 Michael Saboff 2015-01-14 10:48:54 PST
Committed r178427: <http://trac.webkit.org/changeset/178427>