Bug 140420 - JavaScript identifier incorrectly parsed if the prefix before an escape sequence is a keyword
Summary: JavaScript identifier incorrectly parsed if the prefix before an escape seque...
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Michael Saboff
Depends on:
Reported: 2015-01-13 17:49 PST by Alan Tam
Modified: 2015-01-14 10:48 PST (History)
2 users (show)

See Also:

Patch (6.25 KB, patch)
2015-01-14 10:26 PST, Michael Saboff
oliver: review+
Details | Formatted Diff | Diff
Performance results of the patch (50.09 KB, text/plain)
2015-01-14 10:29 PST, Michael Saboff
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alan Tam 2015-01-13 17:49:23 PST
1. Run "in\u00e9dit = 1" in JavaScript console.

EXPECTED: It sets variable inédit to 1 and returns 1.
ACTUAL: SyntaxError: Unexpected keyword 'in'

This is a result of a minifier trying to convert all code to ASCII.

Chrome 39 and Firefox 34 both work fine.
Comment 1 Michael Saboff 2015-01-13 20:47:24 PST
This is probably related to adding for..in iteration to the parser.
Comment 2 Michael Saboff 2015-01-13 20:57:33 PST
Using ToT r178251.

For "in\u00e9dit = 1;" I get:
SyntaxError: Unexpected keyword 'in'

For "var in\u00e9dit = 1;" I get:
SyntaxError: Cannot use the keyword 'in' as a variable name.
Comment 3 Alan Tam 2015-01-14 02:45:00 PST
It is not limited to for..in, but all keywords. Indeed, object literal is another way to trigger the bug.

> ({while\u00e9dit:1})
SyntaxError: Unexpected identifier '\u00e9dit'. Expected a ':' following the property name 'while'.

Again, this works in Chrome and Firefox, returning this hash: {"whileédit":1}
Comment 4 Michael Saboff 2015-01-14 08:09:39 PST
Yes, it affects all keywords.

Test performance of a patch now.
Comment 5 Michael Saboff 2015-01-14 08:58:09 PST
The problem is due to parseKeyword() matching the "in" or any other keyword.  It then calls isIdentPart() on the next character, the \ for the unicode escape.  isIdentPart() only looks for characters with the types of CharacterIdentifierStart, CharacterZero and CharacterNumber.  The \ character is CharacterBackSlash.  The character that results from the unicode escape \u00e9 is é, which has the character class CharacterIdentifierStart.

parseKeyword() is generated from KeywordLookupGenerator.py.  Looks like it needs to be taught about escaped characters.

Adding a new isIdentPartOrEscape() function that will call isIdentPart().  If that fails, it looks for '\' an a valid unicode escape.  If it finds one, it checks that unicode character with isIdentPart().
Comment 6 Michael Saboff 2015-01-14 10:26:22 PST
Created attachment 244611 [details]
Comment 7 Michael Saboff 2015-01-14 10:29:43 PST
Created attachment 244612 [details]
Performance results of the patch

Seems to be neutral.
Comment 8 Michael Saboff 2015-01-14 10:48:54 PST
Committed r178427: <http://trac.webkit.org/changeset/178427>