Bug 140420

Summary: JavaScript identifier incorrectly parsed if the prefix before an escape sequence is a keyword
Product: WebKit Reporter: Alan Tam <Tam>
Component: JavaScriptCoreAssignee: Michael Saboff <msaboff>
Status: RESOLVED FIXED    
Severity: Normal CC: ggaren, msaboff
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
oliver: review+
Performance results of the patch none

Alan Tam
Reported 2015-01-13 17:49:23 PST
STEPS: 1. Run "in\u00e9dit = 1" in JavaScript console. EXPECTED: It sets variable inédit to 1 and returns 1. ACTUAL: SyntaxError: Unexpected keyword 'in' This is a result of a minifier trying to convert all code to ASCII. Chrome 39 and Firefox 34 both work fine.
Attachments
Patch (6.25 KB, patch)
2015-01-14 10:26 PST, Michael Saboff
oliver: review+
Performance results of the patch (50.09 KB, text/plain)
2015-01-14 10:29 PST, Michael Saboff
no flags
Michael Saboff
Comment 1 2015-01-13 20:47:24 PST
This is probably related to adding for..in iteration to the parser.
Michael Saboff
Comment 2 2015-01-13 20:57:33 PST
Using ToT r178251. For "in\u00e9dit = 1;" I get: SyntaxError: Unexpected keyword 'in' For "var in\u00e9dit = 1;" I get: SyntaxError: Cannot use the keyword 'in' as a variable name.
Alan Tam
Comment 3 2015-01-14 02:45:00 PST
It is not limited to for..in, but all keywords. Indeed, object literal is another way to trigger the bug. > ({while\u00e9dit:1}) SyntaxError: Unexpected identifier '\u00e9dit'. Expected a ':' following the property name 'while'. Again, this works in Chrome and Firefox, returning this hash: {"whileédit":1}
Michael Saboff
Comment 4 2015-01-14 08:09:39 PST
Yes, it affects all keywords. Test performance of a patch now.
Michael Saboff
Comment 5 2015-01-14 08:58:09 PST
The problem is due to parseKeyword() matching the "in" or any other keyword. It then calls isIdentPart() on the next character, the \ for the unicode escape. isIdentPart() only looks for characters with the types of CharacterIdentifierStart, CharacterZero and CharacterNumber. The \ character is CharacterBackSlash. The character that results from the unicode escape \u00e9 is é, which has the character class CharacterIdentifierStart. parseKeyword() is generated from KeywordLookupGenerator.py. Looks like it needs to be taught about escaped characters. Adding a new isIdentPartOrEscape() function that will call isIdentPart(). If that fails, it looks for '\' an a valid unicode escape. If it finds one, it checks that unicode character with isIdentPart().
Michael Saboff
Comment 6 2015-01-14 10:26:22 PST
Michael Saboff
Comment 7 2015-01-14 10:29:43 PST
Created attachment 244612 [details] Performance results of the patch Seems to be neutral.
Michael Saboff
Comment 8 2015-01-14 10:48:54 PST
Note You need to log in before you can comment on or make changes to this bug.