WebKit doesn't permit escaped sequences in JavaScript identifiers. As I understand ECMA-262, these are supposed to be handled exactly like the corresponding characters. Firefox passes this test.
Created attachment 3849 [details] test case
Created attachment 4309 [details] Darin's patch extracted from 4885. Fixes unicode escapes Darin's patch extracted from bug 4885. Fixes unicode escapes. Note: in 4885, there's no handling of the case where there is an unicode escape in an identifier which maps to a token after the escape is augmented.
Comment on attachment 4309 [details] Darin's patch extracted from 4885. Fixes unicode escapes Looks great (not sure if I can review my own patch). Where does the specification say that if you use \u sequences, then an identifier is not a keyword? I re-read it and that was not clear to me. Did you test other implementations to see if that was so? (Part of the reason I ask is that I know in C++ it does not work this way; you can use the sequences in keywords.) The boolean identifierIsNotToken is a great way to implement that rule, but I recommend naming it identifierIsNotKeyword instead. Another way to implement it would be to create a separate state. The old one could be renamed IdentifierOrKeyword and the new state would be just Identifier. The IdentifierOrKeyword case could fall into the Identifier case after checking for and handling keywords. Besides allowing \u sequences in identifiers, this patch also implements the full rule for what characters are allowed in identifiers. So the attached test case is insufficient. We need test cases that cover a variety of valid and invalid identifiers, as well as covering the case of an identifer that uses a \u sequence to avoid collision with a keyword. The minimum that would be needed on this patch would be renaming identifierIsNotToken to identifierIsNotKeyword and adding more test cases.
Created attachment 4313 [details] New version of the unicode escape fix patch Unicode escape fix. Changes the bool variable identifierIsNotToken to few extra states.
(In reply to comment #3) > Where does the specification say that if you use \u sequences, then an > identifier is not a keyword? I re-read it and that was not clear to me. Did you > test other implementations to see if that was so? (Part of the reason I ask is > that I know in C++ it does not work this way; you can use the sequences in > keywords.) The grammar says that 'Keyword' can be one of the set of keywords and that 'IdentifierStart' can have 'UnicodeEscape', so I assumed that that means that 'Keyword' can not have 'UnicodeEscape'. As far as I can see, the spec says that you shouldn't have unicode escapes in keywords. It's not explicitly written out, so I could be missing something.. Yes, Firefox behaves in same way. There's also mozilla test to test this - ecma_3/Unicode/uc-003.js is fixed with the patch.. > The boolean identifierIsNotToken is a great way to implement that rule, but I > recommend naming it identifierIsNotKeyword instead. Another way to implement it > would be to create a separate state. The old one could be renamed > IdentifierOrKeyword and the new state would be just Identifier. The > IdentifierOrKeyword case could fall into the Identifier case after checking for > and handling keywords. This is certainly more explicit and clearer. I added this to the new patch. > Besides allowing \u sequences in identifiers, this patch also implements the > full rule for what characters are allowed in identifiers. So the attached test > case is insufficient. We need test cases that cover a variety of valid and > invalid identifiers, as well as covering the case of an identifer that uses a > \u sequence to avoid collision with a keyword. There's atleast that ecma_3/Unicode/uc-003.js which is fixed. It tests the collision with the keyword.
Created attachment 4314 [details] Testcase to demonstrate keyword clash with unicode escape sequences Testcase to demonstrate keyword clash with unicode escape sequences. Duplicates the test of ecma_3/Unicode/uc-003.js
BTW: here's the list of tests this fixes. Unfortunately there are one unrelated test case which is fixed.. ecma_3/Function/regress-193555.js I don't know why this is fixed. I thought it has something to do with additional checks for whitespace characters (ie. the test may have unicode whitespace chars which were previously not accepted as whitespace). With fast inspection, I couldn't find such a characters, though. ecma_3/Function/regress-58274.js Unicode escape test for escaped chars. ecma_3/Unicode/uc-003.js Tests identifiers which have escaped unicode chars. Also tests the keyword clash ecma_3/Unicode/uc-005.js Unicode escape test for escaped chars. BTW. There might be problems with following constructs: function id\1234() { }; if (id\1234.toString() == somestringvar) because 'id\1234.toString()' doesn't return string with escaped text, but string 'id?' where ? is the augmented unicode character. So some information of the source is lost. In Firefox situation is similar but slightly better: for 'id\1234' FF returns 'id\1234', but for 'id\abcd' FF returns 'id\ABCD'.
(In reply to comment #7) > function id\1234() { }; > if (id\1234.toString() == somestringvar) And of course there were tons of typos in the comment #7. The \1234 should've been \u1234. For example the above snippet should have been something like function id\u1234() { }; if (id\u1234.toString() == somestringvar)
This also fixes kjs/parse.js, via permitting unicode identifiers. I can spot a bug though --- it needs to check that decoded chars are valid identifier start/continuation. From 7.6: "A UnicodeEscapeSequence cannot be used to put a character into an identifier that would otherwise be illegal. In other words, if a \ UnicodeEscapeSequence sequence were replaced by its UnicodeEscapeSequence's CV, the result must still be a valid Identifier that has the exact same sequence of characters as the original Identifier."
Comment on attachment 4313 [details] New version of the unicode escape fix patch The bug Maks mentions is probably worthy of a separate bug report. This patch, however, seems to have addressed Darin's previous comments. Darin?
Comment on attachment 4313 [details] New version of the unicode escape fix patch Yes, seems fine to land this as the next step in improving things. I'd like to see the problem Maks reports fixed.
Comment on attachment 4313 [details] New version of the unicode escape fix patch The patch contains some tab characters. It would be best to land it without those.
Landed. Didn't land layout test because comments indicate it's a duplicate of the mozilla test.
*** Bug 4918 has been marked as a duplicate of this bug. ***