WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
4921
\u escape sequences in JavaScript identifiers
https://bugs.webkit.org/show_bug.cgi?id=4921
Summary
\u escape sequences in JavaScript identifiers
Alexey Proskuryakov
Reported
2005-09-10 13:09:45 PDT
WebKit doesn't permit escaped sequences in JavaScript identifiers. As I understand ECMA-262, these are supposed to be handled exactly like the corresponding characters. Firefox passes this test.
Attachments
test case
(669 bytes, text/html)
2005-09-10 13:09 PDT
,
Alexey Proskuryakov
no flags
Details
Darin's patch extracted from 4885. Fixes unicode escapes
(9.55 KB, patch)
2005-10-11 07:44 PDT
,
Kimmo Kinnunen
darin
: review-
Details
Formatted Diff
Diff
New version of the unicode escape fix patch
(9.59 KB, patch)
2005-10-11 09:58 PDT
,
Kimmo Kinnunen
darin
: review+
Details
Formatted Diff
Diff
Testcase to demonstrate keyword clash with unicode escape sequences
(701 bytes, text/html)
2005-10-11 10:09 PDT
,
Kimmo Kinnunen
no flags
Details
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2005-09-10 13:09:59 PDT
Created
attachment 3849
[details]
test case
Kimmo Kinnunen
Comment 2
2005-10-11 07:44:43 PDT
Created
attachment 4309
[details]
Darin's patch extracted from 4885. Fixes unicode escapes Darin's patch extracted from
bug 4885
. Fixes unicode escapes. Note: in 4885, there's no handling of the case where there is an unicode escape in an identifier which maps to a token after the escape is augmented.
Darin Adler
Comment 3
2005-10-11 08:40:01 PDT
Comment on
attachment 4309
[details]
Darin's patch extracted from 4885. Fixes unicode escapes Looks great (not sure if I can review my own patch). Where does the specification say that if you use \u sequences, then an identifier is not a keyword? I re-read it and that was not clear to me. Did you test other implementations to see if that was so? (Part of the reason I ask is that I know in C++ it does not work this way; you can use the sequences in keywords.) The boolean identifierIsNotToken is a great way to implement that rule, but I recommend naming it identifierIsNotKeyword instead. Another way to implement it would be to create a separate state. The old one could be renamed IdentifierOrKeyword and the new state would be just Identifier. The IdentifierOrKeyword case could fall into the Identifier case after checking for and handling keywords. Besides allowing \u sequences in identifiers, this patch also implements the full rule for what characters are allowed in identifiers. So the attached test case is insufficient. We need test cases that cover a variety of valid and invalid identifiers, as well as covering the case of an identifer that uses a \u sequence to avoid collision with a keyword. The minimum that would be needed on this patch would be renaming identifierIsNotToken to identifierIsNotKeyword and adding more test cases.
Kimmo Kinnunen
Comment 4
2005-10-11 09:58:57 PDT
Created
attachment 4313
[details]
New version of the unicode escape fix patch Unicode escape fix. Changes the bool variable identifierIsNotToken to few extra states.
Kimmo Kinnunen
Comment 5
2005-10-11 10:06:52 PDT
(In reply to
comment #3
)
> Where does the specification say that if you use \u sequences, then an > identifier is not a keyword? I re-read it and that was not clear to me. Did you > test other implementations to see if that was so? (Part of the reason I ask is > that I know in C++ it does not work this way; you can use the sequences in > keywords.)
The grammar says that 'Keyword' can be one of the set of keywords and that 'IdentifierStart' can have 'UnicodeEscape', so I assumed that that means that 'Keyword' can not have 'UnicodeEscape'. As far as I can see, the spec says that you shouldn't have unicode escapes in keywords. It's not explicitly written out, so I could be missing something.. Yes, Firefox behaves in same way. There's also mozilla test to test this - ecma_3/Unicode/uc-003.js is fixed with the patch..
> The boolean identifierIsNotToken is a great way to implement that rule, but I > recommend naming it identifierIsNotKeyword instead. Another way to implement it > would be to create a separate state. The old one could be renamed > IdentifierOrKeyword and the new state would be just Identifier. The > IdentifierOrKeyword case could fall into the Identifier case after checking for > and handling keywords.
This is certainly more explicit and clearer. I added this to the new patch.
> Besides allowing \u sequences in identifiers, this patch also implements the > full rule for what characters are allowed in identifiers. So the attached test > case is insufficient. We need test cases that cover a variety of valid and > invalid identifiers, as well as covering the case of an identifer that uses a > \u sequence to avoid collision with a keyword.
There's atleast that ecma_3/Unicode/uc-003.js which is fixed. It tests the collision with the keyword.
Kimmo Kinnunen
Comment 6
2005-10-11 10:09:56 PDT
Created
attachment 4314
[details]
Testcase to demonstrate keyword clash with unicode escape sequences Testcase to demonstrate keyword clash with unicode escape sequences. Duplicates the test of ecma_3/Unicode/uc-003.js
Kimmo Kinnunen
Comment 7
2005-10-11 10:20:56 PDT
BTW: here's the list of tests this fixes. Unfortunately there are one unrelated test case which is fixed.. ecma_3/Function/regress-193555.js I don't know why this is fixed. I thought it has something to do with additional checks for whitespace characters (ie. the test may have unicode whitespace chars which were previously not accepted as whitespace). With fast inspection, I couldn't find such a characters, though. ecma_3/Function/regress-58274.js Unicode escape test for escaped chars. ecma_3/Unicode/uc-003.js Tests identifiers which have escaped unicode chars. Also tests the keyword clash ecma_3/Unicode/uc-005.js Unicode escape test for escaped chars. BTW. There might be problems with following constructs: function id\1234() { }; if (id\1234.toString() == somestringvar) because 'id\1234.toString()' doesn't return string with escaped text, but string 'id?' where ? is the augmented unicode character. So some information of the source is lost. In Firefox situation is similar but slightly better: for 'id\1234' FF returns 'id\1234', but for 'id\abcd' FF returns 'id\ABCD'.
Kimmo Kinnunen
Comment 8
2005-10-11 10:24:22 PDT
(In reply to
comment #7
)
> function id\1234() { }; > if (id\1234.toString() == somestringvar)
And of course there were tons of typos in the
comment #7
. The \1234 should've been \u1234. For example the above snippet should have been something like function id\u1234() { }; if (id\u1234.toString() == somestringvar)
Maks Orlovich
Comment 9
2005-12-30 12:32:39 PST
This also fixes kjs/parse.js, via permitting unicode identifiers. I can spot a bug though --- it needs to check that decoded chars are valid identifier start/continuation. From 7.6: "A UnicodeEscapeSequence cannot be used to put a character into an identifier that would otherwise be illegal. In other words, if a \ UnicodeEscapeSequence sequence were replaced by its UnicodeEscapeSequence's CV, the result must still be a valid Identifier that has the exact same sequence of characters as the original Identifier."
Geoffrey Garen
Comment 10
2005-12-30 13:20:37 PST
Comment on
attachment 4313
[details]
New version of the unicode escape fix patch The bug Maks mentions is probably worthy of a separate bug report. This patch, however, seems to have addressed Darin's previous comments. Darin?
Darin Adler
Comment 11
2005-12-30 13:29:18 PST
Comment on
attachment 4313
[details]
New version of the unicode escape fix patch Yes, seems fine to land this as the next step in improving things. I'd like to see the problem Maks reports fixed.
Darin Adler
Comment 12
2005-12-30 13:29:38 PST
Comment on
attachment 4313
[details]
New version of the unicode escape fix patch The patch contains some tab characters. It would be best to land it without those.
Geoffrey Garen
Comment 13
2006-01-04 13:43:27 PST
Landed. Didn't land layout test because comments indicate it's a duplicate of the mozilla test.
Alexey Proskuryakov
Comment 14
2006-03-11 07:38:13 PST
***
Bug 4918
has been marked as a duplicate of this bug. ***
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug