According to http://es5.github.com/x7.html#x7.6 (spec), http://mathiasbynens.be/notes/javascript-identifiers (summary), and http://mothereff.in/js-variables#a%5Cu200d (validator), `a\u200d` is a valid JavaScript identifier. However, JavaScriptCore doesn’t seem to support it: var a\u200d; // throws SyntaxError: Unexpected token ILLEGAL This is a spec violation.
Created attachment 127603 [details] Test case
Actually, it throws an Error, not a SyntaxError: > Error: Unrecognized token 'a\u200d'
This one should be allowed too: var Ꙭൽↈⴱ;
FWIW, we have a regression test checking that this character is not allowed, following discussion in bug 4931 and related. Note that the error message in console is better in nightlies.
Looks like the spec has changed, we should probably update our position to match.
This one should be allowed too as of Unicode 6.1.0: http://mothereff.in/js-variables#%5Cu0cf1 var \u0cf1;
In case another example is needed: http://mothereff.in/js-variables#%5Cua7aa var \ua7aa;
Let’s make this bug about the ZWJ and ZWNJ characters missing from IdentifierPart only. I’ll file a separate bug for Unicode 6.1.0 support.
Created attachment 128289 [details] Proposed patch
This is not marked for review, but this almost certainly breaks existing layout tests (which will need to be updated accordingly to new expected behavior).
Hi Mathias, The patch looks great - thanks for fixing this! However we do ask for bug fixes to be accompanied by regression tests. Do you think it would be possible to add one? You should be able to test this with something like: shouldBe("var x\u200c = 42; x\u200c", "42"); shouldBe("var x\u200d = 43; x\u200d", "43"); You'll find the tests in LayoutTests/fast/js, just pick an existing test to add the tests to, or if none look appropriate create a new .html / .js based on an existing one. Let me know if you have an questions. Many thanks, G.
Comment on attachment 128289 [details] Proposed patch Attachment 128289 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/11561643
Created attachment 128296 [details] Proposed patch including new layout tests
Created attachment 128301 [details] Proposed patch including new layout tests that updates old ZWJ test
Comment on attachment 128301 [details] Proposed patch including new layout tests that updates old ZWJ test Attachment 128301 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/11574009
(In reply to comment #10) > This is not marked for review, but this almost certainly breaks existing layout tests (which will need to be updated accordingly to new expected behavior). Could you please comment on this? Specifically, fast/js/removing-Cf-characters.html looks like it should break.
Sorry, I missed the latest patch between bot spew. Please disregard comment 16.
Comment on attachment 128301 [details] Proposed patch including new layout tests that updates old ZWJ test Attachment 128301 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/11560655 New failing tests: fast/js/var-declarations-zero-width.html
Created attachment 128451 [details] Patch
Comment on attachment 128451 [details] Patch Attachment 128451 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/11576248 New failing tests: fast/js/var-declarations-zero-width.html
Created attachment 128463 [details] Patch Disable the new test in Chromium for now. It can be enabled as soon as V8 r10800 (http://code.google.com/p/v8/source/detail?r=10800) has rolled into WebKit. Thanks to Peter Beverloo in #webkit for the helpful pointers.
Created attachment 128464 [details] Patch Fixed a typo in both changelog entries.
Comment on attachment 128464 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=128464&action=review Cool, thanks for fixing this in JSC and V8 :-). > Source/JavaScriptCore/runtime/LiteralParser.cpp:3 > + * Copyright (C) Mathias Bynens (mathias@qiwi.be) nit: Include the copyright year (as you do in Lexer.cpp)
Created attachment 128467 [details] Patch Updated as per Peter’s feedback in comment #23.
Comment on attachment 128467 [details] Patch Attachment 128467 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/11608295
(In reply to comment #25) > (From update of attachment 128467 [details]) > Attachment 128467 [details] did not pass qt-ews (qt): > Output: http://queues.webkit.org/results/11608295 The Qt bot shows that the equation between *m_ptr and 0x200C on line 285 (LiteralParser<LChar>::Lexer::lexIdentifier) is out of range. Alexey, could you maybe give me some pointers about how to handle the comparison in this case, or can we rely on the UChar code-path to be taken when 0x200C/0x200D has been found in the code?
> can we rely on the UChar code-path to be taken when 0x200C/0x200D has been found in the code? Yes, my understanding is that this function is called with char template parameter when the string is known ASCII (not UTF-8). It is surprising that this check has cases for Unicode spaces, but not for regular 0x20.
Correction: LChar is L is Latin-1, not just ASCII.
Comment on attachment 128467 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=128467&action=review > Source/JavaScriptCore/parser/Lexer.cpp:384 > + | Mark_NonSpacing | Mark_SpacingCombining | Number_DecimalDigit | Punctuation_Connector)); Move the checks for 0x200C and 0x200D to the end as these are not the typical case. > Source/JavaScriptCore/runtime/LiteralParser.cpp:285 > + while (m_ptr < m_end && (isASCIIAlphanumeric(*m_ptr) || *m_ptr == '_' || *m_ptr == '$' || *m_ptr == 0x200C || *m_ptr == 0x200D)) Remove the checks for 0x200C and 0x200D here as this explicit template instantiation is only for 8 bit (LChar) characters. > LayoutTests/fast/js/var-declarations-zero-width.html:29 > +<p>This page tests if U+200C and U+200D are allowed as part of an identifier. > +<pre id=console></pre> > + > +<script> > +if (window.layoutTestController) > + layoutTestController.dumpAsText(); > + > +function log(s) { > + document.getElementById("console").appendChild(document.createTextNode(s + "\n")); > +} > + > +function shouldBe(a, b) { > + var evalA; > + try { > + evalA = eval(a); > + } catch (e) { > + evalA = e; > + } > + > + if (evalA === b) { > + log("PASS: " + a + " should be " + b + " and is."); > + } else { > + log("FAIL: " + a + " should be " + b + " but instead is " + evalA + "."); > + } > +} > + > +shouldBe("var x\u200c = 42; x\u200c", 42); > +shouldBe("var x\u200d = 43; x\u200d", 43); > +</script> Please move the script part to LayoutTests/fast/js/script-tests/var-declarations-zero-width.js and reference the .js file in the .html. Also use the provided "js-test-pre.js" and "js-test-post.js" scripts. The .html should end up something like: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <html> <head> <script src="resources/js-test-pre.js"></script> </head> <body> <script src="script-tests/var-declarations-zero-width.js"></script> <script src="resources/js-test-post.js"></script> </body> And the script-tests/var-declarations-zero-width.js would be something like: description("This page tests if U+200C and U+200D are allowed as part of an identifier."); shouldBe("var x\u200c = 42; x\u200c", 42); shouldBe("var x\u200d = 43; x\u200d", 43);
> Please move the script part to LayoutTests/fast/js/script-tests/var-declarations-zero-width.js and reference the .js file in the .html. (fast/js is an exception - everywhere else, it's correct to have one file)
Comment on attachment 128467 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=128467&action=review >> Source/JavaScriptCore/parser/Lexer.cpp:384 >> + | Mark_NonSpacing | Mark_SpacingCombining | Number_DecimalDigit | Punctuation_Connector)); > > Move the checks for 0x200C and 0x200D to the end as these are not the typical case. Will do. In case you were wondering, I figured I’d place them at the start as the category lookups are probably less efficient than a simple equality check. >> LayoutTests/fast/js/var-declarations-zero-width.html:29 >> +</script> > > Please move the script part to LayoutTests/fast/js/script-tests/var-declarations-zero-width.js and reference the .js file in the .html. Also use the provided "js-test-pre.js" and "js-test-post.js" scripts. > > The .html should end up something like: > <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> > <html> > <head> > <script src="resources/js-test-pre.js"></script> > </head> > <body> > <script src="script-tests/var-declarations-zero-width.js"></script> > <script src="resources/js-test-post.js"></script> > </body> > > And the script-tests/var-declarations-zero-width.js would be something like: > description("This page tests if U+200C and U+200D are allowed as part of an identifier."); > > shouldBe("var x\u200c = 42; x\u200c", 42); > shouldBe("var x\u200d = 43; x\u200d", 43); Thanks; will do. I did not know about this convention.
Created attachment 128524 [details] Patch Updated as per Michael’s and Alexey’s feedback. (Thanks!)
Created attachment 128527 [details] Patch Corrected a typo in the LayoutTests/ChangeLog entry.
Comment on attachment 128527 [details] Patch Clearing flags on attachment: 128527 Committed r108742: <http://trac.webkit.org/changeset/108742>
All reviewed patches have been landed. Closing bug.
Thanks for the help with this patch, everyone! Bug 79353 tracks Unicode 6.1.0 support for identifiers, and I filed bug 79361 to enable the new layout test for Chromium as soon as V8 r10800 has rolled into WebKit.