Bug 78908

Summary: `\u200c` and `\u200d` should be allowed in IdentifierPart, as per ES5
Product: WebKit Reporter: Mathias Bynens <mathias>
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, barraclough, dglazkov, joethomas, mathias, msaboff, oliver, peter, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Test case
none
Proposed patch
mathias: review-, webkit-ews: commit-queue-
Proposed patch including new layout tests
none
Proposed patch including new layout tests that updates old ZWJ test
webkit-ews: commit-queue-
Patch
webkit.review.bot: commit-queue-
Patch
none
Patch
none
Patch
mathias: review-, webkit-ews: commit-queue-
Patch
none
Patch none

Mathias Bynens
Reported 2012-02-17 09:12:24 PST
According to http://es5.github.com/x7.html#x7.6 (spec), http://mathiasbynens.be/notes/javascript-identifiers (summary), and http://mothereff.in/js-variables#a%5Cu200d (validator), `a\u200d` is a valid JavaScript identifier. However, JavaScriptCore doesn’t seem to support it: var a\u200d; // throws SyntaxError: Unexpected token ILLEGAL This is a spec violation.
Attachments
Test case (187 bytes, text/html)
2012-02-17 09:12 PST, Mathias Bynens
no flags
Proposed patch (3.51 KB, patch)
2012-02-22 14:18 PST, Mathias Bynens
mathias: review-
webkit-ews: commit-queue-
Proposed patch including new layout tests (5.68 KB, patch)
2012-02-22 14:57 PST, Mathias Bynens
no flags
Proposed patch including new layout tests that updates old ZWJ test (6.89 KB, patch)
2012-02-22 15:21 PST, Mathias Bynens
webkit-ews: commit-queue-
Patch (6.88 KB, patch)
2012-02-23 03:52 PST, Mathias Bynens
webkit.review.bot: commit-queue-
Patch (7.66 KB, patch)
2012-02-23 05:42 PST, Mathias Bynens
no flags
Patch (7.66 KB, patch)
2012-02-23 05:45 PST, Mathias Bynens
no flags
Patch (7.66 KB, patch)
2012-02-23 06:04 PST, Mathias Bynens
mathias: review-
webkit-ews: commit-queue-
Patch (7.44 KB, patch)
2012-02-23 11:56 PST, Mathias Bynens
no flags
Patch (7.44 KB, patch)
2012-02-23 12:03 PST, Mathias Bynens
no flags
Mathias Bynens
Comment 1 2012-02-17 09:12:57 PST
Created attachment 127603 [details] Test case
Mathias Bynens
Comment 2 2012-02-17 09:15:42 PST
Actually, it throws an Error, not a SyntaxError: > Error: Unrecognized token 'a\u200d'
Mathias Bynens
Comment 3 2012-02-17 15:58:08 PST
This one should be allowed too: var Ꙭൽↈⴱ;
Alexey Proskuryakov
Comment 4 2012-02-17 16:55:19 PST
FWIW, we have a regression test checking that this character is not allowed, following discussion in bug 4931 and related. Note that the error message in console is better in nightlies.
Gavin Barraclough
Comment 5 2012-02-17 23:39:39 PST
Looks like the spec has changed, we should probably update our position to match.
Mathias Bynens
Comment 6 2012-02-19 06:19:13 PST
This one should be allowed too as of Unicode 6.1.0: http://mothereff.in/js-variables#%5Cu0cf1 var \u0cf1;
Mathias Bynens
Comment 7 2012-02-21 08:50:49 PST
In case another example is needed: http://mothereff.in/js-variables#%5Cua7aa var \ua7aa;
Mathias Bynens
Comment 8 2012-02-22 14:01:54 PST
Let’s make this bug about the ZWJ and ZWNJ characters missing from IdentifierPart only. I’ll file a separate bug for Unicode 6.1.0 support.
Mathias Bynens
Comment 9 2012-02-22 14:18:02 PST
Created attachment 128289 [details] Proposed patch
Alexey Proskuryakov
Comment 10 2012-02-22 14:27:57 PST
This is not marked for review, but this almost certainly breaks existing layout tests (which will need to be updated accordingly to new expected behavior).
Gavin Barraclough
Comment 11 2012-02-22 14:30:20 PST
Hi Mathias, The patch looks great - thanks for fixing this! However we do ask for bug fixes to be accompanied by regression tests. Do you think it would be possible to add one? You should be able to test this with something like: shouldBe("var x\u200c = 42; x\u200c", "42"); shouldBe("var x\u200d = 43; x\u200d", "43"); You'll find the tests in LayoutTests/fast/js, just pick an existing test to add the tests to, or if none look appropriate create a new .html / .js based on an existing one. Let me know if you have an questions. Many thanks, G.
Early Warning System Bot
Comment 12 2012-02-22 14:51:01 PST
Comment on attachment 128289 [details] Proposed patch Attachment 128289 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/11561643
Mathias Bynens
Comment 13 2012-02-22 14:57:06 PST
Created attachment 128296 [details] Proposed patch including new layout tests
Mathias Bynens
Comment 14 2012-02-22 15:21:23 PST
Created attachment 128301 [details] Proposed patch including new layout tests that updates old ZWJ test
Early Warning System Bot
Comment 15 2012-02-22 15:34:34 PST
Comment on attachment 128301 [details] Proposed patch including new layout tests that updates old ZWJ test Attachment 128301 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/11574009
Alexey Proskuryakov
Comment 16 2012-02-22 16:01:55 PST
(In reply to comment #10) > This is not marked for review, but this almost certainly breaks existing layout tests (which will need to be updated accordingly to new expected behavior). Could you please comment on this? Specifically, fast/js/removing-Cf-characters.html looks like it should break.
Alexey Proskuryakov
Comment 17 2012-02-22 16:03:00 PST
Sorry, I missed the latest patch between bot spew. Please disregard comment 16.
WebKit Review Bot
Comment 18 2012-02-22 16:54:39 PST
Comment on attachment 128301 [details] Proposed patch including new layout tests that updates old ZWJ test Attachment 128301 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/11560655 New failing tests: fast/js/var-declarations-zero-width.html
Mathias Bynens
Comment 19 2012-02-23 03:52:51 PST
WebKit Review Bot
Comment 20 2012-02-23 04:38:11 PST
Comment on attachment 128451 [details] Patch Attachment 128451 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/11576248 New failing tests: fast/js/var-declarations-zero-width.html
Mathias Bynens
Comment 21 2012-02-23 05:42:09 PST
Created attachment 128463 [details] Patch Disable the new test in Chromium for now. It can be enabled as soon as V8 r10800 (http://code.google.com/p/v8/source/detail?r=10800) has rolled into WebKit. Thanks to Peter Beverloo in #webkit for the helpful pointers.
Mathias Bynens
Comment 22 2012-02-23 05:45:32 PST
Created attachment 128464 [details] Patch Fixed a typo in both changelog entries.
Peter Beverloo
Comment 23 2012-02-23 06:00:30 PST
Comment on attachment 128464 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=128464&action=review Cool, thanks for fixing this in JSC and V8 :-). > Source/JavaScriptCore/runtime/LiteralParser.cpp:3 > + * Copyright (C) Mathias Bynens (mathias@qiwi.be) nit: Include the copyright year (as you do in Lexer.cpp)
Mathias Bynens
Comment 24 2012-02-23 06:04:39 PST
Created attachment 128467 [details] Patch Updated as per Peter’s feedback in comment #23.
Early Warning System Bot
Comment 25 2012-02-23 09:22:51 PST
Mathias Bynens
Comment 26 2012-02-23 10:00:45 PST
(In reply to comment #25) > (From update of attachment 128467 [details]) > Attachment 128467 [details] did not pass qt-ews (qt): > Output: http://queues.webkit.org/results/11608295 The Qt bot shows that the equation between *m_ptr and 0x200C on line 285 (LiteralParser<LChar>::Lexer::lexIdentifier) is out of range. Alexey, could you maybe give me some pointers about how to handle the comparison in this case, or can we rely on the UChar code-path to be taken when 0x200C/0x200D has been found in the code?
Alexey Proskuryakov
Comment 27 2012-02-23 10:15:51 PST
> can we rely on the UChar code-path to be taken when 0x200C/0x200D has been found in the code? Yes, my understanding is that this function is called with char template parameter when the string is known ASCII (not UTF-8). It is surprising that this check has cases for Unicode spaces, but not for regular 0x20.
Alexey Proskuryakov
Comment 28 2012-02-23 10:18:35 PST
Correction: LChar is L is Latin-1, not just ASCII.
Michael Saboff
Comment 29 2012-02-23 10:41:21 PST
Comment on attachment 128467 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=128467&action=review > Source/JavaScriptCore/parser/Lexer.cpp:384 > + | Mark_NonSpacing | Mark_SpacingCombining | Number_DecimalDigit | Punctuation_Connector)); Move the checks for 0x200C and 0x200D to the end as these are not the typical case. > Source/JavaScriptCore/runtime/LiteralParser.cpp:285 > + while (m_ptr < m_end && (isASCIIAlphanumeric(*m_ptr) || *m_ptr == '_' || *m_ptr == '$' || *m_ptr == 0x200C || *m_ptr == 0x200D)) Remove the checks for 0x200C and 0x200D here as this explicit template instantiation is only for 8 bit (LChar) characters. > LayoutTests/fast/js/var-declarations-zero-width.html:29 > +<p>This page tests if U+200C and U+200D are allowed as part of an identifier. > +<pre id=console></pre> > + > +<script> > +if (window.layoutTestController) > + layoutTestController.dumpAsText(); > + > +function log(s) { > + document.getElementById("console").appendChild(document.createTextNode(s + "\n")); > +} > + > +function shouldBe(a, b) { > + var evalA; > + try { > + evalA = eval(a); > + } catch (e) { > + evalA = e; > + } > + > + if (evalA === b) { > + log("PASS: " + a + " should be " + b + " and is."); > + } else { > + log("FAIL: " + a + " should be " + b + " but instead is " + evalA + "."); > + } > +} > + > +shouldBe("var x\u200c = 42; x\u200c", 42); > +shouldBe("var x\u200d = 43; x\u200d", 43); > +</script> Please move the script part to LayoutTests/fast/js/script-tests/var-declarations-zero-width.js and reference the .js file in the .html. Also use the provided "js-test-pre.js" and "js-test-post.js" scripts. The .html should end up something like: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <html> <head> <script src="resources/js-test-pre.js"></script> </head> <body> <script src="script-tests/var-declarations-zero-width.js"></script> <script src="resources/js-test-post.js"></script> </body> And the script-tests/var-declarations-zero-width.js would be something like: description("This page tests if U+200C and U+200D are allowed as part of an identifier."); shouldBe("var x\u200c = 42; x\u200c", 42); shouldBe("var x\u200d = 43; x\u200d", 43);
Alexey Proskuryakov
Comment 30 2012-02-23 11:07:12 PST
> Please move the script part to LayoutTests/fast/js/script-tests/var-declarations-zero-width.js and reference the .js file in the .html. (fast/js is an exception - everywhere else, it's correct to have one file)
Mathias Bynens
Comment 31 2012-02-23 11:39:46 PST
Comment on attachment 128467 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=128467&action=review >> Source/JavaScriptCore/parser/Lexer.cpp:384 >> + | Mark_NonSpacing | Mark_SpacingCombining | Number_DecimalDigit | Punctuation_Connector)); > > Move the checks for 0x200C and 0x200D to the end as these are not the typical case. Will do. In case you were wondering, I figured I’d place them at the start as the category lookups are probably less efficient than a simple equality check. >> LayoutTests/fast/js/var-declarations-zero-width.html:29 >> +</script> > > Please move the script part to LayoutTests/fast/js/script-tests/var-declarations-zero-width.js and reference the .js file in the .html. Also use the provided "js-test-pre.js" and "js-test-post.js" scripts. > > The .html should end up something like: > <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> > <html> > <head> > <script src="resources/js-test-pre.js"></script> > </head> > <body> > <script src="script-tests/var-declarations-zero-width.js"></script> > <script src="resources/js-test-post.js"></script> > </body> > > And the script-tests/var-declarations-zero-width.js would be something like: > description("This page tests if U+200C and U+200D are allowed as part of an identifier."); > > shouldBe("var x\u200c = 42; x\u200c", 42); > shouldBe("var x\u200d = 43; x\u200d", 43); Thanks; will do. I did not know about this convention.
Mathias Bynens
Comment 32 2012-02-23 11:56:58 PST
Created attachment 128524 [details] Patch Updated as per Michael’s and Alexey’s feedback. (Thanks!)
Mathias Bynens
Comment 33 2012-02-23 12:03:41 PST
Created attachment 128527 [details] Patch Corrected a typo in the LayoutTests/ChangeLog entry.
WebKit Review Bot
Comment 34 2012-02-24 00:20:32 PST
Comment on attachment 128527 [details] Patch Clearing flags on attachment: 128527 Committed r108742: <http://trac.webkit.org/changeset/108742>
WebKit Review Bot
Comment 35 2012-02-24 00:20:42 PST
All reviewed patches have been landed. Closing bug.
Mathias Bynens
Comment 36 2012-02-24 00:23:46 PST
Thanks for the help with this patch, everyone! Bug 79353 tracks Unicode 6.1.0 support for identifiers, and I filed bug 79361 to enable the new layout test for Chromium as soon as V8 r10800 has rolled into WebKit.
Note You need to log in before you can comment on or make changes to this bug.