Bug 78908 - `\u200c` and `\u200d` should be allowed in IdentifierPart, as per ES5
Summary: `\u200c` and `\u200d` should be allowed in IdentifierPart, as per ES5
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-17 09:12 PST by Mathias Bynens
Modified: 2012-02-24 00:23 PST (History)
9 users (show)

See Also:


Attachments
Test case (187 bytes, text/html)
2012-02-17 09:12 PST, Mathias Bynens
no flags Details
Proposed patch (3.51 KB, patch)
2012-02-22 14:18 PST, Mathias Bynens
mathias: review-
webkit-ews: commit-queue-
Details | Formatted Diff | Diff
Proposed patch including new layout tests (5.68 KB, patch)
2012-02-22 14:57 PST, Mathias Bynens
no flags Details | Formatted Diff | Diff
Proposed patch including new layout tests that updates old ZWJ test (6.89 KB, patch)
2012-02-22 15:21 PST, Mathias Bynens
webkit-ews: commit-queue-
Details | Formatted Diff | Diff
Patch (6.88 KB, patch)
2012-02-23 03:52 PST, Mathias Bynens
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
Patch (7.66 KB, patch)
2012-02-23 05:42 PST, Mathias Bynens
no flags Details | Formatted Diff | Diff
Patch (7.66 KB, patch)
2012-02-23 05:45 PST, Mathias Bynens
no flags Details | Formatted Diff | Diff
Patch (7.66 KB, patch)
2012-02-23 06:04 PST, Mathias Bynens
mathias: review-
webkit-ews: commit-queue-
Details | Formatted Diff | Diff
Patch (7.44 KB, patch)
2012-02-23 11:56 PST, Mathias Bynens
no flags Details | Formatted Diff | Diff
Patch (7.44 KB, patch)
2012-02-23 12:03 PST, Mathias Bynens
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mathias Bynens 2012-02-17 09:12:24 PST
According to http://es5.github.com/x7.html#x7.6 (spec), http://mathiasbynens.be/notes/javascript-identifiers (summary), and http://mothereff.in/js-variables#a%5Cu200d (validator), `a\u200d` is a valid JavaScript identifier. 

However, JavaScriptCore doesn’t seem to support it:

    var a\u200d; // throws SyntaxError: Unexpected token ILLEGAL

This is a spec violation.
Comment 1 Mathias Bynens 2012-02-17 09:12:57 PST
Created attachment 127603 [details]
Test case
Comment 2 Mathias Bynens 2012-02-17 09:15:42 PST
Actually, it throws an Error, not a SyntaxError:

> Error: Unrecognized token 'a\u200d'
Comment 3 Mathias Bynens 2012-02-17 15:58:08 PST
This one should be allowed too:

    var Ꙭൽↈⴱ;
Comment 4 Alexey Proskuryakov 2012-02-17 16:55:19 PST
FWIW, we have a regression test checking that this character is not allowed, following discussion in bug 4931 and related.

Note that the error message in console is better in nightlies.
Comment 5 Gavin Barraclough 2012-02-17 23:39:39 PST
Looks like the spec has changed, we should probably update our position to match.
Comment 6 Mathias Bynens 2012-02-19 06:19:13 PST
This one should be allowed too as of Unicode 6.1.0: http://mothereff.in/js-variables#%5Cu0cf1

    var \u0cf1;
Comment 7 Mathias Bynens 2012-02-21 08:50:49 PST
In case another example is needed: http://mothereff.in/js-variables#%5Cua7aa

    var \ua7aa;
Comment 8 Mathias Bynens 2012-02-22 14:01:54 PST
Let’s make this bug about the ZWJ and ZWNJ characters missing from IdentifierPart only.

I’ll file a separate bug for Unicode 6.1.0 support.
Comment 9 Mathias Bynens 2012-02-22 14:18:02 PST
Created attachment 128289 [details]
Proposed patch
Comment 10 Alexey Proskuryakov 2012-02-22 14:27:57 PST
This is not marked for review, but this almost certainly breaks existing layout tests (which will need to be updated accordingly to new expected behavior).
Comment 11 Gavin Barraclough 2012-02-22 14:30:20 PST
Hi Mathias,

The patch looks great - thanks for fixing this!

However we do ask for bug fixes to be accompanied by regression tests.  Do you think it would be possible to add one?

You should be able to test this with something like:

shouldBe("var x\u200c = 42; x\u200c", "42");
shouldBe("var x\u200d = 43; x\u200d", "43");

You'll find the tests in LayoutTests/fast/js, just pick an existing test to add the tests to, or if none look appropriate create a new .html / .js based on an existing one.  Let me know if you have an questions.

Many thanks,
G.
Comment 12 Early Warning System Bot 2012-02-22 14:51:01 PST
Comment on attachment 128289 [details]
Proposed patch

Attachment 128289 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/11561643
Comment 13 Mathias Bynens 2012-02-22 14:57:06 PST
Created attachment 128296 [details]
Proposed patch including new layout tests
Comment 14 Mathias Bynens 2012-02-22 15:21:23 PST
Created attachment 128301 [details]
Proposed patch including new layout tests that updates old ZWJ test
Comment 15 Early Warning System Bot 2012-02-22 15:34:34 PST
Comment on attachment 128301 [details]
Proposed patch including new layout tests that updates old ZWJ test

Attachment 128301 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/11574009
Comment 16 Alexey Proskuryakov 2012-02-22 16:01:55 PST
(In reply to comment #10)
> This is not marked for review, but this almost certainly breaks existing layout tests (which will need to be updated accordingly to new expected behavior).

Could you please comment on this? Specifically, fast/js/removing-Cf-characters.html looks like it should break.
Comment 17 Alexey Proskuryakov 2012-02-22 16:03:00 PST
Sorry, I missed the latest patch between bot spew. Please disregard comment 16.
Comment 18 WebKit Review Bot 2012-02-22 16:54:39 PST
Comment on attachment 128301 [details]
Proposed patch including new layout tests that updates old ZWJ test

Attachment 128301 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/11560655

New failing tests:
fast/js/var-declarations-zero-width.html
Comment 19 Mathias Bynens 2012-02-23 03:52:51 PST
Created attachment 128451 [details]
Patch
Comment 20 WebKit Review Bot 2012-02-23 04:38:11 PST
Comment on attachment 128451 [details]
Patch

Attachment 128451 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/11576248

New failing tests:
fast/js/var-declarations-zero-width.html
Comment 21 Mathias Bynens 2012-02-23 05:42:09 PST
Created attachment 128463 [details]
Patch

Disable the new test in Chromium for now. It can be enabled as soon as V8 r10800 (http://code.google.com/p/v8/source/detail?r=10800) has rolled into WebKit.

Thanks to Peter Beverloo in #webkit for the helpful pointers.
Comment 22 Mathias Bynens 2012-02-23 05:45:32 PST
Created attachment 128464 [details]
Patch

Fixed a typo in both changelog entries.
Comment 23 Peter Beverloo 2012-02-23 06:00:30 PST
Comment on attachment 128464 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=128464&action=review

Cool, thanks for fixing this in JSC and V8 :-).

> Source/JavaScriptCore/runtime/LiteralParser.cpp:3
> + * Copyright (C) Mathias Bynens (mathias@qiwi.be)

nit: Include the copyright year (as you do in Lexer.cpp)
Comment 24 Mathias Bynens 2012-02-23 06:04:39 PST
Created attachment 128467 [details]
Patch

Updated as per Peter’s feedback in comment #23.
Comment 25 Early Warning System Bot 2012-02-23 09:22:51 PST
Comment on attachment 128467 [details]
Patch

Attachment 128467 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/11608295
Comment 26 Mathias Bynens 2012-02-23 10:00:45 PST
(In reply to comment #25)
> (From update of attachment 128467 [details])
> Attachment 128467 [details] did not pass qt-ews (qt):
> Output: http://queues.webkit.org/results/11608295

The Qt bot shows that the equation between *m_ptr and 0x200C on line 285 (LiteralParser<LChar>::Lexer::lexIdentifier) is out of range. Alexey, could you maybe give me some pointers about how to handle the comparison in this case, or can we rely on the UChar code-path to be taken when 0x200C/0x200D has been found in the code?
Comment 27 Alexey Proskuryakov 2012-02-23 10:15:51 PST
> can we rely on the UChar code-path to be taken when 0x200C/0x200D has been found in the code?

Yes, my understanding is that this function is called with char template parameter when the string is known ASCII (not UTF-8).

It is surprising that this check has cases for Unicode spaces, but not for regular 0x20.
Comment 28 Alexey Proskuryakov 2012-02-23 10:18:35 PST
Correction: LChar is L is Latin-1, not just ASCII.
Comment 29 Michael Saboff 2012-02-23 10:41:21 PST
Comment on attachment 128467 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=128467&action=review

> Source/JavaScriptCore/parser/Lexer.cpp:384
> +        | Mark_NonSpacing | Mark_SpacingCombining | Number_DecimalDigit | Punctuation_Connector));

Move the checks for 0x200C and 0x200D to the end as these are not the typical case.

> Source/JavaScriptCore/runtime/LiteralParser.cpp:285
> +    while (m_ptr < m_end && (isASCIIAlphanumeric(*m_ptr) || *m_ptr == '_' || *m_ptr == '$' || *m_ptr == 0x200C || *m_ptr == 0x200D))

Remove the checks for 0x200C and 0x200D here as this explicit template instantiation is only for 8 bit (LChar) characters.

> LayoutTests/fast/js/var-declarations-zero-width.html:29
> +<p>This page tests if U+200C and U+200D are allowed as part of an identifier.
> +<pre id=console></pre>
> +
> +<script>
> +if (window.layoutTestController)
> +    layoutTestController.dumpAsText();
> +    
> +function log(s) {
> +    document.getElementById("console").appendChild(document.createTextNode(s + "\n"));
> +}
> +
> +function shouldBe(a, b) {
> +    var evalA;
> +    try {
> +        evalA = eval(a);
> +    } catch (e) {
> +        evalA = e;
> +    }
> +    
> +    if (evalA === b) {
> +        log("PASS: " + a + " should be " + b + " and is.");
> +    } else {
> +        log("FAIL: " + a + " should be " + b + " but instead is " + evalA + ".");
> +    }
> +}
> +
> +shouldBe("var x\u200c = 42; x\u200c", 42);
> +shouldBe("var x\u200d = 43; x\u200d", 43);
> +</script>

Please move the script part to LayoutTests/fast/js/script-tests/var-declarations-zero-width.js and reference the .js file in the .html.  Also use the provided "js-test-pre.js" and "js-test-post.js" scripts.

The .html should end up something like:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<script src="resources/js-test-pre.js"></script>
</head>
<body>
<script src="script-tests/var-declarations-zero-width.js"></script>
<script src="resources/js-test-post.js"></script>
</body>

And the script-tests/var-declarations-zero-width.js would be something like:
description("This page tests if U+200C and U+200D are allowed as part of an identifier.");

shouldBe("var x\u200c = 42; x\u200c", 42);
shouldBe("var x\u200d = 43; x\u200d", 43);
Comment 30 Alexey Proskuryakov 2012-02-23 11:07:12 PST
> Please move the script part to LayoutTests/fast/js/script-tests/var-declarations-zero-width.js and reference the .js file in the .html.

(fast/js is an exception - everywhere else, it's correct to have one file)
Comment 31 Mathias Bynens 2012-02-23 11:39:46 PST
Comment on attachment 128467 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=128467&action=review

>> Source/JavaScriptCore/parser/Lexer.cpp:384
>> +        | Mark_NonSpacing | Mark_SpacingCombining | Number_DecimalDigit | Punctuation_Connector));
> 
> Move the checks for 0x200C and 0x200D to the end as these are not the typical case.

Will do. In case you were wondering, I figured I’d place them at the start as the category lookups are probably less efficient than a simple equality check.

>> LayoutTests/fast/js/var-declarations-zero-width.html:29
>> +</script>
> 
> Please move the script part to LayoutTests/fast/js/script-tests/var-declarations-zero-width.js and reference the .js file in the .html.  Also use the provided "js-test-pre.js" and "js-test-post.js" scripts.
> 
> The .html should end up something like:
> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
> <html>
> <head>
> <script src="resources/js-test-pre.js"></script>
> </head>
> <body>
> <script src="script-tests/var-declarations-zero-width.js"></script>
> <script src="resources/js-test-post.js"></script>
> </body>
> 
> And the script-tests/var-declarations-zero-width.js would be something like:
> description("This page tests if U+200C and U+200D are allowed as part of an identifier.");
> 
> shouldBe("var x\u200c = 42; x\u200c", 42);
> shouldBe("var x\u200d = 43; x\u200d", 43);

Thanks; will do. I did not know about this convention.
Comment 32 Mathias Bynens 2012-02-23 11:56:58 PST
Created attachment 128524 [details]
Patch

Updated as per Michael’s and Alexey’s feedback. (Thanks!)
Comment 33 Mathias Bynens 2012-02-23 12:03:41 PST
Created attachment 128527 [details]
Patch

Corrected a typo in the LayoutTests/ChangeLog entry.
Comment 34 WebKit Review Bot 2012-02-24 00:20:32 PST
Comment on attachment 128527 [details]
Patch

Clearing flags on attachment: 128527

Committed r108742: <http://trac.webkit.org/changeset/108742>
Comment 35 WebKit Review Bot 2012-02-24 00:20:42 PST
All reviewed patches have been landed.  Closing bug.
Comment 36 Mathias Bynens 2012-02-24 00:23:46 PST
Thanks for the help with this patch, everyone!

Bug 79353 tracks Unicode 6.1.0 support for identifiers, and I filed bug 79361 to enable the new layout test for Chromium as soon as V8 r10800 has rolled into WebKit.