WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 78908
`\u200c` and `\u200d` should be allowed in IdentifierPart, as per ES5
https://bugs.webkit.org/show_bug.cgi?id=78908
Summary
`\u200c` and `\u200d` should be allowed in IdentifierPart, as per ES5
Mathias Bynens
Reported
2012-02-17 09:12:24 PST
According to
http://es5.github.com/x7.html#x7.6
(spec),
http://mathiasbynens.be/notes/javascript-identifiers
(summary), and
http://mothereff.in/js-variables#a%5Cu200d
(validator), `a\u200d` is a valid JavaScript identifier. However, JavaScriptCore doesn’t seem to support it: var a\u200d; // throws SyntaxError: Unexpected token ILLEGAL This is a spec violation.
Attachments
Test case
(187 bytes, text/html)
2012-02-17 09:12 PST
,
Mathias Bynens
no flags
Details
Proposed patch
(3.51 KB, patch)
2012-02-22 14:18 PST
,
Mathias Bynens
mathias
: review-
webkit-ews
: commit-queue-
Details
Formatted Diff
Diff
Proposed patch including new layout tests
(5.68 KB, patch)
2012-02-22 14:57 PST
,
Mathias Bynens
no flags
Details
Formatted Diff
Diff
Proposed patch including new layout tests that updates old ZWJ test
(6.89 KB, patch)
2012-02-22 15:21 PST
,
Mathias Bynens
webkit-ews
: commit-queue-
Details
Formatted Diff
Diff
Patch
(6.88 KB, patch)
2012-02-23 03:52 PST
,
Mathias Bynens
webkit.review.bot
: commit-queue-
Details
Formatted Diff
Diff
Patch
(7.66 KB, patch)
2012-02-23 05:42 PST
,
Mathias Bynens
no flags
Details
Formatted Diff
Diff
Patch
(7.66 KB, patch)
2012-02-23 05:45 PST
,
Mathias Bynens
no flags
Details
Formatted Diff
Diff
Patch
(7.66 KB, patch)
2012-02-23 06:04 PST
,
Mathias Bynens
mathias
: review-
webkit-ews
: commit-queue-
Details
Formatted Diff
Diff
Patch
(7.44 KB, patch)
2012-02-23 11:56 PST
,
Mathias Bynens
no flags
Details
Formatted Diff
Diff
Patch
(7.44 KB, patch)
2012-02-23 12:03 PST
,
Mathias Bynens
no flags
Details
Formatted Diff
Diff
Show Obsolete
(8)
View All
Add attachment
proposed patch, testcase, etc.
Mathias Bynens
Comment 1
2012-02-17 09:12:57 PST
Created
attachment 127603
[details]
Test case
Mathias Bynens
Comment 2
2012-02-17 09:15:42 PST
Actually, it throws an Error, not a SyntaxError:
> Error: Unrecognized token 'a\u200d'
Mathias Bynens
Comment 3
2012-02-17 15:58:08 PST
This one should be allowed too: var Ꙭൽↈⴱ;
Alexey Proskuryakov
Comment 4
2012-02-17 16:55:19 PST
FWIW, we have a regression test checking that this character is not allowed, following discussion in
bug 4931
and related. Note that the error message in console is better in nightlies.
Gavin Barraclough
Comment 5
2012-02-17 23:39:39 PST
Looks like the spec has changed, we should probably update our position to match.
Mathias Bynens
Comment 6
2012-02-19 06:19:13 PST
This one should be allowed too as of Unicode 6.1.0:
http://mothereff.in/js-variables#%5Cu0cf1
var \u0cf1;
Mathias Bynens
Comment 7
2012-02-21 08:50:49 PST
In case another example is needed:
http://mothereff.in/js-variables#%5Cua7aa
var \ua7aa;
Mathias Bynens
Comment 8
2012-02-22 14:01:54 PST
Let’s make this bug about the ZWJ and ZWNJ characters missing from IdentifierPart only. I’ll file a separate bug for Unicode 6.1.0 support.
Mathias Bynens
Comment 9
2012-02-22 14:18:02 PST
Created
attachment 128289
[details]
Proposed patch
Alexey Proskuryakov
Comment 10
2012-02-22 14:27:57 PST
This is not marked for review, but this almost certainly breaks existing layout tests (which will need to be updated accordingly to new expected behavior).
Gavin Barraclough
Comment 11
2012-02-22 14:30:20 PST
Hi Mathias, The patch looks great - thanks for fixing this! However we do ask for bug fixes to be accompanied by regression tests. Do you think it would be possible to add one? You should be able to test this with something like: shouldBe("var x\u200c = 42; x\u200c", "42"); shouldBe("var x\u200d = 43; x\u200d", "43"); You'll find the tests in LayoutTests/fast/js, just pick an existing test to add the tests to, or if none look appropriate create a new .html / .js based on an existing one. Let me know if you have an questions. Many thanks, G.
Early Warning System Bot
Comment 12
2012-02-22 14:51:01 PST
Comment on
attachment 128289
[details]
Proposed patch
Attachment 128289
[details]
did not pass qt-ews (qt): Output:
http://queues.webkit.org/results/11561643
Mathias Bynens
Comment 13
2012-02-22 14:57:06 PST
Created
attachment 128296
[details]
Proposed patch including new layout tests
Mathias Bynens
Comment 14
2012-02-22 15:21:23 PST
Created
attachment 128301
[details]
Proposed patch including new layout tests that updates old ZWJ test
Early Warning System Bot
Comment 15
2012-02-22 15:34:34 PST
Comment on
attachment 128301
[details]
Proposed patch including new layout tests that updates old ZWJ test
Attachment 128301
[details]
did not pass qt-ews (qt): Output:
http://queues.webkit.org/results/11574009
Alexey Proskuryakov
Comment 16
2012-02-22 16:01:55 PST
(In reply to
comment #10
)
> This is not marked for review, but this almost certainly breaks existing layout tests (which will need to be updated accordingly to new expected behavior).
Could you please comment on this? Specifically, fast/js/removing-Cf-characters.html looks like it should break.
Alexey Proskuryakov
Comment 17
2012-02-22 16:03:00 PST
Sorry, I missed the latest patch between bot spew. Please disregard
comment 16
.
WebKit Review Bot
Comment 18
2012-02-22 16:54:39 PST
Comment on
attachment 128301
[details]
Proposed patch including new layout tests that updates old ZWJ test
Attachment 128301
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/11560655
New failing tests: fast/js/var-declarations-zero-width.html
Mathias Bynens
Comment 19
2012-02-23 03:52:51 PST
Created
attachment 128451
[details]
Patch
WebKit Review Bot
Comment 20
2012-02-23 04:38:11 PST
Comment on
attachment 128451
[details]
Patch
Attachment 128451
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/11576248
New failing tests: fast/js/var-declarations-zero-width.html
Mathias Bynens
Comment 21
2012-02-23 05:42:09 PST
Created
attachment 128463
[details]
Patch Disable the new test in Chromium for now. It can be enabled as soon as V8
r10800
(
http://code.google.com/p/v8/source/detail?r=10800
) has rolled into WebKit. Thanks to Peter Beverloo in #webkit for the helpful pointers.
Mathias Bynens
Comment 22
2012-02-23 05:45:32 PST
Created
attachment 128464
[details]
Patch Fixed a typo in both changelog entries.
Peter Beverloo
Comment 23
2012-02-23 06:00:30 PST
Comment on
attachment 128464
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=128464&action=review
Cool, thanks for fixing this in JSC and V8 :-).
> Source/JavaScriptCore/runtime/LiteralParser.cpp:3 > + * Copyright (C) Mathias Bynens (
mathias@qiwi.be
)
nit: Include the copyright year (as you do in Lexer.cpp)
Mathias Bynens
Comment 24
2012-02-23 06:04:39 PST
Created
attachment 128467
[details]
Patch Updated as per Peter’s feedback in
comment #23
.
Early Warning System Bot
Comment 25
2012-02-23 09:22:51 PST
Comment on
attachment 128467
[details]
Patch
Attachment 128467
[details]
did not pass qt-ews (qt): Output:
http://queues.webkit.org/results/11608295
Mathias Bynens
Comment 26
2012-02-23 10:00:45 PST
(In reply to
comment #25
)
> (From update of
attachment 128467
[details]
) >
Attachment 128467
[details]
did not pass qt-ews (qt): > Output:
http://queues.webkit.org/results/11608295
The Qt bot shows that the equation between *m_ptr and 0x200C on line 285 (LiteralParser<LChar>::Lexer::lexIdentifier) is out of range. Alexey, could you maybe give me some pointers about how to handle the comparison in this case, or can we rely on the UChar code-path to be taken when 0x200C/0x200D has been found in the code?
Alexey Proskuryakov
Comment 27
2012-02-23 10:15:51 PST
> can we rely on the UChar code-path to be taken when 0x200C/0x200D has been found in the code?
Yes, my understanding is that this function is called with char template parameter when the string is known ASCII (not UTF-8). It is surprising that this check has cases for Unicode spaces, but not for regular 0x20.
Alexey Proskuryakov
Comment 28
2012-02-23 10:18:35 PST
Correction: LChar is L is Latin-1, not just ASCII.
Michael Saboff
Comment 29
2012-02-23 10:41:21 PST
Comment on
attachment 128467
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=128467&action=review
> Source/JavaScriptCore/parser/Lexer.cpp:384 > + | Mark_NonSpacing | Mark_SpacingCombining | Number_DecimalDigit | Punctuation_Connector));
Move the checks for 0x200C and 0x200D to the end as these are not the typical case.
> Source/JavaScriptCore/runtime/LiteralParser.cpp:285 > + while (m_ptr < m_end && (isASCIIAlphanumeric(*m_ptr) || *m_ptr == '_' || *m_ptr == '$' || *m_ptr == 0x200C || *m_ptr == 0x200D))
Remove the checks for 0x200C and 0x200D here as this explicit template instantiation is only for 8 bit (LChar) characters.
> LayoutTests/fast/js/var-declarations-zero-width.html:29 > +<p>This page tests if U+200C and U+200D are allowed as part of an identifier. > +<pre id=console></pre> > + > +<script> > +if (window.layoutTestController) > + layoutTestController.dumpAsText(); > + > +function log(s) { > + document.getElementById("console").appendChild(document.createTextNode(s + "\n")); > +} > + > +function shouldBe(a, b) { > + var evalA; > + try { > + evalA = eval(a); > + } catch (e) { > + evalA = e; > + } > + > + if (evalA === b) { > + log("PASS: " + a + " should be " + b + " and is."); > + } else { > + log("FAIL: " + a + " should be " + b + " but instead is " + evalA + "."); > + } > +} > + > +shouldBe("var x\u200c = 42; x\u200c", 42); > +shouldBe("var x\u200d = 43; x\u200d", 43); > +</script>
Please move the script part to LayoutTests/fast/js/script-tests/var-declarations-zero-width.js and reference the .js file in the .html. Also use the provided "js-test-pre.js" and "js-test-post.js" scripts. The .html should end up something like: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <html> <head> <script src="resources/js-test-pre.js"></script> </head> <body> <script src="script-tests/var-declarations-zero-width.js"></script> <script src="resources/js-test-post.js"></script> </body> And the script-tests/var-declarations-zero-width.js would be something like: description("This page tests if U+200C and U+200D are allowed as part of an identifier."); shouldBe("var x\u200c = 42; x\u200c", 42); shouldBe("var x\u200d = 43; x\u200d", 43);
Alexey Proskuryakov
Comment 30
2012-02-23 11:07:12 PST
> Please move the script part to LayoutTests/fast/js/script-tests/var-declarations-zero-width.js and reference the .js file in the .html.
(fast/js is an exception - everywhere else, it's correct to have one file)
Mathias Bynens
Comment 31
2012-02-23 11:39:46 PST
Comment on
attachment 128467
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=128467&action=review
>> Source/JavaScriptCore/parser/Lexer.cpp:384 >> + | Mark_NonSpacing | Mark_SpacingCombining | Number_DecimalDigit | Punctuation_Connector)); > > Move the checks for 0x200C and 0x200D to the end as these are not the typical case.
Will do. In case you were wondering, I figured I’d place them at the start as the category lookups are probably less efficient than a simple equality check.
>> LayoutTests/fast/js/var-declarations-zero-width.html:29 >> +</script> > > Please move the script part to LayoutTests/fast/js/script-tests/var-declarations-zero-width.js and reference the .js file in the .html. Also use the provided "js-test-pre.js" and "js-test-post.js" scripts. > > The .html should end up something like: > <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> > <html> > <head> > <script src="resources/js-test-pre.js"></script> > </head> > <body> > <script src="script-tests/var-declarations-zero-width.js"></script> > <script src="resources/js-test-post.js"></script> > </body> > > And the script-tests/var-declarations-zero-width.js would be something like: > description("This page tests if U+200C and U+200D are allowed as part of an identifier."); > > shouldBe("var x\u200c = 42; x\u200c", 42); > shouldBe("var x\u200d = 43; x\u200d", 43);
Thanks; will do. I did not know about this convention.
Mathias Bynens
Comment 32
2012-02-23 11:56:58 PST
Created
attachment 128524
[details]
Patch Updated as per Michael’s and Alexey’s feedback. (Thanks!)
Mathias Bynens
Comment 33
2012-02-23 12:03:41 PST
Created
attachment 128527
[details]
Patch Corrected a typo in the LayoutTests/ChangeLog entry.
WebKit Review Bot
Comment 34
2012-02-24 00:20:32 PST
Comment on
attachment 128527
[details]
Patch Clearing flags on attachment: 128527 Committed
r108742
: <
http://trac.webkit.org/changeset/108742
>
WebKit Review Bot
Comment 35
2012-02-24 00:20:42 PST
All reviewed patches have been landed. Closing bug.
Mathias Bynens
Comment 36
2012-02-24 00:23:46 PST
Thanks for the help with this patch, everyone!
Bug 79353
tracks Unicode 6.1.0 support for identifiers, and I filed
bug 79361
to enable the new layout test for Chromium as soon as V8
r10800
has rolled into WebKit.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug