WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 208998
JavaScript identifier grammar supports unescaped astral symbols, but JSC doesn’t
https://bugs.webkit.org/show_bug.cgi?id=208998
Summary
JavaScript identifier grammar supports unescaped astral symbols, but JSC doesn’t
Mathias Bynens
Reported
2020-03-12 08:58:42 PDT
Since ES2015, ID_Start or ID_Continue symbols _outside of the BMP_ can be used in identifiers. Example using U+102A7 CARIAN LETTER A2 (taken from
https://mathiasbynens.be/notes/javascript-identifiers-es6
): var 𐊧; // 1 var \u{102A7}; // 2 Expected result: The above program evaluates without throwing an exception. Actual result: Although JavaScriptCore handles line 2 correctly, it throws on line 1: SyntaxError: Invalid character '\ud800' Note that JSC is the only engine with this bug: $ eshost -sx 'var 𐊧' #### Chakra, SpiderMonkey, V8, V8 --harmony, XS #### JavaScriptCore SyntaxError: Invalid character '\ud800'
Attachments
WIP
(31.43 KB, patch)
2020-03-15 15:57 PDT
,
Keith Miller
no flags
Details
Formatted Diff
Diff
Patch
(34.10 KB, patch)
2020-03-16 15:11 PDT
,
Keith Miller
no flags
Details
Formatted Diff
Diff
Patch
(40.20 KB, patch)
2020-03-16 15:42 PDT
,
Keith Miller
no flags
Details
Formatted Diff
Diff
Patch for landing
(40.19 KB, patch)
2020-03-16 16:28 PDT
,
Keith Miller
no flags
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2020-03-12 12:22:25 PDT
<
rdar://problem/60381814
>
Keith Miller
Comment 2
2020-03-13 11:35:31 PDT
Interesting our implementation seems semantically the same as v8's: static inline bool isIdentStart(UChar32 c) { return isLatin1(c) ? isIdentStart(static_cast<LChar>(c)) : isNonLatin1IdentStart(c); } static NEVER_INLINE bool isNonLatin1IdentStart(UChar c) { return u_hasBinaryProperty(c, UCHAR_ID_START); } and similarly for non-start: static ALWAYS_INLINE bool isIdentPart(UChar32 c) { return isLatin1(c) ? isIdentPart(static_cast<LChar>(c)) : isNonLatin1IdentPart(c); } static NEVER_INLINE bool isNonLatin1IdentPart(UChar32 c) { return u_hasBinaryProperty(c, UCHAR_ID_CONTINUE) || c == 0x200C || c == 0x200D; } My guess is this is a bug in the system ICU. Does that analysis seem correct to you? My Unicode/ICU knowledge is very limited...
Yusuke Suzuki
Comment 3
2020-03-13 13:04:58 PDT
I think we should have a fallback path for surrogate pairs. Something like this. if (!isIdentStart(code)) { // Check whether the code is leading surrogate. if (!isLeadSurrogate(code)) goto fail; codeNext = getOneCharacter(); if (!isTrailSurrogate(codeNext)) goto fail; codePoint = codePointFromSurrogatePair(code, codeNext); if (!isIdentStart(codePoint)) goto fail; }
Mathias Bynens
Comment 4
2020-03-13 16:28:32 PDT
The error message mentions U+D800 which is the high surrogate half of the character in my example ('\u{102A7}' === '\uD800\uDEA7'), so I think the code paths Keith mentions are hit for each surrogate half (and surrogate code points are never valid identifier characters, hence the exception) as opposed to considering surrogate pairs where applicable.
Keith Miller
Comment 5
2020-03-14 00:27:00 PDT
(In reply to Mathias Bynens from
comment #4
)
> The error message mentions U+D800 which is the high surrogate half of the > character in my example ('\u{102A7}' === '\uD800\uDEA7'), so I think the > code paths Keith mentions are hit for each surrogate half (and surrogate > code points are never valid identifier characters, hence the exception) as > opposed to considering surrogate pairs where applicable.
Yeah, I think you're right. I just didn't know what was going on... >.>
Keith Miller
Comment 6
2020-03-15 15:57:01 PDT
Created
attachment 393627
[details]
WIP
Keith Miller
Comment 7
2020-03-16 15:11:00 PDT
Created
attachment 393691
[details]
Patch
Keith Miller
Comment 8
2020-03-16 15:42:09 PDT
Created
attachment 393695
[details]
Patch
Michael Saboff
Comment 9
2020-03-16 16:17:36 PDT
Comment on
attachment 393695
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=393695&action=review
r=me
> Source/JavaScriptCore/parser/Lexer.cpp:2625 > + // FIXME: This should probably not be a lex error but dealing with surrogate pairs here is annoying and it's going to be an error anyway...
Add a bug for this FIXME.
Keith Miller
Comment 10
2020-03-16 16:19:07 PDT
Comment on
attachment 393695
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=393695&action=review
>> Source/JavaScriptCore/parser/Lexer.cpp:2625 >> + // FIXME: This should probably not be a lex error but dealing with surrogate pairs here is annoying and it's going to be an error anyway... > > Add a bug for this FIXME.
Ehh, This should probably not be a FIXME. I don't think we're actually going to fix it.
Keith Miller
Comment 11
2020-03-16 16:28:14 PDT
Created
attachment 393704
[details]
Patch for landing
WebKit Commit Bot
Comment 12
2020-03-16 17:12:22 PDT
Comment on
attachment 393704
[details]
Patch for landing Clearing flags on attachment: 393704 Committed
r258531
: <
https://trac.webkit.org/changeset/258531
>
WebKit Commit Bot
Comment 13
2020-03-16 17:12:24 PDT
All reviewed patches have been landed. Closing bug.
Alexey Shvayka
Comment 14
2020-03-23 10:19:48 PDT
***
Bug 167940
has been marked as a duplicate of this bug. ***
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug