WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
212725
Pasted JavaScript code from pasteboard is unicode-normalized and this changes meaning of code
https://bugs.webkit.org/show_bug.cgi?id=212725
Summary
Pasted JavaScript code from pasteboard is unicode-normalized and this changes...
Huáng Jùnliàng
Reported
2020-06-03 20:37:15 PDT
Safari TP 107 throws on the following snippet ```js // \u1f7c-\u1f7d var range = "[ὼ-ώ]"; var regex = new RegExp(range); ``` However it does not throw when the range above is escaped as ascii only: ```js var range = "[\u1f7c-\u1f7d]"; var regex = new RegExp(range); ``` While `1f7c` seems random, the following snippet is good. ```js // \u1f7b-\u1f7c var range="[ύ-ὼ]",regex=new RegExp(range); ``` I don't think this issue can be related to recent Unicode version updates because \u1f7b - \u1f7d have been available since Unicode 1.1. **Context** Found this issue when debugging
https://github.com/babel/website/issues/2254
, JSC throws on Babel's ugly identifier name detection regex:
https://github.com/babel/babel/blob/master/packages/babel-helper-validator-identifier/src/identifier.js
after minified by terser. **Related version** This issue is also reproducible on Safari Version 13.1.1 (15609.2.9.1.2).
Attachments
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2020-06-05 09:57:35 PDT
<
rdar://problem/64033253
>
Yusuke Suzuki
Comment 2
2020-06-09 16:55:44 PDT
(In reply to Huáng Jùnliàng from
comment #0
)
> Safari TP 107 throws on the following snippet > > ```js > // \u1f7c-\u1f7d > var range = "[ὼ-ώ]"; > var regex = new RegExp(range); > ```
In the above code, it looks like the range is not `\u1f7c-\u1f7d`. [..."ὼ-ώ"].map((ch) => ch.charCodeAt(0).toString(16)) // => ["1f7c","2d","3ce"] Since 0x3ce is smaller than 0x1f7c, this throws a SyntaxError. I've ensured that the above script throws a SyntaxError in Firefox and Chrome. So, it seems that this is a minifier's bug. Can you check the result with the other browsers?
> > However it does not throw when the range above is escaped as ascii only: > > ```js > var range = "[\u1f7c-\u1f7d]"; > var regex = new RegExp(range); > ``` > > While `1f7c` seems random, the following snippet is good. > > ```js > // \u1f7b-\u1f7c > var range="[ύ-ὼ]",regex=new RegExp(range); > ``` > > I don't think this issue can be related to recent Unicode version updates > because \u1f7b - \u1f7d have been available since Unicode 1.1. > > > **Context** > > Found this issue when debugging >
https://github.com/babel/website/issues/2254
, JSC throws on Babel's ugly > identifier name detection regex: >
https://github.com/babel/babel/blob/master/packages/babel-helper-validator
- > identifier/src/identifier.js after minified by terser. > > **Related version** > This issue is also reproducible on Safari Version 13.1.1 (15609.2.9.1.2).
Huáng Jùnliàng
Comment 3
2020-06-09 19:27:10 PDT
> In the above code, it looks like the range is not `\u1f7c-\u1f7d`.
> [..."ὼ-ώ"].map((ch) => ch.charCodeAt(0).toString(16)) // => ["1f7c","2d","3ce"]
I can confirm that if you copy & paste the code example from this webpage, it throws syntax error on other browsers too and `ώ` becomes U+03CE. I can also confirm that the minifier is working properly. You can try the following example on
https://try.terser.org
``` "\u{1f7d}".codePointAt(0).toString(16); ``` And copy the output code to Chrome/Firefox/Safari console. Both Chrome and Firefox returns `1f7d` but Safari console returns `3ce`. Note that U+1F7D has singleton decompositions, which means ``` String.fromCodePoint(0x1f7d).normalize() === String.fromCodePoint(0x3ce) ``` and it is included in CompositionExclusions, which means U+1F7D should never exist in a normalized Unicode string. Since ECMAScript does not require the source text to be normalized, I think this is a bug of Safari console, which applies normalization on the pasted source code. I will file a new bug report about that. However I think this bug also affects certain feature implementations, because runtime error is thrown on Safari exclusively in
https://github.com/babel/website/issues/2254
. I will try to isolate that issue again and post an update, please leave it unresolved.
Yusuke Suzuki
Comment 4
2020-06-09 19:41:00 PDT
(In reply to Huáng Jùnliàng from
comment #3
)
> > In the above code, it looks like the range is not `\u1f7c-\u1f7d`. > > > [..."ὼ-ώ"].map((ch) => ch.charCodeAt(0).toString(16)) // => ["1f7c","2d","3ce"] > > I can confirm that if you copy & paste the code example from this webpage, > it throws syntax error on other browsers too and `ώ` becomes U+03CE.
Thanks for your confirmation :)
> > I can also confirm that the minifier is working properly. You can try the > following example on
https://try.terser.org
> > ``` > "\u{1f7d}".codePointAt(0).toString(16); > ``` > > And copy the output code to Chrome/Firefox/Safari console. Both Chrome and > Firefox returns `1f7d` but Safari console returns `3ce`.
Interesting! This reproduced in my machine too. One interesting thing is that, 1. I copied this from terser result 2. I created secret gist from Safari 3. I copied from the text from gist 4. Paste it to Chrome / Firefox consoles Then, I got the same normalized results. That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit? I should talk to platform folks.
> > Note that U+1F7D has singleton decompositions, which means > > ``` > String.fromCodePoint(0x1f7d).normalize() === String.fromCodePoint(0x3ce) > ``` > > and it is included in CompositionExclusions, which means U+1F7D should never > exist in a normalized Unicode string. > > Since ECMAScript does not require the source text to be normalized, I think > this is a bug of Safari console, which applies normalization on the pasted > source code. I will file a new bug report about that.
Maybe, this is not console's bug given that I encountered this normalization even in textarea. Rather, it sounds like pasteboard is doing normalization. It would be possible that this is derived from UIKit...?
> However I think this bug also affects certain feature implementations, > because runtime error is thrown on Safari exclusively in >
https://github.com/babel/website/issues/2254
. > > I will try to isolate that issue again and post an update, please leave it > unresolved.
Sure! Thanks. This helps us a lot :D
Huáng Jùnliàng
Comment 5
2020-06-09 20:01:22 PDT
> That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit?
I should talk to platform folks. That looks reasonable. Since you are much more familiar with the internals than I am, can you file a radar on that? I have also prepared a gist at
https://gist.github.com/JLHwung/64fed33e2dbb3da7a18566fab26f045f
.
Yusuke Suzuki
Comment 6
2020-06-09 20:59:00 PDT
(In reply to Huáng Jùnliàng from
comment #5
)
> > That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit? > I should talk to platform folks. > > That looks reasonable. Since you are much more familiar with the internals > than I am, can you file a radar on that? I have also prepared a gist at >
https://gist.github.com/JLHwung/64fed33e2dbb3da7a18566fab26f045f
.
OK, I've talked with Wenson and Alexey about it. We have a code path which is performing unicode-normalizing the text from the pasteboard to textarea to alleviate the situation like fancy decomposed unicode HFS+'s filename is directly pasted to the textarea. We are still talking about the direction. BTW, from this information, I guess that babel website console has a code path which copy & paste the babel code itself, correct?
Yusuke Suzuki
Comment 7
2020-06-18 21:48:17 PDT
After looking into babel console code, I suspect that this is normalized when creating Blob. @Huáng I guess that babel repl website creates large Blob which includes all babel source code, and this Blob is created from user JS something like, `new Blob([codeText])`, is it correct?
Yusuke Suzuki
Comment 8
2020-06-18 21:48:37 PDT
I'll keep this bug for pasteboard text encoding.
Yusuke Suzuki
Comment 9
2020-06-18 21:51:35 PDT
I think babel's repl issue is derived from
https://bugs.webkit.org/show_bug.cgi?id=213254
Huáng Jùnliàng
Comment 10
2020-06-19 15:21:01 PDT
@Yusuke Thanks for the attaching the see also issue.
> babel repl website creates large Blob which includes all babel source code
Yes! You are right. I will reply on that issue as this issue should be focused on pasteboard.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug