212725 – Pasted JavaScript code from pasteboard is unicode-normalized and this changes meaning of code

NEW 212725

Pasted JavaScript code from pasteboard is unicode-normalized and this changes meaning of code

https://bugs.webkit.org/show_bug.cgi?id=212725

Summary Pasted JavaScript code from pasteboard is unicode-normalized and this changes...

Huáng Jùnliàng

Reported 2020-06-03 20:37:15 PDT

Safari TP 107 throws on the following snippet ```js // \u1f7c-\u1f7d var range = "[ὼ-ώ]"; var regex = new RegExp(range); ``` However it does not throw when the range above is escaped as ascii only: ```js var range = "[\u1f7c-\u1f7d]"; var regex = new RegExp(range); ``` While `1f7c` seems random, the following snippet is good. ```js // \u1f7b-\u1f7c var range="[ύ-ὼ]",regex=new RegExp(range); ``` I don't think this issue can be related to recent Unicode version updates because \u1f7b - \u1f7d have been available since Unicode 1.1. **Context** Found this issue when debugging https://github.com/babel/website/issues/2254, JSC throws on Babel's ugly identifier name detection regex: https://github.com/babel/babel/blob/master/packages/babel-helper-validator-identifier/src/identifier.js after minified by terser. **Related version** This issue is also reproducible on Safari Version 13.1.1 (15609.2.9.1.2).

Attachments
Add attachment proposed patch, testcase, etc.

Radar WebKit Bug Importer

Comment 1 2020-06-05 09:57:35 PDT

<rdar://problem/64033253>

Yusuke Suzuki

Comment 2 2020-06-09 16:55:44 PDT

(In reply to Huáng Jùnliàng from comment #0) > Safari TP 107 throws on the following snippet > > ```js > // \u1f7c-\u1f7d > var range = "[ὼ-ώ]"; > var regex = new RegExp(range); > ``` In the above code, it looks like the range is not `\u1f7c-\u1f7d`. [..."ὼ-ώ"].map((ch) => ch.charCodeAt(0).toString(16)) // => ["1f7c","2d","3ce"] Since 0x3ce is smaller than 0x1f7c, this throws a SyntaxError. I've ensured that the above script throws a SyntaxError in Firefox and Chrome. So, it seems that this is a minifier's bug. Can you check the result with the other browsers? > > However it does not throw when the range above is escaped as ascii only: > > ```js > var range = "[\u1f7c-\u1f7d]"; > var regex = new RegExp(range); > ``` > > While `1f7c` seems random, the following snippet is good. > > ```js > // \u1f7b-\u1f7c > var range="[ύ-ὼ]",regex=new RegExp(range); > ``` > > I don't think this issue can be related to recent Unicode version updates > because \u1f7b - \u1f7d have been available since Unicode 1.1. > > > **Context** > > Found this issue when debugging > https://github.com/babel/website/issues/2254, JSC throws on Babel's ugly > identifier name detection regex: > https://github.com/babel/babel/blob/master/packages/babel-helper-validator- > identifier/src/identifier.js after minified by terser. > > **Related version** > This issue is also reproducible on Safari Version 13.1.1 (15609.2.9.1.2).

Huáng Jùnliàng

Comment 3 2020-06-09 19:27:10 PDT

> In the above code, it looks like the range is not `\u1f7c-\u1f7d`. > [..."ὼ-ώ"].map((ch) => ch.charCodeAt(0).toString(16)) // => ["1f7c","2d","3ce"] I can confirm that if you copy & paste the code example from this webpage, it throws syntax error on other browsers too and `ώ` becomes U+03CE. I can also confirm that the minifier is working properly. You can try the following example on https://try.terser.org ``` "\u{1f7d}".codePointAt(0).toString(16); ``` And copy the output code to Chrome/Firefox/Safari console. Both Chrome and Firefox returns `1f7d` but Safari console returns `3ce`. Note that U+1F7D has singleton decompositions, which means ``` String.fromCodePoint(0x1f7d).normalize() === String.fromCodePoint(0x3ce) ``` and it is included in CompositionExclusions, which means U+1F7D should never exist in a normalized Unicode string. Since ECMAScript does not require the source text to be normalized, I think this is a bug of Safari console, which applies normalization on the pasted source code. I will file a new bug report about that. However I think this bug also affects certain feature implementations, because runtime error is thrown on Safari exclusively in https://github.com/babel/website/issues/2254. I will try to isolate that issue again and post an update, please leave it unresolved.

Yusuke Suzuki

Comment 4 2020-06-09 19:41:00 PDT

(In reply to Huáng Jùnliàng from comment #3) > > In the above code, it looks like the range is not `\u1f7c-\u1f7d`. > > > [..."ὼ-ώ"].map((ch) => ch.charCodeAt(0).toString(16)) // => ["1f7c","2d","3ce"] > > I can confirm that if you copy & paste the code example from this webpage, > it throws syntax error on other browsers too and `ώ` becomes U+03CE. Thanks for your confirmation :) > > I can also confirm that the minifier is working properly. You can try the > following example on https://try.terser.org > > ``` > "\u{1f7d}".codePointAt(0).toString(16); > ``` > > And copy the output code to Chrome/Firefox/Safari console. Both Chrome and > Firefox returns `1f7d` but Safari console returns `3ce`. Interesting! This reproduced in my machine too. One interesting thing is that, 1. I copied this from terser result 2. I created secret gist from Safari 3. I copied from the text from gist 4. Paste it to Chrome / Firefox consoles Then, I got the same normalized results. That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit? I should talk to platform folks. > > Note that U+1F7D has singleton decompositions, which means > > ``` > String.fromCodePoint(0x1f7d).normalize() === String.fromCodePoint(0x3ce) > ``` > > and it is included in CompositionExclusions, which means U+1F7D should never > exist in a normalized Unicode string. > > Since ECMAScript does not require the source text to be normalized, I think > this is a bug of Safari console, which applies normalization on the pasted > source code. I will file a new bug report about that. Maybe, this is not console's bug given that I encountered this normalization even in textarea. Rather, it sounds like pasteboard is doing normalization. It would be possible that this is derived from UIKit...? > However I think this bug also affects certain feature implementations, > because runtime error is thrown on Safari exclusively in > https://github.com/babel/website/issues/2254. > > I will try to isolate that issue again and post an update, please leave it > unresolved. Sure! Thanks. This helps us a lot :D

Huáng Jùnliàng

Comment 5 2020-06-09 20:01:22 PDT

> That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit? I should talk to platform folks. That looks reasonable. Since you are much more familiar with the internals than I am, can you file a radar on that? I have also prepared a gist at https://gist.github.com/JLHwung/64fed33e2dbb3da7a18566fab26f045f.

Yusuke Suzuki

Comment 6 2020-06-09 20:59:00 PDT

(In reply to Huáng Jùnliàng from comment #5) > > That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit? > I should talk to platform folks. > > That looks reasonable. Since you are much more familiar with the internals > than I am, can you file a radar on that? I have also prepared a gist at > https://gist.github.com/JLHwung/64fed33e2dbb3da7a18566fab26f045f. OK, I've talked with Wenson and Alexey about it. We have a code path which is performing unicode-normalizing the text from the pasteboard to textarea to alleviate the situation like fancy decomposed unicode HFS+'s filename is directly pasted to the textarea. We are still talking about the direction. BTW, from this information, I guess that babel website console has a code path which copy & paste the babel code itself, correct?

Yusuke Suzuki

Comment 7 2020-06-18 21:48:17 PDT

After looking into babel console code, I suspect that this is normalized when creating Blob. @Huáng I guess that babel repl website creates large Blob which includes all babel source code, and this Blob is created from user JS something like, `new Blob([codeText])`, is it correct?

Yusuke Suzuki

Comment 8 2020-06-18 21:48:37 PDT

I'll keep this bug for pasteboard text encoding.

Yusuke Suzuki

Comment 9 2020-06-18 21:51:35 PDT

I think babel's repl issue is derived from https://bugs.webkit.org/show_bug.cgi?id=213254

Huáng Jùnliàng

Comment 10 2020-06-19 15:21:01 PDT

@Yusuke Thanks for the attaching the see also issue. > babel repl website creates large Blob which includes all babel source code Yes! You are right. I will reply on that issue as this issue should be focused on pasteboard.

Note You need to log in before you can comment on or make changes to this bug.

Status NEW

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version Safari Technology Preview

Hardware Mac

OS macOS 10.15

Product WebKit

Component Web Inspector

Assignee

Wenson Hsieh

Reported

2020-06-03 20:37 PDT

Modified

2020-06-19 15:21 PDT History

CC List

8 users Show

URL

Keywords InRadar

Depends on

Blocks