Bug 212725
| Summary: | Pasted JavaScript code from pasteboard is unicode-normalized and this changes meaning of code | ||
|---|---|---|---|
| Product: | WebKit | Reporter: | Huáng Jùnliàng <jlhwung> |
| Component: | Web Inspector | Assignee: | Wenson Hsieh <wenson_hsieh> |
| Status: | NEW | ||
| Severity: | Normal | CC: | ap, fpizlo, inspector-bugzilla-changes, ljharb, msaboff, webkit-bug-importer, wenson_hsieh, ysuzuki |
| Priority: | P2 | Keywords: | InRadar |
| Version: | Safari Technology Preview | ||
| Hardware: | Mac | ||
| OS: | macOS 10.15 | ||
| See Also: | https://bugs.webkit.org/show_bug.cgi?id=213254 | ||
Huáng Jùnliàng
Safari TP 107 throws on the following snippet
```js
// \u1f7c-\u1f7d
var range = "[ὼ-ώ]";
var regex = new RegExp(range);
```
However it does not throw when the range above is escaped as ascii only:
```js
var range = "[\u1f7c-\u1f7d]";
var regex = new RegExp(range);
```
While `1f7c` seems random, the following snippet is good.
```js
// \u1f7b-\u1f7c
var range="[ύ-ὼ]",regex=new RegExp(range);
```
I don't think this issue can be related to recent Unicode version updates because \u1f7b - \u1f7d have been available since Unicode 1.1.
**Context**
Found this issue when debugging https://github.com/babel/website/issues/2254, JSC throws on Babel's ugly identifier name detection regex: https://github.com/babel/babel/blob/master/packages/babel-helper-validator-identifier/src/identifier.js after minified by terser.
**Related version**
This issue is also reproducible on Safari Version 13.1.1 (15609.2.9.1.2).
| Attachments | ||
|---|---|---|
| Add attachment proposed patch, testcase, etc. |
Radar WebKit Bug Importer
<rdar://problem/64033253>
Yusuke Suzuki
(In reply to Huáng Jùnliàng from comment #0)
> Safari TP 107 throws on the following snippet
>
> ```js
> // \u1f7c-\u1f7d
> var range = "[ὼ-ώ]";
> var regex = new RegExp(range);
> ```
In the above code, it looks like the range is not `\u1f7c-\u1f7d`.
[..."ὼ-ώ"].map((ch) => ch.charCodeAt(0).toString(16)) // => ["1f7c","2d","3ce"]
Since 0x3ce is smaller than 0x1f7c, this throws a SyntaxError.
I've ensured that the above script throws a SyntaxError in Firefox and Chrome.
So, it seems that this is a minifier's bug.
Can you check the result with the other browsers?
>
> However it does not throw when the range above is escaped as ascii only:
>
> ```js
> var range = "[\u1f7c-\u1f7d]";
> var regex = new RegExp(range);
> ```
>
> While `1f7c` seems random, the following snippet is good.
>
> ```js
> // \u1f7b-\u1f7c
> var range="[ύ-ὼ]",regex=new RegExp(range);
> ```
>
> I don't think this issue can be related to recent Unicode version updates
> because \u1f7b - \u1f7d have been available since Unicode 1.1.
>
>
> **Context**
>
> Found this issue when debugging
> https://github.com/babel/website/issues/2254, JSC throws on Babel's ugly
> identifier name detection regex:
> https://github.com/babel/babel/blob/master/packages/babel-helper-validator-
> identifier/src/identifier.js after minified by terser.
>
> **Related version**
> This issue is also reproducible on Safari Version 13.1.1 (15609.2.9.1.2).
Huáng Jùnliàng
> In the above code, it looks like the range is not `\u1f7c-\u1f7d`.
> [..."ὼ-ώ"].map((ch) => ch.charCodeAt(0).toString(16)) // => ["1f7c","2d","3ce"]
I can confirm that if you copy & paste the code example from this webpage, it throws syntax error on other browsers too and `ώ` becomes U+03CE.
I can also confirm that the minifier is working properly. You can try the following example on https://try.terser.org
```
"\u{1f7d}".codePointAt(0).toString(16);
```
And copy the output code to Chrome/Firefox/Safari console. Both Chrome and Firefox returns `1f7d` but Safari console returns `3ce`.
Note that U+1F7D has singleton decompositions, which means
```
String.fromCodePoint(0x1f7d).normalize() === String.fromCodePoint(0x3ce)
```
and it is included in CompositionExclusions, which means U+1F7D should never exist in a normalized Unicode string.
Since ECMAScript does not require the source text to be normalized, I think this is a bug of Safari console, which applies normalization on the pasted source code. I will file a new bug report about that.
However I think this bug also affects certain feature implementations, because runtime error is thrown on Safari exclusively in https://github.com/babel/website/issues/2254.
I will try to isolate that issue again and post an update, please leave it unresolved.
Yusuke Suzuki
(In reply to Huáng Jùnliàng from comment #3)
> > In the above code, it looks like the range is not `\u1f7c-\u1f7d`.
>
> > [..."ὼ-ώ"].map((ch) => ch.charCodeAt(0).toString(16)) // => ["1f7c","2d","3ce"]
>
> I can confirm that if you copy & paste the code example from this webpage,
> it throws syntax error on other browsers too and `ώ` becomes U+03CE.
Thanks for your confirmation :)
>
> I can also confirm that the minifier is working properly. You can try the
> following example on https://try.terser.org
>
> ```
> "\u{1f7d}".codePointAt(0).toString(16);
> ```
>
> And copy the output code to Chrome/Firefox/Safari console. Both Chrome and
> Firefox returns `1f7d` but Safari console returns `3ce`.
Interesting! This reproduced in my machine too.
One interesting thing is that,
1. I copied this from terser result
2. I created secret gist from Safari
3. I copied from the text from gist
4. Paste it to Chrome / Firefox consoles
Then, I got the same normalized results. That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit?
I should talk to platform folks.
>
> Note that U+1F7D has singleton decompositions, which means
>
> ```
> String.fromCodePoint(0x1f7d).normalize() === String.fromCodePoint(0x3ce)
> ```
>
> and it is included in CompositionExclusions, which means U+1F7D should never
> exist in a normalized Unicode string.
>
> Since ECMAScript does not require the source text to be normalized, I think
> this is a bug of Safari console, which applies normalization on the pasted
> source code. I will file a new bug report about that.
Maybe, this is not console's bug given that I encountered this normalization even in textarea.
Rather, it sounds like pasteboard is doing normalization. It would be possible that this is derived from UIKit...?
> However I think this bug also affects certain feature implementations,
> because runtime error is thrown on Safari exclusively in
> https://github.com/babel/website/issues/2254.
>
> I will try to isolate that issue again and post an update, please leave it
> unresolved.
Sure! Thanks. This helps us a lot :D
Huáng Jùnliàng
> That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit?
I should talk to platform folks.
That looks reasonable. Since you are much more familiar with the internals than I am, can you file a radar on that? I have also prepared a gist at https://gist.github.com/JLHwung/64fed33e2dbb3da7a18566fab26f045f.
Yusuke Suzuki
(In reply to Huáng Jùnliàng from comment #5)
> > That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit?
> I should talk to platform folks.
>
> That looks reasonable. Since you are much more familiar with the internals
> than I am, can you file a radar on that? I have also prepared a gist at
> https://gist.github.com/JLHwung/64fed33e2dbb3da7a18566fab26f045f.
OK, I've talked with Wenson and Alexey about it.
We have a code path which is performing unicode-normalizing the text from the pasteboard to textarea to alleviate the situation like fancy decomposed unicode HFS+'s filename is directly pasted to the textarea. We are still talking about the direction.
BTW, from this information, I guess that babel website console has a code path which copy & paste the babel code itself, correct?
Yusuke Suzuki
After looking into babel console code, I suspect that this is normalized when creating Blob.
@Huáng I guess that babel repl website creates large Blob which includes all babel source code, and this Blob is created from user JS something like, `new Blob([codeText])`, is it correct?
Yusuke Suzuki
I'll keep this bug for pasteboard text encoding.
Yusuke Suzuki
I think babel's repl issue is derived from https://bugs.webkit.org/show_bug.cgi?id=213254
Huáng Jùnliàng
@Yusuke Thanks for the attaching the see also issue.
> babel repl website creates large Blob which includes all babel source code
Yes! You are right. I will reply on that issue as this issue should be focused on pasteboard.