<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>212725</bug_id>
          
          <creation_ts>2020-06-03 20:37:15 -0700</creation_ts>
          <short_desc>Pasted JavaScript code from pasteboard is unicode-normalized and this changes meaning of code</short_desc>
          <delta_ts>2020-06-19 15:21:01 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Web Inspector</component>
          <version>Safari Technology Preview</version>
          <rep_platform>Mac</rep_platform>
          <op_sys>macOS 10.15</op_sys>
          <bug_status>NEW</bug_status>
          <resolution></resolution>
          
          <see_also>https://bugs.webkit.org/show_bug.cgi?id=213254</see_also>
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords>InRadar</keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Huáng Jùnliàng">jlhwung</reporter>
          <assigned_to name="Wenson Hsieh">wenson_hsieh</assigned_to>
          <cc>ap</cc>
    
    <cc>fpizlo</cc>
    
    <cc>inspector-bugzilla-changes</cc>
    
    <cc>ljharb</cc>
    
    <cc>msaboff</cc>
    
    <cc>webkit-bug-importer</cc>
    
    <cc>wenson_hsieh</cc>
    
    <cc>ysuzuki</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>1658962</commentid>
    <comment_count>0</comment_count>
    <who name="Huáng Jùnliàng">jlhwung</who>
    <bug_when>2020-06-03 20:37:15 -0700</bug_when>
    <thetext>Safari TP 107 throws on the following snippet

```js
// \u1f7c-\u1f7d
var range = &quot;[ὼ-ώ]&quot;;
var regex = new RegExp(range);
```

However it does not throw when the range above is escaped as ascii only:

```js
var range = &quot;[\u1f7c-\u1f7d]&quot;;
var regex = new RegExp(range);
```

While `1f7c` seems random, the following snippet is good.

```js
// \u1f7b-\u1f7c
var range=&quot;[ύ-ὼ]&quot;,regex=new RegExp(range);
```

I don&apos;t think this issue can be related to recent Unicode version updates because \u1f7b - \u1f7d have been available since Unicode 1.1.


**Context**

Found this issue when debugging https://github.com/babel/website/issues/2254, JSC throws on Babel&apos;s ugly identifier name detection regex: https://github.com/babel/babel/blob/master/packages/babel-helper-validator-identifier/src/identifier.js after minified by terser.

**Related version**
This issue is also reproducible on Safari Version 13.1.1 (15609.2.9.1.2).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1659642</commentid>
    <comment_count>1</comment_count>
    <who name="Radar WebKit Bug Importer">webkit-bug-importer</who>
    <bug_when>2020-06-05 09:57:35 -0700</bug_when>
    <thetext>&lt;rdar://problem/64033253&gt;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1661016</commentid>
    <comment_count>2</comment_count>
    <who name="Yusuke Suzuki">ysuzuki</who>
    <bug_when>2020-06-09 16:55:44 -0700</bug_when>
    <thetext>(In reply to Huáng Jùnliàng from comment #0)
&gt; Safari TP 107 throws on the following snippet
&gt; 
&gt; ```js
&gt; // \u1f7c-\u1f7d
&gt; var range = &quot;[ὼ-ώ]&quot;;
&gt; var regex = new RegExp(range);
&gt; ```

In the above code, it looks like the range is not `\u1f7c-\u1f7d`.

[...&quot;ὼ-ώ&quot;].map((ch) =&gt; ch.charCodeAt(0).toString(16)) // =&gt; [&quot;1f7c&quot;,&quot;2d&quot;,&quot;3ce&quot;]

Since 0x3ce is smaller than 0x1f7c, this throws a SyntaxError.

I&apos;ve ensured that the above script throws a SyntaxError in Firefox and Chrome.
So, it seems that this is a minifier&apos;s bug.
Can you check the result with the other browsers?

&gt; 
&gt; However it does not throw when the range above is escaped as ascii only:
&gt; 
&gt; ```js
&gt; var range = &quot;[\u1f7c-\u1f7d]&quot;;
&gt; var regex = new RegExp(range);
&gt; ```
&gt; 
&gt; While `1f7c` seems random, the following snippet is good.
&gt; 
&gt; ```js
&gt; // \u1f7b-\u1f7c
&gt; var range=&quot;[ύ-ὼ]&quot;,regex=new RegExp(range);
&gt; ```
&gt; 
&gt; I don&apos;t think this issue can be related to recent Unicode version updates
&gt; because \u1f7b - \u1f7d have been available since Unicode 1.1.
&gt; 
&gt; 
&gt; **Context**
&gt; 
&gt; Found this issue when debugging
&gt; https://github.com/babel/website/issues/2254, JSC throws on Babel&apos;s ugly
&gt; identifier name detection regex:
&gt; https://github.com/babel/babel/blob/master/packages/babel-helper-validator-
&gt; identifier/src/identifier.js after minified by terser.
&gt; 
&gt; **Related version**
&gt; This issue is also reproducible on Safari Version 13.1.1 (15609.2.9.1.2).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1661056</commentid>
    <comment_count>3</comment_count>
    <who name="Huáng Jùnliàng">jlhwung</who>
    <bug_when>2020-06-09 19:27:10 -0700</bug_when>
    <thetext>&gt; In the above code, it looks like the range is not `\u1f7c-\u1f7d`.

&gt; [...&quot;ὼ-ώ&quot;].map((ch) =&gt; ch.charCodeAt(0).toString(16)) // =&gt; [&quot;1f7c&quot;,&quot;2d&quot;,&quot;3ce&quot;]

I can confirm that if you copy &amp; paste the code example from this webpage, it throws syntax error on other browsers too and `ώ` becomes U+03CE.

I can also confirm that the minifier is working properly. You can try the following example on https://try.terser.org

```
&quot;\u{1f7d}&quot;.codePointAt(0).toString(16);
```

And copy the output code to Chrome/Firefox/Safari console. Both Chrome and Firefox returns `1f7d` but Safari console returns `3ce`.

Note that U+1F7D has singleton decompositions, which means

```
String.fromCodePoint(0x1f7d).normalize() === String.fromCodePoint(0x3ce)
```

and it is included in CompositionExclusions, which means U+1F7D should never exist in a normalized Unicode string.

Since ECMAScript does not require the source text to be normalized, I think this is a bug of Safari console, which applies normalization on the pasted source code. I will file a new bug report about that.

However I think this bug also affects certain feature implementations, because runtime error is thrown on Safari exclusively in https://github.com/babel/website/issues/2254.

I will try to isolate that issue again and post an update, please leave it unresolved.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1661057</commentid>
    <comment_count>4</comment_count>
    <who name="Yusuke Suzuki">ysuzuki</who>
    <bug_when>2020-06-09 19:41:00 -0700</bug_when>
    <thetext>(In reply to Huáng Jùnliàng from comment #3)
&gt; &gt; In the above code, it looks like the range is not `\u1f7c-\u1f7d`.
&gt; 
&gt; &gt; [...&quot;ὼ-ώ&quot;].map((ch) =&gt; ch.charCodeAt(0).toString(16)) // =&gt; [&quot;1f7c&quot;,&quot;2d&quot;,&quot;3ce&quot;]
&gt; 
&gt; I can confirm that if you copy &amp; paste the code example from this webpage,
&gt; it throws syntax error on other browsers too and `ώ` becomes U+03CE.

Thanks for your confirmation :)

&gt; 
&gt; I can also confirm that the minifier is working properly. You can try the
&gt; following example on https://try.terser.org
&gt; 
&gt; ```
&gt; &quot;\u{1f7d}&quot;.codePointAt(0).toString(16);
&gt; ```
&gt; 
&gt; And copy the output code to Chrome/Firefox/Safari console. Both Chrome and
&gt; Firefox returns `1f7d` but Safari console returns `3ce`.

Interesting! This reproduced in my machine too.
One interesting thing is that,

1. I copied this from terser result
2. I created secret gist from Safari
3. I copied from the text from gist
4. Paste it to Chrome / Firefox consoles

Then, I got the same normalized results. That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit?
I should talk to platform folks.

&gt; 
&gt; Note that U+1F7D has singleton decompositions, which means
&gt; 
&gt; ```
&gt; String.fromCodePoint(0x1f7d).normalize() === String.fromCodePoint(0x3ce)
&gt; ```
&gt; 
&gt; and it is included in CompositionExclusions, which means U+1F7D should never
&gt; exist in a normalized Unicode string.
&gt; 
&gt; Since ECMAScript does not require the source text to be normalized, I think
&gt; this is a bug of Safari console, which applies normalization on the pasted
&gt; source code. I will file a new bug report about that.

Maybe, this is not console&apos;s bug given that I encountered this normalization even in textarea.
Rather, it sounds like pasteboard is doing normalization. It would be possible that this is derived from UIKit...?

&gt; However I think this bug also affects certain feature implementations,
&gt; because runtime error is thrown on Safari exclusively in
&gt; https://github.com/babel/website/issues/2254.
&gt; 
&gt; I will try to isolate that issue again and post an update, please leave it
&gt; unresolved.

Sure! Thanks. This helps us a lot :D</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1661060</commentid>
    <comment_count>5</comment_count>
    <who name="Huáng Jùnliàng">jlhwung</who>
    <bug_when>2020-06-09 20:01:22 -0700</bug_when>
    <thetext>&gt; That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit?
I should talk to platform folks.

That looks reasonable. Since you are much more familiar with the internals than I am, can you file a radar on that? I have also prepared a gist at https://gist.github.com/JLHwung/64fed33e2dbb3da7a18566fab26f045f.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1661071</commentid>
    <comment_count>6</comment_count>
    <who name="Yusuke Suzuki">ysuzuki</who>
    <bug_when>2020-06-09 20:59:00 -0700</bug_when>
    <thetext>(In reply to Huáng Jùnliàng from comment #5)
&gt; &gt; That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit?
&gt; I should talk to platform folks.
&gt; 
&gt; That looks reasonable. Since you are much more familiar with the internals
&gt; than I am, can you file a radar on that? I have also prepared a gist at
&gt; https://gist.github.com/JLHwung/64fed33e2dbb3da7a18566fab26f045f.

OK, I&apos;ve talked with Wenson and Alexey about it.
We have a code path which is performing unicode-normalizing the text from the pasteboard to textarea to alleviate the situation like fancy decomposed unicode HFS+&apos;s filename is directly pasted to the textarea. We are still talking about the direction.

BTW, from this information, I guess that babel website console has a code path which copy &amp; paste the babel code itself, correct?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1664101</commentid>
    <comment_count>7</comment_count>
    <who name="Yusuke Suzuki">ysuzuki</who>
    <bug_when>2020-06-18 21:48:17 -0700</bug_when>
    <thetext>After looking into babel console code, I suspect that this is normalized when creating Blob.

@Huáng I guess that babel repl website creates large Blob which includes all babel source code, and this Blob is created from user JS something like, `new Blob([codeText])`, is it correct?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1664102</commentid>
    <comment_count>8</comment_count>
    <who name="Yusuke Suzuki">ysuzuki</who>
    <bug_when>2020-06-18 21:48:37 -0700</bug_when>
    <thetext>I&apos;ll keep this bug for pasteboard text encoding.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1664105</commentid>
    <comment_count>9</comment_count>
    <who name="Yusuke Suzuki">ysuzuki</who>
    <bug_when>2020-06-18 21:51:35 -0700</bug_when>
    <thetext>I think babel&apos;s repl issue is derived from https://bugs.webkit.org/show_bug.cgi?id=213254</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1664559</commentid>
    <comment_count>10</comment_count>
    <who name="Huáng Jùnliàng">jlhwung</who>
    <bug_when>2020-06-19 15:21:01 -0700</bug_when>
    <thetext>@Yusuke Thanks for the attaching the see also issue. 

&gt; babel repl website creates large Blob which includes all babel source code

Yes! You are right. I will reply on that issue as this issue should be focused on pasteboard.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>