Safari TP 107 throws on the following snippet ```js // \u1f7c-\u1f7d var range = "[ὼ-ώ]"; var regex = new RegExp(range); ``` However it does not throw when the range above is escaped as ascii only: ```js var range = "[\u1f7c-\u1f7d]"; var regex = new RegExp(range); ``` While `1f7c` seems random, the following snippet is good. ```js // \u1f7b-\u1f7c var range="[ύ-ὼ]",regex=new RegExp(range); ``` I don't think this issue can be related to recent Unicode version updates because \u1f7b - \u1f7d have been available since Unicode 1.1. **Context** Found this issue when debugging https://github.com/babel/website/issues/2254, JSC throws on Babel's ugly identifier name detection regex: https://github.com/babel/babel/blob/master/packages/babel-helper-validator-identifier/src/identifier.js after minified by terser. **Related version** This issue is also reproducible on Safari Version 13.1.1 (15609.2.9.1.2).
<rdar://problem/64033253>
(In reply to Huáng Jùnliàng from comment #0) > Safari TP 107 throws on the following snippet > > ```js > // \u1f7c-\u1f7d > var range = "[ὼ-ώ]"; > var regex = new RegExp(range); > ``` In the above code, it looks like the range is not `\u1f7c-\u1f7d`. [..."ὼ-ώ"].map((ch) => ch.charCodeAt(0).toString(16)) // => ["1f7c","2d","3ce"] Since 0x3ce is smaller than 0x1f7c, this throws a SyntaxError. I've ensured that the above script throws a SyntaxError in Firefox and Chrome. So, it seems that this is a minifier's bug. Can you check the result with the other browsers? > > However it does not throw when the range above is escaped as ascii only: > > ```js > var range = "[\u1f7c-\u1f7d]"; > var regex = new RegExp(range); > ``` > > While `1f7c` seems random, the following snippet is good. > > ```js > // \u1f7b-\u1f7c > var range="[ύ-ὼ]",regex=new RegExp(range); > ``` > > I don't think this issue can be related to recent Unicode version updates > because \u1f7b - \u1f7d have been available since Unicode 1.1. > > > **Context** > > Found this issue when debugging > https://github.com/babel/website/issues/2254, JSC throws on Babel's ugly > identifier name detection regex: > https://github.com/babel/babel/blob/master/packages/babel-helper-validator- > identifier/src/identifier.js after minified by terser. > > **Related version** > This issue is also reproducible on Safari Version 13.1.1 (15609.2.9.1.2).
> In the above code, it looks like the range is not `\u1f7c-\u1f7d`. > [..."ὼ-ώ"].map((ch) => ch.charCodeAt(0).toString(16)) // => ["1f7c","2d","3ce"] I can confirm that if you copy & paste the code example from this webpage, it throws syntax error on other browsers too and `ώ` becomes U+03CE. I can also confirm that the minifier is working properly. You can try the following example on https://try.terser.org ``` "\u{1f7d}".codePointAt(0).toString(16); ``` And copy the output code to Chrome/Firefox/Safari console. Both Chrome and Firefox returns `1f7d` but Safari console returns `3ce`. Note that U+1F7D has singleton decompositions, which means ``` String.fromCodePoint(0x1f7d).normalize() === String.fromCodePoint(0x3ce) ``` and it is included in CompositionExclusions, which means U+1F7D should never exist in a normalized Unicode string. Since ECMAScript does not require the source text to be normalized, I think this is a bug of Safari console, which applies normalization on the pasted source code. I will file a new bug report about that. However I think this bug also affects certain feature implementations, because runtime error is thrown on Safari exclusively in https://github.com/babel/website/issues/2254. I will try to isolate that issue again and post an update, please leave it unresolved.
(In reply to Huáng Jùnliàng from comment #3) > > In the above code, it looks like the range is not `\u1f7c-\u1f7d`. > > > [..."ὼ-ώ"].map((ch) => ch.charCodeAt(0).toString(16)) // => ["1f7c","2d","3ce"] > > I can confirm that if you copy & paste the code example from this webpage, > it throws syntax error on other browsers too and `ώ` becomes U+03CE. Thanks for your confirmation :) > > I can also confirm that the minifier is working properly. You can try the > following example on https://try.terser.org > > ``` > "\u{1f7d}".codePointAt(0).toString(16); > ``` > > And copy the output code to Chrome/Firefox/Safari console. Both Chrome and > Firefox returns `1f7d` but Safari console returns `3ce`. Interesting! This reproduced in my machine too. One interesting thing is that, 1. I copied this from terser result 2. I created secret gist from Safari 3. I copied from the text from gist 4. Paste it to Chrome / Firefox consoles Then, I got the same normalized results. That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit? I should talk to platform folks. > > Note that U+1F7D has singleton decompositions, which means > > ``` > String.fromCodePoint(0x1f7d).normalize() === String.fromCodePoint(0x3ce) > ``` > > and it is included in CompositionExclusions, which means U+1F7D should never > exist in a normalized Unicode string. > > Since ECMAScript does not require the source text to be normalized, I think > this is a bug of Safari console, which applies normalization on the pasted > source code. I will file a new bug report about that. Maybe, this is not console's bug given that I encountered this normalization even in textarea. Rather, it sounds like pasteboard is doing normalization. It would be possible that this is derived from UIKit...? > However I think this bug also affects certain feature implementations, > because runtime error is thrown on Safari exclusively in > https://github.com/babel/website/issues/2254. > > I will try to isolate that issue again and post an update, please leave it > unresolved. Sure! Thanks. This helps us a lot :D
> That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit? I should talk to platform folks. That looks reasonable. Since you are much more familiar with the internals than I am, can you file a radar on that? I have also prepared a gist at https://gist.github.com/JLHwung/64fed33e2dbb3da7a18566fab26f045f.
(In reply to Huáng Jùnliàng from comment #5) > > That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit? > I should talk to platform folks. > > That looks reasonable. Since you are much more familiar with the internals > than I am, can you file a radar on that? I have also prepared a gist at > https://gist.github.com/JLHwung/64fed33e2dbb3da7a18566fab26f045f. OK, I've talked with Wenson and Alexey about it. We have a code path which is performing unicode-normalizing the text from the pasteboard to textarea to alleviate the situation like fancy decomposed unicode HFS+'s filename is directly pasted to the textarea. We are still talking about the direction. BTW, from this information, I guess that babel website console has a code path which copy & paste the babel code itself, correct?
After looking into babel console code, I suspect that this is normalized when creating Blob. @Huáng I guess that babel repl website creates large Blob which includes all babel source code, and this Blob is created from user JS something like, `new Blob([codeText])`, is it correct?
I'll keep this bug for pasteboard text encoding.
I think babel's repl issue is derived from https://bugs.webkit.org/show_bug.cgi?id=213254
@Yusuke Thanks for the attaching the see also issue. > babel repl website creates large Blob which includes all babel source code Yes! You are right. I will reply on that issue as this issue should be focused on pasteboard.