Summary: | Pasted JavaScript code from pasteboard is unicode-normalized and this changes meaning of code | ||
---|---|---|---|
Product: | WebKit | Reporter: | Huáng Jùnliàng <jlhwung> |
Component: | Web Inspector | Assignee: | Wenson Hsieh <wenson_hsieh> |
Status: | NEW --- | ||
Severity: | Normal | CC: | ap, fpizlo, inspector-bugzilla-changes, ljharb, msaboff, webkit-bug-importer, wenson_hsieh, ysuzuki |
Priority: | P2 | Keywords: | InRadar |
Version: | Safari Technology Preview | ||
Hardware: | Mac | ||
OS: | macOS 10.15 | ||
See Also: | https://bugs.webkit.org/show_bug.cgi?id=213254 |
Description
Huáng Jùnliàng
2020-06-03 20:37:15 PDT
(In reply to Huáng Jùnliàng from comment #0) > Safari TP 107 throws on the following snippet > > ```js > // \u1f7c-\u1f7d > var range = "[ὼ-ώ]"; > var regex = new RegExp(range); > ``` In the above code, it looks like the range is not `\u1f7c-\u1f7d`. [..."ὼ-ώ"].map((ch) => ch.charCodeAt(0).toString(16)) // => ["1f7c","2d","3ce"] Since 0x3ce is smaller than 0x1f7c, this throws a SyntaxError. I've ensured that the above script throws a SyntaxError in Firefox and Chrome. So, it seems that this is a minifier's bug. Can you check the result with the other browsers? > > However it does not throw when the range above is escaped as ascii only: > > ```js > var range = "[\u1f7c-\u1f7d]"; > var regex = new RegExp(range); > ``` > > While `1f7c` seems random, the following snippet is good. > > ```js > // \u1f7b-\u1f7c > var range="[ύ-ὼ]",regex=new RegExp(range); > ``` > > I don't think this issue can be related to recent Unicode version updates > because \u1f7b - \u1f7d have been available since Unicode 1.1. > > > **Context** > > Found this issue when debugging > https://github.com/babel/website/issues/2254, JSC throws on Babel's ugly > identifier name detection regex: > https://github.com/babel/babel/blob/master/packages/babel-helper-validator- > identifier/src/identifier.js after minified by terser. > > **Related version** > This issue is also reproducible on Safari Version 13.1.1 (15609.2.9.1.2). > In the above code, it looks like the range is not `\u1f7c-\u1f7d`. > [..."ὼ-ώ"].map((ch) => ch.charCodeAt(0).toString(16)) // => ["1f7c","2d","3ce"] I can confirm that if you copy & paste the code example from this webpage, it throws syntax error on other browsers too and `ώ` becomes U+03CE. I can also confirm that the minifier is working properly. You can try the following example on https://try.terser.org ``` "\u{1f7d}".codePointAt(0).toString(16); ``` And copy the output code to Chrome/Firefox/Safari console. Both Chrome and Firefox returns `1f7d` but Safari console returns `3ce`. Note that U+1F7D has singleton decompositions, which means ``` String.fromCodePoint(0x1f7d).normalize() === String.fromCodePoint(0x3ce) ``` and it is included in CompositionExclusions, which means U+1F7D should never exist in a normalized Unicode string. Since ECMAScript does not require the source text to be normalized, I think this is a bug of Safari console, which applies normalization on the pasted source code. I will file a new bug report about that. However I think this bug also affects certain feature implementations, because runtime error is thrown on Safari exclusively in https://github.com/babel/website/issues/2254. I will try to isolate that issue again and post an update, please leave it unresolved. (In reply to Huáng Jùnliàng from comment #3) > > In the above code, it looks like the range is not `\u1f7c-\u1f7d`. > > > [..."ὼ-ώ"].map((ch) => ch.charCodeAt(0).toString(16)) // => ["1f7c","2d","3ce"] > > I can confirm that if you copy & paste the code example from this webpage, > it throws syntax error on other browsers too and `ώ` becomes U+03CE. Thanks for your confirmation :) > > I can also confirm that the minifier is working properly. You can try the > following example on https://try.terser.org > > ``` > "\u{1f7d}".codePointAt(0).toString(16); > ``` > > And copy the output code to Chrome/Firefox/Safari console. Both Chrome and > Firefox returns `1f7d` but Safari console returns `3ce`. Interesting! This reproduced in my machine too. One interesting thing is that, 1. I copied this from terser result 2. I created secret gist from Safari 3. I copied from the text from gist 4. Paste it to Chrome / Firefox consoles Then, I got the same normalized results. That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit? I should talk to platform folks. > > Note that U+1F7D has singleton decompositions, which means > > ``` > String.fromCodePoint(0x1f7d).normalize() === String.fromCodePoint(0x3ce) > ``` > > and it is included in CompositionExclusions, which means U+1F7D should never > exist in a normalized Unicode string. > > Since ECMAScript does not require the source text to be normalized, I think > this is a bug of Safari console, which applies normalization on the pasted > source code. I will file a new bug report about that. Maybe, this is not console's bug given that I encountered this normalization even in textarea. Rather, it sounds like pasteboard is doing normalization. It would be possible that this is derived from UIKit...? > However I think this bug also affects certain feature implementations, > because runtime error is thrown on Safari exclusively in > https://github.com/babel/website/issues/2254. > > I will try to isolate that issue again and post an update, please leave it > unresolved. Sure! Thanks. This helps us a lot :D > That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit? I should talk to platform folks. That looks reasonable. Since you are much more familiar with the internals than I am, can you file a radar on that? I have also prepared a gist at https://gist.github.com/JLHwung/64fed33e2dbb3da7a18566fab26f045f. (In reply to Huáng Jùnliàng from comment #5) > > That sounds like paste-board is normalizing the content when pasting when pasting a text in WebKit? > I should talk to platform folks. > > That looks reasonable. Since you are much more familiar with the internals > than I am, can you file a radar on that? I have also prepared a gist at > https://gist.github.com/JLHwung/64fed33e2dbb3da7a18566fab26f045f. OK, I've talked with Wenson and Alexey about it. We have a code path which is performing unicode-normalizing the text from the pasteboard to textarea to alleviate the situation like fancy decomposed unicode HFS+'s filename is directly pasted to the textarea. We are still talking about the direction. BTW, from this information, I guess that babel website console has a code path which copy & paste the babel code itself, correct? After looking into babel console code, I suspect that this is normalized when creating Blob. @Huáng I guess that babel repl website creates large Blob which includes all babel source code, and this Blob is created from user JS something like, `new Blob([codeText])`, is it correct? I'll keep this bug for pasteboard text encoding. I think babel's repl issue is derived from https://bugs.webkit.org/show_bug.cgi?id=213254 @Yusuke Thanks for the attaching the see also issue.
> babel repl website creates large Blob which includes all babel source code
Yes! You are right. I will reply on that issue as this issue should be focused on pasteboard.
|