Bug 290567
| Summary: | RegExp Unicode JIT treats escaped surrogate followed by literal surrogate as surrogate pair | ||
|---|---|---|---|
| Product: | WebKit | Reporter: | Ben Grant <ben> |
| Component: | JavaScriptCore | Assignee: | Michael Saboff <msaboff> |
| Status: | RESOLVED FIXED | ||
| Severity: | Normal | CC: | keith_miller, mark.lam, msaboff, webkit-bug-importer, ysuzuki |
| Priority: | P2 | Keywords: | InRadar |
| Version: | WebKit Local Build | ||
| Hardware: | Mac (Apple Silicon) | ||
| OS: | macOS 15 | ||
Ben Grant
To reproduce, evaluate either of the following examples:
> new RegExp("\\ud800\udc00+", "u").exec("\u{10000}\u{10000}")
> new RegExp("\\uD83D\uDC38", "u").exec("\u{1F438}")
(the first is from https://github.com/oven-sh/bun/issues/18540, and the second is from https://github.com/tc39/test262/blob/ce7e72d2107f99d165f4259571f10aa75753d997/test/staging/sm/RegExp/unicode-raw.js#L56)
These should both return null. The reason is that, in Unicode mode, \u-escaped surrogates followed by literal surrogates should not form a pair. So these regular expressions are trying to match an unpaired high surrogate followed by an unpaired low surrogate, which is impossible as those code units would form a pair.
But in JavaScriptCore by default, these code samples do match the first codepoint of the input string:
> >>> new RegExp("\\ud800\udc00+", "u").exec("\u{10000}\u{10000}")
> [𐀀]
> >>> new RegExp("\\uD83D\uDC38", "u").exec("\u{1F438}")
> [🐸]
I'm using a local build from 292785@main. The correct behavior is observed in SpiderMonkey and V8, and in JavaScriptCore with --useRegExpJIT=0.
This was originally reported to Deno at https://github.com/denoland/deno/issues/28587, but the Deno team believes (and I agree given the test262 coverage) that V8 has the more correct behavior here.
| Attachments | ||
|---|---|---|
| Add attachment proposed patch, testcase, etc. |
Radar WebKit Bug Importer
<rdar://problem/148548273>
Michael Saboff
Pull request: https://github.com/WebKit/WebKit/pull/44450
EWS
Committed 294066@main (ab6288d351f1): <https://commits.webkit.org/294066@main>
Reviewed commits have been landed. Closing PR #44450 and removing active labels.