WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
291923
[Yarr] Improve reading of Surrogate Pairs in Unicode Regular Expressions
https://bugs.webkit.org/show_bug.cgi?id=291923
Summary
[Yarr] Improve reading of Surrogate Pairs in Unicode Regular Expressions
Michael Saboff
Reported
2025-04-22 16:54:07 PDT
Currently we create a helper to read possible surrogate pairs. That helper reads a single 16 byte character and checks to see if it is a surrogate and if it is a leading surrogate, it reads a second character to see if it is a trailing surrogate. If so we construct a non-BMP character and return it. That helper is generated at the end of every RegExp JIT'ed code. There are a few optimizations we can make. 1. If possible, we can load 32 bits and check to see if the two characters that read are a valid surrogate pair. If so, we convert it and return. 2. We can reduce the number of branches in the hot paths. 3. We can turn the helper into thunk that is created when needed, thus reducing the JIT footprint when multiple Unicode RegExp have been compiled.
Attachments
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2025-04-22 16:54:40 PDT
<
rdar://problem/149811541
>
Michael Saboff
Comment 2
2025-04-22 17:48:43 PDT
Pull request:
https://github.com/WebKit/WebKit/pull/44394
EWS
Comment 3
2025-04-23 22:21:24 PDT
Committed
294046@main
(aaee2a6f166a): <
https://commits.webkit.org/294046@main
> Reviewed commits have been landed. Closing PR #44394 and removing active labels.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug