RESOLVED FIXED291923
[Yarr] Improve reading of Surrogate Pairs in Unicode Regular Expressions
https://bugs.webkit.org/show_bug.cgi?id=291923
Summary [Yarr] Improve reading of Surrogate Pairs in Unicode Regular Expressions
Michael Saboff
Reported 2025-04-22 16:54:07 PDT
Currently we create a helper to read possible surrogate pairs. That helper reads a single 16 byte character and checks to see if it is a surrogate and if it is a leading surrogate, it reads a second character to see if it is a trailing surrogate. If so we construct a non-BMP character and return it. That helper is generated at the end of every RegExp JIT'ed code. There are a few optimizations we can make. 1. If possible, we can load 32 bits and check to see if the two characters that read are a valid surrogate pair. If so, we convert it and return. 2. We can reduce the number of branches in the hot paths. 3. We can turn the helper into thunk that is created when needed, thus reducing the JIT footprint when multiple Unicode RegExp have been compiled.
Attachments
Radar WebKit Bug Importer
Comment 1 2025-04-22 16:54:40 PDT
Michael Saboff
Comment 2 2025-04-22 17:48:43 PDT
EWS
Comment 3 2025-04-23 22:21:24 PDT
Committed 294046@main (aaee2a6f166a): <https://commits.webkit.org/294046@main> Reviewed commits have been landed. Closing PR #44394 and removing active labels.
Note You need to log in before you can comment on or make changes to this bug.