Bug 291923
| Summary: | [Yarr] Improve reading of Surrogate Pairs in Unicode Regular Expressions | ||
|---|---|---|---|
| Product: | WebKit | Reporter: | Michael Saboff <msaboff> |
| Component: | New Bugs | Assignee: | Michael Saboff <msaboff> |
| Status: | RESOLVED FIXED | ||
| Severity: | Normal | CC: | webkit-bug-importer |
| Priority: | P2 | Keywords: | InRadar |
| Version: | Other | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
Michael Saboff
Currently we create a helper to read possible surrogate pairs. That helper reads a single 16 byte character and checks to see if it is a surrogate and if it is a leading surrogate, it reads a second character to see if it is a trailing surrogate. If so we construct a non-BMP character and return it. That helper is generated at the end of every RegExp JIT'ed code.
There are a few optimizations we can make.
1. If possible, we can load 32 bits and check to see if the two characters that read are a valid surrogate pair. If so, we convert it and return.
2. We can reduce the number of branches in the hot paths.
3. We can turn the helper into thunk that is created when needed, thus reducing the JIT footprint when multiple Unicode RegExp have been compiled.
| Attachments | ||
|---|---|---|
| Add attachment proposed patch, testcase, etc. |
Radar WebKit Bug Importer
<rdar://problem/149811541>
Michael Saboff
Pull request: https://github.com/WebKit/WebKit/pull/44394
EWS
Committed 294046@main (aaee2a6f166a): <https://commits.webkit.org/294046@main>
Reviewed commits have been landed. Closing PR #44394 and removing active labels.