WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
289567
[Yarr] Improve processing of adjacent or near adjacent single characters
https://bugs.webkit.org/show_bug.cgi?id=289567
Summary
[Yarr] Improve processing of adjacent or near adjacent single characters
Michael Saboff
Reported
2025-03-11 14:39:38 PDT
There currently is an optimization in the Yarr JIT where we process adjacent single character atoms. For example, /abcd/ is processed as: 1:Term PatternCharacter checked-offset:(4) 'a' <44> 0x12f018b6c: sub x17, x0, #4 <48> 0x12f018b70: ldr w17, [x17, x1] <52> 0x12f018b74: movz w16, #0x6261 <56> 0x12f018b78: movk w16, #0x6463, lsl #16 -> 0x64636261 <60> 0x12f018b7c: cmp w17, w16 <64> 0x12f018b80: b.ne 0x12f018b90 -> <80> 2:Term PatternCharacter checked-offset:(4) 'b' already handled 3:Term PatternCharacter checked-offset:(4) 'c' already handled 4:Term PatternCharacter checked-offset:(4) 'd' already handled but if there is something in between we could check characters that a nearly adjacent individually. For example, /a\dbc/ is currently processed as: 1:Term PatternCharacter checked-offset:(4) 'a' <84> 0x12f015054: sub x17, x0, #4 <88> 0x12f015058: ldrb w6, [x17, x1] <92> 0x12f01505c: cmp w6, #97 <96> 0x12f015060: b.ne 0x12f015098 -> <152> 2:Term PatternCharacter checked-offset:(4) 'b' <100> 0x12f015064: sub x17, x0, #2 <104> 0x12f015068: ldrh w6, [x17, x1] <108> 0x12f01506c: movz w16, #0x6362 -> 25442 <112> 0x12f015070: cmp w6, w16 <116> 0x12f015074: b.ne 0x12f015098 -> <152> 3:Term PatternCharacter checked-offset:(4) 'c' already handled 4:Term PatternCharacterClass checked-offset:(4) <digits> ... Note that we have an existing optimization to move the matching of character classes to after single character atoms. For the second case, we could load 4 characters and mask out the character class character like: 1:Term PatternCharacter checked-offset:(4) 'a' <84> 0x12f014f54: sub x17, x0, #4 <88> 0x12f014f58: ldr w6, [x17, x1] <92> 0x12f014f5c: and w6, w6, #0xffff00ff <96> 0x12f014f60: movz w16, #0x61 <100> 0x12f014f64: movk w16, #0x6362, lsl #16 -> 0x63620061 <104> 0x12f014f68: cmp w6, w16 <108> 0x12f014f6c: b.ne 0x12f014f90 -> <144> 2:Term PatternCharacter checked-offset:(4) 'b' already handled 3:Term PatternCharacter checked-offset:(4) 'c' already handled 4:Term PatternCharacterClass checked-offset:(4) <digits> ... This eliminating a load, compare and branch. The more general case is to use larger load, compare and branch code sequences for single character atoms, including patterns that have mixed in single character width character class atoms.
Attachments
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2025-03-11 14:40:17 PDT
<
rdar://problem/146795365
>
Michael Saboff
Comment 2
2025-03-11 15:28:33 PDT
Pull request:
https://github.com/WebKit/WebKit/pull/42284
EWS
Comment 3
2025-03-12 01:38:01 PDT
Committed
292003@main
(1e14cbbdc2f5): <
https://commits.webkit.org/292003@main
> Reviewed commits have been landed. Closing PR #42284 and removing active labels.
EWS
Comment 4
2025-03-31 12:40:08 PDT
Committed
289651.362@safari-7621-branch
(b78009996aa0): <
https://commits.webkit.org/289651.362@safari-7621-branch
> Reviewed commits have been landed. Closing PR #2897 and removing active labels.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug