Bug 289567

Summary: [Yarr] Improve processing of adjacent or near adjacent single characters
Product: WebKit Reporter: Michael Saboff <msaboff>
Component: JavaScriptCoreAssignee: Michael Saboff <msaboff>
Status: RESOLVED FIXED    
Severity: Normal CC: webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: Other   
Hardware: Unspecified   
OS: Unspecified   

Michael Saboff
Reported 2025-03-11 14:39:38 PDT
There currently is an optimization in the Yarr JIT where we process adjacent single character atoms. For example, /abcd/ is processed as: 1:Term PatternCharacter checked-offset:(4) 'a' <44> 0x12f018b6c: sub x17, x0, #4 <48> 0x12f018b70: ldr w17, [x17, x1] <52> 0x12f018b74: movz w16, #0x6261 <56> 0x12f018b78: movk w16, #0x6463, lsl #16 -> 0x64636261 <60> 0x12f018b7c: cmp w17, w16 <64> 0x12f018b80: b.ne 0x12f018b90 -> <80> 2:Term PatternCharacter checked-offset:(4) 'b' already handled 3:Term PatternCharacter checked-offset:(4) 'c' already handled 4:Term PatternCharacter checked-offset:(4) 'd' already handled but if there is something in between we could check characters that a nearly adjacent individually. For example, /a\dbc/ is currently processed as: 1:Term PatternCharacter checked-offset:(4) 'a' <84> 0x12f015054: sub x17, x0, #4 <88> 0x12f015058: ldrb w6, [x17, x1] <92> 0x12f01505c: cmp w6, #97 <96> 0x12f015060: b.ne 0x12f015098 -> <152> 2:Term PatternCharacter checked-offset:(4) 'b' <100> 0x12f015064: sub x17, x0, #2 <104> 0x12f015068: ldrh w6, [x17, x1] <108> 0x12f01506c: movz w16, #0x6362 -> 25442 <112> 0x12f015070: cmp w6, w16 <116> 0x12f015074: b.ne 0x12f015098 -> <152> 3:Term PatternCharacter checked-offset:(4) 'c' already handled 4:Term PatternCharacterClass checked-offset:(4) <digits> ... Note that we have an existing optimization to move the matching of character classes to after single character atoms. For the second case, we could load 4 characters and mask out the character class character like: 1:Term PatternCharacter checked-offset:(4) 'a' <84> 0x12f014f54: sub x17, x0, #4 <88> 0x12f014f58: ldr w6, [x17, x1] <92> 0x12f014f5c: and w6, w6, #0xffff00ff <96> 0x12f014f60: movz w16, #0x61 <100> 0x12f014f64: movk w16, #0x6362, lsl #16 -> 0x63620061 <104> 0x12f014f68: cmp w6, w16 <108> 0x12f014f6c: b.ne 0x12f014f90 -> <144> 2:Term PatternCharacter checked-offset:(4) 'b' already handled 3:Term PatternCharacter checked-offset:(4) 'c' already handled 4:Term PatternCharacterClass checked-offset:(4) <digits> ... This eliminating a load, compare and branch. The more general case is to use larger load, compare and branch code sequences for single character atoms, including patterns that have mixed in single character width character class atoms.
Attachments
Radar WebKit Bug Importer
Comment 1 2025-03-11 14:40:17 PDT
Michael Saboff
Comment 2 2025-03-11 15:28:33 PDT
EWS
Comment 3 2025-03-12 01:38:01 PDT
Committed 292003@main (1e14cbbdc2f5): <https://commits.webkit.org/292003@main> Reviewed commits have been landed. Closing PR #42284 and removing active labels.
EWS
Comment 4 2025-03-31 12:40:08 PDT
Committed 289651.362@safari-7621-branch (b78009996aa0): <https://commits.webkit.org/289651.362@safari-7621-branch> Reviewed commits have been landed. Closing PR #2897 and removing active labels.
Note You need to log in before you can comment on or make changes to this bug.