Bug 289567
| Summary: | [Yarr] Improve processing of adjacent or near adjacent single characters | ||
|---|---|---|---|
| Product: | WebKit | Reporter: | Michael Saboff <msaboff> |
| Component: | JavaScriptCore | Assignee: | Michael Saboff <msaboff> |
| Status: | RESOLVED FIXED | ||
| Severity: | Normal | CC: | webkit-bug-importer |
| Priority: | P2 | Keywords: | InRadar |
| Version: | Other | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
Michael Saboff
There currently is an optimization in the Yarr JIT where we process adjacent single character atoms. For example, /abcd/ is processed as:
1:Term PatternCharacter checked-offset:(4) 'a'
<44> 0x12f018b6c: sub x17, x0, #4
<48> 0x12f018b70: ldr w17, [x17, x1]
<52> 0x12f018b74: movz w16, #0x6261
<56> 0x12f018b78: movk w16, #0x6463, lsl #16 -> 0x64636261
<60> 0x12f018b7c: cmp w17, w16
<64> 0x12f018b80: b.ne 0x12f018b90 -> <80>
2:Term PatternCharacter checked-offset:(4) 'b' already handled
3:Term PatternCharacter checked-offset:(4) 'c' already handled
4:Term PatternCharacter checked-offset:(4) 'd' already handled
but if there is something in between we could check characters that a nearly adjacent individually. For example, /a\dbc/ is currently processed as:
1:Term PatternCharacter checked-offset:(4) 'a'
<84> 0x12f015054: sub x17, x0, #4
<88> 0x12f015058: ldrb w6, [x17, x1]
<92> 0x12f01505c: cmp w6, #97
<96> 0x12f015060: b.ne 0x12f015098 -> <152>
2:Term PatternCharacter checked-offset:(4) 'b'
<100> 0x12f015064: sub x17, x0, #2
<104> 0x12f015068: ldrh w6, [x17, x1]
<108> 0x12f01506c: movz w16, #0x6362 -> 25442
<112> 0x12f015070: cmp w6, w16
<116> 0x12f015074: b.ne 0x12f015098 -> <152>
3:Term PatternCharacter checked-offset:(4) 'c' already handled
4:Term PatternCharacterClass checked-offset:(4) <digits>
...
Note that we have an existing optimization to move the matching of character classes to after single character atoms.
For the second case, we could load 4 characters and mask out the character class character like:
1:Term PatternCharacter checked-offset:(4) 'a'
<84> 0x12f014f54: sub x17, x0, #4
<88> 0x12f014f58: ldr w6, [x17, x1]
<92> 0x12f014f5c: and w6, w6, #0xffff00ff
<96> 0x12f014f60: movz w16, #0x61
<100> 0x12f014f64: movk w16, #0x6362, lsl #16 -> 0x63620061
<104> 0x12f014f68: cmp w6, w16
<108> 0x12f014f6c: b.ne 0x12f014f90 -> <144>
2:Term PatternCharacter checked-offset:(4) 'b' already handled
3:Term PatternCharacter checked-offset:(4) 'c' already handled
4:Term PatternCharacterClass checked-offset:(4) <digits>
...
This eliminating a load, compare and branch.
The more general case is to use larger load, compare and branch code sequences for single character atoms, including patterns that have mixed in single character width character class atoms.
| Attachments | ||
|---|---|---|
| Add attachment proposed patch, testcase, etc. |
Radar WebKit Bug Importer
<rdar://problem/146795365>
Michael Saboff
Pull request: https://github.com/WebKit/WebKit/pull/42284
EWS
Committed 292003@main (1e14cbbdc2f5): <https://commits.webkit.org/292003@main>
Reviewed commits have been landed. Closing PR #42284 and removing active labels.
EWS
Committed 289651.362@safari-7621-branch (b78009996aa0): <https://commits.webkit.org/289651.362@safari-7621-branch>
Reviewed commits have been landed. Closing PR #2897 and removing active labels.