Bug 285121
Summary: | [Yarr] Improve processing of \s* and related simple Character Class atoms | ||
---|---|---|---|
Product: | WebKit | Reporter: | Michael Saboff <msaboff> |
Component: | New Bugs | Assignee: | Michael Saboff <msaboff> |
Status: | RESOLVED FIXED | ||
Severity: | Normal | CC: | webkit-bug-importer, ysuzuki |
Priority: | P2 | Keywords: | InRadar |
Version: | Other | ||
Hardware: | Unspecified | ||
OS: | Unspecified |
Michael Saboff
Current the JIT code generated for matching \s* for a 8 bit string is:
1:Term PatternCharacterClass checked-offset:(0) <whitespace> {0,...} greedy
<68> 0x11aabce04: movz w7, #0x0
<72> 0x11aabce08: cmp w1, w2
<76> 0x11aabce0c: b.eq 0x11aabce44 -> <132>
<80> 0x11aabce10: ldrb w6, [x0, x1]
<84> 0x11aabce14: cmp w6, #9
<88> 0x11aabce18: b.lt 0x11aabce34 -> <116>
<92> 0x11aabce1c: cmp w6, #13
<96> 0x11aabce20: b.le 0x11aabce38 -> <120>
<100> 0x11aabce24: cmp w6, #32
<104> 0x11aabce28: b.eq 0x11aabce38 -> <120>
<108> 0x11aabce2c: cmp w6, #160
<112> 0x11aabce30: b.eq 0x11aabce38 -> <120>
<116> 0x11aabce34: b 0x11aabce44 -> <132>
<120> 0x11aabce38: add w1, w1, #1
<124> 0x11aabce3c: add w7, w7, #1
<128> 0x11aabce40: b 0x11aabce08 -> <72>
<132> 0x11aabce44: stur x7, [sp, #8]
The JIT code generated for matching the same atom for 16 bit strings is slightly better:
1:Term PatternCharacterClass checked-offset:(0) <whitespace> {0,...} greedy
<68> 0x11aabcf44: movz w7, #0x0
<72> 0x11aabcf48: cmp w1, w2
<76> 0x11aabcf4c: b.eq 0x11aabcf78 -> <120>
<80> 0x11aabcf50: ldrh w6, [x0, x1, lsl #1]
<84> 0x11aabcf54: movz x17, #0xb501
<88> 0x11aabcf58: movk x17, #0xe5d, lsl #16
<92> 0x11aabcf5c: movk x17, #0x1, lsl #32 -> 0x10e5db501
<96> 0x11aabcf60: ldrb w17, [x6, x17]
<100> 0x11aabcf64: cbnz w17, 0x11aabcf6c -> <108>
<104> 0x11aabcf68: b 0x11aabcf78 -> <120>
<108> 0x11aabcf6c: add w1, w1, #1
<112> 0x11aabcf70: add w7, w7, #1
<116> 0x11aabcf74: b 0x11aabcf48 -> <72>
<120> 0x11aabcf78: stur x7, [sp, #8]
There are two issues with the 8 bit matching. First it isn't using the character table from the builtin spaces character class. The second issue is that we branch over a branch (instructions at offset 112 & 116). The 16 bit matching code only has the branch over a branch issue (see instructions at offset 100 * 104).
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
Radar WebKit Bug Importer
<rdar://problem/141967884>
Michael Saboff
Pull request: https://github.com/WebKit/WebKit/pull/38353
EWS
Committed 288284@main (33a069473f47): <https://commits.webkit.org/288284@main>
Reviewed commits have been landed. Closing PR #38353 and removing active labels.