WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
228810
[JSC] Yarr's character tracking for BoyerMoore search should be more precise
https://bugs.webkit.org/show_bug.cgi?id=228810
Summary
[JSC] Yarr's character tracking for BoyerMoore search should be more precise
Yusuke Suzuki
Reported
2021-08-04 20:34:33 PDT
[JSC] Yarr's character tracking for BoyerMoore search should be more precise
Attachments
Patch
(14.19 KB, patch)
2021-08-04 20:37 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(14.44 KB, patch)
2021-08-04 23:31 PDT
,
Yusuke Suzuki
saam
: review+
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Yusuke Suzuki
Comment 1
2021-08-04 20:37:00 PDT
Created
attachment 434963
[details]
Patch
Yusuke Suzuki
Comment 2
2021-08-04 23:31:44 PDT
Created
attachment 434965
[details]
Patch
Saam Barati
Comment 3
2021-08-05 15:14:32 PDT
Comment on
attachment 434965
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=434965&action=review
r=me
> Source/JavaScriptCore/ChangeLog:8 > + We should track candidate characters without masking up to 2 so
let's fix this first sentence to be more clear like you suggested on slack.
> Source/JavaScriptCore/ChangeLog:12 > + We also remove m_table giving up. This is because m_table can be used
"We also remove m_table giving up...." => "We also add support for m_table because m_table can be used"
> Source/JavaScriptCore/ChangeLog:17 > + To make `\s` work on Char8 case, we use Char8 / Char16 information > + effectively in BoyreMoore search bitmap construction.
how so? It'd be helpful to share more detail here, like, what makes it "effective"
> Source/JavaScriptCore/yarr/YarrJIT.cpp:2414 > + auto [map, characters] = op.m_bmInfo->createCandidateBitmap(beginIndex, endIndex);
nit: "characters" is a generic name given this context. Similar suggestion to naming the class. maybe "charactersFastPath" or something?
> Source/JavaScriptCore/yarr/YarrJIT.cpp:2429 > UChar32 character2 = 0xff; > - if (mapCount == 2) { > - character2 = map.findBit(character1 + 1, true); > - ASSERT(character2 != BoyerMooreBitmap::Map::size()); > - } > - dataLogLnIf(Options::verboseRegExpCompilation(), "Found 1-or-2 characters lookahead character:(0x", hex(character1), "),character2:(", hex(character2), "),isMaskEffective:(", isMaskEffective,"),range:[", beginIndex, ", ", endIndex, ")"); > + if (characters.size() > 1) > + character2 = characters.at(1); > + dataLogLnIf(Options::verboseRegExpCompilation(), "Found ", characters.size(), " characters lookahead character:(0x", hex(character1), "),character2:(", hex(character2), "),range:[", beginIndex, ", ", endIndex, ")");
Can all of the "character2", "characters.size() > 1" just be moved into "characters.size() > 1" branch below? I think that'd make things a lot more clear. No need to set to 0xff, etc.
> Source/JavaScriptCore/yarr/YarrJIT.h:63 > +class BoyerMooreCharacterCandidates {
this name feels weirdly generic. I wonder if we can go with "BoyerMooreFastCase" or "BoyerMooreFastCandidates" or something similar that's just a bit more descriptive of what we're aiming for? Since this isn't really a comprehensive list of "character candidates".
Saam Barati
Comment 4
2021-08-05 15:14:52 PDT
Comment on
attachment 434965
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=434965&action=review
> Source/JavaScriptCore/yarr/YarrJIT.h:145 > + // Early return since characters are sorted.
can we debug assert they're sorted?
Yusuke Suzuki
Comment 5
2021-08-06 16:55:13 PDT
Comment on
attachment 434965
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=434965&action=review
Thanks
>> Source/JavaScriptCore/ChangeLog:8 >> + We should track candidate characters without masking up to 2 so > > let's fix this first sentence to be more clear like you suggested on slack.
Fixed.
>> Source/JavaScriptCore/ChangeLog:12 >> + We also remove m_table giving up. This is because m_table can be used > > "We also remove m_table giving up...." => "We also add support for m_table because m_table can be used"
Fixed.
>> Source/JavaScriptCore/ChangeLog:17 >> + effectively in BoyreMoore search bitmap construction. > > how so? It'd be helpful to share more detail here, like, what makes it "effective"
Fixed.
>> Source/JavaScriptCore/yarr/YarrJIT.cpp:2414 >> + auto [map, characters] = op.m_bmInfo->createCandidateBitmap(beginIndex, endIndex); > > nit: "characters" is a generic name given this context. Similar suggestion to naming the class. maybe "charactersFastPath" or something?
Fixed.
>> Source/JavaScriptCore/yarr/YarrJIT.cpp:2429 >> + dataLogLnIf(Options::verboseRegExpCompilation(), "Found ", characters.size(), " characters lookahead character:(0x", hex(character1), "),character2:(", hex(character2), "),range:[", beginIndex, ", ", endIndex, ")"); > > Can all of the "character2", "characters.size() > 1" just be moved into "characters.size() > 1" branch below? I think that'd make things a lot more clear. No need to set to 0xff, etc.
Fixed.
>> Source/JavaScriptCore/yarr/YarrJIT.h:63 >> +class BoyerMooreCharacterCandidates { > > this name feels weirdly generic. I wonder if we can go with "BoyerMooreFastCase" or "BoyerMooreFastCandidates" or something similar that's just a bit more descriptive of what we're aiming for? Since this isn't really a comprehensive list of "character candidates".
Fixed.
>> Source/JavaScriptCore/yarr/YarrJIT.h:145 >> + // Early return since characters are sorted. > > can we debug assert they're sorted?
Added.
Yusuke Suzuki
Comment 6
2021-08-06 17:03:18 PDT
Committed
r280744
(
240330@main
): <
https://commits.webkit.org/240330@main
>
Radar WebKit Bug Importer
Comment 7
2021-08-06 17:04:22 PDT
<
rdar://problem/81638511
>
Yusuke Suzuki
Comment 8
2021-08-06 17:26:36 PDT
Committed
r280745
(
240331@main
): <
https://commits.webkit.org/240331@main
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug