WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
228613
[JSC] Yarr BoyerMoore search should support character-class
https://bugs.webkit.org/show_bug.cgi?id=228613
Summary
[JSC] Yarr BoyerMoore search should support character-class
Yusuke Suzuki
Reported
2021-07-29 15:12:51 PDT
...
Attachments
Patch
(13.42 KB, patch)
2021-07-29 23:45 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(15.11 KB, patch)
2021-07-29 23:52 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(15.38 KB, patch)
2021-07-30 12:53 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(19.65 KB, patch)
2021-07-30 13:14 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(19.87 KB, patch)
2021-07-30 13:22 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(19.83 KB, patch)
2021-07-30 18:26 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(22.04 KB, patch)
2021-07-30 19:40 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(22.05 KB, patch)
2021-07-31 13:56 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(23.78 KB, patch)
2021-08-02 10:55 PDT
,
Yusuke Suzuki
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(23.56 KB, patch)
2021-08-02 11:12 PDT
,
Yusuke Suzuki
saam
: review+
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Show Obsolete
(9)
View All
Add attachment
proposed patch, testcase, etc.
Yusuke Suzuki
Comment 1
2021-07-29 23:45:13 PDT
Created
attachment 434606
[details]
Patch
Yusuke Suzuki
Comment 2
2021-07-29 23:52:00 PDT
Created
attachment 434607
[details]
Patch
Yusuke Suzuki
Comment 3
2021-07-30 12:53:44 PDT
Created
attachment 434655
[details]
Patch
Yusuke Suzuki
Comment 4
2021-07-30 13:14:35 PDT
Created
attachment 434659
[details]
Patch
Yusuke Suzuki
Comment 5
2021-07-30 13:22:01 PDT
Created
attachment 434661
[details]
Patch
Yusuke Suzuki
Comment 6
2021-07-30 18:26:20 PDT
Created
attachment 434681
[details]
Patch
Yusuke Suzuki
Comment 7
2021-07-30 19:40:48 PDT
Created
attachment 434682
[details]
Patch
Yusuke Suzuki
Comment 8
2021-07-30 19:42:56 PDT
Comment on
attachment 434682
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=434682&action=review
> Source/JavaScriptCore/yarr/YarrJIT.cpp:2462 > if (!m_pattern.m_body->m_hasFixedSize) {
Previously, this path was dead code since we run this only when `disjunction->m_hasFixedSize` is true. After removing that restriction, I found the bug that we didn't update that when the match is failed, that's the above change.
Yusuke Suzuki
Comment 9
2021-07-31 13:56:40 PDT
Created
attachment 434704
[details]
Patch
Yusuke Suzuki
Comment 10
2021-08-02 10:55:45 PDT
Created
attachment 434770
[details]
Patch
Yusuke Suzuki
Comment 11
2021-08-02 11:12:03 PDT
Created
attachment 434773
[details]
Patch
Saam Barati
Comment 12
2021-08-02 15:50:14 PDT
Comment on
attachment 434773
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=434773&action=review
> Source/JavaScriptCore/yarr/YarrJIT.cpp:3974 > + if (!characterClass.m_rangesUnicode.isEmpty()) > + bmInfo.addRanges(cursor, characterClass.m_rangesUnicode); > + if (!characterClass.m_matchesUnicode.isEmpty()) > + bmInfo.addCharacters(cursor, characterClass.m_matchesUnicode);
don't we check that the regex isn't unicode before collecting BM info?
> Source/JavaScriptCore/yarr/YarrJIT.h:111 > + if (static_cast<unsigned>(range.end - range.begin + 1) >= mapSize) {
why isn't this looking at m_count + (range.end-range.begin+1)?
> Source/JavaScriptCore/yarr/YarrJIT.h:122 > + m_count = mapSize;
we don't need to set the actual bits?
Yusuke Suzuki
Comment 13
2021-08-02 16:00:25 PDT
Comment on
attachment 434773
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=434773&action=review
>> Source/JavaScriptCore/yarr/YarrJIT.cpp:3974 >> + bmInfo.addCharacters(cursor, characterClass.m_matchesUnicode); > > don't we check that the regex isn't unicode before collecting BM info?
unicode() flag is whether we decode surrogate-pairs. This is different concept from including unicode in RegExp, so RegExp can include non-ASCII characters without unicode flag.
>> Source/JavaScriptCore/yarr/YarrJIT.h:111 >> + if (static_cast<unsigned>(range.end - range.begin + 1) >= mapSize) { > > why isn't this looking at m_count + (range.end-range.begin+1)?
It is possible that these characters overlaps with the already included characters in the bitmap. In that case, `m_count + (range.end-range.begin+1)` is too restrictive. On the other hand, if the range exceeds the mapSize, then we can definitely say "this range does not fit in mapSize".
>> Source/JavaScriptCore/yarr/YarrJIT.h:122 >> + m_count = mapSize; > > we don't need to set the actual bits?
This map-count value is used to check whether we should care this bitmap in `findBestCharacterSequence`. So by setting it to mapSize, we can say "the collected characters are not sure, but this map is not useful".
Saam Barati
Comment 14
2021-08-02 16:06:01 PDT
Comment on
attachment 434773
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=434773&action=review
r=me
>>> Source/JavaScriptCore/yarr/YarrJIT.h:122 >>> + m_count = mapSize; >> >> we don't need to set the actual bits? > > This map-count value is used to check whether we should care this bitmap in `findBestCharacterSequence`. So by setting it to mapSize, we can say "the collected characters are not sure, but this map is not useful".
can we static assert that the limit is <= mapSize?
Yusuke Suzuki
Comment 15
2021-08-02 16:31:20 PDT
Comment on
attachment 434773
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=434773&action=review
>>>> Source/JavaScriptCore/yarr/YarrJIT.h:122 >>>> + m_count = mapSize; >>> >>> we don't need to set the actual bits? >> >> This map-count value is used to check whether we should care this bitmap in `findBestCharacterSequence`. So by setting it to mapSize, we can say "the collected characters are not sure, but this map is not useful". > > can we static assert that the limit is <= mapSize?
Yes, done :)
Yusuke Suzuki
Comment 16
2021-08-02 16:43:24 PDT
Committed
r280570
(
240194@main
): <
https://commits.webkit.org/240194@main
>
Radar WebKit Bug Importer
Comment 17
2021-08-02 16:44:25 PDT
<
rdar://problem/81435189
>
Darin Adler
Comment 18
2021-08-02 17:40:06 PDT
Comment on
attachment 434773
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=434773&action=review
> Source/JavaScriptCore/yarr/YarrJIT.h:98 > + for (UChar character : characters)
Why is this narrowing the UChar32 to UChar?
Yusuke Suzuki
Comment 19
2021-08-02 18:05:48 PDT
Comment on
attachment 434773
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=434773&action=review
>> Source/JavaScriptCore/yarr/YarrJIT.h:98 >> + for (UChar character : characters) > > Why is this narrowing the UChar32 to UChar?
Oops, I'll change it. (but this is not an issue since we mask this character with 0x7f anyway in `add`.
Yusuke Suzuki
Comment 20
2021-08-02 18:11:38 PDT
Committed
r280577
(
240199@main
): <
https://commits.webkit.org/240199@main
>
Darin Adler
Comment 21
2021-08-03 13:25:30 PDT
Comment on
attachment 434773
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=434773&action=review
>>> Source/JavaScriptCore/yarr/YarrJIT.h:98 >>> + for (UChar character : characters) >> >> Why is this narrowing the UChar32 to UChar? > > Oops, I'll change it. (but this is not an issue since we mask this character with 0x7f anyway in `add`.
Glad this didn’t matter. It’s an example of why I like auto so much.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug