Bug 228613

Summary: [JSC] Yarr BoyerMoore search should support character-class
Product: WebKit Reporter: Yusuke Suzuki <ysuzuki>
Component: JavaScriptCoreAssignee: Yusuke Suzuki <ysuzuki>
Status: RESOLVED FIXED    
Severity: Normal CC: darin, ews-watchlist, keith_miller, mark.lam, msaboff, saam, tzagallo, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
ews-feeder: commit-queue-
Patch saam: review+, ews-feeder: commit-queue-

Yusuke Suzuki
Reported 2021-07-29 15:12:51 PDT
...
Attachments
Patch (13.42 KB, patch)
2021-07-29 23:45 PDT, Yusuke Suzuki
no flags
Patch (15.11 KB, patch)
2021-07-29 23:52 PDT, Yusuke Suzuki
no flags
Patch (15.38 KB, patch)
2021-07-30 12:53 PDT, Yusuke Suzuki
no flags
Patch (19.65 KB, patch)
2021-07-30 13:14 PDT, Yusuke Suzuki
no flags
Patch (19.87 KB, patch)
2021-07-30 13:22 PDT, Yusuke Suzuki
no flags
Patch (19.83 KB, patch)
2021-07-30 18:26 PDT, Yusuke Suzuki
no flags
Patch (22.04 KB, patch)
2021-07-30 19:40 PDT, Yusuke Suzuki
no flags
Patch (22.05 KB, patch)
2021-07-31 13:56 PDT, Yusuke Suzuki
no flags
Patch (23.78 KB, patch)
2021-08-02 10:55 PDT, Yusuke Suzuki
ews-feeder: commit-queue-
Patch (23.56 KB, patch)
2021-08-02 11:12 PDT, Yusuke Suzuki
saam: review+
ews-feeder: commit-queue-
Yusuke Suzuki
Comment 1 2021-07-29 23:45:13 PDT
Yusuke Suzuki
Comment 2 2021-07-29 23:52:00 PDT
Yusuke Suzuki
Comment 3 2021-07-30 12:53:44 PDT
Yusuke Suzuki
Comment 4 2021-07-30 13:14:35 PDT
Yusuke Suzuki
Comment 5 2021-07-30 13:22:01 PDT
Yusuke Suzuki
Comment 6 2021-07-30 18:26:20 PDT
Yusuke Suzuki
Comment 7 2021-07-30 19:40:48 PDT
Yusuke Suzuki
Comment 8 2021-07-30 19:42:56 PDT
Comment on attachment 434682 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434682&action=review > Source/JavaScriptCore/yarr/YarrJIT.cpp:2462 > if (!m_pattern.m_body->m_hasFixedSize) { Previously, this path was dead code since we run this only when `disjunction->m_hasFixedSize` is true. After removing that restriction, I found the bug that we didn't update that when the match is failed, that's the above change.
Yusuke Suzuki
Comment 9 2021-07-31 13:56:40 PDT
Yusuke Suzuki
Comment 10 2021-08-02 10:55:45 PDT
Yusuke Suzuki
Comment 11 2021-08-02 11:12:03 PDT
Saam Barati
Comment 12 2021-08-02 15:50:14 PDT
Comment on attachment 434773 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434773&action=review > Source/JavaScriptCore/yarr/YarrJIT.cpp:3974 > + if (!characterClass.m_rangesUnicode.isEmpty()) > + bmInfo.addRanges(cursor, characterClass.m_rangesUnicode); > + if (!characterClass.m_matchesUnicode.isEmpty()) > + bmInfo.addCharacters(cursor, characterClass.m_matchesUnicode); don't we check that the regex isn't unicode before collecting BM info? > Source/JavaScriptCore/yarr/YarrJIT.h:111 > + if (static_cast<unsigned>(range.end - range.begin + 1) >= mapSize) { why isn't this looking at m_count + (range.end-range.begin+1)? > Source/JavaScriptCore/yarr/YarrJIT.h:122 > + m_count = mapSize; we don't need to set the actual bits?
Yusuke Suzuki
Comment 13 2021-08-02 16:00:25 PDT
Comment on attachment 434773 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434773&action=review >> Source/JavaScriptCore/yarr/YarrJIT.cpp:3974 >> + bmInfo.addCharacters(cursor, characterClass.m_matchesUnicode); > > don't we check that the regex isn't unicode before collecting BM info? unicode() flag is whether we decode surrogate-pairs. This is different concept from including unicode in RegExp, so RegExp can include non-ASCII characters without unicode flag. >> Source/JavaScriptCore/yarr/YarrJIT.h:111 >> + if (static_cast<unsigned>(range.end - range.begin + 1) >= mapSize) { > > why isn't this looking at m_count + (range.end-range.begin+1)? It is possible that these characters overlaps with the already included characters in the bitmap. In that case, `m_count + (range.end-range.begin+1)` is too restrictive. On the other hand, if the range exceeds the mapSize, then we can definitely say "this range does not fit in mapSize". >> Source/JavaScriptCore/yarr/YarrJIT.h:122 >> + m_count = mapSize; > > we don't need to set the actual bits? This map-count value is used to check whether we should care this bitmap in `findBestCharacterSequence`. So by setting it to mapSize, we can say "the collected characters are not sure, but this map is not useful".
Saam Barati
Comment 14 2021-08-02 16:06:01 PDT
Comment on attachment 434773 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434773&action=review r=me >>> Source/JavaScriptCore/yarr/YarrJIT.h:122 >>> + m_count = mapSize; >> >> we don't need to set the actual bits? > > This map-count value is used to check whether we should care this bitmap in `findBestCharacterSequence`. So by setting it to mapSize, we can say "the collected characters are not sure, but this map is not useful". can we static assert that the limit is <= mapSize?
Yusuke Suzuki
Comment 15 2021-08-02 16:31:20 PDT
Comment on attachment 434773 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434773&action=review >>>> Source/JavaScriptCore/yarr/YarrJIT.h:122 >>>> + m_count = mapSize; >>> >>> we don't need to set the actual bits? >> >> This map-count value is used to check whether we should care this bitmap in `findBestCharacterSequence`. So by setting it to mapSize, we can say "the collected characters are not sure, but this map is not useful". > > can we static assert that the limit is <= mapSize? Yes, done :)
Yusuke Suzuki
Comment 16 2021-08-02 16:43:24 PDT
Radar WebKit Bug Importer
Comment 17 2021-08-02 16:44:25 PDT
Darin Adler
Comment 18 2021-08-02 17:40:06 PDT
Comment on attachment 434773 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434773&action=review > Source/JavaScriptCore/yarr/YarrJIT.h:98 > + for (UChar character : characters) Why is this narrowing the UChar32 to UChar?
Yusuke Suzuki
Comment 19 2021-08-02 18:05:48 PDT
Comment on attachment 434773 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434773&action=review >> Source/JavaScriptCore/yarr/YarrJIT.h:98 >> + for (UChar character : characters) > > Why is this narrowing the UChar32 to UChar? Oops, I'll change it. (but this is not an issue since we mask this character with 0x7f anyway in `add`.
Yusuke Suzuki
Comment 20 2021-08-02 18:11:38 PDT
Darin Adler
Comment 21 2021-08-03 13:25:30 PDT
Comment on attachment 434773 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434773&action=review >>> Source/JavaScriptCore/yarr/YarrJIT.h:98 >>> + for (UChar character : characters) >> >> Why is this narrowing the UChar32 to UChar? > > Oops, I'll change it. (but this is not an issue since we mask this character with 0x7f anyway in `add`. Glad this didn’t matter. It’s an example of why I like auto so much.
Note You need to log in before you can comment on or make changes to this bug.