...
Created attachment 434606 [details] Patch
Created attachment 434607 [details] Patch
Created attachment 434655 [details] Patch
Created attachment 434659 [details] Patch
Created attachment 434661 [details] Patch
Created attachment 434681 [details] Patch
Created attachment 434682 [details] Patch
Comment on attachment 434682 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434682&action=review > Source/JavaScriptCore/yarr/YarrJIT.cpp:2462 > if (!m_pattern.m_body->m_hasFixedSize) { Previously, this path was dead code since we run this only when `disjunction->m_hasFixedSize` is true. After removing that restriction, I found the bug that we didn't update that when the match is failed, that's the above change.
Created attachment 434704 [details] Patch
Created attachment 434770 [details] Patch
Created attachment 434773 [details] Patch
Comment on attachment 434773 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434773&action=review > Source/JavaScriptCore/yarr/YarrJIT.cpp:3974 > + if (!characterClass.m_rangesUnicode.isEmpty()) > + bmInfo.addRanges(cursor, characterClass.m_rangesUnicode); > + if (!characterClass.m_matchesUnicode.isEmpty()) > + bmInfo.addCharacters(cursor, characterClass.m_matchesUnicode); don't we check that the regex isn't unicode before collecting BM info? > Source/JavaScriptCore/yarr/YarrJIT.h:111 > + if (static_cast<unsigned>(range.end - range.begin + 1) >= mapSize) { why isn't this looking at m_count + (range.end-range.begin+1)? > Source/JavaScriptCore/yarr/YarrJIT.h:122 > + m_count = mapSize; we don't need to set the actual bits?
Comment on attachment 434773 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434773&action=review >> Source/JavaScriptCore/yarr/YarrJIT.cpp:3974 >> + bmInfo.addCharacters(cursor, characterClass.m_matchesUnicode); > > don't we check that the regex isn't unicode before collecting BM info? unicode() flag is whether we decode surrogate-pairs. This is different concept from including unicode in RegExp, so RegExp can include non-ASCII characters without unicode flag. >> Source/JavaScriptCore/yarr/YarrJIT.h:111 >> + if (static_cast<unsigned>(range.end - range.begin + 1) >= mapSize) { > > why isn't this looking at m_count + (range.end-range.begin+1)? It is possible that these characters overlaps with the already included characters in the bitmap. In that case, `m_count + (range.end-range.begin+1)` is too restrictive. On the other hand, if the range exceeds the mapSize, then we can definitely say "this range does not fit in mapSize". >> Source/JavaScriptCore/yarr/YarrJIT.h:122 >> + m_count = mapSize; > > we don't need to set the actual bits? This map-count value is used to check whether we should care this bitmap in `findBestCharacterSequence`. So by setting it to mapSize, we can say "the collected characters are not sure, but this map is not useful".
Comment on attachment 434773 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434773&action=review r=me >>> Source/JavaScriptCore/yarr/YarrJIT.h:122 >>> + m_count = mapSize; >> >> we don't need to set the actual bits? > > This map-count value is used to check whether we should care this bitmap in `findBestCharacterSequence`. So by setting it to mapSize, we can say "the collected characters are not sure, but this map is not useful". can we static assert that the limit is <= mapSize?
Comment on attachment 434773 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434773&action=review >>>> Source/JavaScriptCore/yarr/YarrJIT.h:122 >>>> + m_count = mapSize; >>> >>> we don't need to set the actual bits? >> >> This map-count value is used to check whether we should care this bitmap in `findBestCharacterSequence`. So by setting it to mapSize, we can say "the collected characters are not sure, but this map is not useful". > > can we static assert that the limit is <= mapSize? Yes, done :)
Committed r280570 (240194@main): <https://commits.webkit.org/240194@main>
<rdar://problem/81435189>
Comment on attachment 434773 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434773&action=review > Source/JavaScriptCore/yarr/YarrJIT.h:98 > + for (UChar character : characters) Why is this narrowing the UChar32 to UChar?
Comment on attachment 434773 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434773&action=review >> Source/JavaScriptCore/yarr/YarrJIT.h:98 >> + for (UChar character : characters) > > Why is this narrowing the UChar32 to UChar? Oops, I'll change it. (but this is not an issue since we mask this character with 0x7f anyway in `add`.
Committed r280577 (240199@main): <https://commits.webkit.org/240199@main>
Comment on attachment 434773 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434773&action=review >>> Source/JavaScriptCore/yarr/YarrJIT.h:98 >>> + for (UChar character : characters) >> >> Why is this narrowing the UChar32 to UChar? > > Oops, I'll change it. (but this is not an issue since we mask this character with 0x7f anyway in `add`. Glad this didn’t matter. It’s an example of why I like auto so much.