RESOLVED FIXED 228301
[JSC] Yarr should perform BoyerMoore search
https://bugs.webkit.org/show_bug.cgi?id=228301
Summary [JSC] Yarr should perform BoyerMoore search
Yusuke Suzuki
Reported 2021-07-26 14:08:46 PDT
We would like to start the simple one. Let's try to deploy BM search for the top-level disjunction first. Then, let's extend the coverage for the other disjunctions.
Attachments
Patch (7.58 KB, patch)
2021-07-26 14:18 PDT, Yusuke Suzuki
no flags
Patch (8.26 KB, patch)
2021-07-26 14:24 PDT, Yusuke Suzuki
no flags
Patch (23.67 KB, patch)
2021-07-27 14:54 PDT, Yusuke Suzuki
no flags
Patch (23.59 KB, patch)
2021-07-27 14:58 PDT, Yusuke Suzuki
ews-feeder: commit-queue-
Patch (24.00 KB, patch)
2021-07-27 15:13 PDT, Yusuke Suzuki
ews-feeder: commit-queue-
Patch (24.01 KB, patch)
2021-07-27 16:35 PDT, Yusuke Suzuki
ews-feeder: commit-queue-
Patch (26.73 KB, patch)
2021-07-27 19:43 PDT, Yusuke Suzuki
no flags
Patch (28.76 KB, patch)
2021-07-27 20:21 PDT, Yusuke Suzuki
ews-feeder: commit-queue-
Patch (28.87 KB, patch)
2021-07-27 20:37 PDT, Yusuke Suzuki
no flags
Patch (29.03 KB, patch)
2021-07-27 21:03 PDT, Yusuke Suzuki
no flags
Patch (29.12 KB, patch)
2021-07-27 23:01 PDT, Yusuke Suzuki
no flags
Patch (46.44 KB, patch)
2021-07-28 16:42 PDT, Yusuke Suzuki
no flags
Patch (46.91 KB, patch)
2021-07-28 17:55 PDT, Yusuke Suzuki
no flags
Patch (46.98 KB, patch)
2021-07-28 17:57 PDT, Yusuke Suzuki
no flags
Patch (47.27 KB, patch)
2021-07-28 18:00 PDT, Yusuke Suzuki
no flags
Patch (47.29 KB, patch)
2021-07-28 18:07 PDT, Yusuke Suzuki
no flags
Patch (47.78 KB, patch)
2021-07-28 18:51 PDT, Yusuke Suzuki
ews-feeder: commit-queue-
Patch (47.83 KB, patch)
2021-07-28 18:54 PDT, Yusuke Suzuki
saam: review+
Yusuke Suzuki
Comment 1 2021-07-26 14:18:47 PDT
Yusuke Suzuki
Comment 2 2021-07-26 14:24:52 PDT
Yusuke Suzuki
Comment 3 2021-07-27 14:54:43 PDT
Yusuke Suzuki
Comment 4 2021-07-27 14:58:32 PDT
Yusuke Suzuki
Comment 5 2021-07-27 15:13:55 PDT
Yusuke Suzuki
Comment 6 2021-07-27 16:35:06 PDT
Yusuke Suzuki
Comment 7 2021-07-27 19:43:09 PDT
Yusuke Suzuki
Comment 8 2021-07-27 20:21:25 PDT
Yusuke Suzuki
Comment 9 2021-07-27 20:37:09 PDT
Yusuke Suzuki
Comment 10 2021-07-27 21:03:41 PDT
Yusuke Suzuki
Comment 11 2021-07-27 23:01:13 PDT
Yusuke Suzuki
Comment 12 2021-07-28 16:42:14 PDT
Sam Weinig
Comment 13 2021-07-28 17:46:49 PDT
Fun! I would replace "BM" with "BoyerMoore" to make it a bit clearer.
Yusuke Suzuki
Comment 14 2021-07-28 17:52:36 PDT
(In reply to Sam Weinig from comment #13) > Fun! I would replace "BM" with "BoyerMoore" to make it a bit clearer. Sounds good, relaced.
Yusuke Suzuki
Comment 15 2021-07-28 17:55:13 PDT
Yusuke Suzuki
Comment 16 2021-07-28 17:57:39 PDT
Yusuke Suzuki
Comment 17 2021-07-28 18:00:25 PDT
Yusuke Suzuki
Comment 18 2021-07-28 18:07:05 PDT
Yusuke Suzuki
Comment 19 2021-07-28 18:51:22 PDT
Yusuke Suzuki
Comment 20 2021-07-28 18:54:06 PDT
Saam Barati
Comment 21 2021-07-29 13:56:24 PDT
Comment on attachment 434488 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434488&action=review r=me I also suggest filing/linking bugs for your FIXMEs > Source/JavaScriptCore/yarr/YarrJIT.cpp:944 > + BoyerMooreInfo* m_bmInfo; can we just default this to nullptr so we don't need to set it at the call sites > Source/JavaScriptCore/yarr/YarrJIT.cpp:2415 > + auto vector = makeUniqueRef<BoyerMooreByteVector>(map); > + dataLogLnIf(Options::verboseRegExpCompilation(), "Found bitmap lookahead count:(", mapCount, "),range:[", beginIndex, ", ", endIndex, ")"); > + const uint8_t* pointer = vector->data(); > + if (const auto* existingPointer = m_codeBlock.findSameVector(op.m_bmInfo->index(), vector.get())) > + pointer = existingPointer; > + else > + m_bmVector.append(WTFMove(vector)); small nit: I think it's nicer to encapsulate this logic in its own function, so we don't mistakenly use a stale pointer somehow. Like, "getBMByteVector" or something, and it take in BoyerMooreBitmap::Map& > Source/JavaScriptCore/yarr/YarrJIT.h:127 > + void set8BitCode(MacroAssemblerCodeRef<Yarr8BitPtrTag> ref, Vector<UniqueRef<BoyerMooreByteVector>>&& maps) this should take Vector& not Vector&& unless it actually takes ownership over "maps". Alternatively, it can take "Vector", and take ownership > Source/JavaScriptCore/yarr/YarrJIT.h:134 > + void set16BitCode(MacroAssemblerCodeRef<Yarr16BitPtrTag> ref, Vector<UniqueRef<BoyerMooreByteVector>>&& maps) ditto > Source/JavaScriptCore/yarr/YarrJIT.h:144 > + void set8BitCodeMatchOnly(MacroAssemblerCodeRef<YarrMatchOnly8BitPtrTag> matchOnly, Vector<UniqueRef<BoyerMooreByteVector>>&& maps) ditto > Source/JavaScriptCore/yarr/YarrJIT.h:151 > + void set16BitCodeMatchOnly(MacroAssemblerCodeRef<YarrMatchOnly16BitPtrTag> matchOnly, Vector<UniqueRef<BoyerMooreByteVector>>&& maps) ditto > Source/JavaScriptCore/yarr/YarrJIT.h:243 > + // Compile and Clear are guarded by RegExp's cell-lock. So these operations work atomically. nit: would be nicer if they just took a const Locker& instead of a comment > Source/JavaScriptCore/yarr/YarrJIT.h:254 > + const uint8_t* findSameVector(unsigned index, BoyerMooreByteVector& vector) const tryFindSameBoyerMooreVector? or tryFindSameBMVector? or tryReuseBMVector? findSameVector is a weirdly generic name
Yusuke Suzuki
Comment 22 2021-07-29 15:05:40 PDT
Comment on attachment 434488 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434488&action=review Thanks! >> Source/JavaScriptCore/yarr/YarrJIT.cpp:944 >> + BoyerMooreInfo* m_bmInfo; > > can we just default this to nullptr so we don't need to set it at the call sites Fixed. >> Source/JavaScriptCore/yarr/YarrJIT.cpp:2415 >> + m_bmVector.append(WTFMove(vector)); > > small nit: I think it's nicer to encapsulate this logic in its own function, so we don't mistakenly use a stale pointer somehow. Like, "getBMByteVector" or something, and it take in BoyerMooreBitmap::Map& Fixed. >> Source/JavaScriptCore/yarr/YarrJIT.h:127 >> + void set8BitCode(MacroAssemblerCodeRef<Yarr8BitPtrTag> ref, Vector<UniqueRef<BoyerMooreByteVector>>&& maps) > > this should take Vector& not Vector&& unless it actually takes ownership over "maps". Alternatively, it can take "Vector", and take ownership Fixed. >> Source/JavaScriptCore/yarr/YarrJIT.h:134 >> + void set16BitCode(MacroAssemblerCodeRef<Yarr16BitPtrTag> ref, Vector<UniqueRef<BoyerMooreByteVector>>&& maps) > > ditto Fixed. >> Source/JavaScriptCore/yarr/YarrJIT.h:144 >> + void set8BitCodeMatchOnly(MacroAssemblerCodeRef<YarrMatchOnly8BitPtrTag> matchOnly, Vector<UniqueRef<BoyerMooreByteVector>>&& maps) > > ditto Fixed. >> Source/JavaScriptCore/yarr/YarrJIT.h:151 >> + void set16BitCodeMatchOnly(MacroAssemblerCodeRef<YarrMatchOnly16BitPtrTag> matchOnly, Vector<UniqueRef<BoyerMooreByteVector>>&& maps) > > ditto Fixed. >> Source/JavaScriptCore/yarr/YarrJIT.h:243 >> + // Compile and Clear are guarded by RegExp's cell-lock. So these operations work atomically. > > nit: would be nicer if they just took a const Locker& instead of a comment Fixed. >> Source/JavaScriptCore/yarr/YarrJIT.h:254 >> + const uint8_t* findSameVector(unsigned index, BoyerMooreByteVector& vector) const > > tryFindSameBoyerMooreVector? or tryFindSameBMVector? or tryReuseBMVector? > > findSameVector is a weirdly generic name Fixed.
Yusuke Suzuki
Comment 23 2021-07-29 15:26:22 PDT
Radar WebKit Bug Importer
Comment 24 2021-07-29 15:27:19 PDT
Yusuke Suzuki
Comment 25 2021-07-30 14:16:53 PDT
Note You need to log in before you can comment on or make changes to this bug.