Bug 228301

Summary: [JSC] Yarr should perform BoyerMoore search
Product: WebKit Reporter: Yusuke Suzuki <ysuzuki>
Component: JavaScriptCoreAssignee: Yusuke Suzuki <ysuzuki>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, cdumez, cmarcelo, ews-watchlist, keith_miller, mark.lam, msaboff, saam, sam, tzagallo, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
ews-feeder: commit-queue-
Patch
ews-feeder: commit-queue-
Patch
ews-feeder: commit-queue-
Patch
none
Patch
ews-feeder: commit-queue-
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
ews-feeder: commit-queue-
Patch saam: review+

Description Yusuke Suzuki 2021-07-26 14:08:46 PDT
We would like to start the simple one. Let's try to deploy BM search for the top-level disjunction first.
Then, let's extend the coverage for the other disjunctions.
Comment 1 Yusuke Suzuki 2021-07-26 14:18:47 PDT
Created attachment 434238 [details]
Patch
Comment 2 Yusuke Suzuki 2021-07-26 14:24:52 PDT
Created attachment 434239 [details]
Patch
Comment 3 Yusuke Suzuki 2021-07-27 14:54:43 PDT
Created attachment 434319 [details]
Patch
Comment 4 Yusuke Suzuki 2021-07-27 14:58:32 PDT
Created attachment 434320 [details]
Patch
Comment 5 Yusuke Suzuki 2021-07-27 15:13:55 PDT
Created attachment 434343 [details]
Patch
Comment 6 Yusuke Suzuki 2021-07-27 16:35:06 PDT
Created attachment 434384 [details]
Patch
Comment 7 Yusuke Suzuki 2021-07-27 19:43:09 PDT
Created attachment 434394 [details]
Patch
Comment 8 Yusuke Suzuki 2021-07-27 20:21:25 PDT
Created attachment 434397 [details]
Patch
Comment 9 Yusuke Suzuki 2021-07-27 20:37:09 PDT
Created attachment 434398 [details]
Patch
Comment 10 Yusuke Suzuki 2021-07-27 21:03:41 PDT
Created attachment 434399 [details]
Patch
Comment 11 Yusuke Suzuki 2021-07-27 23:01:13 PDT
Created attachment 434403 [details]
Patch
Comment 12 Yusuke Suzuki 2021-07-28 16:42:14 PDT
Created attachment 434474 [details]
Patch
Comment 13 Sam Weinig 2021-07-28 17:46:49 PDT
Fun! I would replace "BM" with "BoyerMoore" to make it a bit clearer.
Comment 14 Yusuke Suzuki 2021-07-28 17:52:36 PDT
(In reply to Sam Weinig from comment #13)
> Fun! I would replace "BM" with "BoyerMoore" to make it a bit clearer.

Sounds good, relaced.
Comment 15 Yusuke Suzuki 2021-07-28 17:55:13 PDT
Created attachment 434480 [details]
Patch
Comment 16 Yusuke Suzuki 2021-07-28 17:57:39 PDT
Created attachment 434481 [details]
Patch
Comment 17 Yusuke Suzuki 2021-07-28 18:00:25 PDT
Created attachment 434482 [details]
Patch
Comment 18 Yusuke Suzuki 2021-07-28 18:07:05 PDT
Created attachment 434484 [details]
Patch
Comment 19 Yusuke Suzuki 2021-07-28 18:51:22 PDT
Created attachment 434486 [details]
Patch
Comment 20 Yusuke Suzuki 2021-07-28 18:54:06 PDT
Created attachment 434488 [details]
Patch
Comment 21 Saam Barati 2021-07-29 13:56:24 PDT
Comment on attachment 434488 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=434488&action=review

r=me

I also suggest filing/linking bugs for your FIXMEs

> Source/JavaScriptCore/yarr/YarrJIT.cpp:944
> +        BoyerMooreInfo* m_bmInfo;

can we just default this to nullptr so we don't need to set it at the call sites

> Source/JavaScriptCore/yarr/YarrJIT.cpp:2415
> +                            auto vector = makeUniqueRef<BoyerMooreByteVector>(map);
> +                            dataLogLnIf(Options::verboseRegExpCompilation(), "Found bitmap lookahead count:(", mapCount, "),range:[", beginIndex, ", ", endIndex, ")");
> +                            const uint8_t* pointer = vector->data();
> +                            if (const auto* existingPointer = m_codeBlock.findSameVector(op.m_bmInfo->index(), vector.get()))
> +                                pointer = existingPointer;
> +                            else
> +                                m_bmVector.append(WTFMove(vector));

small nit: I think it's nicer to encapsulate this logic in its own function, so we don't mistakenly use a stale pointer somehow. Like, "getBMByteVector" or something, and it take in BoyerMooreBitmap::Map&

> Source/JavaScriptCore/yarr/YarrJIT.h:127
> +    void set8BitCode(MacroAssemblerCodeRef<Yarr8BitPtrTag> ref, Vector<UniqueRef<BoyerMooreByteVector>>&& maps)

this should take Vector& not Vector&& unless it actually takes ownership over "maps". Alternatively, it can take "Vector", and take ownership

> Source/JavaScriptCore/yarr/YarrJIT.h:134
> +    void set16BitCode(MacroAssemblerCodeRef<Yarr16BitPtrTag> ref, Vector<UniqueRef<BoyerMooreByteVector>>&& maps)

ditto

> Source/JavaScriptCore/yarr/YarrJIT.h:144
> +    void set8BitCodeMatchOnly(MacroAssemblerCodeRef<YarrMatchOnly8BitPtrTag> matchOnly, Vector<UniqueRef<BoyerMooreByteVector>>&& maps)

ditto

> Source/JavaScriptCore/yarr/YarrJIT.h:151
> +    void set16BitCodeMatchOnly(MacroAssemblerCodeRef<YarrMatchOnly16BitPtrTag> matchOnly, Vector<UniqueRef<BoyerMooreByteVector>>&& maps)

ditto

> Source/JavaScriptCore/yarr/YarrJIT.h:243
> +    // Compile and Clear are guarded by RegExp's cell-lock. So these operations work atomically.

nit: would be nicer if they just took a const Locker& instead of a comment

> Source/JavaScriptCore/yarr/YarrJIT.h:254
> +    const uint8_t* findSameVector(unsigned index, BoyerMooreByteVector& vector) const

tryFindSameBoyerMooreVector? or tryFindSameBMVector? or tryReuseBMVector?

findSameVector is a weirdly generic name
Comment 22 Yusuke Suzuki 2021-07-29 15:05:40 PDT
Comment on attachment 434488 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=434488&action=review

Thanks!

>> Source/JavaScriptCore/yarr/YarrJIT.cpp:944
>> +        BoyerMooreInfo* m_bmInfo;
> 
> can we just default this to nullptr so we don't need to set it at the call sites

Fixed.

>> Source/JavaScriptCore/yarr/YarrJIT.cpp:2415
>> +                                m_bmVector.append(WTFMove(vector));
> 
> small nit: I think it's nicer to encapsulate this logic in its own function, so we don't mistakenly use a stale pointer somehow. Like, "getBMByteVector" or something, and it take in BoyerMooreBitmap::Map&

Fixed.

>> Source/JavaScriptCore/yarr/YarrJIT.h:127
>> +    void set8BitCode(MacroAssemblerCodeRef<Yarr8BitPtrTag> ref, Vector<UniqueRef<BoyerMooreByteVector>>&& maps)
> 
> this should take Vector& not Vector&& unless it actually takes ownership over "maps". Alternatively, it can take "Vector", and take ownership

Fixed.

>> Source/JavaScriptCore/yarr/YarrJIT.h:134
>> +    void set16BitCode(MacroAssemblerCodeRef<Yarr16BitPtrTag> ref, Vector<UniqueRef<BoyerMooreByteVector>>&& maps)
> 
> ditto

Fixed.

>> Source/JavaScriptCore/yarr/YarrJIT.h:144
>> +    void set8BitCodeMatchOnly(MacroAssemblerCodeRef<YarrMatchOnly8BitPtrTag> matchOnly, Vector<UniqueRef<BoyerMooreByteVector>>&& maps)
> 
> ditto

Fixed.

>> Source/JavaScriptCore/yarr/YarrJIT.h:151
>> +    void set16BitCodeMatchOnly(MacroAssemblerCodeRef<YarrMatchOnly16BitPtrTag> matchOnly, Vector<UniqueRef<BoyerMooreByteVector>>&& maps)
> 
> ditto

Fixed.

>> Source/JavaScriptCore/yarr/YarrJIT.h:243
>> +    // Compile and Clear are guarded by RegExp's cell-lock. So these operations work atomically.
> 
> nit: would be nicer if they just took a const Locker& instead of a comment

Fixed.

>> Source/JavaScriptCore/yarr/YarrJIT.h:254
>> +    const uint8_t* findSameVector(unsigned index, BoyerMooreByteVector& vector) const
> 
> tryFindSameBoyerMooreVector? or tryFindSameBMVector? or tryReuseBMVector?
> 
> findSameVector is a weirdly generic name

Fixed.
Comment 23 Yusuke Suzuki 2021-07-29 15:26:22 PDT
Committed r280452 (240087@main): <https://commits.webkit.org/240087@main>
Comment 24 Radar WebKit Bug Importer 2021-07-29 15:27:19 PDT
<rdar://problem/81293620>
Comment 25 Yusuke Suzuki 2021-07-30 14:16:53 PDT
Committed r280496 (240128@main): <https://commits.webkit.org/240128@main>