Bug 228301 - [JSC] Yarr should perform BoyerMoore search
Summary: [JSC] Yarr should perform BoyerMoore search
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Yusuke Suzuki
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-07-26 14:08 PDT by Yusuke Suzuki
Modified: 2021-07-30 14:16 PDT (History)
11 users (show)

See Also:


Attachments
Patch (7.58 KB, patch)
2021-07-26 14:18 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (8.26 KB, patch)
2021-07-26 14:24 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (23.67 KB, patch)
2021-07-27 14:54 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (23.59 KB, patch)
2021-07-27 14:58 PDT, Yusuke Suzuki
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (24.00 KB, patch)
2021-07-27 15:13 PDT, Yusuke Suzuki
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (24.01 KB, patch)
2021-07-27 16:35 PDT, Yusuke Suzuki
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (26.73 KB, patch)
2021-07-27 19:43 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (28.76 KB, patch)
2021-07-27 20:21 PDT, Yusuke Suzuki
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (28.87 KB, patch)
2021-07-27 20:37 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (29.03 KB, patch)
2021-07-27 21:03 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (29.12 KB, patch)
2021-07-27 23:01 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (46.44 KB, patch)
2021-07-28 16:42 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (46.91 KB, patch)
2021-07-28 17:55 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (46.98 KB, patch)
2021-07-28 17:57 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (47.27 KB, patch)
2021-07-28 18:00 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (47.29 KB, patch)
2021-07-28 18:07 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (47.78 KB, patch)
2021-07-28 18:51 PDT, Yusuke Suzuki
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (47.83 KB, patch)
2021-07-28 18:54 PDT, Yusuke Suzuki
saam: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yusuke Suzuki 2021-07-26 14:08:46 PDT
We would like to start the simple one. Let's try to deploy BM search for the top-level disjunction first.
Then, let's extend the coverage for the other disjunctions.
Comment 1 Yusuke Suzuki 2021-07-26 14:18:47 PDT
Created attachment 434238 [details]
Patch
Comment 2 Yusuke Suzuki 2021-07-26 14:24:52 PDT
Created attachment 434239 [details]
Patch
Comment 3 Yusuke Suzuki 2021-07-27 14:54:43 PDT
Created attachment 434319 [details]
Patch
Comment 4 Yusuke Suzuki 2021-07-27 14:58:32 PDT
Created attachment 434320 [details]
Patch
Comment 5 Yusuke Suzuki 2021-07-27 15:13:55 PDT
Created attachment 434343 [details]
Patch
Comment 6 Yusuke Suzuki 2021-07-27 16:35:06 PDT
Created attachment 434384 [details]
Patch
Comment 7 Yusuke Suzuki 2021-07-27 19:43:09 PDT
Created attachment 434394 [details]
Patch
Comment 8 Yusuke Suzuki 2021-07-27 20:21:25 PDT
Created attachment 434397 [details]
Patch
Comment 9 Yusuke Suzuki 2021-07-27 20:37:09 PDT
Created attachment 434398 [details]
Patch
Comment 10 Yusuke Suzuki 2021-07-27 21:03:41 PDT
Created attachment 434399 [details]
Patch
Comment 11 Yusuke Suzuki 2021-07-27 23:01:13 PDT
Created attachment 434403 [details]
Patch
Comment 12 Yusuke Suzuki 2021-07-28 16:42:14 PDT
Created attachment 434474 [details]
Patch
Comment 13 Sam Weinig 2021-07-28 17:46:49 PDT
Fun! I would replace "BM" with "BoyerMoore" to make it a bit clearer.
Comment 14 Yusuke Suzuki 2021-07-28 17:52:36 PDT
(In reply to Sam Weinig from comment #13)
> Fun! I would replace "BM" with "BoyerMoore" to make it a bit clearer.

Sounds good, relaced.
Comment 15 Yusuke Suzuki 2021-07-28 17:55:13 PDT
Created attachment 434480 [details]
Patch
Comment 16 Yusuke Suzuki 2021-07-28 17:57:39 PDT
Created attachment 434481 [details]
Patch
Comment 17 Yusuke Suzuki 2021-07-28 18:00:25 PDT
Created attachment 434482 [details]
Patch
Comment 18 Yusuke Suzuki 2021-07-28 18:07:05 PDT
Created attachment 434484 [details]
Patch
Comment 19 Yusuke Suzuki 2021-07-28 18:51:22 PDT
Created attachment 434486 [details]
Patch
Comment 20 Yusuke Suzuki 2021-07-28 18:54:06 PDT
Created attachment 434488 [details]
Patch
Comment 21 Saam Barati 2021-07-29 13:56:24 PDT
Comment on attachment 434488 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=434488&action=review

r=me

I also suggest filing/linking bugs for your FIXMEs

> Source/JavaScriptCore/yarr/YarrJIT.cpp:944
> +        BoyerMooreInfo* m_bmInfo;

can we just default this to nullptr so we don't need to set it at the call sites

> Source/JavaScriptCore/yarr/YarrJIT.cpp:2415
> +                            auto vector = makeUniqueRef<BoyerMooreByteVector>(map);
> +                            dataLogLnIf(Options::verboseRegExpCompilation(), "Found bitmap lookahead count:(", mapCount, "),range:[", beginIndex, ", ", endIndex, ")");
> +                            const uint8_t* pointer = vector->data();
> +                            if (const auto* existingPointer = m_codeBlock.findSameVector(op.m_bmInfo->index(), vector.get()))
> +                                pointer = existingPointer;
> +                            else
> +                                m_bmVector.append(WTFMove(vector));

small nit: I think it's nicer to encapsulate this logic in its own function, so we don't mistakenly use a stale pointer somehow. Like, "getBMByteVector" or something, and it take in BoyerMooreBitmap::Map&

> Source/JavaScriptCore/yarr/YarrJIT.h:127
> +    void set8BitCode(MacroAssemblerCodeRef<Yarr8BitPtrTag> ref, Vector<UniqueRef<BoyerMooreByteVector>>&& maps)

this should take Vector& not Vector&& unless it actually takes ownership over "maps". Alternatively, it can take "Vector", and take ownership

> Source/JavaScriptCore/yarr/YarrJIT.h:134
> +    void set16BitCode(MacroAssemblerCodeRef<Yarr16BitPtrTag> ref, Vector<UniqueRef<BoyerMooreByteVector>>&& maps)

ditto

> Source/JavaScriptCore/yarr/YarrJIT.h:144
> +    void set8BitCodeMatchOnly(MacroAssemblerCodeRef<YarrMatchOnly8BitPtrTag> matchOnly, Vector<UniqueRef<BoyerMooreByteVector>>&& maps)

ditto

> Source/JavaScriptCore/yarr/YarrJIT.h:151
> +    void set16BitCodeMatchOnly(MacroAssemblerCodeRef<YarrMatchOnly16BitPtrTag> matchOnly, Vector<UniqueRef<BoyerMooreByteVector>>&& maps)

ditto

> Source/JavaScriptCore/yarr/YarrJIT.h:243
> +    // Compile and Clear are guarded by RegExp's cell-lock. So these operations work atomically.

nit: would be nicer if they just took a const Locker& instead of a comment

> Source/JavaScriptCore/yarr/YarrJIT.h:254
> +    const uint8_t* findSameVector(unsigned index, BoyerMooreByteVector& vector) const

tryFindSameBoyerMooreVector? or tryFindSameBMVector? or tryReuseBMVector?

findSameVector is a weirdly generic name
Comment 22 Yusuke Suzuki 2021-07-29 15:05:40 PDT
Comment on attachment 434488 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=434488&action=review

Thanks!

>> Source/JavaScriptCore/yarr/YarrJIT.cpp:944
>> +        BoyerMooreInfo* m_bmInfo;
> 
> can we just default this to nullptr so we don't need to set it at the call sites

Fixed.

>> Source/JavaScriptCore/yarr/YarrJIT.cpp:2415
>> +                                m_bmVector.append(WTFMove(vector));
> 
> small nit: I think it's nicer to encapsulate this logic in its own function, so we don't mistakenly use a stale pointer somehow. Like, "getBMByteVector" or something, and it take in BoyerMooreBitmap::Map&

Fixed.

>> Source/JavaScriptCore/yarr/YarrJIT.h:127
>> +    void set8BitCode(MacroAssemblerCodeRef<Yarr8BitPtrTag> ref, Vector<UniqueRef<BoyerMooreByteVector>>&& maps)
> 
> this should take Vector& not Vector&& unless it actually takes ownership over "maps". Alternatively, it can take "Vector", and take ownership

Fixed.

>> Source/JavaScriptCore/yarr/YarrJIT.h:134
>> +    void set16BitCode(MacroAssemblerCodeRef<Yarr16BitPtrTag> ref, Vector<UniqueRef<BoyerMooreByteVector>>&& maps)
> 
> ditto

Fixed.

>> Source/JavaScriptCore/yarr/YarrJIT.h:144
>> +    void set8BitCodeMatchOnly(MacroAssemblerCodeRef<YarrMatchOnly8BitPtrTag> matchOnly, Vector<UniqueRef<BoyerMooreByteVector>>&& maps)
> 
> ditto

Fixed.

>> Source/JavaScriptCore/yarr/YarrJIT.h:151
>> +    void set16BitCodeMatchOnly(MacroAssemblerCodeRef<YarrMatchOnly16BitPtrTag> matchOnly, Vector<UniqueRef<BoyerMooreByteVector>>&& maps)
> 
> ditto

Fixed.

>> Source/JavaScriptCore/yarr/YarrJIT.h:243
>> +    // Compile and Clear are guarded by RegExp's cell-lock. So these operations work atomically.
> 
> nit: would be nicer if they just took a const Locker& instead of a comment

Fixed.

>> Source/JavaScriptCore/yarr/YarrJIT.h:254
>> +    const uint8_t* findSameVector(unsigned index, BoyerMooreByteVector& vector) const
> 
> tryFindSameBoyerMooreVector? or tryFindSameBMVector? or tryReuseBMVector?
> 
> findSameVector is a weirdly generic name

Fixed.
Comment 23 Yusuke Suzuki 2021-07-29 15:26:22 PDT
Committed r280452 (240087@main): <https://commits.webkit.org/240087@main>
Comment 24 Radar WebKit Bug Importer 2021-07-29 15:27:19 PDT
<rdar://problem/81293620>
Comment 25 Yusuke Suzuki 2021-07-30 14:16:53 PDT
Committed r280496 (240128@main): <https://commits.webkit.org/240128@main>