Bug 230469 - Inline RegExp.test JIT code in DFG and FTL
Summary: Inline RegExp.test JIT code in DFG and FTL
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-09-19 18:05 PDT by Michael Saboff
Modified: 2021-11-14 00:08 PST (History)
10 users (show)

See Also:


Attachments
Patch (248.79 KB, patch)
2021-09-19 22:43 PDT, Michael Saboff
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Updated Patch with suggested changes and speculative fixes for EWS failures. (248.86 KB, patch)
2021-09-20 10:39 PDT, Michael Saboff
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Updated patch adding Boyer-Moore searching and test fixes (255.01 KB, patch)
2021-09-25 22:01 PDT, Michael Saboff
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch with speculative fix for Windows (255.02 KB, patch)
2021-09-26 13:24 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Patch with changes from review (255.67 KB, patch)
2021-10-04 14:11 PDT, Michael Saboff
saam: review+
Details | Formatted Diff | Diff
Patch for landing (252.66 KB, patch)
2021-10-19 10:15 PDT, Michael Saboff
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch for landing (252.67 KB, patch)
2021-10-19 11:05 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2021-09-19 18:05:52 PDT
RegExp.test() is used to check if a string matches, without recording any captured groups.  Currently we call out to C++ code that compiles JIT code for the regular expression, then calls the regular expression, creates a matches array with the result, and then converts the result to a boolean. There are several levels of calls.  If we inlined the RegExp processing and customize it to do what is minimally needed for the "test()" function this would benefit frequently use.  It makes sense to limit the amount of inlined code and continue to use the the current path for larger expressions.
Comment 1 Radar WebKit Bug Importer 2021-09-19 18:06:17 PDT
<rdar://problem/83289752>
Comment 2 Michael Saboff 2021-09-19 22:43:42 PDT
Created attachment 438636 [details]
Patch
Comment 3 Yusuke Suzuki 2021-09-20 00:04:26 PDT
Comment on attachment 438636 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=438636&action=review

Added quick comment.

> Source/JavaScriptCore/dfg/DFGStrengthReductionPhase.cpp:809
> +                    dataLog(" can't inline expr\n");

We should not emit dataLog.

> Source/JavaScriptCore/ftl/FTLLowerDFGToB3.cpp:14321
> +        patchpoint->append(ConstrainedValue(stringLength, ValueRep::SomeLateRegister));

Let's add zeroExt(stringLength, Int64) since YarrJIT signature is UCPURegister (to avoid emitting zeroExtend32ToPtr in the YarrJIT side).
Comment 4 Yusuke Suzuki 2021-09-20 00:32:28 PDT
Comment on attachment 438636 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=438636&action=review

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2773
> +        Yarr::jitCompileInlinedTest(&m_graph.m_stackChecker, regExp->pattern(), regExp->flags(), Yarr::CharSize::Char8, &vm(), m_jit, yarrRegisters);

How does DFG / FTL hold used Bitmap for BoyreMoore search?
It is held by YarrCodeBlock. Which can be cleared. If DFG / FTL does not keep the bitmap used inside the inlined code, then we will access the freed memory.
Comment 5 Michael Saboff 2021-09-20 10:21:34 PDT
Comment on attachment 438636 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=438636&action=review

>> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2773
>> +        Yarr::jitCompileInlinedTest(&m_graph.m_stackChecker, regExp->pattern(), regExp->flags(), Yarr::CharSize::Char8, &vm(), m_jit, yarrRegisters);
> 
> How does DFG / FTL hold used Bitmap for BoyreMoore search?
> It is held by YarrCodeBlock. Which can be cleared. If DFG / FTL does not keep the bitmap used inside the inlined code, then we will access the freed memory.

The inline path doesn't use BoyerMoore to save one register.  See YarrJIT.cpp ~3811, where we create and use BoyerMoore info where we don't go down that path for InlineTest code generation.

>> Source/JavaScriptCore/dfg/DFGStrengthReductionPhase.cpp:809
>> +                    dataLog(" can't inline expr\n");
> 
> We should not emit dataLog.

Removed.

>> Source/JavaScriptCore/ftl/FTLLowerDFGToB3.cpp:14321
>> +        patchpoint->append(ConstrainedValue(stringLength, ValueRep::SomeLateRegister));
> 
> Let's add zeroExt(stringLength, Int64) since YarrJIT signature is UCPURegister (to avoid emitting zeroExtend32ToPtr in the YarrJIT side).

Added.
Comment 6 Yusuke Suzuki 2021-09-20 10:36:27 PDT
Comment on attachment 438636 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=438636&action=review

>>> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2773
>>> +        Yarr::jitCompileInlinedTest(&m_graph.m_stackChecker, regExp->pattern(), regExp->flags(), Yarr::CharSize::Char8, &vm(), m_jit, yarrRegisters);
>> 
>> How does DFG / FTL hold used Bitmap for BoyreMoore search?
>> It is held by YarrCodeBlock. Which can be cleared. If DFG / FTL does not keep the bitmap used inside the inlined code, then we will access the freed memory.
> 
> The inline path doesn't use BoyerMoore to save one register.  See YarrJIT.cpp ~3811, where we create and use BoyerMoore info where we don't go down that path for InlineTest code generation.

Hmm, I think we need to enable it: disabling BoyreMoore searching dramatically decreases performance of RegExp in Speedometer2/jQuery-TodoMVC (50%).
Comment 7 Michael Saboff 2021-09-20 10:39:04 PDT
Created attachment 438693 [details]
Updated Patch with suggested changes and speculative fixes for EWS failures.
Comment 8 Michael Saboff 2021-09-21 13:38:28 PDT
(In reply to Yusuke Suzuki from comment #6)
> Comment on attachment 438636 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=438636&action=review
> 
> >>> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2773
> >>> +        Yarr::jitCompileInlinedTest(&m_graph.m_stackChecker, regExp->pattern(), regExp->flags(), Yarr::CharSize::Char8, &vm(), m_jit, yarrRegisters);
> >> 
> >> How does DFG / FTL hold used Bitmap for BoyreMoore search?
> >> It is held by YarrCodeBlock. Which can be cleared. If DFG / FTL does not keep the bitmap used inside the inlined code, then we will access the freed memory.
> > 
> > The inline path doesn't use BoyerMoore to save one register.  See YarrJIT.cpp ~3811, where we create and use BoyerMoore info where we don't go down that path for InlineTest code generation.
> 
> Hmm, I think we need to enable it: disabling BoyreMoore searching
> dramatically decreases performance of RegExp in Speedometer2/jQuery-TodoMVC
> (50%).

Will make these changes and repost.
Comment 9 Michael Saboff 2021-09-25 22:01:59 PDT
Created attachment 439286 [details]
Updated patch adding Boyer-Moore searching and test fixes
Comment 10 Michael Saboff 2021-09-26 13:24:26 PDT
Created attachment 439295 [details]
Patch with speculative fix for Windows
Comment 11 Saam Barati 2021-09-28 17:51:37 PDT
Comment on attachment 439286 [details]
Updated patch adding Boyer-Moore searching and test fixes

View in context: https://bugs.webkit.org/attachment.cgi?id=439286&action=review

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2731
> +    auto jitCodeBlock = regExp->getRegExpJITCodeBlock();
> +    auto inlineCodeStats8Bit = jitCodeBlock->get8BitInlineStats();

How is this not a total race? it seems like m_regExpJITCode pointer is set before the stats are. I think you want some kind of concurrency protocol, probably at the very least a storeStoreFence before storing to the pointer. Especially because you're using this to drive register allocation, this seems super important.

If it were up to me, I'd remove the ensureRegExpJITCode function. I'd allocate my own data structure I pass into ::compile, like this:
```
auto jitCode = makeUnique<Yarr::YarrCodeBlock>();
Yarr::jitCompile(pattern, m_patternString, charSize, vm, *jitCode, Yarr::JITCompileMode::IncludeSubpatterns);
storeStoreFence()
m_regExpJITCode = = WTFMove(jitCode);
```

Something like that. What isn't clear to me is why we're calling ensure at all, instead of straight up allocating it. Like, can that pointer ever be non-null? If so, then my suggestion probably won't work, and you'll need something else.

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2746
> +    auto swapRegIfNeeded = [&] {

this whole thing looks so weird. If you didn't use GPRFlushedCallResult I don't think any of these checks would be needed

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2748
> +            swapReg = allocate();

why not use a GPRTemporary?

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2796
> +        if (inlineCodeStats8Bit.needsTemp2()) {
> +            JSGlobalObject* globalObjectPtr = m_jit.graph().globalObjectFor(node->origin.semantic);
> +            m_jit.move(CCallHelpers::TrustedImmPtr(globalObjectPtr), globalObjectGPR);
> +        }

"needsTemp2" is a weird name for something that means I need the lexical global object. Why would we ever be dealing with lexical global object here instead of the other global object gpr that is a child to this node?

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2874
> +    m_jit.exceptionCheck();

does the exception check only need to be for the callOpeartion?

> Source/JavaScriptCore/runtime/StackAlignment.h:58
> +    unsigned sizeInRegisters = sizeInBytes / sizeof(void*);

I think you want sizeof(CPURegister) instead of void* here, because arm64_32

> Source/JavaScriptCore/runtime/StackAlignment.h:61
> +    if (sizeInRegisters <= CallFrame::headerSizeInRegisters)
> +        return 0;

This check looks weird. What's the deal with it?
Comment 12 Michael Saboff 2021-09-29 13:50:31 PDT
Comment on attachment 439286 [details]
Updated patch adding Boyer-Moore searching and test fixes

View in context: https://bugs.webkit.org/attachment.cgi?id=439286&action=review

>> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2731
>> +    auto inlineCodeStats8Bit = jitCodeBlock->get8BitInlineStats();
> 
> How is this not a total race? it seems like m_regExpJITCode pointer is set before the stats are. I think you want some kind of concurrency protocol, probably at the very least a storeStoreFence before storing to the pointer. Especially because you're using this to drive register allocation, this seems super important.
> 
> If it were up to me, I'd remove the ensureRegExpJITCode function. I'd allocate my own data structure I pass into ::compile, like this:
> ```
> auto jitCode = makeUnique<Yarr::YarrCodeBlock>();
> Yarr::jitCompile(pattern, m_patternString, charSize, vm, *jitCode, Yarr::JITCompileMode::IncludeSubpatterns);
> storeStoreFence()
> m_regExpJITCode = = WTFMove(jitCode);
> ```
> 
> Something like that. What isn't clear to me is why we're calling ensure at all, instead of straight up allocating it. Like, can that pointer ever be non-null? If so, then my suggestion probably won't work, and you'll need something else.

The m_regExpJITCode pointer is null before we JIT a RegExp expression.  It will always be null for RegExp's that use the YARR interpreter and not the JIT.  Once we have a m_regExpJITCode, we have stats that at least say we can't inline.

The YarrCodeBlock can point to up to 4 JIT'ed regions, with all combinations of 8 bit / 16 bit and match only / match with sub patterns.  Therefore we don't want to replace the YarrCodeBlock when we compile a possibly additional JIT variant for a given RegExp*.  I believe that is why we have ensureRegExpJITCode().  Once a RegExp has a m_regExpJITCode, it is never deleted only cleared which deletes the code and currently resets the stats.  I will eliminate the clearing of the stats, since that information is useful for inlining even if the corresponding out of line YARR JIT code has been deleted.  I'll  reorder the setting of the stats so that m_canInline is the last things set with it preceded by a store store fence.

 I'll add a null check in DFGStrengthReduction where we first access the m_regExpJITCode for a given RegExp.  I'll also add a null check at the top of this function and call the operation if it is null.

>> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2746
>> +    auto swapRegIfNeeded = [&] {
> 
> this whole thing looks so weird. If you didn't use GPRFlushedCallResult I don't think any of these checks would be needed

Since there is the possibility we will call out to C++, I use GPRFlushedCallResult.  Therefore we may need to swap registers.

>> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2748
>> +            swapReg = allocate();
> 
> why not use a GPRTemporary?

I was following the pattern I see in compileArithDiv() & compileArithMod().

>> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2796
>> +        }
> 
> "needsTemp2" is a weird name for something that means I need the lexical global object. Why would we ever be dealing with lexical global object here instead of the other global object gpr that is a child to this node?

If we need Temp2 on X86_64, we run out of registers.  I figured I'd use the global object register for the inline code and materialize the GO afterwards.

I'll change it so that we always materialize the constant GlobalObject for node1 since strength reduction has already proven that.

>> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2874
>> +    m_jit.exceptionCheck();
> 
> does the exception check only need to be for the callOpeartion?

Yes.  The inline path doesn't throw any exceptions.

>> Source/JavaScriptCore/runtime/StackAlignment.h:58
>> +    unsigned sizeInRegisters = sizeInBytes / sizeof(void*);
> 
> I think you want sizeof(CPURegister) instead of void* here, because arm64_32

Will change.

>> Source/JavaScriptCore/runtime/StackAlignment.h:61
>> +        return 0;
> 
> This check looks weird. What's the deal with it?

See the return statement below.  If sizeInRegisters is < CallFrame::headerSizeInRegisters, then we underflow on the unsigned subtraction and return a huge number.

This code handles the case where the YARR JIT may not need any space or space that is equal to the call frame header.  The rest of the out going stack space is based on argument count.  This function converts the YARR stack needs to argument count.

Want a comment?
Comment 13 Michael Saboff 2021-10-04 14:11:49 PDT
Created attachment 440102 [details]
Patch with changes from review
Comment 14 Saam Barati 2021-10-13 19:05:18 PDT
Comment on attachment 439286 [details]
Updated patch adding Boyer-Moore searching and test fixes

View in context: https://bugs.webkit.org/attachment.cgi?id=439286&action=review

>>> Source/JavaScriptCore/runtime/StackAlignment.h:61
>>> +        return 0;
>> 
>> This check looks weird. What's the deal with it?
> 
> See the return statement below.  If sizeInRegisters is < CallFrame::headerSizeInRegisters, then we underflow on the unsigned subtraction and return a huge number.
> 
> This code handles the case where the YARR JIT may not need any space or space that is equal to the call frame header.  The rest of the out going stack space is based on argument count.  This function converts the YARR stack needs to argument count.
> 
> Want a comment?

Gotcha, I suppose I glossed right over that subtract.

The people calling this function, why would they pass < headerSizeInRegisters?
Comment 15 Michael Saboff 2021-10-13 20:32:14 PDT
(In reply to Saam Barati from comment #14)
> Comment on attachment 439286 [details]
> Updated patch adding Boyer-Moore searching and test fixes
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=439286&action=review
> 
> >>> Source/JavaScriptCore/runtime/StackAlignment.h:61
> >>> +        return 0;
> >> 
> >> This check looks weird. What's the deal with it?
> > 
> > See the return statement below.  If sizeInRegisters is < CallFrame::headerSizeInRegisters, then we underflow on the unsigned subtraction and return a huge number.
> > 
> > This code handles the case where the YARR JIT may not need any space or space that is equal to the call frame header.  The rest of the out going stack space is based on argument count.  This function converts the YARR stack needs to argument count.
> > 
> > Want a comment?
> 
> Gotcha, I suppose I glossed right over that subtract.
> 
> The people calling this function, why would they pass <
> headerSizeInRegisters?

As I said above, the count coming from the YARR JIT engine could be 0 or 8.  This is the space that YARR uses to save state needed for backtracking and not the space for it to make an outgoing call.
Comment 16 Saam Barati 2021-10-14 13:23:58 PDT
Comment on attachment 440102 [details]
Patch with changes from review

View in context: https://bugs.webkit.org/attachment.cgi?id=440102&action=review

> Source/JavaScriptCore/dfg/DFGClobberize.h:686
> +#define ABSTRACT_HEAP_NOT_RegExpObject_lastIndex(name) if (name != InvalidAbstractHeap && \
> +    name != InvalidAbstractHeap && \
> +    name != World && \
> +    name != Stack && \
> +    name != Heap && \
> +    name != RegExpObject_lastIndex) \
> +        write(name);
> +    FOR_EACH_ABSTRACT_HEAP_KIND(ABSTRACT_HEAP_NOT_RegExpObject_lastIndex)

😊

> Source/JavaScriptCore/dfg/DFGCommonData.h:126
> +    Yarr::YarrBoyerMoyerData m_boyerMooreData;

is it ok for this to be shared between all RegExpTestInline nodes?

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2710
> +    GPRFlushedCallResult result(this);

I'm not sure anything is preventing the above from allocating into rax (on x86). Is it ok if one of them do and have the same value as resultGPR?

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2749
> +            swapReg = allocate();

I still think it's simpler to have something like std::optional<GPRTemporary> that we emplace here. Not sure if there is a reason not to use that approach? The benefit is it frees us from worrying about calling unlock() below.

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2856
> +    swapRegIfNeeded();

this seems slightly wrong to do after flushRegisters above. It might be innocuous since we don't associate it with any particular value in the graph, but logically, I think these should be swapped in order with flushRegisters. Like, the call below might clobber the swap we allocate(), and if we had associated it with a value, the register allocation state might be wrong.

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2876
> +    m_jit.exceptionCheck();

should this only go after the callOperation?

> Source/JavaScriptCore/ftl/FTLLowerDFGToB3.cpp:14297
> +        LValue globalObject = lowCell(m_node->child1());

Isn't this a constant? Why is it still in child1? Same goes to code in DFG.

I think we can just change the code to use the constant in op info?

This should be sound since we're using KnownCelluse on child1, so we don't need to preserve the "type check".

> Source/JavaScriptCore/ftl/FTLLowerDFGToB3.cpp:14404
> +        patchpoint->effects = Effects::none();

This doesn't seem right. At the very least we're storing some stuff to global object, etc.
Don't we also read from the Heap?
Comment 17 Saam Barati 2021-10-14 13:29:00 PDT
Comment on attachment 440102 [details]
Patch with changes from review

View in context: https://bugs.webkit.org/attachment.cgi?id=440102&action=review

> Source/JavaScriptCore/yarr/YarrJITRegisters.h:38
> +struct YarrJITDefaultRegisters : private MacroAssembler {

Does this need to inherit from MacroAssembler?

> Source/JavaScriptCore/yarr/YarrJITRegisters.h:151
> +class YarrJITRegisters : private MacroAssembler {

ditto
Comment 18 Saam Barati 2021-10-14 15:00:27 PDT
Comment on attachment 440102 [details]
Patch with changes from review

View in context: https://bugs.webkit.org/attachment.cgi?id=440102&action=review

r=me with the stuff we spoke about when talking on the phone

> Source/JavaScriptCore/yarr/YarrJIT.cpp:4100
> +        m_boyerMooreData = dynamic_cast<YarrBoyerMoyerData*>(m_codeBlock);

dynamic_cast?
Why not static_cast?

> Source/JavaScriptCore/yarr/YarrJITRegisters.h:32
> +#define YARR_CALL

don't think this is needed.
Comment 19 Michael Saboff 2021-10-19 10:15:28 PDT
Created attachment 441749 [details]
Patch for landing

Incorporated changes in response to reviewer's comments.
Comment 20 Michael Saboff 2021-10-19 11:05:32 PDT
Created attachment 441759 [details]
Patch for landing

Fixed Windows build issue.
Comment 21 Michael Saboff 2021-11-11 13:02:35 PST
Committed r285651 (244149@main): <https://commits.webkit.org/244149@main>