RESOLVED FIXED 230469
Inline RegExp.test JIT code in DFG and FTL
https://bugs.webkit.org/show_bug.cgi?id=230469
Summary Inline RegExp.test JIT code in DFG and FTL
Michael Saboff
Reported 2021-09-19 18:05:52 PDT
RegExp.test() is used to check if a string matches, without recording any captured groups. Currently we call out to C++ code that compiles JIT code for the regular expression, then calls the regular expression, creates a matches array with the result, and then converts the result to a boolean. There are several levels of calls. If we inlined the RegExp processing and customize it to do what is minimally needed for the "test()" function this would benefit frequently use. It makes sense to limit the amount of inlined code and continue to use the the current path for larger expressions.
Attachments
Patch (248.79 KB, patch)
2021-09-19 22:43 PDT, Michael Saboff
ews-feeder: commit-queue-
Updated Patch with suggested changes and speculative fixes for EWS failures. (248.86 KB, patch)
2021-09-20 10:39 PDT, Michael Saboff
ews-feeder: commit-queue-
Updated patch adding Boyer-Moore searching and test fixes (255.01 KB, patch)
2021-09-25 22:01 PDT, Michael Saboff
ews-feeder: commit-queue-
Patch with speculative fix for Windows (255.02 KB, patch)
2021-09-26 13:24 PDT, Michael Saboff
no flags
Patch with changes from review (255.67 KB, patch)
2021-10-04 14:11 PDT, Michael Saboff
saam: review+
Patch for landing (252.66 KB, patch)
2021-10-19 10:15 PDT, Michael Saboff
ews-feeder: commit-queue-
Patch for landing (252.67 KB, patch)
2021-10-19 11:05 PDT, Michael Saboff
no flags
Radar WebKit Bug Importer
Comment 1 2021-09-19 18:06:17 PDT
Michael Saboff
Comment 2 2021-09-19 22:43:42 PDT
Yusuke Suzuki
Comment 3 2021-09-20 00:04:26 PDT
Comment on attachment 438636 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=438636&action=review Added quick comment. > Source/JavaScriptCore/dfg/DFGStrengthReductionPhase.cpp:809 > + dataLog(" can't inline expr\n"); We should not emit dataLog. > Source/JavaScriptCore/ftl/FTLLowerDFGToB3.cpp:14321 > + patchpoint->append(ConstrainedValue(stringLength, ValueRep::SomeLateRegister)); Let's add zeroExt(stringLength, Int64) since YarrJIT signature is UCPURegister (to avoid emitting zeroExtend32ToPtr in the YarrJIT side).
Yusuke Suzuki
Comment 4 2021-09-20 00:32:28 PDT
Comment on attachment 438636 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=438636&action=review > Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2773 > + Yarr::jitCompileInlinedTest(&m_graph.m_stackChecker, regExp->pattern(), regExp->flags(), Yarr::CharSize::Char8, &vm(), m_jit, yarrRegisters); How does DFG / FTL hold used Bitmap for BoyreMoore search? It is held by YarrCodeBlock. Which can be cleared. If DFG / FTL does not keep the bitmap used inside the inlined code, then we will access the freed memory.
Michael Saboff
Comment 5 2021-09-20 10:21:34 PDT
Comment on attachment 438636 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=438636&action=review >> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2773 >> + Yarr::jitCompileInlinedTest(&m_graph.m_stackChecker, regExp->pattern(), regExp->flags(), Yarr::CharSize::Char8, &vm(), m_jit, yarrRegisters); > > How does DFG / FTL hold used Bitmap for BoyreMoore search? > It is held by YarrCodeBlock. Which can be cleared. If DFG / FTL does not keep the bitmap used inside the inlined code, then we will access the freed memory. The inline path doesn't use BoyerMoore to save one register. See YarrJIT.cpp ~3811, where we create and use BoyerMoore info where we don't go down that path for InlineTest code generation. >> Source/JavaScriptCore/dfg/DFGStrengthReductionPhase.cpp:809 >> + dataLog(" can't inline expr\n"); > > We should not emit dataLog. Removed. >> Source/JavaScriptCore/ftl/FTLLowerDFGToB3.cpp:14321 >> + patchpoint->append(ConstrainedValue(stringLength, ValueRep::SomeLateRegister)); > > Let's add zeroExt(stringLength, Int64) since YarrJIT signature is UCPURegister (to avoid emitting zeroExtend32ToPtr in the YarrJIT side). Added.
Yusuke Suzuki
Comment 6 2021-09-20 10:36:27 PDT
Comment on attachment 438636 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=438636&action=review >>> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2773 >>> + Yarr::jitCompileInlinedTest(&m_graph.m_stackChecker, regExp->pattern(), regExp->flags(), Yarr::CharSize::Char8, &vm(), m_jit, yarrRegisters); >> >> How does DFG / FTL hold used Bitmap for BoyreMoore search? >> It is held by YarrCodeBlock. Which can be cleared. If DFG / FTL does not keep the bitmap used inside the inlined code, then we will access the freed memory. > > The inline path doesn't use BoyerMoore to save one register. See YarrJIT.cpp ~3811, where we create and use BoyerMoore info where we don't go down that path for InlineTest code generation. Hmm, I think we need to enable it: disabling BoyreMoore searching dramatically decreases performance of RegExp in Speedometer2/jQuery-TodoMVC (50%).
Michael Saboff
Comment 7 2021-09-20 10:39:04 PDT
Created attachment 438693 [details] Updated Patch with suggested changes and speculative fixes for EWS failures.
Michael Saboff
Comment 8 2021-09-21 13:38:28 PDT
(In reply to Yusuke Suzuki from comment #6) > Comment on attachment 438636 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=438636&action=review > > >>> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2773 > >>> + Yarr::jitCompileInlinedTest(&m_graph.m_stackChecker, regExp->pattern(), regExp->flags(), Yarr::CharSize::Char8, &vm(), m_jit, yarrRegisters); > >> > >> How does DFG / FTL hold used Bitmap for BoyreMoore search? > >> It is held by YarrCodeBlock. Which can be cleared. If DFG / FTL does not keep the bitmap used inside the inlined code, then we will access the freed memory. > > > > The inline path doesn't use BoyerMoore to save one register. See YarrJIT.cpp ~3811, where we create and use BoyerMoore info where we don't go down that path for InlineTest code generation. > > Hmm, I think we need to enable it: disabling BoyreMoore searching > dramatically decreases performance of RegExp in Speedometer2/jQuery-TodoMVC > (50%). Will make these changes and repost.
Michael Saboff
Comment 9 2021-09-25 22:01:59 PDT
Created attachment 439286 [details] Updated patch adding Boyer-Moore searching and test fixes
Michael Saboff
Comment 10 2021-09-26 13:24:26 PDT
Created attachment 439295 [details] Patch with speculative fix for Windows
Saam Barati
Comment 11 2021-09-28 17:51:37 PDT
Comment on attachment 439286 [details] Updated patch adding Boyer-Moore searching and test fixes View in context: https://bugs.webkit.org/attachment.cgi?id=439286&action=review > Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2731 > + auto jitCodeBlock = regExp->getRegExpJITCodeBlock(); > + auto inlineCodeStats8Bit = jitCodeBlock->get8BitInlineStats(); How is this not a total race? it seems like m_regExpJITCode pointer is set before the stats are. I think you want some kind of concurrency protocol, probably at the very least a storeStoreFence before storing to the pointer. Especially because you're using this to drive register allocation, this seems super important. If it were up to me, I'd remove the ensureRegExpJITCode function. I'd allocate my own data structure I pass into ::compile, like this: ``` auto jitCode = makeUnique<Yarr::YarrCodeBlock>(); Yarr::jitCompile(pattern, m_patternString, charSize, vm, *jitCode, Yarr::JITCompileMode::IncludeSubpatterns); storeStoreFence() m_regExpJITCode = = WTFMove(jitCode); ``` Something like that. What isn't clear to me is why we're calling ensure at all, instead of straight up allocating it. Like, can that pointer ever be non-null? If so, then my suggestion probably won't work, and you'll need something else. > Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2746 > + auto swapRegIfNeeded = [&] { this whole thing looks so weird. If you didn't use GPRFlushedCallResult I don't think any of these checks would be needed > Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2748 > + swapReg = allocate(); why not use a GPRTemporary? > Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2796 > + if (inlineCodeStats8Bit.needsTemp2()) { > + JSGlobalObject* globalObjectPtr = m_jit.graph().globalObjectFor(node->origin.semantic); > + m_jit.move(CCallHelpers::TrustedImmPtr(globalObjectPtr), globalObjectGPR); > + } "needsTemp2" is a weird name for something that means I need the lexical global object. Why would we ever be dealing with lexical global object here instead of the other global object gpr that is a child to this node? > Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2874 > + m_jit.exceptionCheck(); does the exception check only need to be for the callOpeartion? > Source/JavaScriptCore/runtime/StackAlignment.h:58 > + unsigned sizeInRegisters = sizeInBytes / sizeof(void*); I think you want sizeof(CPURegister) instead of void* here, because arm64_32 > Source/JavaScriptCore/runtime/StackAlignment.h:61 > + if (sizeInRegisters <= CallFrame::headerSizeInRegisters) > + return 0; This check looks weird. What's the deal with it?
Michael Saboff
Comment 12 2021-09-29 13:50:31 PDT
Comment on attachment 439286 [details] Updated patch adding Boyer-Moore searching and test fixes View in context: https://bugs.webkit.org/attachment.cgi?id=439286&action=review >> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2731 >> + auto inlineCodeStats8Bit = jitCodeBlock->get8BitInlineStats(); > > How is this not a total race? it seems like m_regExpJITCode pointer is set before the stats are. I think you want some kind of concurrency protocol, probably at the very least a storeStoreFence before storing to the pointer. Especially because you're using this to drive register allocation, this seems super important. > > If it were up to me, I'd remove the ensureRegExpJITCode function. I'd allocate my own data structure I pass into ::compile, like this: > ``` > auto jitCode = makeUnique<Yarr::YarrCodeBlock>(); > Yarr::jitCompile(pattern, m_patternString, charSize, vm, *jitCode, Yarr::JITCompileMode::IncludeSubpatterns); > storeStoreFence() > m_regExpJITCode = = WTFMove(jitCode); > ``` > > Something like that. What isn't clear to me is why we're calling ensure at all, instead of straight up allocating it. Like, can that pointer ever be non-null? If so, then my suggestion probably won't work, and you'll need something else. The m_regExpJITCode pointer is null before we JIT a RegExp expression. It will always be null for RegExp's that use the YARR interpreter and not the JIT. Once we have a m_regExpJITCode, we have stats that at least say we can't inline. The YarrCodeBlock can point to up to 4 JIT'ed regions, with all combinations of 8 bit / 16 bit and match only / match with sub patterns. Therefore we don't want to replace the YarrCodeBlock when we compile a possibly additional JIT variant for a given RegExp*. I believe that is why we have ensureRegExpJITCode(). Once a RegExp has a m_regExpJITCode, it is never deleted only cleared which deletes the code and currently resets the stats. I will eliminate the clearing of the stats, since that information is useful for inlining even if the corresponding out of line YARR JIT code has been deleted. I'll reorder the setting of the stats so that m_canInline is the last things set with it preceded by a store store fence. I'll add a null check in DFGStrengthReduction where we first access the m_regExpJITCode for a given RegExp. I'll also add a null check at the top of this function and call the operation if it is null. >> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2746 >> + auto swapRegIfNeeded = [&] { > > this whole thing looks so weird. If you didn't use GPRFlushedCallResult I don't think any of these checks would be needed Since there is the possibility we will call out to C++, I use GPRFlushedCallResult. Therefore we may need to swap registers. >> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2748 >> + swapReg = allocate(); > > why not use a GPRTemporary? I was following the pattern I see in compileArithDiv() & compileArithMod(). >> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2796 >> + } > > "needsTemp2" is a weird name for something that means I need the lexical global object. Why would we ever be dealing with lexical global object here instead of the other global object gpr that is a child to this node? If we need Temp2 on X86_64, we run out of registers. I figured I'd use the global object register for the inline code and materialize the GO afterwards. I'll change it so that we always materialize the constant GlobalObject for node1 since strength reduction has already proven that. >> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2874 >> + m_jit.exceptionCheck(); > > does the exception check only need to be for the callOpeartion? Yes. The inline path doesn't throw any exceptions. >> Source/JavaScriptCore/runtime/StackAlignment.h:58 >> + unsigned sizeInRegisters = sizeInBytes / sizeof(void*); > > I think you want sizeof(CPURegister) instead of void* here, because arm64_32 Will change. >> Source/JavaScriptCore/runtime/StackAlignment.h:61 >> + return 0; > > This check looks weird. What's the deal with it? See the return statement below. If sizeInRegisters is < CallFrame::headerSizeInRegisters, then we underflow on the unsigned subtraction and return a huge number. This code handles the case where the YARR JIT may not need any space or space that is equal to the call frame header. The rest of the out going stack space is based on argument count. This function converts the YARR stack needs to argument count. Want a comment?
Michael Saboff
Comment 13 2021-10-04 14:11:49 PDT
Created attachment 440102 [details] Patch with changes from review
Saam Barati
Comment 14 2021-10-13 19:05:18 PDT
Comment on attachment 439286 [details] Updated patch adding Boyer-Moore searching and test fixes View in context: https://bugs.webkit.org/attachment.cgi?id=439286&action=review >>> Source/JavaScriptCore/runtime/StackAlignment.h:61 >>> + return 0; >> >> This check looks weird. What's the deal with it? > > See the return statement below. If sizeInRegisters is < CallFrame::headerSizeInRegisters, then we underflow on the unsigned subtraction and return a huge number. > > This code handles the case where the YARR JIT may not need any space or space that is equal to the call frame header. The rest of the out going stack space is based on argument count. This function converts the YARR stack needs to argument count. > > Want a comment? Gotcha, I suppose I glossed right over that subtract. The people calling this function, why would they pass < headerSizeInRegisters?
Michael Saboff
Comment 15 2021-10-13 20:32:14 PDT
(In reply to Saam Barati from comment #14) > Comment on attachment 439286 [details] > Updated patch adding Boyer-Moore searching and test fixes > > View in context: > https://bugs.webkit.org/attachment.cgi?id=439286&action=review > > >>> Source/JavaScriptCore/runtime/StackAlignment.h:61 > >>> + return 0; > >> > >> This check looks weird. What's the deal with it? > > > > See the return statement below. If sizeInRegisters is < CallFrame::headerSizeInRegisters, then we underflow on the unsigned subtraction and return a huge number. > > > > This code handles the case where the YARR JIT may not need any space or space that is equal to the call frame header. The rest of the out going stack space is based on argument count. This function converts the YARR stack needs to argument count. > > > > Want a comment? > > Gotcha, I suppose I glossed right over that subtract. > > The people calling this function, why would they pass < > headerSizeInRegisters? As I said above, the count coming from the YARR JIT engine could be 0 or 8. This is the space that YARR uses to save state needed for backtracking and not the space for it to make an outgoing call.
Saam Barati
Comment 16 2021-10-14 13:23:58 PDT
Comment on attachment 440102 [details] Patch with changes from review View in context: https://bugs.webkit.org/attachment.cgi?id=440102&action=review > Source/JavaScriptCore/dfg/DFGClobberize.h:686 > +#define ABSTRACT_HEAP_NOT_RegExpObject_lastIndex(name) if (name != InvalidAbstractHeap && \ > + name != InvalidAbstractHeap && \ > + name != World && \ > + name != Stack && \ > + name != Heap && \ > + name != RegExpObject_lastIndex) \ > + write(name); > + FOR_EACH_ABSTRACT_HEAP_KIND(ABSTRACT_HEAP_NOT_RegExpObject_lastIndex) 😊 > Source/JavaScriptCore/dfg/DFGCommonData.h:126 > + Yarr::YarrBoyerMoyerData m_boyerMooreData; is it ok for this to be shared between all RegExpTestInline nodes? > Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2710 > + GPRFlushedCallResult result(this); I'm not sure anything is preventing the above from allocating into rax (on x86). Is it ok if one of them do and have the same value as resultGPR? > Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2749 > + swapReg = allocate(); I still think it's simpler to have something like std::optional<GPRTemporary> that we emplace here. Not sure if there is a reason not to use that approach? The benefit is it frees us from worrying about calling unlock() below. > Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2856 > + swapRegIfNeeded(); this seems slightly wrong to do after flushRegisters above. It might be innocuous since we don't associate it with any particular value in the graph, but logically, I think these should be swapped in order with flushRegisters. Like, the call below might clobber the swap we allocate(), and if we had associated it with a value, the register allocation state might be wrong. > Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2876 > + m_jit.exceptionCheck(); should this only go after the callOperation? > Source/JavaScriptCore/ftl/FTLLowerDFGToB3.cpp:14297 > + LValue globalObject = lowCell(m_node->child1()); Isn't this a constant? Why is it still in child1? Same goes to code in DFG. I think we can just change the code to use the constant in op info? This should be sound since we're using KnownCelluse on child1, so we don't need to preserve the "type check". > Source/JavaScriptCore/ftl/FTLLowerDFGToB3.cpp:14404 > + patchpoint->effects = Effects::none(); This doesn't seem right. At the very least we're storing some stuff to global object, etc. Don't we also read from the Heap?
Saam Barati
Comment 17 2021-10-14 13:29:00 PDT
Comment on attachment 440102 [details] Patch with changes from review View in context: https://bugs.webkit.org/attachment.cgi?id=440102&action=review > Source/JavaScriptCore/yarr/YarrJITRegisters.h:38 > +struct YarrJITDefaultRegisters : private MacroAssembler { Does this need to inherit from MacroAssembler? > Source/JavaScriptCore/yarr/YarrJITRegisters.h:151 > +class YarrJITRegisters : private MacroAssembler { ditto
Saam Barati
Comment 18 2021-10-14 15:00:27 PDT
Comment on attachment 440102 [details] Patch with changes from review View in context: https://bugs.webkit.org/attachment.cgi?id=440102&action=review r=me with the stuff we spoke about when talking on the phone > Source/JavaScriptCore/yarr/YarrJIT.cpp:4100 > + m_boyerMooreData = dynamic_cast<YarrBoyerMoyerData*>(m_codeBlock); dynamic_cast? Why not static_cast? > Source/JavaScriptCore/yarr/YarrJITRegisters.h:32 > +#define YARR_CALL don't think this is needed.
Michael Saboff
Comment 19 2021-10-19 10:15:28 PDT
Created attachment 441749 [details] Patch for landing Incorporated changes in response to reviewer's comments.
Michael Saboff
Comment 20 2021-10-19 11:05:32 PDT
Created attachment 441759 [details] Patch for landing Fixed Windows build issue.
Michael Saboff
Comment 21 2021-11-11 13:02:35 PST
Note You need to log in before you can comment on or make changes to this bug.