As indicated in the title, we would like to turn recursive tail calls into simple loop, preferably before optimizations like LICM for optimal performance.
We expect this to be a big win on the (admittedly contrived) benchmark in https://bugs.webkit.org/show_bug.cgi?id=175754.
There are several difficulties. The main one is the interaction with inlining.
Consider the following example: A() calls B() that calls C() that tail calls B ().
If the inliner inlines C into B into A, then we want to insert a jump from the end of C to the beginning of B, in A.
But currently the beginning of B might be in the middle of a basic block, so we must change the inliner to split the basic block.
More surprisingly, the transformation itself must also happen at the same time and not in a later phase (for example after Fixup).
Otherwise the intervening passes would not see the control flow edge, and might hoist parts of B() above the entry point of B (into the original code of A).
So I will do the basic block splitting in DFGBytecodeParser::inlineCall and the actual transformation in DFGBytecodeParser::handleInlining.
In the first time, we will only do the optimization for monomorphic non-varargs calls for simplicity. It should be straightforward (or at least possible) to extend it to the other cases later on.
Attachment 320892[details] did not pass style-queue:
ERROR: Source/JavaScriptCore/ChangeLog:9: Line contains tab character. [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:10: Line contains tab character. [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6282: Wrong number of spaces before statement. (expected: 8) [whitespace/indent] [4]
Total errors found: 3 in 2 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 320897[details]
Archive of layout-test-results from ews103 for mac-elcapitan
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Created attachment 320898[details]
Archive of layout-test-results from ews113 for mac-elcapitan
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews113 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Created attachment 320899[details]
Archive of layout-test-results from ews106 for mac-elcapitan-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews106 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6
Created attachment 320900[details]
Archive of layout-test-results from ews125 for ios-simulator-wk2
The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews125 Port: ios-simulator-wk2 Platform: Mac OS X 10.12.5
Created attachment 321543[details]
Archive of layout-test-results from ews114 for mac-elcapitan
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews114 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Created attachment 321544[details]
Archive of layout-test-results from ews105 for mac-elcapitan-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews105 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6
Created attachment 321546[details]
Archive of layout-test-results from ews103 for mac-elcapitan
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Created attachment 321547[details]
Archive of layout-test-results from ews121 for ios-simulator-wk2
The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews121 Port: ios-simulator-wk2 Platform: Mac OS X 10.12.6
Comment on attachment 321539[details]
WIP
View in context: https://bugs.webkit.org/attachment.cgi?id=321539&action=review> Source/JavaScriptCore/ChangeLog:15
> + This makes it a lot easier to make sure that they are in the "right" inlineStackEntry, i.e. the one corresponding to the function that their successor comes from.
I'm not sure I follow this comment. What's "right" here? What's "successor"?
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1652
> + // If the current block is not empty, we must split it so that recursive tail calls in the inlinee have somewhere to jump to.
I don't think you mean empty here given assert below. Perhaps you just mean if the block only has code for the op_call?
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1661
> + if (m_currentIndex != m_currentBlock->bytecodeBegin) {
> + ASSERT(!m_currentBlock->isEmpty());
> + Node* jumpNode = addToGraph(Jump);
> + Ref<BasicBlock> block = adoptRef(*new BasicBlock(m_currentIndex, m_numArguments, m_numLocals, m_currentBlock->executionCount));
> + jumpNode->targetBlock() = block.ptr();
> + m_inlineStackTop->m_blockLinkingTargets.append(block.ptr());
> + m_currentBlock = block.ptr();
> + m_graph.appendBlock(WTFMove(block));
> + }
This split needs to happen after we process the setLocalQueue we may have appended to above. Otherwise, if we split, we'll generate invalid IR. I think a simple tail call that needs arity fixup will trigger this.
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6352
> + Ref<BasicBlock> block = adoptRef(*new BasicBlock(m_currentIndex, m_numArguments, m_numLocals, 1)); // TODO: why 1 ?
Maybe we just start all blocks at 1? I know we change this value in the StaticExecutionCountEstimationPhase
Attachment 321681[details] did not pass style-queue:
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1311: Should have a space between // and comment [whitespace/comments] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1312: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4136: WTF_CONCAT is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4138: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4144: Wrong number of spaces before statement. (expected: 12) [whitespace/indent] [4]
Total errors found: 5 in 4 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 321685[details]
Archive of layout-test-results from ews100 for mac-elcapitan
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews100 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Created attachment 321686[details]
Archive of layout-test-results from ews107 for mac-elcapitan-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6
Created attachment 321687[details]
Archive of layout-test-results from ews112 for mac-elcapitan
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Created attachment 321689[details]
Archive of layout-test-results from ews126 for ios-simulator-wk2
The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews126 Port: ios-simulator-wk2 Platform: Mac OS X 10.12.6
Comment on attachment 321826[details]
WIP, frustrating heisenbug
View in context: https://bugs.webkit.org/attachment.cgi?id=321826&action=review> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:83
> +#define VERBOSE_LOG(...) do { \
> +if (Options::verboseDFGByteCodeParsing()) \
> + dataLog(__VA_ARGS__); \
> +} while (0)
Nit: You could make this
#define VERBOSE_LOG(...) dataLogIf(Options::verboseDFGByteCodeParsing(), __VA_ARGS__);
Comment on attachment 321826[details]
WIP, frustrating heisenbug
View in context: https://bugs.webkit.org/attachment.cgi?id=321826&action=review> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1278
> + for (int i = 0; i < stackEntry->m_codeBlock->m_numVars; ++i)
> + set(virtualRegisterForLocal(i), undefined, ImmediateNakedSet); // TODO: check that it is still reasonable to do this "ImmediateNakedSet" stuff here.
This is wrong. You need setDirect here and you need to map the virtual register manually
Attachment 322183[details] did not pass style-queue:
ERROR: Source/JavaScriptCore/ChangeLog:29: Line contains tab character. [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:30: Line contains tab character. [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:31: Line contains tab character. [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1342: Should have a space between // and comment [whitespace/comments] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1343: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4166: WTF_CONCAT is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4168: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4173: Wrong number of spaces before statement. (expected: 12) [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6285: Wrong number of spaces before statement. (expected: 8) [whitespace/indent] [4]
Total errors found: 9 in 4 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 322185[details]
Archive of layout-test-results from ews104 for mac-elcapitan-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews104 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6
Created attachment 322186[details]
Archive of layout-test-results from ews112 for mac-elcapitan
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Created attachment 322187[details]
Archive of layout-test-results from ews103 for mac-elcapitan
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Created attachment 322455[details]
Fix stupid bug in handleInlining, everything optimizing tail calls is commented out, it should hopefully pass ews tests this time
Attachment 322455[details] did not pass style-queue:
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1342: Should have a space between // and comment [whitespace/comments] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1343: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4177: WTF_CONCAT is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4179: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4184: Wrong number of spaces before statement. (expected: 12) [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6265: Wrong number of spaces before statement. (expected: 8) [whitespace/indent] [4]
Total errors found: 6 in 6 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Attachment 322511[details] did not pass style-queue:
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1342: Should have a space between // and comment [whitespace/comments] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1343: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4177: WTF_CONCAT is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4179: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4184: Wrong number of spaces before statement. (expected: 12) [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6264: Wrong number of spaces before statement. (expected: 8) [whitespace/indent] [4]
Total errors found: 6 in 6 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Attachment 322545[details] did not pass style-queue:
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1313: Multi line control clauses should use braces. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1343: Should have a space between // and comment [whitespace/comments] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1344: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4178: WTF_CONCAT is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4180: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4185: Wrong number of spaces before statement. (expected: 12) [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6267: Wrong number of spaces before statement. (expected: 8) [whitespace/indent] [4]
Total errors found: 7 in 6 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 322545[details]
Commented out the buggy splitting of entry block
View in context: https://bugs.webkit.org/attachment.cgi?id=322545&action=review
Seems reasonable but there are some things you should clean up.
> Source/JavaScriptCore/ChangeLog:17
> + - m_linkingTargets is a dictionnary from bytecode indices to BasicBlock*
typo: dictionnary -> dictionary.
> Source/JavaScriptCore/ChangeLog:21
> + terminal that need to be taken off m_unlinkedBlocks.
typo: needs
> Source/JavaScriptCore/ChangeLog:28
> + - The 7 and 8 arguments of handleCall were inlined in its 3 arguments version for readability
> + - The 9 argument version was cleaned up and simplified
I feel like something is wrong when a function has more than 6 arguments... :(
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1236
> + BasicBlock* blockPtr = block.ptr();
Nit: you could probably drop this line.
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1273
> + Node* callTarget = get(VirtualRegister(pc[2].u.operand));
> + int argumentCountIncludingThis = pc[3].u.operand;
> + int registerOffset = -pc[4].u.operand;
It would be great if you gave these names in the BytecodeList.json and used them here.
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1325
> +bool ByteCodeParser::handleRecursiveTailCall(Node* callTargetNode, VirtualRegister thisArgument, CallLinkStatus callLinkStatus)
> +{
> + // We currently only do this optimisation in the simple, non-polymorphic case.
> + if (callLinkStatus.couldTakeSlowPath() || callLinkStatus.size() != 1)
> + return false;
> +
> + auto targetExecutable = callLinkStatus[0].executable();
> + InlineStackEntry* stackEntry = m_inlineStackTop;
> + do {
> + if (targetExecutable != stackEntry->executable())
> + continue;
> +
> + VERBOSE_LOG(" We found a recursive tail, optimizing it to a jump to block ", *stackEntry->m_callsiteBlockHead, ".\n");
> + // TODO: set the arguments to the right value
> + // TODO: shadow chicken log
> +
> + // We must also add some check that the profiling information was correct and the target of this call is what we thought
> + emitFunctionChecks(callLinkStatus[0], callTargetNode, thisArgument);
> +
> + VERBOSE_LOG(" Repeating the work of op_enter as we are jumping to right after it. \n");
> + Node* undefined = addToGraph(JSConstant, OpInfo(m_constantUndefined));
> + // Initialize all locals to undefined.
> + for (int i = 0; i < stackEntry->m_codeBlock->m_numVars; ++i)
> + setDirect(stackEntry->remapOperand(virtualRegisterForLocal(i)), undefined, ImmediateSetWithFlush);
> + // TODO: check that ImmediateSetWithFlush is the right pick here.
> +
> + Node* jumpNode = addToGraph(Jump);
> + jumpNode->targetBlock() = stackEntry->m_callsiteBlockHead;
> + m_currentBlock->didLink();
> + flushForTerminal();
> + return true;
> + } while ((stackEntry = stackEntry->m_caller));
> +
> + // The tail call was not recursive
> + return false;
This function isn't used, can you remove it for now?
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1344
> + //if (op == TailCall && handleRecursiveTailCall(callTarget, thisArgument, callLinkStatus))
> + // return Terminal;
Please remove commented code.
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2052
> + // TODO: we should be able to get rid of this somehow.
Make this a FIXME and link a bugzilla.
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2080
> + // TODO: we can avoid this indirection in the case of normal inlined call by passing the continuation block all the way into inlineStackEntry->m_continuationBlock in inlineCall.
ditto.
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4252
> + // to be true. TODO: change this.
Ditto.
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5276
> + if (m_inlineStackTop->m_returnValue.isValid()) // TODO: understand what happens if it is not.
Ditto.
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5284
> + // TODO: should there be a special case for when we are returning from the first block? I do not see why, but there was one.
Ditto.
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6257
> + BasicBlock* callsiteBlockHead, // TODO: remove from constructor, since it is set in parseBlock, when dealing with op_enter
Ditto.
> > Source/JavaScriptCore/ChangeLog:28
> > + - The 7 and 8 arguments of handleCall were inlined in its 3 arguments version for readability
> > + - The 9 argument version was cleaned up and simplified
>
> I feel like something is wrong when a function has more than 6 arguments...
> :(
I totally agree, but I saw no easy way to solve this problem, and my refactoring was already pretty big.
> > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1236
> > + BasicBlock* blockPtr = block.ptr();
>
> Nit: you could probably drop this line.
I am not sure it would remain correct to refer to block.ptr() after I WTFMove(block).
> > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1273
> > + Node* callTarget = get(VirtualRegister(pc[2].u.operand));
> > + int argumentCountIncludingThis = pc[3].u.operand;
> > + int registerOffset = -pc[4].u.operand;
>
> It would be great if you gave these names in the BytecodeList.json and used
> them here.
Is there a way to do so that correctly negate the registerOffset, and correctly "get"s the callTarget? If not, I am not sure what to call those before the operands are processed.
Comment on attachment 322545[details]
Commented out the buggy splitting of entry block
View in context: https://bugs.webkit.org/attachment.cgi?id=322545&action=review
Robin, is the goal of this review to get feedback on the tail loop stuff, or to first check in the refactoring? If the latter, then let's open a new bug with a title for the refactoring.
>> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2052
>> + // TODO: we should be able to get rid of this somehow.
>
> Make this a FIXME and link a bugzilla.
What's your conclusion here? I don't vote for a bugzilla unless we know how we'll get rid of this.
>> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2080
>> + // TODO: we can avoid this indirection in the case of normal inlined call by passing the continuation block all the way into inlineStackEntry->m_continuationBlock in inlineCall.
>
> ditto.
Seems like this might be worth just doing.
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4292
> + if (false) {
> + VERBOSE_LOG("Splitting the entry block for potential recursive tail calls.\n");
> + Node* jumpNode = addToGraph(Jump);
> + BasicBlock* newBlock = allocateUntargetableBlock();
> + jumpNode->targetBlock() = newBlock;
> + m_currentBlock->didLink();
> + m_currentBlock = newBlock;
> + }
What happens when you run this code?
>> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5276
>> + if (m_inlineStackTop->m_returnValue.isValid()) // TODO: understand what happens if it is not.
>
> Ditto.
Going against Keith's ditto here: please don't file a bug on you not fully understanding of the code. I'm like 95% sure this has to do w/ setters, but regardless of that, you just figure out why this happens, and remove the TODO
>> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5284
>> + // TODO: should there be a special case for when we are returning from the first block? I do not see why, but there was one.
>
> Ditto.
Ditto here. This seems like something you need to figure out before landing
>> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6257
>> + BasicBlock* callsiteBlockHead, // TODO: remove from constructor, since it is set in parseBlock, when dealing with op_enter
>
> Ditto.
Ditto here. This is something you should either do or remove your TODO
Attachment 323474[details] did not pass style-queue:
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:215: The parameter name "stackEntry" adds no information, so it should be removed. [readability/parameter_name] [5]
Total errors found: 1 in 3 files
If any of these errors are false positives, please file a bug against check-webkit-style.
After running a few tests, it is clear that this regression is entirely caused by the splitting of op_enter blocks: in particular it reproduces even if handleRecursiveTailCall is replaced by return false.
Some more things about this performance regression:
- It only shows on Octane/raytrace: neither sun spider, kraken or microbenchmarks have any significant regression (maybe a 1% regression on Kraken, to investigate later)
- It is caused purely by the splitting of entry blocks, not by the actual optimisation
- It persists even when not doing the splitting for functions at the leaves of the call graph (through testing for the emptiness of callLinkInfoMap).
- It persists even when running DFGCFGSimplificationPhase right after the byte code parser (to eliminate unnecessary jumps)
- It does not appear to be located in a single JS function: when attempting to bisect with JSC_bytecodeRangeToDFGCompile, the difference with the baseline progressively shrinks instead of having a cliff somewhere.
Attachment 323558[details] did not pass style-queue:
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:215: The parameter name "stackEntry" adds no information, so it should be removed. [readability/parameter_name] [5]
ERROR: Source/JavaScriptCore/ChangeLog:12: Line contains tab character. [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:13: Line contains tab character. [whitespace/tab] [5]
Total errors found: 3 in 3 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 323558[details]
Fixed a horrible design bug, and the performance regression
View in context: https://bugs.webkit.org/attachment.cgi?id=323558&action=review> Source/JavaScriptCore/bytecode/PreciseJumpTargets.cpp:68
> + // We need to insert a jump after op_enter, so recursive tail calls have somewhere to jump to.
> + // But we only want to pay that price for functions that have at least one tail call.
> + CallLinkInfoMap map;
> + codeBlock->getCallLinkInfoMap(map);
> + for (auto info : map.values()) {
> + if (info->callMode() == CallMode::Tail) {
> + out.append(bytecodeOffset + opcodeLengths[op_enter]);
> + break;
> + }
> + }
> + }
You should just set a bit if a CodeBlock has an op_tail_call in it. You can probably do it in bytecode generator via UnlinkedCodeBlock.
Comment on attachment 323558[details]
Fixed a horrible design bug, and the performance regression
View in context: https://bugs.webkit.org/attachment.cgi?id=323558&action=review> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1433
> + if (auto callFrame = stackEntry->m_inlineCallFrame) {
nit: I prefer a type here, or at least, auto* instead of auto.
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1436
> + if (argumentCountIncludingThis != (int) callFrame->argumentCountIncludingThis)
Style: WebKit style uses static_cast instead of C-style casts.
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1465
> + // We want to emit the SetLocals with a code origin that points to the place we are jumping to.
Nit: "with a code origin" => "with an exit origin"
since the semantic origin won't be there.
Attachment 323710[details] did not pass style-queue:
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:215: The parameter name "stackEntry" adds no information, so it should be removed. [readability/parameter_name] [5]
Total errors found: 1 in 7 files
If any of these errors are false positives, please file a bug against check-webkit-style.
We now see an improvement of TailBench9000 but it is rather disappointing:
- n-body-run.js roughly went from 1.05s to 0.95s
- merge-sort-run.js roughly went from 1.40s to 1.10s
I have not yet ran other benchmarks on this last version of the code.
Created attachment 323785[details]
Archive of layout-test-results from ews114 for mac-elcapitan
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews114 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Comment on attachment 323710[details]
Moved the splitting of op_enter to UnlinkedCodeBlocks, and correctly recognize when it has happened
My experiments with inserting LoopHint have not improved performance so far, so I would like to get this patch landed as is, and make it a separate bug.
The build-bot failure looks very much like a flaky test to me, I don't see how my patch could break layout. Is there a simple way to convince the buildbot to try again? Or should I just resubmit the patch?
(In reply to Robin Morisset from comment #78)
Is there a simple way to convince the
> buildbot to try again? Or should I just resubmit the patch?
I think you just need to re-upload the change unfortunately.
Comment on attachment 323710[details]
Moved the splitting of op_enter to UnlinkedCodeBlocks, and correctly recognize when it has happened
View in context: https://bugs.webkit.org/attachment.cgi?id=323710&action=review> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1418
> + // We currently only do this optimisation in the simple, non-polymorphic case.
Maybe add a FIXME with a bugzilla link here since you told me you wanted to do this and you thought it'd be pretty easy?
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1429
> + BasicBlock** entryBlockPtr = tryBinarySearch<BasicBlock*, unsigned>(stackEntry->m_blockLinkingTargets, stackEntry->m_blockLinkingTargets.size(), OPCODE_LENGTH(op_enter), getBytecodeBeginForBlock);
m_blockLInkingTargets is sorted?
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1480
> + // It would be unsound to jump over a non-tail call: the "tail" call is not really a tail call in that case.
Please add tests for this since you didn't find any when running the tests with the earlier broken patch. It'd be good to verify your earlier broken version would fail such a test.
(In reply to Saam Barati from comment #80)
> Comment on attachment 323710[details]
> Moved the splitting of op_enter to UnlinkedCodeBlocks, and correctly
> recognize when it has happened
>
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=323710&action=review
>
> > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1418
> > + // We currently only do this optimisation in the simple, non-polymorphic case.
>
> Maybe add a FIXME with a bugzilla link here since you told me you wanted to
> do this and you thought it'd be pretty easy?
Done
> > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1429
> > + BasicBlock** entryBlockPtr = tryBinarySearch<BasicBlock*, unsigned>(stackEntry->m_blockLinkingTargets, stackEntry->m_blockLinkingTargets.size(), OPCODE_LENGTH(op_enter), getBytecodeBeginForBlock);
>
> m_blockLInkingTargets is sorted?
Yes. It was already mentioned in a comment; I added some ASSERTs to allocateTargetableBlock and makeBlockTargetable.
> > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1480
> > + // It would be unsound to jump over a non-tail call: the "tail" call is not really a tail call in that case.
>
> Please add tests for this since you didn't find any when running the tests
> with the earlier broken patch. It'd be good to verify your earlier broken
> version would fail such a test.
I added some test, but the buggy code currently appears to be passing the test somehow, I am investigating.
Mystery solved: there was yet another bug (!) that was causing this entire code path not to be exercised by the tests for either the buggy or the "fixed" version of the code.
In short, because each code block has a superset of the jump targets, a function with no control-flow was deemed to have no jump targets, even if it had a tail call, inhibiting the splitting of the basic block that includes op_enter.
After fixing this bug, we can now correctly detect the other bugs from previous versions of this patch with the test. I also added another test as I realized that one code path was still not tested. The tests now (hopefully) correctly find wrong code generation; but I do not see how to test that the transformation occurred when it should.
Comment on attachment 324120[details]
Patch
View in context: https://bugs.webkit.org/attachment.cgi?id=324120&action=review
r=me with a few comments.
> Source/JavaScriptCore/bytecode/CodeBlock.h:928
> + bool hasTailCalls() const {return m_unlinkedCode->hasTailCalls();}
style nit: this should be written as:
{ return m_unlinkedCode->hasTailCalls(); }
for the spacing to follow webkit style
> Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:1321
> + // We must add the end of op_enter as a potential jump target, because the bytecode parser may decide to split its basic block
> + // to have somewhere to jump to if there is a recursive tail-call that points to this function.
> + m_codeBlock->addJumpTarget(instructions().size());
> + // This disables peephole optimizations when an instruction is a jump target
> + m_lastOpcodeID = op_end;
Is it worth just emitting a label since that will do this work for us? Or should we keep this as is?
> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1469
> + // We jump right after it and not before it, because of some invariant saying that an OSR entry point cannot have predecessors in the IR.
This is not an accurate comment. The invariant is that a root in the CFG cannot have a predecessor. An OSR entrypoint *can* have predecessors. For example, loops are often OSR entry points. FWIW, we could totally jump to op_enter, we'd just need to split before op_enter, not after. This may be awkward. I'm totally OK with how you've implemented it.
(In reply to Saam Barati from comment #85)
> Comment on attachment 324120[details]
> Patch
>
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=324120&action=review
>
> r=me with a few comments.
>
> > Source/JavaScriptCore/bytecode/CodeBlock.h:928
> > + bool hasTailCalls() const {return m_unlinkedCode->hasTailCalls();}
>
> style nit: this should be written as:
> { return m_unlinkedCode->hasTailCalls(); }
> for the spacing to follow webkit style
Done
> > Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:1321
> > + // We must add the end of op_enter as a potential jump target, because the bytecode parser may decide to split its basic block
> > + // to have somewhere to jump to if there is a recursive tail-call that points to this function.
> > + m_codeBlock->addJumpTarget(instructions().size());
> > + // This disables peephole optimizations when an instruction is a jump target
> > + m_lastOpcodeID = op_end;
>
> Is it worth just emitting a label since that will do this work for us? Or
> should we keep this as is?
That was my first idea. The problem is that we would then have a Ref<Label> with nowhere to put it. It is not a serious problem, it just felt wrong to allocate data that we would never even read back before freeing it. I can change it back if you prefer.
> > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1469
> > + // We jump right after it and not before it, because of some invariant saying that an OSR entry point cannot have predecessors in the IR.
>
> This is not an accurate comment. The invariant is that a root in the CFG
> cannot have a predecessor. An OSR entrypoint *can* have predecessors. For
> example, loops are often OSR entry points. FWIW, we could totally jump to
> op_enter, we'd just need to split before op_enter, not after. This may be
> awkward. I'm totally OK with how you've implemented it.
Thanks for the comment, will fix.
This should finally be finished.
Diff with the previous version of the patch:
- Applied both of Saam's comments
- Added more tests, and slight refactoring of the tests.
- Added one RELEASE_ASSERT instead of silently failing if for whatever reason the op_enter block is not split when we are expecting it to be.
This patch broke Speedometer 2's React-Redux-TodoMVC test case.
To reproduce,
1. Go to http://browserbench.org/Speedometer2.0/InteractiveRunner.html
2. Click on "Unselect all"
3. Select "React-Redux-TodoMVC"
4. Click on "Run"
Open Web Inspector and observe that there is an exception being thrown.
You have to close the inspector & reload the page before running it again since the bug doesn't reproduce when Web Inspector is open.
Thanks for the bug report, I am looking into it.
Here is what I have found about it so far:
- The bug appears deep within React code, taken from http://browserbench.org/Speedometer2.0/resources/todomvc/architecture-examples/react-redux/dist/index.html (sadly partially minified).
- The bug is that "this" is apparently undefined in "this.moveChild", in _updateChildren in that file.
- _updateChildren appears many times in the stack trace, but it does not appear to make any tail call so it should not be optimized.
- The bug disappear if we refuse to make jumps to inlined functions, or if we refuse to make jumps across levels of the inline stack stack. So it is a case where we are creating a jump from one inlined function to another (higher in the call stack).
- The bug does not disappear if I disable the optimization when the number of arguments is different from the number of parameters of the relevant code block (for inlined frames).
- It reproduces fine with JSC_useConcurrentJIT=0
- Removing the repetition of op_enter does not make the bug disappear change.
- Removing the setting of extra parameters to undefined also does not make the bug disappear or change.
Because of how tricky it was to write a minimized regression test, and speedometer provides some protection against regression, I decided to reland this patch with the fix described in https://bugs.webkit.org/show_bug.cgi and no extra test.
I marked it r=sbarati as we discussed this together.
Collected 10 samples per benchmark/VM, with 10 VM invocations per benchmark. Emitted a call to
gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used
the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark
execution times with 95% confidence intervals in milliseconds.
Baseline TailCall
n-body 837.5192+-2.5306 ^ 742.2701+-2.1430 ^ definitely 1.1283x faster
merge-sort 597.9785+-5.6828 ^ 521.2900+-3.2286 ^ definitely 1.1471x faster
bf-interpreter 478.1371+-4.7080 ^ 334.9072+-1.8962 ^ definitely 1.4277x faster
<geometric> 620.9622+-3.5729 ^ 506.0356+-1.2795 ^ definitely 1.2271x faster
No visible change on other benchmarks that I've tested.
2017-09-15 05:39 PDT, Robin Morisset
2017-09-15 06:33 PDT, Build Bot
2017-09-15 06:36 PDT, Build Bot
2017-09-15 06:37 PDT, Build Bot
2017-09-15 07:03 PDT, Build Bot
2017-09-18 07:41 PDT, Robin Morisset
2017-09-22 06:32 PDT, Robin Morisset
2017-09-22 07:14 PDT, Build Bot
2017-09-22 07:25 PDT, Build Bot
2017-09-22 07:30 PDT, Build Bot
2017-09-22 07:41 PDT, Build Bot
2017-09-25 03:23 PDT, Robin Morisset
2017-09-25 06:13 PDT, Robin Morisset
2017-09-25 07:23 PDT, Robin Morisset
2017-09-25 08:09 PDT, Build Bot
2017-09-25 08:16 PDT, Build Bot
2017-09-25 08:19 PDT, Build Bot
2017-09-25 08:39 PDT, Build Bot
2017-09-26 09:36 PDT, Robin Morisset
2017-09-27 08:37 PDT, Robin Morisset
2017-09-27 10:52 PDT, Robin Morisset
2017-09-28 00:32 PDT, Robin Morisset
2017-09-29 07:01 PDT, Robin Morisset
2017-09-29 08:12 PDT, Build Bot
2017-09-29 08:40 PDT, Build Bot
2017-09-29 08:55 PDT, Build Bot
2017-10-02 15:18 PDT, Robin Morisset
2017-10-03 05:20 PDT, Robin Morisset
2017-10-03 10:43 PDT, Robin Morisset
2017-10-11 13:42 PDT, Robin Morisset
2017-10-11 15:40 PDT, Robin Morisset
2017-10-12 13:44 PDT, Robin Morisset
2017-10-13 11:17 PDT, Robin Morisset
2017-10-13 18:37 PDT, Build Bot
2017-10-18 07:44 PDT, Robin Morisset
2017-10-18 08:57 PDT, Robin Morisset
2017-10-19 04:00 PDT, Robin Morisset
2017-10-31 10:29 PDT, Robin Morisset
2017-11-08 12:11 PST, Robin Morisset