WebKit Bugzilla
Attachment 342499 Details for
Bug 181409
: Reduce graph size by replacing terminal nodes in blocks that have a ForceOSRExit with Unreachable
Home
|
New
|
Browse
|
Search
|
[?]
|
Reports
|
Requests
|
Help
|
New Account
|
Log In
Remember
[x]
|
Forgot Password
Login:
[x]
[patch]
patch
c-backup.diff (text/plain), 10.48 KB, created by
Saam Barati
on 2018-06-11 18:53:15 PDT
(
hide
)
Description:
patch
Filename:
MIME Type:
Creator:
Saam Barati
Created:
2018-06-11 18:53:15 PDT
Size:
10.48 KB
patch
obsolete
>Index: Source/JavaScriptCore/ChangeLog >=================================================================== >--- Source/JavaScriptCore/ChangeLog (revision 232737) >+++ Source/JavaScriptCore/ChangeLog (working copy) >@@ -1,3 +1,58 @@ >+2018-06-11 Saam Barati <sbarati@apple.com> >+ >+ Reduce graph size by replacing terminal nodes in blocks that have a ForceOSRExit with Unreachable >+ https://bugs.webkit.org/show_bug.cgi?id=181409 >+ <rdar://problem/36383749> >+ >+ Reviewed by NOBODY (OOPS!). >+ >+ This patch is me redoing r226655. This is a patch I wrote when >+ profiling Speedometer. Fil rolled this change out in r230928. He >+ showed this slowed down a sunspider tests by ~2x. This sunspider >+ regression revealed a real performance bug in the original change: >+ we would kill blocks that reached OSR entry targets, sometimes leading >+ us to not do OSR entry into the DFG, since we could end up deleting >+ entire loops from the CFG. The reason for this is that code that has run >+ ~once and that reaches loops often has ForceOSRExits inside of it. The >+ solution to this is to not perform this optimization on blocks that can >+ reach OSR entry targets. >+ >+ The reason I'm redoing this patch is that it turns out Fil rolling >+ out the change was a Speedometer 2 regression. >+ >+ This is a modified version of the original ChangeLog I wrote in r226655: >+ >+ When I was looking at profiler data for Speedometer, I noticed that one of >+ the hottest functions in Speedometer is around 1100 bytecode operations long. >+ Only about 100 of those bytecode ops ever execute. However, we ended up >+ spending a lot of time compiling basic blocks that never executed. We often >+ plant ForceOSRExit nodes when we parse bytecodes that have a null value profile. >+ This is the case when such a node never executes. >+ >+ This patch makes it so that anytime a block has a ForceOSRExit, and that block >+ can not reach an OSR entry target, we replace its terminal node with an Unreachable >+ node, and remove all nodes after the ForceOSRExit. This cuts down the graph >+ size since it removes control flow edges from the CFG. This allows us to get >+ rid of huge chunks of the CFG in certain programs. When doing this transformation, >+ we also insert Flushes/PhantomLocals to ensure we can recover values that are bytecode >+ live-in to the ForceOSRExit. >+ >+ Using ForceOSRExit as the signal for this is a bit of a hack. It definitely >+ does not get rid of all the CFG that it could. If we decide it's worth >+ it, we could use additional inputs into this mechanism. For example, we could >+ profile if a basic block ever executes inside the LLInt/Baseline, and >+ remove parts of the CFG based on that. >+ >+ When running Speedometer with the concurrent JIT turned off, this patch >+ improves DFG/FTL compile times by around 5%. >+ >+ * dfg/DFGByteCodeParser.cpp: >+ (JSC::DFG::ByteCodeParser::addToGraph): >+ (JSC::DFG::ByteCodeParser::inlineCall): >+ (JSC::DFG::ByteCodeParser::parse): >+ * dfg/DFGGraph.cpp: >+ (JSC::DFG::Graph::blocksInPostOrder): >+ > 2018-06-11 Mark Lam <mark.lam@apple.com> > > Add support for webkit-test-runner jscOptions in DumpRenderTree and WebKitTestRunner. >Index: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp >=================================================================== >--- Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp (revision 232736) >+++ Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp (working copy) >@@ -697,6 +697,9 @@ private: > Node* addToGraph(Node* node) > { > VERBOSE_LOG(" appended ", node, " ", Graph::opName(node->op()), "\n"); >+ >+ m_hasAnyForceOSRExits |= (node->op() == ForceOSRExit); >+ > m_currentBlock->append(node); > if (clobbersExitState(m_graph, node)) > m_exitOK = false; >@@ -1150,6 +1153,7 @@ private: > > Instruction* m_currentInstruction; > bool m_hasDebuggerEnabled; >+ bool m_hasAnyForceOSRExits { false }; > }; > > BasicBlock* ByteCodeParser::allocateTargetableBlock(unsigned bytecodeIndex) >@@ -6832,6 +6836,104 @@ void ByteCodeParser::parse() > parseCodeBlock(); > linkBlocks(inlineStackEntry.m_unlinkedBlocks, inlineStackEntry.m_blockLinkingTargets); > >+ if (m_hasAnyForceOSRExits) { >+ BlockSet blocksToIgnore; >+ for (BasicBlock* block : m_graph.blocksInNaturalOrder()) { >+ if (block->isOSRTarget) >+ blocksToIgnore.add(block); >+ } >+ >+ { >+ bool isSafeToValidate = false; >+ auto postOrder = m_graph.blocksInPostOrder(isSafeToValidate); // This algorithm doesn't rely on the predecessors list, which is not yet built. >+ bool changed; >+ do { >+ changed = false; >+ for (BasicBlock* block : postOrder) { >+ for (BasicBlock* successor : block->successors()) { >+ if (blocksToIgnore.contains(successor)) { >+ changed |= blocksToIgnore.add(block); >+ break; >+ } >+ } >+ } >+ } while (changed); >+ } >+ >+ InsertionSet insertionSet(m_graph); >+ Operands<VariableAccessData*> mapping(OperandsLike, m_graph.block(0)->variablesAtHead); >+ >+ for (BasicBlock* block : m_graph.blocksInNaturalOrder()) { >+ if (blocksToIgnore.contains(block)) >+ continue; >+ >+ mapping.fill(nullptr); >+ if (validationEnabled()) { >+ // Verify that it's correct to fill mapping with nullptr. >+ for (unsigned i = 0; i < block->variablesAtHead.size(); ++i) { >+ Node* node = block->variablesAtHead.at(i); >+ RELEASE_ASSERT(!node); >+ } >+ } >+ >+ for (unsigned nodeIndex = 0; nodeIndex < block->size(); ++nodeIndex) { >+ Node* node = block->at(nodeIndex); >+ >+ if (node->hasVariableAccessData(m_graph)) >+ mapping.operand(node->local()) = node->variableAccessData(); >+ >+ if (node->op() == ForceOSRExit) { >+ NodeOrigin endOrigin = node->origin.withExitOK(true); >+ >+ if (validationEnabled()) { >+ // This verifies that we don't need to change any of the successors's predecessor >+ // list after planting the Unreachable below. At this point in the bytecode >+ // parser, we haven't linked up the predecessor lists yet. >+ for (BasicBlock* successor : block->successors()) >+ RELEASE_ASSERT(successor->predecessors.isEmpty()); >+ } >+ >+ block->resize(nodeIndex + 1); >+ >+ insertionSet.insertNode(block->size(), SpecNone, ExitOK, endOrigin); >+ >+ auto insertLivenessPreservingOp = [&] (InlineCallFrame* inlineCallFrame, NodeType op, VirtualRegister operand) { >+ VariableAccessData* variable = mapping.operand(operand); >+ if (!variable) { >+ variable = newVariableAccessData(operand); >+ mapping.operand(operand) = variable; >+ } >+ >+ VirtualRegister argument = operand - (inlineCallFrame ? inlineCallFrame->stackOffset : 0); >+ if (argument.isArgument() && !argument.isHeader()) { >+ const Vector<ArgumentPosition*>& arguments = m_inlineCallFrameToArgumentPositions.get(inlineCallFrame); >+ arguments[argument.toArgument()]->addVariable(variable); >+ } >+ >+ insertionSet.insertNode(block->size(), SpecNone, op, endOrigin, OpInfo(variable)); >+ }; >+ auto addFlushDirect = [&] (InlineCallFrame* inlineCallFrame, VirtualRegister operand) { >+ insertLivenessPreservingOp(inlineCallFrame, Flush, operand); >+ }; >+ auto addPhantomLocalDirect = [&] (InlineCallFrame* inlineCallFrame, VirtualRegister operand) { >+ insertLivenessPreservingOp(inlineCallFrame, PhantomLocal, operand); >+ }; >+ flushForTerminalImpl(endOrigin.semantic, addFlushDirect, addPhantomLocalDirect); >+ >+ insertionSet.insertNode(block->size(), SpecNone, Unreachable, endOrigin); >+ insertionSet.execute(block); >+ break; >+ } >+ } >+ } >+ } else if (validationEnabled()) { >+ // Ensure our bookkeeping for ForceOSRExit nodes is working. >+ for (BasicBlock* block : m_graph.blocksInNaturalOrder()) { >+ for (Node* node : *block) >+ RELEASE_ASSERT(node->op() != ForceOSRExit); >+ } >+ } >+ > m_graph.determineReachability(); > m_graph.killUnreachableBlocks(); > >Index: Source/JavaScriptCore/dfg/DFGGraph.cpp >=================================================================== >--- Source/JavaScriptCore/dfg/DFGGraph.cpp (revision 232736) >+++ Source/JavaScriptCore/dfg/DFGGraph.cpp (working copy) >@@ -911,7 +911,7 @@ BlockList Graph::blocksInPreOrder() > return result; > } > >-BlockList Graph::blocksInPostOrder() >+BlockList Graph::blocksInPostOrder(bool isSafeToValidate) > { > BlockList result; > PostOrderBlockWorklist worklist; >@@ -930,7 +930,7 @@ BlockList Graph::blocksInPostOrder() > } > } > >- if (validationEnabled()) { >+ if (isSafeToValidate && validationEnabled()) { // There are users of this where we haven't yet built of the CFG enough to be able to run dominators. > auto validateResults = [&] (auto& dominators) { > // When iterating over reverse post order, we should see dominators > // before things they dominate. >Index: Source/JavaScriptCore/dfg/DFGGraph.h >=================================================================== >--- Source/JavaScriptCore/dfg/DFGGraph.h (revision 232736) >+++ Source/JavaScriptCore/dfg/DFGGraph.h (working copy) >@@ -627,7 +627,7 @@ public: > void initializeNodeOwners(); > > BlockList blocksInPreOrder(); >- BlockList blocksInPostOrder(); >+ BlockList blocksInPostOrder(bool isSafeToValidate = true); > > class NaturalBlockIterable { > public:
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Diff
View Attachment As Raw
Actions:
View
|
Formatted Diff
|
Diff
Attachments on
bug 181409
:
330759
|
330760
|
330772
|
330785
|
342315
|
342352
| 342499