NEW 84598
Eliminate unnecessary SetLocal/GetLocals in DFG
https://bugs.webkit.org/show_bug.cgi?id=84598
Summary Eliminate unnecessary SetLocal/GetLocals in DFG
Yuqiang Xian
Reported 2012-04-23 07:42:43 PDT
In current DFG, we use SetLocal/GetLocals to pass values across blocks. Also the SetLocal serves OSR exit to determine how to do value recoveries. This makes it hard to perform more global high level optimizations. Given the fact that we don't have global register allocation at current stage, we cannot get rid of all of the SetLocal/GetLocals easily. But what we can do is to eliminate the unnecessary SetLocal/GetLocals, and we think they're unnecessary if the variable they operate on only has one definition - we already have this information with the redundant Phi elimination, in which case we can have the use nodes in other blocks directly reference the define nodes producing the results without SetLocal/GetLocals. In order to do so we need to retain the results, and currently we have to keep them in memory due to the lack of global reg alloc. Here's an example on how we would like to change the DFG IR: Source code snippet: var a = 1; var b = a + 1; for (var i = 0; i < iteration; i++) { result += 2 * (a + b); } IR generated without this patch: Block #0 9: < 3:5> JSConstant(ResultJS|UsedAsNum|NeedsNegZero, $1= Int32: 1) 10: < 1:-> SetLocal(@9<Int32>, r1(M)) predicting Int, double ratio 0.000000 11: <!1:5> ValueAdd(@9<Int32>, @9<Int32>, ResultJS|MustGenerate|MightClobber|UsedAsNum|NeedsNegZero) 12: < 1:-> SetLocal(@11<Int32>, r2(N)) predicting Int, double ratio 0.000000 Block #1 20: < 1:5> GetLocal(@10, ResultJS|UsedAsNum|NeedsNegZero, r1(M)) predicting Int, double ratio 0.000000 22: < 1:6> GetLocal(@12, ResultJS|UsedAsNum|NeedsNegZero, r2(N)) predicting Int, double ratio 0.000000 23: <!1:6> ValueAdd(@20<Int32>, @22<Int32>, ResultJS|MustGenerate|MightClobber|UsedAsNum|NeedsNegZero) IR generated with this patch: Block #0 9: < 3:5> JSConstant(ResultJS|UsedAsNum|NeedsNegZero, $1 = Int32: 1) 11: <!1:6> ValueAdd(@9<Int32>, @9<Int32>, ResultJS|MustGenerate|MightClobber|UsedAsNum|NeedsNegZero) Block #2 29: <!1:4> ValueAdd(@9<Int32>, @11<Int32>, ResultJS|MustGenerate|MightClobber|UsedAsNum|NeedsNegZero)
Attachments
WIP patch (88.78 KB, patch)
2012-04-23 07:50 PDT, Yuqiang Xian
no flags
WIP patch (90.03 KB, patch)
2012-04-25 06:25 PDT, Yuqiang Xian
no flags
WIP patch updated (90.41 KB, patch)
2012-04-26 02:08 PDT, Yuqiang Xian
no flags
Yuqiang Xian
Comment 1 2012-04-23 07:50:11 PDT
Created attachment 138347 [details] WIP patch WIP patch - this passed the javascriptcore tests and the benchmark cases.
Yuqiang Xian
Comment 2 2012-04-23 07:54:01 PDT
(In reply to comment #1) > Created an attachment (id=138347) [details] > WIP patch > > WIP patch - this passed the javascriptcore tests and the benchmark cases. This patch has included the patch for bug #84324.
Filip Pizlo
Comment 3 2012-04-23 09:30:26 PDT
Comment on attachment 138347 [details] WIP patch View in context: https://bugs.webkit.org/attachment.cgi?id=138347&action=review > Source/JavaScriptCore/dfg/DFGAbstractState.cpp:1076 > return; > for (size_t i = indexInBlock + 1; i--;) > forNode(m_block->at(i)).clobberStructures(); > + > + // Deal with the nodes defined in other blocks but referenced in current block. > + for (size_t arg = 0; arg < m_block->variablesForOSR.numberOfArguments(); ++arg) { > + NodeIndex nodeIndex = m_block->variablesForOSR.argument(arg); > + if (nodeIndex != NoNode) > + forNode(nodeIndex).clobberStructures(); > + } > + for (size_t local = 0; local < m_block->variablesForOSR.numberOfLocals(); ++local) { > + NodeIndex nodeIndex = m_block->variablesForOSR.local(local); > + if (nodeIndex != NoNode) > + forNode(nodeIndex).clobberStructures(); > + } > + > for (size_t i = 0; i < m_variables.numberOfArguments(); ++i) > m_variables.argument(i).clobberStructures(); > for (size_t i = 0; i < m_variables.numberOfLocals(); ++i) It seems like these changes to AbstractState are not enough. You need to make sure that each basic block has its own copy of all variables it uses, from the standpoint of AbstractState. Otherwise you will get unsound behavior. Example: 1: var o = ...; 2: if (p) 3: ... = o.f; 4: ... = o.f; AbstractState will prove in (3) that o is a cell and that it has a particular structure. But we need to be careful and make sure that we remember that there exists a path (from (2) to (4) directly, skipping (3)) where this proof has not been introduced. Hence, at (4) we actually don't know anything about o yet. With your patch it seems that you will end up propagating the proof from (3) to (4), which is wrong. One of the reasons why I think that SetLocal/GetLocal are useful, and why I haven't been able to convince myself to go directly to SSA, is that the SetLocal/GetLocal situation allows us to easily reason about the set of variables that are live at block boundaries, and to easy access them with a unified indexing scheme (operand numbers). We could build similar functionality for SSA, but it will take some care.
Yuqiang Xian
Comment 4 2012-04-25 06:25:33 PDT
Created attachment 138801 [details] WIP patch
Yuqiang Xian
Comment 5 2012-04-25 07:05:59 PDT
(In reply to comment #3) > > It seems like these changes to AbstractState are not enough. > > You need to make sure that each basic block has its own copy of all variables it uses, from the standpoint of AbstractState. Otherwise you will get unsound behavior. Example: > > 1: var o = ...; > 2: if (p) > 3: ... = o.f; > 4: ... = o.f; > > AbstractState will prove in (3) that o is a cell and that it has a particular structure. But we need to be careful and make sure that we remember that there exists a path (from (2) to (4) directly, skipping (3)) where this proof has not been introduced. Hence, at (4) we actually don't know anything about o yet. > > With your patch it seems that you will end up propagating the proof from (3) to (4), which is wrong. > Thanks for pointing out this. Yes it's a bug, and the new patch is supposed to fix this, by merging the predictions of o from both (2) and (3) to (4), and making the prediction of o block local. Thankfully the valuesForOSR aids. > One of the reasons why I think that SetLocal/GetLocal are useful, and why I haven't been able to convince myself to go directly to SSA, is that the SetLocal/GetLocal situation allows us to easily reason about the set of variables that are live at block boundaries, and to easy access them with a unified indexing scheme (operand numbers). We could build similar functionality for SSA, but it will take some care. I agree that SetLocal/GetLocal are useful. But I think for the variable having exactly one definition, we should be able to get rid of SetLocal/GetLocal without much pain, and that's exactly what this patch tries to do. And it should provide a better situation for more high level optimizations (e.g., we're not required to place SetLocal/GetLocals and possibly allocate new operands when we move some nodes across blocks, which also should not impact OSR as SetLocal is still used for OSR exit hints and we don't move the SetLocals). Long term if we'll have liveness analysis and global reg alloc, that would be more sound.
Yuqiang Xian
Comment 6 2012-04-26 02:08:41 PDT
Created attachment 138957 [details] WIP patch updated This seems passing the layout tests, while with 1~2% performance regression on the 3 benchmarks.
Yuqiang Xian
Comment 7 2012-05-14 19:41:08 PDT
(In reply to comment #6) > This seems passing the layout tests, while with 1~2% performance regression on the 3 benchmarks. The regression is mainly caused by the frequent OSR entries. Suppose we have such a scenario: 1), a loop becomes hot and dfg is triggered to optimize it; 2), during the execution of the optimized code for the loop, we encounter a ForceOSRExit and fall-back to the generic code; 3), the generic code is executed and we reach the next iteration of the loop, and re-enter the optimized code through the OSR entry; 4), repeat 2. As some local variables are not placed in the stack slots corresponding to the operands via SetLocals with this patch, instead they might be "displaced" to the virtual registers (we now don't have global reg alloc), we need to have the values of those variables placed in the "displaced" virtual registers at the OSR entries, which involves memory reads/writes for each variable. Thus we suffer a lot with too frequent OSR entries.
Note You need to log in before you can comment on or make changes to this bug.