Bug 80415

Summary: Eliminate redundant Phis in DFG
Product: WebKit Reporter: Yuqiang Xian <yuqiang.xian>
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: barraclough, fpizlo, rakuco, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 80361    
Bug Blocks: 76770    
Attachments:
Description Flags
WIP patch
none
Initial performance result
none
WIP patch
none
Case showing the problem of type predications propagation after Phi elimination
none
proposed patch
none
Latest performance result
none
updated patch
fpizlo: review+
patch updated fpizlo: review+

Description Yuqiang Xian 2012-03-06 06:17:20 PST
Although this may not have any advantage at current stage, this is towards minimal SSA to make more high level optimizations (like bug 76770) easier.
We have the choices either to build minimal SSA from scratch or to keep current simple Phi insertion mechanism and remove the redundancy in another phase. Currently we choose the latter because the change could be smaller.
Comment 1 Yuqiang Xian 2012-03-06 07:13:42 PST
Created attachment 130375 [details]
WIP patch

WIP - Please let me know if this is the right approach or not. Thanks.
Comment 2 Yuqiang Xian 2012-03-06 07:16:56 PST
Created attachment 130377 [details]
Initial performance result

There seems to be a 9% regression on SunSpider-crypto-sha1, which should be caused by the additional compilation overhead. Will try to investigate it later.
Comment 3 Yuqiang Xian 2012-03-06 07:53:29 PST
Created attachment 130384 [details]
WIP patch

This partially reduces the 9% regression on sunspider-crypto-sha1 to 6-7%. The remaining overhead mainly comes from recording Phi uses when building the graph.
Comment 4 Filip Pizlo 2012-03-06 13:55:51 PST
Comment on attachment 130384 [details]
WIP patch

View in context: https://bugs.webkit.org/attachment.cgi?id=130384&action=review

I think this is great.  My intuition is that we could simplify this further by doing away with the phi use map.  But maybe I'm wrong?  Have you tried it without the use map?

> Source/JavaScriptCore/dfg/DFGAbstractState.cpp:1006
> -    if (!node.refCount())
> +    // We don't want to ignore the skipped Phi nodes, otherwise we may fail to
> +    // propagate the type predictions to other blocks.
> +    if (!node.refCount() && node.op != Phi)

Why?  The idea here is that if we have a node that claims that it is using a variable from another block, and its uses of that variable are dead, then we have no work to do.

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:594
> +    void addPhiUse(NodeIndex phi, NodeIndex node, unsigned childIndex, NodeIndex replace = NoNode)
> +    {
> +        ASSERT(phi != NoNode && m_graph[phi].op == Phi);
> +
> +        Vector<PhiUse>* phiUse;
> +        PhiUseMap::iterator iter = m_graph.m_phiUses.find(phi);
> +        if (iter != m_graph.m_phiUses.end())
> +            phiUse = &iter->second;
> +        else {
> +            Vector<PhiUse> emptyUse;
> +            pair<PhiUseMap::iterator, bool> result = m_graph.m_phiUses.add(phi, emptyUse);
> +            phiUse = &result.first->second;
> +        }
> +
> +        // If a replacement is needed, we want to remove the old use entry.
> +        // This would find and delete an entry in the middle of a vector,
> +        // but we expect this would be rarely happened in practice - only when
> +        // a Phi node in a joint block needs to handle more than 3 incoming edges.
> +        if (replace != NoNode) {
> +            size_t i = phiUse->find(PhiUse(replace, childIndex));
> +            ASSERT(i != notFound);
> +            phiUse->remove(i);
> +        }
> +
> +        phiUse->append(PhiUse(node, childIndex));
> +    }

Not sure I buy that the phi use map is necessary.  I kind of don't like it because it means we cannot rerun the redundant phi replacement algorithm a second time unless we also maintain the consistency of this map.  In general, in the future I want to be able to have a phase that introduces new nodes or switches the children of nodes, and then rerun the phi simplification if I had also done some control flow simplification.  But playing with the children of nodes seems hard if I have to maintain the phi use map as I do it.

> Source/JavaScriptCore/dfg/DFGRedundantPhiEliminationPhase.cpp:87
> +                    switch (use.childIndex()) {
> +                    case 1:
> +                        ASSERT(useNode.child1().indexUnchecked() == phi);
> +                        useNode.children.child1().setIndex(replacement);
> +                        m_graph[replacement].ref();
> +                        break;
> +                    case 2:
> +                        ASSERT(useNode.child2().indexUnchecked() == phi);
> +                        useNode.children.child2().setIndex(replacement);
> +                        m_graph[replacement].ref();
> +                        break;
> +                    case 3:
> +                        ASSERT(useNode.child3().indexUnchecked() == phi);
> +                        useNode.children.child3().setIndex(replacement);
> +                        m_graph[replacement].ref();
> +                        break;
> +                    default:
> +                        break;
> +                    }

Why not kill the switch statement and say useNode.children.child(use.childIndex() - 1)?

> Source/JavaScriptCore/dfg/DFGRedundantPhiEliminationPhase.cpp:126
> +    NodeIndex getRedundantReplacement(NodeIndex phi)
> +    {
> +        NodeIndex child1 = m_graph[phi].child1().indexUnchecked();
> +        NodeIndex candidate = child1 == phi ? NoNode : child1;
> +
> +        NodeIndex child2 = m_graph[phi].child2().indexUnchecked();
> +        if (candidate != NoNode) {
> +            if (child2 != NoNode && child2 != candidate && child2 != phi)
> +                return NoNode;
> +        } else if (child2 != phi)
> +            candidate = child2;
> +
> +        NodeIndex child3 = m_graph[phi].child3().indexUnchecked();
> +        if (candidate != NoNode) {
> +            if (child3 != NoNode && child3 != candidate && child3 != phi)
> +                return NoNode;
> +        } else if (child3 != phi)
> +            candidate = child3;
> +        
> +        return candidate;
> +    }

Can't we run this algorithm without having a phi use map?  Seems to me that we could:

1) Wrap the algorithm in a fixpoint.  You're using a worklist; maybe we could do that too.

2) Walk over the graph inside the fixpoint.

3) For every child that is a phi and fits the simplification criteria, attempt to simplify.
Comment 5 Filip Pizlo 2012-03-06 13:58:42 PST
(In reply to comment #2)
> Created an attachment (id=130377) [details]
> Initial performance result
> 
> There seems to be a 9% regression on SunSpider-crypto-sha1, which should be caused by the additional compilation overhead. Will try to investigate it later.

I would not worry about this too much, since it hardly affects the overall average.  Looking at the perf results, it does not feel like a blocker for landing this patch.

Plus, maybe it's just because the DFG is recompiling like crazy on that benchmark because it is doing something that our speculations don't support?  If that were the case then that would further imply to me that we should ignore that result, and file a separate bug to reduce compile times on that program.
Comment 6 Yuqiang Xian 2012-03-06 17:30:51 PST
(In reply to comment #4)
> I think this is great.  My intuition is that we could simplify this further by doing away with the phi use map.  But maybe I'm wrong?  Have you tried it without the use map?

No, I haven't tried it - I was just worried that traversing the whole graph for Phi replacement may be time consuming especially when the traversal can be iterative. We can try it and I think we should only need the iterative traversal on the Phi nodes (if we have the fix of bug 80361 it would be easier) and a one-time traversal on other nodes in the graph (actually, at current stage, the GetLocals) after all of the Phi replacements are identified.

> 
> > Source/JavaScriptCore/dfg/DFGAbstractState.cpp:1006
> > -    if (!node.refCount())
> > +    // We don't want to ignore the skipped Phi nodes, otherwise we may fail to
> > +    // propagate the type predictions to other blocks.
> > +    if (!node.refCount() && node.op != Phi)
> 
> Why?  The idea here is that if we have a node that claims that it is using a variable from another block, and its uses of that variable are dead, then we have no work to do.
> 

I will attach a case to explain this problem soon.

> 
> Not sure I buy that the phi use map is necessary.  I kind of don't like it because it means we cannot rerun the redundant phi replacement algorithm a second time unless we also maintain the consistency of this map.  In general, in the future I want to be able to have a phase that introduces new nodes or switches the children of nodes, and then rerun the phi simplification if I had also done some control flow simplification.  But playing with the children of nodes seems hard if I have to maintain the phi use map as I do it.
> 

Yes. In the redundant phi elimination, we maintain the consistency of this map. But on the other hand, it really brought troubles when we play with the children, though currently only GetLocals have Phi children (except for Phis), but in the future if we get rid of GetLocal/SetLocals (will?) that may be a problem.

> > Source/JavaScriptCore/dfg/DFGRedundantPhiEliminationPhase.cpp:87
> 
> Why not kill the switch statement and say useNode.children.child(use.childIndex() - 1)?
> 

Agree.

> > Source/JavaScriptCore/dfg/DFGRedundantPhiEliminationPhase.cpp:126
> 
> Can't we run this algorithm without having a phi use map?  Seems to me that we could:
> 
> 1) Wrap the algorithm in a fixpoint.  You're using a worklist; maybe we could do that too.
> 
> 2) Walk over the graph inside the fixpoint.
> 
> 3) For every child that is a phi and fits the simplification criteria, attempt to simplify.

It's possible to go without phi use map. I will try and see if it can be more efficient and simpler.

Thanks for the review!
Comment 7 Filip Pizlo 2012-03-06 17:34:41 PST
(In reply to comment #6)
> (In reply to comment #4)
> > I think this is great.  My intuition is that we could simplify this further by doing away with the phi use map.  But maybe I'm wrong?  Have you tried it without the use map?
> 
> No, I haven't tried it - I was just worried that traversing the whole graph for Phi replacement may be time consuming especially when the traversal can be iterative. We can try it and I think we should only need the iterative traversal on the Phi nodes (if we have the fix of bug 80361 it would be easier) and a one-time traversal on other nodes in the graph (actually, at current stage, the GetLocals) after all of the Phi replacements are identified.

It seems like one way around this is to just track the Phis, and not the Phi uses.

We can first do a fixpoint where we simplify Phis only - which may leave dummy Phis like:

somePhi: Phi(@something)

where @something is either another Phi or something real (like a SetLocal), and somePhi is still referenced from various other places (GetLocal, etc).

Then we can do a second pass that considers all nodes (not just Phis) and checks if those nodes reference trivial Phis.  If they do, then replace.

I think that this should be sufficient, since the fixpoint (as I understand it) only depends on the shape of Phis and not the shapes of users of Phis.  So you only need to visit Phis repeatedly - and after that you can visit all non-Phi uses of Phis in one go.

Would that work?
Comment 8 Yuqiang Xian 2012-03-06 17:39:24 PST
Created attachment 130493 [details]
Case showing the problem of type predications propagation after Phi elimination

This is the case showing the problem of type predications propagation after Phi elimination. Please pay attention to block 3, where node 51 and node 49 are eliminated since those two variables each have only one definition (in block 0). They're not used in block 3 but they should still be alive because they will be used in block 1 (block 3's successor). But after we eliminate 51 and 49 in block 3, the CFA ignores them and the type predictions of the two variables are not propagated to the end of the block, and hence not to the successors.
Comment 9 Yuqiang Xian 2012-03-06 17:42:28 PST
(In reply to comment #7)
> > 
> > No, I haven't tried it - I was just worried that traversing the whole graph for Phi replacement may be time consuming especially when the traversal can be iterative. We can try it and I think we should only need the iterative traversal on the Phi nodes (if we have the fix of bug 80361 it would be easier) and a one-time traversal on other nodes in the graph (actually, at current stage, the GetLocals) after all of the Phi replacements are identified.
> 
> It seems like one way around this is to just track the Phis, and not the Phi uses.
> 
> We can first do a fixpoint where we simplify Phis only - which may leave dummy Phis like:
> 
> somePhi: Phi(@something)
> 
> where @something is either another Phi or something real (like a SetLocal), and somePhi is still referenced from various other places (GetLocal, etc).
> 
> Then we can do a second pass that considers all nodes (not just Phis) and checks if those nodes reference trivial Phis.  If they do, then replace.
> 
> I think that this should be sufficient, since the fixpoint (as I understand it) only depends on the shape of Phis and not the shapes of users of Phis.  So you only need to visit Phis repeatedly - and after that you can visit all non-Phi uses of Phis in one go.
> 
> Would that work?

Yes. I think that's exactly what I mean for "iterative traversal on the Phi nodes only" at first and then "one-time traversal on other nodes in the graph". - if I understand you correctly. :)
Comment 10 Filip Pizlo 2012-03-06 17:43:22 PST
(In reply to comment #9)
> (In reply to comment #7)
> > > 
> > > No, I haven't tried it - I was just worried that traversing the whole graph for Phi replacement may be time consuming especially when the traversal can be iterative. We can try it and I think we should only need the iterative traversal on the Phi nodes (if we have the fix of bug 80361 it would be easier) and a one-time traversal on other nodes in the graph (actually, at current stage, the GetLocals) after all of the Phi replacements are identified.
> > 
> > It seems like one way around this is to just track the Phis, and not the Phi uses.
> > 
> > We can first do a fixpoint where we simplify Phis only - which may leave dummy Phis like:
> > 
> > somePhi: Phi(@something)
> > 
> > where @something is either another Phi or something real (like a SetLocal), and somePhi is still referenced from various other places (GetLocal, etc).
> > 
> > Then we can do a second pass that considers all nodes (not just Phis) and checks if those nodes reference trivial Phis.  If they do, then replace.
> > 
> > I think that this should be sufficient, since the fixpoint (as I understand it) only depends on the shape of Phis and not the shapes of users of Phis.  So you only need to visit Phis repeatedly - and after that you can visit all non-Phi uses of Phis in one go.
> > 
> > Would that work?
> 
> Yes. I think that's exactly what I mean for "iterative traversal on the Phi nodes only" at first and then "one-time traversal on other nodes in the graph". - if I understand you correctly. :)

Ooops, sorry, yes, we're thinking the same thing. :-)  I didn't read carefully enough the first time.
Comment 11 Filip Pizlo 2012-03-06 17:44:01 PST
(In reply to comment #8)
> Created an attachment (id=130493) [details]
> Case showing the problem of type predications propagation after Phi elimination
> 
> This is the case showing the problem of type predications propagation after Phi elimination. Please pay attention to block 3, where node 51 and node 49 are eliminated since those two variables each have only one definition (in block 0). They're not used in block 3 but they should still be alive because they will be used in block 1 (block 3's successor). But after we eliminate 51 and 49 in block 3, the CFA ignores them and the type predictions of the two variables are not propagated to the end of the block, and hence not to the successors.

Aha!  I think that the thing that our dummy Phi insertion was giving us was liveness, and you're saying that we've lost this.

This is dangerous since we need to know liveness to compute DFG OSR entry constraints.

So we have two options:

1) Have a different way of computing liveness.  I'm OK with this if you want to do it, since we'll need it eventually!

2) If we eliminate a Phi from a block, then have the block's references to that Phi (i.e. variablesAtHead/AtTail) refer to its replacement.

I like (2) better for this patch, since it feels like a more minimal change.

What do you think?
Comment 12 Yuqiang Xian 2012-03-06 17:53:51 PST
(In reply to comment #11)
> 
> Aha!  I think that the thing that our dummy Phi insertion was giving us was liveness, and you're saying that we've lost this.
> 
> This is dangerous since we need to know liveness to compute DFG OSR entry constraints.
> 
> So we have two options:
> 
> 1) Have a different way of computing liveness.  I'm OK with this if you want to do it, since we'll need it eventually!
> 
> 2) If we eliminate a Phi from a block, then have the block's references to that Phi (i.e. variablesAtHead/AtTail) refer to its replacement.
> 
> I like (2) better for this patch, since it feels like a more minimal change.
> 
> What do you think?

I agree. Then the variablesAtHead/Tail of a block may reference a node outside the block and we need to extract the type predictions from it. But it _should_ not be a problem as we now keep the abstract values of all the nodes in the graph in one AbstractState - will verify it.
Comment 13 Yuqiang Xian 2012-03-07 02:54:30 PST
Created attachment 130579 [details]
proposed patch

Patch w/o the Phi use map.

Additionally, we don't observe any performance regression on the 3 benchmarks with this version.
Comment 14 Yuqiang Xian 2012-03-07 02:56:44 PST
Created attachment 130581 [details]
Latest performance result

As you could see, no obvious performance regression on the 3 major benchmarks.
Comment 15 WebKit Review Bot 2012-03-07 02:58:17 PST
Attachment 130579 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/CMakeLists.txt', u'S..." exit_code: 1
Source/JavaScriptCore/dfg/DFGRedundantPhiEliminationPhase.cpp:128:  Tab found; better to use spaces  [whitespace/tab] [1]
Total errors found: 1 in 11 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 16 Yuqiang Xian 2012-03-07 06:51:21 PST
Created attachment 130619 [details]
updated patch
Comment 17 Filip Pizlo 2012-03-07 11:59:18 PST
Comment on attachment 130619 [details]
updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=130619&action=review

R=me as it is, but it might be worth considering doing away with the Phi-to-block links, if possible.

> Source/JavaScriptCore/dfg/DFGRedundantPhiEliminationPhase.cpp:83
> +                if (replacement != NoNode) {
> +                    // Current Phi node will be skipped, we need to update
> +                    // the variable information if it references this node.
> +                    BasicBlock* basicBlock = m_graph.m_blocks[node.blockIndex()].get();
> +                    VirtualRegister operand = node.local();
> +                    if (operandIsArgument(operand)) {
> +                        unsigned arg = operandToArgument(operand);
> +                        if (basicBlock->variablesAtHead.argument(arg) == index) {
> +                            // This argument must be unused in this block.
> +                            ASSERT(basicBlock->variablesAtTail.argument(arg) == index);
> +                            basicBlock->variablesAtHead.argument(arg) = replacement;
> +                            basicBlock->variablesAtTail.argument(arg) = replacement;
> +                        }
> +                    } else {
> +                        if (basicBlock->variablesAtHead.local(operand) == index) {
> +                            // This local variable must be unused in this block.
> +                            ASSERT(basicBlock->variablesAtTail.local(operand) == index);
> +                            basicBlock->variablesAtHead.local(operand) = replacement;
> +                            basicBlock->variablesAtTail.local(operand) = replacement;
> +                        }
> +                    }

I think this is fine, but why didn't you just walk all blocks' variablesAtHead|Tail lists after the Phi fixup fixpoint?  That way, you wouldn't have had to save the block index in the Phi nodes' OpInfos.
Comment 18 Yuqiang Xian 2012-03-07 17:26:58 PST
Created attachment 130732 [details]
patch updated

Yes. It should be far cleaner! Patch is updated.

Thanks for the review comments!
Comment 19 Filip Pizlo 2012-03-07 17:53:37 PST
Comment on attachment 130732 [details]
patch updated

Yay! :-)
Comment 20 Yuqiang Xian 2012-03-07 19:10:19 PST
Landed as http://trac.webkit.org/changeset/110132