125430 – CSE should work in SSA

RESOLVED FIXED 125430

CSE should work in SSA

https://bugs.webkit.org/show_bug.cgi?id=125430

Summary CSE should work in SSA

Filip Pizlo

Reported 2013-12-08 14:35:02 PST

Patch forthcoming.

Attachments
the patch (3.20 KB, patch) 2013-12-08 15:37 PST, Filip Pizlo	oliver: review+	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Filip Pizlo

Comment 1 2013-12-08 14:35:12 PST

This is actually non-trivial.

Filip Pizlo

Comment 2 2013-12-08 15:37:44 PST

Created attachment 218717 [details] the patch

Nadav Rotem

Comment 3 2013-12-08 19:00:39 PST

>> m_graph.getBlocksInDepthFirstOrder(depthFirst); Why are you scanning the code in DFS? Is this a preparation for the multi-block CSE support? If it is then I have some code for that. Also, I think that I need some help with my CSE patch. It boosts one of the sun spider benchmarks by 30%, but I had a crash I could not debug when I was refactoring the code to use the DoesReadOrWrite API.

Filip Pizlo

Comment 4 2013-12-08 19:06:31 PST

(In reply to comment #3) > >> m_graph.getBlocksInDepthFirstOrder(depthFirst); > > Why are you scanning the code in DFS? It guarantees that if A dominates B then I visit A before B. Essentially, CSE is doing a RAUW even though DFG doesn't have pointers from a node to its users. So even though I don't find global redundancies, I may have a basic block with: x: ValueAdd(@a, @b) y: ValueAdd(@a, @b) And @y gets used in some basic blocks dominated by this block. CSE is responsible for finding all of the uses of @y and replacing them with @x. That's why we do DFS order. > Is this a preparation for the multi-block CSE support? If it is then I have some code for that. Also, I think that I need some help with my CSE patch. It boosts one of the sun spider benchmarks by 30%, but I had a crash I could not debug when I was refactoring the code to use the DoesReadOrWrite API. OK - we should chat this week!

Nadav Rotem

Comment 5 2013-12-08 20:18:51 PST

(In reply to comment #4) > (In reply to comment #3) > > >> m_graph.getBlocksInDepthFirstOrder(depthFirst); > > > > Why are you scanning the code in DFS? > > It guarantees that if A dominates B then I visit A before B. Essentially, CSE is doing a RAUW even though DFG doesn't have pointers from a node to its users. So even though I don't find global redundancies, I may have a basic block with: > > x: ValueAdd(@a, @b) > y: ValueAdd(@a, @b) > > And @y gets used in some basic blocks dominated by this block. CSE is responsible for finding all of the uses of @y and replacing them with @x. That's why we do DFS order. > I know how dominator based value numbering works :) I was wondering why you are implementing it this way. My understanding is that this patch will not change anything since the code still does intra block cse. You will still need to support the non SSA dfg phase with the same code base, so I am not sure how you want to implement the multi block cse. The block scan order is the least of my worries. > > Is this a preparation for the multi-block CSE support? If it is then I have some code for that. Also, I think that I need some help with my CSE patch. It boosts one of the sun spider benchmarks by 30%, but I had a crash I could not debug when I was refactoring the code to use the DoesReadOrWrite API. > > OK - we should chat this week!

Filip Pizlo

Comment 6 2013-12-08 20:38:13 PST

(In reply to comment #5) > (In reply to comment #4) > > (In reply to comment #3) > > > >> m_graph.getBlocksInDepthFirstOrder(depthFirst); > > > > > > Why are you scanning the code in DFS? > > > > It guarantees that if A dominates B then I visit A before B. Essentially, CSE is doing a RAUW even though DFG doesn't have pointers from a node to its users. So even though I don't find global redundancies, I may have a basic block with: > > > > x: ValueAdd(@a, @b) > > y: ValueAdd(@a, @b) > > > > And @y gets used in some basic blocks dominated by this block. CSE is responsible for finding all of the uses of @y and replacing them with @x. That's why we do DFS order. > > > > I know how dominator based value numbering works :) I was wondering why you are implementing it this way. That's the thing - this isn't meant to be dominator-based value numbering. This is still doing block-local CSE. The only thing global about it is propagating node->misc.replacement, which is like a bulk version of RAUW. I think we're in violent agreement over the fact that this isn't dominator based value numbering. It's not meant to be. This patch is just a bugfix to make it legal to run the current CSE phase in SSA mode. Previously doing so would either assert or corrupt the IR. > My understanding is that this patch will not change anything since the code still does intra block cse. Correct, for now. This work: https://bugs.webkit.org/show_bug.cgi?id=125253 will eventually benefit from it, since it's doing a lowering that happens after SSAification. The long-term idea is to do some DFG->DFG lowerings only along the FTL path. Then it makes sense to do some extra CSE along the FTL path, as well. > You will still need to support the non SSA dfg phase with the same code base, so I am not sure how you want to implement the multi block cse. The block scan order is the least of my worries. I'm not implementing multi-block CSE. This patch isn't intended to make multi-block CSE easier to implement. I'm just making it so that it is possible to run the existing CSE phase when in SSA form. Prior to this patch, if you needed to do *any* CSE - even the stupid block-local one - over SSA, you'd crash because the block-local CSE would either assert or corrupt the IR. I think that this patch is mainly interesting from the standpoint of just making all tests pass when CSE is run over SSA. Ultimately, I want all optimization phases to be runnable in SSA and to have some way of testing that they do, so that we have the option of doing SSAification prior to the optimization fixpoint in FTL mode. I suspect that if nothing else, this ought to reduce compile times. But it should also make the CFA and the folder more precise. On the topic of multi-block CSE, I'm not sure how much the DFG CSE is going to end up buying us, in FTL mode. It certainly finds *some* things that LLVM's GVN won't find (in case of nodes that call C functions that do write some memory but don't clobber the node being eliminated), but for *most* things, LLVM GVN+TBAA is likely to be strictly better. Maybe then the DFG CSE would serve a similar purpose to LLVM's EarlyCSE, which as I understand it, has more to do with compile times than anything else? > > > > Is this a preparation for the multi-block CSE support? If it is then I have some code for that. Also, I think that I need some help with my CSE patch. It boosts one of the sun spider benchmarks by 30%, but I had a crash I could not debug when I was refactoring the code to use the DoesReadOrWrite API. > > > > OK - we should chat this week!

Filip Pizlo

Comment 7 2013-12-10 09:54:45 PST

Landed in http://trac.webkit.org/changeset/160328

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component JavaScriptCore

Assignee

Filip Pizlo

Reported

2013-12-08 14:35 PST

Modified

2013-12-10 09:54 PST History

CC List

8 users Show

URL

Keywords

Depends on

Blocks

112840 125429

Dependencies

tree graph