104638 – Support op_switch_imm bytecode on DFG.

Roman Zhuykov

Reported 2012-12-10 23:45:38 PST

Here I make an attempt to support op_switch on DFG level. Is an approach of parsing jump table and expanding several comparisons possible? DFG Bytecode parser were unable to support creating multiple dfg blocks for one bytecode operation. Here I mostly solve the problem by tricks, what is a proper way to allow such operations?

Roman Zhuykov

Comment 1 2012-12-10 23:48:27 PST

Created attachment 178731 [details] Proposed patch

Filip Pizlo

Comment 2 2012-12-11 01:26:56 PST

Comment on attachment 178731 [details] Proposed patch I don't think this is the right approach. For large switch statements, this will perform worse. I also fear that your way of creating multiple basic blocks is fragile - I can't see any obvious bugs in it, but it's kind of a dangerous approach that requires a mutable context to be held in place while you repeatedly revisit the same bytecode. A better way of handling Switch in an optimizing compiler is to (a) have a Switch node in the DFG IR and (b) perform transformations on the Switch later in compilation if you find that it's a trivial one. Writing a phase to handle (b) would give you the ability to have a more straight-forward handling of basic block creation. Using (a) and allowing the Switch to sometimes survive into the backend (particularly when it's a Switch with a lot of cases) would take care of pathologies for large switch statements.

Filip Pizlo

Comment 3 2012-12-11 01:31:02 PST

(In reply to comment #0) > Here I make an attempt to support op_switch on DFG level. > > Is an approach of parsing jump table and expanding several comparisons possible? > > DFG Bytecode parser were unable to support creating multiple dfg blocks for one bytecode operation. > Here I mostly solve the problem by tricks, what is a proper way to allow such operations? I can't see anything obviously wrong with your approach, but I think one way around it, as I say in the other comment, is to have the parser just create a Switch node and only create additional blocks later, if it's appropriate. One thing you'd have to take care of if you did that is that currently, basic block index implies basic block order. So if you added blocks you'd up having the DFG generate code for those blocks at the end of the code. That's not a show-stopper but it probably would lead to suboptimal performance. You could get around this by just decoupling program order from basic block index, for example by having the DFG process blocks using an index vector: class Graph { // all of the current things... Vector<BlockIndex> m_blockOrder; } But, probably an even better way to do this in a first cut is to just wire the Switch node all the way through to the backend, and have the backend do all of the code generation for the Switch. Then you don't have to add any blocks. This might end up being slightly suboptimal for small switch statements (like, we already have lots of optimizations for CompareStrictEq that you might not end up implementing for Switch even though it does the same things) but it will surely be more optimal for large switch statements where a linear sequence of CompareStrictEq's is likely to be a regression over the baseline JIT's lookup table approach.

Filip Pizlo

Comment 4 2012-12-11 01:32:43 PST

(In reply to comment #0) > Here I make an attempt to support op_switch on DFG level. > > Is an approach of parsing jump table and expanding several comparisons possible? > > DFG Bytecode parser were unable to support creating multiple dfg blocks for one bytecode operation. > Here I mostly solve the problem by tricks, what is a proper way to allow such operations? One other thought: it seems that you're generating a linear sequence of CompareStrictEq's. If you really want to go with the CompareStrictEq approach to compiling a Switch, you should at least use a binary tree of blocks that contain comparisons, so that the amount of comparing you have to do is log(size of switch). But I'm not sure that this would be any faster than a lookup table based Switch, in most of the cases where people actually use switch statements.

Roman Zhuykov

Comment 5 2012-12-12 05:16:07 PST

(In reply to comment #2) Certainly, many comparisons for large switch statements perform slower then jump table. I don't like the idea of additional phase which will only expand switch into blocks. This will at least perform slower then processing op_switch in DFGBytecodeParser. Speaking about using Switch DFG node all the way through to the backend — it seems complicated to implement processing such a node in all other phases. I see a lot of code that assumes each basic block in control-flow graph have edges to one or two other blocks (jump or branch). Basic block with switch node may have up to 1000 outgoing edges. Is it possible to implement such support or there is any logic which prevents creating blocks with more than two exits?

Filip Pizlo

Comment 6 2012-12-12 13:03:58 PST

(In reply to comment #5) > (In reply to comment #2) > Certainly, many comparisons for large switch statements perform slower then jump table. I don't like the idea of additional phase which will only expand switch into blocks. This will at least perform slower then processing op_switch in DFGBytecodeParser. This is a blanket assertion with zero evidence. We don't tend to make architectural decisions based on zero evidence. Also, we rarely use compilation speed as the primary guide for deciding how to implement optimizations. There are three considerations when writing a compiler: 1) How fast the code that it generates runs. 2) How maintainable to compiler is. 3) How fast the compiler runs. We prioritize in exactly that order: speed of generated code comes first, maintainability second, and compiler run-time third. The idea that you should cram everything into the bytecode parser makes sense only if you cared more about (3) than (2). But wait, that's not all. Doing all Switch lowering in the bytecode parser prevents you from being able to choose between a lookup switch (i.e. decompose to branches) and a table switch (i.e. use a jump table) based on information gleaned from prior optimizations. Remember, the bytecode parser has close to zero knowledge about the behavior of the program beyond just "peephole" profiling on a few key ops (heap accesses and calls mainly). But later in the compiler we aggregate the profiling data to generate much more information, and we also at that point generate much stronger proofs about the nature of values. Hence leaving a Switch alone until more information is available is generally a good long-term architectural strategy. > Speaking about using Switch DFG node all the way through to the backend — it seems complicated to implement processing such a node in all other phases. I see a lot of code that assumes each basic block in control-flow graph have edges to one or two other blocks (jump or branch). Then you should fix that code as part of your patch. > Basic block with switch node may have up to 1000 outgoing edges. Is it possible to implement such support or there is any logic which prevents creating blocks with more than two exits? The only reason why we have logic that assumes that blocks have at most two outgoing edges is that currently our only true "control" construct is Branch. But there isn't any code that really relies on this property. In particular, if you took all of the places that switch on the NodeType of a block terminal and do different things for Jump and Branch, and added a Switch case, then you wouldn't really be breaking anything. This of course assumes that you're not going to be implementing sparse conditional constant propagation on Switch statements in the first go. That's a sensible thing to not do, since the sparse conditional component is really only there for conditional loop branches. You're unlikely to have a Switch statement where one of the cases acts as a loop back-edge while the other cases act as fall-through forward edges.

Attachments
Proposed patch (8.87 KB, patch) 2012-12-10 23:48 PST, Roman Zhuykov	fpizlo: review-	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.