|Summary:||Proposed byte-code optimization framework|
|Product:||WebKit||Reporter:||Zoltan Herczeg <zherczeg>|
|Severity:||Enhancement||CC:||ggaren, jasy, loki, ossy, zwarich|
|Version:||528+ (Nightly build)|
Description Zoltan Herczeg 2008-09-10 07:05:54 PDT
Hi All, We would like to share our ideas about a byte-code optimization framework. Here is a short description about it: Purpose: The code-block optimizer is a framework, which first parses a given code block, builds its basic block structure, executes various optimizations, and finally, recreates the code block again. Invocation: - always invoked by nodes.cpp from FunctionBodyNode::generateCode and ProgramNode::generateCode Shell extensions: - the jsc command line tool has a new argument: -o opt[,opt[...]] (comma separated keywords) The optimizations are performed in the same order as they specified by this argument - The currently supported optimization keywords: dce - dead code elimination cfp - constant folding and propagation Short description of the current optimizations: - dead code elimination Now working on extended basic block level. It creates a simple def-use system to eliminate "no side effects" instructions. - constant propagation This implementation is a general iterative solution for constant propagation. It determines the values of all constant expressions with the help of constans folding. Folding tries to evaulate the values of possibly constant expressions and after that the results are propagated. Testing: - The framework was tested by inserting dummy movs (mov tr0, tr0) after every instructions. If this code passes the regression tests (jump targets are correctly redirected) we can assume the byte code generation is OK. - all optimization tests are tested individually Major changes in the source code: - new files added: CodeBlockOptimizer.cpp and .h OptConstansFoldingPropagation.cpp - files changed: CodeBlock.cpp - new structures added, which are basically replicas of the original structures, except the instruction indices are changed to byte code pointers. Opcode.h - opcode flags are added to every opcode Alternative invocation: - when a certain treshold is reached for a given code-block. In the latter case, an independent thread performs the optimization, and replaces the original block after the process is done (however, it is not included in the proposed patch). Any feedbacks, questions are welcome. We hope it would be useful.
Comment 2 Geoffrey Garen 2008-09-10 10:49:28 PDT
I had to make a number of changes to get this building on Mac OS X. Attaching fixed-up patch.
Comment 3 Geoffrey Garen 2008-09-10 10:49:59 PDT
Created attachment 23322 [details] fixed up patch Setting the review flag on this patch, since it was set on the last patch.
Comment 4 Geoffrey Garen 2008-09-10 10:51:18 PDT
Created attachment 23323 [details] benchmark results The results on SunSpider and SunSpider --v8 are a wash overall, perhaps a small regression on SunSpider --v8, but a 3.8% speedup on richards.
Comment 5 Geoffrey Garen 2008-09-10 10:58:18 PDT
Some initial impressions, from trying to get the patch to build: RegisterID* FunctionBodyNode::emitCode(CodeGenerator& generator, RegisterID*) @@ -1795,6 +1799,9 @@ CodeGenerator generator(this, globalObject->debugger(), scopeChain, &globalObject->symbolTable(), m_code.get(), m_varStack, m_functionStack); generator.generate(); + + CodeBlockOptimizer opt; + m_code.set(static_cast<JSC::ProgramCodeBlock*>(opt.optimizeBlock(m_code.get(), globalObject->globalExec()))); } This cast is invalid, since a FunctionBodyNode does not hold a ProgramCodeBlock. You should change the optimizer to accept any kind of codeblock. OptConstantFoldingPropagation.cpp should just fold into with CodeBlockOptimizer.cpp, since it holds part of the implementation for CodeBlockOptimizer. The flags in Opcode.h are pretty hard to read. I think something like "kConditionalJump" would be much easier to read than "OPFLAG_CJUMP." It's a shame when a variable name requires a comment just to explain what it stands for. You can put the constants inside a class or a namespace to avoid name collisions. We know there are strings and floats that can be constant-folded in SunSpider. It's surprising that those tests didn't speed up at all.
Comment 6 Zoltan Herczeg 2008-09-10 11:45:11 PDT
(In reply to comment #5) > Some initial impressions, from trying to get the patch to build: > > RegisterID* FunctionBodyNode::emitCode(CodeGenerator& generator, RegisterID*) > @@ -1795,6 +1799,9 @@ > > CodeGenerator generator(this, globalObject->debugger(), scopeChain, > &globalObject->symbolTable(), m_code.get(), m_varStack, m_functionStack); > generator.generate(); > + > + CodeBlockOptimizer opt; > + > m_code.set(static_cast<JSC::ProgramCodeBlock*>(opt.optimizeBlock(m_code.get(), > globalObject->globalExec()))); > } > > This cast is invalid, since a FunctionBodyNode does not hold a > ProgramCodeBlock. You should change the optimizer to accept any kind of > codeblock. It is strange, that the gcc does not throw any warning messages. Maybe because the CodeBlock is the base class of ProgramCodeBlock. We aware of the derived classes of the CodeBlock, and CodeBlockOptimizer::genCode() creates the new code block based on the original codeType, so that request has already implemented. > OptConstantFoldingPropagation.cpp should just fold into with > CodeBlockOptimizer.cpp, since it holds part of the implementation for > CodeBlockOptimizer. True. It was separated only for the sake of easier development. > The flags in Opcode.h are pretty hard to read. I think something like > "kConditionalJump" would be much easier to read than "OPFLAG_CJUMP." It's a > shame when a variable name requires a comment just to explain what it stands > for. You can put the constants inside a class or a namespace to avoid name > collisions. I agree with this. Code cleaning is far from complete :) > We know there are strings and floats that can be constant-folded in SunSpider. > It's surprising that those tests didn't speed up at all. Which test cases and where? We always struggle to find good optimization opportunities, so we apreciate your help to show us any direction.
Comment 7 Zoltan Herczeg 2008-09-12 05:27:53 PDT
Hi, We have found some good peephole optimizations in SunSpider. 1) Optimizing negates: [ 434] negate tr132, lr2 [ 437] negate tr133, lr2 [ 440] negate tr134, lr2 (sometimes a move operation is inserted between the negates) Can be changed to: [ 434] negate tr132, lr2 [ 437] op_mov tr133, tr132 [ 440] op_mov tr134, tr132 2) Mov folding: a) [ 6] mov tr36, tr0 [ 9] mov tr37, tr1 [ 12] mov tr38, tr2 [ 15] mov tr39, tr3 ... b) [ 538] mov tr132, tr3 [ 541] mov tr133, tr3 [ 544] mov tr134, tr3 [ 547] mov tr135, tr3 ... Used especially in array initialization and function call with constant arguments. We introduced two new instruction: op_mov_range for the first version (implemented by a memcpy call) op_set_range for the second version Preliminary measurements shows 2.3% performance improve with SunSpider, but we will see what will happen after we merge our svn with the official one in Monday.
Comment 8 Maciej Stachowiak 2008-09-13 14:22:46 PDT
I'd like to see both Geoff and Cameron comment on this.
Comment 9 Geoffrey Garen 2008-10-24 13:23:15 PDT
> We have found some good peephole optimizations in SunSpider. Do you have a patch that implements these?
Comment 10 Geoffrey Garen 2008-10-24 13:33:59 PDT
Comment on attachment 23322 [details] fixed up patch I'm going to r- this because I don't want to add this kind of complexity to the engine without a measurable speedup. Here's a simple suggestion for a place to start (inspired by what you've posted): - annotate each opcode with a length - change CodeBlock::dump, CTI::privateCompile*, etc. -- anything that iterates bytecode -- to use the abstract length annotations instead of hard-coded constants That will allow you to land parts of this patch while reducing, rather than increasing, complexity. Then, for a speedup: - annotate each opcode with 1) a flag indicating whether the opcode has a destination, which, if true, means that operand 0 holds the destination register 2) a list of source operands (or maybe an enum of Unary | Binary | Ternary | etc. with canonical source operand locations - change programs to issue explicit op_load instructions to initialize their variables to undefined - add an optimization pass to the CodeGenerator that iterates up to the first label and removes any op_load instructions to a destinations that are written before they are read This should substantially improve code like: var x = 1; var y = 2; ... by removing the need to initialize x and y to undefined.
Comment 11 Csaba Osztrogonác 2009-12-16 23:20:49 PST
I closed it , because it is obsolete now.