We would like to share our ideas about a byte-code optimization framework.
Here is a short description about it:
The code-block optimizer is a framework, which first parses a given code block, builds its basic block structure, executes various optimizations, and finally, recreates the code block again.
- always invoked by nodes.cpp from FunctionBodyNode::generateCode and ProgramNode::generateCode
- the jsc command line tool has a new argument: -o opt[,opt[...]] (comma separated keywords)
The optimizations are performed in the same order as they specified by this argument
- The currently supported optimization keywords:
dce - dead code elimination
cfp - constant folding and propagation
Short description of the current optimizations:
- dead code elimination
Now working on extended basic block level. It creates a simple def-use system to eliminate "no side effects" instructions.
- constant propagation
This implementation is a general iterative solution for constant propagation. It determines the values of all constant expressions with the help of constans folding. Folding tries to evaulate the values of possibly constant expressions and after that the results are propagated.
- The framework was tested by inserting dummy movs (mov tr0, tr0) after every instructions. If this code passes the regression tests (jump targets are correctly redirected) we can assume the byte code generation is OK.
- all optimization tests are tested individually
Major changes in the source code:
- new files added:
CodeBlockOptimizer.cpp and .h
- files changed:
CodeBlock.cpp - new structures added, which are basically replicas of the original structures, except the instruction indices are changed to byte code pointers.
Opcode.h - opcode flags are added to every opcode
- when a certain treshold is reached for a given code-block. In the latter case, an independent thread performs the optimization, and replaces the original block after the process is done (however, it is not included in the proposed patch).
Any feedbacks, questions are welcome. We hope it would be useful.
Created attachment 23320 [details]
I had to make a number of changes to get this building on Mac OS X. Attaching fixed-up patch.
Created attachment 23322 [details]
fixed up patch
Setting the review flag on this patch, since it was set on the last patch.
Created attachment 23323 [details]
The results on SunSpider and SunSpider --v8 are a wash overall, perhaps a small regression on SunSpider --v8, but a 3.8% speedup on richards.
Some initial impressions, from trying to get the patch to build:
RegisterID* FunctionBodyNode::emitCode(CodeGenerator& generator, RegisterID*)
@@ -1795,6 +1799,9 @@
CodeGenerator generator(this, globalObject->debugger(), scopeChain, &globalObject->symbolTable(), m_code.get(), m_varStack, m_functionStack);
+ CodeBlockOptimizer opt;
+ m_code.set(static_cast<JSC::ProgramCodeBlock*>(opt.optimizeBlock(m_code.get(), globalObject->globalExec())));
This cast is invalid, since a FunctionBodyNode does not hold a ProgramCodeBlock. You should change the optimizer to accept any kind of codeblock.
OptConstantFoldingPropagation.cpp should just fold into with CodeBlockOptimizer.cpp, since it holds part of the implementation for CodeBlockOptimizer.
The flags in Opcode.h are pretty hard to read. I think something like "kConditionalJump" would be much easier to read than "OPFLAG_CJUMP." It's a shame when a variable name requires a comment just to explain what it stands for. You can put the constants inside a class or a namespace to avoid name collisions.
We know there are strings and floats that can be constant-folded in SunSpider. It's surprising that those tests didn't speed up at all.
(In reply to comment #5)
> Some initial impressions, from trying to get the patch to build:
> RegisterID* FunctionBodyNode::emitCode(CodeGenerator& generator, RegisterID*)
> @@ -1795,6 +1799,9 @@
> CodeGenerator generator(this, globalObject->debugger(), scopeChain,
> &globalObject->symbolTable(), m_code.get(), m_varStack, m_functionStack);
> + CodeBlockOptimizer opt;
> This cast is invalid, since a FunctionBodyNode does not hold a
> ProgramCodeBlock. You should change the optimizer to accept any kind of
It is strange, that the gcc does not throw any warning messages. Maybe because the CodeBlock is the base class of ProgramCodeBlock.
We aware of the derived classes of the CodeBlock, and CodeBlockOptimizer::genCode() creates the new code block based on the original codeType, so that request has already implemented.
> OptConstantFoldingPropagation.cpp should just fold into with
> CodeBlockOptimizer.cpp, since it holds part of the implementation for
True. It was separated only for the sake of easier development.
> The flags in Opcode.h are pretty hard to read. I think something like
> "kConditionalJump" would be much easier to read than "OPFLAG_CJUMP." It's a
> shame when a variable name requires a comment just to explain what it stands
> for. You can put the constants inside a class or a namespace to avoid name
I agree with this. Code cleaning is far from complete :)
> We know there are strings and floats that can be constant-folded in SunSpider.
> It's surprising that those tests didn't speed up at all.
Which test cases and where? We always struggle to find good optimization opportunities, so we apreciate your help to show us any direction.
We have found some good peephole optimizations in SunSpider.
1) Optimizing negates:
[ 434] negate tr132, lr2
[ 437] negate tr133, lr2
[ 440] negate tr134, lr2
(sometimes a move operation is inserted between the negates)
Can be changed to:
[ 434] negate tr132, lr2
[ 437] op_mov tr133, tr132
[ 440] op_mov tr134, tr132
2) Mov folding:
[ 6] mov tr36, tr0
[ 9] mov tr37, tr1
[ 12] mov tr38, tr2
[ 15] mov tr39, tr3
[ 538] mov tr132, tr3
[ 541] mov tr133, tr3
[ 544] mov tr134, tr3
[ 547] mov tr135, tr3
Used especially in array initialization and function call with constant arguments.
We introduced two new instruction:
op_mov_range for the first version (implemented by a memcpy call)
op_set_range for the second version
Preliminary measurements shows 2.3% performance improve with SunSpider, but we will see what will happen after we merge our svn with the official one in Monday.
I'd like to see both Geoff and Cameron comment on this.
> We have found some good peephole optimizations in SunSpider.
Do you have a patch that implements these?
Comment on attachment 23322 [details]
fixed up patch
I'm going to r- this because I don't want to add this kind of complexity to the engine without a measurable speedup.
Here's a simple suggestion for a place to start (inspired by what you've posted):
- annotate each opcode with a length
- change CodeBlock::dump, CTI::privateCompile*, etc. -- anything that iterates bytecode -- to use the abstract length annotations instead of hard-coded constants
That will allow you to land parts of this patch while reducing, rather than increasing, complexity.
Then, for a speedup:
- annotate each opcode with
1) a flag indicating whether the opcode has a destination, which, if true, means that operand 0 holds the destination register
2) a list of source operands (or maybe an enum of Unary | Binary | Ternary | etc. with canonical source operand locations
- change programs to issue explicit op_load instructions to initialize their variables to undefined
- add an optimization pass to the CodeGenerator that iterates up to the first label and removes any op_load instructions to a destinations that are written before they are read
This should substantially improve code like:
var x = 1;
var y = 2;
by removing the need to initialize x and y to undefined.
I closed it , because it is obsolete now.