Bug 20764 - Proposed byte-code optimization framework
Summary: Proposed byte-code optimization framework
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Linux
: P2 Enhancement
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-09-10 07:05 PDT by Zoltan Herczeg
Modified: 2009-12-16 23:20 PST (History)
5 users (show)

See Also:


Attachments
first revision (85.82 KB, patch)
2008-09-10 08:29 PDT, Zoltan Herczeg
no flags Details | Formatted Diff | Diff
fixed up patch (92.61 KB, patch)
2008-09-10 10:49 PDT, Geoffrey Garen
ggaren: review-
Details | Formatted Diff | Diff
benchmark results (4.43 KB, text/plain)
2008-09-10 10:51 PDT, Geoffrey Garen
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Zoltan Herczeg 2008-09-10 07:05:54 PDT
Hi All,

We would like to share our ideas about a byte-code optimization framework. 

Here is a short description about it:

Purpose:
The code-block optimizer is a framework, which first parses a given code block, builds its basic block structure, executes various optimizations, and finally, recreates the code block again.

Invocation:
  - always invoked by nodes.cpp from FunctionBodyNode::generateCode and ProgramNode::generateCode

Shell extensions:
  - the jsc command line tool has a new argument: -o opt[,opt[...]]  (comma separated keywords)
    The optimizations are performed in the same order as they specified by this argument
  - The currently supported optimization keywords:
        dce   - dead code elimination
        cfp   - constant folding and propagation

Short description of the current optimizations:
  - dead code elimination
      Now working on extended basic block level. It creates a simple def-use system to eliminate "no side effects" instructions.
  - constant propagation
      This implementation is a general iterative solution for constant propagation. It determines the values of all constant expressions with the help of constans folding. Folding tries to evaulate the values of possibly constant expressions and after that the results are propagated.

Testing:
  - The framework was tested by inserting dummy movs (mov tr0, tr0) after every instructions. If this code passes the regression tests (jump targets are correctly redirected) we can assume the byte code generation is OK. 
 - all optimization tests are tested individually

Major changes in the source code:
  - new files added:
     CodeBlockOptimizer.cpp and .h
     OptConstansFoldingPropagation.cpp
   - files changed:
     CodeBlock.cpp - new structures added, which are basically replicas of the original structures, except the instruction indices are changed to byte code pointers.
     Opcode.h - opcode flags are added to every opcode

Alternative invocation:
  - when a certain treshold is reached for a given code-block. In the latter case, an independent thread performs the optimization, and replaces the original block after the process is done (however, it is not included in the proposed patch).

Any feedbacks, questions are welcome. We hope it would be useful.
Comment 1 Zoltan Herczeg 2008-09-10 08:29:02 PDT
Created attachment 23320 [details]
first revision
Comment 2 Geoffrey Garen 2008-09-10 10:49:28 PDT
I had to make a number of changes to get this building on Mac OS X. Attaching fixed-up patch.
Comment 3 Geoffrey Garen 2008-09-10 10:49:59 PDT
Created attachment 23322 [details]
fixed up patch

Setting the review flag on this patch, since it was set on the last patch.
Comment 4 Geoffrey Garen 2008-09-10 10:51:18 PDT
Created attachment 23323 [details]
benchmark results

The results on SunSpider and SunSpider --v8 are a wash overall, perhaps a small regression on SunSpider --v8, but a 3.8% speedup on richards.
Comment 5 Geoffrey Garen 2008-09-10 10:58:18 PDT
Some initial impressions, from trying to get the patch to build: 

 RegisterID* FunctionBodyNode::emitCode(CodeGenerator& generator, RegisterID*)
@@ -1795,6 +1799,9 @@
     
     CodeGenerator generator(this, globalObject->debugger(), scopeChain, &globalObject->symbolTable(), m_code.get(), m_varStack, m_functionStack);
     generator.generate();
+
+    CodeBlockOptimizer opt;
+    m_code.set(static_cast<JSC::ProgramCodeBlock*>(opt.optimizeBlock(m_code.get(), globalObject->globalExec())));
 }

This cast is invalid, since a FunctionBodyNode does not hold a ProgramCodeBlock. You should change the optimizer to accept any kind of codeblock.

OptConstantFoldingPropagation.cpp should just fold into with CodeBlockOptimizer.cpp, since it holds part of the implementation for CodeBlockOptimizer.

The flags in Opcode.h are pretty hard to read. I think something like "kConditionalJump" would be much easier to read than "OPFLAG_CJUMP." It's a shame when a variable name requires a comment just to explain what it stands for. You can put the constants inside a class or a namespace to avoid name collisions.

We know there are strings and floats that can be constant-folded in SunSpider. It's surprising that those tests didn't speed up at all.
Comment 6 Zoltan Herczeg 2008-09-10 11:45:11 PDT
(In reply to comment #5)
> Some initial impressions, from trying to get the patch to build: 
> 
>  RegisterID* FunctionBodyNode::emitCode(CodeGenerator& generator, RegisterID*)
> @@ -1795,6 +1799,9 @@
> 
>      CodeGenerator generator(this, globalObject->debugger(), scopeChain,
> &globalObject->symbolTable(), m_code.get(), m_varStack, m_functionStack);
>      generator.generate();
> +
> +    CodeBlockOptimizer opt;
> +   
> m_code.set(static_cast<JSC::ProgramCodeBlock*>(opt.optimizeBlock(m_code.get(),
> globalObject->globalExec())));
>  }
> 
> This cast is invalid, since a FunctionBodyNode does not hold a
> ProgramCodeBlock. You should change the optimizer to accept any kind of
> codeblock.

It is strange, that the gcc does not throw any warning messages. Maybe because the CodeBlock is the base class of ProgramCodeBlock.

We aware of the derived classes of the CodeBlock, and CodeBlockOptimizer::genCode() creates the new code block based on the original codeType, so that request has already implemented.

> OptConstantFoldingPropagation.cpp should just fold into with
> CodeBlockOptimizer.cpp, since it holds part of the implementation for
> CodeBlockOptimizer.

True. It was separated only for the sake of easier development.

> The flags in Opcode.h are pretty hard to read. I think something like
> "kConditionalJump" would be much easier to read than "OPFLAG_CJUMP." It's a
> shame when a variable name requires a comment just to explain what it stands
> for. You can put the constants inside a class or a namespace to avoid name
> collisions.

I agree with this. Code cleaning is far from complete :)

> We know there are strings and floats that can be constant-folded in SunSpider.
> It's surprising that those tests didn't speed up at all.

Which test cases and where? We always struggle to find good optimization opportunities, so we apreciate your help to show us any direction.
Comment 7 Zoltan Herczeg 2008-09-12 05:27:53 PDT
Hi,

We have found some good peephole optimizations in SunSpider. 

1) Optimizing negates:

[ 434] negate            tr132, lr2
[ 437] negate            tr133, lr2
[ 440] negate            tr134, lr2
(sometimes a move operation is inserted between the negates)

Can be changed to:
[ 434] negate            tr132, lr2
[ 437] op_mov            tr133, tr132
[ 440] op_mov            tr134, tr132

2) Mov folding:

a)
[   6] mov               tr36, tr0
[   9] mov               tr37, tr1
[  12] mov               tr38, tr2
[  15] mov               tr39, tr3
...

b)
[ 538] mov               tr132, tr3
[ 541] mov               tr133, tr3
[ 544] mov               tr134, tr3
[ 547] mov               tr135, tr3
...

Used especially in array initialization and function call with constant arguments.

We introduced two new instruction:
op_mov_range for the first version (implemented by a memcpy call)
op_set_range for the second version

Preliminary measurements shows 2.3% performance improve with SunSpider, but we will see what will happen after we merge our svn with the official one in Monday.
Comment 8 Maciej Stachowiak 2008-09-13 14:22:46 PDT
I'd like to see both Geoff and Cameron comment on this.
Comment 9 Geoffrey Garen 2008-10-24 13:23:15 PDT
> We have found some good peephole optimizations in SunSpider. 

Do you have a patch that implements these?
Comment 10 Geoffrey Garen 2008-10-24 13:33:59 PDT
Comment on attachment 23322 [details]
fixed up patch

I'm going to r- this because I don't want to add this kind of complexity to the engine without a measurable speedup.

Here's a simple suggestion for a place to start (inspired by what you've posted):
- annotate each opcode with a length
- change CodeBlock::dump, CTI::privateCompile*, etc. -- anything that iterates bytecode -- to use the abstract length annotations instead of hard-coded constants

That will allow you to land parts of this patch while reducing, rather than increasing, complexity.

Then, for a speedup:
- annotate each opcode with
    1) a flag indicating whether the opcode has a destination, which, if true, means that operand 0 holds the destination register
    2) a list of source operands (or maybe an enum of Unary | Binary | Ternary | etc. with canonical source operand locations
- change programs to issue explicit op_load instructions to initialize their variables to undefined
- add an optimization pass to the CodeGenerator that iterates up to the first label and removes any op_load instructions to a destinations that are written before they are read

This should substantially improve code like:
var x = 1;
var y = 2;
...
by removing the need to initialize x and y to undefined.
Comment 11 Csaba Osztrogonác 2009-12-16 23:20:49 PST
I closed it , because it is obsolete now.