We were thinking about if there any useful way to increase the maintainability and the efficiency of the current peephole framework.
The current approach is quick, but the more peepholes are included the slower the bytecode generation is. (Because the continuous checks.) We have implemented a peephole framework which works with a state machine. The idea is originated from the Tamarin. This state machine works in the following way:
1. After each bytecode is emitted the state machine changes its state according to the given bytecode.
2. If it is found a pattern candidate, it tries to apply the peephole. Otherwise it counts the states forward according to the next bytecodes.
In this framework it is very easy to adding a new peephole pattern to the system. All peepholes are grouped in the same file, and in a same style. We found there is no performance regression with this new framework.
What are your opinions and thoughts about this solution?
Created attachment 25843 [details]
Peephole framwork with state machine
This patch introduces the peephole framework, which works with the help of the generated state tables.
We can switch among the original peepholes and the peepholes of the introduced framework with the macro PEEPHOLE_OPTIMIZER.
What is the performance change due to this patch? Is it a win?
(In reply to comment #2)
> What is the performance change due to this patch? Is it a win?
Whit these existed peepholes there is no any significant gain (and sometimes are minimal slow down). But I think if we increase the number of peepholes there could be some performance improvement.
I will send the detailed performance results soon.
Created attachment 25850 [details]
performance results (on jit)
Created attachment 25851 [details]
Comment on attachment 25843 [details]
Peephole framwork with state machine
While this patch looks like it would be a great addition and improvement over the existing simplistic optimiser there are a number of issues that we'll need to address. The most significant of these is the use of the STL -- basically WebKit does not use STL, and for the more complex STL structures (such as vector, etc) we run the risk of mismatching malloc implementations.
My other concern is that the state tables appear to be hand coded, and I would much rather we had a way to autogenerate them.
Ignoring those issues the code itself looks good (bar some minor style errors) and probably just needs to be brought up to tip of tree (we do now have length info for each opcode which should remove/reduce that portion of this patch).
Will CC myself to this bug so i don't miss updates.
Many apologies for the huge delay in review :-(
Created attachment 26973 [details]
This patch contains the up to date version of the peephole framework patch with some new patterns.
There are no significant performance differences on the SunSpider tests, but only our WindScorpion benchmark (http://www.sed.hu/webkit/?page=downloads). The performance gain is about 3% in this later case.
The state tables of the peepholes are included by the peephole.cpp. This file is fully (re)generated at compilation time by the execution of the code of peephole_generator.cpp
If you want to use this peephole framework, define the PEEPHOLE_OPTIMIZER macro.
Comment on attachment 26973 [details]
Ah ha, peephole_generator is stand alone executable i take it? Sorry, I misunderstood that in my first pass
Hmmm, I'm not sure I understand the change to JITCall.cpp?
(In reply to comment #8)
> (From update of attachment 26973 [details] [review])
> Ah ha, peephole_generator is stand alone executable i take it? Sorry, I
> misunderstood that in my first pass
> Hmmm, I'm not sure I understand the change to JITCall.cpp?
Ok, the modification of the JITCall does not connect to the peephole framework.
It is only the remainder of my earlier bug riport: https://bugs.webkit.org/show_bug.cgi?id=23497
Latest performance results still show no improvement.
Geoff, while i realise there isn't a perf improvement i believe this is the direction we wish to move in future.
A few things that need to occur in the patch (that i noticed over the weekend) peephole_optimiser should be renamed to PeepholeOptimiser (or rather appropriate changes to the generator need to occur to make this happen). Generation of the optimiser should be a build step -- the generated file itself shouldn't be included in the patch.
Created attachment 28469 [details]
I removed the previously included generated file, and renamed files suite to code style guidelines. Additionally the optimizer generation moved to a build phase, already it works on MAC, Qt, GTK and Windows platforms.
Created attachment 28470 [details]
performance result on MAC
Created attachment 28471 [details]
performance result (if JIT and WREC disabled)
(In reply to comment #12)
> Created an attachment (id=28469) [review]
> updated patch
And I forgot to mention, the second peephole pattern fixed, which had a hidden bug. (It works correctly on SunSpider and Regression tests, but appeared with another test.)
I'm not entirely convinced by this patch, while it is helpful to have a sane peephole optimiser the current tradeoff between win vs. additional complexity does not make it seem entirely worth while.
It would be interesting to know what the results are if you do
(which uses the original v8 tests)
to get further performance information.
Comment on attachment 28469 [details]
Sounds like Oliver is waiting on v8 numbers to review this. Given that there has been no activity in this bug in 2 months, I'm marking this as r- to remove it from the review queue.
You should feel encouraged to email oliver or geoff or maciej directly for further guidance as to where to take this patch.
Thanks again for the patch!
Please reflag for review when the requested perf testing info is provided.
I closed it, because it is obsolete now.