|Summary:||State-of-art peephole state machine|
|Product:||WebKit||Reporter:||Judit Jász <jasy>|
|Severity:||Normal||CC:||emacemac7, ggaren, oliver, ossy|
|Version:||528+ (Nightly build)|
Description Judit Jász 2008-12-08 07:42:01 PST
We were thinking about if there any useful way to increase the maintainability and the efficiency of the current peephole framework. The current approach is quick, but the more peepholes are included the slower the bytecode generation is. (Because the continuous checks.) We have implemented a peephole framework which works with a state machine. The idea is originated from the Tamarin. This state machine works in the following way: 1. After each bytecode is emitted the state machine changes its state according to the given bytecode. 2. If it is found a pattern candidate, it tries to apply the peephole. Otherwise it counts the states forward according to the next bytecodes. In this framework it is very easy to adding a new peephole pattern to the system. All peepholes are grouped in the same file, and in a same style. We found there is no performance regression with this new framework. What are your opinions and thoughts about this solution?
Comment 1 Judit Jász 2008-12-08 10:33:43 PST
Created attachment 25843 [details] Peephole framwork with state machine This patch introduces the peephole framework, which works with the help of the generated state tables. We can switch among the original peepholes and the peepholes of the introduced framework with the macro PEEPHOLE_OPTIMIZER.
Comment 2 Sam Weinig 2008-12-08 10:38:33 PST
What is the performance change due to this patch? Is it a win?
Comment 3 Judit Jász 2008-12-08 11:02:56 PST
(In reply to comment #2) > What is the performance change due to this patch? Is it a win? > Whit these existed peepholes there is no any significant gain (and sometimes are minimal slow down). But I think if we increase the number of peepholes there could be some performance improvement. I will send the detailed performance results soon.
Comment 4 Judit Jász 2008-12-08 12:46:01 PST
Created attachment 25850 [details] performance results (on jit)
Comment 6 Oliver Hunt 2009-01-22 18:34:23 PST
Comment on attachment 25843 [details] Peephole framwork with state machine While this patch looks like it would be a great addition and improvement over the existing simplistic optimiser there are a number of issues that we'll need to address. The most significant of these is the use of the STL -- basically WebKit does not use STL, and for the more complex STL structures (such as vector, etc) we run the risk of mismatching malloc implementations. My other concern is that the state tables appear to be hand coded, and I would much rather we had a way to autogenerate them. Ignoring those issues the code itself looks good (bar some minor style errors) and probably just needs to be brought up to tip of tree (we do now have length info for each opcode which should remove/reduce that portion of this patch). Will CC myself to this bug so i don't miss updates. Many apologies for the huge delay in review :-(
Comment 7 Judit Jász 2009-01-23 09:04:26 PST
Created attachment 26973 [details] updated patch This patch contains the up to date version of the peephole framework patch with some new patterns. There are no significant performance differences on the SunSpider tests, but only our WindScorpion benchmark (http://www.sed.hu/webkit/?page=downloads). The performance gain is about 3% in this later case. The state tables of the peepholes are included by the peephole.cpp. This file is fully (re)generated at compilation time by the execution of the code of peephole_generator.cpp If you want to use this peephole framework, define the PEEPHOLE_OPTIMIZER macro.
Comment 8 Oliver Hunt 2009-01-23 11:06:58 PST
Comment on attachment 26973 [details] updated patch Ah ha, peephole_generator is stand alone executable i take it? Sorry, I misunderstood that in my first pass Hmmm, I'm not sure I understand the change to JITCall.cpp? --Oliver
Comment 9 Judit Jász 2009-01-23 11:53:44 PST
(In reply to comment #8) > (From update of attachment 26973 [details] [review]) > Ah ha, peephole_generator is stand alone executable i take it? Sorry, I > misunderstood that in my first pass > > Hmmm, I'm not sure I understand the change to JITCall.cpp? > > --Oliver Ok, the modification of the JITCall does not connect to the peephole framework. It is only the remainder of my earlier bug riport: https://bugs.webkit.org/show_bug.cgi?id=23497
Comment 10 Geoffrey Garen 2009-01-26 17:16:51 PST
Latest performance results still show no improvement.
Comment 11 Oliver Hunt 2009-01-26 17:28:17 PST
Geoff, while i realise there isn't a perf improvement i believe this is the direction we wish to move in future. A few things that need to occur in the patch (that i noticed over the weekend) peephole_optimiser should be renamed to PeepholeOptimiser (or rather appropriate changes to the generator need to occur to make this happen). Generation of the optimiser should be a build step -- the generated file itself shouldn't be included in the patch.
Comment 12 Csaba Osztrogonác 2009-03-11 03:23:31 PDT
Created attachment 28469 [details] updated patch I removed the previously included generated file, and renamed files suite to code style guidelines. Additionally the optimizer generation moved to a build phase, already it works on MAC, Qt, GTK and Windows platforms.
Comment 13 Csaba Osztrogonác 2009-03-11 03:27:15 PDT
Created attachment 28470 [details] performance result on MAC
Comment 14 Csaba Osztrogonác 2009-03-11 03:28:50 PDT
Created attachment 28471 [details] performance result (if JIT and WREC disabled)
Comment 15 Csaba Osztrogonác 2009-03-11 03:33:51 PDT
(In reply to comment #12) > Created an attachment (id=28469) [review] > updated patch And I forgot to mention, the second peephole pattern fixed, which had a hidden bug. (It works correctly on SunSpider and Regression tests, but appeared with another test.)
Comment 16 Oliver Hunt 2009-03-11 03:53:26 PDT
I'm not entirely convinced by this patch, while it is helpful to have a sane peephole optimiser the current tradeoff between win vs. additional complexity does not make it seem entirely worth while. It would be interesting to know what the results are if you do run-sunspider --v8 (which uses the original v8 tests) to get further performance information.
Comment 17 Eric Seidel (no email) 2009-05-21 19:04:21 PDT
Comment on attachment 28469 [details] updated patch Sounds like Oliver is waiting on v8 numbers to review this. Given that there has been no activity in this bug in 2 months, I'm marking this as r- to remove it from the review queue. You should feel encouraged to email oliver or geoff or maciej directly for further guidance as to where to take this patch. Thanks again for the patch!
Comment 18 Maciej Stachowiak 2009-05-21 22:36:46 PDT
Please reflag for review when the requested perf testing info is provided.
Comment 19 Csaba Osztrogonác 2009-12-16 23:23:17 PST
I closed it, because it is obsolete now.