There's softModulo implementation for ARM in baseline JIT. We can extend it to support more platforms like X86, and also apply it to DFG JIT. As there are some fast paths handling special cases performance gains are expected. On X86 (Core i7 Nehalem) Linux we're able to see 1% improvement on Kraken benchmark, mostly due to the 11% improvement on audio-oscillator. Neutral on SunSpider and V8. TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.011x as fast 5198.2ms +/- 0.2% 5141.6ms +/- 0.2% significant ============================================================================= ai: 1.007x as fast 794.1ms +/- 0.4% 788.9ms +/- 0.2% significant astar: 1.007x as fast 794.1ms +/- 0.4% 788.9ms +/- 0.2% significant audio: 1.029x as fast 1455.4ms +/- 0.3% 1413.9ms +/- 0.4% significant beat-detection: ?? 424.1ms +/- 1.0% 425.0ms +/- 0.6% not conclusive: might be *1.002x as slow* dft: - 378.3ms +/- 0.5% 376.8ms +/- 0.4% fft: - 255.6ms +/- 0.3% 254.8ms +/- 0.4% oscillator: 1.112x as fast 397.4ms +/- 0.3% 357.3ms +/- 0.6% significant imaging: - 2062.5ms +/- 0.3% 2056.4ms +/- 0.3% gaussian-blur: - 753.6ms +/- 0.2% 752.7ms +/- 0.7% darkroom: - 417.7ms +/- 0.6% 417.1ms +/- 0.9% desaturate: 1.005x as fast 891.2ms +/- 0.5% 886.6ms +/- 0.1% significant json: - 199.4ms +/- 0.7% 199.0ms +/- 0.5% parse-financial: ?? 77.9ms +/- 0.3% 78.1ms +/- 0.3% not conclusive: might be *1.003x as slow* stringify-tinderbox: - 121.5ms +/- 1.0% 120.9ms +/- 0.8% stanford: 1.005x as fast 686.8ms +/- 0.5% 683.4ms +/- 0.2% significant crypto-aes: ?? 137.6ms +/- 0.9% 138.3ms +/- 0.6% not conclusive: might be *1.005x as slow* crypto-ccm: - 144.1ms +/- 0.3% 143.3ms +/- 0.7% crypto-pbkdf2: 1.010x as fast 294.9ms +/- 0.4% 292.0ms +/- 0.3% significant crypto-sha256-iterative: - 110.2ms +/- 0.9% 109.8ms +/- 1.2%
Created attachment 115385 [details] proposed patch Not marking review? - The major problem is that I don't have an ARM build environment and platform to test the patch. Gavin, if possible could you please help test it on ARM if you think the modification is worthwhile? Otherwise I will try to setup such environment though it may take a bit long time... Thanks.
Will do!
Created attachment 115509 [details] patch updated Just found the previous patch may have problem in DFG as in certain cases the dividend or divisor registers could be modified (for negative dividend or negative divisor), if they're not spilled and not the last use then future operations on those operands could generate wrong results.
Created attachment 115512 [details] Another update Sorry... missing another fix in the previous patch. Attached a new one.
Comment on attachment 115512 [details] Another update I think it's ready for review. Had a test on ARMv7 and no regression is observed comparing to the code w/o the patch, DFG on or not.
r=me. Sorry for the long delay in getting around to looking at this! By the way, do you know how many of the cases where this is a win have a constant divisor?
(In reply to comment #6) > r=me. Sorry for the long delay in getting around to looking at this! > > By the way, do you know how many of the cases where this is a win have a constant divisor? Thanks! I don't know the answer to your question. While I guess we can add some fast paths to complex arithmetic operations (e.g., mod, div, mul) for known constants, where we could do some strength reduction peephole optimizations and see how they impact the performance. What do you think?
Comment on attachment 115512 [details] Another update Clearing flags on attachment: 115512 Committed r100881: <http://trac.webkit.org/changeset/100881>
All reviewed patches have been landed. Closing bug.