RESOLVED FIXED 72878
Strength reduction for Mul and Mod operations for known constants in DFG
https://bugs.webkit.org/show_bug.cgi?id=72878
Summary Strength reduction for Mul and Mod operations for known constants in DFG
Yuqiang Xian
Reported 2011-11-21 06:41:44 PST
We can do strength reduction for some expensive arithmetic operations like Mul, Mod etc., especially it would be easier for known constants such as power-of-two constants.
Attachments
the patch (7.05 KB, patch)
2011-11-21 06:57 PST, Yuqiang Xian
no flags
patch updated (25.75 KB, patch)
2011-11-21 19:21 PST, Yuqiang Xian
no flags
patch fixing the 64bit regression (25.90 KB, patch)
2011-11-21 21:51 PST, Yuqiang Xian
no flags
simple test case (598 bytes, text/plain)
2011-11-22 20:00 PST, Yuqiang Xian
no flags
Yuqiang Xian
Comment 1 2011-11-21 06:57:02 PST
Created attachment 116079 [details] the patch No obvious performance gains on v8, sunspider and kraken benchmarks, though. Also currently only for 32_64 DFG.
Filip Pizlo
Comment 2 2011-11-21 10:57:16 PST
Can you add 64-bit code for this? It would be great if we could keep the two JITs in sync whenever possible.
Yuqiang Xian
Comment 3 2011-11-21 19:21:02 PST
Created attachment 116168 [details] patch updated Adding 64bit support, and also try to share the code between 64 and 32_64.
Yuqiang Xian
Comment 4 2011-11-21 19:47:06 PST
Comment on attachment 116168 [details] patch updated Sorry. Seems one failure in 64-bit layout test. Still under investigation. So clear review flags for now.
Yuqiang Xian
Comment 5 2011-11-21 21:51:15 PST
Created attachment 116177 [details] patch fixing the 64bit regression In the ArithMod fast path, the dividend value could be the final result (in case of |dividend| < |divisor|) and the result cannot be a tagged integer, so we speculate it as strict int32.
Yuqiang Xian
Comment 6 2011-11-22 18:59:13 PST
Performance results on "64-bit": 1% on SunSpider, neutral elsewhere. SunSpider: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.011x as fast 181.7ms +/- 0.9% 179.7ms +/- 0.6% significant ============================================================================= 3d: - 26.7ms +/- 1.8% 26.4ms +/- 1.4% cube: - 10.0ms +/- 0.0% 10.0ms +/- 0.0% morph: - 8.0ms +/- 0.0% 8.0ms +/- 0.0% raytrace: - 8.7ms +/- 5.5% 8.4ms +/- 4.4% access: - 17.1ms +/- 6.4% 16.9ms +/- 4.2% binary-trees: - 2.6ms +/- 23.2% 2.4ms +/- 25.1% fannkuch: - 7.3ms +/- 4.7% 7.1ms +/- 3.2% nbody: - 4.0ms +/- 0.0% 4.0ms +/- 0.0% nsieve: ?? 3.2ms +/- 14.1% 3.4ms +/- 10.9% not conclusive: might be *1.063x as slow* bitops: - 13.5ms +/- 3.7% 13.1ms +/- 1.7% 3bit-bits-in-byte: - 1.0ms +/- 0.0% 1.0ms +/- 0.0% bits-in-byte: - 4.3ms +/- 8.0% 4.1ms +/- 5.5% bitwise-and: - 3.1ms +/- 7.3% 3.0ms +/- 0.0% nsieve-bits: - 5.1ms +/- 4.4% 5.0ms +/- 0.0% controlflow: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% recursive: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% crypto: ?? 11.0ms +/- 0.0% 11.1ms +/- 2.0% not conclusive: might be *1.009x as slow* aes: ?? 7.0ms +/- 0.0% 7.1ms +/- 3.2% not conclusive: might be *1.014x as slow* md5: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% sha1: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% date: 1.052x as fast 24.4ms +/- 1.5% 23.2ms +/- 1.3% significant format-tofte: - 12.4ms +/- 3.0% 12.2ms +/- 2.5% format-xparb: 1.091x as fast 12.0ms +/- 0.0% 11.0ms +/- 0.0% significant math: - 19.0ms +/- 0.0% 19.0ms +/- 0.0% cordic: - 7.0ms +/- 0.0% 7.0ms +/- 0.0% partial-sums: - 10.0ms +/- 0.0% 10.0ms +/- 0.0% spectral-norm: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% regexp: - 14.0ms +/- 0.0% 14.0ms +/- 0.0% dna: - 14.0ms +/- 0.0% 14.0ms +/- 0.0% string: - 54.0ms +/- 0.0% 54.0ms +/- 0.0% base64: - 4.0ms +/- 0.0% 4.0ms +/- 0.0% fasta: - 8.0ms +/- 0.0% 8.0ms +/- 0.0% tagcloud: - 13.0ms +/- 0.0% 13.0ms +/- 0.0% unpack-code: - 23.0ms +/- 0.0% 23.0ms +/- 0.0% validate-input: - 6.0ms +/- 0.0% 6.0ms +/- 0.0% Kraken: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: - 3782.4ms +/- 0.4% 3766.0ms +/- 0.3% ============================================================================= ai: - 803.6ms +/- 1.0% 800.3ms +/- 0.5% astar: - 803.6ms +/- 1.0% 800.3ms +/- 0.5% audio: - 1049.8ms +/- 0.3% 1049.5ms +/- 0.1% beat-detection: - 208.1ms +/- 0.1% 207.9ms +/- 0.1% dft: - 436.4ms +/- 0.5% 435.0ms +/- 0.1% fft: ?? 137.3ms +/- 0.3% 138.0ms +/- 0.6% not conclusive: might be *1.005x as slow* oscillator: ?? 268.0ms +/- 0.4% 268.6ms +/- 0.1% not conclusive: might be *1.002x as slow* imaging: 1.010x as fast 1159.8ms +/- 0.6% 1148.0ms +/- 0.5% significant gaussian-blur: - 585.7ms +/- 0.3% 585.0ms +/- 0.1% darkroom: - 341.0ms +/- 2.0% 334.9ms +/- 1.8% desaturate: 1.022x as fast 233.1ms +/- 0.2% 228.1ms +/- 0.1% significant json: ?? 170.1ms +/- 0.4% 170.3ms +/- 0.3% not conclusive: might be *1.001x as slow* parse-financial: *1.014x as slow* 74.0ms +/- 0.5% 75.0ms +/- 0.6% significant stringify-tinderbox: 1.008x as fast 96.1ms +/- 0.5% 95.3ms +/- 0.4% significant stanford: - 599.1ms +/- 0.4% 597.9ms +/- 0.3% crypto-aes: - 127.9ms +/- 0.8% 128.0ms +/- 0.4% crypto-ccm: ?? 127.9ms +/- 0.3% 128.2ms +/- 0.4% not conclusive: might be *1.002x as slow* crypto-pbkdf2: - 237.0ms +/- 0.5% 236.2ms +/- 0.8% crypto-sha256-iterative: 1.008x as fast 106.3ms +/- 0.3% 105.5ms +/- 0.4% significant V8: FROM - Richards: 9613 DeltaBlue: 6407 Crypto: 15087 RayTrace: 8427 EarleyBoyer: 8602 RegExp: 1838 Splay: 6306 ---- Score (version 6): 6947 TO - Richards: 9616 DeltaBlue: 6341 Crypto: 15109 RayTrace: 8312 EarleyBoyer: 8648 RegExp: 1856 Splay: 6275 ---- Score (version 6): 6935
Filip Pizlo
Comment 7 2011-11-22 19:13:14 PST
Comment on attachment 116177 [details] patch fixing the 64bit regression View in context: https://bugs.webkit.org/attachment.cgi?id=116177&action=review Looks great! Two comments: 1) Can you check that the indentation is 4 spaces, no tabs, in the one place I pointed out? 2) Since there is no clear win on major benchmarks, can you construct a micro benchmark that demonstrates the win? It can be arbitrarily simple. I think that this patch is a good idea, and it's the right direction, but we just need to make sure that it does what we think it does. I'm marking it r+ and cq- so that you can check the indentation. > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:1602 > + if (x) { > + num = x; > + log += shift; > + } Is the indentation right here?
Filip Pizlo
Comment 8 2011-11-22 19:16:34 PST
> format-xparb: 1.091x as fast 12.0ms +/- 0.0% 11.0ms +/- 0.0% significant Something to keep in mind: date-format-xparb fluctuates by 10% every time I breathe on the DFG, for any reason. So I tend to ignore it. That's not to say that this isn't a true progression. But taking progressions on xparb seriously is bad luck: next time you make a patch you might find xparb regressing by 10%, for no reason at all! ;-)
Yuqiang Xian
Comment 9 2011-11-22 20:00:21 PST
Created attachment 116309 [details] simple test case Filip, thanks for the comments! Here's a very simple test case for constant power of two mul and mod. Tested on 64bit: ToT - constant multiply result: 9999999900000000, in 878 ms constant modulo result: 50000000, in 784 ms Patched - constant multiply result: 9999999900000000, in 721 ms constant modulo result: 50000000, in 721 ms
Filip Pizlo
Comment 10 2011-11-22 20:04:13 PST
(In reply to comment #9) > Created an attachment (id=116309) [details] > simple test case > > Filip, thanks for the comments! > > Here's a very simple test case for constant power of two mul and mod. > > Tested on 64bit: > > ToT - > constant multiply result: 9999999900000000, in 878 ms > constant modulo result: 50000000, in 784 ms > > Patched - > constant multiply result: 9999999900000000, in 721 ms > constant modulo result: 50000000, in 721 ms That's great! Note that 'i' and 'iter' are global variables in your test, so probably you could get even better performance by making 'iter' an argument to the test functions (so it's local) or maybe loading it into a local 'var' at the top of those functions, and then declaring 'i' as a var.
Yuqiang Xian
Comment 11 2011-11-22 20:19:23 PST
Yuqiang Xian
Comment 12 2011-11-22 20:20:08 PST
Comment on attachment 116177 [details] patch fixing the 64bit regression Clear flags for landed patch.
Andy Wingo
Comment 13 2011-11-23 08:32:30 PST
(In reply to comment #8) > > format-xparb: 1.091x as fast 12.0ms +/- 0.0% 11.0ms +/- 0.0% significant > > Something to keep in mind: date-format-xparb fluctuates by 10% every time I breathe on the DFG, for any reason. So I tend to ignore it. That's not to say that this isn't a true progression. But taking progressions on xparb seriously is bad luck: next time you make a patch you might find xparb regressing by 10%, for no reason at all! ;-) Sounds like the integral milliseconds truncation of bug 71801.
Filip Pizlo
Comment 14 2011-11-23 11:33:46 PST
(In reply to comment #13) > (In reply to comment #8) > > > format-xparb: 1.091x as fast 12.0ms +/- 0.0% 11.0ms +/- 0.0% significant > > > > Something to keep in mind: date-format-xparb fluctuates by 10% every time I breathe on the DFG, for any reason. So I tend to ignore it. That's not to say that this isn't a true progression. But taking progressions on xparb seriously is bad luck: next time you make a patch you might find xparb regressing by 10%, for no reason at all! ;-) > > Sounds like the integral milliseconds truncation of bug 71801. This pathology shows up even when doing microsecond timing with Tools/Scripts/bencher.
Note You need to log in before you can comment on or make changes to this bug.