72878 – Strength reduction for Mul and Mod operations for known constants in DFG

RESOLVED FIXED 72878

Strength reduction for Mul and Mod operations for known constants in DFG

https://bugs.webkit.org/show_bug.cgi?id=72878

Summary Strength reduction for Mul and Mod operations for known constants in DFG

Yuqiang Xian

Reported 2011-11-21 06:41:44 PST

We can do strength reduction for some expensive arithmetic operations like Mul, Mod etc., especially it would be easier for known constants such as power-of-two constants.

Attachments
the patch (7.05 KB, patch) 2011-11-21 06:57 PST, Yuqiang Xian	no flags	Details Formatted Diff Diff
patch updated (25.75 KB, patch) 2011-11-21 19:21 PST, Yuqiang Xian	no flags	Details Formatted Diff Diff
patch fixing the 64bit regression (25.90 KB, patch) 2011-11-21 21:51 PST, Yuqiang Xian	no flags	Details Formatted Diff Diff
simple test case (598 bytes, text/plain) 2011-11-22 20:00 PST, Yuqiang Xian	no flags	Details
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Yuqiang Xian

Comment 1 2011-11-21 06:57:02 PST

Created attachment 116079 [details] the patch No obvious performance gains on v8, sunspider and kraken benchmarks, though. Also currently only for 32_64 DFG.

Filip Pizlo

Comment 2 2011-11-21 10:57:16 PST

Can you add 64-bit code for this? It would be great if we could keep the two JITs in sync whenever possible.

Yuqiang Xian

Comment 3 2011-11-21 19:21:02 PST

Created attachment 116168 [details] patch updated Adding 64bit support, and also try to share the code between 64 and 32_64.

Yuqiang Xian

Comment 4 2011-11-21 19:47:06 PST

Comment on attachment 116168 [details] patch updated Sorry. Seems one failure in 64-bit layout test. Still under investigation. So clear review flags for now.

Yuqiang Xian

Comment 5 2011-11-21 21:51:15 PST

Created attachment 116177 [details] patch fixing the 64bit regression In the ArithMod fast path, the dividend value could be the final result (in case of |dividend| < |divisor|) and the result cannot be a tagged integer, so we speculate it as strict int32.

Yuqiang Xian

Comment 6 2011-11-22 18:59:13 PST

Performance results on "64-bit": 1% on SunSpider, neutral elsewhere. SunSpider: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.011x as fast 181.7ms +/- 0.9% 179.7ms +/- 0.6% significant ============================================================================= 3d: - 26.7ms +/- 1.8% 26.4ms +/- 1.4% cube: - 10.0ms +/- 0.0% 10.0ms +/- 0.0% morph: - 8.0ms +/- 0.0% 8.0ms +/- 0.0% raytrace: - 8.7ms +/- 5.5% 8.4ms +/- 4.4% access: - 17.1ms +/- 6.4% 16.9ms +/- 4.2% binary-trees: - 2.6ms +/- 23.2% 2.4ms +/- 25.1% fannkuch: - 7.3ms +/- 4.7% 7.1ms +/- 3.2% nbody: - 4.0ms +/- 0.0% 4.0ms +/- 0.0% nsieve: ?? 3.2ms +/- 14.1% 3.4ms +/- 10.9% not conclusive: might be *1.063x as slow* bitops: - 13.5ms +/- 3.7% 13.1ms +/- 1.7% 3bit-bits-in-byte: - 1.0ms +/- 0.0% 1.0ms +/- 0.0% bits-in-byte: - 4.3ms +/- 8.0% 4.1ms +/- 5.5% bitwise-and: - 3.1ms +/- 7.3% 3.0ms +/- 0.0% nsieve-bits: - 5.1ms +/- 4.4% 5.0ms +/- 0.0% controlflow: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% recursive: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% crypto: ?? 11.0ms +/- 0.0% 11.1ms +/- 2.0% not conclusive: might be *1.009x as slow* aes: ?? 7.0ms +/- 0.0% 7.1ms +/- 3.2% not conclusive: might be *1.014x as slow* md5: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% sha1: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% date: 1.052x as fast 24.4ms +/- 1.5% 23.2ms +/- 1.3% significant format-tofte: - 12.4ms +/- 3.0% 12.2ms +/- 2.5% format-xparb: 1.091x as fast 12.0ms +/- 0.0% 11.0ms +/- 0.0% significant math: - 19.0ms +/- 0.0% 19.0ms +/- 0.0% cordic: - 7.0ms +/- 0.0% 7.0ms +/- 0.0% partial-sums: - 10.0ms +/- 0.0% 10.0ms +/- 0.0% spectral-norm: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% regexp: - 14.0ms +/- 0.0% 14.0ms +/- 0.0% dna: - 14.0ms +/- 0.0% 14.0ms +/- 0.0% string: - 54.0ms +/- 0.0% 54.0ms +/- 0.0% base64: - 4.0ms +/- 0.0% 4.0ms +/- 0.0% fasta: - 8.0ms +/- 0.0% 8.0ms +/- 0.0% tagcloud: - 13.0ms +/- 0.0% 13.0ms +/- 0.0% unpack-code: - 23.0ms +/- 0.0% 23.0ms +/- 0.0% validate-input: - 6.0ms +/- 0.0% 6.0ms +/- 0.0% Kraken: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: - 3782.4ms +/- 0.4% 3766.0ms +/- 0.3% ============================================================================= ai: - 803.6ms +/- 1.0% 800.3ms +/- 0.5% astar: - 803.6ms +/- 1.0% 800.3ms +/- 0.5% audio: - 1049.8ms +/- 0.3% 1049.5ms +/- 0.1% beat-detection: - 208.1ms +/- 0.1% 207.9ms +/- 0.1% dft: - 436.4ms +/- 0.5% 435.0ms +/- 0.1% fft: ?? 137.3ms +/- 0.3% 138.0ms +/- 0.6% not conclusive: might be *1.005x as slow* oscillator: ?? 268.0ms +/- 0.4% 268.6ms +/- 0.1% not conclusive: might be *1.002x as slow* imaging: 1.010x as fast 1159.8ms +/- 0.6% 1148.0ms +/- 0.5% significant gaussian-blur: - 585.7ms +/- 0.3% 585.0ms +/- 0.1% darkroom: - 341.0ms +/- 2.0% 334.9ms +/- 1.8% desaturate: 1.022x as fast 233.1ms +/- 0.2% 228.1ms +/- 0.1% significant json: ?? 170.1ms +/- 0.4% 170.3ms +/- 0.3% not conclusive: might be *1.001x as slow* parse-financial: *1.014x as slow* 74.0ms +/- 0.5% 75.0ms +/- 0.6% significant stringify-tinderbox: 1.008x as fast 96.1ms +/- 0.5% 95.3ms +/- 0.4% significant stanford: - 599.1ms +/- 0.4% 597.9ms +/- 0.3% crypto-aes: - 127.9ms +/- 0.8% 128.0ms +/- 0.4% crypto-ccm: ?? 127.9ms +/- 0.3% 128.2ms +/- 0.4% not conclusive: might be *1.002x as slow* crypto-pbkdf2: - 237.0ms +/- 0.5% 236.2ms +/- 0.8% crypto-sha256-iterative: 1.008x as fast 106.3ms +/- 0.3% 105.5ms +/- 0.4% significant V8: FROM - Richards: 9613 DeltaBlue: 6407 Crypto: 15087 RayTrace: 8427 EarleyBoyer: 8602 RegExp: 1838 Splay: 6306 ---- Score (version 6): 6947 TO - Richards: 9616 DeltaBlue: 6341 Crypto: 15109 RayTrace: 8312 EarleyBoyer: 8648 RegExp: 1856 Splay: 6275 ---- Score (version 6): 6935

Filip Pizlo

Comment 7 2011-11-22 19:13:14 PST

Comment on attachment 116177 [details] patch fixing the 64bit regression View in context: https://bugs.webkit.org/attachment.cgi?id=116177&action=review Looks great! Two comments: 1) Can you check that the indentation is 4 spaces, no tabs, in the one place I pointed out? 2) Since there is no clear win on major benchmarks, can you construct a micro benchmark that demonstrates the win? It can be arbitrarily simple. I think that this patch is a good idea, and it's the right direction, but we just need to make sure that it does what we think it does. I'm marking it r+ and cq- so that you can check the indentation. > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:1602 > + if (x) { > + num = x; > + log += shift; > + } Is the indentation right here?

Filip Pizlo

Comment 8 2011-11-22 19:16:34 PST

> format-xparb: 1.091x as fast 12.0ms +/- 0.0% 11.0ms +/- 0.0% significant Something to keep in mind: date-format-xparb fluctuates by 10% every time I breathe on the DFG, for any reason. So I tend to ignore it. That's not to say that this isn't a true progression. But taking progressions on xparb seriously is bad luck: next time you make a patch you might find xparb regressing by 10%, for no reason at all! ;-)

Yuqiang Xian

Comment 9 2011-11-22 20:00:21 PST

Created attachment 116309 [details] simple test case Filip, thanks for the comments! Here's a very simple test case for constant power of two mul and mod. Tested on 64bit: ToT - constant multiply result: 9999999900000000, in 878 ms constant modulo result: 50000000, in 784 ms Patched - constant multiply result: 9999999900000000, in 721 ms constant modulo result: 50000000, in 721 ms

Filip Pizlo

Comment 10 2011-11-22 20:04:13 PST

(In reply to comment #9) > Created an attachment (id=116309) [details] > simple test case > > Filip, thanks for the comments! > > Here's a very simple test case for constant power of two mul and mod. > > Tested on 64bit: > > ToT - > constant multiply result: 9999999900000000, in 878 ms > constant modulo result: 50000000, in 784 ms > > Patched - > constant multiply result: 9999999900000000, in 721 ms > constant modulo result: 50000000, in 721 ms That's great! Note that 'i' and 'iter' are global variables in your test, so probably you could get even better performance by making 'iter' an argument to the test functions (so it's local) or maybe loading it into a local 'var' at the top of those functions, and then declaring 'i' as a var.

Yuqiang Xian

Comment 11 2011-11-22 20:19:23 PST

Landed as r101042 - http://trac.webkit.org/changeset/101042

Yuqiang Xian

Comment 12 2011-11-22 20:20:08 PST

Comment on attachment 116177 [details] patch fixing the 64bit regression Clear flags for landed patch.

Andy Wingo

Comment 13 2011-11-23 08:32:30 PST

(In reply to comment #8) > > format-xparb: 1.091x as fast 12.0ms +/- 0.0% 11.0ms +/- 0.0% significant > > Something to keep in mind: date-format-xparb fluctuates by 10% every time I breathe on the DFG, for any reason. So I tend to ignore it. That's not to say that this isn't a true progression. But taking progressions on xparb seriously is bad luck: next time you make a patch you might find xparb regressing by 10%, for no reason at all! ;-) Sounds like the integral milliseconds truncation of bug 71801.

Filip Pizlo

Comment 14 2011-11-23 11:33:46 PST

(In reply to comment #13) > (In reply to comment #8) > > > format-xparb: 1.091x as fast 12.0ms +/- 0.0% 11.0ms +/- 0.0% significant > > > > Something to keep in mind: date-format-xparb fluctuates by 10% every time I breathe on the DFG, for any reason. So I tend to ignore it. That's not to say that this isn't a true progression. But taking progressions on xparb seriously is bad luck: next time you make a patch you might find xparb regressing by 10%, for no reason at all! ;-) > > Sounds like the integral milliseconds truncation of bug 71801. This pathology shows up even when doing microsecond timing with Tools/Scripts/bencher.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component JavaScriptCore

Assignee

Nobody

Reported

2011-11-21 06:41 PST

Modified

2011-11-23 11:33 PST History

CC List

3 users Show

URL

Keywords

Depends on

Blocks