WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
72501
Improve modulo operation on 32bit platforms
https://bugs.webkit.org/show_bug.cgi?id=72501
Summary
Improve modulo operation on 32bit platforms
Yuqiang Xian
Reported
2011-11-16 06:41:51 PST
There's softModulo implementation for ARM in baseline JIT. We can extend it to support more platforms like X86, and also apply it to DFG JIT. As there are some fast paths handling special cases performance gains are expected. On X86 (Core i7 Nehalem) Linux we're able to see 1% improvement on Kraken benchmark, mostly due to the 11% improvement on audio-oscillator. Neutral on SunSpider and V8. TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.011x as fast 5198.2ms +/- 0.2% 5141.6ms +/- 0.2% significant ============================================================================= ai: 1.007x as fast 794.1ms +/- 0.4% 788.9ms +/- 0.2% significant astar: 1.007x as fast 794.1ms +/- 0.4% 788.9ms +/- 0.2% significant audio: 1.029x as fast 1455.4ms +/- 0.3% 1413.9ms +/- 0.4% significant beat-detection: ?? 424.1ms +/- 1.0% 425.0ms +/- 0.6% not conclusive: might be *1.002x as slow* dft: - 378.3ms +/- 0.5% 376.8ms +/- 0.4% fft: - 255.6ms +/- 0.3% 254.8ms +/- 0.4% oscillator: 1.112x as fast 397.4ms +/- 0.3% 357.3ms +/- 0.6% significant imaging: - 2062.5ms +/- 0.3% 2056.4ms +/- 0.3% gaussian-blur: - 753.6ms +/- 0.2% 752.7ms +/- 0.7% darkroom: - 417.7ms +/- 0.6% 417.1ms +/- 0.9% desaturate: 1.005x as fast 891.2ms +/- 0.5% 886.6ms +/- 0.1% significant json: - 199.4ms +/- 0.7% 199.0ms +/- 0.5% parse-financial: ?? 77.9ms +/- 0.3% 78.1ms +/- 0.3% not conclusive: might be *1.003x as slow* stringify-tinderbox: - 121.5ms +/- 1.0% 120.9ms +/- 0.8% stanford: 1.005x as fast 686.8ms +/- 0.5% 683.4ms +/- 0.2% significant crypto-aes: ?? 137.6ms +/- 0.9% 138.3ms +/- 0.6% not conclusive: might be *1.005x as slow* crypto-ccm: - 144.1ms +/- 0.3% 143.3ms +/- 0.7% crypto-pbkdf2: 1.010x as fast 294.9ms +/- 0.4% 292.0ms +/- 0.3% significant crypto-sha256-iterative: - 110.2ms +/- 0.9% 109.8ms +/- 1.2%
Attachments
proposed patch
(14.40 KB, patch)
2011-11-16 08:33 PST
,
Yuqiang Xian
no flags
Details
Formatted Diff
Diff
patch updated
(14.82 KB, patch)
2011-11-16 18:36 PST
,
Yuqiang Xian
no flags
Details
Formatted Diff
Diff
Another update
(14.84 KB, patch)
2011-11-16 18:58 PST
,
Yuqiang Xian
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Yuqiang Xian
Comment 1
2011-11-16 08:33:32 PST
Created
attachment 115385
[details]
proposed patch Not marking review? - The major problem is that I don't have an ARM build environment and platform to test the patch. Gavin, if possible could you please help test it on ARM if you think the modification is worthwhile? Otherwise I will try to setup such environment though it may take a bit long time... Thanks.
Gavin Barraclough
Comment 2
2011-11-16 10:51:44 PST
Will do!
Yuqiang Xian
Comment 3
2011-11-16 18:36:33 PST
Created
attachment 115509
[details]
patch updated Just found the previous patch may have problem in DFG as in certain cases the dividend or divisor registers could be modified (for negative dividend or negative divisor), if they're not spilled and not the last use then future operations on those operands could generate wrong results.
Yuqiang Xian
Comment 4
2011-11-16 18:58:41 PST
Created
attachment 115512
[details]
Another update Sorry... missing another fix in the previous patch. Attached a new one.
Yuqiang Xian
Comment 5
2011-11-17 23:07:11 PST
Comment on
attachment 115512
[details]
Another update I think it's ready for review. Had a test on ARMv7 and no regression is observed comparing to the code w/o the patch, DFG on or not.
Filip Pizlo
Comment 6
2011-11-20 19:19:56 PST
r=me. Sorry for the long delay in getting around to looking at this! By the way, do you know how many of the cases where this is a win have a constant divisor?
Yuqiang Xian
Comment 7
2011-11-20 19:40:42 PST
(In reply to
comment #6
)
> r=me. Sorry for the long delay in getting around to looking at this! > > By the way, do you know how many of the cases where this is a win have a constant divisor?
Thanks! I don't know the answer to your question. While I guess we can add some fast paths to complex arithmetic operations (e.g., mod, div, mul) for known constants, where we could do some strength reduction peephole optimizations and see how they impact the performance. What do you think?
WebKit Review Bot
Comment 8
2011-11-20 20:47:17 PST
Comment on
attachment 115512
[details]
Another update Clearing flags on attachment: 115512 Committed
r100881
: <
http://trac.webkit.org/changeset/100881
>
WebKit Review Bot
Comment 9
2011-11-20 20:47:26 PST
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug