Bug 72501 - Improve modulo operation on 32bit platforms
Summary: Improve modulo operation on 32bit platforms
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-16 06:41 PST by Yuqiang Xian
Modified: 2011-11-20 20:47 PST (History)
3 users (show)

See Also:


Attachments
proposed patch (14.40 KB, patch)
2011-11-16 08:33 PST, Yuqiang Xian
no flags Details | Formatted Diff | Diff
patch updated (14.82 KB, patch)
2011-11-16 18:36 PST, Yuqiang Xian
no flags Details | Formatted Diff | Diff
Another update (14.84 KB, patch)
2011-11-16 18:58 PST, Yuqiang Xian
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yuqiang Xian 2011-11-16 06:41:51 PST
There's softModulo implementation for ARM in baseline JIT. We can extend it to support more platforms like X86, and also apply it to DFG JIT. As there are some fast paths handling special cases performance gains are expected. On X86 (Core i7 Nehalem) Linux we're able to see 1% improvement on Kraken benchmark, mostly due to the 11% improvement on audio-oscillator. Neutral on SunSpider and V8.

TEST                         COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:                 1.011x as fast    5198.2ms +/- 0.2%   5141.6ms +/- 0.2%     significant

=============================================================================

  ai:                        1.007x as fast     794.1ms +/- 0.4%    788.9ms +/- 0.2%     significant
    astar:                   1.007x as fast     794.1ms +/- 0.4%    788.9ms +/- 0.2%     significant

  audio:                     1.029x as fast    1455.4ms +/- 0.3%   1413.9ms +/- 0.4%     significant
    beat-detection:          ??                 424.1ms +/- 1.0%    425.0ms +/- 0.6%     not conclusive: might be *1.002x as slow*
    dft:                     -                  378.3ms +/- 0.5%    376.8ms +/- 0.4%
    fft:                     -                  255.6ms +/- 0.3%    254.8ms +/- 0.4%
    oscillator:              1.112x as fast     397.4ms +/- 0.3%    357.3ms +/- 0.6%     significant

  imaging:                   -                 2062.5ms +/- 0.3%   2056.4ms +/- 0.3%
    gaussian-blur:           -                  753.6ms +/- 0.2%    752.7ms +/- 0.7%
    darkroom:                -                  417.7ms +/- 0.6%    417.1ms +/- 0.9%
    desaturate:              1.005x as fast     891.2ms +/- 0.5%    886.6ms +/- 0.1%     significant

  json:                      -                  199.4ms +/- 0.7%    199.0ms +/- 0.5%
    parse-financial:         ??                  77.9ms +/- 0.3%     78.1ms +/- 0.3%     not conclusive: might be *1.003x as slow*
    stringify-tinderbox:     -                  121.5ms +/- 1.0%    120.9ms +/- 0.8%

  stanford:                  1.005x as fast     686.8ms +/- 0.5%    683.4ms +/- 0.2%     significant
    crypto-aes:              ??                 137.6ms +/- 0.9%    138.3ms +/- 0.6%     not conclusive: might be *1.005x as slow*
    crypto-ccm:              -                  144.1ms +/- 0.3%    143.3ms +/- 0.7%
    crypto-pbkdf2:           1.010x as fast     294.9ms +/- 0.4%    292.0ms +/- 0.3%     significant
    crypto-sha256-iterative: -                  110.2ms +/- 0.9%    109.8ms +/- 1.2%
Comment 1 Yuqiang Xian 2011-11-16 08:33:32 PST
Created attachment 115385 [details]
proposed patch

Not marking review? - The major problem is that I don't have an ARM build environment and platform to test the patch.
Gavin, if possible could you please help test it on ARM if you think the modification is worthwhile? Otherwise I will try to setup such environment though it may take a bit long time... Thanks.
Comment 2 Gavin Barraclough 2011-11-16 10:51:44 PST
Will do!
Comment 3 Yuqiang Xian 2011-11-16 18:36:33 PST
Created attachment 115509 [details]
patch updated

Just found the previous patch may have problem in DFG as in certain cases the dividend or divisor registers could be modified (for negative dividend or negative divisor), if they're not spilled and not the last use then future operations on those operands could generate wrong results.
Comment 4 Yuqiang Xian 2011-11-16 18:58:41 PST
Created attachment 115512 [details]
Another update

Sorry... missing another fix in the previous patch. Attached a new one.
Comment 5 Yuqiang Xian 2011-11-17 23:07:11 PST
Comment on attachment 115512 [details]
Another update

I think it's ready for review. Had a test on ARMv7 and no regression is observed comparing to the code w/o the patch, DFG on or not.
Comment 6 Filip Pizlo 2011-11-20 19:19:56 PST
r=me.  Sorry for the long delay in getting around to looking at this!

By the way, do you know how many of the cases where this is a win have a constant divisor?
Comment 7 Yuqiang Xian 2011-11-20 19:40:42 PST
(In reply to comment #6)
> r=me.  Sorry for the long delay in getting around to looking at this!
> 
> By the way, do you know how many of the cases where this is a win have a constant divisor?

Thanks!

I don't know the answer to your question. While I guess we can add some fast paths to complex arithmetic operations (e.g., mod, div, mul) for known constants, where we could do some strength reduction peephole optimizations and see how they impact the performance. What do you think?
Comment 8 WebKit Review Bot 2011-11-20 20:47:17 PST
Comment on attachment 115512 [details]
Another update

Clearing flags on attachment: 115512

Committed r100881: <http://trac.webkit.org/changeset/100881>
Comment 9 WebKit Review Bot 2011-11-20 20:47:26 PST
All reviewed patches have been landed.  Closing bug.