Bug 72878 - Strength reduction for Mul and Mod operations for known constants in DFG
Summary: Strength reduction for Mul and Mod operations for known constants in DFG
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-21 06:41 PST by Yuqiang Xian
Modified: 2011-11-23 11:33 PST (History)
3 users (show)

See Also:


Attachments
the patch (7.05 KB, patch)
2011-11-21 06:57 PST, Yuqiang Xian
no flags Details | Formatted Diff | Diff
patch updated (25.75 KB, patch)
2011-11-21 19:21 PST, Yuqiang Xian
no flags Details | Formatted Diff | Diff
patch fixing the 64bit regression (25.90 KB, patch)
2011-11-21 21:51 PST, Yuqiang Xian
no flags Details | Formatted Diff | Diff
simple test case (598 bytes, text/plain)
2011-11-22 20:00 PST, Yuqiang Xian
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yuqiang Xian 2011-11-21 06:41:44 PST
We can do strength reduction for some expensive arithmetic operations like Mul, Mod etc., especially it would be easier for known constants such as power-of-two constants.
Comment 1 Yuqiang Xian 2011-11-21 06:57:02 PST
Created attachment 116079 [details]
the patch

No obvious performance gains on v8, sunspider and kraken benchmarks, though. Also currently only for 32_64 DFG.
Comment 2 Filip Pizlo 2011-11-21 10:57:16 PST
Can you add 64-bit code for this?  It would be great if we could keep the two JITs in sync whenever possible.
Comment 3 Yuqiang Xian 2011-11-21 19:21:02 PST
Created attachment 116168 [details]
patch updated

Adding 64bit support, and also try to share the code between 64 and 32_64.
Comment 4 Yuqiang Xian 2011-11-21 19:47:06 PST
Comment on attachment 116168 [details]
patch updated

Sorry. Seems one failure in 64-bit layout test. Still under investigation. So clear review flags for now.
Comment 5 Yuqiang Xian 2011-11-21 21:51:15 PST
Created attachment 116177 [details]
patch fixing the 64bit regression

In the ArithMod fast path, the dividend value could be the final result (in case of |dividend| < |divisor|) and the result cannot be a tagged integer, so we speculate it as strict int32.
Comment 6 Yuqiang Xian 2011-11-22 18:59:13 PST
Performance results on "64-bit": 1% on SunSpider, neutral elsewhere.

SunSpider:

TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           1.011x as fast    181.7ms +/- 0.9%   179.7ms +/- 0.6%     significant

=============================================================================

  3d:                  -                  26.7ms +/- 1.8%    26.4ms +/- 1.4%
    cube:              -                  10.0ms +/- 0.0%    10.0ms +/- 0.0%
    morph:             -                   8.0ms +/- 0.0%     8.0ms +/- 0.0%
    raytrace:          -                   8.7ms +/- 5.5%     8.4ms +/- 4.4%

  access:              -                  17.1ms +/- 6.4%    16.9ms +/- 4.2%
    binary-trees:      -                   2.6ms +/- 23.2%     2.4ms +/- 25.1%
    fannkuch:          -                   7.3ms +/- 4.7%     7.1ms +/- 3.2%
    nbody:             -                   4.0ms +/- 0.0%     4.0ms +/- 0.0%
    nsieve:            ??                  3.2ms +/- 14.1%     3.4ms +/- 10.9%     not conclusive: might be *1.063x as slow*

  bitops:              -                  13.5ms +/- 3.7%    13.1ms +/- 1.7%
    3bit-bits-in-byte: -                   1.0ms +/- 0.0%     1.0ms +/- 0.0%
    bits-in-byte:      -                   4.3ms +/- 8.0%     4.1ms +/- 5.5%
    bitwise-and:       -                   3.1ms +/- 7.3%     3.0ms +/- 0.0%
    nsieve-bits:       -                   5.1ms +/- 4.4%     5.0ms +/- 0.0%

  controlflow:         -                   2.0ms +/- 0.0%     2.0ms +/- 0.0%
    recursive:         -                   2.0ms +/- 0.0%     2.0ms +/- 0.0%

  crypto:              ??                 11.0ms +/- 0.0%    11.1ms +/- 2.0%     not conclusive: might be *1.009x as slow*
    aes:               ??                  7.0ms +/- 0.0%     7.1ms +/- 3.2%     not conclusive: might be *1.014x as slow*
    md5:               -                   2.0ms +/- 0.0%     2.0ms +/- 0.0%
    sha1:              -                   2.0ms +/- 0.0%     2.0ms +/- 0.0%

  date:                1.052x as fast     24.4ms +/- 1.5%    23.2ms +/- 1.3%     significant
    format-tofte:      -                  12.4ms +/- 3.0%    12.2ms +/- 2.5%
    format-xparb:      1.091x as fast     12.0ms +/- 0.0%    11.0ms +/- 0.0%     significant

  math:                -                  19.0ms +/- 0.0%    19.0ms +/- 0.0%
    cordic:            -                   7.0ms +/- 0.0%     7.0ms +/- 0.0%
    partial-sums:      -                  10.0ms +/- 0.0%    10.0ms +/- 0.0%
    spectral-norm:     -                   2.0ms +/- 0.0%     2.0ms +/- 0.0%

  regexp:              -                  14.0ms +/- 0.0%    14.0ms +/- 0.0%
    dna:               -                  14.0ms +/- 0.0%    14.0ms +/- 0.0%

  string:              -                  54.0ms +/- 0.0%    54.0ms +/- 0.0%
    base64:            -                   4.0ms +/- 0.0%     4.0ms +/- 0.0%
    fasta:             -                   8.0ms +/- 0.0%     8.0ms +/- 0.0%
    tagcloud:          -                  13.0ms +/- 0.0%    13.0ms +/- 0.0%
    unpack-code:       -                  23.0ms +/- 0.0%    23.0ms +/- 0.0%
    validate-input:    -                   6.0ms +/- 0.0%     6.0ms +/- 0.0%


Kraken:

TEST                         COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:                 -                 3782.4ms +/- 0.4%   3766.0ms +/- 0.3%

=============================================================================

  ai:                        -                  803.6ms +/- 1.0%    800.3ms +/- 0.5%
    astar:                   -                  803.6ms +/- 1.0%    800.3ms +/- 0.5%

  audio:                     -                 1049.8ms +/- 0.3%   1049.5ms +/- 0.1%
    beat-detection:          -                  208.1ms +/- 0.1%    207.9ms +/- 0.1%
    dft:                     -                  436.4ms +/- 0.5%    435.0ms +/- 0.1%
    fft:                     ??                 137.3ms +/- 0.3%    138.0ms +/- 0.6%     not conclusive: might be *1.005x as slow*
    oscillator:              ??                 268.0ms +/- 0.4%    268.6ms +/- 0.1%     not conclusive: might be *1.002x as slow*

  imaging:                   1.010x as fast    1159.8ms +/- 0.6%   1148.0ms +/- 0.5%     significant
    gaussian-blur:           -                  585.7ms +/- 0.3%    585.0ms +/- 0.1%
    darkroom:                -                  341.0ms +/- 2.0%    334.9ms +/- 1.8%
    desaturate:              1.022x as fast     233.1ms +/- 0.2%    228.1ms +/- 0.1%     significant

  json:                      ??                 170.1ms +/- 0.4%    170.3ms +/- 0.3%     not conclusive: might be *1.001x as slow*
    parse-financial:         *1.014x as slow*    74.0ms +/- 0.5%     75.0ms +/- 0.6%     significant
    stringify-tinderbox:     1.008x as fast      96.1ms +/- 0.5%     95.3ms +/- 0.4%     significant

  stanford:                  -                  599.1ms +/- 0.4%    597.9ms +/- 0.3%
    crypto-aes:              -                  127.9ms +/- 0.8%    128.0ms +/- 0.4%
    crypto-ccm:              ??                 127.9ms +/- 0.3%    128.2ms +/- 0.4%     not conclusive: might be *1.002x as slow*
    crypto-pbkdf2:           -                  237.0ms +/- 0.5%    236.2ms +/- 0.8%
    crypto-sha256-iterative: 1.008x as fast     106.3ms +/- 0.3%    105.5ms +/- 0.4%     significant


V8:

FROM -
Richards: 9613
DeltaBlue: 6407
Crypto: 15087
RayTrace: 8427
EarleyBoyer: 8602
RegExp: 1838
Splay: 6306
----
Score (version 6): 6947

TO -
Richards: 9616
DeltaBlue: 6341
Crypto: 15109
RayTrace: 8312
EarleyBoyer: 8648
RegExp: 1856
Splay: 6275
----
Score (version 6): 6935
Comment 7 Filip Pizlo 2011-11-22 19:13:14 PST
Comment on attachment 116177 [details]
patch fixing the 64bit regression

View in context: https://bugs.webkit.org/attachment.cgi?id=116177&action=review

Looks great!  Two comments:

1) Can you check that the indentation is 4 spaces, no tabs, in the one place I pointed out?

2) Since there is no clear win on major benchmarks, can you construct a micro benchmark that demonstrates the win?  It can be arbitrarily simple.  I think that this patch is a good idea, and it's the right direction, but we just need to make sure that it does what we think it does.

I'm marking it r+ and cq- so that you can check the indentation.

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:1602
> +        if (x) {
> +          num = x;
> +          log += shift;
> +        }

Is the indentation right here?
Comment 8 Filip Pizlo 2011-11-22 19:16:34 PST
>     format-xparb:      1.091x as fast     12.0ms +/- 0.0%    11.0ms +/- 0.0%     significant

Something to keep in mind: date-format-xparb fluctuates by 10% every time I breathe on the DFG, for any reason.  So I tend to ignore it.  That's not to say that this isn't a true progression.  But taking progressions on xparb seriously is bad luck: next time you make a patch you might find xparb regressing by 10%, for no reason at all! ;-)
Comment 9 Yuqiang Xian 2011-11-22 20:00:21 PST
Created attachment 116309 [details]
simple test case

Filip, thanks for the comments!

Here's a very simple test case for constant power of two mul and mod.

Tested on 64bit:

ToT -
constant multiply result: 9999999900000000, in 878 ms
constant modulo result: 50000000, in 784 ms

Patched -
constant multiply result: 9999999900000000, in 721 ms
constant modulo result: 50000000, in 721 ms
Comment 10 Filip Pizlo 2011-11-22 20:04:13 PST
(In reply to comment #9)
> Created an attachment (id=116309) [details]
> simple test case
> 
> Filip, thanks for the comments!
> 
> Here's a very simple test case for constant power of two mul and mod.
> 
> Tested on 64bit:
> 
> ToT -
> constant multiply result: 9999999900000000, in 878 ms
> constant modulo result: 50000000, in 784 ms
> 
> Patched -
> constant multiply result: 9999999900000000, in 721 ms
> constant modulo result: 50000000, in 721 ms

That's great!  Note that 'i' and 'iter' are global variables in your test, so probably you could get even better performance by making 'iter' an argument to the test functions (so it's local) or maybe loading it into a local 'var' at the top of those functions, and then declaring 'i' as a var.
Comment 11 Yuqiang Xian 2011-11-22 20:19:23 PST
Landed as r101042 - http://trac.webkit.org/changeset/101042
Comment 12 Yuqiang Xian 2011-11-22 20:20:08 PST
Comment on attachment 116177 [details]
patch fixing the 64bit regression

Clear flags for landed patch.
Comment 13 Andy Wingo 2011-11-23 08:32:30 PST
(In reply to comment #8)
> >     format-xparb:      1.091x as fast     12.0ms +/- 0.0%    11.0ms +/- 0.0%     significant
> 
> Something to keep in mind: date-format-xparb fluctuates by 10% every time I breathe on the DFG, for any reason.  So I tend to ignore it.  That's not to say that this isn't a true progression.  But taking progressions on xparb seriously is bad luck: next time you make a patch you might find xparb regressing by 10%, for no reason at all! ;-)

Sounds like the integral milliseconds truncation of bug 71801.
Comment 14 Filip Pizlo 2011-11-23 11:33:46 PST
(In reply to comment #13)
> (In reply to comment #8)
> > >     format-xparb:      1.091x as fast     12.0ms +/- 0.0%    11.0ms +/- 0.0%     significant
> > 
> > Something to keep in mind: date-format-xparb fluctuates by 10% every time I breathe on the DFG, for any reason.  So I tend to ignore it.  That's not to say that this isn't a true progression.  But taking progressions on xparb seriously is bad luck: next time you make a patch you might find xparb regressing by 10%, for no reason at all! ;-)
> 
> Sounds like the integral milliseconds truncation of bug 71801.

This pathology shows up even when doing microsecond timing with Tools/Scripts/bencher.