Bug 69717 - Fix value profiling in 32_64 JIT
Summary: Fix value profiling in 32_64 JIT
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-09 01:02 PDT by Yuqiang Xian
Modified: 2011-10-09 12:11 PDT (History)
3 users (show)

See Also:


Attachments
the patch (8.32 KB, patch)
2011-10-09 01:12 PDT, Yuqiang Xian
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yuqiang Xian 2011-10-09 01:02:13 PDT
Current value profiling for 32_64 JIT is broken and cannot record correct predicated types, which results in many speculation failures in the 32_64 DFG JIT, fallbacks to baseline JIT, and re-optimizations again and again. With this fix 32_64 DFG JIT can demonstrate real performance gains.
Comment 1 Yuqiang Xian 2011-10-09 01:12:20 PDT
Created attachment 110292 [details]
the patch
Comment 2 Filip Pizlo 2011-10-09 01:29:00 PDT
Comment on attachment 110292 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=110292&action=review

It would be great to see the performance impact, if any, on JSVALUE64 and JSVALUE32_64.

> Source/JavaScriptCore/bytecode/ValueProfile.cpp:64
> +        if (!value) {

Have you made sure that these changes don't perturb SunSpider, V8, and Kraken performance on JSVALUE64?
Comment 3 Yuqiang Xian 2011-10-09 02:03:11 PDT
Performance result on Mac 64 - no difference with this change.

Kraken

TEST                         COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:                 -                 4142.9ms +/- 0.2%   4146.3ms +/- 0.4%

=============================================================================

  ai:                        -                  802.3ms +/- 0.1%    802.7ms +/- 0.1%
    astar:                   -                  802.3ms +/- 0.1%    802.7ms +/- 0.1%

  audio:                     ??                1091.6ms +/- 0.3%   1093.9ms +/- 0.8%     not conclusive: might be *1.002x as slow*
    beat-detection:          -                  222.0ms +/- 0.3%    221.3ms +/- 0.2%
    dft:                     ??                 440.9ms +/- 0.1%    445.0ms +/- 1.9%     not conclusive: might be *1.009x as slow*
    fft:                     -                  152.0ms +/- 0.2%    152.0ms +/- 0.0%
    oscillator:              -                  276.7ms +/- 1.2%    275.6ms +/- 0.3%

  imaging:                   ??                1342.4ms +/- 0.8%   1343.9ms +/- 1.1%     not conclusive: might be *1.001x as slow*
    gaussian-blur:           ??                 604.2ms +/- 0.2%    606.8ms +/- 0.6%     not conclusive: might be *1.004x as slow*
    darkroom:                -                  503.3ms +/- 2.1%    502.1ms +/- 2.6%
    desaturate:              -                  234.9ms +/- 0.1%    235.0ms +/- 0.1%

  json:                      1.006x as fast     158.6ms +/- 0.2%    157.7ms +/- 0.3%     significant
    parse-financial:         -                   69.2ms +/- 0.4%     68.9ms +/- 0.3%
    stringify-tinderbox:     1.007x as fast      89.4ms +/- 0.4%     88.8ms +/- 0.5%     significant

  stanford:                  -                  748.0ms +/- 0.6%    748.1ms +/- 0.4%
    crypto-aes:              -                  153.5ms +/- 0.5%    153.0ms +/- 0.5%
    crypto-ccm:              -                  137.7ms +/- 1.1%    137.7ms +/- 1.0%
    crypto-pbkdf2:           ??                 358.0ms +/- 1.1%    358.4ms +/- 0.5%     not conclusive: might be *1.001x as slow*
    crypto-sha256-iterative: ??                  98.8ms +/- 0.5%     99.0ms +/- 0.3%     not conclusive: might be *1.002x as slow*


SunSpider

TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           -                 185.4ms +/- 0.4%   184.6ms +/- 0.4%

=============================================================================

  3d:                  -                  27.1ms +/- 1.5%    26.9ms +/- 2.0%
    cube:              -                  10.0ms +/- 0.0%    10.0ms +/- 0.0%
    morph:             -                   8.0ms +/- 0.0%     8.0ms +/- 0.0%
    raytrace:          -                   9.1ms +/- 4.5%     8.9ms +/- 5.9%

  access:              -                  17.2ms +/- 3.3%    16.7ms +/- 4.5%
    binary-trees:      -                   2.5ms +/- 15.1%     2.5ms +/- 20.2%
    fannkuch:          -                   7.5ms +/- 6.7%     7.1ms +/- 3.2%
    nbody:             -                   4.2ms +/- 10.8%     4.0ms +/- 0.0%
    nsieve:            ??                  3.0ms +/- 0.0%     3.1ms +/- 7.3%     not conclusive: might be *1.033x as slow*

  bitops:              -                  14.3ms +/- 2.4%    14.0ms +/- 0.0%
    3bit-bits-in-byte: -                   1.0ms +/- 0.0%     1.0ms +/- 0.0%
    bits-in-byte:      -                   5.0ms +/- 0.0%     5.0ms +/- 0.0%
    bitwise-and:       -                   3.2ms +/- 9.4%     3.0ms +/- 0.0%
    nsieve-bits:       -                   5.1ms +/- 4.4%     5.0ms +/- 0.0%

  controlflow:         -                   2.0ms +/- 0.0%     2.0ms +/- 0.0%
    recursive:         -                   2.0ms +/- 0.0%     2.0ms +/- 0.0%

  crypto:              -                  12.0ms +/- 0.0%    12.0ms +/- 0.0%
    aes:               -                   7.0ms +/- 0.0%     7.0ms +/- 0.0%
    md5:               -                   3.0ms +/- 0.0%     3.0ms +/- 0.0%
    sha1:              -                   2.0ms +/- 0.0%     2.0ms +/- 0.0%

  date:                -                  22.0ms +/- 0.0%    22.0ms +/- 0.0%
    format-tofte:      -                  12.0ms +/- 0.0%    12.0ms +/- 0.0%
    format-xparb:      -                  10.0ms +/- 0.0%    10.0ms +/- 0.0%

  math:                -                  20.0ms +/- 0.0%    20.0ms +/- 0.0%
    cordic:            -                   7.0ms +/- 0.0%     7.0ms +/- 0.0%
    partial-sums:      -                  10.0ms +/- 0.0%    10.0ms +/- 0.0%
    spectral-norm:     -                   3.0ms +/- 0.0%     3.0ms +/- 0.0%

  regexp:              ??                 12.8ms +/- 2.4%    13.0ms +/- 0.0%     not conclusive: might be *1.016x as slow*
    dna:               ??                 12.8ms +/- 2.4%    13.0ms +/- 0.0%     not conclusive: might be *1.016x as slow*

  string:              -                  58.0ms +/- 0.0%    58.0ms +/- 0.0%
    base64:            -                   6.0ms +/- 0.0%     6.0ms +/- 0.0%
    fasta:             -                   7.0ms +/- 0.0%     7.0ms +/- 0.0%
    tagcloud:          -                  13.0ms +/- 0.0%    13.0ms +/- 0.0%
    unpack-code:       -                  25.0ms +/- 0.0%    25.0ms +/- 0.0%
    validate-input:    -                   7.0ms +/- 0.0%     7.0ms +/- 0.0%


V8

Before

Richards: 5786
DeltaBlue: 4463
Crypto: 13952
RayTrace: 7770
EarleyBoyer: 8285
RegExp: 1741
Splay: 5025
----
Score (version 6): 5730


After

Richards: 5952
DeltaBlue: 4450
Crypto: 13942
RayTrace: 8083
EarleyBoyer: 8240
RegExp: 1738
Splay: 5039
----
Score (version 6): 5779
Comment 4 Yuqiang Xian 2011-10-09 02:06:10 PDT
Performance result on Linux IA32 with DFG JIT turned on. Only part of the kraken result is available. There are still some issues with v8 benchmarks.

TEST                     COMPARISON            FROM (ToT)                 TO (Mod)        DETAILS

=============================================================================

** TOTAL **:             1.47x as fast     10246.2ms +/- 0.3%   6970.3ms +/- 0.1%     significant

=============================================================================

  ai:                    2.88x as fast      2270.8ms +/- 0.3%    789.7ms +/- 0.1%     significant
    astar:               2.88x as fast      2270.8ms +/- 0.3%    789.7ms +/- 0.1%     significant

  audio:                 1.071x as fast     2427.6ms +/- 0.6%   2266.4ms +/- 0.2%     significant
    beat-detection:      1.116x as fast      728.1ms +/- 0.5%    652.5ms +/- 0.5%     significant
    dft:                 1.081x as fast      602.8ms +/- 0.8%    557.6ms +/- 0.3%     significant
    fft:                 1.164x as fast      552.6ms +/- 0.5%    474.6ms +/- 0.1%     significant
    oscillator:          *1.069x as slow*    544.1ms +/- 0.8%    581.7ms +/- 0.1%     significant

  imaging:               1.49x as fast      4376.0ms +/- 0.5%   2929.3ms +/- 0.1%     significant
    gaussian-blur:       2.21x as fast      2843.7ms +/- 1.2%   1284.9ms +/- 0.1%     significant
    darkroom:            *1.135x as slow*    634.1ms +/- 1.8%    719.5ms +/- 0.6%     significant
    desaturate:          *1.030x as slow*    898.2ms +/- 0.3%    924.9ms +/- 0.2%     significant

  json:                  -                   215.1ms +/- 0.4%    214.8ms +/- 0.5%
    parse-financial:     1.021x as fast       98.5ms +/- 0.4%     96.5ms +/- 0.5%     significant
    stringify-tinderbox: *1.015x as slow*    116.6ms +/- 0.5%    118.3ms +/- 0.7%     significant

  stanford:              1.24x as fast       956.7ms +/- 0.4%    770.1ms +/- 0.3%     significant
    crypto-aes:          1.105x as fast      189.7ms +/- 1.3%    171.6ms +/- 1.2%     significant
    crypto-pbkdf2:       1.28x as fast       767.0ms +/- 0.4%    598.5ms +/- 0.1%     significant
Comment 5 Filip Pizlo 2011-10-09 02:14:44 PDT
This looks great!

By the way, you might find it easier to run all of the benchmarks in one go using Tools/Scripts/bencher.  You need two checkouts, which you build separately: one for baseline (no changes), one with your changes.  You also need ruby installed, and the json gem.  And a ~/.bencher file.  When you run bencher the first time, it will tell you how to set this file up.

Then do:

Tools/Scripts/bencher TipOfTree:<path-to-baseline-jsc> MyChanges:<path-to-your-jsc>

Example:

Tools/Scripts/bencher TipOfTree:/Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc CFA:/Volumes/Data/pizlo/secondary/OpenSource/WebKitBuild/Release/jsc

I find that this makes it a bit faster to run benchmarks.  But the way you ran them is fine, as well.
Comment 6 WebKit Review Bot 2011-10-09 03:18:29 PDT
Comment on attachment 110292 [details]
the patch

Clearing flags on attachment: 110292

Committed r97025: <http://trac.webkit.org/changeset/97025>
Comment 7 WebKit Review Bot 2011-10-09 03:18:33 PDT
All reviewed patches have been landed.  Closing bug.
Comment 8 Yuqiang Xian 2011-10-09 07:09:58 PDT
Thanks for your hints, Filip. I will try the approach you suggested next time.

BTW, one thing I think need to clarify is that the performance result of 32-bit DFG I pasted before is for the comparison between DFG w/o and w/ this patch (i.e. both have DFG turned on).

Here's the comparsion between w/o DFG and w/ DFG after this change, tested on Linux ia32. We still have two cases failed to run (stanford-crypto-ccm produces wrong results and stanford-crypto-sha256-iterative crashes) and under investigations.

TEST                     COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:             1.39x as fast     9678.1ms +/- 0.0%   6970.3ms +/- 0.1%     significant

=============================================================================

  ai:                    2.68x as fast     2114.8ms +/- 0.1%    789.7ms +/- 0.1%     significant
    astar:               2.68x as fast     2114.8ms +/- 0.1%    789.7ms +/- 0.1%     significant

  audio:                 1.037x as fast    2349.3ms +/- 0.1%   2266.4ms +/- 0.2%     significant
    beat-detection:      1.090x as fast     711.5ms +/- 0.3%    652.5ms +/- 0.5%     significant
    dft:                 1.045x as fast     582.5ms +/- 0.4%    557.6ms +/- 0.3%     significant
    fft:                 1.113x as fast     528.4ms +/- 0.2%    474.6ms +/- 0.1%     significant
    oscillator:          *1.104x as slow*   526.9ms +/- 0.1%    581.7ms +/- 0.1%     significant

  imaging:               1.39x as fast     4082.4ms +/- 0.1%   2929.3ms +/- 0.1%     significant
    gaussian-blur:       1.98x as fast     2543.3ms +/- 0.1%   1284.9ms +/- 0.1%     significant
    darkroom:            *1.142x as slow*   630.0ms +/- 0.1%    719.5ms +/- 0.6%     significant
    desaturate:          *1.017x as slow*   909.1ms +/- 0.0%    924.9ms +/- 0.2%     significant

  json:                  *1.013x as slow*   212.0ms +/- 0.4%    214.8ms +/- 0.5%     significant
    parse-financial:     *1.010x as slow*    95.5ms +/- 0.4%     96.5ms +/- 0.5%     significant
    stringify-tinderbox: *1.015x as slow*   116.5ms +/- 0.5%    118.3ms +/- 0.7%     significant

  stanford:              1.194x as fast     919.6ms +/- 0.2%    770.1ms +/- 0.3%     significant
    crypto-aes:          ??                 170.1ms +/- 0.3%    171.6ms +/- 1.2%     not conclusive: might be *1.009x as slow*
    crypto-pbkdf2:       1.25x as fast      749.5ms +/- 0.2%    598.5ms +/- 0.1%     significant
Comment 9 Filip Pizlo 2011-10-09 12:11:50 PDT
(In reply to comment #8)
> Thanks for your hints, Filip. I will try the approach you suggested next time.
> 
> BTW, one thing I think need to clarify is that the performance result of 32-bit DFG I pasted before is for the comparison between DFG w/o and w/ this patch (i.e. both have DFG turned on).
> 
> Here's the comparsion between w/o DFG and w/ DFG after this change, tested on Linux ia32. We still have two cases failed to run (stanford-crypto-ccm produces wrong results and stanford-crypto-sha256-iterative crashes) and under investigations.

Ah!  Thanks for the clarification.

> ** TOTAL **:             1.39x as fast     9678.1ms +/- 0.0%   6970.3ms +/- 0.1%     significant

This is starting to look good!