Bug 126679 - FTL should be able to be parallel
Summary: FTL should be able to be parallel
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Filip Pizlo
URL:
Keywords:
Depends on:
Blocks: 112840
  Show dependency treegraph
 
Reported: 2014-01-08 21:12 PST by Filip Pizlo
Modified: 2014-01-30 13:32 PST (History)
7 users (show)

See Also:


Attachments
the patch (3.37 KB, patch)
2014-01-08 21:14 PST, Filip Pizlo
oliver: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2014-01-08 21:12:18 PST
FTL is already concurrent but we should allow parallel FTL compiles also.
Comment 1 Filip Pizlo 2014-01-08 21:14:18 PST
Created attachment 220697 [details]
the patch
Comment 2 Filip Pizlo 2014-01-08 21:15:53 PST
The short story:


Benchmark report for Octane on dethklok (MacBookPro9,1).

VMs tested:
"DFG" at /Volumes/Data/pizlo/cStack/OpenSource/WebKitBuild/Release/jsc (r161543)
"FTL" at /Volumes/Data/pizlo/cStack/OpenSource/WebKitBuild/Release/jsc (r161543)
    export JSC_useExperimentalFTL=true
"FTLParallel" at /Volumes/Data/pizlo/cStack/OpenSource/WebKitBuild/Release/jsc (r161543)
    export JSC_useExperimentalFTL=true
    export JSC_numberOfCompilerThreads=7

Collected 4 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements.
Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level
timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                 DFG                       FTL                   FTLParallel              FTLParallel v. DFG    
Octane and V8v7:
   gbemu          x2      37.09279+-0.16632    !    57.01742+-3.35945    ^    46.94429+-0.28350       ! definitely 1.2656x slower

V8v7:
   <arithmetic>                 ERROR                     ERROR                     ERROR             
   <geometric> *                ERROR                     ERROR                     ERROR             
   <harmonic>                   ERROR                     ERROR                     ERROR             

Octane including V8v7:
   <arithmetic>           37.09279+-0.16632    !    57.01742+-3.35945    ^    46.94429+-0.28350       ! definitely 1.2656x slower
   <geometric> *          37.09279+-0.16632    !    57.01742+-3.35945    ^    46.94429+-0.28350       ! definitely 1.2656x slower
   <harmonic>             37.09279+-0.16632    !    57.01742+-3.35945    ^    46.94429+-0.28350       ! definitely 1.2656x slower
Comment 3 Filip Pizlo 2014-01-08 22:01:12 PST
More numbers, note that by default we still disable parallel JIT, but making it available seems like a good thing:


Benchmark report for SunSpider, Octane, and Kraken on oldmac (MacPro4,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/cStack/OpenSource/WebKitBuild/Release/jsc (r161543)
    export JSC_useExperimentalFTL=true
"Parallel" at /Volumes/Data/fromMiniMe/cStack/OpenSource/WebKitBuild/Release/jsc (r161543)
    export JSC_useExperimentalFTL=true
    export JSC_numberOfCompilerThreads=7

Collected 10 samples per benchmark/VM, with 10 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to
get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                TipOfTree                  Parallel                                     
SunSpider:
   3d-cube                                    7.7071+-0.0627     ?      7.7599+-0.0630        ?
   3d-morph                                   8.7913+-0.0855     !      9.1583+-0.1142        ! definitely 1.0417x slower
   3d-raytrace                                9.3980+-0.1465     ?      9.5145+-0.1579        ? might be 1.0124x slower
   access-binary-trees                        2.4403+-0.0295     ^      2.3333+-0.0685        ^ definitely 1.0458x faster
   access-fannkuch                            7.9568+-0.1316     !      8.4934+-0.3379        ! definitely 1.0674x slower
   access-nbody                               4.2729+-0.0189     !      4.4826+-0.0303        ! definitely 1.0491x slower
   access-nsieve                              5.5803+-0.0928     ?      5.6215+-0.0719        ?
   bitops-3bit-bits-in-byte                   1.9870+-0.0047     !      2.1859+-0.0122        ! definitely 1.1001x slower
   bitops-bits-in-byte                        6.7047+-0.0682     !      6.8630+-0.0448        ! definitely 1.0236x slower
   bitops-bitwise-and                         3.0412+-0.0320     !      3.1931+-0.0858        ! definitely 1.0499x slower
   bitops-nsieve-bits                         5.8168+-0.0552     !      6.1890+-0.0612        ! definitely 1.0640x slower
   controlflow-recursive                      4.5129+-0.0739     ^      3.9195+-0.0261        ^ definitely 1.1514x faster
   crypto-aes                                 5.8784+-0.0767     !      6.1094+-0.0306        ! definitely 1.0393x slower
   crypto-md5                                 3.5477+-0.0188     ?      3.5822+-0.0360        ?
   crypto-sha1                                3.6050+-0.0223     !      3.7944+-0.0272        ! definitely 1.0525x slower
   date-format-tofte                         11.6571+-0.1033     !     11.9440+-0.1331        ! definitely 1.0246x slower
   date-format-xparb                          8.6383+-0.1450     ?      8.8529+-0.2685        ? might be 1.0248x slower
   math-cordic                                4.7751+-0.0291     !      4.9111+-0.0260        ! definitely 1.0285x slower
   math-partial-sums                         10.1289+-0.1080     !     10.3787+-0.0792        ! definitely 1.0247x slower
   math-spectral-norm                         4.6044+-0.0323     ^      3.2050+-0.0165        ^ definitely 1.4366x faster
   regexp-dna                                13.0012+-0.1077           12.9501+-0.1136        
   string-base64                              5.5442+-0.0405     !      5.9641+-0.0575        ! definitely 1.0757x slower
   string-fasta                              11.3186+-0.1675     ^     10.8574+-0.0923        ^ definitely 1.0425x faster
   string-tagcloud                           15.0780+-0.1324     !     15.5265+-0.1503        ! definitely 1.0297x slower
   string-unpack-code                        32.4747+-0.4656     ?     32.9790+-0.2077        ? might be 1.0155x slower
   string-validate-input                      7.1373+-0.0561     !      7.2932+-0.0750        ! definitely 1.0218x slower

   <arithmetic> *                             7.9076+-0.0206     !      8.0024+-0.0196        ! definitely 1.0120x slower
   <geometric>                                6.5238+-0.0140     !      6.5626+-0.0162        ! definitely 1.0059x slower
   <harmonic>                                 5.5163+-0.0139     ?      5.5209+-0.0174        ? might be 1.0008x slower

                                                TipOfTree                  Parallel                                     
Octane and V8v7:
   encrypt                                   0.43707+-0.00054    ?     0.43721+-0.00042       ?
   decrypt                                   8.07201+-0.01921    !     8.17635+-0.06581       ! definitely 1.0129x slower
   deltablue                        x2       0.55410+-0.00355          0.54569+-0.01000         might be 1.0154x faster
   earley                                    0.89137+-0.01115          0.88520+-0.00610       
   boyer                                    11.92959+-0.05730    ?    12.02242+-0.06071       ?
   raytrace                         x2       3.95029+-0.04078          3.94056+-0.01886       
   regexp                           x2      30.88141+-0.08180    ?    30.97033+-0.05840       ?
   richards                         x2       0.22391+-0.00123    ?     0.22518+-0.00330       ?
   splay                            x2       0.63047+-0.00325          0.62737+-0.00320       
   navier-stokes                    x2       8.24171+-0.00417    ^     8.22514+-0.00855       ^ definitely 1.0020x faster
   closure                                   0.77295+-0.00243    ?     0.77768+-0.00246       ?
   jquery                                   10.88617+-0.07203    ?    11.05496+-0.12591       ? might be 1.0155x slower
   gbemu                            x2      94.17431+-0.30754    ^    83.98530+-0.59289       ^ definitely 1.1213x faster
   mandreel                         x2     102.53284+-0.44228        101.71001+-0.63318       
   pdfjs                            x2     100.40156+-0.33543    ?   100.56494+-0.24322       ?
   box2d                            x2      34.75103+-0.13026    ^    34.16445+-0.16824       ^ definitely 1.0172x faster

V8v7:
   <arithmetic>                              6.89336+-0.01301    ?     6.91186+-0.01271       ? might be 1.0027x slower
   <geometric> *                             2.16438+-0.00520          2.16183+-0.00799         might be 1.0012x faster
   <harmonic>                                0.79419+-0.00231          0.79301+-0.00622         might be 1.0015x faster

Octane including V8v7:
   <arithmetic>                             30.21817+-0.04109    ^    29.35661+-0.07554       ^ definitely 1.0293x faster
   <geometric> *                             6.62176+-0.01072    ^     6.55245+-0.01652       ^ definitely 1.0106x faster
   <harmonic>                                1.20092+-0.00332          1.19956+-0.00877         might be 1.0011x faster

                                                TipOfTree                  Parallel                                     
Kraken:
   ai-astar                                  494.728+-0.926            494.582+-0.917         
   audio-beat-detection                      222.562+-1.220      ?     223.112+-1.602         ?
   audio-dft                                 292.784+-1.339      ?     293.696+-2.648         ?
   audio-fft                                 129.641+-0.122      ?     129.743+-0.207         ?
   audio-oscillator                          530.735+-0.598      !     537.215+-3.845         ! definitely 1.0122x slower
   imaging-darkroom                          293.979+-2.450      ^     289.578+-0.685         ^ definitely 1.0152x faster
   imaging-desaturate                        109.575+-0.117      ?     109.999+-0.386         ?
   imaging-gaussian-blur                     192.666+-7.232      ?     203.863+-26.531        ? might be 1.0581x slower
   json-parse-financial                       79.570+-0.179      !      81.684+-0.424         ! definitely 1.0266x slower
   json-stringify-tinderbox                  107.233+-2.463            106.180+-1.573         
   stanford-crypto-aes                        92.793+-0.701             92.711+-0.393         
   stanford-crypto-ccm                       101.024+-1.560      ?     104.447+-4.284         ? might be 1.0339x slower
   stanford-crypto-pbkdf2                    260.186+-0.474      ?     260.354+-0.869         ?
   stanford-crypto-sha256-iterative          113.937+-0.544      !     115.170+-0.659         ! definitely 1.0108x slower

   <arithmetic> *                            215.815+-0.647      ?     217.310+-1.820         ? might be 1.0069x slower
   <geometric>                               178.208+-0.642      ?     179.650+-1.418         ? might be 1.0081x slower
   <harmonic>                                151.420+-0.633      ?     152.836+-0.932         ? might be 1.0094x slower

                                                TipOfTree                  Parallel                                     
All benchmarks:
   <arithmetic>                              60.7982+-0.1331     ?     60.8132+-0.3843        ? might be 1.0002x slower
   <geometric>                               13.2356+-0.0099           13.2342+-0.0324          might be 1.0001x faster
   <harmonic>                                 2.4947+-0.0056            2.4928+-0.0158          might be 1.0007x faster

                                                TipOfTree                  Parallel                                     
Geomean of preferred means:
   <scaled-result>                           22.4405+-0.0235     ?     22.5024+-0.0734        ? might be 1.0028x slower
Comment 4 Filip Pizlo 2014-01-08 22:11:39 PST
Landed on branch in http://trac.webkit.org/changeset/161546
Comment 5 Mark Lam 2014-01-13 18:41:05 PST
Review status updated in r161938: <http://trac.webkit.org/r161938>.