Bug 112839

Summary: fourthTier: DFG should be able to run on a separate thread
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: JavaScriptCoreAssignee: Filip Pizlo <fpizlo>
Status: RESOLVED FIXED    
Severity: Normal CC: barraclough, ggaren, mark.lam, mhahnenberg, msaboff, oliver, sam
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 114707, 114708, 114762, 114906, 114909, 114987, 115083, 115297, 115299, 115300, 115301, 115445, 115582, 115594, 115598, 116060, 116126, 116350    
Bug Blocks: 112836    
Attachments:
Description Flags
work in progress
none
it runs things
none
it's a speed-up, for the most part
none
fixed more buggies
none
moar
none
more things
none
the patch ggaren: review+

Description Filip Pizlo 2013-03-20 13:43:52 PDT
Because threads are awesome.
Comment 1 Filip Pizlo 2013-05-18 14:48:47 PDT
Created attachment 202219 [details]
work in progress
Comment 2 Filip Pizlo 2013-05-18 17:30:57 PDT
Created attachment 202222 [details]
it runs things
Comment 3 Filip Pizlo 2013-05-18 21:58:20 PDT
Wow.

Benchmark report for SunSpider on oldmac (MacPro4,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/fourthTier/OpenSource/WebKitBuild/Release/jsc (r150349)
"CoCo" at /Volumes/Data/fromMiniMe/fourthTier/secondary/OpenSource/WebKitBuild/Release/jsc (r150349)

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between
sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific
preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95%
confidence intervals in milliseconds.

                                  TipOfTree                    CoCo                                       

3d-cube                         9.7970+-0.3084     ^      7.0279+-0.5858        ^ definitely 1.3940x faster
3d-morph                        9.0063+-0.2394            8.8302+-0.5370          might be 1.0199x faster
3d-raytrace                    10.8543+-0.3490     ^      8.3311+-0.1109        ^ definitely 1.3029x faster
access-binary-trees             1.9811+-0.0081     ?      2.1782+-0.5912        ? might be 1.0994x slower
access-fannkuch                 7.8915+-0.1643     ?      8.5080+-2.2365        ? might be 1.0781x slower
access-nbody                    4.7534+-0.0515     ^      3.9523+-0.0584        ^ definitely 1.2027x faster
access-nsieve                   4.9548+-0.0678     !      6.1717+-0.1144        ! definitely 1.2456x slower
bitops-3bit-bits-in-byte        1.8517+-0.0107     ^      1.7721+-0.0183        ^ definitely 1.0449x faster
bitops-bits-in-byte             6.6758+-0.0920            6.6059+-0.1225          might be 1.0106x faster
bitops-bitwise-and              2.7389+-0.0519     !      4.2159+-1.1382        ! definitely 1.5393x slower
bitops-nsieve-bits              4.7877+-0.0547     ?      7.3282+-3.2467        ? might be 1.5306x slower
controlflow-recursive           3.2102+-0.0488            3.1139+-0.1090          might be 1.0309x faster
crypto-aes                      7.8632+-0.1244     ^      5.1990+-0.0911        ^ definitely 1.5125x faster
crypto-md5                      4.4320+-0.0377     ^      3.0869+-0.0767        ^ definitely 1.4357x faster
crypto-sha1                     3.3562+-0.0248     ^      2.8028+-0.0159        ^ definitely 1.1974x faster
date-format-tofte              15.4000+-0.3783     ?     15.5954+-1.3100        ? might be 1.0127x slower
date-format-xparb               9.6366+-0.1786     ^      8.2300+-0.1334        ^ definitely 1.1709x faster
math-cordic                     4.0619+-0.0539     ^      3.9165+-0.0129        ^ definitely 1.0371x faster
math-partial-sums              12.6170+-0.1246     ^     12.1648+-0.0850        ^ definitely 1.0372x faster
math-spectral-norm              3.1937+-0.0119     ^      2.7017+-0.0070        ^ definitely 1.1821x faster
regexp-dna                     13.2100+-0.3825     ^     12.6349+-0.1839        ^ definitely 1.0455x faster
string-base64                   5.0036+-0.0991     ?      5.0320+-0.0483        ?
string-fasta                   11.1450+-0.2891     ^     10.6305+-0.1452        ^ definitely 1.0484x faster
string-tagcloud                14.3686+-0.2043           14.2502+-0.2312        
string-unpack-code             27.9101+-0.1143     ?     28.0778+-0.6290        ?
string-validate-input           7.3840+-0.2730     ^      6.5804+-0.1159        ^ definitely 1.1221x faster

<arithmetic> *                  8.0032+-0.0840     ^      7.6515+-0.1927        ^ definitely 1.0460x faster
<geometric>                     6.4521+-0.0570     ^      6.0837+-0.1248        ^ definitely 1.0605x faster
<harmonic>                      5.2095+-0.0315     ^      4.8993+-0.1004        ^ definitely 1.0633x faster
Comment 4 Filip Pizlo 2013-05-18 22:33:28 PDT
Created attachment 202227 [details]
it's a speed-up, for the most part

Still have some slow-downs to investigate.  But it's starting to really work.
Comment 5 Filip Pizlo 2013-05-18 23:06:35 PDT
Current results.  Notice that some benchmarks, like navier-stokes, exhibit ridiculous slow-down.  This is almost certainly a bug.


Benchmark report for SunSpider, V8Spider, Octane, Kraken, and JSRegress on oldmac (MacPro4,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/fourthTier/OpenSource/WebKitBuild/Release/jsc (r150349)
"CoCo" at /Volumes/Data/fromMiniMe/fourthTier/secondary/OpenSource/WebKitBuild/Release/jsc (r150349)

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get
microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                     TipOfTree                    CoCo                                       
SunSpider:
   3d-cube                                         9.7786+-0.2486     ^      6.7297+-0.0589        ^ definitely 1.4531x faster
   3d-morph                                        8.8167+-0.0967     ?      8.9819+-0.5222        ? might be 1.0187x slower
   3d-raytrace                                    10.8967+-0.3513     ^      8.2750+-0.1397        ^ definitely 1.3168x faster
   access-binary-trees                             1.9855+-0.0096     ^      1.8045+-0.0477        ^ definitely 1.1003x faster
   access-fannkuch                                 7.8129+-0.1508     ?      9.3907+-2.8883        ? might be 1.2020x slower
   access-nbody                                    4.7573+-0.0358            4.4756+-1.1266          might be 1.0630x faster
   access-nsieve                                   4.9352+-0.0712     !      6.1566+-0.1039        ! definitely 1.2475x slower
   bitops-3bit-bits-in-byte                        1.8525+-0.0102     ?      1.9083+-0.2077        ? might be 1.0301x slower
   bitops-bits-in-byte                             6.6271+-0.0720     ?      6.7302+-0.1056        ? might be 1.0156x slower
   bitops-bitwise-and                              2.7114+-0.0832     !      3.9366+-1.1098        ! definitely 1.4519x slower
   bitops-nsieve-bits                              4.7344+-0.0604     ?      5.9164+-2.4752        ? might be 1.2497x slower
   controlflow-recursive                           3.1815+-0.0339            3.1131+-0.1030          might be 1.0220x faster
   crypto-aes                                      7.9432+-0.1245     ^      5.1281+-0.0759        ^ definitely 1.5490x faster
   crypto-md5                                      4.4196+-0.0349     ^      3.0194+-0.0297        ^ definitely 1.4637x faster
   crypto-sha1                                     3.3745+-0.0404     ^      2.7956+-0.0178        ^ definitely 1.2071x faster
   date-format-tofte                              15.3180+-0.2270           15.0403+-0.2231          might be 1.0185x faster
   date-format-xparb                               9.5845+-0.1476     ^      8.0790+-0.1620        ^ definitely 1.1863x faster
   math-cordic                                     4.0867+-0.0078     ?      4.3198+-0.9029        ? might be 1.0570x slower
   math-partial-sums                              12.4959+-0.0957     ^     12.1021+-0.0746        ^ definitely 1.0325x faster
   math-spectral-norm                              3.1934+-0.0110     ^      2.7070+-0.0119        ^ definitely 1.1797x faster
   regexp-dna                                     12.7434+-0.2371           12.6044+-0.2043          might be 1.0110x faster
   string-base64                                   5.0709+-0.0966     ?      5.1673+-0.2849        ? might be 1.0190x slower
   string-fasta                                   10.8532+-0.1253           10.6744+-0.1372          might be 1.0168x faster
   string-tagcloud                                14.5169+-0.2767           14.2301+-0.1903          might be 1.0202x faster
   string-unpack-code                             28.0144+-0.1109           27.6866+-0.5252          might be 1.0118x faster
   string-validate-input                           7.1890+-0.1689     ^      6.6443+-0.1905        ^ definitely 1.0820x faster

   <arithmetic> *                                  7.9574+-0.0694     ^      7.6007+-0.1121        ^ definitely 1.0469x faster
   <geometric>                                     6.4203+-0.0483     ^      6.0430+-0.0871        ^ definitely 1.0624x faster
   <harmonic>                                      5.1931+-0.0283     ^      4.8566+-0.0738        ^ definitely 1.0693x faster

                                                     TipOfTree                    CoCo                                       
V8Spider:
   crypto                                        247.3585+-0.4020     ^    242.7377+-0.5920        ^ definitely 1.0190x faster
   deltablue                                     126.7658+-0.1396     ^    106.2332+-7.5506        ^ definitely 1.1933x faster
   earley-boyer                                   83.7641+-0.4134     ^     74.7976+-5.4524        ^ definitely 1.1199x faster
   raytrace                                       62.7493+-0.2024     ^     37.6050+-0.1536        ^ definitely 1.6686x faster
   regexp                                        102.2530+-0.1857     ^    100.3067+-0.4839        ^ definitely 1.0194x faster
   richards                                      118.3580+-0.2858     ^    113.2298+-1.0252        ^ definitely 1.0453x faster
   splay                                          48.7244+-0.3366     ^     47.4327+-0.4667        ^ definitely 1.0272x faster

   <arithmetic>                                  112.8533+-0.1439     ^    103.1918+-1.7441        ^ definitely 1.0936x faster
   <geometric> *                                  99.5922+-0.1724     ^     87.3317+-1.5862        ^ definitely 1.1404x faster
   <harmonic>                                     89.1031+-0.2119     ^     74.8030+-1.1203        ^ definitely 1.1912x faster

                                                     TipOfTree                    CoCo                                       
Octane and V8v7:
   encrypt                                        1.49987+-0.00933          1.49929+-0.00906       
   decrypt                                       28.16765+-0.06705         28.15719+-0.04085       
   deltablue                             x2       0.56801+-0.00090    !     0.57283+-0.00201       ! definitely 1.0085x slower
   earley                                         0.88494+-0.00446          0.88448+-0.00576       
   boyer                                         12.78958+-0.09568         12.77414+-0.04585       
   raytrace                              x2       4.46236+-0.04225    ?     4.54341+-0.05089       ? might be 1.0182x slower
   regexp                                x2      32.43851+-0.17818         32.35575+-0.08666       
   richards                              x2       0.30187+-0.00008    !     0.31085+-0.00573       ! definitely 1.0297x slower
   splay                                 x2       0.62218+-0.01490    ?     0.65646+-0.04560       ? might be 1.0551x slower
   navier-stokes                         x2      10.79146+-0.01235    !    23.36345+-8.28389       ! definitely 2.1650x slower
   closure                                        0.31881+-0.03402          0.31669+-0.03423       
   jquery                                         4.47099+-0.54795    ?     4.51045+-0.55188       ?
   gbemu                                 x2     263.87720+-10.81013       258.18663+-11.14612        might be 1.0220x faster
   box2d                                 x2      33.77843+-0.13454    ?    33.86534+-0.28680       ?

V8v7:
   <arithmetic>                                   8.85693+-0.02612    !    10.43254+-1.03001       ! definitely 1.1779x slower
   <geometric> *                                  2.78715+-0.01020    !     3.05287+-0.15261       ! definitely 1.0953x slower
   <harmonic>                                     1.00189+-0.00451    !     1.03018+-0.01599       ! definitely 1.0282x slower

Octane including V8v7:
   <arithmetic>                                  33.71872+-1.00995    ?    34.35690+-1.24531       ? might be 1.0189x slower
   <geometric> *                                  4.88991+-0.07278    !     5.21323+-0.20748       ! definitely 1.0661x slower
   <harmonic>                                     1.13031+-0.02225    ?     1.15546+-0.03132       ? might be 1.0222x slower

                                                     TipOfTree                    CoCo                                       
Kraken:
   ai-astar                                       493.841+-0.650      ?     496.357+-4.487         ?
   audio-beat-detection                          1204.313+-3.631      !    1224.124+-10.203        ! definitely 1.0164x slower
   audio-dft                                      310.932+-0.909            309.026+-1.527         
   audio-fft                                     1144.335+-7.195           1131.977+-6.795           might be 1.0109x faster
   audio-oscillator                               233.131+-1.077            232.778+-1.102         
   imaging-darkroom                               292.055+-0.850            291.050+-0.829         
   imaging-desaturate                             853.947+-3.313            852.413+-3.945         
   imaging-gaussian-blur                          406.844+-5.471            403.461+-0.323         
   json-parse-financial                            82.815+-1.210             81.831+-0.319           might be 1.0120x faster
   json-stringify-tinderbox                        99.140+-0.318      !     100.808+-0.249         ! definitely 1.0168x slower
   stanford-crypto-aes                            183.598+-0.868            183.096+-1.188         
   stanford-crypto-ccm                            215.641+-1.535            190.852+-24.833          might be 1.1299x faster
   stanford-crypto-pbkdf2                         270.211+-0.635            269.042+-2.759         
   stanford-crypto-sha256-iterative               116.998+-0.515      ^     115.881+-0.389         ^ definitely 1.0096x faster

   <arithmetic> *                                 421.986+-0.899            420.192+-1.980           might be 1.0043x faster
   <geometric>                                    301.550+-0.579            297.993+-3.142           might be 1.0119x faster
   <harmonic>                                     222.969+-0.643            219.542+-3.109           might be 1.0156x faster

                                                     TipOfTree                    CoCo                                       
JSRegress:
   adapt-to-double-divide                         22.4739+-0.0930     ?     31.6828+-10.5863       ? might be 1.4098x slower
   aliased-arguments-getbyval                      0.9335+-0.0125     ^      0.8175+-0.0079        ^ definitely 1.1419x faster
   allocate-big-object                             2.5427+-0.0354     ?      2.6232+-0.2472        ? might be 1.0317x slower
   arity-mismatch-inlining                         0.7796+-0.0117     ?      1.0926+-0.3755        ? might be 1.4015x slower
   array-access-polymorphic-structure              7.0968+-0.0844     ^      6.8144+-0.0696        ^ definitely 1.0414x faster
   array-with-double-add                           5.8303+-0.0842            5.7995+-0.0644        
   array-with-double-increment                     4.1078+-0.0123     ?      4.1340+-0.0686        ?
   array-with-double-mul-add                       7.0954+-0.0899     ?     15.1081+-11.8846       ? might be 2.1293x slower
   array-with-double-sum                           7.8917+-0.1094     ?     15.5021+-8.7747        ? might be 1.9644x slower
   array-with-int32-add-sub                       10.4537+-0.1057     ?     10.5442+-0.1097        ?
   array-with-int32-or-double-sum                  7.9409+-0.0891     ?     10.4671+-5.4904        ? might be 1.3181x slower
   big-int-mul                                     4.9434+-0.0658     ?      5.6440+-1.2370        ? might be 1.1417x slower
   boolean-test                                    4.4148+-0.0760            4.3702+-0.0616          might be 1.0102x faster
   cast-int-to-double                             13.9257+-0.1357     ?     14.8823+-2.1077        ? might be 1.0687x slower
   cell-argument                                  14.4188+-0.1139     ?     17.8431+-3.9468        ? might be 1.2375x slower
   cfg-simplify                                    3.9719+-0.0662     ?      3.9803+-0.0439        ?
   cmpeq-obj-to-obj-other                         12.2042+-0.4054     ?     13.1207+-2.1311        ? might be 1.0751x slower
   constant-test                                   8.3285+-0.1645     ?     11.8873+-7.3971        ? might be 1.4273x slower
   direct-arguments-getbyval                       0.8539+-0.0080     ^      0.7389+-0.0307        ^ definitely 1.1557x faster
   double-pollution-getbyval                      10.7442+-0.1385     ?     14.5226+-4.6349        ? might be 1.3517x slower
   double-pollution-putbyoffset                    5.1270+-0.0994     ?      5.5488+-1.1070        ? might be 1.0823x slower
   empty-string-plus-int                          10.9205+-0.1937           10.8740+-0.1591        
   external-arguments-getbyval                     2.2125+-0.0260     ^      1.8978+-0.0153        ^ definitely 1.1658x faster
   external-arguments-putbyval                     3.3016+-0.0091            3.2219+-0.1288          might be 1.0247x faster
   Float32Array-matrix-mult                       14.2612+-0.0958     ^     13.7907+-0.1085        ^ definitely 1.0341x faster
   fold-double-to-int                             21.9474+-0.1485     ?     23.3149+-2.9695        ? might be 1.0623x slower
   function-dot-apply                              3.2041+-0.0132     ?      3.4739+-0.5573        ? might be 1.0842x slower
   function-test                                   4.9854+-0.0949     ?      5.6177+-1.4156        ? might be 1.1268x slower
   get-by-id-chain-from-try-block                  6.7581+-0.1894     ?      6.8534+-0.1469        ? might be 1.0141x slower
   HashMap-put-get-iterate-keys                   96.1959+-1.2993     ^     93.8202+-0.9095        ^ definitely 1.0253x faster
   HashMap-put-get-iterate                        99.3947+-1.0722     ^     97.4217+-0.6628        ^ definitely 1.0203x faster
   HashMap-string-put-get-iterate                 75.1716+-0.3378     ^     71.5885+-0.7680        ^ definitely 1.0501x faster
   indexed-properties-in-objects                   4.5383+-0.0646     ?      4.7673+-0.3401        ? might be 1.0505x slower
   inline-arguments-access                         1.2664+-0.0086     ?      1.9950+-0.8140        ? might be 1.5753x slower
   inline-arguments-local-escape                  23.1776+-0.1920           22.9687+-0.1958        
   inline-get-scoped-var                           6.5421+-0.1403     ?      9.4513+-4.0656        ? might be 1.4447x slower
   inlined-put-by-id-transition                   16.7348+-0.3361           16.3492+-0.5430          might be 1.0236x faster
   int-or-other-abs-then-get-by-val                8.7916+-0.0755     ?      9.6336+-1.9292        ? might be 1.0958x slower
   int-or-other-abs-zero-then-get-by-val          37.3649+-0.2140     ?     38.8876+-2.7263        ? might be 1.0408x slower
   int-or-other-add-then-get-by-val               10.2339+-0.0866     !     15.3102+-4.7637        ! definitely 1.4960x slower
   int-or-other-add                               10.4608+-0.0814     !     17.8036+-4.0116        ! definitely 1.7019x slower
   int-or-other-div-then-get-by-val                7.8719+-0.0881            7.8416+-0.0912        
   int-or-other-max-then-get-by-val                9.9540+-0.2137     ?     11.8540+-2.9076        ? might be 1.1909x slower
   int-or-other-min-then-get-by-val                8.0446+-0.0652     ?      8.9856+-1.3212        ? might be 1.1170x slower
   int-or-other-mod-then-get-by-val                8.0506+-0.1008            7.9150+-0.1011          might be 1.0171x faster
   int-or-other-mul-then-get-by-val                7.1369+-0.0927     ?      7.8557+-1.6019        ? might be 1.1007x slower
   int-or-other-neg-then-get-by-val                8.1312+-0.1002     !     10.1972+-1.6909        ! definitely 1.2541x slower
   int-or-other-neg-zero-then-get-by-val          36.5502+-0.2997           36.3447+-0.1276        
   int-or-other-sub-then-get-by-val               10.2345+-0.1038     ?     11.5472+-2.9679        ? might be 1.1283x slower
   int-or-other-sub                                8.2688+-0.0986     ?     11.0039+-3.2494        ? might be 1.3308x slower
   int-overflow-local                             12.8705+-0.0951     ?     13.7738+-2.2198        ? might be 1.0702x slower
   Int16Array-bubble-sort                         49.4250+-0.1755     ?     52.1285+-4.3410        ? might be 1.0547x slower
   Int16Array-load-int-mul                         1.8956+-0.0095     ?      2.6325+-1.5924        ? might be 1.3888x slower
   Int8Array-load                                  4.8114+-0.0689     ?      5.1702+-0.5741        ? might be 1.0746x slower
   integer-divide                                 15.1066+-0.1226     ?     17.8787+-6.1291        ? might be 1.1835x slower
   integer-modulo                                  2.1554+-0.0116     ^      2.0299+-0.0368        ^ definitely 1.0618x faster
   make-indexed-storage                            3.9397+-0.0472            3.8425+-0.2482          might be 1.0253x faster
   method-on-number                               25.9195+-0.5861           25.4819+-0.3081          might be 1.0172x faster
   nested-function-parsing-random                379.4199+-13.0925    ?    381.5733+-13.0563       ?
   nested-function-parsing                        47.9182+-1.0874           47.4300+-1.0438          might be 1.0103x faster
   new-array-buffer-dead                           3.6458+-0.0122     ?      6.2953+-5.8220        ? might be 1.7267x slower
   new-array-buffer-push                          10.4308+-0.2114           10.1843+-0.1701          might be 1.0242x faster
   new-array-dead                                 28.2093+-0.0816     !     42.3392+-9.3632        ! definitely 1.5009x slower
   new-array-push                                  6.9565+-0.1293            6.9478+-0.3606        
   number-test                                     4.3422+-0.0524            4.2916+-0.0562          might be 1.0118x faster
   object-closure-call                             8.4023+-0.1510     !     11.8350+-1.7579        ! definitely 1.4085x slower
   object-test                                     4.9035+-0.0348     ?      5.4789+-1.3989        ? might be 1.1173x slower
   poly-stricteq                                 984.3646+-13.4675    ?   1028.7016+-49.3085       ? might be 1.0450x slower
   polymorphic-structure                          20.0156+-0.1019     !     58.9155+-14.8085       ! definitely 2.9435x slower
   polyvariant-monomorphic-get-by-id              12.5454+-0.1149     ?     15.0380+-5.4914        ? might be 1.1987x slower
   rare-osr-exit-on-local                         20.6400+-0.1368     !     26.4702+-3.7327        ! definitely 1.2825x slower
   register-pressure-from-osr                     31.5666+-0.1310     ?     34.1304+-5.3456        ? might be 1.0812x slower
   simple-activation-demo                         34.3976+-0.1046     ?     61.1678+-58.8882       ? might be 1.7783x slower
   slow-array-profile-convergence                  4.4311+-0.0428     ^      4.1532+-0.1402        ^ definitely 1.0669x faster
   slow-convergence                                3.8460+-0.0511     ^      3.4631+-0.0825        ^ definitely 1.1105x faster
   sparse-conditional                              1.3262+-0.0136     ?      1.4397+-0.3124        ? might be 1.0856x slower
   splice-to-remove                               50.3002+-0.1648     ^     49.2651+-0.1984        ^ definitely 1.0210x faster
   string-concat-object                            2.7010+-0.0456            2.6934+-0.0286        
   string-concat-pair-object                       2.6667+-0.0202            2.6593+-0.0438        
   string-concat-pair-simple                      17.0172+-0.2936     ?     18.2761+-1.7396        ? might be 1.0740x slower
   string-concat-simple                           16.8689+-0.2328     ?     17.5145+-1.2728        ? might be 1.0383x slower
   string-cons-repeat                             10.0805+-0.0302     ?     12.7000+-2.9505        ? might be 1.2599x slower
   string-cons-tower                              10.9122+-0.0330           10.8540+-0.0572        
   string-equality                               106.5795+-0.1729     ^    104.4463+-0.1456        ^ definitely 1.0204x faster
   string-hash                                     2.6712+-0.0127     ^      2.5469+-0.0123        ^ definitely 1.0488x faster
   string-repeat-arith                            45.3087+-0.6742     ^     44.2748+-0.1749        ^ definitely 1.0234x faster
   string-sub                                     88.7269+-1.3198           87.4230+-1.7861          might be 1.0149x faster
   string-test                                     4.3118+-0.0364     ?      4.8124+-1.2970        ? might be 1.1161x slower
   structure-hoist-over-transitions                3.2898+-0.0278     ?      3.3188+-0.1845        ?
   tear-off-arguments-simple                       1.8101+-0.0077     ?      1.9340+-0.2435        ? might be 1.0684x slower
   tear-off-arguments                              3.3889+-0.0155            3.2447+-0.1354          might be 1.0444x faster
   temporal-structure                             20.7727+-0.0501     !     49.8408+-19.4181       ! definitely 2.3993x slower
   to-int32-boolean                               30.5883+-0.1113     ?     30.6623+-0.1350        ?
   undefined-test                                  4.5833+-0.0375            4.5686+-0.0860        

   <arithmetic>                                   30.4665+-0.2127     !     33.0164+-1.0522        ! definitely 1.0837x slower
   <geometric> *                                   9.6806+-0.0289     !     10.4774+-0.1859        ! definitely 1.0823x slower
   <harmonic>                                      5.2218+-0.0194     ?      5.3190+-0.0787        ? might be 1.0186x slower

                                                     TipOfTree                    CoCo                                       
All benchmarks:
   <arithmetic>                                   64.4805+-0.2557     ?     65.4113+-0.7401        ? might be 1.0144x slower
   <geometric>                                    12.2789+-0.0595     !     12.7496+-0.1594        ! definitely 1.0383x slower
   <harmonic>                                      3.8243+-0.0368     ?      3.8577+-0.0445        ? might be 1.0087x slower

                                                     TipOfTree                    CoCo                                       
Geomean of preferred means:
   <scaled-result>                                27.5350+-0.1518           27.3131+-0.2670          might be 1.0081x faster
Comment 6 Filip Pizlo 2013-05-19 12:04:44 PDT
Created attachment 202238 [details]
fixed more buggies

Still investigating things, but we're now up to 8% SunSpider speed-up.


Benchmark report for SunSpider, V8Spider, Octane, Kraken, and JSRegress on oldmac (MacPro4,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/fourthTier/OpenSource/WebKitBuild/Release/jsc (r150349)
"CoCo" at /Volumes/Data/fromMiniMe/fourthTier/secondary/OpenSource/WebKitBuild/Release/jsc (r150349)

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get
microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                     TipOfTree                    CoCo                                       
SunSpider:
   3d-cube                                         9.5285+-0.1540     ^      6.6626+-0.1032        ^ definitely 1.4301x faster
   3d-morph                                        8.8888+-0.1291     ^      8.5282+-0.1103        ^ definitely 1.0423x faster
   3d-raytrace                                    10.5736+-0.1471     ^      8.3586+-0.1267        ^ definitely 1.2650x faster
   access-binary-trees                             1.9797+-0.0078     ^      1.8267+-0.0135        ^ definitely 1.0838x faster
   access-fannkuch                                 7.8151+-0.1464     ^      7.4967+-0.0599        ^ definitely 1.0425x faster
   access-nbody                                    4.7413+-0.0605     ^      3.9856+-0.0109        ^ definitely 1.1896x faster
   access-nsieve                                   4.9422+-0.0293     ^      4.8525+-0.0294        ^ definitely 1.0185x faster
   bitops-3bit-bits-in-byte                        1.8526+-0.0120     ^      1.7759+-0.0164        ^ definitely 1.0432x faster
   bitops-bits-in-byte                             6.6707+-0.1072     ?      6.6850+-0.1330        ?
   bitops-bitwise-and                              2.7080+-0.0744     ?      2.7749+-0.0337        ? might be 1.0247x slower
   bitops-nsieve-bits                              4.7892+-0.0502     ^      4.4798+-0.0357        ^ definitely 1.0691x faster
   controlflow-recursive                           3.1806+-0.0350     ^      3.0745+-0.0192        ^ definitely 1.0345x faster
   crypto-aes                                      7.8931+-0.1327     ^      5.1704+-0.0682        ^ definitely 1.5266x faster
   crypto-md5                                      4.4070+-0.0595     ^      3.0655+-0.0275        ^ definitely 1.4376x faster
   crypto-sha1                                     3.3563+-0.0149     ^      2.8171+-0.0185        ^ definitely 1.1914x faster
   date-format-tofte                              15.2275+-0.1845           14.9006+-0.1817          might be 1.0219x faster
   date-format-xparb                               9.7970+-0.2375     ^      8.3792+-0.1627        ^ definitely 1.1692x faster
   math-cordic                                     4.0941+-0.0115     ^      3.9618+-0.0486        ^ definitely 1.0334x faster
   math-partial-sums                              12.5590+-0.1206     ^     12.2155+-0.1270        ^ definitely 1.0281x faster
   math-spectral-norm                              3.1948+-0.0121     ^      2.7176+-0.0178        ^ definitely 1.1756x faster
   regexp-dna                                     12.7943+-0.2196           12.6545+-0.1833          might be 1.0110x faster
   string-base64                                   5.0671+-0.0789            5.0279+-0.0450        
   string-fasta                                   10.8400+-0.1384           10.7529+-0.1644        
   string-tagcloud                                14.4488+-0.1885           14.1776+-0.2011          might be 1.0191x faster
   string-unpack-code                             28.0919+-0.1948     ^     27.5249+-0.1342        ^ definitely 1.0206x faster
   string-validate-input                           7.1768+-0.1531     ^      6.4514+-0.1304        ^ definitely 1.1124x faster

   <arithmetic> *                                  7.9468+-0.0666     ^      7.3199+-0.0543        ^ definitely 1.0856x faster
   <geometric>                                     6.4135+-0.0449     ^      5.8037+-0.0351        ^ definitely 1.1051x faster
   <harmonic>                                      5.1892+-0.0250     ^      4.6901+-0.0211        ^ definitely 1.1064x faster

                                                     TipOfTree                    CoCo                                       
V8Spider:
   crypto                                        247.7954+-0.5886     ^    243.3546+-0.8866        ^ definitely 1.0182x faster
   deltablue                                     126.7398+-0.0859     ^    102.6285+-1.0971        ^ definitely 1.2349x faster
   earley-boyer                                   83.8980+-0.3234     ^     69.5889+-0.2082        ^ definitely 1.2056x faster
   raytrace                                       62.7492+-0.2910     ^     38.0090+-0.1596        ^ definitely 1.6509x faster
   regexp                                        102.9754+-0.9878     ^     99.7671+-0.1285        ^ definitely 1.0322x faster
   richards                                      118.3877+-0.2179     ^    112.6261+-0.8601        ^ definitely 1.0512x faster
   splay                                          48.7556+-0.1596     ^     47.0282+-0.3036        ^ definitely 1.0367x faster

   <arithmetic>                                  113.0430+-0.2012     ^    101.8575+-0.2796        ^ definitely 1.1098x faster
   <geometric> *                                  99.7493+-0.1809     ^     86.0376+-0.2319        ^ definitely 1.1594x faster
   <harmonic>                                     89.2255+-0.1756     ^     73.9141+-0.1959        ^ definitely 1.2072x faster

                                                     TipOfTree                    CoCo                                       
Octane and V8v7:
   encrypt                                        1.49364+-0.00896    ?     1.49674+-0.00804       ?
   decrypt                                       28.19836+-0.04289         28.16538+-0.05527       
   deltablue                             x2       0.56748+-0.00049    !     0.57325+-0.00166       ! definitely 1.0102x slower
   earley                                         0.88295+-0.00333          0.87931+-0.00275       
   boyer                                         12.74352+-0.01894    !    12.81610+-0.04487       ! definitely 1.0057x slower
   raytrace                              x2       4.43320+-0.02796    !     4.51827+-0.03663       ! definitely 1.0192x slower
   regexp                                x2      32.55321+-0.15627         32.54845+-0.17454       
   richards                              x2       0.30373+-0.00066    !     0.31527+-0.00570       ! definitely 1.0380x slower
   splay                                 x2       0.61463+-0.00495    ?     0.64555+-0.02756       ? might be 1.0503x slower
   navier-stokes                         x2      10.78794+-0.01333         10.77550+-0.00845       
   closure                                        0.31921+-0.03402          0.31802+-0.03442       
   jquery                                         4.48681+-0.54602    ?     4.49695+-0.54853       ?
   gbemu                                 x2     262.74903+-10.90715       258.06415+-12.03657        might be 1.0182x faster
   box2d                                 x2      33.82863+-0.11753         33.76532+-0.21236       

V8v7:
   <arithmetic>                                   8.86493+-0.02362    ?     8.88188+-0.02486       ? might be 1.0019x slower
   <geometric> *                                  2.78219+-0.00647    !     2.82190+-0.01711       ! definitely 1.0143x slower
   <harmonic>                                     1.00140+-0.00210    !     1.02883+-0.01185       ! definitely 1.0274x slower

Octane including V8v7:
   <arithmetic>                                  33.62728+-1.01897         33.20837+-1.12328         might be 1.0126x faster
   <geometric> *                                  4.88340+-0.07030    ?     4.92456+-0.07864       ? might be 1.0084x slower
   <harmonic>                                     1.13010+-0.02092    ?     1.15460+-0.02540       ? might be 1.0217x slower

                                                     TipOfTree                    CoCo                                       
Kraken:
   ai-astar                                       494.364+-0.295      ^     491.928+-0.577         ^ definitely 1.0050x faster
   audio-beat-detection                          1210.120+-7.781      ?    1225.708+-18.890        ? might be 1.0129x slower
   audio-dft                                      310.353+-0.997      ^     308.155+-0.674         ^ definitely 1.0071x faster
   audio-fft                                     1132.330+-5.151      ?    1230.324+-98.202        ? might be 1.0865x slower
   audio-oscillator                               233.828+-1.205            232.605+-0.973         
   imaging-darkroom                               291.906+-0.870            291.580+-0.877         
   imaging-desaturate                             853.856+-3.697            852.350+-3.563         
   imaging-gaussian-blur                          403.180+-0.329      ?     403.314+-0.203         ?
   json-parse-financial                            81.987+-0.429             81.485+-0.206         
   json-stringify-tinderbox                        99.899+-0.374             99.860+-0.293         
   stanford-crypto-aes                            185.453+-1.308            184.623+-0.956         
   stanford-crypto-ccm                            213.829+-1.239      ?     215.180+-2.589         ?
   stanford-crypto-pbkdf2                         270.560+-0.884      !     272.451+-0.965         ! definitely 1.0070x slower
   stanford-crypto-sha256-iterative               116.885+-0.365      ^     115.510+-0.158         ^ definitely 1.0119x faster

   <arithmetic> *                                 421.325+-1.027      ?     428.934+-7.186         ? might be 1.0181x slower
   <geometric>                                    301.263+-0.526      ?     302.555+-1.703         ? might be 1.0043x slower
   <harmonic>                                     222.795+-0.509            222.298+-0.468           might be 1.0022x faster

                                                     TipOfTree                    CoCo                                       
JSRegress:
   adapt-to-double-divide                         22.4466+-0.0983     ?     22.5301+-0.1122        ?
   aliased-arguments-getbyval                      0.9305+-0.0109     ^      0.8241+-0.0078        ^ definitely 1.1291x faster
   allocate-big-object                             2.6085+-0.0957     ^      2.4611+-0.0276        ^ definitely 1.0599x faster
   arity-mismatch-inlining                         0.7793+-0.0115     ?      0.7802+-0.0195        ?
   array-access-polymorphic-structure              7.0565+-0.1269     ^      6.8076+-0.0788        ^ definitely 1.0366x faster
   array-with-double-add                           5.8062+-0.0914     ?      5.8666+-0.0731        ? might be 1.0104x slower
   array-with-double-increment                     4.1395+-0.0935            4.1098+-0.0347        
   array-with-double-mul-add                       7.0991+-0.1008            7.0900+-0.0834        
   array-with-double-sum                           7.8402+-0.1096     ?      7.9001+-0.0895        ?
   array-with-int32-add-sub                       10.3817+-0.0805     ?     10.4571+-0.0570        ?
   array-with-int32-or-double-sum                  7.9751+-0.0966            7.9568+-0.0838        
   big-int-mul                                     4.9398+-0.0774     ^      4.7980+-0.0588        ^ definitely 1.0295x faster
   boolean-test                                    4.4610+-0.0665            4.3665+-0.0777          might be 1.0216x faster
   cast-int-to-double                             13.8484+-0.1551     ?     13.8849+-0.1852        ?
   cell-argument                                  14.4327+-0.1460           14.4141+-0.0964        
   cfg-simplify                                    3.9975+-0.0456            3.9727+-0.0631        
   cmpeq-obj-to-obj-other                         12.1682+-0.2698           12.1626+-0.3441        
   constant-test                                   8.4068+-0.1362     ?      8.5511+-0.0826        ? might be 1.0172x slower
   direct-arguments-getbyval                       0.8562+-0.0086     ^      0.7316+-0.0112        ^ definitely 1.1703x faster
   double-pollution-getbyval                      10.7168+-0.1099     ?     10.9031+-0.1260        ? might be 1.0174x slower
   double-pollution-putbyoffset                    5.0900+-0.1512            4.9592+-0.0735          might be 1.0264x faster
   empty-string-plus-int                          10.9669+-0.2028           10.8820+-0.1666        
   external-arguments-getbyval                     2.2131+-0.0352     ^      1.9324+-0.0121        ^ definitely 1.1453x faster
   external-arguments-putbyval                     3.2895+-0.0200     ^      3.1656+-0.0221        ^ definitely 1.0391x faster
   Float32Array-matrix-mult                       14.2250+-0.0722     ^     13.5711+-0.1048        ^ definitely 1.0482x faster
   fold-double-to-int                             21.9252+-0.3220     ?     22.0347+-0.1511        ?
   function-dot-apply                              3.2002+-0.0133     ^      3.1235+-0.0254        ^ definitely 1.0245x faster
   function-test                                   4.9808+-0.0608     ?      5.0180+-0.0878        ?
   get-by-id-chain-from-try-block                  6.7345+-0.1870            6.7021+-0.1603        
   HashMap-put-get-iterate-keys                   96.9098+-1.6110     ^     92.4773+-0.8929        ^ definitely 1.0479x faster
   HashMap-put-get-iterate                       100.5368+-0.5532     ^     95.4911+-0.9847        ^ definitely 1.0528x faster
   HashMap-string-put-get-iterate                 75.2821+-0.5074     ^     72.5220+-0.7001        ^ definitely 1.0381x faster
   indexed-properties-in-objects                   4.5818+-0.0126     ^      4.4151+-0.0072        ^ definitely 1.0378x faster
   inline-arguments-access                         1.2674+-0.0093     !      1.2855+-0.0085        ! definitely 1.0143x slower
   inline-arguments-local-escape                  23.2293+-0.1921           23.1472+-0.2079        
   inline-get-scoped-var                           6.5505+-0.0923     ?      6.6310+-0.0851        ? might be 1.0123x slower
   inlined-put-by-id-transition                   16.8299+-0.2828     ^     16.1305+-0.2523        ^ definitely 1.0434x faster
   int-or-other-abs-then-get-by-val                8.8349+-0.0854            8.8140+-0.0954        
   int-or-other-abs-zero-then-get-by-val          37.2495+-0.1369     ^     36.9133+-0.1011        ^ definitely 1.0091x faster
   int-or-other-add-then-get-by-val               10.2423+-0.1246           10.2387+-0.1324        
   int-or-other-add                               10.5475+-0.1212     ?     10.6019+-0.0853        ?
   int-or-other-div-then-get-by-val                7.8563+-0.0869            7.7964+-0.0981        
   int-or-other-max-then-get-by-val               10.0489+-0.1153     ?     10.0814+-0.1695        ?
   int-or-other-min-then-get-by-val                8.2059+-0.1147     ^      8.0141+-0.0687        ^ definitely 1.0239x faster
   int-or-other-mod-then-get-by-val                8.0701+-0.0709            7.9355+-0.0870          might be 1.0170x faster
   int-or-other-mul-then-get-by-val                7.2558+-0.0939     ^      7.0464+-0.0838        ^ definitely 1.0297x faster
   int-or-other-neg-then-get-by-val                8.1465+-0.1022            8.0454+-0.0954          might be 1.0126x faster
   int-or-other-neg-zero-then-get-by-val          36.8138+-0.3771           36.4259+-0.2154          might be 1.0106x faster
   int-or-other-sub-then-get-by-val               10.3160+-0.1392           10.2053+-0.0971          might be 1.0108x faster
   int-or-other-sub                                8.2569+-0.0999            8.2380+-0.0979        
   int-overflow-local                             12.8636+-0.1229           12.7774+-0.1217        
   Int16Array-bubble-sort                         49.3466+-0.1220           49.1553+-0.1455        
   Int16Array-load-int-mul                         1.8942+-0.0112     !      1.9136+-0.0067        ! definitely 1.0102x slower
   Int8Array-load                                  4.8423+-0.0599            4.7814+-0.0618          might be 1.0127x faster
   integer-divide                                 15.1039+-0.1330     ?     15.2086+-0.1146        ?
   integer-modulo                                  2.1546+-0.0110     ^      2.0247+-0.0153        ^ definitely 1.0641x faster
   make-indexed-storage                            3.9221+-0.0345     ^      3.5742+-0.0347        ^ definitely 1.0973x faster
   method-on-number                               25.3716+-0.4273           25.3462+-0.4099        
   nested-function-parsing-random                379.8008+-12.9035    ?    379.9561+-13.0431       ?
   nested-function-parsing                        47.9680+-1.0905           47.5330+-1.0243        
   new-array-buffer-dead                           3.6434+-0.0123     ?      3.6564+-0.0304        ?
   new-array-buffer-push                          10.4945+-0.2015           10.2637+-0.1779          might be 1.0225x faster
   new-array-dead                                 28.2327+-0.1104     ?     28.3446+-0.0945        ?
   new-array-push                                  6.9503+-0.1022            6.8054+-0.0983          might be 1.0213x faster
   number-test                                     4.3402+-0.0535            4.2790+-0.0582          might be 1.0143x faster
   object-closure-call                             8.4685+-0.0987     !     10.5277+-1.9032        ! definitely 1.2432x slower
   object-test                                     4.9035+-0.0603            4.8286+-0.0750          might be 1.0155x faster
   poly-stricteq                                 980.3060+-7.8710     !   1029.5500+-41.2130       ! definitely 1.0502x slower
   polymorphic-structure                          20.0159+-0.1304     !     20.8507+-0.0808        ! definitely 1.0417x slower
   polyvariant-monomorphic-get-by-id              12.5403+-0.0946           12.4817+-0.0682        
   rare-osr-exit-on-local                         20.4776+-0.0831     ?     20.5206+-0.0833        ?
   register-pressure-from-osr                     31.4887+-0.1135           31.2958+-0.1150        
   simple-activation-demo                         34.5363+-0.2762           34.2328+-0.0794        
   slow-array-profile-convergence                  4.4793+-0.0917     ^      4.0832+-0.0466        ^ definitely 1.0970x faster
   slow-convergence                                3.8507+-0.0066     ^      3.4197+-0.0096        ^ definitely 1.1261x faster
   sparse-conditional                              1.3163+-0.0188     ?      1.3168+-0.0166        ?
   splice-to-remove                               50.1300+-0.1299     ?     50.2866+-0.1812        ?
   string-concat-object                            2.7650+-0.0503            2.7096+-0.0393          might be 1.0204x faster
   string-concat-pair-object                       2.6786+-0.0516            2.6283+-0.0225          might be 1.0192x faster
   string-concat-pair-simple                      17.2471+-0.2528     ^     16.7335+-0.2530        ^ definitely 1.0307x faster
   string-concat-simple                           16.9704+-0.3704     ?     17.0389+-0.3626        ?
   string-cons-repeat                             10.0613+-0.0222     !     10.1398+-0.0250        ! definitely 1.0078x slower
   string-cons-tower                              10.9021+-0.0357           10.8617+-0.0354        
   string-equality                               106.4793+-0.1430          105.6643+-1.4763        
   string-hash                                     2.6725+-0.0152     ^      2.5458+-0.0092        ^ definitely 1.0497x faster
   string-repeat-arith                            45.2100+-0.3428     ?     45.2340+-0.2505        ?
   string-sub                                     87.5054+-0.8567     ?     89.4133+-1.1788        ? might be 1.0218x slower
   string-test                                     4.3056+-0.0354     ^      4.2396+-0.0269        ^ definitely 1.0156x faster
   structure-hoist-over-transitions                3.2815+-0.0216     ^      3.1741+-0.0264        ^ definitely 1.0338x faster
   tear-off-arguments-simple                       1.8062+-0.0077     ^      1.7394+-0.0074        ^ definitely 1.0384x faster
   tear-off-arguments                              3.3933+-0.0155     ^      3.1772+-0.0196        ^ definitely 1.0680x faster
   temporal-structure                             20.9042+-0.0911     ?     20.9465+-0.0642        ?
   to-int32-boolean                               30.5509+-0.0859     ?     30.6098+-0.1244        ?
   undefined-test                                  4.5289+-0.0791            4.5066+-0.0441        

   <arithmetic>                                   30.4389+-0.1196     ?     30.7956+-0.3653        ? might be 1.0117x slower
   <geometric> *                                   9.6932+-0.0261     ^      9.5507+-0.0353        ^ definitely 1.0149x faster
   <harmonic>                                      5.2284+-0.0169     ^      5.0416+-0.0231        ^ definitely 1.0371x faster

                                                     TipOfTree                    CoCo                                       
All benchmarks:
   <arithmetic>                                   64.4020+-0.2359     ?     64.6243+-0.5400        ? might be 1.0035x slower
   <geometric>                                    12.2837+-0.0565     ^     11.9286+-0.0628        ^ definitely 1.0298x faster
   <harmonic>                                      3.8258+-0.0348            3.7547+-0.0388          might be 1.0189x faster

                                                     TipOfTree                    CoCo                                       
Geomean of preferred means:
   <scaled-result>                                27.5276+-0.1477     ^     26.3489+-0.1858        ^ definitely 1.0447x faster
Comment 7 Filip Pizlo 2013-05-19 22:28:23 PDT
Created attachment 202256 [details]
moar

Still playing around with alternatives.
Comment 8 Filip Pizlo 2013-05-20 15:50:43 PDT
Created attachment 202329 [details]
more things

Changed how GC is deferred.  This appears to be an improvement and it fixes some crashes.  Still testing though.
Comment 9 Filip Pizlo 2013-05-20 23:40:51 PDT
Created attachment 202380 [details]
the patch

It passes tests.
Comment 10 Geoffrey Garen 2013-05-21 09:22:21 PDT
Comment on attachment 202380 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=202380&action=review

r=me

> Source/JavaScriptCore/jit/JITStubs.cpp:959
> +        worklistState =
> +            stackFrame.vm->worklist->completeAllReadyPlansForVM(*stackFrame.vm, codeBlock);

This seems OK for now, but it might become a problem if LLVM compilation sticks at ~4ms / function. If you had 4 functions queued up, one call to cti_optimize would cause you to drop an animation frame. If you had 16 functions queued up, you'd drop 4 frames. Basically, there's a tradeoff here between startup time and responsiveness, and the nob is turned all the way toward startup time right now, which is probably the wrong balance.

> Source/JavaScriptCore/runtime/JSSegmentedVariableObject.h:97
> +    typedef ConcurrentJITLock Lock;
> +    typedef ConcurrentJITLocker Locker;

Actually, I think this reads even better without the typedef, since it calls out the other client you're locking against.
Comment 11 Filip Pizlo 2013-05-21 09:29:08 PDT
(In reply to comment #10)
> (From update of attachment 202380 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=202380&action=review
> 
> r=me
> 
> > Source/JavaScriptCore/jit/JITStubs.cpp:959
> > +        worklistState =
> > +            stackFrame.vm->worklist->completeAllReadyPlansForVM(*stackFrame.vm, codeBlock);
> 
> This seems OK for now, but it might become a problem if LLVM compilation sticks at ~4ms / function. If you had 4 functions queued up, one call to cti_optimize would cause you to drop an animation frame. If you had 16 functions queued up, you'd drop 4 frames. Basically, there's a tradeoff here between startup time and responsiveness, and the nob is turned all the way toward startup time right now, which is probably the wrong balance.

No it won't, there's no trade-off.  This method, "complete all ready [sic] plans for VM", does not wait for the compilation thread.  It just completes the ones that are done.  The only waiting it does is on the Worklist::m_lock, which is only held when modifying the queue/map/finished data structures.  Those modifications are pretty quick - at worst, grab the lock, loop over the list, release the lock.  The compilation thread only holds the lock for a very short time - at worst to dequeue or to append.

This is distinct from completeAllPlansForVM(), which does wait for the compilation thread to finish.

> 
> > Source/JavaScriptCore/runtime/JSSegmentedVariableObject.h:97
> > +    typedef ConcurrentJITLock Lock;
> > +    typedef ConcurrentJITLocker Locker;
> 
> Actually, I think this reads even better without the typedef, since it calls out the other client you're locking against.

Do you advocate making this change throughout?  There are a bunch of places that use the typedef-ConcurrentJITLock-to-Lock idiom.  I kind of like how this makes most code terse but still lets you find what the typedef means.

But I kind of like the change you're proposing.  Since that would be a larger patch (all of the places that use ConcurrentJITLock, and there are a bunch of them), how about I do it in a separate bug?
Comment 12 Filip Pizlo 2013-05-21 10:40:04 PDT
Most up-to-date numbers:


Benchmark report for SunSpider, V8Spider, Octane, Kraken, and JSRegress on oldmac (MacPro4,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/fourthTier/OpenSource/WebKitBuild/Release/jsc (r150398)
"CoCo" at /Volumes/Data/fromMiniMe/fourthTier/secondary/OpenSource/WebKitBuild/Release/jsc (r150398)

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get
microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                     TipOfTree                    CoCo                                       
SunSpider:
   3d-cube                                         9.6499+-0.2056     ^      6.7133+-0.1048        ^ definitely 1.4374x faster
   3d-morph                                        8.9624+-0.1326     ^      8.5918+-0.1158        ^ definitely 1.0431x faster
   3d-raytrace                                    10.6995+-0.1522     ^      8.3448+-0.1436        ^ definitely 1.2822x faster
   access-binary-trees                             1.9843+-0.0099            1.9258+-0.0948          might be 1.0304x faster
   access-fannkuch                                 7.8671+-0.1521     ^      7.5370+-0.0595        ^ definitely 1.0438x faster
   access-nbody                                    4.7480+-0.0643     ^      4.0090+-0.0225        ^ definitely 1.1844x faster
   access-nsieve                                   4.9202+-0.0510     ^      4.7887+-0.0721        ^ definitely 1.0275x faster
   bitops-3bit-bits-in-byte                        1.8576+-0.0104     ^      1.7855+-0.0192        ^ definitely 1.0404x faster
   bitops-bits-in-byte                             6.6267+-0.0955            6.5624+-0.0832        
   bitops-bitwise-and                              2.8060+-0.0186            2.7799+-0.0239        
   bitops-nsieve-bits                              4.7735+-0.0727     ^      4.4933+-0.0226        ^ definitely 1.0624x faster
   controlflow-recursive                           3.1904+-0.0353     ^      3.0758+-0.0189        ^ definitely 1.0372x faster
   crypto-aes                                      8.0045+-0.1763     ^      5.1721+-0.0808        ^ definitely 1.5476x faster
   crypto-md5                                      4.3211+-0.0290     ^      3.1718+-0.0213        ^ definitely 1.3624x faster
   crypto-sha1                                     3.3981+-0.0154     ^      2.8351+-0.0177        ^ definitely 1.1986x faster
   date-format-tofte                              15.3863+-0.1878     ^     14.5663+-0.1854        ^ definitely 1.0563x faster
   date-format-xparb                               9.4774+-0.1809     ^      8.2224+-0.1213        ^ definitely 1.1526x faster
   math-cordic                                     4.1031+-0.0089     ^      3.9208+-0.0402        ^ definitely 1.0465x faster
   math-partial-sums                              12.5822+-0.1089     ^     12.2123+-0.1300        ^ definitely 1.0303x faster
   math-spectral-norm                              3.2215+-0.0147     ^      2.7153+-0.0182        ^ definitely 1.1864x faster
   regexp-dna                                     12.5889+-0.1631     ?     12.7145+-0.1816        ?
   string-base64                                   5.0933+-0.0910            5.0118+-0.0700          might be 1.0163x faster
   string-fasta                                   10.7299+-0.0893           10.7248+-0.1361        
   string-tagcloud                                14.3621+-0.1760           14.2318+-0.1842        
   string-unpack-code                             28.3396+-0.1400     ^     27.6466+-0.1300        ^ definitely 1.0251x faster
   string-validate-input                           7.1897+-0.1548     ^      6.5280+-0.0847        ^ definitely 1.1014x faster

   <arithmetic> *                                  7.9571+-0.0665     ^      7.3185+-0.0530        ^ definitely 1.0873x faster
   <geometric>                                     6.4260+-0.0464     ^      5.8174+-0.0378        ^ definitely 1.1046x faster
   <harmonic>                                      5.2093+-0.0267     ^      4.7217+-0.0327        ^ definitely 1.1033x faster

                                                     TipOfTree                    CoCo                                       
V8Spider:
   crypto                                        252.3019+-6.3185     ^    242.0084+-0.3358        ^ definitely 1.0425x faster
   deltablue                                     127.7899+-0.8304     ^    102.0635+-0.6066        ^ definitely 1.2521x faster
   earley-boyer                                   84.0409+-0.2748     ^     69.8289+-0.3471        ^ definitely 1.2035x faster
   raytrace                                       63.6212+-0.3199     ^     38.1306+-0.1315        ^ definitely 1.6685x faster
   regexp                                        102.3147+-0.5922     ^     99.8834+-0.3194        ^ definitely 1.0243x faster
   richards                                      118.7354+-0.2458     ^    112.7906+-0.8365        ^ definitely 1.0527x faster
   splay                                          48.7566+-0.2879     ^     47.2623+-0.5742        ^ definitely 1.0316x faster

   <arithmetic>                                  113.9372+-0.9922     ^    101.7097+-0.1476        ^ definitely 1.1202x faster
   <geometric> *                                 100.2889+-0.4861     ^     86.0756+-0.1392        ^ definitely 1.1651x faster
   <harmonic>                                     89.6062+-0.2913     ^     74.0569+-0.1679        ^ definitely 1.2100x faster

                                                     TipOfTree                    CoCo                                       
Octane and V8v7:
   encrypt                                        1.49828+-0.00917          1.49704+-0.00886       
   decrypt                                       28.21896+-0.04017         28.14959+-0.14971       
   deltablue                             x2       0.57479+-0.00275    ?     0.57546+-0.00286       ?
   earley                                         0.88702+-0.00528    ?     0.89320+-0.00329       ?
   boyer                                         12.76316+-0.02460    ?    12.81662+-0.04635       ?
   raytrace                              x2       4.43644+-0.00628    !     4.47048+-0.02746       ! definitely 1.0077x slower
   regexp                                x2      32.48712+-0.13902    ?    32.61359+-0.18111       ?
   richards                              x2       0.30402+-0.00062    !     0.31225+-0.00339       ! definitely 1.0271x slower
   splay                                 x2       0.61994+-0.00764    ?     0.62812+-0.01887       ? might be 1.0132x slower
   navier-stokes                         x2      10.78609+-0.01360         10.77549+-0.00675       
   closure                                        0.31899+-0.03415    ?     0.31931+-0.03499       ?
   jquery                                         4.49077+-0.55383          4.47929+-0.54674       
   gbemu                                 x2     264.59222+-10.11509       258.36364+-12.90555        might be 1.0241x faster
   box2d                                 x2      33.82537+-0.14673         33.74638+-0.16887       

V8v7:
   <arithmetic>                                   8.86151+-0.02077    ?     8.88170+-0.02389       ? might be 1.0023x slower
   <geometric> *                                  2.79117+-0.00590    !     2.81027+-0.01164       ! definitely 1.0068x slower
   <harmonic>                                     1.00681+-0.00232    !     1.02130+-0.00737       ! definitely 1.0144x slower

Octane including V8v7:
   <arithmetic>                                  33.79223+-0.94658         33.23299+-1.19610         might be 1.0168x faster
   <geometric> *                                  4.89788+-0.06747    ?     4.90984+-0.07650       ? might be 1.0024x slower
   <harmonic>                                     1.13487+-0.02029    ?     1.14839+-0.02437       ? might be 1.0119x slower

                                                     TipOfTree                    CoCo                                       
Kraken:
   ai-astar                                       494.467+-0.810      ^     492.339+-0.497         ^ definitely 1.0043x faster
   audio-beat-detection                          1201.365+-6.093      ?    1251.031+-68.854        ? might be 1.0413x slower
   audio-dft                                      310.148+-0.726      ?     310.679+-0.772         ?
   audio-fft                                     1138.103+-13.501     ?    1194.025+-59.244        ? might be 1.0491x slower
   audio-oscillator                               234.181+-1.042            233.800+-0.967         
   imaging-darkroom                               290.995+-1.136            290.596+-0.884         
   imaging-desaturate                             849.050+-3.729      ?     849.382+-4.051         ?
   imaging-gaussian-blur                          403.118+-0.605      ?     404.140+-2.560         ?
   json-parse-financial                            81.772+-0.193      ^      80.663+-0.203         ^ definitely 1.0137x faster
   json-stringify-tinderbox                        99.190+-0.263      ?      99.664+-0.355         ?
   stanford-crypto-aes                            183.103+-0.577      ^     181.488+-0.813         ^ definitely 1.0089x faster
   stanford-crypto-ccm                            214.439+-1.003            191.840+-25.989          might be 1.1178x faster
   stanford-crypto-pbkdf2                         269.982+-1.299      ^     265.512+-2.326         ^ definitely 1.0168x faster
   stanford-crypto-sha256-iterative               115.778+-0.193            115.552+-0.124         

   <arithmetic> *                                 420.406+-1.448      ?     425.765+-9.608         ? might be 1.0127x slower
   <geometric>                                    300.374+-0.569            298.448+-4.175           might be 1.0065x faster
   <harmonic>                                     221.866+-0.313      ^     218.375+-3.120         ^ definitely 1.0160x faster

                                                     TipOfTree                    CoCo                                       
JSRegress:
   adapt-to-double-divide                         22.4341+-0.0826     ?     22.5966+-0.1355        ?
   aliased-arguments-getbyval                      0.9417+-0.0102     ^      0.8408+-0.0091        ^ definitely 1.1200x faster
   allocate-big-object                             2.5568+-0.0350            2.5014+-0.0249          might be 1.0222x faster
   arity-mismatch-inlining                         0.7822+-0.0136     ?      0.7833+-0.0211        ?
   array-access-polymorphic-structure              7.2140+-0.1301     ^      6.7576+-0.0936        ^ definitely 1.0675x faster
   array-with-double-add                           5.7864+-0.1013     ?      5.8422+-0.0693        ?
   array-with-double-increment                     4.0798+-0.0531     ?      4.2154+-0.0862        ? might be 1.0332x slower
   array-with-double-mul-add                       7.1538+-0.0799     ?      7.1720+-0.0855        ?
   array-with-double-sum                           7.8619+-0.0979     ?      7.8726+-0.0719        ?
   array-with-int32-add-sub                       10.6305+-0.1364           10.5590+-0.0803        
   array-with-int32-or-double-sum                  7.9269+-0.1150     ?      8.0773+-0.1024        ? might be 1.0190x slower
   big-int-mul                                     4.8795+-0.0737            4.8129+-0.0411          might be 1.0138x faster
   boolean-test                                    4.4385+-0.0589            4.3904+-0.0627          might be 1.0110x faster
   cast-int-to-double                             13.8579+-0.0780     ?     13.9432+-0.1146        ?
   cell-argument                                  14.3010+-0.0312     !     14.4514+-0.0801        ! definitely 1.0105x slower
   cfg-simplify                                    4.0578+-0.0474            4.0042+-0.0364          might be 1.0134x faster
   cmpeq-obj-to-obj-other                         12.2170+-0.3348     ?     12.3393+-0.3465        ? might be 1.0100x slower
   constant-test                                   8.5006+-0.1432     ?      8.5705+-0.1229        ?
   direct-arguments-getbyval                       0.8609+-0.0087     ^      0.7323+-0.0073        ^ definitely 1.1756x faster
   double-pollution-getbyval                      10.7288+-0.1148     ?     10.8311+-0.1005        ?
   double-pollution-putbyoffset                    5.1052+-0.1032            4.9191+-0.0994          might be 1.0378x faster
   empty-string-plus-int                          10.8354+-0.1787     ?     10.8436+-0.1678        ?
   external-arguments-getbyval                     2.2258+-0.0301     ^      1.9093+-0.0095        ^ definitely 1.1658x faster
   external-arguments-putbyval                     3.3275+-0.0165            3.2827+-0.1412          might be 1.0137x faster
   Float32Array-matrix-mult                       14.2371+-0.0983     ^     13.7713+-0.2161        ^ definitely 1.0338x faster
   fold-double-to-int                             21.9802+-0.3169     ?     22.0115+-0.2034        ?
   function-dot-apply                              3.2082+-0.0135     ^      3.1038+-0.0120        ^ definitely 1.0336x faster
   function-test                                   5.1082+-0.0752            5.0986+-0.0716        
   get-by-id-chain-from-try-block                  6.8377+-0.1457            6.7745+-0.1775        
   HashMap-put-get-iterate-keys                   94.8348+-0.8266           93.9742+-1.9139        
   HashMap-put-get-iterate                        98.8796+-1.1131           97.1083+-0.9762          might be 1.0182x faster
   HashMap-string-put-get-iterate                 74.8393+-0.4343     ^     71.7546+-0.4862        ^ definitely 1.0430x faster
   indexed-properties-in-objects                   4.5585+-0.0447     ^      4.3592+-0.0602        ^ definitely 1.0457x faster
   inline-arguments-access                         1.2686+-0.0085     !      1.2946+-0.0073        ! definitely 1.0205x slower
   inline-arguments-local-escape                  22.8639+-0.1066     ?     23.0111+-0.2137        ?
   inline-get-scoped-var                           6.5241+-0.0904     ?      6.5947+-0.0744        ? might be 1.0108x slower
   inlined-put-by-id-transition                   16.9741+-0.3131     ^     16.3867+-0.1678        ^ definitely 1.0358x faster
   int-or-other-abs-then-get-by-val                8.9580+-0.1331            8.8304+-0.0711          might be 1.0145x faster
   int-or-other-abs-zero-then-get-by-val          37.2762+-0.2962           37.0678+-0.2655        
   int-or-other-add-then-get-by-val               10.1959+-0.0883           10.1918+-0.0833        
   int-or-other-add                               10.4838+-0.0927           10.4806+-0.0976        
   int-or-other-div-then-get-by-val                7.9154+-0.1208            7.7420+-0.0836          might be 1.0224x faster
   int-or-other-max-then-get-by-val               10.0459+-0.1317            9.9063+-0.1830          might be 1.0141x faster
   int-or-other-min-then-get-by-val                8.1799+-0.1150            8.0404+-0.0963          might be 1.0174x faster
   int-or-other-mod-then-get-by-val                8.0952+-0.1000     ^      7.8596+-0.0957        ^ definitely 1.0300x faster
   int-or-other-mul-then-get-by-val                7.1842+-0.1142            7.0712+-0.0996          might be 1.0160x faster
   int-or-other-neg-then-get-by-val                8.0313+-0.0966            7.9997+-0.1039        
   int-or-other-neg-zero-then-get-by-val          36.5081+-0.1659     ?     36.5956+-0.3472        ?
   int-or-other-sub-then-get-by-val               10.2755+-0.0902           10.2391+-0.1035        
   int-or-other-sub                                8.2252+-0.0955            8.2185+-0.1083        
   int-overflow-local                             12.8977+-0.0772           12.7714+-0.1342        
   Int16Array-bubble-sort                         49.4208+-0.1271           49.1982+-0.1818        
   Int16Array-load-int-mul                         1.8965+-0.0104     !      1.9175+-0.0086        ! definitely 1.0111x slower
   Int8Array-load                                  4.8578+-0.0608            4.8493+-0.0222        
   integer-divide                                 15.1363+-0.0848           15.0918+-0.0723        
   integer-modulo                                  2.1610+-0.0169     ^      2.0387+-0.0145        ^ definitely 1.0600x faster
   make-indexed-storage                            4.1687+-0.0329     ^      3.7362+-0.0354        ^ definitely 1.1158x faster
   method-on-number                               25.9305+-0.4858           25.5215+-0.5658          might be 1.0160x faster
   nested-function-parsing-random                382.7938+-13.2770         382.5710+-13.3033       
   nested-function-parsing                        48.0004+-1.1081           47.6056+-0.9559        
   new-array-buffer-dead                           3.6478+-0.0126     ?      3.6654+-0.0320        ?
   new-array-buffer-push                          10.4742+-0.1599           10.3453+-0.1926          might be 1.0125x faster
   new-array-dead                                 28.2037+-0.0834     ?     28.3341+-0.0778        ?
   new-array-push                                  6.9144+-0.1146            6.8513+-0.0614        
   number-test                                     4.2870+-0.0863     ?      4.3301+-0.0699        ? might be 1.0101x slower
   object-closure-call                             8.4225+-0.1019     !     10.5617+-1.9059        ! definitely 1.2540x slower
   object-test                                     4.8792+-0.0772     ?      4.9431+-0.0602        ? might be 1.0131x slower
   poly-stricteq                                 970.9461+-2.8302     !    998.5610+-12.9388       ! definitely 1.0284x slower
   polymorphic-structure                          19.9364+-0.0806     !     21.2821+-0.6624        ! definitely 1.0675x slower
   polyvariant-monomorphic-get-by-id              12.4956+-0.0837     ?     12.5439+-0.0887        ?
   rare-osr-exit-on-local                         20.4981+-0.1298           20.4968+-0.0769        
   register-pressure-from-osr                     31.5793+-0.1081     ^     31.3128+-0.1067        ^ definitely 1.0085x faster
   simple-activation-demo                         34.4835+-0.1380     ^     34.2321+-0.0932        ^ definitely 1.0073x faster
   slow-array-profile-convergence                  4.4407+-0.0591     ^      4.1033+-0.0225        ^ definitely 1.0822x faster
   slow-convergence                                3.8817+-0.0047     ^      3.4549+-0.0263        ^ definitely 1.1235x faster
   sparse-conditional                              1.3439+-0.0099            1.3207+-0.0219          might be 1.0175x faster
   splice-to-remove                               50.2737+-0.2501     ^     49.5566+-0.1260        ^ definitely 1.0145x faster
   string-concat-object                            2.7306+-0.0457            2.7267+-0.0512        
   string-concat-pair-object                       2.7407+-0.0789            2.6530+-0.0241          might be 1.0331x faster
   string-concat-pair-simple                      16.8871+-0.3057     ?     16.9288+-0.2450        ?
   string-concat-simple                           17.0945+-0.2832     ?     17.2314+-0.2649        ?
   string-cons-repeat                             10.0806+-0.0248     !     10.1844+-0.0261        ! definitely 1.0103x slower
   string-cons-tower                              10.9453+-0.0229     ^     10.8847+-0.0327        ^ definitely 1.0056x faster
   string-equality                               108.0721+-1.9209     ^    105.1292+-0.8394        ^ definitely 1.0280x faster
   string-hash                                     2.6790+-0.0128     ^      2.5520+-0.0106        ^ definitely 1.0497x faster
   string-repeat-arith                            43.9696+-0.4475     ?     44.3525+-0.1472        ?
   string-sub                                     85.6510+-1.0897     !     87.2145+-0.3700        ! definitely 1.0183x slower
   string-test                                     4.2977+-0.0257            4.2428+-0.0657          might be 1.0129x faster
   structure-hoist-over-transitions                3.3313+-0.0247     ^      3.1696+-0.0227        ^ definitely 1.0510x faster
   tear-off-arguments-simple                       1.8326+-0.0105     ^      1.7574+-0.0085        ^ definitely 1.0428x faster
   tear-off-arguments                              3.4296+-0.0153     ^      3.1928+-0.0189        ^ definitely 1.0742x faster
   temporal-structure                             21.2194+-0.5357           21.0447+-0.1684        
   to-int32-boolean                               30.6567+-0.2084     ?     30.7856+-0.2886        ?
   undefined-test                                  4.5862+-0.0360            4.4873+-0.0853          might be 1.0220x faster

   <arithmetic>                                   30.3231+-0.1327     ?     30.5050+-0.1677        ? might be 1.0060x slower
   <geometric> *                                   9.7065+-0.0245     ^      9.5811+-0.0356        ^ definitely 1.0131x faster
   <harmonic>                                      5.2523+-0.0209     ^      5.0702+-0.0228        ^ definitely 1.0359x faster

                                                     TipOfTree                    CoCo                                       
All benchmarks:
   <arithmetic>                                   64.3186+-0.2581           64.1813+-0.8267          might be 1.0021x faster
   <geometric>                                    12.3018+-0.0546     ^     11.9361+-0.0614        ^ definitely 1.0306x faster
   <harmonic>                                      3.8423+-0.0339     ^      3.7581+-0.0380        ^ definitely 1.0224x faster

                                                     TipOfTree                    CoCo                                       
Geomean of preferred means:
   <scaled-result>                                27.5761+-0.1454     ^     26.3109+-0.1840        ^ definitely 1.0481x faster
Comment 13 Filip Pizlo 2013-05-21 11:02:20 PDT
(In reply to comment #11)
> (In reply to comment #10)
> > (From update of attachment 202380 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=202380&action=review
> > 
> > r=me
> > 
> > > Source/JavaScriptCore/jit/JITStubs.cpp:959
> > > +        worklistState =
> > > +            stackFrame.vm->worklist->completeAllReadyPlansForVM(*stackFrame.vm, codeBlock);
> > 
> > This seems OK for now, but it might become a problem if LLVM compilation sticks at ~4ms / function. If you had 4 functions queued up, one call to cti_optimize would cause you to drop an animation frame. If you had 16 functions queued up, you'd drop 4 frames. Basically, there's a tradeoff here between startup time and responsiveness, and the nob is turned all the way toward startup time right now, which is probably the wrong balance.
> 
> No it won't, there's no trade-off.  This method, "complete all ready [sic] plans for VM", does not wait for the compilation thread.  It just completes the ones that are done.  The only waiting it does is on the Worklist::m_lock, which is only held when modifying the queue/map/finished data structures.  Those modifications are pretty quick - at worst, grab the lock, loop over the list, release the lock.  The compilation thread only holds the lock for a very short time - at worst to dequeue or to append.
> 
> This is distinct from completeAllPlansForVM(), which does wait for the compilation thread to finish.
> 
> > 
> > > Source/JavaScriptCore/runtime/JSSegmentedVariableObject.h:97
> > > +    typedef ConcurrentJITLock Lock;
> > > +    typedef ConcurrentJITLocker Locker;
> > 
> > Actually, I think this reads even better without the typedef, since it calls out the other client you're locking against.
> 
> Do you advocate making this change throughout?  There are a bunch of places that use the typedef-ConcurrentJITLock-to-Lock idiom.  I kind of like how this makes most code terse but still lets you find what the typedef means.
> 
> But I kind of like the change you're proposing.  Since that would be a larger patch (all of the places that use ConcurrentJITLock, and there are a bunch of them), how about I do it in a separate bug?

Filed: https://bugs.webkit.org/show_bug.cgi?id=116561
Comment 14 Geoffrey Garen 2013-05-21 11:26:40 PDT
> > But I kind of like the change you're proposing.  Since that would be a larger patch (all of the places that use ConcurrentJITLock, and there are a bunch of them), how about I do it in a separate bug?

Sounds great!
Comment 15 Filip Pizlo 2013-05-21 12:10:42 PDT
Landed in http://trac.webkit.org/changeset/150465