Bug 114762

Summary: fourthTier: all inline caches should thread-safe enough to allow a concurrent compilation thread to read them safely
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: JavaScriptCoreAssignee: Filip Pizlo <fpizlo>
Status: RESOLVED FIXED    
Severity: Normal CC: barraclough, ggaren, mark.lam, mhahnenberg, msaboff, oliver, sam, xinchao.peng
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 112839    
Attachments:
Description Flags
the patch
none
the patch mhahnenberg: review+

Description Filip Pizlo 2013-04-17 13:37:22 PDT
Patch forthcoming.
Comment 1 Filip Pizlo 2013-04-17 14:06:36 PDT
Created attachment 198609 [details]
the patch

Not yet ready for review since I'm still running perf tests.
Comment 2 Filip Pizlo 2013-04-17 14:07:56 PDT
Comment on attachment 198609 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=198609&action=review

> Source/JavaScriptCore/jit/JITStubs.cpp:2178
> +    else {

I will revert this style goof.
Comment 3 Filip Pizlo 2013-04-17 14:21:03 PDT
Created attachment 198611 [details]
the patch

Still not ready for review.  I just did a few fixes, like implementing the generic version of weakCompareAndSwap over bytes.
Comment 4 Filip Pizlo 2013-04-17 14:46:17 PDT
Looks sufficiently neutralish.

Benchmark report for SunSpider, V8Spider, Octane, Kraken, and JSRegress on oldmac (MacPro4,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/fourthTier/OpenSource/WebKitBuild/Release/jsc (r148622)
"Conf#2" at /Volumes/Data/fromMiniMe/fourthTier/primary/OpenSource/WebKitBuild/Release/jsc (r148622)

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get
microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                     TipOfTree                   Conf#2                                      
SunSpider:
   3d-cube                                        10.7732+-0.7529     ?     10.8990+-0.9224        ? might be 1.0117x slower
   3d-morph                                        9.9823+-0.8885     ?     10.5479+-0.8095        ? might be 1.0567x slower
   3d-raytrace                                    12.7587+-0.9596           12.3760+-0.8274          might be 1.0309x faster
   access-binary-trees                             2.4409+-0.5366            2.3562+-0.5066          might be 1.0360x faster
   access-fannkuch                                 9.4951+-0.7698            8.5674+-0.5061          might be 1.1083x faster
   access-nbody                                    4.9198+-0.3752     ?      5.5967+-0.6018        ? might be 1.1376x slower
   access-nsieve                                   5.1594+-0.3121     ?      5.7071+-0.6323        ? might be 1.1061x slower
   bitops-3bit-bits-in-byte                        2.2412+-0.3776            2.1458+-0.4400          might be 1.0444x faster
   bitops-bits-in-byte                             7.7304+-0.4993     ?      7.7334+-0.5442        ?
   bitops-bitwise-and                              3.2975+-0.6362     ?      3.4661+-0.6822        ? might be 1.0511x slower
   bitops-nsieve-bits                              4.8339+-0.5006     ?      5.2027+-0.7069        ? might be 1.0763x slower
   controlflow-recursive                           3.5442+-0.4818     ?      4.3113+-0.7373        ? might be 1.2164x slower
   crypto-aes                                      8.9056+-0.6071     ?      9.4226+-0.6975        ? might be 1.0581x slower
   crypto-md5                                      4.2070+-0.0690     !      5.1098+-0.5744        ! definitely 1.2146x slower
   crypto-sha1                                     3.9989+-0.6133            3.9405+-0.6392          might be 1.0148x faster
   date-format-tofte                              17.3577+-1.3181     ?     18.3363+-1.5610        ? might be 1.0564x slower
   date-format-xparb                              10.6231+-0.7370     ?     11.1372+-1.2858        ? might be 1.0484x slower
   math-cordic                                     4.5687+-0.5623            4.1780+-0.2841          might be 1.0935x faster
   math-partial-sums                              14.0246+-1.3121     ?     15.5887+-0.7774        ? might be 1.1115x slower
   math-spectral-norm                              3.7217+-0.5433            3.6799+-0.4566          might be 1.0114x faster
   regexp-dna                                     13.4874+-0.9232           13.2055+-0.8186          might be 1.0213x faster
   string-base64                                   5.8741+-0.7326            5.5280+-0.7045          might be 1.0626x faster
   string-fasta                                   12.5830+-0.8348     ?     12.6457+-0.9810        ?
   string-tagcloud                                17.0785+-1.2532     ?     17.3610+-0.8267        ? might be 1.0165x slower
   string-unpack-code                             32.5443+-2.2888           30.3825+-1.9038          might be 1.0712x faster
   string-validate-input                           9.5832+-0.5710     ^      8.1781+-0.6308        ^ definitely 1.1718x faster

   <arithmetic> *                                  9.0667+-0.3284     ?      9.1386+-0.2873        ? might be 1.0079x slower
   <geometric>                                     7.2184+-0.2548     ?      7.3315+-0.2477        ? might be 1.0157x slower
   <harmonic>                                      5.7588+-0.2265     ?      5.8350+-0.2167        ? might be 1.0132x slower

                                                     TipOfTree                   Conf#2                                      
V8Spider:
   crypto                                         98.2067+-2.2626     !    104.2140+-2.8089        ! definitely 1.0612x slower
   deltablue                                     142.8194+-9.2890     ?    150.9483+-8.1493        ? might be 1.0569x slower
   earley-boyer                                   89.3803+-5.0535     ?     90.9872+-4.8304        ? might be 1.0180x slower
   raytrace                                       72.6136+-3.5272           69.5225+-3.9035          might be 1.0445x faster
   regexp                                        113.4736+-6.3801     ?    113.7722+-5.4376        ?
   richards                                      134.9424+-6.5422          133.4757+-5.8802          might be 1.0110x faster
   splay                                          50.5034+-2.0854     ?     52.7329+-2.6118        ? might be 1.0441x slower

   <arithmetic>                                  100.2771+-2.6839     ?    102.2361+-1.9171        ? might be 1.0195x slower
   <geometric> *                                  94.9037+-2.3333     ?     96.5785+-1.5952        ? might be 1.0176x slower
   <harmonic>                                     89.1866+-2.0660     ?     90.6574+-1.5417        ? might be 1.0165x slower

                                                     TipOfTree                   Conf#2                                      
Octane and V8v7:
   encrypt                                        0.52861+-0.01990          0.49864+-0.01720         might be 1.0601x faster
   decrypt                                       10.77387+-0.16534    ?    10.90941+-0.17941       ? might be 1.0126x slower
   deltablue                             x2       0.68750+-0.01796    ?     0.68863+-0.01681       ?
   earley                                         0.93213+-0.01255    ?     0.96996+-0.03889       ? might be 1.0406x slower
   boyer                                         13.65013+-0.65114         12.94433+-0.15999         might be 1.0545x faster
   raytrace                              x2       4.68156+-0.21498    ?     4.74780+-0.24781       ? might be 1.0141x slower
   regexp                                x2      36.74121+-0.83399    ?    37.24425+-1.13091       ? might be 1.0137x slower
   richards                              x2       0.37891+-0.00634    ?     0.38478+-0.00909       ? might be 1.0155x slower
   splay                                 x2       0.65758+-0.02612    ?     0.66659+-0.03335       ? might be 1.0137x slower
   navier-stokes                         x2      12.46268+-0.15495    ?    12.80540+-0.31373       ? might be 1.0275x slower
   closure                                        0.35169+-0.05558    ?     0.35659+-0.05839       ? might be 1.0139x slower
   jquery                                         5.01616+-0.69583    ?     5.10675+-0.76303       ? might be 1.0181x slower
   gbemu                                 x2     287.18190+-17.21174       285.78391+-17.74520      
   box2d                                 x2      36.43667+-0.94683         36.22425+-0.84636       

V8v7:
   <arithmetic>                                   8.56898+-0.13079    ?     8.64983+-0.12523       ? might be 1.0094x slower
   <geometric> *                                  2.73193+-0.02610    ?     2.75016+-0.02116       ? might be 1.0067x slower
   <harmonic>                                     1.06501+-0.01425    ?     1.06880+-0.01532       ? might be 1.0036x slower

Octane including V8v7:
   <arithmetic>                                  35.89585+-1.63302         35.81259+-1.66363         might be 1.0023x faster
   <geometric> *                                  4.93247+-0.09883    ?     4.95755+-0.09884       ? might be 1.0051x slower
   <harmonic>                                     1.20386+-0.03107    ?     1.21003+-0.03595       ? might be 1.0051x slower

                                                     TipOfTree                   Conf#2                                      
Kraken:
   ai-astar                                       564.016+-22.741           548.852+-26.473          might be 1.0276x faster
   audio-beat-detection                           271.087+-13.166           267.935+-15.064          might be 1.0118x faster
   audio-dft                                      359.598+-13.298     ?     387.805+-29.153        ? might be 1.0784x slower
   audio-fft                                      163.107+-6.692      ?     164.352+-4.663         ?
   audio-oscillator                               262.675+-12.257           258.578+-10.941          might be 1.0158x faster
   imaging-darkroom                               322.852+-14.686     ?     331.091+-13.220        ? might be 1.0255x slower
   imaging-desaturate                             189.341+-7.705            180.871+-10.651          might be 1.0468x faster
   imaging-gaussian-blur                          449.468+-31.862           438.416+-16.551          might be 1.0252x faster
   json-parse-financial                            89.767+-4.353      ?      91.340+-4.658         ? might be 1.0175x slower
   json-stringify-tinderbox                       117.635+-2.998      ?     118.585+-3.430         ?
   stanford-crypto-aes                            274.929+-9.222            263.918+-16.025          might be 1.0417x faster
   stanford-crypto-ccm                            119.097+-6.197      ?     124.912+-4.557         ? might be 1.0488x slower
   stanford-crypto-pbkdf2                         289.861+-13.921     ?     291.108+-13.613        ?
   stanford-crypto-sha256-iterative               127.304+-7.241      ?     130.271+-4.828         ? might be 1.0233x slower

   <arithmetic> *                                 257.196+-3.358            257.002+-3.752           might be 1.0008x faster
   <geometric>                                    223.973+-2.605      ?     224.630+-2.758         ? might be 1.0029x slower
   <harmonic>                                     194.187+-2.128      ?     195.746+-2.284         ? might be 1.0080x slower

                                                     TipOfTree                   Conf#2                                      
JSRegress:
   adapt-to-double-divide                         25.3017+-2.0167     ?     25.5738+-2.0607        ? might be 1.0108x slower
   aliased-arguments-getbyval                      1.0736+-0.2077            1.0026+-0.1462          might be 1.0708x faster
   allocate-big-object                             3.1486+-0.5895            2.8882+-0.4687          might be 1.0902x faster
   arity-mismatch-inlining                         0.9315+-0.2269            0.8802+-0.2071          might be 1.0582x faster
   array-access-polymorphic-structure              8.2992+-0.7990            7.9141+-0.7346          might be 1.0487x faster
   array-with-double-add                           6.8133+-0.7338            6.5764+-0.6513          might be 1.0360x faster
   array-with-double-increment                     4.7634+-0.6554            4.6154+-0.4589          might be 1.0321x faster
   array-with-double-mul-add                       8.1105+-0.5581     ?      8.5930+-0.3939        ? might be 1.0595x slower
   array-with-double-sum                           8.9950+-0.7045     ?      9.0182+-0.5771        ?
   array-with-int32-add-sub                       12.6781+-0.9069           12.3052+-0.4419          might be 1.0303x faster
   array-with-int32-or-double-sum                  8.7711+-0.7163     ?      8.7902+-0.5434        ?
   big-int-mul                                     5.5825+-0.6763            5.4768+-0.6071          might be 1.0193x faster
   boolean-test                                    5.0977+-0.5875     ?      5.1502+-0.5915        ? might be 1.0103x slower
   cast-int-to-double                             14.8425+-1.1230     ?     16.0399+-1.0950        ? might be 1.0807x slower
   cell-argument                                  16.0172+-1.1938           15.6231+-0.9089          might be 1.0252x faster
   cfg-simplify                                    4.2933+-0.4263     ?      4.7271+-0.6268        ? might be 1.1010x slower
   cmpeq-obj-to-obj-other                         11.8893+-0.9601     ?     12.7418+-0.8596        ? might be 1.0717x slower
   constant-test                                   9.7388+-0.5613            9.6625+-0.6959        
   direct-arguments-getbyval                       1.0562+-0.3271            0.9729+-0.1991          might be 1.0856x faster
   double-pollution-getbyval                      12.8150+-1.1972           12.0079+-0.7601          might be 1.0672x faster
   double-pollution-putbyoffset                    6.0374+-0.7701            6.0309+-0.6917        
   empty-string-plus-int                          12.6062+-0.8203     !     14.3878+-0.8777        ! definitely 1.1413x slower
   external-arguments-getbyval                     2.7412+-0.5916            2.4407+-0.3284          might be 1.1231x faster
   external-arguments-putbyval                     3.8867+-0.5553     ?      4.0535+-0.5437        ? might be 1.0429x slower
   Float32Array-matrix-mult                       15.8385+-0.6888     ?     16.3619+-1.1122        ? might be 1.0331x slower
   fold-double-to-int                             24.7382+-1.2906           23.5455+-1.3652          might be 1.0507x faster
   function-dot-apply                              3.9800+-0.6518            3.7141+-0.6022          might be 1.0716x faster
   function-test                                   5.7699+-0.6511     ?      5.9686+-0.7408        ? might be 1.0344x slower
   get-by-id-chain-from-try-block                  8.0133+-0.5624     ?      8.9817+-0.6651        ? might be 1.1208x slower
   HashMap-put-get-iterate-keys                  102.2048+-5.0722     ?    109.6164+-6.3441        ? might be 1.0725x slower
   HashMap-put-get-iterate                       107.1619+-6.1954     ?    110.7891+-4.5439        ? might be 1.0338x slower
   HashMap-string-put-get-iterate                 82.7761+-4.5444           80.2078+-4.8661          might be 1.0320x faster
   indexed-properties-in-objects                   5.2805+-0.6568            4.9070+-0.4738          might be 1.0761x faster
   inline-arguments-access                         1.4816+-0.2234     ?      1.6259+-0.4115        ? might be 1.0974x slower
   inline-arguments-local-escape                  25.8086+-1.9293     ?     26.8575+-1.4942        ? might be 1.0406x slower
   inline-get-scoped-var                           7.7334+-0.5407            7.2062+-0.7153          might be 1.0732x faster
   inlined-put-by-id-transition                   16.6553+-0.2585     ?     17.0550+-0.7717        ? might be 1.0240x slower
   int-or-other-abs-then-get-by-val               10.8286+-0.8712            9.7244+-0.7090          might be 1.1136x faster
   int-or-other-abs-zero-then-get-by-val          40.5876+-2.7099     ?     41.7003+-2.9785        ? might be 1.0274x slower
   int-or-other-add-then-get-by-val               10.6907+-0.5346     !     12.5845+-0.4341        ! definitely 1.1772x slower
   int-or-other-add                               12.1626+-0.7466           12.1310+-0.8930        
   int-or-other-div-then-get-by-val                8.3704+-0.5576     ?      8.7899+-0.6522        ? might be 1.0501x slower
   int-or-other-max-then-get-by-val               11.1460+-0.6125     ?     11.3571+-0.8223        ? might be 1.0189x slower
   int-or-other-min-then-get-by-val                9.5575+-0.6996     ?      9.9255+-0.6154        ? might be 1.0385x slower
   int-or-other-mod-then-get-by-val                9.2831+-0.4987            9.0938+-0.3757          might be 1.0208x faster
   int-or-other-mul-then-get-by-val                8.2409+-0.5250            8.1759+-0.6603        
   int-or-other-neg-then-get-by-val                9.8055+-0.4325            9.5058+-0.6032          might be 1.0315x faster
   int-or-other-neg-zero-then-get-by-val          40.9331+-2.5917           40.6453+-2.9694        
   int-or-other-sub-then-get-by-val               11.0429+-0.7352     ?     11.1979+-0.8853        ? might be 1.0140x slower
   int-or-other-sub                                9.3172+-0.7569            9.2757+-0.7517        
   int-overflow-local                             15.1200+-1.1562     ?     15.6340+-0.7867        ? might be 1.0340x slower
   Int16Array-bubble-sort                         54.2952+-2.9049     ?     57.5772+-0.9861        ? might be 1.0604x slower
   Int16Array-load-int-mul                         2.4510+-0.5453     ?      2.6050+-0.6645        ? might be 1.0628x slower
   Int8Array-load                                  5.4719+-0.6333     ?      5.8526+-0.6967        ? might be 1.0696x slower
   integer-divide                                 16.3649+-0.8985     ?     16.8948+-1.0491        ? might be 1.0324x slower
   integer-modulo                                  2.5957+-0.4867            2.4641+-0.4221          might be 1.0534x faster
   make-indexed-storage                            4.5676+-0.6015            4.4355+-0.4995          might be 1.0298x faster
   method-on-number                               26.8215+-1.8931           24.9067+-1.1704          might be 1.0769x faster
   nested-function-parsing-random                420.2076+-27.4093    ?    424.9766+-29.3867       ? might be 1.0113x slower
   nested-function-parsing                        52.3087+-3.4263     ?     53.2750+-4.5216        ? might be 1.0185x slower
   new-array-buffer-dead                           4.3082+-0.5686            4.0262+-0.5919          might be 1.0700x faster
   new-array-buffer-push                          12.3439+-1.0000           11.8319+-1.1610          might be 1.0433x faster
   new-array-dead                                 33.4362+-0.9490           32.5353+-2.6809          might be 1.0277x faster
   new-array-push                                  8.2143+-0.6605     ?      8.7602+-0.6099        ? might be 1.0665x slower
   number-test                                     5.0777+-0.6558            4.9625+-0.6016          might be 1.0232x faster
   object-closure-call                             8.6807+-0.4051     !     10.1376+-0.8897        ! definitely 1.1678x slower
   object-test                                     6.0050+-0.6591     ?      6.1622+-0.7019        ? might be 1.0262x slower
   poly-stricteq                                 110.2450+-2.0018          109.8285+-3.3010        
   polymorphic-structure                          21.7346+-1.3504     ?     23.7902+-1.1342        ? might be 1.0946x slower
   polyvariant-monomorphic-get-by-id              15.4141+-0.9650           14.8617+-0.6667          might be 1.0372x faster
   rare-osr-exit-on-local                         22.0829+-1.5925     !     24.4667+-0.6956        ! definitely 1.1080x slower
   register-pressure-from-osr                     36.8713+-1.5757     ^     32.8851+-1.4277        ^ definitely 1.1212x faster
   simple-activation-demo                         38.5390+-2.7533     ?     41.2185+-1.2459        ? might be 1.0695x slower
   slow-array-profile-convergence                  5.0100+-0.6991     ?      5.1050+-0.6707        ? might be 1.0190x slower
   slow-convergence                                4.7688+-0.7602            4.1473+-0.4916          might be 1.1499x faster
   sparse-conditional                              1.5460+-0.3089     ?      1.6908+-0.2650        ? might be 1.0937x slower
   splice-to-remove                               57.7990+-3.7626     ?     59.0822+-2.8042        ? might be 1.0222x slower
   string-concat-object                            2.9805+-0.4187     ?      3.0939+-0.2808        ? might be 1.0380x slower
   string-concat-pair-object                       3.1770+-0.5265            2.7198+-0.0424          might be 1.1681x faster
   string-concat-pair-simple                      18.6781+-1.3768           18.0079+-1.3392          might be 1.0372x faster
   string-concat-simple                           17.3103+-0.1950     !     18.9641+-1.3489        ! definitely 1.0955x slower
   string-cons-repeat                             11.6338+-0.9880           11.1892+-0.8615          might be 1.0397x faster
   string-cons-tower                              12.1271+-0.8269     ?     12.8342+-1.0835        ? might be 1.0583x slower
   string-equality                               120.8965+-9.1981     ?    125.2224+-3.3817        ? might be 1.0358x slower
   string-hash                                     2.8263+-0.1812     ?      3.1405+-0.4257        ? might be 1.1112x slower
   string-repeat-arith                            53.5491+-0.6551           53.5349+-2.8296        
   string-sub                                     95.1613+-4.3599     !    104.7923+-4.7190        ! definitely 1.1012x slower
   string-test                                     5.0047+-0.6573            4.9646+-0.6174        
   structure-hoist-over-transitions                3.8305+-0.5905            3.7142+-0.4848          might be 1.0313x faster
   tear-off-arguments-simple                       2.2737+-0.5667            2.1894+-0.4130          might be 1.0385x faster
   tear-off-arguments                              3.6298+-0.3333     ?      4.2440+-0.6852        ? might be 1.1692x slower
   temporal-structure                             23.2880+-1.0497     ?     23.7900+-1.2156        ? might be 1.0216x slower
   to-int32-boolean                               34.1202+-2.4273     ?     34.1966+-1.5382        ?
   undefined-test                                  5.1342+-0.6350            4.9934+-0.5581          might be 1.0282x faster

   <arithmetic>                                   23.4340+-0.5584     ?     23.8736+-0.5201        ? might be 1.0188x slower
   <geometric> *                                  10.6473+-0.3040     ?     10.7392+-0.2830        ? might be 1.0086x slower
   <harmonic>                                      5.8342+-0.1597            5.8334+-0.1966          might be 1.0001x faster

                                                     TipOfTree                   Conf#2                                      
All benchmarks:
   <arithmetic>                                   46.2020+-0.6356     ?     46.5232+-0.7133        ? might be 1.0070x slower
   <geometric>                                    12.8693+-0.3133     ?     12.9877+-0.2960        ? might be 1.0092x slower
   <harmonic>                                      4.1725+-0.1058     ?      4.1878+-0.1152        ? might be 1.0037x slower

                                                     TipOfTree                   Conf#2                                      
Geomean of preferred means:
   <scaled-result>                                25.8772+-0.4952     ?     26.0788+-0.4600        ? might be 1.0078x slower
Comment 5 Mark Hahnenberg 2013-04-18 14:40:34 PDT
Comment on attachment 198611 [details]
the patch

r=me
Comment 6 Filip Pizlo 2013-04-18 14:47:40 PDT
Landed in http://trac.webkit.org/changeset/148703
Comment 7 Peng Xinchao 2013-08-06 02:46:01 PDT
at 32 bit ARM_THUMB2 ,Implement concurrent compilation thread , how i should do ?Do i need to add lock at writing profiler ?Where?
Comment 8 Filip Pizlo 2013-08-06 06:40:54 PDT
(In reply to comment #7)
> at 32 bit ARM_THUMB2 ,Implement concurrent compilation thread , how i should do ?Do i need to add lock at writing profiler ?Where?

I don't think that concurrent compilation would be a speed up on 32-bit systems. You'd have to add locking to value profiles. That's almost certainly a terrible idea.
Comment 9 Peng Xinchao 2013-08-06 20:25:36 PDT
(In reply to comment #8)
> (In reply to comment #7)
> > at 32 bit ARM_THUMB2 ,Implement concurrent compilation thread , how i should do ?Do i need to add lock at writing profiler ?Where?
> 
> I don't think that concurrent compilation would be a speed up on 32-bit systems. You'd have to add locking to value profiles. That's almost certainly a terrible idea

I try to add locking to value profile:
at function CodeBlock:CodeBlock()
  
    m_instructions = WTF::RefCountedArray<Instruction>(instructions);

--->
   {
     ConcurrentJITLocker locker(m_lock);
     m_instructions = WTF::RefCountedArray<Instruction>(instructions);
  }

but it still happened crash。Where do I still need to motify ?
Comment 10 Filip Pizlo 2013-08-06 20:30:58 PDT
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #7)
> > > at 32 bit ARM_THUMB2 ,Implement concurrent compilation thread , how i should do ?Do i need to add lock at writing profiler ?Where?
> > 
> > I don't think that concurrent compilation would be a speed up on 32-bit systems. You'd have to add locking to value profiles. That's almost certainly a terrible idea
> 
> I try to add locking to value profile:
> at function CodeBlock:CodeBlock()
> 
>     m_instructions = WTF::RefCountedArray<Instruction>(instructions);
> 
> --->
>    {
>      ConcurrentJITLocker locker(m_lock);
>      m_instructions = WTF::RefCountedArray<Instruction>(instructions);
>   }
> 
> but it still happened crash。Where do I still need to motify ?

I'm really confused.  How does your fix resolve the word-tearing problems when generated assembly code in the LLInt and baseline JIT attempts to store a 64-bit value into a value profile using two 32-bit stores?
Comment 11 Peng Xinchao 2013-08-06 20:41:01 PDT
(In reply to comment #10)
> (In reply to comment #9)
> > (In reply to comment #8)
> > > (In reply to comment #7)
> > > > at 32 bit ARM_THUMB2 ,Implement concurrent compilation thread , how i should do ?Do i need to add lock at writing profiler ?Where?
> > > 
> > > I don't think that concurrent compilation would be a speed up on 32-bit systems. You'd have to add locking to value profiles. That's almost certainly a terrible idea
> > 
> > I try to add locking to value profile:
> > at function CodeBlock:CodeBlock()
> > 
> >     m_instructions = WTF::RefCountedArray<Instruction>(instructions);
> > 
> > --->
> >    {
> >      ConcurrentJITLocker locker(m_lock);
> >      m_instructions = WTF::RefCountedArray<Instruction>(instructions);
> >   }
> > 
> > but it still happened crash。Where do I still need to motify ?
> 
> I'm really confused.  How does your fix resolve the word-tearing problems when generated assembly code in the LLInt and baseline JIT attempts to store a 64-bit value into a value profile using two 32-bit stores

My fix is that i need to implement DFG thread in the 32 bit ARM_THUMB2 .
I think i need add lock to value profiles .so i do that .
do my doing have issue?
Comment 12 Filip Pizlo 2013-08-06 21:01:50 PDT
(In reply to comment #11)
> (In reply to comment #10)
> > (In reply to comment #9)
> > > (In reply to comment #8)
> > > > (In reply to comment #7)
> > > > > at 32 bit ARM_THUMB2 ,Implement concurrent compilation thread , how i should do ?Do i need to add lock at writing profiler ?Where?
> > > > 
> > > > I don't think that concurrent compilation would be a speed up on 32-bit systems. You'd have to add locking to value profiles. That's almost certainly a terrible idea
> > > 
> > > I try to add locking to value profile:
> > > at function CodeBlock:CodeBlock()
> > > 
> > >     m_instructions = WTF::RefCountedArray<Instruction>(instructions);
> > > 
> > > --->
> > >    {
> > >      ConcurrentJITLocker locker(m_lock);
> > >      m_instructions = WTF::RefCountedArray<Instruction>(instructions);
> > >   }
> > > 
> > > but it still happened crash。Where do I still need to motify ?
> > 
> > I'm really confused.  How does your fix resolve the word-tearing problems when generated assembly code in the LLInt and baseline JIT attempts to store a 64-bit value into a value profile using two 32-bit stores
> 
> My fix is that i need to implement DFG thread in the 32 bit ARM_THUMB2 .
> I think i need add lock to value profiles .so i do that .
> do my doing have issue?

Yes.  The issue is that the concurrent JIT depends on being on a 64-bit system, or more broadly, systems where doing atomic 64-bit stores to memory is cheap.  The 32-bit JIT and 32-bit LLInt have code that stores to value profiles using two 32-bit stores.  You would need to fix that, either by placing a lock in JITed code (and the LLInt) around stores to value profiles, or convert those stores to using 64-bit atomic stores (probably double strex would do it).  But that's probably not a good idea: the cost of doing locked/atomic 64-bit stores on 32-bit systems probably outweighs any benefit of doing concurrent JITing.  But, you're welcome to experiment with this; maybe it will work for you.  Certainly, your proposed fix doesn't accomplish what I'm suggesting.

An alternative would be to drastically refactor the concurrent JIT to scrape all value profiles synchronously.  This would be a huge change, and I don't recommend it: again, the cost of doing this would probably outweigh the benefits of concurrent JIT.
Comment 13 Peng Xinchao 2013-08-06 21:27:27 PDT
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > (In reply to comment #9)
> > > > (In reply to comment #8)
> > > > > (In reply to comment #7)
> > > > > > at 32 bit ARM_THUMB2 ,Implement concurrent compilation thread , how i should do ?Do i need to add lock at writing profiler ?Where?
> > > > > 
> > > > > I don't think that concurrent compilation would be a speed up on 32-bit systems. You'd have to add locking to value profiles. That's almost certainly a terrible idea
> > > > 
> > > > I try to add locking to value profile:
> > > > at function CodeBlock:CodeBlock()
> > > > 
> > > >     m_instructions = WTF::RefCountedArray<Instruction>(instructions);
> > > > 
> > > > --->
> > > >    {
> > > >      ConcurrentJITLocker locker(m_lock);
> > > >      m_instructions = WTF::RefCountedArray<Instruction>(instructions);
> > > >   }
> > > > 
> > > > but it still happened crash。Where do I still need to motify ?
> > > 
> > > I'm really confused.  How does your fix resolve the word-tearing problems when generated assembly code in the LLInt and baseline JIT attempts to store a 64-bit value into a value profile using two 32-bit stores
> > 
> > My fix is that i need to implement DFG thread in the 32 bit ARM_THUMB2 .
> > I think i need add lock to value profiles .so i do that .
> > do my doing have issue?
> 
> Yes.  The issue is that the concurrent JIT depends on being on a 64-bit system, or more broadly, systems where doing atomic 64-bit stores to memory is cheap.  The 32-bit JIT and 32-bit LLInt have code that stores to value profiles using two 32-bit stores.  You would need to fix that, either by placing a lock in JITed code (and the LLInt) around stores to value profiles, or convert those stores to using 64-bit atomic stores (probably double strex would do it).  But that's probably not a good idea: the cost of doing locked/atomic 64-bit stores on 32-bit systems probably outweighs any benefit of doing concurrent JITing.  But, you're welcome to experiment with this; maybe it will work for you.  Certainly, your proposed fix doesn't accomplish what I'm suggesting.
> 
> An alternative would be to drastically refactor the concurrent JIT to scrape all value profiles synchronously.  This would be a huge change, and I don't recommend it: again, the cost of doing this would probably outweigh the benefits of concurrent JIT.

If i do motification that you think , i guess that  performance of motified javascriptcore are not better than now .
Is my idea unnecessary? Do you thinks i have necessary to motify ?
Comment 14 Filip Pizlo 2013-08-06 21:31:01 PDT
(In reply to comment #13)
> (In reply to comment #12)
> > (In reply to comment #11)
> > > (In reply to comment #10)
> > > > (In reply to comment #9)
> > > > > (In reply to comment #8)
> > > > > > (In reply to comment #7)
> > > > > > > at 32 bit ARM_THUMB2 ,Implement concurrent compilation thread , how i should do ?Do i need to add lock at writing profiler ?Where?
> > > > > > 
> > > > > > I don't think that concurrent compilation would be a speed up on 32-bit systems. You'd have to add locking to value profiles. That's almost certainly a terrible idea
> > > > > 
> > > > > I try to add locking to value profile:
> > > > > at function CodeBlock:CodeBlock()
> > > > > 
> > > > >     m_instructions = WTF::RefCountedArray<Instruction>(instructions);
> > > > > 
> > > > > --->
> > > > >    {
> > > > >      ConcurrentJITLocker locker(m_lock);
> > > > >      m_instructions = WTF::RefCountedArray<Instruction>(instructions);
> > > > >   }
> > > > > 
> > > > > but it still happened crash。Where do I still need to motify ?
> > > > 
> > > > I'm really confused.  How does your fix resolve the word-tearing problems when generated assembly code in the LLInt and baseline JIT attempts to store a 64-bit value into a value profile using two 32-bit stores
> > > 
> > > My fix is that i need to implement DFG thread in the 32 bit ARM_THUMB2 .
> > > I think i need add lock to value profiles .so i do that .
> > > do my doing have issue?
> > 
> > Yes.  The issue is that the concurrent JIT depends on being on a 64-bit system, or more broadly, systems where doing atomic 64-bit stores to memory is cheap.  The 32-bit JIT and 32-bit LLInt have code that stores to value profiles using two 32-bit stores.  You would need to fix that, either by placing a lock in JITed code (and the LLInt) around stores to value profiles, or convert those stores to using 64-bit atomic stores (probably double strex would do it).  But that's probably not a good idea: the cost of doing locked/atomic 64-bit stores on 32-bit systems probably outweighs any benefit of doing concurrent JITing.  But, you're welcome to experiment with this; maybe it will work for you.  Certainly, your proposed fix doesn't accomplish what I'm suggesting.
> > 
> > An alternative would be to drastically refactor the concurrent JIT to scrape all value profiles synchronously.  This would be a huge change, and I don't recommend it: again, the cost of doing this would probably outweigh the benefits of concurrent JIT.
> 
> If i do motification that you think , i guess that  performance of motified javascriptcore are not better than now .
> Is my idea unnecessary? Do you thinks i have necessary to motify ?

Let me put it this way: concurrent JIT is a modest speed-up (around 8%) on SunSpider.  It's about an 11% speed-up when you run V8v6 under the SunSpider harness.  It's not a speed-up on Octane or V8v6/v7 when run using Google's harness.  It's not a speed-up on other benchmarks to my knowledge.

It's probably not a good idea to do major architectural work entirely based on benchmarks, but what this tells us is that you're at best looking at a 10% speed-up.  That's great, but it's not worth it if you have to introduce horrible bugs in the process or put locks around basic operations in the LLInt and Baseline JIT.

But I may be wrong, and attempting to switch the baseline JIT to use two-word ldrex/strex for value profiles would be easy.  If that turns out to have low overhead then maybe this whole thing would be worth a shot.  I recommend performing that experiment, but only if you think that 10% speed-up is worth it for your purposes ... and only if you're willing to deal with the bug fallout.