Bug 112780 - DFG implementation of op_strcat should inline rope allocations
Summary: DFG implementation of op_strcat should inline rope allocations
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Filip Pizlo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-20 00:45 PDT by Filip Pizlo
Modified: 2013-03-20 13:32 PDT (History)
7 users (show)

See Also:


Attachments
the patch (36.71 KB, patch)
2013-03-20 01:06 PDT, Filip Pizlo
oliver: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2013-03-20 00:45:54 PDT
Patch forthcoming.
Comment 1 Filip Pizlo 2013-03-20 00:51:40 PDT
OMG.


Benchmark report for SunSpider, V8Spider, Octane, Kraken, and JSRegress on oldmac (MacPro4,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc (r146250)
"StrCat" at /Volumes/Data/fromMiniMe/secondary/OpenSource/WebKitBuild/Release/jsc (r146250)

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get
microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                     TipOfTree                   StrCat                                      
SunSpider:
   3d-cube                                         9.1020+-0.1405            9.0950+-0.1850        
   3d-morph                                        8.7251+-0.1225     ?      8.8282+-0.1520        ? might be 1.0118x slower
   3d-raytrace                                    10.4436+-0.1461           10.4374+-0.1566        
   access-binary-trees                             1.9471+-0.0509            1.9353+-0.0111        
   access-fannkuch                                 7.7616+-0.1153            7.7001+-0.1067        
   access-nbody                                    4.6520+-0.0112     ?      4.6531+-0.0095        ?
   access-nsieve                                   4.9701+-0.0401            4.9238+-0.0813        
   bitops-3bit-bits-in-byte                        1.8327+-0.0085            1.8307+-0.0101        
   bitops-bits-in-byte                             7.0533+-0.1048     ?      7.0751+-0.1076        ?
   bitops-bitwise-and                              2.5455+-0.0763     ?      2.6443+-0.0709        ? might be 1.0388x slower
   bitops-nsieve-bits                              4.1817+-0.0232            4.1740+-0.0162        
   controlflow-recursive                           3.1199+-0.0449            3.1174+-0.0490        
   crypto-aes                                      7.6646+-0.1196     ?      7.7703+-0.1513        ? might be 1.0138x slower
   crypto-md5                                      3.9127+-0.0297     !      4.0287+-0.0361        ! definitely 1.0296x slower
   crypto-sha1                                     3.2083+-0.0224            3.1921+-0.0152        
   date-format-tofte                              14.8310+-0.1966     ?     14.8933+-0.1365        ?
   date-format-xparb                              10.5381+-0.2785     ^      9.3676+-0.1994        ^ definitely 1.1250x faster
   math-cordic                                     4.0293+-0.0119     ?      4.0297+-0.0107        ?
   math-partial-sums                              12.4347+-0.1101     ?     12.4886+-0.1218        ?
   math-spectral-norm                              3.1625+-0.0169            3.1514+-0.0139        
   regexp-dna                                     11.4363+-0.2102           11.3769+-0.1685        
   string-base64                                   4.8215+-0.0351            4.7259+-0.0886          might be 1.0202x faster
   string-fasta                                   10.8107+-0.1140     ?     10.8940+-0.1457        ?
   string-tagcloud                                14.3466+-0.3045           14.0300+-0.2068          might be 1.0226x faster
   string-unpack-code                             26.7938+-0.0828     !     27.2721+-0.1211        ! definitely 1.0179x slower
   string-validate-input                           7.5661+-0.1156            7.4003+-0.0780          might be 1.0224x faster

   <arithmetic> *                                  7.7650+-0.0574            7.7321+-0.0674          might be 1.0043x faster
   <geometric>                                     6.2588+-0.0384            6.2363+-0.0459          might be 1.0036x faster
   <harmonic>                                      5.0462+-0.0299            5.0432+-0.0279          might be 1.0006x faster

                                                     TipOfTree                   StrCat                                      
V8Spider:
   crypto                                         87.5234+-0.2011     ?     88.3779+-1.6128        ?
   deltablue                                     125.2600+-0.4073     ?    125.2918+-0.8755        ?
   earley-boyer                                   82.7613+-0.3771           82.7290+-0.2061        
   raytrace                                       61.5320+-0.1550     !     61.9204+-0.1081        ! definitely 1.0063x slower
   regexp                                        101.8963+-0.5501          101.4172+-0.1028        
   richards                                      119.0732+-0.3724     ?    119.1839+-0.5371        ?
   splay                                          56.6075+-0.4767     ^     48.7913+-0.2259        ^ definitely 1.1602x faster

   <arithmetic>                                   90.6648+-0.1710     ^     89.6731+-0.1945        ^ definitely 1.0111x faster
   <geometric> *                                  87.2010+-0.1745     ^     85.5134+-0.1948        ^ definitely 1.0197x faster
   <harmonic>                                     83.7071+-0.1899     ^     81.1221+-0.1878        ^ definitely 1.0319x faster

                                                     TipOfTree                   StrCat                                      
Octane and V8v7:
   encrypt                                        0.46706+-0.00054    ?     0.46774+-0.00056       ?
   decrypt                                        8.65021+-0.02235    ?     8.67169+-0.06606       ?
   deltablue                             x2       0.56827+-0.00067          0.56742+-0.00166       
   earley                                         0.88077+-0.00188    !     0.90650+-0.00392       ! definitely 1.0292x slower
   boyer                                         12.79319+-0.03411    ?    12.81921+-0.04082       ?
   raytrace                              x2       4.48527+-0.05895          4.43804+-0.03000         might be 1.0106x faster
   regexp                                x2      32.37329+-0.15671         32.25172+-0.14972       
   richards                              x2       0.30726+-0.00048          0.30637+-0.00106       
   splay                                 x2       0.70344+-0.00925    ^     0.64376+-0.02271       ^ definitely 1.0927x faster
   navier-stokes                         x2      10.81820+-0.01309         10.79608+-0.01563       
   closure                                        0.30878+-0.03588    ?     0.30972+-0.03426       ?
   jquery                                         4.39308+-0.55456    ?     4.41432+-0.55612       ?
   gbemu                                 x2     252.03606+-16.94385       251.16266+-16.24598      
   box2d                                 x2      31.63886+-0.18567         31.46117+-0.08973       

V8v7:
   <arithmetic>                                   7.58142+-0.02224          7.55450+-0.02116         might be 1.0036x faster
   <geometric> *                                  2.45074+-0.00605    ^     2.42239+-0.00884       ^ definitely 1.0117x faster
   <harmonic>                                     0.93920+-0.00210    ^     0.92490+-0.00497       ^ definitely 1.0155x faster

Octane including V8v7:
   <arithmetic>                                  31.51611+-1.55121         31.40198+-1.49411         might be 1.0036x faster
   <geometric> *                                  4.39657+-0.06955          4.35776+-0.06437         might be 1.0089x faster
   <harmonic>                                     1.06504+-0.02038          1.05233+-0.01717         might be 1.0121x faster

                                                     TipOfTree                   StrCat                                      
Kraken:
   ai-astar                                       494.695+-0.353      ?     495.012+-0.590         ?
   audio-beat-detection                           246.476+-2.484            246.077+-2.210         
   audio-dft                                      312.825+-0.970            312.310+-1.964         
   audio-fft                                      144.105+-0.240            143.736+-0.145         
   audio-oscillator                               234.673+-1.084            234.629+-1.034         
   imaging-darkroom                               291.928+-1.033            290.231+-0.802         
   imaging-desaturate                             160.432+-0.333            160.428+-0.308         
   imaging-gaussian-blur                          398.305+-0.991            397.517+-0.641         
   json-parse-financial                            79.658+-0.365             79.631+-0.155         
   json-stringify-tinderbox                       100.691+-0.306      ?     101.113+-0.543         ?
   stanford-crypto-aes                             97.129+-0.381             96.490+-0.544         
   stanford-crypto-ccm                            105.133+-4.045            104.265+-4.112         
   stanford-crypto-pbkdf2                         273.904+-7.458            268.631+-0.808           might be 1.0196x faster
   stanford-crypto-sha256-iterative               117.754+-2.031      ^     115.072+-0.131         ^ definitely 1.0233x faster

   <arithmetic> *                                 218.408+-0.856            217.510+-0.467           might be 1.0041x faster
   <geometric>                                    186.885+-0.895            186.008+-0.689           might be 1.0047x faster
   <harmonic>                                     160.695+-0.915            159.914+-0.861           might be 1.0049x faster

                                                     TipOfTree                   StrCat                                      
JSRegress:
   adapt-to-double-divide                         22.4650+-0.0973     ?     22.6001+-0.1221        ?
   aliased-arguments-getbyval                      0.9029+-0.0087     ?      0.9087+-0.0103        ?
   allocate-big-object                             2.5204+-0.0474     ?      2.6371+-0.1047        ? might be 1.0463x slower
   arity-mismatch-inlining                         0.7669+-0.0108     ?      0.7744+-0.0219        ?
   array-access-polymorphic-structure              7.1098+-0.0810            7.0706+-0.0928        
   array-with-double-add                           5.7874+-0.0919     ?      5.7987+-0.0900        ?
   array-with-double-increment                     4.2175+-0.0912            4.1231+-0.0101          might be 1.0229x faster
   array-with-double-mul-add                       6.9980+-0.0931     ?      7.0709+-0.1059        ? might be 1.0104x slower
   array-with-double-sum                           7.9035+-0.0950            7.8891+-0.1105        
   array-with-int32-add-sub                       10.5194+-0.1474           10.4017+-0.1086          might be 1.0113x faster
   array-with-int32-or-double-sum                  7.9755+-0.1145     ?      7.9937+-0.0935        ?
   big-int-mul                                     4.9911+-0.0155     ?      5.0021+-0.0173        ?
   boolean-test                                    4.3989+-0.0565     ?      4.4199+-0.0680        ?
   cast-int-to-double                             13.9444+-0.1181           13.9162+-0.1042        
   cell-argument                                  14.4922+-0.1202           14.3659+-0.0764        
   cfg-simplify                                    3.9303+-0.0930     ?      4.0021+-0.0150        ? might be 1.0183x slower
   cmpeq-obj-to-obj-other                         11.1266+-0.2263     !     11.5914+-0.1692        ! definitely 1.0418x slower
   constant-test                                   8.5508+-0.1374            8.4728+-0.1135        
   direct-arguments-getbyval                       0.8325+-0.0093            0.8320+-0.0098        
   double-pollution-getbyval                      10.6867+-0.1196     ?     10.7506+-0.1226        ?
   double-pollution-putbyoffset                    5.0264+-0.0251            5.0231+-0.0249        
   external-arguments-getbyval                     2.2082+-0.0389            2.1934+-0.0410        
   external-arguments-putbyval                     3.3122+-0.0171     ?      3.3415+-0.0699        ?
   Float32Array-matrix-mult                       13.8183+-0.0865     !     14.4195+-0.1432        ! definitely 1.0435x slower
   fold-double-to-int                             22.0176+-0.2537           21.8695+-0.1613        
   function-dot-apply                              3.1733+-0.0080     ?      3.1754+-0.0096        ?
   function-test                                   4.9984+-0.0560            4.9837+-0.1162        
   get-by-id-chain-from-try-block                  7.4270+-0.0960     ?      7.5099+-0.0977        ? might be 1.0112x slower
   HashMap-put-get-iterate-keys                   88.5555+-0.5321     ?     89.8767+-0.9993        ? might be 1.0149x slower
   HashMap-put-get-iterate                        91.6294+-0.8363           90.5546+-0.8644          might be 1.0119x faster
   HashMap-string-put-get-iterate                 73.6103+-0.4110           73.3356+-0.3383        
   indexed-properties-in-objects                   4.5354+-0.0180     ?      4.5420+-0.0374        ?
   inline-arguments-access                         1.2481+-0.0065            1.2465+-0.0097        
   inline-arguments-local-escape                  23.2782+-0.1161     ^     22.9261+-0.1571        ^ definitely 1.0154x faster
   inline-get-scoped-var                           6.6146+-0.0857     ?      6.6248+-0.0877        ?
   inlined-put-by-id-transition                   16.7545+-0.3331           16.6549+-0.1323        
   int-or-other-abs-then-get-by-val                8.7932+-0.0988     ?      8.9186+-0.1068        ? might be 1.0143x slower
   int-or-other-abs-zero-then-get-by-val          37.0613+-0.1350     ?     37.3534+-0.3862        ?
   int-or-other-add-then-get-by-val               10.2542+-0.1285     ?     10.2806+-0.1248        ?
   int-or-other-add                               10.4798+-0.0966     ?     10.5149+-0.1229        ?
   int-or-other-div-then-get-by-val                7.9403+-0.0946     ?      7.9640+-0.0814        ?
   int-or-other-max-then-get-by-val               10.0088+-0.2449            9.9383+-0.2083        
   int-or-other-min-then-get-by-val                8.1825+-0.1114     ?      8.1835+-0.1020        ?
   int-or-other-mod-then-get-by-val                8.0365+-0.1108            8.0016+-0.1055        
   int-or-other-mul-then-get-by-val                7.2223+-0.0971     ?      7.2227+-0.1034        ?
   int-or-other-neg-then-get-by-val                8.1501+-0.0902     ?      8.1594+-0.1232        ?
   int-or-other-neg-zero-then-get-by-val          36.4814+-0.1262           36.4060+-0.1169        
   int-or-other-sub-then-get-by-val               10.2348+-0.1096     ?     10.2845+-0.1287        ?
   int-or-other-sub                                8.2182+-0.1126            8.1580+-0.1015        
   int-overflow-local                             12.8913+-0.1131           12.8646+-0.1150        
   Int16Array-bubble-sort                         49.7224+-0.4382           49.4922+-0.2366        
   Int16Array-load-int-mul                         1.8789+-0.0074     ?      1.8830+-0.0066        ?
   Int8Array-load                                  4.8646+-0.0419     ?      4.8714+-0.0239        ?
   integer-divide                                 15.1984+-0.1290           15.1204+-0.1107        
   integer-modulo                                  2.0599+-0.0125     ?      2.0610+-0.0150        ?
   make-indexed-storage                            3.9300+-0.0424            3.9054+-0.0447        
   method-on-number                               23.8250+-0.5492           23.4125+-0.4873          might be 1.0176x faster
   nested-function-parsing-random                376.1677+-13.1717    ?    377.4356+-13.0783       ?
   nested-function-parsing                        51.1583+-1.1442     ^     47.8857+-1.1675        ^ definitely 1.0683x faster
   new-array-buffer-dead                           3.6232+-0.0125     ?      3.6266+-0.0173        ?
   new-array-buffer-push                          10.4755+-0.1594           10.3989+-0.1856        
   new-array-dead                                 28.2948+-0.1251           28.2491+-0.0811        
   new-array-push                                  7.1196+-0.1813            6.9540+-0.0700          might be 1.0238x faster
   number-test                                     4.3065+-0.0908     ?      4.3227+-0.0573        ?
   object-closure-call                             8.3433+-0.0916            8.3339+-0.1054        
   object-test                                     4.9050+-0.0548     ?      4.9232+-0.1054        ?
   poly-stricteq                                  91.5765+-1.1989           90.8259+-0.2760        
   polymorphic-structure                          20.1160+-0.1612           20.0110+-0.1295        
   polyvariant-monomorphic-get-by-id              12.5509+-0.1449           12.5053+-0.1203        
   rare-osr-exit-on-local                         20.6214+-0.1457           20.5618+-0.1147        
   register-pressure-from-osr                     31.5523+-0.1350     ?     31.5747+-0.1140        ?
   simple-activation-demo                         34.4323+-0.1305     ?     34.4448+-0.1393        ?
   slow-array-profile-convergence                  4.3552+-0.0278            4.3467+-0.0201        
   slow-convergence                                3.7944+-0.0081     ?      3.8006+-0.0109        ?
   sparse-conditional                              1.3154+-0.0115            1.3125+-0.0139        
   splice-to-remove                               50.4247+-0.1741           50.3684+-0.1691        
   string-concat-object                            5.5094+-0.0579     ^      2.7209+-0.0145        ^ definitely 2.0248x faster
   string-concat-pair-object                       2.7271+-0.0297     ^      2.6626+-0.0188        ^ definitely 1.0242x faster
   string-concat-pair-simple                      17.9229+-0.2238     ^     17.2733+-0.1407        ^ definitely 1.0376x faster
   string-concat-simple                           44.9397+-0.2664     ^     16.9350+-0.1740        ^ definitely 2.6537x faster
   string-cons-repeat                             10.1017+-0.0206     ?     10.1288+-0.0274        ?
   string-cons-tower                              10.9276+-0.0291     ?     10.9645+-0.0564        ?
   string-hash                                     2.6490+-0.0112     ?      2.6492+-0.0094        ?
   string-repeat-arith                            45.7644+-0.1524     ^     45.3655+-0.1909        ^ definitely 1.0088x faster
   string-sub                                     89.3981+-0.8347     ^     87.0744+-1.3783        ^ definitely 1.0267x faster
   string-test                                     4.2867+-0.0246     ?      4.3133+-0.0555        ?
   structure-hoist-over-transitions                3.3272+-0.0755            3.2727+-0.0276          might be 1.0166x faster
   tear-off-arguments-simple                       1.7767+-0.0109            1.7759+-0.0108        
   tear-off-arguments                              3.3767+-0.0100     ?      3.3871+-0.0095        ?
   temporal-structure                             20.9459+-0.1200           20.8629+-0.1132        
   to-int32-boolean                               27.1413+-0.1167           27.1114+-0.0943        
   undefined-test                                  4.5538+-0.0417     ?      4.5629+-0.0409        ?

   <arithmetic>                                   20.2635+-0.1591     ^     19.8569+-0.1546        ^ definitely 1.0205x faster
   <geometric> *                                   9.3117+-0.0247     ^      9.1322+-0.0242        ^ definitely 1.0197x faster
   <harmonic>                                      5.1612+-0.0135     ^      5.1031+-0.0212        ^ definitely 1.0114x faster

                                                     TipOfTree                   StrCat                                      
All benchmarks:
   <arithmetic>                                   40.0736+-0.3417           39.6992+-0.3252          might be 1.0094x faster
   <geometric>                                    11.2751+-0.0537     ^     11.1165+-0.0531        ^ definitely 1.0143x faster
   <harmonic>                                      3.6747+-0.0349            3.6368+-0.0304          might be 1.0104x faster

                                                     TipOfTree                   StrCat                                      
Geomean of preferred means:
   <scaled-result>                                22.7195+-0.1281     ^     22.4654+-0.1228        ^ definitely 1.0113x faster
Comment 2 Filip Pizlo 2013-03-20 01:06:55 PDT
Created attachment 193996 [details]
the patch
Comment 3 Oliver Hunt 2013-03-20 01:35:18 PDT
Comment on attachment 193996 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=193996&action=review

> Source/JavaScriptCore/dfg/DFGOperations.cpp:1576
> +    JSGlobalData& globalData = exec->globalData();

#if CPU(X86)
RELEASE_ASSERT_NOT_REACHED();
#endif 
?
Comment 4 Filip Pizlo 2013-03-20 01:55:19 PDT
(In reply to comment #3)
> (From update of attachment 193996 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=193996&action=review
> 
> > Source/JavaScriptCore/dfg/DFGOperations.cpp:1576
> > +    JSGlobalData& globalData = exec->globalData();
> 
> #if CPU(X86)
> RELEASE_ASSERT_NOT_REACHED();
> #endif 
> ?

I could do that, but would it help?  There's nothing wrong with calling that function on X86.  The fact that it won't be called is a detail that is orthogonal to the DFGOperations interface.
Comment 5 Filip Pizlo 2013-03-20 13:32:14 PDT
Landed in http://trac.webkit.org/changeset/146382