Bug 160443 - [JSC] Improve the memory locality of DFG Node's AbstractValues
Summary: [JSC] Improve the memory locality of DFG Node's AbstractValues
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Benjamin Poulain
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-01 21:50 PDT by Benjamin Poulain
Modified: 2016-08-03 20:45 PDT (History)
5 users (show)

See Also:


Attachments
Patch (7.72 KB, patch)
2016-08-01 22:01 PDT, Benjamin Poulain
no flags Details | Formatted Diff | Diff
Patch (8.30 KB, patch)
2016-08-02 18:07 PDT, Benjamin Poulain
no flags Details | Formatted Diff | Diff
Patch for landing (8.05 KB, patch)
2016-08-03 20:16 PDT, Benjamin Poulain
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Benjamin Poulain 2016-08-01 21:50:17 PDT
[JSC] Improve the memory locality of DFG Node's AbstractValues
Comment 1 Benjamin Poulain 2016-08-01 22:01:55 PDT
Created attachment 285075 [details]
Patch
Comment 2 Benjamin Poulain 2016-08-01 22:03:59 PDT
Fast benchmarks with 20 runs. https://bugs.webkit.org/show_bug.cgi?id=160370 not applied.


                                               Baseline                  NoPacking                 Packing                Packing VS Baseline
SunSpider:
   3d-cube                                 4.6536+-0.0512     ?      4.7191+-0.0418            4.6983+-0.0291        ?
   3d-morph                                4.9866+-0.0291     ?      5.0245+-0.0826     ?      5.0260+-0.0673        ?
   3d-raytrace                             4.8657+-0.0226     ?      4.8984+-0.0382            4.8764+-0.0218        ?
   access-binary-trees                     1.9769+-0.0197     ?      1.9786+-0.0198     ?      1.9910+-0.0211        ?
   access-fannkuch                         6.1405+-0.1443     ^      5.7092+-0.0503     ?      5.7431+-0.0717        ^ definitely 1.0692x faster
   access-nbody                            2.3701+-0.0247     ?      2.4108+-0.0623            2.4038+-0.0361        ? might be 1.0142x slower
   access-nsieve                           2.9353+-0.0170     ?      2.9516+-0.0359     ?      2.9606+-0.0418        ?
   bitops-3bit-bits-in-byte                1.0751+-0.0124     ?      1.0799+-0.0142     ?      1.0864+-0.0153        ? might be 1.0105x slower
   bitops-bits-in-byte                     2.5868+-0.0139     ?      2.6015+-0.0220     ?      2.6016+-0.0253        ?
   bitops-bitwise-and                      1.9917+-0.0138     ?      2.0033+-0.0185            1.9954+-0.0202        ?
   bitops-nsieve-bits                      3.1241+-0.0634            3.0963+-0.0178            3.0912+-0.0124          might be 1.0106x faster
   controlflow-recursive                   2.3295+-0.0291            2.3194+-0.0165            2.3177+-0.0189        
   crypto-aes                              4.4244+-0.0248            4.4145+-0.0258     ?      4.4314+-0.0364        ?
   crypto-md5                              2.6643+-0.0166     ?      2.6792+-0.0159            2.6630+-0.0143        
   crypto-sha1                             2.7432+-0.0192     ?      2.7508+-0.0223            2.7333+-0.0227        
   date-format-tofte                       6.4480+-0.0404     ?      6.4797+-0.0568     ?      6.4828+-0.0583        ?
   date-format-xparb                       4.7469+-0.0385     ?      4.7700+-0.0562            4.7562+-0.0216        ?
   math-cordic                             2.8051+-0.1163            2.7456+-0.0144            2.7452+-0.0189          might be 1.0218x faster
   math-partial-sums                       4.0163+-0.0811            3.9938+-0.0322            3.9796+-0.0221        
   math-spectral-norm                      2.0590+-0.0242            2.0395+-0.0201     ?      2.0402+-0.0256        
   regexp-dna                              6.4929+-0.1547            6.3862+-0.0724            6.3714+-0.0804          might be 1.0191x faster
   string-base64                           4.0149+-0.0403     ?      4.0216+-0.0520            3.9967+-0.0320        
   string-fasta                            5.5171+-0.0717     ?      5.5301+-0.0221     ?      5.5722+-0.0666        ?
   string-tagcloud                         8.2997+-0.0436     ?      8.4353+-0.1311            8.3946+-0.1105        ? might be 1.0114x slower
   string-unpack-code                     18.2689+-0.3052           18.1604+-0.1514           18.0942+-0.1289        
   string-validate-input                   4.0771+-0.0282     ?      4.0942+-0.0244            4.0794+-0.0165        ?

   <arithmetic>                            4.4467+-0.0156            4.4344+-0.0108            4.4281+-0.0097          might be 1.0042x faster

                                               Conf#1                    Conf#2                    Conf#3                 Conf#3 v. Conf#1     
Kraken:
   ai-astar                                87.304+-0.459      ?      87.626+-0.604             87.563+-0.650         ?
   audio-beat-detection                    38.880+-0.096             38.858+-0.073      ?      38.925+-0.181         ?
   audio-dft                               98.224+-0.834      ?      98.612+-0.847             97.780+-0.606         
   audio-fft                               30.242+-0.043      ?      30.286+-0.090             30.261+-0.086         ?
   audio-oscillator                        47.770+-0.117      !      48.441+-0.147             48.363+-0.315         ! definitely 1.0124x slower
   imaging-darkroom                        61.236+-0.245      ?      61.334+-0.316             61.217+-0.102         
   imaging-desaturate                      45.033+-1.155             44.225+-0.287      ?      44.706+-0.494         
   imaging-gaussian-blur                   63.102+-1.443             61.888+-0.702             61.735+-0.866           might be 1.0221x faster
   json-parse-financial                    34.362+-0.423             33.975+-0.391      ?      34.308+-0.462         
   json-stringify-tinderbox                23.038+-0.134      !      23.607+-0.155      ^      22.496+-0.372         ^ definitely 1.0241x faster
   stanford-crypto-aes                     37.048+-0.287             36.878+-0.135             36.866+-0.150         
   stanford-crypto-ccm                     32.975+-0.810      ?      33.778+-0.920             33.745+-0.661         ? might be 1.0234x slower
   stanford-crypto-pbkdf2                  92.973+-0.412             92.844+-0.397      ?      93.084+-0.570         ?
   stanford-crypto-sha256-iterative        30.277+-0.087      ?      30.326+-0.220             30.235+-0.057         

   <arithmetic>                            51.605+-0.149      ?      51.620+-0.112             51.520+-0.104           might be 1.0016x faster

                                               Conf#1                    Conf#2                    Conf#3                 Conf#3 v. Conf#1     
Geomean of preferred means:
   <scaled-result>                         4.7902+-0.0112            4.7843+-0.0083            4.7763+-0.0072          might be 1.0029x faster
Comment 3 Benjamin Poulain 2016-08-01 22:05:39 PDT
Top benchmarks, 12 runs.
I doubt this truly affects Octane or ASM. The patch should only improve latency.

                                                  Conf#1                    Conf#2                    Conf#3                 Conf#3 v. Conf#1     
SunSpider:
   3d-cube                                    4.5283+-0.1014     ?      4.6555+-0.0992     ?      4.6592+-0.0829        ? might be 1.0289x slower
   3d-morph                                   5.0856+-0.2175            4.9622+-0.0236     ?      4.9952+-0.0475          might be 1.0181x faster
   3d-raytrace                                4.9069+-0.0752            4.8879+-0.0473            4.8830+-0.0449        
   access-binary-trees                        1.9808+-0.0331     ?      1.9852+-0.0214            1.9774+-0.0212        
   access-fannkuch                            6.0591+-0.2568            5.6775+-0.1289     ?      5.7488+-0.0521        ^ definitely 1.0540x faster
   access-nbody                               2.3775+-0.0704     ?      2.4133+-0.0721            2.3501+-0.0113          might be 1.0117x faster
   access-nsieve                              3.0211+-0.1542            2.9470+-0.0426            2.9079+-0.0161          might be 1.0389x faster
   bitops-3bit-bits-in-byte                   1.1107+-0.0374            1.0642+-0.0162     ?      1.0790+-0.0175          might be 1.0294x faster
   bitops-bits-in-byte                        2.5840+-0.0389     ?      2.6131+-0.0429            2.5571+-0.0242          might be 1.0105x faster
   bitops-bitwise-and                         1.9724+-0.0165     ?      2.0130+-0.0351     ^      1.9663+-0.0088        
   bitops-nsieve-bits                         3.1177+-0.0498            3.1062+-0.0290            3.0869+-0.0339        
   controlflow-recursive                      2.2982+-0.0174     ?      2.3464+-0.0560            2.3055+-0.0148        ?
   crypto-aes                                 4.4093+-0.0333            4.3720+-0.0149     ?      4.4218+-0.0566        ?
   crypto-md5                                 2.6692+-0.0296            2.6621+-0.0220     ?      2.6739+-0.0301        ?
   crypto-sha1                                2.7202+-0.0265     ?      2.7475+-0.0721            2.7304+-0.0306        ?
   date-format-tofte                          6.4128+-0.0736     ?      6.4836+-0.0598            6.4377+-0.0760        ?
   date-format-xparb                          4.7407+-0.0507     ?      4.7453+-0.0529     ?      4.8116+-0.0892        ? might be 1.0150x slower
   math-cordic                                2.7143+-0.0120     ?      2.7296+-0.0133     ?      2.7331+-0.0201        ?
   math-partial-sums                          3.9748+-0.0444            3.9590+-0.0413     ?      3.9781+-0.0399        ?
   math-spectral-norm                         2.1045+-0.0845            2.0234+-0.0346     ?      2.0276+-0.0424          might be 1.0380x faster
   regexp-dna                                 6.2487+-0.0942     ?      6.3608+-0.0866            6.3352+-0.0911        ? might be 1.0138x slower
   string-base64                              3.9914+-0.0415            3.9655+-0.0313     ?      4.0455+-0.1087        ? might be 1.0135x slower
   string-fasta                               5.4522+-0.0194     ?      5.4978+-0.0312     ?      5.5369+-0.0314        ! definitely 1.0155x slower
   string-tagcloud                            8.3369+-0.1944     ?      8.3548+-0.1254     ?      8.3549+-0.1651        ?
   string-unpack-code                        18.1269+-0.2051     ?     18.3100+-0.7568           18.1349+-0.2341        ?
   string-validate-input                      4.0817+-0.0515     ?      4.1097+-0.0774            4.0952+-0.0499        ?

   <arithmetic>                               4.4241+-0.0237            4.4228+-0.0276            4.4167+-0.0141          might be 1.0017x faster

                                                  Conf#1                    Conf#2                    Conf#3                 Conf#3 v. Conf#1     
Octane:
   encrypt                                   0.16172+-0.00215          0.16109+-0.00128    ?     0.16152+-0.00151       
   decrypt                                   2.71957+-0.03372          2.69736+-0.00951    ?     2.72519+-0.03534       ?
   deltablue                        x2       0.12925+-0.00105    ?     0.13364+-0.00841          0.13346+-0.00792       ? might be 1.0326x slower
   earley                                    0.28582+-0.00161    ?     0.28620+-0.00141          0.28566+-0.00157       
   boyer                                     4.98424+-0.06369    ?     4.99195+-0.05932    ?     5.00784+-0.01777       ?
   navier-stokes                    x2       4.92973+-0.01107    ?     4.93727+-0.01295    ?     4.94075+-0.01603       ?
   raytrace                         x2       0.79913+-0.00429    ?     0.79932+-0.00322          0.79918+-0.00384       ?
   richards                         x2       0.08366+-0.00112          0.08281+-0.00121    ?     0.08308+-0.00096       
   splay                            x2       0.33701+-0.00093          0.33677+-0.00194          0.33657+-0.00198       
   regexp                           x2      16.85333+-0.29356         16.59632+-0.52353    ?    17.19453+-0.48489       ? might be 1.0202x slower
   pdfjs                            x2      38.77752+-0.26053         38.50117+-0.27065    ?    38.79237+-0.24719       ?
   mandreel                         x2      42.75439+-0.28420    ?    42.78746+-0.22917         42.68880+-0.22971       
   gbemu                            x2      29.81598+-0.13736         29.67881+-0.13219    ?    29.74712+-0.08697       
   closure                                   0.48299+-0.00163    ?     0.48372+-0.00187          0.48132+-0.00214       
   jquery                                    6.46137+-0.03051          6.44973+-0.03612          6.44857+-0.02278       
   box2d                            x2       9.28965+-0.03613          9.25584+-0.06818          9.20210+-0.04473       ^ definitely 1.0095x faster
   zlib                             x2     369.83863+-6.00287        366.80492+-2.10013        357.38211+-9.99199         might be 1.0349x faster
   typescript                       x2     604.51898+-4.68085        601.46037+-7.09917        599.99866+-3.66573       

   <geometric>                               5.04284+-0.01109          5.03358+-0.02908    ?     5.03844+-0.01300         might be 1.0009x faster

                                                  Conf#1                    Conf#2                    Conf#3                 Conf#3 v. Conf#1     
Kraken:
   ai-astar                                   87.510+-1.135             87.323+-1.244             86.933+-1.437         
   audio-beat-detection                       38.795+-0.089      ?      38.803+-0.086      ?      38.955+-0.146         ?
   audio-dft                                  98.008+-1.442      ?      99.360+-1.819             97.978+-1.367         
   audio-fft                                  30.219+-0.036      ?      30.232+-0.057             30.217+-0.035         
   audio-oscillator                           47.718+-0.117      !      48.244+-0.383      ?      49.056+-0.959         ! definitely 1.0280x slower
   imaging-darkroom                           61.229+-0.140             61.168+-0.099             61.079+-0.118         
   imaging-desaturate                         43.766+-0.176      ?      44.381+-1.204             44.234+-1.271         ? might be 1.0107x slower
   imaging-gaussian-blur                      62.002+-2.629             61.873+-1.343             61.511+-0.835         
   json-parse-financial                       34.149+-0.528             34.027+-0.827      ?      34.664+-1.635         ? might be 1.0151x slower
   json-stringify-tinderbox                   23.238+-0.274      ?      23.801+-0.844             22.486+-0.501           might be 1.0335x faster
   stanford-crypto-aes                        36.782+-0.411      ?      37.145+-1.011      ?      37.468+-1.077         ? might be 1.0187x slower
   stanford-crypto-ccm                        33.039+-1.668      ?      34.222+-1.015             32.737+-1.606         
   stanford-crypto-pbkdf2                     92.784+-0.727             92.221+-0.190      ?      92.377+-0.257         
   stanford-crypto-sha256-iterative           30.210+-0.173      ?      30.377+-0.487             30.229+-0.106         ?

   <arithmetic>                               51.389+-0.231      ?      51.655+-0.257             51.423+-0.305         ? might be 1.0007x slower

                                                  Conf#1                    Conf#2                    Conf#3                 Conf#3 v. Conf#1     
AsmBench:
   bigfib.cpp                               428.4924+-1.7935          428.1734+-3.9477          427.0936+-1.5308        
   cray.c                                   388.9729+-2.6488          388.7526+-2.2565          388.6483+-1.6919        
   dry.c                                    473.8678+-57.7763         472.7513+-52.1405         427.1050+-3.3011          might be 1.1095x faster
   FloatMM.c                                738.9834+-22.5742         736.8981+-24.3549         729.6907+-19.7195         might be 1.0127x faster
   gcc-loops.cpp                           3608.5908+-14.2060        3600.5040+-11.9867    ?   3605.0666+-12.3677       
   n-body.c                                 800.9911+-4.6294          799.0552+-1.8147     ?    799.8246+-2.8250        
   Quicksort.c                              397.6229+-2.8334          396.7095+-3.1127     ?    399.3499+-3.4433        ?
   stepanov_container.cpp                  3313.9651+-21.8253        3294.3932+-19.0162    ?   3300.6651+-8.2700        
   Towers.c                                 265.6785+-0.9580          264.7243+-0.6980     ?    265.2861+-1.3943        

   <geometric>                              729.0649+-9.1576          727.3731+-8.0087          720.1744+-2.7211          might be 1.0123x faster

                                                  Conf#1                    Conf#2                    Conf#3                 Conf#3 v. Conf#1     
Geomean of preferred means:
   <scaled-result>                           30.2352+-0.0944     ?     30.2406+-0.1047           30.1294+-0.0573          might be 1.0035x faster
Comment 4 Benjamin Poulain 2016-08-02 18:07:21 PDT
Created attachment 285174 [details]
Patch
Comment 5 Benjamin Poulain 2016-08-02 18:08:28 PDT
Comment on attachment 285174 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=285174&action=review

> Source/JavaScriptCore/dfg/DFGInPlaceAbstractState.cpp:60
> +    m_abstractValues.resize(m_graph.maxNodeCount());

I had to move this here because ConstantFolding can add values after executing each block.
Comment 6 Mark Lam 2016-08-03 09:46:14 PDT
Comment on attachment 285174 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=285174&action=review

r=me

> Source/JavaScriptCore/ChangeLog:17
> +        I moved the vector to Graph as a cache shared by every instanciation of

typo: /instanciation/instantiation/.

> Source/JavaScriptCore/b3/B3SparseCollection.h:83
> +        unsigned lastValueIndex = m_vector.size();

I suggest calling this endIndex.  lastValueIndex implies that it points to the last value, when it is meant to point just past the last value.

> Source/JavaScriptCore/b3/B3SparseCollection.h:110
> +        m_vector.resize(holeIndex);

I suggest changing this to be m_vector.resize(endIndex).

>> Source/JavaScriptCore/dfg/DFGInPlaceAbstractState.cpp:60
>> +    m_abstractValues.resize(m_graph.maxNodeCount());
> 
> I had to move this here because ConstantFolding can add values after executing each block.

Can you turn this into a comment here?  I would think that future readers of the code may want to know why this line is here, and at best may glance at the ChangeLog, but will very likely not see this comment in bugzilla.  The most likely place to look for an explanation of why this is done is in a comment right here in the code where it's relevant.
Comment 7 Benjamin Poulain 2016-08-03 20:16:37 PDT
Created attachment 285298 [details]
Patch for landing
Comment 8 WebKit Commit Bot 2016-08-03 20:45:12 PDT
Comment on attachment 285298 [details]
Patch for landing

Clearing flags on attachment: 285298

Committed r204112: <http://trac.webkit.org/changeset/204112>
Comment 9 WebKit Commit Bot 2016-08-03 20:45:16 PDT
All reviewed patches have been landed.  Closing bug.