RESOLVED FIXED 160443
[JSC] Improve the memory locality of DFG Node's AbstractValues
https://bugs.webkit.org/show_bug.cgi?id=160443
Summary [JSC] Improve the memory locality of DFG Node's AbstractValues
Benjamin Poulain
Reported 2016-08-01 21:50:17 PDT
[JSC] Improve the memory locality of DFG Node's AbstractValues
Attachments
Patch (7.72 KB, patch)
2016-08-01 22:01 PDT, Benjamin Poulain
no flags
Patch (8.30 KB, patch)
2016-08-02 18:07 PDT, Benjamin Poulain
no flags
Patch for landing (8.05 KB, patch)
2016-08-03 20:16 PDT, Benjamin Poulain
no flags
Benjamin Poulain
Comment 1 2016-08-01 22:01:55 PDT
Benjamin Poulain
Comment 2 2016-08-01 22:03:59 PDT
Fast benchmarks with 20 runs. https://bugs.webkit.org/show_bug.cgi?id=160370 not applied. Baseline NoPacking Packing Packing VS Baseline SunSpider: 3d-cube 4.6536+-0.0512 ? 4.7191+-0.0418 4.6983+-0.0291 ? 3d-morph 4.9866+-0.0291 ? 5.0245+-0.0826 ? 5.0260+-0.0673 ? 3d-raytrace 4.8657+-0.0226 ? 4.8984+-0.0382 4.8764+-0.0218 ? access-binary-trees 1.9769+-0.0197 ? 1.9786+-0.0198 ? 1.9910+-0.0211 ? access-fannkuch 6.1405+-0.1443 ^ 5.7092+-0.0503 ? 5.7431+-0.0717 ^ definitely 1.0692x faster access-nbody 2.3701+-0.0247 ? 2.4108+-0.0623 2.4038+-0.0361 ? might be 1.0142x slower access-nsieve 2.9353+-0.0170 ? 2.9516+-0.0359 ? 2.9606+-0.0418 ? bitops-3bit-bits-in-byte 1.0751+-0.0124 ? 1.0799+-0.0142 ? 1.0864+-0.0153 ? might be 1.0105x slower bitops-bits-in-byte 2.5868+-0.0139 ? 2.6015+-0.0220 ? 2.6016+-0.0253 ? bitops-bitwise-and 1.9917+-0.0138 ? 2.0033+-0.0185 1.9954+-0.0202 ? bitops-nsieve-bits 3.1241+-0.0634 3.0963+-0.0178 3.0912+-0.0124 might be 1.0106x faster controlflow-recursive 2.3295+-0.0291 2.3194+-0.0165 2.3177+-0.0189 crypto-aes 4.4244+-0.0248 4.4145+-0.0258 ? 4.4314+-0.0364 ? crypto-md5 2.6643+-0.0166 ? 2.6792+-0.0159 2.6630+-0.0143 crypto-sha1 2.7432+-0.0192 ? 2.7508+-0.0223 2.7333+-0.0227 date-format-tofte 6.4480+-0.0404 ? 6.4797+-0.0568 ? 6.4828+-0.0583 ? date-format-xparb 4.7469+-0.0385 ? 4.7700+-0.0562 4.7562+-0.0216 ? math-cordic 2.8051+-0.1163 2.7456+-0.0144 2.7452+-0.0189 might be 1.0218x faster math-partial-sums 4.0163+-0.0811 3.9938+-0.0322 3.9796+-0.0221 math-spectral-norm 2.0590+-0.0242 2.0395+-0.0201 ? 2.0402+-0.0256 regexp-dna 6.4929+-0.1547 6.3862+-0.0724 6.3714+-0.0804 might be 1.0191x faster string-base64 4.0149+-0.0403 ? 4.0216+-0.0520 3.9967+-0.0320 string-fasta 5.5171+-0.0717 ? 5.5301+-0.0221 ? 5.5722+-0.0666 ? string-tagcloud 8.2997+-0.0436 ? 8.4353+-0.1311 8.3946+-0.1105 ? might be 1.0114x slower string-unpack-code 18.2689+-0.3052 18.1604+-0.1514 18.0942+-0.1289 string-validate-input 4.0771+-0.0282 ? 4.0942+-0.0244 4.0794+-0.0165 ? <arithmetic> 4.4467+-0.0156 4.4344+-0.0108 4.4281+-0.0097 might be 1.0042x faster Conf#1 Conf#2 Conf#3 Conf#3 v. Conf#1 Kraken: ai-astar 87.304+-0.459 ? 87.626+-0.604 87.563+-0.650 ? audio-beat-detection 38.880+-0.096 38.858+-0.073 ? 38.925+-0.181 ? audio-dft 98.224+-0.834 ? 98.612+-0.847 97.780+-0.606 audio-fft 30.242+-0.043 ? 30.286+-0.090 30.261+-0.086 ? audio-oscillator 47.770+-0.117 ! 48.441+-0.147 48.363+-0.315 ! definitely 1.0124x slower imaging-darkroom 61.236+-0.245 ? 61.334+-0.316 61.217+-0.102 imaging-desaturate 45.033+-1.155 44.225+-0.287 ? 44.706+-0.494 imaging-gaussian-blur 63.102+-1.443 61.888+-0.702 61.735+-0.866 might be 1.0221x faster json-parse-financial 34.362+-0.423 33.975+-0.391 ? 34.308+-0.462 json-stringify-tinderbox 23.038+-0.134 ! 23.607+-0.155 ^ 22.496+-0.372 ^ definitely 1.0241x faster stanford-crypto-aes 37.048+-0.287 36.878+-0.135 36.866+-0.150 stanford-crypto-ccm 32.975+-0.810 ? 33.778+-0.920 33.745+-0.661 ? might be 1.0234x slower stanford-crypto-pbkdf2 92.973+-0.412 92.844+-0.397 ? 93.084+-0.570 ? stanford-crypto-sha256-iterative 30.277+-0.087 ? 30.326+-0.220 30.235+-0.057 <arithmetic> 51.605+-0.149 ? 51.620+-0.112 51.520+-0.104 might be 1.0016x faster Conf#1 Conf#2 Conf#3 Conf#3 v. Conf#1 Geomean of preferred means: <scaled-result> 4.7902+-0.0112 4.7843+-0.0083 4.7763+-0.0072 might be 1.0029x faster
Benjamin Poulain
Comment 3 2016-08-01 22:05:39 PDT
Top benchmarks, 12 runs. I doubt this truly affects Octane or ASM. The patch should only improve latency. Conf#1 Conf#2 Conf#3 Conf#3 v. Conf#1 SunSpider: 3d-cube 4.5283+-0.1014 ? 4.6555+-0.0992 ? 4.6592+-0.0829 ? might be 1.0289x slower 3d-morph 5.0856+-0.2175 4.9622+-0.0236 ? 4.9952+-0.0475 might be 1.0181x faster 3d-raytrace 4.9069+-0.0752 4.8879+-0.0473 4.8830+-0.0449 access-binary-trees 1.9808+-0.0331 ? 1.9852+-0.0214 1.9774+-0.0212 access-fannkuch 6.0591+-0.2568 5.6775+-0.1289 ? 5.7488+-0.0521 ^ definitely 1.0540x faster access-nbody 2.3775+-0.0704 ? 2.4133+-0.0721 2.3501+-0.0113 might be 1.0117x faster access-nsieve 3.0211+-0.1542 2.9470+-0.0426 2.9079+-0.0161 might be 1.0389x faster bitops-3bit-bits-in-byte 1.1107+-0.0374 1.0642+-0.0162 ? 1.0790+-0.0175 might be 1.0294x faster bitops-bits-in-byte 2.5840+-0.0389 ? 2.6131+-0.0429 2.5571+-0.0242 might be 1.0105x faster bitops-bitwise-and 1.9724+-0.0165 ? 2.0130+-0.0351 ^ 1.9663+-0.0088 bitops-nsieve-bits 3.1177+-0.0498 3.1062+-0.0290 3.0869+-0.0339 controlflow-recursive 2.2982+-0.0174 ? 2.3464+-0.0560 2.3055+-0.0148 ? crypto-aes 4.4093+-0.0333 4.3720+-0.0149 ? 4.4218+-0.0566 ? crypto-md5 2.6692+-0.0296 2.6621+-0.0220 ? 2.6739+-0.0301 ? crypto-sha1 2.7202+-0.0265 ? 2.7475+-0.0721 2.7304+-0.0306 ? date-format-tofte 6.4128+-0.0736 ? 6.4836+-0.0598 6.4377+-0.0760 ? date-format-xparb 4.7407+-0.0507 ? 4.7453+-0.0529 ? 4.8116+-0.0892 ? might be 1.0150x slower math-cordic 2.7143+-0.0120 ? 2.7296+-0.0133 ? 2.7331+-0.0201 ? math-partial-sums 3.9748+-0.0444 3.9590+-0.0413 ? 3.9781+-0.0399 ? math-spectral-norm 2.1045+-0.0845 2.0234+-0.0346 ? 2.0276+-0.0424 might be 1.0380x faster regexp-dna 6.2487+-0.0942 ? 6.3608+-0.0866 6.3352+-0.0911 ? might be 1.0138x slower string-base64 3.9914+-0.0415 3.9655+-0.0313 ? 4.0455+-0.1087 ? might be 1.0135x slower string-fasta 5.4522+-0.0194 ? 5.4978+-0.0312 ? 5.5369+-0.0314 ! definitely 1.0155x slower string-tagcloud 8.3369+-0.1944 ? 8.3548+-0.1254 ? 8.3549+-0.1651 ? string-unpack-code 18.1269+-0.2051 ? 18.3100+-0.7568 18.1349+-0.2341 ? string-validate-input 4.0817+-0.0515 ? 4.1097+-0.0774 4.0952+-0.0499 ? <arithmetic> 4.4241+-0.0237 4.4228+-0.0276 4.4167+-0.0141 might be 1.0017x faster Conf#1 Conf#2 Conf#3 Conf#3 v. Conf#1 Octane: encrypt 0.16172+-0.00215 0.16109+-0.00128 ? 0.16152+-0.00151 decrypt 2.71957+-0.03372 2.69736+-0.00951 ? 2.72519+-0.03534 ? deltablue x2 0.12925+-0.00105 ? 0.13364+-0.00841 0.13346+-0.00792 ? might be 1.0326x slower earley 0.28582+-0.00161 ? 0.28620+-0.00141 0.28566+-0.00157 boyer 4.98424+-0.06369 ? 4.99195+-0.05932 ? 5.00784+-0.01777 ? navier-stokes x2 4.92973+-0.01107 ? 4.93727+-0.01295 ? 4.94075+-0.01603 ? raytrace x2 0.79913+-0.00429 ? 0.79932+-0.00322 0.79918+-0.00384 ? richards x2 0.08366+-0.00112 0.08281+-0.00121 ? 0.08308+-0.00096 splay x2 0.33701+-0.00093 0.33677+-0.00194 0.33657+-0.00198 regexp x2 16.85333+-0.29356 16.59632+-0.52353 ? 17.19453+-0.48489 ? might be 1.0202x slower pdfjs x2 38.77752+-0.26053 38.50117+-0.27065 ? 38.79237+-0.24719 ? mandreel x2 42.75439+-0.28420 ? 42.78746+-0.22917 42.68880+-0.22971 gbemu x2 29.81598+-0.13736 29.67881+-0.13219 ? 29.74712+-0.08697 closure 0.48299+-0.00163 ? 0.48372+-0.00187 0.48132+-0.00214 jquery 6.46137+-0.03051 6.44973+-0.03612 6.44857+-0.02278 box2d x2 9.28965+-0.03613 9.25584+-0.06818 9.20210+-0.04473 ^ definitely 1.0095x faster zlib x2 369.83863+-6.00287 366.80492+-2.10013 357.38211+-9.99199 might be 1.0349x faster typescript x2 604.51898+-4.68085 601.46037+-7.09917 599.99866+-3.66573 <geometric> 5.04284+-0.01109 5.03358+-0.02908 ? 5.03844+-0.01300 might be 1.0009x faster Conf#1 Conf#2 Conf#3 Conf#3 v. Conf#1 Kraken: ai-astar 87.510+-1.135 87.323+-1.244 86.933+-1.437 audio-beat-detection 38.795+-0.089 ? 38.803+-0.086 ? 38.955+-0.146 ? audio-dft 98.008+-1.442 ? 99.360+-1.819 97.978+-1.367 audio-fft 30.219+-0.036 ? 30.232+-0.057 30.217+-0.035 audio-oscillator 47.718+-0.117 ! 48.244+-0.383 ? 49.056+-0.959 ! definitely 1.0280x slower imaging-darkroom 61.229+-0.140 61.168+-0.099 61.079+-0.118 imaging-desaturate 43.766+-0.176 ? 44.381+-1.204 44.234+-1.271 ? might be 1.0107x slower imaging-gaussian-blur 62.002+-2.629 61.873+-1.343 61.511+-0.835 json-parse-financial 34.149+-0.528 34.027+-0.827 ? 34.664+-1.635 ? might be 1.0151x slower json-stringify-tinderbox 23.238+-0.274 ? 23.801+-0.844 22.486+-0.501 might be 1.0335x faster stanford-crypto-aes 36.782+-0.411 ? 37.145+-1.011 ? 37.468+-1.077 ? might be 1.0187x slower stanford-crypto-ccm 33.039+-1.668 ? 34.222+-1.015 32.737+-1.606 stanford-crypto-pbkdf2 92.784+-0.727 92.221+-0.190 ? 92.377+-0.257 stanford-crypto-sha256-iterative 30.210+-0.173 ? 30.377+-0.487 30.229+-0.106 ? <arithmetic> 51.389+-0.231 ? 51.655+-0.257 51.423+-0.305 ? might be 1.0007x slower Conf#1 Conf#2 Conf#3 Conf#3 v. Conf#1 AsmBench: bigfib.cpp 428.4924+-1.7935 428.1734+-3.9477 427.0936+-1.5308 cray.c 388.9729+-2.6488 388.7526+-2.2565 388.6483+-1.6919 dry.c 473.8678+-57.7763 472.7513+-52.1405 427.1050+-3.3011 might be 1.1095x faster FloatMM.c 738.9834+-22.5742 736.8981+-24.3549 729.6907+-19.7195 might be 1.0127x faster gcc-loops.cpp 3608.5908+-14.2060 3600.5040+-11.9867 ? 3605.0666+-12.3677 n-body.c 800.9911+-4.6294 799.0552+-1.8147 ? 799.8246+-2.8250 Quicksort.c 397.6229+-2.8334 396.7095+-3.1127 ? 399.3499+-3.4433 ? stepanov_container.cpp 3313.9651+-21.8253 3294.3932+-19.0162 ? 3300.6651+-8.2700 Towers.c 265.6785+-0.9580 264.7243+-0.6980 ? 265.2861+-1.3943 <geometric> 729.0649+-9.1576 727.3731+-8.0087 720.1744+-2.7211 might be 1.0123x faster Conf#1 Conf#2 Conf#3 Conf#3 v. Conf#1 Geomean of preferred means: <scaled-result> 30.2352+-0.0944 ? 30.2406+-0.1047 30.1294+-0.0573 might be 1.0035x faster
Benjamin Poulain
Comment 4 2016-08-02 18:07:21 PDT
Benjamin Poulain
Comment 5 2016-08-02 18:08:28 PDT
Comment on attachment 285174 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=285174&action=review > Source/JavaScriptCore/dfg/DFGInPlaceAbstractState.cpp:60 > + m_abstractValues.resize(m_graph.maxNodeCount()); I had to move this here because ConstantFolding can add values after executing each block.
Mark Lam
Comment 6 2016-08-03 09:46:14 PDT
Comment on attachment 285174 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=285174&action=review r=me > Source/JavaScriptCore/ChangeLog:17 > + I moved the vector to Graph as a cache shared by every instanciation of typo: /instanciation/instantiation/. > Source/JavaScriptCore/b3/B3SparseCollection.h:83 > + unsigned lastValueIndex = m_vector.size(); I suggest calling this endIndex. lastValueIndex implies that it points to the last value, when it is meant to point just past the last value. > Source/JavaScriptCore/b3/B3SparseCollection.h:110 > + m_vector.resize(holeIndex); I suggest changing this to be m_vector.resize(endIndex). >> Source/JavaScriptCore/dfg/DFGInPlaceAbstractState.cpp:60 >> + m_abstractValues.resize(m_graph.maxNodeCount()); > > I had to move this here because ConstantFolding can add values after executing each block. Can you turn this into a comment here? I would think that future readers of the code may want to know why this line is here, and at best may glance at the ChangeLog, but will very likely not see this comment in bugzilla. The most likely place to look for an explanation of why this is done is in a comment right here in the code where it's relevant.
Benjamin Poulain
Comment 7 2016-08-03 20:16:37 PDT
Created attachment 285298 [details] Patch for landing
WebKit Commit Bot
Comment 8 2016-08-03 20:45:12 PDT
Comment on attachment 285298 [details] Patch for landing Clearing flags on attachment: 285298 Committed r204112: <http://trac.webkit.org/changeset/204112>
WebKit Commit Bot
Comment 9 2016-08-03 20:45:16 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.