The DFG virtual register allocator currently only allocates slots that are beyond the high watermark of used bytecode temporaries. But there may be unused bytecode temporaries below that high watermark, and the virtual register allocation should be able to reuse them. This will reduce the DFG call frame sizes, and will pave the way for more a fluid interaction between bytecode variables and virtual registers. This is likely to be necessary as we enable inlining of JavaScript code blocks.
This does weird, but good, things for performance. Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "BetterVRAlloc" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree BetterVRAlloc SunSpider: 3d-cube 7.3773+-0.1682 ? 7.4579+-0.1385 ? might be 1.0109x slower 3d-morph 7.6129+-0.1029 ? 7.6418+-0.1258 ? 3d-raytrace 7.4375+-0.1547 ? 7.5778+-0.1489 ? might be 1.0189x slower access-binary-trees 1.7496+-0.0528 1.7096+-0.0679 might be 1.0234x faster access-fannkuch 6.4089+-0.0902 ? 6.5075+-0.1120 ? might be 1.0154x slower access-nbody 3.3459+-0.0701 3.3088+-0.0721 might be 1.0112x faster access-nsieve 2.5818+-0.0483 ? 2.6016+-0.0653 ? bitops-3bit-bits-in-byte 1.7004+-0.0294 ? 1.7129+-0.0297 ? bitops-bits-in-byte 2.7519+-0.0609 ? 2.7989+-0.0604 ? might be 1.0171x slower bitops-bitwise-and 3.3616+-0.0921 3.2038+-0.1061 might be 1.0493x faster bitops-nsieve-bits 5.4789+-0.0996 5.4371+-0.1053 controlflow-recursive 2.1314+-0.0493 ^ 2.0389+-0.0401 ^ definitely 1.0454x faster crypto-aes 6.6118+-0.1531 ? 6.8585+-0.2216 ? might be 1.0373x slower crypto-md5 2.8384+-0.0509 2.7581+-0.0578 might be 1.0291x faster crypto-sha1 2.4793+-0.0552 ? 2.5161+-0.0564 ? might be 1.0148x slower date-format-tofte 10.1105+-0.1987 9.8553+-0.1352 might be 1.0259x faster date-format-xparb 8.8825+-0.1773 ? 8.9709+-0.1812 ? math-cordic 6.3703+-0.0684 ? 6.4577+-0.1055 ? might be 1.0137x slower math-partial-sums 7.6012+-0.1293 ? 7.6175+-0.1011 ? math-spectral-norm 2.8564+-0.0711 ? 2.8866+-0.0705 ? might be 1.0106x slower regexp-dna 10.7062+-0.1282 ? 10.7601+-0.1004 ? string-base64 5.3347+-0.1450 5.2579+-0.1227 might be 1.0146x faster string-fasta 6.4862+-0.1604 6.4511+-0.0916 string-tagcloud 11.1826+-0.2440 ? 11.3868+-0.2751 ? might be 1.0183x slower string-unpack-code 20.3730+-0.3643 ? 20.7027+-0.5347 ? might be 1.0162x slower string-validate-input 6.3280+-0.1449 ? 6.3823+-0.1091 ? <arithmetic> * 6.1577+-0.0307 ? 6.1869+-0.0297 ? <geometric> 5.0812+-0.0208 ? 5.0825+-0.0289 ? <harmonic> 4.1863+-0.0221 4.1675+-0.0390 TipOfTree BetterVRAlloc V8: crypto 72.6790+-0.4355 72.3419+-0.4505 deltablue 228.8242+-2.0552 ? 231.0917+-2.1679 ? earley-boyer 92.2505+-0.3473 ! 97.0999+-0.3843 ! definitely 1.0526x slower raytrace 58.6995+-0.4796 ? 58.9699+-0.4908 ? regexp 104.1287+-0.4657 ? 104.9266+-0.5007 ? richards 207.9427+-1.3704 ^ 189.9799+-1.3203 ^ definitely 1.0946x faster splay 96.5209+-0.5152 95.3950+-0.8332 might be 1.0118x faster <arithmetic> 123.0065+-0.4590 ^ 121.4007+-0.3942 ^ definitely 1.0132x faster <geometric> * 109.4509+-0.3131 108.9284+-0.2682 <harmonic> 98.8930+-0.2627 ? 99.0225+-0.2325 ? TipOfTree BetterVRAlloc Kraken: ai-astar 497.5357+-1.0770 ? 499.8479+-1.8509 ? audio-beat-detection 192.0635+-1.4448 191.8720+-0.8279 audio-dft 272.6202+-3.5519 271.3265+-2.0227 audio-fft 124.5866+-0.5697 124.2436+-0.4142 audio-oscillator 255.3471+-1.9788 ? 255.5191+-1.5136 ? imaging-darkroom 418.8741+-0.6484 ? 419.2329+-2.7905 ? imaging-desaturate 232.8519+-1.0063 ^ 220.8624+-2.0278 ^ definitely 1.0543x faster imaging-gaussian-blur 586.1653+-0.9813 582.0832+-3.6930 json-parse-financial 55.3562+-0.2220 ? 55.7427+-0.2774 ? json-stringify-tinderbox 68.6493+-0.2618 ? 69.0735+-0.3286 ? stanford-crypto-aes 133.5733+-1.5222 133.1723+-1.8243 stanford-crypto-ccm 102.3700+-0.3699 102.1013+-0.5759 stanford-crypto-pbkdf2 194.2358+-1.1225 ? 196.5417+-2.0086 ? might be 1.0119x slower stanford-crypto-sha256-iterative 72.3107+-0.3245 ^ 71.6367+-0.2204 ^ definitely 1.0094x faster <arithmetic> * 229.0386+-0.4915 228.0897+-0.6386 <geometric> 178.9605+-0.4238 178.2923+-0.5165 <harmonic> 139.8795+-0.3317 139.6275+-0.3821 TipOfTree BetterVRAlloc All benchmarks: <arithmetic> 89.9507+-0.1980 ^ 89.4451+-0.2265 ^ definitely 1.0057x faster <geometric> 23.1890+-0.0668 23.1496+-0.0756 <harmonic> 7.3649+-0.0381 7.3327+-0.0667 TipOfTree BetterVRAlloc Geomean of preferred means: <scaled-result> 53.6428+-0.1329 53.5676+-0.1168
Created attachment 110582 [details] the patch
Ran this for a bunch of iterations on V8 to get better significance. And yes, it's a non-trivial improvement. Benchmark report for V8. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "BetterVRAlloc" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc Collected 60 samples per benchmark/VM, with 20 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree BetterVRAlloc crypto 72.8480+-0.1132 ^ 72.1891+-0.1859 ^ definitely 1.0091x faster deltablue 229.7648+-0.9618 ? 230.9335+-0.7912 ? earley-boyer 92.0746+-0.2574 ! 97.1430+-0.1545 ! definitely 1.0550x slower raytrace 58.6933+-0.1451 ! 59.3937+-0.1625 ! definitely 1.0119x slower regexp 104.4100+-0.1694 ? 104.7309+-0.1666 ? richards 208.5533+-0.5139 ^ 189.0045+-0.3390 ^ definitely 1.1034x faster splay 96.9219+-0.3767 96.0736+-0.7089 <arithmetic> 123.3237+-0.1473 ^ 121.3526+-0.1864 ^ definitely 1.0162x faster <geometric> * 109.6712+-0.1097 ^ 109.0018+-0.1606 ^ definitely 1.0061x faster <harmonic> 99.0458+-0.1036 ? 99.1898+-0.1474 ?
Attachment 110582 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/ChangeLog:1: ChangeLog entry has no bug number [changelog/bugnumber] [5] Total errors found: 1 in 8 files If any of these errors are false positives, please file a bug against check-webkit-style.
It turns out that my previous attempt at a virtual register allocator was a total bust. I fixed it, and now we have real numbers: Benchmark report for V8. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "BetterVRAlloc" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree BetterVRAlloc crypto 72.9077+-0.4529 ? 73.1820+-0.3834 ? deltablue 233.4981+-2.2990 ^ 229.2700+-1.5657 ^ definitely 1.0184x faster earley-boyer 91.4999+-0.3203 ? 91.6503+-0.4134 ? raytrace 58.7518+-0.4181 58.5479+-0.4437 regexp 104.2604+-0.2749 104.1496+-0.4870 richards 207.6681+-1.7385 ^ 188.0565+-0.9151 ^ definitely 1.1043x faster splay 95.9854+-0.3932 95.5227+-0.5338 <arithmetic> 123.5102+-0.5281 ^ 120.0541+-0.3077 ^ definitely 1.0288x faster <geometric> * 109.6144+-0.2834 ^ 107.7297+-0.1889 ^ definitely 1.0175x faster <harmonic> 98.9001+-0.1905 ^ 98.0251+-0.1539 ^ definitely 1.0089x faster
Updated numbers for all benchmarks: Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "BetterVRAlloc" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree BetterVRAlloc SunSpider: 3d-cube 7.3791+-0.1438 7.3422+-0.2024 3d-morph 7.6559+-0.1532 ? 7.6660+-0.1369 ? 3d-raytrace 7.5526+-0.1831 ? 7.6005+-0.1643 ? access-binary-trees 1.6997+-0.0453 ? 1.7195+-0.0596 ? might be 1.0117x slower access-fannkuch 6.4543+-0.1042 6.4251+-0.0914 access-nbody 3.3612+-0.0839 3.3161+-0.0804 might be 1.0136x faster access-nsieve 2.5566+-0.0616 ? 2.6244+-0.0842 ? might be 1.0265x slower bitops-3bit-bits-in-byte 1.7225+-0.0330 ? 1.7510+-0.0330 ? might be 1.0165x slower bitops-bits-in-byte 2.7106+-0.0671 ? 2.7899+-0.0492 ? might be 1.0293x slower bitops-bitwise-and 3.3565+-0.0647 3.3551+-0.1026 bitops-nsieve-bits 5.4728+-0.1269 ? 5.5231+-0.1031 ? controlflow-recursive 2.0593+-0.0302 ? 2.0820+-0.0449 ? might be 1.0110x slower crypto-aes 6.7189+-0.1746 ? 6.7529+-0.1620 ? crypto-md5 2.8141+-0.0487 ? 2.8817+-0.0505 ? might be 1.0240x slower crypto-sha1 2.4218+-0.0621 ? 2.5025+-0.0589 ? might be 1.0333x slower date-format-tofte 10.0579+-0.1816 10.0549+-0.1680 date-format-xparb 9.4010+-0.4992 ? 9.6260+-0.4280 ? might be 1.0239x slower math-cordic 6.5915+-0.1215 6.5062+-0.1217 might be 1.0131x faster math-partial-sums 7.6552+-0.1328 ? 7.7737+-0.1838 ? might be 1.0155x slower math-spectral-norm 2.8287+-0.0347 ? 2.8479+-0.0720 ? regexp-dna 10.7310+-0.1783 10.7082+-0.1497 string-base64 5.3418+-0.1218 5.3307+-0.1068 string-fasta 6.4322+-0.1368 ? 6.4788+-0.2142 ? string-tagcloud 11.3594+-0.1888 ? 11.3987+-0.2564 ? string-unpack-code 20.4972+-0.2142 ? 20.6382+-0.3662 ? string-validate-input 6.2973+-0.1110 ? 6.3350+-0.1157 ? <arithmetic> * 6.1973+-0.0333 ? 6.2319+-0.0329 ? <geometric> 5.0876+-0.0263 ? 5.1265+-0.0274 ? <harmonic> 4.1650+-0.0314 ? 4.2101+-0.0321 ? might be 1.0108x slower TipOfTree BetterVRAlloc V8: crypto 73.4915+-0.5072 ^ 72.5717+-0.3088 ^ definitely 1.0127x faster deltablue 229.3694+-2.9698 ? 230.5546+-1.8132 ? earley-boyer 91.6589+-0.2657 ? 92.8696+-1.2849 ? might be 1.0132x slower raytrace 58.4779+-0.3150 ? 59.0484+-0.5281 ? regexp 104.5795+-0.3130 104.2271+-0.4320 richards 207.9325+-1.2140 ^ 189.4965+-1.3250 ^ definitely 1.0973x faster splay 96.3336+-0.5974 95.9125+-0.5882 <arithmetic> 123.1205+-0.4639 ^ 120.6686+-0.3824 ^ definitely 1.0203x faster <geometric> * 109.5373+-0.2146 ^ 108.2105+-0.3030 ^ definitely 1.0123x faster <harmonic> 98.9614+-0.1761 ^ 98.4156+-0.2803 ^ definitely 1.0055x faster TipOfTree BetterVRAlloc Kraken: ai-astar 503.0404+-2.6460 501.6886+-4.6412 audio-beat-detection 194.5291+-1.4049 ^ 191.3495+-0.9404 ^ definitely 1.0166x faster audio-dft 272.6498+-2.3990 ? 279.1601+-5.1765 ? might be 1.0239x slower audio-fft 125.3488+-0.6055 ? 125.3593+-0.5099 ? audio-oscillator 253.9799+-2.2824 253.1516+-1.7852 imaging-darkroom 423.5625+-5.1567 ? 425.0566+-8.0295 ? imaging-desaturate 233.2266+-1.3487 232.7642+-1.3199 imaging-gaussian-blur 586.1470+-0.9626 ? 587.3914+-1.0103 ? json-parse-financial 55.5272+-0.1996 ! 56.1811+-0.3381 ! definitely 1.0118x slower json-stringify-tinderbox 69.5921+-1.4267 68.9636+-0.4248 stanford-crypto-aes 133.6110+-1.4540 133.2715+-1.5036 stanford-crypto-ccm 102.0949+-0.4570 101.1988+-0.6843 stanford-crypto-pbkdf2 194.6565+-1.3833 ^ 192.1962+-0.7880 ^ definitely 1.0128x faster stanford-crypto-sha256-iterative 72.4658+-0.2888 ^ 71.5203+-0.3084 ^ definitely 1.0132x faster <arithmetic> * 230.0308+-0.5548 229.9466+-0.9315 <geometric> 179.6701+-0.5064 179.2885+-0.6065 <harmonic> 140.4526+-0.5347 140.0540+-0.3782 TipOfTree BetterVRAlloc All benchmarks: <arithmetic> 90.2852+-0.1993 89.9141+-0.2787 <geometric> 23.2351+-0.0815 ? 23.2761+-0.0696 ? <harmonic> 7.3290+-0.0539 ? 7.4054+-0.0549 ? might be 1.0104x slower TipOfTree BetterVRAlloc Geomean of preferred means: <scaled-result> 53.8492+-0.1318 53.7238+-0.1173
Created attachment 110589 [details] the patch
Attachment 110589 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/dfg/DFGJITCompiler.cpp:165: Missing space after , [whitespace/comma] [3] Total errors found: 1 in 10 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 110594 [details] the patch - fix style
Comment on attachment 110594 [details] the patch - fix style Make sure you run it through at least the fast/js part of webkit tests.
Landed in r97197.
This patch appears to have broken GTK builds (missing stdio.h). Build fix attempted in http://trac.webkit.org/changeset/97197.