RESOLVED FIXED68316
DFG JIT does not have full block-local CSE
https://bugs.webkit.org/show_bug.cgi?id=68316
Summary DFG JIT does not have full block-local CSE
Filip Pizlo
Reported 2011-09-17 14:01:17 PDT
DFG JIT can perform some simple load elimination on array accesses, but other than that it cannot do common subexpression elimination (CSE). This is unfortunate, since CSE would be of great benefit for global variable accesses, more complex forms of array accesses, and repeated arithmetic. This is particularly true since these operations in JavaScript are so expensive. Thus eliminating even a few of them can have a big effect. The DFG JIT should implement block-local CSE with facilities to eliminate both pure operations (additions on numbers, etc) and heap operations (GetGlobalVar, GetByVal, CheckMethod). The latter should take care to ensure correctness in the case of conflicting side-effects. CSE should be OSR-friendly: if it eliminates an operation, it should ensure that everything needed to perform that operation in the old JIT if OSR happens is still intact.
Attachments
work in progress (42.74 KB, patch)
2011-09-17 14:08 PDT, Filip Pizlo
no flags
work in progress - fixed clobber detection (44.03 KB, patch)
2011-09-17 14:25 PDT, Filip Pizlo
no flags
fix style (44.04 KB, patch)
2011-09-17 14:30 PDT, Filip Pizlo
no flags
work in progress - fixed crypto slow-down (44.52 KB, patch)
2011-09-17 15:42 PDT, Filip Pizlo
no flags
the patch - reduced worst case performance a bit (45.35 KB, patch)
2011-09-17 16:31 PDT, Filip Pizlo
no flags
the patch - updated changelog (45.26 KB, patch)
2011-09-17 16:55 PDT, Filip Pizlo
oliver: review+
Filip Pizlo
Comment 1 2011-09-17 14:08:42 PDT
Created attachment 107773 [details] work in progress Still working on this, but it's pretty awesome for Kraken. Benchmark report for Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "PhantomCSE" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree PhantomCSE ai-astar 634.8768+-5.2150 632.3351+-4.7085 audio-beat-detection 467.5151+-1.2716 465.7044+-1.3745 audio-dft 423.1598+-2.3857 421.5126+-2.3649 audio-fft 364.7906+-1.2717 ? 365.5174+-1.0969 ? audio-oscillator 312.1471+-1.5005 ? 331.5295+-21.8849 ? might be 1.0621x slower imaging-darkroom 412.6474+-1.6945 411.6372+-1.1734 imaging-desaturate 207.4252+-0.7089 ! 221.9861+-0.8245 ! definitely 1.0702x slower imaging-gaussian-blur 1082.7893+-2.2880 ^ 591.5093+-0.8721 ^ definitely 1.8306x faster json-parse-financial 49.5232+-0.2997 ? 50.3901+-1.3708 ? might be 1.0175x slower json-stringify-tinderbox 68.0220+-0.2688 67.8016+-0.6565 stanford-crypto-aes 144.5714+-0.6423 143.6752+-1.0772 stanford-crypto-ccm 110.8207+-0.2111 110.1401+-0.5823 stanford-crypto-pbkdf2 396.3371+-1.5674 394.0494+-5.6482 stanford-crypto-sha256-iterative 148.7113+-0.6486 ? 149.4918+-0.3482 ? <arithmetic> 344.5241+-0.4476 ^ 311.2343+-1.5880 ^ definitely 1.1070x faster <geometric> 252.2160+-0.3048 ^ 243.4866+-0.8714 ^ definitely 1.0359x faster <harmonic> 174.4002+-0.3305 174.1282+-0.7948
Filip Pizlo
Comment 2 2011-09-17 14:25:29 PDT
Created attachment 107774 [details] work in progress - fixed clobber detection This fixes a bug where nodes that might clobber the world depending on what the predictions are are not caught as having clobbered the world. That would have potentially caused a heap access to miss a mutation performed by a valueOf method. This does not affect performance, we still have a win. Benchmark report for Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "PhantomCSE" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree PhantomCSE ai-astar 633.3743+-5.5077 628.2505+-3.3314 audio-beat-detection 467.7312+-1.1106 465.8301+-2.4056 audio-dft 425.0346+-3.3409 421.6433+-3.9040 audio-fft 364.9765+-1.0920 363.3348+-0.7880 audio-oscillator 312.2311+-1.3474 ? 334.1131+-24.5161 ? might be 1.0701x slower imaging-darkroom 421.8919+-7.9838 412.6673+-3.6377 might be 1.0224x faster imaging-desaturate 207.5554+-0.8215 ! 222.0429+-0.8075 ! definitely 1.0698x slower imaging-gaussian-blur 1080.7974+-3.1483 ^ 590.5500+-1.2227 ^ definitely 1.8302x faster json-parse-financial 49.4475+-0.2421 ? 49.8471+-0.5680 ? json-stringify-tinderbox 67.7704+-0.3214 67.3444+-0.3539 stanford-crypto-aes 145.3547+-1.7446 144.5298+-1.4606 stanford-crypto-ccm 113.2217+-1.2073 111.4091+-1.2598 might be 1.0163x faster stanford-crypto-pbkdf2 395.2784+-1.8910 393.7265+-1.4519 stanford-crypto-sha256-iterative 149.5971+-0.5527 ^ 148.2928+-0.5524 ^ definitely 1.0088x faster <arithmetic> 345.3045+-1.0924 ^ 310.9701+-1.5270 ^ definitely 1.1104x faster <geometric> 253.0919+-0.7436 ^ 243.2712+-1.0140 ^ definitely 1.0404x faster <harmonic> 174.9138+-0.5020 ^ 173.6795+-0.4600 ^ definitely 1.0071x faster
WebKit Review Bot
Comment 3 2011-09-17 14:28:02 PDT
Attachment 107774 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/dfg/DFGPropagator.cpp:537: Place brace on its own line for function definitions. [whitespace/braces] [4] Source/JavaScriptCore/dfg/DFGPropagator.cpp:544: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 2 in 7 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 4 2011-09-17 14:30:30 PDT
Created attachment 107775 [details] fix style
Filip Pizlo
Comment 5 2011-09-17 15:22:14 PDT
This appears to cause problems in v8-crypto. Will investigate. Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "PhantomCSE" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree PhantomCSE SunSpider: 3d-cube 7.8284+-0.2287 ? 8.0779+-0.3870 ? might be 1.0319x slower 3d-morph 7.6338+-0.1110 7.4934+-0.0868 might be 1.0187x faster 3d-raytrace 7.8429+-0.1767 7.7937+-0.1765 access-binary-trees 2.3852+-0.1110 2.3590+-0.0676 might be 1.0111x faster access-fannkuch 11.8849+-0.3509 11.7750+-0.2073 access-nbody 4.2269+-0.0789 ? 4.2728+-0.0942 ? might be 1.0109x slower access-nsieve 2.8027+-0.1699 2.6734+-0.0905 might be 1.0483x faster bitops-3bit-bits-in-byte 1.6754+-0.0381 ? 1.7034+-0.0363 ? might be 1.0167x slower bitops-bits-in-byte 2.8888+-0.1243 2.7230+-0.0417 might be 1.0609x faster bitops-bitwise-and 3.5988+-0.0952 ? 3.6492+-0.0804 ? might be 1.0140x slower bitops-nsieve-bits 5.5235+-0.2110 ? 5.5242+-0.1183 ? controlflow-recursive 1.9878+-0.0313 ? 2.0334+-0.0523 ? might be 1.0229x slower crypto-aes 6.8065+-0.2288 ? 6.9506+-0.2345 ? might be 1.0212x slower crypto-md5 2.8490+-0.0740 2.8360+-0.0769 crypto-sha1 2.2570+-0.0533 2.2547+-0.0661 date-format-tofte 10.3381+-0.3317 ? 10.5701+-0.3384 ? might be 1.0225x slower date-format-xparb 8.8419+-0.1870 ? 8.8898+-0.2041 ? math-cordic 6.3064+-0.1460 6.2773+-0.1129 math-partial-sums 7.4684+-0.1287 7.3521+-0.1185 might be 1.0158x faster math-spectral-norm 2.6770+-0.0452 ? 2.7024+-0.0858 ? regexp-dna 11.1097+-0.2560 11.0797+-0.2991 string-base64 5.9207+-0.1820 ? 5.9388+-0.2183 ? string-fasta 7.4196+-0.2309 7.2921+-0.1549 might be 1.0175x faster string-tagcloud 12.2438+-0.3220 ? 12.3827+-0.3968 ? might be 1.0113x slower string-unpack-code 19.0840+-0.4811 ? 19.1473+-0.4783 ? string-validate-input 7.0255+-0.1803 6.7060+-0.1777 might be 1.0477x faster <arithmetic> 6.5626+-0.0592 6.5561+-0.0347 <geometric> 5.4158+-0.0484 5.4012+-0.0282 <harmonic> 4.4173+-0.0425 4.4048+-0.0279 TipOfTree PhantomCSE V8: crypto 85.0254+-1.0501 ! 90.1991+-0.9224 ! definitely 1.0608x slower deltablue 241.6628+-3.2628 ? 246.2428+-3.6104 ? might be 1.0190x slower earley-boyer 97.6438+-0.7456 ? 99.2094+-2.1783 ? might be 1.0160x slower raytrace 71.4261+-2.1758 71.2358+-0.9164 regexp 108.6062+-0.7931 107.9489+-1.3720 richards 222.6911+-1.9571 ? 223.6860+-1.7525 ? splay 100.8236+-1.2160 100.5770+-1.1419 <arithmetic> 132.5542+-0.8131 ? 134.1570+-0.8461 ? might be 1.0121x slower <geometric> 119.5471+-0.8401 ! 121.0450+-0.6406 ! definitely 1.0125x slower <harmonic> 109.7402+-0.9649 ? 111.1702+-0.4928 ? might be 1.0130x slower TipOfTree PhantomCSE Kraken: ai-astar 649.6561+-4.1369 646.0026+-2.4923 audio-beat-detection 475.6792+-4.4012 ? 483.0049+-4.7150 ? might be 1.0154x slower audio-dft 436.4239+-7.3389 ? 446.6362+-8.3083 ? might be 1.0234x slower audio-fft 373.1857+-1.6543 ^ 367.4489+-3.1949 ^ definitely 1.0156x faster audio-oscillator 316.8722+-2.4950 ? 320.6392+-1.8030 ? might be 1.0119x slower imaging-darkroom 422.7289+-3.2483 420.5169+-3.2162 imaging-desaturate 213.9714+-1.3662 ! 227.1311+-2.3809 ! definitely 1.0615x slower imaging-gaussian-blur 1105.4039+-10.1736 ^ 599.5372+-3.7661 ^ definitely 1.8438x faster json-parse-financial 51.4321+-1.0503 ? 52.0258+-1.3815 ? might be 1.0115x slower json-stringify-tinderbox 71.3999+-0.7193 70.0470+-1.1840 might be 1.0193x faster stanford-crypto-aes 148.7115+-1.0955 145.8912+-2.0387 might be 1.0193x faster stanford-crypto-ccm 114.4813+-1.0582 112.6201+-1.1088 might be 1.0165x faster stanford-crypto-pbkdf2 405.1123+-2.8189 401.1639+-4.8977 stanford-crypto-sha256-iterative 153.8989+-1.7272 ^ 150.0681+-1.9634 ^ definitely 1.0255x faster <arithmetic> 352.7827+-0.6156 ^ 317.3381+-1.0955 ^ definitely 1.1117x faster <geometric> 259.2795+-0.4700 ^ 248.2747+-1.0513 ^ definitely 1.0443x faster <harmonic> 180.3846+-0.9365 178.1785+-1.5075 might be 1.0124x faster TipOfTree PhantomCSE All benchmarks: <arithmetic> 128.4567+-0.2710 ^ 118.1338+-0.3207 ^ definitely 1.0874x faster <geometric> 27.1805+-0.1378 ^ 26.8417+-0.0892 ^ definitely 1.0126x faster <harmonic> 7.7977+-0.0731 7.7760+-0.0484
Filip Pizlo
Comment 6 2011-09-17 15:42:26 PDT
Created attachment 107776 [details] work in progress - fixed crypto slow-down Turns out there was a pretty stupid goof in the handling of PutByVal aliasing, which would result in almost no PutByValAlias nodes being created. Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "PhantomCSE" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree PhantomCSE SunSpider: 3d-cube 7.9232+-0.2648 7.7525+-0.1162 might be 1.0220x faster 3d-morph 7.5163+-0.1542 ? 7.6411+-0.1451 ? might be 1.0166x slower 3d-raytrace 7.6198+-0.2213 ? 7.8154+-0.2687 ? might be 1.0257x slower access-binary-trees 2.2927+-0.0631 ? 2.3430+-0.1076 ? might be 1.0219x slower access-fannkuch 11.7955+-0.2221 11.7900+-0.2864 access-nbody 4.1922+-0.1009 ? 4.2398+-0.1195 ? might be 1.0114x slower access-nsieve 2.6116+-0.0884 2.5449+-0.0317 might be 1.0262x faster bitops-3bit-bits-in-byte 1.7448+-0.0877 1.6776+-0.0485 might be 1.0401x faster bitops-bits-in-byte 2.8421+-0.1109 2.7431+-0.0741 might be 1.0361x faster bitops-bitwise-and 3.5856+-0.1136 3.5406+-0.0905 might be 1.0127x faster bitops-nsieve-bits 5.4613+-0.1900 5.3684+-0.2327 might be 1.0173x faster controlflow-recursive 2.0266+-0.0641 ? 2.0337+-0.0686 ? crypto-aes 7.0701+-0.2844 6.7988+-0.2883 might be 1.0399x faster crypto-md5 2.8068+-0.0990 2.7402+-0.0907 might be 1.0243x faster crypto-sha1 2.2296+-0.0893 ? 2.2779+-0.0488 ? might be 1.0217x slower date-format-tofte 10.1306+-0.2732 ? 10.2069+-0.3424 ? date-format-xparb 8.7109+-0.2887 ? 8.8241+-0.3383 ? might be 1.0130x slower math-cordic 6.1599+-0.1779 6.1429+-0.0971 math-partial-sums 7.4027+-0.1215 ? 7.5718+-0.2252 ? might be 1.0229x slower math-spectral-norm 2.5749+-0.0525 ? 2.7397+-0.1251 ? might be 1.0640x slower regexp-dna 10.8766+-0.1139 10.8204+-0.1617 string-base64 5.7194+-0.1861 ? 5.8138+-0.1489 ? might be 1.0165x slower string-fasta 6.9102+-0.1286 ? 7.0826+-0.1556 ? might be 1.0249x slower string-tagcloud 12.3084+-0.1789 12.1509+-0.2705 might be 1.0130x faster string-unpack-code 18.6770+-0.4576 18.5139+-0.4159 string-validate-input 6.7260+-0.2239 6.7087+-0.2291 <arithmetic> 6.4583+-0.0381 6.4570+-0.0472 <geometric> 5.3302+-0.0372 5.3297+-0.0292 <harmonic> 4.3574+-0.0437 4.3517+-0.0257 TipOfTree PhantomCSE V8: crypto 82.6909+-0.4717 ? 83.5986+-1.4897 ? might be 1.0110x slower deltablue 240.3774+-2.9172 ? 241.6129+-2.6006 ? earley-boyer 94.9925+-0.3658 ? 95.1899+-0.3365 ? raytrace 69.2022+-0.5321 68.5608+-0.3370 regexp 106.4039+-0.2924 ? 106.8109+-0.3315 ? richards 217.5292+-0.7257 ? 219.1786+-1.2439 ? splay 98.7563+-0.6588 ? 98.9796+-0.6181 ? <arithmetic> 129.9932+-0.4350 ? 130.5616+-0.4737 ? <geometric> 116.9140+-0.2731 ? 117.2857+-0.3739 ? <harmonic> 107.1039+-0.2477 ? 107.3130+-0.3540 ? TipOfTree PhantomCSE Kraken: ai-astar 633.8644+-4.5250 629.9101+-4.8375 audio-beat-detection 467.7739+-1.8784 ! 480.9118+-2.5217 ! definitely 1.0281x slower audio-dft 420.5802+-2.2915 ? 422.7050+-3.7068 ? audio-fft 365.8273+-1.0027 ^ 361.6524+-1.0137 ^ definitely 1.0115x faster audio-oscillator 312.5754+-1.2268 ? 313.3259+-1.3247 ? imaging-darkroom 414.3694+-0.6042 ? 417.3312+-6.3912 ? imaging-desaturate 208.8211+-0.4056 ! 219.5143+-1.8043 ! definitely 1.0512x slower imaging-gaussian-blur 1079.3643+-3.0724 ^ 591.0141+-1.4195 ^ definitely 1.8263x faster json-parse-financial 49.3249+-0.2466 ? 49.3504+-0.3346 ? json-stringify-tinderbox 68.2024+-0.2941 ^ 67.4662+-0.1954 ^ definitely 1.0109x faster stanford-crypto-aes 145.1407+-0.6647 143.8512+-1.0142 stanford-crypto-ccm 111.9583+-0.4591 111.0426+-0.5527 stanford-crypto-pbkdf2 393.4172+-1.8730 ! 400.2963+-3.0613 ! definitely 1.0175x slower stanford-crypto-sha256-iterative 149.7088+-0.5165 149.1306+-0.5542 <arithmetic> 344.3520+-0.4113 ^ 311.2502+-0.8385 ^ definitely 1.1064x faster <geometric> 252.5200+-0.3160 ^ 242.9262+-0.5994 ^ definitely 1.0395x faster <harmonic> 174.7026+-0.3688 ^ 173.0850+-0.4576 ^ definitely 1.0093x faster TipOfTree PhantomCSE All benchmarks: <arithmetic> 125.5063+-0.1475 ^ 115.7301+-0.2455 ^ definitely 1.0845x faster <geometric> 26.6428+-0.0985 ^ 26.3484+-0.0743 ^ definitely 1.0112x faster <harmonic> 7.6893+-0.0752 7.6787+-0.0442
Filip Pizlo
Comment 7 2011-09-17 16:31:30 PDT
Created attachment 107778 [details] the patch - reduced worst case performance a bit This improces the CSE algorithm by ensuring that pure CSE only searches for aliases in the range: From: Beginning of the basic block, or the maximum index of the children, whichever is bigger. To: Last occurence of a node with the same opcode. It searches this range backwards. This optimization does not apply for non-pure operations because it would miss side effects. Current numbers with latest changes. Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "PhantomCSE" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc Collected 30 samples per benchmark/VM, with 10 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree PhantomCSE SunSpider: 3d-cube 7.7572+-0.0872 ? 7.7909+-0.1244 ? 3d-morph 7.5184+-0.0785 ? 7.5558+-0.0963 ? 3d-raytrace 7.7239+-0.1032 ? 7.7468+-0.1156 ? access-binary-trees 2.2740+-0.0315 ? 2.3107+-0.0482 ? might be 1.0161x slower access-fannkuch 11.7960+-0.1183 ^ 11.5626+-0.0988 ^ definitely 1.0202x faster access-nbody 4.1633+-0.0454 ? 4.2540+-0.0832 ? might be 1.0218x slower access-nsieve 2.6018+-0.0398 ? 2.6032+-0.0310 ? bitops-3bit-bits-in-byte 1.6668+-0.0267 ? 1.7083+-0.0400 ? might be 1.0249x slower bitops-bits-in-byte 2.7696+-0.0319 ? 2.8233+-0.0574 ? might be 1.0194x slower bitops-bitwise-and 3.5817+-0.0621 ? 3.6520+-0.0629 ? might be 1.0196x slower bitops-nsieve-bits 5.2870+-0.0667 ? 5.2977+-0.0682 ? controlflow-recursive 2.0383+-0.0401 ? 2.1312+-0.0930 ? might be 1.0456x slower crypto-aes 7.0860+-0.2158 ? 7.1663+-0.1855 ? might be 1.0113x slower crypto-md5 2.8093+-0.0448 ? 2.8269+-0.0811 ? crypto-sha1 2.2280+-0.0402 ? 2.2414+-0.0501 ? date-format-tofte 10.2269+-0.1716 10.2042+-0.1426 date-format-xparb 8.7081+-0.1255 ? 8.8777+-0.1652 ? might be 1.0195x slower math-cordic 6.1956+-0.0690 6.1295+-0.0601 might be 1.0108x faster math-partial-sums 7.4034+-0.0937 7.3807+-0.0997 math-spectral-norm 2.5948+-0.0337 ? 2.6117+-0.0482 ? regexp-dna 10.8661+-0.1062 ? 10.9373+-0.1211 ? string-base64 5.8322+-0.1108 5.8219+-0.1171 string-fasta 6.9385+-0.1083 ? 7.0151+-0.1314 ? might be 1.0110x slower string-tagcloud 12.1121+-0.1935 ? 12.3550+-0.2062 ? might be 1.0201x slower string-unpack-code 18.8462+-0.3793 18.5825+-0.2283 might be 1.0142x faster string-validate-input 6.6845+-0.1124 ? 6.7245+-0.1514 ? <arithmetic> 6.4504+-0.0253 ? 6.4735+-0.0239 ? <geometric> 5.3125+-0.0181 ! 5.3515+-0.0169 ! definitely 1.0073x slower <harmonic> 4.3312+-0.0168 ! 4.3786+-0.0162 ! definitely 1.0109x slower TipOfTree PhantomCSE V8: crypto 82.7124+-0.2753 ? 82.9466+-0.3673 ? deltablue 239.2213+-0.8353 ? 239.8840+-1.0250 ? earley-boyer 94.9496+-0.1586 ? 95.3712+-0.4045 ? raytrace 69.0085+-0.2725 68.9221+-0.4887 regexp 107.3699+-0.7440 107.0229+-0.3693 richards 217.7471+-0.5430 ! 219.5234+-0.5972 ! definitely 1.0082x slower splay 98.7746+-0.3300 98.7596+-0.2519 <arithmetic> 129.9691+-0.2003 ? 130.3471+-0.2048 ? <geometric> 116.9542+-0.1749 ? 117.1789+-0.1988 ? <harmonic> 107.1496+-0.1678 ? 107.2769+-0.2259 ? TipOfTree PhantomCSE Kraken: ai-astar 636.9671+-4.2157 ^ 629.5812+-1.9895 ^ definitely 1.0117x faster audio-beat-detection 467.7131+-1.0341 ! 470.1425+-1.0160 ! definitely 1.0052x slower audio-dft 425.3115+-3.7983 423.8902+-2.1948 audio-fft 364.0817+-0.6708 363.9339+-1.0152 audio-oscillator 312.5008+-0.3736 ? 312.5853+-0.5099 ? imaging-darkroom 413.9106+-0.8494 ^ 410.8664+-0.5848 ^ definitely 1.0074x faster imaging-desaturate 207.8968+-0.4679 ! 218.0565+-0.4687 ! definitely 1.0489x slower imaging-gaussian-blur 1081.8128+-1.7363 ^ 589.7422+-1.1866 ^ definitely 1.8344x faster json-parse-financial 49.2932+-0.1803 ? 49.7524+-0.2908 ? json-stringify-tinderbox 69.6689+-1.5041 ^ 67.6637+-0.3880 ^ definitely 1.0296x faster stanford-crypto-aes 144.3012+-0.4594 144.2128+-0.4487 stanford-crypto-ccm 111.9439+-0.3347 111.7710+-0.3905 stanford-crypto-pbkdf2 396.0157+-1.5465 ? 397.0790+-1.5456 ? stanford-crypto-sha256-iterative 149.2708+-0.4787 ! 150.4953+-0.5297 ! definitely 1.0082x slower <arithmetic> 345.0492+-0.4615 ^ 309.9837+-0.3411 ^ definitely 1.1131x faster <geometric> 252.9574+-0.3953 ^ 242.5861+-0.2959 ^ definitely 1.0428x faster <harmonic> 175.1667+-0.5736 ^ 173.5501+-0.3501 ^ definitely 1.0093x faster TipOfTree PhantomCSE All benchmarks: <arithmetic> 125.7060+-0.1391 ^ 115.3301+-0.1168 ^ definitely 1.0900x faster <geometric> 26.6087+-0.0437 ^ 26.3931+-0.0411 ^ definitely 1.0082x faster <harmonic> 7.6444+-0.0289 ! 7.7252+-0.0277 ! definitely 1.0106x slower
Filip Pizlo
Comment 8 2011-09-17 16:55:22 PDT
Created attachment 107779 [details] the patch - updated changelog
Oliver Hunt
Comment 9 2011-09-17 17:20:48 PDT
Comment on attachment 107779 [details] the patch - updated changelog View in context: https://bugs.webkit.org/attachment.cgi?id=107779&action=review r=me but i'd like some response to my commentary > Source/JavaScriptCore/dfg/DFGNode.h:176 > + macro(PutByVal, NodeMustGenerate | NodeClobbersWorld) \ > + macro(PutByValAlias, NodeMustGenerate | NodeClobbersWorld) \ > + macro(GetById, NodeResultJS | NodeMustGenerate | NodeClobbersWorld) \ > + macro(PutById, NodeMustGenerate | NodeClobbersWorld) \ > + macro(PutByIdDirect, NodeMustGenerate | NodeClobbersWorld) \ Do we really need to consider the gets to clobber the world? if we know that the get is a pure object property, then the access is pure > Source/JavaScriptCore/dfg/DFGPropagator.cpp:732 > + case ArithAbs: Don't we support sqrt(), and min/max? In the longer term it seems we would probably want a generic handler for pure intrinsics. > Source/JavaScriptCore/dfg/DFGPropagator.cpp:-443 > -#if ENABLE(DFG_DEBUG_VERBOSE) > - graph.dump(codeBlock); > -#endif is this deliberate?
Filip Pizlo
Comment 10 2011-09-17 17:25:43 PDT
> > Source/JavaScriptCore/dfg/DFGNode.h:176 > > + macro(PutByVal, NodeMustGenerate | NodeClobbersWorld) \ > > + macro(PutByValAlias, NodeMustGenerate | NodeClobbersWorld) \ > > + macro(GetById, NodeResultJS | NodeMustGenerate | NodeClobbersWorld) \ > > + macro(PutById, NodeMustGenerate | NodeClobbersWorld) \ > > + macro(PutByIdDirect, NodeMustGenerate | NodeClobbersWorld) \ > > Do we really need to consider the gets to clobber the world? if we know that the get is a pure object property, then the access is pure Currently we never predict or speculate the purity of GetById. All GetById's get compiled with a slow path that may call arbitrary JS code for getters, or arbitrary C++ code in WebCore. This will change once we start doing reasonable things for GetById. We will still have a GetById that covers the cases where we know nothing (in which case it'll be have as it does now, and we'll still have to assume that it clobbers the world), but we will also have another GetById op that already knows the structure it's accessing and already knows that the access is pure. That's probably my next task... > > > Source/JavaScriptCore/dfg/DFGPropagator.cpp:732 > > + case ArithAbs: > > Don't we support sqrt(), and min/max? Haven't landed that patch yet. :-) > > In the longer term it seems we would probably want a generic handler for pure intrinsics. pureCSE() is that handler. So it's just a question of whether we want performNodeCSE() to have one switch statement (which is more efficient) or have an if statement at the top like: if (node.op & NodeIsPure) { setReplacement(pureCSE(node)); return; } I think that would be reasonable if the number of pure arithmetic nodes that are CSE-able grows really big. > > > Source/JavaScriptCore/dfg/DFGPropagator.cpp:-443 > > -#if ENABLE(DFG_DEBUG_VERBOSE) > > - graph.dump(codeBlock); > > -#endif > > is this deliberate? Yup, since fixpoint() now does dump() calls, including one at the end.
Filip Pizlo
Comment 11 2011-09-17 20:46:39 PDT
Landed in r95389.
Note You need to log in before you can comment on or make changes to this bug.