Summary: | DFG ArithMul power-of-two case does not check for overflow | ||
---|---|---|---|
Product: | WebKit | Reporter: | Filip Pizlo <fpizlo> |
Component: | JavaScriptCore | Assignee: | Nobody <webkit-unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | Normal | CC: | barraclough, fpizlo, yuqiang.xian |
Priority: | P2 | Keywords: | InRadar |
Version: | 528+ (Nightly build) | ||
Hardware: | All | ||
OS: | All |
Description
Filip Pizlo
2011-12-09 17:33:03 PST
Given how difficult it is to test whether a left-shift overflowed, and the number of registers it would burn, and the speed with which Intel chips do imull, I think the best thing to do is to disable that code-path - provided it isn't too gigantic of a slow-down. Blowing this away does not really degrade performance. [pizlo@nitroflex bencher] ./bencher TipOfTree:/Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc FixMulPow2:/Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/jsc --remote oldmac,bigmac --localPackaging VM builds for remote hosts... Sending VM builds to oldmac... Running on oldmac... 376/376 Generating benchmark report at TipOfTree_FixMulPow2_SunSpiderV8Kraken_20111209_1754_benchReport.txt Benchmark report for SunSpider, V8, and Kraken on oldmac.local (MacPro4,1). VMs tested: "TipOfTree" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc (r102489) "FixMulPow2" at /Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/jsc (r102489) Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree FixMulPow2 SunSpider: 3d-cube 8.8224+-0.0405 ? 8.8635+-0.0508 ? 3d-morph 10.1297+-0.0367 ? 10.4209+-0.3263 ? might be 1.0287x slower 3d-raytrace 9.2536+-0.0649 ! 9.4933+-0.1461 ! definitely 1.0259x slower access-binary-trees 1.9176+-0.0086 1.9080+-0.0087 access-fannkuch 9.0631+-0.0523 ? 9.0975+-0.0089 ? access-nbody 4.7436+-0.0122 ? 4.8236+-0.0806 ? might be 1.0169x slower access-nsieve 3.7943+-0.0098 ? 3.8041+-0.0410 ? bitops-3bit-bits-in-byte 1.4998+-0.0177 ? 1.5089+-0.0232 ? bitops-bits-in-byte 6.1231+-0.0378 6.0817+-0.0340 bitops-bitwise-and 3.9732+-0.0070 ? 4.0233+-0.0642 ? might be 1.0126x slower bitops-nsieve-bits 6.8225+-0.0428 ? 6.9564+-0.1278 ? might be 1.0196x slower controlflow-recursive 2.7872+-0.0381 ? 2.7994+-0.0512 ? crypto-aes 8.6859+-0.0601 ? 8.7250+-0.0617 ? crypto-md5 2.9454+-0.0171 ? 2.9963+-0.0467 ? might be 1.0173x slower crypto-sha1 2.6034+-0.0261 ? 2.6267+-0.0379 ? date-format-tofte 13.0016+-0.0988 ? 13.0880+-0.1038 ? date-format-xparb 12.6390+-0.2878 ? 12.6697+-0.2360 ? math-cordic 8.7144+-0.0323 8.6458+-0.0381 math-partial-sums 12.6258+-0.0277 12.6119+-0.0144 math-spectral-norm 3.1340+-0.0076 ? 3.1881+-0.0610 ? might be 1.0172x slower regexp-dna 10.7563+-0.0510 10.6554+-0.1093 string-base64 5.1810+-0.0780 5.1474+-0.0543 string-fasta 8.8090+-0.0119 ? 8.9538+-0.1581 ? might be 1.0164x slower string-tagcloud 15.0102+-0.0924 ? 15.1508+-0.1006 ? string-unpack-code 25.7089+-0.0443 ? 26.2098+-0.4604 ? might be 1.0195x slower string-validate-input 6.7554+-0.0640 ? 6.8227+-0.0939 ? <arithmetic> * 7.9039+-0.0266 ? 7.9720+-0.0419 ? might be 1.0086x slower <geometric> 6.3868+-0.0230 ? 6.4348+-0.0287 ? might be 1.0075x slower <harmonic> 5.0243+-0.0219 ? 5.0577+-0.0219 ? might be 1.0066x slower TipOfTree FixMulPow2 V8: crypto 92.6449+-0.4076 92.4068+-0.5526 deltablue 208.3868+-1.6903 ? 213.1338+-3.7391 ? might be 1.0228x slower earley-boyer 119.8552+-1.2954 ? 120.1837+-1.2574 ? raytrace 69.6708+-0.1800 ! 71.0942+-0.8180 ! definitely 1.0204x slower regexp 146.8267+-0.4641 ? 149.6573+-3.1169 ? might be 1.0193x slower richards 169.4280+-0.6674 169.0760+-1.2809 splay 108.4465+-1.0325 107.4175+-0.8730 <arithmetic> 130.7513+-0.3781 ? 131.8528+-1.0604 ? might be 1.0084x slower <geometric> * 123.3619+-0.3187 ? 124.2410+-0.7394 ? might be 1.0071x slower <harmonic> 116.2241+-0.2759 ? 117.0133+-0.5145 ? might be 1.0068x slower TipOfTree FixMulPow2 Kraken: ai-astar 895.6052+-0.7002 ? 896.5565+-0.3396 ? audio-beat-detection 250.2319+-1.0338 249.5959+-1.0664 audio-dft 332.1065+-2.6067 ? 332.3157+-3.0548 ? audio-fft 161.7327+-0.8238 ? 163.0326+-2.1831 ? audio-oscillator 342.5079+-5.8040 ? 344.9145+-7.2976 ? imaging-darkroom 409.5305+-8.4259 408.4862+-7.3023 imaging-desaturate 291.6037+-3.6493 ^ 287.1199+-0.0666 ^ definitely 1.0156x faster imaging-gaussian-blur 759.6894+-3.1926 759.1672+-1.8406 json-parse-financial 88.1471+-1.3660 87.4829+-0.3084 json-stringify-tinderbox 101.3181+-1.6798 100.2710+-0.5065 might be 1.0104x faster stanford-crypto-aes 141.4638+-1.0923 141.1779+-0.9186 stanford-crypto-ccm 139.5778+-2.3247 138.2890+-0.9216 stanford-crypto-pbkdf2 281.1497+-2.0951 ? 282.2310+-2.1859 ? stanford-crypto-sha256-iterative 116.9620+-0.3171 ? 117.1520+-0.3943 ? <arithmetic> * 307.9733+-0.7417 307.6995+-0.6817 might be 1.0009x faster <geometric> 240.4913+-0.7018 240.0163+-0.5922 might be 1.0020x faster <harmonic> 194.4717+-0.8527 193.8532+-0.4351 might be 1.0032x faster TipOfTree FixMulPow2 All benchmarks: <arithmetic> 115.5827+-0.2298 ? 115.7028+-0.1950 ? might be 1.0010x slower <geometric> 29.2542+-0.0760 ? 29.3894+-0.0846 ? might be 1.0046x slower <harmonic> 8.8560+-0.0377 ? 8.9138+-0.0379 ? might be 1.0065x slower TipOfTree FixMulPow2 Geomean of preferred means: <scaled-result> 66.9642+-0.1383 ! 67.2943+-0.1674 ! definitely 1.0049x slower Sending VM builds to bigmac... Running on bigmac... 376/376 Generating benchmark report at TipOfTree_FixMulPow2_SunSpiderV8Kraken_20111209_1757_benchReport.txt Benchmark report for SunSpider, V8, and Kraken on bigmac.local (MacPro5,1). VMs tested: "TipOfTree" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc (r102489) "FixMulPow2" at /Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/jsc (r102489) Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree FixMulPow2 SunSpider: 3d-cube 7.3286+-0.0276 ? 7.3798+-0.0625 ? 3d-morph 8.3946+-0.0454 ? 8.5986+-0.1680 ? might be 1.0243x slower 3d-raytrace 7.7222+-0.0504 ? 7.7244+-0.0649 ? access-binary-trees 1.6014+-0.0182 ? 1.6031+-0.0098 ? access-fannkuch 7.6535+-0.1013 ^ 7.5326+-0.0071 ^ definitely 1.0160x faster access-nbody 3.9900+-0.0671 3.9358+-0.0061 might be 1.0138x faster access-nsieve 3.2448+-0.0690 3.2442+-0.0646 bitops-3bit-bits-in-byte 1.2558+-0.0173 1.2494+-0.0160 bitops-bits-in-byte 5.0813+-0.0231 ? 5.1031+-0.0424 ? bitops-bitwise-and 3.2863+-0.0052 ? 3.2889+-0.0103 ? bitops-nsieve-bits 5.6594+-0.0386 5.6381+-0.0379 controlflow-recursive 2.3387+-0.0311 2.3100+-0.0223 might be 1.0125x faster crypto-aes 7.2298+-0.0365 7.2251+-0.0490 crypto-md5 2.5124+-0.0425 2.4768+-0.0285 might be 1.0144x faster crypto-sha1 2.1796+-0.0271 ? 2.2013+-0.0383 ? date-format-tofte 10.8253+-0.1170 10.7042+-0.0380 might be 1.0113x faster date-format-xparb 10.0302+-0.1578 ! 10.3999+-0.1119 ! definitely 1.0369x slower math-cordic 7.2952+-0.1149 7.1475+-0.0331 might be 1.0207x faster math-partial-sums 10.5961+-0.1258 ? 10.6224+-0.1534 ? math-spectral-norm 2.6021+-0.0140 2.5999+-0.0119 regexp-dna 8.8517+-0.0175 ^ 8.6658+-0.0564 ^ definitely 1.0215x faster string-base64 4.3306+-0.1115 4.2197+-0.0120 might be 1.0263x faster string-fasta 7.2507+-0.0228 7.2381+-0.0183 string-tagcloud 12.7410+-0.2312 12.5263+-0.0288 might be 1.0171x faster string-unpack-code 20.7537+-0.0746 ? 20.8791+-0.0528 ? string-validate-input 5.6580+-0.0997 5.5980+-0.0407 might be 1.0107x faster <arithmetic> * 6.5543+-0.0180 6.5428+-0.0196 might be 1.0018x faster <geometric> 5.3204+-0.0183 5.3038+-0.0197 might be 1.0031x faster <harmonic> 4.1997+-0.0209 4.1844+-0.0202 might be 1.0037x faster TipOfTree FixMulPow2 V8: crypto 76.2924+-0.2678 ? 76.7267+-0.2270 ? deltablue 172.7743+-0.6245 ! 175.7640+-1.9377 ! definitely 1.0173x slower earley-boyer 99.2092+-1.0524 ? 99.4144+-1.1150 ? raytrace 59.1495+-1.4139 58.4911+-0.6817 might be 1.0113x faster regexp 124.3739+-2.0267 122.9345+-0.4374 might be 1.0117x faster richards 139.6375+-1.0160 ? 140.2569+-0.5714 ? splay 90.9997+-0.9867 90.1261+-1.0301 <arithmetic> 108.9195+-0.2534 ? 109.1020+-0.3772 ? might be 1.0017x slower <geometric> * 102.8934+-0.3178 102.8569+-0.3466 might be 1.0004x faster <harmonic> 97.0859+-0.4752 96.8867+-0.3668 might be 1.0021x faster TipOfTree FixMulPow2 Kraken: ai-astar 827.1434+-0.7248 818.2130+-11.3371 might be 1.0109x faster audio-beat-detection 204.2132+-0.6689 203.3182+-0.5126 audio-dft 274.2510+-2.8594 273.4628+-2.5241 audio-fft 131.9620+-0.5930 ? 133.7127+-1.7760 ? might be 1.0133x slower audio-oscillator 288.3097+-4.6556 285.9835+-6.0897 imaging-darkroom 334.3828+-5.3327 ? 336.4993+-5.3063 ? imaging-desaturate 239.4356+-2.5173 237.4307+-0.0424 imaging-gaussian-blur 626.7838+-1.5952 626.7320+-0.8463 json-parse-financial 71.7307+-0.1940 ? 71.9160+-0.1204 ? json-stringify-tinderbox 81.9776+-0.1943 ? 82.1879+-0.1845 ? stanford-crypto-aes 117.7920+-0.7258 117.2975+-1.0522 stanford-crypto-ccm 115.8883+-0.9108 115.5176+-0.9126 stanford-crypto-pbkdf2 229.5143+-0.7063 ! 236.7903+-3.2024 ! definitely 1.0317x slower stanford-crypto-sha256-iterative 97.4243+-1.4862 95.9179+-0.3763 might be 1.0157x faster <arithmetic> * 260.0578+-0.3817 259.6414+-1.3031 might be 1.0016x faster <geometric> 199.5318+-0.3089 199.5019+-0.7863 might be 1.0001x faster <harmonic> 160.0845+-0.2719 160.0598+-0.5373 might be 1.0002x faster TipOfTree FixMulPow2 All benchmarks: <arithmetic> 97.3119+-0.1205 97.2086+-0.3627 might be 1.0011x faster <geometric> 24.3451+-0.0576 24.3007+-0.0575 might be 1.0018x faster <harmonic> 7.4010+-0.0362 7.3745+-0.0348 might be 1.0036x faster TipOfTree FixMulPow2 Geomean of preferred means: <scaled-result> 55.9750+-0.1016 55.9049+-0.0945 might be 1.0013x faster Running locally... 376/376 Generating benchmark report at TipOfTree_FixMulPow2_SunSpiderV8Kraken_20111209_1759_benchReport.txt Benchmark report for SunSpider, V8, and Kraken on nitroflex.local (MacBookPro8,2). VMs tested: "TipOfTree" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc (r102489) "FixMulPow2" at /Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/jsc (r102489) Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree FixMulPow2 SunSpider: 3d-cube 6.8294+-0.1312 ? 6.9265+-0.1014 ? might be 1.0142x slower 3d-morph 7.4700+-0.0857 ? 7.6343+-0.0850 ? might be 1.0220x slower 3d-raytrace 7.2177+-0.1896 7.0432+-0.2143 might be 1.0248x faster access-binary-trees 1.5275+-0.0481 ? 1.5482+-0.0604 ? might be 1.0135x slower access-fannkuch 6.1380+-0.0614 ? 6.3060+-0.1295 ? might be 1.0274x slower access-nbody 3.2018+-0.0747 ? 3.2778+-0.0649 ? might be 1.0237x slower access-nsieve 2.5930+-0.0599 2.5773+-0.0709 bitops-3bit-bits-in-byte 1.2633+-0.0460 1.2451+-0.0223 might be 1.0146x faster bitops-bits-in-byte 2.3626+-0.0820 2.3135+-0.0680 might be 1.0212x faster bitops-bitwise-and 3.3883+-0.0481 3.3789+-0.0574 bitops-nsieve-bits 5.4693+-0.1017 ? 5.5954+-0.0840 ? might be 1.0231x slower controlflow-recursive 2.0594+-0.0379 2.0503+-0.0425 crypto-aes 6.8051+-0.1494 ? 6.9432+-0.1415 ? might be 1.0203x slower crypto-md5 2.3312+-0.0607 ? 2.3412+-0.0432 ? crypto-sha1 2.0772+-0.0602 2.0600+-0.0585 date-format-tofte 9.7000+-0.1039 ? 9.7277+-0.1107 ? date-format-xparb 9.3244+-0.1726 9.0841+-0.2486 might be 1.0265x faster math-cordic 6.2549+-0.1035 ? 6.3003+-0.1205 ? math-partial-sums 7.3705+-0.1207 7.3464+-0.1472 math-spectral-norm 2.3696+-0.0436 ^ 2.2685+-0.0297 ^ definitely 1.0446x faster regexp-dna 7.8428+-0.1296 7.7023+-0.1192 might be 1.0182x faster string-base64 4.1656+-0.0694 4.1222+-0.0543 might be 1.0105x faster string-fasta 6.5717+-0.1355 6.5629+-0.1258 string-tagcloud 10.8419+-0.1830 10.7578+-0.2240 string-unpack-code 18.5995+-0.2640 ? 18.6179+-0.2466 ? string-validate-input 5.1557+-0.0988 ? 5.2516+-0.1449 ? might be 1.0186x slower <arithmetic> * 5.7281+-0.0169 ? 5.7301+-0.0158 ? might be 1.0003x slower <geometric> 4.6566+-0.0239 4.6537+-0.0197 might be 1.0006x faster <harmonic> 3.7325+-0.0376 3.7210+-0.0284 might be 1.0031x faster TipOfTree FixMulPow2 V8: crypto 68.1354+-0.5837 ? 68.6847+-0.9837 ? deltablue 146.6398+-1.4785 ? 148.6985+-0.9985 ? might be 1.0140x slower earley-boyer 79.3793+-0.8273 79.1721+-0.8123 raytrace 52.4420+-0.6225 ? 52.6070+-0.6663 ? regexp 102.3748+-0.3992 101.9397+-0.5000 richards 115.8665+-0.3406 ? 116.8016+-0.9163 ? splay 71.3077+-0.9325 70.9522+-0.8682 <arithmetic> 90.8779+-0.2566 ? 91.2651+-0.3007 ? might be 1.0043x slower <geometric> * 86.1284+-0.2534 ? 86.3891+-0.2978 ? might be 1.0030x slower <harmonic> 81.7741+-0.2837 ? 81.9577+-0.3125 ? might be 1.0022x slower TipOfTree FixMulPow2 Kraken: ai-astar 493.6381+-2.4018 488.1730+-4.3529 might be 1.0112x faster audio-beat-detection 183.4718+-0.7613 ? 184.4280+-0.8773 ? audio-dft 268.6801+-3.0884 267.9458+-2.6115 audio-fft 121.0080+-0.9271 119.8034+-0.4678 might be 1.0101x faster audio-oscillator 245.1226+-4.0601 245.1007+-4.6031 imaging-darkroom 299.9684+-4.9093 299.6737+-5.3073 imaging-desaturate 204.9617+-0.6385 ! 223.7069+-1.5759 ! definitely 1.0915x slower imaging-gaussian-blur 533.6444+-2.4466 ? 539.1258+-7.7694 ? might be 1.0103x slower json-parse-financial 57.5675+-0.5910 57.2049+-0.6658 json-stringify-tinderbox 75.0914+-2.5704 72.5306+-0.3800 might be 1.0353x faster stanford-crypto-aes 94.5268+-0.6884 ! 96.4248+-1.0929 ! definitely 1.0201x slower stanford-crypto-ccm 100.4736+-1.1978 100.0588+-1.0934 stanford-crypto-pbkdf2 184.4436+-1.0203 ! 191.8652+-3.0205 ! definitely 1.0402x slower stanford-crypto-sha256-iterative 80.9935+-0.3268 ? 81.3816+-0.4025 ? <arithmetic> * 210.2565+-0.6947 ! 211.9588+-0.5994 ! definitely 1.0081x slower <geometric> 168.1878+-0.7798 ? 169.3609+-0.5405 ? might be 1.0070x slower <harmonic> 136.3751+-0.8477 ? 136.6586+-0.5301 ? might be 1.0021x slower TipOfTree FixMulPow2 All benchmarks: <arithmetic> 79.3333+-0.2116 ! 79.8992+-0.1837 ! definitely 1.0071x slower <geometric> 20.9305+-0.0681 ? 20.9762+-0.0635 ? might be 1.0022x slower <harmonic> 6.5696+-0.0643 6.5503+-0.0489 might be 1.0030x faster TipOfTree FixMulPow2 Geomean of preferred means: <scaled-result> 46.9857+-0.0972 ? 47.1652+-0.0967 ? might be 1.0038x slower [pizlo@nitroflex bencher] Landed in http://trac.webkit.org/changeset/102509 ok... so we cannot simply rely on the NodeMayOverflow flag which is just an assumption and not a guarantee. Sorry for introducing the bug. |