RESOLVED FIXED 74908
Temporary GPR should not be lazily allocated in DFG JIT on X86
https://bugs.webkit.org/show_bug.cgi?id=74908
Summary Temporary GPR should not be lazily allocated in DFG JIT on X86
Yuqiang Xian
Reported 2011-12-19 20:51:29 PST
On X86, we used to allocate a temporary GPR lazily when it's really used rather than defined. This may cause potential issues of allocating registers inside control flow and result in problems in subsequent code generation, for example the DFG JIT may think an operand already being spilled (to satisfy the allocation request) and generate code to read the data from memory, but the allocation and spilling are in a branch which is not taken at runtime, so the generated code is incorrect. Although current DFG JIT code doesn't have this problematic pattern, it's better to cut-off the root to avoid any potential issues in the future. Patch forthcoming.
Attachments
the patch (2.50 KB, patch)
2011-12-19 21:10 PST, Yuqiang Xian
fpizlo: review+
the patch (3.85 KB, patch)
2011-12-19 23:13 PST, Yuqiang Xian
no flags
Yuqiang Xian
Comment 1 2011-12-19 21:10:28 PST
Created attachment 119984 [details] the patch
Filip Pizlo
Comment 2 2011-12-19 21:40:12 PST
I like this change. But I would like to see what performance effect this has. Can you post some performance comparisons, to show that this is either neutral or a win?
Yuqiang Xian
Comment 3 2011-12-19 22:18:22 PST
Seems some cases (especially, Kraken) are impacted by this change, on Linux IA32 (Core i7 Nehalem), possibly caused by some instruction ordering changes, which may need deeper investigation. The performance result: Benchmark report for SunSpider, V8, and Kraken on \c. VMs tested: "ToT" at /home/yxian/WebKit_orig/WebKitBuild/Release/Source/JavaScriptCore/shell/jsc_efl "74908" at /mnt/supplement/WebKit/WebKitBuild/Release_efl/Source/JavaScriptCore/shell/jsc_efl Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. ToT 74908 SunSpider: 3d-cube 8.1819+-1.1747 7.4230+-0.0596 might be 1.1022x faster 3d-morph 10.3083+-0.0440 ? 10.3085+-0.0402 ? 3d-raytrace 9.5854+-0.0422 ? 9.6253+-0.0472 ? access-binary-trees 2.0577+-0.4597 1.8671+-0.0448 might be 1.1021x faster access-fannkuch 10.5143+-0.0494 ? 10.5700+-0.0385 ? access-nbody 5.4011+-0.0447 5.3716+-0.0419 access-nsieve 3.7123+-0.0302 3.6787+-0.0179 bitops-3bit-bits-in-byte 1.1956+-0.0357 1.1731+-0.0326 might be 1.0192x faster bitops-bits-in-byte 4.6542+-0.0578 4.5883+-0.0299 might be 1.0144x faster bitops-bitwise-and 4.2350+-0.0416 4.2235+-0.0216 bitops-nsieve-bits 6.7435+-0.1197 6.7103+-0.0849 controlflow-recursive 2.8677+-0.0536 2.8480+-0.0474 crypto-aes 10.5288+-0.0560 ! 10.7119+-0.0832 ! definitely 1.0174x slower crypto-md5 3.6444+-0.0481 3.6160+-0.0533 crypto-sha1 3.0002+-0.0659 2.9547+-0.0457 might be 1.0154x faster date-format-tofte 11.7048+-0.0871 ? 12.4378+-1.5467 ? might be 1.0626x slower date-format-xparb 11.1803+-0.0719 ? 11.1967+-0.0848 ? math-cordic 9.1706+-0.0320 ? 9.1716+-0.0408 ? math-partial-sums 14.5324+-1.4646 13.8742+-0.0552 might be 1.0474x faster math-spectral-norm 2.5975+-0.0438 2.5914+-0.0397 regexp-dna 9.1554+-0.0716 9.1394+-0.0621 string-base64 5.3683+-0.0487 ? 5.3807+-0.0720 ? string-fasta 9.6004+-0.0515 9.5828+-0.0477 string-tagcloud 15.6370+-0.0537 ? 15.6695+-0.0588 ? string-unpack-code 24.9733+-0.2835 ^ 24.5256+-0.0843 ^ definitely 1.0183x faster string-validate-input 7.1826+-0.0475 ? 7.1969+-0.0606 ? <arithmetic> * 7.9897+-0.0725 7.9399+-0.0647 might be 1.0063x faster <geometric> 6.4304+-0.0505 6.3797+-0.0265 might be 1.0079x faster <harmonic> 4.9402+-0.0567 4.8836+-0.0293 might be 1.0116x faster ToT 74908 V8: crypto 97.0162+-0.4236 ? 97.1926+-0.3836 ? deltablue 166.9306+-0.6094 ? 168.1061+-0.6851 ? earley-boyer 107.9100+-0.2276 ? 108.3725+-0.6809 ? raytrace 52.6538+-0.4061 52.5592+-0.4340 regexp 128.7707+-0.7285 128.2945+-0.2053 richards 173.4722+-0.6191 ! 176.2666+-1.7734 ! definitely 1.0161x slower splay 126.8781+-0.8428 ? 127.1935+-1.4026 ? <arithmetic> 121.9474+-0.2155 ! 122.5693+-0.2931 ! definitely 1.0051x slower <geometric> * 114.6731+-0.2402 ? 115.0977+-0.2633 ? might be 1.0037x slower <harmonic> 106.1049+-0.3105 ? 106.3385+-0.2966 ? might be 1.0022x slower ToT 74908 Kraken: ai-astar 785.4775+-1.8174 784.4535+-5.3564 audio-beat-detection 352.6068+-2.4209 ! 362.4537+-1.0279 ! definitely 1.0279x slower audio-dft 344.7694+-1.9229 ! 371.6174+-3.1400 ! definitely 1.0779x slower audio-fft 215.2775+-0.1940 ! 224.9574+-0.7567 ! definitely 1.0450x slower audio-oscillator 340.3142+-4.0461 ? 343.4859+-2.6913 ? imaging-darkroom 386.6637+-10.3607 ? 387.8107+-10.2604 ? imaging-desaturate 316.4435+-0.3136 ^ 289.5850+-0.1891 ^ definitely 1.0927x faster imaging-gaussian-blur 628.6579+-1.5163 ^ 626.8492+-0.2748 ^ definitely 1.0029x faster json-parse-financial 68.8907+-0.6046 ? 69.0699+-0.5954 ? json-stringify-tinderbox 99.8946+-0.3831 ! 107.0620+-2.1414 ! definitely 1.0717x slower stanford-crypto-aes 132.6336+-0.5289 ? 134.1565+-1.3846 ? might be 1.0115x slower stanford-crypto-ccm 128.3499+-0.9291 ? 128.4467+-0.4705 ? stanford-crypto-pbkdf2 282.3136+-2.1761 282.2923+-0.8014 stanford-crypto-sha256-iterative 108.4356+-0.2734 ! 109.9840+-0.5452 ! definitely 1.0143x slower <arithmetic> * 299.3377+-0.9721 ! 301.5874+-1.0977 ! definitely 1.0075x slower <geometric> 238.5068+-0.7272 ! 241.3118+-0.8654 ! definitely 1.0118x slower <harmonic> 187.7669+-0.5586 ! 190.6914+-0.7903 ! definitely 1.0156x slower ToT 74908 All benchmarks: <arithmetic> 111.7467+-0.3133 ! 112.4818+-0.3473 ! definitely 1.0066x slower <geometric> 28.9750+-0.1359 28.9653+-0.0887 might be 1.0003x faster <harmonic> 8.6980+-0.0973 8.6030+-0.0503 might be 1.0110x faster ToT 74908 Geomean of preferred means: <scaled-result> 64.9695+-0.2435 ? 65.0765+-0.2367 ? might be 1.0016x slower
Filip Pizlo
Comment 4 2011-12-19 22:20:31 PST
Comment on attachment 119984 [details] the patch Thanks for running the tests! The overall performance looks OK to me. I'll leave it up to you if you want to investigate the Kraken issue more, or if you want to land as-is. R=me.
Yuqiang Xian
Comment 5 2011-12-19 23:13:19 PST
Created attachment 119991 [details] the patch Found the performance root cause and updated the patch. The new performance result is: Benchmark report for SunSpider, V8, and Kraken on \c. VMs tested: "ToT" at /home/yxian/WebKit_orig/WebKitBuild/Release/Source/JavaScriptCore/shell/jsc_efl "74908" at /mnt/supplement/WebKit/WebKitBuild/Release_efl/Source/JavaScriptCore/shell/jsc_efl Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. ToT 74908 SunSpider: 3d-cube 7.4440+-0.0813 ? 7.4742+-0.0635 ? 3d-morph 10.3584+-0.0856 ? 10.3893+-0.0923 ? 3d-raytrace 9.6399+-0.1474 ? 9.7055+-0.1280 ? access-binary-trees 1.8435+-0.0381 ? 1.8930+-0.0779 ? might be 1.0269x slower access-fannkuch 10.5576+-0.0693 ? 10.5681+-0.0450 ? access-nbody 5.4380+-0.0805 5.3877+-0.0474 access-nsieve 3.7275+-0.0526 3.7014+-0.0445 bitops-3bit-bits-in-byte 1.1854+-0.0383 1.1732+-0.0324 might be 1.0104x faster bitops-bits-in-byte 4.6396+-0.0472 4.6245+-0.0620 bitops-bitwise-and 4.2584+-0.0474 4.2339+-0.0436 bitops-nsieve-bits 6.7512+-0.0879 6.7050+-0.0850 controlflow-recursive 3.2853+-0.5479 2.8959+-0.0633 might be 1.1344x faster crypto-aes 10.5296+-0.0667 10.5109+-0.0734 crypto-md5 3.6452+-0.0310 ? 3.6682+-0.0425 ? crypto-sha1 3.0064+-0.0570 2.9904+-0.0770 date-format-tofte 11.7823+-0.1143 11.7537+-0.0900 date-format-xparb 11.0766+-0.1094 ? 11.4662+-0.4110 ? might be 1.0352x slower math-cordic 9.2163+-0.0833 ? 9.2179+-0.0949 ? math-partial-sums 13.9619+-0.0751 ? 13.9655+-0.0788 ? math-spectral-norm 2.6618+-0.0692 2.6456+-0.0585 regexp-dna 9.2515+-0.1169 9.1970+-0.1043 string-base64 5.3948+-0.0629 ? 5.3971+-0.0619 ? string-fasta 9.6174+-0.0744 ? 9.6770+-0.0797 ? string-tagcloud 15.7726+-0.0843 15.6713+-0.0877 string-unpack-code 24.6931+-0.0797 24.6056+-0.0981 string-validate-input 7.2508+-0.0784 7.2343+-0.0573 <arithmetic> * 7.9611+-0.0386 7.9520+-0.0259 might be 1.0011x faster <geometric> 6.4309+-0.0476 6.4081+-0.0254 might be 1.0036x faster <harmonic> 4.9437+-0.0469 4.9163+-0.0264 might be 1.0056x faster ToT 74908 V8: crypto 97.3909+-0.3260 ? 97.5333+-0.3836 ? deltablue 167.5801+-0.6006 ? 167.9793+-0.8711 ? earley-boyer 108.7405+-0.2172 108.4502+-0.1726 raytrace 53.0596+-0.4061 52.8163+-0.4649 regexp 128.6210+-0.4573 128.3101+-0.3374 richards 175.4681+-1.0956 ^ 173.8435+-0.3191 ^ definitely 1.0093x faster splay 127.1825+-1.1579 125.7619+-0.3356 might be 1.0113x faster <arithmetic> 122.5775+-0.2861 122.0992+-0.2293 might be 1.0039x faster <geometric> * 115.2596+-0.2390 114.8277+-0.2664 might be 1.0038x faster <harmonic> 106.6765+-0.2646 106.2852+-0.3581 might be 1.0037x faster ToT 74908 Kraken: ai-astar 786.9061+-1.1917 ? 787.0541+-0.6627 ? audio-beat-detection 358.1306+-6.2473 355.5454+-2.8121 audio-dft 348.0289+-2.0739 ? 349.0720+-2.3584 ? audio-fft 216.8900+-0.9561 ? 217.2359+-0.6450 ? audio-oscillator 340.9966+-2.6576 339.1060+-3.3476 imaging-darkroom 388.1350+-9.4523 385.7510+-10.3425 imaging-desaturate 317.7499+-0.7087 317.3586+-1.5265 imaging-gaussian-blur 631.5004+-0.9397 ^ 592.7648+-1.7196 ^ definitely 1.0653x faster json-parse-financial 69.2648+-0.7859 68.9126+-0.5316 json-stringify-tinderbox 101.1172+-0.6247 ? 101.6840+-0.7057 ? stanford-crypto-aes 133.1517+-0.3595 ^ 131.7723+-0.2330 ^ definitely 1.0105x faster stanford-crypto-ccm 128.3865+-0.6013 128.1132+-0.4693 stanford-crypto-pbkdf2 283.4278+-0.7111 ^ 279.5538+-1.4392 ^ definitely 1.0139x faster stanford-crypto-sha256-iterative 108.9552+-0.7702 108.2972+-0.3137 <arithmetic> * 300.9029+-0.9610 ^ 297.3015+-1.1468 ^ definitely 1.0121x faster <geometric> 239.8573+-0.7456 ^ 237.9736+-0.8437 ^ definitely 1.0079x faster <harmonic> 188.8577+-0.6692 187.9468+-0.6201 might be 1.0048x faster ToT 74908 All benchmarks: <arithmetic> 112.2909+-0.3145 ^ 111.1419+-0.3781 ^ definitely 1.0103x faster <geometric> 29.0472+-0.1370 28.9062+-0.0918 might be 1.0049x faster <harmonic> 8.7053+-0.0806 8.6573+-0.0456 might be 1.0055x faster ToT 74908 Geomean of preferred means: <scaled-result> 65.1164+-0.1789 ^ 64.7499+-0.1814 ^ definitely 1.0057x faster
Filip Pizlo
Comment 6 2011-12-19 23:24:22 PST
Comment on attachment 119991 [details] the patch r=me
Yuqiang Xian
Comment 7 2011-12-19 23:32:06 PST
Yuqiang Xian
Comment 8 2011-12-19 23:32:44 PST
Comment on attachment 119991 [details] the patch Clearing flags.
Note You need to log in before you can comment on or make changes to this bug.