RESOLVED FIXED Bug 69717
Fix value profiling in 32_64 JIT
https://bugs.webkit.org/show_bug.cgi?id=69717
Summary Fix value profiling in 32_64 JIT
Yuqiang Xian
Reported 2011-10-09 01:02:13 PDT
Current value profiling for 32_64 JIT is broken and cannot record correct predicated types, which results in many speculation failures in the 32_64 DFG JIT, fallbacks to baseline JIT, and re-optimizations again and again. With this fix 32_64 DFG JIT can demonstrate real performance gains.
Attachments
the patch (8.32 KB, patch)
2011-10-09 01:12 PDT, Yuqiang Xian
no flags
Yuqiang Xian
Comment 1 2011-10-09 01:12:20 PDT
Created attachment 110292 [details] the patch
Filip Pizlo
Comment 2 2011-10-09 01:29:00 PDT
Comment on attachment 110292 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=110292&action=review It would be great to see the performance impact, if any, on JSVALUE64 and JSVALUE32_64. > Source/JavaScriptCore/bytecode/ValueProfile.cpp:64 > + if (!value) { Have you made sure that these changes don't perturb SunSpider, V8, and Kraken performance on JSVALUE64?
Yuqiang Xian
Comment 3 2011-10-09 02:03:11 PDT
Performance result on Mac 64 - no difference with this change. Kraken TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: - 4142.9ms +/- 0.2% 4146.3ms +/- 0.4% ============================================================================= ai: - 802.3ms +/- 0.1% 802.7ms +/- 0.1% astar: - 802.3ms +/- 0.1% 802.7ms +/- 0.1% audio: ?? 1091.6ms +/- 0.3% 1093.9ms +/- 0.8% not conclusive: might be *1.002x as slow* beat-detection: - 222.0ms +/- 0.3% 221.3ms +/- 0.2% dft: ?? 440.9ms +/- 0.1% 445.0ms +/- 1.9% not conclusive: might be *1.009x as slow* fft: - 152.0ms +/- 0.2% 152.0ms +/- 0.0% oscillator: - 276.7ms +/- 1.2% 275.6ms +/- 0.3% imaging: ?? 1342.4ms +/- 0.8% 1343.9ms +/- 1.1% not conclusive: might be *1.001x as slow* gaussian-blur: ?? 604.2ms +/- 0.2% 606.8ms +/- 0.6% not conclusive: might be *1.004x as slow* darkroom: - 503.3ms +/- 2.1% 502.1ms +/- 2.6% desaturate: - 234.9ms +/- 0.1% 235.0ms +/- 0.1% json: 1.006x as fast 158.6ms +/- 0.2% 157.7ms +/- 0.3% significant parse-financial: - 69.2ms +/- 0.4% 68.9ms +/- 0.3% stringify-tinderbox: 1.007x as fast 89.4ms +/- 0.4% 88.8ms +/- 0.5% significant stanford: - 748.0ms +/- 0.6% 748.1ms +/- 0.4% crypto-aes: - 153.5ms +/- 0.5% 153.0ms +/- 0.5% crypto-ccm: - 137.7ms +/- 1.1% 137.7ms +/- 1.0% crypto-pbkdf2: ?? 358.0ms +/- 1.1% 358.4ms +/- 0.5% not conclusive: might be *1.001x as slow* crypto-sha256-iterative: ?? 98.8ms +/- 0.5% 99.0ms +/- 0.3% not conclusive: might be *1.002x as slow* SunSpider TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: - 185.4ms +/- 0.4% 184.6ms +/- 0.4% ============================================================================= 3d: - 27.1ms +/- 1.5% 26.9ms +/- 2.0% cube: - 10.0ms +/- 0.0% 10.0ms +/- 0.0% morph: - 8.0ms +/- 0.0% 8.0ms +/- 0.0% raytrace: - 9.1ms +/- 4.5% 8.9ms +/- 5.9% access: - 17.2ms +/- 3.3% 16.7ms +/- 4.5% binary-trees: - 2.5ms +/- 15.1% 2.5ms +/- 20.2% fannkuch: - 7.5ms +/- 6.7% 7.1ms +/- 3.2% nbody: - 4.2ms +/- 10.8% 4.0ms +/- 0.0% nsieve: ?? 3.0ms +/- 0.0% 3.1ms +/- 7.3% not conclusive: might be *1.033x as slow* bitops: - 14.3ms +/- 2.4% 14.0ms +/- 0.0% 3bit-bits-in-byte: - 1.0ms +/- 0.0% 1.0ms +/- 0.0% bits-in-byte: - 5.0ms +/- 0.0% 5.0ms +/- 0.0% bitwise-and: - 3.2ms +/- 9.4% 3.0ms +/- 0.0% nsieve-bits: - 5.1ms +/- 4.4% 5.0ms +/- 0.0% controlflow: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% recursive: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% crypto: - 12.0ms +/- 0.0% 12.0ms +/- 0.0% aes: - 7.0ms +/- 0.0% 7.0ms +/- 0.0% md5: - 3.0ms +/- 0.0% 3.0ms +/- 0.0% sha1: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% date: - 22.0ms +/- 0.0% 22.0ms +/- 0.0% format-tofte: - 12.0ms +/- 0.0% 12.0ms +/- 0.0% format-xparb: - 10.0ms +/- 0.0% 10.0ms +/- 0.0% math: - 20.0ms +/- 0.0% 20.0ms +/- 0.0% cordic: - 7.0ms +/- 0.0% 7.0ms +/- 0.0% partial-sums: - 10.0ms +/- 0.0% 10.0ms +/- 0.0% spectral-norm: - 3.0ms +/- 0.0% 3.0ms +/- 0.0% regexp: ?? 12.8ms +/- 2.4% 13.0ms +/- 0.0% not conclusive: might be *1.016x as slow* dna: ?? 12.8ms +/- 2.4% 13.0ms +/- 0.0% not conclusive: might be *1.016x as slow* string: - 58.0ms +/- 0.0% 58.0ms +/- 0.0% base64: - 6.0ms +/- 0.0% 6.0ms +/- 0.0% fasta: - 7.0ms +/- 0.0% 7.0ms +/- 0.0% tagcloud: - 13.0ms +/- 0.0% 13.0ms +/- 0.0% unpack-code: - 25.0ms +/- 0.0% 25.0ms +/- 0.0% validate-input: - 7.0ms +/- 0.0% 7.0ms +/- 0.0% V8 Before Richards: 5786 DeltaBlue: 4463 Crypto: 13952 RayTrace: 7770 EarleyBoyer: 8285 RegExp: 1741 Splay: 5025 ---- Score (version 6): 5730 After Richards: 5952 DeltaBlue: 4450 Crypto: 13942 RayTrace: 8083 EarleyBoyer: 8240 RegExp: 1738 Splay: 5039 ---- Score (version 6): 5779
Yuqiang Xian
Comment 4 2011-10-09 02:06:10 PDT
Performance result on Linux IA32 with DFG JIT turned on. Only part of the kraken result is available. There are still some issues with v8 benchmarks. TEST COMPARISON FROM (ToT) TO (Mod) DETAILS ============================================================================= ** TOTAL **: 1.47x as fast 10246.2ms +/- 0.3% 6970.3ms +/- 0.1% significant ============================================================================= ai: 2.88x as fast 2270.8ms +/- 0.3% 789.7ms +/- 0.1% significant astar: 2.88x as fast 2270.8ms +/- 0.3% 789.7ms +/- 0.1% significant audio: 1.071x as fast 2427.6ms +/- 0.6% 2266.4ms +/- 0.2% significant beat-detection: 1.116x as fast 728.1ms +/- 0.5% 652.5ms +/- 0.5% significant dft: 1.081x as fast 602.8ms +/- 0.8% 557.6ms +/- 0.3% significant fft: 1.164x as fast 552.6ms +/- 0.5% 474.6ms +/- 0.1% significant oscillator: *1.069x as slow* 544.1ms +/- 0.8% 581.7ms +/- 0.1% significant imaging: 1.49x as fast 4376.0ms +/- 0.5% 2929.3ms +/- 0.1% significant gaussian-blur: 2.21x as fast 2843.7ms +/- 1.2% 1284.9ms +/- 0.1% significant darkroom: *1.135x as slow* 634.1ms +/- 1.8% 719.5ms +/- 0.6% significant desaturate: *1.030x as slow* 898.2ms +/- 0.3% 924.9ms +/- 0.2% significant json: - 215.1ms +/- 0.4% 214.8ms +/- 0.5% parse-financial: 1.021x as fast 98.5ms +/- 0.4% 96.5ms +/- 0.5% significant stringify-tinderbox: *1.015x as slow* 116.6ms +/- 0.5% 118.3ms +/- 0.7% significant stanford: 1.24x as fast 956.7ms +/- 0.4% 770.1ms +/- 0.3% significant crypto-aes: 1.105x as fast 189.7ms +/- 1.3% 171.6ms +/- 1.2% significant crypto-pbkdf2: 1.28x as fast 767.0ms +/- 0.4% 598.5ms +/- 0.1% significant
Filip Pizlo
Comment 5 2011-10-09 02:14:44 PDT
This looks great! By the way, you might find it easier to run all of the benchmarks in one go using Tools/Scripts/bencher. You need two checkouts, which you build separately: one for baseline (no changes), one with your changes. You also need ruby installed, and the json gem. And a ~/.bencher file. When you run bencher the first time, it will tell you how to set this file up. Then do: Tools/Scripts/bencher TipOfTree:<path-to-baseline-jsc> MyChanges:<path-to-your-jsc> Example: Tools/Scripts/bencher TipOfTree:/Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc CFA:/Volumes/Data/pizlo/secondary/OpenSource/WebKitBuild/Release/jsc I find that this makes it a bit faster to run benchmarks. But the way you ran them is fine, as well.
WebKit Review Bot
Comment 6 2011-10-09 03:18:29 PDT
Comment on attachment 110292 [details] the patch Clearing flags on attachment: 110292 Committed r97025: <http://trac.webkit.org/changeset/97025>
WebKit Review Bot
Comment 7 2011-10-09 03:18:33 PDT
All reviewed patches have been landed. Closing bug.
Yuqiang Xian
Comment 8 2011-10-09 07:09:58 PDT
Thanks for your hints, Filip. I will try the approach you suggested next time. BTW, one thing I think need to clarify is that the performance result of 32-bit DFG I pasted before is for the comparison between DFG w/o and w/ this patch (i.e. both have DFG turned on). Here's the comparsion between w/o DFG and w/ DFG after this change, tested on Linux ia32. We still have two cases failed to run (stanford-crypto-ccm produces wrong results and stanford-crypto-sha256-iterative crashes) and under investigations. TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.39x as fast 9678.1ms +/- 0.0% 6970.3ms +/- 0.1% significant ============================================================================= ai: 2.68x as fast 2114.8ms +/- 0.1% 789.7ms +/- 0.1% significant astar: 2.68x as fast 2114.8ms +/- 0.1% 789.7ms +/- 0.1% significant audio: 1.037x as fast 2349.3ms +/- 0.1% 2266.4ms +/- 0.2% significant beat-detection: 1.090x as fast 711.5ms +/- 0.3% 652.5ms +/- 0.5% significant dft: 1.045x as fast 582.5ms +/- 0.4% 557.6ms +/- 0.3% significant fft: 1.113x as fast 528.4ms +/- 0.2% 474.6ms +/- 0.1% significant oscillator: *1.104x as slow* 526.9ms +/- 0.1% 581.7ms +/- 0.1% significant imaging: 1.39x as fast 4082.4ms +/- 0.1% 2929.3ms +/- 0.1% significant gaussian-blur: 1.98x as fast 2543.3ms +/- 0.1% 1284.9ms +/- 0.1% significant darkroom: *1.142x as slow* 630.0ms +/- 0.1% 719.5ms +/- 0.6% significant desaturate: *1.017x as slow* 909.1ms +/- 0.0% 924.9ms +/- 0.2% significant json: *1.013x as slow* 212.0ms +/- 0.4% 214.8ms +/- 0.5% significant parse-financial: *1.010x as slow* 95.5ms +/- 0.4% 96.5ms +/- 0.5% significant stringify-tinderbox: *1.015x as slow* 116.5ms +/- 0.5% 118.3ms +/- 0.7% significant stanford: 1.194x as fast 919.6ms +/- 0.2% 770.1ms +/- 0.3% significant crypto-aes: ?? 170.1ms +/- 0.3% 171.6ms +/- 1.2% not conclusive: might be *1.009x as slow* crypto-pbkdf2: 1.25x as fast 749.5ms +/- 0.2% 598.5ms +/- 0.1% significant
Filip Pizlo
Comment 9 2011-10-09 12:11:50 PDT
(In reply to comment #8) > Thanks for your hints, Filip. I will try the approach you suggested next time. > > BTW, one thing I think need to clarify is that the performance result of 32-bit DFG I pasted before is for the comparison between DFG w/o and w/ this patch (i.e. both have DFG turned on). > > Here's the comparsion between w/o DFG and w/ DFG after this change, tested on Linux ia32. We still have two cases failed to run (stanford-crypto-ccm produces wrong results and stanford-crypto-sha256-iterative crashes) and under investigations. Ah! Thanks for the clarification. > ** TOTAL **: 1.39x as fast 9678.1ms +/- 0.0% 6970.3ms +/- 0.1% significant This is starting to look good!
Note You need to log in before you can comment on or make changes to this bug.