RESOLVED FIXED 71202
DFG OSR exits should add to value profiles
https://bugs.webkit.org/show_bug.cgi?id=71202
Summary DFG OSR exits should add to value profiles
Filip Pizlo
Reported 2011-10-30 21:43:11 PDT
Value profiles are stochastic and imprecise. In return, we get some nice properties, such as that they are really cheap - so even if code never gets optimized, such as if it is short running, it will still run quickly despite being profiled. But, that imprecision often leads to misspeculations and missed opportunities. An easy way to side-step this is if code frequently fails speculation by performing OSR exit, the OSR exit code should augment the value profile associated with the value that failed speculation (i.e. the value profile that contained the bad information) with that value. This way, if recompilation is triggered due to frequent speculation failures, the DFG will have not just the normal value profiles but also a snapshot of why speculation was failing previously.
Attachments
the patch (73.53 KB, patch)
2011-10-30 22:02 PDT, Filip Pizlo
no flags
the patch (131.23 KB, patch)
2011-10-31 01:42 PDT, Filip Pizlo
no flags
the patch (130.79 KB, patch)
2011-10-31 01:43 PDT, Filip Pizlo
oliver: review+
Filip Pizlo
Comment 1 2011-10-30 22:02:30 PDT
Created attachment 113012 [details] the patch This is still a work in progress, but it's doing nice things for performance. Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "OSRExitProfile" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree OSRExitProfile SunSpider: 3d-cube 7.4476+-0.2451 ? 7.4765+-0.3462 ? 3d-morph 7.5905+-0.1488 ? 7.7374+-0.1980 ? might be 1.0194x slower 3d-raytrace 7.6253+-0.1928 7.5224+-0.1804 might be 1.0137x faster access-binary-trees 1.6272+-0.0670 ? 1.6485+-0.0665 ? might be 1.0131x slower access-fannkuch 6.6357+-0.1318 6.5111+-0.1041 might be 1.0191x faster access-nbody 3.7730+-0.0934 ? 3.8759+-0.0823 ? might be 1.0273x slower access-nsieve 2.7499+-0.1359 2.6392+-0.0771 might be 1.0420x faster bitops-3bit-bits-in-byte 1.2663+-0.0178 ? 1.3191+-0.0398 ? might be 1.0417x slower bitops-bits-in-byte 2.4506+-0.1165 2.4098+-0.0593 might be 1.0169x faster bitops-bitwise-and 3.3116+-0.0798 ? 3.4150+-0.1532 ? might be 1.0312x slower bitops-nsieve-bits 5.4553+-0.1166 ? 5.5435+-0.1914 ? might be 1.0162x slower controlflow-recursive 2.1771+-0.0652 2.1228+-0.0364 might be 1.0256x faster crypto-aes 7.4525+-0.2351 ? 7.4617+-0.2610 ? crypto-md5 2.8494+-0.1006 2.7413+-0.0958 might be 1.0394x faster crypto-sha1 2.4779+-0.0862 2.4676+-0.0633 date-format-tofte 10.4422+-0.3701 10.2439+-0.2959 might be 1.0194x faster date-format-xparb 8.9904+-0.2293 ? 9.5272+-0.3653 ? might be 1.0597x slower math-cordic 6.5545+-0.1336 ? 6.6676+-0.1387 ? might be 1.0172x slower math-partial-sums 7.3791+-0.1189 ? 7.5362+-0.1761 ? might be 1.0213x slower math-spectral-norm 2.5897+-0.0553 ? 2.6413+-0.0944 ? might be 1.0199x slower regexp-dna 11.7961+-0.3366 11.6946+-0.2568 string-base64 4.2983+-0.1578 4.2516+-0.0844 might be 1.0110x faster string-fasta 6.3607+-0.1389 ? 6.3615+-0.1704 ? string-tagcloud 11.8896+-0.3017 ? 12.1887+-0.5354 ? might be 1.0252x slower string-unpack-code 20.9870+-0.5481 ? 21.0838+-0.5925 ? string-validate-input 5.5790+-0.3440 5.3810+-0.2450 might be 1.0368x faster <arithmetic> * 6.2214+-0.0307 ? 6.2488+-0.0329 ? <geometric> 5.0128+-0.0298 ? 5.0252+-0.0299 ? <harmonic> 3.9985+-0.0460 ? 4.0096+-0.0352 ? TipOfTree OSRExitProfile V8: crypto 73.2589+-0.4478 ? 74.0815+-0.5231 ? might be 1.0112x slower deltablue 167.0489+-1.6830 166.7101+-2.2811 earley-boyer 90.8381+-0.6412 ? 91.8894+-1.2159 ? might be 1.0116x slower raytrace 63.6327+-0.7158 ^ 62.3539+-0.4908 ^ definitely 1.0205x faster regexp 105.8577+-0.6788 ? 106.2984+-1.1406 ? richards 124.2458+-0.8623 ? 125.4560+-0.5646 ? splay 92.4507+-0.8355 ? 93.2114+-0.5934 ? <arithmetic> 102.4761+-0.3628 ? 102.8572+-0.3517 ? <geometric> * 97.8644+-0.3566 ? 98.1744+-0.2179 ? <harmonic> 93.7314+-0.3717 ? 93.9123+-0.1826 ? TipOfTree OSRExitProfile Kraken: ai-astar 498.1019+-3.5555 497.8109+-4.3642 audio-beat-detection 192.9884+-2.0270 192.5105+-1.6015 audio-dft 267.9992+-3.7820 264.1323+-2.5511 might be 1.0146x faster audio-fft 125.1567+-1.0422 124.3891+-1.0387 audio-oscillator 252.5195+-1.7051 250.8467+-1.2936 imaging-darkroom 405.0860+-3.5058 ^ 301.0530+-4.2403 ^ definitely 1.3456x faster imaging-desaturate 225.8273+-1.8701 225.7868+-0.7781 imaging-gaussian-blur 557.1729+-2.5246 553.0562+-2.2644 json-parse-financial 58.0259+-0.4031 57.7457+-0.3248 json-stringify-tinderbox 68.2285+-0.3516 ? 68.4495+-0.3577 ? stanford-crypto-aes 133.0071+-1.8636 ^ 96.5019+-0.2555 ^ definitely 1.3783x faster stanford-crypto-ccm 100.1896+-0.6561 ? 100.4740+-1.8711 ? stanford-crypto-pbkdf2 195.5539+-4.3864 194.4378+-2.4384 stanford-crypto-sha256-iterative 70.6284+-0.4679 ! 80.3949+-0.5279 ! definitely 1.1383x slower <arithmetic> * 225.0347+-0.8813 ^ 214.8278+-0.7069 ^ definitely 1.0475x faster <geometric> 177.2422+-0.6686 ^ 170.6528+-0.3701 ^ definitely 1.0386x faster <harmonic> 139.6617+-0.4457 ^ 136.6990+-0.2430 ^ definitely 1.0217x faster TipOfTree OSRExitProfile All benchmarks: <arithmetic> 85.7356+-0.3072 ^ 82.7672+-0.2422 ^ definitely 1.0359x faster <geometric> 22.5702+-0.0881 ^ 22.3579+-0.0672 ^ definitely 1.0095x faster <harmonic> 7.0387+-0.0789 ? 7.0555+-0.0602 ? TipOfTree OSRExitProfile Geomean of preferred means: <scaled-result> 51.5526+-0.1653 ^ 50.8890+-0.1053 ^ definitely 1.0130x faster
WebKit Review Bot
Comment 2 2011-10-30 22:04:08 PDT
Attachment 113012 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/bytecode/ValueProfile.cpp:82: Should have a space between // and comment [whitespace/comments] [4] Source/JavaScriptCore/bytecode/ValueProfile.cpp:89: Should have a space between // and comment [whitespace/comments] [4] Source/JavaScriptCore/bytecode/ValueProfile.cpp:97: Should have a space between // and comment [whitespace/comments] [4] Total errors found: 3 in 12 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 3 2011-10-31 01:42:04 PDT
Created attachment 113022 [details] the patch Implemented 32_64. Fixed some pathologies that I found along the way. Performance on my MBP: Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "OSRExitProfile" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree OSRExitProfile SunSpider: 3d-cube 7.4545+-0.2832 7.2920+-0.2150 might be 1.0223x faster 3d-morph 7.6612+-0.1539 7.5698+-0.1604 might be 1.0121x faster 3d-raytrace 7.5646+-0.1959 7.5226+-0.2498 access-binary-trees 1.7036+-0.0889 1.6274+-0.0497 might be 1.0468x faster access-fannkuch 6.5161+-0.1284 ? 6.5336+-0.1356 ? access-nbody 3.7941+-0.0783 3.7882+-0.0742 access-nsieve 2.6010+-0.1113 ? 2.6077+-0.0756 ? bitops-3bit-bits-in-byte 1.3291+-0.0764 1.2797+-0.0307 might be 1.0386x faster bitops-bits-in-byte 2.4330+-0.0468 2.4273+-0.0345 bitops-bitwise-and 3.5000+-0.3836 3.3709+-0.1231 might be 1.0383x faster bitops-nsieve-bits 5.5452+-0.1208 5.5114+-0.1194 controlflow-recursive 2.1032+-0.0495 ? 2.1054+-0.0467 ? crypto-aes 7.5267+-0.2766 ? 7.6808+-0.2884 ? might be 1.0205x slower crypto-md5 2.7573+-0.0798 2.6895+-0.0813 might be 1.0252x faster crypto-sha1 2.5158+-0.0961 ? 2.5269+-0.0744 ? date-format-tofte 10.5336+-0.3881 10.0672+-0.3106 might be 1.0463x faster date-format-xparb 9.2902+-0.2910 ? 9.7650+-0.2773 ? might be 1.0511x slower math-cordic 6.5445+-0.1407 6.4691+-0.1203 might be 1.0117x faster math-partial-sums 7.4331+-0.1180 ? 7.4579+-0.1256 ? math-spectral-norm 2.5659+-0.0569 ? 2.6255+-0.0755 ? might be 1.0232x slower regexp-dna 11.7454+-0.2662 11.5669+-0.2919 might be 1.0154x faster string-base64 4.5259+-0.2209 ^ 4.0954+-0.1086 ^ definitely 1.1051x faster string-fasta 6.3696+-0.1226 ? 6.4395+-0.2528 ? might be 1.0110x slower string-tagcloud 11.8794+-0.5258 11.6764+-0.4317 might be 1.0174x faster string-unpack-code 21.0209+-0.5359 21.0046+-0.4777 string-validate-input 5.3864+-0.1639 5.3293+-0.2266 might be 1.0107x faster <arithmetic> * 6.2423+-0.0458 6.1935+-0.0323 <geometric> 5.0308+-0.0482 4.9799+-0.0249 might be 1.0102x faster <harmonic> 4.0224+-0.0589 3.9694+-0.0275 might be 1.0134x faster TipOfTree OSRExitProfile V8: crypto 72.7642+-0.4655 ! 73.7216+-0.4294 ! definitely 1.0132x slower deltablue 165.3919+-0.6548 ? 167.2993+-2.5104 ? might be 1.0115x slower earley-boyer 90.5715+-0.4088 ! 91.4635+-0.3694 ! definitely 1.0098x slower raytrace 63.0878+-0.5189 ? 63.1119+-0.4952 ? regexp 104.5092+-0.2850 ? 104.6082+-0.5008 ? richards 125.1410+-0.7511 124.6242+-0.4366 splay 92.6263+-0.8803 91.7876+-0.3696 <arithmetic> 102.0131+-0.2478 ? 102.3737+-0.3549 ? <geometric> * 97.4202+-0.2436 ? 97.7310+-0.2245 ? <harmonic> 93.2800+-0.2585 ? 93.5759+-0.1745 ? TipOfTree OSRExitProfile Kraken: ai-astar 497.8291+-2.2483 495.2406+-4.7121 audio-beat-detection 190.5957+-1.5949 ? 191.3128+-2.4672 ? audio-dft 271.3709+-7.4366 264.9552+-2.7768 might be 1.0242x faster audio-fft 124.2120+-1.0364 123.6318+-0.7699 audio-oscillator 251.2518+-1.6367 ? 252.5525+-1.8577 ? imaging-darkroom 438.3063+-43.5788 ^ 300.6240+-3.8107 ^ definitely 1.4580x faster imaging-desaturate 225.0933+-0.9025 224.7551+-1.1130 imaging-gaussian-blur 553.4834+-2.1758 552.4078+-2.6317 json-parse-financial 57.6940+-0.2728 57.5327+-0.5092 json-stringify-tinderbox 68.7863+-0.7813 68.4622+-0.6336 stanford-crypto-aes 133.4748+-1.5811 ^ 96.9603+-1.0838 ^ definitely 1.3766x faster stanford-crypto-ccm 100.3808+-1.5548 99.5330+-1.0401 stanford-crypto-pbkdf2 192.9906+-0.8765 ? 196.4618+-2.8331 ? might be 1.0180x slower stanford-crypto-sha256-iterative 71.0408+-0.7204 ! 79.7118+-0.3803 ! definitely 1.1221x slower <arithmetic> * 226.8936+-2.7807 ^ 214.5816+-0.4404 ^ definitely 1.0574x faster <geometric> 177.8281+-0.9731 ^ 170.3935+-0.4246 ^ definitely 1.0436x faster <harmonic> 139.7862+-0.3273 ^ 136.3782+-0.4782 ^ definitely 1.0250x faster TipOfTree OSRExitProfile All benchmarks: <arithmetic> 86.2320+-0.8017 ^ 82.5912+-0.1293 ^ definitely 1.0441x faster <geometric> 22.6213+-0.1161 ^ 22.2213+-0.0647 ^ definitely 1.0180x faster <harmonic> 7.0792+-0.1007 6.9861+-0.0471 might be 1.0133x faster TipOfTree OSRExitProfile Geomean of preferred means: <scaled-result> 51.6707+-0.1784 ^ 50.6426+-0.1111 ^ definitely 1.0203x faster Performance on my Mac Pro: Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/fromMiniMe/quinary/OpenSource/WebKitBuild/Release/jsc "OSRExitProfile" at /Volumes/Data/fromMiniMe/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree OSRExitProfile SunSpider: 3d-cube 7.9226+-0.0726 7.9218+-0.0725 3d-morph 8.4109+-0.0357 ! 8.6122+-0.1463 ! definitely 1.0239x slower 3d-raytrace 8.2457+-0.1275 8.1584+-0.0989 might be 1.0107x faster access-binary-trees 1.7159+-0.0235 1.7122+-0.0244 access-fannkuch 7.7707+-0.0323 ! 7.8800+-0.0327 ! definitely 1.0141x slower access-nbody 4.6132+-0.0267 ^ 4.5377+-0.0119 ^ definitely 1.0166x faster access-nsieve 3.2429+-0.0318 3.2209+-0.0202 bitops-3bit-bits-in-byte 1.3253+-0.0088 1.3217+-0.0118 bitops-bits-in-byte 4.9741+-0.0289 ? 4.9939+-0.0235 ? bitops-bitwise-and 3.4683+-0.0677 ? 3.4788+-0.0910 ? bitops-nsieve-bits 5.7003+-0.0376 5.6874+-0.0379 controlflow-recursive 2.3656+-0.0301 2.3521+-0.0241 crypto-aes 7.6225+-0.0454 ! 7.7965+-0.0962 ! definitely 1.0228x slower crypto-md5 2.8842+-0.0301 ? 2.8846+-0.0220 ? crypto-sha1 2.6554+-0.0292 2.6467+-0.0274 date-format-tofte 10.8094+-0.0904 10.6824+-0.1033 might be 1.0119x faster date-format-xparb 10.2739+-0.0636 ! 11.2169+-0.4809 ! definitely 1.0918x slower math-cordic 7.2778+-0.0287 7.2759+-0.0338 math-partial-sums 10.5738+-0.0555 10.5579+-0.0218 math-spectral-norm 2.9220+-0.0349 2.8988+-0.0226 regexp-dna 13.4306+-0.1902 ? 13.6700+-0.2354 ? might be 1.0178x slower string-base64 4.5318+-0.0131 ^ 4.3737+-0.0480 ^ definitely 1.0361x faster string-fasta 7.1551+-0.0175 ? 7.2128+-0.0580 ? string-tagcloud 13.1124+-0.0961 13.0921+-0.0823 string-unpack-code 23.1503+-0.2013 23.0892+-0.1106 string-validate-input 5.8721+-0.0375 ^ 5.7439+-0.0192 ^ definitely 1.0223x faster <arithmetic> * 7.0010+-0.0275 ? 7.0392+-0.0385 ? <geometric> 5.6642+-0.0235 ? 5.6740+-0.0273 ? <harmonic> 4.4911+-0.0229 4.4824+-0.0236 TipOfTree OSRExitProfile V8: crypto 81.4879+-0.5618 ? 81.6116+-0.5521 ? deltablue 182.8618+-1.5267 182.3407+-1.5476 earley-boyer 114.2309+-0.5866 ^ 112.4634+-0.4256 ^ definitely 1.0157x faster raytrace 70.4738+-0.2521 ^ 69.4215+-0.7369 ^ definitely 1.0152x faster regexp 125.1142+-0.8947 124.7320+-0.4422 richards 141.8218+-0.2707 ! 144.2976+-0.3103 ! definitely 1.0175x slower splay 122.3796+-0.5153 ^ 121.3030+-0.5030 ^ definitely 1.0089x faster <arithmetic> 119.7671+-0.3736 119.4528+-0.2912 <geometric> * 114.6531+-0.3189 114.2175+-0.2599 <harmonic> 109.5415+-0.2858 108.9818+-0.2788 TipOfTree OSRExitProfile Kraken: ai-astar 816.5574+-12.8439 ? 827.0204+-0.7326 ? might be 1.0128x slower audio-beat-detection 213.8757+-0.9533 ^ 211.2199+-0.9561 ^ definitely 1.0126x faster audio-dft 269.7655+-7.9556 262.5692+-3.1835 might be 1.0274x faster audio-fft 133.3203+-0.2228 ! 137.8996+-0.6059 ! definitely 1.0343x slower audio-oscillator 292.7531+-1.2667 ? 293.1369+-1.3357 ? imaging-darkroom 462.4961+-11.1884 ^ 340.0617+-5.4201 ^ definitely 1.3600x faster imaging-desaturate 245.5975+-0.4425 ^ 241.5062+-0.2790 ^ definitely 1.0169x faster imaging-gaussian-blur 622.4428+-0.6371 622.2575+-0.5368 json-parse-financial 71.9435+-0.6837 71.3530+-0.4389 json-stringify-tinderbox 79.4496+-0.4179 79.4022+-0.3451 stanford-crypto-aes 154.8135+-2.3105 ^ 116.4911+-0.5211 ^ definitely 1.3290x faster stanford-crypto-ccm 116.2768+-1.2510 ? 116.3312+-0.6872 ? stanford-crypto-pbkdf2 235.4881+-1.7905 ? 236.9285+-1.1141 ? stanford-crypto-sha256-iterative 85.8812+-0.1605 ! 98.6669+-0.2702 ! definitely 1.1489x slower <arithmetic> * 271.4758+-1.2765 ^ 261.0603+-0.3748 ^ definitely 1.0399x faster <geometric> 207.2575+-0.4542 ^ 200.5239+-0.3073 ^ definitely 1.0336x faster <harmonic> 163.1304+-0.2808 ^ 160.4491+-0.2487 ^ definitely 1.0167x faster TipOfTree OSRExitProfile All benchmarks: <arithmetic> 102.5757+-0.3941 ^ 99.4475+-0.1162 ^ definitely 1.0315x faster <geometric> 25.9042+-0.0727 ^ 25.6604+-0.0722 ^ definitely 1.0095x faster <harmonic> 7.9138+-0.0395 7.8966+-0.0406 TipOfTree OSRExitProfile Geomean of preferred means: <scaled-result> 60.1758+-0.1473 ^ 59.4285+-0.1256 ^ definitely 1.0126x faster
Filip Pizlo
Comment 4 2011-10-31 01:43:07 PDT
Created attachment 113023 [details] the patch Removed some debugging stuff.
Oliver Hunt
Comment 5 2011-10-31 08:45:23 PDT
Comment on attachment 113023 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=113023&action=review > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:642 > - speculationCheck(m_jit.branchTest32(MacroAssembler::Zero, scratchReg)); > + speculationCheck(JSValueRegs(), NoNode, m_jit.branchTest32(MacroAssembler::Zero, scratchReg)); Might be nice to be able to record that we're seeing ropes here -- if we are there's no point in planting an inline character access as we currently do
Filip Pizlo
Comment 6 2011-10-31 13:12:44 PDT
(In reply to comment #5) > (From update of attachment 113023 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=113023&action=review > > > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:642 > > - speculationCheck(m_jit.branchTest32(MacroAssembler::Zero, scratchReg)); > > + speculationCheck(JSValueRegs(), NoNode, m_jit.branchTest32(MacroAssembler::Zero, scratchReg)); > > Might be nice to be able to record that we're seeing ropes here -- if we are there's no point in planting an inline character access as we currently do Or better yet, drop a call to deropification and have the CFA realize that after this call, it's no longer a rope.
Filip Pizlo
Comment 7 2011-10-31 16:51:55 PDT
Note You need to log in before you can comment on or make changes to this bug.