RESOLVED FIXED 68593
DFG JIT should infer which uses of a variable are not aliased
https://bugs.webkit.org/show_bug.cgi?id=68593
Summary DFG JIT should infer which uses of a variable are not aliased
Filip Pizlo
Reported 2011-09-21 23:21:39 PDT
Say you write the following program: var i; for (i = 0; i < n; ++i) { array[i]++; } i = 0.5; Currently, the DFG will assume that 'i' is always both Int and Double, which will cause the first loop to perform very poorly. This problem is made worse by the bytecode generator's reuse of virtual registers for variables that are lexically unrelated. The DFG should be fixed so that it does not get confused by this, particularly since a simple algorithm exists for fixing it. Start by assuming that each GetLocal, SetLocal, and Phi operates on a distinct variable (even if it obviously doesn't). Then for each GetLocal, SetLocal and Phi, union its variable with the variables of its children. At the end, each union-find set will correspond to a disjoint live-range of a variable. In many cases, it will correspond precisely (one-to-one) to a virtual register, while in other cases it will be many-to-one: for any virtual register, each distinct use will have a different "variable". This implies that SetLocal/GetLocal will have two OpInfo slots: the virtual register (slot one, as it is now) and the variable number used for predictions and static analysis.
Attachments
the patch (25.38 KB, patch)
2011-09-22 18:00 PDT, Filip Pizlo
no flags
the patch after merging (26.60 KB, patch)
2011-09-25 21:15 PDT, Filip Pizlo
no flags
more work in progress (42.33 KB, patch)
2011-09-25 22:04 PDT, Filip Pizlo
no flags
more work in progress (47.20 KB, patch)
2011-09-26 16:18 PDT, Filip Pizlo
no flags
starting to work (81.14 KB, patch)
2011-09-26 23:50 PDT, Filip Pizlo
no flags
the patch (82.27 KB, patch)
2011-09-27 14:53 PDT, Filip Pizlo
no flags
the patch (82.12 KB, patch)
2011-09-27 14:59 PDT, Filip Pizlo
no flags
the patch - fix changelog (83.07 KB, patch)
2011-09-27 15:06 PDT, Filip Pizlo
no flags
the patch - attempt to fix windows, merge (84.47 KB, patch)
2011-09-27 21:43 PDT, Filip Pizlo
no flags
the patch - more fixes (85.10 KB, patch)
2011-09-27 22:05 PDT, Filip Pizlo
no flags
the patch (85.64 KB, patch)
2011-09-27 22:31 PDT, Filip Pizlo
no flags
the patch (83.45 KB, patch)
2011-09-27 22:57 PDT, Filip Pizlo
no flags
the patch - add 32_64 (100.25 KB, patch)
2011-09-28 22:14 PDT, Filip Pizlo
oliver: review+
Filip Pizlo
Comment 1 2011-09-22 18:00:52 PDT
Created attachment 108434 [details] the patch Work in progress.
Filip Pizlo
Comment 2 2011-09-25 21:15:23 PDT
Created attachment 108625 [details] the patch after merging Still a work in progress.
Filip Pizlo
Comment 3 2011-09-25 22:04:42 PDT
Created attachment 108626 [details] more work in progress I'm starting to have a story for live range splitting, but it's incomplete and probably broken.
Filip Pizlo
Comment 4 2011-09-26 16:18:22 PDT
Created attachment 108748 [details] more work in progress We can now split live ranges, but we don't use this information for anything, yet.
Filip Pizlo
Comment 5 2011-09-26 23:50:53 PDT
Created attachment 108794 [details] starting to work This has live range splitting using union-find alias analysis. It seems to work, and seems to do the right thing. Still testing it though.
Filip Pizlo
Comment 6 2011-09-27 14:53:29 PDT
Created attachment 108904 [details] the patch This undoes a 2% speed-up that I had seen on my laptop, but that was not seen on arewefastyet.com. I investigated this to death, and found that the DFG does exactly the same thing, in exactly the same order, on the benchmark that has the 14% fluctuation. Compile time is definitely not an issue. I'm guessing that the fluctuation that I had seen (14% win after getting rid of static predictions, and a 14% slow-down with this patch) was just some kind of systematic noise and I'm inclined to ignore it.
WebKit Review Bot
Comment 7 2011-09-27 14:57:43 PDT
Attachment 108904 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/dfg/DFGGraph.h:328: The parameter name "variableAccessData" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:167: An else statement can be removed when the prior "if" concludes with a return, break, continue or goto statement. [readability/control_flow] [4] Total errors found: 2 in 16 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 8 2011-09-27 14:58:01 PDT
It should be noted that this patch significantly ruggedizes our tiered compilation: - Previously OSR entry would verify a prediction at the head of a basic block even if the variable was dead at the head, because it had no way of distinguishing between a variable being dead at the tail or dead at the head (if it was live at either, it would assume it was live at both). - Previously if a virtual register was reused for a different purpose in the middle of a function, we would get mixed predictins between the different uses of the variable. Now we get separate predictions. It also serves as a stepping stone for two other improvements: - Currently we do not consider flow sensitivity for variable predictions. This patch does not do this yet, but gives us a facility for doing so, by separating how a variable is stored (i.e. its virtual register) from how it's predicted. - Currently if we tried to do inlining of two functions that have different types in their variables (which is almost certainly going to be the case if they are different functions) then the predictions will get mixed. This patch ensures that once we implement inlining, we will not mix predictions just because the variables got assigned the same virtual registers.
Filip Pizlo
Comment 9 2011-09-27 14:59:47 PDT
Created attachment 108907 [details] the patch Fixed style.
Filip Pizlo
Comment 10 2011-09-27 15:06:52 PDT
Created attachment 108909 [details] the patch - fix changelog Updated the ChangeLog, since in the previous patches it still said that this was a work in progress (it's not).
Filip Pizlo
Comment 11 2011-09-27 21:43:21 PDT
Created attachment 108960 [details] the patch - attempt to fix windows, merge
Filip Pizlo
Comment 12 2011-09-27 22:05:15 PDT
Created attachment 108961 [details] the patch - more fixes
Filip Pizlo
Comment 13 2011-09-27 22:28:56 PDT
Fixed some stuff. Now it's totally neutral. Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "LiveRangeSplit" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree LiveRangeSplit SunSpider: 3d-cube 7.6922+-0.1988 ? 7.7205+-0.2574 ? 3d-morph 7.5029+-0.1592 7.4454+-0.2336 3d-raytrace 8.2500+-0.2146 ? 8.4582+-0.2255 ? might be 1.0252x slower access-binary-trees 2.1528+-0.1100 2.1172+-0.1243 might be 1.0168x faster access-fannkuch 6.2283+-0.0899 ? 6.4395+-0.1506 ? might be 1.0339x slower access-nbody 3.6747+-0.0747 ? 3.6900+-0.1150 ? access-nsieve 2.5620+-0.0497 ? 2.6403+-0.0766 ? might be 1.0306x slower bitops-3bit-bits-in-byte 1.7323+-0.0467 1.7322+-0.0522 bitops-bits-in-byte 2.7092+-0.0460 ? 2.7689+-0.0710 ? might be 1.0221x slower bitops-bitwise-and 3.5607+-0.1322 3.4690+-0.0751 might be 1.0264x faster bitops-nsieve-bits 5.5097+-0.1361 5.4230+-0.1047 might be 1.0160x faster controlflow-recursive 2.0410+-0.0514 ? 2.0848+-0.0459 ? might be 1.0215x slower crypto-aes 6.3633+-0.1242 ? 6.4963+-0.2182 ? might be 1.0209x slower crypto-md5 2.8494+-0.1204 2.7613+-0.0634 might be 1.0319x faster crypto-sha1 2.5110+-0.0591 2.5104+-0.0746 date-format-tofte 10.0627+-0.3360 ? 10.2101+-0.3020 ? might be 1.0147x slower date-format-xparb 9.5168+-0.1626 ? 9.6750+-0.2667 ? might be 1.0166x slower math-cordic 6.3692+-0.0948 6.3406+-0.1316 math-partial-sums 8.0563+-0.1988 7.8165+-0.2004 might be 1.0307x faster math-spectral-norm 2.9357+-0.1066 2.8295+-0.0678 might be 1.0375x faster regexp-dna 10.8642+-0.2494 ? 11.0709+-0.2275 ? might be 1.0190x slower string-base64 5.9855+-0.3214 5.7881+-0.1345 might be 1.0341x faster string-fasta 7.1136+-0.1528 ? 7.2783+-0.2611 ? might be 1.0231x slower string-tagcloud 11.7383+-0.2994 ? 12.2077+-0.3409 ? might be 1.0400x slower string-unpack-code 21.5002+-0.7005 ? 21.5974+-0.3771 ? string-validate-input 6.5260+-0.2503 6.3355+-0.1992 might be 1.0301x faster <arithmetic> 6.3849+-0.0276 ? 6.4195+-0.0282 ? <geometric> 5.2580+-0.0177 ? 5.2671+-0.0246 ? <harmonic> 4.3333+-0.0238 4.3310+-0.0326 TipOfTree LiveRangeSplit V8: crypto 71.7616+-1.0628 71.1205+-0.3344 deltablue 228.1658+-0.7650 ? 231.5297+-2.8410 ? might be 1.0147x slower earley-boyer 89.8243+-0.5228 ? 89.8258+-0.5621 ? raytrace 62.2547+-0.7894 ? 63.6586+-0.9188 ? might be 1.0226x slower regexp 106.3893+-1.0650 105.9907+-1.6466 richards 198.1538+-0.8433 ! 200.9998+-1.5931 ! definitely 1.0144x slower splay 96.5465+-0.8717 ^ 95.0097+-0.4065 ^ definitely 1.0162x faster <arithmetic> 121.8709+-0.3295 ? 122.5907+-0.6098 ? <geometric> 109.2908+-0.4012 ? 109.6402+-0.4618 ? <harmonic> 99.5289+-0.4996 ? 99.7573+-0.4226 ? TipOfTree LiveRangeSplit Kraken: ai-astar 499.2630+-5.3220 495.1316+-4.8329 audio-beat-detection 207.5893+-2.3626 205.4705+-1.9525 might be 1.0103x faster audio-dft 425.4155+-5.1167 ? 439.0691+-16.9925 ? might be 1.0321x slower audio-fft 139.5907+-0.8042 ? 141.1493+-1.4043 ? might be 1.0112x slower audio-oscillator 257.4750+-3.5507 ? 257.8873+-2.2155 ? imaging-darkroom 440.2766+-4.1131 ^ 426.5286+-6.2953 ^ definitely 1.0322x faster imaging-desaturate 224.8910+-1.3828 ? 228.9449+-7.2459 ? might be 1.0180x slower imaging-gaussian-blur 586.4749+-4.4782 586.4513+-4.4580 json-parse-financial 48.5181+-0.6663 ? 48.7977+-0.7105 ? json-stringify-tinderbox 68.7911+-0.6904 ^ 67.5631+-0.5023 ^ definitely 1.0182x faster stanford-crypto-aes 134.0193+-1.6182 ? 135.1391+-0.7355 ? stanford-crypto-ccm 105.1607+-0.7178 ^ 103.1710+-0.7521 ^ definitely 1.0193x faster stanford-crypto-pbkdf2 202.8855+-0.9739 ^ 200.2139+-0.8190 ^ definitely 1.0133x faster stanford-crypto-sha256-iterative 84.8164+-0.9626 ? 85.4529+-1.3834 ? <arithmetic> 244.6548+-0.6791 244.3550+-1.4587 <geometric> 189.0306+-0.6238 188.7703+-0.9134 <harmonic> 143.5915+-0.7115 143.3510+-0.7654 TipOfTree LiveRangeSplit All benchmarks: <arithmetic> 94.5590+-0.2062 ? 94.5960+-0.3798 ? <geometric> 24.0152+-0.0550 ? 24.0395+-0.0798 ? <harmonic> 7.6202+-0.0407 7.6162+-0.0558
Filip Pizlo
Comment 14 2011-09-27 22:31:31 PDT
Created attachment 108964 [details] the patch This is now performance neutral. It turns out ConvertThis had wrong prediction logic.
Filip Pizlo
Comment 15 2011-09-27 22:57:38 PDT
Created attachment 108968 [details] the patch fixed merge
Filip Pizlo
Comment 16 2011-09-28 00:02:25 PDT
Updated numbers after merging with https://bugs.webkit.org/show_bug.cgi?id=68580 Conclusion: it's netural. Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "LiveRangeSplit" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc Collected 15 samples per benchmark/VM, with 5 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree LiveRangeSplit SunSpider: 3d-cube 7.6787+-0.3039 7.4757+-0.2208 might be 1.0271x faster 3d-morph 7.3829+-0.1321 ? 7.4831+-0.1374 ? might be 1.0136x slower 3d-raytrace 8.2394+-0.2402 ? 8.4595+-0.3011 ? might be 1.0267x slower access-binary-trees 2.0714+-0.0659 ? 2.1538+-0.0820 ? might be 1.0398x slower access-fannkuch 6.4854+-0.1156 6.4094+-0.1069 might be 1.0119x faster access-nbody 3.6724+-0.0718 3.6551+-0.0791 access-nsieve 2.6425+-0.0768 2.6057+-0.0687 might be 1.0141x faster bitops-3bit-bits-in-byte 1.6955+-0.0423 ? 1.7364+-0.0360 ? might be 1.0241x slower bitops-bits-in-byte 2.7285+-0.0482 ? 2.7671+-0.0680 ? might be 1.0141x slower bitops-bitwise-and 3.4507+-0.0774 3.3981+-0.0622 might be 1.0155x faster bitops-nsieve-bits 5.4244+-0.1194 ? 5.4386+-0.1020 ? controlflow-recursive 2.0497+-0.0537 ? 2.0617+-0.0333 ? crypto-aes 6.6114+-0.1880 ? 6.6872+-0.1737 ? might be 1.0115x slower crypto-md5 2.7668+-0.0747 ? 2.8010+-0.0668 ? might be 1.0124x slower crypto-sha1 2.4589+-0.0671 ? 2.4827+-0.0555 ? date-format-tofte 10.4541+-0.4384 10.0867+-0.2541 might be 1.0364x faster date-format-xparb 9.6846+-0.2358 9.6279+-0.2417 math-cordic 6.3572+-0.1142 ? 6.5219+-0.1184 ? might be 1.0259x slower math-partial-sums 7.5899+-0.1042 7.5543+-0.1343 math-spectral-norm 2.8221+-0.0534 ? 2.8413+-0.0906 ? regexp-dna 10.9369+-0.2158 10.8010+-0.2099 might be 1.0126x faster string-base64 6.0534+-0.1700 5.9570+-0.1248 might be 1.0162x faster string-fasta 6.9345+-0.1592 ? 7.1482+-0.2413 ? might be 1.0308x slower string-tagcloud 11.8804+-0.2679 11.8612+-0.2276 string-unpack-code 21.3751+-0.4629 ? 21.3774+-0.3379 ? string-validate-input 6.2531+-0.1936 ? 6.3781+-0.1630 ? might be 1.0200x slower <arithmetic> 6.3731+-0.0272 ? 6.3758+-0.0352 ? <geometric> 5.2286+-0.0153 ? 5.2477+-0.0216 ? <harmonic> 4.2922+-0.0271 ? 4.3243+-0.0223 ? TipOfTree LiveRangeSplit V8: crypto 70.8935+-0.3890 ? 71.2029+-0.2827 ? deltablue 229.9200+-0.9956 ^ 227.5736+-0.7179 ^ definitely 1.0103x faster earley-boyer 89.3518+-0.2992 ! 90.3557+-0.1945 ! definitely 1.0112x slower raytrace 62.5615+-0.4116 ? 62.8549+-0.3808 ? regexp 104.0308+-0.2646 103.7363+-0.3711 richards 198.6203+-0.7546 ? 198.9464+-0.4766 ? splay 90.9935+-0.4642 ? 92.3987+-1.6417 ? might be 1.0154x slower <arithmetic> 120.9102+-0.2689 ? 121.0098+-0.2652 ? <geometric> 107.9908+-0.2128 ? 108.3575+-0.2639 ? <harmonic> 98.2147+-0.2025 ! 98.7076+-0.2400 ! definitely 1.0050x slower TipOfTree LiveRangeSplit Kraken: ai-astar 490.2591+-3.2305 ? 490.4662+-2.2753 ? audio-beat-detection 191.1469+-0.9371 189.4504+-1.0021 audio-dft 278.5623+-2.1923 ? 278.9677+-1.9053 ? audio-fft 127.7110+-0.9664 ? 127.7116+-0.8813 ? audio-oscillator 256.5281+-1.6728 ? 267.7998+-12.2863 ? might be 1.0439x slower imaging-darkroom 419.4694+-1.8256 418.3243+-0.7219 imaging-desaturate 222.7996+-0.6588 ? 223.5182+-0.7306 ? imaging-gaussian-blur 578.0480+-1.4034 ! 582.3310+-2.1179 ! definitely 1.0074x slower json-parse-financial 47.6611+-0.1951 ? 47.8559+-0.2437 ? json-stringify-tinderbox 69.0243+-0.3084 ^ 68.2484+-0.1993 ^ definitely 1.0114x faster stanford-crypto-aes 130.2373+-1.3560 128.9448+-1.1310 might be 1.0100x faster stanford-crypto-ccm 102.6255+-0.4921 ^ 99.9854+-0.7414 ^ definitely 1.0264x faster stanford-crypto-pbkdf2 194.7292+-1.5753 193.4078+-0.6359 stanford-crypto-sha256-iterative 85.3745+-0.4286 84.7869+-0.2958 <arithmetic> 228.1555+-0.4259 ? 228.6999+-0.9140 ? <geometric> 178.6563+-0.2998 178.4558+-0.5838 <harmonic> 138.6943+-0.2527 138.1404+-0.3229 TipOfTree LiveRangeSplit All benchmarks: <arithmetic> 89.4946+-0.1462 ? 89.6731+-0.3029 ? <geometric> 23.4997+-0.0463 ? 23.5510+-0.0669 ? <harmonic> 7.5445+-0.0465 ? 7.5992+-0.0383 ?
Filip Pizlo
Comment 17 2011-09-28 22:14:07 PDT
Created attachment 109121 [details] the patch - add 32_64 Added, and tested, support for 32_64.
Filip Pizlo
Comment 18 2011-09-29 16:17:04 PDT
Landed in r96375.
Note You need to log in before you can comment on or make changes to this bug.