Bug 68593

Summary: DFG JIT should infer which uses of a variable are not aliased
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: sam, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 68677, 68763, 68784    
Bug Blocks:    
Attachments:
Description Flags
the patch
none
the patch after merging
none
more work in progress
none
more work in progress
none
starting to work
none
the patch
none
the patch
none
the patch - fix changelog
none
the patch - attempt to fix windows, merge
none
the patch - more fixes
none
the patch
none
the patch
none
the patch - add 32_64 oliver: review+

Description Filip Pizlo 2011-09-21 23:21:39 PDT
Say you write the following program:

var i;

for (i = 0; i < n; ++i) {
    array[i]++;
}

i = 0.5;

Currently, the DFG will assume that 'i' is always both Int and Double, which will cause the first loop to perform very poorly.  This problem is made worse by the bytecode generator's reuse of virtual registers for variables that are lexically unrelated.

The DFG should be fixed so that it does not get confused by this, particularly since a simple algorithm exists for fixing it.  Start by assuming that each GetLocal, SetLocal, and Phi operates on a distinct variable (even if it obviously doesn't).  Then for each GetLocal, SetLocal and Phi, union its variable with the variables of its children.

At the end, each union-find set will correspond to a disjoint live-range of a variable.  In many cases, it will correspond precisely (one-to-one) to a virtual register, while in other cases it will be many-to-one: for any virtual register, each distinct use will have a different "variable".

This implies that SetLocal/GetLocal will have two OpInfo slots: the virtual register (slot one, as it is now) and the variable number used for predictions and static analysis.
Comment 1 Filip Pizlo 2011-09-22 18:00:52 PDT
Created attachment 108434 [details]
the patch

Work in progress.
Comment 2 Filip Pizlo 2011-09-25 21:15:23 PDT
Created attachment 108625 [details]
the patch after merging

Still a work in progress.
Comment 3 Filip Pizlo 2011-09-25 22:04:42 PDT
Created attachment 108626 [details]
more work in progress

I'm starting to have a story for live range splitting, but it's incomplete and probably broken.
Comment 4 Filip Pizlo 2011-09-26 16:18:22 PDT
Created attachment 108748 [details]
more work in progress

We can now split live ranges, but we don't use this information for anything, yet.
Comment 5 Filip Pizlo 2011-09-26 23:50:53 PDT
Created attachment 108794 [details]
starting to work

This has live range splitting using union-find alias analysis.  It seems to work, and seems to do the right thing.  Still testing it though.
Comment 6 Filip Pizlo 2011-09-27 14:53:29 PDT
Created attachment 108904 [details]
the patch

This undoes a 2% speed-up that I had seen on my laptop, but that was not seen on arewefastyet.com.  I investigated this to death, and found that the DFG does exactly the same thing, in exactly the same order, on the benchmark that has the 14% fluctuation.  Compile time is definitely not an issue.  I'm guessing that the fluctuation that I had seen (14% win after getting rid of static predictions, and a 14% slow-down with this patch) was just some kind of systematic noise and I'm inclined to ignore it.
Comment 7 WebKit Review Bot 2011-09-27 14:57:43 PDT
Attachment 108904 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/dfg/DFGGraph.h:328:  The parameter name "variableAccessData" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:167:  An else statement can be removed when the prior "if" concludes with a return, break, continue or goto statement.  [readability/control_flow] [4]
Total errors found: 2 in 16 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 Filip Pizlo 2011-09-27 14:58:01 PDT
It should be noted that this patch significantly ruggedizes our tiered compilation:

- Previously OSR entry would verify a prediction at the head of a basic block even if the variable was dead at the head, because it had no way of distinguishing between a variable being dead at the tail or dead at the head (if it was live at either, it would assume it was live at both).

- Previously if a virtual register was reused for a different purpose in the middle of a function, we would get mixed predictins between the different uses of the variable.  Now we get separate predictions.

It also serves as a stepping stone for two other improvements:

- Currently we do not consider flow sensitivity for variable predictions.  This patch does not do this yet, but gives us a facility for doing so, by separating how a variable is stored (i.e. its virtual register) from how it's predicted.

- Currently if we tried to do inlining of two functions that have different types in their variables (which is almost certainly going to be the case if they are different functions) then the predictions will get mixed.  This patch ensures that once we implement inlining, we will not mix predictions just because the variables got assigned the same virtual registers.
Comment 9 Filip Pizlo 2011-09-27 14:59:47 PDT
Created attachment 108907 [details]
the patch

Fixed style.
Comment 10 Filip Pizlo 2011-09-27 15:06:52 PDT
Created attachment 108909 [details]
the patch - fix changelog

Updated the ChangeLog, since in the previous patches it still said that this was a work in progress (it's not).
Comment 11 Filip Pizlo 2011-09-27 21:43:21 PDT
Created attachment 108960 [details]
the patch - attempt to fix windows, merge
Comment 12 Filip Pizlo 2011-09-27 22:05:15 PDT
Created attachment 108961 [details]
the patch - more fixes
Comment 13 Filip Pizlo 2011-09-27 22:28:56 PDT
Fixed some stuff.  Now it's totally neutral.



Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"LiveRangeSplit" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree             LiveRangeSplit                                 
SunSpider:
   3d-cube                                7.6922+-0.1988    ?     7.7205+-0.2574       ?
   3d-morph                               7.5029+-0.1592          7.4454+-0.2336       
   3d-raytrace                            8.2500+-0.2146    ?     8.4582+-0.2255       ? might be 1.0252x slower
   access-binary-trees                    2.1528+-0.1100          2.1172+-0.1243         might be 1.0168x faster
   access-fannkuch                        6.2283+-0.0899    ?     6.4395+-0.1506       ? might be 1.0339x slower
   access-nbody                           3.6747+-0.0747    ?     3.6900+-0.1150       ?
   access-nsieve                          2.5620+-0.0497    ?     2.6403+-0.0766       ? might be 1.0306x slower
   bitops-3bit-bits-in-byte               1.7323+-0.0467          1.7322+-0.0522       
   bitops-bits-in-byte                    2.7092+-0.0460    ?     2.7689+-0.0710       ? might be 1.0221x slower
   bitops-bitwise-and                     3.5607+-0.1322          3.4690+-0.0751         might be 1.0264x faster
   bitops-nsieve-bits                     5.5097+-0.1361          5.4230+-0.1047         might be 1.0160x faster
   controlflow-recursive                  2.0410+-0.0514    ?     2.0848+-0.0459       ? might be 1.0215x slower
   crypto-aes                             6.3633+-0.1242    ?     6.4963+-0.2182       ? might be 1.0209x slower
   crypto-md5                             2.8494+-0.1204          2.7613+-0.0634         might be 1.0319x faster
   crypto-sha1                            2.5110+-0.0591          2.5104+-0.0746       
   date-format-tofte                     10.0627+-0.3360    ?    10.2101+-0.3020       ? might be 1.0147x slower
   date-format-xparb                      9.5168+-0.1626    ?     9.6750+-0.2667       ? might be 1.0166x slower
   math-cordic                            6.3692+-0.0948          6.3406+-0.1316       
   math-partial-sums                      8.0563+-0.1988          7.8165+-0.2004         might be 1.0307x faster
   math-spectral-norm                     2.9357+-0.1066          2.8295+-0.0678         might be 1.0375x faster
   regexp-dna                            10.8642+-0.2494    ?    11.0709+-0.2275       ? might be 1.0190x slower
   string-base64                          5.9855+-0.3214          5.7881+-0.1345         might be 1.0341x faster
   string-fasta                           7.1136+-0.1528    ?     7.2783+-0.2611       ? might be 1.0231x slower
   string-tagcloud                       11.7383+-0.2994    ?    12.2077+-0.3409       ? might be 1.0400x slower
   string-unpack-code                    21.5002+-0.7005    ?    21.5974+-0.3771       ?
   string-validate-input                  6.5260+-0.2503          6.3355+-0.1992         might be 1.0301x faster

   <arithmetic>                           6.3849+-0.0276    ?     6.4195+-0.0282       ?
   <geometric>                            5.2580+-0.0177    ?     5.2671+-0.0246       ?
   <harmonic>                             4.3333+-0.0238          4.3310+-0.0326       

                                            TipOfTree             LiveRangeSplit                                 
V8:
   crypto                                71.7616+-1.0628         71.1205+-0.3344       
   deltablue                            228.1658+-0.7650    ?   231.5297+-2.8410       ? might be 1.0147x slower
   earley-boyer                          89.8243+-0.5228    ?    89.8258+-0.5621       ?
   raytrace                              62.2547+-0.7894    ?    63.6586+-0.9188       ? might be 1.0226x slower
   regexp                               106.3893+-1.0650        105.9907+-1.6466       
   richards                             198.1538+-0.8433    !   200.9998+-1.5931       ! definitely 1.0144x slower
   splay                                 96.5465+-0.8717    ^    95.0097+-0.4065       ^ definitely 1.0162x faster

   <arithmetic>                         121.8709+-0.3295    ?   122.5907+-0.6098       ?
   <geometric>                          109.2908+-0.4012    ?   109.6402+-0.4618       ?
   <harmonic>                            99.5289+-0.4996    ?    99.7573+-0.4226       ?

                                            TipOfTree             LiveRangeSplit                                 
Kraken:
   ai-astar                             499.2630+-5.3220        495.1316+-4.8329       
   audio-beat-detection                 207.5893+-2.3626        205.4705+-1.9525         might be 1.0103x faster
   audio-dft                            425.4155+-5.1167    ?   439.0691+-16.9925      ? might be 1.0321x slower
   audio-fft                            139.5907+-0.8042    ?   141.1493+-1.4043       ? might be 1.0112x slower
   audio-oscillator                     257.4750+-3.5507    ?   257.8873+-2.2155       ?
   imaging-darkroom                     440.2766+-4.1131    ^   426.5286+-6.2953       ^ definitely 1.0322x faster
   imaging-desaturate                   224.8910+-1.3828    ?   228.9449+-7.2459       ? might be 1.0180x slower
   imaging-gaussian-blur                586.4749+-4.4782        586.4513+-4.4580       
   json-parse-financial                  48.5181+-0.6663    ?    48.7977+-0.7105       ?
   json-stringify-tinderbox              68.7911+-0.6904    ^    67.5631+-0.5023       ^ definitely 1.0182x faster
   stanford-crypto-aes                  134.0193+-1.6182    ?   135.1391+-0.7355       ?
   stanford-crypto-ccm                  105.1607+-0.7178    ^   103.1710+-0.7521       ^ definitely 1.0193x faster
   stanford-crypto-pbkdf2               202.8855+-0.9739    ^   200.2139+-0.8190       ^ definitely 1.0133x faster
   stanford-crypto-sha256-iterative      84.8164+-0.9626    ?    85.4529+-1.3834       ?

   <arithmetic>                         244.6548+-0.6791        244.3550+-1.4587       
   <geometric>                          189.0306+-0.6238        188.7703+-0.9134       
   <harmonic>                           143.5915+-0.7115        143.3510+-0.7654       

                                            TipOfTree             LiveRangeSplit                                 
All benchmarks:
   <arithmetic>                          94.5590+-0.2062    ?    94.5960+-0.3798       ?
   <geometric>                           24.0152+-0.0550    ?    24.0395+-0.0798       ?
   <harmonic>                             7.6202+-0.0407          7.6162+-0.0558
Comment 14 Filip Pizlo 2011-09-27 22:31:31 PDT
Created attachment 108964 [details]
the patch

This is now performance neutral.  It turns out ConvertThis had wrong prediction logic.
Comment 15 Filip Pizlo 2011-09-27 22:57:38 PDT
Created attachment 108968 [details]
the patch

fixed merge
Comment 16 Filip Pizlo 2011-09-28 00:02:25 PDT
Updated numbers after merging with https://bugs.webkit.org/show_bug.cgi?id=68580

Conclusion: it's netural.


Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"LiveRangeSplit" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc

Collected 15 samples per benchmark/VM, with 5 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree             LiveRangeSplit                                 
SunSpider:
   3d-cube                                7.6787+-0.3039          7.4757+-0.2208         might be 1.0271x faster
   3d-morph                               7.3829+-0.1321    ?     7.4831+-0.1374       ? might be 1.0136x slower
   3d-raytrace                            8.2394+-0.2402    ?     8.4595+-0.3011       ? might be 1.0267x slower
   access-binary-trees                    2.0714+-0.0659    ?     2.1538+-0.0820       ? might be 1.0398x slower
   access-fannkuch                        6.4854+-0.1156          6.4094+-0.1069         might be 1.0119x faster
   access-nbody                           3.6724+-0.0718          3.6551+-0.0791       
   access-nsieve                          2.6425+-0.0768          2.6057+-0.0687         might be 1.0141x faster
   bitops-3bit-bits-in-byte               1.6955+-0.0423    ?     1.7364+-0.0360       ? might be 1.0241x slower
   bitops-bits-in-byte                    2.7285+-0.0482    ?     2.7671+-0.0680       ? might be 1.0141x slower
   bitops-bitwise-and                     3.4507+-0.0774          3.3981+-0.0622         might be 1.0155x faster
   bitops-nsieve-bits                     5.4244+-0.1194    ?     5.4386+-0.1020       ?
   controlflow-recursive                  2.0497+-0.0537    ?     2.0617+-0.0333       ?
   crypto-aes                             6.6114+-0.1880    ?     6.6872+-0.1737       ? might be 1.0115x slower
   crypto-md5                             2.7668+-0.0747    ?     2.8010+-0.0668       ? might be 1.0124x slower
   crypto-sha1                            2.4589+-0.0671    ?     2.4827+-0.0555       ?
   date-format-tofte                     10.4541+-0.4384         10.0867+-0.2541         might be 1.0364x faster
   date-format-xparb                      9.6846+-0.2358          9.6279+-0.2417       
   math-cordic                            6.3572+-0.1142    ?     6.5219+-0.1184       ? might be 1.0259x slower
   math-partial-sums                      7.5899+-0.1042          7.5543+-0.1343       
   math-spectral-norm                     2.8221+-0.0534    ?     2.8413+-0.0906       ?
   regexp-dna                            10.9369+-0.2158         10.8010+-0.2099         might be 1.0126x faster
   string-base64                          6.0534+-0.1700          5.9570+-0.1248         might be 1.0162x faster
   string-fasta                           6.9345+-0.1592    ?     7.1482+-0.2413       ? might be 1.0308x slower
   string-tagcloud                       11.8804+-0.2679         11.8612+-0.2276       
   string-unpack-code                    21.3751+-0.4629    ?    21.3774+-0.3379       ?
   string-validate-input                  6.2531+-0.1936    ?     6.3781+-0.1630       ? might be 1.0200x slower

   <arithmetic>                           6.3731+-0.0272    ?     6.3758+-0.0352       ?
   <geometric>                            5.2286+-0.0153    ?     5.2477+-0.0216       ?
   <harmonic>                             4.2922+-0.0271    ?     4.3243+-0.0223       ?

                                            TipOfTree             LiveRangeSplit                                 
V8:
   crypto                                70.8935+-0.3890    ?    71.2029+-0.2827       ?
   deltablue                            229.9200+-0.9956    ^   227.5736+-0.7179       ^ definitely 1.0103x faster
   earley-boyer                          89.3518+-0.2992    !    90.3557+-0.1945       ! definitely 1.0112x slower
   raytrace                              62.5615+-0.4116    ?    62.8549+-0.3808       ?
   regexp                               104.0308+-0.2646        103.7363+-0.3711       
   richards                             198.6203+-0.7546    ?   198.9464+-0.4766       ?
   splay                                 90.9935+-0.4642    ?    92.3987+-1.6417       ? might be 1.0154x slower

   <arithmetic>                         120.9102+-0.2689    ?   121.0098+-0.2652       ?
   <geometric>                          107.9908+-0.2128    ?   108.3575+-0.2639       ?
   <harmonic>                            98.2147+-0.2025    !    98.7076+-0.2400       ! definitely 1.0050x slower

                                            TipOfTree             LiveRangeSplit                                 
Kraken:
   ai-astar                             490.2591+-3.2305    ?   490.4662+-2.2753       ?
   audio-beat-detection                 191.1469+-0.9371        189.4504+-1.0021       
   audio-dft                            278.5623+-2.1923    ?   278.9677+-1.9053       ?
   audio-fft                            127.7110+-0.9664    ?   127.7116+-0.8813       ?
   audio-oscillator                     256.5281+-1.6728    ?   267.7998+-12.2863      ? might be 1.0439x slower
   imaging-darkroom                     419.4694+-1.8256        418.3243+-0.7219       
   imaging-desaturate                   222.7996+-0.6588    ?   223.5182+-0.7306       ?
   imaging-gaussian-blur                578.0480+-1.4034    !   582.3310+-2.1179       ! definitely 1.0074x slower
   json-parse-financial                  47.6611+-0.1951    ?    47.8559+-0.2437       ?
   json-stringify-tinderbox              69.0243+-0.3084    ^    68.2484+-0.1993       ^ definitely 1.0114x faster
   stanford-crypto-aes                  130.2373+-1.3560        128.9448+-1.1310         might be 1.0100x faster
   stanford-crypto-ccm                  102.6255+-0.4921    ^    99.9854+-0.7414       ^ definitely 1.0264x faster
   stanford-crypto-pbkdf2               194.7292+-1.5753        193.4078+-0.6359       
   stanford-crypto-sha256-iterative      85.3745+-0.4286         84.7869+-0.2958       

   <arithmetic>                         228.1555+-0.4259    ?   228.6999+-0.9140       ?
   <geometric>                          178.6563+-0.2998        178.4558+-0.5838       
   <harmonic>                           138.6943+-0.2527        138.1404+-0.3229       

                                            TipOfTree             LiveRangeSplit                                 
All benchmarks:
   <arithmetic>                          89.4946+-0.1462    ?    89.6731+-0.3029       ?
   <geometric>                           23.4997+-0.0463    ?    23.5510+-0.0669       ?
   <harmonic>                             7.5445+-0.0465    ?     7.5992+-0.0383       ?
Comment 17 Filip Pizlo 2011-09-28 22:14:07 PDT
Created attachment 109121 [details]
the patch - add 32_64

Added, and tested, support for 32_64.
Comment 18 Filip Pizlo 2011-09-29 16:17:04 PDT
Landed in r96375.