Bug 69690 - DFG does not have flow-sensitive intraprocedural control flow analysis
Summary: DFG does not have flow-sensitive intraprocedural control flow analysis
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-07 20:36 PDT by Filip Pizlo
Modified: 2011-10-11 19:05 PDT (History)
5 users (show)

See Also:


Attachments
work in progress (118.13 KB, patch)
2011-10-07 20:44 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
work in progress - fix style, mostly (117.58 KB, patch)
2011-10-07 20:54 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
work in progress - better performance (120.10 KB, patch)
2011-10-07 23:00 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
work in progress - it's now pure win (133.94 KB, patch)
2011-10-08 00:33 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (150.64 KB, patch)
2011-10-08 00:45 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (150.66 KB, patch)
2011-10-08 00:49 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (162.27 KB, patch)
2011-10-08 13:00 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch - merge with 32_64 changes (162.40 KB, patch)
2011-10-08 13:59 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch - more perf improvements (164.39 KB, patch)
2011-10-08 15:30 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch - merged with ByVal changes (164.91 KB, patch)
2011-10-09 13:40 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch - address Gavin's most recent comments (165.31 KB, patch)
2011-10-10 15:48 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch - merged up to ToT (165.99 KB, patch)
2011-10-10 16:21 PDT, Filip Pizlo
barraclough: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2011-10-07 20:36:48 PDT
Currently the DFG will reperform certain type and safety checks because it forgot that it had already performed them in a previous basic block.

It also sometimes forgets that it had performed checks in the same basic block, if those checks are not captured by either CSE or DataFormat.

One way to ensure that checks are precisely eliminated if all possible paths up to that check already contain that same check (or a stricter check) followed by operations that cannot change the property being checked.  Ascertaining this can be done with a flow-sensitive intraprocedural control flow analysis (CFA), which the DFG currently does not have.
Comment 1 Filip Pizlo 2011-10-07 20:44:35 PDT
Created attachment 110248 [details]
work in progress

Still working on it.  Haven't measured performance yet.  Haven't implemented 32_64 support yet.
Comment 2 WebKit Review Bot 2011-10-07 20:47:56 PDT
Attachment 110248 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/dfg/DFGAbstractValue.h:119:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Source/JavaScriptCore/dfg/DFGAbstractValue.h:204:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Source/JavaScriptCore/dfg/DFGAbstractValue.h:253:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Source/JavaScriptCore/dfg/DFGBasicBlock.h:56:  The parameter type should use PassOwnPtr instead of OwnPtr.  [readability/pass_ptr] [5]
Source/JavaScriptCore/dfg/DFGAbstractState.h:95:  The parameter name "codeBlock" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGAbstractState.h:95:  The parameter name "graph" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGAbstractState.h:106:  The parameter name "graph" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGAbstractState.h:111:  The parameter name "basicBlock" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGAbstractState.h:134:  The parameter name "mergeMode" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGAbstractState.h:147:  The parameter name "nodeIndex" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGAbstractState.h:162:  The parameter name "graph" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGAbstractState.h:162:  The parameter name "basicBlock" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGAbstractState.h:171:  The parameter name "nodeIndex" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGOSREntry.cpp:89:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Total errors found: 14 in 22 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Filip Pizlo 2011-10-07 20:52:12 PDT
This looks like it still needs work, but even now, it's a speed-up on V8.



Benchmark report for V8.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"CFA" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark
iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get
microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals
in milliseconds.

                       TipOfTree                  CFA                                       

crypto              71.8876+-0.2979    !    73.8087+-0.2691       ! definitely 1.0267x slower
deltablue          226.0413+-2.3269    ?   226.1569+-2.0996       ?
earley-boyer        90.5814+-0.4154    !    91.8870+-0.3324       ! definitely 1.0144x slower
raytrace            58.4061+-0.3131    !    59.3055+-0.3795       ! definitely 1.0154x slower
regexp             102.7642+-0.2884    ?   103.4826+-0.4524       ?
richards           204.4922+-0.8884    ^   175.1322+-0.3993       ^ definitely 1.1676x faster
splay               95.6563+-0.7585    ?    96.4660+-0.2593       ?

<arithmetic>       121.4042+-0.3672    ^   118.0341+-0.2739       ^ definitely 1.0286x faster
<geometric> *      108.1240+-0.2137    ^   106.8489+-0.1572       ^ definitely 1.0119x faster
<harmonic>          97.7987+-0.1951    ?    97.9595+-0.1685       ?
Comment 4 Filip Pizlo 2011-10-07 20:54:11 PDT
Created attachment 110250 [details]
work in progress - fix style, mostly
Comment 5 WebKit Review Bot 2011-10-07 20:56:48 PDT
Attachment 110250 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/dfg/DFGBasicBlock.h:56:  The parameter type should use PassOwnPtr instead of OwnPtr.  [readability/pass_ptr] [5]
Total errors found: 1 in 22 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 6 Filip Pizlo 2011-10-07 22:24:29 PDT
> crypto              71.8876+-0.2979    !    73.8087+-0.2691       ! definitely 1.0267x slower

This 2% appears to be entirely due to the CFA itself!  The CFA code needs to be tuned up.  I'll play around with making it faster.
Comment 7 Filip Pizlo 2011-10-07 23:00:21 PDT
Created attachment 110259 [details]
work in progress - better performance

I did two things:

- Fixed OSR entry. If a variable is dead then the abstract value should be TOP. Internally, the abstract state machine treats dead variables as BOTTOM. That's necessary for making the analysis precise.

- Changed the abstract structure value representation. Previously it was basically Vector<Structure*>. Now it's just Structure*, where sets larger than 1 are turned to TOP. This works great, since most structure sets only contain one element anyway.


Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"CFA" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree                  CFA                                       
SunSpider:
   3d-cube                                7.3400+-0.2059          7.2108+-0.1398         might be 1.0179x faster
   3d-morph                               7.5851+-0.1400    ?     7.6501+-0.1387       ?
   3d-raytrace                            7.5752+-0.1830          7.5150+-0.1734       
   access-binary-trees                    1.7129+-0.0337    ?     1.7169+-0.0476       ?
   access-fannkuch                        6.2979+-0.0958    ?     6.5286+-0.1364       ? might be 1.0366x slower
   access-nbody                           3.4297+-0.0839    ?     3.5317+-0.0830       ? might be 1.0297x slower
   access-nsieve                          2.5318+-0.0516    ?     2.6022+-0.0786       ? might be 1.0278x slower
   bitops-3bit-bits-in-byte               1.6817+-0.0191    !     1.7452+-0.0250       ! definitely 1.0378x slower
   bitops-bits-in-byte                    2.7081+-0.0537    !     2.9258+-0.1245       ! definitely 1.0804x slower
   bitops-bitwise-and                     3.4372+-0.2532          3.3606+-0.0894         might be 1.0228x faster
   bitops-nsieve-bits                     5.4391+-0.0644          5.3109+-0.1139         might be 1.0241x faster
   controlflow-recursive                  2.0508+-0.0375          2.0316+-0.0499       
   crypto-aes                             6.5082+-0.1513    ?     6.7630+-0.1750       ? might be 1.0392x slower
   crypto-md5                             2.7830+-0.0851    ?     2.8322+-0.0706       ? might be 1.0177x slower
   crypto-sha1                            2.4498+-0.0714    ?     2.5556+-0.0867       ? might be 1.0432x slower
   date-format-tofte                      9.8098+-0.2208    ?     9.9895+-0.1587       ? might be 1.0183x slower
   date-format-xparb                      9.4492+-0.1745    !     9.8804+-0.2001       ! definitely 1.0456x slower
   math-cordic                            6.3737+-0.1416          6.3125+-0.0656       
   math-partial-sums                      7.5554+-0.1169          7.5150+-0.1276       
   math-spectral-norm                     2.7905+-0.0655    ?     2.8601+-0.0784       ? might be 1.0250x slower
   regexp-dna                            10.6186+-0.1482         10.5608+-0.1658       
   string-base64                          5.1063+-0.0828    ?     5.1827+-0.0692       ? might be 1.0150x slower
   string-fasta                           6.3348+-0.1140    ?     6.4154+-0.1134       ? might be 1.0127x slower
   string-tagcloud                       11.1755+-0.2087    ?    11.2236+-0.2715       ?
   string-unpack-code                    21.0133+-0.2257    !    21.6342+-0.2337       ! definitely 1.0295x slower
   string-validate-input                  6.2811+-0.1612          6.2791+-0.1612       

   <arithmetic> *                         6.1553+-0.0194    !     6.2359+-0.0193       ! definitely 1.0131x slower
   <geometric>                            5.0469+-0.0176    !     5.1180+-0.0158       ! definitely 1.0141x slower
   <harmonic>                             4.1371+-0.0257    !     4.2074+-0.0236       ! definitely 1.0170x slower

                                            TipOfTree                  CFA                                       
V8:
   crypto                                71.6115+-0.3106    !    72.6151+-0.5762       ! definitely 1.0140x slower
   deltablue                            227.0336+-3.5653        223.0970+-2.1287         might be 1.0176x faster
   earley-boyer                          90.4309+-0.3425    ?    90.9446+-0.2832       ?
   raytrace                              57.5184+-0.4610         57.4722+-0.2783       
   regexp                               102.7896+-0.3338    ?   102.8251+-0.5877       ?
   richards                             205.1041+-1.1592    ^   175.0092+-0.6119       ^ definitely 1.1720x faster
   splay                                 95.5634+-0.6248    ?    95.6634+-0.7130       ?

   <arithmetic>                         121.4359+-0.5050    ^   116.8038+-0.3613       ^ definitely 1.0397x faster
   <geometric> *                        107.9023+-0.2225    ^   105.5267+-0.2425       ^ definitely 1.0225x faster
   <harmonic>                            97.3732+-0.1490    ^    96.4890+-0.1988       ^ definitely 1.0092x faster

                                            TipOfTree                  CFA                                       
Kraken:
   ai-astar                             493.1064+-3.9254        492.3566+-3.0941       
   audio-beat-detection                 191.3132+-0.6433    ?   191.4514+-0.7850       ?
   audio-dft                            265.4996+-2.7456    ^   260.4941+-1.8597       ^ definitely 1.0192x faster
   audio-fft                            125.3307+-0.5836    ?   126.1825+-1.9849       ?
   audio-oscillator                     251.3514+-1.9030        249.7558+-2.0144       
   imaging-darkroom                     410.4260+-0.8666    !   413.6212+-1.1919       ! definitely 1.0078x slower
   imaging-desaturate                   230.1417+-0.5040    ^   216.3676+-0.4729       ^ definitely 1.0637x faster
   imaging-gaussian-blur                578.7784+-2.2247    ^   568.6758+-2.3888       ^ definitely 1.0178x faster
   json-parse-financial                  54.4313+-0.2379    ?    54.5051+-0.2106       ?
   json-stringify-tinderbox              69.6600+-0.8972    ^    68.1390+-0.2398       ^ definitely 1.0223x faster
   stanford-crypto-aes                  131.3154+-2.2640        130.2981+-1.6271       
   stanford-crypto-ccm                   99.8329+-0.9814    !   101.7389+-0.8432       ! definitely 1.0191x slower
   stanford-crypto-pbkdf2               187.8078+-0.7526    !   191.5040+-1.2973       ! definitely 1.0197x slower
   stanford-crypto-sha256-iterative      69.6199+-0.2190    ?    69.9535+-0.3764       ?

   <arithmetic> *                       225.6153+-0.5117    ^   223.9317+-0.4806       ^ definitely 1.0075x faster
   <geometric>                          176.2934+-0.4303    ^   175.3365+-0.4004       ^ definitely 1.0055x faster
   <harmonic>                           137.8627+-0.3810        137.4240+-0.3734       

                                            TipOfTree                  CFA                                       
All benchmarks:
   <arithmetic>                          88.6958+-0.1882    ^    87.5490+-0.1577       ^ definitely 1.0131x faster
   <geometric>                           22.9502+-0.0535    ?    23.0146+-0.0492       ?
   <harmonic>                             7.2778+-0.0442    !     7.3970+-0.0406       ! definitely 1.0164x slower

                                            TipOfTree                  CFA                                       
Geomean of preferred means:
   <scaled-result>                       53.1147+-0.0841    ^    52.8190+-0.0781       ^ definitely 1.0056x faster
Comment 8 WebKit Review Bot 2011-10-07 23:03:52 PDT
Attachment 110259 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/dfg/DFGAbstractValue.h:96:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Source/JavaScriptCore/dfg/DFGBasicBlock.h:56:  The parameter type should use PassOwnPtr instead of OwnPtr.  [readability/pass_ptr] [5]
Source/JavaScriptCore/dfg/DFGAbstractState.cpp:48:  Should have a space between // and comment  [whitespace/comments] [4]
Source/JavaScriptCore/dfg/DFGAbstractState.cpp:65:  Should have a space between // and comment  [whitespace/comments] [4]
Source/JavaScriptCore/dfg/DFGAbstractState.cpp:82:  Should have a space between // and comment  [whitespace/comments] [4]
Source/JavaScriptCore/dfg/DFGAbstractState.cpp:123:  Should have a space between // and comment  [whitespace/comments] [4]
Total errors found: 6 in 22 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 9 Filip Pizlo 2011-10-08 00:33:34 PDT
Created attachment 110263 [details]
work in progress - it's now pure win

This is now pure win.  This required a number of changes:

- CFA merging requires interrogating the successors of basic blocks. This was previously a binary search. Now it's not.

- The CFA is now pseudo-worklist based. It visits basic blocks in bytecode order (which is almost program order, and almost topological order), leading to a high probability that a block will not be visited unless its predecessors have been visited. But it does not visit a block if since the last time it was visited nothing has happened that would change the block's outcome.

- Bunch of little tweaks to reduce the costs of various CFA operations.


Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"CFA" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree                  CFA                                       
SunSpider:
   3d-cube                                7.1877+-0.1640    ?     7.2292+-0.1642       ?
   3d-morph                               7.5879+-0.1785          7.5121+-0.1356         might be 1.0101x faster
   3d-raytrace                            7.4148+-0.2026          7.3754+-0.1707       
   access-binary-trees                    1.7784+-0.0808          1.6926+-0.0562         might be 1.0507x faster
   access-fannkuch                        6.3277+-0.1237          6.3013+-0.1268       
   access-nbody                           3.5172+-0.0884          3.4572+-0.0941         might be 1.0174x faster
   access-nsieve                          2.5999+-0.0791          2.5825+-0.0877       
   bitops-3bit-bits-in-byte               1.7346+-0.0357          1.7115+-0.0327         might be 1.0135x faster
   bitops-bits-in-byte                    2.7539+-0.0720          2.7514+-0.0510       
   bitops-bitwise-and                     3.3173+-0.0725    ?     3.3754+-0.1000       ? might be 1.0175x slower
   bitops-nsieve-bits                     5.3589+-0.1009    ?     5.3927+-0.1290       ?
   controlflow-recursive                  2.0357+-0.0392    ?     2.0723+-0.0671       ? might be 1.0180x slower
   crypto-aes                             6.5654+-0.1839    ?     6.6289+-0.1744       ?
   crypto-md5                             2.7721+-0.0633    ?     2.8039+-0.0589       ? might be 1.0114x slower
   crypto-sha1                            2.4790+-0.1072    ?     2.4911+-0.0745       ?
   date-format-tofte                      9.8535+-0.2432    ?     9.9580+-0.2362       ? might be 1.0106x slower
   date-format-xparb                      9.4830+-0.1867    !    10.0138+-0.2298       ! definitely 1.0560x slower
   math-cordic                            6.3866+-0.1093    ?     6.5029+-0.1173       ? might be 1.0182x slower
   math-partial-sums                      7.6234+-0.1297          7.5099+-0.1406         might be 1.0151x faster
   math-spectral-norm                     2.8589+-0.0701          2.8198+-0.0660         might be 1.0139x faster
   regexp-dna                            10.5651+-0.2565         10.4409+-0.1349         might be 1.0119x faster
   string-base64                          5.0891+-0.1039    ?     5.2070+-0.1078       ? might be 1.0232x slower
   string-fasta                           6.4453+-0.1413          6.3143+-0.0977         might be 1.0207x faster
   string-tagcloud                       11.0358+-0.2692    ?    11.1644+-0.1814       ? might be 1.0117x slower
   string-unpack-code                    21.0071+-0.2760    ?    21.4615+-0.5199       ? might be 1.0216x slower
   string-validate-input                  6.2340+-0.1813    ?     6.3609+-0.1603       ? might be 1.0204x slower

   <arithmetic> *                         6.1543+-0.0155    ?     6.1973+-0.0322       ?
   <geometric>                            5.0642+-0.0192    ?     5.0771+-0.0246       ?
   <harmonic>                             4.1736+-0.0342          4.1643+-0.0360       

                                            TipOfTree                  CFA                                       
V8:
   crypto                                71.6260+-0.3327    ?    71.8717+-0.3132       ?
   deltablue                            226.0467+-1.9042    ^   221.1798+-1.0192       ^ definitely 1.0220x faster
   earley-boyer                          89.8473+-0.2783    !    90.9236+-0.3194       ! definitely 1.0120x slower
   raytrace                              57.7508+-0.2079    ?    58.0618+-0.2847       ?
   regexp                               102.4067+-0.2936    ?   103.1329+-0.4744       ?
   richards                             204.6559+-1.0139    ^   175.9648+-1.3114       ^ definitely 1.1631x faster
   splay                                 96.0001+-0.5870         95.7168+-0.2982       

   <arithmetic>                         121.1905+-0.2260    ^   116.6931+-0.2717       ^ definitely 1.0385x faster
   <geometric> *                        107.7836+-0.1350    ^   105.5294+-0.1963       ^ definitely 1.0214x faster
   <harmonic>                            97.3547+-0.1511    ^    96.5707+-0.1726       ^ definitely 1.0081x faster

                                            TipOfTree                  CFA                                       
Kraken:
   ai-astar                             491.8601+-2.3376        489.5730+-2.2160       
   audio-beat-detection                 190.6516+-0.7819    !   193.5275+-1.1763       ! definitely 1.0151x slower
   audio-dft                            268.0732+-3.6325    ^   259.3428+-1.9135       ^ definitely 1.0337x faster
   audio-fft                            124.9162+-0.4877        124.7628+-0.2686       
   audio-oscillator                     252.5326+-2.3092        249.5812+-1.6099         might be 1.0118x faster
   imaging-darkroom                     411.7627+-1.5812    ?   412.7726+-0.8839       ?
   imaging-desaturate                   229.9889+-0.7966    ^   216.8373+-0.8930       ^ definitely 1.0607x faster
   imaging-gaussian-blur                580.7792+-2.4572    ^   565.6115+-1.0252       ^ definitely 1.0268x faster
   json-parse-financial                  55.2379+-0.3316         55.1255+-0.5643       
   json-stringify-tinderbox              68.1584+-0.4261         67.6238+-0.1915       
   stanford-crypto-aes                  130.3048+-1.2905    ?   130.4318+-1.5252       ?
   stanford-crypto-ccm                   99.8189+-0.8939    !   102.0182+-0.9102       ! definitely 1.0220x slower
   stanford-crypto-pbkdf2               187.7075+-0.8696    !   190.3954+-1.0340       ! definitely 1.0143x slower
   stanford-crypto-sha256-iterative      70.1452+-0.4680    ?    70.3815+-0.4656       ?

   <arithmetic> *                       225.8527+-0.5194    ^   223.4275+-0.4581       ^ definitely 1.0109x faster
   <geometric>                          176.3357+-0.4920    ^   175.2288+-0.4434       ^ definitely 1.0063x faster
   <harmonic>                           137.8869+-0.4240        137.5821+-0.4486       

                                            TipOfTree                  CFA                                       
All benchmarks:
   <arithmetic>                          88.7294+-0.1670    ^    87.3610+-0.1625       ^ definitely 1.0157x faster
   <geometric>                           22.9916+-0.0565         22.9087+-0.0767       
   <harmonic>                             7.3403+-0.0588          7.3234+-0.0619       

                                            TipOfTree                  CFA                                       
Geomean of preferred means:
   <scaled-result>                       53.1110+-0.0676    ^    52.6706+-0.1069       ^ definitely 1.0084x faster
Comment 10 WebKit Review Bot 2011-10-08 00:36:03 PDT
Attachment 110263 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/dfg/DFGAbstractValue.h:96:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Source/JavaScriptCore/dfg/DFGBasicBlock.h:56:  The parameter type should use PassOwnPtr instead of OwnPtr.  [readability/pass_ptr] [5]
Total errors found: 2 in 26 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 11 Filip Pizlo 2011-10-08 00:45:28 PDT
Created attachment 110264 [details]
the patch

Got it to work with 32_64.
Comment 12 WebKit Review Bot 2011-10-08 00:47:36 PDT
Attachment 110264 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/ChangeLog:3:  Line contains tab character.  [whitespace/tab] [5]
Source/JavaScriptCore/ChangeLog:4:  Line contains tab character.  [whitespace/tab] [5]
Source/JavaScriptCore/ChangeLog:8:  Line contains tab character.  [whitespace/tab] [5]
Source/JavaScriptCore/dfg/DFGBasicBlock.h:56:  The parameter type should use PassOwnPtr instead of OwnPtr.  [readability/pass_ptr] [5]
Total errors found: 4 in 26 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 13 Filip Pizlo 2011-10-08 00:49:30 PDT
Created attachment 110265 [details]
the patch

Oh, emacs, you and your tabs, which you like to put into ChangeLogs regardless of how many times I tell you not to.
Comment 14 WebKit Review Bot 2011-10-08 00:53:18 PDT
Attachment 110265 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/dfg/DFGBasicBlock.h:56:  The parameter type should use PassOwnPtr instead of OwnPtr.  [readability/pass_ptr] [5]
Total errors found: 1 in 26 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 15 Oliver Hunt 2011-10-08 11:28:40 PDT
Comment on attachment 110265 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=110265&action=review

Should use a typedef for bool (*predictionCheck)(PredictedType), getBytecodeBegin shouldn't be taking an OwnPtr*, but it looks like that has just  been moved from one place to another.

Other than those style quirks I can't see any obvious problems.

I am slightly concerned with the potential maintenance problems caused by separating type flow logic from codegen, especially given the vast amount of code duplication we currently have :-(

I do think Gavin should also look at this though.

> Source/JavaScriptCore/dfg/DFGBasicBlock.h:59
> +    static inline BlockIndex getBytecodeBegin(OwnPtr<BasicBlock>* block)
> +    {
> +        return (*block)->bytecodeBegin;
> +    }

This seems wrong, there's no transfer of ownership intended here, you should just be passing BasicBlock*

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT32_64.cpp:350
> +void SpeculativeJIT::compileObjectEquality(Node& node, void* vptr, bool (*predictionCheck)(PredictedType))

Would be nice to have a typedef for bool (*predictionCheck)(PredictedType)
Comment 16 Filip Pizlo 2011-10-08 12:32:44 PDT
(In reply to comment #15)
> (From update of attachment 110265 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=110265&action=review
> 
> Should use a typedef for bool (*predictionCheck)(PredictedType), getBytecodeBegin shouldn't be taking an OwnPtr*, but it looks like that has just  been moved from one place to another.
> 
> Other than those style quirks I can't see any obvious problems.
> 
> I am slightly concerned with the potential maintenance problems caused by separating type flow logic from codegen, especially given the vast amount of code duplication we currently have :-(

I know.  I don't think I want to do this in this patch, but I think I've got a solution for both this and register allocation: have named speculations (so just as we have an enum of node types, have an enum of speculation types), have attributes on nodes that describe what speculatiuons should be performed, have one phase in Propagator that deduces what those speculatins ought to be, and then while we're at it, have node attributes that specify what additional registers the codegen for a node will required.  And viola, we'll be able to do global register allocation and get rid of GenerationInfo entirely.

> 
> I do think Gavin should also look at this though.
> 
> > Source/JavaScriptCore/dfg/DFGBasicBlock.h:59
> > +    static inline BlockIndex getBytecodeBegin(OwnPtr<BasicBlock>* block)
> > +    {
> > +        return (*block)->bytecodeBegin;
> > +    }
> 
> This seems wrong, there's no transfer of ownership intended here, you should just be passing BasicBlock*

I think that this code is exactly as it should be.  This is one of those gross stubs that we need for doing binary searches over Vector<OwnPtr<BasicBlock>>.  The WTF::binarySearch() function takes a function pointer for getting the search key; for a vector of the form Vector<T> it takes a function of the form U (*)(T*), where U is the search key.  For Vector<OwnPtr<BasicBlock>> we have T = OwnPtr<BasicBlock>, so the function must be of the form U (*)(OwnPtr<BasicBlock>*).

The only way to change this is to make Vector<OwnPtr<BasicBlock>> be something else, but I quite like the idea of a vector of pointers for basic blocks, since there are few basic blocks and the basic block data structure is rather large.

So while this looks yucky, and you'd probably never want to call it directly, it's necessary to support binary searches of basic blocks by their bytecode index.

> 
> > Source/JavaScriptCore/dfg/DFGSpeculativeJIT32_64.cpp:350
> > +void SpeculativeJIT::compileObjectEquality(Node& node, void* vptr, bool (*predictionCheck)(PredictedType))
> 
> Would be nice to have a typedef for bool (*predictionCheck)(PredictedType)

Agree, will add.
Comment 17 Filip Pizlo 2011-10-08 13:00:39 PDT
Created attachment 110277 [details]
the patch

Added a typedef for bool (*)(PredictedType).  Removed bytecode/ActionablePrediction.h because it's unused.  Updated some build files (though most of them make no reference to either DFG, or to any headers, so did not need to change).
Comment 18 WebKit Review Bot 2011-10-08 13:03:44 PDT
Attachment 110277 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/dfg/DFGBasicBlock.h:56:  The parameter type should use PassOwnPtr instead of OwnPtr.  [readability/pass_ptr] [5]
Total errors found: 1 in 28 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 19 Filip Pizlo 2011-10-08 13:59:35 PDT
Created attachment 110281 [details]
the patch - merge with 32_64 changes
Comment 20 WebKit Review Bot 2011-10-08 14:01:50 PDT
Attachment 110281 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/dfg/DFGBasicBlock.h:56:  The parameter type should use PassOwnPtr instead of OwnPtr.  [readability/pass_ptr] [5]
Total errors found: 1 in 28 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 21 Filip Pizlo 2011-10-08 15:30:24 PDT
Created attachment 110283 [details]
the patch - more perf improvements

- Did more tuning to the CFA itself. It now calls clobberStructures() less frequently. clobberStructures() is an expensive, but necessary, method. Likely any future work on improving CFA efficiency will focus on that method - how it's called, and what it does internally.

- Added better debugging support. Every graph dump will now print the CFA state in a concise and clear way.

- Fixed a bug where clobberStructures() was not being called for possibly-side-effecting operations like LogicalNot on questionable objects, and CompareXYZ on questionable objects.

- Did some investigation into what is going on in SunSpider. It turns out that date-format-xparb, among others, are still failing speculation like crazy. A lot of its is due to GetByVal not having a string property case. The exponential backoff in recompilation is doing its best, but since the benchmark is so short-running, any increase in DFG runtime will affect such short-running benchmarks that fail speculation systematically.

- Ran performance numbers on a separate machine, to find that the speed-up is smaller on my Mac Pro than it is on my MacBook Pro.  But it's a speed-up on both machines.

Current performance situation:

- Overall 0.5% speed-up in geomean of preferred means.

- Slight reproducible sub-1% slow-down on SunSpider

- Either a 2.4% or a 0.5% win on V8 depending on machine.

- 1% win on Kraken.

Recommendation: we should land this, since: it's already a net win, the losses on SunSpider are due to preexisting conditions for which we are currently uninsured, and this should only become a bigger win as we add more machinery to take advantage of it. (Like, we're not doing anything to take advantage of the always-double propagation that this thing is already doing, and I haven't yet implemented the is-constant propagation, which should be a win on function checks among other things.) Not to mention that classically, CFA is most profitable when combined with inlining.

Latest numbers on MacBook Pro:


Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"CFA" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc

Collected 30 samples per benchmark/VM, with 10 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree                  CFA                                       
SunSpider:
   3d-cube                                7.2512+-0.1027          7.2506+-0.1160       
   3d-morph                               7.6656+-0.0813    ?     7.7124+-0.0978       ?
   3d-raytrace                            7.3843+-0.0984    ?     7.4021+-0.0967       ?
   access-binary-trees                    1.7302+-0.0287          1.6781+-0.0239         might be 1.0310x faster
   access-fannkuch                        6.3504+-0.0672          6.3159+-0.0640       
   access-nbody                           3.4521+-0.0469    ?     3.4932+-0.0531       ? might be 1.0119x slower
   access-nsieve                          2.5600+-0.0356    ?     2.5885+-0.0470       ? might be 1.0111x slower
   bitops-3bit-bits-in-byte               1.7154+-0.0262    ?     1.7434+-0.0319       ? might be 1.0163x slower
   bitops-bits-in-byte                    2.7388+-0.0391    ?     2.7447+-0.0359       ?
   bitops-bitwise-and                     3.3050+-0.0546    ?     3.3606+-0.0589       ? might be 1.0168x slower
   bitops-nsieve-bits                     5.4385+-0.0653          5.4370+-0.0711       
   controlflow-recursive                  2.0667+-0.0212    ?     2.0812+-0.0208       ?
   crypto-aes                             6.6382+-0.1104    ?     6.6709+-0.1017       ?
   crypto-md5                             2.8355+-0.0466          2.8343+-0.0453       
   crypto-sha1                            2.4700+-0.0373    ?     2.5155+-0.0443       ? might be 1.0184x slower
   date-format-tofte                      9.9355+-0.0996    ?     9.9652+-0.1371       ?
   date-format-xparb                      9.5529+-0.1560    !     9.9465+-0.1131       ! definitely 1.0412x slower
   math-cordic                            6.2958+-0.0680    ?     6.3863+-0.0523       ? might be 1.0144x slower
   math-partial-sums                      7.5041+-0.0695    ?     7.5846+-0.0878       ? might be 1.0107x slower
   math-spectral-norm                     2.8198+-0.0398    ?     2.8726+-0.0465       ? might be 1.0187x slower
   regexp-dna                            10.6842+-0.0893         10.6180+-0.1092       
   string-base64                          5.2352+-0.0858    ?     5.2951+-0.0554       ? might be 1.0114x slower
   string-fasta                           6.3476+-0.0765    ?     6.4334+-0.0947       ? might be 1.0135x slower
   string-tagcloud                       11.0721+-0.1259    ?    11.1451+-0.1129       ?
   string-unpack-code                    20.9296+-0.2131    ?    21.3736+-0.3461       ? might be 1.0212x slower
   string-validate-input                  6.3714+-0.1157          6.3281+-0.0869       

   <arithmetic> *                         6.1673+-0.0147    !     6.2222+-0.0173       ! definitely 1.0089x slower
   <geometric>                            5.0674+-0.0157    !     5.1036+-0.0164       ! definitely 1.0072x slower
   <harmonic>                             4.1647+-0.0213    ?     4.1880+-0.0207       ?

                                            TipOfTree                  CFA                                       
V8:
   crypto                                71.9306+-0.2217         71.6923+-0.2342       
   deltablue                            224.0349+-1.1196    ^   220.8591+-0.8745       ^ definitely 1.0144x faster
   earley-boyer                          90.8330+-0.2471    ?    91.0550+-0.6016       ?
   raytrace                              58.6492+-0.1942    ^    57.5860+-0.1636       ^ definitely 1.0185x faster
   regexp                               103.2842+-0.5738    ?   103.6217+-0.3030       ?
   richards                             204.7647+-0.4575    ^   178.8069+-0.4249       ^ definitely 1.1452x faster
   splay                                 93.9206+-0.3154         93.8991+-0.3847       

   <arithmetic>                         121.0596+-0.1853    ^   116.7886+-0.1891       ^ definitely 1.0366x faster
   <geometric> *                        107.9179+-0.1498    ^   105.3890+-0.1611       ^ definitely 1.0240x faster
   <harmonic>                            97.7053+-0.1498    ^    96.2554+-0.1560       ^ definitely 1.0151x faster

                                            TipOfTree                  CFA                                       
Kraken:
   ai-astar                             491.8722+-1.8657        491.6102+-2.1346       
   audio-beat-detection                 191.3896+-0.6444    !   193.1956+-0.5294       ! definitely 1.0094x slower
   audio-dft                            265.5784+-1.5956    ^   261.3193+-2.1896       ^ definitely 1.0163x faster
   audio-fft                            124.9334+-0.1853    ?   125.4338+-0.4450       ?
   audio-oscillator                     251.5153+-1.0876        250.4084+-1.0322       
   imaging-darkroom                     412.2875+-0.8770    ?   412.7251+-0.8786       ?
   imaging-desaturate                   229.8157+-0.4044    ^   216.2189+-0.3764       ^ definitely 1.0629x faster
   imaging-gaussian-blur                579.9277+-0.8575    ^   567.2564+-0.9921       ^ definitely 1.0223x faster
   json-parse-financial                  53.8428+-0.2035    ?    54.0464+-0.2093       ?
   json-stringify-tinderbox              67.6931+-0.2539    ?    67.7358+-0.2734       ?
   stanford-crypto-aes                  130.2188+-0.9518        129.4420+-0.8691       
   stanford-crypto-ccm                   99.4847+-0.3260    !   101.1548+-0.4164       ! definitely 1.0168x slower
   stanford-crypto-pbkdf2               187.6499+-0.8004    ?   188.8759+-1.0071       ?
   stanford-crypto-sha256-iterative      69.9878+-0.1863         69.6543+-0.1668       

   <arithmetic> *                       225.4426+-0.3118    ^   223.5055+-0.3132       ^ definitely 1.0087x faster
   <geometric>                          175.7162+-0.2682    ^   174.8010+-0.2755       ^ definitely 1.0052x faster
   <harmonic>                           136.9772+-0.2368        136.7825+-0.2282       

                                            TipOfTree                  CFA                                       
All benchmarks:
   <arithmetic>                          88.5950+-0.1068    ^    87.4122+-0.1004       ^ definitely 1.0135x faster
   <geometric>                           22.9797+-0.0474         22.9534+-0.0477       
   <harmonic>                             7.3245+-0.0367    ?     7.3629+-0.0354       ?

                                            TipOfTree                  CFA                                       
Geomean of preferred means:
   <scaled-result>                       53.1381+-0.0625    ^    52.7237+-0.0651       ^ definitely 1.0079x faster

Latest numbers on Mac Pro:


Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/secondary/OpenSource/WebKitBuild/Release/jsc
"CFA" at /Volumes/Data/fromMiniMe/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                           TipOfTree                  CFA                                       
SunSpider:
  3d-cube                                7.9004+-0.0265    !     7.9902+-0.0335       ! definitely 1.0114x slower
  3d-morph                               8.4615+-0.1076    ?     8.4887+-0.1116       ?
  3d-raytrace                            8.0821+-0.1095          8.0724+-0.0618       
  access-binary-trees                    1.7927+-0.0033          1.7806+-0.0161       
  access-fannkuch                        7.6813+-0.0148    !     7.9859+-0.0254       ! definitely 1.0397x slower
  access-nbody                           4.1641+-0.0068    !     4.1778+-0.0037       ! definitely 1.0033x slower
  access-nsieve                          3.1422+-0.0032    !     3.1584+-0.0066       ! definitely 1.0052x slower
  bitops-3bit-bits-in-byte               1.7320+-0.0103    ?     1.7441+-0.0089       ?
  bitops-bits-in-byte                    5.2152+-0.0453    ?     5.2206+-0.0533       ?
  bitops-bitwise-and                     3.4118+-0.0640          3.3993+-0.0644       
  bitops-nsieve-bits                     5.6763+-0.0392          5.6438+-0.0351       
  controlflow-recursive                  2.2564+-0.0244          2.2509+-0.0028       
  crypto-aes                             6.9132+-0.0689    !     7.0664+-0.0630       ! definitely 1.0222x slower
  crypto-md5                             3.0181+-0.0343    ?     3.0585+-0.0331       ? might be 1.0134x slower
  crypto-sha1                            2.7090+-0.0109    !     2.7557+-0.0108       ! definitely 1.0172x slower
  date-format-tofte                     10.6874+-0.0850    ?    10.7886+-0.1017       ?
  date-format-xparb                     10.7804+-0.2653    ?    11.2082+-0.1793       ? might be 1.0397x slower
  math-cordic                            7.2695+-0.0506          7.2238+-0.0219       
  math-partial-sums                     10.4781+-0.0501         10.4767+-0.0339       
  math-spectral-norm                     3.2288+-0.0168    !     3.2505+-0.0036       ! definitely 1.0067x slower
  regexp-dna                            12.4863+-0.1218    ?    12.5624+-0.1253       ?
  string-base64                          5.4727+-0.0648    ?     5.5304+-0.0697       ? might be 1.0105x slower
  string-fasta                           7.3632+-0.0576          7.3506+-0.0232       
  string-tagcloud                       12.8958+-0.0586         12.8081+-0.0688       
  string-unpack-code                    23.9726+-0.1714    ?    24.2456+-0.1831       ? might be 1.0114x slower
  string-validate-input                  6.8011+-0.1062          6.7479+-0.0643       

  <arithmetic> *                         7.0612+-0.0244    !     7.1149+-0.0199       ! definitely 1.0076x slower
  <geometric>                            5.7812+-0.0155    !     5.8176+-0.0134       ! definitely 1.0063x slower
  <harmonic>                             4.6961+-0.0111    ?     4.7189+-0.0120       ?

                                           TipOfTree                  CFA                                       
V8:
  crypto                                80.3328+-0.2637         79.9557+-0.1580       
  deltablue                            252.4219+-1.5655    ?   253.0487+-1.3899       ?
  earley-boyer                         110.3867+-0.3100    !   111.4466+-0.2599       ! definitely 1.0096x slower
  raytrace                              66.3575+-0.3105    ^    64.6593+-0.1899       ^ definitely 1.0263x faster
  regexp                               123.6966+-0.5882        123.6552+-0.5325       
  richards                             221.5944+-0.9748    ^   215.4157+-1.0238       ^ definitely 1.0287x faster
  splay                                122.9378+-0.7662    ?   124.5312+-0.8843       ? might be 1.0130x slower

  <arithmetic>                         139.6754+-0.4149        138.9589+-0.3738       
  <geometric> *                        125.8663+-0.2769    ^   125.2510+-0.2812       ^ definitely 1.0049x faster
  <harmonic>                           114.2437+-0.1976    ^   113.5287+-0.2205       ^ definitely 1.0063x faster

                                           TipOfTree                  CFA                                       
Kraken:
  ai-astar                             832.8773+-0.4054        825.4636+-11.7787      
  audio-beat-detection                 214.1596+-1.7669        213.3784+-0.9437       
  audio-dft                            271.3054+-4.2497    ^   260.8641+-3.0615       ^ definitely 1.0400x faster
  audio-fft                            135.4391+-0.1603    !   139.2097+-1.1166       ! definitely 1.0278x slower
  audio-oscillator                     293.9338+-3.0579        292.1754+-1.9142       
  imaging-darkroom                     486.8431+-4.6328        484.2544+-3.7257       
  imaging-desaturate                   244.8629+-0.2524    ^   237.8543+-0.0918       ^ definitely 1.0295x faster
  imaging-gaussian-blur                641.8301+-0.2648    ^   611.2323+-0.2040       ^ definitely 1.0501x faster
  json-parse-financial                  68.4385+-0.1365    ?    68.6702+-0.2375       ?
  json-stringify-tinderbox              79.7739+-0.3035    ?    80.1401+-0.2522       ?
  stanford-crypto-aes                  152.6437+-1.2898        151.5830+-1.4340       
  stanford-crypto-ccm                  116.1246+-0.7775    ?   117.1616+-0.5738       ?
  stanford-crypto-pbkdf2               235.4855+-1.7872        232.6827+-1.6888         might be 1.0120x faster
  stanford-crypto-sha256-iterative      85.6205+-0.2548    ?    86.0403+-0.3846       ?

  <arithmetic> *                       275.6670+-0.6405    ^   271.4793+-0.9585       ^ definitely 1.0154x faster
  <geometric>                          208.1917+-0.4363    ^   206.5505+-0.5262       ^ definitely 1.0079x faster
  <harmonic>                           162.2625+-0.2764        162.1470+-0.4291       

                                           TipOfTree                  CFA                                       
All benchmarks:
  <arithmetic>                         106.8225+-0.2224    ^   105.4981+-0.2833       ^ definitely 1.0126x faster
  <geometric>                           26.6011+-0.0546    ?    26.6112+-0.0438       ?
  <harmonic>                             8.2687+-0.0192    ?     8.3073+-0.0206       ?

                                           TipOfTree                  CFA                                       
Geomean of preferred means:
  <scaled-result>                       62.5735+-0.1251    ^    62.3100+-0.0910       ^ definitely 1.0042x faster
Comment 22 WebKit Review Bot 2011-10-08 15:31:50 PDT
Attachment 110283 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/dfg/DFGBasicBlock.h:56:  The parameter type should use PassOwnPtr instead of OwnPtr.  [readability/pass_ptr] [5]
Total errors found: 1 in 28 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 23 Filip Pizlo 2011-10-09 13:40:49 PDT
Created attachment 110310 [details]
the patch - merged with ByVal changes
Comment 24 WebKit Review Bot 2011-10-09 13:43:09 PDT
Attachment 110310 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/dfg/DFGBasicBlock.h:56:  The parameter type should use PassOwnPtr instead of OwnPtr.  [readability/pass_ptr] [5]
Total errors found: 1 in 28 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 25 Gavin Barraclough 2011-10-10 10:03:04 PDT
Comment on attachment 110310 [details]
the patch - merged with ByVal changes

View in context: https://bugs.webkit.org/attachment.cgi?id=110310&action=review

1/2 way through the review!

> Source/JavaScriptCore/bytecode/PredictedType.cpp:47
> +    bool isTop = true;

Could we not just test a mask & early return here?

> Source/JavaScriptCore/dfg/DFGAbstractState.cpp:34
> +namespace JSC { namespace DFG {

We normally only use ENABLE_* definitions in Platform.h (because this is included everywhere, and the way the macro works means a missed include won't result in a compile error).
It would be better to make this a regular define & #if FOO check.

> Source/JavaScriptCore/dfg/DFGAbstractState.cpp:63
> +    PROFILE(17);

Not a big issue, but it would probably be better to define these values rather than use magic numbers through the code.
Comment 26 Gavin Barraclough 2011-10-10 10:03:17 PDT
View in context: https://bugs.webkit.org/attachment.cgi?id=110310&action=review

1/2 way through the review!

> Source/JavaScriptCore/bytecode/PredictedType.cpp:47
> +    bool isTop = true;

Could we not just test a mask & early return here?

> Source/JavaScriptCore/dfg/DFGAbstractState.cpp:34
> +namespace JSC { namespace DFG {

We normally only use ENABLE_* definitions in Platform.h (because this is included everywhere, and the way the macro works means a missed include won't result in a compile error).
It would be better to make this a regular define & #if FOO check.

> Source/JavaScriptCore/dfg/DFGAbstractState.cpp:63
> +    PROFILE(17);

Not a big issue, but it would probably be better to define these values rather than use magic numbers through the code.
Comment 27 Filip Pizlo 2011-10-10 11:39:01 PDT
(In reply to comment #26)
> View in context: https://bugs.webkit.org/attachment.cgi?id=110310&action=review
> 
> 1/2 way through the review!
> 
> > Source/JavaScriptCore/bytecode/PredictedType.cpp:47
> > +    bool isTop = true;
> 
> Could we not just test a mask & early return here?

That mask would be ever-changing and we'd have to separately maintain it, as we'd have to construct it manually by saying something like:

PredictFinalObject | PredictArray | PredictObjectOther | PredictString | PredictCellOther | PredictInt32 | PredictDouble | PredictBoolean | PredictOther

The reason why PredictTop doesn't do the job is that currently a top value will not have reserved bits set because nobody sets them, unless they use PredictTop directly (which happens, but rarely).  For example, we leave 0x0004 and 0x0008, and 0x0020 for additional object types.  Yet more bits are reserved if we ever wanted to get more specific about PredictOther (currently it means null-or-undefined but there is no separate null prediction or undefined prediction) or numbers (like int-represented-as-double).

What I'm trying to do is to avoid a situation where the PredictTop mask would have to be changed to include newly used, previously reserved, bits every time we add support for predicting some new type.

Fortunately, it's almost never the case that someone wants to know if a PredictedType is top.  It only comes into play in the ToString function.  That function already has to have an "if (foo & PredictBar)" statement for every type that it's capable of printing.  So it was natural to have an else statement that clears isTop, which gives a more easy to maintain approach to the top madness.

> 
> > Source/JavaScriptCore/dfg/DFGAbstractState.cpp:34
> > +namespace JSC { namespace DFG {
> 
> We normally only use ENABLE_* definitions in Platform.h (because this is included everywhere, and the way the macro works means a missed include won't result in a compile error).
> It would be better to make this a regular define & #if FOO check.

OK.

> 
> > Source/JavaScriptCore/dfg/DFGAbstractState.cpp:63
> > +    PROFILE(17);
> 
> Not a big issue, but it would probably be better to define these values rather than use magic numbers through the code.

Will do.
Comment 28 Geoffrey Garen 2011-10-10 12:26:20 PDT
> I know.  I don't think I want to do this in this patch, but I think I've got a solution for both this and register allocation: have named speculations (so just as we have an enum of node types, have an enum of speculation types), have attributes on nodes that describe what speculatiuons should be performed, have one phase in Propagator that deduces what those speculatins ought to be, and then while we're at it, have node attributes that specify what additional registers the codegen for a node will required.  And viola, we'll be able to do global register allocation and get rid of GenerationInfo entirely.

Not sure if this helps, but another possible option is to add explicit nodes for each speculation. We didn't do this originally because it made the graph not reusable for the non-speculative path, but now there is no non-speculative path.

> > > +    static inline BlockIndex getBytecodeBegin(OwnPtr<BasicBlock>* block)
> > > +    {
> > > +        return (*block)->bytecodeBegin;
> > > +    }

RE this vs binarySearch, perhaps the binarySearch template could be modified to pass T& instead of T* to child functions.
Comment 29 Filip Pizlo 2011-10-10 12:31:43 PDT
(In reply to comment #28)
> > I know.  I don't think I want to do this in this patch, but I think I've got a solution for both this and register allocation: have named speculations (so just as we have an enum of node types, have an enum of speculation types), have attributes on nodes that describe what speculatiuons should be performed, have one phase in Propagator that deduces what those speculatins ought to be, and then while we're at it, have node attributes that specify what additional registers the codegen for a node will required.  And viola, we'll be able to do global register allocation and get rid of GenerationInfo entirely.
> 
> Not sure if this helps, but another possible option is to add explicit nodes for each speculation. We didn't do this originally because it made the graph not reusable for the non-speculative path, but now there is no non-speculative path.

This would be a performance regression.  Our boolean speculations are done as part of testing the boolean, and often without using SpeculateBooleanOperand.  And then we have tests for object-or-undefined-or-null, where the speculation checks are embedded in the control flow of the node itself.

As the DFG gets more powerful, our nodes will have more sophisticated execution semantics, and our optimizations will have to have more sophisticated modeling of those execution semantics.  Better to just live with it.  I've seen compiler hackers go down the path of trying to modularize this aspect of compilation and the results are invariably uglier than what we have.  You either have to introduce new opcodes for each micro operation, but then you get performance regressions unless you also introduce more basic blocks (we could model the object-or-undefined speculation by blowing each node that does that into a control flow diamond).  Or you can start to create a formal specification for your nodes.  But that fails because the formal specification becomes more hairy and complicated than the compiler you were going to write in the first place.

> 
> > > > +    static inline BlockIndex getBytecodeBegin(OwnPtr<BasicBlock>* block)
> > > > +    {
> > > > +        return (*block)->bytecodeBegin;
> > > > +    }
> 
> RE this vs binarySearch, perhaps the binarySearch template could be modified to pass T& instead of T* to child functions.

That would still make the style bot unhappy...
Comment 30 Gavin Barraclough 2011-10-10 15:06:01 PDT
Comment on attachment 110310 [details]
the patch - merged with ByVal changes

View in context: https://bugs.webkit.org/attachment.cgi?id=110310&action=review

couple more comments

> Source/JavaScriptCore/dfg/DFGAbstractValue.h:63
> +            m_structure = reinterpret_cast<Structure*>(1);

The magic value '1' should be named, either behind a static const or an ifdef.

> Source/JavaScriptCore/dfg/DFGAbstractValue.h:235
> +        if (reinterpret_cast<uintptr_t>(m_structure) <= 1)

This should use 'isNeitherClearNorTop'.

> Source/JavaScriptCore/dfg/DFGAbstractValue.h:249
> +    bool isNeitherClearNorTop() const { return reinterpret_cast<uintptr_t>(m_structure) > 1; }

It would probably make the code more understandable to negate this - make this method isClearOrTop(), negate uses.
Comment 31 Filip Pizlo 2011-10-10 15:48:11 PDT
Created attachment 110424 [details]
the patch - address Gavin's most recent comments
Comment 32 Filip Pizlo 2011-10-10 16:21:55 PDT
Created attachment 110430 [details]
the patch - merged up to ToT
Comment 33 WebKit Review Bot 2011-10-10 16:25:55 PDT
Attachment 110430 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/dfg/DFGBasicBlock.h:56:  The parameter type should use PassOwnPtr instead of OwnPtr.  [readability/pass_ptr] [5]
Total errors found: 1 in 28 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 34 Gavin Barraclough 2011-10-11 16:45:44 PDT
Comment on attachment 110430 [details]
the patch - merged up to ToT

View in context: https://bugs.webkit.org/attachment.cgi?id=110430&action=review

r+.  I still don't think AbstractValue is really descriptive enough, but I don't have a better name to suggest.

> Source/JavaScriptCore/dfg/DFGAbstractValue.h:312
> +struct AbstractValue {

AbstractValue doesn't seem to be the right name for this class.  This struct can tell you a type, not a value - e.g. "int", or "object with properties x, y", not '3' or '"Hello, world"'.
This naming is confusing in it's uses, e.g. in GetLocal.

> Source/JavaScriptCore/dfg/DFGNode.h:715
> +    unsigned takenBytecodeOffsetDuringParsing()

This change could probably have been made as a separate patch, breaking up this review a bit would have helped! - probably no point doing so now though.
It's a bit unfortunate that the meaning of the branch's opInfos changes over time, it would be better if we could guard against error here.
We could consider extra state in debug builds that we can ASSERT on, or we could change reflect this in the node types (have a BranchDuringParse/JumpDuringParse that are updated as the successors are set, and never reach generation).
Comment 35 Filip Pizlo 2011-10-11 19:05:36 PDT
Landed in r97218.