Bug 68316 - DFG JIT does not have full block-local CSE
Summary: DFG JIT does not have full block-local CSE
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-17 14:01 PDT by Filip Pizlo
Modified: 2011-09-17 20:46 PDT (History)
5 users (show)

See Also:


Attachments
work in progress (42.74 KB, patch)
2011-09-17 14:08 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
work in progress - fixed clobber detection (44.03 KB, patch)
2011-09-17 14:25 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
fix style (44.04 KB, patch)
2011-09-17 14:30 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
work in progress - fixed crypto slow-down (44.52 KB, patch)
2011-09-17 15:42 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch - reduced worst case performance a bit (45.35 KB, patch)
2011-09-17 16:31 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch - updated changelog (45.26 KB, patch)
2011-09-17 16:55 PDT, Filip Pizlo
oliver: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2011-09-17 14:01:17 PDT
DFG JIT can perform some simple load elimination on array accesses, but other than that it cannot do common subexpression elimination (CSE).  This is unfortunate, since CSE would be of great benefit for global variable accesses, more complex forms of array accesses, and repeated arithmetic.  This is particularly true since these operations in JavaScript are so expensive.  Thus eliminating even a few of them can have a big effect.

The DFG JIT should implement block-local CSE with facilities to eliminate both pure operations (additions on numbers, etc) and heap operations (GetGlobalVar, GetByVal, CheckMethod).  The latter should take care to ensure correctness in the case of conflicting side-effects.  CSE should be OSR-friendly: if it eliminates an operation, it should ensure that everything needed to perform that operation in the old JIT if OSR happens is still intact.
Comment 1 Filip Pizlo 2011-09-17 14:08:42 PDT
Created attachment 107773 [details]
work in progress

Still working on this, but it's pretty awesome for Kraken.



Benchmark report for Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"PhantomCSE" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing.
Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                         TipOfTree               PhantomCSE                                   

ai-astar                             634.8768+-5.2150        632.3351+-4.7085       
audio-beat-detection                 467.5151+-1.2716        465.7044+-1.3745       
audio-dft                            423.1598+-2.3857        421.5126+-2.3649       
audio-fft                            364.7906+-1.2717    ?   365.5174+-1.0969       ?
audio-oscillator                     312.1471+-1.5005    ?   331.5295+-21.8849      ? might be 1.0621x slower
imaging-darkroom                     412.6474+-1.6945        411.6372+-1.1734       
imaging-desaturate                   207.4252+-0.7089    !   221.9861+-0.8245       ! definitely 1.0702x slower
imaging-gaussian-blur               1082.7893+-2.2880    ^   591.5093+-0.8721       ^ definitely 1.8306x faster
json-parse-financial                  49.5232+-0.2997    ?    50.3901+-1.3708       ? might be 1.0175x slower
json-stringify-tinderbox              68.0220+-0.2688         67.8016+-0.6565       
stanford-crypto-aes                  144.5714+-0.6423        143.6752+-1.0772       
stanford-crypto-ccm                  110.8207+-0.2111        110.1401+-0.5823       
stanford-crypto-pbkdf2               396.3371+-1.5674        394.0494+-5.6482       
stanford-crypto-sha256-iterative     148.7113+-0.6486    ?   149.4918+-0.3482       ?

<arithmetic>                         344.5241+-0.4476    ^   311.2343+-1.5880       ^ definitely 1.1070x faster
<geometric>                          252.2160+-0.3048    ^   243.4866+-0.8714       ^ definitely 1.0359x faster
<harmonic>                           174.4002+-0.3305        174.1282+-0.7948
Comment 2 Filip Pizlo 2011-09-17 14:25:29 PDT
Created attachment 107774 [details]
work in progress - fixed clobber detection

This fixes a bug where nodes that might clobber the world depending on what the predictions are are not caught as having clobbered the world.  That would have potentially caused a heap access to miss a mutation performed by a valueOf method.  This does not affect performance, we still have a win.



Benchmark report for Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"PhantomCSE" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing.
Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                         TipOfTree               PhantomCSE                                   

ai-astar                             633.3743+-5.5077        628.2505+-3.3314       
audio-beat-detection                 467.7312+-1.1106        465.8301+-2.4056       
audio-dft                            425.0346+-3.3409        421.6433+-3.9040       
audio-fft                            364.9765+-1.0920        363.3348+-0.7880       
audio-oscillator                     312.2311+-1.3474    ?   334.1131+-24.5161      ? might be 1.0701x slower
imaging-darkroom                     421.8919+-7.9838        412.6673+-3.6377         might be 1.0224x faster
imaging-desaturate                   207.5554+-0.8215    !   222.0429+-0.8075       ! definitely 1.0698x slower
imaging-gaussian-blur               1080.7974+-3.1483    ^   590.5500+-1.2227       ^ definitely 1.8302x faster
json-parse-financial                  49.4475+-0.2421    ?    49.8471+-0.5680       ?
json-stringify-tinderbox              67.7704+-0.3214         67.3444+-0.3539       
stanford-crypto-aes                  145.3547+-1.7446        144.5298+-1.4606       
stanford-crypto-ccm                  113.2217+-1.2073        111.4091+-1.2598         might be 1.0163x faster
stanford-crypto-pbkdf2               395.2784+-1.8910        393.7265+-1.4519       
stanford-crypto-sha256-iterative     149.5971+-0.5527    ^   148.2928+-0.5524       ^ definitely 1.0088x faster

<arithmetic>                         345.3045+-1.0924    ^   310.9701+-1.5270       ^ definitely 1.1104x faster
<geometric>                          253.0919+-0.7436    ^   243.2712+-1.0140       ^ definitely 1.0404x faster
<harmonic>                           174.9138+-0.5020    ^   173.6795+-0.4600       ^ definitely 1.0071x faster
Comment 3 WebKit Review Bot 2011-09-17 14:28:02 PDT
Attachment 107774 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/dfg/DFGPropagator.cpp:537:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Source/JavaScriptCore/dfg/DFGPropagator.cpp:544:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 2 in 7 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Filip Pizlo 2011-09-17 14:30:30 PDT
Created attachment 107775 [details]
fix style
Comment 5 Filip Pizlo 2011-09-17 15:22:14 PDT
This appears to cause problems in v8-crypto.  Will investigate.



Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"PhantomCSE" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree               PhantomCSE                                   
SunSpider:
   3d-cube                                7.8284+-0.2287    ?     8.0779+-0.3870       ? might be 1.0319x slower
   3d-morph                               7.6338+-0.1110          7.4934+-0.0868         might be 1.0187x faster
   3d-raytrace                            7.8429+-0.1767          7.7937+-0.1765       
   access-binary-trees                    2.3852+-0.1110          2.3590+-0.0676         might be 1.0111x faster
   access-fannkuch                       11.8849+-0.3509         11.7750+-0.2073       
   access-nbody                           4.2269+-0.0789    ?     4.2728+-0.0942       ? might be 1.0109x slower
   access-nsieve                          2.8027+-0.1699          2.6734+-0.0905         might be 1.0483x faster
   bitops-3bit-bits-in-byte               1.6754+-0.0381    ?     1.7034+-0.0363       ? might be 1.0167x slower
   bitops-bits-in-byte                    2.8888+-0.1243          2.7230+-0.0417         might be 1.0609x faster
   bitops-bitwise-and                     3.5988+-0.0952    ?     3.6492+-0.0804       ? might be 1.0140x slower
   bitops-nsieve-bits                     5.5235+-0.2110    ?     5.5242+-0.1183       ?
   controlflow-recursive                  1.9878+-0.0313    ?     2.0334+-0.0523       ? might be 1.0229x slower
   crypto-aes                             6.8065+-0.2288    ?     6.9506+-0.2345       ? might be 1.0212x slower
   crypto-md5                             2.8490+-0.0740          2.8360+-0.0769       
   crypto-sha1                            2.2570+-0.0533          2.2547+-0.0661       
   date-format-tofte                     10.3381+-0.3317    ?    10.5701+-0.3384       ? might be 1.0225x slower
   date-format-xparb                      8.8419+-0.1870    ?     8.8898+-0.2041       ?
   math-cordic                            6.3064+-0.1460          6.2773+-0.1129       
   math-partial-sums                      7.4684+-0.1287          7.3521+-0.1185         might be 1.0158x faster
   math-spectral-norm                     2.6770+-0.0452    ?     2.7024+-0.0858       ?
   regexp-dna                            11.1097+-0.2560         11.0797+-0.2991       
   string-base64                          5.9207+-0.1820    ?     5.9388+-0.2183       ?
   string-fasta                           7.4196+-0.2309          7.2921+-0.1549         might be 1.0175x faster
   string-tagcloud                       12.2438+-0.3220    ?    12.3827+-0.3968       ? might be 1.0113x slower
   string-unpack-code                    19.0840+-0.4811    ?    19.1473+-0.4783       ?
   string-validate-input                  7.0255+-0.1803          6.7060+-0.1777         might be 1.0477x faster

   <arithmetic>                           6.5626+-0.0592          6.5561+-0.0347       
   <geometric>                            5.4158+-0.0484          5.4012+-0.0282       
   <harmonic>                             4.4173+-0.0425          4.4048+-0.0279       

                                            TipOfTree               PhantomCSE                                   
V8:
   crypto                                85.0254+-1.0501    !    90.1991+-0.9224       ! definitely 1.0608x slower
   deltablue                            241.6628+-3.2628    ?   246.2428+-3.6104       ? might be 1.0190x slower
   earley-boyer                          97.6438+-0.7456    ?    99.2094+-2.1783       ? might be 1.0160x slower
   raytrace                              71.4261+-2.1758         71.2358+-0.9164       
   regexp                               108.6062+-0.7931        107.9489+-1.3720       
   richards                             222.6911+-1.9571    ?   223.6860+-1.7525       ?
   splay                                100.8236+-1.2160        100.5770+-1.1419       

   <arithmetic>                         132.5542+-0.8131    ?   134.1570+-0.8461       ? might be 1.0121x slower
   <geometric>                          119.5471+-0.8401    !   121.0450+-0.6406       ! definitely 1.0125x slower
   <harmonic>                           109.7402+-0.9649    ?   111.1702+-0.4928       ? might be 1.0130x slower

                                            TipOfTree               PhantomCSE                                   
Kraken:
   ai-astar                             649.6561+-4.1369        646.0026+-2.4923       
   audio-beat-detection                 475.6792+-4.4012    ?   483.0049+-4.7150       ? might be 1.0154x slower
   audio-dft                            436.4239+-7.3389    ?   446.6362+-8.3083       ? might be 1.0234x slower
   audio-fft                            373.1857+-1.6543    ^   367.4489+-3.1949       ^ definitely 1.0156x faster
   audio-oscillator                     316.8722+-2.4950    ?   320.6392+-1.8030       ? might be 1.0119x slower
   imaging-darkroom                     422.7289+-3.2483        420.5169+-3.2162       
   imaging-desaturate                   213.9714+-1.3662    !   227.1311+-2.3809       ! definitely 1.0615x slower
   imaging-gaussian-blur               1105.4039+-10.1736   ^   599.5372+-3.7661       ^ definitely 1.8438x faster
   json-parse-financial                  51.4321+-1.0503    ?    52.0258+-1.3815       ? might be 1.0115x slower
   json-stringify-tinderbox              71.3999+-0.7193         70.0470+-1.1840         might be 1.0193x faster
   stanford-crypto-aes                  148.7115+-1.0955        145.8912+-2.0387         might be 1.0193x faster
   stanford-crypto-ccm                  114.4813+-1.0582        112.6201+-1.1088         might be 1.0165x faster
   stanford-crypto-pbkdf2               405.1123+-2.8189        401.1639+-4.8977       
   stanford-crypto-sha256-iterative     153.8989+-1.7272    ^   150.0681+-1.9634       ^ definitely 1.0255x faster

   <arithmetic>                         352.7827+-0.6156    ^   317.3381+-1.0955       ^ definitely 1.1117x faster
   <geometric>                          259.2795+-0.4700    ^   248.2747+-1.0513       ^ definitely 1.0443x faster
   <harmonic>                           180.3846+-0.9365        178.1785+-1.5075         might be 1.0124x faster

                                            TipOfTree               PhantomCSE                                   
All benchmarks:
   <arithmetic>                         128.4567+-0.2710    ^   118.1338+-0.3207       ^ definitely 1.0874x faster
   <geometric>                           27.1805+-0.1378    ^    26.8417+-0.0892       ^ definitely 1.0126x faster
   <harmonic>                             7.7977+-0.0731          7.7760+-0.0484
Comment 6 Filip Pizlo 2011-09-17 15:42:26 PDT
Created attachment 107776 [details]
work in progress - fixed crypto slow-down

Turns out there was a pretty stupid goof in the handling of PutByVal aliasing, which would result in almost no PutByValAlias nodes being created.



Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"PhantomCSE" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree               PhantomCSE                                   
SunSpider:
   3d-cube                                7.9232+-0.2648          7.7525+-0.1162         might be 1.0220x faster
   3d-morph                               7.5163+-0.1542    ?     7.6411+-0.1451       ? might be 1.0166x slower
   3d-raytrace                            7.6198+-0.2213    ?     7.8154+-0.2687       ? might be 1.0257x slower
   access-binary-trees                    2.2927+-0.0631    ?     2.3430+-0.1076       ? might be 1.0219x slower
   access-fannkuch                       11.7955+-0.2221         11.7900+-0.2864       
   access-nbody                           4.1922+-0.1009    ?     4.2398+-0.1195       ? might be 1.0114x slower
   access-nsieve                          2.6116+-0.0884          2.5449+-0.0317         might be 1.0262x faster
   bitops-3bit-bits-in-byte               1.7448+-0.0877          1.6776+-0.0485         might be 1.0401x faster
   bitops-bits-in-byte                    2.8421+-0.1109          2.7431+-0.0741         might be 1.0361x faster
   bitops-bitwise-and                     3.5856+-0.1136          3.5406+-0.0905         might be 1.0127x faster
   bitops-nsieve-bits                     5.4613+-0.1900          5.3684+-0.2327         might be 1.0173x faster
   controlflow-recursive                  2.0266+-0.0641    ?     2.0337+-0.0686       ?
   crypto-aes                             7.0701+-0.2844          6.7988+-0.2883         might be 1.0399x faster
   crypto-md5                             2.8068+-0.0990          2.7402+-0.0907         might be 1.0243x faster
   crypto-sha1                            2.2296+-0.0893    ?     2.2779+-0.0488       ? might be 1.0217x slower
   date-format-tofte                     10.1306+-0.2732    ?    10.2069+-0.3424       ?
   date-format-xparb                      8.7109+-0.2887    ?     8.8241+-0.3383       ? might be 1.0130x slower
   math-cordic                            6.1599+-0.1779          6.1429+-0.0971       
   math-partial-sums                      7.4027+-0.1215    ?     7.5718+-0.2252       ? might be 1.0229x slower
   math-spectral-norm                     2.5749+-0.0525    ?     2.7397+-0.1251       ? might be 1.0640x slower
   regexp-dna                            10.8766+-0.1139         10.8204+-0.1617       
   string-base64                          5.7194+-0.1861    ?     5.8138+-0.1489       ? might be 1.0165x slower
   string-fasta                           6.9102+-0.1286    ?     7.0826+-0.1556       ? might be 1.0249x slower
   string-tagcloud                       12.3084+-0.1789         12.1509+-0.2705         might be 1.0130x faster
   string-unpack-code                    18.6770+-0.4576         18.5139+-0.4159       
   string-validate-input                  6.7260+-0.2239          6.7087+-0.2291       

   <arithmetic>                           6.4583+-0.0381          6.4570+-0.0472       
   <geometric>                            5.3302+-0.0372          5.3297+-0.0292       
   <harmonic>                             4.3574+-0.0437          4.3517+-0.0257       

                                            TipOfTree               PhantomCSE                                   
V8:
   crypto                                82.6909+-0.4717    ?    83.5986+-1.4897       ? might be 1.0110x slower
   deltablue                            240.3774+-2.9172    ?   241.6129+-2.6006       ?
   earley-boyer                          94.9925+-0.3658    ?    95.1899+-0.3365       ?
   raytrace                              69.2022+-0.5321         68.5608+-0.3370       
   regexp                               106.4039+-0.2924    ?   106.8109+-0.3315       ?
   richards                             217.5292+-0.7257    ?   219.1786+-1.2439       ?
   splay                                 98.7563+-0.6588    ?    98.9796+-0.6181       ?

   <arithmetic>                         129.9932+-0.4350    ?   130.5616+-0.4737       ?
   <geometric>                          116.9140+-0.2731    ?   117.2857+-0.3739       ?
   <harmonic>                           107.1039+-0.2477    ?   107.3130+-0.3540       ?

                                            TipOfTree               PhantomCSE                                   
Kraken:
   ai-astar                             633.8644+-4.5250        629.9101+-4.8375       
   audio-beat-detection                 467.7739+-1.8784    !   480.9118+-2.5217       ! definitely 1.0281x slower
   audio-dft                            420.5802+-2.2915    ?   422.7050+-3.7068       ?
   audio-fft                            365.8273+-1.0027    ^   361.6524+-1.0137       ^ definitely 1.0115x faster
   audio-oscillator                     312.5754+-1.2268    ?   313.3259+-1.3247       ?
   imaging-darkroom                     414.3694+-0.6042    ?   417.3312+-6.3912       ?
   imaging-desaturate                   208.8211+-0.4056    !   219.5143+-1.8043       ! definitely 1.0512x slower
   imaging-gaussian-blur               1079.3643+-3.0724    ^   591.0141+-1.4195       ^ definitely 1.8263x faster
   json-parse-financial                  49.3249+-0.2466    ?    49.3504+-0.3346       ?
   json-stringify-tinderbox              68.2024+-0.2941    ^    67.4662+-0.1954       ^ definitely 1.0109x faster
   stanford-crypto-aes                  145.1407+-0.6647        143.8512+-1.0142       
   stanford-crypto-ccm                  111.9583+-0.4591        111.0426+-0.5527       
   stanford-crypto-pbkdf2               393.4172+-1.8730    !   400.2963+-3.0613       ! definitely 1.0175x slower
   stanford-crypto-sha256-iterative     149.7088+-0.5165        149.1306+-0.5542       

   <arithmetic>                         344.3520+-0.4113    ^   311.2502+-0.8385       ^ definitely 1.1064x faster
   <geometric>                          252.5200+-0.3160    ^   242.9262+-0.5994       ^ definitely 1.0395x faster
   <harmonic>                           174.7026+-0.3688    ^   173.0850+-0.4576       ^ definitely 1.0093x faster

                                            TipOfTree               PhantomCSE                                   
All benchmarks:
   <arithmetic>                         125.5063+-0.1475    ^   115.7301+-0.2455       ^ definitely 1.0845x faster
   <geometric>                           26.6428+-0.0985    ^    26.3484+-0.0743       ^ definitely 1.0112x faster
   <harmonic>                             7.6893+-0.0752          7.6787+-0.0442
Comment 7 Filip Pizlo 2011-09-17 16:31:30 PDT
Created attachment 107778 [details]
the patch - reduced worst case performance a bit

This improces the CSE algorithm by ensuring that pure CSE only searches for aliases in the range:

From: Beginning of the basic block, or the maximum index of the children, whichever is bigger.
To: Last occurence of a node with the same opcode.

It searches this range backwards.  This optimization does not apply for non-pure operations because it would miss side effects.

Current numbers with latest changes.

Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"PhantomCSE" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc

Collected 30 samples per benchmark/VM, with 10 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree               PhantomCSE                                   
SunSpider:
   3d-cube                                7.7572+-0.0872    ?     7.7909+-0.1244       ?
   3d-morph                               7.5184+-0.0785    ?     7.5558+-0.0963       ?
   3d-raytrace                            7.7239+-0.1032    ?     7.7468+-0.1156       ?
   access-binary-trees                    2.2740+-0.0315    ?     2.3107+-0.0482       ? might be 1.0161x slower
   access-fannkuch                       11.7960+-0.1183    ^    11.5626+-0.0988       ^ definitely 1.0202x faster
   access-nbody                           4.1633+-0.0454    ?     4.2540+-0.0832       ? might be 1.0218x slower
   access-nsieve                          2.6018+-0.0398    ?     2.6032+-0.0310       ?
   bitops-3bit-bits-in-byte               1.6668+-0.0267    ?     1.7083+-0.0400       ? might be 1.0249x slower
   bitops-bits-in-byte                    2.7696+-0.0319    ?     2.8233+-0.0574       ? might be 1.0194x slower
   bitops-bitwise-and                     3.5817+-0.0621    ?     3.6520+-0.0629       ? might be 1.0196x slower
   bitops-nsieve-bits                     5.2870+-0.0667    ?     5.2977+-0.0682       ?
   controlflow-recursive                  2.0383+-0.0401    ?     2.1312+-0.0930       ? might be 1.0456x slower
   crypto-aes                             7.0860+-0.2158    ?     7.1663+-0.1855       ? might be 1.0113x slower
   crypto-md5                             2.8093+-0.0448    ?     2.8269+-0.0811       ?
   crypto-sha1                            2.2280+-0.0402    ?     2.2414+-0.0501       ?
   date-format-tofte                     10.2269+-0.1716         10.2042+-0.1426       
   date-format-xparb                      8.7081+-0.1255    ?     8.8777+-0.1652       ? might be 1.0195x slower
   math-cordic                            6.1956+-0.0690          6.1295+-0.0601         might be 1.0108x faster
   math-partial-sums                      7.4034+-0.0937          7.3807+-0.0997       
   math-spectral-norm                     2.5948+-0.0337    ?     2.6117+-0.0482       ?
   regexp-dna                            10.8661+-0.1062    ?    10.9373+-0.1211       ?
   string-base64                          5.8322+-0.1108          5.8219+-0.1171       
   string-fasta                           6.9385+-0.1083    ?     7.0151+-0.1314       ? might be 1.0110x slower
   string-tagcloud                       12.1121+-0.1935    ?    12.3550+-0.2062       ? might be 1.0201x slower
   string-unpack-code                    18.8462+-0.3793         18.5825+-0.2283         might be 1.0142x faster
   string-validate-input                  6.6845+-0.1124    ?     6.7245+-0.1514       ?

   <arithmetic>                           6.4504+-0.0253    ?     6.4735+-0.0239       ?
   <geometric>                            5.3125+-0.0181    !     5.3515+-0.0169       ! definitely 1.0073x slower
   <harmonic>                             4.3312+-0.0168    !     4.3786+-0.0162       ! definitely 1.0109x slower

                                            TipOfTree               PhantomCSE                                   
V8:
   crypto                                82.7124+-0.2753    ?    82.9466+-0.3673       ?
   deltablue                            239.2213+-0.8353    ?   239.8840+-1.0250       ?
   earley-boyer                          94.9496+-0.1586    ?    95.3712+-0.4045       ?
   raytrace                              69.0085+-0.2725         68.9221+-0.4887       
   regexp                               107.3699+-0.7440        107.0229+-0.3693       
   richards                             217.7471+-0.5430    !   219.5234+-0.5972       ! definitely 1.0082x slower
   splay                                 98.7746+-0.3300         98.7596+-0.2519       

   <arithmetic>                         129.9691+-0.2003    ?   130.3471+-0.2048       ?
   <geometric>                          116.9542+-0.1749    ?   117.1789+-0.1988       ?
   <harmonic>                           107.1496+-0.1678    ?   107.2769+-0.2259       ?

                                            TipOfTree               PhantomCSE                                   
Kraken:
   ai-astar                             636.9671+-4.2157    ^   629.5812+-1.9895       ^ definitely 1.0117x faster
   audio-beat-detection                 467.7131+-1.0341    !   470.1425+-1.0160       ! definitely 1.0052x slower
   audio-dft                            425.3115+-3.7983        423.8902+-2.1948       
   audio-fft                            364.0817+-0.6708        363.9339+-1.0152       
   audio-oscillator                     312.5008+-0.3736    ?   312.5853+-0.5099       ?
   imaging-darkroom                     413.9106+-0.8494    ^   410.8664+-0.5848       ^ definitely 1.0074x faster
   imaging-desaturate                   207.8968+-0.4679    !   218.0565+-0.4687       ! definitely 1.0489x slower
   imaging-gaussian-blur               1081.8128+-1.7363    ^   589.7422+-1.1866       ^ definitely 1.8344x faster
   json-parse-financial                  49.2932+-0.1803    ?    49.7524+-0.2908       ?
   json-stringify-tinderbox              69.6689+-1.5041    ^    67.6637+-0.3880       ^ definitely 1.0296x faster
   stanford-crypto-aes                  144.3012+-0.4594        144.2128+-0.4487       
   stanford-crypto-ccm                  111.9439+-0.3347        111.7710+-0.3905       
   stanford-crypto-pbkdf2               396.0157+-1.5465    ?   397.0790+-1.5456       ?
   stanford-crypto-sha256-iterative     149.2708+-0.4787    !   150.4953+-0.5297       ! definitely 1.0082x slower

   <arithmetic>                         345.0492+-0.4615    ^   309.9837+-0.3411       ^ definitely 1.1131x faster
   <geometric>                          252.9574+-0.3953    ^   242.5861+-0.2959       ^ definitely 1.0428x faster
   <harmonic>                           175.1667+-0.5736    ^   173.5501+-0.3501       ^ definitely 1.0093x faster

                                            TipOfTree               PhantomCSE                                   
All benchmarks:
   <arithmetic>                         125.7060+-0.1391    ^   115.3301+-0.1168       ^ definitely 1.0900x faster
   <geometric>                           26.6087+-0.0437    ^    26.3931+-0.0411       ^ definitely 1.0082x faster
   <harmonic>                             7.6444+-0.0289    !     7.7252+-0.0277       ! definitely 1.0106x slower
Comment 8 Filip Pizlo 2011-09-17 16:55:22 PDT
Created attachment 107779 [details]
the patch - updated changelog
Comment 9 Oliver Hunt 2011-09-17 17:20:48 PDT
Comment on attachment 107779 [details]
the patch - updated changelog

View in context: https://bugs.webkit.org/attachment.cgi?id=107779&action=review

r=me but i'd like some response to my commentary

> Source/JavaScriptCore/dfg/DFGNode.h:176
> +    macro(PutByVal, NodeMustGenerate | NodeClobbersWorld) \
> +    macro(PutByValAlias, NodeMustGenerate | NodeClobbersWorld) \
> +    macro(GetById, NodeResultJS | NodeMustGenerate | NodeClobbersWorld) \
> +    macro(PutById, NodeMustGenerate | NodeClobbersWorld) \
> +    macro(PutByIdDirect, NodeMustGenerate | NodeClobbersWorld) \

Do we really need to consider the gets to clobber the world?  if we know that the get is a pure object property, then the access is pure

> Source/JavaScriptCore/dfg/DFGPropagator.cpp:732
> +        case ArithAbs:

Don't we support sqrt(), and min/max?

In the longer term it seems we would probably want a generic handler for pure intrinsics.

> Source/JavaScriptCore/dfg/DFGPropagator.cpp:-443
> -#if ENABLE(DFG_DEBUG_VERBOSE)
> -    graph.dump(codeBlock);
> -#endif

is this deliberate?
Comment 10 Filip Pizlo 2011-09-17 17:25:43 PDT
> > Source/JavaScriptCore/dfg/DFGNode.h:176
> > +    macro(PutByVal, NodeMustGenerate | NodeClobbersWorld) \
> > +    macro(PutByValAlias, NodeMustGenerate | NodeClobbersWorld) \
> > +    macro(GetById, NodeResultJS | NodeMustGenerate | NodeClobbersWorld) \
> > +    macro(PutById, NodeMustGenerate | NodeClobbersWorld) \
> > +    macro(PutByIdDirect, NodeMustGenerate | NodeClobbersWorld) \
> 
> Do we really need to consider the gets to clobber the world?  if we know that the get is a pure object property, then the access is pure

Currently we never predict or speculate the purity of GetById.  All GetById's get compiled with a slow path that may call arbitrary JS code for getters, or arbitrary C++ code in WebCore.

This will change once we start doing reasonable things for GetById.  We will still have a GetById that covers the cases where we know nothing (in which case it'll be have as it does now, and we'll still have to assume that it clobbers the world), but we will also have another GetById op that already knows the structure it's accessing and already knows that the access is pure.  That's probably my next task...

> 
> > Source/JavaScriptCore/dfg/DFGPropagator.cpp:732
> > +        case ArithAbs:
> 
> Don't we support sqrt(), and min/max?

Haven't landed that patch yet. :-)

> 
> In the longer term it seems we would probably want a generic handler for pure intrinsics.

pureCSE() is that handler.  So it's just a question of whether we want performNodeCSE() to have one switch statement (which is more efficient) or have an if statement at the top like:

if (node.op & NodeIsPure) {
    setReplacement(pureCSE(node));
    return;
}

I think that would be reasonable if the number of pure arithmetic nodes that are CSE-able grows really big.

> 
> > Source/JavaScriptCore/dfg/DFGPropagator.cpp:-443
> > -#if ENABLE(DFG_DEBUG_VERBOSE)
> > -    graph.dump(codeBlock);
> > -#endif
> 
> is this deliberate?

Yup, since fixpoint() now does dump() calls, including one at the end.
Comment 11 Filip Pizlo 2011-09-17 20:46:39 PDT
Landed in r95389.