Bug 69868 - DFG virtual register allocator should be more aggressive in reusing temporary slots
Summary: DFG virtual register allocator should be more aggressive in reusing temporary...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-11 14:08 PDT by Filip Pizlo
Modified: 2011-10-11 18:36 PDT (History)
2 users (show)

See Also:


Attachments
the patch (13.99 KB, patch)
2011-10-11 14:47 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (20.64 KB, patch)
2011-10-11 15:41 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch - fix style (20.64 KB, patch)
2011-10-11 15:48 PDT, Filip Pizlo
oliver: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2011-10-11 14:08:25 PDT
The DFG virtual register allocator currently only allocates slots that are beyond the high watermark of used bytecode temporaries.  But there may be unused bytecode temporaries below that high watermark, and the virtual register allocation should be able to reuse them.

This will reduce the DFG call frame sizes, and will pave the way for more a fluid interaction between bytecode variables and virtual registers.  This is likely to be necessary as we enable inlining of JavaScript code blocks.
Comment 1 Filip Pizlo 2011-10-11 14:44:09 PDT
This does weird, but good, things for performance.


Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"BetterVRAlloc" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree             BetterVRAlloc                                  
SunSpider:
   3d-cube                                7.3773+-0.1682    ?     7.4579+-0.1385       ? might be 1.0109x slower
   3d-morph                               7.6129+-0.1029    ?     7.6418+-0.1258       ?
   3d-raytrace                            7.4375+-0.1547    ?     7.5778+-0.1489       ? might be 1.0189x slower
   access-binary-trees                    1.7496+-0.0528          1.7096+-0.0679         might be 1.0234x faster
   access-fannkuch                        6.4089+-0.0902    ?     6.5075+-0.1120       ? might be 1.0154x slower
   access-nbody                           3.3459+-0.0701          3.3088+-0.0721         might be 1.0112x faster
   access-nsieve                          2.5818+-0.0483    ?     2.6016+-0.0653       ?
   bitops-3bit-bits-in-byte               1.7004+-0.0294    ?     1.7129+-0.0297       ?
   bitops-bits-in-byte                    2.7519+-0.0609    ?     2.7989+-0.0604       ? might be 1.0171x slower
   bitops-bitwise-and                     3.3616+-0.0921          3.2038+-0.1061         might be 1.0493x faster
   bitops-nsieve-bits                     5.4789+-0.0996          5.4371+-0.1053       
   controlflow-recursive                  2.1314+-0.0493    ^     2.0389+-0.0401       ^ definitely 1.0454x faster
   crypto-aes                             6.6118+-0.1531    ?     6.8585+-0.2216       ? might be 1.0373x slower
   crypto-md5                             2.8384+-0.0509          2.7581+-0.0578         might be 1.0291x faster
   crypto-sha1                            2.4793+-0.0552    ?     2.5161+-0.0564       ? might be 1.0148x slower
   date-format-tofte                     10.1105+-0.1987          9.8553+-0.1352         might be 1.0259x faster
   date-format-xparb                      8.8825+-0.1773    ?     8.9709+-0.1812       ?
   math-cordic                            6.3703+-0.0684    ?     6.4577+-0.1055       ? might be 1.0137x slower
   math-partial-sums                      7.6012+-0.1293    ?     7.6175+-0.1011       ?
   math-spectral-norm                     2.8564+-0.0711    ?     2.8866+-0.0705       ? might be 1.0106x slower
   regexp-dna                            10.7062+-0.1282    ?    10.7601+-0.1004       ?
   string-base64                          5.3347+-0.1450          5.2579+-0.1227         might be 1.0146x faster
   string-fasta                           6.4862+-0.1604          6.4511+-0.0916       
   string-tagcloud                       11.1826+-0.2440    ?    11.3868+-0.2751       ? might be 1.0183x slower
   string-unpack-code                    20.3730+-0.3643    ?    20.7027+-0.5347       ? might be 1.0162x slower
   string-validate-input                  6.3280+-0.1449    ?     6.3823+-0.1091       ?

   <arithmetic> *                         6.1577+-0.0307    ?     6.1869+-0.0297       ?
   <geometric>                            5.0812+-0.0208    ?     5.0825+-0.0289       ?
   <harmonic>                             4.1863+-0.0221          4.1675+-0.0390       

                                            TipOfTree             BetterVRAlloc                                  
V8:
   crypto                                72.6790+-0.4355         72.3419+-0.4505       
   deltablue                            228.8242+-2.0552    ?   231.0917+-2.1679       ?
   earley-boyer                          92.2505+-0.3473    !    97.0999+-0.3843       ! definitely 1.0526x slower
   raytrace                              58.6995+-0.4796    ?    58.9699+-0.4908       ?
   regexp                               104.1287+-0.4657    ?   104.9266+-0.5007       ?
   richards                             207.9427+-1.3704    ^   189.9799+-1.3203       ^ definitely 1.0946x faster
   splay                                 96.5209+-0.5152         95.3950+-0.8332         might be 1.0118x faster

   <arithmetic>                         123.0065+-0.4590    ^   121.4007+-0.3942       ^ definitely 1.0132x faster
   <geometric> *                        109.4509+-0.3131        108.9284+-0.2682       
   <harmonic>                            98.8930+-0.2627    ?    99.0225+-0.2325       ?

                                            TipOfTree             BetterVRAlloc                                  
Kraken:
   ai-astar                             497.5357+-1.0770    ?   499.8479+-1.8509       ?
   audio-beat-detection                 192.0635+-1.4448        191.8720+-0.8279       
   audio-dft                            272.6202+-3.5519        271.3265+-2.0227       
   audio-fft                            124.5866+-0.5697        124.2436+-0.4142       
   audio-oscillator                     255.3471+-1.9788    ?   255.5191+-1.5136       ?
   imaging-darkroom                     418.8741+-0.6484    ?   419.2329+-2.7905       ?
   imaging-desaturate                   232.8519+-1.0063    ^   220.8624+-2.0278       ^ definitely 1.0543x faster
   imaging-gaussian-blur                586.1653+-0.9813        582.0832+-3.6930       
   json-parse-financial                  55.3562+-0.2220    ?    55.7427+-0.2774       ?
   json-stringify-tinderbox              68.6493+-0.2618    ?    69.0735+-0.3286       ?
   stanford-crypto-aes                  133.5733+-1.5222        133.1723+-1.8243       
   stanford-crypto-ccm                  102.3700+-0.3699        102.1013+-0.5759       
   stanford-crypto-pbkdf2               194.2358+-1.1225    ?   196.5417+-2.0086       ? might be 1.0119x slower
   stanford-crypto-sha256-iterative      72.3107+-0.3245    ^    71.6367+-0.2204       ^ definitely 1.0094x faster

   <arithmetic> *                       229.0386+-0.4915        228.0897+-0.6386       
   <geometric>                          178.9605+-0.4238        178.2923+-0.5165       
   <harmonic>                           139.8795+-0.3317        139.6275+-0.3821       

                                            TipOfTree             BetterVRAlloc                                  
All benchmarks:
   <arithmetic>                          89.9507+-0.1980    ^    89.4451+-0.2265       ^ definitely 1.0057x faster
   <geometric>                           23.1890+-0.0668         23.1496+-0.0756       
   <harmonic>                             7.3649+-0.0381          7.3327+-0.0667       

                                            TipOfTree             BetterVRAlloc                                  
Geomean of preferred means:
   <scaled-result>                       53.6428+-0.1329         53.5676+-0.1168
Comment 2 Filip Pizlo 2011-10-11 14:47:28 PDT
Created attachment 110582 [details]
the patch
Comment 3 Filip Pizlo 2011-10-11 14:51:44 PDT
Ran this for a bunch of iterations on V8 to get better significance.  And yes, it's a non-trivial improvement.



Benchmark report for V8.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"BetterVRAlloc" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc

Collected 60 samples per benchmark/VM, with 20 VM invocations per benchmark. Used 1
benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime()
function to get microsecond-level timing. Reporting benchmark execution times with 95%
confidence intervals in milliseconds.

                       TipOfTree             BetterVRAlloc                                  

crypto              72.8480+-0.1132    ^    72.1891+-0.1859       ^ definitely 1.0091x faster
deltablue          229.7648+-0.9618    ?   230.9335+-0.7912       ?
earley-boyer        92.0746+-0.2574    !    97.1430+-0.1545       ! definitely 1.0550x slower
raytrace            58.6933+-0.1451    !    59.3937+-0.1625       ! definitely 1.0119x slower
regexp             104.4100+-0.1694    ?   104.7309+-0.1666       ?
richards           208.5533+-0.5139    ^   189.0045+-0.3390       ^ definitely 1.1034x faster
splay               96.9219+-0.3767         96.0736+-0.7089       

<arithmetic>       123.3237+-0.1473    ^   121.3526+-0.1864       ^ definitely 1.0162x faster
<geometric> *      109.6712+-0.1097    ^   109.0018+-0.1606       ^ definitely 1.0061x faster
<harmonic>          99.0458+-0.1036    ?    99.1898+-0.1474       ?
Comment 4 WebKit Review Bot 2011-10-11 14:52:52 PDT
Attachment 110582 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/ChangeLog:1:  ChangeLog entry has no bug number  [changelog/bugnumber] [5]
Total errors found: 1 in 8 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 5 Filip Pizlo 2011-10-11 15:28:31 PDT
It turns out that my previous attempt at a virtual register allocator was a total bust.  I fixed it, and now we have real numbers:



Benchmark report for V8.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"BetterVRAlloc" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark
iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get
microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals
in milliseconds.

                       TipOfTree             BetterVRAlloc                                  

crypto              72.9077+-0.4529    ?    73.1820+-0.3834       ?
deltablue          233.4981+-2.2990    ^   229.2700+-1.5657       ^ definitely 1.0184x faster
earley-boyer        91.4999+-0.3203    ?    91.6503+-0.4134       ?
raytrace            58.7518+-0.4181         58.5479+-0.4437       
regexp             104.2604+-0.2749        104.1496+-0.4870       
richards           207.6681+-1.7385    ^   188.0565+-0.9151       ^ definitely 1.1043x faster
splay               95.9854+-0.3932         95.5227+-0.5338       

<arithmetic>       123.5102+-0.5281    ^   120.0541+-0.3077       ^ definitely 1.0288x faster
<geometric> *      109.6144+-0.2834    ^   107.7297+-0.1889       ^ definitely 1.0175x faster
<harmonic>          98.9001+-0.1905    ^    98.0251+-0.1539       ^ definitely 1.0089x faster
Comment 6 Filip Pizlo 2011-10-11 15:38:18 PDT
Updated numbers for all benchmarks:



Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"BetterVRAlloc" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree             BetterVRAlloc                                  
SunSpider:
   3d-cube                                7.3791+-0.1438          7.3422+-0.2024       
   3d-morph                               7.6559+-0.1532    ?     7.6660+-0.1369       ?
   3d-raytrace                            7.5526+-0.1831    ?     7.6005+-0.1643       ?
   access-binary-trees                    1.6997+-0.0453    ?     1.7195+-0.0596       ? might be 1.0117x slower
   access-fannkuch                        6.4543+-0.1042          6.4251+-0.0914       
   access-nbody                           3.3612+-0.0839          3.3161+-0.0804         might be 1.0136x faster
   access-nsieve                          2.5566+-0.0616    ?     2.6244+-0.0842       ? might be 1.0265x slower
   bitops-3bit-bits-in-byte               1.7225+-0.0330    ?     1.7510+-0.0330       ? might be 1.0165x slower
   bitops-bits-in-byte                    2.7106+-0.0671    ?     2.7899+-0.0492       ? might be 1.0293x slower
   bitops-bitwise-and                     3.3565+-0.0647          3.3551+-0.1026       
   bitops-nsieve-bits                     5.4728+-0.1269    ?     5.5231+-0.1031       ?
   controlflow-recursive                  2.0593+-0.0302    ?     2.0820+-0.0449       ? might be 1.0110x slower
   crypto-aes                             6.7189+-0.1746    ?     6.7529+-0.1620       ?
   crypto-md5                             2.8141+-0.0487    ?     2.8817+-0.0505       ? might be 1.0240x slower
   crypto-sha1                            2.4218+-0.0621    ?     2.5025+-0.0589       ? might be 1.0333x slower
   date-format-tofte                     10.0579+-0.1816         10.0549+-0.1680       
   date-format-xparb                      9.4010+-0.4992    ?     9.6260+-0.4280       ? might be 1.0239x slower
   math-cordic                            6.5915+-0.1215          6.5062+-0.1217         might be 1.0131x faster
   math-partial-sums                      7.6552+-0.1328    ?     7.7737+-0.1838       ? might be 1.0155x slower
   math-spectral-norm                     2.8287+-0.0347    ?     2.8479+-0.0720       ?
   regexp-dna                            10.7310+-0.1783         10.7082+-0.1497       
   string-base64                          5.3418+-0.1218          5.3307+-0.1068       
   string-fasta                           6.4322+-0.1368    ?     6.4788+-0.2142       ?
   string-tagcloud                       11.3594+-0.1888    ?    11.3987+-0.2564       ?
   string-unpack-code                    20.4972+-0.2142    ?    20.6382+-0.3662       ?
   string-validate-input                  6.2973+-0.1110    ?     6.3350+-0.1157       ?

   <arithmetic> *                         6.1973+-0.0333    ?     6.2319+-0.0329       ?
   <geometric>                            5.0876+-0.0263    ?     5.1265+-0.0274       ?
   <harmonic>                             4.1650+-0.0314    ?     4.2101+-0.0321       ? might be 1.0108x slower

                                            TipOfTree             BetterVRAlloc                                  
V8:
   crypto                                73.4915+-0.5072    ^    72.5717+-0.3088       ^ definitely 1.0127x faster
   deltablue                            229.3694+-2.9698    ?   230.5546+-1.8132       ?
   earley-boyer                          91.6589+-0.2657    ?    92.8696+-1.2849       ? might be 1.0132x slower
   raytrace                              58.4779+-0.3150    ?    59.0484+-0.5281       ?
   regexp                               104.5795+-0.3130        104.2271+-0.4320       
   richards                             207.9325+-1.2140    ^   189.4965+-1.3250       ^ definitely 1.0973x faster
   splay                                 96.3336+-0.5974         95.9125+-0.5882       

   <arithmetic>                         123.1205+-0.4639    ^   120.6686+-0.3824       ^ definitely 1.0203x faster
   <geometric> *                        109.5373+-0.2146    ^   108.2105+-0.3030       ^ definitely 1.0123x faster
   <harmonic>                            98.9614+-0.1761    ^    98.4156+-0.2803       ^ definitely 1.0055x faster

                                            TipOfTree             BetterVRAlloc                                  
Kraken:
   ai-astar                             503.0404+-2.6460        501.6886+-4.6412       
   audio-beat-detection                 194.5291+-1.4049    ^   191.3495+-0.9404       ^ definitely 1.0166x faster
   audio-dft                            272.6498+-2.3990    ?   279.1601+-5.1765       ? might be 1.0239x slower
   audio-fft                            125.3488+-0.6055    ?   125.3593+-0.5099       ?
   audio-oscillator                     253.9799+-2.2824        253.1516+-1.7852       
   imaging-darkroom                     423.5625+-5.1567    ?   425.0566+-8.0295       ?
   imaging-desaturate                   233.2266+-1.3487        232.7642+-1.3199       
   imaging-gaussian-blur                586.1470+-0.9626    ?   587.3914+-1.0103       ?
   json-parse-financial                  55.5272+-0.1996    !    56.1811+-0.3381       ! definitely 1.0118x slower
   json-stringify-tinderbox              69.5921+-1.4267         68.9636+-0.4248       
   stanford-crypto-aes                  133.6110+-1.4540        133.2715+-1.5036       
   stanford-crypto-ccm                  102.0949+-0.4570        101.1988+-0.6843       
   stanford-crypto-pbkdf2               194.6565+-1.3833    ^   192.1962+-0.7880       ^ definitely 1.0128x faster
   stanford-crypto-sha256-iterative      72.4658+-0.2888    ^    71.5203+-0.3084       ^ definitely 1.0132x faster

   <arithmetic> *                       230.0308+-0.5548        229.9466+-0.9315       
   <geometric>                          179.6701+-0.5064        179.2885+-0.6065       
   <harmonic>                           140.4526+-0.5347        140.0540+-0.3782       

                                            TipOfTree             BetterVRAlloc                                  
All benchmarks:
   <arithmetic>                          90.2852+-0.1993         89.9141+-0.2787       
   <geometric>                           23.2351+-0.0815    ?    23.2761+-0.0696       ?
   <harmonic>                             7.3290+-0.0539    ?     7.4054+-0.0549       ? might be 1.0104x slower

                                            TipOfTree             BetterVRAlloc                                  
Geomean of preferred means:
   <scaled-result>                       53.8492+-0.1318         53.7238+-0.1173
Comment 7 Filip Pizlo 2011-10-11 15:41:43 PDT
Created attachment 110589 [details]
the patch
Comment 8 WebKit Review Bot 2011-10-11 15:44:01 PDT
Attachment 110589 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/dfg/DFGJITCompiler.cpp:165:  Missing space after ,  [whitespace/comma] [3]
Total errors found: 1 in 10 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 9 Filip Pizlo 2011-10-11 15:48:24 PDT
Created attachment 110594 [details]
the patch - fix style
Comment 10 Oliver Hunt 2011-10-11 15:50:30 PDT
Comment on attachment 110594 [details]
the patch - fix style

Make sure you run it through at least the fast/js part of webkit tests.
Comment 11 Filip Pizlo 2011-10-11 16:49:55 PDT
Landed in r97197.
Comment 12 Ryosuke Niwa 2011-10-11 18:36:10 PDT
This patch appears to have broken GTK builds (missing stdio.h). Build fix attempted in http://trac.webkit.org/changeset/97197.