Bug 72845 - DFG 32_64 should directly store double virtual registers on SetLocal
Summary: DFG 32_64 should directly store double virtual registers on SetLocal
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-20 19:05 PST by Filip Pizlo
Modified: 2011-11-20 19:32 PST (History)
0 users

See Also:


Attachments
the patch (1.71 KB, patch)
2011-11-20 19:11 PST, Filip Pizlo
oliver: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2011-11-20 19:05:56 PST
The 32_64 DFG will perform a complex shuffling to move an FPR into a pair of GPRs when doing a SetLocal() with the source in an FPR.  It should just store the FPR into memory instead.
Comment 1 Filip Pizlo 2011-11-20 19:11:35 PST
Created attachment 116021 [details]
the patch

Benchmark report for SunSpider, V8, and Kraken on nitroflex.local (MacBookPro8,2).

VMs tested:
"TipOfTree32" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc (r100877)
"DoubleSetLocal" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc (r100877)

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime()
function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in
milliseconds.

                                           TipOfTree32            DoubleSetLocal                                 
SunSpider:
   3d-cube                                8.4549+-0.1343    ^     8.0452+-0.2251       ^ definitely 1.0509x faster
   3d-morph                              10.7193+-0.1773         10.4906+-0.1972         might be 1.0218x faster
   3d-raytrace                            9.3581+-0.1933          9.1155+-0.2413         might be 1.0266x faster
   access-binary-trees                    1.8295+-0.0483          1.7555+-0.0480         might be 1.0422x faster
   access-fannkuch                        6.9373+-0.1184    ?     6.9615+-0.0946       ?
   access-nbody                           5.1551+-0.1148          5.0928+-0.0389         might be 1.0122x faster
   access-nsieve                          2.5978+-0.0531          2.5769+-0.0547       
   bitops-3bit-bits-in-byte               1.2894+-0.0129          1.2718+-0.0216         might be 1.0139x faster
   bitops-bits-in-byte                    2.4475+-0.0977    ?     2.4613+-0.0988       ?
   bitops-bitwise-and                     2.9713+-0.1023          2.8787+-0.0825         might be 1.0322x faster
   bitops-nsieve-bits                     6.2704+-0.0946          6.1502+-0.0575         might be 1.0195x faster
   controlflow-recursive                  2.6202+-0.0480    ?     2.6276+-0.0594       ?
   crypto-aes                             8.6539+-0.2166          8.6224+-0.1586       
   crypto-md5                             3.0539+-0.0660    ?     3.1494+-0.1330       ? might be 1.0313x slower
   crypto-sha1                            2.4836+-0.0682          2.4560+-0.0529         might be 1.0113x faster
   date-format-tofte                     10.5511+-0.2150    ?    10.7371+-0.2162       ? might be 1.0176x slower
   date-format-xparb                     10.4321+-0.1995    ?    10.5713+-0.2092       ? might be 1.0133x slower
   math-cordic                            8.0759+-0.0995    ?     8.1393+-0.1471       ?
   math-partial-sums                      9.5702+-0.1078    ?     9.5913+-0.1801       ?
   math-spectral-norm                     2.5604+-0.0429          2.5581+-0.0905       
   regexp-dna                            10.5702+-0.2713         10.4854+-0.1979       
   string-base64                          4.3580+-0.1558          4.3331+-0.0987       
   string-fasta                           8.7332+-0.1923          8.6863+-0.1381       
   string-tagcloud                       13.0627+-0.2919         13.0404+-0.2198       
   string-unpack-code                    21.5516+-0.4510         21.4704+-0.4070       
   string-validate-input                  5.9273+-0.1235    ?     5.9752+-0.1225       ?

   <arithmetic> *                         6.9321+-0.0299          6.8940+-0.0226         might be 1.0055x faster
   <geometric>                            5.5357+-0.0271          5.4989+-0.0271         might be 1.0067x faster
   <harmonic>                             4.3034+-0.0384          4.2653+-0.0455         might be 1.0089x faster

                                           TipOfTree32            DoubleSetLocal                                 
V8:
   crypto                                90.0416+-0.8016    ?    90.6770+-0.6636       ?
   deltablue                            154.9745+-0.6468        154.6830+-2.4361       
   earley-boyer                         143.3423+-1.0763        143.0581+-1.2902       
   raytrace                              61.0435+-0.3575         60.8018+-0.2768       
   regexp                               108.9734+-1.1136        107.7358+-0.5199         might be 1.0115x faster
   richards                             167.4317+-1.4102        165.2445+-0.9657         might be 1.0132x faster
   splay                                 78.8951+-1.1458         78.2908+-0.8136       

   <arithmetic>                         114.9575+-0.3612        114.3559+-0.4613         might be 1.0053x faster
   <geometric> *                        108.3815+-0.3745        107.8706+-0.3324         might be 1.0047x faster
   <harmonic>                           101.8154+-0.4095        101.3808+-0.2502         might be 1.0043x faster

                                           TipOfTree32            DoubleSetLocal                                 
Kraken:
   ai-astar                             521.7866+-4.5492        517.8003+-1.6303       
   audio-beat-detection                 373.5188+-2.9094    ^   368.5723+-1.1785       ^ definitely 1.0134x faster
   audio-dft                            384.2349+-4.8247    ^   367.0053+-2.2540       ^ definitely 1.0469x faster
   audio-fft                            249.2332+-1.9537        249.0107+-0.6502       
   audio-oscillator                     465.0110+-2.5229        461.8881+-3.2125       
   imaging-darkroom                     397.2592+-3.4557        390.9891+-3.9149         might be 1.0160x faster
   imaging-desaturate                   920.9864+-0.9105        917.8887+-3.3876       
   imaging-gaussian-blur                764.3026+-2.0463    ^   698.2806+-2.7788       ^ definitely 1.0945x faster
   json-parse-financial                  58.8100+-0.2344         58.4174+-0.1946       
   json-stringify-tinderbox              98.7484+-0.3706         98.5522+-0.5464       
   stanford-crypto-aes                  107.9467+-0.6432    ?   108.5069+-0.2682       ?
   stanford-crypto-ccm                  110.9496+-0.5222        109.9775+-0.5927       
   stanford-crypto-pbkdf2               216.4592+-1.3129    ^   213.6526+-1.0766       ^ definitely 1.0131x faster
   stanford-crypto-sha256-iterative      91.4980+-0.2973         91.2890+-0.2927       

   <arithmetic> *                       340.0532+-0.5716    ^   332.2736+-0.4668       ^ definitely 1.0234x faster
   <geometric>                          248.0638+-0.3670    ^   244.3442+-0.3037       ^ definitely 1.0152x faster
   <harmonic>                           176.6955+-0.2389    ^   175.3484+-0.2843       ^ definitely 1.0077x faster

                                           TipOfTree32            DoubleSetLocal                                 
All benchmarks:
   <arithmetic>                         122.2485+-0.1486    ^   119.8205+-0.1493       ^ definitely 1.0203x faster
   <geometric>                           26.7588+-0.0696    ^    26.5219+-0.0802       ^ definitely 1.0089x faster
   <harmonic>                             7.5933+-0.0661          7.5265+-0.0785         might be 1.0089x faster

                                           TipOfTree32            DoubleSetLocal                                 
Geomean of preferred means:
   <scaled-result>                       63.4529+-0.0942    ^    62.7510+-0.1008       ^ definitely 1.0112x faster
Comment 2 Filip Pizlo 2011-11-20 19:32:45 PST
Landed in http://trac.webkit.org/changeset/100878