Bug 126545

Summary: FTL should not use the inputs of an add or sub as the live-at-exit values in an overflow check, if the values aren't live after
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: JavaScriptCoreAssignee: Filip Pizlo <fpizlo>
Status: RESOLVED FIXED    
Severity: Normal CC: atrick, barraclough, ggaren, mark.lam, mhahnenberg, msaboff, nrotem, oliver, sam
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 112840    
Attachments:
Description Flags
work in progress
none
the patch
none
the patch
none
the patch
none
the patch oliver: review+

Description Filip Pizlo 2014-01-06 15:42:58 PST
Effectively, make SpeculationRecovery work in the FTL.  Patch forthcoming.
Comment 1 Filip Pizlo 2014-01-08 13:27:48 PST
Created attachment 220658 [details]
work in progress
Comment 2 Filip Pizlo 2014-01-08 14:13:21 PST
Created attachment 220663 [details]
the patch
Comment 3 Filip Pizlo 2014-01-08 14:15:03 PST
Created attachment 220664 [details]
the patch
Comment 4 Filip Pizlo 2014-01-08 16:22:34 PST
Created attachment 220673 [details]
the patch
Comment 5 Filip Pizlo 2014-01-08 16:51:53 PST
Created attachment 220676 [details]
the patch
Comment 6 Filip Pizlo 2014-01-08 19:57:51 PST
Here's what the performance looks like.  Note that gbemu is slower with FTL anyway.  We need to investigate that.  So, it's possible that the gbemu result below is a fluke.  If I can't figure it out easily then I'll just ignore it for now and fix gbemu as part of a separate investigation.


Benchmark report for SunSpider, Octane, and Kraken on oldmac (MacPro4,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/cStack/OpenSource/WebKitBuild/Release/jsc (r161524)
    export JSC_useExperimentalFTL=true
"AddSubRecovery" at /Volumes/Data/fromMiniMe/cStack/OpenSource/WebKitBuild/Release/jsc (r161520)
    export JSC_useExperimentalFTL=true

Collected 10 samples per benchmark/VM, with 10 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to
get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                TipOfTree               AddSubRecovery                                  
SunSpider:
   3d-cube                                    7.7477+-0.0921            7.6919+-0.0624        
   3d-morph                                   8.9602+-0.1306            8.9439+-0.1633        
   3d-raytrace                                9.5637+-0.1792            9.5186+-0.1196        
   access-binary-trees                        2.4490+-0.0225     ?      2.4539+-0.0312        ?
   access-fannkuch                            8.0607+-0.0973            8.0432+-0.1013        
   access-nbody                               4.2694+-0.0749     ?      4.2695+-0.0168        ?
   access-nsieve                              5.5850+-0.0770     ?      5.6194+-0.0248        ?
   bitops-3bit-bits-in-byte                   1.9774+-0.0268     ?      1.9896+-0.0100        ?
   bitops-bits-in-byte                        6.6305+-0.0235            6.6007+-0.0865        
   bitops-bitwise-and                         3.0238+-0.0835     ?      3.0575+-0.0304        ? might be 1.0112x slower
   bitops-nsieve-bits                         5.8492+-0.0190     ?      5.8584+-0.0438        ?
   controlflow-recursive                      4.5934+-0.0648            4.5706+-0.0487        
   crypto-aes                                 5.8817+-0.1081     ?      5.9063+-0.0429        ?
   crypto-md5                                 3.5607+-0.0620     ?      3.5769+-0.0402        ?
   crypto-sha1                                3.6101+-0.0187            3.5884+-0.0552        
   date-format-tofte                         11.5845+-0.1187     ?     11.6822+-0.1167        ?
   date-format-xparb                          8.7830+-0.0958     ^      8.5817+-0.0888        ^ definitely 1.0235x faster
   math-cordic                                4.7315+-0.0493     ?      4.7629+-0.0156        ?
   math-partial-sums                         10.0864+-0.1116     ?     10.1489+-0.1402        ?
   math-spectral-norm                         4.6061+-0.0208            4.4216+-0.3540          might be 1.0417x faster
   regexp-dna                                12.9702+-0.0865     ?     12.9944+-0.1125        ?
   string-base64                              5.5906+-0.0428            5.5418+-0.0193        
   string-fasta                              11.5935+-0.2043     ?     11.6970+-0.2534        ?
   string-tagcloud                           15.0665+-0.1307     ?     15.2615+-0.1236        ? might be 1.0129x slower
   string-unpack-code                        32.5556+-0.2230     ?     32.7959+-0.4159        ?
   string-validate-input                      7.0304+-0.0611     ?      7.0469+-0.0563        ?

   <arithmetic> *                             7.9370+-0.0112     ?      7.9471+-0.0260        ? might be 1.0013x slower
   <geometric>                                6.5435+-0.0148            6.5385+-0.0267          might be 1.0008x faster
   <harmonic>                                 5.5255+-0.0247            5.5191+-0.0302          might be 1.0012x faster

                                                TipOfTree               AddSubRecovery                                  
Octane and V8v7:
   encrypt                                   0.47793+-0.00038    ^     0.43935+-0.00583       ^ definitely 1.0878x faster
   decrypt                                   8.59600+-0.05267    ^     8.13880+-0.09517       ^ definitely 1.0562x faster
   deltablue                        x2       0.55095+-0.00425    ?     0.55502+-0.00560       ?
   earley                                    0.88133+-0.00916    ?     0.89300+-0.01437       ? might be 1.0132x slower
   boyer                                    12.10769+-0.14450         11.99398+-0.02967       
   raytrace                         x2       3.92836+-0.03043    ?     3.94097+-0.04681       ?
   regexp                           x2      30.82303+-0.09634    ?    31.04310+-0.58821       ?
   richards                         x2       0.22442+-0.00251          0.22347+-0.00274       
   splay                            x2       0.62854+-0.00353    ?     0.63194+-0.00737       ?
   navier-stokes                    x2       8.39458+-0.13356    ^     8.22516+-0.01000       ^ definitely 1.0206x faster
   closure                                   0.77800+-0.00791          0.77601+-0.00993       
   jquery                                   10.96835+-0.08423         10.86367+-0.07504       
   gbemu                            x2      82.45033+-0.34407    !    94.26814+-0.64635       ! definitely 1.1433x slower
   mandreel                         x2     103.46800+-0.23636        102.74142+-0.57080       
   pdfjs                            x2      99.56900+-0.26983    ?    99.82254+-0.25272       ?
   box2d                            x2      35.31580+-0.45731         35.14919+-0.12333       

V8v7:
   <arithmetic>                              6.94767+-0.03015          6.91903+-0.07269         might be 1.0041x faster
   <geometric> *                             2.18671+-0.01090          2.16780+-0.01165         might be 1.0087x faster
   <harmonic>                                0.80145+-0.00474          0.79453+-0.00688         might be 1.0087x faster

Octane including V8v7:
   <arithmetic>                             29.40444+-0.04191    !    30.24257+-0.06338       ! definitely 1.0285x slower
   <geometric> *                             6.60793+-0.01984    ?     6.63299+-0.02120       ? might be 1.0038x slower
   <harmonic>                                1.21150+-0.00671          1.20167+-0.00973         might be 1.0082x faster

                                                TipOfTree               AddSubRecovery                                  
Kraken:
   ai-astar                                  493.103+-0.779      ?     493.471+-0.729         ?
   audio-beat-detection                      222.061+-1.313      ?     222.454+-1.470         ?
   audio-dft                                 293.444+-2.965            291.676+-1.039         
   audio-fft                                 129.743+-0.136            129.604+-0.132         
   audio-oscillator                          531.025+-7.483      ?     532.127+-3.620         ?
   imaging-darkroom                          294.288+-1.613            293.047+-1.153         
   imaging-desaturate                        120.759+-0.120      ^     110.162+-1.400         ^ definitely 1.0962x faster
   imaging-gaussian-blur                     199.408+-0.386      ^     190.213+-0.642         ^ definitely 1.0483x faster
   json-parse-financial                       79.736+-0.247      ?      80.204+-0.363         ?
   json-stringify-tinderbox                  106.659+-1.333            106.562+-0.444         
   stanford-crypto-aes                        92.862+-1.503             92.286+-0.383         
   stanford-crypto-ccm                       105.746+-7.114            102.586+-3.351           might be 1.0308x faster
   stanford-crypto-pbkdf2                    261.388+-4.301            259.783+-0.818         
   stanford-crypto-sha256-iterative          113.275+-0.435      ?     114.188+-0.730         ?

   <arithmetic> *                            217.393+-1.180      ^     215.597+-0.380         ^ definitely 1.0083x faster
   <geometric>                               180.389+-1.110      ^     178.179+-0.500         ^ definitely 1.0124x faster
   <harmonic>                                153.694+-1.141      ^     151.589+-0.598         ^ definitely 1.0139x faster

                                                TipOfTree               AddSubRecovery                                  
All benchmarks:
   <arithmetic>                              60.8238+-0.2464           60.7772+-0.0805          might be 1.0008x faster
   <geometric>                               13.2746+-0.0202           13.2558+-0.0263          might be 1.0014x faster
   <harmonic>                                 2.5135+-0.0119            2.4962+-0.0170          might be 1.0070x faster

                                                TipOfTree               AddSubRecovery                                  
Geomean of preferred means:
   <scaled-result>                           22.5070+-0.0437           22.4828+-0.0349          might be 1.0011x faster
Comment 7 Filip Pizlo 2014-01-08 20:06:12 PST
Yes, the gbemu regression is real.  I think it's better to investigate it separately and get this landed.

(In reply to comment #6)
> Here's what the performance looks like.  Note that gbemu is slower with FTL anyway.  We need to investigate that.  So, it's possible that the gbemu result below is a fluke.  If I can't figure it out easily then I'll just ignore it for now and fix gbemu as part of a separate investigation.
> 
> 
> Benchmark report for SunSpider, Octane, and Kraken on oldmac (MacPro4,1).
> 
> VMs tested:
> "TipOfTree" at /Volumes/Data/pizlo/cStack/OpenSource/WebKitBuild/Release/jsc (r161524)
>     export JSC_useExperimentalFTL=true
> "AddSubRecovery" at /Volumes/Data/fromMiniMe/cStack/OpenSource/WebKitBuild/Release/jsc (r161520)
>     export JSC_useExperimentalFTL=true
> 
> Collected 10 samples per benchmark/VM, with 10 VM invocations per benchmark. Emitted a call to gc() between sample
> measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to
> get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.
> 
>                                                 TipOfTree               AddSubRecovery                                  
> SunSpider:
>    3d-cube                                    7.7477+-0.0921            7.6919+-0.0624        
>    3d-morph                                   8.9602+-0.1306            8.9439+-0.1633        
>    3d-raytrace                                9.5637+-0.1792            9.5186+-0.1196        
>    access-binary-trees                        2.4490+-0.0225     ?      2.4539+-0.0312        ?
>    access-fannkuch                            8.0607+-0.0973            8.0432+-0.1013        
>    access-nbody                               4.2694+-0.0749     ?      4.2695+-0.0168        ?
>    access-nsieve                              5.5850+-0.0770     ?      5.6194+-0.0248        ?
>    bitops-3bit-bits-in-byte                   1.9774+-0.0268     ?      1.9896+-0.0100        ?
>    bitops-bits-in-byte                        6.6305+-0.0235            6.6007+-0.0865        
>    bitops-bitwise-and                         3.0238+-0.0835     ?      3.0575+-0.0304        ? might be 1.0112x slower
>    bitops-nsieve-bits                         5.8492+-0.0190     ?      5.8584+-0.0438        ?
>    controlflow-recursive                      4.5934+-0.0648            4.5706+-0.0487        
>    crypto-aes                                 5.8817+-0.1081     ?      5.9063+-0.0429        ?
>    crypto-md5                                 3.5607+-0.0620     ?      3.5769+-0.0402        ?
>    crypto-sha1                                3.6101+-0.0187            3.5884+-0.0552        
>    date-format-tofte                         11.5845+-0.1187     ?     11.6822+-0.1167        ?
>    date-format-xparb                          8.7830+-0.0958     ^      8.5817+-0.0888        ^ definitely 1.0235x faster
>    math-cordic                                4.7315+-0.0493     ?      4.7629+-0.0156        ?
>    math-partial-sums                         10.0864+-0.1116     ?     10.1489+-0.1402        ?
>    math-spectral-norm                         4.6061+-0.0208            4.4216+-0.3540          might be 1.0417x faster
>    regexp-dna                                12.9702+-0.0865     ?     12.9944+-0.1125        ?
>    string-base64                              5.5906+-0.0428            5.5418+-0.0193        
>    string-fasta                              11.5935+-0.2043     ?     11.6970+-0.2534        ?
>    string-tagcloud                           15.0665+-0.1307     ?     15.2615+-0.1236        ? might be 1.0129x slower
>    string-unpack-code                        32.5556+-0.2230     ?     32.7959+-0.4159        ?
>    string-validate-input                      7.0304+-0.0611     ?      7.0469+-0.0563        ?
> 
>    <arithmetic> *                             7.9370+-0.0112     ?      7.9471+-0.0260        ? might be 1.0013x slower
>    <geometric>                                6.5435+-0.0148            6.5385+-0.0267          might be 1.0008x faster
>    <harmonic>                                 5.5255+-0.0247            5.5191+-0.0302          might be 1.0012x faster
> 
>                                                 TipOfTree               AddSubRecovery                                  
> Octane and V8v7:
>    encrypt                                   0.47793+-0.00038    ^     0.43935+-0.00583       ^ definitely 1.0878x faster
>    decrypt                                   8.59600+-0.05267    ^     8.13880+-0.09517       ^ definitely 1.0562x faster
>    deltablue                        x2       0.55095+-0.00425    ?     0.55502+-0.00560       ?
>    earley                                    0.88133+-0.00916    ?     0.89300+-0.01437       ? might be 1.0132x slower
>    boyer                                    12.10769+-0.14450         11.99398+-0.02967       
>    raytrace                         x2       3.92836+-0.03043    ?     3.94097+-0.04681       ?
>    regexp                           x2      30.82303+-0.09634    ?    31.04310+-0.58821       ?
>    richards                         x2       0.22442+-0.00251          0.22347+-0.00274       
>    splay                            x2       0.62854+-0.00353    ?     0.63194+-0.00737       ?
>    navier-stokes                    x2       8.39458+-0.13356    ^     8.22516+-0.01000       ^ definitely 1.0206x faster
>    closure                                   0.77800+-0.00791          0.77601+-0.00993       
>    jquery                                   10.96835+-0.08423         10.86367+-0.07504       
>    gbemu                            x2      82.45033+-0.34407    !    94.26814+-0.64635       ! definitely 1.1433x slower
>    mandreel                         x2     103.46800+-0.23636        102.74142+-0.57080       
>    pdfjs                            x2      99.56900+-0.26983    ?    99.82254+-0.25272       ?
>    box2d                            x2      35.31580+-0.45731         35.14919+-0.12333       
> 
> V8v7:
>    <arithmetic>                              6.94767+-0.03015          6.91903+-0.07269         might be 1.0041x faster
>    <geometric> *                             2.18671+-0.01090          2.16780+-0.01165         might be 1.0087x faster
>    <harmonic>                                0.80145+-0.00474          0.79453+-0.00688         might be 1.0087x faster
> 
> Octane including V8v7:
>    <arithmetic>                             29.40444+-0.04191    !    30.24257+-0.06338       ! definitely 1.0285x slower
>    <geometric> *                             6.60793+-0.01984    ?     6.63299+-0.02120       ? might be 1.0038x slower
>    <harmonic>                                1.21150+-0.00671          1.20167+-0.00973         might be 1.0082x faster
> 
>                                                 TipOfTree               AddSubRecovery                                  
> Kraken:
>    ai-astar                                  493.103+-0.779      ?     493.471+-0.729         ?
>    audio-beat-detection                      222.061+-1.313      ?     222.454+-1.470         ?
>    audio-dft                                 293.444+-2.965            291.676+-1.039         
>    audio-fft                                 129.743+-0.136            129.604+-0.132         
>    audio-oscillator                          531.025+-7.483      ?     532.127+-3.620         ?
>    imaging-darkroom                          294.288+-1.613            293.047+-1.153         
>    imaging-desaturate                        120.759+-0.120      ^     110.162+-1.400         ^ definitely 1.0962x faster
>    imaging-gaussian-blur                     199.408+-0.386      ^     190.213+-0.642         ^ definitely 1.0483x faster
>    json-parse-financial                       79.736+-0.247      ?      80.204+-0.363         ?
>    json-stringify-tinderbox                  106.659+-1.333            106.562+-0.444         
>    stanford-crypto-aes                        92.862+-1.503             92.286+-0.383         
>    stanford-crypto-ccm                       105.746+-7.114            102.586+-3.351           might be 1.0308x faster
>    stanford-crypto-pbkdf2                    261.388+-4.301            259.783+-0.818         
>    stanford-crypto-sha256-iterative          113.275+-0.435      ?     114.188+-0.730         ?
> 
>    <arithmetic> *                            217.393+-1.180      ^     215.597+-0.380         ^ definitely 1.0083x faster
>    <geometric>                               180.389+-1.110      ^     178.179+-0.500         ^ definitely 1.0124x faster
>    <harmonic>                                153.694+-1.141      ^     151.589+-0.598         ^ definitely 1.0139x faster
> 
>                                                 TipOfTree               AddSubRecovery                                  
> All benchmarks:
>    <arithmetic>                              60.8238+-0.2464           60.7772+-0.0805          might be 1.0008x faster
>    <geometric>                               13.2746+-0.0202           13.2558+-0.0263          might be 1.0014x faster
>    <harmonic>                                 2.5135+-0.0119            2.4962+-0.0170          might be 1.0070x faster
> 
>                                                 TipOfTree               AddSubRecovery                                  
> Geomean of preferred means:
>    <scaled-result>                           22.5070+-0.0437           22.4828+-0.0349          might be 1.0011x faster
Comment 8 Filip Pizlo 2014-01-08 20:08:33 PST
Landed on branch in http://trac.webkit.org/changeset/161543
Comment 9 Filip Pizlo 2014-01-10 20:26:43 PST
*** Bug 125755 has been marked as a duplicate of this bug. ***
Comment 10 Mark Lam 2014-01-13 18:41:33 PST
Review status updated in r161938: <http://trac.webkit.org/r161938>.