Bug 151128 - Get rid of anonymous stack slots
Summary: Get rid of anonymous stack slots
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: All All
: P2 Normal
Assignee: Filip Pizlo
URL:
Keywords:
Depends on:
Blocks: 150279
  Show dependency treegraph
 
Reported: 2015-11-10 20:23 PST by Filip Pizlo
Modified: 2016-02-02 15:21 PST (History)
11 users (show)

See Also:


Attachments
it's a start (37.67 KB, patch)
2016-01-30 17:51 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff
moar (82.38 KB, patch)
2016-01-31 11:15 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff
probably done (103.12 KB, patch)
2016-01-31 11:42 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff
more (125.17 KB, patch)
2016-02-01 12:19 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff
getting close (132.32 KB, patch)
2016-02-01 13:00 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff
almost done (156.34 KB, patch)
2016-02-01 19:08 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff
rebased patch (156.61 KB, patch)
2016-02-02 09:55 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff
more fixes (157.64 KB, patch)
2016-02-02 10:18 PST, Filip Pizlo
mark.lam: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2015-11-10 20:23:20 PST
Air::allocateStack assumes that a Def of a StackSlot means that the StackSlot is not live before that instruction.  But this is only true if the instruction Def's the full size of the StackSlot.  This is hard to tell because currently, a Def in Air means that it could write anywhere from 1 to 8 bytes.

There are a lot of possible solutions.

For starters, B3 StackSlots are unlikely to benefit from being allocated based on liveness.  We could simply create a new notion stack slot that has the Use/Def semantics we want.  For example, we could say that the "Anonymous" stack slot kind actually means that any store to the stack slot changes all bytes in the StackSlot.  This would only require a documentation change.

Alternatively, we could incorporate forward flow into the notion of liveness: a Def is only a Def if just prior to it, none of the bytes in the StackSlot have had a value stored into them.
Comment 1 Filip Pizlo 2016-01-30 17:50:14 PST
The easiest way to resolve this issue is to hide the concept of anonymous stack slots.  They should be private to Air.  And Air should just call them "spill slots".
Comment 2 Filip Pizlo 2016-01-30 17:51:15 PST
Created attachment 270331 [details]
it's a start
Comment 3 Filip Pizlo 2016-01-31 11:15:58 PST
Created attachment 270341 [details]
moar
Comment 4 Filip Pizlo 2016-01-31 11:42:58 PST
Created attachment 270342 [details]
probably done

Still need to test it and stuff
Comment 5 Filip Pizlo 2016-02-01 12:19:48 PST
Created attachment 270410 [details]
more
Comment 6 Filip Pizlo 2016-02-01 13:00:34 PST
Created attachment 270413 [details]
getting close
Comment 7 Filip Pizlo 2016-02-01 19:08:20 PST
Created attachment 270464 [details]
almost done
Comment 8 Filip Pizlo 2016-02-02 09:55:39 PST
Created attachment 270493 [details]
rebased patch
Comment 9 WebKit Commit Bot 2016-02-02 09:56:54 PST
Attachment 270493 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/b3/B3VariableValue.h:57:  The parameter name "variable" adds no information, so it should be removed.  [readability/parameter_name] [5]
ERROR: Source/JavaScriptCore/b3/B3ControlValue.h:150:  Wrong number of spaces before statement. (expected: 12)  [whitespace/indent] [4]
Total errors found: 2 in 68 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 10 Filip Pizlo 2016-02-02 10:18:18 PST
Created attachment 270495 [details]
more fixes

Fixed some rebasing issues.
Comment 11 WebKit Commit Bot 2016-02-02 10:20:41 PST
Attachment 270495 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/b3/B3VariableValue.h:57:  The parameter name "variable" adds no information, so it should be removed.  [readability/parameter_name] [5]
ERROR: Source/JavaScriptCore/b3/B3ControlValue.h:150:  Wrong number of spaces before statement. (expected: 12)  [whitespace/indent] [4]
Total errors found: 2 in 68 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 12 Filip Pizlo 2016-02-02 10:43:51 PST
This change was not intended to be a speed-up, but the perf results show that it might be one.


Benchmark report for SunSpider, V8Spider, Octane, Kraken, and AsmBench on shakezilla (MacBookPro11,3).

VMs tested:
"B3ToT" at /Volumes/Data/quartary/OpenSource/WebKitBuild/Release/jsc (r196011)
"FixStackSlot" at /Volumes/Data/quinary/OpenSource/WebKitBuild/Release/jsc (r196011)

Collected 6 samples per benchmark/VM, with 6 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to
get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                  B3ToT                  FixStackSlot                                   
SunSpider:
   3d-cube                                    4.7480+-0.0486     ?      4.7718+-0.0722        ?
   3d-morph                                   5.4072+-0.2233            5.3054+-0.0569          might be 1.0192x faster
   3d-raytrace                                5.5427+-0.0537     ?      5.5785+-0.0524        ?
   access-binary-trees                        2.2648+-0.1512            2.1749+-0.0858          might be 1.0413x faster
   access-fannkuch                            5.8050+-0.2445     ?      5.8069+-0.1516        ?
   access-nbody                               2.7018+-0.0133     ?      2.7232+-0.0184        ?
   access-nsieve                              3.5011+-0.0693            3.4103+-0.0553          might be 1.0266x faster
   bitops-3bit-bits-in-byte                   1.1917+-0.0505            1.1693+-0.0125          might be 1.0191x faster
   bitops-bits-in-byte                        3.1534+-0.0230     ?      3.2168+-0.0451        ? might be 1.0201x slower
   bitops-bitwise-and                         1.9707+-0.0283            1.9603+-0.0149        
   bitops-nsieve-bits                         3.0369+-0.0511     ?      3.0538+-0.0999        ?
   controlflow-recursive                      2.4565+-0.0714            2.3833+-0.0455          might be 1.0307x faster
   crypto-aes                                 4.0472+-0.0675     ?      4.0738+-0.1430        ?
   crypto-md5                                 2.5238+-0.0458     ?      2.5527+-0.0795        ? might be 1.0114x slower
   crypto-sha1                                2.4445+-0.1741     ?      2.6718+-0.1576        ? might be 1.0930x slower
   date-format-tofte                          6.8427+-0.1864     ?      7.0432+-0.1719        ? might be 1.0293x slower
   date-format-xparb                          4.8087+-0.0853            4.7933+-0.2462        
   math-cordic                                2.9539+-0.0372            2.9361+-0.0225        
   math-partial-sums                          4.8722+-0.0560            4.8697+-0.0491        
   math-spectral-norm                         2.0385+-0.0638     ?      2.1464+-0.0912        ? might be 1.0529x slower
   regexp-dna                                 6.3192+-0.2103            6.1148+-0.1756          might be 1.0334x faster
   string-base64                              4.7965+-0.1614            4.6636+-0.0741          might be 1.0285x faster
   string-fasta                               5.7413+-0.0747     ?      5.7524+-0.0570        ?
   string-tagcloud                            7.9209+-0.1224            7.9159+-0.1177        
   string-unpack-code                        17.9409+-0.1135     ?     18.7487+-1.8518        ? might be 1.0450x slower
   string-validate-input                      4.4277+-0.1414     ?      4.4420+-0.1174        ?

   <arithmetic>                               4.5945+-0.0175     ?      4.6261+-0.0824        ? might be 1.0069x slower

                                                  B3ToT                  FixStackSlot                                   
V8Spider:
   crypto                                    37.6754+-0.2706           37.6636+-0.2515        
   deltablue                                 52.7941+-2.7719     ?     52.9433+-1.6701        ?
   earley-boyer                              41.2561+-0.3886     ?     41.3205+-0.2763        ?
   raytrace                                  20.9548+-0.7038           20.9502+-0.5271        
   regexp                                    62.8147+-0.7418           62.2260+-1.0473        
   richards                                  40.6309+-0.6083           40.5625+-0.7045        
   splay                                     37.1233+-0.8260     ?     37.3645+-0.8885        ?

   <geometric>                               39.9594+-0.4885           39.9580+-0.2804          might be 1.0000x faster

                                                  B3ToT                  FixStackSlot                                   
Octane:
   encrypt                                   0.16140+-0.00368          0.16073+-0.00447       
   decrypt                                   2.79783+-0.01323          2.79723+-0.00774       
   deltablue                        x2       0.14406+-0.02129          0.13550+-0.00170         might be 1.0632x faster
   earley                                    0.28023+-0.00172          0.27850+-0.00286       
   boyer                                     4.55362+-0.18487          4.44199+-0.13664         might be 1.0251x faster
   navier-stokes                    x2       4.81771+-0.02299    ?     4.82065+-0.02585       ?
   raytrace                         x2       0.88177+-0.00452          0.88096+-0.01121       
   richards                         x2       0.08047+-0.00093          0.08017+-0.00127       
   splay                            x2       0.35205+-0.00407          0.35142+-0.00126       
   regexp                           x2      25.01267+-0.51362    ?    25.07954+-0.40898       ?
   pdfjs                            x2      37.47671+-0.29114    ?    37.61198+-0.25680       ?
   mandreel                         x2      42.93871+-0.28512         42.91171+-0.28955       
   gbemu                            x2      24.87135+-0.49699         24.63162+-0.35977       
   closure                                   0.56665+-0.00410          0.56062+-0.00260         might be 1.0108x faster
   jquery                                    7.34220+-0.04309    ^     7.26102+-0.01401       ^ definitely 1.0112x faster
   box2d                            x2       9.08035+-0.05127    ?     9.08730+-0.07989       ?
   zlib                             x2     391.23366+-9.82845        381.67175+-9.67881         might be 1.0251x faster
   typescript                       x2     673.42631+-11.23961   ?   679.24784+-11.82868      ?

   <geometric>                               5.25180+-0.04799          5.21408+-0.01331         might be 1.0072x faster

                                                  B3ToT                  FixStackSlot                                   
Kraken:
   ai-astar                                   95.793+-3.221             95.258+-3.986         
   audio-beat-detection                       55.538+-0.132      ?      55.750+-0.330         ?
   audio-dft                                  96.923+-2.625      ?      96.939+-3.021         ?
   audio-fft                                  43.667+-0.112             43.529+-0.251         
   audio-oscillator                           50.562+-1.938             49.785+-0.619           might be 1.0156x faster
   imaging-darkroom                           59.581+-0.198             59.483+-0.138         
   imaging-desaturate                         43.303+-0.306      ?      43.568+-0.339         ?
   imaging-gaussian-blur                      68.462+-0.842             67.959+-0.250         
   json-parse-financial                       37.898+-1.646      ?      37.919+-0.843         ?
   json-stringify-tinderbox                   22.792+-0.431      ^      22.025+-0.183         ^ definitely 1.0348x faster
   stanford-crypto-aes                        41.315+-1.218             40.813+-0.418           might be 1.0123x faster
   stanford-crypto-ccm                        36.482+-1.726      ?      37.177+-1.361         ? might be 1.0190x slower
   stanford-crypto-pbkdf2                    101.317+-0.466      ?     101.477+-0.588         ?
   stanford-crypto-sha256-iterative           38.681+-0.537             38.481+-0.200         

   <arithmetic>                               56.594+-0.297             56.440+-0.508           might be 1.0027x faster

                                                  B3ToT                  FixStackSlot                                   
AsmBench:
   bigfib.cpp                               424.5592+-1.0756          424.1931+-3.3810        
   cray.c                                   393.0655+-2.5063          391.8980+-0.5336        
   dry.c                                    411.3667+-7.5392          407.1241+-6.5431          might be 1.0104x faster
   FloatMM.c                                699.6375+-0.8075     ?    701.6382+-2.8382        ?
   gcc-loops.cpp                           3506.9248+-9.0975         3504.7858+-8.7061        
   n-body.c                                 854.3085+-2.6624     ?    858.5210+-15.7331       ?
   Quicksort.c                              393.5944+-3.0170     ?    395.8725+-4.3933        ?
   stepanov_container.cpp                  3243.0458+-13.0321        3239.1754+-25.5083       
   Towers.c                                 256.0876+-0.6784     ?    256.3263+-0.8264        ?

   <geometric>                              712.0084+-1.6787          711.8753+-1.5493          might be 1.0002x faster

                                                  B3ToT                  FixStackSlot                                   
Geomean of preferred means:
   <scaled-result>                           32.9517+-0.1368           32.9297+-0.1795          might be 1.0007x faster
Comment 13 Mark Lam 2016-02-02 13:09:15 PST
Comment on attachment 270495 [details]
more fixes

View in context: https://bugs.webkit.org/attachment.cgi?id=270495&action=review

r=me with some comments.

> Source/JavaScriptCore/b3/B3ReduceStrength.cpp:2290
> +                // accurately tha the effects() method does.

typo: tha ==> than.

> Source/JavaScriptCore/b3/B3VariableValue.h:57
> +    VariableValue(Opcode, Origin, Variable* variable);

The "variable" name adds no info.  Please remove.

> Websites/webkit.org/docs/b3/intermediate-representation.html:113
> +      input to B3, it's a good idea to run fixSSA() manually before running the compiler. The

Did you mean to say "before running the optimizer" instead?
Comment 14 Filip Pizlo 2016-02-02 13:34:23 PST
(In reply to comment #13)
> Comment on attachment 270495 [details]
> more fixes
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=270495&action=review
> 
> r=me with some comments.
> 
> > Source/JavaScriptCore/b3/B3ReduceStrength.cpp:2290
> > +                // accurately tha the effects() method does.
> 
> typo: tha ==> than.

Fixed.

> 
> > Source/JavaScriptCore/b3/B3VariableValue.h:57
> > +    VariableValue(Opcode, Origin, Variable* variable);
> 
> The "variable" name adds no info.  Please remove.

Fixed.

> 
> > Websites/webkit.org/docs/b3/intermediate-representation.html:113
> > +      input to B3, it's a good idea to run fixSSA() manually before running the compiler. The
> 
> Did you mean to say "before running the optimizer" instead?

Currently, there is no difference between "running the compiler" and "running the optimizer".  It's not possible to run "just the compiler".  When running the compiler, you can specify an optLevel, and setting it to zero will disable some, but not all, optimizations.
Comment 15 Filip Pizlo 2016-02-02 15:21:59 PST
Landed in http://trac.webkit.org/changeset/196032