Bug 156603 - [JSC] ReduceDoubleToFloat should work accross Phis
Summary: [JSC] ReduceDoubleToFloat should work accross Phis
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Benjamin Poulain
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-14 15:13 PDT by Benjamin Poulain
Modified: 2016-04-17 22:12 PDT (History)
6 users (show)

See Also:


Attachments
Patch (47.69 KB, patch)
2016-04-14 15:27 PDT, Benjamin Poulain
no flags Details | Formatted Diff | Diff
Patch (47.82 KB, patch)
2016-04-14 15:31 PDT, Benjamin Poulain
fpizlo: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Benjamin Poulain 2016-04-14 15:13:25 PDT
[JSC] ReduceDoubleToFloat should work accross Phis
Comment 1 Benjamin Poulain 2016-04-14 15:27:48 PDT
Created attachment 276440 [details]
Patch
Comment 2 WebKit Commit Bot 2016-04-14 15:30:47 PDT
Attachment 276440 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/b3/testb3.cpp:4236:  Consider using CHECK_EQ instead of CHECK(a == b)  [readability/check] [2]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:4236:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Total errors found: 2 in 5 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Benjamin Poulain 2016-04-14 15:31:19 PDT
Created attachment 276441 [details]
Patch
Comment 4 Benjamin Poulain 2016-04-14 15:31:45 PDT
1.5% faster on Unity3D.
2.8% faster on box2d-throughput-f32

                                                  Conf#1                    Conf#2                                      
SunSpider:
   3d-cube                                    5.0630+-0.0857     ?      5.1213+-0.0912        ? might be 1.0115x slower
   3d-morph                                   5.1391+-0.0193     ?      5.2100+-0.0776        ? might be 1.0138x slower
   3d-raytrace                                5.5978+-0.0689     ?      5.6257+-0.0604        ?
   access-binary-trees                        2.1249+-0.0241            2.1194+-0.0183        
   access-fannkuch                            5.9132+-0.0966            5.8171+-0.1002          might be 1.0165x faster
   access-nbody                               2.5805+-0.0718            2.5222+-0.0056          might be 1.0231x faster
   access-nsieve                              3.0710+-0.0510            3.0691+-0.0538        
   bitops-3bit-bits-in-byte                   1.1362+-0.0128            1.1247+-0.0095          might be 1.0102x faster
   bitops-bits-in-byte                        2.7992+-0.0283     ?      2.8100+-0.0510        ?
   bitops-bitwise-and                         2.1385+-0.0990            2.0785+-0.0485          might be 1.0288x faster
   bitops-nsieve-bits                         3.1204+-0.0447            3.1090+-0.0156        
   controlflow-recursive                      2.3639+-0.0231            2.3509+-0.0494        
   crypto-aes                                 4.0411+-0.0868            4.0368+-0.0466        
   crypto-md5                                 2.4738+-0.0330     ?      2.5009+-0.0602        ? might be 1.0109x slower
   crypto-sha1                                2.3662+-0.0380            2.3448+-0.0259        
   date-format-tofte                          6.3992+-0.0751     ?      6.5363+-0.1850        ? might be 1.0214x slower
   date-format-xparb                          4.7750+-0.1088     ?      4.8119+-0.0967        ?
   math-cordic                                2.8944+-0.0293            2.8840+-0.0174        
   math-partial-sums                          4.8608+-0.0467     ?      4.9004+-0.0901        ?
   math-spectral-norm                         2.0161+-0.0368            1.9980+-0.0088        
   regexp-dna                                 6.3816+-0.1312     ?      6.4131+-0.1861        ?
   string-base64                              4.9389+-0.1949     ?      5.0019+-0.2586        ? might be 1.0128x slower
   string-fasta                               5.9164+-0.1130            5.8000+-0.0569          might be 1.0201x faster
   string-tagcloud                            8.1603+-0.0850            8.1314+-0.0860        
   string-unpack-code                        19.3697+-0.4996     ?     19.7377+-0.6318        ? might be 1.0190x slower
   string-validate-input                      4.4263+-0.0927            4.4134+-0.1178        

   <arithmetic>                               4.6180+-0.0226     ?      4.6334+-0.0330        ? might be 1.0033x slower

                                                  Conf#1                    Conf#2                                      
Octane:
   encrypt                                   0.16302+-0.00072          0.16235+-0.00062       
   decrypt                                   2.84583+-0.00869          2.84089+-0.00306       
   deltablue                        x2       0.13999+-0.00195          0.13934+-0.00196       
   earley                                    0.28653+-0.00112    ?     0.28696+-0.00135       ?
   boyer                                     5.03678+-0.03691          5.02123+-0.05303       
   navier-stokes                    x2       5.00274+-0.00923    ?     5.01018+-0.01144       ?
   raytrace                         x2       0.79686+-0.00220    ?     0.79716+-0.00250       ?
   richards                         x2       0.08333+-0.00051    ?     0.08348+-0.00042       ?
   splay                            x2       0.34197+-0.00270          0.34191+-0.00165       
   regexp                           x2      15.83075+-0.10910         15.79311+-0.11414       
   pdfjs                            x2      39.62902+-0.28598    ?    39.74836+-0.27570       ?
   mandreel                         x2      42.53394+-0.11733         42.36217+-0.11472       
   gbemu                            x2      24.25185+-0.05847    ?    24.29922+-0.14632       ?
   closure                                   0.53462+-0.00215          0.53338+-0.00160       
   jquery                                    6.83363+-0.02290    ?     6.87418+-0.02429       ?
   box2d                            x2       9.18677+-0.03212    ?     9.26675+-0.05235       ?
   zlib                             x2     357.60263+-4.57913        355.81132+-3.04523       
   typescript                       x2     619.53806+-2.20658        618.44664+-1.75235       

   <geometric>                               5.02325+-0.00644          5.02243+-0.00712         might be 1.0002x faster

                                                  Conf#1                    Conf#2                                      
Kraken:
   ai-astar                                   88.233+-0.805             88.147+-0.868         
   audio-beat-detection                       42.004+-0.068      ?      42.064+-0.090         ?
   audio-dft                                  99.891+-1.072             99.869+-1.198         
   audio-fft                                  32.905+-0.269             32.834+-0.171         
   audio-oscillator                           47.920+-0.127      ?      47.926+-0.049         ?
   imaging-darkroom                           60.327+-0.173      ^      59.968+-0.106         ^ definitely 1.0060x faster
   imaging-desaturate                         45.300+-0.161             45.242+-0.133         
   imaging-gaussian-blur                      64.753+-1.804             64.115+-2.095         
   json-parse-financial                       37.782+-0.091      ^      37.230+-0.232         ^ definitely 1.0148x faster
   json-stringify-tinderbox                   24.799+-1.453             22.849+-0.701           might be 1.0853x faster
   stanford-crypto-aes                        39.370+-0.432             39.369+-0.408         
   stanford-crypto-ccm                        32.723+-0.580             32.631+-0.857         
   stanford-crypto-pbkdf2                     96.461+-0.334      ?      96.972+-0.934         ?
   stanford-crypto-sha256-iterative           37.037+-0.538             36.835+-0.239         

   <arithmetic>                               53.536+-0.193             53.289+-0.238           might be 1.0046x faster

                                                  Conf#1                    Conf#2                                      
AsmBench:
   bigfib.cpp                               443.7581+-1.3842     ?    444.4828+-2.2829        ?
   cray.c                                   364.3141+-1.2121     ?    365.1794+-1.8583        ?
   dry.c                                    443.5284+-23.1532         427.9712+-4.1389          might be 1.0364x faster
   FloatMM.c                                725.7166+-3.5496     ?    726.8204+-3.3913        ?
   gcc-loops.cpp                           3712.3209+-3.7844     ?   3720.0697+-12.8160       ?
   n-body.c                                 808.3756+-2.3792     ?    809.4591+-3.0377        ?
   Quicksort.c                              397.8874+-1.1236     ?    398.7238+-1.4567        ?
   stepanov_container.cpp                  3332.8912+-15.0782        3328.9234+-13.3996       
   Towers.c                                 272.8819+-0.5958          272.7597+-0.2360        

   <geometric>                              726.3510+-3.9406          724.5353+-1.0660          might be 1.0025x faster

                                                  Conf#1                    Conf#2                                      
Geomean of preferred means:
   <scaled-result>                           30.8175+-0.0684           30.7869+-0.0589          might be 1.0010x faster
Comment 5 WebKit Commit Bot 2016-04-14 15:33:23 PDT
Attachment 276441 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/b3/testb3.cpp:4236:  Consider using CHECK_EQ instead of CHECK(a == b)  [readability/check] [2]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:4236:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Total errors found: 2 in 5 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 6 Filip Pizlo 2016-04-14 16:10:22 PDT
Comment on attachment 276441 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=276441&action=review

I think I buy this.  I will read it again before r+'ing.  Maybe worth it to get someone else to look at it, too.  Maybe Saam.

> Source/JavaScriptCore/ChangeLog:10
> +        This patch extends B3's ReduceDoubleToFloat phase to work accross
> +        Upsilon-Phis. This is important to optimize loops and some crazy cases.

Awesome.
Comment 7 Saam Barati 2016-04-14 19:40:28 PDT
Comment on attachment 276441 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=276441&action=review

I think I understand mostly what's going on. I've added some comments. 
I want to read it and grok it a bit more, but I have to catch a plane now. 
I'll look more later.

> Source/JavaScriptCore/ChangeLog:15
> +            Float @2 = DoubleToPhi(@1)

Typo: DoubleToFloat

> Source/JavaScriptCore/ChangeLog:22
> +        but the user of the value does not do DoubleToPhi.

Typo here too

> Source/JavaScriptCore/b3/B3ReduceDoubleToFloat.cpp:176
> +                changedPhiState = true;

This is dead code.

> Source/JavaScriptCore/b3/B3ReduceDoubleToFloat.cpp:216
> +                || (value->type() == Double

Indentation is off here.
This might be easier to read if it were broken up into individual if statements.

> Source/JavaScriptCore/b3/B3ReduceDoubleToFloat.cpp:346
> +                    if (phi->type() == Float && child->type() == Double

The "child->type() == Double" should always be true, right? Maybe make it an assertion

> Source/JavaScriptCore/b3/B3ReduceDoubleToFloat.cpp:388
> +                            upsilon->child(0) = newChild;

I have a question about B3 here:
What happens to the old value if we were the only reference of it?
Are all values allocated under some arena and are cleaned up after compilation?
Comment 8 Benjamin Poulain 2016-04-15 10:47:02 PDT
(In reply to comment #7)
> > Source/JavaScriptCore/b3/B3ReduceDoubleToFloat.cpp:346
> > +                    if (phi->type() == Float && child->type() == Double
> 
> The "child->type() == Double" should always be true, right? Maybe make it an
> assertion

Remember that we are not in a valid state in simplify().
You can have a child that was already converted to float by an other node.

It would be fine to convert it again, but changedPhi has to tell the truth.

> > Source/JavaScriptCore/b3/B3ReduceDoubleToFloat.cpp:388
> > +                            upsilon->child(0) = newChild;
> 
> I have a question about B3 here:
> What happens to the old value if we were the only reference of it?
> Are all values allocated under some arena and are cleaned up after
> compilation?

All the values are owned by the Procedure. The dead values stick around until you call deleteValue().
Comment 9 Filip Pizlo 2016-04-17 12:52:29 PDT
Comment on attachment 276441 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=276441&action=review

>> Source/JavaScriptCore/b3/B3ReduceDoubleToFloat.cpp:216
>> +                || (value->type() == Double
> 
> Indentation is off here.
> This might be easier to read if it were broken up into individual if statements.

I believe that this indentation style is what we're supposed to do, but I agree that this would read easier if it was broken up.

>> Source/JavaScriptCore/b3/B3ReduceDoubleToFloat.cpp:388
>> +                            upsilon->child(0) = newChild;
> 
> I have a question about B3 here:
> What happens to the old value if we were the only reference of it?
> Are all values allocated under some arena and are cleaned up after compilation?

Values are owned by Procedure and it will delete all of them when the Procedure is deleted.

More precisely, Values can be in one of these states:

1) Not owned by any Procedure.  We don't allow this state to be user-visible by always funneling allocations through Procedure.
2) Owned by a Procedure but not in any BasicBlock.  This is not a memory leak since Procedure will delete all values.  This state will fail validation because we require all values to be in basic blocks.  If you know that you're in this state, Procedure has a GC method you can call.  Currently, the FTL calls this method right after lowering.  I don't think that method has any other users.
3) Owned by a Procedure and in one BasicBlock.  This is expected state of all values in between phases (i.e. when we validate).
Comment 10 Benjamin Poulain 2016-04-17 22:12:57 PDT
Committed r199648: <http://trac.webkit.org/changeset/199648>