Bug 156486 - [JSC] B3 can use undefined bits or not defined required bits when spilling
Summary: [JSC] B3 can use undefined bits or not defined required bits when spilling
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Benjamin Poulain
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-11 18:53 PDT by Benjamin Poulain
Modified: 2016-04-11 23:26 PDT (History)
6 users (show)

See Also:


Attachments
Patch (11.31 KB, patch)
2016-04-11 19:02 PDT, Benjamin Poulain
fpizlo: review+
Details | Formatted Diff | Diff
Patch (12.41 KB, patch)
2016-04-11 19:29 PDT, Benjamin Poulain
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Benjamin Poulain 2016-04-11 18:53:30 PDT
[JSC] B3 can use undefined bits or not defined required bits when spilling
Comment 1 Benjamin Poulain 2016-04-11 19:02:48 PDT
Created attachment 276197 [details]
Patch
Comment 2 WebKit Commit Bot 2016-04-11 19:04:22 PDT
Attachment 276197 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/b3/testb3.cpp:11430:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:11440:  Consider using CHECK_EQ instead of CHECK(a == b)  [readability/check] [2]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:11440:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:11481:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:11494:  Consider using CHECK_EQ instead of CHECK(a == b)  [readability/check] [2]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:11494:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:11495:  Consider using CHECK_EQ instead of CHECK(a == b)  [readability/check] [2]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:11498:  Consider using CHECK_EQ instead of CHECK(a == b)  [readability/check] [2]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:11498:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
ERROR: Source/JavaScriptCore/b3/air/AirIteratedRegisterCoalescing.cpp:1465:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 10 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Filip Pizlo 2016-04-11 19:05:26 PDT
Comment on attachment 276197 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=276197&action=review

> Source/JavaScriptCore/ChangeLog:17
> +            Op64 %tmp1, %tmp2, %tmp3

I believe that the width analysis should report that we could read 64 bits from %tmp1, so then %tmp1's spill slot would be 64-bit.
Comment 4 Benjamin Poulain 2016-04-11 19:29:10 PDT
Created attachment 276200 [details]
Patch
Comment 5 WebKit Commit Bot 2016-04-11 19:30:42 PDT
Attachment 276200 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/b3/testb3.cpp:11430:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:11440:  Consider using CHECK_EQ instead of CHECK(a == b)  [readability/check] [2]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:11440:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:11481:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:11494:  Consider using CHECK_EQ instead of CHECK(a == b)  [readability/check] [2]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:11494:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:11495:  Consider using CHECK_EQ instead of CHECK(a == b)  [readability/check] [2]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:11498:  Consider using CHECK_EQ instead of CHECK(a == b)  [readability/check] [2]
ERROR: Source/JavaScriptCore/b3/testb3.cpp:11498:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
ERROR: Source/JavaScriptCore/b3/air/AirIteratedRegisterCoalescing.cpp:1465:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 10 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 6 Filip Pizlo 2016-04-11 21:20:02 PDT
Comment on attachment 276200 [details]
Patch

How does performance look with this change?
Comment 7 Benjamin Poulain 2016-04-11 21:49:37 PDT
Looks like ASM.js is consistently ~1% slower.

                                                  Conf#1                    Conf#2                                      
SunSpider:
   3d-cube                                    8.4700+-0.2026     ?      8.7783+-0.3943        ? might be 1.0364x slower
   3d-morph                                   8.0162+-0.1140            7.8483+-0.1866          might be 1.0214x faster
   3d-raytrace                                8.9434+-0.0881     ?      8.9725+-0.4521        ?
   access-binary-trees                        3.1238+-0.1945     ?      3.1438+-0.1957        ?
   access-fannkuch                            9.1057+-0.1984     ?      9.2297+-0.2061        ? might be 1.0136x slower
   access-nbody                               4.2905+-0.2136     ?      4.3686+-0.1271        ? might be 1.0182x slower
   access-nsieve                              4.6182+-0.1408     ?      4.7003+-0.1357        ? might be 1.0178x slower
   bitops-3bit-bits-in-byte                   1.6906+-0.1234            1.6691+-0.1486          might be 1.0129x faster
   bitops-bits-in-byte                        5.0170+-0.1201     ?      5.0589+-0.1450        ?
   bitops-bitwise-and                         2.7761+-0.1008     ?      2.9255+-0.0535        ? might be 1.0538x slower
   bitops-nsieve-bits                         4.6110+-0.1628     ?      4.6262+-0.1005        ?
   controlflow-recursive                      3.4317+-0.1120     ?      3.5907+-0.1227        ? might be 1.0463x slower
   crypto-aes                                 6.1561+-0.4940     ?      6.1772+-0.3405        ?
   crypto-md5                                 3.8763+-0.1746     ?      3.9257+-0.1325        ? might be 1.0128x slower
   crypto-sha1                                3.5192+-0.1567     ?      3.5223+-0.2197        ?
   date-format-tofte                         11.9573+-0.3182           11.8615+-0.1484        
   date-format-xparb                          7.2903+-0.2723            7.2382+-0.2115        
   math-cordic                                4.4269+-0.1337            4.4053+-0.1568        
   math-partial-sums                          9.7850+-0.1278            9.4720+-0.3954          might be 1.0330x faster
   math-spectral-norm                         3.3010+-0.0402            3.2201+-0.1869          might be 1.0251x faster
   regexp-dna                                10.3401+-0.5237           10.0945+-0.3628          might be 1.0243x faster
   string-base64                              6.5437+-0.0766     ?      6.5683+-0.2092        ?
   string-fasta                               9.0342+-0.4598     ?      9.0490+-0.3726        ?
   string-tagcloud                           12.9733+-0.7505           12.6915+-0.3714          might be 1.0222x faster
   string-unpack-code                        26.7853+-1.0230     ?     27.3695+-1.8284        ? might be 1.0218x slower
   string-validate-input                      6.3016+-0.2336            6.2375+-0.1245          might be 1.0103x faster

   <arithmetic>                               7.1686+-0.0851     ?      7.1825+-0.1089        ? might be 1.0019x slower

                                                  Conf#1                    Conf#2                                      
Octane:
   encrypt                                   0.27507+-0.00936          0.27294+-0.00566       
   decrypt                                   4.92433+-0.01932    !     5.00699+-0.04442       ! definitely 1.0168x slower
   deltablue                        x2       0.24382+-0.00785          0.24216+-0.00365       
   earley                                    0.51026+-0.00446    ?     0.51219+-0.00350       ?
   boyer                                     8.59965+-0.01781    !     8.75121+-0.05111       ! definitely 1.0176x slower
   navier-stokes                    x2       6.45040+-0.01609          6.43984+-0.03206       
   raytrace                         x2       1.39355+-0.03777    ?     1.41191+-0.02327       ? might be 1.0132x slower
   richards                         x2       0.14166+-0.00191    ?     0.14179+-0.00215       ?
   splay                            x2       0.53410+-0.00409          0.53074+-0.01105       
   regexp                           x2      26.70413+-0.48799         26.62181+-0.37138       
   pdfjs                            x2      60.75209+-1.63618         60.37284+-0.46014       
   mandreel                         x2      67.10087+-0.27170    !    67.86370+-0.30657       ! definitely 1.0114x slower
   gbemu                            x2      44.55959+-2.38735         42.65322+-0.40096         might be 1.0447x faster
   closure                                   0.83894+-0.00408    !     0.84956+-0.00392       ! definitely 1.0127x slower
   jquery                                   10.92553+-0.25972    ?    10.99053+-0.10177       ?
   box2d                            x2      16.01959+-0.14411    ?    16.03096+-0.27536       ?
   zlib                             x2     550.03451+-13.48751       548.60665+-5.01457       
   typescript                       x2    1069.22711+-7.02785    ?  1072.60455+-11.38620      ?

   <geometric>                               8.25054+-0.05727          8.24164+-0.04554         might be 1.0011x faster

                                                  Conf#1                    Conf#2                                      
Kraken:
   ai-astar                                  132.773+-1.363            132.524+-0.396         
   audio-beat-detection                       68.271+-0.251      ?      68.936+-1.227         ?
   audio-dft                                 127.768+-1.648      ?     129.038+-1.590         ?
   audio-fft                                  54.887+-0.406      ?      54.990+-0.319         ?
   audio-oscillator                           78.777+-0.603             78.288+-0.740         
   imaging-darkroom                           96.976+-2.958             95.803+-0.294           might be 1.0122x faster
   imaging-desaturate                         92.572+-0.285      ?      92.892+-0.557         ?
   imaging-gaussian-blur                     121.156+-1.000            110.865+-15.946          might be 1.0928x faster
   json-parse-financial                       67.831+-0.424      ?      68.075+-0.309         ?
   json-stringify-tinderbox                   40.422+-0.764             40.129+-0.196         
   stanford-crypto-aes                        61.270+-0.715             61.013+-1.304         
   stanford-crypto-ccm                        59.892+-4.987             59.022+-5.200           might be 1.0147x faster
   stanford-crypto-pbkdf2                    149.264+-2.635      ?     149.866+-3.190         ?
   stanford-crypto-sha256-iterative           58.026+-0.545      ?      59.032+-0.555         ? might be 1.0173x slower

   <arithmetic>                               86.420+-0.555             85.748+-0.567           might be 1.0078x faster

                                                  Conf#1                    Conf#2                                      
AsmBench:
   bigfib.cpp                               656.9565+-9.9714     ?    658.5748+-14.1574       ?
   cray.c                                   565.3174+-5.8253     ?    566.1335+-10.5589       ?
   dry.c                                    651.1944+-27.5699    ?    674.1650+-0.6050        ? might be 1.0353x slower
   FloatMM.c                                889.3350+-2.3675     ?    891.9225+-3.2651        ?
   gcc-loops.cpp                           6420.1935+-30.6182    ?   6434.8116+-22.1752       ?
   n-body.c                                1522.2892+-79.2739    ?   1544.3279+-85.1951       ? might be 1.0145x slower
   Quicksort.c                              574.9522+-3.7392     !    590.7017+-2.7602        ! definitely 1.0274x slower
   stepanov_container.cpp                  4460.5098+-27.9670    ?   4487.1917+-49.2854       ?
   Towers.c                                 377.2852+-1.7200     !    385.9769+-0.6723        ! definitely 1.0230x slower

   <geometric>                             1081.9978+-5.5919     !   1095.7885+-2.4173        ! definitely 1.0127x slower

                                                  Conf#1                    Conf#2                                      
Geomean of preferred means:
   <scaled-result>                           48.4940+-0.2464     ?     48.5633+-0.3385        ? might be 1.0014x slower

Second run of ASM.js:
                                  Conf#1                    Conf#2                                      

bigfib.cpp                  654.1125+-6.8197     ?    655.4327+-5.5464        ?
cray.c                      565.2639+-5.8350     ?    571.0610+-8.0384        ? might be 1.0103x slower
dry.c                       657.9476+-30.5735    ?    658.9620+-28.6640       ?
FloatMM.c                   890.0372+-1.6299     ?    891.0305+-3.7049        ?
gcc-loops.cpp              6419.4900+-21.4621        6414.3735+-32.6108       
n-body.c                   1548.6223+-91.1361        1542.5115+-81.2546       
Quicksort.c                 572.1219+-3.2370     !    589.4455+-8.9777        ! definitely 1.0303x slower
stepanov_container.cpp     4463.2445+-60.1400    ?   4490.3182+-59.7972       ?
Towers.c                    376.7356+-0.4260     !    385.5025+-0.7840        ! definitely 1.0233x slower

<geometric>                1084.1483+-11.3733    ?   1092.5004+-9.8962        ? might be 1.0077x slower
Comment 8 Benjamin Poulain 2016-04-11 22:14:06 PDT
                                 Baseline               SmallStackslot            LargeStackslot        LargeStackslot v. Baseline

bigfib.cpp                  655.0576+-11.3429         652.7036+-12.3709    ?    656.0836+-5.9068        ?
cray.c                      565.4435+-8.1634     ?    571.1257+-2.0889          565.9597+-5.1407        ?
dry.c                       650.7029+-25.8607    ?    674.8217+-2.0914          658.0632+-29.7491       ? might be 1.0113x slower
FloatMM.c                   889.3929+-1.2567          889.1890+-1.4387     ?    890.1675+-3.5484        ?
gcc-loops.cpp              6413.8161+-21.2019    ?   6414.8786+-38.3145        6408.4991+-19.4484       
n-body.c                   1574.0956+-69.2270        1550.5068+-42.5719    ?   1569.4171+-78.0803       
Quicksort.c                 574.8629+-6.2179     ?    575.6371+-4.8284     !    594.1012+-1.9979        ! definitely 1.0335x slower
stepanov_container.cpp     4437.3698+-20.8219    ?   4464.8043+-26.3570    ?   4476.6780+-44.9211       ?
Towers.c                    376.1720+-1.6275     ?    377.3234+-2.8622     !    386.1508+-1.9629        ! definitely 1.0265x slower

<geometric>                1084.5183+-8.1783     ?   1089.1745+-3.3137     ?   1094.0357+-9.8083        ? might be 1.0088x slower
Comment 9 Benjamin Poulain 2016-04-11 22:15:09 PDT
Baseline against first patch again:
                                  Conf#1                    Conf#2                                      

bigfib.cpp                  653.0945+-10.4978    ?    669.9697+-30.3487       ? might be 1.0258x slower
cray.c                      564.0093+-6.7668     ?    574.0212+-5.0392        ? might be 1.0178x slower
dry.c                       667.3754+-24.8306         650.4228+-25.3341         might be 1.0261x faster
FloatMM.c                   890.0898+-2.8910     ?    890.6833+-1.4781        ?
gcc-loops.cpp              6434.5238+-43.4264        6414.8804+-17.1803       
n-body.c                   1548.2090+-85.1345    ?   1551.4907+-41.5082       ?
Quicksort.c                 576.8409+-2.9926          575.0517+-2.3609        
stepanov_container.cpp     4438.1675+-30.6399    ?   4491.6426+-88.1739       ? might be 1.0120x slower
Towers.c                    376.4673+-0.9480     ?    377.4253+-2.2056        ?

<geometric>                1085.8985+-6.9108     ?   1089.3462+-5.8747        ? might be 1.0032x slower
Comment 10 Filip Pizlo 2016-04-11 22:23:19 PDT
I think both approaches are sound, and they both appear to deliver good performance.  I like the second one a little better because I think it puts the burden of work on defs (extra store on some defs) instead of uses (extra zext on some uses).
Comment 11 WebKit Commit Bot 2016-04-11 23:16:16 PDT
Comment on attachment 276200 [details]
Patch

Clearing flags on attachment: 276200

Committed r199337: <http://trac.webkit.org/changeset/199337>