Bug 163548 - Air::IRC doesn't need to have a special-case for uncolored Tmps since they all get colored
Summary: Air::IRC doesn't need to have a special-case for uncolored Tmps since they al...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: All All
: P2 Normal
Assignee: Filip Pizlo
URL:
Keywords: InRadar
Depends on: 163509
Blocks:
  Show dependency treegraph
 
Reported: 2016-10-17 09:55 PDT by Filip Pizlo
Modified: 2016-10-17 14:49 PDT (History)
3 users (show)

See Also:


Attachments
the patch (24.90 KB, patch)
2016-10-17 11:42 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (25.55 KB, patch)
2016-10-17 11:51 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (25.71 KB, patch)
2016-10-17 11:58 PDT, Filip Pizlo
ggaren: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2016-10-17 09:55:34 PDT
IRC used to assume that no-degree Tmps can get any color. That's not true. No degree does not really mean no interference because of coalescable Moves. Bug 163509 was about that. Now we just need to remove the other side of this special case: IRC still has a special case for uncolored Tmps. We don't need that anymore - a truly unused Tmp will get handled naturally by the algorithm.

If we ever find that large numbers of unused Tmps are putting a stress on IRC and making it run longer, then we could just add a Tmp pruning pass.

Also, it turns out that this special case is hiding other bugs.  I'm investigating those now.
Comment 1 Filip Pizlo 2016-10-17 09:58:32 PDT
Here's a horrible bug.  Say you have a Patchpoint that late-clobbers all regs followed by a Patchpoint that wants a scratch.  If you do this, the scratch is uncolorable.

We have this bug right now because we don't have callee-save FPRs on x86.  So if you have a CCall or JS call Patchpoint followed by a Patchpoint that wants FP scratches, then the FP scratch will not get colored and it will be left as %xmm0 at the end of IRC.  That may be wrong, if the return value of the CCall or Patchpoint just before was %xmm0, for example.
Comment 2 Filip Pizlo 2016-10-17 10:17:03 PDT
This bug exists because our strategy to reason about interference at instruction boundaries doesn't capture the essence of what scratch registers are all about.  For any other kind of use, we know that the tmp holding the use will have to have been live across the previous instruction, so it will definitely interfere with the previous instruction's late defs and late clobbers.  So, we can safely conflate the execution of the previous instruction's late actions with the execution of this instruction's early actions.

Reasoning about interference only at instruction boundaries is also imprecise for register clobber sets - but only just. Imagine a pair of consecutive insts where the first has a dead def and the second has an early clobber. In fact, the dead def could use the same register as the early clobber, but we won't let that happen. Perhaps a better example is an inst with a late clobber followed by an inst with an early def. Clearly, the early def could use the same register as the previous instruction's late clobber, but we won't let it. That's actually a serious bug. It would mean that if you early define a floating point value and the thing before you is a CCall, you'll have a really bad time.

Fortunately, the only thing we have to do to ensure that things don't go wrong is to insert Nops to pad between instructions that have incompatible constraints.  Here are some Nop insertion rules that we could apply:

- Insert Nop before any instruction that claims early reg clobber and after any instruction that claims late reg clobber. I think that this would fix all known cases of this bug. This still means that the scratch registers of one inst could affect the colorability of early defs of the next inst, which is pretty weird.

- Insert Nop around scratch users. This wouldn't be enough, since it wouldn't fix the early def problem.

Maybe one way to look at this is that Scratch and EarlyDef are the problem children. Let's consider if this is really true by looking at the four main forms of arg kinds: early versus late crossed with use versus def.

Early Def: We know that this case is problematic, because it's exactly what happens in my current crash. It happens to be an early def triggered by Scratch.

Early Use: Uses cause liveness. Either the thing that is being used is defined by the previous instruction or it's live before the previous instruction - so either way, it interferes with all of the previous instruction's clobbers and defs.  No problem here.

Late Def: This case is also problematic. You could have a late def followed by something that early-clobbers all registers. In that case we'd want the def to use one of those registers and then insert spill code, but we can't do that if the def cannot get any regs at all.

Late Use: This could be a killing use. Imagine a late use followed by something that early-clobbers all registers.

Right now we don't have any clients that early-clobber all registers. But they may early-clobber some registers, and maybe this is enough to worry about.

Note that we can generalize this a bit further. A clobber is like a dead def. So the patterns of interference we have to worry about are:

Late Def or Late Clobber followed by Early Def or Early Clobber
Late Use followed by Early Def or Early Clobber.

Maybe what we should do is have a pre-pass for register allocators that inserts Nops to pad those bad cases.
Comment 3 Radar WebKit Bug Importer 2016-10-17 11:37:11 PDT
<rdar://problem/28804381>
Comment 4 Filip Pizlo 2016-10-17 11:42:15 PDT
Created attachment 291848 [details]
the patch
Comment 5 WebKit Commit Bot 2016-10-17 11:44:41 PDT
Attachment 291848 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/ChangeLog:15:  Please consider whether the use of security-sensitive phrasing could help someone exploit WebKit: fuzzing  [changelog/unwantedsecurityterms] [3]
Total errors found: 1 in 15 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 6 Filip Pizlo 2016-10-17 11:51:17 PDT
Created attachment 291850 [details]
the patch
Comment 7 WebKit Commit Bot 2016-10-17 11:54:12 PDT
Attachment 291850 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/ChangeLog:16:  Please consider whether the use of security-sensitive phrasing could help someone exploit WebKit: fuzzing  [changelog/unwantedsecurityterms] [3]
Total errors found: 1 in 15 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 Filip Pizlo 2016-10-17 11:58:06 PDT
Created attachment 291852 [details]
the patch
Comment 9 WebKit Commit Bot 2016-10-17 11:59:27 PDT
Attachment 291852 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/ChangeLog:16:  Please consider whether the use of security-sensitive phrasing could help someone exploit WebKit: fuzzing  [changelog/unwantedsecurityterms] [3]
Total errors found: 1 in 15 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 10 Filip Pizlo 2016-10-17 12:26:24 PDT
Perf is neutral:

Benchmark report for SunSpider, LongSpider, V8Spider, Octane, Kraken, and AsmBench on murderface (MacBookPro11,5).

VMs tested:
"TipOfTree" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r207408)
"Things" at /Volumes/Data/secondary/OpenSource/WebKitBuild/Release/jsc (r207408)

Collected 6 samples per benchmark/VM, with 6 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to
get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                TipOfTree                   Things                                      
SunSpider:
   3d-cube                                    4.5635+-0.2077     !      4.8669+-0.0921        ! definitely 1.0665x slower
   3d-morph                                   4.6061+-0.0371     ?      4.7535+-0.2435        ? might be 1.0320x slower
   3d-raytrace                                4.7595+-0.1825     ?      4.8223+-0.1327        ? might be 1.0132x slower
   access-binary-trees                        1.9661+-0.0268     ?      2.0055+-0.0611        ? might be 1.0201x slower
   access-fannkuch                            4.7254+-0.1757     ?      4.7303+-0.1320        ?
   access-nbody                               2.3392+-0.0273     ?      2.3603+-0.0908        ?
   access-nsieve                              3.2064+-0.0675            3.1633+-0.0407          might be 1.0136x faster
   bitops-3bit-bits-in-byte                   1.1117+-0.0726     ?      1.1221+-0.0309        ?
   bitops-bits-in-byte                        2.7420+-0.0654            2.7278+-0.0856        
   bitops-bitwise-and                         1.9352+-0.0342     ?      1.9899+-0.1687        ? might be 1.0283x slower
   bitops-nsieve-bits                         2.9925+-0.0412     ?      3.0083+-0.0516        ?
   controlflow-recursive                      2.2332+-0.0307     ?      2.2554+-0.0644        ?
   crypto-aes                                 4.2394+-0.0621     ?      4.3119+-0.1421        ? might be 1.0171x slower
   crypto-md5                                 2.6076+-0.0831     ?      2.6100+-0.0558        ?
   crypto-sha1                                2.7259+-0.1662            2.6991+-0.0678        
   date-format-tofte                          6.6424+-0.1466     ?      6.6534+-0.1654        ?
   date-format-xparb                          4.3614+-0.0681     ?      4.4875+-0.2865        ? might be 1.0289x slower
   math-cordic                                2.6462+-0.0672            2.6402+-0.0526        
   math-partial-sums                          3.8569+-0.0742            3.8362+-0.1170        
   math-spectral-norm                         1.9379+-0.0141     ?      1.9799+-0.0537        ? might be 1.0217x slower
   regexp-dna                                 6.0219+-0.1280     ?      6.0466+-0.2232        ?
   string-base64                              4.5788+-0.2945            4.5485+-0.1798        
   string-fasta                               5.3533+-0.2661            5.3433+-0.0949        
   string-tagcloud                            8.2626+-0.2509            8.2262+-0.5519        
   string-unpack-code                        18.8714+-0.9339           18.3337+-0.6285          might be 1.0293x faster
   string-validate-input                      4.2263+-0.1359            4.2087+-0.0681        

   <arithmetic>                               4.3659+-0.0488     ?      4.3743+-0.0422        ? might be 1.0019x slower

                                                TipOfTree                   Things                                      
LongSpider:
   3d-cube                                  766.3006+-7.8896     ?    774.9436+-10.7597       ? might be 1.0113x slower
   3d-morph                                 564.8380+-5.7219          560.7331+-2.7138        
   3d-raytrace                              451.1555+-6.5341     ?    452.2502+-4.7283        ?
   access-binary-trees                      785.1498+-3.1568          777.8087+-7.5539        
   access-fannkuch                          236.9313+-15.8096         228.9799+-4.0861          might be 1.0347x faster
   access-nbody                             503.0647+-5.3581          502.2517+-4.3050        
   access-nsieve                            283.2287+-12.0946         281.6157+-7.3657        
   bitops-3bit-bits-in-byte                  31.6905+-1.1717           31.3657+-0.7230          might be 1.0104x faster
   bitops-bits-in-byte                       81.0870+-1.7034     ?     82.3441+-2.7083        ? might be 1.0155x slower
   bitops-nsieve-bits                       368.1612+-5.3194     ?    371.5583+-10.6290       ?
   controlflow-recursive                    431.3577+-4.9020          426.2359+-2.2779          might be 1.0120x faster
   crypto-aes                               536.8144+-7.0910          530.1278+-9.4802          might be 1.0126x faster
   crypto-md5                               454.6825+-11.5408         447.8979+-9.3365          might be 1.0151x faster
   crypto-sha1                              593.3735+-8.6011          588.5035+-7.0299        
   date-format-tofte                        329.7884+-8.0984          325.7576+-3.4504          might be 1.0124x faster
   date-format-xparb                        597.3298+-3.5808     ?    597.3829+-3.6885        ?
   hash-map                                 135.3798+-1.8664          134.5875+-0.5538        
   math-cordic                              406.5844+-3.0753          406.4504+-1.8212        
   math-partial-sums                        280.3720+-1.1693     ?    285.6557+-7.9468        ? might be 1.0188x slower
   math-spectral-norm                       513.7357+-1.8481     ?    514.9935+-3.5529        ?
   string-base64                            465.3550+-2.4841     ?    470.2137+-4.4853        ? might be 1.0104x slower
   string-fasta                             323.7067+-4.7078     ?    329.2609+-7.7219        ? might be 1.0172x slower
   string-tagcloud                          161.2982+-2.1852     ?    162.5371+-0.8447        ?

   <geometric>                              334.5645+-1.6294          334.0034+-0.7251          might be 1.0017x faster

                                                TipOfTree                   Things                                      
V8Spider:
   crypto                                    33.8389+-0.2162     ?     33.9471+-0.4037        ?
   deltablue                                 48.0852+-1.1778           47.8535+-1.7035        
   earley-boyer                              34.2065+-0.4593     ?     34.5249+-0.4456        ?
   raytrace                                  18.6578+-0.3119           18.4515+-0.1656          might be 1.0112x faster
   regexp                                    49.6619+-0.5939           49.5150+-0.2479        
   richards                                  38.9279+-1.2655           37.6948+-0.7162          might be 1.0327x faster
   splay                                     28.6549+-0.9101     ?     29.2582+-0.8115        ? might be 1.0211x slower

   <geometric>                               34.4399+-0.3001           34.3528+-0.2323          might be 1.0025x faster

                                                TipOfTree                   Things                                      
Octane:
   encrypt                                   0.14809+-0.00079    ?     0.14842+-0.00083       ?
   decrypt                                   2.54850+-0.03149          2.54710+-0.00805       
   deltablue                        x2       0.11653+-0.00237    ?     0.11678+-0.00111       ?
   earley                                    0.23928+-0.00170          0.23807+-0.00154       
   boyer                                     4.26517+-0.12756          4.12466+-0.14025         might be 1.0341x faster
   navier-stokes                    x2       4.61390+-0.02118    ?     4.62915+-0.03708       ?
   raytrace                         x2       0.66140+-0.00761          0.65207+-0.00489         might be 1.0143x faster
   richards                         x2       0.07742+-0.00120          0.07704+-0.00025       
   splay                            x2       0.31110+-0.00157          0.30727+-0.00233         might be 1.0124x faster
   regexp                           x2      16.47500+-0.50938         16.23519+-0.56249         might be 1.0148x faster
   pdfjs                            x2      38.62938+-0.52428         38.53210+-0.49054       
   mandreel                         x2      39.78858+-0.31988    ?    39.98960+-0.49391       ?
   gbemu                            x2      28.49413+-0.34866         28.48138+-0.21021       
   closure                                   0.47102+-0.00322    ?     0.47189+-0.00435       ?
   jquery                                    6.42113+-0.03222    ?     6.46318+-0.04181       ?
   box2d                            x2       8.63700+-0.12467    ?     8.65220+-0.07580       ?
   zlib                             x2     336.08184+-12.29731   ?   337.94990+-4.50929       ?
   typescript                       x2     600.42969+-13.20123   ?   602.13529+-14.98861      ?

   <geometric>                               4.68792+-0.02586          4.67470+-0.02102         might be 1.0028x faster

                                                TipOfTree                   Things                                      
Kraken:
   ai-astar                                   88.491+-1.811      ?      89.722+-3.738         ? might be 1.0139x slower
   audio-beat-detection                       35.869+-1.533             35.285+-0.129           might be 1.0165x faster
   audio-dft                                  96.468+-3.494             95.269+-2.450           might be 1.0126x faster
   audio-fft                                  27.391+-0.069      ?      29.140+-4.349         ? might be 1.0639x slower
   audio-oscillator                           42.631+-0.225      ?      42.715+-0.244         ?
   imaging-darkroom                           55.945+-0.700             55.686+-0.531         
   imaging-desaturate                         40.610+-0.153      ?      42.115+-2.123         ? might be 1.0370x slower
   imaging-gaussian-blur                      59.987+-3.584             58.609+-2.727           might be 1.0235x faster
   json-parse-financial                       32.195+-1.510             31.944+-1.557         
   json-stringify-tinderbox                   21.179+-1.261      ?      22.691+-1.995         ? might be 1.0714x slower
   stanford-crypto-aes                        34.114+-0.352             34.014+-0.209         
   stanford-crypto-ccm                        32.386+-2.168             32.163+-0.476         
   stanford-crypto-pbkdf2                     88.909+-0.829      ?      91.229+-1.995         ? might be 1.0261x slower
   stanford-crypto-sha256-iterative           29.280+-0.517      ?      30.207+-2.015         ? might be 1.0317x slower

   <arithmetic>                               48.961+-0.559      ?      49.342+-0.677         ? might be 1.0078x slower

                                                TipOfTree                   Things                                      
AsmBench:
   bigfib.cpp                               409.4784+-7.7076     ?    409.5128+-5.0230        ?
   cray.c                                   354.0447+-5.1482     ?    354.8725+-3.2708        ?
   dry.c                                    386.9092+-4.0625     ?    390.0046+-2.7128        ?
   FloatMM.c                                633.1066+-33.3193         619.0405+-30.2663         might be 1.0227x faster
   gcc-loops.cpp                           3342.0907+-22.3980    ?   3353.3735+-14.7225       ?
   n-body.c                                 747.9452+-2.9462     ?    750.7775+-3.9302        ?
   Quicksort.c                              367.4604+-6.4377     ?    371.8779+-10.8246       ? might be 1.0120x slower
   stepanov_container.cpp                  3161.1853+-19.0877    ?   3163.3820+-22.5073       ?
   Towers.c                                 230.9992+-3.1366     ?    232.0922+-4.7698        ?

   <geometric>                              660.0768+-4.5254     ?    660.9889+-4.0355        ? might be 1.0014x slower

                                                TipOfTree                   Things                                      
Geomean of preferred means:
   <scaled-result>                           44.3607+-0.2115     ?     44.3904+-0.1871        ? might be 1.0007x slower
Comment 11 Geoffrey Garen 2016-10-17 14:22:15 PDT
Comment on attachment 291852 [details]
the patch

r=me
Comment 12 Filip Pizlo 2016-10-17 14:49:13 PDT
Landed in https://trac.webkit.org/changeset/207434