Bug 153535 - IRC should know which Tmps hold constants, and emit spill code accordingly
Summary: IRC should know which Tmps hold constants, and emit spill code accordingly
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: All All
: P2 Normal
Assignee: Filip Pizlo
URL:
Keywords:
Depends on:
Blocks: 150456
  Show dependency treegraph
 
Reported: 2016-01-26 21:58 PST by Filip Pizlo
Modified: 2016-01-26 23:05 PST (History)
12 users (show)

See Also:


Attachments
the patch (34.75 KB, patch)
2016-01-26 22:02 PST, Filip Pizlo
oliver: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2016-01-26 21:58:25 PST
Patch forthcoming.
Comment 1 Filip Pizlo 2016-01-26 22:02:14 PST
Created attachment 269981 [details]
the patch
Comment 2 WebKit Commit Bot 2016-01-26 22:03:11 PST
Attachment 269981 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/b3/air/AirTmpSummary.cpp:83:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 1 in 6 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Oliver Hunt 2016-01-26 22:17:08 PST
Comment on attachment 269981 [details]
the patch

R=me -- do you have full perf output?
Comment 4 Benjamin Poulain 2016-01-26 22:27:45 PST
View in context: https://bugs.webkit.org/attachment.cgi?id=269981&action=review

The idea looks okay but I am not convinced that is safe.

You can have:

Move(Imm, Tmp1)
Move(Tmp1, Tmp2)
Add(const, Tmp2)
Use(Tmp2)

In the first round, you figure that Tmp1 and 2 should really be merged. Due to back luck, Tmp1 is picked as the name.
In the second round, you have to spill Tmp1. You check it's value and figure it is a constant.

You would need some very bad luck but that seems feasible to me.

It seems to me that you have to update the TmpSummary when aliasing. What was before a worse spill selection may end up into a bug now.

> Source/JavaScriptCore/b3/air/AirIteratedRegisterCoalescing.cpp:1226
> +                    plan.constant = summary->constant;

You can probably get away with less duplication by doing everything bug filling the SlotOrConstant outside the branch.

SlotOrConstant plan;
if ()
   ... constant plan
else
   ... stackslot plan

Give it a temp

> Source/JavaScriptCore/b3/air/AirTmpSummary.h:49
> +        int64_t constant { 0 };

Why not store the Air::Arg?
Comment 5 Filip Pizlo 2016-01-26 22:42:47 PST
(In reply to comment #4)
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=269981&action=review
> 
> The idea looks okay but I am not convinced that is safe.
> 
> You can have:
> 
> Move(Imm, Tmp1)
> Move(Tmp1, Tmp2)
> Add(const, Tmp2)
> Use(Tmp2)
> 
> In the first round, you figure that Tmp1 and 2 should really be merged. Due
> to back luck, Tmp1 is picked as the name.
> In the second round, you have to spill Tmp1. You check it's value and figure
> it is a constant.
> 
> You would need some very bad luck but that seems feasible to me.
> 
> It seems to me that you have to update the TmpSummary when aliasing. What
> was before a worse spill selection may end up into a bug now.

Yup, you're right.

> 
> > Source/JavaScriptCore/b3/air/AirIteratedRegisterCoalescing.cpp:1226
> > +                    plan.constant = summary->constant;
> 
> You can probably get away with less duplication by doing everything bug
> filling the SlotOrConstant outside the branch.
> 
> SlotOrConstant plan;
> if ()
>    ... constant plan
> else
>    ... stackslot plan
> 
> Give it a temp
> 
> > Source/JavaScriptCore/b3/air/AirTmpSummary.h:49
> > +        int64_t constant { 0 };
> 
> Why not store the Air::Arg?
Comment 6 Filip Pizlo 2016-01-26 22:47:52 PST
I'm not seeing a big speed-up from this.  I will retest.  But if implementing this means having to do a lot of analysis updates during spilling, then that may be too much complexity for no speed-up.
Comment 7 Filip Pizlo 2016-01-26 23:05:24 PST
Yeah I'm going to abandon this.  "FixStackSlot" is really this patch.


Benchmark report for SunSpider, V8Spider, Octane, Kraken, and AsmBench on shakezilla (MacBookPro11,3).

VMs tested:
"LLVM" at /Volumes/Data/secondary/OpenSource/WebKitBuild/Release/jsc (r195637)
"B3ToT" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r195653)
"FixStackSlot" at /Volumes/Data/quinary/OpenSource/WebKitBuild/Release/jsc (r195653)

Collected 6 samples per benchmark/VM, with 6 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark
iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark
execution times with 95% confidence intervals in milliseconds.

                                                   LLVM                     B3ToT                  FixStackSlot            FixStackSlot v. LLVM   
SunSpider:
   3d-cube                                    4.7894+-0.2559     ?      4.9212+-0.4678            4.6844+-0.0956          might be 1.0224x faster
   3d-morph                                   5.4443+-0.1522            5.4110+-0.2089     ?      5.5740+-0.2290        ? might be 1.0238x slower
   3d-raytrace                                5.6170+-0.2053            5.5291+-0.0769            5.4171+-0.0780          might be 1.0369x faster
   access-binary-trees                        2.2149+-0.1390     ?      2.5319+-0.6268            2.5278+-0.8981        ? might be 1.1413x slower
   access-fannkuch                            5.7083+-0.2275     ?      5.7520+-0.2915            5.6160+-0.0437          might be 1.0164x faster
   access-nbody                               2.8453+-0.2651            2.8114+-0.3230     ?      2.9543+-0.3683        ? might be 1.0383x slower
   access-nsieve                              3.3110+-0.3001     ?      3.7904+-0.4962     ^      3.1191+-0.0811          might be 1.0615x faster
   bitops-3bit-bits-in-byte                   1.1755+-0.0140            1.1733+-0.0181     ?      1.2276+-0.0572        ? might be 1.0444x slower
   bitops-bits-in-byte                        3.1551+-0.0492     ?      3.1640+-0.0737            3.1606+-0.0467        ?
   bitops-bitwise-and                         1.9755+-0.0293     ?      2.1064+-0.1746            1.9745+-0.0246        
   bitops-nsieve-bits                         2.9530+-0.0587     ?      3.0345+-0.0485     ^      2.9362+-0.0125        
   controlflow-recursive                      2.6573+-0.7910            2.4211+-0.0975            2.2872+-0.0419          might be 1.1619x faster
   crypto-aes                                 4.0558+-0.0580            4.0356+-0.0843     ?      4.1029+-0.1861        ? might be 1.0116x slower
   crypto-md5                                 2.5686+-0.1678            2.4742+-0.0215     ?      2.6687+-0.3550        ? might be 1.0390x slower
   crypto-sha1                                2.5059+-0.2783            2.4903+-0.1492            2.4428+-0.1370          might be 1.0258x faster
   date-format-tofte                          7.3312+-0.2966            6.9319+-0.1673            6.8319+-0.0348        ^ definitely 1.0731x faster
   date-format-xparb                          4.6788+-0.1351     ?      5.0529+-0.8775            4.9351+-0.3952        ? might be 1.0548x slower
   math-cordic                                2.9197+-0.0281     ?      3.1418+-0.3785            2.8995+-0.0521        
   math-partial-sums                          4.9575+-0.2886            4.8896+-0.0920            4.8880+-0.0905          might be 1.0142x faster
   math-spectral-norm                         2.0713+-0.0082     ?      2.0731+-0.0459            2.0570+-0.0632        
   regexp-dna                                 6.3435+-0.1982            6.1319+-0.1590     ?      6.2488+-0.2575          might be 1.0152x faster
   string-base64                              4.8879+-0.5220            4.8155+-0.1738     ^      4.5019+-0.0843          might be 1.0857x faster
   string-fasta                               5.8739+-0.0692            5.8568+-0.1465            5.8030+-0.0912          might be 1.0122x faster
   string-tagcloud                            8.0479+-0.3202            8.0140+-0.3554            7.7443+-0.0373          might be 1.0392x faster
   string-unpack-code                        19.2266+-1.7077           18.3458+-0.8107     ?     18.8305+-1.5089          might be 1.0210x faster
   string-validate-input                      4.4493+-0.1844     ?      4.6125+-0.4244            4.3435+-0.2975          might be 1.0244x faster

   <arithmetic>                               4.6833+-0.1094            4.6735+-0.0593            4.6068+-0.0730          might be 1.0166x faster

                                                   LLVM                     B3ToT                  FixStackSlot            FixStackSlot v. LLVM   
V8Spider:
   crypto                                    50.4202+-1.0291     ^     38.0345+-0.4847           37.6337+-0.6090        ^ definitely 1.3398x faster
   deltablue                                 76.6903+-2.3078     ^     51.4685+-0.8000     ?     53.4559+-1.3808        ^ definitely 1.4346x faster
   earley-boyer                              45.5573+-0.8511     ^     41.9361+-0.8742           41.8469+-0.4467        ^ definitely 1.0887x faster
   raytrace                                  30.8907+-1.3434     ^     20.3018+-0.6365     ?     20.5273+-0.9701        ^ definitely 1.5049x faster
   regexp                                    63.8780+-2.8251     ?     64.8892+-3.2281           64.2653+-2.6902        ?
   richards                                  53.2545+-1.1488     ^     40.1047+-0.5869     ?     47.0843+-16.2073         might be 1.1310x faster
   splay                                     38.9781+-1.9016           36.6893+-1.7494           36.2443+-1.8178          might be 1.0754x faster

   <geometric>                               49.4164+-0.6504     ^     39.8205+-0.3884     ?     40.6534+-1.8265        ^ definitely 1.2156x faster

                                                   LLVM                     B3ToT                  FixStackSlot            FixStackSlot v. LLVM   
Octane:
   encrypt                                   0.15251+-0.00208    !     0.16297+-0.00296          0.15864+-0.00407       ? might be 1.0402x slower
   decrypt                                   2.90266+-0.00906    ^     2.85430+-0.02266          2.83900+-0.00300       ^ definitely 1.0224x faster
   deltablue                        x2       0.13560+-0.00320          0.13395+-0.00335    ?     0.13412+-0.00481         might be 1.0111x faster
   earley                                    0.28595+-0.00715          0.27764+-0.00259    ?     0.28103+-0.00357         might be 1.0175x faster
   boyer                                     4.25335+-0.02297    ?     4.36977+-0.13231          4.29641+-0.00876       ! definitely 1.0101x slower
   navier-stokes                    x2       4.86960+-0.17996          4.83430+-0.02650          4.80787+-0.01486         might be 1.0128x faster
   raytrace                         x2       0.85306+-0.00544    !     0.86828+-0.00268    !     0.88498+-0.00581       ! definitely 1.0374x slower
   richards                         x2       0.08555+-0.00069    ^     0.07882+-0.00122    ?     0.07907+-0.00081       ^ definitely 1.0819x faster
   splay                            x2       0.35048+-0.00259    ?     0.35344+-0.00386          0.34989+-0.00126       
   regexp                           x2      25.27622+-0.42318    ^    24.27359+-0.15231         24.05678+-0.33669       ^ definitely 1.0507x faster
   pdfjs                            x2      38.03603+-0.31473         37.69347+-0.10104    ?    37.80200+-0.28297       
   mandreel                         x2      42.71740+-0.82776    ?    43.15691+-0.64838    ?    44.03764+-2.64013       ? might be 1.0309x slower
   gbemu                            x2      29.18268+-0.34543    !    32.38942+-2.62173    ?    34.13318+-3.96887       ! definitely 1.1696x slower
   closure                                   0.56746+-0.00255          0.56601+-0.00445          0.56455+-0.00149       
   jquery                                    7.32850+-0.04465          7.32338+-0.03271    ?     7.33013+-0.07065       ?
   box2d                            x2       9.16494+-0.05151    ^     9.02753+-0.08029    !     9.28304+-0.05046       ! definitely 1.0129x slower
   zlib                             x2     376.69156+-16.47696   ?   385.74034+-10.18427       382.34942+-17.89433      ? might be 1.0150x slower
   typescript                       x2     686.20626+-22.49363       675.96883+-17.63547       668.51090+-12.70805        might be 1.0265x faster

   <geometric>                               5.29248+-0.02903    ?     5.29388+-0.03583    ?     5.31485+-0.06388       ? might be 1.0042x slower

                                                   LLVM                     B3ToT                  FixStackSlot            FixStackSlot v. LLVM   
Kraken:
   ai-astar                                  129.422+-3.409      ^      94.272+-3.070             93.845+-0.878         ^ definitely 1.3791x faster
   audio-beat-detection                       49.567+-0.523      !      55.176+-0.205      ?      55.540+-0.838         ! definitely 1.1205x slower
   audio-dft                                  95.289+-1.367      ?      96.266+-1.702      ?      96.545+-1.788         ? might be 1.0132x slower
   audio-fft                                  35.509+-0.656      !      44.368+-0.817             43.882+-0.801         ! definitely 1.2358x slower
   audio-oscillator                           59.799+-1.799      ^      52.255+-1.896             51.419+-1.543         ^ definitely 1.1630x faster
   imaging-darkroom                           60.026+-0.746      ?      60.150+-0.083      ?      60.180+-0.075         ?
   imaging-desaturate                         49.252+-2.382      !      86.944+-0.573             86.811+-0.858         ! definitely 1.7626x slower
   imaging-gaussian-blur                      89.319+-4.651      ^      68.899+-2.958             68.210+-0.399         ^ definitely 1.3095x faster
   json-parse-financial                       37.175+-0.642      ?      37.700+-1.064      ?      43.903+-15.552        ? might be 1.1810x slower
   json-stringify-tinderbox                   23.588+-0.719             23.321+-1.359      ?      23.795+-3.512         ?
   stanford-crypto-aes                        43.097+-1.236      ^      40.649+-1.166             40.378+-1.087         ^ definitely 1.0673x faster
   stanford-crypto-ccm                        37.085+-1.656      ?      37.134+-1.540      ?      37.901+-1.152         ? might be 1.0220x slower
   stanford-crypto-pbkdf2                     97.625+-0.655      !      98.836+-0.420      !      99.920+-0.597         ! definitely 1.0235x slower
   stanford-crypto-sha256-iterative           37.577+-1.650      ?      38.586+-1.140      ?      39.282+-1.340         ? might be 1.0454x slower

   <arithmetic>                               60.309+-0.264      ^      59.611+-0.375      ?      60.115+-1.551           might be 1.0032x faster

                                                   LLVM                     B3ToT                  FixStackSlot            FixStackSlot v. LLVM   
AsmBench:
   bigfib.cpp                               446.1048+-7.0531     ?    460.2276+-16.1047         453.6863+-6.0451        ? might be 1.0170x slower
   cray.c                                   390.9513+-1.0508     ^    375.1922+-5.9795     ?    376.4558+-4.4787        ^ definitely 1.0385x faster
   dry.c                                    426.2250+-2.4951     ^    411.9704+-7.2703     ?    415.3663+-11.4569         might be 1.0261x faster
   FloatMM.c                                680.7536+-1.3438     !    734.1637+-2.4712     ?    734.8278+-4.1856        ! definitely 1.0794x slower
   gcc-loops.cpp                           3414.7884+-16.0611    !   3506.4532+-9.9958     ?   3510.0105+-22.5020       ! definitely 1.0279x slower
   n-body.c                                 820.3629+-1.0765     !    852.7295+-2.3957          851.1387+-2.3577        ! definitely 1.0375x slower
   Quicksort.c                              406.3832+-1.0965     ^    397.3653+-4.6427          392.7680+-2.4018        ^ definitely 1.0347x faster
   stepanov_container.cpp                  3478.7534+-26.8152    ^   3220.7915+-31.0237    ?   3245.4424+-15.6376       ^ definitely 1.0719x faster
   Towers.c                                 232.1444+-1.1496     !    250.9630+-2.9381     ?    253.3819+-5.0959        ! definitely 1.0915x slower

   <geometric>                              711.1437+-1.5959     ?    717.0700+-5.3172     ?    717.3056+-2.8316        ! definitely 1.0087x slower

                                                   LLVM                     B3ToT                  FixStackSlot            FixStackSlot v. LLVM   
Geomean of preferred means:
   <scaled-result>                           34.9997+-0.1968     ^     33.4868+-0.1406     ?     33.6097+-0.4412        ^ definitely 1.0414x faster