Bug 71191 - The DFG inliner should not flush the callee
Summary: The DFG inliner should not flush the callee
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-30 12:28 PDT by Filip Pizlo
Modified: 2011-10-30 15:48 PDT (History)
0 users

See Also:


Attachments
the patch (14.72 KB, patch)
2011-10-30 12:32 PDT, Filip Pizlo
oliver: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2011-10-30 12:28:04 PDT
The callee is a known constant once we get to the inline site.  There is no need to emit code to flush its value, since it is known statically.
Comment 1 Filip Pizlo 2011-10-30 12:32:29 PDT
Created attachment 112997 [details]
the patch

Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"LessFlushing" at /Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime()
function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in
milliseconds.

                                            TipOfTree              LessFlushing                                  
SunSpider:
   3d-cube                                7.6774+-0.2933          7.6607+-0.3838       
   3d-morph                               7.8094+-0.2228          7.6638+-0.1546         might be 1.0190x faster
   3d-raytrace                            7.6318+-0.1369    ?     7.8192+-0.3037       ? might be 1.0246x slower
   access-binary-trees                    1.6662+-0.0878          1.5880+-0.0485         might be 1.0492x faster
   access-fannkuch                        6.5485+-0.1799    ?     6.5864+-0.1052       ?
   access-nbody                           3.8464+-0.1466          3.7546+-0.0710         might be 1.0245x faster
   access-nsieve                          2.6408+-0.0810    ?     2.6629+-0.0597       ?
   bitops-3bit-bits-in-byte               1.2968+-0.0490    ?     1.3214+-0.0326       ? might be 1.0190x slower
   bitops-bits-in-byte                    2.3388+-0.0407    ?     2.4127+-0.0610       ? might be 1.0316x slower
   bitops-bitwise-and                     3.3356+-0.0755    ?     3.3868+-0.0625       ? might be 1.0154x slower
   bitops-nsieve-bits                     5.4879+-0.1551    ?     5.6985+-0.1596       ? might be 1.0384x slower
   controlflow-recursive                  2.1144+-0.0506    ?     2.1350+-0.0434       ?
   crypto-aes                             7.7821+-0.2848          7.3630+-0.2184         might be 1.0569x faster
   crypto-md5                             2.8010+-0.0746    ?     2.8239+-0.0653       ?
   crypto-sha1                            2.5037+-0.1033          2.4452+-0.0559         might be 1.0239x faster
   date-format-tofte                     10.7080+-0.3585         10.6284+-0.3882       
   date-format-xparb                      9.3093+-0.3583          9.1810+-0.1518         might be 1.0140x faster
   math-cordic                            6.3960+-0.0707    ?     6.5141+-0.1048       ? might be 1.0185x slower
   math-partial-sums                      7.5764+-0.1904          7.5538+-0.1381       
   math-spectral-norm                     2.6344+-0.0869          2.5553+-0.0489         might be 1.0310x faster
   regexp-dna                            11.8781+-0.2835         11.8478+-0.2854       
   string-base64                          4.7189+-0.3074          4.3514+-0.0887         might be 1.0845x faster
   string-fasta                           6.3242+-0.1923    ?     6.3576+-0.1993       ?
   string-tagcloud                       11.8795+-0.4631    ?    12.0385+-0.4493       ? might be 1.0134x slower
   string-unpack-code                    21.2102+-0.6343         20.9655+-0.6009         might be 1.0117x faster
   string-validate-input                  5.6201+-0.2652    ^     5.2098+-0.0356       ^ definitely 1.0787x faster

   <arithmetic> *                         6.2975+-0.0308          6.2510+-0.0537       
   <geometric>                            5.0577+-0.0298          5.0211+-0.0229       
   <harmonic>                             4.0199+-0.0444          3.9972+-0.0248       

                                            TipOfTree              LessFlushing                                  
V8:
   crypto                                74.1495+-0.5178         73.7049+-0.2992       
   deltablue                            178.5523+-1.2782    ^   174.6592+-0.9780       ^ definitely 1.0223x faster
   earley-boyer                          92.2212+-0.5215         91.5441+-0.8722       
   raytrace                              63.2862+-0.6343         62.8049+-0.5868       
   regexp                               106.2758+-0.2968    ?   106.5172+-0.5921       ?
   richards                             126.4207+-0.3875    ?   126.7569+-0.8468       ?
   splay                                 94.0682+-0.6561         93.3268+-0.5404       

   <arithmetic>                         104.9963+-0.2882    ^   104.1877+-0.2257       ^ definitely 1.0078x faster
   <geometric> *                         99.6581+-0.2726    ^    99.0038+-0.2268       ^ definitely 1.0066x faster
   <harmonic>                            94.9949+-0.2986    ^    94.4175+-0.2496       ^ definitely 1.0061x faster

                                            TipOfTree              LessFlushing                                  
Kraken:
   ai-astar                             507.7860+-5.5031        500.0292+-3.1202         might be 1.0155x faster
   audio-beat-detection                 193.5606+-1.6671        193.3397+-1.0881       
   audio-dft                            270.9824+-2.0267    !   279.2854+-4.6612       ! definitely 1.0306x slower
   audio-fft                            125.5675+-1.0207    ?   125.9343+-1.3975       ?
   audio-oscillator                     251.7554+-1.5405    ?   253.0721+-2.5280       ?
   imaging-darkroom                     403.8683+-0.8796    ?   405.8017+-1.5481       ?
   imaging-desaturate                   225.5569+-0.8329    ?   227.1678+-1.1289       ?
   imaging-gaussian-blur                559.0650+-2.9824        557.9223+-2.4453       
   json-parse-financial                  57.9411+-0.8268         57.6131+-0.4766       
   json-stringify-tinderbox              68.9015+-0.2731    ?    69.5439+-0.6252       ?
   stanford-crypto-aes                  134.0962+-1.3364        133.1493+-1.3010       
   stanford-crypto-ccm                  100.2692+-0.5826    ?   101.1800+-0.7529       ?
   stanford-crypto-pbkdf2               193.9197+-1.1550    ?   196.3040+-3.6841       ? might be 1.0123x slower
   stanford-crypto-sha256-iterative      71.5146+-0.3583         71.0716+-0.4888       

   <arithmetic> *                       226.0560+-0.5869    ?   226.5296+-0.2998       ?
   <geometric>                          177.9366+-0.4431    ?   178.4780+-0.2787       ?
   <harmonic>                           140.2580+-0.4695    ?   140.5373+-0.2454       ?

                                            TipOfTree              LessFlushing                                  
All benchmarks:
   <arithmetic>                          86.4573+-0.2136         86.4522+-0.1007       
   <geometric>                           22.7697+-0.0828         22.6769+-0.0622       
   <harmonic>                             7.0768+-0.0761          7.0378+-0.0427       

                                            TipOfTree              LessFlushing                                  
Geomean of preferred means:
   <scaled-result>                       52.1551+-0.1390         51.9477+-0.1545
Comment 2 Oliver Hunt 2011-10-30 15:21:35 PDT
Comment on attachment 112997 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=112997&action=review

It would be nice is we could elide the argument flushing when we can prove no OSRs will happen

> Source/JavaScriptCore/bytecode/CodeOrigin.h:81
>      unsigned stackOffset;
> -    unsigned calleeVR;
> +    WriteBarrier<JSFunction> callee;
>      CodeOrigin caller;

can we move stackOffset down the struct? changing from an unsigned to a pointer should increase alignment requirements so i think this ends up increasing the struct size by 12 bytes rather than 4
Comment 3 Filip Pizlo 2011-10-30 15:43:42 PDT
(In reply to comment #2)
> (From update of attachment 112997 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=112997&action=review
> 
> It would be nice is we could elide the argument flushing when we can prove no OSRs will happen

Most of the argument flushes seem to be:

1) Flushing a constant
2) Flushing a value that is provably equal to the argument to the machine code block
3) Flushing a value that was already flushed because it was provably equal to an argument to something higher up in the inline stack.

I'm planning on doing optimizations for these cases once I write a test suite (well, a suite of LayoutTests) for arguments handling combined with inlining and OSR.  Right now I consider those optimizations too risky since I don't even know if that functionality works.

> 
> > Source/JavaScriptCore/bytecode/CodeOrigin.h:81
> >      unsigned stackOffset;
> > -    unsigned calleeVR;
> > +    WriteBarrier<JSFunction> callee;
> >      CodeOrigin caller;
> 
> can we move stackOffset down the struct? changing from an unsigned to a pointer should increase alignment requirements so i think this ends up increasing the struct size by 12 bytes rather than 4

Good point, made that change.
Comment 4 Filip Pizlo 2011-10-30 15:48:34 PDT
Landed in http://trac.webkit.org/changeset/98831