The callee is a known constant once we get to the inline site. There is no need to emit code to flush its value, since it is known statically.
Created attachment 112997 [details] the patch Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "LessFlushing" at /Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree LessFlushing SunSpider: 3d-cube 7.6774+-0.2933 7.6607+-0.3838 3d-morph 7.8094+-0.2228 7.6638+-0.1546 might be 1.0190x faster 3d-raytrace 7.6318+-0.1369 ? 7.8192+-0.3037 ? might be 1.0246x slower access-binary-trees 1.6662+-0.0878 1.5880+-0.0485 might be 1.0492x faster access-fannkuch 6.5485+-0.1799 ? 6.5864+-0.1052 ? access-nbody 3.8464+-0.1466 3.7546+-0.0710 might be 1.0245x faster access-nsieve 2.6408+-0.0810 ? 2.6629+-0.0597 ? bitops-3bit-bits-in-byte 1.2968+-0.0490 ? 1.3214+-0.0326 ? might be 1.0190x slower bitops-bits-in-byte 2.3388+-0.0407 ? 2.4127+-0.0610 ? might be 1.0316x slower bitops-bitwise-and 3.3356+-0.0755 ? 3.3868+-0.0625 ? might be 1.0154x slower bitops-nsieve-bits 5.4879+-0.1551 ? 5.6985+-0.1596 ? might be 1.0384x slower controlflow-recursive 2.1144+-0.0506 ? 2.1350+-0.0434 ? crypto-aes 7.7821+-0.2848 7.3630+-0.2184 might be 1.0569x faster crypto-md5 2.8010+-0.0746 ? 2.8239+-0.0653 ? crypto-sha1 2.5037+-0.1033 2.4452+-0.0559 might be 1.0239x faster date-format-tofte 10.7080+-0.3585 10.6284+-0.3882 date-format-xparb 9.3093+-0.3583 9.1810+-0.1518 might be 1.0140x faster math-cordic 6.3960+-0.0707 ? 6.5141+-0.1048 ? might be 1.0185x slower math-partial-sums 7.5764+-0.1904 7.5538+-0.1381 math-spectral-norm 2.6344+-0.0869 2.5553+-0.0489 might be 1.0310x faster regexp-dna 11.8781+-0.2835 11.8478+-0.2854 string-base64 4.7189+-0.3074 4.3514+-0.0887 might be 1.0845x faster string-fasta 6.3242+-0.1923 ? 6.3576+-0.1993 ? string-tagcloud 11.8795+-0.4631 ? 12.0385+-0.4493 ? might be 1.0134x slower string-unpack-code 21.2102+-0.6343 20.9655+-0.6009 might be 1.0117x faster string-validate-input 5.6201+-0.2652 ^ 5.2098+-0.0356 ^ definitely 1.0787x faster <arithmetic> * 6.2975+-0.0308 6.2510+-0.0537 <geometric> 5.0577+-0.0298 5.0211+-0.0229 <harmonic> 4.0199+-0.0444 3.9972+-0.0248 TipOfTree LessFlushing V8: crypto 74.1495+-0.5178 73.7049+-0.2992 deltablue 178.5523+-1.2782 ^ 174.6592+-0.9780 ^ definitely 1.0223x faster earley-boyer 92.2212+-0.5215 91.5441+-0.8722 raytrace 63.2862+-0.6343 62.8049+-0.5868 regexp 106.2758+-0.2968 ? 106.5172+-0.5921 ? richards 126.4207+-0.3875 ? 126.7569+-0.8468 ? splay 94.0682+-0.6561 93.3268+-0.5404 <arithmetic> 104.9963+-0.2882 ^ 104.1877+-0.2257 ^ definitely 1.0078x faster <geometric> * 99.6581+-0.2726 ^ 99.0038+-0.2268 ^ definitely 1.0066x faster <harmonic> 94.9949+-0.2986 ^ 94.4175+-0.2496 ^ definitely 1.0061x faster TipOfTree LessFlushing Kraken: ai-astar 507.7860+-5.5031 500.0292+-3.1202 might be 1.0155x faster audio-beat-detection 193.5606+-1.6671 193.3397+-1.0881 audio-dft 270.9824+-2.0267 ! 279.2854+-4.6612 ! definitely 1.0306x slower audio-fft 125.5675+-1.0207 ? 125.9343+-1.3975 ? audio-oscillator 251.7554+-1.5405 ? 253.0721+-2.5280 ? imaging-darkroom 403.8683+-0.8796 ? 405.8017+-1.5481 ? imaging-desaturate 225.5569+-0.8329 ? 227.1678+-1.1289 ? imaging-gaussian-blur 559.0650+-2.9824 557.9223+-2.4453 json-parse-financial 57.9411+-0.8268 57.6131+-0.4766 json-stringify-tinderbox 68.9015+-0.2731 ? 69.5439+-0.6252 ? stanford-crypto-aes 134.0962+-1.3364 133.1493+-1.3010 stanford-crypto-ccm 100.2692+-0.5826 ? 101.1800+-0.7529 ? stanford-crypto-pbkdf2 193.9197+-1.1550 ? 196.3040+-3.6841 ? might be 1.0123x slower stanford-crypto-sha256-iterative 71.5146+-0.3583 71.0716+-0.4888 <arithmetic> * 226.0560+-0.5869 ? 226.5296+-0.2998 ? <geometric> 177.9366+-0.4431 ? 178.4780+-0.2787 ? <harmonic> 140.2580+-0.4695 ? 140.5373+-0.2454 ? TipOfTree LessFlushing All benchmarks: <arithmetic> 86.4573+-0.2136 86.4522+-0.1007 <geometric> 22.7697+-0.0828 22.6769+-0.0622 <harmonic> 7.0768+-0.0761 7.0378+-0.0427 TipOfTree LessFlushing Geomean of preferred means: <scaled-result> 52.1551+-0.1390 51.9477+-0.1545
Comment on attachment 112997 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=112997&action=review It would be nice is we could elide the argument flushing when we can prove no OSRs will happen > Source/JavaScriptCore/bytecode/CodeOrigin.h:81 > unsigned stackOffset; > - unsigned calleeVR; > + WriteBarrier<JSFunction> callee; > CodeOrigin caller; can we move stackOffset down the struct? changing from an unsigned to a pointer should increase alignment requirements so i think this ends up increasing the struct size by 12 bytes rather than 4
(In reply to comment #2) > (From update of attachment 112997 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=112997&action=review > > It would be nice is we could elide the argument flushing when we can prove no OSRs will happen Most of the argument flushes seem to be: 1) Flushing a constant 2) Flushing a value that is provably equal to the argument to the machine code block 3) Flushing a value that was already flushed because it was provably equal to an argument to something higher up in the inline stack. I'm planning on doing optimizations for these cases once I write a test suite (well, a suite of LayoutTests) for arguments handling combined with inlining and OSR. Right now I consider those optimizations too risky since I don't even know if that functionality works. > > > Source/JavaScriptCore/bytecode/CodeOrigin.h:81 > > unsigned stackOffset; > > - unsigned calleeVR; > > + WriteBarrier<JSFunction> callee; > > CodeOrigin caller; > > can we move stackOffset down the struct? changing from an unsigned to a pointer should increase alignment requirements so i think this ends up increasing the struct size by 12 bytes rather than 4 Good point, made that change.
Landed in http://trac.webkit.org/changeset/98831