RESOLVED FIXED 71191
The DFG inliner should not flush the callee
https://bugs.webkit.org/show_bug.cgi?id=71191
Summary The DFG inliner should not flush the callee
Filip Pizlo
Reported 2011-10-30 12:28:04 PDT
The callee is a known constant once we get to the inline site. There is no need to emit code to flush its value, since it is known statically.
Attachments
the patch (14.72 KB, patch)
2011-10-30 12:32 PDT, Filip Pizlo
oliver: review+
Filip Pizlo
Comment 1 2011-10-30 12:32:29 PDT
Created attachment 112997 [details] the patch Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "LessFlushing" at /Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree LessFlushing SunSpider: 3d-cube 7.6774+-0.2933 7.6607+-0.3838 3d-morph 7.8094+-0.2228 7.6638+-0.1546 might be 1.0190x faster 3d-raytrace 7.6318+-0.1369 ? 7.8192+-0.3037 ? might be 1.0246x slower access-binary-trees 1.6662+-0.0878 1.5880+-0.0485 might be 1.0492x faster access-fannkuch 6.5485+-0.1799 ? 6.5864+-0.1052 ? access-nbody 3.8464+-0.1466 3.7546+-0.0710 might be 1.0245x faster access-nsieve 2.6408+-0.0810 ? 2.6629+-0.0597 ? bitops-3bit-bits-in-byte 1.2968+-0.0490 ? 1.3214+-0.0326 ? might be 1.0190x slower bitops-bits-in-byte 2.3388+-0.0407 ? 2.4127+-0.0610 ? might be 1.0316x slower bitops-bitwise-and 3.3356+-0.0755 ? 3.3868+-0.0625 ? might be 1.0154x slower bitops-nsieve-bits 5.4879+-0.1551 ? 5.6985+-0.1596 ? might be 1.0384x slower controlflow-recursive 2.1144+-0.0506 ? 2.1350+-0.0434 ? crypto-aes 7.7821+-0.2848 7.3630+-0.2184 might be 1.0569x faster crypto-md5 2.8010+-0.0746 ? 2.8239+-0.0653 ? crypto-sha1 2.5037+-0.1033 2.4452+-0.0559 might be 1.0239x faster date-format-tofte 10.7080+-0.3585 10.6284+-0.3882 date-format-xparb 9.3093+-0.3583 9.1810+-0.1518 might be 1.0140x faster math-cordic 6.3960+-0.0707 ? 6.5141+-0.1048 ? might be 1.0185x slower math-partial-sums 7.5764+-0.1904 7.5538+-0.1381 math-spectral-norm 2.6344+-0.0869 2.5553+-0.0489 might be 1.0310x faster regexp-dna 11.8781+-0.2835 11.8478+-0.2854 string-base64 4.7189+-0.3074 4.3514+-0.0887 might be 1.0845x faster string-fasta 6.3242+-0.1923 ? 6.3576+-0.1993 ? string-tagcloud 11.8795+-0.4631 ? 12.0385+-0.4493 ? might be 1.0134x slower string-unpack-code 21.2102+-0.6343 20.9655+-0.6009 might be 1.0117x faster string-validate-input 5.6201+-0.2652 ^ 5.2098+-0.0356 ^ definitely 1.0787x faster <arithmetic> * 6.2975+-0.0308 6.2510+-0.0537 <geometric> 5.0577+-0.0298 5.0211+-0.0229 <harmonic> 4.0199+-0.0444 3.9972+-0.0248 TipOfTree LessFlushing V8: crypto 74.1495+-0.5178 73.7049+-0.2992 deltablue 178.5523+-1.2782 ^ 174.6592+-0.9780 ^ definitely 1.0223x faster earley-boyer 92.2212+-0.5215 91.5441+-0.8722 raytrace 63.2862+-0.6343 62.8049+-0.5868 regexp 106.2758+-0.2968 ? 106.5172+-0.5921 ? richards 126.4207+-0.3875 ? 126.7569+-0.8468 ? splay 94.0682+-0.6561 93.3268+-0.5404 <arithmetic> 104.9963+-0.2882 ^ 104.1877+-0.2257 ^ definitely 1.0078x faster <geometric> * 99.6581+-0.2726 ^ 99.0038+-0.2268 ^ definitely 1.0066x faster <harmonic> 94.9949+-0.2986 ^ 94.4175+-0.2496 ^ definitely 1.0061x faster TipOfTree LessFlushing Kraken: ai-astar 507.7860+-5.5031 500.0292+-3.1202 might be 1.0155x faster audio-beat-detection 193.5606+-1.6671 193.3397+-1.0881 audio-dft 270.9824+-2.0267 ! 279.2854+-4.6612 ! definitely 1.0306x slower audio-fft 125.5675+-1.0207 ? 125.9343+-1.3975 ? audio-oscillator 251.7554+-1.5405 ? 253.0721+-2.5280 ? imaging-darkroom 403.8683+-0.8796 ? 405.8017+-1.5481 ? imaging-desaturate 225.5569+-0.8329 ? 227.1678+-1.1289 ? imaging-gaussian-blur 559.0650+-2.9824 557.9223+-2.4453 json-parse-financial 57.9411+-0.8268 57.6131+-0.4766 json-stringify-tinderbox 68.9015+-0.2731 ? 69.5439+-0.6252 ? stanford-crypto-aes 134.0962+-1.3364 133.1493+-1.3010 stanford-crypto-ccm 100.2692+-0.5826 ? 101.1800+-0.7529 ? stanford-crypto-pbkdf2 193.9197+-1.1550 ? 196.3040+-3.6841 ? might be 1.0123x slower stanford-crypto-sha256-iterative 71.5146+-0.3583 71.0716+-0.4888 <arithmetic> * 226.0560+-0.5869 ? 226.5296+-0.2998 ? <geometric> 177.9366+-0.4431 ? 178.4780+-0.2787 ? <harmonic> 140.2580+-0.4695 ? 140.5373+-0.2454 ? TipOfTree LessFlushing All benchmarks: <arithmetic> 86.4573+-0.2136 86.4522+-0.1007 <geometric> 22.7697+-0.0828 22.6769+-0.0622 <harmonic> 7.0768+-0.0761 7.0378+-0.0427 TipOfTree LessFlushing Geomean of preferred means: <scaled-result> 52.1551+-0.1390 51.9477+-0.1545
Oliver Hunt
Comment 2 2011-10-30 15:21:35 PDT
Comment on attachment 112997 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=112997&action=review It would be nice is we could elide the argument flushing when we can prove no OSRs will happen > Source/JavaScriptCore/bytecode/CodeOrigin.h:81 > unsigned stackOffset; > - unsigned calleeVR; > + WriteBarrier<JSFunction> callee; > CodeOrigin caller; can we move stackOffset down the struct? changing from an unsigned to a pointer should increase alignment requirements so i think this ends up increasing the struct size by 12 bytes rather than 4
Filip Pizlo
Comment 3 2011-10-30 15:43:42 PDT
(In reply to comment #2) > (From update of attachment 112997 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=112997&action=review > > It would be nice is we could elide the argument flushing when we can prove no OSRs will happen Most of the argument flushes seem to be: 1) Flushing a constant 2) Flushing a value that is provably equal to the argument to the machine code block 3) Flushing a value that was already flushed because it was provably equal to an argument to something higher up in the inline stack. I'm planning on doing optimizations for these cases once I write a test suite (well, a suite of LayoutTests) for arguments handling combined with inlining and OSR. Right now I consider those optimizations too risky since I don't even know if that functionality works. > > > Source/JavaScriptCore/bytecode/CodeOrigin.h:81 > > unsigned stackOffset; > > - unsigned calleeVR; > > + WriteBarrier<JSFunction> callee; > > CodeOrigin caller; > > can we move stackOffset down the struct? changing from an unsigned to a pointer should increase alignment requirements so i think this ends up increasing the struct size by 12 bytes rather than 4 Good point, made that change.
Filip Pizlo
Comment 4 2011-10-30 15:48:34 PDT
Note You need to log in before you can comment on or make changes to this bug.