RESOLVED FIXED 175523
[DFG] Relax arity requirement
https://bugs.webkit.org/show_bug.cgi?id=175523
Summary [DFG] Relax arity requirement
Yusuke Suzuki
Reported 2017-08-13 10:13:28 PDT
When the arity fixup is needed, DFG gives up inlining. We should perform arity fixup in DFG and allow these functions to be inlined!
Attachments
Patch (6.74 KB, patch)
2017-08-13 12:48 PDT, Yusuke Suzuki
no flags
Patch (32.30 KB, patch)
2017-08-16 09:32 PDT, Yusuke Suzuki
no flags
Patch (32.47 KB, patch)
2017-08-16 10:25 PDT, Yusuke Suzuki
no flags
Patch (36.77 KB, patch)
2017-08-16 11:40 PDT, Yusuke Suzuki
no flags
Patch (49.29 KB, patch)
2017-08-17 16:28 PDT, Yusuke Suzuki
no flags
Patch (52.39 KB, patch)
2017-08-17 18:30 PDT, Yusuke Suzuki
no flags
Patch (59.99 KB, patch)
2017-08-18 00:08 PDT, Yusuke Suzuki
no flags
Patch (60.47 KB, patch)
2017-08-29 16:40 PDT, Yusuke Suzuki
no flags
Patch (60.46 KB, patch)
2017-08-30 19:20 PDT, Yusuke Suzuki
no flags
Patch (60.46 KB, patch)
2017-08-30 19:29 PDT, Yusuke Suzuki
saam: review+
Patch for landing (62.16 KB, patch)
2017-09-02 01:07 PDT, Yusuke Suzuki
no flags
Patch for landing (62.16 KB, patch)
2017-09-02 01:19 PDT, Yusuke Suzuki
no flags
Yusuke Suzuki
Comment 1 2017-08-13 12:48:36 PDT
Created attachment 318013 [details] Patch WIP
Yusuke Suzuki
Comment 2 2017-08-16 09:32:28 PDT
Created attachment 318266 [details] Patch WIP
Yusuke Suzuki
Comment 3 2017-08-16 10:25:10 PDT
Created attachment 318270 [details] Patch WIP
Yusuke Suzuki
Comment 4 2017-08-16 11:40:19 PDT
Created attachment 318277 [details] Patch WIP
Build Bot
Comment 5 2017-08-16 11:41:51 PDT
Attachment 318277 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1559: When wrapping a line, only indent 4 spaces. [whitespace/indent] [3] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5995: Wrong number of spaces before statement. (expected: 8) [whitespace/indent] [4] Total errors found: 2 in 18 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 6 2017-08-17 16:28:14 PDT
Created attachment 318436 [details] Patch WIP
Build Bot
Comment 7 2017-08-17 16:31:13 PDT
Attachment 318436 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1559: When wrapping a line, only indent 4 spaces. [whitespace/indent] [3] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5995: Wrong number of spaces before statement. (expected: 8) [whitespace/indent] [4] Total errors found: 2 in 26 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 8 2017-08-17 18:30:15 PDT
Created attachment 318456 [details] Patch WIP
Build Bot
Comment 9 2017-08-17 18:32:14 PDT
Attachment 318456 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1559: When wrapping a line, only indent 4 spaces. [whitespace/indent] [3] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5995: Wrong number of spaces before statement. (expected: 8) [whitespace/indent] [4] Total errors found: 2 in 28 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 10 2017-08-18 00:08:08 PDT
Yusuke Suzuki
Comment 11 2017-08-18 01:02:54 PDT
Performance seems neutral in SunSpider, Octane, and Kraken. Benchmark report for SunSpider, Octane, and Kraken on sakura-trick. VMs tested: "baseline" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/arity-tot/Release/bin/jsc "patched" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/arity/Release/bin/jsc Collected 30 samples per benchmark/VM, with 30 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. baseline patched SunSpider: 3d-cube 8.8515+-0.6472 8.3906+-0.5052 might be 1.0549x faster 3d-morph 21.9400+-3.8353 18.8490+-3.6735 might be 1.1640x faster 3d-raytrace 9.6133+-0.5565 9.4410+-0.6365 might be 1.0182x faster access-binary-trees 4.0922+-0.2109 4.0830+-0.2836 access-fannkuch 10.6515+-0.9241 ? 11.3128+-0.8015 ? might be 1.0621x slower access-nbody 4.9356+-0.3269 ? 5.3178+-0.3462 ? might be 1.0774x slower access-nsieve 6.4670+-0.5098 ? 6.7402+-0.5625 ? might be 1.0422x slower bitops-3bit-bits-in-byte 2.2414+-0.1406 ? 2.3213+-0.1477 ? might be 1.0357x slower bitops-bits-in-byte 5.4161+-0.5070 5.3651+-0.4138 bitops-bitwise-and 4.4602+-0.3687 4.3406+-0.3700 might be 1.0276x faster bitops-nsieve-bits 6.1618+-0.4392 ? 6.5085+-0.3864 ? might be 1.0563x slower controlflow-recursive 5.0577+-0.3498 5.0148+-0.2220 crypto-aes 7.9557+-0.4993 7.8757+-0.4579 might be 1.0102x faster crypto-md5 4.8827+-0.3886 4.7478+-0.3323 might be 1.0284x faster crypto-sha1 5.2459+-0.3512 5.1926+-0.3534 might be 1.0103x faster date-format-tofte 12.8260+-0.8681 ? 13.3424+-0.8851 ? might be 1.0403x slower date-format-xparb 10.1109+-0.6211 ? 10.2845+-0.5946 ? might be 1.0172x slower math-cordic 6.0790+-0.4223 5.7125+-0.3813 might be 1.0642x faster math-partial-sums 11.4365+-0.6781 10.7543+-0.7751 might be 1.0634x faster math-spectral-norm 3.9795+-0.2345 ? 4.0927+-0.2551 ? might be 1.0285x slower regexp-dna 12.4586+-0.9572 ? 12.7482+-0.7345 ? might be 1.0232x slower string-base64 7.9979+-0.6506 7.5830+-0.5764 might be 1.0547x faster string-fasta 11.4262+-0.8205 11.3134+-0.6282 string-tagcloud 14.9830+-0.7134 ? 15.1685+-0.8758 ? might be 1.0124x slower string-unpack-code 33.6683+-1.2289 33.2489+-1.4156 might be 1.0126x faster string-validate-input 8.6673+-0.5133 ? 8.8465+-0.4592 ? might be 1.0207x slower <arithmetic> 9.2925+-0.1752 9.1768+-0.1780 might be 1.0126x faster baseline patched Octane: encrypt 0.21697+-0.00250 0.21505+-0.00222 decrypt 3.19592+-0.07732 3.15080+-0.01926 might be 1.0143x faster deltablue x2 0.17943+-0.00286 0.17899+-0.00354 earley 0.38484+-0.00581 0.38385+-0.00563 boyer 6.81272+-0.06503 ? 6.87593+-0.08935 ? navier-stokes x2 5.15583+-0.02783 5.13569+-0.01410 raytrace x2 1.08541+-0.01280 ? 1.09709+-0.02142 ? might be 1.0108x slower richards x2 0.10280+-0.00603 0.10176+-0.00341 might be 1.0102x faster splay x2 0.41644+-0.00297 0.41633+-0.00272 regexp x2 24.80921+-0.88032 ? 24.82124+-0.46774 ? pdfjs x2 53.16431+-1.21299 52.09709+-0.85814 might be 1.0205x faster mandreel x2 59.04636+-0.91564 ? 59.46426+-1.88705 ? gbemu x2 44.27448+-1.53874 ? 46.33595+-1.52696 ? might be 1.0466x slower closure 0.76323+-0.02071 0.75325+-0.01909 might be 1.0133x faster jquery 10.02756+-0.30741 9.93506+-0.29140 box2d x2 13.25011+-0.30650 13.00191+-0.33909 might be 1.0191x faster zlib x2 471.37625+-5.13116 463.36143+-5.39613 might be 1.0173x faster typescript x2 869.03813+-20.31389 ? 871.38597+-25.51990 ? <geometric> 6.77721+-0.04307 6.76760+-0.04416 might be 1.0014x faster baseline patched Kraken: ai-astar 152.440+-4.847 ? 155.918+-4.254 ? might be 1.0228x slower audio-beat-detection 53.706+-2.854 53.309+-2.224 audio-dft 101.823+-2.447 ? 102.626+-2.420 ? audio-fft 38.700+-1.117 ? 38.833+-1.203 ? audio-oscillator 69.622+-2.187 68.690+-3.007 might be 1.0136x faster imaging-darkroom 122.502+-2.888 119.586+-2.688 might be 1.0244x faster imaging-desaturate 72.576+-1.852 72.510+-2.034 imaging-gaussian-blur 109.564+-3.220 ? 111.903+-4.197 ? might be 1.0214x slower json-parse-financial 61.940+-1.840 ? 62.111+-1.752 ? json-stringify-tinderbox 39.797+-1.070 39.068+-1.612 might be 1.0187x faster stanford-crypto-aes 58.412+-2.978 56.556+-2.395 might be 1.0328x faster stanford-crypto-ccm 53.473+-3.680 49.564+-2.570 might be 1.0789x faster stanford-crypto-pbkdf2 93.506+-3.136 93.174+-3.229 stanford-crypto-sha256-iterative 33.539+-0.842 33.075+-1.714 might be 1.0140x faster <arithmetic> 75.829+-0.821 75.495+-0.795 might be 1.0044x faster baseline patched Geomean of preferred means: <scaled-result> 16.8333+-0.1261 16.7306+-0.1341 might be 1.0061x faster
Saam Barati
Comment 12 2017-08-29 15:24:53 PDT
Comment on attachment 318481 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=318481&action=review > Source/JavaScriptCore/bytecode/InlineCallFrame.h:183 > + unsigned argumentCountIncludingThis; // Do not include fixups. "Do not include fixups" =>"Does not include arity fixup" > Source/JavaScriptCore/dfg/DFGArgumentsEliminationPhase.cpp:703 > + safeToGetStack = index < inlineCallFrame->argumentCountIncludingThis - 1; Why does this not include fixups? > Source/JavaScriptCore/dfg/DFGArgumentsEliminationPhase.cpp:808 > + unsigned frameArgumentCount = inlineCallFrame->argumentCountIncludingThis - 1; Why does this not include fixup? > Source/JavaScriptCore/dfg/DFGArgumentsEliminationPhase.cpp:837 > + unsigned frameArgumentCount = inlineCallFrame->argumentCountIncludingThis - 1; ditto > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1544 > + if (extraSlots != paddedStackSpace) > + registerOffsetAfterFixup = registerOffset - (paddedStackSpace - extraSlots); This seems wrong to me. What if you have arity fixup but extraSlots == paddedStackSpace, don't you need to change this? > Source/JavaScriptCore/runtime/CommonSlowPaths.h:53 > +ALWAYS_INLINE int paddedStackSpaceFor(CodeBlock* codeBlock, int argumentCountIncludingThis) This name reads weirdly to me. I expect it to return the total stack space including padding. But it's really just returning the padding needed. So maybe we can call this numberOfStackPaddingSlots or something similar. > Source/JavaScriptCore/runtime/CommonSlowPaths.h:-60 > - ASSERT(argumentCountIncludingThis < newCodeBlock->numParameters()); You should keep this assertion since it dictates that the above function behaves like this code here used to
Yusuke Suzuki
Comment 13 2017-08-29 16:27:38 PDT
Comment on attachment 318481 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=318481&action=review Thank you! >> Source/JavaScriptCore/dfg/DFGArgumentsEliminationPhase.cpp:703 >> + safeToGetStack = index < inlineCallFrame->argumentCountIncludingThis - 1; > > Why does this not include fixups? This is because this phase is considering "whether this arguments[xxx] access is safe". Consider the function, `function t(a, b, c) { }`. And we call it in the form of `t(42)`. Then, b and c will be filled with undefined by arity fixup. But, `arguments.length` should be 1, and `arguments` object should not include fixups. If we access `arguments[1]` in this function, this will not be trapped in the own property phase since fixups are not included. So, we should use `argumentCountIncludingThis - 1` instead of `argumentsWithFixups.size() - 1` here. >> Source/JavaScriptCore/dfg/DFGArgumentsEliminationPhase.cpp:808 >> + unsigned frameArgumentCount = inlineCallFrame->argumentCountIncludingThis - 1; > > Why does this not include fixup? Ditto. >> Source/JavaScriptCore/dfg/DFGArgumentsEliminationPhase.cpp:837 >> + unsigned frameArgumentCount = inlineCallFrame->argumentCountIncludingThis - 1; > > ditto Ditto. >> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1544 >> + registerOffsetAfterFixup = registerOffset - (paddedStackSpace - extraSlots); > > This seems wrong to me. What if you have arity fixup but extraSlots == paddedStackSpace, don't you need to change this? Yes. This is because we already have the hole for the extraSlots. Note that this extraSlots operation is the same to what we are doing for the arity fixup in LLInt and Baseline JIT. Consider the case, -> stack bottom [extraSlot for stack alignment][arg2][arg1][arg0][header] At that time, we found that this function requires 4 arguments. Then, we will make it as is the following. [arg3][arg2][arg1][arg0][header] So, extraSlots is used for arg3. In this case, `registerOffset == registerOffsetAfterFixup`. You can see extra slot use in LLInt's LowLevelInterpreter64.asm's functionArityCheck macro and Baseline's ThunkGenerators.cpp's arityFixupGenerator. >> Source/JavaScriptCore/runtime/CommonSlowPaths.h:53 >> +ALWAYS_INLINE int paddedStackSpaceFor(CodeBlock* codeBlock, int argumentCountIncludingThis) > > This name reads weirdly to me. I expect it to return the total stack space including padding. But it's really just returning the padding needed. So maybe we can call this numberOfStackPaddingSlots or something similar. OK, I'll rename it. >> Source/JavaScriptCore/runtime/CommonSlowPaths.h:-60 >> - ASSERT(argumentCountIncludingThis < newCodeBlock->numParameters()); > > You should keep this assertion since it dictates that the above function behaves like this code here used to OK, I've reverted it. `numberOfStackPaddingSlots` should not have this since we always call that function in DFG. I'll keep this assertion in arityCheckFor function.
Yusuke Suzuki
Comment 14 2017-08-29 16:40:07 PDT
Yusuke Suzuki
Comment 15 2017-08-30 19:20:19 PDT
Created attachment 319434 [details] Patch Simple build fix for debug build.
Yusuke Suzuki
Comment 16 2017-08-30 19:29:47 PDT
Created attachment 319435 [details] Patch Simple build fix for debug build.
Yusuke Suzuki
Comment 17 2017-09-01 17:48:12 PDT
Ping?
Saam Barati
Comment 18 2017-09-02 00:02:05 PDT
Comment on attachment 319435 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=319435&action=review > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1583 > + for (int index = 0; index < argumentCountIncludingThis; ++index) How does this loop not overwrite itself?
Saam Barati
Comment 19 2017-09-02 00:02:55 PDT
Comment on attachment 319435 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=319435&action=review >> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1583 >> + for (int index = 0; index < argumentCountIncludingThis; ++index) > > How does this loop not overwrite itself? Ignore this. Yusuke and I already discussed on IRC. The math works out
Yusuke Suzuki
Comment 20 2017-09-02 01:07:46 PDT
Created attachment 319703 [details] Patch for landing
Yusuke Suzuki
Comment 21 2017-09-02 01:19:42 PDT
Created attachment 319707 [details] Patch for landing
Yusuke Suzuki
Comment 22 2017-09-02 01:35:52 PDT
Yusuke Suzuki
Comment 24 2017-09-05 01:23:49 PDT
I guess that this improves Speedometer 2.0 jQuery. https://perf.webkit.org/v3/#/charts?since=1502871591591&paneList=((18-946-(1504326612417.7107-1504454348857.5935)-null-(5-2.5-500))) I think this is because jQuery's function can take optional values as an additional argument. And before this patch, it prevents inlining.
Radar WebKit Bug Importer
Comment 25 2017-09-27 12:51:08 PDT
Note You need to log in before you can comment on or make changes to this bug.