176601 – Turn recursive tail calls into loops

RESOLVED FIXED 176601

Turn recursive tail calls into loops

https://bugs.webkit.org/show_bug.cgi?id=176601

Summary Turn recursive tail calls into loops

Robin Morisset

Reported 2017-09-08 08:16:35 PDT

As indicated in the title, we would like to turn recursive tail calls into simple loop, preferably before optimizations like LICM for optimal performance. We expect this to be a big win on the (admittedly contrived) benchmark in https://bugs.webkit.org/show_bug.cgi?id=175754. There are several difficulties. The main one is the interaction with inlining. Consider the following example: A() calls B() that calls C() that tail calls B (). If the inliner inlines C into B into A, then we want to insert a jump from the end of C to the beginning of B, in A. But currently the beginning of B might be in the middle of a basic block, so we must change the inliner to split the basic block. More surprisingly, the transformation itself must also happen at the same time and not in a later phase (for example after Fixup). Otherwise the intervening passes would not see the control flow edge, and might hoist parts of B() above the entry point of B (into the original code of A). So I will do the basic block splitting in DFGBytecodeParser::inlineCall and the actual transformation in DFGBytecodeParser::handleInlining. In the first time, we will only do the optimization for monomorphic non-varargs calls for simplicity. It should be straightforward (or at least possible) to extend it to the other cases later on.

Attachments
Patch (4.86 KB, patch) 2017-09-15 05:39 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews103 for mac-elcapitan (399.22 KB, application/zip) 2017-09-15 06:33 PDT, Build Bot	no flags	Details
Archive of layout-test-results from ews113 for mac-elcapitan (579.73 KB, application/zip) 2017-09-15 06:36 PDT, Build Bot	no flags	Details
Archive of layout-test-results from ews106 for mac-elcapitan-wk2 (365.14 KB, application/zip) 2017-09-15 06:37 PDT, Build Bot	no flags	Details
Archive of layout-test-results from ews125 for ios-simulator-wk2 (881.88 KB, application/zip) 2017-09-15 07:03 PDT, Build Bot	no flags	Details
Patch (12.70 KB, patch) 2017-09-18 07:41 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
WIP (37.46 KB, patch) 2017-09-22 06:32 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews114 for mac-elcapitan (1.27 MB, application/zip) 2017-09-22 07:14 PDT, Build Bot	no flags	Details
Archive of layout-test-results from ews105 for mac-elcapitan-wk2 (847.37 KB, application/zip) 2017-09-22 07:25 PDT, Build Bot	no flags	Details
Archive of layout-test-results from ews103 for mac-elcapitan (422.87 KB, application/zip) 2017-09-22 07:30 PDT, Build Bot	no flags	Details
Archive of layout-test-results from ews121 for ios-simulator-wk2 (1.08 MB, application/zip) 2017-09-22 07:41 PDT, Build Bot	no flags	Details
Still WIP, still buggy (51.83 KB, patch) 2017-09-25 03:23 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
WIP, stil trying to split the entry block, still buggy (51.79 KB, patch) 2017-09-25 06:13 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
WIP, optimisation commented out to test refactoring, and block splitting (51.84 KB, patch) 2017-09-25 07:23 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews100 for mac-elcapitan (504.85 KB, application/zip) 2017-09-25 08:09 PDT, Build Bot	no flags	Details
Archive of layout-test-results from ews107 for mac-elcapitan-wk2 (363.69 KB, application/zip) 2017-09-25 08:16 PDT, Build Bot	no flags	Details
Archive of layout-test-results from ews112 for mac-elcapitan (496.05 KB, application/zip) 2017-09-25 08:19 PDT, Build Bot	no flags	Details
Archive of layout-test-results from ews126 for ios-simulator-wk2 (1.03 MB, application/zip) 2017-09-25 08:39 PDT, Build Bot	no flags	Details
WIP, frustrating heisenbug (51.63 KB, patch) 2017-09-26 09:36 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
WIP, some more bugs (54.17 KB, patch) 2017-09-27 08:37 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Patch (55.11 KB, patch) 2017-09-27 10:52 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
WIP, fix another bug (55.22 KB, patch) 2017-09-28 00:32 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
WIP, more refactoring, probably still buggy (63.87 KB, patch) 2017-09-29 07:01 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews104 for mac-elcapitan-wk2 (1.47 MB, application/zip) 2017-09-29 08:12 PDT, Build Bot	no flags	Details
Archive of layout-test-results from ews112 for mac-elcapitan (2.53 MB, application/zip) 2017-09-29 08:40 PDT, Build Bot	no flags	Details
Archive of layout-test-results from ews103 for mac-elcapitan (2.16 MB, application/zip) 2017-09-29 08:55 PDT, Build Bot	no flags	Details
Fix stupid bug in handleInlining, everything optimizing tail calls is commented out, it should hopefully pass ews tests this time (66.86 KB, patch) 2017-10-02 15:18 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Known buggy, submitting to EWS to find more failing tests (66.88 KB, patch) 2017-10-03 05:20 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Commented out the buggy splitting of entry block (66.93 KB, patch) 2017-10-03 10:43 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Patch (11.40 KB, patch) 2017-10-11 13:42 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Fixed, this passes all the stress tests, but I want some more testing before asking for review (11.88 KB, patch) 2017-10-11 15:40 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Fixed a horrible design bug, and the performance regression (13.46 KB, patch) 2017-10-12 13:44 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Moved the splitting of op_enter to UnlinkedCodeBlocks, and correctly recognize when it has happened (18.65 KB, patch) 2017-10-13 11:17 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews114 for mac-elcapitan (2.00 MB, application/zip) 2017-10-13 18:37 PDT, Build Bot	no flags	Details
Patch (21.41 KB, patch) 2017-10-18 07:44 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Patch (22.67 KB, patch) 2017-10-18 08:57 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Patch (23.66 KB, patch) 2017-10-19 04:00 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
WIP, for sharing with Saam (26.36 KB, patch) 2017-10-31 10:29 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Patch for landing (25.12 KB, patch) 2017-11-08 12:11 PST, Robin Morisset	no flags	Details Formatted Diff Diff
Show Obsolete (38) View All Add attachment proposed patch, testcase, etc.

Robin Morisset

Comment 1 2017-09-15 05:39:02 PDT

Created attachment 320892 [details] Patch

Robin Morisset

Comment 2 2017-09-15 05:39:53 PDT

This patch is WIP: it compiles but does not goes through tests cleanly yet.

Build Bot

Comment 3 2017-09-15 05:42:00 PDT

Attachment 320892 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/ChangeLog:9: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ChangeLog:10: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6282: Wrong number of spaces before statement. (expected: 8) [whitespace/indent] [4] Total errors found: 3 in 2 files If any of these errors are false positives, please file a bug against check-webkit-style.

Build Bot

Comment 4 2017-09-15 06:33:10 PDT

Comment on attachment 320892 [details] Patch Attachment 320892 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/4557755 Number of test failures exceeded the failure limit.

Build Bot

Comment 5 2017-09-15 06:33:11 PDT

Created attachment 320897 [details] Archive of layout-test-results from ews103 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews103 Port: mac-elcapitan Platform: Mac OS X 10.11.6

Build Bot

Comment 6 2017-09-15 06:36:55 PDT

Comment on attachment 320892 [details] Patch Attachment 320892 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/4557737 Number of test failures exceeded the failure limit.

Build Bot

Comment 7 2017-09-15 06:36:56 PDT

Created attachment 320898 [details] Archive of layout-test-results from ews113 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews113 Port: mac-elcapitan Platform: Mac OS X 10.11.6

Build Bot

Comment 8 2017-09-15 06:37:49 PDT

Comment on attachment 320892 [details] Patch Attachment 320892 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/4557763 Number of test failures exceeded the failure limit.

Build Bot

Comment 9 2017-09-15 06:37:51 PDT

Created attachment 320899 [details] Archive of layout-test-results from ews106 for mac-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews106 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6

Build Bot

Comment 10 2017-09-15 07:02:59 PDT

Comment on attachment 320892 [details] Patch Attachment 320892 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/4557797 Number of test failures exceeded the failure limit.

Build Bot

Comment 11 2017-09-15 07:03:01 PDT

Created attachment 320900 [details] Archive of layout-test-results from ews125 for ios-simulator-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews125 Port: ios-simulator-wk2 Platform: Mac OS X 10.12.5

Robin Morisset

Comment 12 2017-09-18 07:41:34 PDT

Created attachment 321090 [details] Patch WIP, still crashes, and does not SetLocal arguments

Robin Morisset

Comment 13 2017-09-22 06:32:08 PDT

Created attachment 321539 [details] WIP

Build Bot

Comment 14 2017-09-22 07:14:43 PDT

Comment on attachment 321539 [details] WIP Attachment 321539 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/4627359 Number of test failures exceeded the failure limit.

Build Bot

Comment 15 2017-09-22 07:14:44 PDT

Created attachment 321543 [details] Archive of layout-test-results from ews114 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews114 Port: mac-elcapitan Platform: Mac OS X 10.11.6

Build Bot

Comment 16 2017-09-22 07:25:17 PDT

Comment on attachment 321539 [details] WIP Attachment 321539 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/4627484 Number of test failures exceeded the failure limit.

Build Bot

Comment 17 2017-09-22 07:25:18 PDT

Created attachment 321544 [details] Archive of layout-test-results from ews105 for mac-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews105 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6

Build Bot

Comment 18 2017-09-22 07:30:32 PDT

Comment on attachment 321539 [details] WIP Attachment 321539 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/4627548 Number of test failures exceeded the failure limit.

Build Bot

Comment 19 2017-09-22 07:30:33 PDT

Created attachment 321546 [details] Archive of layout-test-results from ews103 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews103 Port: mac-elcapitan Platform: Mac OS X 10.11.6

Build Bot

Comment 20 2017-09-22 07:41:10 PDT

Comment on attachment 321539 [details] WIP Attachment 321539 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/4627420 Number of test failures exceeded the failure limit.

Build Bot

Comment 21 2017-09-22 07:41:11 PDT

Created attachment 321547 [details] Archive of layout-test-results from ews121 for ios-simulator-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews121 Port: ios-simulator-wk2 Platform: Mac OS X 10.12.6

Saam Barati

Comment 22 2017-09-22 10:12:04 PDT

Comment on attachment 321539 [details] WIP View in context: https://bugs.webkit.org/attachment.cgi?id=321539&action=review > Source/JavaScriptCore/ChangeLog:15 > + This makes it a lot easier to make sure that they are in the "right" inlineStackEntry, i.e. the one corresponding to the function that their successor comes from. I'm not sure I follow this comment. What's "right" here? What's "successor"? > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1652 > + // If the current block is not empty, we must split it so that recursive tail calls in the inlinee have somewhere to jump to. I don't think you mean empty here given assert below. Perhaps you just mean if the block only has code for the op_call? > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1661 > + if (m_currentIndex != m_currentBlock->bytecodeBegin) { > + ASSERT(!m_currentBlock->isEmpty()); > + Node* jumpNode = addToGraph(Jump); > + Ref<BasicBlock> block = adoptRef(*new BasicBlock(m_currentIndex, m_numArguments, m_numLocals, m_currentBlock->executionCount)); > + jumpNode->targetBlock() = block.ptr(); > + m_inlineStackTop->m_blockLinkingTargets.append(block.ptr()); > + m_currentBlock = block.ptr(); > + m_graph.appendBlock(WTFMove(block)); > + } This split needs to happen after we process the setLocalQueue we may have appended to above. Otherwise, if we split, we'll generate invalid IR. I think a simple tail call that needs arity fixup will trigger this. > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6352 > + Ref<BasicBlock> block = adoptRef(*new BasicBlock(m_currentIndex, m_numArguments, m_numLocals, 1)); // TODO: why 1 ? Maybe we just start all blocks at 1? I know we change this value in the StaticExecutionCountEstimationPhase

Robin Morisset

Comment 23 2017-09-25 03:23:09 PDT

Created attachment 321675 [details] Still WIP, still buggy

Robin Morisset

Comment 24 2017-09-25 06:13:34 PDT

Created attachment 321678 [details] WIP, stil trying to split the entry block, still buggy

Robin Morisset

Comment 25 2017-09-25 07:23:07 PDT

Created attachment 321681 [details] WIP, optimisation commented out to test refactoring, and block splitting

Build Bot

Comment 26 2017-09-25 07:24:46 PDT

Attachment 321681 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1311: Should have a space between // and comment [whitespace/comments] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1312: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4136: WTF_CONCAT is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4138: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4144: Wrong number of spaces before statement. (expected: 12) [whitespace/indent] [4] Total errors found: 5 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.

Build Bot

Comment 27 2017-09-25 08:09:42 PDT

Comment on attachment 321681 [details] WIP, optimisation commented out to test refactoring, and block splitting Attachment 321681 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/4649225 Number of test failures exceeded the failure limit.

Build Bot

Comment 28 2017-09-25 08:09:43 PDT

Created attachment 321685 [details] Archive of layout-test-results from ews100 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews100 Port: mac-elcapitan Platform: Mac OS X 10.11.6

Build Bot

Comment 29 2017-09-25 08:16:37 PDT

Comment on attachment 321681 [details] WIP, optimisation commented out to test refactoring, and block splitting Attachment 321681 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/4649246 Number of test failures exceeded the failure limit.

Build Bot

Comment 30 2017-09-25 08:16:38 PDT

Created attachment 321686 [details] Archive of layout-test-results from ews107 for mac-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews107 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6

Build Bot

Comment 31 2017-09-25 08:19:04 PDT

Comment on attachment 321681 [details] WIP, optimisation commented out to test refactoring, and block splitting Attachment 321681 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/4649236 Number of test failures exceeded the failure limit.

Build Bot

Comment 32 2017-09-25 08:19:05 PDT

Created attachment 321687 [details] Archive of layout-test-results from ews112 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews112 Port: mac-elcapitan Platform: Mac OS X 10.11.6

Build Bot

Comment 33 2017-09-25 08:39:46 PDT

Comment on attachment 321681 [details] WIP, optimisation commented out to test refactoring, and block splitting Attachment 321681 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/4649342 Number of test failures exceeded the failure limit.

Build Bot

Comment 34 2017-09-25 08:39:47 PDT

Created attachment 321689 [details] Archive of layout-test-results from ews126 for ios-simulator-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews126 Port: ios-simulator-wk2 Platform: Mac OS X 10.12.6

Robin Morisset

Comment 35 2017-09-26 09:36:52 PDT

Created attachment 321826 [details] WIP, frustrating heisenbug

Keith Miller

Comment 36 2017-09-26 09:54:51 PDT

Comment on attachment 321826 [details] WIP, frustrating heisenbug View in context: https://bugs.webkit.org/attachment.cgi?id=321826&action=review > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:83 > +#define VERBOSE_LOG(...) do { \ > +if (Options::verboseDFGByteCodeParsing()) \ > + dataLog(__VA_ARGS__); \ > +} while (0) Nit: You could make this #define VERBOSE_LOG(...) dataLogIf(Options::verboseDFGByteCodeParsing(), __VA_ARGS__);

Saam Barati

Comment 37 2017-09-26 10:07:09 PDT

Comment on attachment 321826 [details] WIP, frustrating heisenbug View in context: https://bugs.webkit.org/attachment.cgi?id=321826&action=review > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1278 > + for (int i = 0; i < stackEntry->m_codeBlock->m_numVars; ++i) > + set(virtualRegisterForLocal(i), undefined, ImmediateNakedSet); // TODO: check that it is still reasonable to do this "ImmediateNakedSet" stuff here. This is wrong. You need setDirect here and you need to map the virtual register manually

Robin Morisset

Comment 38 2017-09-27 08:37:41 PDT

Created attachment 321960 [details] WIP, some more bugs

Robin Morisset

Comment 39 2017-09-27 10:52:49 PDT

Created attachment 321974 [details] Patch

Robin Morisset

Comment 40 2017-09-28 00:32:15 PDT

Created attachment 322070 [details] WIP, fix another bug

Robin Morisset

Comment 41 2017-09-29 07:01:05 PDT

Created attachment 322183 [details] WIP, more refactoring, probably still buggy

Build Bot

Comment 42 2017-09-29 07:03:33 PDT

Attachment 322183 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/ChangeLog:29: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ChangeLog:30: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ChangeLog:31: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1342: Should have a space between // and comment [whitespace/comments] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1343: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4166: WTF_CONCAT is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4168: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4173: Wrong number of spaces before statement. (expected: 12) [whitespace/indent] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6285: Wrong number of spaces before statement. (expected: 8) [whitespace/indent] [4] Total errors found: 9 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.

Build Bot

Comment 43 2017-09-29 07:49:12 PDT

Comment on attachment 322183 [details] WIP, more refactoring, probably still buggy Attachment 322183 [details] did not pass jsc-ews (mac): Output: http://webkit-queues.webkit.org/results/4697967 New failing tests: stress/v8-deltablue-strict.js.ftl-eager v8-v6/v8-deltablue.js.ftl-eager-no-cjit stress/v8-richards-strict.js.ftl-no-cjit-validate-sampling-profiler wasm.yaml/wasm/function-tests/stack-overflow.js.default-wasm stress/spread-calling.js.ftl-no-cjit-no-put-stack-validate microbenchmarks/delta-blue-try-catch.js.ftl-eager-no-cjit stress/spread-calling.js.ftl-no-cjit-validate-sampling-profiler v8-v6/v8-deltablue.js.default stress/v8-deltablue-strict.js.ftl-no-cjit-b3o1 v8-v6/v8-richards.js.default microbenchmarks/delta-blue-try-catch.js.ftl-no-cjit-validate-sampling-profiler stress/v8-deltablue-strict.js.ftl-eager-no-cjit-b3o1 stress/v8-deltablue-strict.js.ftl-no-cjit-validate-sampling-profiler stress/spread-non-array.js.ftl-eager stress/v8-richards-strict.js.ftl-no-cjit-no-put-stack-validate stress/arrowfunction-lexical-bind-arguments-strict.js.ftl-eager-no-cjit-b3o1 v8-v6/v8-deltablue.js.ftl-no-cjit-no-put-stack-validate stress/const-tdz.js.ftl-eager-no-cjit-b3o1 microbenchmarks/delta-blue-try-catch.js.ftl-eager-no-cjit-b3o1 microbenchmarks/getter-richards.js.default basic-tests.yaml/stress-test.js.ftl-eager-no-cjit stress/arrowfunction-lexical-bind-arguments-strict.js.ftl-eager-no-cjit v8-v6/v8-deltablue.js.ftl-eager stress/arrowfunction-lexical-bind-arguments-non-strict-2.js.ftl-eager-no-cjit stress/spread-calling.js.default stress/v8-richards-strict.js.default stress/arguments-elimination-varargs-too-many-args-arg-count.js.ftl-eager-no-cjit-b3o1 v8-v6/v8-richards.js.ftl-no-cjit-validate-sampling-profiler v8-v6/v8-richards.js.ftl-eager stress/array-copywithin.js.ftl-eager-no-cjit v8-v6/v8-richards.js.ftl-no-cjit-no-put-stack-validate stress/v8-deltablue-strict.js.default cdjs-tests.yaml/main.js.default v8-v6/v8-deltablue.js.ftl-no-cjit-validate-sampling-profiler wasm.yaml/wasm/function-tests/memory-access-past-4gib.js.wasm-no-cjit-yes-tls-context microbenchmarks/deltablue-for-of.js.ftl-eager microbenchmarks/delta-blue-try-catch.js.ftl-no-cjit-no-put-stack-validate wasm.yaml/wasm/js-api/export-arity.js.wasm-no-cjit-yes-tls-context stress/spread-non-array.js.ftl-no-cjit-no-put-stack-validate stress/arguments-elimination-varargs-too-many-args-arg-count.js.ftl-eager-no-cjit microbenchmarks/deltablue-for-of.js.ftl-eager-no-cjit microbenchmarks/fake-iterators-that-throw-when-finished.js.ftl-eager microbenchmarks/deltablue-for-of.js.ftl-eager-no-cjit-b3o1 stress/const-tdz.js.ftl-eager-no-cjit stress/arrowfunction-lexical-bind-arguments-non-strict-2.js.ftl-eager-no-cjit-b3o1 stress/v8-deltablue-strict.js.ftl-no-cjit-no-put-stack-validate microbenchmarks/delta-blue-try-catch.js.ftl-eager stress/spread-non-array.js.default basic-tests.yaml/stress-test.js.default wasm.yaml/wasm/fuzz/export-function.js.wasm-slow-memory basic-tests.yaml/stress-test.js.ftl-no-cjit microbenchmarks/fake-iterators-that-throw-when-finished.js.ftl-no-cjit-validate-sampling-profiler stress/spread-non-array.js.ftl-no-cjit-b3o1 v8-v6/v8-deltablue.js.ftl-eager-no-cjit-b3o1 cdjs-tests.yaml/main.js.ftl-no-cjit wasm.yaml/wasm/function-tests/stack-overflow.js.wasm-no-cjit-yes-tls-context microbenchmarks/delta-blue-try-catch.js.ftl-no-cjit-b3o1 v8-v6/v8-deltablue.js.ftl-no-cjit-b3o1 microbenchmarks/fake-iterators-that-throw-when-finished.js.ftl-no-cjit-b3o1 wasm.yaml/wasm/function-tests/stack-overflow.js.wasm-no-call-ic microbenchmarks/fake-iterators-that-throw-when-finished.js.default jsc-layout-tests.yaml/js/script-tests/arrowfunction-lexical-bind-arguments-non-strict.js.layout-ftl-eager-no-cjit stress/lexical-let-tdz.js.ftl-eager-no-cjit stress/lexical-let-tdz.js.ftl-eager-no-cjit-b3o1 stress/spread-non-array.js.ftl-no-cjit-validate-sampling-profiler v8-v6/v8-richards.js.ftl-no-cjit-b3o1 stress/spread-calling.js.ftl-no-cjit-b3o1 microbenchmarks/delta-blue-try-catch.js.default wasm.yaml/wasm/js-api/export-arity.js.default-wasm wasm.yaml/wasm/function-tests/stack-overflow.js.wasm-slow-memory stress/v8-richards-strict.js.ftl-no-cjit-b3o1 wasm.yaml/wasm/fuzz/memory.js.wasm-eager-jettison wasm.yaml/wasm/function-tests/stack-overflow.js.wasm-eager-jettison stress/const-loop-semantics.js.ftl-eager-no-cjit-b3o1 microbenchmarks/fake-iterators-that-throw-when-finished.js.ftl-no-cjit-no-put-stack-validate jsc-layout-tests.yaml/js/script-tests/arrowfunction-lexical-bind-arguments-strict.js.layout-ftl-eager-no-cjit stress/v8-deltablue-strict.js.ftl-eager-no-cjit stress/spread-non-array.js.ftl-eager-no-cjit-b3o1 wasm.yaml/wasm/js-api/export-arity.js.wasm-slow-memory

Build Bot

Comment 44 2017-09-29 08:12:46 PDT

Comment on attachment 322183 [details] WIP, more refactoring, probably still buggy Attachment 322183 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/4698039 Number of test failures exceeded the failure limit.

Build Bot

Comment 45 2017-09-29 08:12:48 PDT

Created attachment 322185 [details] Archive of layout-test-results from ews104 for mac-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews104 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6

Build Bot

Comment 46 2017-09-29 08:40:07 PDT

Comment on attachment 322183 [details] WIP, more refactoring, probably still buggy Attachment 322183 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/4698098 New failing tests: imported/w3c/web-platform-tests/dom/ranges/Range-mutations-dataChange.html imported/w3c/web-platform-tests/encoding/textdecoder-fatal-single-byte.html imported/w3c/web-platform-tests/dom/ranges/Range-intersectsNode.html imported/w3c/web-platform-tests/html/syntax/parsing/named-character-references.html imported/w3c/web-platform-tests/WebCryptoAPI/generateKey/test_failures.https.html imported/w3c/web-platform-tests/encoding/textdecoder-labels.html imported/w3c/web-platform-tests/encoding/api-invalid-label.html workers/wasm-long-compile.html imported/w3c/web-platform-tests/html/the-xhtml-syntax/parsing-xhtml-documents/xhtml-mathml-dtd-entity-10.htm

Build Bot

Comment 47 2017-09-29 08:40:09 PDT

Created attachment 322186 [details] Archive of layout-test-results from ews112 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews112 Port: mac-elcapitan Platform: Mac OS X 10.11.6

Build Bot

Comment 48 2017-09-29 08:55:03 PDT

Comment on attachment 322183 [details] WIP, more refactoring, probably still buggy Attachment 322183 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/4698189 New failing tests: imported/w3c/web-platform-tests/dom/ranges/Range-mutations-dataChange.html imported/w3c/web-platform-tests/dom/ranges/Range-intersectsNode.html imported/w3c/web-platform-tests/encoding/textdecoder-fatal-single-byte.html imported/w3c/web-platform-tests/dom/ranges/Range-isPointInRange.html imported/w3c/web-platform-tests/encoding/textdecoder-labels.html imported/w3c/web-platform-tests/dom/ranges/Range-mutations.html imported/w3c/web-platform-tests/html/dom/reflection-grouping.html imported/w3c/web-platform-tests/html/dom/reflection-misc.html imported/w3c/web-platform-tests/dom/ranges/Range-compareBoundaryPoints.html imported/w3c/web-platform-tests/dom/ranges/Range-comparePoint.html imported/w3c/web-platform-tests/WebCryptoAPI/derive_bits_keys/test_hkdf.https.html imported/w3c/web-platform-tests/html/dom/reflection-metadata.html imported/w3c/web-platform-tests/html/dom/reflection-sections.html imported/w3c/web-platform-tests/WebCryptoAPI/generateKey/test_failures.https.html imported/w3c/web-platform-tests/encoding/api-invalid-label.html imported/w3c/web-platform-tests/html/dom/reflection-forms.html imported/w3c/web-platform-tests/html/dom/reflection-text.html imported/w3c/web-platform-tests/dom/ranges/Range-set.html imported/w3c/web-platform-tests/html/dom/reflection-obsolete.html imported/w3c/web-platform-tests/html/dom/reflection-tabular.html imported/w3c/web-platform-tests/html/syntax/parsing/named-character-references.html imported/w3c/web-platform-tests/html/dom/reflection-embedded.html workers/wasm-long-compile.html imported/w3c/web-platform-tests/html/the-xhtml-syntax/parsing-xhtml-documents/xhtml-mathml-dtd-entity-10.htm

Build Bot

Comment 49 2017-09-29 08:55:09 PDT

Created attachment 322187 [details] Archive of layout-test-results from ews103 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews103 Port: mac-elcapitan Platform: Mac OS X 10.11.6

Robin Morisset

Comment 50 2017-10-02 15:18:15 PDT

Created attachment 322455 [details] Fix stupid bug in handleInlining, everything optimizing tail calls is commented out, it should hopefully pass ews tests this time

Build Bot

Comment 51 2017-10-02 15:20:32 PDT

Attachment 322455 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1342: Should have a space between // and comment [whitespace/comments] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1343: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4177: WTF_CONCAT is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4179: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4184: Wrong number of spaces before statement. (expected: 12) [whitespace/indent] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6265: Wrong number of spaces before statement. (expected: 8) [whitespace/indent] [4] Total errors found: 6 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.

Robin Morisset

Comment 52 2017-10-03 05:20:45 PDT

Created attachment 322511 [details] Known buggy, submitting to EWS to find more failing tests

Build Bot

Comment 53 2017-10-03 05:21:49 PDT

Attachment 322511 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1342: Should have a space between // and comment [whitespace/comments] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1343: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4177: WTF_CONCAT is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4179: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4184: Wrong number of spaces before statement. (expected: 12) [whitespace/indent] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6264: Wrong number of spaces before statement. (expected: 8) [whitespace/indent] [4] Total errors found: 6 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.

Build Bot

Comment 54 2017-10-03 06:00:15 PDT

Comment on attachment 322511 [details] Known buggy, submitting to EWS to find more failing tests Attachment 322511 [details] did not pass jsc-ews (mac): Output: http://webkit-queues.webkit.org/results/4739483 New failing tests: stress/arguments-elimination-varargs-too-many-args-arg-count.js.ftl-eager-no-cjit-b3o1 stress/arguments-elimination-varargs-too-many-args-arg-count.js.ftl-eager-no-cjit

Robin Morisset

Comment 55 2017-10-03 10:43:14 PDT

Created attachment 322545 [details] Commented out the buggy splitting of entry block

Build Bot

Comment 56 2017-10-03 10:46:35 PDT

Attachment 322545 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1313: Multi line control clauses should use braces. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1343: Should have a space between // and comment [whitespace/comments] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1344: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4178: WTF_CONCAT is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4180: Should be indented on a separate line, with the colon or comma first on that line. [whitespace/indent] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4185: Wrong number of spaces before statement. (expected: 12) [whitespace/indent] [4] ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6267: Wrong number of spaces before statement. (expected: 8) [whitespace/indent] [4] Total errors found: 7 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.

Keith Miller

Comment 57 2017-10-03 13:24:50 PDT

Comment on attachment 322545 [details] Commented out the buggy splitting of entry block View in context: https://bugs.webkit.org/attachment.cgi?id=322545&action=review Seems reasonable but there are some things you should clean up. > Source/JavaScriptCore/ChangeLog:17 > + - m_linkingTargets is a dictionnary from bytecode indices to BasicBlock* typo: dictionnary -> dictionary. > Source/JavaScriptCore/ChangeLog:21 > + terminal that need to be taken off m_unlinkedBlocks. typo: needs > Source/JavaScriptCore/ChangeLog:28 > + - The 7 and 8 arguments of handleCall were inlined in its 3 arguments version for readability > + - The 9 argument version was cleaned up and simplified I feel like something is wrong when a function has more than 6 arguments... :( > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1236 > + BasicBlock* blockPtr = block.ptr(); Nit: you could probably drop this line. > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1273 > + Node* callTarget = get(VirtualRegister(pc[2].u.operand)); > + int argumentCountIncludingThis = pc[3].u.operand; > + int registerOffset = -pc[4].u.operand; It would be great if you gave these names in the BytecodeList.json and used them here. > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1325 > +bool ByteCodeParser::handleRecursiveTailCall(Node* callTargetNode, VirtualRegister thisArgument, CallLinkStatus callLinkStatus) > +{ > + // We currently only do this optimisation in the simple, non-polymorphic case. > + if (callLinkStatus.couldTakeSlowPath() || callLinkStatus.size() != 1) > + return false; > + > + auto targetExecutable = callLinkStatus[0].executable(); > + InlineStackEntry* stackEntry = m_inlineStackTop; > + do { > + if (targetExecutable != stackEntry->executable()) > + continue; > + > + VERBOSE_LOG(" We found a recursive tail, optimizing it to a jump to block ", *stackEntry->m_callsiteBlockHead, ".\n"); > + // TODO: set the arguments to the right value > + // TODO: shadow chicken log > + > + // We must also add some check that the profiling information was correct and the target of this call is what we thought > + emitFunctionChecks(callLinkStatus[0], callTargetNode, thisArgument); > + > + VERBOSE_LOG(" Repeating the work of op_enter as we are jumping to right after it. \n"); > + Node* undefined = addToGraph(JSConstant, OpInfo(m_constantUndefined)); > + // Initialize all locals to undefined. > + for (int i = 0; i < stackEntry->m_codeBlock->m_numVars; ++i) > + setDirect(stackEntry->remapOperand(virtualRegisterForLocal(i)), undefined, ImmediateSetWithFlush); > + // TODO: check that ImmediateSetWithFlush is the right pick here. > + > + Node* jumpNode = addToGraph(Jump); > + jumpNode->targetBlock() = stackEntry->m_callsiteBlockHead; > + m_currentBlock->didLink(); > + flushForTerminal(); > + return true; > + } while ((stackEntry = stackEntry->m_caller)); > + > + // The tail call was not recursive > + return false; This function isn't used, can you remove it for now? > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1344 > + //if (op == TailCall && handleRecursiveTailCall(callTarget, thisArgument, callLinkStatus)) > + // return Terminal; Please remove commented code. > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2052 > + // TODO: we should be able to get rid of this somehow. Make this a FIXME and link a bugzilla. > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2080 > + // TODO: we can avoid this indirection in the case of normal inlined call by passing the continuation block all the way into inlineStackEntry->m_continuationBlock in inlineCall. ditto. > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4252 > + // to be true. TODO: change this. Ditto. > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5276 > + if (m_inlineStackTop->m_returnValue.isValid()) // TODO: understand what happens if it is not. Ditto. > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5284 > + // TODO: should there be a special case for when we are returning from the first block? I do not see why, but there was one. Ditto. > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6257 > + BasicBlock* callsiteBlockHead, // TODO: remove from constructor, since it is set in parseBlock, when dealing with op_enter Ditto.

Robin Morisset

Comment 58 2017-10-04 07:55:27 PDT

> > Source/JavaScriptCore/ChangeLog:28 > > + - The 7 and 8 arguments of handleCall were inlined in its 3 arguments version for readability > > + - The 9 argument version was cleaned up and simplified > > I feel like something is wrong when a function has more than 6 arguments... > :( I totally agree, but I saw no easy way to solve this problem, and my refactoring was already pretty big. > > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1236 > > + BasicBlock* blockPtr = block.ptr(); > > Nit: you could probably drop this line. I am not sure it would remain correct to refer to block.ptr() after I WTFMove(block). > > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1273 > > + Node* callTarget = get(VirtualRegister(pc[2].u.operand)); > > + int argumentCountIncludingThis = pc[3].u.operand; > > + int registerOffset = -pc[4].u.operand; > > It would be great if you gave these names in the BytecodeList.json and used > them here. Is there a way to do so that correctly negate the registerOffset, and correctly "get"s the callTarget? If not, I am not sure what to call those before the operands are processed.

Saam Barati

Comment 59 2017-10-04 13:09:40 PDT

Comment on attachment 322545 [details] Commented out the buggy splitting of entry block View in context: https://bugs.webkit.org/attachment.cgi?id=322545&action=review Robin, is the goal of this review to get feedback on the tail loop stuff, or to first check in the refactoring? If the latter, then let's open a new bug with a title for the refactoring. >> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2052 >> + // TODO: we should be able to get rid of this somehow. > > Make this a FIXME and link a bugzilla. What's your conclusion here? I don't vote for a bugzilla unless we know how we'll get rid of this. >> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2080 >> + // TODO: we can avoid this indirection in the case of normal inlined call by passing the continuation block all the way into inlineStackEntry->m_continuationBlock in inlineCall. > > ditto. Seems like this might be worth just doing. > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4292 > + if (false) { > + VERBOSE_LOG("Splitting the entry block for potential recursive tail calls.\n"); > + Node* jumpNode = addToGraph(Jump); > + BasicBlock* newBlock = allocateUntargetableBlock(); > + jumpNode->targetBlock() = newBlock; > + m_currentBlock->didLink(); > + m_currentBlock = newBlock; > + } What happens when you run this code? >> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5276 >> + if (m_inlineStackTop->m_returnValue.isValid()) // TODO: understand what happens if it is not. > > Ditto. Going against Keith's ditto here: please don't file a bug on you not fully understanding of the code. I'm like 95% sure this has to do w/ setters, but regardless of that, you just figure out why this happens, and remove the TODO >> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5284 >> + // TODO: should there be a special case for when we are returning from the first block? I do not see why, but there was one. > > Ditto. Ditto here. This seems like something you need to figure out before landing >> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6257 >> + BasicBlock* callsiteBlockHead, // TODO: remove from constructor, since it is set in parseBlock, when dealing with op_enter > > Ditto. Ditto here. This is something you should either do or remove your TODO

Robin Morisset

Comment 60 2017-10-06 09:06:56 PDT

I moved the refactoring part of the patch to https://bugs.webkit.org/show_bug.cgi?id=177922 so that it can be reviewed more easily.

Robin Morisset

Comment 61 2017-10-11 13:42:12 PDT

Created attachment 323457 [details] Patch

Robin Morisset

Comment 62 2017-10-11 15:40:12 PDT

Created attachment 323474 [details] Fixed, this passes all the stress tests, but I want some more testing before asking for review

Build Bot

Comment 63 2017-10-11 15:41:24 PDT

Attachment 323474 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:215: The parameter name "stackEntry" adds no information, so it should be removed. [readability/parameter_name] [5] Total errors found: 1 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.

Robin Morisset

Comment 64 2017-10-11 16:10:46 PDT

Robin Morisset

Comment 65 2017-10-11 17:36:24 PDT

After running a few tests, it is clear that this regression is entirely caused by the splitting of op_enter blocks: in particular it reproduces even if handleRecursiveTailCall is replaced by return false.

Robin Morisset

Comment 66 2017-10-12 11:27:20 PDT

Some more things about this performance regression: - It only shows on Octane/raytrace: neither sun spider, kraken or microbenchmarks have any significant regression (maybe a 1% regression on Kraken, to investigate later) - It is caused purely by the splitting of entry blocks, not by the actual optimisation - It persists even when not doing the splitting for functions at the leaves of the call graph (through testing for the emptiness of callLinkInfoMap). - It persists even when running DFGCFGSimplificationPhase right after the byte code parser (to eliminate unnecessary jumps) - It does not appear to be located in a single JS function: when attempting to bisect with JSC_bytecodeRangeToDFGCompile, the difference with the baseline progressively shrinks instead of having a cliff somewhere.

Robin Morisset

Comment 67 2017-10-12 13:44:10 PDT

Created attachment 323558 [details] Fixed a horrible design bug, and the performance regression

Robin Morisset

Comment 68 2017-10-12 13:45:39 PDT

Benchmark report for Octane on MacBook-Pro-5 (MacBookPro14,3). VMs tested: "Baseline" at /Users/rmorisset/Webkit2/OpenSource/WebKitBuild/Baseline/Release/jsc "Tail" at /Users/rmorisset/Webkit2/OpenSource/WebKitBuild/TailCall3/Release/jsc Collected 8 samples per benchmark/VM, with 8 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. Baseline Tail encrypt 0.13207+-0.00161 ? 0.13261+-0.00127 ? decrypt 2.34831+-0.04530 2.33041+-0.01246 deltablue x2 0.12342+-0.00190 ? 0.12351+-0.00322 ? earley 0.24859+-0.00332 0.24785+-0.00275 boyer 4.31797+-0.07807 4.28364+-0.03930 navier-stokes x2 4.87890+-0.06596 4.84418+-0.05741 raytrace x2 0.68536+-0.00741 0.68159+-0.00531 richards x2 0.07856+-0.00083 ? 0.07917+-0.00139 ? splay x2 0.31365+-0.00953 0.30422+-0.00139 might be 1.0310x faster regexp x2 15.52398+-0.16968 15.51919+-0.29346 pdfjs x2 39.02238+-0.21603 ? 39.06900+-0.24751 ? mandreel x2 38.14925+-0.39561 38.03272+-0.60931 gbemu x2 29.17857+-0.55032 ? 29.73977+-1.43515 ? might be 1.0192x slower closure 0.45147+-0.00498 ? 0.45274+-0.01029 ? jquery 6.08769+-0.12487 6.01421+-0.07787 might be 1.0122x faster box2d x2 7.77007+-0.22361 7.56490+-0.05053 might be 1.0271x faster zlib x2 306.73149+-3.07770 ? 307.86919+-2.63745 ? typescript x2 606.15997+-7.28700 599.21710+-6.84371 might be 1.0116x faster <geometric> 4.62426+-0.02182 4.60428+-0.01928 might be 1.0043x faster --------------------- Benchmark report for Kraken on MacBook-Pro-5 (MacBookPro14,3). VMs tested: "Baseline" at /Users/rmorisset/Webkit2/OpenSource/WebKitBuild/Baseline/Release/jsc "Tail" at /Users/rmorisset/Webkit2/OpenSource/WebKitBuild/TailCall3/Release/jsc Collected 8 samples per benchmark/VM, with 8 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. Baseline Tail ai-astar 71.707+-3.284 ? 74.090+-3.158 ? might be 1.0332x slower audio-beat-detection 34.792+-0.521 ? 36.688+-3.837 ? might be 1.0545x slower audio-dft 85.256+-3.956 ? 86.756+-2.058 ? might be 1.0176x slower audio-fft 24.700+-1.414 ? 24.726+-1.062 ? audio-oscillator 48.445+-1.414 ? 51.629+-4.553 ? might be 1.0657x slower imaging-darkroom 52.437+-1.327 ? 52.996+-1.902 ? might be 1.0107x slower imaging-desaturate 45.185+-2.413 44.590+-1.823 might be 1.0134x faster imaging-gaussian-blur 57.064+-3.813 ? 58.884+-5.177 ? might be 1.0319x slower json-parse-financial 31.760+-2.089 31.531+-1.447 json-stringify-tinderbox 19.834+-0.519 ? 20.088+-0.547 ? might be 1.0128x slower stanford-crypto-aes 34.609+-1.645 ? 35.127+-1.028 ? might be 1.0150x slower stanford-crypto-ccm 31.612+-2.898 30.550+-2.354 might be 1.0348x faster stanford-crypto-pbkdf2 58.973+-1.840 57.927+-1.497 might be 1.0180x faster stanford-crypto-sha256-iterative 18.413+-0.292 ? 18.814+-0.720 ? might be 1.0218x slower <arithmetic> 43.913+-0.329 ? 44.600+-0.709 ? might be 1.0156x slower So the regression on raytrace is solved, but there might be a 1.5% slowdown on Kraken.

Build Bot

Comment 69 2017-10-12 13:47:03 PDT

Attachment 323558 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:215: The parameter name "stackEntry" adds no information, so it should be removed. [readability/parameter_name] [5] ERROR: Source/JavaScriptCore/ChangeLog:12: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ChangeLog:13: Line contains tab character. [whitespace/tab] [5] Total errors found: 3 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.

Saam Barati

Comment 70 2017-10-12 13:48:25 PDT

Comment on attachment 323558 [details] Fixed a horrible design bug, and the performance regression View in context: https://bugs.webkit.org/attachment.cgi?id=323558&action=review > Source/JavaScriptCore/bytecode/PreciseJumpTargets.cpp:68 > + // We need to insert a jump after op_enter, so recursive tail calls have somewhere to jump to. > + // But we only want to pay that price for functions that have at least one tail call. > + CallLinkInfoMap map; > + codeBlock->getCallLinkInfoMap(map); > + for (auto info : map.values()) { > + if (info->callMode() == CallMode::Tail) { > + out.append(bytecodeOffset + opcodeLengths[op_enter]); > + break; > + } > + } > + } You should just set a bit if a CodeBlock has an op_tail_call in it. You can probably do it in bytecode generator via UnlinkedCodeBlock.

Saam Barati

Comment 71 2017-10-12 13:53:49 PDT

Comment on attachment 323558 [details] Fixed a horrible design bug, and the performance regression View in context: https://bugs.webkit.org/attachment.cgi?id=323558&action=review > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1433 > + if (auto callFrame = stackEntry->m_inlineCallFrame) { nit: I prefer a type here, or at least, auto* instead of auto. > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1436 > + if (argumentCountIncludingThis != (int) callFrame->argumentCountIncludingThis) Style: WebKit style uses static_cast instead of C-style casts. > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1465 > + // We want to emit the SetLocals with a code origin that points to the place we are jumping to. Nit: "with a code origin" => "with an exit origin" since the semantic origin won't be there.

Robin Morisset

Comment 72 2017-10-13 11:17:48 PDT

Created attachment 323710 [details] Moved the splitting of op_enter to UnlinkedCodeBlocks, and correctly recognize when it has happened

Build Bot

Comment 73 2017-10-13 11:19:27 PDT

Attachment 323710 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:215: The parameter name "stackEntry" adds no information, so it should be removed. [readability/parameter_name] [5] Total errors found: 1 in 7 files If any of these errors are false positives, please file a bug against check-webkit-style.

Robin Morisset

Comment 74 2017-10-13 11:21:03 PDT

We now see an improvement of TailBench9000 but it is rather disappointing: - n-body-run.js roughly went from 1.05s to 0.95s - merge-sort-run.js roughly went from 1.40s to 1.10s I have not yet ran other benchmarks on this last version of the code.

Robin Morisset

Comment 75 2017-10-13 11:39:21 PDT

It appears that the optimization currently hurts tier-up from DFG to FTL, I am investigating how to fix it.

Build Bot

Comment 76 2017-10-13 18:37:27 PDT

Comment on attachment 323710 [details] Moved the splitting of op_enter to UnlinkedCodeBlocks, and correctly recognize when it has happened Attachment 323710 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/4852670 New failing tests: http/tests/loading/basic-auth-resend-wrong-credentials.html

Build Bot

Comment 77 2017-10-13 18:37:29 PDT

Created attachment 323785 [details] Archive of layout-test-results from ews114 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews114 Port: mac-elcapitan Platform: Mac OS X 10.11.6

Robin Morisset

Comment 78 2017-10-17 08:56:05 PDT

Comment on attachment 323710 [details] Moved the splitting of op_enter to UnlinkedCodeBlocks, and correctly recognize when it has happened My experiments with inserting LoopHint have not improved performance so far, so I would like to get this patch landed as is, and make it a separate bug. The build-bot failure looks very much like a flaky test to me, I don't see how my patch could break layout. Is there a simple way to convince the buildbot to try again? Or should I just resubmit the patch?

Sam Weinig

Comment 79 2017-10-17 09:19:35 PDT

(In reply to Robin Morisset from comment #78) Is there a simple way to convince the > buildbot to try again? Or should I just resubmit the patch? I think you just need to re-upload the change unfortunately.

Saam Barati

Comment 80 2017-10-17 12:18:52 PDT

Comment on attachment 323710 [details] Moved the splitting of op_enter to UnlinkedCodeBlocks, and correctly recognize when it has happened View in context: https://bugs.webkit.org/attachment.cgi?id=323710&action=review > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1418 > + // We currently only do this optimisation in the simple, non-polymorphic case. Maybe add a FIXME with a bugzilla link here since you told me you wanted to do this and you thought it'd be pretty easy? > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1429 > + BasicBlock** entryBlockPtr = tryBinarySearch<BasicBlock*, unsigned>(stackEntry->m_blockLinkingTargets, stackEntry->m_blockLinkingTargets.size(), OPCODE_LENGTH(op_enter), getBytecodeBeginForBlock); m_blockLInkingTargets is sorted? > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1480 > + // It would be unsound to jump over a non-tail call: the "tail" call is not really a tail call in that case. Please add tests for this since you didn't find any when running the tests with the earlier broken patch. It'd be good to verify your earlier broken version would fail such a test.

Robin Morisset

Comment 81 2017-10-18 07:44:01 PDT

Created attachment 324116 [details] Patch

Robin Morisset

Comment 82 2017-10-18 07:51:07 PDT

(In reply to Saam Barati from comment #80) > Comment on attachment 323710 [details] > Moved the splitting of op_enter to UnlinkedCodeBlocks, and correctly > recognize when it has happened > > View in context: > https://bugs.webkit.org/attachment.cgi?id=323710&action=review > > > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1418 > > + // We currently only do this optimisation in the simple, non-polymorphic case. > > Maybe add a FIXME with a bugzilla link here since you told me you wanted to > do this and you thought it'd be pretty easy? Done > > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1429 > > + BasicBlock** entryBlockPtr = tryBinarySearch<BasicBlock*, unsigned>(stackEntry->m_blockLinkingTargets, stackEntry->m_blockLinkingTargets.size(), OPCODE_LENGTH(op_enter), getBytecodeBeginForBlock); > > m_blockLInkingTargets is sorted? Yes. It was already mentioned in a comment; I added some ASSERTs to allocateTargetableBlock and makeBlockTargetable. > > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1480 > > + // It would be unsound to jump over a non-tail call: the "tail" call is not really a tail call in that case. > > Please add tests for this since you didn't find any when running the tests > with the earlier broken patch. It'd be good to verify your earlier broken > version would fail such a test. I added some test, but the buggy code currently appears to be passing the test somehow, I am investigating.

Robin Morisset

Comment 83 2017-10-18 08:57:44 PDT

Created attachment 324120 [details] Patch

Robin Morisset

Comment 84 2017-10-18 09:07:56 PDT

Mystery solved: there was yet another bug (!) that was causing this entire code path not to be exercised by the tests for either the buggy or the "fixed" version of the code. In short, because each code block has a superset of the jump targets, a function with no control-flow was deemed to have no jump targets, even if it had a tail call, inhibiting the splitting of the basic block that includes op_enter. After fixing this bug, we can now correctly detect the other bugs from previous versions of this patch with the test. I also added another test as I realized that one code path was still not tested. The tests now (hopefully) correctly find wrong code generation; but I do not see how to test that the transformation occurred when it should.

Saam Barati

Comment 85 2017-10-18 11:46:56 PDT

Comment on attachment 324120 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=324120&action=review r=me with a few comments. > Source/JavaScriptCore/bytecode/CodeBlock.h:928 > + bool hasTailCalls() const {return m_unlinkedCode->hasTailCalls();} style nit: this should be written as: { return m_unlinkedCode->hasTailCalls(); } for the spacing to follow webkit style > Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:1321 > + // We must add the end of op_enter as a potential jump target, because the bytecode parser may decide to split its basic block > + // to have somewhere to jump to if there is a recursive tail-call that points to this function. > + m_codeBlock->addJumpTarget(instructions().size()); > + // This disables peephole optimizations when an instruction is a jump target > + m_lastOpcodeID = op_end; Is it worth just emitting a label since that will do this work for us? Or should we keep this as is? > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1469 > + // We jump right after it and not before it, because of some invariant saying that an OSR entry point cannot have predecessors in the IR. This is not an accurate comment. The invariant is that a root in the CFG cannot have a predecessor. An OSR entrypoint *can* have predecessors. For example, loops are often OSR entry points. FWIW, we could totally jump to op_enter, we'd just need to split before op_enter, not after. This may be awkward. I'm totally OK with how you've implemented it.

Robin Morisset

Comment 86 2017-10-19 02:20:25 PDT

(In reply to Saam Barati from comment #85) > Comment on attachment 324120 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=324120&action=review > > r=me with a few comments. > > > Source/JavaScriptCore/bytecode/CodeBlock.h:928 > > + bool hasTailCalls() const {return m_unlinkedCode->hasTailCalls();} > > style nit: this should be written as: > { return m_unlinkedCode->hasTailCalls(); } > for the spacing to follow webkit style Done > > Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:1321 > > + // We must add the end of op_enter as a potential jump target, because the bytecode parser may decide to split its basic block > > + // to have somewhere to jump to if there is a recursive tail-call that points to this function. > > + m_codeBlock->addJumpTarget(instructions().size()); > > + // This disables peephole optimizations when an instruction is a jump target > > + m_lastOpcodeID = op_end; > > Is it worth just emitting a label since that will do this work for us? Or > should we keep this as is? That was my first idea. The problem is that we would then have a Ref<Label> with nowhere to put it. It is not a serious problem, it just felt wrong to allocate data that we would never even read back before freeing it. I can change it back if you prefer. > > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1469 > > + // We jump right after it and not before it, because of some invariant saying that an OSR entry point cannot have predecessors in the IR. > > This is not an accurate comment. The invariant is that a root in the CFG > cannot have a predecessor. An OSR entrypoint *can* have predecessors. For > example, loops are often OSR entry points. FWIW, we could totally jump to > op_enter, we'd just need to split before op_enter, not after. This may be > awkward. I'm totally OK with how you've implemented it. Thanks for the comment, will fix.

Robin Morisset

Comment 87 2017-10-19 04:00:18 PDT

Created attachment 324216 [details] Patch

Robin Morisset

Comment 88 2017-10-19 04:02:29 PDT

This should finally be finished. Diff with the previous version of the patch: - Applied both of Saam's comments - Added more tests, and slight refactoring of the tests. - Added one RELEASE_ASSERT instead of silently failing if for whatever reason the op_enter block is not split when we are expecting it to be.

WebKit Commit Bot

Comment 89 2017-10-19 09:27:48 PDT

Comment on attachment 324216 [details] Patch Clearing flags on attachment: 324216 Committed r223691: <https://trac.webkit.org/changeset/223691>

WebKit Commit Bot

Comment 90 2017-10-19 09:27:51 PDT

All reviewed patches have been landed. Closing bug.

Michael Catanzaro

Comment 91 2017-10-19 14:11:45 PDT

Regression in bug #178543

WebKit Commit Bot

Comment 92 2017-10-25 15:11:49 PDT

Re-opened since this is blocked by bug 178834

Ryosuke Niwa

Comment 93 2017-10-25 15:18:44 PDT

This patch broke Speedometer 2's React-Redux-TodoMVC test case. To reproduce, 1. Go to http://browserbench.org/Speedometer2.0/InteractiveRunner.html 2. Click on "Unselect all" 3. Select "React-Redux-TodoMVC" 4. Click on "Run" Open Web Inspector and observe that there is an exception being thrown. You have to close the inspector & reload the page before running it again since the bug doesn't reproduce when Web Inspector is open.

Robin Morisset

Comment 94 2017-10-26 07:56:39 PDT

Thanks for the bug report, I am looking into it. Here is what I have found about it so far: - The bug appears deep within React code, taken from http://browserbench.org/Speedometer2.0/resources/todomvc/architecture-examples/react-redux/dist/index.html (sadly partially minified). - The bug is that "this" is apparently undefined in "this.moveChild", in _updateChildren in that file. - _updateChildren appears many times in the stack trace, but it does not appear to make any tail call so it should not be optimized. - The bug disappear if we refuse to make jumps to inlined functions, or if we refuse to make jumps across levels of the inline stack stack. So it is a case where we are creating a jump from one inlined function to another (higher in the call stack). - The bug does not disappear if I disable the optimization when the number of arguments is different from the number of parameters of the relevant code block (for inlined frames). - It reproduces fine with JSC_useConcurrentJIT=0 - Removing the repetition of op_enter does not make the bug disappear change. - Removing the setting of extra parameters to undefined also does not make the bug disappear or change.

Ryan Haddad

Comment 95 2017-10-26 09:42:13 PDT

*** Bug 178592 has been marked as a duplicate of this bug. ***

Robin Morisset

Comment 96 2017-10-31 10:29:41 PDT

Created attachment 325457 [details] WIP, for sharing with Saam

Robin Morisset

Comment 97 2017-11-08 12:11:05 PST

Created attachment 326343 [details] Patch for landing

Robin Morisset

Comment 98 2017-11-08 12:13:21 PST

Because of how tricky it was to write a minimized regression test, and speedometer provides some protection against regression, I decided to reland this patch with the fix described in https://bugs.webkit.org/show_bug.cgi and no extra test. I marked it r=sbarati as we discussed this together.

Robin Morisset

Comment 99 2017-11-08 12:14:25 PST

*** Bug 178834 has been marked as a duplicate of this bug. ***

WebKit Commit Bot

Comment 100 2017-11-08 12:42:19 PST

Comment on attachment 326343 [details] Patch for landing Clearing flags on attachment: 326343 Committed r224592: <https://trac.webkit.org/changeset/224592>

WebKit Commit Bot

Comment 101 2017-11-08 12:42:22 PST

All reviewed patches have been landed. Closing bug.

Robin Morisset

Comment 102 2017-11-09 02:29:22 PST

Collected 10 samples per benchmark/VM, with 10 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. Baseline TailCall n-body 837.5192+-2.5306 ^ 742.2701+-2.1430 ^ definitely 1.1283x faster merge-sort 597.9785+-5.6828 ^ 521.2900+-3.2286 ^ definitely 1.1471x faster bf-interpreter 478.1371+-4.7080 ^ 334.9072+-1.8962 ^ definitely 1.4277x faster <geometric> 620.9622+-3.5729 ^ 506.0356+-1.2795 ^ definitely 1.2271x faster No visible change on other benchmarks that I've tested.

David Kilzer (:ddkilzer)

Comment 103 2017-11-15 03:15:30 PST

<rdar://problem/35556442>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Enhancement

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component JavaScriptCore

Assignee

Robin Morisset

Reported

2017-09-08 08:16 PDT

Modified

2017-11-15 03:15 PST History

CC List

18 users Show

URL

Keywords InRadar

Duplicates (2)

178592 178834 View as bug list

Depends on

177922 178834

Blocks

178389 178390

Dependencies

tree graph