Bug 176601

Summary: Turn recursive tail calls into loops
Product: WebKit Reporter: Robin Morisset <rmorisset>
Component: JavaScriptCoreAssignee: Robin Morisset <rmorisset>
Status: RESOLVED FIXED    
Severity: Enhancement CC: benjamin, buildbot, commit-queue, fpizlo, ggaren, gskachkov, jfbastien, jlewis3, keith_miller, mark.lam, mcatanzaro, msaboff, rniwa, sam, sbarati, ticaiolima, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=178543
Bug Depends on: 177922, 178834    
Bug Blocks: 178389, 178390    
Attachments:
Description Flags
Patch
none
Archive of layout-test-results from ews103 for mac-elcapitan
none
Archive of layout-test-results from ews113 for mac-elcapitan
none
Archive of layout-test-results from ews106 for mac-elcapitan-wk2
none
Archive of layout-test-results from ews125 for ios-simulator-wk2
none
Patch
none
WIP
none
Archive of layout-test-results from ews114 for mac-elcapitan
none
Archive of layout-test-results from ews105 for mac-elcapitan-wk2
none
Archive of layout-test-results from ews103 for mac-elcapitan
none
Archive of layout-test-results from ews121 for ios-simulator-wk2
none
Still WIP, still buggy
none
WIP, stil trying to split the entry block, still buggy
none
WIP, optimisation commented out to test refactoring, and block splitting
none
Archive of layout-test-results from ews100 for mac-elcapitan
none
Archive of layout-test-results from ews107 for mac-elcapitan-wk2
none
Archive of layout-test-results from ews112 for mac-elcapitan
none
Archive of layout-test-results from ews126 for ios-simulator-wk2
none
WIP, frustrating heisenbug
none
WIP, some more bugs
none
Patch
none
WIP, fix another bug
none
WIP, more refactoring, probably still buggy
none
Archive of layout-test-results from ews104 for mac-elcapitan-wk2
none
Archive of layout-test-results from ews112 for mac-elcapitan
none
Archive of layout-test-results from ews103 for mac-elcapitan
none
Fix stupid bug in handleInlining, everything optimizing tail calls is commented out, it should hopefully pass ews tests this time
none
Known buggy, submitting to EWS to find more failing tests
none
Commented out the buggy splitting of entry block
none
Patch
none
Fixed, this passes all the stress tests, but I want some more testing before asking for review
none
Fixed a horrible design bug, and the performance regression
none
Moved the splitting of op_enter to UnlinkedCodeBlocks, and correctly recognize when it has happened
none
Archive of layout-test-results from ews114 for mac-elcapitan
none
Patch
none
Patch
none
Patch
none
WIP, for sharing with Saam
none
Patch for landing none

Description Robin Morisset 2017-09-08 08:16:35 PDT
As indicated in the title, we would like to turn recursive tail calls into simple loop, preferably before optimizations like LICM for optimal performance.
We expect this to be a big win on the (admittedly contrived) benchmark in https://bugs.webkit.org/show_bug.cgi?id=175754.

There are several difficulties. The main one is the interaction with inlining.
Consider the following example: A() calls B() that calls C() that tail calls B ().
If the inliner inlines C into B into A, then we want to insert a jump from the end of C to the beginning of B, in A.
But currently the beginning of B might be in the middle of a basic block, so we must change the inliner to split the basic block.

More surprisingly, the transformation itself must also happen at the same time and not in a later phase (for example after Fixup).
Otherwise the intervening passes would not see the control flow edge, and might hoist parts of B() above the entry point of B (into the original code of A).

So I will do the basic block splitting in DFGBytecodeParser::inlineCall and the actual transformation in DFGBytecodeParser::handleInlining.


In the first time, we will only do the optimization for monomorphic non-varargs calls for simplicity. It should be straightforward (or at least possible) to extend it to the other cases later on.
Comment 1 Robin Morisset 2017-09-15 05:39:02 PDT
Created attachment 320892 [details]
Patch
Comment 2 Robin Morisset 2017-09-15 05:39:53 PDT
This patch is WIP: it compiles but does not goes through tests cleanly yet.
Comment 3 Build Bot 2017-09-15 05:42:00 PDT
Attachment 320892 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/ChangeLog:9:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:10:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6282:  Wrong number of spaces before statement. (expected: 8)  [whitespace/indent] [4]
Total errors found: 3 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Build Bot 2017-09-15 06:33:10 PDT
Comment on attachment 320892 [details]
Patch

Attachment 320892 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/4557755

Number of test failures exceeded the failure limit.
Comment 5 Build Bot 2017-09-15 06:33:11 PDT
Created attachment 320897 [details]
Archive of layout-test-results from ews103 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 6 Build Bot 2017-09-15 06:36:55 PDT
Comment on attachment 320892 [details]
Patch

Attachment 320892 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/4557737

Number of test failures exceeded the failure limit.
Comment 7 Build Bot 2017-09-15 06:36:56 PDT
Created attachment 320898 [details]
Archive of layout-test-results from ews113 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews113  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 8 Build Bot 2017-09-15 06:37:49 PDT
Comment on attachment 320892 [details]
Patch

Attachment 320892 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/4557763

Number of test failures exceeded the failure limit.
Comment 9 Build Bot 2017-09-15 06:37:51 PDT
Created attachment 320899 [details]
Archive of layout-test-results from ews106 for mac-elcapitan-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews106  Port: mac-elcapitan-wk2  Platform: Mac OS X 10.11.6
Comment 10 Build Bot 2017-09-15 07:02:59 PDT
Comment on attachment 320892 [details]
Patch

Attachment 320892 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/4557797

Number of test failures exceeded the failure limit.
Comment 11 Build Bot 2017-09-15 07:03:01 PDT
Created attachment 320900 [details]
Archive of layout-test-results from ews125 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews125  Port: ios-simulator-wk2  Platform: Mac OS X 10.12.5
Comment 12 Robin Morisset 2017-09-18 07:41:34 PDT
Created attachment 321090 [details]
Patch

WIP, still crashes, and does not SetLocal arguments
Comment 13 Robin Morisset 2017-09-22 06:32:08 PDT
Created attachment 321539 [details]
WIP
Comment 14 Build Bot 2017-09-22 07:14:43 PDT
Comment on attachment 321539 [details]
WIP

Attachment 321539 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/4627359

Number of test failures exceeded the failure limit.
Comment 15 Build Bot 2017-09-22 07:14:44 PDT
Created attachment 321543 [details]
Archive of layout-test-results from ews114 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews114  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 16 Build Bot 2017-09-22 07:25:17 PDT
Comment on attachment 321539 [details]
WIP

Attachment 321539 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/4627484

Number of test failures exceeded the failure limit.
Comment 17 Build Bot 2017-09-22 07:25:18 PDT
Created attachment 321544 [details]
Archive of layout-test-results from ews105 for mac-elcapitan-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews105  Port: mac-elcapitan-wk2  Platform: Mac OS X 10.11.6
Comment 18 Build Bot 2017-09-22 07:30:32 PDT
Comment on attachment 321539 [details]
WIP

Attachment 321539 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/4627548

Number of test failures exceeded the failure limit.
Comment 19 Build Bot 2017-09-22 07:30:33 PDT
Created attachment 321546 [details]
Archive of layout-test-results from ews103 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 20 Build Bot 2017-09-22 07:41:10 PDT
Comment on attachment 321539 [details]
WIP

Attachment 321539 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/4627420

Number of test failures exceeded the failure limit.
Comment 21 Build Bot 2017-09-22 07:41:11 PDT
Created attachment 321547 [details]
Archive of layout-test-results from ews121 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews121  Port: ios-simulator-wk2  Platform: Mac OS X 10.12.6
Comment 22 Saam Barati 2017-09-22 10:12:04 PDT
Comment on attachment 321539 [details]
WIP

View in context: https://bugs.webkit.org/attachment.cgi?id=321539&action=review

> Source/JavaScriptCore/ChangeLog:15
> +            This makes it a lot easier to make sure that they are in the "right" inlineStackEntry, i.e. the one corresponding to the function that their successor comes from.

I'm not sure I follow this comment. What's "right" here? What's "successor"?

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1652
> +    // If the current block is not empty, we must split it so that recursive tail calls in the inlinee have somewhere to jump to.

I don't think you mean empty here given assert below. Perhaps you just mean if the block only has code for the op_call?

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1661
> +    if (m_currentIndex != m_currentBlock->bytecodeBegin) {
> +        ASSERT(!m_currentBlock->isEmpty());
> +        Node* jumpNode = addToGraph(Jump);
> +        Ref<BasicBlock> block = adoptRef(*new BasicBlock(m_currentIndex, m_numArguments, m_numLocals, m_currentBlock->executionCount));
> +        jumpNode->targetBlock() = block.ptr();
> +        m_inlineStackTop->m_blockLinkingTargets.append(block.ptr());
> +        m_currentBlock = block.ptr();
> +        m_graph.appendBlock(WTFMove(block));
> +    }

This split needs to happen after we process the setLocalQueue we may have appended to above. Otherwise, if we split, we'll generate invalid IR. I think a simple tail call that needs arity fixup will trigger this.

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6352
> +            Ref<BasicBlock> block = adoptRef(*new BasicBlock(m_currentIndex, m_numArguments, m_numLocals, 1)); // TODO: why 1 ?

Maybe we just start all blocks at 1? I know we change this value in the StaticExecutionCountEstimationPhase
Comment 23 Robin Morisset 2017-09-25 03:23:09 PDT
Created attachment 321675 [details]
Still WIP, still buggy
Comment 24 Robin Morisset 2017-09-25 06:13:34 PDT
Created attachment 321678 [details]
WIP, stil trying to split the entry block, still buggy
Comment 25 Robin Morisset 2017-09-25 07:23:07 PDT
Created attachment 321681 [details]
WIP, optimisation commented out to test refactoring, and block splitting
Comment 26 Build Bot 2017-09-25 07:24:46 PDT
Attachment 321681 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1311:  Should have a space between // and comment  [whitespace/comments] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1312:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4136:  WTF_CONCAT is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4138:  Should be indented on a separate line, with the colon or comma first on that line.  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4144:  Wrong number of spaces before statement. (expected: 12)  [whitespace/indent] [4]
Total errors found: 5 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 27 Build Bot 2017-09-25 08:09:42 PDT
Comment on attachment 321681 [details]
WIP, optimisation commented out to test refactoring, and block splitting

Attachment 321681 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/4649225

Number of test failures exceeded the failure limit.
Comment 28 Build Bot 2017-09-25 08:09:43 PDT
Created attachment 321685 [details]
Archive of layout-test-results from ews100 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews100  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 29 Build Bot 2017-09-25 08:16:37 PDT
Comment on attachment 321681 [details]
WIP, optimisation commented out to test refactoring, and block splitting

Attachment 321681 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/4649246

Number of test failures exceeded the failure limit.
Comment 30 Build Bot 2017-09-25 08:16:38 PDT
Created attachment 321686 [details]
Archive of layout-test-results from ews107 for mac-elcapitan-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107  Port: mac-elcapitan-wk2  Platform: Mac OS X 10.11.6
Comment 31 Build Bot 2017-09-25 08:19:04 PDT
Comment on attachment 321681 [details]
WIP, optimisation commented out to test refactoring, and block splitting

Attachment 321681 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/4649236

Number of test failures exceeded the failure limit.
Comment 32 Build Bot 2017-09-25 08:19:05 PDT
Created attachment 321687 [details]
Archive of layout-test-results from ews112 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 33 Build Bot 2017-09-25 08:39:46 PDT
Comment on attachment 321681 [details]
WIP, optimisation commented out to test refactoring, and block splitting

Attachment 321681 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/4649342

Number of test failures exceeded the failure limit.
Comment 34 Build Bot 2017-09-25 08:39:47 PDT
Created attachment 321689 [details]
Archive of layout-test-results from ews126 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews126  Port: ios-simulator-wk2  Platform: Mac OS X 10.12.6
Comment 35 Robin Morisset 2017-09-26 09:36:52 PDT
Created attachment 321826 [details]
WIP, frustrating heisenbug
Comment 36 Keith Miller 2017-09-26 09:54:51 PDT
Comment on attachment 321826 [details]
WIP, frustrating heisenbug

View in context: https://bugs.webkit.org/attachment.cgi?id=321826&action=review

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:83
> +#define VERBOSE_LOG(...) do { \
> +if (Options::verboseDFGByteCodeParsing()) \
> +    dataLog(__VA_ARGS__); \
> +} while (0)

Nit: You could make this 

#define VERBOSE_LOG(...) dataLogIf(Options::verboseDFGByteCodeParsing(), __VA_ARGS__);
Comment 37 Saam Barati 2017-09-26 10:07:09 PDT
Comment on attachment 321826 [details]
WIP, frustrating heisenbug

View in context: https://bugs.webkit.org/attachment.cgi?id=321826&action=review

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1278
> +        for (int i = 0; i < stackEntry->m_codeBlock->m_numVars; ++i)
> +            set(virtualRegisterForLocal(i), undefined, ImmediateNakedSet); // TODO: check that it is still reasonable to do this "ImmediateNakedSet" stuff here.

This is wrong. You need setDirect here and you need to map the virtual register manually
Comment 38 Robin Morisset 2017-09-27 08:37:41 PDT
Created attachment 321960 [details]
WIP, some more bugs
Comment 39 Robin Morisset 2017-09-27 10:52:49 PDT
Created attachment 321974 [details]
Patch
Comment 40 Robin Morisset 2017-09-28 00:32:15 PDT
Created attachment 322070 [details]
WIP, fix another bug
Comment 41 Robin Morisset 2017-09-29 07:01:05 PDT
Created attachment 322183 [details]
WIP, more refactoring, probably still buggy
Comment 42 Build Bot 2017-09-29 07:03:33 PDT
Attachment 322183 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/ChangeLog:29:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:30:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:31:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1342:  Should have a space between // and comment  [whitespace/comments] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1343:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4166:  WTF_CONCAT is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4168:  Should be indented on a separate line, with the colon or comma first on that line.  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4173:  Wrong number of spaces before statement. (expected: 12)  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6285:  Wrong number of spaces before statement. (expected: 8)  [whitespace/indent] [4]
Total errors found: 9 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 43 Build Bot 2017-09-29 07:49:12 PDT
Comment on attachment 322183 [details]
WIP, more refactoring, probably still buggy

Attachment 322183 [details] did not pass jsc-ews (mac):
Output: http://webkit-queues.webkit.org/results/4697967

New failing tests:
stress/v8-deltablue-strict.js.ftl-eager
v8-v6/v8-deltablue.js.ftl-eager-no-cjit
stress/v8-richards-strict.js.ftl-no-cjit-validate-sampling-profiler
wasm.yaml/wasm/function-tests/stack-overflow.js.default-wasm
stress/spread-calling.js.ftl-no-cjit-no-put-stack-validate
microbenchmarks/delta-blue-try-catch.js.ftl-eager-no-cjit
stress/spread-calling.js.ftl-no-cjit-validate-sampling-profiler
v8-v6/v8-deltablue.js.default
stress/v8-deltablue-strict.js.ftl-no-cjit-b3o1
v8-v6/v8-richards.js.default
microbenchmarks/delta-blue-try-catch.js.ftl-no-cjit-validate-sampling-profiler
stress/v8-deltablue-strict.js.ftl-eager-no-cjit-b3o1
stress/v8-deltablue-strict.js.ftl-no-cjit-validate-sampling-profiler
stress/spread-non-array.js.ftl-eager
stress/v8-richards-strict.js.ftl-no-cjit-no-put-stack-validate
stress/arrowfunction-lexical-bind-arguments-strict.js.ftl-eager-no-cjit-b3o1
v8-v6/v8-deltablue.js.ftl-no-cjit-no-put-stack-validate
stress/const-tdz.js.ftl-eager-no-cjit-b3o1
microbenchmarks/delta-blue-try-catch.js.ftl-eager-no-cjit-b3o1
microbenchmarks/getter-richards.js.default
basic-tests.yaml/stress-test.js.ftl-eager-no-cjit
stress/arrowfunction-lexical-bind-arguments-strict.js.ftl-eager-no-cjit
v8-v6/v8-deltablue.js.ftl-eager
stress/arrowfunction-lexical-bind-arguments-non-strict-2.js.ftl-eager-no-cjit
stress/spread-calling.js.default
stress/v8-richards-strict.js.default
stress/arguments-elimination-varargs-too-many-args-arg-count.js.ftl-eager-no-cjit-b3o1
v8-v6/v8-richards.js.ftl-no-cjit-validate-sampling-profiler
v8-v6/v8-richards.js.ftl-eager
stress/array-copywithin.js.ftl-eager-no-cjit
v8-v6/v8-richards.js.ftl-no-cjit-no-put-stack-validate
stress/v8-deltablue-strict.js.default
cdjs-tests.yaml/main.js.default
v8-v6/v8-deltablue.js.ftl-no-cjit-validate-sampling-profiler
wasm.yaml/wasm/function-tests/memory-access-past-4gib.js.wasm-no-cjit-yes-tls-context
microbenchmarks/deltablue-for-of.js.ftl-eager
microbenchmarks/delta-blue-try-catch.js.ftl-no-cjit-no-put-stack-validate
wasm.yaml/wasm/js-api/export-arity.js.wasm-no-cjit-yes-tls-context
stress/spread-non-array.js.ftl-no-cjit-no-put-stack-validate
stress/arguments-elimination-varargs-too-many-args-arg-count.js.ftl-eager-no-cjit
microbenchmarks/deltablue-for-of.js.ftl-eager-no-cjit
microbenchmarks/fake-iterators-that-throw-when-finished.js.ftl-eager
microbenchmarks/deltablue-for-of.js.ftl-eager-no-cjit-b3o1
stress/const-tdz.js.ftl-eager-no-cjit
stress/arrowfunction-lexical-bind-arguments-non-strict-2.js.ftl-eager-no-cjit-b3o1
stress/v8-deltablue-strict.js.ftl-no-cjit-no-put-stack-validate
microbenchmarks/delta-blue-try-catch.js.ftl-eager
stress/spread-non-array.js.default
basic-tests.yaml/stress-test.js.default
wasm.yaml/wasm/fuzz/export-function.js.wasm-slow-memory
basic-tests.yaml/stress-test.js.ftl-no-cjit
microbenchmarks/fake-iterators-that-throw-when-finished.js.ftl-no-cjit-validate-sampling-profiler
stress/spread-non-array.js.ftl-no-cjit-b3o1
v8-v6/v8-deltablue.js.ftl-eager-no-cjit-b3o1
cdjs-tests.yaml/main.js.ftl-no-cjit
wasm.yaml/wasm/function-tests/stack-overflow.js.wasm-no-cjit-yes-tls-context
microbenchmarks/delta-blue-try-catch.js.ftl-no-cjit-b3o1
v8-v6/v8-deltablue.js.ftl-no-cjit-b3o1
microbenchmarks/fake-iterators-that-throw-when-finished.js.ftl-no-cjit-b3o1
wasm.yaml/wasm/function-tests/stack-overflow.js.wasm-no-call-ic
microbenchmarks/fake-iterators-that-throw-when-finished.js.default
jsc-layout-tests.yaml/js/script-tests/arrowfunction-lexical-bind-arguments-non-strict.js.layout-ftl-eager-no-cjit
stress/lexical-let-tdz.js.ftl-eager-no-cjit
stress/lexical-let-tdz.js.ftl-eager-no-cjit-b3o1
stress/spread-non-array.js.ftl-no-cjit-validate-sampling-profiler
v8-v6/v8-richards.js.ftl-no-cjit-b3o1
stress/spread-calling.js.ftl-no-cjit-b3o1
microbenchmarks/delta-blue-try-catch.js.default
wasm.yaml/wasm/js-api/export-arity.js.default-wasm
wasm.yaml/wasm/function-tests/stack-overflow.js.wasm-slow-memory
stress/v8-richards-strict.js.ftl-no-cjit-b3o1
wasm.yaml/wasm/fuzz/memory.js.wasm-eager-jettison
wasm.yaml/wasm/function-tests/stack-overflow.js.wasm-eager-jettison
stress/const-loop-semantics.js.ftl-eager-no-cjit-b3o1
microbenchmarks/fake-iterators-that-throw-when-finished.js.ftl-no-cjit-no-put-stack-validate
jsc-layout-tests.yaml/js/script-tests/arrowfunction-lexical-bind-arguments-strict.js.layout-ftl-eager-no-cjit
stress/v8-deltablue-strict.js.ftl-eager-no-cjit
stress/spread-non-array.js.ftl-eager-no-cjit-b3o1
wasm.yaml/wasm/js-api/export-arity.js.wasm-slow-memory
Comment 44 Build Bot 2017-09-29 08:12:46 PDT
Comment on attachment 322183 [details]
WIP, more refactoring, probably still buggy

Attachment 322183 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/4698039

Number of test failures exceeded the failure limit.
Comment 45 Build Bot 2017-09-29 08:12:48 PDT
Created attachment 322185 [details]
Archive of layout-test-results from ews104 for mac-elcapitan-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews104  Port: mac-elcapitan-wk2  Platform: Mac OS X 10.11.6
Comment 46 Build Bot 2017-09-29 08:40:07 PDT
Comment on attachment 322183 [details]
WIP, more refactoring, probably still buggy

Attachment 322183 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/4698098

New failing tests:
imported/w3c/web-platform-tests/dom/ranges/Range-mutations-dataChange.html
imported/w3c/web-platform-tests/encoding/textdecoder-fatal-single-byte.html
imported/w3c/web-platform-tests/dom/ranges/Range-intersectsNode.html
imported/w3c/web-platform-tests/html/syntax/parsing/named-character-references.html
imported/w3c/web-platform-tests/WebCryptoAPI/generateKey/test_failures.https.html
imported/w3c/web-platform-tests/encoding/textdecoder-labels.html
imported/w3c/web-platform-tests/encoding/api-invalid-label.html
workers/wasm-long-compile.html
imported/w3c/web-platform-tests/html/the-xhtml-syntax/parsing-xhtml-documents/xhtml-mathml-dtd-entity-10.htm
Comment 47 Build Bot 2017-09-29 08:40:09 PDT
Created attachment 322186 [details]
Archive of layout-test-results from ews112 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 48 Build Bot 2017-09-29 08:55:03 PDT
Comment on attachment 322183 [details]
WIP, more refactoring, probably still buggy

Attachment 322183 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/4698189

New failing tests:
imported/w3c/web-platform-tests/dom/ranges/Range-mutations-dataChange.html
imported/w3c/web-platform-tests/dom/ranges/Range-intersectsNode.html
imported/w3c/web-platform-tests/encoding/textdecoder-fatal-single-byte.html
imported/w3c/web-platform-tests/dom/ranges/Range-isPointInRange.html
imported/w3c/web-platform-tests/encoding/textdecoder-labels.html
imported/w3c/web-platform-tests/dom/ranges/Range-mutations.html
imported/w3c/web-platform-tests/html/dom/reflection-grouping.html
imported/w3c/web-platform-tests/html/dom/reflection-misc.html
imported/w3c/web-platform-tests/dom/ranges/Range-compareBoundaryPoints.html
imported/w3c/web-platform-tests/dom/ranges/Range-comparePoint.html
imported/w3c/web-platform-tests/WebCryptoAPI/derive_bits_keys/test_hkdf.https.html
imported/w3c/web-platform-tests/html/dom/reflection-metadata.html
imported/w3c/web-platform-tests/html/dom/reflection-sections.html
imported/w3c/web-platform-tests/WebCryptoAPI/generateKey/test_failures.https.html
imported/w3c/web-platform-tests/encoding/api-invalid-label.html
imported/w3c/web-platform-tests/html/dom/reflection-forms.html
imported/w3c/web-platform-tests/html/dom/reflection-text.html
imported/w3c/web-platform-tests/dom/ranges/Range-set.html
imported/w3c/web-platform-tests/html/dom/reflection-obsolete.html
imported/w3c/web-platform-tests/html/dom/reflection-tabular.html
imported/w3c/web-platform-tests/html/syntax/parsing/named-character-references.html
imported/w3c/web-platform-tests/html/dom/reflection-embedded.html
workers/wasm-long-compile.html
imported/w3c/web-platform-tests/html/the-xhtml-syntax/parsing-xhtml-documents/xhtml-mathml-dtd-entity-10.htm
Comment 49 Build Bot 2017-09-29 08:55:09 PDT
Created attachment 322187 [details]
Archive of layout-test-results from ews103 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 50 Robin Morisset 2017-10-02 15:18:15 PDT
Created attachment 322455 [details]
Fix stupid bug in handleInlining, everything optimizing tail calls is commented  out, it should hopefully pass ews tests this time
Comment 51 Build Bot 2017-10-02 15:20:32 PDT
Attachment 322455 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1342:  Should have a space between // and comment  [whitespace/comments] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1343:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4177:  WTF_CONCAT is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4179:  Should be indented on a separate line, with the colon or comma first on that line.  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4184:  Wrong number of spaces before statement. (expected: 12)  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6265:  Wrong number of spaces before statement. (expected: 8)  [whitespace/indent] [4]
Total errors found: 6 in 6 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 52 Robin Morisset 2017-10-03 05:20:45 PDT
Created attachment 322511 [details]
Known buggy, submitting to EWS to find more failing tests
Comment 53 Build Bot 2017-10-03 05:21:49 PDT
Attachment 322511 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1342:  Should have a space between // and comment  [whitespace/comments] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1343:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4177:  WTF_CONCAT is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4179:  Should be indented on a separate line, with the colon or comma first on that line.  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4184:  Wrong number of spaces before statement. (expected: 12)  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6264:  Wrong number of spaces before statement. (expected: 8)  [whitespace/indent] [4]
Total errors found: 6 in 6 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 54 Build Bot 2017-10-03 06:00:15 PDT
Comment on attachment 322511 [details]
Known buggy, submitting to EWS to find more failing tests

Attachment 322511 [details] did not pass jsc-ews (mac):
Output: http://webkit-queues.webkit.org/results/4739483

New failing tests:
stress/arguments-elimination-varargs-too-many-args-arg-count.js.ftl-eager-no-cjit-b3o1
stress/arguments-elimination-varargs-too-many-args-arg-count.js.ftl-eager-no-cjit
Comment 55 Robin Morisset 2017-10-03 10:43:14 PDT
Created attachment 322545 [details]
Commented out the buggy splitting of entry block
Comment 56 Build Bot 2017-10-03 10:46:35 PDT
Attachment 322545 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1313:  Multi line control clauses should use braces.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1343:  Should have a space between // and comment  [whitespace/comments] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1344:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4178:  WTF_CONCAT is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4180:  Should be indented on a separate line, with the colon or comma first on that line.  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4185:  Wrong number of spaces before statement. (expected: 12)  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6267:  Wrong number of spaces before statement. (expected: 8)  [whitespace/indent] [4]
Total errors found: 7 in 6 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 57 Keith Miller 2017-10-03 13:24:50 PDT
Comment on attachment 322545 [details]
Commented out the buggy splitting of entry block

View in context: https://bugs.webkit.org/attachment.cgi?id=322545&action=review

Seems reasonable but there are some things you should clean up.

> Source/JavaScriptCore/ChangeLog:17
> +        - m_linkingTargets is a dictionnary from bytecode indices to BasicBlock*

typo: dictionnary -> dictionary.

> Source/JavaScriptCore/ChangeLog:21
> +        terminal that need to be taken off m_unlinkedBlocks.

typo: needs

> Source/JavaScriptCore/ChangeLog:28
> +        - The 7 and 8 arguments of handleCall were inlined in its 3 arguments version for readability
> +        - The 9 argument version was cleaned up and simplified

I feel like something is wrong when a function has more than 6 arguments... :(

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1236
> +    BasicBlock* blockPtr = block.ptr();

Nit: you could probably drop this line.

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1273
> +    Node* callTarget = get(VirtualRegister(pc[2].u.operand));
> +    int argumentCountIncludingThis = pc[3].u.operand;
> +    int registerOffset = -pc[4].u.operand;

It would be great if you gave these names in the BytecodeList.json and used them here.

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1325
> +bool ByteCodeParser::handleRecursiveTailCall(Node* callTargetNode, VirtualRegister thisArgument, CallLinkStatus callLinkStatus)
> +{
> +    // We currently only do this optimisation in the simple, non-polymorphic case.
> +    if (callLinkStatus.couldTakeSlowPath() || callLinkStatus.size() != 1)
> +        return false;
> +
> +    auto targetExecutable = callLinkStatus[0].executable();
> +    InlineStackEntry* stackEntry = m_inlineStackTop;
> +    do {
> +        if (targetExecutable != stackEntry->executable())
> +            continue;
> +
> +        VERBOSE_LOG("   We found a recursive tail, optimizing it to a jump to block ", *stackEntry->m_callsiteBlockHead, ".\n");
> +        // TODO: set the arguments to the right value
> +        // TODO: shadow chicken log
> +
> +        // We must also add some check that the profiling information was correct and the target of this call is what we thought
> +        emitFunctionChecks(callLinkStatus[0], callTargetNode, thisArgument);
> +
> +        VERBOSE_LOG("   Repeating the work of op_enter as we are jumping to right after it. \n");
> +        Node* undefined = addToGraph(JSConstant, OpInfo(m_constantUndefined));
> +        // Initialize all locals to undefined.
> +        for (int i = 0; i < stackEntry->m_codeBlock->m_numVars; ++i)
> +            setDirect(stackEntry->remapOperand(virtualRegisterForLocal(i)), undefined, ImmediateSetWithFlush);
> +            // TODO: check that ImmediateSetWithFlush is the right pick here.
> +
> +        Node* jumpNode = addToGraph(Jump);
> +        jumpNode->targetBlock() = stackEntry->m_callsiteBlockHead;
> +        m_currentBlock->didLink();
> +        flushForTerminal();
> +        return true;
> +    } while ((stackEntry = stackEntry->m_caller));
> +
> +    // The tail call was not recursive
> +    return false;

This function isn't used, can you remove it for now?

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1344
> +        //if (op == TailCall && handleRecursiveTailCall(callTarget, thisArgument, callLinkStatus))
> +          //  return Terminal;

Please remove commented code.

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2052
> +            // TODO: we should be able to get rid of this somehow.

Make this a FIXME and link a bugzilla.

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2080
> +            // TODO: we can avoid this indirection in the case of normal inlined call by passing the continuation block all the way into inlineStackEntry->m_continuationBlock in inlineCall.

ditto.

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4252
> +            // to be true. TODO: change this.

Ditto.

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5276
> +            if (m_inlineStackTop->m_returnValue.isValid()) // TODO: understand what happens if it is not.

Ditto.

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5284
> +                // TODO: should there be a special case for when we are returning from the first block? I do not see why, but there was one.

Ditto.

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6257
> +    BasicBlock* callsiteBlockHead, // TODO: remove from constructor, since it is set in parseBlock, when dealing with op_enter

Ditto.
Comment 58 Robin Morisset 2017-10-04 07:55:27 PDT
> > Source/JavaScriptCore/ChangeLog:28
> > +        - The 7 and 8 arguments of handleCall were inlined in its 3 arguments version for readability
> > +        - The 9 argument version was cleaned up and simplified
> 
> I feel like something is wrong when a function has more than 6 arguments...
> :(

I totally agree, but I saw no easy way to solve this problem, and my refactoring was already pretty big.

> > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1236
> > +    BasicBlock* blockPtr = block.ptr();
> 
> Nit: you could probably drop this line.

I am not sure it would remain correct to refer to block.ptr() after I WTFMove(block).

> > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1273
> > +    Node* callTarget = get(VirtualRegister(pc[2].u.operand));
> > +    int argumentCountIncludingThis = pc[3].u.operand;
> > +    int registerOffset = -pc[4].u.operand;
> 
> It would be great if you gave these names in the BytecodeList.json and used
> them here.

Is there a way to do so that correctly negate the registerOffset, and correctly "get"s the callTarget? If not, I am not sure what to call those before the operands are processed.
Comment 59 Saam Barati 2017-10-04 13:09:40 PDT
Comment on attachment 322545 [details]
Commented out the buggy splitting of entry block

View in context: https://bugs.webkit.org/attachment.cgi?id=322545&action=review

Robin, is the goal of this review to get feedback on the tail loop stuff, or to first check in the refactoring? If the latter, then let's open a new bug with a title for the refactoring.

>> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2052
>> +            // TODO: we should be able to get rid of this somehow.
> 
> Make this a FIXME and link a bugzilla.

What's your conclusion here? I don't vote for a bugzilla unless we know how we'll get rid of this.

>> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2080
>> +            // TODO: we can avoid this indirection in the case of normal inlined call by passing the continuation block all the way into inlineStackEntry->m_continuationBlock in inlineCall.
> 
> ditto.

Seems like this might be worth just doing.

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:4292
> +            if (false) {
> +            VERBOSE_LOG("Splitting the entry block for potential recursive tail calls.\n");
> +            Node* jumpNode = addToGraph(Jump);
> +            BasicBlock* newBlock = allocateUntargetableBlock();
> +            jumpNode->targetBlock() = newBlock;
> +            m_currentBlock->didLink();
> +            m_currentBlock = newBlock;
> +            }

What happens when you run this code?

>> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5276
>> +            if (m_inlineStackTop->m_returnValue.isValid()) // TODO: understand what happens if it is not.
> 
> Ditto.

Going against Keith's ditto here: please don't file a bug on you not fully understanding of the code. I'm like 95% sure this has to do w/ setters, but regardless of that, you just figure out why this happens, and remove the TODO

>> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:5284
>> +                // TODO: should there be a special case for when we are returning from the first block? I do not see why, but there was one.
> 
> Ditto.

Ditto here. This seems like something you need to figure out before landing

>> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:6257
>> +    BasicBlock* callsiteBlockHead, // TODO: remove from constructor, since it is set in parseBlock, when dealing with op_enter
> 
> Ditto.

Ditto here. This is something you should either do or remove your TODO
Comment 60 Robin Morisset 2017-10-06 09:06:56 PDT
I moved the refactoring part of the patch to https://bugs.webkit.org/show_bug.cgi?id=177922 so that it can be reviewed more easily.
Comment 61 Robin Morisset 2017-10-11 13:42:12 PDT
Created attachment 323457 [details]
Patch
Comment 62 Robin Morisset 2017-10-11 15:40:12 PDT
Created attachment 323474 [details]
Fixed, this passes all the stress tests, but I want some more testing before asking for review
Comment 63 Build Bot 2017-10-11 15:41:24 PDT
Attachment 323474 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:215:  The parameter name "stackEntry" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 1 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 64 Robin Morisset 2017-10-11 16:10:46 PDT
Benchmark report for Octane on MacBook-Pro-5 (MacBookPro14,3).

VMs tested:
"Baseline" at /Users/rmorisset/Webkit2/OpenSource/WebKitBuild/Baseline/Release/jsc
"TailCall" at /Users/rmorisset/Webkit2/OpenSource/WebKitBuild/TailCall3/Release/jsc

Collected 6 samples per benchmark/VM, with 6 VM invocations per benchmark. Emitted a call to gc()
between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the
jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution
times with 95% confidence intervals in milliseconds.

                            Baseline                  TailCall                                     

encrypt                 0.12516+-0.00200    ?     0.12576+-0.00276       ?
decrypt                 2.20316+-0.02487          2.19586+-0.00988       
deltablue      x2       0.11811+-0.00486          0.11528+-0.00436         might be 1.0245x faster
earley                  0.23454+-0.00292    ?     0.23467+-0.00234       ?
boyer                   4.09504+-0.03587          4.08283+-0.04547       
navier-stokes  x2       4.55212+-0.04389          4.54719+-0.02290       
raytrace       x2       0.63796+-0.01012    !     2.19885+-0.00639       ! definitely 3.4467x slower
richards       x2       0.07406+-0.00089          0.07396+-0.00030       
splay          x2       0.29583+-0.00294          0.29561+-0.00234       
regexp         x2      14.64647+-0.37289    !    15.44706+-0.18671       ! definitely 1.0547x slower
pdfjs          x2      36.56302+-0.18624    ?    36.80115+-0.34733       ?
mandreel       x2      35.58116+-0.36891    ?    35.85318+-0.47004       ?
gbemu          x2      27.57263+-0.34381    ?    27.62819+-0.47350       ?
closure                 0.42556+-0.00235    ?     0.42734+-0.00604       ?
jquery                  5.60274+-0.02243    ?     5.62641+-0.11612       ?
box2d          x2       7.08661+-0.03697    ?     7.10947+-0.09351       ?
zlib           x2     290.65739+-1.10185    ?   291.59003+-3.20236       ?
typescript     x2     582.97498+-17.32713   ?   588.47603+-9.79813       ?

<geometric>             4.34865+-0.01869    !     4.74215+-0.01957       ! definitely 1.0905x slower

So clearly something is very wrong with this patch. Sunspider showed no performance change.
Comment 65 Robin Morisset 2017-10-11 17:36:24 PDT
After running a few tests, it is clear that this regression is entirely caused by the splitting of op_enter blocks: in particular it reproduces even if handleRecursiveTailCall is replaced by return false.
Comment 66 Robin Morisset 2017-10-12 11:27:20 PDT
Some more things about this performance regression:
- It only shows on Octane/raytrace: neither sun spider, kraken or microbenchmarks have any significant regression (maybe a 1% regression on Kraken, to investigate later)
- It is caused purely by the splitting of entry blocks, not by the actual optimisation
- It persists even when not doing the splitting for functions at the leaves of the call graph (through testing for the emptiness of callLinkInfoMap).
- It persists even when running DFGCFGSimplificationPhase right after the byte code parser (to eliminate unnecessary jumps)
- It does not appear to be located in a single JS function: when attempting to bisect with JSC_bytecodeRangeToDFGCompile, the difference with the baseline progressively shrinks instead of having a cliff somewhere.
Comment 67 Robin Morisset 2017-10-12 13:44:10 PDT
Created attachment 323558 [details]
Fixed a horrible design bug, and the performance regression
Comment 68 Robin Morisset 2017-10-12 13:45:39 PDT
Benchmark report for Octane on MacBook-Pro-5 (MacBookPro14,3).

VMs tested:
"Baseline" at /Users/rmorisset/Webkit2/OpenSource/WebKitBuild/Baseline/Release/jsc
"Tail" at /Users/rmorisset/Webkit2/OpenSource/WebKitBuild/TailCall3/Release/jsc

Collected 8 samples per benchmark/VM, with 8 VM invocations per benchmark. Emitted a call to gc()
between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the
jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution
times with 95% confidence intervals in milliseconds.

                            Baseline                    Tail                                       

encrypt                 0.13207+-0.00161    ?     0.13261+-0.00127       ?
decrypt                 2.34831+-0.04530          2.33041+-0.01246       
deltablue      x2       0.12342+-0.00190    ?     0.12351+-0.00322       ?
earley                  0.24859+-0.00332          0.24785+-0.00275       
boyer                   4.31797+-0.07807          4.28364+-0.03930       
navier-stokes  x2       4.87890+-0.06596          4.84418+-0.05741       
raytrace       x2       0.68536+-0.00741          0.68159+-0.00531       
richards       x2       0.07856+-0.00083    ?     0.07917+-0.00139       ?
splay          x2       0.31365+-0.00953          0.30422+-0.00139         might be 1.0310x faster
regexp         x2      15.52398+-0.16968         15.51919+-0.29346       
pdfjs          x2      39.02238+-0.21603    ?    39.06900+-0.24751       ?
mandreel       x2      38.14925+-0.39561         38.03272+-0.60931       
gbemu          x2      29.17857+-0.55032    ?    29.73977+-1.43515       ? might be 1.0192x slower
closure                 0.45147+-0.00498    ?     0.45274+-0.01029       ?
jquery                  6.08769+-0.12487          6.01421+-0.07787         might be 1.0122x faster
box2d          x2       7.77007+-0.22361          7.56490+-0.05053         might be 1.0271x faster
zlib           x2     306.73149+-3.07770    ?   307.86919+-2.63745       ?
typescript     x2     606.15997+-7.28700        599.21710+-6.84371         might be 1.0116x faster

<geometric>             4.62426+-0.02182          4.60428+-0.01928         might be 1.0043x faster

---------------------

Benchmark report for Kraken on MacBook-Pro-5 (MacBookPro14,3).

VMs tested:
"Baseline" at /Users/rmorisset/Webkit2/OpenSource/WebKitBuild/Baseline/Release/jsc
"Tail" at /Users/rmorisset/Webkit2/OpenSource/WebKitBuild/TailCall3/Release/jsc

Collected 8 samples per benchmark/VM, with 8 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime()
function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in
milliseconds.

                                           Baseline                    Tail                                       

ai-astar                                71.707+-3.284      ?      74.090+-3.158         ? might be 1.0332x slower
audio-beat-detection                    34.792+-0.521      ?      36.688+-3.837         ? might be 1.0545x slower
audio-dft                               85.256+-3.956      ?      86.756+-2.058         ? might be 1.0176x slower
audio-fft                               24.700+-1.414      ?      24.726+-1.062         ?
audio-oscillator                        48.445+-1.414      ?      51.629+-4.553         ? might be 1.0657x slower
imaging-darkroom                        52.437+-1.327      ?      52.996+-1.902         ? might be 1.0107x slower
imaging-desaturate                      45.185+-2.413             44.590+-1.823           might be 1.0134x faster
imaging-gaussian-blur                   57.064+-3.813      ?      58.884+-5.177         ? might be 1.0319x slower
json-parse-financial                    31.760+-2.089             31.531+-1.447         
json-stringify-tinderbox                19.834+-0.519      ?      20.088+-0.547         ? might be 1.0128x slower
stanford-crypto-aes                     34.609+-1.645      ?      35.127+-1.028         ? might be 1.0150x slower
stanford-crypto-ccm                     31.612+-2.898             30.550+-2.354           might be 1.0348x faster
stanford-crypto-pbkdf2                  58.973+-1.840             57.927+-1.497           might be 1.0180x faster
stanford-crypto-sha256-iterative        18.413+-0.292      ?      18.814+-0.720         ? might be 1.0218x slower

<arithmetic>                            43.913+-0.329      ?      44.600+-0.709         ? might be 1.0156x slower

So the regression on raytrace is solved, but there might be a 1.5% slowdown on Kraken.
Comment 69 Build Bot 2017-10-12 13:47:03 PDT
Attachment 323558 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:215:  The parameter name "stackEntry" adds no information, so it should be removed.  [readability/parameter_name] [5]
ERROR: Source/JavaScriptCore/ChangeLog:12:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:13:  Line contains tab character.  [whitespace/tab] [5]
Total errors found: 3 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 70 Saam Barati 2017-10-12 13:48:25 PDT
Comment on attachment 323558 [details]
Fixed a horrible design bug, and the performance regression

View in context: https://bugs.webkit.org/attachment.cgi?id=323558&action=review

> Source/JavaScriptCore/bytecode/PreciseJumpTargets.cpp:68
> +        // We need to insert a jump after op_enter, so recursive tail calls have somewhere to jump to.
> +        // But we only want to pay that price for functions that have at least one tail call.
> +        CallLinkInfoMap map;
> +        codeBlock->getCallLinkInfoMap(map);
> +        for (auto info : map.values()) {
> +            if (info->callMode() == CallMode::Tail) {
> +                out.append(bytecodeOffset + opcodeLengths[op_enter]);
> +                break;
> +            }
> +        }
> +    }

You should just set a bit if a CodeBlock has an op_tail_call in it. You can probably do it in bytecode generator via UnlinkedCodeBlock.
Comment 71 Saam Barati 2017-10-12 13:53:49 PDT
Comment on attachment 323558 [details]
Fixed a horrible design bug, and the performance regression

View in context: https://bugs.webkit.org/attachment.cgi?id=323558&action=review

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1433
> +        if (auto callFrame = stackEntry->m_inlineCallFrame) {

nit: I prefer a type here, or at least, auto* instead of auto.

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1436
> +            if (argumentCountIncludingThis != (int) callFrame->argumentCountIncludingThis)

Style: WebKit style uses static_cast instead of C-style casts.

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1465
> +        // We want to emit the SetLocals with a code origin that points to the place we are jumping to.

Nit: "with a code origin" => "with an exit origin"
since the semantic origin won't be there.
Comment 72 Robin Morisset 2017-10-13 11:17:48 PDT
Created attachment 323710 [details]
Moved the splitting of op_enter to UnlinkedCodeBlocks, and correctly recognize when it has happened
Comment 73 Build Bot 2017-10-13 11:19:27 PDT
Attachment 323710 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:215:  The parameter name "stackEntry" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 1 in 7 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 74 Robin Morisset 2017-10-13 11:21:03 PDT
We now see an improvement of TailBench9000 but it is rather disappointing:
- n-body-run.js roughly went from 1.05s to 0.95s
- merge-sort-run.js roughly went from 1.40s to 1.10s

I have not yet ran other benchmarks on this last version of the code.
Comment 75 Robin Morisset 2017-10-13 11:39:21 PDT
It appears that the optimization currently hurts tier-up from DFG to FTL, I am investigating how to fix it.
Comment 76 Build Bot 2017-10-13 18:37:27 PDT
Comment on attachment 323710 [details]
Moved the splitting of op_enter to UnlinkedCodeBlocks, and correctly recognize when it has happened

Attachment 323710 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/4852670

New failing tests:
http/tests/loading/basic-auth-resend-wrong-credentials.html
Comment 77 Build Bot 2017-10-13 18:37:29 PDT
Created attachment 323785 [details]
Archive of layout-test-results from ews114 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews114  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 78 Robin Morisset 2017-10-17 08:56:05 PDT
Comment on attachment 323710 [details]
Moved the splitting of op_enter to UnlinkedCodeBlocks, and correctly recognize when it has happened

My experiments with inserting LoopHint have not improved performance so far, so I would like to get this patch landed as is, and make it a separate bug.

The build-bot failure looks very much like a flaky test to me, I don't see how my patch could break layout. Is there a simple way to convince the buildbot to try again? Or should I just resubmit the patch?
Comment 79 Sam Weinig 2017-10-17 09:19:35 PDT
(In reply to Robin Morisset from comment #78)
Is there a simple way to convince the
> buildbot to try again? Or should I just resubmit the patch?

I think you just need to re-upload the change unfortunately.
Comment 80 Saam Barati 2017-10-17 12:18:52 PDT
Comment on attachment 323710 [details]
Moved the splitting of op_enter to UnlinkedCodeBlocks, and correctly recognize when it has happened

View in context: https://bugs.webkit.org/attachment.cgi?id=323710&action=review

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1418
> +    // We currently only do this optimisation in the simple, non-polymorphic case.

Maybe add a FIXME with a bugzilla link here since you told me you wanted to do this and you thought it'd be pretty easy?

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1429
> +        BasicBlock** entryBlockPtr = tryBinarySearch<BasicBlock*, unsigned>(stackEntry->m_blockLinkingTargets, stackEntry->m_blockLinkingTargets.size(), OPCODE_LENGTH(op_enter), getBytecodeBeginForBlock);

m_blockLInkingTargets is sorted?

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1480
> +        // It would be unsound to jump over a non-tail call: the "tail" call is not really a tail call in that case.

Please add tests for this since you didn't find any when running the tests with the earlier broken patch. It'd be good to verify your earlier broken version would fail such a test.
Comment 81 Robin Morisset 2017-10-18 07:44:01 PDT
Created attachment 324116 [details]
Patch
Comment 82 Robin Morisset 2017-10-18 07:51:07 PDT
(In reply to Saam Barati from comment #80)
> Comment on attachment 323710 [details]
> Moved the splitting of op_enter to UnlinkedCodeBlocks, and correctly
> recognize when it has happened
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=323710&action=review
> 
> > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1418
> > +    // We currently only do this optimisation in the simple, non-polymorphic case.
> 
> Maybe add a FIXME with a bugzilla link here since you told me you wanted to
> do this and you thought it'd be pretty easy?

Done

> > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1429
> > +        BasicBlock** entryBlockPtr = tryBinarySearch<BasicBlock*, unsigned>(stackEntry->m_blockLinkingTargets, stackEntry->m_blockLinkingTargets.size(), OPCODE_LENGTH(op_enter), getBytecodeBeginForBlock);
> 
> m_blockLInkingTargets is sorted?

Yes. It was already mentioned in a comment; I added some ASSERTs to allocateTargetableBlock and makeBlockTargetable.

> > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1480
> > +        // It would be unsound to jump over a non-tail call: the "tail" call is not really a tail call in that case.
> 
> Please add tests for this since you didn't find any when running the tests
> with the earlier broken patch. It'd be good to verify your earlier broken
> version would fail such a test.

I added some test, but the buggy code currently appears to be passing the test somehow, I am investigating.
Comment 83 Robin Morisset 2017-10-18 08:57:44 PDT
Created attachment 324120 [details]
Patch
Comment 84 Robin Morisset 2017-10-18 09:07:56 PDT
Mystery solved: there was yet another bug (!) that was causing this entire code path not to be exercised by the tests for either the buggy or the "fixed" version of the code.

In short, because each code block has a superset of the jump targets, a function with no control-flow was deemed to have no jump targets, even if it had a tail call, inhibiting the splitting of the basic block that includes op_enter.

After fixing this bug, we can now correctly detect the other bugs from previous versions of this patch with the test. I also added another test as I realized that one code path was still not tested. The tests now (hopefully) correctly find wrong code generation; but I do not see how to test that the transformation occurred when it should.
Comment 85 Saam Barati 2017-10-18 11:46:56 PDT
Comment on attachment 324120 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=324120&action=review

r=me with a few comments.

> Source/JavaScriptCore/bytecode/CodeBlock.h:928
> +    bool hasTailCalls() const {return m_unlinkedCode->hasTailCalls();}

style nit: this should be written as:
{ return m_unlinkedCode->hasTailCalls(); }
for the spacing to follow webkit style

> Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:1321
> +    // We must add the end of op_enter as a potential jump target, because the bytecode parser may decide to split its basic block
> +    // to have somewhere to jump to if there is a recursive tail-call that points to this function.
> +    m_codeBlock->addJumpTarget(instructions().size());
> +    // This disables peephole optimizations when an instruction is a jump target
> +    m_lastOpcodeID = op_end;

Is it worth just emitting a label since that will do this work for us? Or should we keep this as is?

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1469
> +        // We jump right after it and not before it, because of some invariant saying that an OSR entry point cannot have predecessors in the IR.

This is not an accurate comment. The invariant is that a root in the CFG cannot have a predecessor. An OSR entrypoint *can* have predecessors. For example, loops are often OSR entry points. FWIW, we could totally jump to op_enter, we'd just need to split before op_enter, not after. This may be awkward. I'm totally OK with how you've implemented it.
Comment 86 Robin Morisset 2017-10-19 02:20:25 PDT
(In reply to Saam Barati from comment #85)
> Comment on attachment 324120 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=324120&action=review
> 
> r=me with a few comments.
> 
> > Source/JavaScriptCore/bytecode/CodeBlock.h:928
> > +    bool hasTailCalls() const {return m_unlinkedCode->hasTailCalls();}
> 
> style nit: this should be written as:
> { return m_unlinkedCode->hasTailCalls(); }
> for the spacing to follow webkit style


Done

> > Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:1321
> > +    // We must add the end of op_enter as a potential jump target, because the bytecode parser may decide to split its basic block
> > +    // to have somewhere to jump to if there is a recursive tail-call that points to this function.
> > +    m_codeBlock->addJumpTarget(instructions().size());
> > +    // This disables peephole optimizations when an instruction is a jump target
> > +    m_lastOpcodeID = op_end;
> 
> Is it worth just emitting a label since that will do this work for us? Or
> should we keep this as is?

That was my first idea. The problem is that we would then have a Ref<Label> with nowhere to put it. It is not a serious problem, it just felt wrong to allocate data that we would never even read back before freeing it. I can change it back if you prefer.

> > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1469
> > +        // We jump right after it and not before it, because of some invariant saying that an OSR entry point cannot have predecessors in the IR.
> 
> This is not an accurate comment. The invariant is that a root in the CFG
> cannot have a predecessor. An OSR entrypoint *can* have predecessors. For
> example, loops are often OSR entry points. FWIW, we could totally jump to
> op_enter, we'd just need to split before op_enter, not after. This may be
> awkward. I'm totally OK with how you've implemented it.

Thanks for the comment, will fix.
Comment 87 Robin Morisset 2017-10-19 04:00:18 PDT
Created attachment 324216 [details]
Patch
Comment 88 Robin Morisset 2017-10-19 04:02:29 PDT
This should finally be finished.
Diff with the previous version of the patch:
- Applied both of Saam's comments
- Added more tests, and slight refactoring of the tests.
- Added one RELEASE_ASSERT instead of silently failing if for whatever reason the op_enter block is not split when we are expecting it to be.
Comment 89 WebKit Commit Bot 2017-10-19 09:27:48 PDT
Comment on attachment 324216 [details]
Patch

Clearing flags on attachment: 324216

Committed r223691: <https://trac.webkit.org/changeset/223691>
Comment 90 WebKit Commit Bot 2017-10-19 09:27:51 PDT
All reviewed patches have been landed.  Closing bug.
Comment 91 Michael Catanzaro 2017-10-19 14:11:45 PDT
Regression in bug #178543
Comment 92 WebKit Commit Bot 2017-10-25 15:11:49 PDT
Re-opened since this is blocked by bug 178834
Comment 93 Ryosuke Niwa 2017-10-25 15:18:44 PDT
This patch broke Speedometer 2's React-Redux-TodoMVC test case.
To reproduce,

1. Go to http://browserbench.org/Speedometer2.0/InteractiveRunner.html
2. Click on "Unselect all"
3. Select "React-Redux-TodoMVC"
4. Click on "Run"

Open Web Inspector and observe that there is an exception being thrown.

You have to close the inspector & reload the page before running it again since the bug doesn't reproduce when Web Inspector is open.
Comment 94 Robin Morisset 2017-10-26 07:56:39 PDT
Thanks for the bug report, I am looking into it.

Here is what I have found about it so far:
- The bug appears deep within React code, taken from http://browserbench.org/Speedometer2.0/resources/todomvc/architecture-examples/react-redux/dist/index.html (sadly partially minified).
- The bug is that "this" is apparently undefined in "this.moveChild", in _updateChildren in that file.
- _updateChildren appears many times in the stack trace, but it does not appear to make any tail call so it should not be optimized.
- The bug disappear if we refuse to make jumps to inlined functions, or if we refuse to make jumps across levels of the inline stack stack. So it is a case where we are creating a jump from one inlined function to another (higher in the call stack).
- The bug does not disappear if I disable the optimization when the number of arguments is different from the number of parameters of the relevant code block (for inlined frames).
- It reproduces fine with JSC_useConcurrentJIT=0
- Removing the repetition of op_enter does not make the bug disappear change.
- Removing the setting of extra parameters to undefined also does not make the bug disappear or change.
Comment 95 Ryan Haddad 2017-10-26 09:42:13 PDT
*** Bug 178592 has been marked as a duplicate of this bug. ***
Comment 96 Robin Morisset 2017-10-31 10:29:41 PDT
Created attachment 325457 [details]
WIP, for sharing with Saam
Comment 97 Robin Morisset 2017-11-08 12:11:05 PST
Created attachment 326343 [details]
Patch for landing
Comment 98 Robin Morisset 2017-11-08 12:13:21 PST
Because of how tricky it was to write a minimized regression test, and speedometer provides some protection against regression, I decided to reland this patch with the fix described in https://bugs.webkit.org/show_bug.cgi and no extra test.

I marked it r=sbarati as we discussed this together.
Comment 99 Robin Morisset 2017-11-08 12:14:25 PST
*** Bug 178834 has been marked as a duplicate of this bug. ***
Comment 100 WebKit Commit Bot 2017-11-08 12:42:19 PST
Comment on attachment 326343 [details]
Patch for landing

Clearing flags on attachment: 326343

Committed r224592: <https://trac.webkit.org/changeset/224592>
Comment 101 WebKit Commit Bot 2017-11-08 12:42:22 PST
All reviewed patches have been landed.  Closing bug.
Comment 102 Robin Morisset 2017-11-09 02:29:22 PST
Collected 10 samples per benchmark/VM, with 10 VM invocations per benchmark. Emitted a call to
gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used
the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark
execution times with 95% confidence intervals in milliseconds.

                         Baseline                  TailCall

n-body              837.5192+-2.5306     ^    742.2701+-2.1430        ^ definitely 1.1283x faster
merge-sort          597.9785+-5.6828     ^    521.2900+-3.2286        ^ definitely 1.1471x faster
bf-interpreter      478.1371+-4.7080     ^    334.9072+-1.8962        ^ definitely 1.4277x faster

<geometric>         620.9622+-3.5729     ^    506.0356+-1.2795        ^ definitely 1.2271x faster

No visible change on other benchmarks that I've tested.
Comment 103 David Kilzer (:ddkilzer) 2017-11-15 03:15:30 PST
<rdar://problem/35556442>