In virtualForThunkGenerator(), there is a fast path at the top of the generated code where we don't emit a function prologue because we might be able to jump directly to the callee. We need to use the stack pointer as the base register and adjust the offset taking into account that the prologue hasn't been executed.
Created attachment 219389 [details] Patch With this fix, it appears that we can run v8-deltablue and v8-richards.
Attachment 219389 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source/JavaScriptCore/jit/AssemblyHelpers.h', u'Source/JavaScriptCore/jit/ThunkGenerators.cpp', '--commit-queue']" exit_code: 1 ERROR: Source/JavaScriptCore/jit/AssemblyHelpers.h:80: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] ERROR: Source/JavaScriptCore/jit/AssemblyHelpers.h:82: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Total errors found: 2 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 219389 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=219389&action=review > Source/JavaScriptCore/jit/AssemblyHelpers.h:105 > + // First, the access is via the stack pointer. Second, the address calculation must also take > + // into account that the stack pointer may not have been adjusted down for the return PC and/or > + // caller's frame pointer. On some platforms, the callee is responsible for pushing the > + // "link register" containing the return address in the function prologue. What is your strategy for taking into account whether the caller has pushed the link register? You don't seem to have done that here.
(In reply to comment #3) > (From update of attachment 219389 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=219389&action=review > > > Source/JavaScriptCore/jit/AssemblyHelpers.h:105 > > + // First, the access is via the stack pointer. Second, the address calculation must also take > > + // into account that the stack pointer may not have been adjusted down for the return PC and/or > > + // caller's frame pointer. On some platforms, the callee is responsible for pushing the > > + // "link register" containing the return address in the function prologue. > > What is your strategy for taking into account whether the caller has pushed the link register? You don't seem to have done that here. The current patch is in a #if CPU(X86_64) || CPU(X86) block. I haven't looked at the other CPU types to see which have a link register that gets pushed in a function prologue. This #if / #endif block also has emitFunctionPrologue(). I'll create the correct version of these functions at the same time that emitFunctionPrologue() for the other CPU types is created.
Committed r160714: <http://trac.webkit.org/changeset/160714 >
> > What is your strategy for taking into account whether the caller has pushed the link register? You don't seem to have done that here. > > The current patch is in a #if CPU(X86_64) || CPU(X86) block. I haven't looked at the other CPU types to see which have a link register that gets pushed in a function prologue. This #if / #endif block also has emitFunctionPrologue(). I'll create the correct version of these functions at the same time that emitFunctionPrologue() for the other CPU types is created. I don't like this strategy. There are two problems with it: (1) It creates technical debt unnecessarily; (2) It introduces a new custom calling and stack convention to each target CPU. I'd prefer to see the virtual thunk set up a real stack frame, access that frame using our normal helper methods, and then tail call into the target function.
(In reply to comment #6) > > > What is your strategy for taking into account whether the caller has pushed the link register? You don't seem to have done that here. > > > > The current patch is in a #if CPU(X86_64) || CPU(X86) block. I haven't looked at the other CPU types to see which have a link register that gets pushed in a function prologue. This #if / #endif block also has emitFunctionPrologue(). I'll create the correct version of these functions at the same time that emitFunctionPrologue() for the other CPU types is created. > > I don't like this strategy. There are two problems with it: (1) It creates technical debt unnecessarily; (2) It introduces a new custom calling and stack convention to each target CPU. I will write the emitFunctionPrologue() and emitFunctionEpilogue() functions for all platforms in another patch <https://bugs.webkit.org/show_bug.cgi?id=125861>. For ARM64, we will also need to add the load pair and store pair instructions to the assembler and macro assembler. I suspect that there are two basic flavors, one for CPUs that push the return address as part of the call instruction and another for those that have a link register that the callee pushes. The current virtual thunk already has a custom tail call sequence. The current ToT prologue is preserveReturnAddressAfterCall() followed by emitPutReturnPCToCallFrameHeader(). CPU differences are handled with preserveReturnAddressAfterCall(). The virtual thunk doesn't do that on the fast path, it leaves it for the target function. What is changing here is that the call frame register hasn't been updated to point at the callee frame, that's done in the prologue, so we need to use the stack pointer to access the callee frame using a pre-prologue offset. That is why I introduced emitPutToCallFrameHeaderBeforePrologue, et al, to factor out any CPU differences. > I'd prefer to see the virtual thunk set up a real stack frame, access that frame using our normal helper methods, and then tail call into the target function. If the virtual thunk properly sets up a stack through emitFunctionPrologue(), how can we tail call? The target function will also have prologue code. Are you suggesting that the virtual thunk contain a prologue, access the callee frame, run the epilogue and then tail call the target?
> > I'd prefer to see the virtual thunk set up a real stack frame, access that frame using our normal helper methods, and then tail call into the target function. > > If the virtual thunk properly sets up a stack through emitFunctionPrologue(), how can we tail call? // destination on the left push lr // if we have an lr push bp move bp, sp // ... body of virtual function call fast path ... jump <arity checking target> > The target function will also have prologue code. Are you suggesting that the virtual thunk contain a prologue, access the callee frame, run the epilogue and then tail call the target? That would probably be OK. But I think we can do even better: The so-called arity checking entrypoint (which is really just the virtual call entry point) should just elide the function prologue. It's more efficient to assume that the virtual call thunk has already executed the function prologue.
(In reply to comment #8) > > > I'd prefer to see the virtual thunk set up a real stack frame, access that frame using our normal helper methods, and then tail call into the target function. > > > > If the virtual thunk properly sets up a stack through emitFunctionPrologue(), how can we tail call? > > // destination on the left > push lr // if we have an lr > push bp > move bp, sp > > // ... body of virtual function call fast path ... > > jump <arity checking target> I knew that part. I was talking about a second prologue executing in the target. > > The target function will also have prologue code. Are you suggesting that the virtual thunk contain a prologue, access the callee frame, run the epilogue and then tail call the target? > > That would probably be OK. But I think we can do even better: The so-called arity checking entrypoint (which is really just the virtual call entry point) should just elide the function prologue. It's more efficient to assume that the virtual call thunk has already executed the function prologue. The arity checking entry point is also used when the callee doesn't have enough args in operationLinkCall & operationLinkConstruct. In that case, we are patching a call and not a jump.
> The arity checking entry point is also used when the callee doesn't have enough args in operationLinkCall & operationLinkConstruct. In that case, we are patching a call and not a jump. In that case, I would make arity mismatch call through the virtual call thunk. Since arity mismatch already calls out to C++, the extra thunk overhead should not be significant. Another option, which would also work, while resolving the technical debt, would be to move bp by a constant, then do the store to the ScopeChain slot through normal helper functions, then move bp back. Another option, which would also work, while resolving the technical debt, would be for the callee to do the store to the ScopeChain slot. For global functions and DFG create-once functions, the ScopeChain is a constant in the callee, so this would also be a speedup.
Looks like Mac EWS is unhappy about this patch - it's been trying for 8 days now, unable to give a conclusive answer.
Comment on attachment 219389 [details] Patch r=me since we are using this on the branch -- I'd still like to find a better long-term solution.
Comment on attachment 219389 [details] Patch Attachment 219389 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.appspot.com/results/6217341775904768 New failing tests: cssom/cssvalue-comparison.html fast/canvas/canvas-blending-color-over-pattern.html css3/unicode-bidi-isolate-basic.html fast/canvas/canvas-blending-color-over-gradient.html fast/canvas/canvas-blending-fill-style.html cssom/cssstyledeclaration-csstext-final-delimiter.html fast/canvas/webgl/array-unit-tests.html cssom/cssimportrule-media.html editing/execCommand/query-text-alignment.html fast/canvas/canvas-blending-color-over-color.html fast/canvas/canvas-blending-color-over-image.html fast/dom/SelectorAPI/resig-SelectorsAPI-test.xhtml fast/dom/webidl-operations-on-node-prototype.html fast/canvas/canvas-blending-clipping.html http/tests/canvas/webgl/origin-clean-conformance.html fast/canvas/canvas-blending-gradient-over-color.html http/tests/cookies/third-party-cookie-relaxing.html fast/media/mq-color-index-02.html fast/dom/gc-acid3.html fast/media/w3c/test_media_queries.html fast/dom/Document/CaretRangeFromPoint/hittest-relative-to-viewport.html fast/canvas/canvas-blending-global-alpha.html fast/canvas/webgl/tex-image-with-format-and-type.html fast/media/mq-js-update-media.html http/tests/misc/acid3.html fast/canvas/webgl/arraybuffer-transfer-of-control.html editing/pasteboard/paste-text-events.html fast/regions/webkit-flow-into-parsing.html fast/parser/fragment-parser.html editing/execCommand/query-command-state.html
Created attachment 220483 [details] Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: webkit-ews-10 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Comment on attachment 219389 [details] Patch Attachment 219389 [details] did not pass mac-ews (mac): Output: http://webkit-queues.appspot.com/results/5123605000093696 New failing tests: cssom/cssvalue-comparison.html editing/pasteboard/drop-text-events.html fast/canvas/canvas-blending-color-over-pattern.html css3/unicode-bidi-isolate-basic.html fast/canvas/canvas-blending-color-over-gradient.html fast/canvas/canvas-blending-fill-style.html cssom/cssstyledeclaration-csstext-final-delimiter.html fast/canvas/webgl/array-unit-tests.html cssom/cssimportrule-media.html editing/execCommand/query-text-alignment.html fast/canvas/canvas-blending-color-over-color.html fast/dom/SelectorAPI/resig-SelectorsAPI-test.xhtml fast/dom/webidl-operations-on-node-prototype.html fast/canvas/canvas-blending-clipping.html http/tests/canvas/webgl/origin-clean-conformance.html fast/canvas/canvas-blending-gradient-over-color.html http/tests/cookies/third-party-cookie-relaxing.html fast/media/mq-color-index-02.html fast/dom/gc-acid3.html fast/media/w3c/test_media_queries.html fast/dom/Document/CaretRangeFromPoint/hittest-relative-to-viewport.html fast/canvas/canvas-blending-global-alpha.html fast/canvas/webgl/tex-image-with-format-and-type.html fast/media/mq-js-update-media.html http/tests/misc/acid3.html fast/canvas/webgl/arraybuffer-transfer-of-control.html editing/pasteboard/paste-text-events.html fast/regions/webkit-flow-into-parsing.html fast/parser/fragment-parser.html editing/execCommand/query-command-state.html
Created attachment 220600 [details] Archive of layout-test-results from webkit-ews-02 for mac-mountainlion The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: webkit-ews-02 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Review status updated in r161938: <http://trac.webkit.org/r161938>.