Bug 125828

Summary: CStack Branch: Fix callee frame access in virtualForThunkGenerator when we don't emit prologue code
Product: WebKit Reporter: Michael Saboff <msaboff>
Component: JavaScriptCoreAssignee: Michael Saboff <msaboff>
Status: RESOLVED FIXED    
Severity: Normal CC: buildbot, commit-queue, ggaren, mark.lam, ossy, rniwa
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
Patch
ggaren: review+, buildbot: commit-queue-
Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2
none
Archive of layout-test-results from webkit-ews-02 for mac-mountainlion none

Michael Saboff
Reported 2013-12-16 20:14:42 PST
In virtualForThunkGenerator(), there is a fast path at the top of the generated code where we don't emit a function prologue because we might be able to jump directly to the callee. We need to use the stack pointer as the base register and adjust the offset taking into account that the prologue hasn't been executed.
Attachments
Patch (4.31 KB, patch)
2013-12-16 20:18 PST, Michael Saboff
ggaren: review+
buildbot: commit-queue-
Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2 (637.66 KB, application/zip)
2014-01-06 19:56 PST, Build Bot
no flags
Archive of layout-test-results from webkit-ews-02 for mac-mountainlion (524.02 KB, application/zip)
2014-01-07 23:57 PST, Build Bot
no flags
Michael Saboff
Comment 1 2013-12-16 20:18:23 PST
Created attachment 219389 [details] Patch With this fix, it appears that we can run v8-deltablue and v8-richards.
WebKit Commit Bot
Comment 2 2013-12-16 20:21:18 PST
Attachment 219389 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source/JavaScriptCore/jit/AssemblyHelpers.h', u'Source/JavaScriptCore/jit/ThunkGenerators.cpp', '--commit-queue']" exit_code: 1 ERROR: Source/JavaScriptCore/jit/AssemblyHelpers.h:80: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] ERROR: Source/JavaScriptCore/jit/AssemblyHelpers.h:82: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Total errors found: 2 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Geoffrey Garen
Comment 3 2013-12-16 21:51:21 PST
Comment on attachment 219389 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=219389&action=review > Source/JavaScriptCore/jit/AssemblyHelpers.h:105 > + // First, the access is via the stack pointer. Second, the address calculation must also take > + // into account that the stack pointer may not have been adjusted down for the return PC and/or > + // caller's frame pointer. On some platforms, the callee is responsible for pushing the > + // "link register" containing the return address in the function prologue. What is your strategy for taking into account whether the caller has pushed the link register? You don't seem to have done that here.
Michael Saboff
Comment 4 2013-12-16 22:11:52 PST
(In reply to comment #3) > (From update of attachment 219389 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=219389&action=review > > > Source/JavaScriptCore/jit/AssemblyHelpers.h:105 > > + // First, the access is via the stack pointer. Second, the address calculation must also take > > + // into account that the stack pointer may not have been adjusted down for the return PC and/or > > + // caller's frame pointer. On some platforms, the callee is responsible for pushing the > > + // "link register" containing the return address in the function prologue. > > What is your strategy for taking into account whether the caller has pushed the link register? You don't seem to have done that here. The current patch is in a #if CPU(X86_64) || CPU(X86) block. I haven't looked at the other CPU types to see which have a link register that gets pushed in a function prologue. This #if / #endif block also has emitFunctionPrologue(). I'll create the correct version of these functions at the same time that emitFunctionPrologue() for the other CPU types is created.
Michael Saboff
Comment 5 2013-12-17 10:23:22 PST
Geoffrey Garen
Comment 6 2013-12-17 10:57:13 PST
> > What is your strategy for taking into account whether the caller has pushed the link register? You don't seem to have done that here. > > The current patch is in a #if CPU(X86_64) || CPU(X86) block. I haven't looked at the other CPU types to see which have a link register that gets pushed in a function prologue. This #if / #endif block also has emitFunctionPrologue(). I'll create the correct version of these functions at the same time that emitFunctionPrologue() for the other CPU types is created. I don't like this strategy. There are two problems with it: (1) It creates technical debt unnecessarily; (2) It introduces a new custom calling and stack convention to each target CPU. I'd prefer to see the virtual thunk set up a real stack frame, access that frame using our normal helper methods, and then tail call into the target function.
Michael Saboff
Comment 7 2013-12-17 11:39:03 PST
(In reply to comment #6) > > > What is your strategy for taking into account whether the caller has pushed the link register? You don't seem to have done that here. > > > > The current patch is in a #if CPU(X86_64) || CPU(X86) block. I haven't looked at the other CPU types to see which have a link register that gets pushed in a function prologue. This #if / #endif block also has emitFunctionPrologue(). I'll create the correct version of these functions at the same time that emitFunctionPrologue() for the other CPU types is created. > > I don't like this strategy. There are two problems with it: (1) It creates technical debt unnecessarily; (2) It introduces a new custom calling and stack convention to each target CPU. I will write the emitFunctionPrologue() and emitFunctionEpilogue() functions for all platforms in another patch <https://bugs.webkit.org/show_bug.cgi?id=125861>. For ARM64, we will also need to add the load pair and store pair instructions to the assembler and macro assembler. I suspect that there are two basic flavors, one for CPUs that push the return address as part of the call instruction and another for those that have a link register that the callee pushes. The current virtual thunk already has a custom tail call sequence. The current ToT prologue is preserveReturnAddressAfterCall() followed by emitPutReturnPCToCallFrameHeader(). CPU differences are handled with preserveReturnAddressAfterCall(). The virtual thunk doesn't do that on the fast path, it leaves it for the target function. What is changing here is that the call frame register hasn't been updated to point at the callee frame, that's done in the prologue, so we need to use the stack pointer to access the callee frame using a pre-prologue offset. That is why I introduced emitPutToCallFrameHeaderBeforePrologue, et al, to factor out any CPU differences. > I'd prefer to see the virtual thunk set up a real stack frame, access that frame using our normal helper methods, and then tail call into the target function. If the virtual thunk properly sets up a stack through emitFunctionPrologue(), how can we tail call? The target function will also have prologue code. Are you suggesting that the virtual thunk contain a prologue, access the callee frame, run the epilogue and then tail call the target?
Geoffrey Garen
Comment 8 2013-12-17 16:10:49 PST
> > I'd prefer to see the virtual thunk set up a real stack frame, access that frame using our normal helper methods, and then tail call into the target function. > > If the virtual thunk properly sets up a stack through emitFunctionPrologue(), how can we tail call? // destination on the left push lr // if we have an lr push bp move bp, sp // ... body of virtual function call fast path ... jump <arity checking target> > The target function will also have prologue code. Are you suggesting that the virtual thunk contain a prologue, access the callee frame, run the epilogue and then tail call the target? That would probably be OK. But I think we can do even better: The so-called arity checking entrypoint (which is really just the virtual call entry point) should just elide the function prologue. It's more efficient to assume that the virtual call thunk has already executed the function prologue.
Michael Saboff
Comment 9 2013-12-17 16:38:34 PST
(In reply to comment #8) > > > I'd prefer to see the virtual thunk set up a real stack frame, access that frame using our normal helper methods, and then tail call into the target function. > > > > If the virtual thunk properly sets up a stack through emitFunctionPrologue(), how can we tail call? > > // destination on the left > push lr // if we have an lr > push bp > move bp, sp > > // ... body of virtual function call fast path ... > > jump <arity checking target> I knew that part. I was talking about a second prologue executing in the target. > > The target function will also have prologue code. Are you suggesting that the virtual thunk contain a prologue, access the callee frame, run the epilogue and then tail call the target? > > That would probably be OK. But I think we can do even better: The so-called arity checking entrypoint (which is really just the virtual call entry point) should just elide the function prologue. It's more efficient to assume that the virtual call thunk has already executed the function prologue. The arity checking entry point is also used when the callee doesn't have enough args in operationLinkCall & operationLinkConstruct. In that case, we are patching a call and not a jump.
Geoffrey Garen
Comment 10 2013-12-18 13:13:15 PST
> The arity checking entry point is also used when the callee doesn't have enough args in operationLinkCall & operationLinkConstruct. In that case, we are patching a call and not a jump. In that case, I would make arity mismatch call through the virtual call thunk. Since arity mismatch already calls out to C++, the extra thunk overhead should not be significant. Another option, which would also work, while resolving the technical debt, would be to move bp by a constant, then do the store to the ScopeChain slot through normal helper functions, then move bp back. Another option, which would also work, while resolving the technical debt, would be for the callee to do the store to the ScopeChain slot. For global functions and DFG create-once functions, the ScopeChain is a constant in the callee, so this would also be a speedup.
Alexey Proskuryakov
Comment 11 2013-12-24 23:32:20 PST
Looks like Mac EWS is unhappy about this patch - it's been trying for 8 days now, unable to give a conclusive answer.
Geoffrey Garen
Comment 12 2014-01-03 12:24:25 PST
Comment on attachment 219389 [details] Patch r=me since we are using this on the branch -- I'd still like to find a better long-term solution.
Build Bot
Comment 13 2014-01-06 19:56:30 PST
Comment on attachment 219389 [details] Patch Attachment 219389 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.appspot.com/results/6217341775904768 New failing tests: cssom/cssvalue-comparison.html fast/canvas/canvas-blending-color-over-pattern.html css3/unicode-bidi-isolate-basic.html fast/canvas/canvas-blending-color-over-gradient.html fast/canvas/canvas-blending-fill-style.html cssom/cssstyledeclaration-csstext-final-delimiter.html fast/canvas/webgl/array-unit-tests.html cssom/cssimportrule-media.html editing/execCommand/query-text-alignment.html fast/canvas/canvas-blending-color-over-color.html fast/canvas/canvas-blending-color-over-image.html fast/dom/SelectorAPI/resig-SelectorsAPI-test.xhtml fast/dom/webidl-operations-on-node-prototype.html fast/canvas/canvas-blending-clipping.html http/tests/canvas/webgl/origin-clean-conformance.html fast/canvas/canvas-blending-gradient-over-color.html http/tests/cookies/third-party-cookie-relaxing.html fast/media/mq-color-index-02.html fast/dom/gc-acid3.html fast/media/w3c/test_media_queries.html fast/dom/Document/CaretRangeFromPoint/hittest-relative-to-viewport.html fast/canvas/canvas-blending-global-alpha.html fast/canvas/webgl/tex-image-with-format-and-type.html fast/media/mq-js-update-media.html http/tests/misc/acid3.html fast/canvas/webgl/arraybuffer-transfer-of-control.html editing/pasteboard/paste-text-events.html fast/regions/webkit-flow-into-parsing.html fast/parser/fragment-parser.html editing/execCommand/query-command-state.html
Build Bot
Comment 14 2014-01-06 19:56:31 PST
Created attachment 220483 [details] Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: webkit-ews-10 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Build Bot
Comment 15 2014-01-07 23:57:34 PST
Comment on attachment 219389 [details] Patch Attachment 219389 [details] did not pass mac-ews (mac): Output: http://webkit-queues.appspot.com/results/5123605000093696 New failing tests: cssom/cssvalue-comparison.html editing/pasteboard/drop-text-events.html fast/canvas/canvas-blending-color-over-pattern.html css3/unicode-bidi-isolate-basic.html fast/canvas/canvas-blending-color-over-gradient.html fast/canvas/canvas-blending-fill-style.html cssom/cssstyledeclaration-csstext-final-delimiter.html fast/canvas/webgl/array-unit-tests.html cssom/cssimportrule-media.html editing/execCommand/query-text-alignment.html fast/canvas/canvas-blending-color-over-color.html fast/dom/SelectorAPI/resig-SelectorsAPI-test.xhtml fast/dom/webidl-operations-on-node-prototype.html fast/canvas/canvas-blending-clipping.html http/tests/canvas/webgl/origin-clean-conformance.html fast/canvas/canvas-blending-gradient-over-color.html http/tests/cookies/third-party-cookie-relaxing.html fast/media/mq-color-index-02.html fast/dom/gc-acid3.html fast/media/w3c/test_media_queries.html fast/dom/Document/CaretRangeFromPoint/hittest-relative-to-viewport.html fast/canvas/canvas-blending-global-alpha.html fast/canvas/webgl/tex-image-with-format-and-type.html fast/media/mq-js-update-media.html http/tests/misc/acid3.html fast/canvas/webgl/arraybuffer-transfer-of-control.html editing/pasteboard/paste-text-events.html fast/regions/webkit-flow-into-parsing.html fast/parser/fragment-parser.html editing/execCommand/query-command-state.html
Build Bot
Comment 16 2014-01-07 23:57:36 PST
Created attachment 220600 [details] Archive of layout-test-results from webkit-ews-02 for mac-mountainlion The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: webkit-ews-02 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Mark Lam
Comment 17 2014-01-13 18:43:26 PST
Review status updated in r161938: <http://trac.webkit.org/r161938>.
Note You need to log in before you can comment on or make changes to this bug.