Bug 125828 - CStack Branch: Fix callee frame access in virtualForThunkGenerator when we don't emit prologue code
Summary: CStack Branch: Fix callee frame access in virtualForThunkGenerator when we do...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-16 20:14 PST by Michael Saboff
Modified: 2015-02-26 06:34 PST (History)
6 users (show)

See Also:


Attachments
Patch (4.31 KB, patch)
2013-12-16 20:18 PST, Michael Saboff
ggaren: review+
buildbot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2 (637.66 KB, application/zip)
2014-01-06 19:56 PST, Build Bot
no flags Details
Archive of layout-test-results from webkit-ews-02 for mac-mountainlion (524.02 KB, application/zip)
2014-01-07 23:57 PST, Build Bot
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2013-12-16 20:14:42 PST
In virtualForThunkGenerator(), there is a fast path at the top of the generated code where we don't emit a function prologue because we might be able to jump directly to the callee.  We need to use the stack pointer as the base register and adjust the offset taking into account that the prologue hasn't been executed.
Comment 1 Michael Saboff 2013-12-16 20:18:23 PST
Created attachment 219389 [details]
Patch

With this fix, it appears that we can run v8-deltablue and v8-richards.
Comment 2 WebKit Commit Bot 2013-12-16 20:21:18 PST
Attachment 219389 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source/JavaScriptCore/jit/AssemblyHelpers.h', u'Source/JavaScriptCore/jit/ThunkGenerators.cpp', '--commit-queue']" exit_code: 1
ERROR: Source/JavaScriptCore/jit/AssemblyHelpers.h:80:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
ERROR: Source/JavaScriptCore/jit/AssemblyHelpers.h:82:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
Total errors found: 2 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Geoffrey Garen 2013-12-16 21:51:21 PST
Comment on attachment 219389 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=219389&action=review

> Source/JavaScriptCore/jit/AssemblyHelpers.h:105
> +    // First, the access is via the stack pointer.  Second, the address calculation must also take
> +    // into account that the stack pointer may not have been adjusted down for the return PC and/or
> +    // caller's frame pointer.  On some platforms, the callee is responsible for pushing the
> +    // "link register" containing the return address in the function prologue.

What is your strategy for taking into account whether the caller has pushed the link register? You don't seem to have done that here.
Comment 4 Michael Saboff 2013-12-16 22:11:52 PST
(In reply to comment #3)
> (From update of attachment 219389 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=219389&action=review
> 
> > Source/JavaScriptCore/jit/AssemblyHelpers.h:105
> > +    // First, the access is via the stack pointer.  Second, the address calculation must also take
> > +    // into account that the stack pointer may not have been adjusted down for the return PC and/or
> > +    // caller's frame pointer.  On some platforms, the callee is responsible for pushing the
> > +    // "link register" containing the return address in the function prologue.
> 
> What is your strategy for taking into account whether the caller has pushed the link register? You don't seem to have done that here.

The current patch is in a #if CPU(X86_64) || CPU(X86) block.  I haven't looked at the other CPU types to see which have a link register that gets pushed in a function prologue.  This #if / #endif block also has emitFunctionPrologue().  I'll create the correct version of these functions at the same time that emitFunctionPrologue() for the other CPU types is created.
Comment 5 Michael Saboff 2013-12-17 10:23:22 PST
Committed r160714: <http://trac.webkit.org/changeset/160714 >
Comment 6 Geoffrey Garen 2013-12-17 10:57:13 PST
> > What is your strategy for taking into account whether the caller has pushed the link register? You don't seem to have done that here.
> 
> The current patch is in a #if CPU(X86_64) || CPU(X86) block.  I haven't looked at the other CPU types to see which have a link register that gets pushed in a function prologue.  This #if / #endif block also has emitFunctionPrologue().  I'll create the correct version of these functions at the same time that emitFunctionPrologue() for the other CPU types is created.

I don't like this strategy. There are two problems with it: (1) It creates technical debt unnecessarily; (2) It introduces a new custom calling and stack convention to each target CPU.

I'd prefer to see the virtual thunk set up a real stack frame, access that frame using our normal helper methods, and then tail call into the target function.
Comment 7 Michael Saboff 2013-12-17 11:39:03 PST
(In reply to comment #6)
> > > What is your strategy for taking into account whether the caller has pushed the link register? You don't seem to have done that here.
> > 
> > The current patch is in a #if CPU(X86_64) || CPU(X86) block.  I haven't looked at the other CPU types to see which have a link register that gets pushed in a function prologue.  This #if / #endif block also has emitFunctionPrologue().  I'll create the correct version of these functions at the same time that emitFunctionPrologue() for the other CPU types is created.
> 
> I don't like this strategy. There are two problems with it: (1) It creates technical debt unnecessarily; (2) It introduces a new custom calling and stack convention to each target CPU.

I will write the emitFunctionPrologue() and emitFunctionEpilogue() functions for all platforms in another patch <https://bugs.webkit.org/show_bug.cgi?id=125861>.  For ARM64, we will also need to add the load pair and store pair instructions to the assembler and macro assembler.  I suspect that there are two basic flavors, one for CPUs that push the return address as part of the call instruction and another for those that have a link register that the callee pushes.

The current virtual thunk already has a custom tail call sequence.  The current ToT prologue is preserveReturnAddressAfterCall() followed by emitPutReturnPCToCallFrameHeader().  CPU differences are handled with preserveReturnAddressAfterCall().  The virtual thunk doesn't do that on the fast path, it leaves it for the target function.  What is changing here is that the call frame register hasn't been updated to point at the callee frame, that's done in the prologue, so we need to use the stack pointer to access the callee frame using a pre-prologue offset.  That is why I introduced emitPutToCallFrameHeaderBeforePrologue, et al, to factor out any CPU differences.

> I'd prefer to see the virtual thunk set up a real stack frame, access that frame using our normal helper methods, and then tail call into the target function.

If the virtual thunk properly sets up a stack through emitFunctionPrologue(), how can we tail call?  The target function will also have prologue code.  Are you suggesting that the virtual thunk contain a prologue, access the callee frame, run the epilogue and then tail call the target?
Comment 8 Geoffrey Garen 2013-12-17 16:10:49 PST
> > I'd prefer to see the virtual thunk set up a real stack frame, access that frame using our normal helper methods, and then tail call into the target function.
> 
> If the virtual thunk properly sets up a stack through emitFunctionPrologue(), how can we tail call?

// destination on the left
push lr // if we have an lr
push bp
move bp, sp

// ... body of virtual function call fast path ...

jump <arity checking target>

>  The target function will also have prologue code.  Are you suggesting that the virtual thunk contain a prologue, access the callee frame, run the epilogue and then tail call the target?

That would probably be OK. But I think we can do even better: The so-called arity checking entrypoint (which is really just the virtual call entry point) should just elide the function prologue. It's more efficient to assume that the virtual call thunk has already executed the function prologue.
Comment 9 Michael Saboff 2013-12-17 16:38:34 PST
(In reply to comment #8)
> > > I'd prefer to see the virtual thunk set up a real stack frame, access that frame using our normal helper methods, and then tail call into the target function.
> > 
> > If the virtual thunk properly sets up a stack through emitFunctionPrologue(), how can we tail call?
> 
> // destination on the left
> push lr // if we have an lr
> push bp
> move bp, sp
> 
> // ... body of virtual function call fast path ...
> 
> jump <arity checking target>

I knew that part.  I was talking about a second prologue executing in the target.

> >  The target function will also have prologue code.  Are you suggesting that the virtual thunk contain a prologue, access the callee frame, run the epilogue and then tail call the target?
> 
> That would probably be OK. But I think we can do even better: The so-called arity checking entrypoint (which is really just the virtual call entry point) should just elide the function prologue. It's more efficient to assume that the virtual call thunk has already executed the function prologue.

The arity checking entry point is also used when the callee doesn't have enough args in operationLinkCall & operationLinkConstruct.  In that case, we are patching a call and not a jump.
Comment 10 Geoffrey Garen 2013-12-18 13:13:15 PST
> The arity checking entry point is also used when the callee doesn't have enough args in operationLinkCall & operationLinkConstruct.  In that case, we are patching a call and not a jump.

In that case, I would make arity mismatch call through the virtual call thunk. Since arity mismatch already calls out to C++, the extra thunk overhead should not be significant.

Another option, which would also work, while resolving the technical debt, would be to move bp by a constant, then do the store to the ScopeChain slot through normal helper functions, then move bp back.

Another option, which would also work, while resolving the technical debt, would be for the callee to do the store to the ScopeChain slot. For global functions and DFG create-once functions, the ScopeChain is a constant in the callee, so this would also be a speedup.
Comment 11 Alexey Proskuryakov 2013-12-24 23:32:20 PST
Looks like Mac EWS is unhappy about this patch - it's been trying for 8 days now, unable to give a conclusive answer.
Comment 12 Geoffrey Garen 2014-01-03 12:24:25 PST
Comment on attachment 219389 [details]
Patch

r=me since we are using this on the branch -- I'd still like to find a better long-term solution.
Comment 13 Build Bot 2014-01-06 19:56:30 PST
Comment on attachment 219389 [details]
Patch

Attachment 219389 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/6217341775904768

New failing tests:
cssom/cssvalue-comparison.html
fast/canvas/canvas-blending-color-over-pattern.html
css3/unicode-bidi-isolate-basic.html
fast/canvas/canvas-blending-color-over-gradient.html
fast/canvas/canvas-blending-fill-style.html
cssom/cssstyledeclaration-csstext-final-delimiter.html
fast/canvas/webgl/array-unit-tests.html
cssom/cssimportrule-media.html
editing/execCommand/query-text-alignment.html
fast/canvas/canvas-blending-color-over-color.html
fast/canvas/canvas-blending-color-over-image.html
fast/dom/SelectorAPI/resig-SelectorsAPI-test.xhtml
fast/dom/webidl-operations-on-node-prototype.html
fast/canvas/canvas-blending-clipping.html
http/tests/canvas/webgl/origin-clean-conformance.html
fast/canvas/canvas-blending-gradient-over-color.html
http/tests/cookies/third-party-cookie-relaxing.html
fast/media/mq-color-index-02.html
fast/dom/gc-acid3.html
fast/media/w3c/test_media_queries.html
fast/dom/Document/CaretRangeFromPoint/hittest-relative-to-viewport.html
fast/canvas/canvas-blending-global-alpha.html
fast/canvas/webgl/tex-image-with-format-and-type.html
fast/media/mq-js-update-media.html
http/tests/misc/acid3.html
fast/canvas/webgl/arraybuffer-transfer-of-control.html
editing/pasteboard/paste-text-events.html
fast/regions/webkit-flow-into-parsing.html
fast/parser/fragment-parser.html
editing/execCommand/query-command-state.html
Comment 14 Build Bot 2014-01-06 19:56:31 PST
Created attachment 220483 [details]
Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-10  Port: mac-mountainlion-wk2  Platform: Mac OS X 10.8.5
Comment 15 Build Bot 2014-01-07 23:57:34 PST
Comment on attachment 219389 [details]
Patch

Attachment 219389 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/5123605000093696

New failing tests:
cssom/cssvalue-comparison.html
editing/pasteboard/drop-text-events.html
fast/canvas/canvas-blending-color-over-pattern.html
css3/unicode-bidi-isolate-basic.html
fast/canvas/canvas-blending-color-over-gradient.html
fast/canvas/canvas-blending-fill-style.html
cssom/cssstyledeclaration-csstext-final-delimiter.html
fast/canvas/webgl/array-unit-tests.html
cssom/cssimportrule-media.html
editing/execCommand/query-text-alignment.html
fast/canvas/canvas-blending-color-over-color.html
fast/dom/SelectorAPI/resig-SelectorsAPI-test.xhtml
fast/dom/webidl-operations-on-node-prototype.html
fast/canvas/canvas-blending-clipping.html
http/tests/canvas/webgl/origin-clean-conformance.html
fast/canvas/canvas-blending-gradient-over-color.html
http/tests/cookies/third-party-cookie-relaxing.html
fast/media/mq-color-index-02.html
fast/dom/gc-acid3.html
fast/media/w3c/test_media_queries.html
fast/dom/Document/CaretRangeFromPoint/hittest-relative-to-viewport.html
fast/canvas/canvas-blending-global-alpha.html
fast/canvas/webgl/tex-image-with-format-and-type.html
fast/media/mq-js-update-media.html
http/tests/misc/acid3.html
fast/canvas/webgl/arraybuffer-transfer-of-control.html
editing/pasteboard/paste-text-events.html
fast/regions/webkit-flow-into-parsing.html
fast/parser/fragment-parser.html
editing/execCommand/query-command-state.html
Comment 16 Build Bot 2014-01-07 23:57:36 PST
Created attachment 220600 [details]
Archive of layout-test-results from webkit-ews-02 for mac-mountainlion

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-02  Port: mac-mountainlion  Platform: Mac OS X 10.8.5
Comment 17 Mark Lam 2014-01-13 18:43:26 PST
Review status updated in r161938: <http://trac.webkit.org/r161938>.