Bug 130638 - [Win64] ASM LLINT is not enabled.
Summary: [Win64] ASM LLINT is not enabled.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
: 124450 127827 (view as bug list)
Depends on:
Blocks: 130389
  Show dependency treegraph
 
Reported: 2014-03-22 05:31 PDT by peavo
Modified: 2014-07-22 10:54 PDT (History)
15 users (show)

See Also:


Attachments
Patch (42.32 KB, patch)
2014-03-22 05:56 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (42.42 KB, patch)
2014-03-22 06:19 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (42.42 KB, patch)
2014-03-22 10:42 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (46.46 KB, patch)
2014-03-25 11:49 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (46.59 KB, patch)
2014-04-11 15:45 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (46.58 KB, patch)
2014-04-11 16:18 PDT, peavo
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from webkit-ews-14 for mac-mountainlion-wk2 (652.71 KB, application/zip)
2014-04-11 17:34 PDT, Build Bot
no flags Details
Patch (43.42 KB, patch)
2014-04-19 03:17 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (43.51 KB, patch)
2014-04-22 09:51 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (41.42 KB, patch)
2014-05-07 05:30 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (41.43 KB, patch)
2014-05-15 10:57 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (41.47 KB, patch)
2014-05-26 07:32 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (41.31 KB, patch)
2014-05-26 13:29 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (41.75 KB, patch)
2014-05-30 14:05 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (41.68 KB, patch)
2014-06-20 13:56 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (41.50 KB, patch)
2014-06-23 14:10 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (41.09 KB, patch)
2014-06-24 09:31 PDT, peavo
no flags Details | Formatted Diff | Diff
Patch (39.55 KB, patch)
2014-06-25 08:50 PDT, peavo
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description peavo 2014-03-22 05:31:12 PDT
It would be nice to have LLINT enabled on Win64, since it's now a requirement for JIT.
Comment 1 peavo 2014-03-22 05:56:20 PDT
Created attachment 227547 [details]
Patch
Comment 2 peavo 2014-03-22 06:19:02 PDT
Created attachment 227548 [details]
Patch
Comment 3 peavo 2014-03-22 06:34:06 PDT
With this patch I get these stress tests results with WinCairo 64-bit:

** The following JSC stress test failures have been introduced:
        mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla
        mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-llint
        mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-baseline
        mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-dfg-eager-no-cjit-validate-phases
        stress/ftl-arithcos.js.default
        stress/ftl-arithcos.js.no-llint
        stress/ftl-arithcos.js.always-trigger-copy-phase
        stress/ftl-arithcos.js.no-cjit-validate-phases
        stress/ftl-arithcos.js.dfg-eager
        stress/ftl-arithcos.js.dfg-eager-no-cjit-validate
        jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout
        jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-no-llint
        jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-no-cjit
        jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-dfg-eager-no-cjit
        jsc-layout-tests.yaml/js/script-tests/function-apply-many-args.js.layout
        jsc-layout-tests.yaml/js/script-tests/function-apply-many-args.js.layout-no-llint
        jsc-layout-tests.yaml/js/script-tests/function-apply-many-args.js.layout-no-cjit
        jsc-layout-tests.yaml/js/script-tests/function-apply-many-args.js.layout-dfg-eager-no-cjit
        jsc-layout-tests.yaml/js/script-tests/math.js.layout
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-llint
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-cjit
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-dfg-eager-no-cjit
        jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout
        jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-no-llint
        jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-no-cjit
        jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-dfg-eager-no-cjit
        jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout
        jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-no-llint
        jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-no-cjit
        jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-dfg-eager-no-cjit

Results for JSC stress tests:
    30 failures found.
Comment 4 peavo 2014-03-22 10:42:47 PDT
Created attachment 227553 [details]
Patch
Comment 5 Geoffrey Garen 2014-03-22 11:27:27 PDT
Not sure how to interpret this review request. Are you saying that this patch introduces 30 failures? If so, that's not good.
Comment 6 peavo 2014-03-22 12:43:42 PDT
(In reply to comment #5)
> Not sure how to interpret this review request. Are you saying that this patch introduces 30 failures? If so, that's not good.

Currently, Win64 is using the C loop backend, which also has some failures.
I will run a stress test with the C loop backend, and post those result for comparison.
Comment 7 peavo 2014-03-22 13:18:03 PDT
(In reply to comment #6)
> (In reply to comment #5)
> > Not sure how to interpret this review request. Are you saying that this patch introduces 30 failures? If so, that's not good.
> 
> Currently, Win64 is using the C loop backend, which also has some failures.
> I will run a stress test with the C loop backend, and post those result for comparison.

These are the results when running with the C loop backend:

** The following JSC stress test failures have been introduced:
        mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla
        mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-llint
        mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-baseline
        mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-dfg-eager-no-cjit-validate-phases
        stress/ftl-arithcos.js.default
        stress/ftl-arithcos.js.no-llint
        stress/ftl-arithcos.js.always-trigger-copy-phase
        stress/ftl-arithcos.js.no-cjit-validate-phases
        stress/ftl-arithcos.js.dfg-eager
        stress/ftl-arithcos.js.dfg-eager-no-cjit-validate
        jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout
        jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-no-llint
        jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-no-cjit
        jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-dfg-eager-no-cjit
        jsc-layout-tests.yaml/js/script-tests/math.js.layout
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-llint
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-cjit
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-dfg-eager-no-cjit
        jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout
        jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-no-llint
        jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-no-cjit
        jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-dfg-eager-no-cjit
        jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout
        jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-no-llint
        jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-no-cjit
        jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-dfg-eager-no-cjit

Results for JSC stress tests:
    26 failures found.
Comment 8 Geoffrey Garen 2014-03-22 13:41:34 PDT
So, it looks like we've got two issues:

(1) A bunch of tests are failing on Win64.

(2) If you want to enable the LLInt, you need to resolve a regression in function-apply-many-args.
Comment 9 Geoffrey Garen 2014-03-22 13:44:13 PDT
Filed a separate bug for (1): <https://bugs.webkit.org/show_bug.cgi?id=130646>.
Comment 10 peavo 2014-03-22 13:45:25 PDT
(In reply to comment #8)
> So, it looks like we've got two issues:
> 
> (1) A bunch of tests are failing on Win64.
> 
> (2) If you want to enable the LLInt, you need to resolve a regression in function-apply-many-args.

I will have a look at the regression :)
Comment 11 peavo 2014-03-25 11:49:03 PDT
Created attachment 227778 [details]
Patch
Comment 12 peavo 2014-03-25 11:54:07 PDT
(In reply to comment #11)
> Created an attachment (id=227778) [details]
> Patch

Included a fix for the regression in function-apply-many-args.
The reason for the failure was that we got a stack overflow exception.
This is fixed by increasing the default stack size of 1MB to 2MB,
in the project settings for jsc.exe.

Test results with latest patch:

** The following JSC stress test failures have been introduced:
        mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla
        mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-llint
        mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-baseline
        mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-dfg-eager-no-cjit-validate-phases
        stress/ftl-arithcos.js.default
        stress/ftl-arithcos.js.no-llint
        stress/ftl-arithcos.js.always-trigger-copy-phase
        stress/ftl-arithcos.js.no-cjit-validate-phases
        stress/ftl-arithcos.js.dfg-eager
        stress/ftl-arithcos.js.dfg-eager-no-cjit-validate
        jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout
        jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-no-llint
        jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-no-cjit
        jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-dfg-eager-no-cjit
        jsc-layout-tests.yaml/js/script-tests/math.js.layout
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-llint
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-cjit
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-dfg-eager-no-cjit
        jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout
        jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-no-llint
        jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-no-cjit
        jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-dfg-eager-no-cjit
        jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout
        jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-no-llint
        jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-no-cjit
        jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-dfg-eager-no-cjit

Results for JSC stress tests:
    26 failures found.
Comment 13 Brent Fulgham 2014-03-25 14:50:40 PDT
Does this bug supersede Bug 129807? Should I close the other bug, or land the patch on it?
Comment 14 peavo 2014-03-25 15:19:49 PDT
(In reply to comment #13)
> Does this bug supersede Bug 129807? Should I close the other bug, or land the patch on it?

Sorry for the late reply.
I actually think bug 129807 was landed.
I'm not getting JSC compile errors with or without this patch.
Comment 15 peavo 2014-03-25 15:26:12 PDT
(In reply to comment #14)
> (In reply to comment #13)
> > Does this bug supersede Bug 129807? Should I close the other bug, or land the patch on it?
> 
> Sorry for the late reply.
> I actually think bug 129807 was landed.
> I'm not getting JSC compile errors with or without this patch.

I think it was landed in changeset 165296.
Comment 16 Brent Fulgham 2014-04-01 10:14:51 PDT
Could the JSC team review this please? I'd love to see this integrated.
Comment 17 Alex Christensen 2014-04-10 15:01:54 PDT
I just confirmed that there are only 22 failures with or without this patch.  Is there any reason why this patch should not be included as-is?

Here are the 22 failures:
stress/ftl-arithcos.js.default
stress/ftl-arithcos.js.no-llint
stress/ftl-arithcos.js.always-trigger-copy-phase
stress/ftl-arithcos.js.no-cjit-validate-phases
stress/ftl-arithcos.js.dfg-eager
stress/ftl-arithcos.js.dfg-eager-no-cjit-validate
jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout
jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-no-llint
jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-no-cjit
jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-dfg-eager-no-cjit
jsc-layout-tests.yaml/js/script-tests/math.js.layout
jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-llint
jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-cjit
jsc-layout-tests.yaml/js/script-tests/math.js.layout-dfg-eager-no-cjit
jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout
jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-no-llint
jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-no-cjit
jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-dfg-eager-no-cjit
jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout
jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-no-llint
jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-no-cjit
jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-dfg-eager-no-cjit
Comment 18 Mark Lam 2014-04-10 17:52:05 PDT
Comment on attachment 227778 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=227778&action=review

> Source/JavaScriptCore/jit/Repatch.cpp:836
> -    stubJit.store32(MacroAssembler::TrustedImm32(reinterpret_cast<uint32_t>(structure->id())), MacroAssembler::Address(baseGPR, JSCell::structureIDOffset()));
> +#if USE(JSVALUE64)
> +    uint32_t val = structure->id();
> +#else
> +    uint32_t val = reinterpret_cast<uint32_t>(structure->id());
> +#endif
> +    stubJit.store32(MacroAssembler::TrustedImm32(val), MacroAssembler::Address(baseGPR, JSCell::structureIDOffset()));

Can you please clarify why is this change needed?  Your ChangeLog mentioned a "compile fix".  Can you please provide some details?  Thanks.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:387
> +# The signature of a slow path call is extern "C" SlowPathReturnType slowpathfunction(ExecState* exec, Instruction* pc).
> +# The size of the SlowPathReturnType is 16 bytes.
> +# On Win64, when the return type is larger than 8 bytes, we need to allocate space on the stack for the return value.
> +# On entry rcx (t2), should contain a pointer to this stack space. The other parameters are shifted to the right,
> +# rdx (t1) should contain the first argument, and r8 (t6) should contain the second argument.
> +# On return, rax contains a pointer to this stack value, and we then need to copy the 16 byte return value into rax (t0) and rdx (t1)
> +# since the return value is expected to be split between the two.
> +# See http://msdn.microsoft.com/en-us/library/7572ztz4.aspx
> +macro callSlowPathWin64(slowPath, arg1, arg2)
> +    move arg1, t1
> +    move arg2, t6
> +    subp 80, sp
> +    move sp, t2
> +    addp 48, t2
> +    call slowPath
> +    addp 80, sp
> +    move 8[t0], t1
> +    move [t0], t0
> +end

I suggest moving this into the cCall2() macro as a X86_64_WIN target instead of having a separate function like this.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:395
> -    cCall2(slowPath, cfr, PC)
> +    if X86_64_WIN
> +        callSlowPathWin64(slowPath, cfr, PC)
> +    else
> +        cCall2(slowPath, cfr, PC)
> +    end

Instead of adding this condition everywhere, is there any reason why you can't just add a case to cCall2() to handle X86_64_WIN?

Same with all the uses of cCall2() below.
Comment 19 peavo 2014-04-11 15:45:01 PDT
Created attachment 229169 [details]
Patch
Comment 20 Mark Lam 2014-04-11 15:59:19 PDT
Comment on attachment 229169 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=229169&action=review

> Source/JavaScriptCore/jit/Repatch.cpp:1048
> -    stubJit.store32(MacroAssembler::TrustedImm32(reinterpret_cast<uint32_t>(structure->id())), MacroAssembler::Address(baseGPR, JSCell::structureIDOffset()));
> +#if USE(JSVALUE64)
> +    uint32_t val = structure->id();
> +#else
> +    uint32_t val = reinterpret_cast<uint32_t>(structure->id());
> +#endif
> +    stubJit.store32(MacroAssembler::TrustedImm32(val), MacroAssembler::Address(baseGPR, JSCell::structureIDOffset()));

Did you explain why this is needed?

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:209
> -    cCall2(_llint_throw_stack_overflow_error, vm, protoCallFrame)
> +    cCall2SlowPath(_llint_throw_stack_overflow_error, vm, protoCallFrame)

Not needed.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:327
>      else
> -        addp 16, sp 
> +        if X86_64_WIN
> +            subp 32, sp
> +        else
> +            addp 16, sp
> +        end
>          call temp
> -        subp 16, sp
> +        if X86_64_WIN
> +            addp 32, sp
> +        else
> +            subp 16, sp
> +        end
>      end

This will be easier to read and understand if you express it as:

    elsif if X86_64_WIN
        subp 32, sp
        call temp
        addp 32, sp
    else
        addp 16, sp
        call temp
        subp 16, sp
    end

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:383
> +# The signature of a slow path call is extern "C" SlowPathReturnType slowpathfunction(ExecState* exec, Instruction* pc).
> +# The size of the SlowPathReturnType is 16 bytes.
> +# On Win64, when the return type is larger than 8 bytes, we need to allocate space on the stack for the return value.
> +# On entry rcx (t2), should contain a pointer to this stack space. The other parameters are shifted to the right,
> +# rdx (t1) should contain the first argument, and r8 (t6) should contain the second argument.
> +# On return, rax contains a pointer to this stack value, and we then need to copy the 16 byte return value into rax (t0) and rdx (t1)
> +# since the return value is expected to be split between the two.
> +# See http://msdn.microsoft.com/en-us/library/7572ztz4.aspx
> +macro callSlowPathWin64(slowPath, arg1, arg2)
> +    move arg1, t1
> +    move arg2, t6
> +    subp 80, sp
> +    move sp, t2
> +    addp 48, t2
> +    call slowPath
> +    addp 80, sp
> +    move 8[t0], t1
> +    move [t0], t0
> +end

You misunderstood me.  I meant for you to move this inside cCall2.  Is there any reason why that cannot be done?

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:391
> +macro cCall2SlowPath(slowPath, arg1, arg2)
> +    if X86_64_WIN
> +        callSlowPathWin64(slowPath, arg1, arg2)
> +    else
> +        cCall2(slowPath, arg1, arg2)
> +    end
> +end

This is not needed.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:395
> -    cCall2(slowPath, cfr, PC)
> +    cCall2SlowPath(slowPath, cfr, PC)

No need to change this.  Leave as calling cCall2.  Same with all similar cases below.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:415
> -    cCall2(slowPath, cfr, PC)
> +    cCall2SlowPath(slowPath, cfr, PC)

Ditto.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:422
> -    cCall2(_llint_slow_path_handle_watchdog_timer, cfr, PC)
> +    cCall2SlowPath(_llint_slow_path_handle_watchdog_timer, cfr, PC)

Ditto.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:434
> -            cCall2(_llint_loop_osr, cfr, PC)
> +            cCall2SlowPath(_llint_loop_osr, cfr, PC)

Ditto.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:555
> -    cCall2(slowPath, cfr, PC)   # This slowPath has the protocol: t0 = 0 => no error, t0 != 0 => error
> +    cCall2SlowPath(slowPath, cfr, PC)   # This slowPath has the protocol: t0 = 0 => no error, t0 != 0 => error

Ditto.

> Source/JavaScriptCore/llint/LowLevelInterpreter.asm:520
> -        cCall2(osrSlowPath, cfr, PC)
> +        cCall2SlowPath(osrSlowPath, cfr, PC)

Ditto.
Comment 21 peavo 2014-04-11 16:06:03 PDT
(In reply to comment #18)

Thanks for looking into this :)

> (From update of attachment 227778 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=227778&action=review
> 
> > Source/JavaScriptCore/jit/Repatch.cpp:836
> > -    stubJit.store32(MacroAssembler::TrustedImm32(reinterpret_cast<uint32_t>(structure->id())), MacroAssembler::Address(baseGPR, JSCell::structureIDOffset()));
> > +#if USE(JSVALUE64)
> > +    uint32_t val = structure->id();
> > +#else
> > +    uint32_t val = reinterpret_cast<uint32_t>(structure->id());
> > +#endif
> > +    stubJit.store32(MacroAssembler::TrustedImm32(val), MacroAssembler::Address(baseGPR, JSCell::structureIDOffset()));
> 
> Can you please clarify why is this change needed?  Your ChangeLog mentioned a "compile fix".  Can you please provide some details?  Thanks.

For some reason, MSVC does not accept a reinterpret_cast from uint32_t to uint32_t.

For example,

uint32_t p = 0;
reinterpret_cast<uint32_t>(p);

will generate the error

error C2440: 'reinterpret_cast' : cannot convert from 'uint32_t' to 'uint32_t'
Conversion is a valid standard conversion, which can be performed implicitly or by use of static_cast, C-style cast or function-style cast

> 
> > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:387
> > +# The signature of a slow path call is extern "C" SlowPathReturnType slowpathfunction(ExecState* exec, Instruction* pc).
> > +# The size of the SlowPathReturnType is 16 bytes.
> > +# On Win64, when the return type is larger than 8 bytes, we need to allocate space on the stack for the return value.
> > +# On entry rcx (t2), should contain a pointer to this stack space. The other parameters are shifted to the right,
> > +# rdx (t1) should contain the first argument, and r8 (t6) should contain the second argument.
> > +# On return, rax contains a pointer to this stack value, and we then need to copy the 16 byte return value into rax (t0) and rdx (t1)
> > +# since the return value is expected to be split between the two.
> > +# See http://msdn.microsoft.com/en-us/library/7572ztz4.aspx
> > +macro callSlowPathWin64(slowPath, arg1, arg2)
> > +    move arg1, t1
> > +    move arg2, t6
> > +    subp 80, sp
> > +    move sp, t2
> > +    addp 48, t2
> > +    call slowPath
> > +    addp 80, sp
> > +    move 8[t0], t1
> > +    move [t0], t0
> > +end
> 
> I suggest moving this into the cCall2() macro as a X86_64_WIN target instead of having a separate function like this.
> 
> > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:395
> > -    cCall2(slowPath, cfr, PC)
> > +    if X86_64_WIN
> > +        callSlowPathWin64(slowPath, cfr, PC)
> > +    else
> > +        cCall2(slowPath, cfr, PC)
> > +    end
> 
> Instead of adding this condition everywhere, is there any reason why you can't just add a case to cCall2() to handle X86_64_WIN?
> 
> Same with all the uses of cCall2() below.

I think we need to have a different implementation when the function return
type is larger than 64-bit, which is the case for the slow path call.
Win64 ABI then says that the otherwise first argument register (rcx) should
hold a pointer to a stack allocated area for the return value.

See also http://msdn.microsoft.com/en-us/library/7572ztz4.aspx for further details.

I made a new macro cCall2SlowPath in the latest patch, though, so we can avoid
adding the condition everywhere.
Comment 22 peavo 2014-04-11 16:18:25 PDT
Created attachment 229176 [details]
Patch
Comment 23 peavo 2014-04-11 16:21:56 PDT
Thanks again for reviewing :)

I updated the patch accordingly, but kept the cCall2SlowPath macro,
since I believe it is necessary to have two different implementations,
depending on the return type size of the call.

Maybe there is a more elegant way around this?
Comment 24 Build Bot 2014-04-11 17:34:40 PDT
Comment on attachment 229176 [details]
Patch

Attachment 229176 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/5928210986434560

New failing tests:
compositing/iframes/invisible-nested-iframe-hide.html
Comment 25 Build Bot 2014-04-11 17:34:44 PDT
Created attachment 229186 [details]
Archive of layout-test-results from webkit-ews-14 for mac-mountainlion-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-14  Port: mac-mountainlion-wk2  Platform: Mac OS X 10.8.5
Comment 26 Brent Fulgham 2014-04-17 09:58:28 PDT
Comment on attachment 229176 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=229176&action=review

A few minor corrections. Also, could you confirm that you really wanted to change the 'nativeCallTrampoline' method to use the t0 rather than t1? I don't see why that changed.

> Source/JavaScriptCore/llint/LLIntData.cpp:129
> +#if CPU(X86_64) && !OS(WINDOWS) || CPU(ARM64) || ENABLE(LLINT_C_LOOP)

Are we short-circuiting here? Maybe we need parentheses around the combined X86_64 and !OS(WINDOWS) test? I think the meaning would be clearer.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:300
> +        # 4 parameter registers, and the call instruction won't put the return address at the correct stack location then.

This should read:
# Also, we need to manually copy the return address to the stack, since before the call instruction we allocated space for the 4 parameter registers, and the call instruction won't put the return address at the correct stack location.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:2084
> +        loadp JSFunction::m_executable[arg2], t0

Why was this changed from t1 to t0? Do we need to add an 'arg3' case to the condition windows/non-windows code above?

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:2089
> +        call executableOffsetToFunction[t0]

Again, why is t1 being changed to t0?
Comment 27 peavo 2014-04-19 03:17:07 PDT
Created attachment 229733 [details]
Patch
Comment 28 peavo 2014-04-19 03:27:23 PDT
(In reply to comment #26)

Thanks for reviewing :)

And sorry for the late reply.

> (From update of attachment 229176 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=229176&action=review
> 
> A few minor corrections. Also, could you confirm that you really wanted to change the 'nativeCallTrampoline' method to use the t0 rather than t1? I don't see why that changed.

Yes, this is intentional.
t1 (rdx) is used to hold the 2. parameter on Windows, so it's important that is's not overwritten and used as a temp register.
Better use t0 as a temp register, which is available for use on all platforms, it seems.

> 
> > Source/JavaScriptCore/llint/LLIntData.cpp:129
> > +#if CPU(X86_64) && !OS(WINDOWS) || CPU(ARM64) || ENABLE(LLINT_C_LOOP)
> 
> Are we short-circuiting here? Maybe we need parentheses around the combined X86_64 and !OS(WINDOWS) test? I think the meaning would be clearer.

Fixed in latest patch.

> 
> > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:300
> > +        # 4 parameter registers, and the call instruction won't put the return address at the correct stack location then.
> 
> This should read:
> # Also, we need to manually copy the return address to the stack, since before the call instruction we allocated space for the 4 parameter registers, and the call instruction won't put the return address at the correct stack location.
> 

Updated in latest patch.

> > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:2084
> > +        loadp JSFunction::m_executable[arg2], t0
> 
> Why was this changed from t1 to t0? Do we need to add an 'arg3' case to the condition windows/non-windows code above?

See above comment, avoid using t1 as temp register on Windows, as it is used to hold the 2. argument.

> 
> > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:2089
> > +        call executableOffsetToFunction[t0]
> 
> Again, why is t1 being changed to t0?

See above comment.
Comment 29 peavo 2014-04-22 09:51:21 PDT
Created attachment 229891 [details]
Patch
Comment 30 peavo 2014-04-22 09:55:17 PDT
(In reply to comment #26)
> (From update of attachment 229176 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=229176&action=review
> 
> A few minor corrections. Also, could you confirm that you really wanted to change the 'nativeCallTrampoline' method to use the t0 rather than t1? I don't see why that changed.
> 

Added a register constant for this in latest patch, to avoid changing the register usage in nativeCallTrampoline on Mac.
Comment 31 peavo 2014-05-07 05:30:24 PDT
Created attachment 230994 [details]
Patch
Comment 32 peavo 2014-05-07 05:37:16 PDT
(In reply to comment #18)
> (From update of attachment 227778 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=227778&action=review
> 
> > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:387
> > +# The signature of a slow path call is extern "C" SlowPathReturnType slowpathfunction(ExecState* exec, Instruction* pc).
> > +# The size of the SlowPathReturnType is 16 bytes.
> > +# On Win64, when the return type is larger than 8 bytes, we need to allocate space on the stack for the return value.
> > +# On entry rcx (t2), should contain a pointer to this stack space. The other parameters are shifted to the right,
> > +# rdx (t1) should contain the first argument, and r8 (t6) should contain the second argument.
> > +# On return, rax contains a pointer to this stack value, and we then need to copy the 16 byte return value into rax (t0) and rdx (t1)
> > +# since the return value is expected to be split between the two.
> > +# See http://msdn.microsoft.com/en-us/library/7572ztz4.aspx
> > +macro callSlowPathWin64(slowPath, arg1, arg2)
> > +    move arg1, t1
> > +    move arg2, t6
> > +    subp 80, sp
> > +    move sp, t2
> > +    addp 48, t2
> > +    call slowPath
> > +    addp 80, sp
> > +    move 8[t0], t1
> > +    move [t0], t0
> > +end
> 
> I suggest moving this into the cCall2() macro as a X86_64_WIN target instead of having a separate function like this.
> 
> > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:395
> > -    cCall2(slowPath, cfr, PC)
> > +    if X86_64_WIN
> > +        callSlowPathWin64(slowPath, cfr, PC)
> > +    else
> > +        cCall2(slowPath, cfr, PC)
> > +    end
> 
> Instead of adding this condition everywhere, is there any reason why you can't just add a case to cCall2() to handle X86_64_WIN?
> 
> Same with all the uses of cCall2() below.

Mark, I added a Win64 case to cCall2() as you suggested.
Comment 33 peavo 2014-05-15 10:57:43 PDT
Created attachment 231527 [details]
Patch
Comment 34 peavo 2014-05-15 10:58:33 PDT
(In reply to comment #33)
> Created an attachment (id=231527) [details]
> Patch

Rebased patch.
Comment 35 Alex Christensen 2014-05-16 09:48:39 PDT
View in context: https://bugs.webkit.org/attachment.cgi?id=231527&action=review

> Source/JavaScriptCore/JavaScriptCore.vcxproj/jsc/jscCommon.props:19
> +      <StackReserveSize>2097152</StackReserveSize>

Won't increasing the stack reserve size also need to be done in WinLauncher, DumpRenderTree, and any other executable that uses JavaScriptCore?  jsc.exe is the only executable used to run the function-apply-many-args test, but the stack overflow problem will exist in any executable using JavaScriptCore if it runs code similar to the test.
Comment 36 peavo 2014-05-16 10:34:11 PDT
(In reply to comment #35)
> View in context: https://bugs.webkit.org/attachment.cgi?id=231527&action=review
> 
> > Source/JavaScriptCore/JavaScriptCore.vcxproj/jsc/jscCommon.props:19
> > +      <StackReserveSize>2097152</StackReserveSize>
> 
> Won't increasing the stack reserve size also need to be done in WinLauncher, DumpRenderTree, and any other executable that uses JavaScriptCore?  jsc.exe is the only executable used to run the function-apply-many-args test, but the stack overflow problem will exist in any executable using JavaScriptCore if it runs code similar to the test.

Good point, we could do that, but I'm not sure it's required, as I assume it's OK for different platforms/executables to have different stack limits.
A script allocating past the stack limit will get a JS exception, and not a crash, I believe, which can be handled by the script.
Please correct me if I'm wrong :)
Comment 37 Alex Christensen 2014-05-20 10:31:52 PDT
I'm ok with this, then.  Someone who works more with JavaScriptCore will have to review it, though.
Comment 38 peavo 2014-05-21 09:11:00 PDT
(In reply to comment #37)
> I'm ok with this, then.  Someone who works more with JavaScriptCore will have to review it, though.

OK, thanks :)
Comment 39 Alex Christensen 2014-05-21 23:17:12 PDT
This patch still applies, compiles, runs, and only has the 22 failures I noted earlier, introducing no new failures.
Comment 40 Mark Lam 2014-05-23 16:52:59 PDT
Comment on attachment 231527 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=231527&action=review

> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:159
> +        // We also need to allocate space on the stack for the 4 parameter registers.

nit: s/space/the shadow space/ to match MSVC documentation which calls it "shadow space".

> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:161
> +        // TODO: Should we also put the return address next to the frame pointer on the stack?
> +        // The call instruction will not put the return address at the expected place because of the 32 byte stack allocation.

I think we don't need this.  We only keep that convention so that tools can walk the stack.  In this case, it doesn't look like MSVC relies on that anyway since the frame pointer is optional.  Let's remove this comment.

> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:162
> +        store64(X86Registers::ebp, Address(X86Registers::esp, -16));

Why do we need to preserve the ebp here?  According to http://msdn.microsoft.com/en-us/library/6t169e9c.aspx, ebp is non-volatile.

> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:163
> +        sub64(TrustedImm32(32), X86Registers::esp);

nit: can you use 4 * sizeof(int64_t) instead of 32?  I think it's more descriptive that way.

> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:168
> +        add64(TrustedImm32(32), X86Registers::esp);

This only accounts for the 4 shadow space slows for the args above.  Don't you have to also pop the 16 bytes for the ebp, and restore the ebp?  If we don't need to save the ebp above, then there's no issue here.

Also, I'd prefer it if you use 4 * sizeof(int64_t) instead of 32.

> Source/JavaScriptCore/jit/JITStubsMSVC64.asm:35
> +    ; Allocate space for all 4 parameter registers, and align stack pointer to 16 bytes.
> +    sub rsp, 40

I couldn't find any MSVC documentation that describes how the stack must be aligned.  I understand the first 32 bytes is for the arg registers shadow space, but Is the extra 8 byte padding here really appropriate and necessary? Did you encounter an issue without this padding?

> Source/JavaScriptCore/jit/JITStubsMSVC64.asm:37
> +    add rsp, 40

ditto.

> Source/JavaScriptCore/jit/ThunkGenerators.cpp:404
> +    jit.subPtr(JSInterfaceJIT::TrustedImm32(32), JSInterfaceJIT::stackPointerRegister);

nit: In nativeForGenerator() above, you used 4 * sizeof(int64_t), which I thought is more descriptive than a literal 32 here.  Would you mind changing this to say 4 * sizeof(int64_t) instead?

> Source/JavaScriptCore/jit/ThunkGenerators.cpp:413
> +    jit.addPtr(JSInterfaceJIT::TrustedImm32(32), JSInterfaceJIT::stackPointerRegister);

ditto.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:71
> +        subp 80, sp

32 (args shadow space) + 16 (return value) ==> 48.  Why 80?  What am I missing?

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:73
> +        addp 48, t2

Why 48?

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:327
> +    elsif X86_64_WIN
> +        # On Win64 we need to manually copy the frame pointer to the stack, since MSVC may not maintain a frame pointer on 64-bit.
> +        # See http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx where it's stated that rbp MAY be used as a frame pointer.
> +        # Also, we need to manually copy the return address to the stack, since before the call instruction we allocated space
> +        # for the 4 parameter registers, and the call instruction won't put the return address at the correct stack location.
> +        move sp, t2
> +        storep cfr, [sp]
> +        call .copyReturnAddressToStack
> +.copyReturnAddressToStack:
> +        pop 8[sp]
> +        # Adjust the return address with the offset to the instruction after the call instruction.
> +        addp 16, 8[sp]

Apart from the "move sp, t2", why do we need to do this?   The rbp is a non-volatile register even if it's not used as a frame pointer.
Comment 41 peavo 2014-05-26 07:32:04 PDT
Created attachment 232079 [details]
Patch
Comment 42 peavo 2014-05-26 07:33:45 PDT
(In reply to comment #40)
> (From update of attachment 231527 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=231527&action=review
> 

Thanks for reviewing!

> > Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:162
> > +        store64(X86Registers::ebp, Address(X86Registers::esp, -16));
> 
> Why do we need to preserve the ebp here?  According to http://msdn.microsoft.com/en-us/library/6t169e9c.aspx, ebp is non-volatile.
> 

This is not done to preserve ebp.
As I've understood it, for a host function call, JIT relies on that the frame pointer is put on the stack after the return address.
The first parameter of a host function call (ExecState *), will then point to the value of the frame pointer.
Since MSVC does not maintain the frame pointer on 64-bit, we have to do this manually on Windows.
Without this, I get many crashes/failures running the stress tests.

> 
> > Source/JavaScriptCore/jit/JITStubsMSVC64.asm:35
> > +    ; Allocate space for all 4 parameter registers, and align stack pointer to 16 bytes.
> > +    sub rsp, 40
> 
> I couldn't find any MSVC documentation that describes how the stack must be aligned.  I understand the first 32 bytes is for the arg registers shadow space, but Is the extra 8 byte padding here really appropriate and necessary? Did you encounter an issue without this padding?
> 

Yes, I was consistently getting a crash in the crt library without the extra stack alignment.
It was crashing on a floating point instruction, if I remember correctly.

>  
> > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:71
> > +        subp 80, sp
> 
> 32 (args shadow space) + 16 (return value) ==> 48.  Why 80?  What am I missing?
> 

Fixed in latest patch.

> 
> > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:327
> > +    elsif X86_64_WIN
> > +        # On Win64 we need to manually copy the frame pointer to the stack, since MSVC may not maintain a frame pointer on 64-bit.
> > +        # See http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx where it's stated that rbp MAY be used as a frame pointer.
> > +        # Also, we need to manually copy the return address to the stack, since before the call instruction we allocated space
> > +        # for the 4 parameter registers, and the call instruction won't put the return address at the correct stack location.
> > +        move sp, t2
> > +        storep cfr, [sp]
> > +        call .copyReturnAddressToStack
> > +.copyReturnAddressToStack:
> > +        pop 8[sp]
> > +        # Adjust the return address with the offset to the instruction after the call instruction.
> > +        addp 16, 8[sp]
> 
> Apart from the "move sp, t2", why do we need to do this?   The rbp is a non-volatile register even if it's not used as a frame pointer.

I have removed the calculation of the return address, since it's not really needed, as you mentioned earlier.
Storing the frame pointer on the stack is needed, I believe, since JIT relies on that the frame pointer is put on the stack, and MSVC does not maintain the frame pointer.
The first parameter of the host function call (ExecState *), will then point to the value of the frame pointer.
The same is also done in the CLOOP case.
Comment 43 peavo 2014-05-26 13:29:15 PDT
Created attachment 232093 [details]
Patch
Comment 44 peavo 2014-05-26 13:33:30 PDT
(In reply to comment #42)
> (In reply to comment #40)
> > (From update of attachment 231527 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=231527&action=review
> > 
>  
> > > Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:162
> > > +        store64(X86Registers::ebp, Address(X86Registers::esp, -16));
> > 
> > Why do we need to preserve the ebp here?  According to http://msdn.microsoft.com/en-us/library/6t169e9c.aspx, ebp is non-volatile.
> > 
> 
> This is not done to preserve ebp.
> As I've understood it, for a host function call, JIT relies on that the frame pointer is put on the stack after the return address.
> The first parameter of a host function call (ExecState *), will then point to the value of the frame pointer.
> Since MSVC does not maintain the frame pointer on 64-bit, we have to do this manually on Windows.
> Without this, I get many crashes/failures running the stress tests.
> 

Fixed incorrect comment in latest patch.
It is the CallerFrame (frame pointer) that needs to be stored manually at the correct location on the stack, since we cannot rely on MSVC to do this.

> > 
> > > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:327
> > > +    elsif X86_64_WIN
> > > +        # On Win64 we need to manually copy the frame pointer to the stack, since MSVC may not maintain a frame pointer on 64-bit.
> > > +        # See http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx where it's stated that rbp MAY be used as a frame pointer.
> > > +        # Also, we need to manually copy the return address to the stack, since before the call instruction we allocated space
> > > +        # for the 4 parameter registers, and the call instruction won't put the return address at the correct stack location.
> > > +        move sp, t2
> > > +        storep cfr, [sp]
> > > +        call .copyReturnAddressToStack
> > > +.copyReturnAddressToStack:
> > > +        pop 8[sp]
> > > +        # Adjust the return address with the offset to the instruction after the call instruction.
> > > +        addp 16, 8[sp]
> > 
> > Apart from the "move sp, t2", why do we need to do this?   The rbp is a non-volatile register even if it's not used as a frame pointer.
> 
> I have removed the calculation of the return address, since it's not really needed, as you mentioned earlier.
> Storing the frame pointer on the stack is needed, I believe, since JIT relies on that the frame pointer is put on the stack, and MSVC does not maintain the frame pointer.
> The first parameter of the host function call (ExecState *), will then point to the value of the frame pointer.
> The same is also done in the CLOOP case.

Fixed incorrect comment in latest patch.
It is the CallerFrame (frame pointer) that needs to be stored manually at the correct location on the stack, since we cannot rely on MSVC to do this.
Comment 45 peavo 2014-05-30 14:05:28 PDT
Created attachment 232302 [details]
Patch
Comment 46 peavo 2014-05-30 14:09:39 PDT
(In reply to comment #45)
> Created an attachment (id=232302) [details]
> Patch

Fixed a problem where the CallerFrame stored on the stack will be overwritten if the called host function has 3 or more parameters.
This happens when the callee copies it's register parameters into it's 32 byte shadow space.
Comment 47 Alex Christensen 2014-06-05 17:02:59 PDT
*** Bug 127827 has been marked as a duplicate of this bug. ***
Comment 48 peavo 2014-06-20 13:56:56 PDT
Created attachment 233450 [details]
Patch
Comment 49 peavo 2014-06-20 13:58:42 PDT
(In reply to comment #48)
> Created an attachment (id=233450) [details]
> Patch

Merged patch with latest after conflicts.
Comment 50 Brent Fulgham 2014-06-23 11:58:46 PDT
JSC guys: Can we land this now?
Comment 51 Mark Lam 2014-06-23 12:06:48 PDT
(In reply to comment #50)
> JSC guys: Can we land this now?

I'm looking at it now.
Comment 52 Mark Lam 2014-06-23 13:12:53 PDT
Comment on attachment 233450 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=233450&action=review

There's a chance that this patch could work and that the errors in stack pointer adjustments only resulted in a some wasted slots (based on my educated guess of what is happening here).  However, I'd rather not approve this patch based on my guesses.  Please fix the math on all the stack pointer adjustments, or explain what I'm missing.  Thanks.

> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:157
> +        // JIT relies on that the CallerFrame (frame pointer) is put on the stack,

/is put on the stack,/being on the stack./

> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:166
> +        // We also need to allocate the shadow space on the stack for the 4 parameter registers.
> +        // Also, to avoid the potential problem that the callee will overwrite the stored frame pointer
> +        // when it copies it's parameter registers into the shadow space, we should allocate another 32 bytes.
> +        // Note: POKE_ARGUMENT_OFFSET is adjusted accordingly.
> +        sub64(TrustedImm32(8 * sizeof(int64_t)), X86Registers::esp);

I'm still not sure if I understand what you're trying to do here.  Here's what I see needs to be done:

1. Allocate space for the CallerFrame pointer and return address (though not populated) [2 slots]
2. Allocate space for the shadow space for the 4 parameter registers [4 slots]

Each of these slots are 64-bits.  Hence, the stack pointer is still 64-bit (16 byte) aligned after "pushing" the 6 slots.  It doesn't look like you there's any alignment padding needed.

So, I don't understand the following:
1. why are you allocating 8 slots?
2. You said that the callee may copy the parameter registers into the shadow space.  Isn't that what the 4 slots is for?

There are 2 slots that are unexplained and unaccounted for here.  Can you please explain what you're trying to achieve here (with the 2 slots)?  If I've misunderstood your reasoning in my comments above, please correct me.  Thanks.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:333
> +        # We need to allocate 48 bytes on the stack; 32 bytes for the shadow space, and 8 bytes for the frame pointer we put on the stack,
> +        # and 8 bytes for the function return address (which isn't really needed, so we only make room for it here).
> +        subp 48, sp

This is wrong.  It should be "subp 32, sp".

If you look at the "else" case below, you'll see that it adds 2 slots to the sp.  The reason for this is because the sp is already pointing to the base of a CallFrame, which starts with a CallerFrame pointer followed by a return address.  When we call the host function here, we're expecting the callee to push the return address followed by the CallerFrame pointer.  That is why we add 16 (i.e. 2 slots) to the sp.

In your case, the callee will not store the return address nor the CallerFrame pointer.  Hence, there's no need for the "addp 16, sp" adjustment.  It is sufficient to store the cfr in [sp] as you have done above.  Thereafter, you only need to "sub 32, sp" to make space for the shadow space.  There's no need for another 16 bytes there.  This appears to be the same logic issue with your other adjustments above.  Please correct me if I've misunderstood something here.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:335
> +        addp 48, sp

"addp 32, sp" for reasons stated above.
Comment 53 peavo 2014-06-23 14:10:14 PDT
Created attachment 233634 [details]
Patch
Comment 54 peavo 2014-06-23 14:14:10 PDT
(In reply to comment #52)
> (From update of attachment 233450 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=233450&action=review
> 

Thanks for reviewing :)

> There are 2 slots that are unexplained and unaccounted for here.  Can you please explain what you're trying to achieve here (with the 2 slots)?  If I've misunderstood your reasoning in my comments above, please correct me.  Thanks.
> 

I believe we have to allocate another 2 slots for two more function parameters, since the call can have up to 6 parameters.
Updated the comment in the patch.

> > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:333
> > +        # We need to allocate 48 bytes on the stack; 32 bytes for the shadow space, and 8 bytes for the frame pointer we put on the stack,
> > +        # and 8 bytes for the function return address (which isn't really needed, so we only make room for it here).
> > +        subp 48, sp
> 
> This is wrong.  It should be "subp 32, sp".
> 

Thanks, fixed in latest patch.
Comment 55 Mark Lam 2014-06-23 16:16:55 PDT
Comment on attachment 233634 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=233634&action=review

> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:166
> +        // We also need to allocate the shadow space on the stack for the 4 parameter registers.
> +        // Also, we should allocate 16 bytes for the frame pointer, and return address (not populated).
> +        // In addition, we need to allocate 16 bytes for two more parameters, since the call can have up to 6 parameters.
> +        // Note: POKE_ARGUMENT_OFFSET is adjusted accordingly.
> +        sub64(TrustedImm32(8 * sizeof(int64_t)), X86Registers::esp);

I've thought thru this some more, and realized that this is actually wrong.  See my comment on POKE_ARGUMENT_OFFSET below for details.  In summary, the space for the parameter shadow space should have already been reserved in maxFrameExtentForSlowPathCall in assembler/MaxFrameExtentForSlowPathCall.h, and it appears to be already reserved there.  The calls to the various setupArguments() (defined in CCallHelpers.h) will poke arguments into the space that is reserved by maxFrameExtentForSlowPathCall before we store the CallerFrame pointer above.

The issue that we're really trying to solve here is that the callee is not expecting the CallerFrame pointer to be on stack.  Hence, after "pushing" the CallerFrame pointer, we need to replicate the arguments that are being passed to the callee.  For the first 4 arguments, this is a simple matter of reserving some space.  For the 5th and 6th argument, we need to copy them here or the callee will see junk instead.

However, we don't actually know how many arguments are being passed for this call.  Fortunately, we can just be conservative and reserve the worst case amount of space needed.  That said, we still need to conservatively copy the 5th and 6th arguments over as if they exists.  Please add that code.

> Source/JavaScriptCore/jit/CCallHelpers.h:786
> -#if CPU(MIPS) || (OS(WINDOWS) && CPU(X86_64))
> +#if CPU(MIPS)
>  #define POKE_ARGUMENT_OFFSET 4
> +#elif CPU(X86_64) && OS(WINDOWS)
> +#define POKE_ARGUMENT_OFFSET -4
>  #else

This is wrong.  Please undo.

At the moment when the various setupArgument()s are called, the sp should be pointing to a spot on the stack where we're ready to push the callee's return address and previous frame pointer:

     |  prev frame* goes here   |
     |---------------------|
     |  ret addr goes here        |
     |---------------------|
     |                                      |  <---- sp points here
     |         reserved space      |
     |         for max callee       |
     |         args & other          |
     |         stuff.                     |
     |                                      |       ^
     |---------------------|       | Stack grows in this direction.
     |    Current JS frame         |       |

In the Win64 case, this means POKE_ARGUMENT_OFFSET should be 4 to reserve 4 slots for the parameter shadow space.  When we "push" the 5th argument, it gets poke'd to sp + POKE_ARGUMENT_OFFSET i.e. sp[4] as expected.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:77
> +    elsif X86_64_WIN
> +        # Note: this implementation is only correct if the return type size is > 8 bytes.
> +        # See macro cCall2Void for an implementation when the return type <= 8 bytes.
> +        # On Win64, when the return type is larger than 8 bytes, we need to allocate space on the stack for the return value.
> +        # On entry rcx (t2), should contain a pointer to this stack space. The other parameters are shifted to the right,
> +        # rdx (t1) should contain the first argument, and r8 (t6) should contain the second argument.
> +        # On return, rax contains a pointer to this stack value, and we then need to copy the 16 byte return value into rax (t0) and rdx (t1)
> +        # since the return value is expected to be split between the two.
> +        # See http://msdn.microsoft.com/en-us/library/7572ztz4.aspx
> +        move arg1, t1
> +        move arg2, t6
> +        subp 48, sp
> +        move sp, t2
> +        addp 32, t2
> +        call function
> +        addp 48, sp
> +        move 8[t0], t1
> +        move [t0], t0

cCall2() is used to call slow path functions that may recurse and call JS functions again.  Doesn't that mean you also need to write the CallerFrame pointer to the stack here?

As for cCall2Void() below, a cursory scan of its callers tells me that we don't currently use it to call something that can recurse into JS.  But we don't know if we may in the future.  Do you think it needs to store the CallerFrame pointer there too?
Comment 56 peavo 2014-06-24 09:31:21 PDT
Created attachment 233709 [details]
Patch
Comment 57 peavo 2014-06-24 09:38:05 PDT
(In reply to comment #55)
> (From update of attachment 233634 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=233634&action=review
> 

Thanks again for reviewing :)

> However, we don't actually know how many arguments are being passed for this call.  Fortunately, we can just be conservative and reserve the worst case amount of space needed.  That said, we still need to conservatively copy the 5th and 6th arguments over as if they exists.  Please add that code.
> 
> > Source/JavaScriptCore/jit/CCallHelpers.h:786
> > -#if CPU(MIPS) || (OS(WINDOWS) && CPU(X86_64))
> > +#if CPU(MIPS)
> >  #define POKE_ARGUMENT_OFFSET 4
> > +#elif CPU(X86_64) && OS(WINDOWS)
> > +#define POKE_ARGUMENT_OFFSET -4
> >  #else
> 
> This is wrong.  Please undo.
> 

Thanks, updated patch accordingly.

I'm probably missing something, but isn't this roughly the same as setting POKE_ARGUMENT_OFFSET == - 4?
See attempt at drawing below :)


 Slot
     ----------------------
     |  ret addr           |     Put by call instruction.
     |---------------------|
 8   |  param 1            |
     |---------------------|          
 7   |  param 2            | 
     |---------------------|     Shadow space (slot 5-8)
 6   |  param 3            |
     |---------------------|
 5   |  param 4            |
     |---------------------|
 4   |  param 5            |     Arg 5 will be poke'd here when POKE_ARGUMENT_OFFSET == - 4
     |---------------------|
 3   |  param 6            |     Arg 6 will be poke'd here when POKE_ARGUMENT_OFFSET == - 4
     |---------------------|
 2   |  prev frame*        |
     |---------------------|
 1   |  ret addr (empty)   |
     |---------------------|
     |                     |  <---- sp points here
     |  param 1            |
     |---------------------|
     |  param 2            |
     |---------------------|
     |  param 3            |
     |---------------------|
     |  param 4            |
     |---------------------|
     |  param 5            |     Arg 5 will be poke'd here when POKE_ARGUMENT_OFFSET == 4 and copied to slot 4
     |---------------------|
     |  param 6            |     Arg 6 will be poke'd here when POKE_ARGUMENT_OFFSET == 4 and copied to slot 3
     |---------------------|  
     |    Current JS frame |  


> 
> cCall2() is used to call slow path functions that may recurse and call JS functions again.  Doesn't that mean you also need to write the CallerFrame pointer to the stack here?
> 
> As for cCall2Void() below, a cursory scan of its callers tells me that we don't currently use it to call something that can recurse into JS.  But we don't know if we may in the future.  Do you think it needs to store the CallerFrame pointer there too?

I don't think this is needed since the frame pointer will be passed to the function as the first parameter, e.g. cCall2(slowPath, cfr, PC).
In makeHostFunctionCall, on the other hand, we pass a pointer to the frame pointer (on the stack) as the first parameter:

move sp, arg1
storep cfr, [sp]
...
call ...

Please correct me if I'm wrong.
Comment 58 Mark Lam 2014-06-24 17:24:41 PDT
Comment on attachment 233709 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=233709&action=review

I would r+ this patch but first, I'd like you to try undoing the .opPutByIdSlow changes I suggested.  If they are not needed anymore, let's undo them.  Let me know if they are still or not.  Thanks.

> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:166
> +        load64(Address(X86Registers::esp, 32), scratchRegister);
> +        store64(scratchRegister, Address(X86Registers::esp, -32));
> +
> +        load64(Address(X86Registers::esp, 40), scratchRegister);
> +        store64(scratchRegister, Address(X86Registers::esp, -24));

To answer your question about using a POKE_ARGUMENT_OFFSET of -4 to achieve this, I think you made a good point there.  I think it will work (and one might want to go with that solution if performance is an issue here).  However, it's not obvious from reading the code as to how things work since it differs from how all other ports expect it to work.  So, if you were to go with that solution, I would like to see a comment here to describe why the 5th and 6th arguments are already in place and no further work is needed.  I would also want a comment in the definition POKE_ARGUMENT_OFFSET to explain why it has a negative offset.  However, if performance is not an issue, then let's stick with this more obvious solution for now.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1366
> +
> +    .opPutByIdSlow:
> +        callSlowPath(_llint_slow_path_put_by_id)
> +        dispatch(9)
> +

The LLINT code has been re-written recently such that all the bytecode handlers are now local labels inside the same function.  As such, I think this change may no longer be needed (though I had suggested it back when the LLINT was implementing each bytecode handler with a separate global label).  Sorry for the hassle, but can you please try undoing this change and seeing if LLINT on Win64 builds and works without this change?

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:-1307
> -.opPutByIdSlow:
> -    callSlowPath(_llint_slow_path_put_by_id)
> -    dispatch(9)
> -

Ditto.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1385
> -    additionalChecks(t1, t3, t2)
> +    additionalChecks(t1, t3, t2, .opPutByIdSlow)

Ditto.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1404
> +
> +    .opPutByIdSlow:
> +        callSlowPath(_llint_slow_path_put_by_id)
> +        dispatch(9)
> +

Ditto.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1407
> -macro noAdditionalChecks(oldStructure, scratch, scratch2)
> +macro noAdditionalChecks(oldStructure, scratch, scratch2, slowPath)

Ditto.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1410
> -macro structureChainChecks(oldStructure, scratch, scratch2)
> +macro structureChainChecks(oldStructure, scratch, scratch2, slowPath)

Ditto.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1421
> -    bpneq oldStructure, [scratch], .opPutByIdSlow
> +    bpneq oldStructure, [scratch], slowPath

Ditto.
Comment 59 peavo 2014-06-25 08:50:22 PDT
Created attachment 233815 [details]
Patch
Comment 60 peavo 2014-06-25 08:56:17 PDT
(In reply to comment #58)
> (From update of attachment 233709 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=233709&action=review
>

Thanks for reviewing :)
 
> I would r+ this patch but first, I'd like you to try undoing the .opPutByIdSlow changes I suggested.  If they are not needed anymore, let's undo them.  Let me know if they are still or not.  Thanks.
> 

They are not needed anymore, undone in latest patch.

> > Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:166
> > +        load64(Address(X86Registers::esp, 32), scratchRegister);
> > +        store64(scratchRegister, Address(X86Registers::esp, -32));
> > +
> > +        load64(Address(X86Registers::esp, 40), scratchRegister);
> > +        store64(scratchRegister, Address(X86Registers::esp, -24));
> 
> To answer your question about using a POKE_ARGUMENT_OFFSET of -4 to achieve this, I think you made a good point there.  I think it will work (and one might want to go with that solution if performance is an issue here).  However, it's not obvious from reading the code as to how things work since it differs from how all other ports expect it to work.  So, if you were to go with that solution, I would like to see a comment here to describe why the 5th and 6th arguments are already in place and no further work is needed.  I would also want a comment in the definition POKE_ARGUMENT_OFFSET to explain why it has a negative offset.  However, if performance is not an issue, then let's stick with this more obvious solution for now.
> 
Sounds good, I added a comment regarding the copying of argument 5 and 6.
Comment 61 Mark Lam 2014-06-25 09:00:28 PDT
Comment on attachment 233815 [details]
Patch

r=me
Comment 62 WebKit Commit Bot 2014-06-25 09:37:46 PDT
Comment on attachment 233815 [details]
Patch

Clearing flags on attachment: 233815

Committed r170428: <http://trac.webkit.org/changeset/170428>
Comment 63 WebKit Commit Bot 2014-06-25 09:37:55 PDT
All reviewed patches have been landed.  Closing bug.
Comment 64 peavo 2014-06-25 09:56:09 PDT
(In reply to comment #61)
> (From update of attachment 233815 [details])
> r=me

Thank you!
Comment 65 Alex Christensen 2014-06-25 10:39:09 PDT
hooray!
Comment 66 peavo 2014-07-22 10:54:54 PDT
*** Bug 124450 has been marked as a duplicate of this bug. ***