WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
130638
[Win64] ASM LLINT is not enabled.
https://bugs.webkit.org/show_bug.cgi?id=130638
Summary
[Win64] ASM LLINT is not enabled.
peavo
Reported
2014-03-22 05:31:12 PDT
It would be nice to have LLINT enabled on Win64, since it's now a requirement for JIT.
Attachments
Patch
(42.32 KB, patch)
2014-03-22 05:56 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(42.42 KB, patch)
2014-03-22 06:19 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(42.42 KB, patch)
2014-03-22 10:42 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(46.46 KB, patch)
2014-03-25 11:49 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(46.59 KB, patch)
2014-04-11 15:45 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(46.58 KB, patch)
2014-04-11 16:18 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Archive of layout-test-results from webkit-ews-14 for mac-mountainlion-wk2
(652.71 KB, application/zip)
2014-04-11 17:34 PDT
,
Build Bot
no flags
Details
Patch
(43.42 KB, patch)
2014-04-19 03:17 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(43.51 KB, patch)
2014-04-22 09:51 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(41.42 KB, patch)
2014-05-07 05:30 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(41.43 KB, patch)
2014-05-15 10:57 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(41.47 KB, patch)
2014-05-26 07:32 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(41.31 KB, patch)
2014-05-26 13:29 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(41.75 KB, patch)
2014-05-30 14:05 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(41.68 KB, patch)
2014-06-20 13:56 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(41.50 KB, patch)
2014-06-23 14:10 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(41.09 KB, patch)
2014-06-24 09:31 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Patch
(39.55 KB, patch)
2014-06-25 08:50 PDT
,
peavo
no flags
Details
Formatted Diff
Diff
Show Obsolete
(16)
View All
Add attachment
proposed patch, testcase, etc.
peavo
Comment 1
2014-03-22 05:56:20 PDT
Created
attachment 227547
[details]
Patch
peavo
Comment 2
2014-03-22 06:19:02 PDT
Created
attachment 227548
[details]
Patch
peavo
Comment 3
2014-03-22 06:34:06 PDT
With this patch I get these stress tests results with WinCairo 64-bit: ** The following JSC stress test failures have been introduced: mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-llint mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-baseline mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-dfg-eager-no-cjit-validate-phases stress/ftl-arithcos.js.default stress/ftl-arithcos.js.no-llint stress/ftl-arithcos.js.always-trigger-copy-phase stress/ftl-arithcos.js.no-cjit-validate-phases stress/ftl-arithcos.js.dfg-eager stress/ftl-arithcos.js.dfg-eager-no-cjit-validate jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-no-llint jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-dfg-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/function-apply-many-args.js.layout jsc-layout-tests.yaml/js/script-tests/function-apply-many-args.js.layout-no-llint jsc-layout-tests.yaml/js/script-tests/function-apply-many-args.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/function-apply-many-args.js.layout-dfg-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/math.js.layout jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-llint jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/math.js.layout-dfg-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-no-llint jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-dfg-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-no-llint jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-dfg-eager-no-cjit Results for JSC stress tests: 30 failures found.
peavo
Comment 4
2014-03-22 10:42:47 PDT
Created
attachment 227553
[details]
Patch
Geoffrey Garen
Comment 5
2014-03-22 11:27:27 PDT
Not sure how to interpret this review request. Are you saying that this patch introduces 30 failures? If so, that's not good.
peavo
Comment 6
2014-03-22 12:43:42 PDT
(In reply to
comment #5
)
> Not sure how to interpret this review request. Are you saying that this patch introduces 30 failures? If so, that's not good.
Currently, Win64 is using the C loop backend, which also has some failures. I will run a stress test with the C loop backend, and post those result for comparison.
peavo
Comment 7
2014-03-22 13:18:03 PDT
(In reply to
comment #6
)
> (In reply to
comment #5
) > > Not sure how to interpret this review request. Are you saying that this patch introduces 30 failures? If so, that's not good. > > Currently, Win64 is using the C loop backend, which also has some failures. > I will run a stress test with the C loop backend, and post those result for comparison.
These are the results when running with the C loop backend: ** The following JSC stress test failures have been introduced: mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-llint mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-baseline mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-dfg-eager-no-cjit-validate-phases stress/ftl-arithcos.js.default stress/ftl-arithcos.js.no-llint stress/ftl-arithcos.js.always-trigger-copy-phase stress/ftl-arithcos.js.no-cjit-validate-phases stress/ftl-arithcos.js.dfg-eager stress/ftl-arithcos.js.dfg-eager-no-cjit-validate jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-no-llint jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-dfg-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/math.js.layout jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-llint jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/math.js.layout-dfg-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-no-llint jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-dfg-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-no-llint jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-dfg-eager-no-cjit Results for JSC stress tests: 26 failures found.
Geoffrey Garen
Comment 8
2014-03-22 13:41:34 PDT
So, it looks like we've got two issues: (1) A bunch of tests are failing on Win64. (2) If you want to enable the LLInt, you need to resolve a regression in function-apply-many-args.
Geoffrey Garen
Comment 9
2014-03-22 13:44:13 PDT
Filed a separate bug for (1): <
https://bugs.webkit.org/show_bug.cgi?id=130646
>.
peavo
Comment 10
2014-03-22 13:45:25 PDT
(In reply to
comment #8
)
> So, it looks like we've got two issues: > > (1) A bunch of tests are failing on Win64. > > (2) If you want to enable the LLInt, you need to resolve a regression in function-apply-many-args.
I will have a look at the regression :)
peavo
Comment 11
2014-03-25 11:49:03 PDT
Created
attachment 227778
[details]
Patch
peavo
Comment 12
2014-03-25 11:54:07 PDT
(In reply to
comment #11
)
> Created an attachment (id=227778) [details] > Patch
Included a fix for the regression in function-apply-many-args. The reason for the failure was that we got a stack overflow exception. This is fixed by increasing the default stack size of 1MB to 2MB, in the project settings for jsc.exe. Test results with latest patch: ** The following JSC stress test failures have been introduced: mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-llint mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-baseline mozilla-tests.yaml/ecma_3/Date/15.9.5.6.js.mozilla-dfg-eager-no-cjit-validate-phases stress/ftl-arithcos.js.default stress/ftl-arithcos.js.no-llint stress/ftl-arithcos.js.always-trigger-copy-phase stress/ftl-arithcos.js.no-cjit-validate-phases stress/ftl-arithcos.js.dfg-eager stress/ftl-arithcos.js.dfg-eager-no-cjit-validate jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-no-llint jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-dfg-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/math.js.layout jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-llint jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/math.js.layout-dfg-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-no-llint jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-dfg-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-no-llint jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-dfg-eager-no-cjit Results for JSC stress tests: 26 failures found.
Brent Fulgham
Comment 13
2014-03-25 14:50:40 PDT
Does this bug supersede
Bug 129807
? Should I close the other bug, or land the patch on it?
peavo
Comment 14
2014-03-25 15:19:49 PDT
(In reply to
comment #13
)
> Does this bug supersede
Bug 129807
? Should I close the other bug, or land the patch on it?
Sorry for the late reply. I actually think
bug 129807
was landed. I'm not getting JSC compile errors with or without this patch.
peavo
Comment 15
2014-03-25 15:26:12 PDT
(In reply to
comment #14
)
> (In reply to
comment #13
) > > Does this bug supersede
Bug 129807
? Should I close the other bug, or land the patch on it? > > Sorry for the late reply. > I actually think
bug 129807
was landed. > I'm not getting JSC compile errors with or without this patch.
I think it was landed in changeset 165296.
Brent Fulgham
Comment 16
2014-04-01 10:14:51 PDT
Could the JSC team review this please? I'd love to see this integrated.
Alex Christensen
Comment 17
2014-04-10 15:01:54 PDT
I just confirmed that there are only 22 failures with or without this patch. Is there any reason why this patch should not be included as-is? Here are the 22 failures: stress/ftl-arithcos.js.default stress/ftl-arithcos.js.no-llint stress/ftl-arithcos.js.always-trigger-copy-phase stress/ftl-arithcos.js.no-cjit-validate-phases stress/ftl-arithcos.js.dfg-eager stress/ftl-arithcos.js.dfg-eager-no-cjit-validate jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-no-llint jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/date-constructor.js.layout-dfg-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/math.js.layout jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-llint jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/math.js.layout-dfg-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-no-llint jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/sort-stability.js.layout-dfg-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-no-llint jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/typedarray-zero-size.js.layout-dfg-eager-no-cjit
Mark Lam
Comment 18
2014-04-10 17:52:05 PDT
Comment on
attachment 227778
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=227778&action=review
> Source/JavaScriptCore/jit/Repatch.cpp:836 > - stubJit.store32(MacroAssembler::TrustedImm32(reinterpret_cast<uint32_t>(structure->id())), MacroAssembler::Address(baseGPR, JSCell::structureIDOffset())); > +#if USE(JSVALUE64) > + uint32_t val = structure->id(); > +#else > + uint32_t val = reinterpret_cast<uint32_t>(structure->id()); > +#endif > + stubJit.store32(MacroAssembler::TrustedImm32(val), MacroAssembler::Address(baseGPR, JSCell::structureIDOffset()));
Can you please clarify why is this change needed? Your ChangeLog mentioned a "compile fix". Can you please provide some details? Thanks.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:387 > +# The signature of a slow path call is extern "C" SlowPathReturnType slowpathfunction(ExecState* exec, Instruction* pc). > +# The size of the SlowPathReturnType is 16 bytes. > +# On Win64, when the return type is larger than 8 bytes, we need to allocate space on the stack for the return value. > +# On entry rcx (t2), should contain a pointer to this stack space. The other parameters are shifted to the right, > +# rdx (t1) should contain the first argument, and r8 (t6) should contain the second argument. > +# On return, rax contains a pointer to this stack value, and we then need to copy the 16 byte return value into rax (t0) and rdx (t1) > +# since the return value is expected to be split between the two. > +# See
http://msdn.microsoft.com/en-us/library/7572ztz4.aspx
> +macro callSlowPathWin64(slowPath, arg1, arg2) > + move arg1, t1 > + move arg2, t6 > + subp 80, sp > + move sp, t2 > + addp 48, t2 > + call slowPath > + addp 80, sp > + move 8[t0], t1 > + move [t0], t0 > +end
I suggest moving this into the cCall2() macro as a X86_64_WIN target instead of having a separate function like this.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:395 > - cCall2(slowPath, cfr, PC) > + if X86_64_WIN > + callSlowPathWin64(slowPath, cfr, PC) > + else > + cCall2(slowPath, cfr, PC) > + end
Instead of adding this condition everywhere, is there any reason why you can't just add a case to cCall2() to handle X86_64_WIN? Same with all the uses of cCall2() below.
peavo
Comment 19
2014-04-11 15:45:01 PDT
Created
attachment 229169
[details]
Patch
Mark Lam
Comment 20
2014-04-11 15:59:19 PDT
Comment on
attachment 229169
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=229169&action=review
> Source/JavaScriptCore/jit/Repatch.cpp:1048 > - stubJit.store32(MacroAssembler::TrustedImm32(reinterpret_cast<uint32_t>(structure->id())), MacroAssembler::Address(baseGPR, JSCell::structureIDOffset())); > +#if USE(JSVALUE64) > + uint32_t val = structure->id(); > +#else > + uint32_t val = reinterpret_cast<uint32_t>(structure->id()); > +#endif > + stubJit.store32(MacroAssembler::TrustedImm32(val), MacroAssembler::Address(baseGPR, JSCell::structureIDOffset()));
Did you explain why this is needed?
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:209 > - cCall2(_llint_throw_stack_overflow_error, vm, protoCallFrame) > + cCall2SlowPath(_llint_throw_stack_overflow_error, vm, protoCallFrame)
Not needed.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:327 > else > - addp 16, sp > + if X86_64_WIN > + subp 32, sp > + else > + addp 16, sp > + end > call temp > - subp 16, sp > + if X86_64_WIN > + addp 32, sp > + else > + subp 16, sp > + end > end
This will be easier to read and understand if you express it as: elsif if X86_64_WIN subp 32, sp call temp addp 32, sp else addp 16, sp call temp subp 16, sp end
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:383 > +# The signature of a slow path call is extern "C" SlowPathReturnType slowpathfunction(ExecState* exec, Instruction* pc). > +# The size of the SlowPathReturnType is 16 bytes. > +# On Win64, when the return type is larger than 8 bytes, we need to allocate space on the stack for the return value. > +# On entry rcx (t2), should contain a pointer to this stack space. The other parameters are shifted to the right, > +# rdx (t1) should contain the first argument, and r8 (t6) should contain the second argument. > +# On return, rax contains a pointer to this stack value, and we then need to copy the 16 byte return value into rax (t0) and rdx (t1) > +# since the return value is expected to be split between the two. > +# See
http://msdn.microsoft.com/en-us/library/7572ztz4.aspx
> +macro callSlowPathWin64(slowPath, arg1, arg2) > + move arg1, t1 > + move arg2, t6 > + subp 80, sp > + move sp, t2 > + addp 48, t2 > + call slowPath > + addp 80, sp > + move 8[t0], t1 > + move [t0], t0 > +end
You misunderstood me. I meant for you to move this inside cCall2. Is there any reason why that cannot be done?
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:391 > +macro cCall2SlowPath(slowPath, arg1, arg2) > + if X86_64_WIN > + callSlowPathWin64(slowPath, arg1, arg2) > + else > + cCall2(slowPath, arg1, arg2) > + end > +end
This is not needed.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:395 > - cCall2(slowPath, cfr, PC) > + cCall2SlowPath(slowPath, cfr, PC)
No need to change this. Leave as calling cCall2. Same with all similar cases below.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:415 > - cCall2(slowPath, cfr, PC) > + cCall2SlowPath(slowPath, cfr, PC)
Ditto.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:422 > - cCall2(_llint_slow_path_handle_watchdog_timer, cfr, PC) > + cCall2SlowPath(_llint_slow_path_handle_watchdog_timer, cfr, PC)
Ditto.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:434 > - cCall2(_llint_loop_osr, cfr, PC) > + cCall2SlowPath(_llint_loop_osr, cfr, PC)
Ditto.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:555 > - cCall2(slowPath, cfr, PC) # This slowPath has the protocol: t0 = 0 => no error, t0 != 0 => error > + cCall2SlowPath(slowPath, cfr, PC) # This slowPath has the protocol: t0 = 0 => no error, t0 != 0 => error
Ditto.
> Source/JavaScriptCore/llint/LowLevelInterpreter.asm:520 > - cCall2(osrSlowPath, cfr, PC) > + cCall2SlowPath(osrSlowPath, cfr, PC)
Ditto.
peavo
Comment 21
2014-04-11 16:06:03 PDT
(In reply to
comment #18
) Thanks for looking into this :)
> (From update of
attachment 227778
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=227778&action=review
> > > Source/JavaScriptCore/jit/Repatch.cpp:836 > > - stubJit.store32(MacroAssembler::TrustedImm32(reinterpret_cast<uint32_t>(structure->id())), MacroAssembler::Address(baseGPR, JSCell::structureIDOffset())); > > +#if USE(JSVALUE64) > > + uint32_t val = structure->id(); > > +#else > > + uint32_t val = reinterpret_cast<uint32_t>(structure->id()); > > +#endif > > + stubJit.store32(MacroAssembler::TrustedImm32(val), MacroAssembler::Address(baseGPR, JSCell::structureIDOffset())); > > Can you please clarify why is this change needed? Your ChangeLog mentioned a "compile fix". Can you please provide some details? Thanks.
For some reason, MSVC does not accept a reinterpret_cast from uint32_t to uint32_t. For example, uint32_t p = 0; reinterpret_cast<uint32_t>(p); will generate the error error C2440: 'reinterpret_cast' : cannot convert from 'uint32_t' to 'uint32_t' Conversion is a valid standard conversion, which can be performed implicitly or by use of static_cast, C-style cast or function-style cast
> > > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:387 > > +# The signature of a slow path call is extern "C" SlowPathReturnType slowpathfunction(ExecState* exec, Instruction* pc). > > +# The size of the SlowPathReturnType is 16 bytes. > > +# On Win64, when the return type is larger than 8 bytes, we need to allocate space on the stack for the return value. > > +# On entry rcx (t2), should contain a pointer to this stack space. The other parameters are shifted to the right, > > +# rdx (t1) should contain the first argument, and r8 (t6) should contain the second argument. > > +# On return, rax contains a pointer to this stack value, and we then need to copy the 16 byte return value into rax (t0) and rdx (t1) > > +# since the return value is expected to be split between the two. > > +# See
http://msdn.microsoft.com/en-us/library/7572ztz4.aspx
> > +macro callSlowPathWin64(slowPath, arg1, arg2) > > + move arg1, t1 > > + move arg2, t6 > > + subp 80, sp > > + move sp, t2 > > + addp 48, t2 > > + call slowPath > > + addp 80, sp > > + move 8[t0], t1 > > + move [t0], t0 > > +end > > I suggest moving this into the cCall2() macro as a X86_64_WIN target instead of having a separate function like this. > > > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:395 > > - cCall2(slowPath, cfr, PC) > > + if X86_64_WIN > > + callSlowPathWin64(slowPath, cfr, PC) > > + else > > + cCall2(slowPath, cfr, PC) > > + end > > Instead of adding this condition everywhere, is there any reason why you can't just add a case to cCall2() to handle X86_64_WIN? > > Same with all the uses of cCall2() below.
I think we need to have a different implementation when the function return type is larger than 64-bit, which is the case for the slow path call. Win64 ABI then says that the otherwise first argument register (rcx) should hold a pointer to a stack allocated area for the return value. See also
http://msdn.microsoft.com/en-us/library/7572ztz4.aspx
for further details. I made a new macro cCall2SlowPath in the latest patch, though, so we can avoid adding the condition everywhere.
peavo
Comment 22
2014-04-11 16:18:25 PDT
Created
attachment 229176
[details]
Patch
peavo
Comment 23
2014-04-11 16:21:56 PDT
Thanks again for reviewing :) I updated the patch accordingly, but kept the cCall2SlowPath macro, since I believe it is necessary to have two different implementations, depending on the return type size of the call. Maybe there is a more elegant way around this?
Build Bot
Comment 24
2014-04-11 17:34:40 PDT
Comment on
attachment 229176
[details]
Patch
Attachment 229176
[details]
did not pass mac-wk2-ews (mac-wk2): Output:
http://webkit-queues.appspot.com/results/5928210986434560
New failing tests: compositing/iframes/invisible-nested-iframe-hide.html
Build Bot
Comment 25
2014-04-11 17:34:44 PDT
Created
attachment 229186
[details]
Archive of layout-test-results from webkit-ews-14 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: webkit-ews-14 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Brent Fulgham
Comment 26
2014-04-17 09:58:28 PDT
Comment on
attachment 229176
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=229176&action=review
A few minor corrections. Also, could you confirm that you really wanted to change the 'nativeCallTrampoline' method to use the t0 rather than t1? I don't see why that changed.
> Source/JavaScriptCore/llint/LLIntData.cpp:129 > +#if CPU(X86_64) && !OS(WINDOWS) || CPU(ARM64) || ENABLE(LLINT_C_LOOP)
Are we short-circuiting here? Maybe we need parentheses around the combined X86_64 and !OS(WINDOWS) test? I think the meaning would be clearer.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:300 > + # 4 parameter registers, and the call instruction won't put the return address at the correct stack location then.
This should read: # Also, we need to manually copy the return address to the stack, since before the call instruction we allocated space for the 4 parameter registers, and the call instruction won't put the return address at the correct stack location.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:2084 > + loadp JSFunction::m_executable[arg2], t0
Why was this changed from t1 to t0? Do we need to add an 'arg3' case to the condition windows/non-windows code above?
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:2089 > + call executableOffsetToFunction[t0]
Again, why is t1 being changed to t0?
peavo
Comment 27
2014-04-19 03:17:07 PDT
Created
attachment 229733
[details]
Patch
peavo
Comment 28
2014-04-19 03:27:23 PDT
(In reply to
comment #26
) Thanks for reviewing :) And sorry for the late reply.
> (From update of
attachment 229176
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=229176&action=review
> > A few minor corrections. Also, could you confirm that you really wanted to change the 'nativeCallTrampoline' method to use the t0 rather than t1? I don't see why that changed.
Yes, this is intentional. t1 (rdx) is used to hold the 2. parameter on Windows, so it's important that is's not overwritten and used as a temp register. Better use t0 as a temp register, which is available for use on all platforms, it seems.
> > > Source/JavaScriptCore/llint/LLIntData.cpp:129 > > +#if CPU(X86_64) && !OS(WINDOWS) || CPU(ARM64) || ENABLE(LLINT_C_LOOP) > > Are we short-circuiting here? Maybe we need parentheses around the combined X86_64 and !OS(WINDOWS) test? I think the meaning would be clearer.
Fixed in latest patch.
> > > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:300 > > + # 4 parameter registers, and the call instruction won't put the return address at the correct stack location then. > > This should read: > # Also, we need to manually copy the return address to the stack, since before the call instruction we allocated space for the 4 parameter registers, and the call instruction won't put the return address at the correct stack location. >
Updated in latest patch.
> > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:2084 > > + loadp JSFunction::m_executable[arg2], t0 > > Why was this changed from t1 to t0? Do we need to add an 'arg3' case to the condition windows/non-windows code above?
See above comment, avoid using t1 as temp register on Windows, as it is used to hold the 2. argument.
> > > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:2089 > > + call executableOffsetToFunction[t0] > > Again, why is t1 being changed to t0?
See above comment.
peavo
Comment 29
2014-04-22 09:51:21 PDT
Created
attachment 229891
[details]
Patch
peavo
Comment 30
2014-04-22 09:55:17 PDT
(In reply to
comment #26
)
> (From update of
attachment 229176
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=229176&action=review
> > A few minor corrections. Also, could you confirm that you really wanted to change the 'nativeCallTrampoline' method to use the t0 rather than t1? I don't see why that changed. >
Added a register constant for this in latest patch, to avoid changing the register usage in nativeCallTrampoline on Mac.
peavo
Comment 31
2014-05-07 05:30:24 PDT
Created
attachment 230994
[details]
Patch
peavo
Comment 32
2014-05-07 05:37:16 PDT
(In reply to
comment #18
)
> (From update of
attachment 227778
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=227778&action=review
> > > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:387 > > +# The signature of a slow path call is extern "C" SlowPathReturnType slowpathfunction(ExecState* exec, Instruction* pc). > > +# The size of the SlowPathReturnType is 16 bytes. > > +# On Win64, when the return type is larger than 8 bytes, we need to allocate space on the stack for the return value. > > +# On entry rcx (t2), should contain a pointer to this stack space. The other parameters are shifted to the right, > > +# rdx (t1) should contain the first argument, and r8 (t6) should contain the second argument. > > +# On return, rax contains a pointer to this stack value, and we then need to copy the 16 byte return value into rax (t0) and rdx (t1) > > +# since the return value is expected to be split between the two. > > +# See
http://msdn.microsoft.com/en-us/library/7572ztz4.aspx
> > +macro callSlowPathWin64(slowPath, arg1, arg2) > > + move arg1, t1 > > + move arg2, t6 > > + subp 80, sp > > + move sp, t2 > > + addp 48, t2 > > + call slowPath > > + addp 80, sp > > + move 8[t0], t1 > > + move [t0], t0 > > +end > > I suggest moving this into the cCall2() macro as a X86_64_WIN target instead of having a separate function like this. > > > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:395 > > - cCall2(slowPath, cfr, PC) > > + if X86_64_WIN > > + callSlowPathWin64(slowPath, cfr, PC) > > + else > > + cCall2(slowPath, cfr, PC) > > + end > > Instead of adding this condition everywhere, is there any reason why you can't just add a case to cCall2() to handle X86_64_WIN? > > Same with all the uses of cCall2() below.
Mark, I added a Win64 case to cCall2() as you suggested.
peavo
Comment 33
2014-05-15 10:57:43 PDT
Created
attachment 231527
[details]
Patch
peavo
Comment 34
2014-05-15 10:58:33 PDT
(In reply to
comment #33
)
> Created an attachment (id=231527) [details] > Patch
Rebased patch.
Alex Christensen
Comment 35
2014-05-16 09:48:39 PDT
View in context:
https://bugs.webkit.org/attachment.cgi?id=231527&action=review
> Source/JavaScriptCore/JavaScriptCore.vcxproj/jsc/jscCommon.props:19 > + <StackReserveSize>2097152</StackReserveSize>
Won't increasing the stack reserve size also need to be done in WinLauncher, DumpRenderTree, and any other executable that uses JavaScriptCore? jsc.exe is the only executable used to run the function-apply-many-args test, but the stack overflow problem will exist in any executable using JavaScriptCore if it runs code similar to the test.
peavo
Comment 36
2014-05-16 10:34:11 PDT
(In reply to
comment #35
)
> View in context:
https://bugs.webkit.org/attachment.cgi?id=231527&action=review
> > > Source/JavaScriptCore/JavaScriptCore.vcxproj/jsc/jscCommon.props:19 > > + <StackReserveSize>2097152</StackReserveSize> > > Won't increasing the stack reserve size also need to be done in WinLauncher, DumpRenderTree, and any other executable that uses JavaScriptCore? jsc.exe is the only executable used to run the function-apply-many-args test, but the stack overflow problem will exist in any executable using JavaScriptCore if it runs code similar to the test.
Good point, we could do that, but I'm not sure it's required, as I assume it's OK for different platforms/executables to have different stack limits. A script allocating past the stack limit will get a JS exception, and not a crash, I believe, which can be handled by the script. Please correct me if I'm wrong :)
Alex Christensen
Comment 37
2014-05-20 10:31:52 PDT
I'm ok with this, then. Someone who works more with JavaScriptCore will have to review it, though.
peavo
Comment 38
2014-05-21 09:11:00 PDT
(In reply to
comment #37
)
> I'm ok with this, then. Someone who works more with JavaScriptCore will have to review it, though.
OK, thanks :)
Alex Christensen
Comment 39
2014-05-21 23:17:12 PDT
This patch still applies, compiles, runs, and only has the 22 failures I noted earlier, introducing no new failures.
Mark Lam
Comment 40
2014-05-23 16:52:59 PDT
Comment on
attachment 231527
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=231527&action=review
> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:159 > + // We also need to allocate space on the stack for the 4 parameter registers.
nit: s/space/the shadow space/ to match MSVC documentation which calls it "shadow space".
> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:161 > + // TODO: Should we also put the return address next to the frame pointer on the stack? > + // The call instruction will not put the return address at the expected place because of the 32 byte stack allocation.
I think we don't need this. We only keep that convention so that tools can walk the stack. In this case, it doesn't look like MSVC relies on that anyway since the frame pointer is optional. Let's remove this comment.
> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:162 > + store64(X86Registers::ebp, Address(X86Registers::esp, -16));
Why do we need to preserve the ebp here? According to
http://msdn.microsoft.com/en-us/library/6t169e9c.aspx
, ebp is non-volatile.
> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:163 > + sub64(TrustedImm32(32), X86Registers::esp);
nit: can you use 4 * sizeof(int64_t) instead of 32? I think it's more descriptive that way.
> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:168 > + add64(TrustedImm32(32), X86Registers::esp);
This only accounts for the 4 shadow space slows for the args above. Don't you have to also pop the 16 bytes for the ebp, and restore the ebp? If we don't need to save the ebp above, then there's no issue here. Also, I'd prefer it if you use 4 * sizeof(int64_t) instead of 32.
> Source/JavaScriptCore/jit/JITStubsMSVC64.asm:35 > + ; Allocate space for all 4 parameter registers, and align stack pointer to 16 bytes. > + sub rsp, 40
I couldn't find any MSVC documentation that describes how the stack must be aligned. I understand the first 32 bytes is for the arg registers shadow space, but Is the extra 8 byte padding here really appropriate and necessary? Did you encounter an issue without this padding?
> Source/JavaScriptCore/jit/JITStubsMSVC64.asm:37 > + add rsp, 40
ditto.
> Source/JavaScriptCore/jit/ThunkGenerators.cpp:404 > + jit.subPtr(JSInterfaceJIT::TrustedImm32(32), JSInterfaceJIT::stackPointerRegister);
nit: In nativeForGenerator() above, you used 4 * sizeof(int64_t), which I thought is more descriptive than a literal 32 here. Would you mind changing this to say 4 * sizeof(int64_t) instead?
> Source/JavaScriptCore/jit/ThunkGenerators.cpp:413 > + jit.addPtr(JSInterfaceJIT::TrustedImm32(32), JSInterfaceJIT::stackPointerRegister);
ditto.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:71 > + subp 80, sp
32 (args shadow space) + 16 (return value) ==> 48. Why 80? What am I missing?
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:73 > + addp 48, t2
Why 48?
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:327 > + elsif X86_64_WIN > + # On Win64 we need to manually copy the frame pointer to the stack, since MSVC may not maintain a frame pointer on 64-bit. > + # See
http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx
where it's stated that rbp MAY be used as a frame pointer. > + # Also, we need to manually copy the return address to the stack, since before the call instruction we allocated space > + # for the 4 parameter registers, and the call instruction won't put the return address at the correct stack location. > + move sp, t2 > + storep cfr, [sp] > + call .copyReturnAddressToStack > +.copyReturnAddressToStack: > + pop 8[sp] > + # Adjust the return address with the offset to the instruction after the call instruction. > + addp 16, 8[sp]
Apart from the "move sp, t2", why do we need to do this? The rbp is a non-volatile register even if it's not used as a frame pointer.
peavo
Comment 41
2014-05-26 07:32:04 PDT
Created
attachment 232079
[details]
Patch
peavo
Comment 42
2014-05-26 07:33:45 PDT
(In reply to
comment #40
)
> (From update of
attachment 231527
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=231527&action=review
>
Thanks for reviewing!
> > Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:162 > > + store64(X86Registers::ebp, Address(X86Registers::esp, -16)); > > Why do we need to preserve the ebp here? According to
http://msdn.microsoft.com/en-us/library/6t169e9c.aspx
, ebp is non-volatile. >
This is not done to preserve ebp. As I've understood it, for a host function call, JIT relies on that the frame pointer is put on the stack after the return address. The first parameter of a host function call (ExecState *), will then point to the value of the frame pointer. Since MSVC does not maintain the frame pointer on 64-bit, we have to do this manually on Windows. Without this, I get many crashes/failures running the stress tests.
> > > Source/JavaScriptCore/jit/JITStubsMSVC64.asm:35 > > + ; Allocate space for all 4 parameter registers, and align stack pointer to 16 bytes. > > + sub rsp, 40 > > I couldn't find any MSVC documentation that describes how the stack must be aligned. I understand the first 32 bytes is for the arg registers shadow space, but Is the extra 8 byte padding here really appropriate and necessary? Did you encounter an issue without this padding? >
Yes, I was consistently getting a crash in the crt library without the extra stack alignment. It was crashing on a floating point instruction, if I remember correctly.
> > > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:71 > > + subp 80, sp > > 32 (args shadow space) + 16 (return value) ==> 48. Why 80? What am I missing? >
Fixed in latest patch.
> > > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:327 > > + elsif X86_64_WIN > > + # On Win64 we need to manually copy the frame pointer to the stack, since MSVC may not maintain a frame pointer on 64-bit. > > + # See
http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx
where it's stated that rbp MAY be used as a frame pointer. > > + # Also, we need to manually copy the return address to the stack, since before the call instruction we allocated space > > + # for the 4 parameter registers, and the call instruction won't put the return address at the correct stack location. > > + move sp, t2 > > + storep cfr, [sp] > > + call .copyReturnAddressToStack > > +.copyReturnAddressToStack: > > + pop 8[sp] > > + # Adjust the return address with the offset to the instruction after the call instruction. > > + addp 16, 8[sp] > > Apart from the "move sp, t2", why do we need to do this? The rbp is a non-volatile register even if it's not used as a frame pointer.
I have removed the calculation of the return address, since it's not really needed, as you mentioned earlier. Storing the frame pointer on the stack is needed, I believe, since JIT relies on that the frame pointer is put on the stack, and MSVC does not maintain the frame pointer. The first parameter of the host function call (ExecState *), will then point to the value of the frame pointer. The same is also done in the CLOOP case.
peavo
Comment 43
2014-05-26 13:29:15 PDT
Created
attachment 232093
[details]
Patch
peavo
Comment 44
2014-05-26 13:33:30 PDT
(In reply to
comment #42
)
> (In reply to
comment #40
) > > (From update of
attachment 231527
[details]
[details]) > > View in context:
https://bugs.webkit.org/attachment.cgi?id=231527&action=review
> > > > > > Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:162 > > > + store64(X86Registers::ebp, Address(X86Registers::esp, -16)); > > > > Why do we need to preserve the ebp here? According to
http://msdn.microsoft.com/en-us/library/6t169e9c.aspx
, ebp is non-volatile. > > > > This is not done to preserve ebp. > As I've understood it, for a host function call, JIT relies on that the frame pointer is put on the stack after the return address. > The first parameter of a host function call (ExecState *), will then point to the value of the frame pointer. > Since MSVC does not maintain the frame pointer on 64-bit, we have to do this manually on Windows. > Without this, I get many crashes/failures running the stress tests. >
Fixed incorrect comment in latest patch. It is the CallerFrame (frame pointer) that needs to be stored manually at the correct location on the stack, since we cannot rely on MSVC to do this.
> > > > > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:327 > > > + elsif X86_64_WIN > > > + # On Win64 we need to manually copy the frame pointer to the stack, since MSVC may not maintain a frame pointer on 64-bit. > > > + # See
http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx
where it's stated that rbp MAY be used as a frame pointer. > > > + # Also, we need to manually copy the return address to the stack, since before the call instruction we allocated space > > > + # for the 4 parameter registers, and the call instruction won't put the return address at the correct stack location. > > > + move sp, t2 > > > + storep cfr, [sp] > > > + call .copyReturnAddressToStack > > > +.copyReturnAddressToStack: > > > + pop 8[sp] > > > + # Adjust the return address with the offset to the instruction after the call instruction. > > > + addp 16, 8[sp] > > > > Apart from the "move sp, t2", why do we need to do this? The rbp is a non-volatile register even if it's not used as a frame pointer. > > I have removed the calculation of the return address, since it's not really needed, as you mentioned earlier. > Storing the frame pointer on the stack is needed, I believe, since JIT relies on that the frame pointer is put on the stack, and MSVC does not maintain the frame pointer. > The first parameter of the host function call (ExecState *), will then point to the value of the frame pointer. > The same is also done in the CLOOP case.
Fixed incorrect comment in latest patch. It is the CallerFrame (frame pointer) that needs to be stored manually at the correct location on the stack, since we cannot rely on MSVC to do this.
peavo
Comment 45
2014-05-30 14:05:28 PDT
Created
attachment 232302
[details]
Patch
peavo
Comment 46
2014-05-30 14:09:39 PDT
(In reply to
comment #45
)
> Created an attachment (id=232302) [details] > Patch
Fixed a problem where the CallerFrame stored on the stack will be overwritten if the called host function has 3 or more parameters. This happens when the callee copies it's register parameters into it's 32 byte shadow space.
Alex Christensen
Comment 47
2014-06-05 17:02:59 PDT
***
Bug 127827
has been marked as a duplicate of this bug. ***
peavo
Comment 48
2014-06-20 13:56:56 PDT
Created
attachment 233450
[details]
Patch
peavo
Comment 49
2014-06-20 13:58:42 PDT
(In reply to
comment #48
)
> Created an attachment (id=233450) [details] > Patch
Merged patch with latest after conflicts.
Brent Fulgham
Comment 50
2014-06-23 11:58:46 PDT
JSC guys: Can we land this now?
Mark Lam
Comment 51
2014-06-23 12:06:48 PDT
(In reply to
comment #50
)
> JSC guys: Can we land this now?
I'm looking at it now.
Mark Lam
Comment 52
2014-06-23 13:12:53 PDT
Comment on
attachment 233450
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=233450&action=review
There's a chance that this patch could work and that the errors in stack pointer adjustments only resulted in a some wasted slots (based on my educated guess of what is happening here). However, I'd rather not approve this patch based on my guesses. Please fix the math on all the stack pointer adjustments, or explain what I'm missing. Thanks.
> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:157 > + // JIT relies on that the CallerFrame (frame pointer) is put on the stack,
/is put on the stack,/being on the stack./
> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:166 > + // We also need to allocate the shadow space on the stack for the 4 parameter registers. > + // Also, to avoid the potential problem that the callee will overwrite the stored frame pointer > + // when it copies it's parameter registers into the shadow space, we should allocate another 32 bytes. > + // Note: POKE_ARGUMENT_OFFSET is adjusted accordingly. > + sub64(TrustedImm32(8 * sizeof(int64_t)), X86Registers::esp);
I'm still not sure if I understand what you're trying to do here. Here's what I see needs to be done: 1. Allocate space for the CallerFrame pointer and return address (though not populated) [2 slots] 2. Allocate space for the shadow space for the 4 parameter registers [4 slots] Each of these slots are 64-bits. Hence, the stack pointer is still 64-bit (16 byte) aligned after "pushing" the 6 slots. It doesn't look like you there's any alignment padding needed. So, I don't understand the following: 1. why are you allocating 8 slots? 2. You said that the callee may copy the parameter registers into the shadow space. Isn't that what the 4 slots is for? There are 2 slots that are unexplained and unaccounted for here. Can you please explain what you're trying to achieve here (with the 2 slots)? If I've misunderstood your reasoning in my comments above, please correct me. Thanks.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:333 > + # We need to allocate 48 bytes on the stack; 32 bytes for the shadow space, and 8 bytes for the frame pointer we put on the stack, > + # and 8 bytes for the function return address (which isn't really needed, so we only make room for it here). > + subp 48, sp
This is wrong. It should be "subp 32, sp". If you look at the "else" case below, you'll see that it adds 2 slots to the sp. The reason for this is because the sp is already pointing to the base of a CallFrame, which starts with a CallerFrame pointer followed by a return address. When we call the host function here, we're expecting the callee to push the return address followed by the CallerFrame pointer. That is why we add 16 (i.e. 2 slots) to the sp. In your case, the callee will not store the return address nor the CallerFrame pointer. Hence, there's no need for the "addp 16, sp" adjustment. It is sufficient to store the cfr in [sp] as you have done above. Thereafter, you only need to "sub 32, sp" to make space for the shadow space. There's no need for another 16 bytes there. This appears to be the same logic issue with your other adjustments above. Please correct me if I've misunderstood something here.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:335 > + addp 48, sp
"addp 32, sp" for reasons stated above.
peavo
Comment 53
2014-06-23 14:10:14 PDT
Created
attachment 233634
[details]
Patch
peavo
Comment 54
2014-06-23 14:14:10 PDT
(In reply to
comment #52
)
> (From update of
attachment 233450
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=233450&action=review
>
Thanks for reviewing :)
> There are 2 slots that are unexplained and unaccounted for here. Can you please explain what you're trying to achieve here (with the 2 slots)? If I've misunderstood your reasoning in my comments above, please correct me. Thanks. >
I believe we have to allocate another 2 slots for two more function parameters, since the call can have up to 6 parameters. Updated the comment in the patch.
> > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:333 > > + # We need to allocate 48 bytes on the stack; 32 bytes for the shadow space, and 8 bytes for the frame pointer we put on the stack, > > + # and 8 bytes for the function return address (which isn't really needed, so we only make room for it here). > > + subp 48, sp > > This is wrong. It should be "subp 32, sp". >
Thanks, fixed in latest patch.
Mark Lam
Comment 55
2014-06-23 16:16:55 PDT
Comment on
attachment 233634
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=233634&action=review
> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:166 > + // We also need to allocate the shadow space on the stack for the 4 parameter registers. > + // Also, we should allocate 16 bytes for the frame pointer, and return address (not populated). > + // In addition, we need to allocate 16 bytes for two more parameters, since the call can have up to 6 parameters. > + // Note: POKE_ARGUMENT_OFFSET is adjusted accordingly. > + sub64(TrustedImm32(8 * sizeof(int64_t)), X86Registers::esp);
I've thought thru this some more, and realized that this is actually wrong. See my comment on POKE_ARGUMENT_OFFSET below for details. In summary, the space for the parameter shadow space should have already been reserved in maxFrameExtentForSlowPathCall in assembler/MaxFrameExtentForSlowPathCall.h, and it appears to be already reserved there. The calls to the various setupArguments() (defined in CCallHelpers.h) will poke arguments into the space that is reserved by maxFrameExtentForSlowPathCall before we store the CallerFrame pointer above. The issue that we're really trying to solve here is that the callee is not expecting the CallerFrame pointer to be on stack. Hence, after "pushing" the CallerFrame pointer, we need to replicate the arguments that are being passed to the callee. For the first 4 arguments, this is a simple matter of reserving some space. For the 5th and 6th argument, we need to copy them here or the callee will see junk instead. However, we don't actually know how many arguments are being passed for this call. Fortunately, we can just be conservative and reserve the worst case amount of space needed. That said, we still need to conservatively copy the 5th and 6th arguments over as if they exists. Please add that code.
> Source/JavaScriptCore/jit/CCallHelpers.h:786 > -#if CPU(MIPS) || (OS(WINDOWS) && CPU(X86_64)) > +#if CPU(MIPS) > #define POKE_ARGUMENT_OFFSET 4 > +#elif CPU(X86_64) && OS(WINDOWS) > +#define POKE_ARGUMENT_OFFSET -4 > #else
This is wrong. Please undo. At the moment when the various setupArgument()s are called, the sp should be pointing to a spot on the stack where we're ready to push the callee's return address and previous frame pointer: | prev frame* goes here | |---------------------| | ret addr goes here | |---------------------| | | <---- sp points here | reserved space | | for max callee | | args & other | | stuff. | | | ^ |---------------------| | Stack grows in this direction. | Current JS frame | | In the Win64 case, this means POKE_ARGUMENT_OFFSET should be 4 to reserve 4 slots for the parameter shadow space. When we "push" the 5th argument, it gets poke'd to sp + POKE_ARGUMENT_OFFSET i.e. sp[4] as expected.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:77 > + elsif X86_64_WIN > + # Note: this implementation is only correct if the return type size is > 8 bytes. > + # See macro cCall2Void for an implementation when the return type <= 8 bytes. > + # On Win64, when the return type is larger than 8 bytes, we need to allocate space on the stack for the return value. > + # On entry rcx (t2), should contain a pointer to this stack space. The other parameters are shifted to the right, > + # rdx (t1) should contain the first argument, and r8 (t6) should contain the second argument. > + # On return, rax contains a pointer to this stack value, and we then need to copy the 16 byte return value into rax (t0) and rdx (t1) > + # since the return value is expected to be split between the two. > + # See
http://msdn.microsoft.com/en-us/library/7572ztz4.aspx
> + move arg1, t1 > + move arg2, t6 > + subp 48, sp > + move sp, t2 > + addp 32, t2 > + call function > + addp 48, sp > + move 8[t0], t1 > + move [t0], t0
cCall2() is used to call slow path functions that may recurse and call JS functions again. Doesn't that mean you also need to write the CallerFrame pointer to the stack here? As for cCall2Void() below, a cursory scan of its callers tells me that we don't currently use it to call something that can recurse into JS. But we don't know if we may in the future. Do you think it needs to store the CallerFrame pointer there too?
peavo
Comment 56
2014-06-24 09:31:21 PDT
Created
attachment 233709
[details]
Patch
peavo
Comment 57
2014-06-24 09:38:05 PDT
(In reply to
comment #55
)
> (From update of
attachment 233634
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=233634&action=review
>
Thanks again for reviewing :)
> However, we don't actually know how many arguments are being passed for this call. Fortunately, we can just be conservative and reserve the worst case amount of space needed. That said, we still need to conservatively copy the 5th and 6th arguments over as if they exists. Please add that code. > > > Source/JavaScriptCore/jit/CCallHelpers.h:786 > > -#if CPU(MIPS) || (OS(WINDOWS) && CPU(X86_64)) > > +#if CPU(MIPS) > > #define POKE_ARGUMENT_OFFSET 4 > > +#elif CPU(X86_64) && OS(WINDOWS) > > +#define POKE_ARGUMENT_OFFSET -4 > > #else > > This is wrong. Please undo. >
Thanks, updated patch accordingly. I'm probably missing something, but isn't this roughly the same as setting POKE_ARGUMENT_OFFSET == - 4? See attempt at drawing below :) Slot ---------------------- | ret addr | Put by call instruction. |---------------------| 8 | param 1 | |---------------------| 7 | param 2 | |---------------------| Shadow space (slot 5-8) 6 | param 3 | |---------------------| 5 | param 4 | |---------------------| 4 | param 5 | Arg 5 will be poke'd here when POKE_ARGUMENT_OFFSET == - 4 |---------------------| 3 | param 6 | Arg 6 will be poke'd here when POKE_ARGUMENT_OFFSET == - 4 |---------------------| 2 | prev frame* | |---------------------| 1 | ret addr (empty) | |---------------------| | | <---- sp points here | param 1 | |---------------------| | param 2 | |---------------------| | param 3 | |---------------------| | param 4 | |---------------------| | param 5 | Arg 5 will be poke'd here when POKE_ARGUMENT_OFFSET == 4 and copied to slot 4 |---------------------| | param 6 | Arg 6 will be poke'd here when POKE_ARGUMENT_OFFSET == 4 and copied to slot 3 |---------------------| | Current JS frame |
> > cCall2() is used to call slow path functions that may recurse and call JS functions again. Doesn't that mean you also need to write the CallerFrame pointer to the stack here? > > As for cCall2Void() below, a cursory scan of its callers tells me that we don't currently use it to call something that can recurse into JS. But we don't know if we may in the future. Do you think it needs to store the CallerFrame pointer there too?
I don't think this is needed since the frame pointer will be passed to the function as the first parameter, e.g. cCall2(slowPath, cfr, PC). In makeHostFunctionCall, on the other hand, we pass a pointer to the frame pointer (on the stack) as the first parameter: move sp, arg1 storep cfr, [sp] ... call ... Please correct me if I'm wrong.
Mark Lam
Comment 58
2014-06-24 17:24:41 PDT
Comment on
attachment 233709
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=233709&action=review
I would r+ this patch but first, I'd like you to try undoing the .opPutByIdSlow changes I suggested. If they are not needed anymore, let's undo them. Let me know if they are still or not. Thanks.
> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:166 > + load64(Address(X86Registers::esp, 32), scratchRegister); > + store64(scratchRegister, Address(X86Registers::esp, -32)); > + > + load64(Address(X86Registers::esp, 40), scratchRegister); > + store64(scratchRegister, Address(X86Registers::esp, -24));
To answer your question about using a POKE_ARGUMENT_OFFSET of -4 to achieve this, I think you made a good point there. I think it will work (and one might want to go with that solution if performance is an issue here). However, it's not obvious from reading the code as to how things work since it differs from how all other ports expect it to work. So, if you were to go with that solution, I would like to see a comment here to describe why the 5th and 6th arguments are already in place and no further work is needed. I would also want a comment in the definition POKE_ARGUMENT_OFFSET to explain why it has a negative offset. However, if performance is not an issue, then let's stick with this more obvious solution for now.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1366 > + > + .opPutByIdSlow: > + callSlowPath(_llint_slow_path_put_by_id) > + dispatch(9) > +
The LLINT code has been re-written recently such that all the bytecode handlers are now local labels inside the same function. As such, I think this change may no longer be needed (though I had suggested it back when the LLINT was implementing each bytecode handler with a separate global label). Sorry for the hassle, but can you please try undoing this change and seeing if LLINT on Win64 builds and works without this change?
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:-1307 > -.opPutByIdSlow: > - callSlowPath(_llint_slow_path_put_by_id) > - dispatch(9) > -
Ditto.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1385 > - additionalChecks(t1, t3, t2) > + additionalChecks(t1, t3, t2, .opPutByIdSlow)
Ditto.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1404 > + > + .opPutByIdSlow: > + callSlowPath(_llint_slow_path_put_by_id) > + dispatch(9) > +
Ditto.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1407 > -macro noAdditionalChecks(oldStructure, scratch, scratch2) > +macro noAdditionalChecks(oldStructure, scratch, scratch2, slowPath)
Ditto.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1410 > -macro structureChainChecks(oldStructure, scratch, scratch2) > +macro structureChainChecks(oldStructure, scratch, scratch2, slowPath)
Ditto.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1421 > - bpneq oldStructure, [scratch], .opPutByIdSlow > + bpneq oldStructure, [scratch], slowPath
Ditto.
peavo
Comment 59
2014-06-25 08:50:22 PDT
Created
attachment 233815
[details]
Patch
peavo
Comment 60
2014-06-25 08:56:17 PDT
(In reply to
comment #58
)
> (From update of
attachment 233709
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=233709&action=review
> Thanks for reviewing :)
> I would r+ this patch but first, I'd like you to try undoing the .opPutByIdSlow changes I suggested. If they are not needed anymore, let's undo them. Let me know if they are still or not. Thanks. >
They are not needed anymore, undone in latest patch.
> > Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:166 > > + load64(Address(X86Registers::esp, 32), scratchRegister); > > + store64(scratchRegister, Address(X86Registers::esp, -32)); > > + > > + load64(Address(X86Registers::esp, 40), scratchRegister); > > + store64(scratchRegister, Address(X86Registers::esp, -24)); > > To answer your question about using a POKE_ARGUMENT_OFFSET of -4 to achieve this, I think you made a good point there. I think it will work (and one might want to go with that solution if performance is an issue here). However, it's not obvious from reading the code as to how things work since it differs from how all other ports expect it to work. So, if you were to go with that solution, I would like to see a comment here to describe why the 5th and 6th arguments are already in place and no further work is needed. I would also want a comment in the definition POKE_ARGUMENT_OFFSET to explain why it has a negative offset. However, if performance is not an issue, then let's stick with this more obvious solution for now. >
Sounds good, I added a comment regarding the copying of argument 5 and 6.
Mark Lam
Comment 61
2014-06-25 09:00:28 PDT
Comment on
attachment 233815
[details]
Patch r=me
WebKit Commit Bot
Comment 62
2014-06-25 09:37:46 PDT
Comment on
attachment 233815
[details]
Patch Clearing flags on attachment: 233815 Committed
r170428
: <
http://trac.webkit.org/changeset/170428
>
WebKit Commit Bot
Comment 63
2014-06-25 09:37:55 PDT
All reviewed patches have been landed. Closing bug.
peavo
Comment 64
2014-06-25 09:56:09 PDT
(In reply to
comment #61
)
> (From update of
attachment 233815
[details]
) > r=me
Thank you!
Alex Christensen
Comment 65
2014-06-25 10:39:09 PDT
hooray!
peavo
Comment 66
2014-07-22 10:54:54 PDT
***
Bug 124450
has been marked as a duplicate of this bug. ***
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug