Bug 145366

Summary: FTL is not working on Windows.
Product: WebKit Reporter: peavo
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: achristensen, basile_clement, beidson, benjamin, bfulgham, cmarcelo, commit-queue, fpizlo, ggaren, gyuyoung.kim, Hironori.Fujii, ian.grunert, mark.lam, msaboff, oliver, ossy, sanxiyn
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
beidson: review-
Patch none

Description peavo 2015-05-25 02:25:14 PDT
The purpose of this patch is to get FTL up and running on Windows.
Comment 1 peavo 2015-05-25 03:27:44 PDT
Created attachment 253682 [details]
Patch
Comment 2 peavo 2015-05-25 04:33:40 PDT
Created attachment 253683 [details]
Patch
Comment 3 Filip Pizlo 2015-05-26 08:16:12 PDT
Comment on attachment 253683 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=253683&action=review

> Source/JavaScriptCore/ChangeLog:29
> +        (JSC::FTL::canCompile): llvm does not support ArithPow on Windows.

This is the wrong approach. You're preventing the compilation of an entire function because of an intrinsic. Instead you should implement ArithPow on Windows using a function call to a helper.
Comment 4 peavo 2015-05-26 10:14:31 PDT
(In reply to comment #3)
> Comment on attachment 253683 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=253683&action=review
> 
> > Source/JavaScriptCore/ChangeLog:29
> > +        (JSC::FTL::canCompile): llvm does not support ArithPow on Windows.
> 
> This is the wrong approach. You're preventing the compilation of an entire
> function because of an intrinsic. Instead you should implement ArithPow on
> Windows using a function call to a helper.

Ok, thanks for reviewing :) I will look into this.
Comment 5 Filip Pizlo 2015-05-26 10:25:14 PDT
(In reply to comment #4)
> (In reply to comment #3)
> > Comment on attachment 253683 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=253683&action=review
> > 
> > > Source/JavaScriptCore/ChangeLog:29
> > > +        (JSC::FTL::canCompile): llvm does not support ArithPow on Windows.
> > 
> > This is the wrong approach. You're preventing the compilation of an entire
> > function because of an intrinsic. Instead you should implement ArithPow on
> > Windows using a function call to a helper.
> 
> Ok, thanks for reviewing :) I will look into this.

Note that you could literally call the pow() function.  It's possible that if you emit a call to an extern function named "pow" then LLVM might try to turn this into an intrinsic and then fail in the backend if the intrinsic is unsupported (I don't know if this is happening here, but I remember such a LLVM JIT bug with other math functions).  One way to avoid this bug is to grab a pointer to the pow() function, and then emit a call to that pointer constant.  We've done this a lot in other such thorny cases.
Comment 6 peavo 2015-05-26 10:47:58 PDT
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > Comment on attachment 253683 [details]
> > > Patch
> > > 
> > > View in context:
> > > https://bugs.webkit.org/attachment.cgi?id=253683&action=review
> > > 
> > > > Source/JavaScriptCore/ChangeLog:29
> > > > +        (JSC::FTL::canCompile): llvm does not support ArithPow on Windows.
> > > 
> > > This is the wrong approach. You're preventing the compilation of an entire
> > > function because of an intrinsic. Instead you should implement ArithPow on
> > > Windows using a function call to a helper.
> > 
> > Ok, thanks for reviewing :) I will look into this.
> 
> Note that you could literally call the pow() function.  It's possible that
> if you emit a call to an extern function named "pow" then LLVM might try to
> turn this into an intrinsic and then fail in the backend if the intrinsic is
> unsupported (I don't know if this is happening here, but I remember such a
> LLVM JIT bug with other math functions).  One way to avoid this bug is to
> grab a pointer to the pow() function, and then emit a call to that pointer
> constant.  We've done this a lot in other such thorny cases.

Ok, thanks for the tip :)
I also found this llvm bug: https://llvm.org/bugs/show_bug.cgi?id=8379
Comment 7 Alex Christensen 2015-05-26 13:35:21 PDT
Be careful: _snprintf does not always null terminate (if the output is the exact length of the buffer), snprintf does.
See https://bugzilla.mozilla.org/show_bug.cgi?id=821003 for more discussion.
Comment 8 peavo 2015-05-26 13:47:07 PDT
Created attachment 253732 [details]
Patch
Comment 9 peavo 2015-05-26 13:53:00 PDT
(In reply to comment #7)
> Be careful: _snprintf does not always null terminate (if the output is the
> exact length of the buffer), snprintf does.
> See https://bugzilla.mozilla.org/show_bug.cgi?id=821003 for more discussion.

Thanks for the heads-up :) I will check these, and see if they're safe.
Comment 10 peavo 2015-05-26 13:56:39 PDT
(In reply to comment #5)
> 
> Note that you could literally call the pow() function.  It's possible that
> if you emit a call to an extern function named "pow" then LLVM might try to
> turn this into an intrinsic and then fail in the backend if the intrinsic is
> unsupported (I don't know if this is happening here, but I remember such a
> LLVM JIT bug with other math functions).  One way to avoid this bug is to
> grab a pointer to the pow() function, and then emit a call to that pointer
> constant.  We've done this a lot in other such thorny cases.

I tried to fix this in llvm. You can see the proposed change in diff.txt. I'm not sure that it is correct, though. SunSpider runs fine. Does "__powidf2" and "pow" have the same signature?
Comment 11 Filip Pizlo 2015-05-26 13:57:36 PDT
(In reply to comment #10)
> (In reply to comment #5)
> > 
> > Note that you could literally call the pow() function.  It's possible that
> > if you emit a call to an extern function named "pow" then LLVM might try to
> > turn this into an intrinsic and then fail in the backend if the intrinsic is
> > unsupported (I don't know if this is happening here, but I remember such a
> > LLVM JIT bug with other math functions).  One way to avoid this bug is to
> > grab a pointer to the pow() function, and then emit a call to that pointer
> > constant.  We've done this a lot in other such thorny cases.
> 
> I tried to fix this in llvm. You can see the proposed change in diff.txt.
> I'm not sure that it is correct, though. SunSpider runs fine. Does
> "__powidf2" and "pow" have the same signature?

You'll need to verify much more than SunSpider. ;-)

Can you run JSC stress tests?

I don't know what __powidf2 is.
Comment 12 peavo 2015-05-26 14:55:07 PDT
(In reply to comment #11)
> 
> You'll need to verify much more than SunSpider. ;-)
> 
> Can you run JSC stress tests?
> 

I get the same stress test results with and without this patch:

** The following JSC stress test failures have been introduced:
        jsc-layout-tests.yaml/js/script-tests/function-apply-aliased.js.layout-no-cjit
        jsc-layout-tests.yaml/js/script-tests/math.js.layout
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-dfg-eager-no-cjit
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-cjit
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-llint

Results for JSC stress tests:
    5 failures found.

How do I know that FTL kicks in as often as it should?

> I don't know what __powidf2 is.

I'm not sure, but I think __powidf2 is a function in the gcc c runtime lib, which llvm is using to implement the powi intrinsic. I tried to replace this with pow on Windows.
Comment 13 Filip Pizlo 2015-05-26 15:07:11 PDT
(In reply to comment #12)
> (In reply to comment #11)
> > 
> > You'll need to verify much more than SunSpider. ;-)
> > 
> > Can you run JSC stress tests?
> > 
> 
> I get the same stress test results with and without this patch:
> 
> ** The following JSC stress test failures have been introduced:
>        
> jsc-layout-tests.yaml/js/script-tests/function-apply-aliased.js.layout-no-
> cjit
>         jsc-layout-tests.yaml/js/script-tests/math.js.layout
>        
> jsc-layout-tests.yaml/js/script-tests/math.js.layout-dfg-eager-no-cjit
>         jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-cjit
>         jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-llint
> 
> Results for JSC stress tests:
>     5 failures found.
> 
> How do I know that FTL kicks in as often as it should?

We have FTL-specific tests that follow this pattern:

function foo() {
    test content here
}

noInline(foo)
for (var i = 0; i < 10000; ++i)
    foo();

The test harness ensures that foo will be FTLed.

> 
> > I don't know what __powidf2 is.
> 
> I'm not sure, but I think __powidf2 is a function in the gcc c runtime lib,
> which llvm is using to implement the powi intrinsic. I tried to replace this
> with pow on Windows.

OK.
Comment 14 Csaba Osztrogonác 2015-05-26 15:28:29 PDT
(In reply to comment #12)
> (In reply to comment #11)
> > 
> > You'll need to verify much more than SunSpider. ;-)
> > 
> > Can you run JSC stress tests?
> > 
> 
> I get the same stress test results with and without this patch:
> 
> ** The following JSC stress test failures have been introduced:
>        
> jsc-layout-tests.yaml/js/script-tests/function-apply-aliased.js.layout-no-
> cjit
>         jsc-layout-tests.yaml/js/script-tests/math.js.layout
>        
> jsc-layout-tests.yaml/js/script-tests/math.js.layout-dfg-eager-no-cjit
>         jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-cjit
>         jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-llint
> 
> Results for JSC stress tests:
>     5 failures found.
> 
> How do I know that FTL kicks in as often as it should?
> 
> > I don't know what __powidf2 is.
> 
> I'm not sure, but I think __powidf2 is a function in the gcc c runtime lib,
> which llvm is using to implement the powi intrinsic. I tried to replace this
> with pow on Windows.

Have you passed --ftl-jit to run-javascriptcore-tests? 
(Without this option FTL JIT is disabled on non Apple Mac platforms.)

You can set JSC_ftlCrashes=1 environment variable
to check if a test triggers FTL compilation or not.
Comment 15 peavo 2015-05-27 05:14:03 PDT
(In reply to comment #14)
> 
> Have you passed --ftl-jit to run-javascriptcore-tests? 
> (Without this option FTL JIT is disabled on non Apple Mac platforms.)
> 
> You can set JSC_ftlCrashes=1 environment variable
> to check if a test triggers FTL compilation or not.

Ah, ok, thanks, this explains a bit :) I did not pass this option.
When I run the stress tests now, it only makes it to test number ~350 (v8) before it gets 5 failures, and then starts using excessive amounts of memory which stalls the test ...
It looks like more work is required here. I will start looking into it :)
Comment 16 peavo 2015-06-08 09:54:54 PDT
Created attachment 254498 [details]
Patch
Comment 17 peavo 2015-06-08 09:56:41 PDT
I've made a little progress here. The v8 crash is fixed, the next failing test is the v8 raytrace test, which renders incorrectly. I will have a look at that :)
Comment 18 peavo 2015-06-08 10:48:14 PDT
Created attachment 254499 [details]
Patch
Comment 19 peavo 2015-06-16 05:43:05 PDT
Created attachment 254949 [details]
Patch
Comment 20 peavo 2015-06-16 06:40:43 PDT
The v8 raytrace test is passing now. The next failing test is throw-from-ftl-call-ic-slow-path-undefined.js. I will try to debug that next :)
Comment 21 peavo 2015-06-20 04:01:13 PDT
Created attachment 255287 [details]
Patch
Comment 22 peavo 2015-06-20 05:50:23 PDT
Created attachment 255288 [details]
Patch
Comment 23 peavo 2015-06-20 10:04:30 PDT
Created attachment 255293 [details]
Patch
Comment 24 peavo 2015-06-20 10:52:14 PDT
With the latest patch, I don't get any stress test regressions :)

20743/20743 (failed 7)

** The following JSC stress test failures have been introduced:
        jsc-layout-tests.yaml/js/script-tests/math.js.layout
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-dfg-eager-no-cjit
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-ftl
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-ftl-eager-no-cjit
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-ftl-no-cjit
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-cjit
        jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-llint

Results for JSC stress tests:
    7 failures found.
Comment 25 Michael Saboff 2015-06-20 21:20:29 PDT
(In reply to comment #24)
> With the latest patch, I don't get any stress test regressions :)
> 
> 20743/20743 (failed 7)
> 
> ** The following JSC stress test failures have been introduced:
>         jsc-layout-tests.yaml/js/script-tests/math.js.layout
>        
> jsc-layout-tests.yaml/js/script-tests/math.js.layout-dfg-eager-no-cjit
>         jsc-layout-tests.yaml/js/script-tests/math.js.layout-ftl
>        
> jsc-layout-tests.yaml/js/script-tests/math.js.layout-ftl-eager-no-cjit
>         jsc-layout-tests.yaml/js/script-tests/math.js.layout-ftl-no-cjit
>         jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-cjit
>         jsc-layout-tests.yaml/js/script-tests/math.js.layout-no-llint
> 
> Results for JSC stress tests:
>     7 failures found.

What is the output of these tests?  It could be the rounding mode you have set.
Comment 26 peavo 2015-06-20 23:54:37 PDT
Ah, I'm not sure what rounding mode I am using, how do I set it?

--- ../.tests/jsc-layout-tests.yaml/js/math-expected.txt	2015-06-20 10:18:23.243854000 -0700
+++ ../jsc-layout-tests.yaml/js/script-tests/math.js.layout.out	2015-06-20 10:40:02.156854000 -0700
@@ -198,7 +198,7 @@
 PASS Math.sign(-Infinity) is -1
 PASS Math.sin(NaN) is NaN
 PASS Math.sin(0) is 0
-PASS Math.sin(-0) is -0
+FAIL Math.sin(-0) should be -0. Was 0.
 PASS Math.sin(Infinity) is NaN
 PASS Math.sin(-Infinity) is NaN
 PASS Math.sqrt(NaN) is NaN
Comment 27 peavo 2015-06-21 00:20:33 PDT
I can see that Visual Studio has a 'Floating Point Model' setting, which can be set to 'precise', 'strict', or 'fast'. It is currently set to 'precise' for JSC.
I also found a function called _controlfp (https://msdn.microsoft.com/en-us/library/e9b52ceh.aspx), which seem to be able to set the rounding mode. I'm not sure what the default rounding mode is.
Comment 28 peavo 2015-06-21 02:09:51 PDT
I tried to set the rounding mode with _controlfp and the parameters _RC_CHOP, _RC_UP, _RC_DOWN, and _RC_NEAR. All other settings than _RC_NEAR resulted in more regressions. Setting rounding mode to _RC_NEAR gave the same results as when not setting the rounding mode, and did not fix the math.js.layout test. So I assume the default setting is _RC_NEAR.
Comment 29 Michael Saboff 2015-06-21 21:03:15 PDT
(In reply to comment #28)
> I tried to set the rounding mode with _controlfp and the parameters
> _RC_CHOP, _RC_UP, _RC_DOWN, and _RC_NEAR. All other settings than _RC_NEAR
> resulted in more regressions. Setting rounding mode to _RC_NEAR gave the
> same results as when not setting the rounding mode, and did not fix the
> math.js.layout test. So I assume the default setting is _RC_NEAR.

Looks like the default rounding mode is fine.  If your only failure is sin(-0), I'd look at that path and see if there is some windows specific handling of sin that is causing the issue.
Comment 30 peavo 2015-06-22 02:12:54 PDT
(In reply to comment #29)
> 
> Looks like the default rounding mode is fine.  If your only failure is
> sin(-0), I'd look at that path and see if there is some windows specific
> handling of sin that is causing the issue.

Yes, the only failure is sin(-0) (which also fails with FTL disabled).
I investigated the path, and found nothing Windows specific, except for the sin() call itself from the function mathProtoFuncSin().
This makes me think that the issue is caused by differences in the MSVC and GCC C run time libraries.
Comment 31 Basile Clement 2015-06-22 10:08:25 PDT
If MSVC runtime says sin(-0) = 0, this is incorrect (standard enforces sin(-0) = -0).
According to http://stackoverflow.com/questions/30392832/sinminus-zero-does-not-return-the-expected-result-on-visual-studio-2013-64bi , this is the case at least for MSVC 2013 on Windows 7. Is this the MSVC/Windows version combination you are using? Can you confirm it is a runtime problem with a simple C program computing sin(-0)?

(In reply to comment #30)
> (In reply to comment #29)
> > 
> > Looks like the default rounding mode is fine.  If your only failure is
> > sin(-0), I'd look at that path and see if there is some windows specific
> > handling of sin that is causing the issue.
> 
> Yes, the only failure is sin(-0) (which also fails with FTL disabled).
> I investigated the path, and found nothing Windows specific, except for the
> sin() call itself from the function mathProtoFuncSin().
> This makes me think that the issue is caused by differences in the MSVC and
> GCC C run time libraries.
Comment 32 peavo 2015-06-22 13:57:43 PDT
(In reply to comment #31)
> If MSVC runtime says sin(-0) = 0, this is incorrect (standard enforces
> sin(-0) = -0).
> According to
> http://stackoverflow.com/questions/30392832/sinminus-zero-does-not-return-
> the-expected-result-on-visual-studio-2013-64bi , this is the case at least
> for MSVC 2013 on Windows 7. Is this the MSVC/Windows version combination you
> are using? Can you confirm it is a runtime problem with a simple C program
> computing sin(-0)?
> 

Thanks for looking into this :)

Yes, I'm using MSVC 2013 on Windows 7.
I created a Win64 test application with the code from your stackoverflow link, and it does give the incorrect result sin(-0) = 0.
The corresponding Win32 application gives the correct result, so it seems to be a problem with the Win64 C runtime libraries.
Comment 33 peavo 2015-06-26 04:23:45 PDT
Created attachment 255630 [details]
Patch
Comment 34 peavo 2015-06-26 04:27:30 PDT
(In reply to comment #33)
> Created attachment 255630 [details]
> Patch

Rebased.
Comment 35 peavo 2015-07-03 03:19:06 PDT
Created attachment 256090 [details]
Patch
Comment 36 peavo 2015-07-03 03:20:26 PDT
(In reply to comment #35)
> Created attachment 256090 [details]
> Patch

Rebased.
Comment 37 Seo Sanghyeon 2015-08-26 04:53:22 PDT
For snprintf, can't one #include <wtf/StringExtras.h>?
Comment 38 peavo 2015-08-26 08:56:31 PDT
(In reply to comment #37)
> For snprintf, can't one #include <wtf/StringExtras.h>?

Thanks, this is a good suggestion, I will update the patch soon :)
Comment 39 peavo 2015-09-05 11:26:30 PDT
Created attachment 260695 [details]
Patch
Comment 40 peavo 2015-09-05 12:59:09 PDT
Thanks for reviewing! I will upload a rebased patch to check that no builds are broken.
Comment 41 peavo 2015-09-05 13:37:48 PDT
Created attachment 260699 [details]
Patch
Comment 42 peavo 2015-09-05 13:42:16 PDT
(In reply to comment #41)
> Created attachment 260699 [details]
> Patch

Rebased patch.
The prebuilt libllvmForJSC.dll is updated with LLVM revision 241791.
For FTL to work properly on Windows, LLVM revision 241725 or later is required.
Comment 43 peavo 2015-09-05 16:07:13 PDT
Created attachment 260703 [details]
Patch
Comment 44 peavo 2015-09-06 03:33:03 PDT
(In reply to comment #43)
> Created attachment 260703 [details]
> Patch

To make the patch build on all platforms, I deleted the llvm api function BuildLandingPad from the list of api functions, since it has changed signature in recent llvm revisions. I am not sure just deleting it is the best approach, though?
Comment 45 Filip Pizlo 2015-09-06 10:21:51 PDT
(In reply to comment #44)
> (In reply to comment #43)
> > Created attachment 260703 [details]
> > Patch
> 
> To make the patch build on all platforms, I deleted the llvm api function
> BuildLandingPad from the list of api functions, since it has changed
> signature in recent llvm revisions. I am not sure just deleting it is the
> best approach, though?

Yes that's the right approach.
Comment 46 peavo 2015-09-07 03:56:20 PDT
(In reply to comment #45)
> (In reply to comment #44)
> > (In reply to comment #43)
> > > Created attachment 260703 [details]
> > > Patch
> > 
> > To make the patch build on all platforms, I deleted the llvm api function
> > BuildLandingPad from the list of api functions, since it has changed
> > signature in recent llvm revisions. I am not sure just deleting it is the
> > best approach, though?
> 
> Yes that's the right approach.

Ok, thanks :) Do I need another r+, or should I go on and commit?
Comment 47 peavo 2015-09-07 04:05:18 PDT
When rebasing I got a couple of conflicts, you may want to do an additional check of those changes.
Comment 48 peavo 2015-09-09 11:11:05 PDT
Created attachment 260864 [details]
Patch
Comment 49 peavo 2015-09-09 11:15:05 PDT
(In reply to comment #48)
> Created attachment 260864 [details]
> Patch

Rebased, and removed the prebuilt libllvmForJSC.dll. I think it is better to manually build llvm, and avoid adding ~10MB to the source tree :)
Comment 50 peavo 2015-09-26 11:12:47 PDT
Created attachment 261971 [details]
Patch
Comment 51 peavo 2015-09-26 11:59:31 PDT
Created attachment 261972 [details]
Patch
Comment 52 peavo 2015-09-26 13:38:37 PDT
Created attachment 261978 [details]
Patch
Comment 53 peavo 2015-09-26 13:45:47 PDT
(In reply to comment #52)
> Created attachment 261978 [details]
> Patch

Rebased. I also had to make a few changes to the patch due to recent JSC changes.
Comment 54 peavo 2015-10-02 09:09:11 PDT
Created attachment 262334 [details]
Patch
Comment 55 peavo 2015-10-02 10:20:20 PDT
(In reply to comment #54)
> Created attachment 262334 [details]
> Patch

Rebased, and simplified a little bit :)
Comment 56 Alex Christensen 2015-10-02 14:51:11 PDT
Comment on attachment 262334 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=262334&action=review

I have a few nits about the build system changes.

> Source/JavaScriptCore/CMakeLists.txt:846
> +    if (NOT LLVM_STATIC_LIBRARIES AND NOT MSVC)

Please use WIN32 instead of MSVC.  It causes fewer warnings when running cmake and until someone builds webkit with mingw or clang, they're the same thing.

> Source/JavaScriptCore/CMakeLists.txt:876
> +        target_link_libraries(llvmForJSC "LLVMAnalysis.lib;LLVMAsmParser.lib;LLVMAsmPrinter.lib;LLVMBitReader.lib;LLVMBitWriter.lib;LLVMCodeGen.lib;LLVMCore.lib;LLVMDebugInfo.lib;LLVMExecutionEngine.lib;LLVMInstCombine.lib;LLVMInstrumentation.lib;LLVMInterpreter.lib;LLVMipa.lib;LLVMipo.lib;LLVMIRReader.lib;LLVMJIT.lib;LLVMLinker.lib;LLVMLTO.lib;LLVMMC.lib;LLVMMCDisassembler.lib;LLVMMCJIT.lib;LLVMMCParser.lib;LLVMObjCARCOpts.lib;LLVMObject.lib;LLVMOption.lib;LLVMRuntimeDyld.lib;LLVMScalarOpts.lib;LLVMSelectionDAG.lib;LLVMSupport.lib;LLVMTableGen.lib;LLVMTarget.lib;LLVMTransformUtils.lib;LLVMVectorize.lib;LLVMX86AsmParser.lib;LLVMX86AsmPrinter.lib;LLVMX86CodeGen.lib;LLVMX86Desc.lib;LLVMX86Disassembler.lib;LLVMX86Info.lib;LLVMX86Utils.lib;LLVMProfileData.lib")

These should be in LLVM_STATIC_LIBRARIES.

> Source/cmake/OptionsWin.cmake:39
> +WEBKIT_OPTION_DEFAULT_PORT_VALUE(ENABLE_FTL_JIT PRIVATE ON)

Shouldn't this only be enabled for the 64-bit build?

> Tools/Scripts/webkitdirs.pm:1888
> +    if (isWin64()) {

&& !canUseNinja()
Comment 57 peavo 2015-10-03 07:54:07 PDT
Created attachment 262379 [details]
Patch
Comment 58 peavo 2015-10-03 08:37:31 PDT
(In reply to comment #56)
> Comment on attachment 262334 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=262334&action=review
> 

Thanks for reviewing! Updated patch :)
Comment 59 peavo 2015-10-04 09:54:06 PDT
Created attachment 262404 [details]
Patch
Comment 60 peavo 2015-10-07 13:31:58 PDT
Created attachment 262630 [details]
Patch
Comment 61 peavo 2015-10-07 13:57:53 PDT
(In reply to comment #60)
> Created attachment 262630 [details]
> Patch

Rebased and removed a compile fix for VS2013 which is no longer needed for VS2015.
Comment 62 peavo 2015-10-19 07:35:07 PDT
Created attachment 263482 [details]
Patch
Comment 63 Mark Lam 2015-10-19 10:41:33 PDT
Comment on attachment 263482 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=263482&action=review

> Source/JavaScriptCore/CMakeLists.txt:952
> +            llvm/LLVMAPI.cpp

Since this file is common to WIN32 and other ports, let's keep it in the common list above.

> Source/JavaScriptCore/CMakeLists.txt:958
> +            llvm/LLVMAPI.cpp

Ditto.

> Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:722
> +#if CPU(X86_64)
> +    void loadDouble128(ImplicitAddress address, FPRegisterID src)
> +    {
> +        ASSERT(isSSE2Present());
> +        m_assembler.movups_mr(src, address.offset, address.base);
> +    }
> +#endif

Why is this added?  I don't see it used anywhere in this patch.

> Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:747
> +#if CPU(X86_64)
> +    void storeDouble128(FPRegisterID src, ImplicitAddress address)
> +    {
> +        ASSERT(isSSE2Present());
> +        m_assembler.movups_rm(src, address.offset, address.base);
> +    }
> +#endif

Why is this added?  I don't see it used anywhere in this patch.

> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:197
> +    Call callJIT()
> +    {
> +        DataLabelPtr label = moveWithPatch(TrustedImmPtr(0), scratchRegister);
> +        Call result = Call(m_assembler.call(scratchRegister), Call::Linkable);
> +
> +        ASSERT_UNUSED(label, differenceBetween(label, result) == REPTACH_OFFSET_CALL_R11);
> +        return result;
> +    }

Why is this added?  I don't see it used anywhere in this patch.

> Source/JavaScriptCore/assembler/X86Assembler.h:1848
> +    void movups_rm(XMMRegisterID src, int offset, RegisterID base)
> +    {
> +        m_formatter.twoByteOp(OP2_MOVSD_WsdVsd, (RegisterID)src, base, offset);
> +    }
> +
> +    void movups_mr(XMMRegisterID src, int offset, RegisterID base)
> +    {
> +        m_formatter.twoByteOp(OP2_MOVSD_VsdWsd, (RegisterID)src, base, offset);
> +    }

Are these really needed since their only caller in this patch does not appear to be called by anyone.

> Source/JavaScriptCore/ftl/FTLCapabilities.cpp:213
> +    case ArithPow:
> +#if OS(WINDOWS)
> +        // LLVM does not support powi on Windows.
> +        if (node->child2().useKind() == Int32Use)
> +            return CannotCompile;
> +#endif
> +        break;

Is this still necessary?  In compileArithPow() below, you seem to have implemented support for (m_node->child2().useKind() == Int32Use) already (line 1843).  Or am I misreading it?

> Source/JavaScriptCore/ftl/FTLCompile.cpp:680
> +            auto iter = recordMap.find(call.stackmapID());
> +            if (iter != recordMap.end()) {
> +                for (unsigned i = 0; i < iter->value.size(); ++i) {
> +                    StackMaps::Record& record = iter->value[i];
> +                    RegisterSet usedRegisters = usedRegistersFor(record);
> +
> +                    dataLog("Used registers = ", usedRegisters, "\n");
> +                }
> +            }

I think this code will clash with https://bugs.webkit.org/show_bug.cgi?id=149970 (which is fixing a bug in this area).
Comment 64 Geoffrey Garen 2015-10-19 14:58:46 PDT
We shouldn't land a change like this while the Windows 64bit build is still crashing in JavaScriptCore regression tests.
Comment 65 peavo 2015-10-19 15:02:54 PDT
Created attachment 263517 [details]
Patch
Comment 66 peavo 2015-10-19 15:14:00 PDT
(In reply to comment #63)
> Comment on attachment 263482 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=263482&action=review
> 

Thanks for reviewing :)

> 
> > Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:722
> > +#if CPU(X86_64)
> > +    void loadDouble128(ImplicitAddress address, FPRegisterID src)
> > +    {
> > +        ASSERT(isSSE2Present());
> > +        m_assembler.movups_mr(src, address.offset, address.base);
> > +    }
> > +#endif
> 
> Why is this added?  I don't see it used anywhere in this patch.
> 

Removed. This was in preparation for saving and restoring all 16 bytes of callee saved xmm registers. I'll create another patch for this soon.

> 
> > Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:197
> > +    Call callJIT()
> > +    {
> > +        DataLabelPtr label = moveWithPatch(TrustedImmPtr(0), scratchRegister);
> > +        Call result = Call(m_assembler.call(scratchRegister), Call::Linkable);
> > +
> > +        ASSERT_UNUSED(label, differenceBetween(label, result) == REPTACH_OFFSET_CALL_R11);
> > +        return result;
> > +    }
> 
> Why is this added?  I don't see it used anywhere in this patch.
>

Removed, no longer used.
 
> 
> > Source/JavaScriptCore/ftl/FTLCapabilities.cpp:213
> > +    case ArithPow:
> > +#if OS(WINDOWS)
> > +        // LLVM does not support powi on Windows.
> > +        if (node->child2().useKind() == Int32Use)
> > +            return CannotCompile;
> > +#endif
> > +        break;
> 
> Is this still necessary?  In compileArithPow() below, you seem to have
> implemented support for (m_node->child2().useKind() == Int32Use) already
> (line 1843).  Or am I misreading it?
> 

You're right. Both are not needed. I kept this, since implementing support for powi using doublePow did not produce the exact correct result, causing a stress test failure.
Comment 67 peavo 2015-10-19 15:17:47 PDT
(In reply to comment #64)
> We shouldn't land a change like this while the Windows 64bit build is still
> crashing in JavaScriptCore regression tests.

I was not aware of any crashes in the JSC tests. Which tests are crashing? Do we have a stacktrace? I recently ran the JSC stress tests with FTL disabled without any failures. Maybe it has been introduced recently?
Comment 68 Brent Fulgham 2015-10-19 15:36:26 PDT
I'm seeing 64-bit JSC (Release) crashes on the Apple Windows port. If you are not seeing them in the WinCairo build, it points another finger at an internal library that Mark and I were discussing as the likely culprit.
Comment 69 peavo 2015-10-19 15:47:05 PDT
(In reply to comment #68)
> I'm seeing 64-bit JSC (Release) crashes on the Apple Windows port. If you
> are not seeing them in the WinCairo build, it points another finger at an
> internal library that Mark and I were discussing as the likely culprit.

Ah, ok, I have only been testing WinCairo. Has it been introduced recently?
Comment 70 Brent Fulgham 2015-10-19 15:53:31 PDT
(In reply to comment #69)
> (In reply to comment #68)
> > I'm seeing 64-bit JSC (Release) crashes on the Apple Windows port. If you
> > are not seeing them in the WinCairo build, it points another finger at an
> > internal library that Mark and I were discussing as the likely culprit.
> 
> Ah, ok, I have only been testing WinCairo. Has it been introduced recently?

I'm not sure. I'm trying to get our 64-bit testing infrastructure squared away so we can answer questions like these in the future. :-\
Comment 71 Filip Pizlo 2015-10-19 16:01:05 PDT
Also: I don't think we should land this until we can prove that the Windows 64-bit no-FTL configuration is as fast as - or almost as fast as - the OSX or Linux 64-bit no-FTL configuration.
Comment 72 peavo 2015-10-20 11:12:48 PDT
(In reply to comment #71)
> Also: I don't think we should land this until we can prove that the Windows
> 64-bit no-FTL configuration is as fast as - or almost as fast as - the OSX
> or Linux 64-bit no-FTL configuration.

Unfortunately, I don't currently have access to a Mac or a Linux machine, but maybe someone else has the opportunity to test this? :)

What is the best way to compare two platforms?
Comment 73 peavo 2015-10-28 13:08:10 PDT
Created attachment 264235 [details]
Patch
Comment 74 peavo 2015-10-28 14:37:48 PDT
Created attachment 264249 [details]
Patch
Comment 75 peavo 2015-10-28 14:47:38 PDT
(In reply to comment #74)
> Created attachment 264249 [details]
> Patch

Rebased, and resolved conflicts. I did not enable FTL in the latest patch, since we haven't compared DFG performance with other platforms yet.
Comment 76 peavo 2015-11-11 09:21:04 PST
Created attachment 265292 [details]
Patch
Comment 77 peavo 2015-11-11 09:32:35 PST
Created attachment 265295 [details]
Patch
Comment 78 peavo 2015-11-11 10:59:49 PST
(In reply to comment #77)
> Created attachment 265295 [details]
> Patch

Rebased and fixed a couple of recent stress test regressions.
There are zero jsc stress test failures now. Still on LLVM revision 241791.

28632/28632

Results for JSC stress tests:
    0 failures found.
    OK.
Comment 79 Alex Christensen 2015-12-09 23:07:22 PST
Comment on attachment 265295 [details]
Patch

Let's revisit this in a few weeks/months once b3 is developed enough.  Most of this will still be needed, but the llvm parts will not.
Comment 80 peavo 2015-12-10 12:11:11 PST
(In reply to comment #79)
> Comment on attachment 265295 [details]
> Patch
> 
> Let's revisit this in a few weeks/months once b3 is developed enough.  Most
> of this will still be needed, but the llvm parts will not.

Ok, sounds good, I might rebase from time to time, to avoid the relevant parts getting too stale :)
Comment 81 peavo 2015-12-17 06:41:44 PST
Created attachment 267558 [details]
Patch
Comment 82 peavo 2015-12-17 07:18:11 PST
Created attachment 267560 [details]
Patch
Comment 83 Csaba Osztrogonác 2016-02-18 05:11:44 PST
Comment on attachment 267560 [details]
Patch

It is outdated, now LVM is gone. Could you update it to B3?
Comment 84 peavo 2016-02-18 13:03:05 PST
(In reply to comment #83)
> Comment on attachment 267560 [details]
> Patch
> 
> It is outdated, now LVM is gone. Could you update it to B3?

Thanks, I hope to look more into B3 on Windows soon.
Comment 85 peavo 2016-05-24 12:24:32 PDT
Created attachment 279685 [details]
Patch
Comment 86 Csaba Osztrogonác 2016-05-24 12:51:40 PDT
Comment on attachment 279685 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=279685&action=review

There is no need for X86_64 guards in FTL files, because FTL is 64 bit only by design.

> Source/cmake/OptionsWin.cmake:33
> +WEBKIT_OPTION_DEFAULT_PORT_VALUE(ENABLE_FTL_JIT PRIVATE ON)

Don't Windows ports support 32-bit x86 anymore?
Comment 87 Csaba Osztrogonác 2016-05-24 12:57:01 PDT
What about the conformance and performance results?
Comment 88 Csaba Osztrogonác 2016-05-25 02:11:12 PDT
(In reply to comment #86)
> Comment on attachment 279685 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=279685&action=review
>
> > Source/cmake/OptionsWin.cmake:33
> > +WEBKIT_OPTION_DEFAULT_PORT_VALUE(ENABLE_FTL_JIT PRIVATE ON)
> 
> Don't Windows ports support 32-bit x86 anymore?

Based on https://bugs.webkit.org/show_bug.cgi?id=157931#c8 , it seems
Apple Windows bots build x86 target, so enabling FTL JIT unconditionally
isn't too good. But fortunately Platform.h disable FTL JIT on 32 bit platforms
even if it is enabled by someone.

/* The FTL *does not* work on 32-bit platforms. Disable it even if someone asked us to enable it. */
#if USE(JSVALUE32_64)
#undef ENABLE_FTL_JIT
#define ENABLE_FTL_JIT 0
#endif
Comment 89 Csaba Osztrogonác 2016-05-25 02:21:17 PDT
Ah, I got the goal of this ugly hack. :) It is used by Apple Mac build too.

Apple Mac port enables FTL_JIT unconditionally in 
Source/JavaScriptCore/Configurations/FeatureDefines.xcconfig
And then force disables in Platform.h for 32 bit platforms.

I assume because xcconfig doesn't know if the 
current build is 32 or 64 bit. Am I right?
Comment 90 peavo 2016-05-25 08:37:54 PDT
Created attachment 279774 [details]
Patch
Comment 91 peavo 2016-05-25 08:40:22 PDT
(In reply to comment #86)
> Comment on attachment 279685 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=279685&action=review
> 
> There is no need for X86_64 guards in FTL files, because FTL is 64 bit only
> by design.
> 

Thanks for reviewing :) Updated patch.
Comment 92 peavo 2016-05-25 09:17:17 PDT
(In reply to comment #87)
> What about the conformance and performance results?

There are no stress test regressions (I also applied the patch in https://bugs.webkit.org/show_bug.cgi?id=158073 when running stress tests).

I will also run some performance tests.
Comment 93 peavo 2016-06-01 11:20:24 PDT
Benchmark report for Kraken on \c.

VMs tested:
"DFG" at /cygdrive/c/WebKit/WebKit3/WebKitBuild/ReleaseDFG/bin64/jsc
"FTL" at /cygdrive/c/WebKit/WebKit3/WebKitBuild/Release/bin64/jsc

Collected 4 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime()
function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in
milliseconds.

                                             DFG                       FTL

ai-astar                               366.796+-140.328    ^     162.320+-12.679        ^ definitely 2.2597x faster
audio-beat-detection                   117.666+-5.419      ^      89.257+-7.523         ^ definitely 1.3183x faster
audio-dft                              189.045+-11.426     ^     145.779+-17.178        ^ definitely 1.2968x faster
audio-fft                               97.183+-7.117      ^      65.513+-5.417         ^ definitely 1.4834x faster
audio-oscillator                       149.853+-55.328            94.431+-8.717           might be 1.5869x faster
imaging-darkroom                       189.104+-9.191      ^     118.680+-1.601         ^ definitely 1.5934x faster
imaging-desaturate                     153.005+-4.605      ^     102.285+-6.300         ^ definitely 1.4959x faster
imaging-gaussian-blur                  254.369+-5.462      ^     131.986+-13.159        ^ definitely 1.9272x faster
json-parse-financial                    87.245+-4.712      ?      91.573+-1.767         ? might be 1.0496x slower
json-stringify-tinderbox                67.304+-4.712             67.024+-2.299
stanford-crypto-aes                    102.341+-9.824      ^      81.158+-3.177         ^ definitely 1.2610x faster
stanford-crypto-ccm                     75.854+-7.986      ?      81.478+-2.980         ? might be 1.0741x slower
stanford-crypto-pbkdf2                 190.708+-7.233      ?     204.435+-44.800        ? might be 1.0720x slower
stanford-crypto-sha256-iterative        76.860+-9.469             70.705+-3.160           might be 1.0871x faster

<arithmetic>                           151.238+-10.660     ^     107.616+-4.342         ^ definitely 1.4053x faster
Comment 94 peavo 2016-06-01 11:56:56 PDT
Benchmark report for Octane on \c.

VMs tested:
"DFG" at /cygdrive/c/WebKit/WebKit3/WebKitBuild/ReleaseDFG/bin64/jsc
"FTL" at /cygdrive/c/WebKit/WebKit3/WebKitBuild/Release/bin64/jsc

Collected 4 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc()
between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the
jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution
times with 95% confidence intervals in milliseconds.

                              DFG                       FTL

encrypt                 0.40643+-0.00614    ^     0.30742+-0.00217       ^ definitely 1.3221x faster
decrypt                 7.42668+-0.06367    ^     5.60406+-0.13478       ^ definitely 1.3252x faster
deltablue      x2       0.58301+-0.01605    ^     0.28249+-0.01155       ^ definitely 2.0638x faster
earley                  0.89649+-0.01630    ^     0.58405+-0.00759       ^ definitely 1.5350x faster
boyer                  12.80865+-0.48265    ^     9.97758+-0.06177       ^ definitely 1.2837x faster
navier-stokes  x2       9.60640+-0.27730    ^     7.11029+-0.21570       ^ definitely 1.3511x faster
raytrace       x2       3.86840+-0.15073    ^     1.54362+-0.03002       ^ definitely 2.5061x faster
richards       x2       0.29908+-0.01145    ^     0.16272+-0.00477       ^ definitely 1.8380x faster
splay          x2       0.89996+-0.04894          0.81134+-0.05202         might be 1.1092x faster
regexp         x2      35.52942+-1.02941    !    39.02314+-1.40534       ! definitely 1.0983x slower
pdfjs          x2      77.58769+-5.01494    ?    79.13132+-2.39736       ? might be 1.0199x slower
mandreel       x2     138.25426+-8.46641    ^   113.13855+-1.95979       ^ definitely 1.2220x faster
gbemu          x2      89.33162+-8.51151    ^    77.26824+-0.84239       ^ definitely 1.1561x faster
closure                 1.03592+-0.05506    ?     1.03644+-0.01954       ?
jquery                 13.46860+-0.42885         13.33433+-0.27978         might be 1.0101x faster
box2d          x2      21.44274+-0.59958    ?    22.37898+-1.28378       ? might be 1.0437x slower
zlib           x2     783.26776+-42.17878   ^   648.50476+-33.98770      ^ definitely 1.2078x faster
typescript     x2    1326.95959+-62.95861   ?  1377.46051+-25.29425      ? might be 1.0381x slower

<geometric>            13.56436+-0.08183    ^    10.65648+-0.06557       ^ definitely 1.2729x faster
Comment 95 peavo 2016-06-15 04:56:04 PDT
Created attachment 281349 [details]
Patch
Comment 96 Brent Fulgham 2016-07-07 13:00:49 PDT
What's the status of this bug? Can a JS person please approve/reject it?
Comment 97 Brady Eidson 2017-08-19 16:01:56 PDT
Comment on attachment 281349 [details]
Patch

r-, as this has been pending review for over a year now. It is near-impossible that this patch still applies to trunk and unlikely to still be relevant in its current form.
Comment 98 Ian Grunert 2023-10-13 09:40:19 PDT
Created attachment 468205 [details]
Patch

Uploading a WIP patch for enabling FTL on Windows. Ran into a couple of problems which I'm currently stuck on.

The patch enables webassembly at build time and disabling at runtime, as FTL makes assumptions that webassembly is enabled. Should be able to make a cleaner patch once https://bugs.webkit.org/show_bug.cgi?id=222315 lands (awaiting PR review).

The BBQ JIT won't work on Windows even after the changes for enabling webassembly on Windows land - the BBQ JIT requires changes to GPR assignment so it can use callee saves. With two fewer GPRs to work with it runs out of GPRs to assign when running benchmarks. This should get separated out to a separate bug.

The two problems I'm stuck on:

Segfaults during JetStream2 when running stanford-crypto-aes (on most but not all runs, it occasionally works). I haven't tracked down the cause, it's a strange one.

FTL is disabled on Windows for the EnumeratorNextUpdateIndexAndMode DFG operation. It crashes on Windows with a segfault on a simple test with JSC_useConcurrentJIT=false. If converted from UGPRPair return type back to JSValue it works - so either something is wrong in the FTL lowering, or the register allocator is having problems due to the extra return register required. Here's the test, the call to EnumeratorNextUpdateIndexAndMode ends up with 0 in the passed baseValue so it fails after attempting to decode and use it:

```
function getMappedArguments() { return arguments; }

function forIn(object) {
    var keys = [];
    for (var key in object)
        keys.push(key);
    return keys;
}

noInline(forIn);

(function() {
    var mappedArguments = getMappedArguments(0, 1, 2);
    for (var i = 0; i < 1e4; ++i) {
        forIn(mappedArguments);
    }
    print('done');
})();
```
Comment 99 Ian Grunert 2023-10-13 16:30:40 PDT
Draft PR: https://github.com/WebKit/WebKit/pull/19073

For those who prefer to look at a PR instead of a patch.