NEW124780
Kraken causes a lot of slow path calls when using Math.pow()
https://bugs.webkit.org/show_bug.cgi?id=124780
Summary Kraken causes a lot of slow path calls when using Math.pow()
Romain Perier
Reported 2013-11-22 09:32:04 PST
Kraken causes a lot of OSR exits when using Math.pow()
Attachments
Patch (4.44 KB, patch)
2013-11-22 09:45 PST, Romain Perier
no flags
Archive of layout-test-results from webkit-ews-09 for mac-mountainlion-wk2 (471.09 KB, application/zip)
2013-11-22 10:39 PST, Build Bot
no flags
Archive of layout-test-results from webkit-ews-05 for mac-mountainlion (521.96 KB, application/zip)
2013-11-22 11:28 PST, Build Bot
no flags
Patch (4.45 KB, patch)
2013-11-23 05:30 PST, Romain Perier
no flags
Archive of layout-test-results from webkit-ews-15 for mac-mountainlion-wk2 (471.20 KB, application/zip)
2013-11-23 06:21 PST, Build Bot
no flags
Archive of layout-test-results from webkit-ews-06 for mac-mountainlion (515.35 KB, application/zip)
2013-11-23 07:07 PST, Build Bot
no flags
Archive of layout-test-results from webkit-ews-01 for mac-mountainlion (525.28 KB, application/zip)
2013-11-23 08:27 PST, Build Bot
no flags
Patch (6.04 KB, patch)
2013-12-07 10:50 PST, Romain Perier
no flags
performance gain (671 bytes, text/plain)
2013-12-08 03:13 PST, Romain Perier
no flags
Patch (5.47 KB, patch)
2013-12-09 02:33 PST, Romain Perier
no flags
Patch (8.14 KB, patch)
2013-12-16 08:17 PST, Romain Perier
no flags
Patch (8.62 KB, patch)
2013-12-26 23:37 PST, Romain Perier
no flags
Patch (8.64 KB, patch)
2013-12-26 23:44 PST, Romain Perier
benjamin: review-
Romain Perier
Comment 1 2013-11-22 09:40:16 PST
The Kraken benchmark produces a lot of OSR exits when you start it from Safari or webkit nightly. If you have a look at Instruments, you can find that's the function " mathProtoFuncPow(ExecState*)" which slows the benchmark significantly. Now take a look at the kraken code source, it call Math.pow() a lot of times with negative integers or floating point numbers as exponents. These cases are not handled in jit/ThunkGenerators.cpp (except when the exponent is equal to 1/2) which causes OSR exits. I propose the following patch which basically improves the performances between 430 and 900%. (Just test it with PerformanceTests with a simple run per seconds). I will probably use the DFG compiler when the code is "hot" , but it will be a future contribution
Romain Perier
Comment 2 2013-11-22 09:45:55 PST
Romain Perier
Comment 3 2013-11-22 09:47:42 PST
As said in the changelog, it also improves the kraken performances (between 1 and 2.5%). Sorry for the first useless comment by the way, I just tested to create the bug using the script Tools/Script/webkit-patch but apparently it did not work completely ;)
Romain Perier
Comment 4 2013-11-22 09:54:14 PST
So the plan here is to use the baselineJIT in order to reduce as most as possible the number of OSR exits caused by kraken. the binary size is also very important, this is why I created a wrapper to the libc function pow. What I have in mind is: we could give types and code informations to the DFG compiler in parallel of this code to improve the performances even better when the code is really "hot". I am still a beginner in JSC so my point of view or my code are not probably perfects.
Filip Pizlo
Comment 5 2013-11-22 10:17:32 PST
Comment on attachment 217695 [details] Patch I'm very confused. The DFG doesn't support have a Math.pow() intrinsic. So, where are the OSR exits?
Romain Perier
Comment 6 2013-11-22 10:23:28 PST
By default the code of Math.pow is produced by the ThunkGenerator "powThunkGenerator" in jit/ThunkGenerators.cpp . The unhandled cases in this ThunkGenerator (when the exponent is a negative integer, or when the exponent is a double != -1/2) causes OSR exits -> mathProtoFuncPow() is called. (it don't know if it's the good technical term for that)
Romain Perier
Comment 7 2013-11-22 10:24:11 PST
The DFG is an idea, for future improvements . Nothing more :)
Build Bot
Comment 8 2013-11-22 10:25:56 PST
Filip Pizlo
Comment 9 2013-11-22 10:26:37 PST
(In reply to comment #6) > By default the code of Math.pow is produced by the ThunkGenerator "powThunkGenerator" in jit/ThunkGenerators.cpp . The unhandled cases in this ThunkGenerator (when the exponent is a negative integer, or when the exponent is a double != -1/2) causes OSR exits -> mathProtoFuncPow() is called. (it don't know if it's the good technical term for that) That's not an OSR exit. ;-) That's a slow path call. OSR implies that a piece of compiled code gets replaced, while on the stack, with a different piece of compiled code that does the same thing. powThunkGenerator has a call to mathProtoFuncPow. It's not an exit. ;-)
Build Bot
Comment 10 2013-11-22 10:39:48 PST
Comment on attachment 217695 [details] Patch Attachment 217695 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.appspot.com/results/33598087 New failing tests: sputnik/Conformance/15_Native_Objects/15.8_Math/15.8.2/15.8.2.13_pow/S15.8.2.13_A7.html js/math.html sputnik/Conformance/15_Native_Objects/15.8_Math/15.8.2/15.8.2.13_pow/S15.8.2.13_A8.html
Build Bot
Comment 11 2013-11-22 10:39:50 PST
Created attachment 217699 [details] Archive of layout-test-results from webkit-ews-09 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: webkit-ews-09 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Build Bot
Comment 12 2013-11-22 11:28:56 PST
Comment on attachment 217695 [details] Patch Attachment 217695 [details] did not pass mac-ews (mac): Output: http://webkit-queues.appspot.com/results/34498050 New failing tests: sputnik/Conformance/15_Native_Objects/15.8_Math/15.8.2/15.8.2.13_pow/S15.8.2.13_A7.html js/math.html sputnik/Conformance/15_Native_Objects/15.8_Math/15.8.2/15.8.2.13_pow/S15.8.2.13_A8.html
Build Bot
Comment 13 2013-11-22 11:28:58 PST
Created attachment 217704 [details] Archive of layout-test-results from webkit-ews-05 for mac-mountainlion The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: webkit-ews-05 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Romain Perier
Comment 14 2013-11-23 05:30:39 PST
Build Bot
Comment 15 2013-11-23 06:10:46 PST
Build Bot
Comment 16 2013-11-23 06:21:00 PST
Comment on attachment 217746 [details] Patch Attachment 217746 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.appspot.com/results/35308130 New failing tests: sputnik/Conformance/15_Native_Objects/15.8_Math/15.8.2/15.8.2.13_pow/S15.8.2.13_A7.html js/math.html sputnik/Conformance/15_Native_Objects/15.8_Math/15.8.2/15.8.2.13_pow/S15.8.2.13_A8.html
Build Bot
Comment 17 2013-11-23 06:21:02 PST
Created attachment 217747 [details] Archive of layout-test-results from webkit-ews-15 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: webkit-ews-15 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Build Bot
Comment 18 2013-11-23 07:07:19 PST
Comment on attachment 217746 [details] Patch Attachment 217746 [details] did not pass mac-ews (mac): Output: http://webkit-queues.appspot.com/results/35368127 New failing tests: sputnik/Conformance/15_Native_Objects/15.8_Math/15.8.2/15.8.2.13_pow/S15.8.2.13_A7.html js/math.html sputnik/Conformance/15_Native_Objects/15.8_Math/15.8.2/15.8.2.13_pow/S15.8.2.13_A8.html
Build Bot
Comment 19 2013-11-23 07:07:26 PST
Created attachment 217748 [details] Archive of layout-test-results from webkit-ews-06 for mac-mountainlion The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: webkit-ews-06 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Build Bot
Comment 20 2013-11-23 08:27:21 PST
Comment on attachment 217746 [details] Patch Attachment 217746 [details] did not pass mac-ews (mac): Output: http://webkit-queues.appspot.com/results/35218136 New failing tests: sputnik/Conformance/15_Native_Objects/15.8_Math/15.8.2/15.8.2.13_pow/S15.8.2.13_A7.html js/math.html sputnik/Conformance/15_Native_Objects/15.8_Math/15.8.2/15.8.2.13_pow/S15.8.2.13_A8.html
Build Bot
Comment 21 2013-11-23 08:27:24 PST
Created attachment 217749 [details] Archive of layout-test-results from webkit-ews-01 for mac-mountainlion The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: webkit-ews-01 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Geoffrey Garen
Comment 22 2013-11-23 11:38:52 PST
Comment on attachment 217746 [details] Patch Looks like this is failing some Sputnik tests. Do you know why?
Romain Perier
Comment 23 2013-11-24 02:35:23 PST
I will investigate why
Romain Perier
Comment 24 2013-12-02 05:50:27 PST
The tests fail because the cases where the exponent is NaN and infinity are not handled in my code (nothing really complicated). However, I was wondering how "NaN" is encoded in the VM. If I try to test if the exponent is NaN (exponent != exponent or using std::isnan) it does not work. Any ideas ? I did the following tests: - branchDouble(DoubleNotEqual with the exponent => it does not work - I saved the exponent on the stack (using storeDouble) and used std::isnan => it does work work - branchDouble(DoubleEqual ==> does not make sense
Romain Perier
Comment 25 2013-12-02 23:32:48 PST
branchDouble with DoubleNotEqualOrUnordered works ;)
Romain Perier
Comment 26 2013-12-07 10:50:37 PST
Build Bot
Comment 27 2013-12-07 11:32:22 PST
Romain Perier
Comment 28 2013-12-08 03:13:53 PST
Created attachment 218679 [details] performance gain See performance gain above
Romain Perier
Comment 29 2013-12-09 02:33:03 PST
Build Bot
Comment 30 2013-12-09 03:13:37 PST
Benjamin Poulain
Comment 31 2013-12-10 16:25:52 PST
Comment on attachment 218741 [details] Patch Please add a ARMv7 wrapper as well. The "jit.sqrtDouble" generator should be behind jit.supportsFloatingPointSqrt. And Windows does not build.
Romain Perier
Comment 32 2013-12-16 08:17:10 PST
Romain Perier
Comment 33 2013-12-16 08:18:35 PST
Could you check and test this patch on ARM ? Unfortunately I have not ARM devices. Thanks
Romain Perier
Comment 34 2013-12-17 23:54:52 PST
Ping
Benjamin Poulain
Comment 35 2013-12-20 14:05:25 PST
I just tried on ARM device. I get 2.5% better timing on Kraken.
Filip Pizlo
Comment 36 2013-12-20 14:35:35 PST
Comment on attachment 219320 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=219320&action=review > Source/JavaScriptCore/ChangeLog:12 > + calls significantly and improves performance between 15% and 185%, > + also, it improves the kraken benchmark (5%). Can you be more specific about which benchmarks are so affected? If any of those benchmarks are available under open source terms, can you put them in LayoutTests/js/regress? > Source/JavaScriptCore/jit/ThunkGenerators.cpp:669 > +#define defineBinaryDoubleOpWrapper(function) \ > + asm( \ > + ".text\n" \ > + ".globl " SYMBOL_STRING(function##Thunk) "\n" \ > + HIDE_SYMBOL(function##Thunk) "\n" \ > + SYMBOL_STRING(function##Thunk) ":" "\n" \ > + "subl $8, %esp\n" \ > + "movsd %xmm0, (%esp) \n" \ > + "subl $8, %esp\n" \ > + "movsd %xmm1, (%esp) \n" \ > + "call " GLOBAL_REFERENCE(function) "\n" \ > + "fstpl (%esp) \n" \ > + "addl $8, %esp\n" \ > + "movsd (%esp), %xmm0 \n" \ > + "addl $8, %esp\n" \ > + "ret\n" \ > + ); \ > + extern "C" { \ > + MathThunkCallingConvention function##Thunk(MathThunkCallingConvention); \ > + } \ > + static MathThunk UnaryDoubleOpWrapper(function) = &function##Thunk; Can you explain this change? > Source/JavaScriptCore/jit/ThunkGenerators.cpp:719 > +#define defineBinaryDoubleOpWrapper(function) \ > + asm( \ > + ".text\n" \ > + ".align 2\n" \ > + ".globl " SYMBOL_STRING(function##Thunk) "\n" \ > + HIDE_SYMBOL(function##Thunk) "\n" \ > + ".thumb\n" \ > + ".thumb_func " THUMB_FUNC_PARAM(function##Thunk) "\n" \ > + SYMBOL_STRING(function##Thunk) ":" "\n" \ > + "push {lr}\n" \ > + "vmov r0, r1, d0\n" \ > + "vmov r2, r3, d1\n" \ > + "blx " GLOBAL_REFERENCE(function) "\n" \ > + "vmov d0, r0, r1\n" \ > + "pop {lr}\n" \ > + "bx lr\n" \ > + ); \ > + extern "C" { \ > + MathThunkCallingConvention function##Thunk(MathThunkCallingConvention); \ > + } \ > + static MathThunk UnaryDoubleOpWrapper(function) = &function##Thunk; > + > +#elif CPU(ARM64) > +#define BinaryDoubleOpWrapper(function) UnaryDoubleOpWrapper(function) > +#define defineBinaryDoubleOpWrapper(function) defineUnaryDoubleOpWrapper(function) And this?
Romain Perier
Comment 37 2013-12-26 23:37:59 PST
Romain Perier
Comment 38 2013-12-26 23:44:12 PST
Romain Perier
Comment 39 2013-12-26 23:52:06 PST
@Benjamin: thanks for your tests @Filip: For PerformanceTests see the attached file "performance gain" (the unit test is at the top of the file) As I said in the changelog, this patch improves the global score of kraken. The main difference is for imaging tests. I am not sure that it does make sense to import Kraken in LayoutTests/js/regress. The kraken benchmark is a big project... (however I did not find informations about the licenses terms)
Michael Saboff
Comment 40 2014-01-15 15:09:13 PST
Comment on attachment 220051 [details] Patch I looked at this specifically for ARM64 correctness. Given that the arguments are already in V registers, this change looks good to me. The 32 bit wrappers look good as well.
Benjamin Poulain
Comment 41 2014-01-27 19:42:01 PST
Comment on attachment 220051 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=220051&action=review > Source/JavaScriptCore/jit/ThunkGenerators.cpp:915 > + jit.neg32(SpecializedThunkJIT::regT0); > + jit.divDouble(SpecializedThunkJIT::fpRegT0, SpecializedThunkJIT::fpRegT1); > + jit.moveDouble(SpecializedThunkJIT::fpRegT1, SpecializedThunkJIT::fpRegT0); > + jit.loadDouble(&oneConstant, SpecializedThunkJIT::fpRegT1); > + exponentIsPositive.link(&jit); I have the feeling jit.neg32(SpecializedThunkJIT::regT0); does not work for INT_MIN (since -INT_MIN > INT_MAX). Am I missing something here? > Source/JavaScriptCore/jit/ThunkGenerators.cpp:947 > + nonIntExponent.link(&jit); > + jit.loadDoubleArgument(1, SpecializedThunkJIT::fpRegT1, SpecializedThunkJIT::regT0); > + jit.loadDouble(&negativeHalfConstant, SpecializedThunkJIT::fpRegT2); > + MacroAssembler::Jump exponentIsInverseSqrt = jit.branchDouble(MacroAssembler::DoubleEqual, SpecializedThunkJIT::fpRegT1, SpecializedThunkJIT::fpRegT2); > > - SpecializedThunkJIT::JumpList doubleResult; > - jit.branchConvertDoubleToInt32(SpecializedThunkJIT::fpRegT1, SpecializedThunkJIT::regT0, doubleResult, SpecializedThunkJIT::fpRegT0); > - jit.returnInt32(SpecializedThunkJIT::regT0); > - doubleResult.link(&jit); > - jit.returnDouble(SpecializedThunkJIT::fpRegT1); > - } else > - jit.appendFailure(nonIntExponent); > + jit.callDoubleToDoublePreservingReturn(BinaryDoubleOpWrapper(jsPow)); > + jit.returnDouble(SpecializedThunkJIT::fpRegT0); > > + exponentIsInverseSqrt.link(&jit); > + jit.loadDouble(&oneConstant, SpecializedThunkJIT::fpRegT1); > + jit.sqrtDouble(SpecializedThunkJIT::fpRegT0, SpecializedThunkJIT::fpRegT0); > + jit.divDouble(SpecializedThunkJIT::fpRegT0, SpecializedThunkJIT::fpRegT1); > + jit.returnDouble(SpecializedThunkJIT::fpRegT1); The branch using jit.sqrtDouble should be behind test for jit.supportsFloatingPointSqrt().
Note You need to log in before you can comment on or make changes to this bug.