158619 – [JSC] Inline JSC::toInt32 to improve kraken

Yusuke Suzuki

Reported 2016-06-10 10:00:33 PDT

Linux perf tool analysis on several kraken benchmarks shows that JSC::toInt32 is frequently called. But, the function JSC::toInt32 itself is small one. It's worth inlining.

Yusuke Suzuki

Comment 1 2016-06-10 10:10:24 PDT

Seems good. I'll check Octane, SunSpider, ASMBench and upload the patch. Benchmark report for Kraken on hanayamata. VMs tested: "baseline" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf-master/Release/bin/jsc "patched" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf/Release/bin/jsc Collected 100 samples per benchmark/VM, with 100 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. baseline patched ai-astar 96.084+-0.548 96.015+-0.523 audio-beat-detection 48.621+-1.138 48.357+-1.020 audio-dft 123.803+-0.373 123.549+-0.109 audio-fft 37.399+-0.050 37.381+-0.039 audio-oscillator 53.682+-0.071 ? 53.718+-0.082 ? imaging-darkroom 87.673+-0.063 ? 87.724+-0.060 ? imaging-desaturate 56.400+-0.064 56.333+-0.041 imaging-gaussian-blur 79.772+-1.849 ? 79.936+-1.394 ? json-parse-financial 44.649+-0.208 ? 44.997+-0.302 ? json-stringify-tinderbox 27.110+-0.056 26.812+-0.428 might be 1.0111x faster stanford-crypto-aes 43.989+-0.157 ^ 43.001+-0.078 ^ definitely 1.0230x faster stanford-crypto-ccm 44.338+-0.688 ? 44.588+-0.686 ? stanford-crypto-pbkdf2 108.420+-1.044 ^ 106.231+-0.437 ^ definitely 1.0206x faster stanford-crypto-sha256-iterative 35.538+-0.077 ^ 35.314+-0.037 ^ definitely 1.0063x faster <arithmetic> 63.391+-0.168 63.140+-0.140 might be 1.0040x faster

Yusuke Suzuki

Comment 2 2016-06-10 21:13:33 PDT

Created attachment 281075 [details] Patch

Yusuke Suzuki

Comment 3 2016-06-10 21:16:47 PDT

Performance results. mandreel seems noise. To ensure that, we retake the mandreel 100 times, and the result is the following. Benchmark report for Octane on hanayamata. VMs tested: "baseline" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf-master/Release/bin/jsc "patched" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf/Release/bin/jsc Collected 100 samples per benchmark/VM, with 100 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. baseline patched mandreel x2 49.08879+-0.28178 48.86048+-0.10656 <geometric> 49.08879+-0.28178 48.86048+-0.10656 might be 1.0047x faster Benchmark report for SunSpider, Octane, Kraken, and AsmBench on hanayamata. VMs tested: "baseline" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf-master/Release/bin/jsc "patched" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf/Release/bin/jsc Collected 30 samples per benchmark/VM, with 30 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. baseline patched SunSpider: 3d-cube 6.1564+-0.2165 6.0395+-0.0198 might be 1.0194x faster 3d-morph 25.7981+-0.1100 ? 25.9357+-0.1825 ? 3d-raytrace 7.4267+-0.5293 6.8834+-0.2116 might be 1.0789x faster access-binary-trees 2.3713+-0.0980 ? 2.4013+-0.1151 ? might be 1.0127x slower access-fannkuch 6.9854+-0.3202 6.7167+-0.2763 might be 1.0400x faster access-nbody 2.8520+-0.0103 ? 2.9024+-0.0567 ? might be 1.0177x slower access-nsieve 3.2326+-0.0544 3.2281+-0.0483 bitops-3bit-bits-in-byte 1.2229+-0.0422 ? 1.2335+-0.0501 ? bitops-bits-in-byte 2.8118+-0.1010 2.7639+-0.0888 might be 1.0173x faster bitops-bitwise-and 2.0064+-0.0246 1.9841+-0.0082 might be 1.0112x faster bitops-nsieve-bits 3.5306+-0.2168 3.3985+-0.1914 might be 1.0389x faster controlflow-recursive 2.8202+-0.0975 ? 2.8671+-0.1323 ? might be 1.0166x slower crypto-aes 5.2740+-0.1032 ? 5.3000+-0.2027 ? crypto-md5 2.7913+-0.1429 2.7811+-0.1261 crypto-sha1 2.5246+-0.0264 ! 2.6164+-0.0640 ! definitely 1.0364x slower date-format-tofte 10.3640+-0.1394 ? 10.4484+-0.1300 ? date-format-xparb 6.0335+-0.1035 ? 6.0831+-0.1376 ? math-cordic 3.0874+-0.0865 ? 3.1688+-0.1189 ? might be 1.0264x slower math-partial-sums 10.5272+-0.0396 ? 10.5978+-0.1326 ? math-spectral-norm 2.1891+-0.0521 ? 2.2340+-0.0519 ? might be 1.0205x slower regexp-dna 7.3567+-0.0223 ? 7.3860+-0.0869 ? string-base64 3.9486+-0.0283 ? 4.0317+-0.0774 ? might be 1.0211x slower string-fasta 6.6015+-0.2154 6.4747+-0.1874 might be 1.0196x faster string-tagcloud 9.7379+-0.0710 9.6021+-0.1486 might be 1.0141x faster string-unpack-code 19.7744+-0.0478 ? 19.8322+-0.2298 ? string-validate-input 4.2980+-0.1012 ? 4.3578+-0.1089 ? might be 1.0139x slower <arithmetic> 6.2201+-0.0308 6.2026+-0.0274 might be 1.0028x faster baseline patched Octane: encrypt 0.17962+-0.00241 ? 0.18033+-0.00120 ? decrypt 2.98068+-0.01640 ? 3.00958+-0.04567 ? deltablue x2 0.15236+-0.00144 0.15025+-0.00117 might be 1.0141x faster earley 0.33712+-0.00074 ? 0.33722+-0.00079 ? boyer 5.32422+-0.00564 ? 5.32558+-0.00841 ? navier-stokes x2 4.78881+-0.00693 ? 4.79320+-0.01528 ? raytrace x2 0.93551+-0.00362 0.93233+-0.00309 richards x2 0.09723+-0.00092 ? 0.09801+-0.00167 ? splay x2 0.39334+-0.00123 ? 0.39364+-0.00119 ? regexp x2 18.71392+-0.06857 18.65126+-0.05944 pdfjs x2 42.39686+-0.35091 42.31642+-0.30435 mandreel x2 48.68664+-0.15582 ! 49.25694+-0.29339 ! definitely 1.0117x slower gbemu x2 39.31304+-1.31894 38.79220+-1.50068 might be 1.0134x faster closure 0.60327+-0.00395 0.60178+-0.00309 jquery 8.02553+-0.02883 ? 8.08426+-0.14237 ? box2d x2 14.63042+-0.09236 ? 14.69277+-0.19818 ? zlib x2 368.45610+-5.24313 364.03090+-2.28827 might be 1.0122x faster typescript x2 810.87181+-4.19509 810.07914+-4.30712 <geometric> 5.90654+-0.01651 5.90027+-0.01901 might be 1.0011x faster baseline patched Kraken: ai-astar 95.966+-1.109 ? 96.090+-0.925 ? audio-beat-detection 47.492+-1.701 46.657+-1.232 might be 1.0179x faster audio-dft 123.721+-0.402 ? 124.373+-0.666 ? audio-fft 37.322+-0.036 37.276+-0.015 audio-oscillator 53.789+-0.285 53.722+-0.157 imaging-darkroom 88.159+-0.917 87.730+-0.094 imaging-desaturate 56.332+-0.064 ? 56.345+-0.104 ? imaging-gaussian-blur 80.524+-2.687 79.091+-2.555 might be 1.0181x faster json-parse-financial 44.610+-0.441 ? 44.791+-0.325 ? json-stringify-tinderbox 27.087+-0.035 ^ 26.609+-0.144 ^ definitely 1.0180x faster stanford-crypto-aes 43.669+-0.210 ^ 42.862+-0.115 ^ definitely 1.0188x faster stanford-crypto-ccm 45.213+-1.424 44.490+-1.290 might be 1.0162x faster stanford-crypto-pbkdf2 107.665+-0.581 ^ 106.229+-0.807 ^ definitely 1.0135x faster stanford-crypto-sha256-iterative 35.470+-0.112 35.351+-0.078 <arithmetic> 63.359+-0.230 62.972+-0.281 might be 1.0061x faster baseline patched AsmBench: towers.c 280.7607+-10.5500 276.5852+-2.2406 might be 1.0151x faster n-body.c 930.6878+-6.5519 924.9891+-6.5493 float-mm.c 718.3752+-4.3448 ? 723.1201+-5.6769 ? container.cpp 3029.7668+-37.4858 ? 3048.7741+-25.0556 ? quicksort.c 439.4318+-1.6382 439.2427+-1.8137 gcc-loops.cpp 4163.1676+-18.3908 ? 4175.9266+-52.8059 ? bigfib.cpp 456.7496+-6.6840 ? 457.0188+-5.5533 ? hash-map 153.3622+-1.5466 ? 153.5558+-1.7462 ? dry.c 478.1047+-12.4209 477.8930+-9.9328 <geometric> 688.1226+-3.0102 688.0901+-2.4242 might be 1.0000x faster baseline patched Geomean of preferred means: <scaled-result> 35.5738+-0.0610 35.4851+-0.0812 might be 1.0025x faster

Yusuke Suzuki

Comment 4 2016-06-10 21:33:39 PDT

Comment on attachment 281075 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=281075&action=review > Source/JavaScriptCore/runtime/JSCJSValueInlines.h:1062 > +ALWAYS_INLINE int32_t toInt32(double number) It may be good to change this ALWAYS_INLINE to inline to rely on the compiler's decision. I'll check this.

Yusuke Suzuki

Comment 5 2016-06-10 21:57:48 PDT

Comment on attachment 281075 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=281075&action=review >> Source/JavaScriptCore/runtime/JSCJSValueInlines.h:1062 >> +ALWAYS_INLINE int32_t toInt32(double number) > > It may be good to change this ALWAYS_INLINE to inline to rely on the compiler's decision. I'll check this. Performance results recommend us to use ALWAYS_INLINE for this toInt32 function. So use ALWAYS_INLINE. Benchmark report for Kraken on hanayamata. VMs tested: "baseline" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf-master/Release/bin/jsc "patched" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf/Release/bin/jsc "always" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf2/Release/bin/jsc Collected 100 samples per benchmark/VM, with 100 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. baseline patched always always v. baseline stanford-crypto-pbkdf2 107.863+-0.376 ^ 106.828+-0.409 106.087+-0.450 ^ definitely 1.0167x faster <arithmetic> 107.863+-0.376 ^ 106.828+-0.409 106.087+-0.450 ^ definitely 1.0167x faster

Mark Lam

Comment 6 2016-06-10 22:06:46 PDT

Comment on attachment 281075 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=281075&action=review r=me with fix. >>> Source/JavaScriptCore/runtime/JSCJSValueInlines.h:1062 >>> +ALWAYS_INLINE int32_t toInt32(double number) >> >> It may be good to change this ALWAYS_INLINE to inline to rely on the compiler's decision. I'll check this. > > Performance results recommend us to use ALWAYS_INLINE for this toInt32 function. So use ALWAYS_INLINE. > > Benchmark report for Kraken on hanayamata. > > VMs tested: > "baseline" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf-master/Release/bin/jsc > "patched" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf/Release/bin/jsc > "always" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf2/Release/bin/jsc > > Collected 100 samples per benchmark/VM, with 100 VM invocations per benchmark. Emitted a call to gc() between sample measurements. > Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level > timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. > > baseline patched always always v. baseline > > stanford-crypto-pbkdf2 107.863+-0.376 ^ 106.828+-0.409 106.087+-0.450 ^ definitely 1.0167x faster > > <arithmetic> 107.863+-0.376 ^ 106.828+-0.409 106.087+-0.450 ^ definitely 1.0167x faster I know that this function came from JSCJSValue.cpp, but I think a better place to put it is in runtime/MathCommon.h since it doesn't have anything to do with JSValues.

Yusuke Suzuki

Comment 7 2016-06-10 22:10:02 PDT

Comment on attachment 281075 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=281075&action=review >>>> Source/JavaScriptCore/runtime/JSCJSValueInlines.h:1062 >>>> +ALWAYS_INLINE int32_t toInt32(double number) >>> >>> It may be good to change this ALWAYS_INLINE to inline to rely on the compiler's decision. I'll check this. >> >> Performance results recommend us to use ALWAYS_INLINE for this toInt32 function. So use ALWAYS_INLINE. >> >> Benchmark report for Kraken on hanayamata. >> >> VMs tested: >> "baseline" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf-master/Release/bin/jsc >> "patched" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf/Release/bin/jsc >> "always" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf2/Release/bin/jsc >> >> Collected 100 samples per benchmark/VM, with 100 VM invocations per benchmark. Emitted a call to gc() between sample measurements. >> Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level >> timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. >> >> baseline patched always always v. baseline >> >> stanford-crypto-pbkdf2 107.863+-0.376 ^ 106.828+-0.409 106.087+-0.450 ^ definitely 1.0167x faster >> >> <arithmetic> 107.863+-0.376 ^ 106.828+-0.409 106.087+-0.450 ^ definitely 1.0167x faster > > I know that this function came from JSCJSValue.cpp, but I think a better place to put it is in runtime/MathCommon.h since it doesn't have anything to do with JSValues. Right! I'll move this to MathCommon.h (And that is also the better place to put operationToInt32 since operationMathPow resides there).

Yusuke Suzuki

Comment 8 2016-06-10 22:29:10 PDT

Comment on attachment 281075 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=281075&action=review >>>>> Source/JavaScriptCore/runtime/JSCJSValueInlines.h:1062 >>>>> +ALWAYS_INLINE int32_t toInt32(double number) >>>> >>>> It may be good to change this ALWAYS_INLINE to inline to rely on the compiler's decision. I'll check this. >>> >>> Performance results recommend us to use ALWAYS_INLINE for this toInt32 function. So use ALWAYS_INLINE. >>> >>> Benchmark report for Kraken on hanayamata. >>> >>> VMs tested: >>> "baseline" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf-master/Release/bin/jsc >>> "patched" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf/Release/bin/jsc >>> "always" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/perf2/Release/bin/jsc >>> >>> Collected 100 samples per benchmark/VM, with 100 VM invocations per benchmark. Emitted a call to gc() between sample measurements. >>> Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level >>> timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. >>> >>> baseline patched always always v. baseline >>> >>> stanford-crypto-pbkdf2 107.863+-0.376 ^ 106.828+-0.409 106.087+-0.450 ^ definitely 1.0167x faster >>> >>> <arithmetic> 107.863+-0.376 ^ 106.828+-0.409 106.087+-0.450 ^ definitely 1.0167x faster >> >> I know that this function came from JSCJSValue.cpp, but I think a better place to put it is in runtime/MathCommon.h since it doesn't have anything to do with JSValues. > > Right! I'll move this to MathCommon.h (And that is also the better place to put operationToInt32 since operationMathPow resides there). Looking into MathCommon, it includes several headers to gain some define flags. But I think these flags should be moved to WTF/Compiler.h. I'll create a new patch later.

Yusuke Suzuki

Comment 9 2016-06-10 22:30:20 PDT

Committed r201964: <http://trac.webkit.org/changeset/201964>

Yusuke Suzuki

Comment 10 2016-06-10 23:00:20 PDT

Committed r201966: <http://trac.webkit.org/changeset/201966>

Darin Adler

Comment 11 2016-06-11 12:35:02 PDT

Comment on attachment 281075 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=281075&action=review > Source/JavaScriptCore/runtime/JSCJSValueInlines.h:1057 > +// of the resulting bit-pattern (as such this metod is also called to implement Typo here "metod". > Source/JavaScriptCore/runtime/JSCJSValueInlines.h:1071 > + // Note this case handles 0, -0, and all infinte, NaN, & denormal value. Typo here "infinte"

Yusuke Suzuki

Comment 12 2016-06-13 09:47:28 PDT

Comment on attachment 281075 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=281075&action=review Thanks! I'll land the follow up patch that fixes pointed typos. >> Source/JavaScriptCore/runtime/JSCJSValueInlines.h:1057 >> +// of the resulting bit-pattern (as such this metod is also called to implement > > Typo here "metod". Thanks! Fixed in the follow up patch. >> Source/JavaScriptCore/runtime/JSCJSValueInlines.h:1071 >> + // Note this case handles 0, -0, and all infinte, NaN, & denormal value. > > Typo here "infinte" Fixed.

Yusuke Suzuki

Comment 13 2016-06-13 09:51:05 PDT

Committed r201993: <http://trac.webkit.org/changeset/201993>

Attachments
Patch (13.29 KB, patch) 2016-06-10 21:13 PDT, Yusuke Suzuki	mark.lam: review+	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.