RESOLVED FIXED 158416
[JSC] Do not allocate unnecessary UTF-8 string for encodeXXX functions
https://bugs.webkit.org/show_bug.cgi?id=158416
Summary [JSC] Do not allocate unnecessary UTF-8 string for encodeXXX functions
Yusuke Suzuki
Reported 2016-06-06 05:18:41 PDT
...
Attachments
Patch (15.90 KB, patch)
2016-06-06 11:34 PDT, Yusuke Suzuki
no flags
Patch (23.07 KB, patch)
2016-06-06 12:12 PDT, Yusuke Suzuki
no flags
Patch (31.27 KB, patch)
2016-06-07 03:59 PDT, Yusuke Suzuki
ggaren: review+
Yusuke Suzuki
Comment 1 2016-06-06 08:15:36 PDT
Let's make kraken's stanford-crypto-sha256-iterative fast!
Yusuke Suzuki
Comment 2 2016-06-06 11:34:49 PDT
Yusuke Suzuki
Comment 3 2016-06-06 11:42:00 PDT
Ah, I'll fix the patch slightly.
Mark Lam
Comment 4 2016-06-06 11:42:12 PDT
Comment on attachment 280611 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=280611&action=review > Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:59 > + static_assert(charactersCount > 0, "Since string literal is null terminated, characterCount is always lager than 0"); typo: /lager/larger/.
Yusuke Suzuki
Comment 5 2016-06-06 11:51:18 PDT
Comment on attachment 280611 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=280611&action=review >> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:59 >> + static_assert(charactersCount > 0, "Since string literal is null terminated, characterCount is always lager than 0"); > > typo: /lager/larger/. Thanks. I'll fix this and upload the patch with build fix (in clang env) & perf results soon :)
Yusuke Suzuki
Comment 6 2016-06-06 12:12:25 PDT
Yusuke Suzuki
Comment 7 2016-06-06 12:14:48 PDT
Benchmark results. stanford-crypto-sha256-iterative shows significant perf improvements. (6.8% win) Octane splay and Kraken imaging-gaussian-blur are noise since they do not execute any of encodeXXX, escape, etc. Benchmark report for SunSpider, Octane, Kraken, and AsmBench on hanayamata. VMs tested: "baseline" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/encode-master/Release/bin/jsc "patched" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/encode/Release/bin/jsc Collected 30 samples per benchmark/VM, with 30 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. baseline patched SunSpider: 3d-cube 6.0684+-0.1519 ? 6.1302+-0.2071 ? might be 1.0102x slower 3d-morph 26.1241+-0.5836 26.0951+-0.3270 3d-raytrace 7.0482+-0.3035 ? 7.0791+-0.3320 ? access-binary-trees 2.3861+-0.1070 ? 2.4146+-0.1097 ? might be 1.0119x slower access-fannkuch 7.1236+-0.3433 7.1217+-0.3383 access-nbody 3.0435+-0.3104 2.8864+-0.0151 might be 1.0544x faster access-nsieve 3.2426+-0.0435 ? 3.2448+-0.0426 ? bitops-3bit-bits-in-byte 1.1916+-0.0382 ? 1.2517+-0.0536 ? might be 1.0504x slower bitops-bits-in-byte 2.7368+-0.0837 ? 2.7787+-0.1000 ? might be 1.0153x slower bitops-bitwise-and 1.9845+-0.0075 ? 2.0047+-0.0306 ? might be 1.0102x slower bitops-nsieve-bits 3.3287+-0.1575 ? 3.3993+-0.1863 ? might be 1.0212x slower controlflow-recursive 2.8508+-0.1212 2.7444+-0.1061 might be 1.0388x faster crypto-aes 5.2183+-0.0471 ? 5.2989+-0.1521 ? might be 1.0154x slower crypto-md5 2.7435+-0.1301 ? 2.8538+-0.1488 ? might be 1.0402x slower crypto-sha1 2.5278+-0.0131 ? 2.5597+-0.0562 ? might be 1.0126x slower date-format-tofte 10.8944+-0.0941 ? 11.0237+-0.1755 ? might be 1.0119x slower date-format-xparb 6.1118+-0.3347 5.9302+-0.0388 might be 1.0306x faster math-cordic 3.2625+-0.1810 3.1901+-0.0993 might be 1.0227x faster math-partial-sums 10.5061+-0.0899 ? 10.6684+-0.2468 ? might be 1.0155x slower math-spectral-norm 2.2008+-0.0501 2.1712+-0.0436 might be 1.0136x faster regexp-dna 7.4185+-0.0994 ? 7.5740+-0.4361 ? might be 1.0210x slower string-base64 3.9481+-0.0232 ? 3.9552+-0.0182 ? string-fasta 6.5388+-0.1743 ? 6.5890+-0.2237 ? string-tagcloud 9.7219+-0.0471 ? 9.7571+-0.0502 ? string-unpack-code 19.3886+-0.0646 ? 19.5157+-0.1861 ? string-validate-input 4.2523+-0.0357 ? 4.3758+-0.1318 ? might be 1.0291x slower <arithmetic> 6.2255+-0.0406 ? 6.2544+-0.0353 ? might be 1.0046x slower baseline patched Octane: encrypt 0.17882+-0.00088 ? 0.18106+-0.00170 ? might be 1.0125x slower decrypt 3.04865+-0.01341 ? 3.06151+-0.02088 ? deltablue x2 0.14966+-0.00199 ? 0.15034+-0.00148 ? earley 0.33620+-0.00072 ? 0.33678+-0.00087 ? boyer 5.32185+-0.00853 5.31366+-0.00566 navier-stokes x2 4.78919+-0.01036 ? 4.79750+-0.01353 ? raytrace x2 0.93351+-0.00266 ? 0.93496+-0.00388 ? richards x2 0.09707+-0.00063 ? 0.09737+-0.00076 ? splay x2 0.39284+-0.00116 ! 0.40022+-0.00175 ! definitely 1.0188x slower regexp x2 18.68103+-0.06739 ? 18.81026+-0.07202 ? pdfjs x2 42.34719+-0.29505 ? 42.59288+-0.34460 ? mandreel x2 48.95736+-0.19510 ? 49.30969+-0.47229 ? gbemu x2 38.37419+-1.43813 ? 39.22959+-1.73683 ? might be 1.0223x slower closure 0.60627+-0.00159 ? 0.60960+-0.00346 ? jquery 8.02892+-0.02616 ? 8.03566+-0.06614 ? box2d x2 14.59820+-0.09158 14.56639+-0.08077 zlib x2 365.46616+-4.24645 ? 365.89304+-6.19792 ? typescript x2 808.09541+-4.24194 ? 811.27535+-9.85035 ? <geometric> 5.88747+-0.01472 ? 5.92034+-0.02222 ? might be 1.0056x slower baseline patched Kraken: ai-astar 96.012+-1.005 95.992+-1.086 audio-beat-detection 47.485+-1.704 ? 47.850+-1.849 ? audio-dft 123.378+-0.083 123.305+-0.099 audio-fft 37.455+-0.299 ? 37.527+-0.418 ? audio-oscillator 53.709+-0.248 ? 53.717+-0.149 ? imaging-darkroom 87.996+-0.409 ? 88.391+-1.463 ? imaging-desaturate 56.423+-0.106 ? 56.426+-0.120 ? imaging-gaussian-blur 78.804+-2.630 ? 82.774+-2.971 ? might be 1.0504x slower json-parse-financial 45.491+-0.902 44.573+-0.379 might be 1.0206x faster json-stringify-tinderbox 27.090+-0.215 ^ 26.380+-0.018 ^ definitely 1.0269x faster stanford-crypto-aes 43.428+-0.345 ? 43.688+-0.202 ? stanford-crypto-ccm 44.088+-0.955 43.030+-1.039 might be 1.0246x faster stanford-crypto-pbkdf2 110.344+-1.646 108.544+-0.663 might be 1.0166x faster stanford-crypto-sha256-iterative 37.800+-0.070 ^ 35.374+-0.059 ^ definitely 1.0686x faster <arithmetic> 63.536+-0.331 63.398+-0.345 might be 1.0022x faster baseline patched AsmBench: towers.c 279.8851+-6.9314 276.4902+-0.3251 might be 1.0123x faster n-body.c 931.2372+-6.7748 925.7138+-7.9955 float-mm.c 715.8945+-3.8811 ? 718.8219+-3.6709 ? container.cpp 3001.1172+-7.7309 ? 3041.1363+-61.5339 ? might be 1.0133x slower quicksort.c 442.1756+-0.7509 441.4594+-1.0854 gcc-loops.cpp 4187.1696+-34.3155 ? 4212.1413+-120.0717 ? bigfib.cpp 457.8382+-5.4623 454.7133+-4.0568 hash-map 155.6219+-1.4812 ? 156.7201+-1.7656 ? dry.c 485.8583+-9.8037 474.3855+-9.5142 might be 1.0242x faster <geometric> 690.6315+-2.4262 688.9938+-3.2615 might be 1.0024x faster baseline patched Geomean of preferred means: <scaled-result> 35.6095+-0.0759 ? 35.6597+-0.0882 ? might be 1.0014x slower
Darin Adler
Comment 8 2016-06-06 12:22:57 PDT
Comment on attachment 280611 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=280611&action=review > Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:72 > + auto throwExeption = [exec] { Typo here: "throwExceptoin" has a "c" in it. > Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:80 > + const CharacterType* end = characters + length; I think auto* would be good here. > Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:81 > + for (const CharacterType* cursor = characters; cursor != end; ++cursor) { I think auto* would be good here. > Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:82 > + CharacterType character = *cursor; I think auto would be good here. > Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:85 > + if (character < doNotEscape.size() && doNotEscape.get(static_cast<LChar>(character))) { The cast to LChar isn’t needed here. > Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:147 > + JSString::SafeView view = exec->argument(0).toString(exec)->view(exec); > + if (exec->hadException()) > + return jsUndefined(); Can do the exception check even faster by using toStringOrNull.
Yusuke Suzuki
Comment 9 2016-06-07 03:59:55 PDT
Yusuke Suzuki
Comment 10 2016-06-07 04:01:52 PDT
Comment on attachment 280611 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=280611&action=review Thanks! >> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:72 >> + auto throwExeption = [exec] { > > Typo here: "throwExceptoin" has a "c" in it. Thanks. Fixed. >> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:80 >> + const CharacterType* end = characters + length; > > I think auto* would be good here. OK, changed. >> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:81 >> + for (const CharacterType* cursor = characters; cursor != end; ++cursor) { > > I think auto* would be good here. Changed. >> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:82 >> + CharacterType character = *cursor; > > I think auto would be good here. Changed. >> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:85 >> + if (character < doNotEscape.size() && doNotEscape.get(static_cast<LChar>(character))) { > > The cast to LChar isn’t needed here. Right. Dropped it. > Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:111 > + CharacterType trail = *cursor; Also changed here. >> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:147 >> + return jsUndefined(); > > Can do the exception check even faster by using toStringOrNull. OK, I've introduced helper function toSafeView helper function using toStringOrNull, and use it.
Yusuke Suzuki
Comment 11 2016-06-07 04:04:09 PDT
And updated perf results. Perf-neutral and only kraken stanford-crypto-sha256-iterative poses significant improvement! (6.95% win). Benchmark report for SunSpider, Octane, Kraken, and AsmBench on hanayamata. VMs tested: "baseline" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/encode-master/Release/bin/jsc "patched" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/encode/Release/bin/jsc Collected 30 samples per benchmark/VM, with 30 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. baseline patched SunSpider: 3d-cube 6.0057+-0.0198 ? 6.1057+-0.1744 ? might be 1.0167x slower 3d-morph 25.8485+-0.1622 25.7666+-0.0607 3d-raytrace 6.8788+-0.2435 ? 6.9953+-0.2588 ? might be 1.0169x slower access-binary-trees 2.3389+-0.0948 ? 2.3906+-0.1032 ? might be 1.0221x slower access-fannkuch 7.1196+-0.3758 6.7184+-0.3016 might be 1.0597x faster access-nbody 2.8874+-0.0546 2.8483+-0.0092 might be 1.0137x faster access-nsieve 3.2575+-0.0572 3.2037+-0.0355 might be 1.0168x faster bitops-3bit-bits-in-byte 1.1926+-0.0412 ? 1.2466+-0.0567 ? might be 1.0452x slower bitops-bits-in-byte 2.8321+-0.1273 2.7691+-0.0905 might be 1.0228x faster bitops-bitwise-and 1.9807+-0.0186 ? 1.9892+-0.0160 ? bitops-nsieve-bits 3.3687+-0.1826 3.2547+-0.1464 might be 1.0350x faster controlflow-recursive 2.8151+-0.1090 2.7681+-0.1113 might be 1.0170x faster crypto-aes 5.2213+-0.0239 5.2048+-0.0263 crypto-md5 2.7792+-0.1280 ? 2.8774+-0.1670 ? might be 1.0353x slower crypto-sha1 2.6602+-0.1036 2.5958+-0.0928 might be 1.0248x faster date-format-tofte 10.8588+-0.1330 10.7968+-0.1359 date-format-xparb 5.9977+-0.0626 ? 6.0072+-0.0642 ? math-cordic 3.2139+-0.1355 ? 3.3180+-0.1822 ? might be 1.0324x slower math-partial-sums 10.4896+-0.0292 ? 10.5342+-0.0272 ? math-spectral-norm 2.1908+-0.0461 2.1884+-0.0515 regexp-dna 7.5171+-0.3227 7.5046+-0.1887 string-base64 3.9660+-0.0404 ? 3.9831+-0.0917 ? string-fasta 6.6948+-0.2576 ? 6.7119+-0.2643 ? string-tagcloud 9.6835+-0.1392 ? 10.0230+-0.4847 ? might be 1.0351x slower string-unpack-code 19.5754+-0.6329 19.2873+-0.2130 might be 1.0149x faster string-validate-input 4.3082+-0.0506 4.3059+-0.1014 <arithmetic> 6.2185+-0.0411 6.2075+-0.0338 might be 1.0018x faster baseline patched Octane: encrypt 0.17891+-0.00163 ? 0.17985+-0.00090 ? decrypt 3.05679+-0.01532 3.05187+-0.02649 deltablue x2 0.14918+-0.00115 0.14914+-0.00097 earley 0.33676+-0.00077 ? 0.33696+-0.00075 ? boyer 5.32338+-0.00652 ? 5.33086+-0.00826 ? navier-stokes x2 4.79202+-0.01165 4.78352+-0.00324 raytrace x2 0.93905+-0.00395 0.93443+-0.00244 richards x2 0.09786+-0.00098 0.09734+-0.00077 splay x2 0.39187+-0.00064 ! 0.39528+-0.00164 ! definitely 1.0087x slower regexp x2 18.67934+-0.04428 ? 18.68563+-0.06320 ? pdfjs x2 42.37457+-0.22313 ? 42.53163+-0.23577 ? mandreel x2 49.18076+-0.30401 49.03960+-0.26038 gbemu x2 39.58926+-1.46815 39.42399+-1.35211 closure 0.59924+-0.00202 0.59759+-0.00236 jquery 8.03726+-0.02229 ^ 7.97781+-0.01915 ^ definitely 1.0075x faster box2d x2 14.77491+-0.14279 ? 14.79194+-0.39388 ? zlib x2 366.10334+-1.85236 ? 370.49943+-8.25235 ? might be 1.0120x slower typescript x2 809.89431+-7.48454 806.59772+-4.38135 <geometric> 5.91064+-0.01874 5.90975+-0.01701 might be 1.0002x faster baseline patched Kraken: ai-astar 96.584+-1.312 95.779+-0.907 audio-beat-detection 47.466+-1.695 ? 47.878+-1.842 ? audio-dft 123.710+-0.387 123.449+-0.240 audio-fft 37.287+-0.019 ? 37.391+-0.214 ? audio-oscillator 53.816+-0.121 53.725+-0.133 imaging-darkroom 87.655+-0.095 ? 88.079+-0.886 ? imaging-desaturate 56.279+-0.059 ? 56.297+-0.059 ? imaging-gaussian-blur 78.756+-2.797 ? 81.308+-5.269 ? might be 1.0324x slower json-parse-financial 44.320+-0.346 ? 44.692+-0.328 ? json-stringify-tinderbox 26.466+-0.078 ? 26.587+-0.063 ? stanford-crypto-aes 43.466+-0.218 ? 43.718+-0.276 ? stanford-crypto-ccm 43.445+-0.793 ? 43.454+-0.768 ? stanford-crypto-pbkdf2 110.258+-0.638 109.057+-1.297 might be 1.0110x faster stanford-crypto-sha256-iterative 37.952+-0.155 ^ 35.484+-0.265 ^ definitely 1.0695x faster <arithmetic> 63.390+-0.289 63.350+-0.409 might be 1.0006x faster baseline patched AsmBench: towers.c 275.9903+-0.5072 275.0513+-0.9836 n-body.c 930.4769+-10.5416 923.1822+-5.2754 float-mm.c 724.1009+-7.4184 721.0275+-3.6279 container.cpp 3022.6193+-19.0074 3018.9513+-13.0584 quicksort.c 441.5230+-2.3981 441.2127+-1.4556 gcc-loops.cpp 4136.4039+-17.9790 ? 4216.1011+-108.8173 ? might be 1.0193x slower bigfib.cpp 452.7611+-3.4175 ? 453.3807+-4.4878 ? hash-map 160.4808+-3.1137 ? 161.1450+-3.3633 ? dry.c 484.4161+-10.0337 ? 492.7318+-27.3824 ? might be 1.0172x slower <geometric> 691.1976+-3.2104 ? 692.5322+-5.3524 ? might be 1.0019x slower baseline patched Geomean of preferred means: <scaled-result> 35.6211+-0.0794 35.6144+-0.0890 might be 1.0002x faster
Geoffrey Garen
Comment 12 2016-06-07 09:54:50 PDT
Comment on attachment 280691 [details] Patch r=me
Yusuke Suzuki
Comment 13 2016-06-07 10:21:48 PDT
Note You need to log in before you can comment on or make changes to this bug.