Bug 158416

Summary: [JSC] Do not allocate unnecessary UTF-8 string for encodeXXX functions
Product: WebKit Reporter: Yusuke Suzuki <ysuzuki>
Component: JavaScriptCoreAssignee: Yusuke Suzuki <ysuzuki>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, cdumez, cmarcelo, commit-queue, fpizlo, ggaren, keith_miller, mark.lam, msaboff, saam
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch ggaren: review+

Description Yusuke Suzuki 2016-06-06 05:18:41 PDT
...
Comment 1 Yusuke Suzuki 2016-06-06 08:15:36 PDT
Let's make kraken's stanford-crypto-sha256-iterative fast!
Comment 2 Yusuke Suzuki 2016-06-06 11:34:49 PDT
Created attachment 280611 [details]
Patch
Comment 3 Yusuke Suzuki 2016-06-06 11:42:00 PDT
Ah, I'll fix the patch slightly.
Comment 4 Mark Lam 2016-06-06 11:42:12 PDT
Comment on attachment 280611 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280611&action=review

> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:59
> +    static_assert(charactersCount > 0, "Since string literal is null terminated, characterCount is always lager than 0");

typo: /lager/larger/.
Comment 5 Yusuke Suzuki 2016-06-06 11:51:18 PDT
Comment on attachment 280611 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280611&action=review

>> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:59
>> +    static_assert(charactersCount > 0, "Since string literal is null terminated, characterCount is always lager than 0");
> 
> typo: /lager/larger/.

Thanks. I'll fix this and upload the patch with build fix (in clang env) & perf results soon :)
Comment 6 Yusuke Suzuki 2016-06-06 12:12:25 PDT
Created attachment 280616 [details]
Patch
Comment 7 Yusuke Suzuki 2016-06-06 12:14:48 PDT
Benchmark results. stanford-crypto-sha256-iterative shows significant perf improvements. (6.8% win)
Octane splay and Kraken imaging-gaussian-blur are noise since they do not execute any of encodeXXX, escape, etc.


Benchmark report for SunSpider, Octane, Kraken, and AsmBench on hanayamata.

VMs tested:
"baseline" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/encode-master/Release/bin/jsc
"patched" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/encode/Release/bin/jsc

Collected 30 samples per benchmark/VM, with 30 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to
get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                 baseline                  patched                                      
SunSpider:
   3d-cube                                    6.0684+-0.1519     ?      6.1302+-0.2071        ? might be 1.0102x slower
   3d-morph                                  26.1241+-0.5836           26.0951+-0.3270        
   3d-raytrace                                7.0482+-0.3035     ?      7.0791+-0.3320        ?
   access-binary-trees                        2.3861+-0.1070     ?      2.4146+-0.1097        ? might be 1.0119x slower
   access-fannkuch                            7.1236+-0.3433            7.1217+-0.3383        
   access-nbody                               3.0435+-0.3104            2.8864+-0.0151          might be 1.0544x faster
   access-nsieve                              3.2426+-0.0435     ?      3.2448+-0.0426        ?
   bitops-3bit-bits-in-byte                   1.1916+-0.0382     ?      1.2517+-0.0536        ? might be 1.0504x slower
   bitops-bits-in-byte                        2.7368+-0.0837     ?      2.7787+-0.1000        ? might be 1.0153x slower
   bitops-bitwise-and                         1.9845+-0.0075     ?      2.0047+-0.0306        ? might be 1.0102x slower
   bitops-nsieve-bits                         3.3287+-0.1575     ?      3.3993+-0.1863        ? might be 1.0212x slower
   controlflow-recursive                      2.8508+-0.1212            2.7444+-0.1061          might be 1.0388x faster
   crypto-aes                                 5.2183+-0.0471     ?      5.2989+-0.1521        ? might be 1.0154x slower
   crypto-md5                                 2.7435+-0.1301     ?      2.8538+-0.1488        ? might be 1.0402x slower
   crypto-sha1                                2.5278+-0.0131     ?      2.5597+-0.0562        ? might be 1.0126x slower
   date-format-tofte                         10.8944+-0.0941     ?     11.0237+-0.1755        ? might be 1.0119x slower
   date-format-xparb                          6.1118+-0.3347            5.9302+-0.0388          might be 1.0306x faster
   math-cordic                                3.2625+-0.1810            3.1901+-0.0993          might be 1.0227x faster
   math-partial-sums                         10.5061+-0.0899     ?     10.6684+-0.2468        ? might be 1.0155x slower
   math-spectral-norm                         2.2008+-0.0501            2.1712+-0.0436          might be 1.0136x faster
   regexp-dna                                 7.4185+-0.0994     ?      7.5740+-0.4361        ? might be 1.0210x slower
   string-base64                              3.9481+-0.0232     ?      3.9552+-0.0182        ?
   string-fasta                               6.5388+-0.1743     ?      6.5890+-0.2237        ?
   string-tagcloud                            9.7219+-0.0471     ?      9.7571+-0.0502        ?
   string-unpack-code                        19.3886+-0.0646     ?     19.5157+-0.1861        ?
   string-validate-input                      4.2523+-0.0357     ?      4.3758+-0.1318        ? might be 1.0291x slower

   <arithmetic>                               6.2255+-0.0406     ?      6.2544+-0.0353        ? might be 1.0046x slower

                                                 baseline                  patched                                      
Octane:
   encrypt                                   0.17882+-0.00088    ?     0.18106+-0.00170       ? might be 1.0125x slower
   decrypt                                   3.04865+-0.01341    ?     3.06151+-0.02088       ?
   deltablue                        x2       0.14966+-0.00199    ?     0.15034+-0.00148       ?
   earley                                    0.33620+-0.00072    ?     0.33678+-0.00087       ?
   boyer                                     5.32185+-0.00853          5.31366+-0.00566       
   navier-stokes                    x2       4.78919+-0.01036    ?     4.79750+-0.01353       ?
   raytrace                         x2       0.93351+-0.00266    ?     0.93496+-0.00388       ?
   richards                         x2       0.09707+-0.00063    ?     0.09737+-0.00076       ?
   splay                            x2       0.39284+-0.00116    !     0.40022+-0.00175       ! definitely 1.0188x slower
   regexp                           x2      18.68103+-0.06739    ?    18.81026+-0.07202       ?
   pdfjs                            x2      42.34719+-0.29505    ?    42.59288+-0.34460       ?
   mandreel                         x2      48.95736+-0.19510    ?    49.30969+-0.47229       ?
   gbemu                            x2      38.37419+-1.43813    ?    39.22959+-1.73683       ? might be 1.0223x slower
   closure                                   0.60627+-0.00159    ?     0.60960+-0.00346       ?
   jquery                                    8.02892+-0.02616    ?     8.03566+-0.06614       ?
   box2d                            x2      14.59820+-0.09158         14.56639+-0.08077       
   zlib                             x2     365.46616+-4.24645    ?   365.89304+-6.19792       ?
   typescript                       x2     808.09541+-4.24194    ?   811.27535+-9.85035       ?

   <geometric>                               5.88747+-0.01472    ?     5.92034+-0.02222       ? might be 1.0056x slower

                                                 baseline                  patched                                      
Kraken:
   ai-astar                                   96.012+-1.005             95.992+-1.086         
   audio-beat-detection                       47.485+-1.704      ?      47.850+-1.849         ?
   audio-dft                                 123.378+-0.083            123.305+-0.099         
   audio-fft                                  37.455+-0.299      ?      37.527+-0.418         ?
   audio-oscillator                           53.709+-0.248      ?      53.717+-0.149         ?
   imaging-darkroom                           87.996+-0.409      ?      88.391+-1.463         ?
   imaging-desaturate                         56.423+-0.106      ?      56.426+-0.120         ?
   imaging-gaussian-blur                      78.804+-2.630      ?      82.774+-2.971         ? might be 1.0504x slower
   json-parse-financial                       45.491+-0.902             44.573+-0.379           might be 1.0206x faster
   json-stringify-tinderbox                   27.090+-0.215      ^      26.380+-0.018         ^ definitely 1.0269x faster
   stanford-crypto-aes                        43.428+-0.345      ?      43.688+-0.202         ?
   stanford-crypto-ccm                        44.088+-0.955             43.030+-1.039           might be 1.0246x faster
   stanford-crypto-pbkdf2                    110.344+-1.646            108.544+-0.663           might be 1.0166x faster
   stanford-crypto-sha256-iterative           37.800+-0.070      ^      35.374+-0.059         ^ definitely 1.0686x faster

   <arithmetic>                               63.536+-0.331             63.398+-0.345           might be 1.0022x faster

                                                 baseline                  patched                                      
AsmBench:
   towers.c                                 279.8851+-6.9314          276.4902+-0.3251          might be 1.0123x faster
   n-body.c                                 931.2372+-6.7748          925.7138+-7.9955        
   float-mm.c                               715.8945+-3.8811     ?    718.8219+-3.6709        ?
   container.cpp                           3001.1172+-7.7309     ?   3041.1363+-61.5339       ? might be 1.0133x slower
   quicksort.c                              442.1756+-0.7509          441.4594+-1.0854        
   gcc-loops.cpp                           4187.1696+-34.3155    ?   4212.1413+-120.0717      ?
   bigfib.cpp                               457.8382+-5.4623          454.7133+-4.0568        
   hash-map                                 155.6219+-1.4812     ?    156.7201+-1.7656        ?
   dry.c                                    485.8583+-9.8037          474.3855+-9.5142          might be 1.0242x faster

   <geometric>                              690.6315+-2.4262          688.9938+-3.2615          might be 1.0024x faster

                                                 baseline                  patched                                      
Geomean of preferred means:
   <scaled-result>                           35.6095+-0.0759     ?     35.6597+-0.0882        ? might be 1.0014x slower
Comment 8 Darin Adler 2016-06-06 12:22:57 PDT
Comment on attachment 280611 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280611&action=review

> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:72
> +    auto throwExeption = [exec] {

Typo here: "throwExceptoin" has a "c" in it.

> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:80
> +    const CharacterType* end = characters + length;

I think auto* would be good here.

> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:81
> +    for (const CharacterType* cursor = characters; cursor != end; ++cursor) {

I think auto* would be good here.

> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:82
> +        CharacterType character = *cursor;

I think auto would be good here.

> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:85
> +        if (character < doNotEscape.size() && doNotEscape.get(static_cast<LChar>(character))) {

The cast to LChar isn’t needed here.

> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:147
> +    JSString::SafeView view = exec->argument(0).toString(exec)->view(exec);
> +    if (exec->hadException())
> +        return jsUndefined();

Can do the exception check even faster by using toStringOrNull.
Comment 9 Yusuke Suzuki 2016-06-07 03:59:55 PDT
Created attachment 280691 [details]
Patch
Comment 10 Yusuke Suzuki 2016-06-07 04:01:52 PDT
Comment on attachment 280611 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280611&action=review

Thanks!

>> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:72
>> +    auto throwExeption = [exec] {
> 
> Typo here: "throwExceptoin" has a "c" in it.

Thanks. Fixed.

>> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:80
>> +    const CharacterType* end = characters + length;
> 
> I think auto* would be good here.

OK, changed.

>> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:81
>> +    for (const CharacterType* cursor = characters; cursor != end; ++cursor) {
> 
> I think auto* would be good here.

Changed.

>> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:82
>> +        CharacterType character = *cursor;
> 
> I think auto would be good here.

Changed.

>> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:85
>> +        if (character < doNotEscape.size() && doNotEscape.get(static_cast<LChar>(character))) {
> 
> The cast to LChar isn’t needed here.

Right. Dropped it.

> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:111
> +            CharacterType trail = *cursor;

Also changed here.

>> Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:147
>> +        return jsUndefined();
> 
> Can do the exception check even faster by using toStringOrNull.

OK, I've introduced helper function toSafeView helper function using toStringOrNull, and use it.
Comment 11 Yusuke Suzuki 2016-06-07 04:04:09 PDT
And updated perf results. Perf-neutral and only kraken stanford-crypto-sha256-iterative poses significant improvement! (6.95% win).

Benchmark report for SunSpider, Octane, Kraken, and AsmBench on hanayamata.

VMs tested:
"baseline" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/encode-master/Release/bin/jsc
"patched" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/encode/Release/bin/jsc

Collected 30 samples per benchmark/VM, with 30 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to
get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                 baseline                  patched
SunSpider:
   3d-cube                                    6.0057+-0.0198     ?      6.1057+-0.1744        ? might be 1.0167x slower
   3d-morph                                  25.8485+-0.1622           25.7666+-0.0607
   3d-raytrace                                6.8788+-0.2435     ?      6.9953+-0.2588        ? might be 1.0169x slower
   access-binary-trees                        2.3389+-0.0948     ?      2.3906+-0.1032        ? might be 1.0221x slower
   access-fannkuch                            7.1196+-0.3758            6.7184+-0.3016          might be 1.0597x faster
   access-nbody                               2.8874+-0.0546            2.8483+-0.0092          might be 1.0137x faster
   access-nsieve                              3.2575+-0.0572            3.2037+-0.0355          might be 1.0168x faster
   bitops-3bit-bits-in-byte                   1.1926+-0.0412     ?      1.2466+-0.0567        ? might be 1.0452x slower
   bitops-bits-in-byte                        2.8321+-0.1273            2.7691+-0.0905          might be 1.0228x faster
   bitops-bitwise-and                         1.9807+-0.0186     ?      1.9892+-0.0160        ?
   bitops-nsieve-bits                         3.3687+-0.1826            3.2547+-0.1464          might be 1.0350x faster
   controlflow-recursive                      2.8151+-0.1090            2.7681+-0.1113          might be 1.0170x faster
   crypto-aes                                 5.2213+-0.0239            5.2048+-0.0263
   crypto-md5                                 2.7792+-0.1280     ?      2.8774+-0.1670        ? might be 1.0353x slower
   crypto-sha1                                2.6602+-0.1036            2.5958+-0.0928          might be 1.0248x faster
   date-format-tofte                         10.8588+-0.1330           10.7968+-0.1359
   date-format-xparb                          5.9977+-0.0626     ?      6.0072+-0.0642        ?
   math-cordic                                3.2139+-0.1355     ?      3.3180+-0.1822        ? might be 1.0324x slower
   math-partial-sums                         10.4896+-0.0292     ?     10.5342+-0.0272        ?
   math-spectral-norm                         2.1908+-0.0461            2.1884+-0.0515
   regexp-dna                                 7.5171+-0.3227            7.5046+-0.1887
   string-base64                              3.9660+-0.0404     ?      3.9831+-0.0917        ?
   string-fasta                               6.6948+-0.2576     ?      6.7119+-0.2643        ?
   string-tagcloud                            9.6835+-0.1392     ?     10.0230+-0.4847        ? might be 1.0351x slower
   string-unpack-code                        19.5754+-0.6329           19.2873+-0.2130          might be 1.0149x faster
   string-validate-input                      4.3082+-0.0506            4.3059+-0.1014

   <arithmetic>                               6.2185+-0.0411            6.2075+-0.0338          might be 1.0018x faster

                                                 baseline                  patched
Octane:
   encrypt                                   0.17891+-0.00163    ?     0.17985+-0.00090       ?
   decrypt                                   3.05679+-0.01532          3.05187+-0.02649
   deltablue                        x2       0.14918+-0.00115          0.14914+-0.00097
   earley                                    0.33676+-0.00077    ?     0.33696+-0.00075       ?
   boyer                                     5.32338+-0.00652    ?     5.33086+-0.00826       ?
   navier-stokes                    x2       4.79202+-0.01165          4.78352+-0.00324
   raytrace                         x2       0.93905+-0.00395          0.93443+-0.00244
   richards                         x2       0.09786+-0.00098          0.09734+-0.00077
   splay                            x2       0.39187+-0.00064    !     0.39528+-0.00164       ! definitely 1.0087x slower
   regexp                           x2      18.67934+-0.04428    ?    18.68563+-0.06320       ?
   pdfjs                            x2      42.37457+-0.22313    ?    42.53163+-0.23577       ?
   mandreel                         x2      49.18076+-0.30401         49.03960+-0.26038
   gbemu                            x2      39.58926+-1.46815         39.42399+-1.35211
   closure                                   0.59924+-0.00202          0.59759+-0.00236
   jquery                                    8.03726+-0.02229    ^     7.97781+-0.01915       ^ definitely 1.0075x faster
   box2d                            x2      14.77491+-0.14279    ?    14.79194+-0.39388       ?
   zlib                             x2     366.10334+-1.85236    ?   370.49943+-8.25235       ? might be 1.0120x slower
   typescript                       x2     809.89431+-7.48454        806.59772+-4.38135

   <geometric>                               5.91064+-0.01874          5.90975+-0.01701         might be 1.0002x faster

                                                 baseline                  patched
Kraken:
   ai-astar                                   96.584+-1.312             95.779+-0.907
   audio-beat-detection                       47.466+-1.695      ?      47.878+-1.842         ?
   audio-dft                                 123.710+-0.387            123.449+-0.240
   audio-fft                                  37.287+-0.019      ?      37.391+-0.214         ?
   audio-oscillator                           53.816+-0.121             53.725+-0.133
   imaging-darkroom                           87.655+-0.095      ?      88.079+-0.886         ?
   imaging-desaturate                         56.279+-0.059      ?      56.297+-0.059         ?
   imaging-gaussian-blur                      78.756+-2.797      ?      81.308+-5.269         ? might be 1.0324x slower
   json-parse-financial                       44.320+-0.346      ?      44.692+-0.328         ?
   json-stringify-tinderbox                   26.466+-0.078      ?      26.587+-0.063         ?
   stanford-crypto-aes                        43.466+-0.218      ?      43.718+-0.276         ?
   stanford-crypto-ccm                        43.445+-0.793      ?      43.454+-0.768         ?
   stanford-crypto-pbkdf2                    110.258+-0.638            109.057+-1.297           might be 1.0110x faster
   stanford-crypto-sha256-iterative           37.952+-0.155      ^      35.484+-0.265         ^ definitely 1.0695x faster

   <arithmetic>                               63.390+-0.289             63.350+-0.409           might be 1.0006x faster

                                                 baseline                  patched
AsmBench:
   towers.c                                 275.9903+-0.5072          275.0513+-0.9836
   n-body.c                                 930.4769+-10.5416         923.1822+-5.2754
   float-mm.c                               724.1009+-7.4184          721.0275+-3.6279
   container.cpp                           3022.6193+-19.0074        3018.9513+-13.0584
   quicksort.c                              441.5230+-2.3981          441.2127+-1.4556
   gcc-loops.cpp                           4136.4039+-17.9790    ?   4216.1011+-108.8173      ? might be 1.0193x slower
   bigfib.cpp                               452.7611+-3.4175     ?    453.3807+-4.4878        ?
   hash-map                                 160.4808+-3.1137     ?    161.1450+-3.3633        ?
   dry.c                                    484.4161+-10.0337    ?    492.7318+-27.3824       ? might be 1.0172x slower

   <geometric>                              691.1976+-3.2104     ?    692.5322+-5.3524        ? might be 1.0019x slower

                                                 baseline                  patched
Geomean of preferred means:
   <scaled-result>                           35.6211+-0.0794           35.6144+-0.0890          might be 1.0002x faster
Comment 12 Geoffrey Garen 2016-06-07 09:54:50 PDT
Comment on attachment 280691 [details]
Patch

r=me
Comment 13 Yusuke Suzuki 2016-06-07 10:21:48 PDT
Committed r201756: <http://trac.webkit.org/changeset/201756>