Bug 146260 - Array.concat should be fast for integer or double arrays
Summary: Array.concat should be fast for integer or double arrays
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Ryosuke Niwa
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-23 16:41 PDT by Ryosuke Niwa
Modified: 2015-07-06 13:20 PDT (History)
7 users (show)

See Also:


Attachments
WIP (6.45 KB, patch)
2015-06-23 16:44 PDT, Ryosuke Niwa
no flags Details | Formatted Diff | Diff
Patch (5.83 KB, patch)
2015-07-06 01:44 PDT, Ryosuke Niwa
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ryosuke Niwa 2015-06-23 16:41:52 PDT
Make it faster!
Comment 1 Ryosuke Niwa 2015-06-23 16:44:05 PDT
Created attachment 255449 [details]
WIP
Comment 2 Ryosuke Niwa 2015-07-06 01:44:29 PDT
Created attachment 256202 [details]
Patch
Comment 3 Ryosuke Niwa 2015-07-06 01:50:09 PDT
                                                           ToT                     Patched                                      
SunSpider:
   3d-cube                                            4.2256+-0.2080            4.1486+-0.1228          might be 1.0186x faster
   3d-morph                                           5.1026+-0.5274            4.8358+-0.1522          might be 1.0552x faster
   3d-raytrace                                        5.3484+-0.6021            4.9348+-0.1349          might be 1.0838x faster
   access-binary-trees                                2.7130+-1.6508            2.1113+-0.2080          might be 1.2850x faster
   access-fannkuch                                    5.0765+-0.1759     ?      5.1047+-0.1920        ?
   access-nbody                                       2.4000+-0.1861            2.3014+-0.0689          might be 1.0428x faster
   access-nsieve                                      3.2500+-0.2123            3.1671+-0.1682          might be 1.0262x faster
   bitops-3bit-bits-in-byte                           1.3497+-0.0580     ?      1.3560+-0.0258        ?
   bitops-bits-in-byte                                3.2062+-0.2491     ?      3.3033+-0.3411        ? might be 1.0303x slower
   bitops-bitwise-and                                 1.9884+-0.3176            1.9120+-0.0617          might be 1.0400x faster
   bitops-nsieve-bits                                 2.9948+-0.6010            2.7932+-0.0961          might be 1.0722x faster
   controlflow-recursive                              2.4005+-0.3053            2.2630+-0.2697          might be 1.0608x faster
   crypto-aes                                         3.8080+-0.4321            3.6125+-0.1643          might be 1.0541x faster
   crypto-md5                                         2.4933+-0.6524            2.3353+-0.1614          might be 1.0677x faster
   crypto-sha1                                        2.9265+-0.7769            2.6860+-0.2184          might be 1.0896x faster
   date-format-tofte                                  6.5455+-0.8765            6.1035+-0.2283          might be 1.0724x faster
   date-format-xparb                                  4.5027+-0.1354     ?      4.6120+-0.9595        ? might be 1.0243x slower
   math-cordic                                        2.5932+-0.1076     ?      2.7600+-0.4219        ? might be 1.0643x slower
   math-partial-sums                                  4.2813+-0.2447     ?      4.3400+-0.2924        ? might be 1.0137x slower
   math-spectral-norm                                 1.8025+-0.1362            1.7240+-0.0404          might be 1.0455x faster
   regexp-dna                                         6.3435+-0.3175            6.2752+-0.5174          might be 1.0109x faster
   string-base64                                      4.2049+-0.1331     ?      4.2401+-0.1050        ?
   string-fasta                                       5.8904+-0.7815            5.8338+-0.5782        
   string-tagcloud                                    8.7347+-1.6620            8.3389+-0.9243          might be 1.0475x faster
   string-unpack-code                                18.7471+-1.1970     ?     19.2507+-2.1459        ? might be 1.0269x slower
   string-validate-input                              4.9433+-0.2834     ?      5.1698+-0.2136        ? might be 1.0458x slower

   <arithmetic>                                       4.5336+-0.1186            4.4428+-0.1392          might be 1.0204x faster

                                                           ToT                     Patched                                      
LongSpider:
   3d-cube                                          732.4911+-17.6962         732.0290+-10.2143       
   3d-morph                                        1430.1607+-18.0872    ?   1434.8452+-26.4035       ?
   3d-raytrace                                      596.0754+-7.6586     ?    599.5426+-1.8276        ?
   access-binary-trees                              781.2598+-2.7399     ?    784.9316+-24.1927       ?
   access-fannkuch                                  271.4528+-24.5880         262.6239+-3.3812          might be 1.0336x faster
   access-nbody                                     485.5085+-4.3272     ?    499.2198+-45.4226       ? might be 1.0282x slower
   access-nsieve                                    353.0665+-5.4384     ?    370.4830+-24.7586       ? might be 1.0493x slower
   bitops-3bit-bits-in-byte                          41.5132+-6.6918           38.5641+-2.4091          might be 1.0765x faster
   bitops-bits-in-byte                               74.2993+-8.7717           74.0218+-6.2145        
   bitops-nsieve-bits                               392.9969+-7.6091          389.0017+-9.2592          might be 1.0103x faster
   controlflow-recursive                            392.9302+-0.8604     ?    409.9299+-18.8462       ? might be 1.0433x slower
   crypto-aes                                       553.3612+-44.8798         552.0107+-6.3426        
   crypto-md5                                       457.0887+-35.6345         444.5355+-2.4949          might be 1.0282x faster
   crypto-sha1                                      603.7267+-1.3899          602.5683+-3.4080        
   date-format-tofte                                475.3430+-6.6626     ?    502.3791+-39.1386       ? might be 1.0569x slower
   date-format-xparb                                606.6652+-17.4584    ?    610.6105+-32.9028       ?
   hash-map                                         156.7812+-2.5357          154.7148+-2.6661          might be 1.0134x faster
   math-cordic                                      463.6342+-11.6011         462.2351+-5.8737        
   math-partial-sums                                394.3937+-3.7936     ?    396.2653+-9.3538        ?
   math-spectral-norm                               539.7730+-43.9003         523.9571+-5.8828          might be 1.0302x faster
   string-base64                                    327.7774+-4.6003          324.7231+-5.5633        
   string-fasta                                     352.8232+-35.0304         344.6918+-3.7037          might be 1.0236x faster
   string-tagcloud                                  173.8855+-4.5186          171.7451+-5.2591          might be 1.0125x faster

   <geometric>                                      373.6247+-6.0726          372.9615+-1.9751          might be 1.0018x faster

                                                           ToT                     Patched                                      
V8Spider:
   crypto                                            48.7357+-4.2400           48.6815+-3.2025        
   deltablue                                         79.6133+-21.4432          78.2777+-7.8502          might be 1.0171x faster
   earley-boyer                                      38.2008+-1.6078     !     40.7530+-0.7233        ! definitely 1.0668x slower
   raytrace                                          34.3981+-12.4652          30.8655+-2.6446          might be 1.1144x faster
   regexp                                            63.7994+-6.6769           60.8007+-4.2471          might be 1.0493x faster
   richards                                          68.0728+-7.1028     ?     68.4914+-5.8730        ?
   splay                                             33.2208+-2.1391     ?     34.9760+-5.7262        ? might be 1.0528x slower

   <geometric>                                       49.3750+-3.5622           49.1452+-1.2383          might be 1.0047x faster

                                                           ToT                     Patched                                      
Octane:
   encrypt                                           0.19499+-0.00326          0.19406+-0.00144       
   decrypt                                           3.20066+-0.02016          3.16506+-0.05610         might be 1.0112x faster
   deltablue                                x2       0.14692+-0.00567    ?     0.15166+-0.00693       ? might be 1.0323x slower
   earley                                            0.27452+-0.04017          0.26486+-0.00502         might be 1.0365x faster
   boyer                                             4.08647+-0.41190          3.98210+-0.07456         might be 1.0262x faster
   navier-stokes                            x2       4.72553+-0.05136          4.69365+-0.01734       
   raytrace                                 x2       1.02673+-0.18760          0.94837+-0.05858         might be 1.0826x faster
   richards                                 x2       0.09792+-0.00469          0.09715+-0.00302       
   splay                                    x2       0.33353+-0.00701    ?     0.33555+-0.00657       ?
   regexp                                   x2      23.97925+-0.36176    ?    24.29159+-2.46113       ? might be 1.0130x slower
   pdfjs                                    x2      36.43494+-1.38061         36.38408+-0.47574       
   mandreel                                 x2      41.94183+-0.89732         41.93924+-0.97913       
   gbemu                                    x2      32.38843+-0.35382    ?    34.03859+-3.46676       ? might be 1.0509x slower
   closure                                           0.52639+-0.01189          0.52277+-0.00248       
   jquery                                            6.64410+-0.07917          6.61718+-0.06709       
   box2d                                    x2       9.43033+-0.09989    ?     9.43754+-0.19752       ?
   zlib                                     x2     368.38127+-17.47167       354.91610+-33.68531        might be 1.0379x faster
   typescript                               x2     627.64178+-28.46865   ?   706.43146+-206.08316     ? might be 1.1255x slower

   <geometric>                                       5.38737+-0.03327    ?     5.40005+-0.13357       ? might be 1.0024x slower

                                                           ToT                     Patched                                      
Kraken:
   ai-astar                                          227.345+-1.622      ^     212.479+-2.620         ^ definitely 1.0700x faster
   audio-beat-detection                               69.711+-1.397             69.674+-0.514         
   audio-dft                                         106.789+-8.551            102.338+-2.838           might be 1.0435x faster
   audio-fft                                          60.339+-9.890      ?      62.430+-1.107         ? might be 1.0347x slower
   audio-oscillator                                   59.859+-3.140      ?      60.376+-3.963         ?
   imaging-darkroom                                   87.248+-2.666      ?      89.140+-5.429         ? might be 1.0217x slower
   imaging-desaturate                                 54.181+-3.706             50.404+-5.320           might be 1.0749x faster
   imaging-gaussian-blur                              79.132+-0.749      ?      79.543+-0.762         ?
   json-parse-financial                               38.369+-2.436      ?      38.392+-2.100         ?
   json-stringify-tinderbox                           23.974+-2.558             23.196+-0.687           might be 1.0335x faster
   stanford-crypto-aes                                51.694+-1.405      ?      51.740+-3.546         ?
   stanford-crypto-ccm                                39.247+-3.695             37.200+-0.773           might be 1.0550x faster
   stanford-crypto-pbkdf2                             93.993+-1.661             90.963+-1.474           might be 1.0333x faster
   stanford-crypto-sha256-iterative                   36.762+-4.860             35.634+-2.017           might be 1.0317x faster

   <arithmetic>                                       73.475+-1.349             71.679+-0.583           might be 1.0250x faster

                                                           ToT                     Patched                                      
JSRegress:
...
   <geometric>                                        7.5597+-0.0689            7.5557+-0.0360          might be 1.0005x faster

                                                           ToT                     Patched                                      
AsmBench:
   bigfib.cpp                                       434.4980+-3.7348     ?    437.9674+-6.1751        ?
   cray.c                                           389.1498+-21.8900         380.0905+-3.2306          might be 1.0238x faster
   dry.c                                            414.1210+-4.7908          411.1771+-1.1559        
   FloatMM.c                                        714.1605+-178.6274        655.5754+-11.8194         might be 1.0894x faster
   gcc-loops.cpp                                   3272.2238+-27.8167    ?   3290.7650+-73.6878       ?
   n-body.c                                         791.6000+-3.8892          787.7072+-2.7899        
   Quicksort.c                                      387.2892+-8.8518     ?    390.2397+-1.9515        ?
   stepanov_container.cpp                          3498.7477+-65.9944    ?   3678.1403+-418.7784      ? might be 1.0513x slower
   Towers.c                                         229.7942+-18.4149         225.6820+-3.1262          might be 1.0182x faster

   <geometric>                                      699.2493+-18.2802         694.5520+-10.5689         might be 1.0068x faster

                                                           ToT                     Patched                                      
CompressionBench:
   huffman                                          206.4354+-4.3874     ^     55.4592+-2.2574        ^ definitely 3.7223x faster
   arithmetic-simple                                253.7523+-3.9196          253.3229+-2.7846        
   arithmetic-precise                               233.4282+-7.2750          230.3280+-3.9559          might be 1.0135x faster
   arithmetic-complex-precise                       230.4389+-2.1971          227.9658+-2.8130          might be 1.0108x faster
   arithmetic-precise-order-0                       267.6912+-26.2054         261.8588+-7.4961          might be 1.0223x faster
   arithmetic-precise-order-1                       306.4816+-43.2585         289.4307+-10.6825         might be 1.0589x faster
   arithmetic-precise-order-2                       325.4543+-5.1850          323.8975+-1.3957        
   arithmetic-simple-order-1                        305.2067+-3.6842     ?    305.6675+-2.6961        ?
   arithmetic-simple-order-2                        353.4729+-4.4366     ?    359.8098+-7.7455        ? might be 1.0179x slower
   lz-string                                        311.3513+-2.5801     ?    318.9263+-24.8698       ? might be 1.0243x slower

   <geometric>                                      275.4657+-2.3287     ^    240.0061+-1.4349        ^ definitely 1.1477x faster

                                                           ToT                     Patched                                      
Geomean of preferred means:
   <scaled-result>                                   51.3268+-0.7692     ^     50.0994+-0.3844        ^ definitely 1.0245x faster

rniwa-rmbp:safari rniwa$
Comment 4 Darin Adler 2015-07-06 09:58:06 PDT
Comment on attachment 256202 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=256202&action=review

> Source/JavaScriptCore/runtime/ArrayPrototype.cpp:529
> +    if (argCount == 1 && previousArray && currentArray && finalArraySize.unsafeGet() < MIN_SPARSE_ARRAY_INDEX) {

I’m not sure why we would limit ourselves to doing this optimization when there are exactly two arrays being concatenated. I think we could easily to do the optimization any time we are concatenating a large number of arrays with compatible types.

I’m not entirely sure why we can’t do this for arrays that end up longer than MIN_SPARSE_ARRAY_INDEX.

> Source/JavaScriptCore/runtime/ArrayPrototype.cpp:532
> +        IndexingType type = JSArray::fastConcatType(exec->vm(), *previousArray, *currentArray);
> +        if (type != NonArray)
> +            return previousArray->fastConcatWith(*exec, *currentArray);

It’s not a great pattern to make all these optimized functions be members of the JSArray class. I believe it’s excessively specific to limit this optimization to JSArray objects. I’m pretty sure that the same optimization would work with two array-like objects.

I also think it’s a shame that the entire fastConcatType function ends up in a header because of this. It should be here, in this cpp file.

An advantage of keeping more of the code here in ArrayPrototype.cpp is that we can write various helper functions here that make these optimized code paths easier to read with less written out.

> Source/JavaScriptCore/runtime/JSArray.cpp:718
> +    ASSERT(thisArraySize + otherArraySize < MIN_SPARSE_ARRAY_INDEX);

Why is this assertion needed?

> Source/JavaScriptCore/runtime/JSArray.cpp:734
> +        auto buffer = resultButterfly.contiguous().data();
> +        memcpy(buffer, m_butterfly->contiguous().data(), sizeof(JSValue) * thisArraySize);
> +        memcpy(buffer + thisArraySize, otherButterfly.contiguous().data(), sizeof(JSValue) * otherArraySize);

Seems icky to merge the Int32 case and the contiguous() case like this. Not at all sure why the double case needs to be separate and the integer one doesn’t.
Comment 5 WebKit Commit Bot 2015-07-06 10:46:32 PDT
Comment on attachment 256202 [details]
Patch

Clearing flags on attachment: 256202

Committed r186358: <http://trac.webkit.org/changeset/186358>
Comment 6 WebKit Commit Bot 2015-07-06 10:46:38 PDT
All reviewed patches have been landed.  Closing bug.
Comment 7 Alex Christensen 2015-07-06 11:15:48 PDT
build fix in http://trac.webkit.org/changeset/186363
Comment 8 Chris Dumez 2015-07-06 13:17:44 PDT
Looks like this broke the GTK build?

[355/2761] Building CXX object Source/JavaScriptCore/CMakeFiles/JavaScriptCore.dir/runtime/JSArrayBufferPrototype.cpp.o
FAILED: /usr/lib/ccache/g++-4.9   -DBUILDING_GTK__=1 -DBUILDING_JavaScriptCore -DBUILDING_WITH_CMAKE=1 -DDATA_DIR=\"share\" -DGETTEXT_PACKAGE=\"WebKit2GTK-4.0\" -DHAVE_CONFIG_H=1 -DJavaScriptCore_EXPORTS -DSTATICALLY_LINKED_WITH_WTF -DUSER_AGENT_GTK_MAJOR_VERSION=601 -DUSER_AGENT_GTK_MINOR_VERSION=1 -DWEBKITGTK_API_VERSION_STRING=\"4.0\" -pipe  -std=c++11 -g -fPIC -I. -I../../Source/JavaScriptCore -I../../Source/JavaScriptCore/API -I../../Source/JavaScriptCore/ForwardingHeaders -I../../Source/JavaScriptCore/assembler -I../../Source/JavaScriptCore/bindings -I../../Source/JavaScriptCore/builtins -I../../Source/JavaScriptCore/bytecode -I../../Source/JavaScriptCore/bytecompiler -I../../Source/JavaScriptCore/dfg -I../../Source/JavaScriptCore/disassembler -I../../Source/JavaScriptCore/ftl -I../../Source/JavaScriptCore/heap -I../../Source/JavaScriptCore/debugger -I../../Source/JavaScriptCore/inspector -I../../Source/JavaScriptCore/inspector/agents -I../../Source/JavaScriptCore/inspector/augmentable -I../../Source/JavaScriptCore/inspector/remote -I../../Source/JavaScriptCore/interpreter -I../../Source/JavaScriptCore/jit -I../../Source/JavaScriptCore/llint -I../../Source/JavaScriptCore/llvm -I../../Source/JavaScriptCore/parser -I../../Source/JavaScriptCore/profiler -I../../Source/JavaScriptCore/replay -I../../Source/JavaScriptCore/runtime -I../../Source/JavaScriptCore/tools -I../../Source/JavaScriptCore/yarr -I../../Source/WTF -IDerivedSources -IDerivedSources/ForwardingHeaders -IDerivedSources/JavaScriptCore -IDerivedSources/JavaScriptCore/inspector -I../../Source -I../../Source/JavaScriptCore/disassembler/udis86 -isystem /usr/include/x86_64-linux-gnu -isystem ../DependenciesGTK/Root/include/glib-2.0 -isystem ../DependenciesGTK/Root/lib64/glib-2.0/include    -Wall -Wextra -Wcast-align -Wformat-security -Wmissing-format-attribute -Wpointer-arith -Wundef -Wwrite-strings -MMD -MT Source/JavaScriptCore/CMakeFiles/JavaScriptCore.dir/runtime/JSArray.cpp.o -MF Source/JavaScriptCore/CMakeFiles/JavaScriptCore.dir/runtime/JSArray.cpp.o.d -o Source/JavaScriptCore/CMakeFiles/JavaScriptCore.dir/runtime/JSArray.cpp.o -c ../../Source/JavaScriptCore/runtime/JSArray.cpp
../../Source/JavaScriptCore/runtime/JSArray.cpp: In member function ‘JSC::EncodedJSValue JSC::JSArray::fastConcatWith(JSC::ExecState&, JSC::JSArray&)’:
../../Source/JavaScriptCore/runtime/JSArray.cpp:713:56: error: no matching function for call to ‘JSC::JSArray::fastConcatType(JSC::JSArray&, JSC::JSArray&)’
     ASSERT(newArrayType == fastConcatType(*this, otherArray));
                                                        ^
../../Source/JavaScriptCore/runtime/JSArray.cpp:713:56: note: candidate is:
In file included from ../../Source/JavaScriptCore/runtime/JSArray.cpp:24:0:
../../Source/JavaScriptCore/runtime/JSArray.h:81:25: note: static JSC::IndexingType JSC::JSArray::fastConcatType(JSC::VM&, JSC::JSArray&, JSC::JSArray&)
     static IndexingType fastConcatType(VM& vm, JSArray& firstArray, JSArray& secondArray)
                         ^
../../Source/JavaScriptCore/runtime/JSArray.h:81:25: note:   candidate expects 3 arguments, 2 provided
ninja: build stopped: subcommand failed.
Comment 9 Alex Christensen 2015-07-06 13:20:03 PDT
Just the GTK debug build.  My fix fixes that, too