RESOLVED FIXED 146260
Array.concat should be fast for integer or double arrays
https://bugs.webkit.org/show_bug.cgi?id=146260
Summary Array.concat should be fast for integer or double arrays
Ryosuke Niwa
Reported 2015-06-23 16:41:52 PDT
Make it faster!
Attachments
WIP (6.45 KB, patch)
2015-06-23 16:44 PDT, Ryosuke Niwa
no flags
Patch (5.83 KB, patch)
2015-07-06 01:44 PDT, Ryosuke Niwa
no flags
Ryosuke Niwa
Comment 1 2015-06-23 16:44:05 PDT
Ryosuke Niwa
Comment 2 2015-07-06 01:44:29 PDT
Ryosuke Niwa
Comment 3 2015-07-06 01:50:09 PDT
ToT Patched SunSpider: 3d-cube 4.2256+-0.2080 4.1486+-0.1228 might be 1.0186x faster 3d-morph 5.1026+-0.5274 4.8358+-0.1522 might be 1.0552x faster 3d-raytrace 5.3484+-0.6021 4.9348+-0.1349 might be 1.0838x faster access-binary-trees 2.7130+-1.6508 2.1113+-0.2080 might be 1.2850x faster access-fannkuch 5.0765+-0.1759 ? 5.1047+-0.1920 ? access-nbody 2.4000+-0.1861 2.3014+-0.0689 might be 1.0428x faster access-nsieve 3.2500+-0.2123 3.1671+-0.1682 might be 1.0262x faster bitops-3bit-bits-in-byte 1.3497+-0.0580 ? 1.3560+-0.0258 ? bitops-bits-in-byte 3.2062+-0.2491 ? 3.3033+-0.3411 ? might be 1.0303x slower bitops-bitwise-and 1.9884+-0.3176 1.9120+-0.0617 might be 1.0400x faster bitops-nsieve-bits 2.9948+-0.6010 2.7932+-0.0961 might be 1.0722x faster controlflow-recursive 2.4005+-0.3053 2.2630+-0.2697 might be 1.0608x faster crypto-aes 3.8080+-0.4321 3.6125+-0.1643 might be 1.0541x faster crypto-md5 2.4933+-0.6524 2.3353+-0.1614 might be 1.0677x faster crypto-sha1 2.9265+-0.7769 2.6860+-0.2184 might be 1.0896x faster date-format-tofte 6.5455+-0.8765 6.1035+-0.2283 might be 1.0724x faster date-format-xparb 4.5027+-0.1354 ? 4.6120+-0.9595 ? might be 1.0243x slower math-cordic 2.5932+-0.1076 ? 2.7600+-0.4219 ? might be 1.0643x slower math-partial-sums 4.2813+-0.2447 ? 4.3400+-0.2924 ? might be 1.0137x slower math-spectral-norm 1.8025+-0.1362 1.7240+-0.0404 might be 1.0455x faster regexp-dna 6.3435+-0.3175 6.2752+-0.5174 might be 1.0109x faster string-base64 4.2049+-0.1331 ? 4.2401+-0.1050 ? string-fasta 5.8904+-0.7815 5.8338+-0.5782 string-tagcloud 8.7347+-1.6620 8.3389+-0.9243 might be 1.0475x faster string-unpack-code 18.7471+-1.1970 ? 19.2507+-2.1459 ? might be 1.0269x slower string-validate-input 4.9433+-0.2834 ? 5.1698+-0.2136 ? might be 1.0458x slower <arithmetic> 4.5336+-0.1186 4.4428+-0.1392 might be 1.0204x faster ToT Patched LongSpider: 3d-cube 732.4911+-17.6962 732.0290+-10.2143 3d-morph 1430.1607+-18.0872 ? 1434.8452+-26.4035 ? 3d-raytrace 596.0754+-7.6586 ? 599.5426+-1.8276 ? access-binary-trees 781.2598+-2.7399 ? 784.9316+-24.1927 ? access-fannkuch 271.4528+-24.5880 262.6239+-3.3812 might be 1.0336x faster access-nbody 485.5085+-4.3272 ? 499.2198+-45.4226 ? might be 1.0282x slower access-nsieve 353.0665+-5.4384 ? 370.4830+-24.7586 ? might be 1.0493x slower bitops-3bit-bits-in-byte 41.5132+-6.6918 38.5641+-2.4091 might be 1.0765x faster bitops-bits-in-byte 74.2993+-8.7717 74.0218+-6.2145 bitops-nsieve-bits 392.9969+-7.6091 389.0017+-9.2592 might be 1.0103x faster controlflow-recursive 392.9302+-0.8604 ? 409.9299+-18.8462 ? might be 1.0433x slower crypto-aes 553.3612+-44.8798 552.0107+-6.3426 crypto-md5 457.0887+-35.6345 444.5355+-2.4949 might be 1.0282x faster crypto-sha1 603.7267+-1.3899 602.5683+-3.4080 date-format-tofte 475.3430+-6.6626 ? 502.3791+-39.1386 ? might be 1.0569x slower date-format-xparb 606.6652+-17.4584 ? 610.6105+-32.9028 ? hash-map 156.7812+-2.5357 154.7148+-2.6661 might be 1.0134x faster math-cordic 463.6342+-11.6011 462.2351+-5.8737 math-partial-sums 394.3937+-3.7936 ? 396.2653+-9.3538 ? math-spectral-norm 539.7730+-43.9003 523.9571+-5.8828 might be 1.0302x faster string-base64 327.7774+-4.6003 324.7231+-5.5633 string-fasta 352.8232+-35.0304 344.6918+-3.7037 might be 1.0236x faster string-tagcloud 173.8855+-4.5186 171.7451+-5.2591 might be 1.0125x faster <geometric> 373.6247+-6.0726 372.9615+-1.9751 might be 1.0018x faster ToT Patched V8Spider: crypto 48.7357+-4.2400 48.6815+-3.2025 deltablue 79.6133+-21.4432 78.2777+-7.8502 might be 1.0171x faster earley-boyer 38.2008+-1.6078 ! 40.7530+-0.7233 ! definitely 1.0668x slower raytrace 34.3981+-12.4652 30.8655+-2.6446 might be 1.1144x faster regexp 63.7994+-6.6769 60.8007+-4.2471 might be 1.0493x faster richards 68.0728+-7.1028 ? 68.4914+-5.8730 ? splay 33.2208+-2.1391 ? 34.9760+-5.7262 ? might be 1.0528x slower <geometric> 49.3750+-3.5622 49.1452+-1.2383 might be 1.0047x faster ToT Patched Octane: encrypt 0.19499+-0.00326 0.19406+-0.00144 decrypt 3.20066+-0.02016 3.16506+-0.05610 might be 1.0112x faster deltablue x2 0.14692+-0.00567 ? 0.15166+-0.00693 ? might be 1.0323x slower earley 0.27452+-0.04017 0.26486+-0.00502 might be 1.0365x faster boyer 4.08647+-0.41190 3.98210+-0.07456 might be 1.0262x faster navier-stokes x2 4.72553+-0.05136 4.69365+-0.01734 raytrace x2 1.02673+-0.18760 0.94837+-0.05858 might be 1.0826x faster richards x2 0.09792+-0.00469 0.09715+-0.00302 splay x2 0.33353+-0.00701 ? 0.33555+-0.00657 ? regexp x2 23.97925+-0.36176 ? 24.29159+-2.46113 ? might be 1.0130x slower pdfjs x2 36.43494+-1.38061 36.38408+-0.47574 mandreel x2 41.94183+-0.89732 41.93924+-0.97913 gbemu x2 32.38843+-0.35382 ? 34.03859+-3.46676 ? might be 1.0509x slower closure 0.52639+-0.01189 0.52277+-0.00248 jquery 6.64410+-0.07917 6.61718+-0.06709 box2d x2 9.43033+-0.09989 ? 9.43754+-0.19752 ? zlib x2 368.38127+-17.47167 354.91610+-33.68531 might be 1.0379x faster typescript x2 627.64178+-28.46865 ? 706.43146+-206.08316 ? might be 1.1255x slower <geometric> 5.38737+-0.03327 ? 5.40005+-0.13357 ? might be 1.0024x slower ToT Patched Kraken: ai-astar 227.345+-1.622 ^ 212.479+-2.620 ^ definitely 1.0700x faster audio-beat-detection 69.711+-1.397 69.674+-0.514 audio-dft 106.789+-8.551 102.338+-2.838 might be 1.0435x faster audio-fft 60.339+-9.890 ? 62.430+-1.107 ? might be 1.0347x slower audio-oscillator 59.859+-3.140 ? 60.376+-3.963 ? imaging-darkroom 87.248+-2.666 ? 89.140+-5.429 ? might be 1.0217x slower imaging-desaturate 54.181+-3.706 50.404+-5.320 might be 1.0749x faster imaging-gaussian-blur 79.132+-0.749 ? 79.543+-0.762 ? json-parse-financial 38.369+-2.436 ? 38.392+-2.100 ? json-stringify-tinderbox 23.974+-2.558 23.196+-0.687 might be 1.0335x faster stanford-crypto-aes 51.694+-1.405 ? 51.740+-3.546 ? stanford-crypto-ccm 39.247+-3.695 37.200+-0.773 might be 1.0550x faster stanford-crypto-pbkdf2 93.993+-1.661 90.963+-1.474 might be 1.0333x faster stanford-crypto-sha256-iterative 36.762+-4.860 35.634+-2.017 might be 1.0317x faster <arithmetic> 73.475+-1.349 71.679+-0.583 might be 1.0250x faster ToT Patched JSRegress: ... <geometric> 7.5597+-0.0689 7.5557+-0.0360 might be 1.0005x faster ToT Patched AsmBench: bigfib.cpp 434.4980+-3.7348 ? 437.9674+-6.1751 ? cray.c 389.1498+-21.8900 380.0905+-3.2306 might be 1.0238x faster dry.c 414.1210+-4.7908 411.1771+-1.1559 FloatMM.c 714.1605+-178.6274 655.5754+-11.8194 might be 1.0894x faster gcc-loops.cpp 3272.2238+-27.8167 ? 3290.7650+-73.6878 ? n-body.c 791.6000+-3.8892 787.7072+-2.7899 Quicksort.c 387.2892+-8.8518 ? 390.2397+-1.9515 ? stepanov_container.cpp 3498.7477+-65.9944 ? 3678.1403+-418.7784 ? might be 1.0513x slower Towers.c 229.7942+-18.4149 225.6820+-3.1262 might be 1.0182x faster <geometric> 699.2493+-18.2802 694.5520+-10.5689 might be 1.0068x faster ToT Patched CompressionBench: huffman 206.4354+-4.3874 ^ 55.4592+-2.2574 ^ definitely 3.7223x faster arithmetic-simple 253.7523+-3.9196 253.3229+-2.7846 arithmetic-precise 233.4282+-7.2750 230.3280+-3.9559 might be 1.0135x faster arithmetic-complex-precise 230.4389+-2.1971 227.9658+-2.8130 might be 1.0108x faster arithmetic-precise-order-0 267.6912+-26.2054 261.8588+-7.4961 might be 1.0223x faster arithmetic-precise-order-1 306.4816+-43.2585 289.4307+-10.6825 might be 1.0589x faster arithmetic-precise-order-2 325.4543+-5.1850 323.8975+-1.3957 arithmetic-simple-order-1 305.2067+-3.6842 ? 305.6675+-2.6961 ? arithmetic-simple-order-2 353.4729+-4.4366 ? 359.8098+-7.7455 ? might be 1.0179x slower lz-string 311.3513+-2.5801 ? 318.9263+-24.8698 ? might be 1.0243x slower <geometric> 275.4657+-2.3287 ^ 240.0061+-1.4349 ^ definitely 1.1477x faster ToT Patched Geomean of preferred means: <scaled-result> 51.3268+-0.7692 ^ 50.0994+-0.3844 ^ definitely 1.0245x faster rniwa-rmbp:safari rniwa$
Darin Adler
Comment 4 2015-07-06 09:58:06 PDT
Comment on attachment 256202 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=256202&action=review > Source/JavaScriptCore/runtime/ArrayPrototype.cpp:529 > + if (argCount == 1 && previousArray && currentArray && finalArraySize.unsafeGet() < MIN_SPARSE_ARRAY_INDEX) { I’m not sure why we would limit ourselves to doing this optimization when there are exactly two arrays being concatenated. I think we could easily to do the optimization any time we are concatenating a large number of arrays with compatible types. I’m not entirely sure why we can’t do this for arrays that end up longer than MIN_SPARSE_ARRAY_INDEX. > Source/JavaScriptCore/runtime/ArrayPrototype.cpp:532 > + IndexingType type = JSArray::fastConcatType(exec->vm(), *previousArray, *currentArray); > + if (type != NonArray) > + return previousArray->fastConcatWith(*exec, *currentArray); It’s not a great pattern to make all these optimized functions be members of the JSArray class. I believe it’s excessively specific to limit this optimization to JSArray objects. I’m pretty sure that the same optimization would work with two array-like objects. I also think it’s a shame that the entire fastConcatType function ends up in a header because of this. It should be here, in this cpp file. An advantage of keeping more of the code here in ArrayPrototype.cpp is that we can write various helper functions here that make these optimized code paths easier to read with less written out. > Source/JavaScriptCore/runtime/JSArray.cpp:718 > + ASSERT(thisArraySize + otherArraySize < MIN_SPARSE_ARRAY_INDEX); Why is this assertion needed? > Source/JavaScriptCore/runtime/JSArray.cpp:734 > + auto buffer = resultButterfly.contiguous().data(); > + memcpy(buffer, m_butterfly->contiguous().data(), sizeof(JSValue) * thisArraySize); > + memcpy(buffer + thisArraySize, otherButterfly.contiguous().data(), sizeof(JSValue) * otherArraySize); Seems icky to merge the Int32 case and the contiguous() case like this. Not at all sure why the double case needs to be separate and the integer one doesn’t.
WebKit Commit Bot
Comment 5 2015-07-06 10:46:32 PDT
Comment on attachment 256202 [details] Patch Clearing flags on attachment: 256202 Committed r186358: <http://trac.webkit.org/changeset/186358>
WebKit Commit Bot
Comment 6 2015-07-06 10:46:38 PDT
All reviewed patches have been landed. Closing bug.
Alex Christensen
Comment 7 2015-07-06 11:15:48 PDT
Chris Dumez
Comment 8 2015-07-06 13:17:44 PDT
Looks like this broke the GTK build? [355/2761] Building CXX object Source/JavaScriptCore/CMakeFiles/JavaScriptCore.dir/runtime/JSArrayBufferPrototype.cpp.o FAILED: /usr/lib/ccache/g++-4.9 -DBUILDING_GTK__=1 -DBUILDING_JavaScriptCore -DBUILDING_WITH_CMAKE=1 -DDATA_DIR=\"share\" -DGETTEXT_PACKAGE=\"WebKit2GTK-4.0\" -DHAVE_CONFIG_H=1 -DJavaScriptCore_EXPORTS -DSTATICALLY_LINKED_WITH_WTF -DUSER_AGENT_GTK_MAJOR_VERSION=601 -DUSER_AGENT_GTK_MINOR_VERSION=1 -DWEBKITGTK_API_VERSION_STRING=\"4.0\" -pipe -std=c++11 -g -fPIC -I. -I../../Source/JavaScriptCore -I../../Source/JavaScriptCore/API -I../../Source/JavaScriptCore/ForwardingHeaders -I../../Source/JavaScriptCore/assembler -I../../Source/JavaScriptCore/bindings -I../../Source/JavaScriptCore/builtins -I../../Source/JavaScriptCore/bytecode -I../../Source/JavaScriptCore/bytecompiler -I../../Source/JavaScriptCore/dfg -I../../Source/JavaScriptCore/disassembler -I../../Source/JavaScriptCore/ftl -I../../Source/JavaScriptCore/heap -I../../Source/JavaScriptCore/debugger -I../../Source/JavaScriptCore/inspector -I../../Source/JavaScriptCore/inspector/agents -I../../Source/JavaScriptCore/inspector/augmentable -I../../Source/JavaScriptCore/inspector/remote -I../../Source/JavaScriptCore/interpreter -I../../Source/JavaScriptCore/jit -I../../Source/JavaScriptCore/llint -I../../Source/JavaScriptCore/llvm -I../../Source/JavaScriptCore/parser -I../../Source/JavaScriptCore/profiler -I../../Source/JavaScriptCore/replay -I../../Source/JavaScriptCore/runtime -I../../Source/JavaScriptCore/tools -I../../Source/JavaScriptCore/yarr -I../../Source/WTF -IDerivedSources -IDerivedSources/ForwardingHeaders -IDerivedSources/JavaScriptCore -IDerivedSources/JavaScriptCore/inspector -I../../Source -I../../Source/JavaScriptCore/disassembler/udis86 -isystem /usr/include/x86_64-linux-gnu -isystem ../DependenciesGTK/Root/include/glib-2.0 -isystem ../DependenciesGTK/Root/lib64/glib-2.0/include -Wall -Wextra -Wcast-align -Wformat-security -Wmissing-format-attribute -Wpointer-arith -Wundef -Wwrite-strings -MMD -MT Source/JavaScriptCore/CMakeFiles/JavaScriptCore.dir/runtime/JSArray.cpp.o -MF Source/JavaScriptCore/CMakeFiles/JavaScriptCore.dir/runtime/JSArray.cpp.o.d -o Source/JavaScriptCore/CMakeFiles/JavaScriptCore.dir/runtime/JSArray.cpp.o -c ../../Source/JavaScriptCore/runtime/JSArray.cpp ../../Source/JavaScriptCore/runtime/JSArray.cpp: In member function ‘JSC::EncodedJSValue JSC::JSArray::fastConcatWith(JSC::ExecState&, JSC::JSArray&)’: ../../Source/JavaScriptCore/runtime/JSArray.cpp:713:56: error: no matching function for call to ‘JSC::JSArray::fastConcatType(JSC::JSArray&, JSC::JSArray&)’ ASSERT(newArrayType == fastConcatType(*this, otherArray)); ^ ../../Source/JavaScriptCore/runtime/JSArray.cpp:713:56: note: candidate is: In file included from ../../Source/JavaScriptCore/runtime/JSArray.cpp:24:0: ../../Source/JavaScriptCore/runtime/JSArray.h:81:25: note: static JSC::IndexingType JSC::JSArray::fastConcatType(JSC::VM&, JSC::JSArray&, JSC::JSArray&) static IndexingType fastConcatType(VM& vm, JSArray& firstArray, JSArray& secondArray) ^ ../../Source/JavaScriptCore/runtime/JSArray.h:81:25: note: candidate expects 3 arguments, 2 provided ninja: build stopped: subcommand failed.
Alex Christensen
Comment 9 2015-07-06 13:20:03 PDT
Just the GTK debug build. My fix fixes that, too
Note You need to log in before you can comment on or make changes to this bug.