| Summary: | Array.concat should be fast for integer or double arrays | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | WebKit | Reporter: | Ryosuke Niwa <rniwa> | ||||||
| Component: | JavaScriptCore | Assignee: | Ryosuke Niwa <rniwa> | ||||||
| Status: | RESOLVED FIXED | ||||||||
| Severity: | Normal | CC: | achristensen, benjamin, cdumez, commit-queue, darin, fpizlo, kling | ||||||
| Priority: | P2 | ||||||||
| Version: | 528+ (Nightly build) | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Attachments: |
|
||||||||
|
Description
Ryosuke Niwa
2015-06-23 16:41:52 PDT
Created attachment 255449 [details]
WIP
Created attachment 256202 [details]
Patch
ToT Patched
SunSpider:
3d-cube 4.2256+-0.2080 4.1486+-0.1228 might be 1.0186x faster
3d-morph 5.1026+-0.5274 4.8358+-0.1522 might be 1.0552x faster
3d-raytrace 5.3484+-0.6021 4.9348+-0.1349 might be 1.0838x faster
access-binary-trees 2.7130+-1.6508 2.1113+-0.2080 might be 1.2850x faster
access-fannkuch 5.0765+-0.1759 ? 5.1047+-0.1920 ?
access-nbody 2.4000+-0.1861 2.3014+-0.0689 might be 1.0428x faster
access-nsieve 3.2500+-0.2123 3.1671+-0.1682 might be 1.0262x faster
bitops-3bit-bits-in-byte 1.3497+-0.0580 ? 1.3560+-0.0258 ?
bitops-bits-in-byte 3.2062+-0.2491 ? 3.3033+-0.3411 ? might be 1.0303x slower
bitops-bitwise-and 1.9884+-0.3176 1.9120+-0.0617 might be 1.0400x faster
bitops-nsieve-bits 2.9948+-0.6010 2.7932+-0.0961 might be 1.0722x faster
controlflow-recursive 2.4005+-0.3053 2.2630+-0.2697 might be 1.0608x faster
crypto-aes 3.8080+-0.4321 3.6125+-0.1643 might be 1.0541x faster
crypto-md5 2.4933+-0.6524 2.3353+-0.1614 might be 1.0677x faster
crypto-sha1 2.9265+-0.7769 2.6860+-0.2184 might be 1.0896x faster
date-format-tofte 6.5455+-0.8765 6.1035+-0.2283 might be 1.0724x faster
date-format-xparb 4.5027+-0.1354 ? 4.6120+-0.9595 ? might be 1.0243x slower
math-cordic 2.5932+-0.1076 ? 2.7600+-0.4219 ? might be 1.0643x slower
math-partial-sums 4.2813+-0.2447 ? 4.3400+-0.2924 ? might be 1.0137x slower
math-spectral-norm 1.8025+-0.1362 1.7240+-0.0404 might be 1.0455x faster
regexp-dna 6.3435+-0.3175 6.2752+-0.5174 might be 1.0109x faster
string-base64 4.2049+-0.1331 ? 4.2401+-0.1050 ?
string-fasta 5.8904+-0.7815 5.8338+-0.5782
string-tagcloud 8.7347+-1.6620 8.3389+-0.9243 might be 1.0475x faster
string-unpack-code 18.7471+-1.1970 ? 19.2507+-2.1459 ? might be 1.0269x slower
string-validate-input 4.9433+-0.2834 ? 5.1698+-0.2136 ? might be 1.0458x slower
<arithmetic> 4.5336+-0.1186 4.4428+-0.1392 might be 1.0204x faster
ToT Patched
LongSpider:
3d-cube 732.4911+-17.6962 732.0290+-10.2143
3d-morph 1430.1607+-18.0872 ? 1434.8452+-26.4035 ?
3d-raytrace 596.0754+-7.6586 ? 599.5426+-1.8276 ?
access-binary-trees 781.2598+-2.7399 ? 784.9316+-24.1927 ?
access-fannkuch 271.4528+-24.5880 262.6239+-3.3812 might be 1.0336x faster
access-nbody 485.5085+-4.3272 ? 499.2198+-45.4226 ? might be 1.0282x slower
access-nsieve 353.0665+-5.4384 ? 370.4830+-24.7586 ? might be 1.0493x slower
bitops-3bit-bits-in-byte 41.5132+-6.6918 38.5641+-2.4091 might be 1.0765x faster
bitops-bits-in-byte 74.2993+-8.7717 74.0218+-6.2145
bitops-nsieve-bits 392.9969+-7.6091 389.0017+-9.2592 might be 1.0103x faster
controlflow-recursive 392.9302+-0.8604 ? 409.9299+-18.8462 ? might be 1.0433x slower
crypto-aes 553.3612+-44.8798 552.0107+-6.3426
crypto-md5 457.0887+-35.6345 444.5355+-2.4949 might be 1.0282x faster
crypto-sha1 603.7267+-1.3899 602.5683+-3.4080
date-format-tofte 475.3430+-6.6626 ? 502.3791+-39.1386 ? might be 1.0569x slower
date-format-xparb 606.6652+-17.4584 ? 610.6105+-32.9028 ?
hash-map 156.7812+-2.5357 154.7148+-2.6661 might be 1.0134x faster
math-cordic 463.6342+-11.6011 462.2351+-5.8737
math-partial-sums 394.3937+-3.7936 ? 396.2653+-9.3538 ?
math-spectral-norm 539.7730+-43.9003 523.9571+-5.8828 might be 1.0302x faster
string-base64 327.7774+-4.6003 324.7231+-5.5633
string-fasta 352.8232+-35.0304 344.6918+-3.7037 might be 1.0236x faster
string-tagcloud 173.8855+-4.5186 171.7451+-5.2591 might be 1.0125x faster
<geometric> 373.6247+-6.0726 372.9615+-1.9751 might be 1.0018x faster
ToT Patched
V8Spider:
crypto 48.7357+-4.2400 48.6815+-3.2025
deltablue 79.6133+-21.4432 78.2777+-7.8502 might be 1.0171x faster
earley-boyer 38.2008+-1.6078 ! 40.7530+-0.7233 ! definitely 1.0668x slower
raytrace 34.3981+-12.4652 30.8655+-2.6446 might be 1.1144x faster
regexp 63.7994+-6.6769 60.8007+-4.2471 might be 1.0493x faster
richards 68.0728+-7.1028 ? 68.4914+-5.8730 ?
splay 33.2208+-2.1391 ? 34.9760+-5.7262 ? might be 1.0528x slower
<geometric> 49.3750+-3.5622 49.1452+-1.2383 might be 1.0047x faster
ToT Patched
Octane:
encrypt 0.19499+-0.00326 0.19406+-0.00144
decrypt 3.20066+-0.02016 3.16506+-0.05610 might be 1.0112x faster
deltablue x2 0.14692+-0.00567 ? 0.15166+-0.00693 ? might be 1.0323x slower
earley 0.27452+-0.04017 0.26486+-0.00502 might be 1.0365x faster
boyer 4.08647+-0.41190 3.98210+-0.07456 might be 1.0262x faster
navier-stokes x2 4.72553+-0.05136 4.69365+-0.01734
raytrace x2 1.02673+-0.18760 0.94837+-0.05858 might be 1.0826x faster
richards x2 0.09792+-0.00469 0.09715+-0.00302
splay x2 0.33353+-0.00701 ? 0.33555+-0.00657 ?
regexp x2 23.97925+-0.36176 ? 24.29159+-2.46113 ? might be 1.0130x slower
pdfjs x2 36.43494+-1.38061 36.38408+-0.47574
mandreel x2 41.94183+-0.89732 41.93924+-0.97913
gbemu x2 32.38843+-0.35382 ? 34.03859+-3.46676 ? might be 1.0509x slower
closure 0.52639+-0.01189 0.52277+-0.00248
jquery 6.64410+-0.07917 6.61718+-0.06709
box2d x2 9.43033+-0.09989 ? 9.43754+-0.19752 ?
zlib x2 368.38127+-17.47167 354.91610+-33.68531 might be 1.0379x faster
typescript x2 627.64178+-28.46865 ? 706.43146+-206.08316 ? might be 1.1255x slower
<geometric> 5.38737+-0.03327 ? 5.40005+-0.13357 ? might be 1.0024x slower
ToT Patched
Kraken:
ai-astar 227.345+-1.622 ^ 212.479+-2.620 ^ definitely 1.0700x faster
audio-beat-detection 69.711+-1.397 69.674+-0.514
audio-dft 106.789+-8.551 102.338+-2.838 might be 1.0435x faster
audio-fft 60.339+-9.890 ? 62.430+-1.107 ? might be 1.0347x slower
audio-oscillator 59.859+-3.140 ? 60.376+-3.963 ?
imaging-darkroom 87.248+-2.666 ? 89.140+-5.429 ? might be 1.0217x slower
imaging-desaturate 54.181+-3.706 50.404+-5.320 might be 1.0749x faster
imaging-gaussian-blur 79.132+-0.749 ? 79.543+-0.762 ?
json-parse-financial 38.369+-2.436 ? 38.392+-2.100 ?
json-stringify-tinderbox 23.974+-2.558 23.196+-0.687 might be 1.0335x faster
stanford-crypto-aes 51.694+-1.405 ? 51.740+-3.546 ?
stanford-crypto-ccm 39.247+-3.695 37.200+-0.773 might be 1.0550x faster
stanford-crypto-pbkdf2 93.993+-1.661 90.963+-1.474 might be 1.0333x faster
stanford-crypto-sha256-iterative 36.762+-4.860 35.634+-2.017 might be 1.0317x faster
<arithmetic> 73.475+-1.349 71.679+-0.583 might be 1.0250x faster
ToT Patched
JSRegress:
...
<geometric> 7.5597+-0.0689 7.5557+-0.0360 might be 1.0005x faster
ToT Patched
AsmBench:
bigfib.cpp 434.4980+-3.7348 ? 437.9674+-6.1751 ?
cray.c 389.1498+-21.8900 380.0905+-3.2306 might be 1.0238x faster
dry.c 414.1210+-4.7908 411.1771+-1.1559
FloatMM.c 714.1605+-178.6274 655.5754+-11.8194 might be 1.0894x faster
gcc-loops.cpp 3272.2238+-27.8167 ? 3290.7650+-73.6878 ?
n-body.c 791.6000+-3.8892 787.7072+-2.7899
Quicksort.c 387.2892+-8.8518 ? 390.2397+-1.9515 ?
stepanov_container.cpp 3498.7477+-65.9944 ? 3678.1403+-418.7784 ? might be 1.0513x slower
Towers.c 229.7942+-18.4149 225.6820+-3.1262 might be 1.0182x faster
<geometric> 699.2493+-18.2802 694.5520+-10.5689 might be 1.0068x faster
ToT Patched
CompressionBench:
huffman 206.4354+-4.3874 ^ 55.4592+-2.2574 ^ definitely 3.7223x faster
arithmetic-simple 253.7523+-3.9196 253.3229+-2.7846
arithmetic-precise 233.4282+-7.2750 230.3280+-3.9559 might be 1.0135x faster
arithmetic-complex-precise 230.4389+-2.1971 227.9658+-2.8130 might be 1.0108x faster
arithmetic-precise-order-0 267.6912+-26.2054 261.8588+-7.4961 might be 1.0223x faster
arithmetic-precise-order-1 306.4816+-43.2585 289.4307+-10.6825 might be 1.0589x faster
arithmetic-precise-order-2 325.4543+-5.1850 323.8975+-1.3957
arithmetic-simple-order-1 305.2067+-3.6842 ? 305.6675+-2.6961 ?
arithmetic-simple-order-2 353.4729+-4.4366 ? 359.8098+-7.7455 ? might be 1.0179x slower
lz-string 311.3513+-2.5801 ? 318.9263+-24.8698 ? might be 1.0243x slower
<geometric> 275.4657+-2.3287 ^ 240.0061+-1.4349 ^ definitely 1.1477x faster
ToT Patched
Geomean of preferred means:
<scaled-result> 51.3268+-0.7692 ^ 50.0994+-0.3844 ^ definitely 1.0245x faster
rniwa-rmbp:safari rniwa$
Comment on attachment 256202 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=256202&action=review > Source/JavaScriptCore/runtime/ArrayPrototype.cpp:529 > + if (argCount == 1 && previousArray && currentArray && finalArraySize.unsafeGet() < MIN_SPARSE_ARRAY_INDEX) { I’m not sure why we would limit ourselves to doing this optimization when there are exactly two arrays being concatenated. I think we could easily to do the optimization any time we are concatenating a large number of arrays with compatible types. I’m not entirely sure why we can’t do this for arrays that end up longer than MIN_SPARSE_ARRAY_INDEX. > Source/JavaScriptCore/runtime/ArrayPrototype.cpp:532 > + IndexingType type = JSArray::fastConcatType(exec->vm(), *previousArray, *currentArray); > + if (type != NonArray) > + return previousArray->fastConcatWith(*exec, *currentArray); It’s not a great pattern to make all these optimized functions be members of the JSArray class. I believe it’s excessively specific to limit this optimization to JSArray objects. I’m pretty sure that the same optimization would work with two array-like objects. I also think it’s a shame that the entire fastConcatType function ends up in a header because of this. It should be here, in this cpp file. An advantage of keeping more of the code here in ArrayPrototype.cpp is that we can write various helper functions here that make these optimized code paths easier to read with less written out. > Source/JavaScriptCore/runtime/JSArray.cpp:718 > + ASSERT(thisArraySize + otherArraySize < MIN_SPARSE_ARRAY_INDEX); Why is this assertion needed? > Source/JavaScriptCore/runtime/JSArray.cpp:734 > + auto buffer = resultButterfly.contiguous().data(); > + memcpy(buffer, m_butterfly->contiguous().data(), sizeof(JSValue) * thisArraySize); > + memcpy(buffer + thisArraySize, otherButterfly.contiguous().data(), sizeof(JSValue) * otherArraySize); Seems icky to merge the Int32 case and the contiguous() case like this. Not at all sure why the double case needs to be separate and the integer one doesn’t. Comment on attachment 256202 [details] Patch Clearing flags on attachment: 256202 Committed r186358: <http://trac.webkit.org/changeset/186358> All reviewed patches have been landed. Closing bug. build fix in http://trac.webkit.org/changeset/186363 Looks like this broke the GTK build?
[355/2761] Building CXX object Source/JavaScriptCore/CMakeFiles/JavaScriptCore.dir/runtime/JSArrayBufferPrototype.cpp.o
FAILED: /usr/lib/ccache/g++-4.9 -DBUILDING_GTK__=1 -DBUILDING_JavaScriptCore -DBUILDING_WITH_CMAKE=1 -DDATA_DIR=\"share\" -DGETTEXT_PACKAGE=\"WebKit2GTK-4.0\" -DHAVE_CONFIG_H=1 -DJavaScriptCore_EXPORTS -DSTATICALLY_LINKED_WITH_WTF -DUSER_AGENT_GTK_MAJOR_VERSION=601 -DUSER_AGENT_GTK_MINOR_VERSION=1 -DWEBKITGTK_API_VERSION_STRING=\"4.0\" -pipe -std=c++11 -g -fPIC -I. -I../../Source/JavaScriptCore -I../../Source/JavaScriptCore/API -I../../Source/JavaScriptCore/ForwardingHeaders -I../../Source/JavaScriptCore/assembler -I../../Source/JavaScriptCore/bindings -I../../Source/JavaScriptCore/builtins -I../../Source/JavaScriptCore/bytecode -I../../Source/JavaScriptCore/bytecompiler -I../../Source/JavaScriptCore/dfg -I../../Source/JavaScriptCore/disassembler -I../../Source/JavaScriptCore/ftl -I../../Source/JavaScriptCore/heap -I../../Source/JavaScriptCore/debugger -I../../Source/JavaScriptCore/inspector -I../../Source/JavaScriptCore/inspector/agents -I../../Source/JavaScriptCore/inspector/augmentable -I../../Source/JavaScriptCore/inspector/remote -I../../Source/JavaScriptCore/interpreter -I../../Source/JavaScriptCore/jit -I../../Source/JavaScriptCore/llint -I../../Source/JavaScriptCore/llvm -I../../Source/JavaScriptCore/parser -I../../Source/JavaScriptCore/profiler -I../../Source/JavaScriptCore/replay -I../../Source/JavaScriptCore/runtime -I../../Source/JavaScriptCore/tools -I../../Source/JavaScriptCore/yarr -I../../Source/WTF -IDerivedSources -IDerivedSources/ForwardingHeaders -IDerivedSources/JavaScriptCore -IDerivedSources/JavaScriptCore/inspector -I../../Source -I../../Source/JavaScriptCore/disassembler/udis86 -isystem /usr/include/x86_64-linux-gnu -isystem ../DependenciesGTK/Root/include/glib-2.0 -isystem ../DependenciesGTK/Root/lib64/glib-2.0/include -Wall -Wextra -Wcast-align -Wformat-security -Wmissing-format-attribute -Wpointer-arith -Wundef -Wwrite-strings -MMD -MT Source/JavaScriptCore/CMakeFiles/JavaScriptCore.dir/runtime/JSArray.cpp.o -MF Source/JavaScriptCore/CMakeFiles/JavaScriptCore.dir/runtime/JSArray.cpp.o.d -o Source/JavaScriptCore/CMakeFiles/JavaScriptCore.dir/runtime/JSArray.cpp.o -c ../../Source/JavaScriptCore/runtime/JSArray.cpp
../../Source/JavaScriptCore/runtime/JSArray.cpp: In member function ‘JSC::EncodedJSValue JSC::JSArray::fastConcatWith(JSC::ExecState&, JSC::JSArray&)’:
../../Source/JavaScriptCore/runtime/JSArray.cpp:713:56: error: no matching function for call to ‘JSC::JSArray::fastConcatType(JSC::JSArray&, JSC::JSArray&)’
ASSERT(newArrayType == fastConcatType(*this, otherArray));
^
../../Source/JavaScriptCore/runtime/JSArray.cpp:713:56: note: candidate is:
In file included from ../../Source/JavaScriptCore/runtime/JSArray.cpp:24:0:
../../Source/JavaScriptCore/runtime/JSArray.h:81:25: note: static JSC::IndexingType JSC::JSArray::fastConcatType(JSC::VM&, JSC::JSArray&, JSC::JSArray&)
static IndexingType fastConcatType(VM& vm, JSArray& firstArray, JSArray& secondArray)
^
../../Source/JavaScriptCore/runtime/JSArray.h:81:25: note: candidate expects 3 arguments, 2 provided
ninja: build stopped: subcommand failed.
Just the GTK debug build. My fix fixes that, too |