The DFG should implement the same by-offset optimizations for put_by_id as it already performs for get_by_id.
Created attachment 109236 [details] work in progress This seems to sort of work. Still doing more tests.
Nice speed-up on V8. Benchmark report for V8. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "PutByOffset" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree PutByOffset crypto 70.8626+-0.3336 ! 72.0406+-0.3267 ! definitely 1.0166x slower deltablue 227.4823+-1.2145 225.5072+-2.2500 earley-boyer 90.0359+-0.8468 ^ 87.4768+-0.3990 ^ definitely 1.0293x faster raytrace 62.8978+-0.2882 ^ 60.9215+-0.5096 ^ definitely 1.0324x faster regexp 103.5795+-0.4278 102.7796+-0.4376 richards 198.4522+-0.8880 ^ 186.9556+-0.6424 ^ definitely 1.0615x faster splay 90.2233+-0.3854 ? 91.0759+-0.7460 ? <arithmetic> 120.5048+-0.2634 ^ 118.1082+-0.3645 ^ definitely 1.0203x faster <geometric> 107.8085+-0.1881 ^ 106.1108+-0.2448 ^ definitely 1.0160x faster <harmonic> 98.1841+-0.1934 ^ 96.9183+-0.2551 ^ definitely 1.0131x faster
Looks like it's only a win on V8. Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "PutByOffset" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree PutByOffset SunSpider: 3d-cube 7.4733+-0.2370 ? 7.4909+-0.2434 ? 3d-morph 7.3221+-0.1314 ? 7.4902+-0.1818 ? might be 1.0230x slower 3d-raytrace 8.2079+-0.2675 ? 8.2614+-0.2639 ? access-binary-trees 2.0855+-0.0733 ^ 1.7651+-0.0578 ^ definitely 1.1816x faster access-fannkuch 6.5079+-0.1798 6.3323+-0.0819 might be 1.0277x faster access-nbody 3.6268+-0.0917 3.4877+-0.0829 might be 1.0399x faster access-nsieve 2.6092+-0.0605 ? 2.6369+-0.0700 ? might be 1.0106x slower bitops-3bit-bits-in-byte 1.7039+-0.0381 ? 1.7147+-0.0353 ? bitops-bits-in-byte 2.7511+-0.0767 ? 2.7978+-0.0933 ? might be 1.0170x slower bitops-bitwise-and 3.4511+-0.0848 ^ 3.2750+-0.0814 ^ definitely 1.0538x faster bitops-nsieve-bits 5.5331+-0.1009 5.4708+-0.0805 might be 1.0114x faster controlflow-recursive 2.0846+-0.0539 2.0461+-0.0608 might be 1.0188x faster crypto-aes 6.6652+-0.2079 ? 6.7579+-0.1919 ? might be 1.0139x slower crypto-md5 2.8164+-0.0746 ? 2.8615+-0.0950 ? might be 1.0160x slower crypto-sha1 2.5247+-0.0600 2.4973+-0.0644 might be 1.0110x faster date-format-tofte 10.0118+-0.2674 ? 10.1345+-0.2735 ? might be 1.0123x slower date-format-xparb 9.2736+-0.3044 ? 9.5957+-0.3815 ? might be 1.0347x slower math-cordic 6.5799+-0.2671 6.4915+-0.1929 might be 1.0136x faster math-partial-sums 7.5752+-0.1633 ? 7.6859+-0.1510 ? might be 1.0146x slower math-spectral-norm 2.8814+-0.0999 2.8715+-0.0738 regexp-dna 10.8292+-0.1427 ? 10.8850+-0.1618 ? string-base64 6.0760+-0.2190 ? 6.1591+-0.2329 ? might be 1.0137x slower string-fasta 7.0525+-0.2171 ? 7.1913+-0.3145 ? might be 1.0197x slower string-tagcloud 11.8762+-0.3947 ? 11.9610+-0.3092 ? string-unpack-code 21.3559+-0.4601 21.2620+-0.4328 string-validate-input 6.2368+-0.1467 ? 6.3990+-0.2173 ? might be 1.0260x slower <arithmetic> 6.3504+-0.0324 ? 6.3662+-0.0315 ? <geometric> 5.2336+-0.0294 5.2091+-0.0221 <harmonic> 4.3158+-0.0355 ^ 4.2486+-0.0303 ^ definitely 1.0158x faster TipOfTree PutByOffset V8: crypto 71.1752+-0.4499 ! 72.1471+-0.3928 ! definitely 1.0137x slower deltablue 228.0470+-1.1437 ? 228.2657+-2.3416 ? earley-boyer 89.8495+-0.6176 ^ 88.1906+-0.5496 ^ definitely 1.0188x faster raytrace 62.3209+-0.3664 ^ 61.1129+-0.4345 ^ definitely 1.0198x faster regexp 104.0114+-0.7839 103.5840+-0.6265 richards 199.0179+-1.0726 ^ 187.5831+-1.0648 ^ definitely 1.0610x faster splay 91.0199+-0.7136 ? 91.4576+-0.6711 ? <arithmetic> 120.7774+-0.3729 ^ 118.9059+-0.3800 ^ definitely 1.0157x faster <geometric> 107.9834+-0.3348 ^ 106.7217+-0.2557 ^ definitely 1.0118x faster <harmonic> 98.2570+-0.3084 ^ 97.3992+-0.2199 ^ definitely 1.0088x faster TipOfTree PutByOffset Kraken: ai-astar 493.2977+-4.6697 491.8601+-3.7826 audio-beat-detection 190.4707+-1.7293 ? 190.7607+-1.5873 ? audio-dft 280.9297+-2.4491 279.2958+-2.6566 audio-fft 128.3063+-1.0409 ? 129.6684+-1.1003 ? might be 1.0106x slower audio-oscillator 256.1838+-1.9104 255.2704+-2.2988 imaging-darkroom 422.3094+-1.3344 ^ 418.5479+-1.3490 ^ definitely 1.0090x faster imaging-desaturate 223.5703+-0.8074 ? 224.3784+-1.7922 ? imaging-gaussian-blur 582.5605+-2.3726 ? 583.2726+-2.1460 ? json-parse-financial 48.5278+-0.8188 47.7118+-0.3265 might be 1.0171x faster json-stringify-tinderbox 68.5631+-0.5214 68.4865+-0.3847 stanford-crypto-aes 130.9508+-1.2565 130.0566+-1.3290 stanford-crypto-ccm 101.1984+-0.9027 ? 102.0392+-0.9587 ? stanford-crypto-pbkdf2 193.0240+-1.6461 ! 196.9612+-1.7595 ! definitely 1.0204x slower stanford-crypto-sha256-iterative 83.5403+-0.4886 ! 85.1920+-0.5650 ! definitely 1.0198x slower <arithmetic> 228.8166+-0.5134 ? 228.8215+-0.4248 ? <geometric> 178.7126+-0.6271 ? 178.9531+-0.3788 ? <harmonic> 138.6559+-0.7664 ? 138.7200+-0.3518 ? TipOfTree PutByOffset All benchmarks: <arithmetic> 89.6593+-0.1651 89.3907+-0.1544 <geometric> 23.5139+-0.0709 23.4213+-0.0565 <harmonic> 7.5847+-0.0606 ^ 7.4693+-0.0520 ^ definitely 1.0155x faster
Created attachment 109243 [details] the patch - works on 64-bit Still need to do the 32_64 work.
Created attachment 109347 [details] the patch - 32_64 and 64 work
Attachment 109347 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/dfg/DFGPropagator.cpp:1034: An else statement can be removed when the prior "if" concludes with a return, break, continue or goto statement. [readability/control_flow] [4] Total errors found: 1 in 9 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 109353 [details] the patch - fix style and merge
Latest numbers, after merging and such. It appears that there are slow-downs in some benchmarks, though the slow-downs are smaller than the speed-ups. The slow-downs are all because PutById has no polymorphic case, and in some (rare!) cases the structure that it's patched on is no longer live. We should fix that by blowing away all property access patches (and maybe all patches, period) whenever we trigger recompilation. For now, I recommend landing this, because it's a bigger win (1.6% on V8) than it is a loss (0.6% on Kraken). Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "PutByOffset" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree PutByOffset SunSpider: 3d-cube 7.6548+-0.3226 7.3917+-0.1907 might be 1.0356x faster 3d-morph 7.4151+-0.1395 ? 7.4507+-0.1249 ? 3d-raytrace 8.4812+-0.2948 8.3150+-0.2184 might be 1.0200x faster access-binary-trees 2.1094+-0.0995 ^ 1.8069+-0.0613 ^ definitely 1.1674x faster access-fannkuch 6.5808+-0.1922 ? 6.7499+-0.3102 ? might be 1.0257x slower access-nbody 3.6466+-0.0806 3.5337+-0.0536 might be 1.0320x faster access-nsieve 2.6069+-0.0616 ? 2.6798+-0.0636 ? might be 1.0280x slower bitops-3bit-bits-in-byte 1.7354+-0.0394 1.7119+-0.0337 might be 1.0137x faster bitops-bits-in-byte 2.7756+-0.0751 2.7511+-0.0885 bitops-bitwise-and 3.4489+-0.0767 ^ 3.2602+-0.0614 ^ definitely 1.0579x faster bitops-nsieve-bits 5.5177+-0.1257 ? 5.5852+-0.2135 ? might be 1.0122x slower controlflow-recursive 2.1451+-0.0706 2.1262+-0.1009 crypto-aes 6.8211+-0.1833 6.7461+-0.2125 might be 1.0111x faster crypto-md5 2.8203+-0.0738 2.7874+-0.0739 might be 1.0118x faster crypto-sha1 2.5016+-0.0660 ? 2.5311+-0.0773 ? might be 1.0118x slower date-format-tofte 10.0443+-0.2708 ? 10.2338+-0.2693 ? might be 1.0189x slower date-format-xparb 10.0244+-0.4041 9.7601+-0.2739 might be 1.0271x faster math-cordic 6.3638+-0.2144 6.3315+-0.0620 math-partial-sums 7.7000+-0.0621 ? 7.7615+-0.1886 ? math-spectral-norm 3.0138+-0.3396 2.8333+-0.0575 might be 1.0637x faster regexp-dna 10.8769+-0.1672 ? 10.9393+-0.2064 ? string-base64 6.2164+-0.1799 6.0559+-0.2046 might be 1.0265x faster string-fasta 7.0881+-0.1568 6.8526+-0.1197 might be 1.0344x faster string-tagcloud 12.1531+-0.2665 11.9861+-0.3321 might be 1.0139x faster string-unpack-code 21.3522+-0.4916 ? 21.4023+-0.4834 ? string-validate-input 6.3879+-0.2148 ? 6.4203+-0.2185 ? <arithmetic> 6.4416+-0.0494 6.3848+-0.0395 <geometric> 5.3031+-0.0494 5.2245+-0.0305 might be 1.0150x faster <harmonic> 4.3660+-0.0548 ^ 4.2681+-0.0382 ^ definitely 1.0229x faster TipOfTree PutByOffset V8: crypto 72.1369+-0.7540 ? 73.1136+-0.4965 ? might be 1.0135x slower deltablue 230.9621+-0.8870 ^ 226.2957+-1.9236 ^ definitely 1.0206x faster earley-boyer 91.2661+-0.8792 ^ 87.8639+-0.2933 ^ definitely 1.0387x faster raytrace 63.6002+-0.4248 ^ 62.0997+-1.0089 ^ definitely 1.0242x faster regexp 104.6459+-0.6469 ? 106.7863+-1.6047 ? might be 1.0205x slower richards 203.1269+-2.4756 ^ 188.0540+-1.1212 ^ definitely 1.0802x faster splay 91.4723+-0.7206 ? 92.3211+-0.9555 ? <arithmetic> 122.4586+-0.3488 ^ 119.5049+-0.5252 ^ definitely 1.0247x faster <geometric> 109.4324+-0.3007 ^ 107.6235+-0.4169 ^ definitely 1.0168x faster <harmonic> 99.5829+-0.3574 ^ 98.4368+-0.4103 ^ definitely 1.0116x faster TipOfTree PutByOffset Kraken: ai-astar 492.2343+-4.4177 ? 500.2010+-5.8382 ? might be 1.0162x slower audio-beat-detection 190.9514+-0.5210 ? 193.0569+-1.7599 ? might be 1.0110x slower audio-dft 287.1908+-2.8916 284.1722+-4.7994 might be 1.0106x faster audio-fft 128.8452+-1.2426 ? 128.9132+-1.2200 ? audio-oscillator 260.5544+-1.9829 259.4385+-2.5186 imaging-darkroom 420.9855+-1.0263 ? 421.8993+-1.7584 ? imaging-desaturate 225.8952+-2.0593 ? 228.8723+-2.6980 ? might be 1.0132x slower imaging-gaussian-blur 586.2501+-4.6554 ? 589.6864+-4.7268 ? json-parse-financial 48.4939+-0.2546 ? 48.7435+-1.2599 ? json-stringify-tinderbox 68.6375+-0.3577 ! 70.0573+-0.9318 ! definitely 1.0207x slower stanford-crypto-aes 133.3343+-1.9512 132.8290+-2.0660 stanford-crypto-ccm 102.4281+-0.6916 102.1125+-1.6788 stanford-crypto-pbkdf2 196.7817+-1.7446 ? 201.3275+-4.3554 ? might be 1.0231x slower stanford-crypto-sha256-iterative 85.1036+-0.6282 ? 85.6230+-0.8534 ? <arithmetic> 230.5490+-0.5868 ! 231.9238+-0.7332 ! definitely 1.0060x slower <geometric> 180.3196+-0.3844 ! 181.3460+-0.6401 ! definitely 1.0057x slower <harmonic> 139.7823+-0.3152 ? 140.6246+-0.7808 ? TipOfTree PutByOffset All benchmarks: <arithmetic> 90.4761+-0.2161 90.4143+-0.2503 <geometric> 23.7963+-0.1389 23.5822+-0.0897 <harmonic> 7.6727+-0.0940 ^ 7.5051+-0.0658 ^ definitely 1.0223x faster
Comment on attachment 109353 [details] the patch - fix style and merge View in context: https://bugs.webkit.org/attachment.cgi?id=109353&action=review r=me, but you should probably drop the write barrier if the cell is already in the old generation > Source/JavaScriptCore/dfg/DFGJITCodeGenerator.cpp:270 > +void JITCodeGenerator::writeBarrier(GPRReg ownerGPR, JSCell* value, WriteBarrierUseKind useKind, GPRReg scratch1, GPRReg scratch2) if value is a cell you can do a generation check (Heap::isMarked()) you can avoid planting this write barrier entirely
(In reply to comment #9) > (From update of attachment 109353 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=109353&action=review > > r=me, but you should probably drop the write barrier if the cell is already in the old generation > > > Source/JavaScriptCore/dfg/DFGJITCodeGenerator.cpp:270 > > +void JITCodeGenerator::writeBarrier(GPRReg ownerGPR, JSCell* value, WriteBarrierUseKind useKind, GPRReg scratch1, GPRReg scratch2) > > if value is a cell you can do a generation check (Heap::isMarked()) you can avoid planting this write barrier entirely Right, good point! I made that change, will land.
Landed in r96443.