The register spill order in DFG needs to be adjusted to reflect current spilling policy that no primitive value is boxed. Patch forthcoming.
Created attachment 120731 [details] the patch
Performance result on 32bit (Core i7 Nehalem Linux): Some cases get obvious improvement while some get slowdown (needs further investigation). Overall almost neutral. VMs tested: "ToT" at /home/yxian/WebKit_orig/WebKitBuild/Release/Source/JavaScriptCore/shell/jsc_efl "SpillOrder" at /mnt/supplement/WebKit/WebKitBuild/Release_efl/Source/JavaScriptCore/shell/jsc_efl Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. ToT SpillOrder SunSpider: 3d-cube 7.1117+-0.0836 ! 7.4183+-0.0941 ! definitely 1.0431x slower 3d-morph 10.6417+-0.0857 10.6191+-0.1000 3d-raytrace 9.5057+-0.0722 ? 9.5482+-0.0750 ? access-binary-trees 1.8021+-0.0360 ? 1.8418+-0.0472 ? might be 1.0220x slower access-fannkuch 10.7467+-0.0628 ! 10.8864+-0.0622 ! definitely 1.0130x slower access-nbody 5.4244+-0.0447 ^ 4.3335+-0.0586 ^ definitely 1.2517x faster access-nsieve 3.7283+-0.0538 3.6961+-0.0432 bitops-3bit-bits-in-byte 1.1868+-0.0384 1.1696+-0.0299 might be 1.0147x faster bitops-bits-in-byte 4.6701+-0.0406 ? 4.6898+-0.0489 ? bitops-bitwise-and 4.2241+-0.0366 4.2229+-0.0515 bitops-nsieve-bits 6.8010+-0.1106 6.7109+-0.1390 might be 1.0134x faster controlflow-recursive 2.7756+-0.0994 2.7524+-0.0462 crypto-aes 9.8318+-0.1618 9.7819+-0.1195 crypto-md5 3.0158+-0.0298 ? 3.0516+-0.0669 ? might be 1.0118x slower crypto-sha1 2.6072+-0.0896 2.5776+-0.0408 might be 1.0115x faster date-format-tofte 11.6659+-0.1436 ? 11.7558+-0.1014 ? date-format-xparb 11.3672+-0.1453 11.1826+-0.1065 might be 1.0165x faster math-cordic 9.1636+-0.0433 ? 9.1973+-0.0679 ? math-partial-sums 13.8785+-0.0619 13.8386+-0.0674 math-spectral-norm 2.5840+-0.0408 ? 2.6044+-0.0486 ? regexp-dna 9.0942+-0.0692 ? 9.1036+-0.0740 ? string-base64 5.3639+-0.0457 5.3459+-0.0590 string-fasta 9.4544+-0.0840 ? 9.4748+-0.0962 ? string-tagcloud 15.3900+-0.0881 ? 15.5524+-0.1076 ? might be 1.0106x slower string-unpack-code 24.4761+-0.1017 ! 24.7848+-0.2017 ! definitely 1.0126x slower string-validate-input 7.3702+-0.0667 ? 7.4342+-0.0782 ? <arithmetic> * 7.8416+-0.0215 7.8298+-0.0235 might be 1.0015x faster <geometric> 6.2691+-0.0241 6.2310+-0.0176 might be 1.0061x faster <harmonic> 4.7724+-0.0372 4.7361+-0.0222 might be 1.0077x faster ToT SpillOrder V8: crypto 97.1258+-1.0538 ? 99.0318+-1.2908 ? might be 1.0196x slower deltablue 167.9481+-0.6349 ? 168.0176+-0.8081 ? earley-boyer 107.8401+-0.1878 ! 108.4812+-0.4099 ! definitely 1.0059x slower raytrace 53.0712+-0.4534 52.6155+-0.5008 regexp 129.2446+-1.0419 128.8020+-0.5767 richards 173.4236+-0.3544 ^ 171.5025+-1.5316 ^ definitely 1.0112x faster splay 125.3138+-0.4902 124.7199+-0.5666 <arithmetic> 121.9953+-0.3170 121.8815+-0.3005 might be 1.0009x faster <geometric> * 114.7610+-0.3722 114.7226+-0.2784 might be 1.0003x faster <harmonic> 106.2956+-0.4520 106.2297+-0.3151 might be 1.0006x faster ToT SpillOrder Kraken: ai-astar 785.4906+-1.0815 ? 786.9144+-1.8163 ? audio-beat-detection 248.5505+-1.2685 ! 256.1680+-0.2308 ! definitely 1.0306x slower audio-dft 355.2405+-1.9364 ? 358.1129+-2.4869 ? audio-fft 155.0614+-0.0526 ! 163.8682+-0.6396 ! definitely 1.0568x slower audio-oscillator 335.0814+-1.3393 ! 341.2124+-1.6898 ! definitely 1.0183x slower imaging-darkroom 380.4842+-9.7790 ? 382.8186+-10.1730 ? imaging-desaturate 317.8581+-1.0630 317.4988+-0.4798 imaging-gaussian-blur 596.2950+-1.4765 ^ 537.7586+-0.9815 ^ definitely 1.1089x faster json-parse-financial 69.5950+-0.4176 ? 70.1253+-0.5553 ? json-stringify-tinderbox 105.5944+-0.8867 104.1428+-1.3432 might be 1.0139x faster stanford-crypto-aes 132.9845+-0.3843 ? 133.6331+-0.3833 ? stanford-crypto-ccm 127.3089+-0.5082 ? 127.8093+-0.4991 ? stanford-crypto-pbkdf2 281.0584+-0.6893 ! 283.0713+-0.8316 ! definitely 1.0072x slower stanford-crypto-sha256-iterative 108.2076+-0.2986 108.1264+-0.3654 <arithmetic> * 285.6293+-0.8992 ^ 283.6614+-0.8614 ^ definitely 1.0069x faster <geometric> 227.2966+-0.6312 ? 227.6839+-0.7165 ? might be 1.0017x slower <harmonic> 181.7799+-0.4415 ? 182.8114+-0.6295 ? might be 1.0057x slower ToT SpillOrder All benchmarks: <arithmetic> 107.5885+-0.3070 ^ 106.9788+-0.2664 ^ definitely 1.0057x faster <geometric> 28.1674+-0.0808 28.0854+-0.0606 might be 1.0029x faster <harmonic> 8.4065+-0.0638 8.3448+-0.0381 might be 1.0074x faster ToT SpillOrder Geomean of preferred means: <scaled-result> 63.5818+-0.1712 63.3964+-0.1339 might be 1.0029x faster
Performance result on 64bit (Mac OS): Overall almost neutral while a obvious slowdown (8%) on imaging-gaussian-blur is observed (on 32bit there's a 11% improvement for this case), which may be further investigated in the future. VMs tested: "ToT" at /webkit/WebKitBuild-orig/Release/jsc "SpillOrder" at /webkit/WebKitBuild-64/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. ToT SpillOrder SunSpider: 3d-cube 6.5219+-0.0772 ? 6.5384+-0.0815 ? 3d-morph 8.0982+-0.0638 ? 8.1566+-0.0522 ? 3d-raytrace 8.3841+-0.8049 7.8268+-0.1195 might be 1.0712x faster access-binary-trees 1.7863+-0.0935 1.7486+-0.0512 might be 1.0215x faster access-fannkuch 7.3589+-0.0960 7.3325+-0.0762 access-nbody 3.8475+-0.0496 ? 3.8839+-0.0597 ? access-nsieve 3.1162+-0.0518 ? 3.1720+-0.0791 ? might be 1.0179x slower bitops-3bit-bits-in-byte 1.2023+-0.0080 ? 1.2062+-0.0125 ? bitops-bits-in-byte 4.8450+-0.0564 ? 4.8767+-0.0639 ? bitops-bitwise-and 3.1650+-0.0216 3.1500+-0.0052 bitops-nsieve-bits 5.4344+-0.0415 ? 5.4718+-0.0413 ? controlflow-recursive 2.2807+-0.0339 2.2514+-0.0101 might be 1.0130x faster crypto-aes 8.5308+-0.1246 8.4498+-0.1046 crypto-md5 2.4437+-0.0741 2.4285+-0.0325 crypto-sha1 2.2442+-0.0703 ? 2.2763+-0.1707 ? might be 1.0143x slower date-format-tofte 11.0530+-0.0630 ? 11.1376+-0.0764 ? date-format-xparb 9.5936+-0.0409 ^ 9.4306+-0.0888 ^ definitely 1.0173x faster math-cordic 7.0268+-0.0950 6.9927+-0.0693 math-partial-sums 10.3018+-0.0443 10.2964+-0.0644 math-spectral-norm 2.5789+-0.0124 2.5767+-0.0079 regexp-dna 9.0382+-0.1357 ? 9.1914+-0.1509 ? might be 1.0169x slower string-base64 4.6544+-0.0123 ? 4.6783+-0.0162 ? string-fasta 7.8880+-0.1092 ? 7.8938+-0.1069 ? string-tagcloud 12.8570+-0.1142 ? 12.9316+-0.0944 ? string-unpack-code 21.8200+-0.2621 21.6093+-0.1502 string-validate-input 6.0590+-0.0585 ? 6.0633+-0.2153 ? <arithmetic> * 6.6204+-0.0354 6.5989+-0.0214 might be 1.0033x faster <geometric> 5.3390+-0.0330 5.3287+-0.0287 might be 1.0019x faster <harmonic> 4.1997+-0.0354 4.1934+-0.0359 might be 1.0015x faster ToT SpillOrder V8: crypto 75.4973+-0.2389 75.0776+-0.3059 deltablue 161.0954+-0.4027 ! 163.9173+-0.4714 ! definitely 1.0175x slower earley-boyer 99.6115+-1.5928 ? 100.1018+-1.4401 ? raytrace 50.3873+-0.2048 ^ 49.8641+-0.0921 ^ definitely 1.0105x faster regexp 121.3851+-0.1344 ! 122.4113+-0.3352 ! definitely 1.0085x slower richards 135.4591+-0.2543 134.3140+-1.1460 splay 97.6467+-0.9025 96.1546+-1.4374 might be 1.0155x faster <arithmetic> 105.8689+-0.2327 ? 105.9773+-0.2850 ? might be 1.0010x slower <geometric> * 99.7080+-0.2394 99.5747+-0.2911 might be 1.0013x faster <harmonic> 93.0755+-0.2284 92.7282+-0.2595 might be 1.0037x faster ToT SpillOrder Kraken: ai-astar 837.8626+-49.6049 796.1270+-0.4181 might be 1.0524x faster audio-beat-detection 186.1555+-0.9161 ? 187.6734+-1.2043 ? audio-dft 446.1301+-1.5645 ? 453.0984+-8.1987 ? might be 1.0156x slower audio-fft 122.8065+-7.3403 118.5203+-3.1245 might be 1.0362x faster audio-oscillator 256.0711+-1.2247 255.9696+-1.2486 imaging-darkroom 290.8491+-5.3218 ? 291.6089+-5.3305 ? imaging-desaturate 220.6728+-0.0440 ? 220.7556+-0.0591 ? imaging-gaussian-blur 497.9859+-0.9100 ! 540.6406+-0.0741 ! definitely 1.0857x slower json-parse-financial 62.6418+-1.6843 61.9968+-0.0362 might be 1.0104x faster json-stringify-tinderbox 75.5961+-0.1585 75.2695+-0.1742 stanford-crypto-aes 111.9879+-0.3878 111.9130+-0.2919 stanford-crypto-ccm 109.2299+-0.8334 ? 109.3411+-0.8627 ? stanford-crypto-pbkdf2 213.3721+-0.1806 ! 214.8904+-0.2552 ! definitely 1.0071x slower stanford-crypto-sha256-iterative 104.4605+-0.1416 ? 104.5802+-0.1995 ? <arithmetic> * 252.5587+-3.6387 ? 253.0275+-0.8441 ? might be 1.0019x slower <geometric> 191.5593+-1.3249 ? 191.8617+-0.5199 ? might be 1.0016x slower <harmonic> 151.4256+-1.0019 151.0671+-0.3450 might be 1.0024x faster ToT SpillOrder All benchmarks: <arithmetic> 94.6603+-1.1013 ? 94.8042+-0.2741 ? might be 1.0015x slower <geometric> 23.9847+-0.1147 23.9658+-0.0879 might be 1.0008x faster <harmonic> 7.3915+-0.0609 7.3801+-0.0618 might be 1.0015x faster ToT SpillOrder Geomean of preferred means: <scaled-result> 55.0351+-0.3552 54.9871+-0.1370 might be 1.0009x faster
Comment on attachment 120731 [details] the patch Attachment 120731 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/11047213 New failing tests: http/tests/inspector/resource-tree/resource-tree-document-url.html
(In reply to comment #4) > (From update of attachment 120731 [details]) > Attachment 120731 [details] did not pass chromium-ews (chromium-xvfb): > Output: http://queues.webkit.org/results/11047213 > > New failing tests: > http/tests/inspector/resource-tree/resource-tree-document-url.html I think this should be a fake failure as JSC has nothing to do with Chromium.
> I think this should be a fake failure as JSC has nothing to do with Chromium. Thanks. It looks like that test is somewhat flaky.
Created attachment 121216 [details] patch updated Update the patch to _not_ adjust the spill order for 64bit, i.e. still Integers have lower priority to be spilled as there may be additional operations - in some cases the spilled integers are boxed as JS integers when being filled (btw, is it necessary?). The regression of the last patch on imaging-gaussian-blur should be caused by this. So this patch keeps the spill order for 64bit as is, but simply modifies the comments a bit to better describe the reason, and also removes SpillOrderBoolean for 64bit as it's never used.
Comment on attachment 121216 [details] patch updated View in context: https://bugs.webkit.org/attachment.cgi?id=121216&action=review > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h:169 > enum SpillOrder { > SpillOrderConstant = 1, // no spill, and cheap fill > SpillOrderSpilled = 2, // no spill > - SpillOrderJS = 4, // needs spill > - SpillOrderCell = 4, // needs spill > - SpillOrderStorage = 4, // needs spill > - SpillOrderInteger = 5, // needs spill and box > - SpillOrderBoolean = 5, // needs spill and box > - SpillOrderDouble = 6, // needs spill and convert > + SpillOrderStorage = 3, // needs spill > + SpillOrderCell = 3, // needs spill > + SpillOrderJS = 3, // needs spill > + SpillOrderInteger = 4, // needs spill, and complex fill > + SpillOrderDouble = 5, // needs spill > }; > #elif USE(JSVALUE32_64) > enum SpillOrder { > SpillOrderConstant = 1, // no spill, and cheap fill > SpillOrderSpilled = 2, // no spill > - SpillOrderJS = 4, // needs spill > - SpillOrderStorage = 4, // needs spill > - SpillOrderDouble = 4, // needs spill > - SpillOrderInteger = 5, // needs spill and box > - SpillOrderCell = 5, // needs spill and box > - SpillOrderBoolean = 5, // needs spill and box > + SpillOrderStorage = 3, // spill involving one GPR > + SpillOrderCell = 3, // spill involving one GPR > + SpillOrderInteger = 3, // spill involving one GPR > + SpillOrderBoolean = 3, // spill involving one GPR > + SpillOrderJS = 4, // spill involving two GPRs > + SpillOrderDouble = 5, // spill involving one FPR > }; > #endif Does changing any of this have any effect on performance? > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h:-674 > - // We need to box int32 and cell values ... > - // but on JSVALUE64 boxing a cell is a no-op! > - if (spillFormat == DataFormatInteger) > - m_jit.orPtr(GPRInfo::tagTypeNumberRegister, reg); > - Why is this removed? Seems wrong.
Thanks for taking a look at this "nearly two years old" patch. :) (In reply to comment #8) > (From update of attachment 121216 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=121216&action=review > > Does changing any of this have any effect on performance? The performance data was presented in comment #2. The second patch doesn't change the order for JSVALUE64 and as I can remember there's no performance impact on 64bit platforms. > > > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h:-674 > > - // We need to box int32 and cell values ... > > - // but on JSVALUE64 boxing a cell is a no-op! > > - if (spillFormat == DataFormatInteger) > > - m_jit.orPtr(GPRInfo::tagTypeNumberRegister, reg); > > - > > Why is this removed? Seems wrong. This can be removed. Actually we already captured the DataFormatInteger (now DataFormatInt32) case in a previous "case", and when we reached here we actually only have the following cases: RELEASE_ASSERT(spillFormat == DataFormatCell || spillFormat & DataFormatJS);
Comment on attachment 121216 [details] patch updated Clearing review flag on patches from before 2014. If this patch is still relevant, please reset the r? flag.