NEW 75342
Adjust spill order in DFG
https://bugs.webkit.org/show_bug.cgi?id=75342
Summary Adjust spill order in DFG
Yuqiang Xian
Reported 2011-12-29 04:39:16 PST
The register spill order in DFG needs to be adjusted to reflect current spilling policy that no primitive value is boxed. Patch forthcoming.
Attachments
the patch (4.93 KB, patch)
2011-12-29 04:46 PST, Yuqiang Xian
webkit.review.bot: commit-queue-
patch updated (4.98 KB, patch)
2012-01-04 22:18 PST, Yuqiang Xian
no flags
Yuqiang Xian
Comment 1 2011-12-29 04:46:16 PST
Created attachment 120731 [details] the patch
Yuqiang Xian
Comment 2 2011-12-29 04:52:23 PST
Performance result on 32bit (Core i7 Nehalem Linux): Some cases get obvious improvement while some get slowdown (needs further investigation). Overall almost neutral. VMs tested: "ToT" at /home/yxian/WebKit_orig/WebKitBuild/Release/Source/JavaScriptCore/shell/jsc_efl "SpillOrder" at /mnt/supplement/WebKit/WebKitBuild/Release_efl/Source/JavaScriptCore/shell/jsc_efl Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. ToT SpillOrder SunSpider: 3d-cube 7.1117+-0.0836 ! 7.4183+-0.0941 ! definitely 1.0431x slower 3d-morph 10.6417+-0.0857 10.6191+-0.1000 3d-raytrace 9.5057+-0.0722 ? 9.5482+-0.0750 ? access-binary-trees 1.8021+-0.0360 ? 1.8418+-0.0472 ? might be 1.0220x slower access-fannkuch 10.7467+-0.0628 ! 10.8864+-0.0622 ! definitely 1.0130x slower access-nbody 5.4244+-0.0447 ^ 4.3335+-0.0586 ^ definitely 1.2517x faster access-nsieve 3.7283+-0.0538 3.6961+-0.0432 bitops-3bit-bits-in-byte 1.1868+-0.0384 1.1696+-0.0299 might be 1.0147x faster bitops-bits-in-byte 4.6701+-0.0406 ? 4.6898+-0.0489 ? bitops-bitwise-and 4.2241+-0.0366 4.2229+-0.0515 bitops-nsieve-bits 6.8010+-0.1106 6.7109+-0.1390 might be 1.0134x faster controlflow-recursive 2.7756+-0.0994 2.7524+-0.0462 crypto-aes 9.8318+-0.1618 9.7819+-0.1195 crypto-md5 3.0158+-0.0298 ? 3.0516+-0.0669 ? might be 1.0118x slower crypto-sha1 2.6072+-0.0896 2.5776+-0.0408 might be 1.0115x faster date-format-tofte 11.6659+-0.1436 ? 11.7558+-0.1014 ? date-format-xparb 11.3672+-0.1453 11.1826+-0.1065 might be 1.0165x faster math-cordic 9.1636+-0.0433 ? 9.1973+-0.0679 ? math-partial-sums 13.8785+-0.0619 13.8386+-0.0674 math-spectral-norm 2.5840+-0.0408 ? 2.6044+-0.0486 ? regexp-dna 9.0942+-0.0692 ? 9.1036+-0.0740 ? string-base64 5.3639+-0.0457 5.3459+-0.0590 string-fasta 9.4544+-0.0840 ? 9.4748+-0.0962 ? string-tagcloud 15.3900+-0.0881 ? 15.5524+-0.1076 ? might be 1.0106x slower string-unpack-code 24.4761+-0.1017 ! 24.7848+-0.2017 ! definitely 1.0126x slower string-validate-input 7.3702+-0.0667 ? 7.4342+-0.0782 ? <arithmetic> * 7.8416+-0.0215 7.8298+-0.0235 might be 1.0015x faster <geometric> 6.2691+-0.0241 6.2310+-0.0176 might be 1.0061x faster <harmonic> 4.7724+-0.0372 4.7361+-0.0222 might be 1.0077x faster ToT SpillOrder V8: crypto 97.1258+-1.0538 ? 99.0318+-1.2908 ? might be 1.0196x slower deltablue 167.9481+-0.6349 ? 168.0176+-0.8081 ? earley-boyer 107.8401+-0.1878 ! 108.4812+-0.4099 ! definitely 1.0059x slower raytrace 53.0712+-0.4534 52.6155+-0.5008 regexp 129.2446+-1.0419 128.8020+-0.5767 richards 173.4236+-0.3544 ^ 171.5025+-1.5316 ^ definitely 1.0112x faster splay 125.3138+-0.4902 124.7199+-0.5666 <arithmetic> 121.9953+-0.3170 121.8815+-0.3005 might be 1.0009x faster <geometric> * 114.7610+-0.3722 114.7226+-0.2784 might be 1.0003x faster <harmonic> 106.2956+-0.4520 106.2297+-0.3151 might be 1.0006x faster ToT SpillOrder Kraken: ai-astar 785.4906+-1.0815 ? 786.9144+-1.8163 ? audio-beat-detection 248.5505+-1.2685 ! 256.1680+-0.2308 ! definitely 1.0306x slower audio-dft 355.2405+-1.9364 ? 358.1129+-2.4869 ? audio-fft 155.0614+-0.0526 ! 163.8682+-0.6396 ! definitely 1.0568x slower audio-oscillator 335.0814+-1.3393 ! 341.2124+-1.6898 ! definitely 1.0183x slower imaging-darkroom 380.4842+-9.7790 ? 382.8186+-10.1730 ? imaging-desaturate 317.8581+-1.0630 317.4988+-0.4798 imaging-gaussian-blur 596.2950+-1.4765 ^ 537.7586+-0.9815 ^ definitely 1.1089x faster json-parse-financial 69.5950+-0.4176 ? 70.1253+-0.5553 ? json-stringify-tinderbox 105.5944+-0.8867 104.1428+-1.3432 might be 1.0139x faster stanford-crypto-aes 132.9845+-0.3843 ? 133.6331+-0.3833 ? stanford-crypto-ccm 127.3089+-0.5082 ? 127.8093+-0.4991 ? stanford-crypto-pbkdf2 281.0584+-0.6893 ! 283.0713+-0.8316 ! definitely 1.0072x slower stanford-crypto-sha256-iterative 108.2076+-0.2986 108.1264+-0.3654 <arithmetic> * 285.6293+-0.8992 ^ 283.6614+-0.8614 ^ definitely 1.0069x faster <geometric> 227.2966+-0.6312 ? 227.6839+-0.7165 ? might be 1.0017x slower <harmonic> 181.7799+-0.4415 ? 182.8114+-0.6295 ? might be 1.0057x slower ToT SpillOrder All benchmarks: <arithmetic> 107.5885+-0.3070 ^ 106.9788+-0.2664 ^ definitely 1.0057x faster <geometric> 28.1674+-0.0808 28.0854+-0.0606 might be 1.0029x faster <harmonic> 8.4065+-0.0638 8.3448+-0.0381 might be 1.0074x faster ToT SpillOrder Geomean of preferred means: <scaled-result> 63.5818+-0.1712 63.3964+-0.1339 might be 1.0029x faster
Yuqiang Xian
Comment 3 2011-12-29 04:59:52 PST
Performance result on 64bit (Mac OS): Overall almost neutral while a obvious slowdown (8%) on imaging-gaussian-blur is observed (on 32bit there's a 11% improvement for this case), which may be further investigated in the future. VMs tested: "ToT" at /webkit/WebKitBuild-orig/Release/jsc "SpillOrder" at /webkit/WebKitBuild-64/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. ToT SpillOrder SunSpider: 3d-cube 6.5219+-0.0772 ? 6.5384+-0.0815 ? 3d-morph 8.0982+-0.0638 ? 8.1566+-0.0522 ? 3d-raytrace 8.3841+-0.8049 7.8268+-0.1195 might be 1.0712x faster access-binary-trees 1.7863+-0.0935 1.7486+-0.0512 might be 1.0215x faster access-fannkuch 7.3589+-0.0960 7.3325+-0.0762 access-nbody 3.8475+-0.0496 ? 3.8839+-0.0597 ? access-nsieve 3.1162+-0.0518 ? 3.1720+-0.0791 ? might be 1.0179x slower bitops-3bit-bits-in-byte 1.2023+-0.0080 ? 1.2062+-0.0125 ? bitops-bits-in-byte 4.8450+-0.0564 ? 4.8767+-0.0639 ? bitops-bitwise-and 3.1650+-0.0216 3.1500+-0.0052 bitops-nsieve-bits 5.4344+-0.0415 ? 5.4718+-0.0413 ? controlflow-recursive 2.2807+-0.0339 2.2514+-0.0101 might be 1.0130x faster crypto-aes 8.5308+-0.1246 8.4498+-0.1046 crypto-md5 2.4437+-0.0741 2.4285+-0.0325 crypto-sha1 2.2442+-0.0703 ? 2.2763+-0.1707 ? might be 1.0143x slower date-format-tofte 11.0530+-0.0630 ? 11.1376+-0.0764 ? date-format-xparb 9.5936+-0.0409 ^ 9.4306+-0.0888 ^ definitely 1.0173x faster math-cordic 7.0268+-0.0950 6.9927+-0.0693 math-partial-sums 10.3018+-0.0443 10.2964+-0.0644 math-spectral-norm 2.5789+-0.0124 2.5767+-0.0079 regexp-dna 9.0382+-0.1357 ? 9.1914+-0.1509 ? might be 1.0169x slower string-base64 4.6544+-0.0123 ? 4.6783+-0.0162 ? string-fasta 7.8880+-0.1092 ? 7.8938+-0.1069 ? string-tagcloud 12.8570+-0.1142 ? 12.9316+-0.0944 ? string-unpack-code 21.8200+-0.2621 21.6093+-0.1502 string-validate-input 6.0590+-0.0585 ? 6.0633+-0.2153 ? <arithmetic> * 6.6204+-0.0354 6.5989+-0.0214 might be 1.0033x faster <geometric> 5.3390+-0.0330 5.3287+-0.0287 might be 1.0019x faster <harmonic> 4.1997+-0.0354 4.1934+-0.0359 might be 1.0015x faster ToT SpillOrder V8: crypto 75.4973+-0.2389 75.0776+-0.3059 deltablue 161.0954+-0.4027 ! 163.9173+-0.4714 ! definitely 1.0175x slower earley-boyer 99.6115+-1.5928 ? 100.1018+-1.4401 ? raytrace 50.3873+-0.2048 ^ 49.8641+-0.0921 ^ definitely 1.0105x faster regexp 121.3851+-0.1344 ! 122.4113+-0.3352 ! definitely 1.0085x slower richards 135.4591+-0.2543 134.3140+-1.1460 splay 97.6467+-0.9025 96.1546+-1.4374 might be 1.0155x faster <arithmetic> 105.8689+-0.2327 ? 105.9773+-0.2850 ? might be 1.0010x slower <geometric> * 99.7080+-0.2394 99.5747+-0.2911 might be 1.0013x faster <harmonic> 93.0755+-0.2284 92.7282+-0.2595 might be 1.0037x faster ToT SpillOrder Kraken: ai-astar 837.8626+-49.6049 796.1270+-0.4181 might be 1.0524x faster audio-beat-detection 186.1555+-0.9161 ? 187.6734+-1.2043 ? audio-dft 446.1301+-1.5645 ? 453.0984+-8.1987 ? might be 1.0156x slower audio-fft 122.8065+-7.3403 118.5203+-3.1245 might be 1.0362x faster audio-oscillator 256.0711+-1.2247 255.9696+-1.2486 imaging-darkroom 290.8491+-5.3218 ? 291.6089+-5.3305 ? imaging-desaturate 220.6728+-0.0440 ? 220.7556+-0.0591 ? imaging-gaussian-blur 497.9859+-0.9100 ! 540.6406+-0.0741 ! definitely 1.0857x slower json-parse-financial 62.6418+-1.6843 61.9968+-0.0362 might be 1.0104x faster json-stringify-tinderbox 75.5961+-0.1585 75.2695+-0.1742 stanford-crypto-aes 111.9879+-0.3878 111.9130+-0.2919 stanford-crypto-ccm 109.2299+-0.8334 ? 109.3411+-0.8627 ? stanford-crypto-pbkdf2 213.3721+-0.1806 ! 214.8904+-0.2552 ! definitely 1.0071x slower stanford-crypto-sha256-iterative 104.4605+-0.1416 ? 104.5802+-0.1995 ? <arithmetic> * 252.5587+-3.6387 ? 253.0275+-0.8441 ? might be 1.0019x slower <geometric> 191.5593+-1.3249 ? 191.8617+-0.5199 ? might be 1.0016x slower <harmonic> 151.4256+-1.0019 151.0671+-0.3450 might be 1.0024x faster ToT SpillOrder All benchmarks: <arithmetic> 94.6603+-1.1013 ? 94.8042+-0.2741 ? might be 1.0015x slower <geometric> 23.9847+-0.1147 23.9658+-0.0879 might be 1.0008x faster <harmonic> 7.3915+-0.0609 7.3801+-0.0618 might be 1.0015x faster ToT SpillOrder Geomean of preferred means: <scaled-result> 55.0351+-0.3552 54.9871+-0.1370 might be 1.0009x faster
WebKit Review Bot
Comment 4 2011-12-29 05:33:10 PST
Comment on attachment 120731 [details] the patch Attachment 120731 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/11047213 New failing tests: http/tests/inspector/resource-tree/resource-tree-document-url.html
Yuqiang Xian
Comment 5 2011-12-29 16:53:23 PST
(In reply to comment #4) > (From update of attachment 120731 [details]) > Attachment 120731 [details] did not pass chromium-ews (chromium-xvfb): > Output: http://queues.webkit.org/results/11047213 > > New failing tests: > http/tests/inspector/resource-tree/resource-tree-document-url.html I think this should be a fake failure as JSC has nothing to do with Chromium.
Adam Barth
Comment 6 2011-12-29 16:59:01 PST
> I think this should be a fake failure as JSC has nothing to do with Chromium. Thanks. It looks like that test is somewhat flaky.
Yuqiang Xian
Comment 7 2012-01-04 22:18:30 PST
Created attachment 121216 [details] patch updated Update the patch to _not_ adjust the spill order for 64bit, i.e. still Integers have lower priority to be spilled as there may be additional operations - in some cases the spilled integers are boxed as JS integers when being filled (btw, is it necessary?). The regression of the last patch on imaging-gaussian-blur should be caused by this. So this patch keeps the spill order for 64bit as is, but simply modifies the comments a bit to better describe the reason, and also removes SpillOrderBoolean for 64bit as it's never used.
Filip Pizlo
Comment 8 2013-10-31 11:32:30 PDT
Comment on attachment 121216 [details] patch updated View in context: https://bugs.webkit.org/attachment.cgi?id=121216&action=review > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h:169 > enum SpillOrder { > SpillOrderConstant = 1, // no spill, and cheap fill > SpillOrderSpilled = 2, // no spill > - SpillOrderJS = 4, // needs spill > - SpillOrderCell = 4, // needs spill > - SpillOrderStorage = 4, // needs spill > - SpillOrderInteger = 5, // needs spill and box > - SpillOrderBoolean = 5, // needs spill and box > - SpillOrderDouble = 6, // needs spill and convert > + SpillOrderStorage = 3, // needs spill > + SpillOrderCell = 3, // needs spill > + SpillOrderJS = 3, // needs spill > + SpillOrderInteger = 4, // needs spill, and complex fill > + SpillOrderDouble = 5, // needs spill > }; > #elif USE(JSVALUE32_64) > enum SpillOrder { > SpillOrderConstant = 1, // no spill, and cheap fill > SpillOrderSpilled = 2, // no spill > - SpillOrderJS = 4, // needs spill > - SpillOrderStorage = 4, // needs spill > - SpillOrderDouble = 4, // needs spill > - SpillOrderInteger = 5, // needs spill and box > - SpillOrderCell = 5, // needs spill and box > - SpillOrderBoolean = 5, // needs spill and box > + SpillOrderStorage = 3, // spill involving one GPR > + SpillOrderCell = 3, // spill involving one GPR > + SpillOrderInteger = 3, // spill involving one GPR > + SpillOrderBoolean = 3, // spill involving one GPR > + SpillOrderJS = 4, // spill involving two GPRs > + SpillOrderDouble = 5, // spill involving one FPR > }; > #endif Does changing any of this have any effect on performance? > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h:-674 > - // We need to box int32 and cell values ... > - // but on JSVALUE64 boxing a cell is a no-op! > - if (spillFormat == DataFormatInteger) > - m_jit.orPtr(GPRInfo::tagTypeNumberRegister, reg); > - Why is this removed? Seems wrong.
Yuqiang Xian
Comment 9 2013-10-31 18:00:27 PDT
Thanks for taking a look at this "nearly two years old" patch. :) (In reply to comment #8) > (From update of attachment 121216 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=121216&action=review > > Does changing any of this have any effect on performance? The performance data was presented in comment #2. The second patch doesn't change the order for JSVALUE64 and as I can remember there's no performance impact on 64bit platforms. > > > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h:-674 > > - // We need to box int32 and cell values ... > > - // but on JSVALUE64 boxing a cell is a no-op! > > - if (spillFormat == DataFormatInteger) > > - m_jit.orPtr(GPRInfo::tagTypeNumberRegister, reg); > > - > > Why is this removed? Seems wrong. This can be removed. Actually we already captured the DataFormatInteger (now DataFormatInt32) case in a previous "case", and when we reached here we actually only have the following cases: RELEASE_ASSERT(spillFormat == DataFormatCell || spillFormat & DataFormatJS);
Anders Carlsson
Comment 10 2014-02-05 10:51:29 PST
Comment on attachment 121216 [details] patch updated Clearing review flag on patches from before 2014. If this patch is still relevant, please reset the r? flag.
Note You need to log in before you can comment on or make changes to this bug.