RESOLVED FIXED 162316
The write barrier should be down with TSO
https://bugs.webkit.org/show_bug.cgi?id=162316
Summary The write barrier should be down with TSO
Filip Pizlo
Reported 2016-09-20 13:56:21 PDT
Patch forthcoming.
Attachments
work in progress (29.60 KB, patch)
2016-09-23 09:59 PDT, Filip Pizlo
no flags
TSO barrier (51.33 KB, patch)
2016-09-23 17:47 PDT, Filip Pizlo
no flags
a different approach (62.29 KB, patch)
2016-09-24 10:54 PDT, Filip Pizlo
no flags
possibly cheaper and less wrong TSO barrier (70.57 KB, patch)
2016-09-25 15:36 PDT, Filip Pizlo
no flags
the patch (80.65 KB, patch)
2016-09-25 17:17 PDT, Filip Pizlo
no flags
the patch (80.37 KB, patch)
2016-09-25 17:22 PDT, Filip Pizlo
no flags
the patch (80.38 KB, patch)
2016-09-25 17:31 PDT, Filip Pizlo
no flags
the patch (78.67 KB, patch)
2016-09-25 18:06 PDT, Filip Pizlo
no flags
the patch (78.67 KB, patch)
2016-09-25 18:11 PDT, Filip Pizlo
no flags
the patch (83.80 KB, patch)
2016-09-25 22:54 PDT, Filip Pizlo
no flags
trying to get around the regexp regression (100.44 KB, patch)
2016-09-25 23:57 PDT, Filip Pizlo
no flags
the patch (122.92 KB, patch)
2016-09-26 12:37 PDT, Filip Pizlo
no flags
the patch (123.06 KB, patch)
2016-09-26 12:49 PDT, Filip Pizlo
no flags
the patch (123.16 KB, patch)
2016-09-26 14:27 PDT, Filip Pizlo
no flags
the patch (123.16 KB, patch)
2016-09-26 14:45 PDT, Filip Pizlo
ggaren: review+
patch for landing (123.74 KB, patch)
2016-09-27 12:39 PDT, Filip Pizlo
no flags
rebased patch (123.61 KB, patch)
2016-09-28 09:51 PDT, Filip Pizlo
no flags
rebased patch (123.56 KB, patch)
2016-09-28 13:34 PDT, Filip Pizlo
no flags
Filip Pizlo
Comment 1 2016-09-20 13:57:24 PDT
It needs an mfence/dmbish. We can either make this unconditional, if it's fast enough, or we can do hacks. I'll try unconditional first.
Filip Pizlo
Comment 2 2016-09-20 14:19:50 PDT
Looks like putting an mfence into the barrier is an enormous slowdown. We can't do it.
Filip Pizlo
Comment 3 2016-09-22 21:25:41 PDT
Here's what happens if we put the ortop store-load fence before every store barrier. Biggest slow down on any benchmark: 2x slower on richards Biggest suite slow down: 12% slower on Octane It's awfully tempting to say that the barrier is ortop behind a branch. Benchmark report for SunSpider, LongSpider, Octane, Kraken, and AsmBench on murderface (MacBookPro11,5). VMs tested: "TipOfTree" at /Volumes/Data/secondary/OpenSource/WebKitBuild/Release/jsc (r206274) "Things" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206274) Collected 6 samples per benchmark/VM, with 6 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree Things SunSpider: 3d-cube 4.8347+-0.2649 ? 4.8454+-0.2665 ? 3d-morph 5.0168+-0.4898 4.8785+-0.3135 might be 1.0283x faster 3d-raytrace 4.8442+-0.1807 ? 4.9115+-0.2780 ? might be 1.0139x slower access-binary-trees 1.9121+-0.1006 ? 2.0883+-0.0962 ? might be 1.0922x slower access-fannkuch 4.8294+-0.2304 4.7858+-0.0596 access-nbody 2.4893+-0.2680 ! 3.4361+-0.0847 ! definitely 1.3804x slower access-nsieve 3.3854+-0.3455 3.1194+-0.1590 might be 1.0853x faster bitops-3bit-bits-in-byte 1.0935+-0.0915 ? 1.1083+-0.0736 ? might be 1.0135x slower bitops-bits-in-byte 2.6328+-0.0724 ? 2.8829+-0.5376 ? might be 1.0950x slower bitops-bitwise-and 2.0046+-0.1978 1.9007+-0.0246 might be 1.0546x faster bitops-nsieve-bits 3.0671+-0.1410 ? 3.1798+-0.3074 ? might be 1.0367x slower controlflow-recursive 2.2067+-0.0854 ? 2.4131+-0.2221 ? might be 1.0935x slower crypto-aes 4.2751+-0.2712 4.1564+-0.1486 might be 1.0286x faster crypto-md5 2.5822+-0.0850 ? 2.5839+-0.0848 ? crypto-sha1 2.7965+-0.1831 2.7643+-0.2527 might be 1.0116x faster date-format-tofte 6.7047+-0.4060 ! 7.9941+-0.1243 ! definitely 1.1923x slower date-format-xparb 4.5741+-0.2277 4.5505+-0.1035 math-cordic 2.8048+-0.3212 2.7110+-0.0609 might be 1.0346x faster math-partial-sums 3.9242+-0.3053 ? 4.2121+-0.0486 ? might be 1.0734x slower math-spectral-norm 2.0118+-0.1842 1.9723+-0.0992 might be 1.0200x faster regexp-dna 6.0520+-0.1355 ? 6.3698+-0.2368 ? might be 1.0525x slower string-base64 4.4180+-0.2117 ? 4.6267+-0.1075 ? might be 1.0472x slower string-fasta 5.2382+-0.0949 ! 5.6356+-0.1404 ! definitely 1.0759x slower string-tagcloud 8.0541+-0.3432 ? 8.2793+-0.2468 ? might be 1.0280x slower string-unpack-code 17.7172+-0.5782 ? 18.2439+-0.6110 ? might be 1.0297x slower string-validate-input 4.2480+-0.3982 ? 4.4784+-0.0904 ? might be 1.0542x slower <arithmetic> 4.3737+-0.0502 ! 4.5434+-0.0438 ! definitely 1.0388x slower TipOfTree Things LongSpider: 3d-cube 783.7228+-11.7323 ! 855.6619+-10.6588 ! definitely 1.0918x slower 3d-morph 568.6765+-11.3501 567.2830+-9.1173 3d-raytrace 450.9052+-10.1626 ? 455.0590+-10.0853 ? access-binary-trees 785.3363+-6.8210 ? 786.5133+-11.4327 ? access-fannkuch 237.3588+-2.6353 ? 242.5754+-15.0498 ? might be 1.0220x slower access-nbody 503.9350+-7.6098 503.7887+-5.6037 access-nsieve 285.1292+-5.5067 ? 287.2666+-4.9013 ? bitops-3bit-bits-in-byte 33.3164+-2.5575 31.4300+-0.5543 might be 1.0600x faster bitops-bits-in-byte 93.4222+-3.6022 ? 95.9135+-5.4419 ? might be 1.0267x slower bitops-nsieve-bits 379.5902+-7.2816 375.7859+-1.9959 might be 1.0101x faster controlflow-recursive 441.9387+-8.8400 441.6835+-13.3431 crypto-aes 528.3320+-9.1814 ? 531.1236+-7.3378 ? crypto-md5 468.6188+-10.8985 466.1707+-7.9897 crypto-sha1 624.2290+-5.5675 ? 638.6044+-10.4121 ? might be 1.0230x slower date-format-tofte 334.0788+-6.0447 ! 384.7257+-6.9758 ! definitely 1.1516x slower date-format-xparb 617.1805+-9.4164 ? 653.0618+-32.5480 ? might be 1.0581x slower hash-map 139.1575+-3.9678 ! 152.4310+-4.8154 ! definitely 1.0954x slower math-cordic 452.1195+-9.9727 ? 455.3107+-7.3447 ? math-partial-sums 285.9348+-7.1712 ! 301.3704+-5.6017 ! definitely 1.0540x slower math-spectral-norm 521.5851+-9.2105 ? 525.1613+-7.4278 ? string-base64 442.5808+-3.0804 441.8303+-8.4247 string-fasta 339.3115+-5.9100 ! 369.7477+-11.3455 ! definitely 1.0897x slower string-tagcloud 162.7723+-1.7795 ! 170.8195+-4.0981 ! definitely 1.0494x slower <geometric> 342.9567+-2.4812 ! 351.8905+-1.5834 ! definitely 1.0260x slower TipOfTree Things Octane: encrypt 0.15084+-0.00551 0.14965+-0.00353 decrypt 2.72931+-0.03442 2.72685+-0.03630 deltablue x2 0.12096+-0.00304 ! 0.15105+-0.00243 ! definitely 1.2488x slower earley 0.24457+-0.00166 ! 0.26245+-0.00300 ! definitely 1.0731x slower boyer 4.37421+-0.11332 ? 4.45275+-0.12217 ? might be 1.0180x slower navier-stokes x2 4.65331+-0.03200 ? 4.68742+-0.08098 ? raytrace x2 0.67654+-0.00346 ! 0.77305+-0.01442 ! definitely 1.1426x slower richards x2 0.07809+-0.00113 ! 0.15307+-0.00258 ! definitely 1.9601x slower splay x2 0.31960+-0.00431 ! 0.43687+-0.00644 ! definitely 1.3669x slower regexp x2 16.43119+-0.62910 ! 18.14595+-0.55471 ! definitely 1.1044x slower pdfjs x2 39.15434+-0.63255 ? 39.90852+-0.85460 ? might be 1.0193x slower mandreel x2 40.01166+-0.75115 ? 40.23776+-0.38598 ? gbemu x2 29.55179+-0.39599 ! 31.55389+-0.20267 ! definitely 1.0677x slower closure 0.47107+-0.00548 ? 0.47698+-0.00745 ? might be 1.0125x slower jquery 6.45894+-0.07439 6.42631+-0.05643 box2d x2 8.81335+-0.08313 ! 9.69109+-0.06784 ! definitely 1.0996x slower zlib x2 343.59926+-3.70208 336.96984+-8.18467 might be 1.0197x faster typescript x2 610.15072+-15.15347 ? 627.17623+-9.47707 ? might be 1.0279x slower <geometric> 4.77782+-0.01863 ! 5.34574+-0.02589 ! definitely 1.1189x slower TipOfTree Things Kraken: ai-astar 90.767+-3.093 ? 90.862+-1.908 ? audio-beat-detection 35.682+-0.328 ? 37.017+-1.870 ? might be 1.0374x slower audio-dft 95.836+-3.737 95.506+-3.921 audio-fft 30.314+-4.939 27.592+-0.170 might be 1.0986x faster audio-oscillator 45.864+-2.911 44.036+-1.469 might be 1.0415x faster imaging-darkroom 57.539+-1.431 57.318+-0.417 imaging-desaturate 41.086+-0.372 40.929+-0.503 imaging-gaussian-blur 57.484+-1.655 56.516+-2.836 might be 1.0171x faster json-parse-financial 31.456+-0.733 ? 32.994+-1.105 ? might be 1.0489x slower json-stringify-tinderbox 20.824+-0.872 ? 21.490+-1.169 ? might be 1.0320x slower stanford-crypto-aes 34.984+-0.470 ? 36.128+-0.847 ? might be 1.0327x slower stanford-crypto-ccm 35.019+-2.087 33.424+-2.722 might be 1.0477x faster stanford-crypto-pbkdf2 89.472+-2.999 ? 90.649+-1.593 ? might be 1.0132x slower stanford-crypto-sha256-iterative 29.205+-0.226 ? 30.100+-1.818 ? might be 1.0306x slower <arithmetic> 49.681+-0.293 49.612+-0.273 might be 1.0014x faster TipOfTree Things AsmBench: bigfib.cpp 412.8313+-2.9297 ? 418.5986+-12.6721 ? might be 1.0140x slower cray.c 362.4455+-6.5672 ? 363.2441+-5.6846 ? dry.c 411.2849+-14.0614 400.3989+-12.6636 might be 1.0272x faster FloatMM.c 701.0242+-32.9905 680.5246+-11.4577 might be 1.0301x faster gcc-loops.cpp 3417.7974+-21.9542 ? 3420.8556+-46.6906 ? n-body.c 763.4065+-10.1276 760.3860+-12.8855 Quicksort.c 376.6666+-6.1638 374.2165+-4.0621 stepanov_container.cpp 3192.4034+-38.6218 ? 3253.7346+-132.5702 ? might be 1.0192x slower Towers.c 252.7060+-3.2718 ? 255.1564+-2.6056 ? <geometric> 687.1447+-5.3013 685.5014+-3.4682 might be 1.0024x faster TipOfTree Things Geomean of preferred means: <scaled-result> 47.6106+-0.1527 ! 49.2801+-0.0711 ! definitely 1.0351x slower
Filip Pizlo
Comment 4 2016-09-23 09:59:59 PDT
Created attachment 289686 [details] work in progress I realized that I can be a lot more aggressive about removing barriers. So I'm wiring this through the compiler now.
Filip Pizlo
Comment 5 2016-09-23 16:38:43 PDT
I'm getting some really interesting data. Basically, just putting the barrier after the store, so that this: o.f = v gets turned into this: o.f = v barrier instead of this: barrier o.f = v Right now, we put the barrier before the store. Putting the barrier after the store appears to be a 10% slow-down on Octane/richards. This causes a 0.7% slow-down on Octane. This is an enormous slow-down from what seems like an innocent change, but it was very easy to narrow it down: putting the barriers before the stores and On top of this, there is a slow-down from putting in the fence. Right now this means putting the fence behind a conditional. So in the end we get: o.f = v if (marking) fence barrier Putting this conditional fence everywhere causes another 0.9% slow-down. Together it's a 1.6% slow-down on Octane. So, do we really want to do it this way? Probably not. We probably want to find some way of doing this that does not result in any Octane slow-down. One possible approach would be to leave the barrier before the store and force the fast path to fail when marking is enabled, for example by making it be this: if o->cellState > blackThreshold where blackThreshold is a global variable that the barrier loads from rather than an inline constant. Then, the slow path would buffer the object. When we pop objects off the buffer, we can run the full barrier. But, I think I'll run some more tests. It's worthwhile to get a complete performance picture, in case the other approach starts having problems, and we want to make trade-offs. Also, I want to get a better understanding of why moving the barrier to after the store makes such a big difference.
Filip Pizlo
Comment 6 2016-09-23 16:42:05 PDT
It looks like this performance pathology only shows up in the FTL JIT.
Filip Pizlo
Comment 7 2016-09-23 16:46:54 PDT
Looks like there are some suspicious issues with register allocation in schedule() in richards.
Filip Pizlo
Comment 8 2016-09-23 16:52:58 PDT
I think I see what is going on! The store barrier slow path looks like it clobbers the world, so it blocks B3's load elimination from working right. That's actually an easy fix.
Filip Pizlo
Comment 9 2016-09-23 17:47:49 PDT
Created attachment 289728 [details] TSO barrier I fixed one of the perf problems. Now I need to test all of the things.
Filip Pizlo
Comment 10 2016-09-24 10:52:40 PDT
Perf: Benchmark report for SunSpider, LongSpider, Octane, Kraken, and AsmBench on murderface (MacBookPro11,5). VMs tested: "TipOfTree" at /Volumes/Data/secondary/OpenSource/WebKitBuild/Release/jsc (r206314) "Things" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206314) export JSC_useConcurrentBarriers=false export JSC_forceFencedBarrier=false "FenceBarrierOff" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206314) export JSC_useConcurrentBarriers=true export JSC_forceFencedBarrier=false "FenceBarrierOn" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206314) export JSC_useConcurrentBarriers=true export JSC_forceFencedBarrier=true Collected 6 samples per benchmark/VM, with 6 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree Things FenceBarrierOff FenceBarrierOn FenceBarrierOn v. TipOfTree SunSpider: 3d-cube 4.8855+-0.3437 4.7451+-0.2015 ? 4.7654+-0.0585 ? 5.0327+-0.2626 ? might be 1.0301x slower 3d-morph 4.7341+-0.2022 ? 4.8443+-0.4272 ? 4.8904+-0.2973 4.6849+-0.0377 might be 1.0105x faster 3d-raytrace 4.8137+-0.3456 ? 4.9133+-0.3841 4.7564+-0.0779 ? 4.8594+-0.4431 ? access-binary-trees 1.9301+-0.0335 1.9289+-0.0318 ? 2.1017+-0.2853 2.0540+-0.0671 ! definitely 1.0642x slower access-fannkuch 4.7689+-0.0938 ? 5.0267+-0.3571 4.7893+-0.1429 4.7772+-0.0541 ? access-nbody 2.3896+-0.2317 ? 2.5207+-0.5531 2.3719+-0.2189 ? 2.5851+-0.1483 ? might be 1.0818x slower access-nsieve 3.1248+-0.2647 ? 3.2107+-0.1061 3.1423+-0.1661 ? 3.2327+-0.2379 ? might be 1.0345x slower bitops-3bit-bits-in-byte 1.0614+-0.0761 ? 1.0975+-0.0420 1.0621+-0.0598 ? 1.0815+-0.0585 ? might be 1.0190x slower bitops-bits-in-byte 2.6108+-0.0640 ? 2.6396+-0.0615 ? 2.7493+-0.0688 2.7318+-0.0443 ! definitely 1.0463x slower bitops-bitwise-and 1.8669+-0.0144 ? 1.9815+-0.2187 1.9395+-0.0935 ? 1.9695+-0.1221 ? might be 1.0550x slower bitops-nsieve-bits 3.0007+-0.0639 ? 3.0479+-0.1050 2.9696+-0.0364 ? 2.9954+-0.0258 controlflow-recursive 2.2263+-0.0560 2.2260+-0.0224 2.2219+-0.0388 ! 2.3564+-0.0475 ! definitely 1.0585x slower crypto-aes 4.2644+-0.1752 4.1656+-0.0653 ? 4.1970+-0.0317 4.1866+-0.0229 might be 1.0186x faster crypto-md5 2.6059+-0.0477 2.6038+-0.0802 ? 2.6512+-0.1150 ? 2.8451+-0.3306 ? might be 1.0918x slower crypto-sha1 2.6635+-0.0809 ? 2.7002+-0.0838 2.6753+-0.0384 ? 2.7194+-0.0890 ? might be 1.0210x slower date-format-tofte 6.5990+-0.1135 6.5070+-0.1606 ? 7.1180+-0.6053 ! 7.9360+-0.0940 ! definitely 1.2026x slower date-format-xparb 4.3823+-0.0682 ? 4.4602+-0.1820 ? 4.5015+-0.2546 ? 4.5329+-0.0812 ! definitely 1.0344x slower math-cordic 2.6175+-0.0460 ? 2.7590+-0.2880 2.6494+-0.0492 ? 2.8150+-0.2690 ? might be 1.0755x slower math-partial-sums 3.9207+-0.1780 3.7919+-0.0697 ? 3.8057+-0.0458 3.8049+-0.0539 might be 1.0304x faster math-spectral-norm 1.9714+-0.0703 1.9311+-0.0218 ? 1.9405+-0.0335 ? 1.9581+-0.0354 regexp-dna 6.1188+-0.0952 6.0387+-0.1454 ? 6.5048+-0.6697 6.0741+-0.2002 string-base64 4.3668+-0.0441 ? 4.4353+-0.1003 ? 4.6130+-0.4051 ? 4.6448+-0.0466 ! definitely 1.0637x slower string-fasta 5.3062+-0.1170 ? 5.3205+-0.1398 ? 5.6424+-0.5966 5.6309+-0.0922 ! definitely 1.0612x slower string-tagcloud 7.9797+-0.1514 ? 8.1622+-0.5594 7.9825+-0.1638 ? 8.7412+-0.8858 ? might be 1.0954x slower string-unpack-code 19.2155+-0.8224 18.8134+-1.7017 ? 18.9829+-0.8186 18.5221+-0.5099 might be 1.0374x faster string-validate-input 4.0581+-0.0559 ! 4.1686+-0.0525 ? 4.1766+-0.0505 ! 4.3840+-0.1073 ! definitely 1.0803x slower <arithmetic> 4.3647+-0.0593 ? 4.3861+-0.1047 ? 4.4308+-0.0941 ? 4.5060+-0.0705 ! definitely 1.0324x slower TipOfTree Things FenceBarrierOff FenceBarrierOn FenceBarrierOn v. TipOfTree LongSpider: 3d-cube 790.3628+-14.4636 ? 793.4216+-9.2169 791.1963+-8.9537 ! 819.6706+-9.1970 ! definitely 1.0371x slower 3d-morph 560.1136+-2.6505 557.3622+-3.2403 ? 559.6033+-4.2366 556.9900+-2.2083 3d-raytrace 443.4788+-5.0786 ? 445.9079+-3.1484 443.5669+-5.7357 441.3468+-1.9866 access-binary-trees 776.8875+-4.6755 775.5487+-2.5747 ? 782.2760+-4.2171 777.6500+-8.6042 ? access-fannkuch 241.7314+-9.6265 ^ 226.0846+-3.6940 ? 233.5190+-12.2351 ? 234.5291+-10.8571 might be 1.0307x faster access-nbody 497.8625+-7.9337 492.4945+-5.1109 491.0199+-1.6968 ? 491.6904+-2.4094 might be 1.0126x faster access-nsieve 275.3899+-8.6057 ? 282.0843+-10.3947 279.8867+-9.4489 ? 280.7753+-7.8075 ? might be 1.0196x slower bitops-3bit-bits-in-byte 31.5917+-0.5577 ? 32.1492+-1.3468 31.7826+-0.4859 ? 32.4253+-1.6604 ? might be 1.0264x slower bitops-bits-in-byte 90.5475+-1.6808 ? 91.0194+-0.6494 90.2509+-0.5593 ? 90.5641+-2.5394 ? bitops-nsieve-bits 373.0286+-7.8153 366.0847+-3.0362 ? 367.7728+-1.9961 ? 372.6866+-8.1861 controlflow-recursive 432.2858+-4.7629 ? 436.5959+-4.9231 434.4041+-2.4137 ? 436.3765+-2.4653 ? crypto-aes 527.0315+-3.5424 526.9155+-5.7155 525.7543+-3.7497 ? 526.3824+-2.8114 crypto-md5 462.0101+-2.3680 461.5408+-3.2257 453.5203+-6.2378 ? 455.1294+-5.8330 might be 1.0151x faster crypto-sha1 619.0815+-3.5727 ? 667.3594+-114.0927 617.0648+-6.5730 612.4217+-3.5069 might be 1.0109x faster date-format-tofte 335.7357+-3.8220 ? 343.8285+-5.4068 ? 346.6802+-29.3892 ? 378.6551+-10.3137 ! definitely 1.1278x slower date-format-xparb 604.3572+-3.2837 ? 615.7940+-15.3735 ? 618.9718+-38.0003 ? 625.1016+-2.5524 ! definitely 1.0343x slower hash-map 138.0494+-4.4299 134.0581+-3.8635 ? 136.2216+-2.9644 ! 145.2815+-4.0119 ? might be 1.0524x slower math-cordic 440.6676+-3.4782 ? 444.4087+-5.2941 442.6868+-5.2303 ? 443.8083+-3.1154 ? math-partial-sums 282.3561+-4.5102 ? 282.8031+-4.0082 ? 283.3629+-3.1374 ? 283.7110+-2.5464 ? math-spectral-norm 518.3130+-4.2663 514.2906+-2.3935 ? 514.5655+-2.4697 ? 514.8337+-2.1327 string-base64 491.0991+-5.6191 ^ 475.4124+-2.6904 ? 477.6324+-6.1856 476.0622+-3.8939 ^ definitely 1.0316x faster string-fasta 332.5522+-2.8762 328.0060+-4.0587 ? 331.4103+-3.7413 ! 363.7319+-2.8054 ! definitely 1.0938x slower string-tagcloud 163.5163+-2.0491 163.0950+-2.3055 ? 163.6358+-2.4436 ? 167.2279+-2.8409 ? might be 1.0227x slower <geometric> 340.1453+-0.6782 340.0635+-3.1573 339.4599+-1.2458 ! 344.6359+-1.8147 ! definitely 1.0132x slower TipOfTree Things FenceBarrierOff FenceBarrierOn FenceBarrierOn v. TipOfTree Octane: encrypt 0.14871+-0.00281 ? 0.14898+-0.00394 ? 0.15280+-0.00594 0.14909+-0.00401 ? decrypt 2.65943+-0.02182 2.65544+-0.01654 ? 2.65951+-0.01500 ? 2.67283+-0.01927 ? deltablue x2 0.11945+-0.00161 ERROR 0.11842+-0.00251 ! 0.13622+-0.00272 ! definitely 1.1404x slower earley 0.23897+-0.00092 ? 0.23939+-0.00386 ? 0.24030+-0.00175 ! 0.26136+-0.00266 ! definitely 1.0937x slower boyer 4.42934+-0.09664 4.38369+-0.12234 ? 4.44571+-0.10589 ? 4.58476+-0.04672 ! definitely 1.0351x slower navier-stokes x2 4.62985+-0.04080 4.61322+-0.01365 4.60662+-0.00561 ? 4.60704+-0.05048 raytrace x2 0.66588+-0.00658 ? 0.66680+-0.00711 ? 0.66809+-0.00307 ! 0.69510+-0.00904 ! definitely 1.0439x slower richards x2 0.07768+-0.00056 0.07736+-0.00135 ! 0.08119+-0.00088 ! 0.15372+-0.00039 ! definitely 1.9790x slower splay x2 0.31131+-0.00229 0.30998+-0.00279 ? 0.31254+-0.00342 ! 0.39484+-0.00156 ! definitely 1.2683x slower regexp x2 16.56341+-0.81563 ? 17.68462+-0.36190 17.40606+-0.84950 ? 18.28816+-0.43341 ! definitely 1.1041x slower pdfjs x2 39.17220+-0.27277 38.63765+-0.46641 ? 38.79572+-0.44520 ! 39.94609+-0.26545 ! definitely 1.0198x slower mandreel x2 39.32074+-0.28400 ? 39.66557+-0.32521 39.48684+-0.28639 39.37410+-0.21706 ? gbemu x2 28.85909+-0.08143 28.69655+-0.25908 ? 29.16622+-0.30302 ? 32.14633+-2.92521 ! definitely 1.1139x slower closure 0.46391+-0.00424 0.46188+-0.00404 ? 0.46291+-0.00433 ? 0.46500+-0.00235 ? jquery 6.33970+-0.04624 ? 6.36350+-0.08610 6.32101+-0.04921 ? 6.35837+-0.04818 ? box2d x2 8.66160+-0.05203 8.61802+-0.05914 ! 8.72656+-0.03259 ! 8.96710+-0.03578 ! definitely 1.0353x slower zlib x2 336.62914+-2.38209 ? 338.11414+-4.19177 335.33173+-11.15128 ? 339.51557+-3.51460 ? typescript x2 610.39062+-13.85539 608.59302+-9.70143 608.05721+-13.77993 ! 635.74292+-10.20850 ! definitely 1.0415x slower <geometric> 4.72171+-0.01352 ERROR 4.75560+-0.01561 ! 5.20716+-0.03504 ! definitely 1.1028x slower TipOfTree Things FenceBarrierOff FenceBarrierOn FenceBarrierOn v. TipOfTree Kraken: ai-astar 88.745+-0.755 88.128+-1.606 ? 90.754+-4.183 88.752+-1.947 ? audio-beat-detection 35.425+-0.221 ? 35.951+-1.556 35.536+-0.544 ? 35.848+-0.265 ? might be 1.0119x slower audio-dft 93.684+-1.430 ? 95.182+-2.398 95.145+-1.866 93.880+-1.503 ? audio-fft 27.499+-0.495 ? 27.692+-1.012 27.321+-0.080 27.295+-0.052 audio-oscillator 43.931+-1.493 43.106+-0.156 ? 43.232+-0.398 ? 43.987+-1.585 ? imaging-darkroom 57.659+-2.017 ? 57.733+-2.253 ? 57.922+-1.756 56.847+-0.177 might be 1.0143x faster imaging-desaturate 41.764+-2.569 40.709+-0.322 ? 40.842+-0.481 ? 41.341+-0.765 might be 1.0102x faster imaging-gaussian-blur 56.245+-1.997 ? 56.403+-1.431 ? 56.644+-3.055 ? 57.091+-2.905 ? might be 1.0150x slower json-parse-financial 31.026+-0.461 30.641+-0.354 ? 31.265+-1.270 ? 34.503+-3.230 ? might be 1.1120x slower json-stringify-tinderbox 20.692+-0.805 ? 21.898+-1.977 21.545+-1.670 21.280+-0.950 ? might be 1.0284x slower stanford-crypto-aes 35.303+-1.064 34.637+-0.380 ? 35.438+-1.000 ? 35.551+-0.240 ? stanford-crypto-ccm 32.985+-0.406 ? 33.687+-1.508 33.332+-2.713 ? 34.518+-4.394 ? might be 1.0465x slower stanford-crypto-pbkdf2 89.313+-1.343 ? 89.358+-2.444 ? 89.626+-3.292 89.037+-1.876 stanford-crypto-sha256-iterative 28.878+-0.245 ? 30.131+-2.089 29.116+-0.280 ? 29.451+-0.801 ? might be 1.0198x slower <arithmetic> 48.796+-0.401 ? 48.947+-0.607 ? 49.123+-0.338 ? 49.242+-0.401 ? might be 1.0091x slower TipOfTree Things FenceBarrierOff FenceBarrierOn FenceBarrierOn v. TipOfTree AsmBench: bigfib.cpp 407.1147+-5.3258 401.8814+-2.2765 ? 404.9376+-4.0144 ? 407.4121+-4.7777 ? cray.c 361.7572+-3.7513 359.6945+-3.0764 ? 361.8408+-2.4479 357.4190+-2.2745 might be 1.0121x faster dry.c 456.1795+-93.4875 404.4752+-11.0078 ? 431.2439+-84.5160 427.5564+-73.1661 might be 1.0669x faster FloatMM.c 702.1463+-40.4966 672.1162+-6.5733 ? 684.6254+-31.0437 684.0195+-32.2109 might be 1.0265x faster gcc-loops.cpp 3350.7568+-10.4230 ? 3369.8304+-17.8820 3346.1816+-10.2480 ? 3353.3887+-12.6830 ? n-body.c 748.5625+-6.1192 ? 750.1156+-3.8622 746.0982+-4.0725 ? 749.2850+-1.9416 ? Quicksort.c 368.8832+-4.4032 ? 369.6404+-5.3791 ? 372.3699+-3.4186 369.6694+-6.2943 ? stepanov_container.cpp 3122.4003+-27.4178 ? 3150.5615+-50.5682 3142.3111+-26.2512 ? 3150.6047+-29.8139 ? Towers.c 250.4936+-3.4886 ? 251.6215+-4.0640 248.2078+-1.4313 ? 249.3890+-2.4550 <geometric> 685.9817+-14.9669 675.0062+-3.2894 ? 679.7620+-12.0832 679.2585+-11.6418 might be 1.0099x faster TipOfTree Things FenceBarrierOff FenceBarrierOn FenceBarrierOn v. TipOfTree Geomean of preferred means: <scaled-result> 47.2131+-0.1939 ERROR 47.3805+-0.2497 ! 48.5743+-0.2830 ! definitely 1.0288x slower
Filip Pizlo
Comment 11 2016-09-24 10:54:06 PDT
Created attachment 289747 [details] a different approach I'm trying a different approach, where instead of emitting: if (needsFence) fence if (o->cellState <= threshold) slowPath() I just do: if (o->cellState <= sneakyThreshold) slowPath() Where sneakyThreshold is a global variable, not a constant. We can raise it to a tautological level when we need fences.
Filip Pizlo
Comment 12 2016-09-25 15:36:50 PDT
Created attachment 289785 [details] possibly cheaper and less wrong TSO barrier Still testing it.
Filip Pizlo
Comment 13 2016-09-25 16:58:01 PDT
OK, got some perf. I'm testing these configurations: TipOfTree = tip of tree Things = all of the changes in this patch, but the JIT uses a store barrier without any TSO fence FenceBarrierOff = the fence is dynamically turned off but can be turned on in O(1) time. this is the configuration that code would run in when the concurrent GC is not running. FenceBarrierOn = the fence is dynamically turned on. this is the configuration that code would run in when the concurrent GC is running. Summary of results: SunSpider: FenceBarrierOff is neutral, FenceBarrierOn is 10.6% slower. LongSpider: FenceBarrierOff is neutral, FenceBarrierOn is 3.5% slower. Octane: FenceBarrierOff is maybe 0.43% slower, FenceBarrierOn is 14.7% slower. Kraken: FenceBarrierOff is maybe 0.7% slower, FenceBarrierOn is 3.1% slower. AsmBench: FenceBarrierOff is maybe 2% slower, FenceBarrierOn is neutral. Note that I reran SunSpider separately after at first seeing really strange results. Here's all the data. Benchmark report for SunSpider, LongSpider, Octane, Kraken, and AsmBench on murderface (MacBookPro11,5). VMs tested: "TipOfTree" at /Volumes/Data/secondary/OpenSource/WebKitBuild/Release/jsc (r206363) "Things" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206363) export JSC_useConcurrentBarriers=false export JSC_forceFencedBarrier=false "FenceBarrierOff" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206363) export JSC_useConcurrentBarriers=true export JSC_forceFencedBarrier=false "FenceBarrierOn" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206363) export JSC_useConcurrentBarriers=true export JSC_forceFencedBarrier=true Collected 6 samples per benchmark/VM, with 6 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree Things FenceBarrierOff FenceBarrierOn FenceBarrierOn v. TipOfTree SunSpider: 3d-cube 4.6873+-0.2464 4.6723+-0.2168 ? 4.8192+-0.1976 ? 5.1078+-0.1400 ! definitely 1.0897x slower 3d-morph 4.7399+-0.1793 4.7051+-0.1343 ? 4.7362+-0.2140 4.6970+-0.0654 3d-raytrace 4.7384+-0.2259 ? 4.7537+-0.1927 ? 4.7561+-0.1646 ? 4.8793+-0.1036 ? might be 1.0297x slower access-binary-trees 1.9175+-0.0589 ? 1.9902+-0.0583 1.9586+-0.0890 ! 2.2765+-0.0678 ! definitely 1.1872x slower access-fannkuch 4.8763+-0.2529 4.8594+-0.3502 ? 4.8758+-0.1839 4.8537+-0.2047 access-nbody 2.3629+-0.0772 ? 2.4242+-0.1369 2.3947+-0.0988 ! 3.8912+-0.0881 ! definitely 1.6468x slower access-nsieve 3.3562+-0.3997 3.2106+-0.0809 3.1377+-0.1225 ? 3.1865+-0.1666 might be 1.0532x faster bitops-3bit-bits-in-byte 1.0755+-0.0372 1.0734+-0.0248 1.0601+-0.0196 ! 1.1551+-0.0520 ? might be 1.0740x slower bitops-bits-in-byte 2.6095+-0.0462 ! 2.7709+-0.0739 2.7496+-0.0328 ? 2.7807+-0.0526 ! definitely 1.0656x slower bitops-bitwise-and 1.8820+-0.0450 ? 2.0783+-0.3709 1.9802+-0.1080 ? 2.0349+-0.1611 ? might be 1.0813x slower bitops-nsieve-bits 2.9371+-0.0603 ? 3.0051+-0.0388 ? 3.2958+-0.4755 3.0335+-0.0485 ? might be 1.0328x slower controlflow-recursive 2.2253+-0.0141 ? 2.2607+-0.0217 ? 2.3227+-0.1296 ? 2.5735+-0.1562 ! definitely 1.1565x slower crypto-aes 4.1389+-0.0501 ? 4.1825+-0.1189 4.1487+-0.1162 ? 4.3077+-0.0710 ! definitely 1.0408x slower crypto-md5 2.5885+-0.0409 2.5629+-0.0412 ? 2.8102+-0.4482 ? 2.8254+-0.0653 ! definitely 1.0915x slower crypto-sha1 2.6435+-0.0945 ? 2.7382+-0.2091 2.6968+-0.1981 ? 2.7872+-0.0651 ? might be 1.0544x slower date-format-tofte 6.4510+-0.3090 ? 7.0450+-0.7715 6.8742+-0.5163 ! 9.1250+-0.1661 ! definitely 1.4145x slower date-format-xparb 4.3540+-0.1031 ? 4.4781+-0.0952 4.3957+-0.0597 ! 4.8894+-0.3077 ! definitely 1.1229x slower math-cordic 2.6507+-0.0538 2.6259+-0.0568 ? 2.6843+-0.1424 ? 2.7441+-0.0910 ? might be 1.0353x slower math-partial-sums 3.9649+-0.2047 3.9623+-0.3220 3.8153+-0.0521 ! 4.4985+-0.1553 ! definitely 1.1346x slower math-spectral-norm 1.9359+-0.0580 ? 2.0052+-0.1224 1.9846+-0.0432 ? 2.0085+-0.1015 ? might be 1.0375x slower regexp-dna 6.2087+-0.1649 ? 6.9290+-0.9818 6.2836+-0.2674 ? 6.3951+-0.5061 ? might be 1.0300x slower string-base64 4.3678+-0.0718 ? 4.6694+-0.3538 4.4497+-0.0878 ! 4.9917+-0.1363 ! definitely 1.1428x slower string-fasta 5.2428+-0.0448 5.2300+-0.0742 ? 5.3259+-0.1634 ! 5.7222+-0.0634 ! definitely 1.0914x slower string-tagcloud 7.9255+-0.1528 ? 8.2115+-0.5876 ? 8.5007+-0.6124 ? 8.8456+-0.3920 ! definitely 1.1161x slower string-unpack-code 18.2529+-0.5251 ? 18.7590+-0.3757 18.5401+-0.6173 ! 20.4280+-0.8089 ! definitely 1.1192x slower string-validate-input 4.1118+-0.1167 ? 4.2775+-0.2347 4.1448+-0.0659 ! 4.8023+-0.1121 ! definitely 1.1679x slower <arithmetic> 4.3171+-0.0202 ! 4.4415+-0.0572 4.4131+-0.0433 ! 4.8016+-0.0313 ! definitely 1.1122x slower TipOfTree Things FenceBarrierOff FenceBarrierOn FenceBarrierOn v. TipOfTree LongSpider: 3d-cube 790.8875+-11.1969 ^ 772.9956+-5.9998 ? 788.2658+-10.4984 ! 853.4161+-12.1295 ! definitely 1.0791x slower 3d-morph 563.8531+-14.6256 556.9321+-4.7208 556.1187+-3.2730 555.9510+-4.1080 might be 1.0142x faster 3d-raytrace 443.4629+-4.5655 ? 444.7491+-3.3469 443.6145+-3.6014 ! 453.6922+-5.2530 ! definitely 1.0231x slower access-binary-trees 780.3031+-4.2928 ? 783.7270+-15.1960 779.8463+-5.3910 ? 786.0037+-6.8504 ? access-fannkuch 233.6416+-9.2029 ? 239.8330+-14.0239 228.8607+-8.1681 ? 230.9484+-10.9864 might be 1.0117x faster access-nbody 496.3149+-9.3575 496.1256+-7.7534 491.3464+-3.2813 ? 495.5094+-3.6910 access-nsieve 277.9156+-8.0497 ? 289.3253+-9.1517 283.2106+-7.9060 275.9535+-4.6446 bitops-3bit-bits-in-byte 31.2857+-0.7790 ? 31.9063+-1.1519 ? 32.7069+-1.5389 31.7387+-0.6159 ? might be 1.0145x slower bitops-bits-in-byte 90.3698+-1.5036 ? 90.8577+-2.0864 ? 91.5820+-2.8605 90.2810+-1.5695 bitops-nsieve-bits 369.9926+-6.0209 366.3011+-4.9484 ? 371.0743+-3.2820 370.2164+-5.3564 ? controlflow-recursive 433.7122+-3.5714 ? 434.0434+-1.9258 ? 435.1054+-2.1277 ? 435.8985+-2.8466 ? crypto-aes 526.2620+-4.3421 524.7892+-10.0911 ? 526.5247+-2.5579 ? 530.3883+-3.2583 ? crypto-md5 464.9832+-5.9635 458.1338+-6.0737 ? 458.4068+-7.5515 ? 465.0562+-12.3124 ? crypto-sha1 620.5447+-8.6803 ? 621.0418+-12.7585 617.4234+-13.8790 614.0472+-3.1212 might be 1.0106x faster date-format-tofte 325.0318+-3.8549 ! 342.0714+-11.5045 ? 342.5265+-9.7111 ! 401.3534+-3.3674 ! definitely 1.2348x slower date-format-xparb 602.9838+-6.1904 ? 613.9401+-11.7384 607.4906+-2.4960 ! 670.5593+-24.0152 ! definitely 1.1121x slower hash-map 140.2442+-6.7184 135.1236+-3.7929 134.2746+-3.0869 ! 154.9412+-4.6041 ! definitely 1.1048x slower math-cordic 446.4979+-8.5427 438.1545+-2.8312 ? 442.4903+-3.4614 ? 442.9231+-5.5418 math-partial-sums 281.4023+-2.4358 ? 282.8702+-2.4484 ? 283.3649+-2.0861 ! 303.6508+-2.9016 ! definitely 1.0791x slower math-spectral-norm 515.4920+-4.0498 ? 519.0566+-6.6786 514.0352+-1.9308 ? 515.4438+-2.4737 string-base64 489.9944+-4.9233 ^ 477.8973+-3.2024 474.4139+-2.3064 ? 484.0563+-12.3825 might be 1.0123x faster string-fasta 329.9166+-4.8052 ? 330.5812+-4.9300 ? 332.0518+-3.0716 ! 372.3283+-7.4094 ! definitely 1.1286x slower string-tagcloud 162.3110+-3.1199 ? 163.8420+-1.6578 163.6790+-2.7616 ! 181.6792+-10.0646 ! definitely 1.1193x slower <geometric> 339.2783+-1.1574 ? 339.9833+-1.1491 339.4282+-1.2724 ! 351.1887+-0.7784 ! definitely 1.0351x slower TipOfTree Things FenceBarrierOff FenceBarrierOn FenceBarrierOn v. TipOfTree Octane: encrypt 0.15209+-0.00578 0.14743+-0.00282 ? 0.14815+-0.00272 ? 0.14910+-0.00369 might be 1.0200x faster decrypt 2.66624+-0.01183 2.65886+-0.01637 ? 2.66809+-0.01365 ? 2.68677+-0.01892 ? deltablue x2 0.11968+-0.00238 0.11904+-0.00267 ? 0.11941+-0.00167 ! 0.14205+-0.00093 ! definitely 1.1869x slower earley 0.23867+-0.00097 0.23807+-0.00244 ? 0.24054+-0.00251 ! 0.26432+-0.00188 ! definitely 1.1075x slower boyer 4.38041+-0.10067 ? 4.40628+-0.14514 ? 4.48629+-0.07904 ? 4.54969+-0.07993 ? might be 1.0386x slower navier-stokes x2 4.60619+-0.01825 ? 4.61248+-0.03914 ? 4.61372+-0.02039 4.60672+-0.02478 ? raytrace x2 0.66279+-0.00402 0.66272+-0.00781 ? 0.66651+-0.00862 ! 0.70993+-0.00683 ! definitely 1.0711x slower richards x2 0.07788+-0.00155 0.07699+-0.00094 ? 0.07883+-0.00192 ! 0.16022+-0.00095 ! definitely 2.0571x slower splay x2 0.31497+-0.00953 ? 0.31564+-0.00731 0.31335+-0.01110 ! 0.40770+-0.00237 ! definitely 1.2944x slower regexp x2 16.44196+-0.84384 ? 16.84465+-0.81112 16.63412+-1.07471 ! 22.06028+-0.20795 ! definitely 1.3417x slower pdfjs x2 38.99352+-0.45848 38.65428+-0.61341 ? 39.13356+-0.25845 ! 40.73753+-0.33209 ! definitely 1.0447x slower mandreel x2 39.76992+-0.30873 39.57649+-0.49918 39.54078+-0.20670 ? 39.74957+-0.35685 gbemu x2 28.83507+-0.18851 ? 29.24355+-0.51371 28.96884+-0.13519 ! 34.58463+-0.25396 ! definitely 1.1994x slower closure 0.46649+-0.00371 ? 0.46776+-0.00425 0.46674+-0.00259 ! 0.47910+-0.00387 ! definitely 1.0270x slower jquery 6.39595+-0.03250 6.39522+-0.03984 ? 6.40099+-0.05028 ? 6.46304+-0.04862 ? might be 1.0105x slower box2d x2 8.82091+-0.30222 8.72143+-0.09720 ? 8.82028+-0.10147 ! 9.90167+-0.05397 ! definitely 1.1225x slower zlib x2 335.06905+-9.72222 ? 335.59928+-11.71526 ? 340.75012+-3.20133 339.90800+-3.43031 ? might be 1.0144x slower typescript x2 598.06108+-18.86401 ? 606.66683+-4.50645 ? 609.35295+-9.43489 ! 661.13749+-12.64584 ! definitely 1.1055x slower <geometric> 4.72498+-0.03139 ? 4.72573+-0.03047 ? 4.74564+-0.02170 ! 5.41825+-0.01486 ! definitely 1.1467x slower TipOfTree Things FenceBarrierOff FenceBarrierOn FenceBarrierOn v. TipOfTree Kraken: ai-astar 88.782+-2.703 ? 90.374+-4.480 89.317+-1.380 89.230+-1.301 ? audio-beat-detection 35.735+-0.641 ? 36.148+-1.872 35.479+-0.298 ! 35.973+-0.185 ? audio-dft 94.264+-1.122 ? 96.113+-3.379 93.799+-1.145 ? 96.438+-2.629 ? might be 1.0231x slower audio-fft 27.350+-0.079 ? 27.380+-0.078 ? 27.826+-1.384 27.361+-0.065 ? audio-oscillator 44.081+-1.633 43.626+-0.269 ? 43.694+-0.863 43.495+-0.584 might be 1.0135x faster imaging-darkroom 56.776+-0.209 56.742+-0.115 ? 57.027+-0.822 ? 58.320+-2.634 ? might be 1.0272x slower imaging-desaturate 41.200+-0.607 40.952+-0.514 ? 44.124+-3.302 41.717+-1.615 ? might be 1.0126x slower imaging-gaussian-blur 55.944+-1.432 ? 56.385+-2.696 ? 57.228+-2.885 57.151+-0.780 ? might be 1.0216x slower json-parse-financial 31.049+-0.741 ? 32.910+-1.712 31.516+-0.293 ! 38.881+-0.168 ! definitely 1.2522x slower json-stringify-tinderbox 21.655+-1.634 ? 22.725+-2.107 21.565+-1.798 ? 22.370+-2.113 ? might be 1.0330x slower stanford-crypto-aes 34.807+-0.334 34.550+-0.242 ? 35.120+-0.539 ! 37.516+-0.371 ! definitely 1.0778x slower stanford-crypto-ccm 32.937+-0.826 32.452+-1.573 ? 33.040+-2.156 ? 34.621+-2.412 ? might be 1.0512x slower stanford-crypto-pbkdf2 88.481+-1.268 ? 89.278+-1.636 88.951+-1.689 ? 91.512+-1.811 ? might be 1.0343x slower stanford-crypto-sha256-iterative 29.815+-2.254 29.390+-0.443 29.191+-0.386 ? 29.367+-0.308 might be 1.0153x faster <arithmetic> 48.777+-0.489 ? 49.216+-0.809 49.134+-0.225 ! 50.282+-0.259 ! definitely 1.0309x slower TipOfTree Things FenceBarrierOff FenceBarrierOn FenceBarrierOn v. TipOfTree AsmBench: bigfib.cpp 405.6804+-2.9668 ? 408.3448+-7.4028 408.0582+-7.0837 404.1593+-1.2290 cray.c 359.2984+-1.4593 ? 360.3912+-4.6435 ? 361.3141+-4.2090 360.1034+-2.2630 ? dry.c 429.1470+-73.7268 423.8395+-61.3341 ? 492.2854+-108.5908 399.6511+-8.6202 might be 1.0738x faster FloatMM.c 666.9849+-26.2727 ? 706.7268+-46.6179 689.2771+-33.2839 686.3002+-30.3009 ? might be 1.0290x slower gcc-loops.cpp 3356.6525+-9.4942 ? 3369.9730+-11.1756 3362.1044+-8.2965 3359.0931+-15.4549 ? n-body.c 748.1152+-5.8446 ? 749.5008+-4.2718 ? 749.9667+-7.4312 ? 750.1257+-1.8834 ? Quicksort.c 370.8504+-4.4654 ? 373.2508+-3.5937 ? 375.8099+-5.5369 375.5684+-6.9777 ? might be 1.0127x slower stepanov_container.cpp 3124.3299+-15.4316 ? 3131.8535+-16.2223 ? 3135.3221+-13.0599 ? 3163.1947+-26.3912 ? might be 1.0124x slower Towers.c 248.8219+-2.2847 ? 251.0608+-4.2427 250.3250+-3.9612 ? 250.6464+-2.3194 ? <geometric> 677.1430+-13.1909 ? 683.2071+-14.3882 ? 692.3095+-15.5077 677.1201+-4.4223 might be 1.0000x faster TipOfTree Things FenceBarrierOff FenceBarrierOn FenceBarrierOn v. TipOfTree Geomean of preferred means: <scaled-result> 46.9670+-0.1595 ! 47.4242+-0.2858 ? 47.4981+-0.2041 ! 49.9539+-0.1350 ! definitely 1.0636x slower Benchmark report for SunSpider on murderface (MacBookPro11,5). VMs tested: "TipOfTree" at /Volumes/Data/secondary/OpenSource/WebKitBuild/Release/jsc (r206363) "Things" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206363) export JSC_useConcurrentBarriers=false export JSC_forceFencedBarrier=false "FenceBarrierOff" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206363) export JSC_useConcurrentBarriers=true export JSC_forceFencedBarrier=false "FenceBarrierOn" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206363) export JSC_useConcurrentBarriers=true export JSC_forceFencedBarrier=true Collected 100 samples per benchmark/VM, with 100 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree Things FenceBarrierOff FenceBarrierOn FenceBarrierOn v. TipOfTree 3d-cube 4.7798+-0.0733 4.7731+-0.0776 ? 4.7969+-0.0682 ! 5.1575+-0.0735 ! definitely 1.0790x slower 3d-morph 4.9877+-0.1557 4.8620+-0.0797 4.8079+-0.0678 ? 4.8669+-0.0731 might be 1.0248x faster 3d-raytrace 4.7818+-0.0820 ? 4.8683+-0.1002 4.7576+-0.0745 ! 5.1002+-0.1022 ! definitely 1.0666x slower access-binary-trees 2.0373+-0.0690 2.0250+-0.0641 ? 2.0319+-0.0647 ! 2.3336+-0.0758 ! definitely 1.1454x slower access-fannkuch 5.0868+-0.0936 5.0123+-0.0779 ? 5.0268+-0.0929 ? 5.1237+-0.0872 ? access-nbody 2.4235+-0.0530 2.4070+-0.0554 ? 2.4074+-0.0393 ! 4.0332+-0.0752 ! definitely 1.6642x slower access-nsieve 3.3544+-0.1123 3.2524+-0.0728 ? 3.2646+-0.0741 ? 3.2886+-0.0592 might be 1.0200x faster bitops-3bit-bits-in-byte 1.1191+-0.0478 1.0958+-0.0320 1.0884+-0.0286 ! 1.1667+-0.0302 ? might be 1.0426x slower bitops-bits-in-byte 2.6866+-0.0639 ! 2.8382+-0.0810 2.8290+-0.0718 ? 2.8594+-0.0656 ! definitely 1.0643x slower bitops-bitwise-and 2.0123+-0.0615 ? 2.0130+-0.0460 ? 2.0272+-0.0553 ? 2.0289+-0.0352 ? bitops-nsieve-bits 3.0680+-0.0560 ? 3.1273+-0.0764 3.0812+-0.0607 ! 3.2384+-0.0794 ! definitely 1.0555x slower controlflow-recursive 2.3504+-0.0681 2.3493+-0.0660 2.2999+-0.0489 ! 2.5551+-0.0394 ! definitely 1.0871x slower crypto-aes 4.2491+-0.0804 ? 4.3124+-0.1037 4.3036+-0.0704 ! 4.4892+-0.1032 ! definitely 1.0565x slower crypto-md5 2.6509+-0.0573 ? 2.6630+-0.0657 ? 2.6635+-0.0614 ! 2.9365+-0.0844 ! definitely 1.1077x slower crypto-sha1 2.7601+-0.0670 2.6642+-0.0375 ? 2.7612+-0.0703 ? 2.8800+-0.0746 ? might be 1.0434x slower date-format-tofte 6.5826+-0.1182 ? 6.7540+-0.1027 6.6871+-0.0935 ! 9.3928+-0.1566 ! definitely 1.4269x slower date-format-xparb 4.4663+-0.0830 ? 4.4911+-0.0612 4.4730+-0.0506 ! 4.9996+-0.0873 ! definitely 1.1194x slower math-cordic 2.6905+-0.0409 ? 2.7427+-0.0657 2.7222+-0.0514 ? 2.7710+-0.0406 ? might be 1.0299x slower math-partial-sums 4.0119+-0.0692 3.9269+-0.0461 ? 4.0258+-0.0746 ! 4.6097+-0.0818 ! definitely 1.1490x slower math-spectral-norm 1.9718+-0.0325 ? 2.0352+-0.0585 ? 2.0393+-0.0652 2.0289+-0.0404 ? might be 1.0289x slower regexp-dna 6.3970+-0.1175 6.3011+-0.0780 ? 6.3835+-0.1095 ? 6.4992+-0.1180 ? might be 1.0160x slower string-base64 4.4889+-0.0816 ? 4.5403+-0.0753 ? 4.5546+-0.0830 ! 5.1134+-0.0853 ! definitely 1.1391x slower string-fasta 5.4806+-0.0922 5.4062+-0.1009 ? 5.4205+-0.0723 ! 5.8931+-0.0713 ! definitely 1.0752x slower string-tagcloud 8.2230+-0.0988 ? 8.2351+-0.0956 8.2001+-0.0865 ! 9.0668+-0.1247 ! definitely 1.1026x slower string-unpack-code 18.1459+-0.1789 ? 18.2796+-0.2175 18.2104+-0.1930 ! 19.8778+-0.2530 ! definitely 1.0954x slower string-validate-input 4.2079+-0.0748 ? 4.2927+-0.0884 4.2476+-0.0599 ! 4.8576+-0.0471 ! definitely 1.1544x slower <arithmetic> 4.4236+-0.0190 ? 4.4334+-0.0149 4.4273+-0.0162 ! 4.8911+-0.0188 ! definitely 1.1057x slower
Filip Pizlo
Comment 14 2016-09-25 17:00:32 PDT
*** Bug 162318 has been marked as a duplicate of this bug. ***
Filip Pizlo
Comment 15 2016-09-25 17:17:07 PDT
Created attachment 289786 [details] the patch
WebKit Commit Bot
Comment 16 2016-09-25 17:19:49 PDT
Attachment 289786 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:128: Missing space before { [whitespace/braces] [5] Total errors found: 1 in 48 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 17 2016-09-25 17:20:50 PDT
Comment on attachment 289786 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=289786&action=review > Source/JavaScriptCore/ChangeLog:43 > + store-load fence on any kind before store barriers, because that causes enormous slow *of any kind > Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.h:87 > +// FIXME: If we sink barriers, we need to make sure that we execute barriers upon OSR exit. I fixed this with the mayExit thing.
Filip Pizlo
Comment 18 2016-09-25 17:22:23 PDT
Created attachment 289787 [details] the patch Fixes some build issues and small goofs.
WebKit Commit Bot
Comment 19 2016-09-25 17:25:05 PDT
Attachment 289787 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:128: Missing space before { [whitespace/braces] [5] Total errors found: 1 in 48 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 20 2016-09-25 17:31:26 PDT
Created attachment 289788 [details] the patch More build fixes!
WebKit Commit Bot
Comment 21 2016-09-25 17:34:13 PDT
Attachment 289788 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:128: Missing space before { [whitespace/braces] [5] Total errors found: 1 in 48 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 22 2016-09-25 18:06:46 PDT
Created attachment 289789 [details] the patch Turns out that all of my ARM assembler changes were not necessary.
WebKit Commit Bot
Comment 23 2016-09-25 18:08:03 PDT
Attachment 289789 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:128: Missing space before { [whitespace/braces] [5] Total errors found: 1 in 46 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 24 2016-09-25 18:11:48 PDT
Created attachment 289790 [details] the patch
WebKit Commit Bot
Comment 25 2016-09-25 18:13:06 PDT
Attachment 289790 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:128: Missing space before { [whitespace/braces] [5] Total errors found: 1 in 46 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 26 2016-09-25 19:13:57 PDT
This looks like a 0.7% slow-down on JetStream with 91% confidence. I think I know why, so I'm going to try things.
Filip Pizlo
Comment 27 2016-09-25 22:30:48 PDT
It looks like a 0.68% slow-down on JetStream 90.4% confidence. I analyzed the per-benchmark data and found that the following benchmarks are regressed: - richards, by about 4% - deltablue, by about 3% - splay, by about 2% - regexp-2010, by about 5% Out of these, the regexp regression seems the most pronounced. Also, it's the only one confirmed by run-jsc-benchmarks. If the other benchmarks become bottlenecks them I'm starting to think that maybe for super hot code, we should emit a patchable nop sled for a fence, and then blast over it with fences at the start of GC and then reset it back at the end. This would lead to amazing perf, provided that we could zap those fences quickly enough. So, how quickly can we modify such code? I'll look into all of that after I fix regexp.
Filip Pizlo
Comment 28 2016-09-25 22:54:28 PDT
Created attachment 289799 [details] the patch The JetStream regression is small, but I'm making it smaller!
WebKit Commit Bot
Comment 29 2016-09-25 22:57:21 PDT
Attachment 289799 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:135: Missing space before { [whitespace/braces] [5] Total errors found: 1 in 48 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 30 2016-09-25 23:57:52 PDT
Created attachment 289800 [details] trying to get around the regexp regression I'm still trying to figure out how best to do it.
Filip Pizlo
Comment 31 2016-09-26 12:37:55 PDT
Created attachment 289846 [details] the patch More performant version.
WebKit Commit Bot
Comment 32 2016-09-26 12:39:40 PDT
Attachment 289846 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:135: Missing space before { [whitespace/braces] [5] Total errors found: 1 in 61 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 33 2016-09-26 12:49:56 PDT
Created attachment 289847 [details] the patch Fixing debug build.
WebKit Commit Bot
Comment 34 2016-09-26 12:51:15 PDT
Attachment 289847 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:135: Missing space before { [whitespace/braces] [5] Total errors found: 1 in 61 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 35 2016-09-26 14:27:19 PDT
Created attachment 289869 [details] the patch Attempting to fix Windows.
WebKit Commit Bot
Comment 36 2016-09-26 14:28:43 PDT
Attachment 289869 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:135: Missing space before { [whitespace/braces] [5] Total errors found: 1 in 61 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 37 2016-09-26 14:45:30 PDT
Created attachment 289875 [details] the patch trying to get windows to work again
WebKit Commit Bot
Comment 38 2016-09-26 14:48:39 PDT
Attachment 289875 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:135: Missing space before { [whitespace/braces] [5] Total errors found: 1 in 61 files If any of these errors are false positives, please file a bug against check-webkit-style.
Geoffrey Garen
Comment 39 2016-09-26 14:59:36 PDT
r=me > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:8321 > + m_jit.sneakyJumpIfIsRememberedOrInEden(baseGPR, scratch1GPR)); I'm not sure I like the trend of labeling concurrent algorithms "sneaky" or "magic". Usually, algorithms do something explainable, and we should strive to explain them. How about "jumpIfIsRememberedOrInEdenOrMightNeedFence"? Or "jumpIfBelowBlackThreshold". (That's what we call this concept in C++.) Or "jumpIfNotCollectingAndIsRememberedOrInEden". > Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:96 > + // requirement by either (1) having a StoreBarrierHint that tells OSR exit to OSR that to barrier that value? > Source/JavaScriptCore/heap/CellState.h:58 > +inline bool isWithinThreshold(CellState cellState, unsigned threshold) Maybe isBelowThreshold? > Source/JavaScriptCore/heap/DeferralContext.h:33 > +class DeferralContext { How about GCDeferralContext or DeferGCContext (to match DeferGC)? > Source/JavaScriptCore/heap/Heap.cpp:1555 > + // In this case, the sneakyBlackThreshold is tautological impossible threshold, so from is a I think you meant that the sneakyBlackThreshold *could be* the tautological threshold -- not that it is guaranteed to be. > Source/JavaScriptCore/heap/Heap.h:269 > + unsigned sneakyBlackThreshold() const { return m_sneakyBlackThreshold; } > + const unsigned* addressOfSneakyBlackThreshold() const { return &m_sneakyBlackThreshold; } Let's just call this blackThreshold.
Geoffrey Garen
Comment 40 2016-09-26 14:59:49 PDT
Comment on attachment 289875 [details] the patch r=me
Filip Pizlo
Comment 41 2016-09-26 15:13:11 PDT
(In reply to comment #39) > r=me > > > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:8321 > > + m_jit.sneakyJumpIfIsRememberedOrInEden(baseGPR, scratch1GPR)); > > I'm not sure I like the trend of labeling concurrent algorithms "sneaky" or > "magic". Usually, algorithms do something explainable, and we should strive > to explain them. > > How about "jumpIfIsRememberedOrInEdenOrMightNeedFence"? > > Or "jumpIfBelowBlackThreshold". (That's what we call this concept in C++.) > > Or "jumpIfNotCollectingAndIsRememberedOrInEden". Those names have so many words! How about I use these names: sneakyJumpIfIsRememberedOrInEden -> barrierJump jumpIfIsRememberedOrInEden -> barrierJumpWithFence jumpIfIsRememberedOrInEdenWithoutFence -> barrierJumpWithoutFence > > > Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:96 > > + // requirement by either (1) having a StoreBarrierHint that tells OSR exit to OSR that > > to barrier that value? Yes. > > > Source/JavaScriptCore/heap/CellState.h:58 > > +inline bool isWithinThreshold(CellState cellState, unsigned threshold) > > Maybe isBelowThreshold? But it's not below threshold. Below means "<". It's below or equal to the threshold. Rather than using BelowOrEqual, I thought "Within" was shorter. > > > Source/JavaScriptCore/heap/DeferralContext.h:33 > > +class DeferralContext { > > How about GCDeferralContext or DeferGCContext (to match DeferGC)? OK. > > > Source/JavaScriptCore/heap/Heap.cpp:1555 > > + // In this case, the sneakyBlackThreshold is tautological impossible threshold, so from > > is a > > I think you meant that the sneakyBlackThreshold *could be* the tautological > threshold -- not that it is guaranteed to be. It is guaranteed to be if barrierShouldBeFenced() is true, which is the condition that guards this comment. > > > Source/JavaScriptCore/heap/Heap.h:269 > > + unsigned sneakyBlackThreshold() const { return m_sneakyBlackThreshold; } > > + const unsigned* addressOfSneakyBlackThreshold() const { return &m_sneakyBlackThreshold; } > > Let's just call this blackThreshold. I don't think that really captures it. This threshold isn't really to do with being black, it's to do with taking the barrier slow path. I'll call it barrierThreshold.
Geoffrey Garen
Comment 42 2016-09-26 15:21:49 PDT
> > How about "jumpIfIsRememberedOrInEdenOrMightNeedFence"? > > > > Or "jumpIfBelowBlackThreshold". (That's what we call this concept in C++.) > > > > Or "jumpIfNotCollectingAndIsRememberedOrInEden". > > Those names have so many words! > > How about I use these names: > > sneakyJumpIfIsRememberedOrInEden -> barrierJump > jumpIfIsRememberedOrInEden -> barrierJumpWithFence > jumpIfIsRememberedOrInEdenWithoutFence -> barrierJumpWithoutFence Yeah, that sounds pretty good. > > > Source/JavaScriptCore/heap/CellState.h:58 > > > +inline bool isWithinThreshold(CellState cellState, unsigned threshold) > > > > Maybe isBelowThreshold? > > But it's not below threshold. Below means "<". It's below or equal to the > threshold. Rather than using BelowOrEqual, I thought "Within" was shorter. OK. > > I think you meant that the sneakyBlackThreshold *could be* the tautological > > threshold -- not that it is guaranteed to be. > > It is guaranteed to be if barrierShouldBeFenced() is true, which is the > condition that guards this comment. OK. > > > Source/JavaScriptCore/heap/Heap.h:269 > > > + unsigned sneakyBlackThreshold() const { return m_sneakyBlackThreshold; } > > > + const unsigned* addressOfSneakyBlackThreshold() const { return &m_sneakyBlackThreshold; } > > > > Let's just call this blackThreshold. > > I don't think that really captures it. This threshold isn't really to do > with being black, it's to do with taking the barrier slow path. I'll call > it barrierThreshold. Sounds good.
Filip Pizlo
Comment 43 2016-09-27 12:33:36 PDT
(In reply to comment #41) > (In reply to comment #39) > > r=me > > > > > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:8321 > > > + m_jit.sneakyJumpIfIsRememberedOrInEden(baseGPR, scratch1GPR)); > > > > I'm not sure I like the trend of labeling concurrent algorithms "sneaky" or > > "magic". Usually, algorithms do something explainable, and we should strive > > to explain them. > > > > How about "jumpIfIsRememberedOrInEdenOrMightNeedFence"? > > > > Or "jumpIfBelowBlackThreshold". (That's what we call this concept in C++.) > > > > Or "jumpIfNotCollectingAndIsRememberedOrInEden". > > Those names have so many words! > > How about I use these names: > > sneakyJumpIfIsRememberedOrInEden -> barrierJump > jumpIfIsRememberedOrInEden -> barrierJumpWithFence > jumpIfIsRememberedOrInEdenWithoutFence -> barrierJumpWithoutFence I ended up removing barrierJumpWithFence. > > > > > > Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:96 > > > + // requirement by either (1) having a StoreBarrierHint that tells OSR exit to OSR that > > > > to barrier that value? > > Yes. > > > > > > Source/JavaScriptCore/heap/CellState.h:58 > > > +inline bool isWithinThreshold(CellState cellState, unsigned threshold) > > > > Maybe isBelowThreshold? > > But it's not below threshold. Below means "<". It's below or equal to the > threshold. Rather than using BelowOrEqual, I thought "Within" was shorter. > > > > > > Source/JavaScriptCore/heap/DeferralContext.h:33 > > > +class DeferralContext { > > > > How about GCDeferralContext or DeferGCContext (to match DeferGC)? > > OK. Fixed. > > > > > > Source/JavaScriptCore/heap/Heap.cpp:1555 > > > + // In this case, the sneakyBlackThreshold is tautological impossible threshold, so from > > > > is a > > > > I think you meant that the sneakyBlackThreshold *could be* the tautological > > threshold -- not that it is guaranteed to be. > > It is guaranteed to be if barrierShouldBeFenced() is true, which is the > condition that guards this comment. > > > > > > Source/JavaScriptCore/heap/Heap.h:269 > > > + unsigned sneakyBlackThreshold() const { return m_sneakyBlackThreshold; } > > > + const unsigned* addressOfSneakyBlackThreshold() const { return &m_sneakyBlackThreshold; } > > > > Let's just call this blackThreshold. > > I don't think that really captures it. This threshold isn't really to do > with being black, it's to do with taking the barrier slow path. I'll call > it barrierThreshold. Fixed.
Filip Pizlo
Comment 44 2016-09-27 12:39:01 PDT
Created attachment 289998 [details] patch for landing
WebKit Commit Bot
Comment 45 2016-09-27 12:42:29 PDT
Attachment 289998 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:135: Missing space before { [whitespace/braces] [5] Total errors found: 1 in 62 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 46 2016-09-28 09:51:02 PDT
Created attachment 290096 [details] rebased patch I just have to test on ARM and then I'll land.
WebKit Commit Bot
Comment 47 2016-09-28 09:52:31 PDT
Attachment 290096 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:135: Missing space before { [whitespace/braces] [5] Total errors found: 1 in 62 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 48 2016-09-28 13:33:57 PDT
Oh boy! Now I just need to benchmark this on ARM and I'm done!
Filip Pizlo
Comment 49 2016-09-28 13:34:57 PDT
Created attachment 290112 [details] rebased patch
WebKit Commit Bot
Comment 50 2016-09-28 13:36:37 PDT
Attachment 290112 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:135: Missing space before { [whitespace/braces] [5] Total errors found: 1 in 62 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 51 2016-09-28 14:58:34 PDT
Csaba Osztrogonác
Comment 52 2016-09-29 05:45:08 PDT
(In reply to comment #51) > Landed in https://trac.webkit.org/changeset/206555 It made Dromaeo/jslib-style-jquery.html crash on performance bots, see bug162721 for details.
Note You need to log in before you can comment on or make changes to this bug.