Bug 147545

Summary: Lightweight locks should be adaptive
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: Web Template FrameworkAssignee: Filip Pizlo <fpizlo>
Status: RESOLVED FIXED    
Severity: Normal CC: andersca, barraclough, basile_clement, benjamin, cdumez, commit-queue, ggaren, kling, mark.lam, mhahnenb, mmirman, msaboff, nrotem, oliver, ossy, saam, sam, simon.fraser
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 147675, 147776    
Bug Blocks: 147665    
Attachments:
Description Flags
work in progress
none
work in progress
none
the patch
none
the patch
none
the patch
none
fix build
none
patch for landing
fpizlo: review+
fix most style
fpizlo: review+
patch for relanding fpizlo: review+

Filip Pizlo
Reported 2015-08-02 14:04:52 PDT
Current WebKit uses a lot of spinlocks because they are cheap - they occupy little memory and they are inexpensive to acquire. But they don't adapt to contention - instead they just spin. We should make them adaptive.
Attachments
work in progress (16.66 KB, patch)
2015-08-02 14:14 PDT, Filip Pizlo
no flags
work in progress (47.20 KB, patch)
2015-08-04 11:55 PDT, Filip Pizlo
no flags
the patch (63.63 KB, patch)
2015-08-05 14:08 PDT, Filip Pizlo
no flags
the patch (65.50 KB, patch)
2015-08-05 16:19 PDT, Filip Pizlo
no flags
the patch (74.43 KB, patch)
2015-08-06 12:43 PDT, Filip Pizlo
no flags
fix build (74.40 KB, patch)
2015-08-06 13:37 PDT, Filip Pizlo
no flags
patch for landing (73.91 KB, patch)
2015-08-06 14:56 PDT, Filip Pizlo
fpizlo: review+
fix most style (74.20 KB, patch)
2015-08-06 15:04 PDT, Filip Pizlo
fpizlo: review+
patch for relanding (77.47 KB, patch)
2015-08-07 13:10 PDT, Filip Pizlo
fpizlo: review+
Filip Pizlo
Comment 1 2015-08-02 14:14:27 PDT
Created attachment 258040 [details] work in progress
Filip Pizlo
Comment 2 2015-08-04 11:55:23 PDT
Created attachment 258193 [details] work in progress Replacing all SpinLock uses with Lock. Added a benchmark, and documentation showing the performance results.
Anders Carlsson
Comment 3 2015-08-04 12:12:15 PDT
Interesting. Can we come up with a better name than "Lock"?
Filip Pizlo
Comment 4 2015-08-04 12:27:43 PDT
(In reply to comment #3) > Interesting. Can we come up with a better name than "Lock"? I was torn about this, and am open to suggestions. Ggaren suggested "Lock" because a simple name will encourage people to use it.
Anders Carlsson
Comment 5 2015-08-04 12:41:50 PDT
(In reply to comment #4) > (In reply to comment #3) > > Interesting. Can we come up with a better name than "Lock"? > > I was torn about this, and am open to suggestions. > > Ggaren suggested "Lock" because a simple name will encourage people to use > it. So this lock will spin for a while and then fall back to a sleeping "system" lock?
Filip Pizlo
Comment 6 2015-08-04 12:47:02 PDT
(In reply to comment #5) > (In reply to comment #4) > > (In reply to comment #3) > > > Interesting. Can we come up with a better name than "Lock"? > > > > I was torn about this, and am open to suggestions. > > > > Ggaren suggested "Lock" because a simple name will encourage people to use > > it. > > So this lock will spin for a while and then fall back to a sleeping "system" > lock? Basically. 40 spins where we call yield, followed by enqueueing and sleeping. We don't really fall back on the system lock - instead a system lock is used for "parking" - i.e. putting the thread to sleep while it waits for some other thread to wake it up. So, when a thread is sleeping, the thread will appear to be wait()'ing on a std::condition_variable while the data structures in Lock.cpp track that the thread is on the queue for that lock.
Geoffrey Garen
Comment 7 2015-08-04 12:53:37 PDT
On second thought, I guess the C++-consistent name would be "Mutex" rather than "Lock". C++ uses std::unique_lock and std::lock_guard to mean "I lock a mutex", and std::mutex to mean "I am the thing that does the testing and sleeping". So, this is really a replacement for std::mutex and not std::*lock*.
Anders Carlsson
Comment 8 2015-08-04 12:55:56 PDT
(In reply to comment #7) > On second thought, I guess the C++-consistent name would be "Mutex" rather > than "Lock". C++ uses std::unique_lock and std::lock_guard to mean "I lock a > mutex", and std::mutex to mean "I am the thing that does the testing and > sleeping". So, this is really a replacement for std::mutex and not > std::*lock*. Critical Section 😑
Filip Pizlo
Comment 9 2015-08-04 13:14:07 PDT
(In reply to comment #8) > (In reply to comment #7) > > On second thought, I guess the C++-consistent name would be "Mutex" rather > > than "Lock". C++ uses std::unique_lock and std::lock_guard to mean "I lock a > > mutex", and std::mutex to mean "I am the thing that does the testing and > > sleeping". So, this is really a replacement for std::mutex and not > > std::*lock*. > > Critical Section 😑 Noooo! :-P I agree that Mutex is the correct name, but then we'd first have to rename the current WTF::Mutex to something like WTF::SystemMutex. That seems annoying, but it might be OK.
Geoffrey Garen
Comment 10 2015-08-04 13:22:23 PDT
> I agree that Mutex is the correct name, but then we'd first have to rename > the current WTF::Mutex to something like WTF::SystemMutex. That seems > annoying, but it might be OK. If the thesis of this patch is correct, then the current WTF::Mutex should become WTF::DeprecatedMutex, since there is no good reason to use the old style, and we should migrate away from it over time. Another reason to migrate away from WTF::Mutex is that it is just a porting holdover from before the C++ thread support library existed. The same goes for WTF::Locker, WTF::ThreadCondition, etc.
Filip Pizlo
Comment 11 2015-08-04 13:25:52 PDT
(In reply to comment #10) > > I agree that Mutex is the correct name, but then we'd first have to rename > > the current WTF::Mutex to something like WTF::SystemMutex. That seems > > annoying, but it might be OK. > > If the thesis of this patch is correct, then the current WTF::Mutex should > become WTF::DeprecatedMutex, since there is no good reason to use the old > style, and we should migrate away from it over time. > > Another reason to migrate away from WTF::Mutex is that it is just a porting > holdover from before the C++ thread support library existed. The same goes > for WTF::Locker, WTF::ThreadCondition, etc. Well, WTF::Mutex currently has the ability to support condition variables. One approach would be to say that all users of WTF::Mutex should move to std::mutex. But I prefer a different approach: implement a WTF::Condition using the same style as this patch - one word in the common case, etc. Then all users of std::mutex should move to the new WTF::Mutex. Even better, we could probably implement a one-word WTF::Monitor that is a combo of Mutex and Condition. This works pretty well for a lot of our code since we often just have one Condition per Mutex. I think that I like WTF::Lock - or some other sufficiently unique name - for the interim before we have a fast WTF::Condition that interoperates with WTF::Lock. Once we have that, maybe we can do the mass renamings?
Mark Lam
Comment 12 2015-08-04 13:27:53 PDT
(In reply to comment #10) > > I agree that Mutex is the correct name, but then we'd first have to rename > > the current WTF::Mutex to something like WTF::SystemMutex. That seems > > annoying, but it might be OK. > > If the thesis of this patch is correct, then the current WTF::Mutex should > become WTF::DeprecatedMutex, since there is no good reason to use the old > style, and we should migrate away from it over time. > > Another reason to migrate away from WTF::Mutex is that it is just a porting > holdover from before the C++ thread support library existed. The same goes > for WTF::Locker, WTF::ThreadCondition, etc. From what I can tell, Fil's implementation does not support priority inheritance. If all a low priority is thread is holding the Lock/Mutex and high priority threads are waiting on the ConditionVar, the owner thread of the Lock will continue to run at a low priority relative to other high priority threads. Does WebKit rely on priority inheritance anywhere?
Anders Carlsson
Comment 13 2015-08-04 13:41:19 PDT
We could just rename current WTF::Mutex to WTF::DeprecatedMutex. Also, I think you should put this class inside wtf/threading where we have (admittedly only one at the moment) other threading classes.
Filip Pizlo
Comment 14 2015-08-04 13:58:07 PDT
(In reply to comment #12) > (In reply to comment #10) > > > I agree that Mutex is the correct name, but then we'd first have to rename > > > the current WTF::Mutex to something like WTF::SystemMutex. That seems > > > annoying, but it might be OK. > > > > If the thesis of this patch is correct, then the current WTF::Mutex should > > become WTF::DeprecatedMutex, since there is no good reason to use the old > > style, and we should migrate away from it over time. > > > > Another reason to migrate away from WTF::Mutex is that it is just a porting > > holdover from before the C++ thread support library existed. The same goes > > for WTF::Locker, WTF::ThreadCondition, etc. > > From what I can tell, Fil's implementation does not support priority > inheritance. If all a low priority is thread is holding the Lock/Mutex and > high priority threads are waiting on the ConditionVar, the owner thread of > the Lock will continue to run at a low priority relative to other high > priority threads. Does WebKit rely on priority inheritance anywhere? True. Our SpinLock also doesn't support priority inheritance, and this currently replaces SpinLock. So, we should be fine.
Mark Lam
Comment 15 2015-08-04 14:01:32 PDT
(In reply to comment #14) > (In reply to comment #12) > > (In reply to comment #10) > > > > I agree that Mutex is the correct name, but then we'd first have to rename > > > > the current WTF::Mutex to something like WTF::SystemMutex. That seems > > > > annoying, but it might be OK. > > > > > > If the thesis of this patch is correct, then the current WTF::Mutex should > > > become WTF::DeprecatedMutex, since there is no good reason to use the old > > > style, and we should migrate away from it over time. > > > > > > Another reason to migrate away from WTF::Mutex is that it is just a porting > > > holdover from before the C++ thread support library existed. The same goes > > > for WTF::Locker, WTF::ThreadCondition, etc. > > > > From what I can tell, Fil's implementation does not support priority > > inheritance. If all a low priority is thread is holding the Lock/Mutex and > > high priority threads are waiting on the ConditionVar, the owner thread of > > the Lock will continue to run at a low priority relative to other high > > priority threads. Does WebKit rely on priority inheritance anywhere? > > True. Our SpinLock also doesn't support priority inheritance, and this > currently replaces SpinLock. So, we should be fine. I'm ok with it replacing SpinLock. My concern was with Geoff's consideration of replacing all uses of WTF::Mutex with this Lock implementation. WTF::Mutex does support priority inheritance. I just want to make sure that we're not replacing the current WTF::Mutex without due consideration.
Filip Pizlo
Comment 16 2015-08-04 14:11:45 PDT
(In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #12) > > > (In reply to comment #10) > > > > > I agree that Mutex is the correct name, but then we'd first have to rename > > > > > the current WTF::Mutex to something like WTF::SystemMutex. That seems > > > > > annoying, but it might be OK. > > > > > > > > If the thesis of this patch is correct, then the current WTF::Mutex should > > > > become WTF::DeprecatedMutex, since there is no good reason to use the old > > > > style, and we should migrate away from it over time. > > > > > > > > Another reason to migrate away from WTF::Mutex is that it is just a porting > > > > holdover from before the C++ thread support library existed. The same goes > > > > for WTF::Locker, WTF::ThreadCondition, etc. > > > > > > From what I can tell, Fil's implementation does not support priority > > > inheritance. If all a low priority is thread is holding the Lock/Mutex and > > > high priority threads are waiting on the ConditionVar, the owner thread of > > > the Lock will continue to run at a low priority relative to other high > > > priority threads. Does WebKit rely on priority inheritance anywhere? > > > > True. Our SpinLock also doesn't support priority inheritance, and this > > currently replaces SpinLock. So, we should be fine. > > I'm ok with it replacing SpinLock. My concern was with Geoff's > consideration of replacing all uses of WTF::Mutex with this Lock > implementation. WTF::Mutex does support priority inheritance. I just want > to make sure that we're not replacing the current WTF::Mutex without due > consideration. Considering how much we use SpinLock, whatever priority inheritance Mutex does have is ineffective. It only takes one lock that doesn't know about priorities to cause a priority inversion, and I think we use SpinLock much more than Mutex.
Gavin Barraclough
Comment 17 2015-08-04 17:38:33 PDT
> > From what I can tell, Fil's implementation does not support priority > > inheritance. If all a low priority is thread is holding the Lock/Mutex and > > high priority threads are waiting on the ConditionVar, the owner thread of > > the Lock will continue to run at a low priority relative to other high > > priority threads. Does WebKit rely on priority inheritance anywhere? > > True. Our SpinLock also doesn't support priority inheritance, and this > currently replaces SpinLock. So, we should be fine. This line of reasoning make me like the name WTF::SystemMutex for the old mutex. If you explicitly need all the system mutex behaviors (i.e. priority inheritance) you should use WTF::SystemMutex. If you don't (i.e. most use cases) then just use WTF::Mutex.
Filip Pizlo
Comment 18 2015-08-04 18:06:03 PDT
This is totally neutral on JSC benchmarks, as expected. Benchmark report for SunSpider, LongSpider, V8Spider, Octane, Kraken, JSRegress, AsmBench, and CompressionBench on shakezilla (MacBookPro11,3). VMs tested: "TipOfTree" at /Volumes/Data/secondary/OpenSource/WebKitBuild/Release/jsc (r187865) "Locks" at /Volumes/Data/quartary/OpenSource/WebKitBuild/Release/jsc (r187865) Collected 6 samples per benchmark/VM, with 6 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree Locks SunSpider: 3d-cube 4.6188+-0.3398 4.5114+-0.3606 might be 1.0238x faster 3d-morph 5.2458+-0.1575 5.1527+-0.1167 might be 1.0181x faster 3d-raytrace 5.1114+-0.1005 ? 5.1558+-0.2675 ? access-binary-trees 2.1410+-0.1643 ? 2.2280+-0.3079 ? might be 1.0407x slower access-fannkuch 5.1596+-0.0585 ? 5.3521+-0.2510 ? might be 1.0373x slower access-nbody 2.4100+-0.0680 ? 2.4127+-0.0940 ? access-nsieve 3.0173+-0.1354 ? 3.0187+-0.0812 ? bitops-3bit-bits-in-byte 1.4736+-0.0371 1.4592+-0.0414 bitops-bits-in-byte 3.2611+-0.0897 3.2560+-0.0809 bitops-bitwise-and 1.9926+-0.0351 ? 2.0106+-0.0718 ? bitops-nsieve-bits 2.9257+-0.0965 2.8903+-0.1189 might be 1.0122x faster controlflow-recursive 2.1397+-0.2957 2.0040+-0.0825 might be 1.0677x faster crypto-aes 3.8312+-0.0946 ? 3.9472+-0.2965 ? might be 1.0303x slower crypto-md5 2.4287+-0.0536 ? 2.4761+-0.0776 ? might be 1.0195x slower crypto-sha1 2.3803+-0.1830 ? 2.7833+-0.9070 ? might be 1.1693x slower date-format-tofte 6.7365+-0.5026 6.6499+-0.2602 might be 1.0130x faster date-format-xparb 4.5785+-0.4290 ? 4.6176+-0.2775 ? math-cordic 2.7206+-0.0705 ? 2.7586+-0.0657 ? might be 1.0140x slower math-partial-sums 5.3063+-0.3196 5.1866+-0.1879 might be 1.0231x faster math-spectral-norm 1.8261+-0.0803 1.8233+-0.0812 regexp-dna 6.3188+-0.3528 6.2596+-0.5123 string-base64 4.2675+-0.1632 ? 4.3479+-0.4265 ? might be 1.0189x slower string-fasta 5.9072+-0.6528 5.6984+-0.0657 might be 1.0366x faster string-tagcloud 7.9675+-0.1336 ? 8.4041+-0.7230 ? might be 1.0548x slower string-unpack-code 19.9951+-0.5403 19.5374+-0.3111 might be 1.0234x faster string-validate-input 5.0745+-1.3733 4.6250+-0.2041 might be 1.0972x faster <arithmetic> 4.5706+-0.1171 4.5602+-0.0625 might be 1.0023x faster TipOfTree Locks LongSpider: 3d-cube 771.8095+-7.7938 ? 773.0779+-9.1282 ? 3d-morph 1494.4234+-7.5328 1490.1073+-5.8303 3d-raytrace 619.9231+-3.2634 ? 621.4335+-3.9142 ? access-binary-trees 812.5942+-5.3948 ? 812.7593+-10.7651 ? access-fannkuch 274.1295+-1.0160 ? 274.3002+-1.8077 ? access-nbody 514.1676+-1.9901 ? 517.4148+-5.7030 ? access-nsieve 365.8359+-11.6282 ? 375.9618+-16.6471 ? might be 1.0277x slower bitops-3bit-bits-in-byte 39.9388+-0.4853 ? 40.1216+-1.2031 ? bitops-bits-in-byte 81.0704+-2.1803 79.5781+-2.1317 might be 1.0188x faster bitops-nsieve-bits 406.6110+-7.8635 403.8947+-5.2638 controlflow-recursive 418.9563+-2.0375 ? 424.5520+-9.2078 ? might be 1.0134x slower crypto-aes 576.9404+-5.4632 ? 578.2195+-5.4360 ? crypto-md5 474.6021+-7.0047 ? 482.6179+-17.4491 ? might be 1.0169x slower crypto-sha1 636.2938+-4.0944 ? 640.3953+-3.1385 ? date-format-tofte 498.8580+-11.7161 ? 503.7063+-24.5992 ? date-format-xparb 627.6571+-11.5439 627.0872+-6.2736 hash-map 156.8577+-1.5164 155.9521+-1.5994 math-cordic 482.5645+-3.3998 ? 483.8471+-2.1519 ? math-partial-sums 464.8508+-3.5575 462.0233+-2.0481 math-spectral-norm 548.6165+-4.3437 ? 551.7510+-6.7575 ? string-base64 346.6176+-4.4541 ? 346.6691+-4.1204 ? string-fasta 357.2021+-3.0032 ? 357.4994+-2.6034 ? string-tagcloud 199.2832+-57.4505 177.5579+-1.5202 might be 1.1224x faster <geometric> 391.0695+-4.6109 390.4987+-1.7559 might be 1.0015x faster TipOfTree Locks V8Spider: crypto 49.1598+-1.7482 48.9808+-1.9084 deltablue 79.0791+-2.6777 ? 79.7299+-4.5284 ? earley-boyer 39.5946+-0.5930 39.2212+-1.0414 raytrace 28.7557+-0.7918 ? 31.6886+-6.1090 ? might be 1.1020x slower regexp 61.1023+-0.5778 ? 61.6135+-0.7153 ? richards 70.9105+-1.7815 ? 71.5622+-3.4116 ? splay 34.1810+-2.0900 34.0174+-2.3288 <geometric> 48.7429+-0.4757 ? 49.3850+-0.8291 ? might be 1.0132x slower TipOfTree Locks Octane: encrypt 0.20265+-0.00074 0.20230+-0.00301 decrypt 3.56572+-0.47534 3.33499+-0.01607 might be 1.0692x faster deltablue x2 0.15417+-0.00137 0.15307+-0.00180 earley 0.30291+-0.00540 0.30110+-0.00649 boyer 4.15908+-0.02376 4.13385+-0.02584 navier-stokes x2 4.92411+-0.02893 4.91718+-0.02410 raytrace x2 1.04270+-0.07241 1.01884+-0.07396 might be 1.0234x faster richards x2 0.10021+-0.00161 0.10017+-0.00196 splay x2 0.33619+-0.00219 ! 0.34187+-0.00215 ! definitely 1.0169x slower regexp x2 24.68531+-0.30351 ? 24.83500+-0.51691 ? pdfjs x2 37.33105+-0.63204 ? 37.49453+-0.34998 ? mandreel x2 43.63864+-0.22603 ? 44.33171+-0.78447 ? might be 1.0159x slower gbemu x2 34.33502+-1.71199 ? 34.35153+-1.03796 ? closure 0.55893+-0.00571 ? 0.56102+-0.00125 ? jquery 7.11598+-0.06968 ? 7.13456+-0.03945 ? box2d x2 10.03658+-0.06629 ? 10.05517+-0.12237 ? zlib x2 389.75605+-2.90554 385.79913+-20.33314 might be 1.0103x faster typescript x2 658.92818+-8.09899 656.12842+-11.54885 <geometric> 5.62700+-0.04596 5.61331+-0.04365 might be 1.0024x faster TipOfTree Locks Kraken: ai-astar 230.285+-8.356 ? 236.064+-1.856 ? might be 1.0251x slower audio-beat-detection 58.771+-0.669 ? 58.976+-0.471 ? audio-dft 95.483+-2.067 ? 95.895+-1.841 ? audio-fft 34.981+-0.570 ? 35.125+-0.601 ? audio-oscillator 62.665+-3.990 ? 62.940+-2.650 ? imaging-darkroom 58.677+-0.334 ? 59.003+-0.752 ? imaging-desaturate 54.471+-2.943 ? 55.475+-0.933 ? might be 1.0184x slower imaging-gaussian-blur 82.631+-0.555 82.590+-0.261 json-parse-financial 37.789+-0.731 37.238+-0.481 might be 1.0148x faster json-stringify-tinderbox 22.592+-1.490 ? 22.814+-1.245 ? stanford-crypto-aes 42.903+-1.715 41.604+-0.927 might be 1.0312x faster stanford-crypto-ccm 35.273+-1.648 ? 36.158+-1.537 ? might be 1.0251x slower stanford-crypto-pbkdf2 94.467+-1.003 94.257+-0.305 stanford-crypto-sha256-iterative 36.312+-0.559 ? 36.499+-0.671 ? <arithmetic> 67.664+-0.939 ? 68.188+-0.477 ? might be 1.0077x slower TipOfTree Locks JSRegress: abc-forward-loop-equal 29.4311+-0.6455 ? 29.5404+-0.5091 ? abc-postfix-backward-loop 29.4591+-0.6159 29.0017+-0.7659 might be 1.0158x faster abc-simple-backward-loop 28.9969+-0.5550 28.8738+-0.3935 abc-simple-forward-loop 29.2154+-0.9575 29.0029+-0.8270 abc-skippy-loop 21.1214+-0.6533 ? 21.4778+-1.1481 ? might be 1.0169x slower abs-boolean 2.4264+-0.0860 2.3811+-0.0347 might be 1.0190x faster adapt-to-double-divide 16.0105+-0.2168 ? 16.2480+-0.5204 ? might be 1.0148x slower aliased-arguments-getbyval 1.1665+-0.0399 ? 1.4693+-0.4157 ? might be 1.2596x slower allocate-big-object 2.5751+-0.5480 ? 2.6128+-0.2639 ? might be 1.0146x slower arguments-named-and-reflective 10.9051+-0.4144 10.8686+-0.3554 arguments-out-of-bounds 10.0490+-0.9350 9.8034+-0.4199 might be 1.0251x faster arguments-strict-mode 9.5454+-0.2523 ? 9.5959+-0.2316 ? arguments 8.1871+-0.2263 ? 8.4177+-0.3306 ? might be 1.0282x slower arity-mismatch-inlining 0.8488+-0.1486 0.8202+-0.0454 might be 1.0348x faster array-access-polymorphic-structure 6.3090+-0.5862 5.9623+-0.1803 might be 1.0581x faster array-nonarray-polymorhpic-access 24.5815+-0.6221 ? 25.1122+-1.2537 ? might be 1.0216x slower array-prototype-every 74.7804+-1.0854 ? 75.6542+-1.7074 ? might be 1.0117x slower array-prototype-forEach 74.7113+-0.5646 74.4927+-1.0822 array-prototype-map 83.1633+-1.0930 ? 83.7856+-0.6654 ? array-prototype-reduce 70.6877+-1.6114 ? 70.9931+-1.8866 ? array-prototype-reduceRight 70.8095+-2.1000 70.2520+-0.9904 array-prototype-some 74.1130+-0.4029 ! 75.4754+-0.7724 ! definitely 1.0184x slower array-splice-contiguous 20.3935+-0.7524 ? 20.4006+-1.1547 ? array-with-double-add 3.9681+-0.1761 3.9156+-0.0583 might be 1.0134x faster array-with-double-increment 3.0434+-0.1339 2.9893+-0.0405 might be 1.0181x faster array-with-double-mul-add 5.0186+-0.2781 4.9191+-0.0308 might be 1.0202x faster array-with-double-sum 3.3228+-0.0785 3.2970+-0.0521 array-with-int32-add-sub 7.2165+-0.0702 7.1890+-0.0923 array-with-int32-or-double-sum 3.5155+-0.1898 3.3524+-0.0437 might be 1.0487x faster ArrayBuffer-DataView-alloc-large-long-lived 28.9523+-0.3013 ? 29.1951+-0.4087 ? ArrayBuffer-DataView-alloc-long-lived 13.0372+-0.5093 12.8317+-0.4350 might be 1.0160x faster ArrayBuffer-Int32Array-byteOffset 3.8682+-0.8988 3.5271+-0.0517 might be 1.0967x faster ArrayBuffer-Int8Array-alloc-large-long-lived 35.7914+-12.1571 30.9712+-0.6537 might be 1.1556x faster ArrayBuffer-Int8Array-alloc-long-lived-buffer 20.1246+-0.2806 ? 20.2460+-0.4632 ? ArrayBuffer-Int8Array-alloc-long-lived 12.1322+-0.3063 ? 12.8298+-1.5482 ? might be 1.0575x slower ArrayBuffer-Int8Array-alloc 9.8477+-0.6430 9.5338+-0.5052 might be 1.0329x faster asmjs_bool_bug 7.0368+-0.0900 ? 7.1530+-0.2484 ? might be 1.0165x slower assign-custom-setter-polymorphic 2.4277+-0.0677 ? 2.4716+-0.1799 ? might be 1.0181x slower assign-custom-setter 3.3452+-0.1044 ? 3.3477+-0.2979 ? basic-set 7.8219+-0.3594 7.6944+-0.3537 might be 1.0166x faster big-int-mul 3.4474+-0.1785 3.3638+-0.0508 might be 1.0249x faster boolean-test 2.8345+-0.0976 ? 2.8393+-0.0753 ? branch-fold 3.5478+-0.0544 3.5389+-0.0350 branch-on-string-as-boolean 16.1193+-0.6798 ? 16.3823+-0.8719 ? might be 1.0163x slower by-val-generic 7.1298+-0.2748 7.0569+-0.2293 might be 1.0103x faster call-spread-apply 27.1528+-1.7094 ? 27.5428+-1.2267 ? might be 1.0144x slower call-spread-call 22.6948+-1.5555 ? 23.6428+-1.8490 ? might be 1.0418x slower captured-assignments 0.4191+-0.0469 0.4184+-0.0378 cast-int-to-double 4.9904+-0.0862 ? 5.0358+-0.2751 ? cell-argument 6.2188+-0.1947 ? 6.4216+-0.3847 ? might be 1.0326x slower cfg-simplify 2.7042+-0.0428 ? 2.7358+-0.0776 ? might be 1.0117x slower chain-getter-access 8.3185+-0.5494 8.2552+-0.7248 cmpeq-obj-to-obj-other 11.4144+-1.1263 11.0760+-1.2552 might be 1.0306x faster constant-test 4.7964+-0.0957 4.7922+-0.0353 create-lots-of-functions 9.9573+-0.3876 9.4965+-0.5592 might be 1.0485x faster cse-new-array-buffer 2.2085+-0.1138 ? 2.2989+-0.1539 ? might be 1.0409x slower cse-new-array 2.3505+-0.1683 ? 2.4184+-0.4198 ? might be 1.0289x slower DataView-custom-properties 35.4969+-0.8371 35.2449+-0.9654 delay-tear-off-arguments-strictmode 13.1044+-0.8228 12.6542+-1.2472 might be 1.0356x faster deltablue-varargs 147.7344+-3.3833 ? 149.5223+-2.2763 ? might be 1.0121x slower destructuring-arguments 163.4681+-2.8567 162.2511+-2.1923 destructuring-parameters-overridden-by-function 0.5030+-0.0408 ? 0.5389+-0.0284 ? might be 1.0714x slower destructuring-swap 4.5915+-0.0088 ? 4.6814+-0.2070 ? might be 1.0196x slower direct-arguments-getbyval 1.1461+-0.0686 ? 1.1761+-0.0480 ? might be 1.0262x slower div-boolean-double 5.6390+-0.6562 5.2838+-0.1281 might be 1.0672x faster div-boolean 8.1726+-0.1546 ? 8.1959+-0.2375 ? double-get-by-val-out-of-bounds 4.2944+-0.2925 4.1179+-0.1729 might be 1.0429x faster double-pollution-getbyval 8.5732+-0.0294 8.5477+-0.0409 double-pollution-putbyoffset 3.8433+-0.0861 3.7512+-0.0730 might be 1.0245x faster double-real-use 27.3352+-2.4386 ? 27.3402+-3.0593 ? double-to-int32-typed-array-no-inline 2.0493+-0.0625 ? 2.0610+-0.0487 ? double-to-int32-typed-array 1.7634+-0.0720 ? 1.8650+-0.1075 ? might be 1.0576x slower double-to-uint32-typed-array-no-inline 2.0874+-0.0438 ? 2.1016+-0.0465 ? double-to-uint32-typed-array 1.9541+-0.1746 1.8213+-0.0600 might be 1.0729x faster elidable-new-object-dag 33.5557+-0.6089 ? 33.7206+-0.7661 ? elidable-new-object-roflcopter 33.2937+-0.8281 33.1740+-0.5020 elidable-new-object-then-call 30.6334+-1.1906 ? 32.1959+-1.5218 ? might be 1.0510x slower elidable-new-object-tree 36.2391+-1.0350 ? 36.5379+-1.2132 ? empty-string-plus-int 4.7792+-0.2019 4.6746+-0.3033 might be 1.0224x faster emscripten-cube2hash 26.7148+-1.5520 ? 27.0872+-0.5889 ? might be 1.0139x slower exit-length-on-plain-object 13.0363+-0.4496 12.7657+-0.1889 might be 1.0212x faster external-arguments-getbyval 1.1845+-0.0618 1.1817+-0.0645 external-arguments-putbyval 2.4150+-0.3808 2.1960+-0.1059 might be 1.0997x faster fixed-typed-array-storage-var-index 1.2191+-0.1174 1.1706+-0.0529 might be 1.0414x faster fixed-typed-array-storage 0.8485+-0.0301 ? 0.9090+-0.0831 ? might be 1.0713x slower Float32Array-matrix-mult 3.8852+-0.1154 ? 4.1158+-0.4322 ? might be 1.0594x slower Float32Array-to-Float64Array-set 46.2218+-0.4158 ! 47.5546+-0.7306 ! definitely 1.0288x slower Float64Array-alloc-long-lived 73.5778+-0.8413 ? 74.4862+-1.3166 ? might be 1.0123x slower Float64Array-to-Int16Array-set 57.5435+-1.4247 ? 57.8420+-2.1581 ? fold-double-to-int 12.3702+-0.2197 ? 12.7174+-0.4438 ? might be 1.0281x slower fold-get-by-id-to-multi-get-by-offset-rare-int 10.4351+-0.3672 10.3497+-0.3431 fold-get-by-id-to-multi-get-by-offset 8.6943+-0.2393 ? 9.0524+-0.4835 ? might be 1.0412x slower fold-multi-get-by-offset-to-get-by-offset 8.5268+-0.5141 7.7818+-0.9843 might be 1.0957x faster fold-multi-get-by-offset-to-poly-get-by-offset 7.3630+-1.2768 ? 8.1326+-0.7635 ? might be 1.1045x slower fold-multi-put-by-offset-to-poly-put-by-offset 7.7292+-0.3083 7.7029+-0.2045 fold-multi-put-by-offset-to-put-by-offset 4.6861+-0.6057 ? 5.1802+-0.6778 ? might be 1.1054x slower fold-multi-put-by-offset-to-replace-or-transition-put-by-offset 9.9001+-1.1068 9.2662+-1.0821 might be 1.0684x faster fold-put-by-id-to-multi-put-by-offset 8.6160+-0.3561 ? 8.8826+-0.9294 ? might be 1.0310x slower fold-put-structure 4.7251+-0.8216 ? 5.1494+-0.6639 ? might be 1.0898x slower for-of-iterate-array-entries 11.2489+-0.4281 ? 11.2823+-0.5164 ? for-of-iterate-array-keys 3.3832+-0.2045 ? 3.4946+-0.1935 ? might be 1.0329x slower for-of-iterate-array-values 3.5361+-0.4255 3.4215+-0.2286 might be 1.0335x faster fround 18.6280+-0.8883 ? 18.9003+-0.8606 ? might be 1.0146x slower ftl-library-inlining-dataview 56.3653+-0.8020 ? 57.1455+-0.5690 ? might be 1.0138x slower ftl-library-inlining 108.5827+-3.1158 107.3868+-0.8126 might be 1.0111x faster function-dot-apply 2.0738+-0.2045 1.9975+-0.1145 might be 1.0382x faster function-test 2.6981+-0.0479 ? 2.7323+-0.1454 ? might be 1.0127x slower function-with-eval 101.7899+-27.8880 90.6862+-0.8463 might be 1.1224x faster gcse-poly-get-less-obvious 14.1600+-0.1831 ? 14.5827+-1.3847 ? might be 1.0299x slower gcse-poly-get 14.4560+-1.0900 14.1248+-0.7286 might be 1.0234x faster gcse 3.7415+-0.0300 ? 3.7749+-0.1158 ? get-by-id-bimorphic-check-structure-elimination-simple 2.5618+-0.0848 ? 2.5723+-0.0952 ? get-by-id-bimorphic-check-structure-elimination 5.5898+-0.0290 ? 5.6350+-0.1315 ? get-by-id-chain-from-try-block 5.5067+-0.1796 ? 5.7276+-0.3959 ? might be 1.0401x slower get-by-id-check-structure-elimination 4.4624+-0.4398 4.2795+-0.0566 might be 1.0427x faster get-by-id-proto-or-self 14.7878+-1.1722 14.7739+-0.8649 get-by-id-quadmorphic-check-structure-elimination-simple 2.8392+-0.0156 ? 2.9611+-0.3184 ? might be 1.0430x slower get-by-id-self-or-proto 14.8367+-1.2603 14.6765+-0.6055 might be 1.0109x faster get-by-val-out-of-bounds 4.0387+-0.1936 3.7887+-0.1236 might be 1.0660x faster get_callee_monomorphic 2.2247+-0.1005 2.2176+-0.0985 get_callee_polymorphic 3.1812+-0.1324 ? 3.1878+-0.1611 ? getter-no-activation 4.8024+-0.2533 ? 4.9355+-0.4116 ? might be 1.0277x slower getter-prototype 9.7732+-0.2517 ? 9.8181+-0.3455 ? getter-richards 111.2248+-2.9748 106.5098+-4.3630 might be 1.0443x faster getter 5.3310+-0.3423 ? 5.5077+-0.6742 ? might be 1.0331x slower global-object-access-with-mutating-structure 5.3385+-0.3275 ? 5.6608+-0.7679 ? might be 1.0604x slower global-var-const-infer-fire-from-opt 0.9130+-0.0848 0.9003+-0.1176 might be 1.0141x faster global-var-const-infer 0.8393+-0.2034 0.7262+-0.0740 might be 1.1557x faster HashMap-put-get-iterate-keys 26.1661+-1.1985 ? 26.6242+-2.0088 ? might be 1.0175x slower HashMap-put-get-iterate 26.0252+-1.0754 26.0108+-1.2974 HashMap-string-put-get-iterate 24.8424+-1.3729 24.0193+-1.0171 might be 1.0343x faster hoist-make-rope 7.9203+-0.3836 7.6068+-0.4711 might be 1.0412x faster hoist-poly-check-structure-effectful-loop 4.7454+-1.2376 4.2230+-0.0790 might be 1.1237x faster hoist-poly-check-structure 3.2236+-0.0974 ? 3.2330+-0.0852 ? imul-double-only 6.7386+-0.3489 6.6762+-0.3878 imul-int-only 7.9030+-0.5001 ? 8.1891+-0.6864 ? might be 1.0362x slower imul-mixed 7.2461+-1.7053 6.9045+-0.9320 might be 1.0495x faster in-four-cases 17.2359+-0.4854 ? 17.3593+-0.5640 ? in-one-case-false 9.0988+-0.1774 ? 9.1637+-0.3132 ? in-one-case-true 9.0683+-0.2109 ? 9.1772+-0.1992 ? might be 1.0120x slower in-two-cases 9.5544+-0.2156 9.3526+-0.2068 might be 1.0216x faster indexed-properties-in-objects 2.7566+-0.0528 ? 2.8378+-0.3293 ? might be 1.0295x slower infer-closure-const-then-mov-no-inline 3.0016+-0.0663 2.9661+-0.0317 might be 1.0119x faster infer-closure-const-then-mov 17.2023+-0.6241 16.9183+-0.8782 might be 1.0168x faster infer-closure-const-then-put-to-scope-no-inline 11.4705+-0.1600 ? 11.5735+-0.3966 ? infer-closure-const-then-put-to-scope 20.5011+-0.9456 ? 21.2447+-0.4627 ? might be 1.0363x slower infer-closure-const-then-reenter-no-inline 49.5101+-0.4197 ? 49.6235+-0.8381 ? infer-closure-const-then-reenter 22.0381+-0.8619 21.4801+-0.5591 might be 1.0260x faster infer-constant-global-property 3.3746+-0.0691 3.3238+-0.0690 might be 1.0153x faster infer-constant-property 2.5870+-0.0767 ? 2.6871+-0.3098 ? might be 1.0387x slower infer-one-time-closure-ten-vars 8.5613+-0.5191 8.4368+-0.2019 might be 1.0148x faster infer-one-time-closure-two-vars 8.6231+-1.1874 7.8493+-0.4357 might be 1.0986x faster infer-one-time-closure 7.7193+-0.3155 7.7172+-0.2470 infer-one-time-deep-closure 12.5498+-0.2206 12.5320+-0.2301 inline-arguments-access 3.6872+-0.1390 3.6535+-0.2597 inline-arguments-aliased-access 3.6155+-0.0859 3.5938+-0.0906 inline-arguments-local-escape 3.7382+-0.2183 ? 3.9161+-0.1768 ? might be 1.0476x slower inline-get-scoped-var 4.8607+-0.6836 4.5531+-0.0968 might be 1.0675x faster inlined-put-by-id-transition 10.5071+-0.9659 10.4994+-0.5332 int-or-other-abs-then-get-by-val 4.7192+-0.0745 ? 4.7309+-0.0649 ? int-or-other-abs-zero-then-get-by-val 16.0094+-0.6580 ? 16.5340+-1.8372 ? might be 1.0328x slower int-or-other-add-then-get-by-val 3.9679+-0.0738 ? 3.9836+-0.1303 ? int-or-other-add 4.8459+-0.1881 4.7853+-0.0755 might be 1.0127x faster int-or-other-div-then-get-by-val 3.6446+-0.0483 ? 4.2713+-1.4209 ? might be 1.1720x slower int-or-other-max-then-get-by-val 3.9763+-0.1104 3.9296+-0.1083 might be 1.0119x faster int-or-other-min-then-get-by-val 3.9684+-0.2585 ? 4.0075+-0.1882 ? int-or-other-mod-then-get-by-val 3.4517+-0.1176 3.3933+-0.0344 might be 1.0172x faster int-or-other-mul-then-get-by-val 3.5400+-0.1007 3.4863+-0.0687 might be 1.0154x faster int-or-other-neg-then-get-by-val 4.4394+-0.1172 4.4148+-0.0986 int-or-other-neg-zero-then-get-by-val 15.7043+-0.2769 ? 16.0299+-0.5322 ? might be 1.0207x slower int-or-other-sub-then-get-by-val 4.0084+-0.0709 ? 4.0190+-0.0769 ? int-or-other-sub 3.4205+-0.1129 3.3710+-0.2498 might be 1.0147x faster int-overflow-local 4.2570+-0.2466 4.1997+-0.0917 might be 1.0136x faster Int16Array-alloc-long-lived 46.5350+-0.8865 46.3805+-0.8759 Int16Array-bubble-sort-with-byteLength 17.9331+-1.4371 ? 18.1070+-1.4030 ? Int16Array-bubble-sort 19.5194+-4.6858 17.1704+-0.2678 might be 1.1368x faster Int16Array-load-int-mul 1.4073+-0.0567 ? 1.4139+-0.0512 ? Int16Array-to-Int32Array-set 44.5347+-0.9270 44.3029+-0.8960 Int32Array-alloc-large 12.2661+-0.7776 ? 12.2882+-0.5082 ? Int32Array-alloc-long-lived 55.9569+-1.0172 ? 56.8643+-0.7153 ? might be 1.0162x slower Int32Array-alloc 3.1479+-0.5080 2.9006+-0.1494 might be 1.0853x faster Int32Array-Int8Array-view-alloc 5.9994+-0.1947 ? 6.0868+-0.4361 ? might be 1.0146x slower int52-spill 4.4820+-0.1097 ? 4.6136+-0.2119 ? might be 1.0294x slower Int8Array-alloc-long-lived 41.5892+-0.6479 41.5321+-1.3298 Int8Array-load-with-byteLength 3.3646+-0.1132 3.3544+-0.0882 Int8Array-load 3.3743+-0.1382 3.3084+-0.0218 might be 1.0199x faster integer-divide 10.3449+-0.2948 10.2629+-0.2195 integer-modulo 1.6921+-0.0682 1.6443+-0.0262 might be 1.0291x faster is-boolean-fold-tricky 3.7695+-0.2507 3.6943+-0.0937 might be 1.0204x faster is-boolean-fold 2.6397+-0.0719 2.6315+-0.2307 is-function-fold-tricky-internal-function 9.7115+-0.1852 ? 9.9187+-0.3937 ? might be 1.0213x slower is-function-fold-tricky 4.1689+-0.2322 4.1405+-0.3524 is-function-fold 2.5895+-0.0308 ? 2.6680+-0.0841 ? might be 1.0303x slower is-number-fold-tricky 4.0082+-0.1464 3.9374+-0.0511 might be 1.0180x faster is-number-fold 2.7183+-0.2368 2.6349+-0.1407 might be 1.0317x faster is-object-or-null-fold-functions 2.6198+-0.0177 ? 2.6492+-0.0771 ? might be 1.0112x slower is-object-or-null-fold-less-tricky 4.1205+-0.1828 3.9911+-0.0580 might be 1.0324x faster is-object-or-null-fold-tricky 5.2237+-0.2759 5.1108+-0.1936 might be 1.0221x faster is-object-or-null-fold 2.5963+-0.0162 ? 2.6821+-0.1837 ? might be 1.0331x slower is-object-or-null-trickier-function 4.0523+-0.0857 ? 4.1052+-0.1356 ? might be 1.0131x slower is-object-or-null-trickier-internal-function 10.4813+-0.5467 10.3479+-0.3109 might be 1.0129x faster is-object-or-null-tricky-function 4.1650+-0.3293 4.0823+-0.0774 might be 1.0203x faster is-object-or-null-tricky-internal-function 7.6849+-0.2710 7.5968+-0.1495 might be 1.0116x faster is-string-fold-tricky 4.0859+-0.3097 3.9436+-0.0668 might be 1.0361x faster is-string-fold 2.6167+-0.0317 2.5723+-0.0478 might be 1.0173x faster is-undefined-fold-tricky 3.2343+-0.0131 ? 3.2968+-0.0941 ? might be 1.0193x slower is-undefined-fold 2.6873+-0.2312 2.5906+-0.0400 might be 1.0373x faster large-int-captured 3.8054+-0.1124 ? 3.8549+-0.1177 ? might be 1.0130x slower large-int-neg 13.5714+-0.1696 ? 13.7554+-0.2425 ? might be 1.0136x slower large-int 13.4234+-0.4982 13.3300+-0.2232 load-varargs-elimination 23.0448+-2.0188 21.2191+-0.5203 might be 1.0860x faster logical-not-weird-types 2.8005+-0.0471 ? 2.8770+-0.0920 ? might be 1.0273x slower logical-not 4.1720+-0.0488 ? 4.1772+-0.0324 ? lots-of-fields 9.2710+-0.3436 9.2445+-0.3625 make-indexed-storage 2.7332+-0.2411 ? 2.9135+-0.0432 ? might be 1.0660x slower make-rope-cse 3.5614+-0.1117 3.4460+-0.0686 might be 1.0335x faster marsaglia-larger-ints 32.5627+-1.4280 32.3544+-1.0415 marsaglia-osr-entry 21.2875+-0.9930 ? 21.5020+-1.0563 ? might be 1.0101x slower math-with-out-of-bounds-array-values 21.1706+-0.3917 ? 22.0400+-1.3890 ? might be 1.0411x slower max-boolean 2.9205+-0.4292 2.6804+-0.0474 might be 1.0896x faster method-on-number 16.3499+-2.2298 15.7264+-0.3308 might be 1.0396x faster min-boolean 2.7332+-0.1508 2.6893+-0.0589 might be 1.0163x faster minus-boolean-double 3.0157+-0.0560 ? 3.0502+-0.0305 ? might be 1.0114x slower minus-boolean 2.4617+-0.3589 2.3305+-0.0473 might be 1.0563x faster misc-strict-eq 30.6582+-1.4321 ? 31.0740+-1.2283 ? might be 1.0136x slower mod-boolean-double 10.9356+-0.0236 ? 11.3243+-0.6947 ? might be 1.0355x slower mod-boolean 8.2232+-0.0381 ? 8.3283+-0.2901 ? might be 1.0128x slower mul-boolean-double 3.5007+-0.0482 ? 3.6218+-0.1176 ? might be 1.0346x slower mul-boolean 2.7504+-0.0337 ? 2.7784+-0.0829 ? might be 1.0102x slower neg-boolean 3.0803+-0.0509 3.0783+-0.0755 negative-zero-divide 0.3402+-0.0201 0.3372+-0.0260 negative-zero-modulo 0.3317+-0.0181 0.3227+-0.0090 might be 1.0280x faster negative-zero-negate 0.2969+-0.0079 ? 0.3120+-0.0242 ? might be 1.0508x slower nested-function-parsing 43.9893+-0.7904 ? 45.3918+-1.5144 ? might be 1.0319x slower new-array-buffer-dead 88.6620+-0.6082 88.2945+-0.8151 new-array-buffer-push 5.9692+-0.1872 ? 6.0770+-0.1910 ? might be 1.0181x slower new-array-dead 13.7745+-0.8925 ? 14.3691+-1.5777 ? might be 1.0432x slower new-array-push 6.2713+-0.1677 ? 6.3263+-0.3784 ? no-inline-constructor 30.6511+-0.5837 ? 30.6954+-0.4519 ? number-test 2.9928+-0.5127 ? 3.1079+-0.5614 ? might be 1.0384x slower object-closure-call 5.5295+-1.4609 5.0150+-0.2569 might be 1.1026x faster object-get-own-property-symbols-on-large-array 4.3228+-0.2826 3.9572+-0.2154 might be 1.0924x faster object-test 2.5636+-0.0436 ? 2.5942+-0.1060 ? might be 1.0119x slower obvious-sink-pathology-taken 98.1758+-0.5442 ? 98.9768+-1.0129 ? obvious-sink-pathology 94.4482+-0.7450 93.9732+-0.5454 obviously-elidable-new-object 29.0589+-1.9294 27.9064+-1.2521 might be 1.0413x faster plus-boolean-arith 2.3861+-0.1421 2.3766+-0.1183 plus-boolean-double 3.0127+-0.0298 ? 3.0751+-0.0842 ? might be 1.0207x slower plus-boolean 2.5045+-0.0434 2.4974+-0.0642 poly-chain-access-different-prototypes-simple 3.1909+-0.0387 ? 3.3320+-0.2930 ? might be 1.0442x slower poly-chain-access-different-prototypes 2.8161+-0.0582 ? 2.9920+-0.3539 ? might be 1.0625x slower poly-chain-access-simpler 3.1650+-0.0639 ? 3.4338+-0.4778 ? might be 1.0849x slower poly-chain-access 3.2872+-0.0415 3.2510+-0.0889 might be 1.0111x faster poly-stricteq 51.1250+-2.3018 50.3325+-1.0605 might be 1.0157x faster polymorphic-array-call 1.2576+-0.0962 1.2429+-0.0791 might be 1.0119x faster polymorphic-get-by-id 2.8027+-0.1118 ? 2.8370+-0.1066 ? might be 1.0122x slower polymorphic-put-by-id 25.0592+-1.9886 25.0528+-0.9222 polymorphic-structure 13.2218+-0.4555 13.0255+-0.2717 might be 1.0151x faster polyvariant-monomorphic-get-by-id 6.4676+-0.6409 ? 6.5570+-0.5297 ? might be 1.0138x slower proto-getter-access 8.2242+-0.6557 8.0768+-0.2657 might be 1.0182x faster prototype-access-with-mutating-prototype 5.1840+-0.1869 ? 5.2817+-0.2326 ? might be 1.0188x slower put-by-id-replace-and-transition 8.0203+-0.9125 7.8495+-0.7383 might be 1.0218x faster put-by-id-slightly-polymorphic 2.6041+-0.0615 ? 2.6045+-0.0390 ? put-by-id 10.1289+-0.5885 9.7128+-0.3381 might be 1.0428x faster put-by-val-direct 0.3269+-0.0392 ? 0.3350+-0.0427 ? might be 1.0248x slower put-by-val-large-index-blank-indexing-type 12.2990+-0.5069 12.1674+-0.6289 might be 1.0108x faster put-by-val-machine-int 2.4688+-0.0919 ? 2.4921+-0.0649 ? rare-osr-exit-on-local 14.2319+-0.1034 ? 14.2503+-0.2043 ? register-pressure-from-osr 16.2970+-0.4292 16.2575+-0.5536 repeat-multi-get-by-offset 21.3699+-0.4236 21.3483+-0.4161 setter-prototype 7.4680+-0.5662 7.2887+-0.1055 might be 1.0246x faster setter 5.5324+-0.4482 5.2751+-0.2463 might be 1.0488x faster simple-activation-demo 24.7349+-1.2257 24.1342+-0.3658 might be 1.0249x faster simple-getter-access 10.3615+-0.3310 ? 10.5888+-0.4820 ? might be 1.0219x slower simple-poly-call-nested 8.4572+-0.1856 ? 8.5163+-0.5092 ? simple-poly-call 1.2171+-0.0511 ? 1.2476+-0.0236 ? might be 1.0250x slower sin-boolean 17.6104+-0.7849 ? 18.1829+-0.5633 ? might be 1.0325x slower singleton-scope 57.3776+-1.3370 57.2948+-0.7073 sink-function 9.1988+-0.8236 ? 9.5696+-0.8152 ? might be 1.0403x slower sink-huge-activation 16.3401+-0.8877 ? 16.4679+-1.0364 ? sinkable-new-object-dag 56.0027+-2.0327 ? 56.4583+-1.7138 ? sinkable-new-object-taken 42.8366+-0.3905 41.5056+-1.1038 might be 1.0321x faster sinkable-new-object 29.6855+-0.7479 29.5966+-0.8989 slow-array-profile-convergence 2.5593+-0.0700 2.4167+-0.1560 might be 1.0590x faster slow-convergence 2.3380+-0.0639 ? 2.4402+-0.2639 ? might be 1.0437x slower slow-ternaries 17.0802+-1.1567 16.3400+-0.5025 might be 1.0453x faster sorting-benchmark 16.7271+-0.7309 ? 17.2496+-1.1350 ? might be 1.0312x slower sparse-conditional 1.1231+-0.0685 ? 1.1281+-0.0984 ? splice-to-remove 12.7288+-0.4852 ? 13.4135+-0.8492 ? might be 1.0538x slower string-char-code-at 14.0789+-0.2012 ? 14.5408+-0.6210 ? might be 1.0328x slower string-concat-object 2.0315+-0.1222 ? 2.0942+-0.2208 ? might be 1.0308x slower string-concat-pair-object 2.1005+-0.0973 2.0427+-0.0998 might be 1.0283x faster string-concat-pair-simple 9.1489+-0.2608 ? 9.4525+-0.6427 ? might be 1.0332x slower string-concat-simple 9.3206+-0.3588 9.1401+-0.2679 might be 1.0197x faster string-cons-repeat 6.5197+-0.3724 6.2753+-0.0930 might be 1.0389x faster string-cons-tower 6.8563+-0.2696 6.8459+-0.4448 string-equality 14.9040+-0.2879 ? 16.8920+-4.9129 ? might be 1.1334x slower string-get-by-val-big-char 6.4945+-0.2081 ? 6.6141+-0.3633 ? might be 1.0184x slower string-get-by-val-out-of-bounds-insane 3.2140+-0.1606 3.1789+-0.1510 might be 1.0110x faster string-get-by-val-out-of-bounds 4.0335+-0.0897 4.0117+-0.1012 string-get-by-val 2.7732+-0.0312 ? 2.8350+-0.1523 ? might be 1.0223x slower string-hash 1.8011+-0.0503 ? 1.8518+-0.1423 ? might be 1.0281x slower string-long-ident-equality 12.5070+-0.3099 12.4930+-0.1623 string-out-of-bounds 10.2957+-0.2414 ? 10.3895+-0.5651 ? string-repeat-arith 28.3127+-2.4203 27.6652+-1.9970 might be 1.0234x faster string-sub 52.2406+-1.7343 ? 53.7339+-2.7587 ? might be 1.0286x slower string-test 2.7150+-0.1090 2.7052+-0.1575 string-var-equality 26.0236+-0.8814 25.8988+-0.1012 structure-hoist-over-transitions 2.5518+-0.3256 2.3871+-0.1026 might be 1.0690x faster substring-concat-weird 36.3336+-1.0488 36.0645+-0.9566 substring-concat 40.4528+-1.1500 39.6150+-1.0643 might be 1.0211x faster substring 44.2067+-0.6149 ? 45.1627+-1.7261 ? might be 1.0216x slower switch-char-constant 2.6517+-0.0763 2.6501+-0.0475 switch-char 5.3918+-0.4063 ? 6.1407+-0.7219 ? might be 1.1389x slower switch-constant 9.1096+-0.5317 8.6955+-0.7775 might be 1.0476x faster switch-string-basic-big-var 15.8295+-0.6737 15.5594+-0.3867 might be 1.0174x faster switch-string-basic-big 14.6224+-0.0502 ? 14.9059+-0.3785 ? might be 1.0194x slower switch-string-basic-var 12.9771+-0.2716 ? 13.0095+-0.2295 ? switch-string-basic 12.9408+-0.5119 ? 13.0186+-0.4379 ? switch-string-big-length-tower-var 18.5831+-0.7327 17.9133+-0.4198 might be 1.0374x faster switch-string-length-tower-var 13.5752+-0.2019 ? 13.7209+-0.5233 ? might be 1.0107x slower switch-string-length-tower 11.9496+-0.5654 ? 12.0768+-0.7585 ? might be 1.0106x slower switch-string-short 11.7242+-0.4068 11.6760+-0.1826 switch 13.5850+-0.4935 13.5618+-1.2831 tear-off-arguments-simple 3.0074+-0.3629 ? 3.1211+-0.6195 ? might be 1.0378x slower tear-off-arguments 3.9759+-0.5293 3.8374+-0.1186 might be 1.0361x faster temporal-structure 11.8560+-0.1123 ? 12.1626+-0.4011 ? might be 1.0259x slower to-int32-boolean 12.9554+-0.1760 ? 12.9683+-0.3637 ? try-catch-get-by-val-cloned-arguments 15.0916+-1.5418 ? 15.8611+-2.0070 ? might be 1.0510x slower try-catch-get-by-val-direct-arguments 6.3882+-0.1744 6.3547+-0.1022 try-catch-get-by-val-scoped-arguments 7.6905+-0.8553 7.4753+-0.2672 might be 1.0288x faster typed-array-get-set-by-val-profiling 27.8927+-1.0758 26.7041+-0.4257 might be 1.0445x faster undefined-property-access 222.5137+-2.2166 ? 224.9151+-9.8847 ? might be 1.0108x slower undefined-test 2.8036+-0.0765 ? 2.9485+-0.2998 ? might be 1.0517x slower unprofiled-licm 14.4114+-0.7128 13.8948+-0.3951 might be 1.0372x faster varargs-call 13.8079+-0.1498 ? 13.9062+-0.2306 ? varargs-construct-inline 22.4495+-1.1794 ? 22.5368+-0.8581 ? varargs-construct 20.5671+-1.8610 19.7297+-0.7125 might be 1.0424x faster varargs-inline 8.3321+-0.3159 ? 8.4739+-0.4129 ? might be 1.0170x slower varargs-strict-mode 10.2015+-3.0752 8.9217+-0.0706 might be 1.1434x faster varargs 9.0400+-0.1417 ? 9.2693+-0.3876 ? might be 1.0254x slower weird-inlining-const-prop 2.7458+-0.1319 2.7002+-0.2400 might be 1.0169x faster <geometric> 7.7854+-0.0322 7.7793+-0.0177 might be 1.0008x faster TipOfTree Locks AsmBench: bigfib.cpp 451.7339+-2.7386 449.9489+-4.0480 cray.c 396.9018+-1.6414 396.5612+-3.0379 dry.c 426.3335+-14.6658 418.8844+-17.7420 might be 1.0178x faster FloatMM.c 684.4924+-3.5659 682.9594+-4.2914 gcc-loops.cpp 3417.9097+-20.6120 ? 3422.5210+-10.5589 ? n-body.c 825.2303+-6.7249 824.3636+-6.6153 Quicksort.c 405.9477+-2.4087 403.7581+-3.2103 stepanov_container.cpp 3570.5713+-53.9702 ? 3580.5184+-46.7172 ? Towers.c 234.2159+-1.0423 ? 236.5055+-1.6307 ? <geometric> 716.9863+-3.2946 715.5897+-3.1498 might be 1.0020x faster TipOfTree Locks CompressionBench: huffman 60.2603+-1.2999 59.1404+-1.9187 might be 1.0189x faster arithmetic-simple 269.4722+-0.9750 ? 270.8483+-1.9146 ? arithmetic-precise 245.4895+-1.8101 ? 246.4168+-2.1057 ? arithmetic-complex-precise 245.1629+-0.8891 ! 246.9762+-0.8129 ! definitely 1.0074x slower arithmetic-precise-order-0 277.8969+-1.8417 ? 280.1409+-1.1917 ? arithmetic-precise-order-1 303.8276+-1.6805 ? 304.0233+-1.3715 ? arithmetic-precise-order-2 358.0378+-8.1948 357.3034+-1.4944 arithmetic-simple-order-1 322.2326+-1.9211 ? 322.4513+-1.4659 ? arithmetic-simple-order-2 385.6448+-9.7185 384.8462+-4.3312 lz-string 308.5616+-4.8886 307.8201+-5.6033 <geometric> 254.3239+-1.1574 ? 254.3342+-0.9678 ? might be 1.0000x slower TipOfTree Locks Geomean of preferred means: <scaled-result> 51.1798+-0.2764 ? 51.2575+-0.1970 ? might be 1.0015x slower
Filip Pizlo
Comment 19 2015-08-04 18:10:00 PDT
(In reply to comment #17) > > > From what I can tell, Fil's implementation does not support priority > > > inheritance. If all a low priority is thread is holding the Lock/Mutex and > > > high priority threads are waiting on the ConditionVar, the owner thread of > > > the Lock will continue to run at a low priority relative to other high > > > priority threads. Does WebKit rely on priority inheritance anywhere? > > > > True. Our SpinLock also doesn't support priority inheritance, and this > > currently replaces SpinLock. So, we should be fine. > > This line of reasoning make me like the name WTF::SystemMutex for the old > mutex. > > If you explicitly need all the system mutex behaviors (i.e. priority > inheritance) you should use WTF::SystemMutex. If you don't (i.e. most use > cases) then just use WTF::Mutex. But std::mutex gives you those things as well. Therefore, you should use std::mutex instead of WTF::Mutex. That means that WTF::Mutex should be called WTF::DeprecatedMutex.
Filip Pizlo
Comment 20 2015-08-05 14:08:10 PDT
Created attachment 258299 [details] the patch
WebKit Commit Bot
Comment 21 2015-08-05 14:11:23 PDT
Attachment 258299 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/heap/ListableHandler.h:26: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:64: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:78: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/WebKit2/WebProcess/WebPage/EventDispatcher.h:37: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/WTF/wtf/MetaAllocator.h:40: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/WTF/wtf/Mutex.cpp:68: Missing space before { [whitespace/braces] [5] ERROR: Source/JavaScriptCore/heap/CopiedSpace.h:38: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/WebKit2/WebProcess/WebPage/ViewUpdateDispatcher.h:34: Alphabetical sorting problem. [build/include_order] [4] Total errors found: 8 in 34 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 22 2015-08-05 16:19:43 PDT
Created attachment 258316 [details] the patch
WebKit Commit Bot
Comment 23 2015-08-05 16:24:32 PDT
Attachment 258316 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/heap/ListableHandler.h:26: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:64: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:78: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/WebKit2/WebProcess/WebPage/EventDispatcher.h:37: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/WTF/wtf/MetaAllocator.h:40: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/WTF/wtf/Mutex.cpp:68: Missing space before { [whitespace/braces] [5] ERROR: Source/JavaScriptCore/heap/CopiedSpace.h:38: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/WebKit2/WebProcess/WebPage/ViewUpdateDispatcher.h:34: Alphabetical sorting problem. [build/include_order] [4] Total errors found: 8 in 36 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 24 2015-08-06 12:43:34 PDT
Created attachment 258385 [details] the patch
WebKit Commit Bot
Comment 25 2015-08-06 12:46:04 PDT
Attachment 258385 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/heap/ListableHandler.h:26: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:64: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:78: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/WebKit2/WebProcess/WebPage/EventDispatcher.h:37: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/WTF/wtf/MetaAllocator.h:40: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/heap/CopiedSpace.h:38: Alphabetical sorting problem. [build/include_order] [4] ERROR: Tools/TestWebKitAPI/Tests/WTF/Lock.cpp:48: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/WebKit2/WebProcess/WebPage/ViewUpdateDispatcher.h:34: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/WTF/wtf/Lock.cpp:68: Missing space before { [whitespace/braces] [5] Total errors found: 9 in 44 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 26 2015-08-06 13:37:28 PDT
Created attachment 258392 [details] fix build
WebKit Commit Bot
Comment 27 2015-08-06 13:39:24 PDT
Attachment 258392 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/heap/ListableHandler.h:26: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:64: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:78: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/WebKit2/WebProcess/WebPage/EventDispatcher.h:37: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/WTF/wtf/MetaAllocator.h:40: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/heap/CopiedSpace.h:38: Alphabetical sorting problem. [build/include_order] [4] ERROR: Tools/TestWebKitAPI/Tests/WTF/Lock.cpp:48: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/WebKit2/WebProcess/WebPage/ViewUpdateDispatcher.h:34: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/WTF/wtf/Lock.cpp:68: Missing space before { [whitespace/braces] [5] Total errors found: 9 in 43 files If any of these errors are false positives, please file a bug against check-webkit-style.
Geoffrey Garen
Comment 28 2015-08-06 14:04:28 PDT
Comment on attachment 258385 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=258385&action=review r=me I think this is all correct code, but I have some substantial comments below. Please fix style warnings. > Source/JavaScriptCore/heap/CopiedSpace.cpp:194 > - SpinLockHolder locker(&m_toSpaceLock); > + LockHolder locker(&m_toSpaceLock); It would be nice just to use std::lock_guard, and get rid of our hand-rolled holders. > Source/WTF/wtf/Lock.cpp:137 > + // Release the queue lock. > + for (;;) { > + currentWordValue = m_word.load(); > + ASSERT(currentWordValue & ~mask); > + ASSERT(currentWordValue & lockedQueueBit); > + bool result = m_word.compareExchangeWeak( > + currentWordValue, currentWordValue & ~lockedQueueBit); Given that another thread holds the lock, that we hold the queue lock, and that the other thread can't drop the lock until we drop the queue lock, it should be impossible at this point for any other thread to store to m_word. Therefore, we can do a plain store here without a looping compare-and-exchange. If my reasoning were incorrect, then it would seem that queue modification would be non-atomic, which would probably be broken (or at least scary). Therefore, I think it is clearer to remove this looping compare-and-exchange. > Source/WTF/wtf/Lock.cpp:139 > + if (result) > + break; I would say return here. You have to read pretty far to see what break means. > Source/WTF/wtf/Lock.cpp:148 > + // We need a CAS loop to install the queue head. This CAS loop is just to protect against > + // isHeldBit changing. We also release the queue lock here. > + for (;;) { Again, I find this loop confusing, because it seems to assert that it need not loop, and I believe the assertion. > Source/WTF/wtf/Lock.cpp:229 > + // Change the queue head, possibly removing it if newQueueHead is null. This is a CAS loop > + // out of paranoia. It doesn't have to be since the state of the lock cannot change right now. > + // We would get no performance win from making this anything but a CAS loop, and it would make > + // it confusing to add more bits to the lock. > + for (;;) { It helps that you explain in a comment that this loop is not needed, but I really don't like reading and debugging code that says "this code shouldn't be here" :(. I would just remove this loop, and perform a naked store. I don't understand what paranoia in this loop buys me. I guess it ensures, at a base level, that the queueHead bits will be valid, but if another thread is illegally modifying the queue or the lock while I hold the lock and the queue lock, I don't think a coherent set of queueHead bits will save me from the other incoherencies that will result. > Source/WTF/wtf/Lock.cpp:255 > + std::unique_lock<std::mutex> locker(queueHead->parkingLock); This should be std::lock_guard since it doesn't need to unlock manually. Also, you should scope the lock_guard so that you release parkingLock right before calling notify. Otherwise, you create a pathology where the awoken thread immediately re-sleeps due to contention on parkingLock. > Source/WTF/wtf/Lock.cpp:258 > + // Use notify_all() out of paranoia. In reality, only the blocked thread is waiting on > + // this condition variable. I would call this convenience or expedience rather than paranoia. We really do need to notify all threads, since we have no API for notifying just the head of the queue. > Source/WTF/wtf/Lock.h:136 > + bool isHeld() const > + { > + return m_word.load(std::memory_order_acquire) & isHeldBit; > + } > + > + bool isLocked() const > + { > + return isHeld(); > + } Can we use just one name for this? > Source/WTF/wtf/Lock.h:138 > + // Everything below here should be considered private. It's not private because of POD rules. Can we say protected here, or perhaps private and make Lock a friend? POD structs can have private stuff. > Source/WTF/wtf/Lock.h:141 > + static const uintptr_t isHeldBit = 1; > + static const uintptr_t lockedQueueBit = 2; > + static const uintptr_t mask = 3; It would be nice to have helper functions for dealing with these bits: bool isLocked(uintptr_t); uintptr_t lock(uintptr_t); uintptr_t unlock(uintptr_t); bool isQueueLocked(uintptr_t); uintptr_t lockQueue(uintptr_t); uintptr_t unlockQueue(uintptr_t); ThreadData* queue(uintptr_t word); uintptr_t setQueue(uintptr_t, ThreadData*); In addition to being more readable at the call site, these functions can help document and verify preconditions. For example, unlock can assert that the queue is not locked, lockQueue can assert that the lock is locked, and so on.
Filip Pizlo
Comment 29 2015-08-06 14:43:41 PDT
(In reply to comment #28) > Comment on attachment 258385 [details] > the patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=258385&action=review > > r=me > > I think this is all correct code, but I have some substantial comments below. > > Please fix style warnings. OK, but I'll leave the ones regarding lambdas alone since what the style bot thinks is the opposite of what everyone does. > > > Source/JavaScriptCore/heap/CopiedSpace.cpp:194 > > - SpinLockHolder locker(&m_toSpaceLock); > > + LockHolder locker(&m_toSpaceLock); > > It would be nice just to use std::lock_guard, and get rid of our hand-rolled > holders. Maybe, but that seems like a different patch. Also, are you advocating having to always say std::lock_guard<Lock> locker(...) instead of LockHolder locker(...)? The latter is more concise. > > > Source/WTF/wtf/Lock.cpp:137 > > + // Release the queue lock. > > + for (;;) { > > + currentWordValue = m_word.load(); > > + ASSERT(currentWordValue & ~mask); > > + ASSERT(currentWordValue & lockedQueueBit); > > + bool result = m_word.compareExchangeWeak( > > + currentWordValue, currentWordValue & ~lockedQueueBit); > > Given that another thread holds the lock, that we hold the queue lock, and > that the other thread can't drop the lock until we drop the queue lock, it > should be impossible at this point for any other thread to store to m_word. > Therefore, we can do a plain store here without a looping > compare-and-exchange. This is true. Using a CAS loop here makes it easier to assert that it's true. But, I'm fine with removing the loop. > > If my reasoning were incorrect, then it would seem that queue modification > would be non-atomic, which would probably be broken (or at least scary). > Therefore, I think it is clearer to remove this looping compare-and-exchange. > > > Source/WTF/wtf/Lock.cpp:139 > > + if (result) > > + break; > > I would say return here. You have to read pretty far to see what break means. OK. > > > Source/WTF/wtf/Lock.cpp:148 > > + // We need a CAS loop to install the queue head. This CAS loop is just to protect against > > + // isHeldBit changing. We also release the queue lock here. > > + for (;;) { > > Again, I find this loop confusing, because it seems to assert that it need > not loop, and I believe the assertion. You're right, isHeldBit cannot change here. It's true that making this a loop makes it easier to assert things. > > > Source/WTF/wtf/Lock.cpp:229 > > + // Change the queue head, possibly removing it if newQueueHead is null. This is a CAS loop > > + // out of paranoia. It doesn't have to be since the state of the lock cannot change right now. > > + // We would get no performance win from making this anything but a CAS loop, and it would make > > + // it confusing to add more bits to the lock. > > + for (;;) { > > It helps that you explain in a comment that this loop is not needed, but I > really don't like reading and debugging code that says "this code shouldn't > be here" :(. I would just remove this loop, and perform a naked store. OK. > > I don't understand what paranoia in this loop buys me. I guess it ensures, > at a base level, that the queueHead bits will be valid, but if another > thread is illegally modifying the queue or the lock while I hold the lock > and the queue lock, I don't think a coherent set of queueHead bits will save > me from the other incoherencies that will result. Loading, asserting, and then CASing to change state ensures that the assert held right up until the moment when you unlock. If you assert that something is true about something you load(), and then you do a store(), then the bad state could have arisen after the load() but before the store(). > > > Source/WTF/wtf/Lock.cpp:255 > > + std::unique_lock<std::mutex> locker(queueHead->parkingLock); > > This should be std::lock_guard since it doesn't need to unlock manually. > Also, you should scope the lock_guard so that you release parkingLock right > before calling notify. Otherwise, you create a pathology where the awoken > thread immediately re-sleeps due to contention on parkingLock. You're right that the style I'm using can cause this rare pathology. My performance testing seems to imply that the pathology isn't happening, in the sense that when I have enough contention to cause parking, my locks perform as well as a WTF::Mutex - so it's unlikely that the awoken thread is paying a steeper price than just waking up out of the condition variable and then successfully acquiring the parking lock. I'm fine with changing this, but the counterpoint here is that the style I'm using is less error-prone. It prevents a large class of bugs where someone calls notify after having changed some state but without ever holding the lock. For example, if unlock() never acquired the parking lock, unlock() could end up setting shouldPark just after lock() checks that shouldPark is still true and just before lock() calls wait(). The way that I like to mentally steer myself away from making that mistake is to add "notify should be called while the lock is held" to my mental checklist. That's a good way of ensuring that you don't mess up, since: shouldPark = false; lock { notify(); } is just as correct as either of these: lock { shouldPark = false; notify(); } shouldPark = false; lock { } notify(); While this is definitely wrong: shouldPark = false; notify(); So, forcing the lock to be held while calling notify() is often enough to compensate for a race condition involving the state variable (in this case, shouldPark). I'd say that's a pretty good deal. But for this code I'm down with being precise and efficient, since it's fine to spend more time thinking about this code. But as a general rule, I guess I'd want to be able to continue to use my "notify while locked" mental checklist item; I'd even prefer if our own API for notify ASSERTed that the lock was held. I'm curious if the "stuck in unlock" pathology affects WebKit. Are you bringing it up as a hypothetical or do we have experience with it? > > > Source/WTF/wtf/Lock.cpp:258 > > + // Use notify_all() out of paranoia. In reality, only the blocked thread is waiting on > > + // this condition variable. > > I would call this convenience or expedience rather than paranoia. We really > do need to notify all threads, since we have no API for notifying just the > head of the queue. No, it's paranoia. There is at most one thread waiting on the parking condition. Each thread has its own parking condition. The parking condition is only waited on by the thread that owns it. Therefore, notify_all and notify_one are exactly the same here. I have two reasons for preferring notify_all pretty much all the time: - If you allow yourself to use notify_one, you will probably eventually introduce a hard-to-catch deadlock due to there being more than one thread that needed to be notified. - In my Jikes RVM days, I played around with making Java's notify() just do notifyAll() behind your back. There was no change in performance on the benchmarks I had available. I suspect that you could easily craft a microbenchmark that gets slower if you force yourself to use notify_all() even though you could have used notify_one(). But I've never seen notify_one() turn into a speed-up, and I have seen it turn into a deadlock. That seems like a rotten deal to me. For that reason, I prefer notify_all(). In this code, I'm fine with using notify_one() if you believe that the proper style is to be precise about notifications. But as a general style, I like notify_all() better. OTOH, in this code, it totally doesn't matter since it would be a bug if more than one thread waited on the same parking condition. > > > Source/WTF/wtf/Lock.h:136 > > + bool isHeld() const > > + { > > + return m_word.load(std::memory_order_acquire) & isHeldBit; > > + } > > + > > + bool isLocked() const > > + { > > + return isHeld(); > > + } > > Can we use just one name for this? I'd rather save that for another patch. I was trying to minimize code changes due to existing inconsistency. > > > Source/WTF/wtf/Lock.h:138 > > + // Everything below here should be considered private. It's not private because of POD rules. > > Can we say protected here, or perhaps private and make Lock a friend? POD > structs can have private stuff. The old rules definitely said otherwise, and we seem to still obey those old rules in SpinLockBase. I guess maybe the rules changed: http://stackoverflow.com/questions/4762788/can-a-class-with-all-private-members-be-a-pod-class So I guess in new C++, LockBase would be POD for our purposes if all data members are private, which is easy to ensure. I'm going to try to use protected, but I might revert that if one of our various compilers doesn't like this. > > > Source/WTF/wtf/Lock.h:141 > > + static const uintptr_t isHeldBit = 1; > > + static const uintptr_t lockedQueueBit = 2; > > + static const uintptr_t mask = 3; > > It would be nice to have helper functions for dealing with these bits: > > bool isLocked(uintptr_t); > uintptr_t lock(uintptr_t); > uintptr_t unlock(uintptr_t); > > bool isQueueLocked(uintptr_t); > uintptr_t lockQueue(uintptr_t); > uintptr_t unlockQueue(uintptr_t); > > ThreadData* queue(uintptr_t word); > uintptr_t setQueue(uintptr_t, ThreadData*); > > In addition to being more readable at the call site, these functions can > help document and verify preconditions. For example, unlock can assert that > the queue is not locked, lockQueue can assert that the lock is locked, and > so on. I actually disagree with this. To me, if I had the option of: 1) word = lock(word); 2) word |= isHeldBit; I would take (2) any day, because the intent is just as clear as (1) and the implementation is strictly more clear. With (1), I'd have to look to see how the act of locking was represented.
Filip Pizlo
Comment 30 2015-08-06 14:56:09 PDT
Created attachment 258402 [details] patch for landing This addresses all of ggaren's comments except for check-style fixes and using helpers for setting and clearing the m_word bits.
WebKit Commit Bot
Comment 31 2015-08-06 14:58:15 PDT
Attachment 258402 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/heap/ListableHandler.h:26: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:64: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:78: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/WebKit2/WebProcess/WebPage/EventDispatcher.h:37: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/WTF/wtf/MetaAllocator.h:40: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/heap/CopiedSpace.h:38: Alphabetical sorting problem. [build/include_order] [4] ERROR: Tools/TestWebKitAPI/Tests/WTF/Lock.cpp:48: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/WebKit2/WebProcess/WebPage/ViewUpdateDispatcher.h:34: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/WTF/wtf/Lock.cpp:69: Missing space before { [whitespace/braces] [5] Total errors found: 9 in 43 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 32 2015-08-06 15:02:31 PDT
(In reply to comment #31) > Attachment 258402 [details] did not pass style-queue: > > > ERROR: Source/JavaScriptCore/heap/ListableHandler.h:26: Alphabetical > sorting problem. [build/include_order] [4] > ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:64: Place brace on its own > line for function definitions. [whitespace/braces] [4] > ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:78: Place brace on its own > line for function definitions. [whitespace/braces] [4] > ERROR: Source/WebKit2/WebProcess/WebPage/EventDispatcher.h:37: Alphabetical > sorting problem. [build/include_order] [4] > ERROR: Source/WTF/wtf/MetaAllocator.h:40: Alphabetical sorting problem. > [build/include_order] [4] > ERROR: Source/JavaScriptCore/heap/CopiedSpace.h:38: Alphabetical sorting > problem. [build/include_order] [4] > ERROR: Tools/TestWebKitAPI/Tests/WTF/Lock.cpp:48: Place brace on its own > line for function definitions. [whitespace/braces] [4] > ERROR: Source/WebKit2/WebProcess/WebPage/ViewUpdateDispatcher.h:34: > Alphabetical sorting problem. [build/include_order] [4] > ERROR: Source/WTF/wtf/Lock.cpp:69: Missing space before { > [whitespace/braces] [5] > Total errors found: 9 in 43 files > > > If any of these errors are false positives, please file a bug against > check-webkit-style. Fixed most of this locally. The "Place brace" and "Missing space" errors are because check-webkit-style doesn't grok lambdas.
Filip Pizlo
Comment 33 2015-08-06 15:04:06 PDT
Created attachment 258405 [details] fix most style
WebKit Commit Bot
Comment 34 2015-08-06 15:07:29 PDT
Attachment 258405 [details] did not pass style-queue: ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:64: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:78: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Tools/TestWebKitAPI/Tests/WTF/Lock.cpp:48: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 3 in 43 files If any of these errors are false positives, please file a bug against check-webkit-style.
Geoffrey Garen
Comment 35 2015-08-06 15:43:22 PDT
> > > Source/JavaScriptCore/heap/CopiedSpace.cpp:194 > > > - SpinLockHolder locker(&m_toSpaceLock); > > > + LockHolder locker(&m_toSpaceLock); > > > > It would be nice just to use std::lock_guard, and get rid of our hand-rolled > > holders. > > Maybe, but that seems like a different patch. Also, are you advocating > having to always say std::lock_guard<Lock> locker(...) instead of LockHolder > locker(...)? The latter is more concise. Yeah, I guess it's a different patch. I'm primarily saying that it's nice not to reinvent things. If we like brevity, I'd prefer to typedef std::lock_guard<Lock> to LockHolder or LockGuard rather than write a class. The lock_guard has the slight downside that we would need to obey the C++ naming scheme. > So, forcing the lock to be held while calling notify() is often enough to > compensate for a race condition involving the state variable (in this case, > shouldPark). I'd say that's a pretty good deal. > > But for this code I'm down with being precise and efficient, since it's fine > to spend more time thinking about this code. But as a general rule, I guess > I'd want to be able to continue to use my "notify while locked" mental > checklist item; I'd even prefer if our own API for notify ASSERTed that the > lock was held. I'm curious if the "stuck in unlock" pathology affects > WebKit. Are you bringing it up as a hypothetical or do we have experience > with it? I think you have a reasonable counterpoint for non-performance-critical code -- and, indeed, it took me a while to understand the subtlety here. I'm bringing up lock/notify performance as a hypothetical, based on reading C++ documentation. I haven't measured a performance problem caused by lock/notify. Perhaps we want a custom version of notify that requires a lock, and then unlocks it for you before notifying, as a kind of reminder that you should have locked before notifying, but without the performance penalty. For my part, I find it easy enough to remember the mental rule "unlock right before notify" -- which implies the same safety as "unlock right after notify", since both require having previously locked. Maybe we should bring this up on webkit-dev to see how other people feel. > > > Source/WTF/wtf/Lock.cpp:258 > > > + // Use notify_all() out of paranoia. In reality, only the blocked thread is waiting on > > > + // this condition variable. > > > > I would call this convenience or expedience rather than paranoia. We really > > do need to notify all threads, since we have no API for notifying just the > > head of the queue. > > No, it's paranoia. There is at most one thread waiting on the parking > condition. Each thread has its own parking condition. Oh, I see -- because we guard the parking condition with the parking mutex. > The parking > condition is only waited on by the thread that owns it. Therefore, > notify_all and notify_one are exactly the same here. > > I have two reasons for preferring notify_all pretty much all the time: > > - If you allow yourself to use notify_one, you will probably eventually > introduce a hard-to-catch deadlock due to there being more than one thread > that needed to be notified. > > - In my Jikes RVM days, I played around with making Java's notify() just do > notifyAll() behind your back. There was no change in performance on the > benchmarks I had available. > > I suspect that you could easily craft a microbenchmark that gets slower if > you force yourself to use notify_all() even though you could have used > notify_one(). But I've never seen notify_one() turn into a speed-up, and I > have seen it turn into a deadlock. That seems like a rotten deal to me. > For that reason, I prefer notify_all(). > > In this code, I'm fine with using notify_one() if you believe that the > proper style is to be precise about notifications. But as a general style, > I like notify_all() better. OTOH, in this code, it totally doesn't matter > since it would be a bug if more than one thread waited on the same parking > condition. Hmmm... I haven't written enough wait/notify code to have a strong intuition about notify_one vs notify_all. I'm happy with either in this case. > > > Source/WTF/wtf/Lock.h:141 > > > + static const uintptr_t isHeldBit = 1; > > > + static const uintptr_t lockedQueueBit = 2; > > > + static const uintptr_t mask = 3; > > > > It would be nice to have helper functions for dealing with these bits: > > > > bool isLocked(uintptr_t); > > uintptr_t lock(uintptr_t); > > uintptr_t unlock(uintptr_t); > > > > bool isQueueLocked(uintptr_t); > > uintptr_t lockQueue(uintptr_t); > > uintptr_t unlockQueue(uintptr_t); > > > > ThreadData* queue(uintptr_t word); > > uintptr_t setQueue(uintptr_t, ThreadData*); > > > > In addition to being more readable at the call site, these functions can > > help document and verify preconditions. For example, unlock can assert that > > the queue is not locked, lockQueue can assert that the lock is locked, and > > so on. > > I actually disagree with this. To me, if I had the option of: > > 1) word = lock(word); > 2) word |= isHeldBit; > > I would take (2) any day, because the intent is just as clear as (1) and the > implementation is strictly more clear. With (1), I'd have to look to see > how the act of locking was represented. In that case, how about these name changes: isLockedBit isQueueLockedBit queueHeadMask
Geoffrey Garen
Comment 36 2015-08-06 15:51:19 PDT
> > No, it's paranoia. There is at most one thread waiting on the parking > > condition. Each thread has its own parking condition. > > Oh, I see -- because we guard the parking condition with the parking mutex. LOL, no, that was wrong -- it's because there's one condition variable per thread! I guess in the case of one condition variable per thread, you'd be taking the "always notify_all()" idiom to a bit of an extreme, but I still don't have a strong negative feeling about it.
Filip Pizlo
Comment 37 2015-08-06 15:52:30 PDT
(In reply to comment #35) > > > > Source/JavaScriptCore/heap/CopiedSpace.cpp:194 > > > > - SpinLockHolder locker(&m_toSpaceLock); > > > > + LockHolder locker(&m_toSpaceLock); > > > > > > It would be nice just to use std::lock_guard, and get rid of our hand-rolled > > > holders. > > > > Maybe, but that seems like a different patch. Also, are you advocating > > having to always say std::lock_guard<Lock> locker(...) instead of LockHolder > > locker(...)? The latter is more concise. > > Yeah, I guess it's a different patch. > > I'm primarily saying that it's nice not to reinvent things. If we like > brevity, I'd prefer to typedef std::lock_guard<Lock> to LockHolder or > LockGuard rather than write a class. > > The lock_guard has the slight downside that we would need to obey the C++ > naming scheme. > > > So, forcing the lock to be held while calling notify() is often enough to > > compensate for a race condition involving the state variable (in this case, > > shouldPark). I'd say that's a pretty good deal. > > > > But for this code I'm down with being precise and efficient, since it's fine > > to spend more time thinking about this code. But as a general rule, I guess > > I'd want to be able to continue to use my "notify while locked" mental > > checklist item; I'd even prefer if our own API for notify ASSERTed that the > > lock was held. I'm curious if the "stuck in unlock" pathology affects > > WebKit. Are you bringing it up as a hypothetical or do we have experience > > with it? > > I think you have a reasonable counterpoint for non-performance-critical code > -- and, indeed, it took me a while to understand the subtlety here. > > I'm bringing up lock/notify performance as a hypothetical, based on reading > C++ documentation. I haven't measured a performance problem caused by > lock/notify. > > Perhaps we want a custom version of notify that requires a lock, and then > unlocks it for you before notifying, as a kind of reminder that you should > have locked before notifying, but without the performance penalty. How do you do this with lexically scoped locking? The "notify while locked" rule is easier to enforce. I guess we could have an API that does the notify as part of ~LockHolder, after it calls unlock. That strikes me as the best: notify() takes a special LockHolder that has Vector<Condition*, 1> in it. This can be used to both assert the safety property and ensure that the notify() comes after unlock(). > > For my part, I find it easy enough to remember the mental rule "unlock right > before notify" -- which implies the same safety as "unlock right after > notify", since both require having previously locked. Maybe we should bring > this up on webkit-dev to see how other people feel. I'd probably want to bring this up once I start writing a replacement for ThreadCondition/std::condition_variable using the ParkingLot (https://bugs.webkit.org/show_bug.cgi?id=147665). The API I propose above seems the nicest, so I'll propose that. > > > > > Source/WTF/wtf/Lock.cpp:258 > > > > + // Use notify_all() out of paranoia. In reality, only the blocked thread is waiting on > > > > + // this condition variable. > > > > > > I would call this convenience or expedience rather than paranoia. We really > > > do need to notify all threads, since we have no API for notifying just the > > > head of the queue. > > > > No, it's paranoia. There is at most one thread waiting on the parking > > condition. Each thread has its own parking condition. > > Oh, I see -- because we guard the parking condition with the parking mutex. > > > The parking > > condition is only waited on by the thread that owns it. Therefore, > > notify_all and notify_one are exactly the same here. > > > > I have two reasons for preferring notify_all pretty much all the time: > > > > - If you allow yourself to use notify_one, you will probably eventually > > introduce a hard-to-catch deadlock due to there being more than one thread > > that needed to be notified. > > > > - In my Jikes RVM days, I played around with making Java's notify() just do > > notifyAll() behind your back. There was no change in performance on the > > benchmarks I had available. > > > > I suspect that you could easily craft a microbenchmark that gets slower if > > you force yourself to use notify_all() even though you could have used > > notify_one(). But I've never seen notify_one() turn into a speed-up, and I > > have seen it turn into a deadlock. That seems like a rotten deal to me. > > For that reason, I prefer notify_all(). > > > > In this code, I'm fine with using notify_one() if you believe that the > > proper style is to be precise about notifications. But as a general style, > > I like notify_all() better. OTOH, in this code, it totally doesn't matter > > since it would be a bug if more than one thread waited on the same parking > > condition. > > Hmmm... I haven't written enough wait/notify code to have a strong intuition > about notify_one vs notify_all. I'm happy with either in this case. > > > > > Source/WTF/wtf/Lock.h:141 > > > > + static const uintptr_t isHeldBit = 1; > > > > + static const uintptr_t lockedQueueBit = 2; > > > > + static const uintptr_t mask = 3; > > > > > > It would be nice to have helper functions for dealing with these bits: > > > > > > bool isLocked(uintptr_t); > > > uintptr_t lock(uintptr_t); > > > uintptr_t unlock(uintptr_t); > > > > > > bool isQueueLocked(uintptr_t); > > > uintptr_t lockQueue(uintptr_t); > > > uintptr_t unlockQueue(uintptr_t); > > > > > > ThreadData* queue(uintptr_t word); > > > uintptr_t setQueue(uintptr_t, ThreadData*); > > > > > > In addition to being more readable at the call site, these functions can > > > help document and verify preconditions. For example, unlock can assert that > > > the queue is not locked, lockQueue can assert that the lock is locked, and > > > so on. > > > > I actually disagree with this. To me, if I had the option of: > > > > 1) word = lock(word); > > 2) word |= isHeldBit; > > > > I would take (2) any day, because the intent is just as clear as (1) and the > > implementation is strictly more clear. With (1), I'd have to look to see > > how the act of locking was represented. > > In that case, how about these name changes: > isLockedBit > isQueueLockedBit > queueHeadMask Yeah, those are better names. I'll make the change.
Filip Pizlo
Comment 38 2015-08-06 15:53:42 PDT
(In reply to comment #36) > > > No, it's paranoia. There is at most one thread waiting on the parking > > > condition. Each thread has its own parking condition. > > > > Oh, I see -- because we guard the parking condition with the parking mutex. > > LOL, no, that was wrong -- it's because there's one condition variable per > thread! > > I guess in the case of one condition variable per thread, you'd be taking > the "always notify_all()" idiom to a bit of an extreme, but I still don't > have a strong negative feeling about it. Yup, it's an extreme alright! I've changed the code to use notify_one(). I think that these kinds of safety idioms make more sense in higher-level code where we don't scrutinize it as much.
Geoffrey Garen
Comment 39 2015-08-06 15:58:50 PDT
> > Perhaps we want a custom version of notify that requires a lock, and then > > unlocks it for you before notifying, as a kind of reminder that you should > > have locked before notifying, but without the performance penalty. > > How do you do this with lexically scoped locking? The "notify while locked" > rule is easier to enforce. Either you manually close the scope and then notify (as you did in this patch), or the notify helper function takes an std::unique_lock, which allows it to manually unlock, and which notices at the end of scope that it has already been unlocked, and you call the helper function right before the scope closes. That said, I agree that it can be annoying to manually close a scope early and move the notify out of it (as you did in this patch). > I guess we could have an API that does the notify as part of ~LockHolder, > after it calls unlock. That strikes me as the best: notify() takes a > special LockHolder that has Vector<Condition*, 1> in it. This can be used > to both assert the safety property and ensure that the notify() comes after > unlock(). Seems pretty good to me.
Geoffrey Garen
Comment 40 2015-08-06 16:07:58 PDT
> That said, I agree that it can be annoying to manually close a scope early > and move the notify out of it (as you did in this patch). An argument in favor of your "hold like while notify" proposal is the degenerate case where your predicate is guarded by some other mutex or lockless design, and so you have a mutex solely for your condition variable. In such a case, you might be tempted to do this: Waiter: while (!predicate()) { unique_lock<mutex> lock(m_conditionMutex); m_condition.wait(lock...); } Signaler: // I happen to know that predicate() is true now, so I'll signal. m_condition.notify_all(); But this code is incorrect! In order to make it correct, you must acquire the lock before you signal, like so: Signaler: { lock_guard<mutex> lock(m_conditionMutex); } m_condition.notify_all(); It is indeed a simpler mental model to think, "The lock guards the condition variable" -- even though it's not quite true.
Filip Pizlo
Comment 41 2015-08-06 16:59:44 PDT
(In reply to comment #40) > > That said, I agree that it can be annoying to manually close a scope early > > and move the notify out of it (as you did in this patch). > > An argument in favor of your "hold like while notify" proposal is the > degenerate case where your predicate is guarded by some other mutex or > lockless design, and so you have a mutex solely for your condition variable. > > In such a case, you might be tempted to do this: > > Waiter: > while (!predicate()) { > unique_lock<mutex> lock(m_conditionMutex); > m_condition.wait(lock...); > } > > Signaler: > // I happen to know that predicate() is true now, so I'll signal. > m_condition.notify_all(); > > But this code is incorrect! In order to make it correct, you must acquire > the lock before you signal, like so: > > Signaler: > { lock_guard<mutex> lock(m_conditionMutex); } > m_condition.notify_all(); > > It is indeed a simpler mental model to think, "The lock guards the condition > variable" -- even though it's not quite true. Interesting, it's been a while since I've seen that kind of code! I probably got bit by this when I first learned how to threads. Simple mental models are very powerful for condition-variable-heavy code because the performance benefits of more complex approaches are low while the risks are high. I've yet to see a legit program (not a pathological microbenchmark) where notify_one was better than notify_all or where calling notify while locked caused a performance problem, or indeed where the speed of the wait/notify algorithm even mattered that much. I've also yet to see an example where using one condition variable per lock ever caused a performance problem. So, I prefer to simplify the model down to: - 1 condition variable per lock, in the style of Monitors in Java. - only one kind of notify: notifyAll(). - notifyAll() must be called while the lock is held. The only problem that this could introduce is some thread waking up too soon and then realizing that it should go back to sleep. This would be a troubling issue if we had a lot of threads - but we don't. We rely on threads mostly for concurrency rather than parallelism. I think a lot of why sophisticated wait/notify tricks don't pan out is that all they can do is reduce the probability of spurious thread wake-ups. The probability of spurious thread wake-ups is low to begin with. The cost of spurious thread wake-up is small - it's just an extra context switch. You'd have to spuriously wake up a lot of threads for it to matter. In WebKit, probably the maximum number of threads you could wake up with any condition variable is bounded by NUM_CPU. Thinking more about it, I'd be *really* surprised if the notify-before-unlock issue ever arose in real code because if it did, then it would contradict the traditional greediness assumption in locking algorithms: the currently running thread has effectively higher priority than any thread we could wake up because in most schedulers, if a thread wakes up another thread, it doesn't cause an immediate context switch. Instead, the thread that was previously running will continue to run - at full speed, on a hot cache. The woken-up thread is at lower priority because either it: (1) is slated to run on the current CPU but cannot because the thread that did notify() is still running, (2) is slated to run on another CPU but that CPU is still doing something else, or (3) is slated to run on another CPU and that CPU is idle, and so it needs to start up. In (3), there will be a delay since everything will be running on a cold cache. Also, OS schedulers sometimes delay waking up a CPU to run a thread under the optimistic assumption that the current CPU will go idle before the other one will wake up. The notion of greed is known in the literature (see for example http://janvitek.github.io/vitekj/490s12/Schedule_files/ols2002-pages-479-495.pdf, page 3) and is thought to be a key quality of barging locks. I witnessed it when I was writing benchmarks of lock fairness, years ago - if you have many threads all doing "while (true) lock { cnt++; }" then one thread will usually spend some time repeatedly locking and unlocking the lock until it loses its timeslice, and only then will another thread have a chance at acquiring the lock. This property is what makes barging locks faster than FIFO locks - you get fewer context switches. Because of how fundamental greed is to scheduling, it seems super unlikely that a call to notify() will cause a thread to run before you get around to calling unlock(). Most likely, the notify() will not cause an immediate context switch, and most likely, the current thread will not run out of its timeslice just as it is about to call unlock(). I want to think about this more before proposing a style change, and probably, instead of proposing a style change the best approach would be to write a patch that imposes the style globally and then prove that this patch is not a slow-down. That, I think should be the thing we use to determine our style: the simplest and most fool-proof style that still maximizes performance.
Filip Pizlo
Comment 42 2015-08-06 17:50:34 PDT
Filip Pizlo
Comment 43 2015-08-06 19:38:14 PDT
I'm seeing some timeouts on the WK2 bots. I'm investigating.
Filip Pizlo
Comment 44 2015-08-06 21:18:00 PDT
Fix some test gardening in http://trac.webkit.org/changeset/188113
Simon Fraser (smfr)
Comment 45 2015-08-06 22:46:44 PDT
Filip Pizlo
Comment 46 2015-08-06 23:16:32 PDT
(In reply to comment #45) > Window is pretty crashy: > https://build.webkit.org/builders/ > Apple%20Win%207%20Debug%20%28Tests%29?numbuilds=100 Oh man, Windows makes weird assertions in compare_exchange_weak and friends. I've got a fix.
Filip Pizlo
Comment 47 2015-08-06 23:18:07 PDT
(In reply to comment #46) > (In reply to comment #45) > > Window is pretty crashy: > > https://build.webkit.org/builders/ > > Apple%20Win%207%20Debug%20%28Tests%29?numbuilds=100 > > Oh man, Windows makes weird assertions in compare_exchange_weak and friends. > I've got a fix. I think this fixes it: http://trac.webkit.org/changeset/188117
Csaba Osztrogonác
Comment 48 2015-08-07 02:18:58 PDT
(In reply to comment #44) > Fix some test gardening in http://trac.webkit.org/changeset/188113 It made ~500-1000 tests fail on ARM Linux bots. Could you check it? See bug147776 for details.
Chris Dumez
Comment 49 2015-08-07 11:37:05 PDT
Completely broke iOS: rdar://problem/22191332
Chris Dumez
Comment 50 2015-08-07 11:52:25 PDT
(In reply to comment #49) > Completely broke iOS: rdar://problem/22191332 Rolled out in <http://trac.webkit.org/changeset/188144>.
Chris Dumez
Comment 51 2015-08-07 11:56:49 PDT
(In reply to comment #50) > (In reply to comment #49) > > Completely broke iOS: rdar://problem/22191332 > > Rolled out in <http://trac.webkit.org/changeset/188144>. Sorry, it looks like it messed up my commit message somehow.
Filip Pizlo
Comment 52 2015-08-07 13:05:28 PDT
(In reply to comment #51) > (In reply to comment #50) > > (In reply to comment #49) > > > Completely broke iOS: rdar://problem/22191332 > > > > Rolled out in <http://trac.webkit.org/changeset/188144>. > > Sorry, it looks like it messed up my commit message somehow. The bug was that the unlock fast path was using a weak CAS, which could cause the unlock slow path to run even when there is nobody on the lock's queue. But the unlock slow path assumes that once the queue lock is acquired, the queue will be non-empty - and it dereferences the queue head unconditionally. That null dereference was crashing every time that the weak CAS failed spuriously. This is an ARM-only issue because X86 has no notion of weak CAS; all CASes are strong and so they never fail spuriously. I'll reland shortly with the fix.
Filip Pizlo
Comment 53 2015-08-07 13:06:07 PDT
This got rolled out so reopening.
Filip Pizlo
Comment 54 2015-08-07 13:10:45 PDT
Created attachment 258522 [details] patch for relanding
WebKit Commit Bot
Comment 55 2015-08-07 13:13:55 PDT
Attachment 258522 [details] did not pass style-queue: ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:64: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:78: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Tools/TestWebKitAPI/Tests/WTF/Lock.cpp:48: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 3 in 44 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 56 2015-08-07 15:40:01 PDT
Note You need to log in before you can comment on or make changes to this bug.