RESOLVED FIXED 147999
Use WTF::Lock and WTF::Condition instead of WTF::Mutex, WTF::ThreadCondition, std::mutex, and std::condition_variable
https://bugs.webkit.org/show_bug.cgi?id=147999
Summary Use WTF::Lock and WTF::Condition instead of WTF::Mutex, WTF::ThreadCondition,...
Filip Pizlo
Reported 2015-08-13 16:09:32 PDT
WTF::Lock and WTF::Condition are 64x smaller than those other things, and WTF::Lock is 10x-100x faster for short critical sections. There's no good reason to use those other things.
Attachments
the patch (59.13 KB, patch)
2015-08-13 16:14 PDT, Filip Pizlo
ggaren: review+
Filip Pizlo
Comment 1 2015-08-13 16:14:36 PDT
Created attachment 258949 [details] the patch
WebKit Commit Bot
Comment 2 2015-08-13 16:16:45 PDT
Attachment 258949 [details] did not pass style-queue: ERROR: Source/WTF/wtf/Lock.h:71: try_lock is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] Total errors found: 1 in 32 files If any of these errors are false positives, please file a bug against check-webkit-style.
Geoffrey Garen
Comment 3 2015-08-13 16:45:42 PDT
Comment on attachment 258949 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=258949&action=review r=me What could possibly go wrong? > Source/WTF/wtf/Lock.h:66 > + if (m_byte.compareExchangeWeak(currentByteValue, currentByteValue | isHeldBit)) > + return true; Why try again instead of just returning the result of compareExchangeWeak? It seems contrary to std::mutex, and to expectation, for tryLock to loop a lot under unlucky circumstances.
Filip Pizlo
Comment 4 2015-08-13 19:38:50 PDT
(In reply to comment #3) > Comment on attachment 258949 [details] > the patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=258949&action=review > > r=me > > What could possibly go wrong? > > > Source/WTF/wtf/Lock.h:66 > > + if (m_byte.compareExchangeWeak(currentByteValue, currentByteValue | isHeldBit)) > > + return true; > > Why try again instead of just returning the result of compareExchangeWeak? > It seems contrary to std::mutex, and to expectation, for tryLock to loop a > lot under unlucky circumstances. I think that this is consistent with expectation, and I don't believe it will loop a lot. The loop is necessary to satisfy what I believe to be a requirement of tryLock: if the caller knows for sure that the lock is not held then tryLock can only return true. It is possible for the caller to know for sure that the lock is not held: for example, the lock may only have been ever acquired by the current thread and is not held by the current thread right now. I also believe that the converse is a requirement: if the current thread holds the lock, then a call to tryLock must return false. Because this requirement is so easily testable, I've always believed it to be part of the expected behavior of tryLock. This definition appears to be consistent with http://linux.die.net/man/3/pthread_mutex_trylock: "The pthread_mutex_trylock() function shall be equivalent to pthread_mutex_lock(), except that if the mutex object referenced by mutex is currently locked (by any thread, including the current thread), the call shall return immediately." We could ensure this behavior if we had a magical primitive called STRONG_CAS_BIT, which was a strong CAS on a bit in memory. Then we would STRONG_CAS_BIT(&m_byte.isHeldBit, false, true). That's exactly what this loop implements, by using a CAS over the whole byte. I agree that implementing these semantics would be of questionable utility if it meant tryLock() looped a lot. This loop will return false immediately if it sees that the lock is held, so you won't loop waiting to acquire the lock. It will only loop around if the lock byte changed between when we loaded it (and found it to not be held) and when we executed the CAS. I believe that this is a lock-free algorithm, in the sense that the only way for tryLock to never return is if the OS scheduler decided to never let us run. When we promise something to be "non-blocking" or being O(1), as we expect for tryLock(), I believe that this should allow for a CAS loop if that CAS loop is lock-free.
Filip Pizlo
Comment 5 2015-08-13 22:27:43 PDT
This appears to be a speed-up: Benchmark report for SunSpider, Octane, Kraken, and AsmBench on shakezilla (MacBookPro11,3). VMs tested: "TipOfTree" at /Volumes/Data/secondary/OpenSource/WebKitBuild/Release/jsc (r188431) "NewLocks" at /Volumes/Data/quinary/OpenSource/WebKitBuild/Release/jsc (r188431) Collected 20 samples per benchmark/VM, with 20 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree NewLocks SunSpider: 3d-cube 4.7079+-0.2427 4.5150+-0.1483 might be 1.0427x faster 3d-morph 5.1893+-0.0810 ? 5.3729+-0.1713 ? might be 1.0354x slower 3d-raytrace 5.2725+-0.1242 ? 5.2941+-0.2346 ? access-binary-trees 2.2193+-0.1509 2.1069+-0.1479 might be 1.0533x faster access-fannkuch 5.4121+-0.1323 ^ 5.2019+-0.0416 ^ definitely 1.0404x faster access-nbody 2.4719+-0.0801 ? 2.4907+-0.1116 ? access-nsieve 3.2560+-0.2458 3.0751+-0.1189 might be 1.0588x faster bitops-3bit-bits-in-byte 1.4745+-0.0252 1.4739+-0.1490 bitops-bits-in-byte 3.2527+-0.0365 3.2430+-0.0786 bitops-bitwise-and 1.9920+-0.0667 1.9302+-0.0136 might be 1.0321x faster bitops-nsieve-bits 2.9597+-0.0492 2.9243+-0.0421 might be 1.0121x faster controlflow-recursive 2.0927+-0.0698 2.0397+-0.0809 might be 1.0260x faster crypto-aes 3.8336+-0.0783 3.7728+-0.0942 might be 1.0161x faster crypto-md5 2.5629+-0.1323 2.4219+-0.2512 might be 1.0582x faster crypto-sha1 2.3848+-0.0672 2.2829+-0.1008 might be 1.0447x faster date-format-tofte 6.8657+-0.3063 6.5775+-0.1161 might be 1.0438x faster date-format-xparb 4.7056+-0.2635 4.4479+-0.1093 might be 1.0579x faster math-cordic 2.8090+-0.0551 2.7671+-0.0661 might be 1.0151x faster math-partial-sums 5.1786+-0.0778 5.1570+-0.0624 math-spectral-norm 1.8258+-0.0362 1.7980+-0.0536 might be 1.0155x faster regexp-dna 6.4885+-0.0923 ? 6.5088+-0.1748 ? string-base64 4.3984+-0.1389 4.2599+-0.0496 might be 1.0325x faster string-fasta 5.7740+-0.1201 ? 5.8762+-0.1522 ? might be 1.0177x slower string-tagcloud 8.0202+-0.1302 7.8639+-0.1237 might be 1.0199x faster string-unpack-code 19.8845+-0.4345 19.0275+-0.5090 might be 1.0450x faster string-validate-input 4.5067+-0.0933 ? 4.5130+-0.1855 ? <arithmetic> 4.5977+-0.0334 ^ 4.4978+-0.0313 ^ definitely 1.0222x faster TipOfTree NewLocks Octane: encrypt 0.17180+-0.00064 0.17171+-0.00055 decrypt 3.17979+-0.03345 ? 3.19699+-0.00981 ? deltablue x2 0.15622+-0.00148 0.15415+-0.00111 might be 1.0135x faster earley 0.27770+-0.00162 0.27502+-0.00142 boyer 4.17854+-0.04413 4.14484+-0.01672 navier-stokes x2 4.82205+-0.01087 ? 4.82440+-0.01086 ? raytrace x2 1.00090+-0.02372 ? 1.04802+-0.07031 ? might be 1.0471x slower richards x2 0.10914+-0.00077 ? 0.11074+-0.00288 ? might be 1.0146x slower splay x2 0.33964+-0.00177 ^ 0.32765+-0.00172 ^ definitely 1.0366x faster regexp x2 25.07020+-0.25610 24.74069+-0.11904 might be 1.0133x faster pdfjs x2 37.38952+-0.22937 ^ 36.91349+-0.20178 ^ definitely 1.0129x faster mandreel x2 43.71060+-0.32385 ? 43.91897+-0.25742 ? gbemu x2 34.24095+-0.61294 34.13445+-0.42616 closure 0.56098+-0.00213 ^ 0.55702+-0.00174 ^ definitely 1.0071x faster jquery 7.13931+-0.03051 ^ 7.07964+-0.02523 ^ definitely 1.0084x faster box2d x2 9.98853+-0.05717 9.96303+-0.05108 zlib x2 383.68757+-6.52357 383.53414+-5.50625 typescript x2 664.18752+-5.83347 657.56748+-5.45786 might be 1.0101x faster <geometric> 5.58035+-0.01457 5.56337+-0.02765 might be 1.0031x faster TipOfTree NewLocks Kraken: ai-astar 235.239+-3.242 ^ 220.327+-1.849 ^ definitely 1.0677x faster audio-beat-detection 58.684+-0.243 ? 58.728+-0.212 ? audio-dft 97.335+-1.130 96.569+-1.127 audio-fft 35.248+-0.320 ? 35.276+-0.352 ? audio-oscillator 60.735+-0.552 60.396+-0.699 imaging-darkroom 61.410+-1.069 60.850+-0.466 imaging-desaturate 54.811+-1.006 ^ 49.281+-0.200 ^ definitely 1.1122x faster imaging-gaussian-blur 84.235+-1.294 83.682+-0.660 json-parse-financial 37.743+-0.514 ! 38.800+-0.401 ! definitely 1.0280x slower json-stringify-tinderbox 24.319+-0.386 ^ 23.019+-0.640 ^ definitely 1.0565x faster stanford-crypto-aes 42.616+-0.668 ? 42.940+-0.704 ? stanford-crypto-ccm 35.328+-0.870 ? 35.828+-0.585 ? might be 1.0142x slower stanford-crypto-pbkdf2 94.138+-0.556 ? 95.347+-1.548 ? might be 1.0128x slower stanford-crypto-sha256-iterative 36.668+-0.564 ? 36.812+-0.692 ? <arithmetic> 68.465+-0.296 ^ 66.990+-0.290 ^ definitely 1.0220x faster TipOfTree NewLocks AsmBench: bigfib.cpp 454.8276+-2.4879 ? 455.0060+-3.7398 ? cray.c 400.2721+-1.2841 ? 402.3261+-2.5293 ? dry.c 425.5744+-4.6141 ? 426.6179+-3.1838 ? FloatMM.c 683.7366+-1.3617 ? 684.9314+-1.8365 ? gcc-loops.cpp 3424.1166+-11.3836 3419.6855+-5.2691 n-body.c 825.6690+-3.4029 823.6618+-1.8188 Quicksort.c 406.8926+-2.2691 ? 406.9406+-2.5490 ? stepanov_container.cpp 3545.8981+-10.3444 ? 3549.0221+-7.9672 ? Towers.c 233.9338+-1.4901 233.6589+-0.7130 <geometric> 717.7007+-1.2120 ? 718.1702+-1.2331 ? might be 1.0007x slower TipOfTree NewLocks Geomean of preferred means: <scaled-result> 33.5071+-0.0722 ^ 33.1225+-0.0528 ^ definitely 1.0116x faster
Filip Pizlo
Comment 6 2015-08-13 23:50:13 PDT
Philippe Normand
Comment 7 2015-08-14 09:34:23 PDT
This patch broke GTK and EFL, see https://bugs.webkit.org/show_bug.cgi?id=148027
WebKit Commit Bot
Comment 8 2015-08-14 09:51:03 PDT
Re-opened since this is blocked by bug 148029
Filip Pizlo
Comment 9 2015-08-14 17:17:23 PDT
Note You need to log in before you can comment on or make changes to this bug.