Bug 147999

Summary: Use WTF::Lock and WTF::Condition instead of WTF::Mutex, WTF::ThreadCondition, std::mutex, and std::condition_variable
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: JavaScriptCoreAssignee: Filip Pizlo <fpizlo>
Status: RESOLVED FIXED    
Severity: Normal CC: barraclough, basile_clement, benjamin, cmarcelo, commit-queue, ggaren, mark.lam, mhahnenb, mmirman, msaboff, nrotem, oliver, pnormand, saam, sam
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 148029    
Bug Blocks:    
Attachments:
Description Flags
the patch ggaren: review+

Description Filip Pizlo 2015-08-13 16:09:32 PDT
WTF::Lock and WTF::Condition are 64x smaller than those other things, and WTF::Lock is 10x-100x faster for short critical sections.

There's no good reason to use those other things.
Comment 1 Filip Pizlo 2015-08-13 16:14:36 PDT
Created attachment 258949 [details]
the patch
Comment 2 WebKit Commit Bot 2015-08-13 16:16:45 PDT
Attachment 258949 [details] did not pass style-queue:


ERROR: Source/WTF/wtf/Lock.h:71:  try_lock is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
Total errors found: 1 in 32 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Geoffrey Garen 2015-08-13 16:45:42 PDT
Comment on attachment 258949 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=258949&action=review

r=me

What could possibly go wrong?

> Source/WTF/wtf/Lock.h:66
> +            if (m_byte.compareExchangeWeak(currentByteValue, currentByteValue | isHeldBit))
> +                return true;

Why try again instead of just returning the result of compareExchangeWeak? It seems contrary to std::mutex, and to expectation, for tryLock to loop a lot under unlucky circumstances.
Comment 4 Filip Pizlo 2015-08-13 19:38:50 PDT
(In reply to comment #3)
> Comment on attachment 258949 [details]
> the patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=258949&action=review
> 
> r=me
> 
> What could possibly go wrong?
> 
> > Source/WTF/wtf/Lock.h:66
> > +            if (m_byte.compareExchangeWeak(currentByteValue, currentByteValue | isHeldBit))
> > +                return true;
> 
> Why try again instead of just returning the result of compareExchangeWeak?
> It seems contrary to std::mutex, and to expectation, for tryLock to loop a
> lot under unlucky circumstances.

I think that this is consistent with expectation, and I don't believe it will loop a lot.

The loop is necessary to satisfy what I believe to be a requirement of tryLock: if the caller knows for sure that the lock is not held then tryLock can only return true. It is possible for the caller to know for sure that the lock is not held: for example, the lock may only have been ever acquired by the current thread and is not held by the current thread right now. I also believe that the converse is a requirement: if the current thread holds the lock, then a call to tryLock must return false. Because this requirement is so easily testable, I've always believed it to be part of the expected behavior of tryLock. This definition appears to be consistent with http://linux.die.net/man/3/pthread_mutex_trylock: "The pthread_mutex_trylock() function shall be equivalent to pthread_mutex_lock(), except that if the mutex object referenced by mutex is currently locked (by any thread, including the current thread), the call shall return immediately." We could ensure this behavior if we had a magical primitive called STRONG_CAS_BIT, which was a strong CAS on a bit in memory. Then we would STRONG_CAS_BIT(&m_byte.isHeldBit, false, true). That's exactly what this loop implements, by using a CAS over the whole byte.

I agree that implementing these semantics would be of questionable utility if it meant tryLock() looped a lot. This loop will return false immediately if it sees that the lock is held, so you won't loop waiting to acquire the lock. It will only loop around if the lock byte changed between when we loaded it (and found it to not be held) and when we executed the CAS. I believe that this is a lock-free algorithm, in the sense that the only way for tryLock to never return is if the OS scheduler decided to never let us run. When we promise something to be "non-blocking" or being O(1), as we expect for tryLock(), I believe that this should allow for a CAS loop if that CAS loop is lock-free.
Comment 5 Filip Pizlo 2015-08-13 22:27:43 PDT
This appears to be a speed-up:


Benchmark report for SunSpider, Octane, Kraken, and AsmBench on shakezilla (MacBookPro11,3).

VMs tested:
"TipOfTree" at /Volumes/Data/secondary/OpenSource/WebKitBuild/Release/jsc (r188431)
"NewLocks" at /Volumes/Data/quinary/OpenSource/WebKitBuild/Release/jsc (r188431)

Collected 20 samples per benchmark/VM, with 20 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to
get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                TipOfTree                  NewLocks                                     
SunSpider:
   3d-cube                                    4.7079+-0.2427            4.5150+-0.1483          might be 1.0427x faster
   3d-morph                                   5.1893+-0.0810     ?      5.3729+-0.1713        ? might be 1.0354x slower
   3d-raytrace                                5.2725+-0.1242     ?      5.2941+-0.2346        ?
   access-binary-trees                        2.2193+-0.1509            2.1069+-0.1479          might be 1.0533x faster
   access-fannkuch                            5.4121+-0.1323     ^      5.2019+-0.0416        ^ definitely 1.0404x faster
   access-nbody                               2.4719+-0.0801     ?      2.4907+-0.1116        ?
   access-nsieve                              3.2560+-0.2458            3.0751+-0.1189          might be 1.0588x faster
   bitops-3bit-bits-in-byte                   1.4745+-0.0252            1.4739+-0.1490        
   bitops-bits-in-byte                        3.2527+-0.0365            3.2430+-0.0786        
   bitops-bitwise-and                         1.9920+-0.0667            1.9302+-0.0136          might be 1.0321x faster
   bitops-nsieve-bits                         2.9597+-0.0492            2.9243+-0.0421          might be 1.0121x faster
   controlflow-recursive                      2.0927+-0.0698            2.0397+-0.0809          might be 1.0260x faster
   crypto-aes                                 3.8336+-0.0783            3.7728+-0.0942          might be 1.0161x faster
   crypto-md5                                 2.5629+-0.1323            2.4219+-0.2512          might be 1.0582x faster
   crypto-sha1                                2.3848+-0.0672            2.2829+-0.1008          might be 1.0447x faster
   date-format-tofte                          6.8657+-0.3063            6.5775+-0.1161          might be 1.0438x faster
   date-format-xparb                          4.7056+-0.2635            4.4479+-0.1093          might be 1.0579x faster
   math-cordic                                2.8090+-0.0551            2.7671+-0.0661          might be 1.0151x faster
   math-partial-sums                          5.1786+-0.0778            5.1570+-0.0624        
   math-spectral-norm                         1.8258+-0.0362            1.7980+-0.0536          might be 1.0155x faster
   regexp-dna                                 6.4885+-0.0923     ?      6.5088+-0.1748        ?
   string-base64                              4.3984+-0.1389            4.2599+-0.0496          might be 1.0325x faster
   string-fasta                               5.7740+-0.1201     ?      5.8762+-0.1522        ? might be 1.0177x slower
   string-tagcloud                            8.0202+-0.1302            7.8639+-0.1237          might be 1.0199x faster
   string-unpack-code                        19.8845+-0.4345           19.0275+-0.5090          might be 1.0450x faster
   string-validate-input                      4.5067+-0.0933     ?      4.5130+-0.1855        ?

   <arithmetic>                               4.5977+-0.0334     ^      4.4978+-0.0313        ^ definitely 1.0222x faster

                                                TipOfTree                  NewLocks                                     
Octane:
   encrypt                                   0.17180+-0.00064          0.17171+-0.00055       
   decrypt                                   3.17979+-0.03345    ?     3.19699+-0.00981       ?
   deltablue                        x2       0.15622+-0.00148          0.15415+-0.00111         might be 1.0135x faster
   earley                                    0.27770+-0.00162          0.27502+-0.00142       
   boyer                                     4.17854+-0.04413          4.14484+-0.01672       
   navier-stokes                    x2       4.82205+-0.01087    ?     4.82440+-0.01086       ?
   raytrace                         x2       1.00090+-0.02372    ?     1.04802+-0.07031       ? might be 1.0471x slower
   richards                         x2       0.10914+-0.00077    ?     0.11074+-0.00288       ? might be 1.0146x slower
   splay                            x2       0.33964+-0.00177    ^     0.32765+-0.00172       ^ definitely 1.0366x faster
   regexp                           x2      25.07020+-0.25610         24.74069+-0.11904         might be 1.0133x faster
   pdfjs                            x2      37.38952+-0.22937    ^    36.91349+-0.20178       ^ definitely 1.0129x faster
   mandreel                         x2      43.71060+-0.32385    ?    43.91897+-0.25742       ?
   gbemu                            x2      34.24095+-0.61294         34.13445+-0.42616       
   closure                                   0.56098+-0.00213    ^     0.55702+-0.00174       ^ definitely 1.0071x faster
   jquery                                    7.13931+-0.03051    ^     7.07964+-0.02523       ^ definitely 1.0084x faster
   box2d                            x2       9.98853+-0.05717          9.96303+-0.05108       
   zlib                             x2     383.68757+-6.52357        383.53414+-5.50625       
   typescript                       x2     664.18752+-5.83347        657.56748+-5.45786         might be 1.0101x faster

   <geometric>                               5.58035+-0.01457          5.56337+-0.02765         might be 1.0031x faster

                                                TipOfTree                  NewLocks                                     
Kraken:
   ai-astar                                  235.239+-3.242      ^     220.327+-1.849         ^ definitely 1.0677x faster
   audio-beat-detection                       58.684+-0.243      ?      58.728+-0.212         ?
   audio-dft                                  97.335+-1.130             96.569+-1.127         
   audio-fft                                  35.248+-0.320      ?      35.276+-0.352         ?
   audio-oscillator                           60.735+-0.552             60.396+-0.699         
   imaging-darkroom                           61.410+-1.069             60.850+-0.466         
   imaging-desaturate                         54.811+-1.006      ^      49.281+-0.200         ^ definitely 1.1122x faster
   imaging-gaussian-blur                      84.235+-1.294             83.682+-0.660         
   json-parse-financial                       37.743+-0.514      !      38.800+-0.401         ! definitely 1.0280x slower
   json-stringify-tinderbox                   24.319+-0.386      ^      23.019+-0.640         ^ definitely 1.0565x faster
   stanford-crypto-aes                        42.616+-0.668      ?      42.940+-0.704         ?
   stanford-crypto-ccm                        35.328+-0.870      ?      35.828+-0.585         ? might be 1.0142x slower
   stanford-crypto-pbkdf2                     94.138+-0.556      ?      95.347+-1.548         ? might be 1.0128x slower
   stanford-crypto-sha256-iterative           36.668+-0.564      ?      36.812+-0.692         ?

   <arithmetic>                               68.465+-0.296      ^      66.990+-0.290         ^ definitely 1.0220x faster

                                                TipOfTree                  NewLocks                                     
AsmBench:
   bigfib.cpp                               454.8276+-2.4879     ?    455.0060+-3.7398        ?
   cray.c                                   400.2721+-1.2841     ?    402.3261+-2.5293        ?
   dry.c                                    425.5744+-4.6141     ?    426.6179+-3.1838        ?
   FloatMM.c                                683.7366+-1.3617     ?    684.9314+-1.8365        ?
   gcc-loops.cpp                           3424.1166+-11.3836        3419.6855+-5.2691        
   n-body.c                                 825.6690+-3.4029          823.6618+-1.8188        
   Quicksort.c                              406.8926+-2.2691     ?    406.9406+-2.5490        ?
   stepanov_container.cpp                  3545.8981+-10.3444    ?   3549.0221+-7.9672        ?
   Towers.c                                 233.9338+-1.4901          233.6589+-0.7130        

   <geometric>                              717.7007+-1.2120     ?    718.1702+-1.2331        ? might be 1.0007x slower

                                                TipOfTree                  NewLocks                                     
Geomean of preferred means:
   <scaled-result>                           33.5071+-0.0722     ^     33.1225+-0.0528        ^ definitely 1.0116x faster
Comment 6 Filip Pizlo 2015-08-13 23:50:13 PDT
Landed in http://trac.webkit.org/changeset/188444
Comment 7 Philippe Normand 2015-08-14 09:34:23 PDT
This patch broke GTK and EFL, see https://bugs.webkit.org/show_bug.cgi?id=148027
Comment 8 WebKit Commit Bot 2015-08-14 09:51:03 PDT
Re-opened since this is blocked by bug 148029
Comment 9 Filip Pizlo 2015-08-14 17:17:23 PDT
Landed in http://trac.webkit.org/changeset/188499