Bug 147545

Summary: Lightweight locks should be adaptive
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: Web Template FrameworkAssignee: Filip Pizlo <fpizlo>
Status: RESOLVED FIXED    
Severity: Normal CC: andersca, barraclough, basile_clement, benjamin, cdumez, commit-queue, ggaren, kling, mark.lam, mhahnenb, mmirman, msaboff, nrotem, oliver, ossy, sam, sbarati, simon.fraser
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 147675, 147776    
Bug Blocks: 147665    
Attachments:
Description Flags
work in progress
none
work in progress
none
the patch
none
the patch
none
the patch
none
fix build
none
patch for landing
fpizlo: review+
fix most style
fpizlo: review+
patch for relanding fpizlo: review+

Description Filip Pizlo 2015-08-02 14:04:52 PDT
Current WebKit uses a lot of spinlocks because they are cheap - they occupy little memory and they are inexpensive to acquire.  But they don't adapt to contention - instead they just spin.  We should make them adaptive.
Comment 1 Filip Pizlo 2015-08-02 14:14:27 PDT
Created attachment 258040 [details]
work in progress
Comment 2 Filip Pizlo 2015-08-04 11:55:23 PDT
Created attachment 258193 [details]
work in progress

Replacing all SpinLock uses with Lock.  Added a benchmark, and documentation showing the performance results.
Comment 3 Anders Carlsson 2015-08-04 12:12:15 PDT
Interesting. Can we come up with a better name than "Lock"?
Comment 4 Filip Pizlo 2015-08-04 12:27:43 PDT
(In reply to comment #3)
> Interesting. Can we come up with a better name than "Lock"?

I was torn about this, and am open to suggestions.

Ggaren suggested "Lock" because a simple name will encourage people to use it.
Comment 5 Anders Carlsson 2015-08-04 12:41:50 PDT
(In reply to comment #4)
> (In reply to comment #3)
> > Interesting. Can we come up with a better name than "Lock"?
> 
> I was torn about this, and am open to suggestions.
> 
> Ggaren suggested "Lock" because a simple name will encourage people to use
> it.

So this lock will spin for a while and then fall back to a sleeping "system" lock?
Comment 6 Filip Pizlo 2015-08-04 12:47:02 PDT
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > Interesting. Can we come up with a better name than "Lock"?
> > 
> > I was torn about this, and am open to suggestions.
> > 
> > Ggaren suggested "Lock" because a simple name will encourage people to use
> > it.
> 
> So this lock will spin for a while and then fall back to a sleeping "system"
> lock?

Basically.  40 spins where we call yield, followed by enqueueing and sleeping.

We don't really fall back on the system lock - instead a system lock is used for "parking" - i.e. putting the thread to sleep while it waits for some other thread to wake it up.  So, when a thread is sleeping, the thread will appear to be wait()'ing on a std::condition_variable while the data structures in Lock.cpp track that the thread is on the queue for that lock.
Comment 7 Geoffrey Garen 2015-08-04 12:53:37 PDT
On second thought, I guess the C++-consistent name would be "Mutex" rather than "Lock". C++ uses std::unique_lock and std::lock_guard to mean "I lock a mutex", and std::mutex to mean "I am the thing that does the testing and sleeping". So, this is really a replacement for std::mutex and not std::*lock*.
Comment 8 Anders Carlsson 2015-08-04 12:55:56 PDT
(In reply to comment #7)
> On second thought, I guess the C++-consistent name would be "Mutex" rather
> than "Lock". C++ uses std::unique_lock and std::lock_guard to mean "I lock a
> mutex", and std::mutex to mean "I am the thing that does the testing and
> sleeping". So, this is really a replacement for std::mutex and not
> std::*lock*.

Critical Section 😑
Comment 9 Filip Pizlo 2015-08-04 13:14:07 PDT
(In reply to comment #8)
> (In reply to comment #7)
> > On second thought, I guess the C++-consistent name would be "Mutex" rather
> > than "Lock". C++ uses std::unique_lock and std::lock_guard to mean "I lock a
> > mutex", and std::mutex to mean "I am the thing that does the testing and
> > sleeping". So, this is really a replacement for std::mutex and not
> > std::*lock*.
> 
> Critical Section 😑

Noooo! :-P

I agree that Mutex is the correct name, but then we'd first have to rename the current WTF::Mutex to something like WTF::SystemMutex.  That seems annoying, but it might be OK.
Comment 10 Geoffrey Garen 2015-08-04 13:22:23 PDT
> I agree that Mutex is the correct name, but then we'd first have to rename
> the current WTF::Mutex to something like WTF::SystemMutex.  That seems
> annoying, but it might be OK.

If the thesis of this patch is correct, then the current WTF::Mutex should become WTF::DeprecatedMutex, since there is no good reason to use the old style, and we should migrate away from it over time.

Another reason to migrate away from WTF::Mutex is that it is just a porting holdover from before the C++ thread support library existed. The same goes for WTF::Locker, WTF::ThreadCondition, etc.
Comment 11 Filip Pizlo 2015-08-04 13:25:52 PDT
(In reply to comment #10)
> > I agree that Mutex is the correct name, but then we'd first have to rename
> > the current WTF::Mutex to something like WTF::SystemMutex.  That seems
> > annoying, but it might be OK.
> 
> If the thesis of this patch is correct, then the current WTF::Mutex should
> become WTF::DeprecatedMutex, since there is no good reason to use the old
> style, and we should migrate away from it over time.
> 
> Another reason to migrate away from WTF::Mutex is that it is just a porting
> holdover from before the C++ thread support library existed. The same goes
> for WTF::Locker, WTF::ThreadCondition, etc.

Well, WTF::Mutex currently has the ability to support condition variables.  One approach would be to say that all users of WTF::Mutex should move to std::mutex.  But I prefer a different approach: implement a WTF::Condition using the same style as this patch - one word in the common case, etc.  Then all users of std::mutex should move to the new WTF::Mutex.  Even better, we could probably implement a one-word WTF::Monitor that is a combo of Mutex and Condition.  This works pretty well for a lot of our code since we often just have one Condition per Mutex.

I think that I like WTF::Lock - or some other sufficiently unique name - for the interim before we have a fast WTF::Condition that interoperates with WTF::Lock.  Once we have that, maybe we can do the mass renamings?
Comment 12 Mark Lam 2015-08-04 13:27:53 PDT
(In reply to comment #10)
> > I agree that Mutex is the correct name, but then we'd first have to rename
> > the current WTF::Mutex to something like WTF::SystemMutex.  That seems
> > annoying, but it might be OK.
> 
> If the thesis of this patch is correct, then the current WTF::Mutex should
> become WTF::DeprecatedMutex, since there is no good reason to use the old
> style, and we should migrate away from it over time.
> 
> Another reason to migrate away from WTF::Mutex is that it is just a porting
> holdover from before the C++ thread support library existed. The same goes
> for WTF::Locker, WTF::ThreadCondition, etc.

From what I can tell, Fil's implementation does not support priority inheritance.  If all a low priority is thread is holding the Lock/Mutex and high priority threads are waiting on the ConditionVar, the owner thread of the Lock will continue to run at a low priority relative to other high priority threads.  Does WebKit rely on priority inheritance anywhere?
Comment 13 Anders Carlsson 2015-08-04 13:41:19 PDT
We could just rename current WTF::Mutex to WTF::DeprecatedMutex.

Also, I think you should put this class inside wtf/threading where we have (admittedly only one at the moment) other threading classes.
Comment 14 Filip Pizlo 2015-08-04 13:58:07 PDT
(In reply to comment #12)
> (In reply to comment #10)
> > > I agree that Mutex is the correct name, but then we'd first have to rename
> > > the current WTF::Mutex to something like WTF::SystemMutex.  That seems
> > > annoying, but it might be OK.
> > 
> > If the thesis of this patch is correct, then the current WTF::Mutex should
> > become WTF::DeprecatedMutex, since there is no good reason to use the old
> > style, and we should migrate away from it over time.
> > 
> > Another reason to migrate away from WTF::Mutex is that it is just a porting
> > holdover from before the C++ thread support library existed. The same goes
> > for WTF::Locker, WTF::ThreadCondition, etc.
> 
> From what I can tell, Fil's implementation does not support priority
> inheritance.  If all a low priority is thread is holding the Lock/Mutex and
> high priority threads are waiting on the ConditionVar, the owner thread of
> the Lock will continue to run at a low priority relative to other high
> priority threads.  Does WebKit rely on priority inheritance anywhere?

True.  Our SpinLock also doesn't support priority inheritance, and this currently replaces SpinLock.  So, we should be fine.
Comment 15 Mark Lam 2015-08-04 14:01:32 PDT
(In reply to comment #14)
> (In reply to comment #12)
> > (In reply to comment #10)
> > > > I agree that Mutex is the correct name, but then we'd first have to rename
> > > > the current WTF::Mutex to something like WTF::SystemMutex.  That seems
> > > > annoying, but it might be OK.
> > > 
> > > If the thesis of this patch is correct, then the current WTF::Mutex should
> > > become WTF::DeprecatedMutex, since there is no good reason to use the old
> > > style, and we should migrate away from it over time.
> > > 
> > > Another reason to migrate away from WTF::Mutex is that it is just a porting
> > > holdover from before the C++ thread support library existed. The same goes
> > > for WTF::Locker, WTF::ThreadCondition, etc.
> > 
> > From what I can tell, Fil's implementation does not support priority
> > inheritance.  If all a low priority is thread is holding the Lock/Mutex and
> > high priority threads are waiting on the ConditionVar, the owner thread of
> > the Lock will continue to run at a low priority relative to other high
> > priority threads.  Does WebKit rely on priority inheritance anywhere?
> 
> True.  Our SpinLock also doesn't support priority inheritance, and this
> currently replaces SpinLock.  So, we should be fine.

I'm ok with it replacing SpinLock.  My concern was with Geoff's consideration of replacing all uses of WTF::Mutex with this Lock implementation.  WTF::Mutex does support priority inheritance.  I just want to make sure that we're not replacing the current WTF::Mutex without due consideration.
Comment 16 Filip Pizlo 2015-08-04 14:11:45 PDT
(In reply to comment #15)
> (In reply to comment #14)
> > (In reply to comment #12)
> > > (In reply to comment #10)
> > > > > I agree that Mutex is the correct name, but then we'd first have to rename
> > > > > the current WTF::Mutex to something like WTF::SystemMutex.  That seems
> > > > > annoying, but it might be OK.
> > > > 
> > > > If the thesis of this patch is correct, then the current WTF::Mutex should
> > > > become WTF::DeprecatedMutex, since there is no good reason to use the old
> > > > style, and we should migrate away from it over time.
> > > > 
> > > > Another reason to migrate away from WTF::Mutex is that it is just a porting
> > > > holdover from before the C++ thread support library existed. The same goes
> > > > for WTF::Locker, WTF::ThreadCondition, etc.
> > > 
> > > From what I can tell, Fil's implementation does not support priority
> > > inheritance.  If all a low priority is thread is holding the Lock/Mutex and
> > > high priority threads are waiting on the ConditionVar, the owner thread of
> > > the Lock will continue to run at a low priority relative to other high
> > > priority threads.  Does WebKit rely on priority inheritance anywhere?
> > 
> > True.  Our SpinLock also doesn't support priority inheritance, and this
> > currently replaces SpinLock.  So, we should be fine.
> 
> I'm ok with it replacing SpinLock.  My concern was with Geoff's
> consideration of replacing all uses of WTF::Mutex with this Lock
> implementation.  WTF::Mutex does support priority inheritance.  I just want
> to make sure that we're not replacing the current WTF::Mutex without due
> consideration.

Considering how much we use SpinLock, whatever priority inheritance Mutex does have is ineffective.  It only takes one lock that doesn't know about priorities to cause a priority inversion, and I think we use SpinLock much more than Mutex.
Comment 17 Gavin Barraclough 2015-08-04 17:38:33 PDT
> > From what I can tell, Fil's implementation does not support priority
> > inheritance.  If all a low priority is thread is holding the Lock/Mutex and
> > high priority threads are waiting on the ConditionVar, the owner thread of
> > the Lock will continue to run at a low priority relative to other high
> > priority threads.  Does WebKit rely on priority inheritance anywhere?
> 
> True.  Our SpinLock also doesn't support priority inheritance, and this
> currently replaces SpinLock.  So, we should be fine.

This line of reasoning make me like the name WTF::SystemMutex for the old mutex.

If you explicitly need all the system mutex behaviors (i.e. priority inheritance) you should use WTF::SystemMutex. If you don't (i.e. most use cases) then just use WTF::Mutex.
Comment 18 Filip Pizlo 2015-08-04 18:06:03 PDT
This is totally neutral on JSC benchmarks, as expected.


Benchmark report for SunSpider, LongSpider, V8Spider, Octane, Kraken, JSRegress, AsmBench, and CompressionBench on shakezilla (MacBookPro11,3).

VMs tested:
"TipOfTree" at /Volumes/Data/secondary/OpenSource/WebKitBuild/Release/jsc (r187865)
"Locks" at /Volumes/Data/quartary/OpenSource/WebKitBuild/Release/jsc (r187865)

Collected 6 samples per benchmark/VM, with 6 VM invocations per benchmark. Emitted a call to gc() between sample measurements.
Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level
timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                        TipOfTree                   Locks                                       
SunSpider:
   3d-cube                                            4.6188+-0.3398            4.5114+-0.3606          might be 1.0238x faster
   3d-morph                                           5.2458+-0.1575            5.1527+-0.1167          might be 1.0181x faster
   3d-raytrace                                        5.1114+-0.1005     ?      5.1558+-0.2675        ?
   access-binary-trees                                2.1410+-0.1643     ?      2.2280+-0.3079        ? might be 1.0407x slower
   access-fannkuch                                    5.1596+-0.0585     ?      5.3521+-0.2510        ? might be 1.0373x slower
   access-nbody                                       2.4100+-0.0680     ?      2.4127+-0.0940        ?
   access-nsieve                                      3.0173+-0.1354     ?      3.0187+-0.0812        ?
   bitops-3bit-bits-in-byte                           1.4736+-0.0371            1.4592+-0.0414        
   bitops-bits-in-byte                                3.2611+-0.0897            3.2560+-0.0809        
   bitops-bitwise-and                                 1.9926+-0.0351     ?      2.0106+-0.0718        ?
   bitops-nsieve-bits                                 2.9257+-0.0965            2.8903+-0.1189          might be 1.0122x faster
   controlflow-recursive                              2.1397+-0.2957            2.0040+-0.0825          might be 1.0677x faster
   crypto-aes                                         3.8312+-0.0946     ?      3.9472+-0.2965        ? might be 1.0303x slower
   crypto-md5                                         2.4287+-0.0536     ?      2.4761+-0.0776        ? might be 1.0195x slower
   crypto-sha1                                        2.3803+-0.1830     ?      2.7833+-0.9070        ? might be 1.1693x slower
   date-format-tofte                                  6.7365+-0.5026            6.6499+-0.2602          might be 1.0130x faster
   date-format-xparb                                  4.5785+-0.4290     ?      4.6176+-0.2775        ?
   math-cordic                                        2.7206+-0.0705     ?      2.7586+-0.0657        ? might be 1.0140x slower
   math-partial-sums                                  5.3063+-0.3196            5.1866+-0.1879          might be 1.0231x faster
   math-spectral-norm                                 1.8261+-0.0803            1.8233+-0.0812        
   regexp-dna                                         6.3188+-0.3528            6.2596+-0.5123        
   string-base64                                      4.2675+-0.1632     ?      4.3479+-0.4265        ? might be 1.0189x slower
   string-fasta                                       5.9072+-0.6528            5.6984+-0.0657          might be 1.0366x faster
   string-tagcloud                                    7.9675+-0.1336     ?      8.4041+-0.7230        ? might be 1.0548x slower
   string-unpack-code                                19.9951+-0.5403           19.5374+-0.3111          might be 1.0234x faster
   string-validate-input                              5.0745+-1.3733            4.6250+-0.2041          might be 1.0972x faster

   <arithmetic>                                       4.5706+-0.1171            4.5602+-0.0625          might be 1.0023x faster

                                                        TipOfTree                   Locks                                       
LongSpider:
   3d-cube                                          771.8095+-7.7938     ?    773.0779+-9.1282        ?
   3d-morph                                        1494.4234+-7.5328         1490.1073+-5.8303        
   3d-raytrace                                      619.9231+-3.2634     ?    621.4335+-3.9142        ?
   access-binary-trees                              812.5942+-5.3948     ?    812.7593+-10.7651       ?
   access-fannkuch                                  274.1295+-1.0160     ?    274.3002+-1.8077        ?
   access-nbody                                     514.1676+-1.9901     ?    517.4148+-5.7030        ?
   access-nsieve                                    365.8359+-11.6282    ?    375.9618+-16.6471       ? might be 1.0277x slower
   bitops-3bit-bits-in-byte                          39.9388+-0.4853     ?     40.1216+-1.2031        ?
   bitops-bits-in-byte                               81.0704+-2.1803           79.5781+-2.1317          might be 1.0188x faster
   bitops-nsieve-bits                               406.6110+-7.8635          403.8947+-5.2638        
   controlflow-recursive                            418.9563+-2.0375     ?    424.5520+-9.2078        ? might be 1.0134x slower
   crypto-aes                                       576.9404+-5.4632     ?    578.2195+-5.4360        ?
   crypto-md5                                       474.6021+-7.0047     ?    482.6179+-17.4491       ? might be 1.0169x slower
   crypto-sha1                                      636.2938+-4.0944     ?    640.3953+-3.1385        ?
   date-format-tofte                                498.8580+-11.7161    ?    503.7063+-24.5992       ?
   date-format-xparb                                627.6571+-11.5439         627.0872+-6.2736        
   hash-map                                         156.8577+-1.5164          155.9521+-1.5994        
   math-cordic                                      482.5645+-3.3998     ?    483.8471+-2.1519        ?
   math-partial-sums                                464.8508+-3.5575          462.0233+-2.0481        
   math-spectral-norm                               548.6165+-4.3437     ?    551.7510+-6.7575        ?
   string-base64                                    346.6176+-4.4541     ?    346.6691+-4.1204        ?
   string-fasta                                     357.2021+-3.0032     ?    357.4994+-2.6034        ?
   string-tagcloud                                  199.2832+-57.4505         177.5579+-1.5202          might be 1.1224x faster

   <geometric>                                      391.0695+-4.6109          390.4987+-1.7559          might be 1.0015x faster

                                                        TipOfTree                   Locks                                       
V8Spider:
   crypto                                            49.1598+-1.7482           48.9808+-1.9084        
   deltablue                                         79.0791+-2.6777     ?     79.7299+-4.5284        ?
   earley-boyer                                      39.5946+-0.5930           39.2212+-1.0414        
   raytrace                                          28.7557+-0.7918     ?     31.6886+-6.1090        ? might be 1.1020x slower
   regexp                                            61.1023+-0.5778     ?     61.6135+-0.7153        ?
   richards                                          70.9105+-1.7815     ?     71.5622+-3.4116        ?
   splay                                             34.1810+-2.0900           34.0174+-2.3288        

   <geometric>                                       48.7429+-0.4757     ?     49.3850+-0.8291        ? might be 1.0132x slower

                                                        TipOfTree                   Locks                                       
Octane:
   encrypt                                           0.20265+-0.00074          0.20230+-0.00301       
   decrypt                                           3.56572+-0.47534          3.33499+-0.01607         might be 1.0692x faster
   deltablue                                x2       0.15417+-0.00137          0.15307+-0.00180       
   earley                                            0.30291+-0.00540          0.30110+-0.00649       
   boyer                                             4.15908+-0.02376          4.13385+-0.02584       
   navier-stokes                            x2       4.92411+-0.02893          4.91718+-0.02410       
   raytrace                                 x2       1.04270+-0.07241          1.01884+-0.07396         might be 1.0234x faster
   richards                                 x2       0.10021+-0.00161          0.10017+-0.00196       
   splay                                    x2       0.33619+-0.00219    !     0.34187+-0.00215       ! definitely 1.0169x slower
   regexp                                   x2      24.68531+-0.30351    ?    24.83500+-0.51691       ?
   pdfjs                                    x2      37.33105+-0.63204    ?    37.49453+-0.34998       ?
   mandreel                                 x2      43.63864+-0.22603    ?    44.33171+-0.78447       ? might be 1.0159x slower
   gbemu                                    x2      34.33502+-1.71199    ?    34.35153+-1.03796       ?
   closure                                           0.55893+-0.00571    ?     0.56102+-0.00125       ?
   jquery                                            7.11598+-0.06968    ?     7.13456+-0.03945       ?
   box2d                                    x2      10.03658+-0.06629    ?    10.05517+-0.12237       ?
   zlib                                     x2     389.75605+-2.90554        385.79913+-20.33314        might be 1.0103x faster
   typescript                               x2     658.92818+-8.09899        656.12842+-11.54885      

   <geometric>                                       5.62700+-0.04596          5.61331+-0.04365         might be 1.0024x faster

                                                        TipOfTree                   Locks                                       
Kraken:
   ai-astar                                          230.285+-8.356      ?     236.064+-1.856         ? might be 1.0251x slower
   audio-beat-detection                               58.771+-0.669      ?      58.976+-0.471         ?
   audio-dft                                          95.483+-2.067      ?      95.895+-1.841         ?
   audio-fft                                          34.981+-0.570      ?      35.125+-0.601         ?
   audio-oscillator                                   62.665+-3.990      ?      62.940+-2.650         ?
   imaging-darkroom                                   58.677+-0.334      ?      59.003+-0.752         ?
   imaging-desaturate                                 54.471+-2.943      ?      55.475+-0.933         ? might be 1.0184x slower
   imaging-gaussian-blur                              82.631+-0.555             82.590+-0.261         
   json-parse-financial                               37.789+-0.731             37.238+-0.481           might be 1.0148x faster
   json-stringify-tinderbox                           22.592+-1.490      ?      22.814+-1.245         ?
   stanford-crypto-aes                                42.903+-1.715             41.604+-0.927           might be 1.0312x faster
   stanford-crypto-ccm                                35.273+-1.648      ?      36.158+-1.537         ? might be 1.0251x slower
   stanford-crypto-pbkdf2                             94.467+-1.003             94.257+-0.305         
   stanford-crypto-sha256-iterative                   36.312+-0.559      ?      36.499+-0.671         ?

   <arithmetic>                                       67.664+-0.939      ?      68.188+-0.477         ? might be 1.0077x slower

                                                        TipOfTree                   Locks                                       
JSRegress:
   abc-forward-loop-equal                            29.4311+-0.6455     ?     29.5404+-0.5091        ?
   abc-postfix-backward-loop                         29.4591+-0.6159           29.0017+-0.7659          might be 1.0158x faster
   abc-simple-backward-loop                          28.9969+-0.5550           28.8738+-0.3935        
   abc-simple-forward-loop                           29.2154+-0.9575           29.0029+-0.8270        
   abc-skippy-loop                                   21.1214+-0.6533     ?     21.4778+-1.1481        ? might be 1.0169x slower
   abs-boolean                                        2.4264+-0.0860            2.3811+-0.0347          might be 1.0190x faster
   adapt-to-double-divide                            16.0105+-0.2168     ?     16.2480+-0.5204        ? might be 1.0148x slower
   aliased-arguments-getbyval                         1.1665+-0.0399     ?      1.4693+-0.4157        ? might be 1.2596x slower
   allocate-big-object                                2.5751+-0.5480     ?      2.6128+-0.2639        ? might be 1.0146x slower
   arguments-named-and-reflective                    10.9051+-0.4144           10.8686+-0.3554        
   arguments-out-of-bounds                           10.0490+-0.9350            9.8034+-0.4199          might be 1.0251x faster
   arguments-strict-mode                              9.5454+-0.2523     ?      9.5959+-0.2316        ?
   arguments                                          8.1871+-0.2263     ?      8.4177+-0.3306        ? might be 1.0282x slower
   arity-mismatch-inlining                            0.8488+-0.1486            0.8202+-0.0454          might be 1.0348x faster
   array-access-polymorphic-structure                 6.3090+-0.5862            5.9623+-0.1803          might be 1.0581x faster
   array-nonarray-polymorhpic-access                 24.5815+-0.6221     ?     25.1122+-1.2537        ? might be 1.0216x slower
   array-prototype-every                             74.7804+-1.0854     ?     75.6542+-1.7074        ? might be 1.0117x slower
   array-prototype-forEach                           74.7113+-0.5646           74.4927+-1.0822        
   array-prototype-map                               83.1633+-1.0930     ?     83.7856+-0.6654        ?
   array-prototype-reduce                            70.6877+-1.6114     ?     70.9931+-1.8866        ?
   array-prototype-reduceRight                       70.8095+-2.1000           70.2520+-0.9904        
   array-prototype-some                              74.1130+-0.4029     !     75.4754+-0.7724        ! definitely 1.0184x slower
   array-splice-contiguous                           20.3935+-0.7524     ?     20.4006+-1.1547        ?
   array-with-double-add                              3.9681+-0.1761            3.9156+-0.0583          might be 1.0134x faster
   array-with-double-increment                        3.0434+-0.1339            2.9893+-0.0405          might be 1.0181x faster
   array-with-double-mul-add                          5.0186+-0.2781            4.9191+-0.0308          might be 1.0202x faster
   array-with-double-sum                              3.3228+-0.0785            3.2970+-0.0521        
   array-with-int32-add-sub                           7.2165+-0.0702            7.1890+-0.0923        
   array-with-int32-or-double-sum                     3.5155+-0.1898            3.3524+-0.0437          might be 1.0487x faster
   ArrayBuffer-DataView-alloc-large-long-lived   
                                                     28.9523+-0.3013     ?     29.1951+-0.4087        ?
   ArrayBuffer-DataView-alloc-long-lived             13.0372+-0.5093           12.8317+-0.4350          might be 1.0160x faster
   ArrayBuffer-Int32Array-byteOffset                  3.8682+-0.8988            3.5271+-0.0517          might be 1.0967x faster
   ArrayBuffer-Int8Array-alloc-large-long-lived   
                                                     35.7914+-12.1571          30.9712+-0.6537          might be 1.1556x faster
   ArrayBuffer-Int8Array-alloc-long-lived-buffer   
                                                     20.1246+-0.2806     ?     20.2460+-0.4632        ?
   ArrayBuffer-Int8Array-alloc-long-lived            12.1322+-0.3063     ?     12.8298+-1.5482        ? might be 1.0575x slower
   ArrayBuffer-Int8Array-alloc                        9.8477+-0.6430            9.5338+-0.5052          might be 1.0329x faster
   asmjs_bool_bug                                     7.0368+-0.0900     ?      7.1530+-0.2484        ? might be 1.0165x slower
   assign-custom-setter-polymorphic                   2.4277+-0.0677     ?      2.4716+-0.1799        ? might be 1.0181x slower
   assign-custom-setter                               3.3452+-0.1044     ?      3.3477+-0.2979        ?
   basic-set                                          7.8219+-0.3594            7.6944+-0.3537          might be 1.0166x faster
   big-int-mul                                        3.4474+-0.1785            3.3638+-0.0508          might be 1.0249x faster
   boolean-test                                       2.8345+-0.0976     ?      2.8393+-0.0753        ?
   branch-fold                                        3.5478+-0.0544            3.5389+-0.0350        
   branch-on-string-as-boolean                       16.1193+-0.6798     ?     16.3823+-0.8719        ? might be 1.0163x slower
   by-val-generic                                     7.1298+-0.2748            7.0569+-0.2293          might be 1.0103x faster
   call-spread-apply                                 27.1528+-1.7094     ?     27.5428+-1.2267        ? might be 1.0144x slower
   call-spread-call                                  22.6948+-1.5555     ?     23.6428+-1.8490        ? might be 1.0418x slower
   captured-assignments                               0.4191+-0.0469            0.4184+-0.0378        
   cast-int-to-double                                 4.9904+-0.0862     ?      5.0358+-0.2751        ?
   cell-argument                                      6.2188+-0.1947     ?      6.4216+-0.3847        ? might be 1.0326x slower
   cfg-simplify                                       2.7042+-0.0428     ?      2.7358+-0.0776        ? might be 1.0117x slower
   chain-getter-access                                8.3185+-0.5494            8.2552+-0.7248        
   cmpeq-obj-to-obj-other                            11.4144+-1.1263           11.0760+-1.2552          might be 1.0306x faster
   constant-test                                      4.7964+-0.0957            4.7922+-0.0353        
   create-lots-of-functions                           9.9573+-0.3876            9.4965+-0.5592          might be 1.0485x faster
   cse-new-array-buffer                               2.2085+-0.1138     ?      2.2989+-0.1539        ? might be 1.0409x slower
   cse-new-array                                      2.3505+-0.1683     ?      2.4184+-0.4198        ? might be 1.0289x slower
   DataView-custom-properties                        35.4969+-0.8371           35.2449+-0.9654        
   delay-tear-off-arguments-strictmode               13.1044+-0.8228           12.6542+-1.2472          might be 1.0356x faster
   deltablue-varargs                                147.7344+-3.3833     ?    149.5223+-2.2763        ? might be 1.0121x slower
   destructuring-arguments                          163.4681+-2.8567          162.2511+-2.1923        
   destructuring-parameters-overridden-by-function   
                                                      0.5030+-0.0408     ?      0.5389+-0.0284        ? might be 1.0714x slower
   destructuring-swap                                 4.5915+-0.0088     ?      4.6814+-0.2070        ? might be 1.0196x slower
   direct-arguments-getbyval                          1.1461+-0.0686     ?      1.1761+-0.0480        ? might be 1.0262x slower
   div-boolean-double                                 5.6390+-0.6562            5.2838+-0.1281          might be 1.0672x faster
   div-boolean                                        8.1726+-0.1546     ?      8.1959+-0.2375        ?
   double-get-by-val-out-of-bounds                    4.2944+-0.2925            4.1179+-0.1729          might be 1.0429x faster
   double-pollution-getbyval                          8.5732+-0.0294            8.5477+-0.0409        
   double-pollution-putbyoffset                       3.8433+-0.0861            3.7512+-0.0730          might be 1.0245x faster
   double-real-use                                   27.3352+-2.4386     ?     27.3402+-3.0593        ?
   double-to-int32-typed-array-no-inline              2.0493+-0.0625     ?      2.0610+-0.0487        ?
   double-to-int32-typed-array                        1.7634+-0.0720     ?      1.8650+-0.1075        ? might be 1.0576x slower
   double-to-uint32-typed-array-no-inline             2.0874+-0.0438     ?      2.1016+-0.0465        ?
   double-to-uint32-typed-array                       1.9541+-0.1746            1.8213+-0.0600          might be 1.0729x faster
   elidable-new-object-dag                           33.5557+-0.6089     ?     33.7206+-0.7661        ?
   elidable-new-object-roflcopter                    33.2937+-0.8281           33.1740+-0.5020        
   elidable-new-object-then-call                     30.6334+-1.1906     ?     32.1959+-1.5218        ? might be 1.0510x slower
   elidable-new-object-tree                          36.2391+-1.0350     ?     36.5379+-1.2132        ?
   empty-string-plus-int                              4.7792+-0.2019            4.6746+-0.3033          might be 1.0224x faster
   emscripten-cube2hash                              26.7148+-1.5520     ?     27.0872+-0.5889        ? might be 1.0139x slower
   exit-length-on-plain-object                       13.0363+-0.4496           12.7657+-0.1889          might be 1.0212x faster
   external-arguments-getbyval                        1.1845+-0.0618            1.1817+-0.0645        
   external-arguments-putbyval                        2.4150+-0.3808            2.1960+-0.1059          might be 1.0997x faster
   fixed-typed-array-storage-var-index                1.2191+-0.1174            1.1706+-0.0529          might be 1.0414x faster
   fixed-typed-array-storage                          0.8485+-0.0301     ?      0.9090+-0.0831        ? might be 1.0713x slower
   Float32Array-matrix-mult                           3.8852+-0.1154     ?      4.1158+-0.4322        ? might be 1.0594x slower
   Float32Array-to-Float64Array-set                  46.2218+-0.4158     !     47.5546+-0.7306        ! definitely 1.0288x slower
   Float64Array-alloc-long-lived                     73.5778+-0.8413     ?     74.4862+-1.3166        ? might be 1.0123x slower
   Float64Array-to-Int16Array-set                    57.5435+-1.4247     ?     57.8420+-2.1581        ?
   fold-double-to-int                                12.3702+-0.2197     ?     12.7174+-0.4438        ? might be 1.0281x slower
   fold-get-by-id-to-multi-get-by-offset-rare-int   
                                                     10.4351+-0.3672           10.3497+-0.3431        
   fold-get-by-id-to-multi-get-by-offset              8.6943+-0.2393     ?      9.0524+-0.4835        ? might be 1.0412x slower
   fold-multi-get-by-offset-to-get-by-offset   
                                                      8.5268+-0.5141            7.7818+-0.9843          might be 1.0957x faster
   fold-multi-get-by-offset-to-poly-get-by-offset   
                                                      7.3630+-1.2768     ?      8.1326+-0.7635        ? might be 1.1045x slower
   fold-multi-put-by-offset-to-poly-put-by-offset   
                                                      7.7292+-0.3083            7.7029+-0.2045        
   fold-multi-put-by-offset-to-put-by-offset   
                                                      4.6861+-0.6057     ?      5.1802+-0.6778        ? might be 1.1054x slower
   fold-multi-put-by-offset-to-replace-or-transition-put-by-offset   
                                                      9.9001+-1.1068            9.2662+-1.0821          might be 1.0684x faster
   fold-put-by-id-to-multi-put-by-offset              8.6160+-0.3561     ?      8.8826+-0.9294        ? might be 1.0310x slower
   fold-put-structure                                 4.7251+-0.8216     ?      5.1494+-0.6639        ? might be 1.0898x slower
   for-of-iterate-array-entries                      11.2489+-0.4281     ?     11.2823+-0.5164        ?
   for-of-iterate-array-keys                          3.3832+-0.2045     ?      3.4946+-0.1935        ? might be 1.0329x slower
   for-of-iterate-array-values                        3.5361+-0.4255            3.4215+-0.2286          might be 1.0335x faster
   fround                                            18.6280+-0.8883     ?     18.9003+-0.8606        ? might be 1.0146x slower
   ftl-library-inlining-dataview                     56.3653+-0.8020     ?     57.1455+-0.5690        ? might be 1.0138x slower
   ftl-library-inlining                             108.5827+-3.1158          107.3868+-0.8126          might be 1.0111x faster
   function-dot-apply                                 2.0738+-0.2045            1.9975+-0.1145          might be 1.0382x faster
   function-test                                      2.6981+-0.0479     ?      2.7323+-0.1454        ? might be 1.0127x slower
   function-with-eval                               101.7899+-27.8880          90.6862+-0.8463          might be 1.1224x faster
   gcse-poly-get-less-obvious                        14.1600+-0.1831     ?     14.5827+-1.3847        ? might be 1.0299x slower
   gcse-poly-get                                     14.4560+-1.0900           14.1248+-0.7286          might be 1.0234x faster
   gcse                                               3.7415+-0.0300     ?      3.7749+-0.1158        ?
   get-by-id-bimorphic-check-structure-elimination-simple   
                                                      2.5618+-0.0848     ?      2.5723+-0.0952        ?
   get-by-id-bimorphic-check-structure-elimination   
                                                      5.5898+-0.0290     ?      5.6350+-0.1315        ?
   get-by-id-chain-from-try-block                     5.5067+-0.1796     ?      5.7276+-0.3959        ? might be 1.0401x slower
   get-by-id-check-structure-elimination              4.4624+-0.4398            4.2795+-0.0566          might be 1.0427x faster
   get-by-id-proto-or-self                           14.7878+-1.1722           14.7739+-0.8649        
   get-by-id-quadmorphic-check-structure-elimination-simple   
                                                      2.8392+-0.0156     ?      2.9611+-0.3184        ? might be 1.0430x slower
   get-by-id-self-or-proto                           14.8367+-1.2603           14.6765+-0.6055          might be 1.0109x faster
   get-by-val-out-of-bounds                           4.0387+-0.1936            3.7887+-0.1236          might be 1.0660x faster
   get_callee_monomorphic                             2.2247+-0.1005            2.2176+-0.0985        
   get_callee_polymorphic                             3.1812+-0.1324     ?      3.1878+-0.1611        ?
   getter-no-activation                               4.8024+-0.2533     ?      4.9355+-0.4116        ? might be 1.0277x slower
   getter-prototype                                   9.7732+-0.2517     ?      9.8181+-0.3455        ?
   getter-richards                                  111.2248+-2.9748          106.5098+-4.3630          might be 1.0443x faster
   getter                                             5.3310+-0.3423     ?      5.5077+-0.6742        ? might be 1.0331x slower
   global-object-access-with-mutating-structure   
                                                      5.3385+-0.3275     ?      5.6608+-0.7679        ? might be 1.0604x slower
   global-var-const-infer-fire-from-opt               0.9130+-0.0848            0.9003+-0.1176          might be 1.0141x faster
   global-var-const-infer                             0.8393+-0.2034            0.7262+-0.0740          might be 1.1557x faster
   HashMap-put-get-iterate-keys                      26.1661+-1.1985     ?     26.6242+-2.0088        ? might be 1.0175x slower
   HashMap-put-get-iterate                           26.0252+-1.0754           26.0108+-1.2974        
   HashMap-string-put-get-iterate                    24.8424+-1.3729           24.0193+-1.0171          might be 1.0343x faster
   hoist-make-rope                                    7.9203+-0.3836            7.6068+-0.4711          might be 1.0412x faster
   hoist-poly-check-structure-effectful-loop   
                                                      4.7454+-1.2376            4.2230+-0.0790          might be 1.1237x faster
   hoist-poly-check-structure                         3.2236+-0.0974     ?      3.2330+-0.0852        ?
   imul-double-only                                   6.7386+-0.3489            6.6762+-0.3878        
   imul-int-only                                      7.9030+-0.5001     ?      8.1891+-0.6864        ? might be 1.0362x slower
   imul-mixed                                         7.2461+-1.7053            6.9045+-0.9320          might be 1.0495x faster
   in-four-cases                                     17.2359+-0.4854     ?     17.3593+-0.5640        ?
   in-one-case-false                                  9.0988+-0.1774     ?      9.1637+-0.3132        ?
   in-one-case-true                                   9.0683+-0.2109     ?      9.1772+-0.1992        ? might be 1.0120x slower
   in-two-cases                                       9.5544+-0.2156            9.3526+-0.2068          might be 1.0216x faster
   indexed-properties-in-objects                      2.7566+-0.0528     ?      2.8378+-0.3293        ? might be 1.0295x slower
   infer-closure-const-then-mov-no-inline             3.0016+-0.0663            2.9661+-0.0317          might be 1.0119x faster
   infer-closure-const-then-mov                      17.2023+-0.6241           16.9183+-0.8782          might be 1.0168x faster
   infer-closure-const-then-put-to-scope-no-inline   
                                                     11.4705+-0.1600     ?     11.5735+-0.3966        ?
   infer-closure-const-then-put-to-scope             20.5011+-0.9456     ?     21.2447+-0.4627        ? might be 1.0363x slower
   infer-closure-const-then-reenter-no-inline   
                                                     49.5101+-0.4197     ?     49.6235+-0.8381        ?
   infer-closure-const-then-reenter                  22.0381+-0.8619           21.4801+-0.5591          might be 1.0260x faster
   infer-constant-global-property                     3.3746+-0.0691            3.3238+-0.0690          might be 1.0153x faster
   infer-constant-property                            2.5870+-0.0767     ?      2.6871+-0.3098        ? might be 1.0387x slower
   infer-one-time-closure-ten-vars                    8.5613+-0.5191            8.4368+-0.2019          might be 1.0148x faster
   infer-one-time-closure-two-vars                    8.6231+-1.1874            7.8493+-0.4357          might be 1.0986x faster
   infer-one-time-closure                             7.7193+-0.3155            7.7172+-0.2470        
   infer-one-time-deep-closure                       12.5498+-0.2206           12.5320+-0.2301        
   inline-arguments-access                            3.6872+-0.1390            3.6535+-0.2597        
   inline-arguments-aliased-access                    3.6155+-0.0859            3.5938+-0.0906        
   inline-arguments-local-escape                      3.7382+-0.2183     ?      3.9161+-0.1768        ? might be 1.0476x slower
   inline-get-scoped-var                              4.8607+-0.6836            4.5531+-0.0968          might be 1.0675x faster
   inlined-put-by-id-transition                      10.5071+-0.9659           10.4994+-0.5332        
   int-or-other-abs-then-get-by-val                   4.7192+-0.0745     ?      4.7309+-0.0649        ?
   int-or-other-abs-zero-then-get-by-val             16.0094+-0.6580     ?     16.5340+-1.8372        ? might be 1.0328x slower
   int-or-other-add-then-get-by-val                   3.9679+-0.0738     ?      3.9836+-0.1303        ?
   int-or-other-add                                   4.8459+-0.1881            4.7853+-0.0755          might be 1.0127x faster
   int-or-other-div-then-get-by-val                   3.6446+-0.0483     ?      4.2713+-1.4209        ? might be 1.1720x slower
   int-or-other-max-then-get-by-val                   3.9763+-0.1104            3.9296+-0.1083          might be 1.0119x faster
   int-or-other-min-then-get-by-val                   3.9684+-0.2585     ?      4.0075+-0.1882        ?
   int-or-other-mod-then-get-by-val                   3.4517+-0.1176            3.3933+-0.0344          might be 1.0172x faster
   int-or-other-mul-then-get-by-val                   3.5400+-0.1007            3.4863+-0.0687          might be 1.0154x faster
   int-or-other-neg-then-get-by-val                   4.4394+-0.1172            4.4148+-0.0986        
   int-or-other-neg-zero-then-get-by-val             15.7043+-0.2769     ?     16.0299+-0.5322        ? might be 1.0207x slower
   int-or-other-sub-then-get-by-val                   4.0084+-0.0709     ?      4.0190+-0.0769        ?
   int-or-other-sub                                   3.4205+-0.1129            3.3710+-0.2498          might be 1.0147x faster
   int-overflow-local                                 4.2570+-0.2466            4.1997+-0.0917          might be 1.0136x faster
   Int16Array-alloc-long-lived                       46.5350+-0.8865           46.3805+-0.8759        
   Int16Array-bubble-sort-with-byteLength            17.9331+-1.4371     ?     18.1070+-1.4030        ?
   Int16Array-bubble-sort                            19.5194+-4.6858           17.1704+-0.2678          might be 1.1368x faster
   Int16Array-load-int-mul                            1.4073+-0.0567     ?      1.4139+-0.0512        ?
   Int16Array-to-Int32Array-set                      44.5347+-0.9270           44.3029+-0.8960        
   Int32Array-alloc-large                            12.2661+-0.7776     ?     12.2882+-0.5082        ?
   Int32Array-alloc-long-lived                       55.9569+-1.0172     ?     56.8643+-0.7153        ? might be 1.0162x slower
   Int32Array-alloc                                   3.1479+-0.5080            2.9006+-0.1494          might be 1.0853x faster
   Int32Array-Int8Array-view-alloc                    5.9994+-0.1947     ?      6.0868+-0.4361        ? might be 1.0146x slower
   int52-spill                                        4.4820+-0.1097     ?      4.6136+-0.2119        ? might be 1.0294x slower
   Int8Array-alloc-long-lived                        41.5892+-0.6479           41.5321+-1.3298        
   Int8Array-load-with-byteLength                     3.3646+-0.1132            3.3544+-0.0882        
   Int8Array-load                                     3.3743+-0.1382            3.3084+-0.0218          might be 1.0199x faster
   integer-divide                                    10.3449+-0.2948           10.2629+-0.2195        
   integer-modulo                                     1.6921+-0.0682            1.6443+-0.0262          might be 1.0291x faster
   is-boolean-fold-tricky                             3.7695+-0.2507            3.6943+-0.0937          might be 1.0204x faster
   is-boolean-fold                                    2.6397+-0.0719            2.6315+-0.2307        
   is-function-fold-tricky-internal-function   
                                                      9.7115+-0.1852     ?      9.9187+-0.3937        ? might be 1.0213x slower
   is-function-fold-tricky                            4.1689+-0.2322            4.1405+-0.3524        
   is-function-fold                                   2.5895+-0.0308     ?      2.6680+-0.0841        ? might be 1.0303x slower
   is-number-fold-tricky                              4.0082+-0.1464            3.9374+-0.0511          might be 1.0180x faster
   is-number-fold                                     2.7183+-0.2368            2.6349+-0.1407          might be 1.0317x faster
   is-object-or-null-fold-functions                   2.6198+-0.0177     ?      2.6492+-0.0771        ? might be 1.0112x slower
   is-object-or-null-fold-less-tricky                 4.1205+-0.1828            3.9911+-0.0580          might be 1.0324x faster
   is-object-or-null-fold-tricky                      5.2237+-0.2759            5.1108+-0.1936          might be 1.0221x faster
   is-object-or-null-fold                             2.5963+-0.0162     ?      2.6821+-0.1837        ? might be 1.0331x slower
   is-object-or-null-trickier-function                4.0523+-0.0857     ?      4.1052+-0.1356        ? might be 1.0131x slower
   is-object-or-null-trickier-internal-function   
                                                     10.4813+-0.5467           10.3479+-0.3109          might be 1.0129x faster
   is-object-or-null-tricky-function                  4.1650+-0.3293            4.0823+-0.0774          might be 1.0203x faster
   is-object-or-null-tricky-internal-function   
                                                      7.6849+-0.2710            7.5968+-0.1495          might be 1.0116x faster
   is-string-fold-tricky                              4.0859+-0.3097            3.9436+-0.0668          might be 1.0361x faster
   is-string-fold                                     2.6167+-0.0317            2.5723+-0.0478          might be 1.0173x faster
   is-undefined-fold-tricky                           3.2343+-0.0131     ?      3.2968+-0.0941        ? might be 1.0193x slower
   is-undefined-fold                                  2.6873+-0.2312            2.5906+-0.0400          might be 1.0373x faster
   large-int-captured                                 3.8054+-0.1124     ?      3.8549+-0.1177        ? might be 1.0130x slower
   large-int-neg                                     13.5714+-0.1696     ?     13.7554+-0.2425        ? might be 1.0136x slower
   large-int                                         13.4234+-0.4982           13.3300+-0.2232        
   load-varargs-elimination                          23.0448+-2.0188           21.2191+-0.5203          might be 1.0860x faster
   logical-not-weird-types                            2.8005+-0.0471     ?      2.8770+-0.0920        ? might be 1.0273x slower
   logical-not                                        4.1720+-0.0488     ?      4.1772+-0.0324        ?
   lots-of-fields                                     9.2710+-0.3436            9.2445+-0.3625        
   make-indexed-storage                               2.7332+-0.2411     ?      2.9135+-0.0432        ? might be 1.0660x slower
   make-rope-cse                                      3.5614+-0.1117            3.4460+-0.0686          might be 1.0335x faster
   marsaglia-larger-ints                             32.5627+-1.4280           32.3544+-1.0415        
   marsaglia-osr-entry                               21.2875+-0.9930     ?     21.5020+-1.0563        ? might be 1.0101x slower
   math-with-out-of-bounds-array-values              21.1706+-0.3917     ?     22.0400+-1.3890        ? might be 1.0411x slower
   max-boolean                                        2.9205+-0.4292            2.6804+-0.0474          might be 1.0896x faster
   method-on-number                                  16.3499+-2.2298           15.7264+-0.3308          might be 1.0396x faster
   min-boolean                                        2.7332+-0.1508            2.6893+-0.0589          might be 1.0163x faster
   minus-boolean-double                               3.0157+-0.0560     ?      3.0502+-0.0305        ? might be 1.0114x slower
   minus-boolean                                      2.4617+-0.3589            2.3305+-0.0473          might be 1.0563x faster
   misc-strict-eq                                    30.6582+-1.4321     ?     31.0740+-1.2283        ? might be 1.0136x slower
   mod-boolean-double                                10.9356+-0.0236     ?     11.3243+-0.6947        ? might be 1.0355x slower
   mod-boolean                                        8.2232+-0.0381     ?      8.3283+-0.2901        ? might be 1.0128x slower
   mul-boolean-double                                 3.5007+-0.0482     ?      3.6218+-0.1176        ? might be 1.0346x slower
   mul-boolean                                        2.7504+-0.0337     ?      2.7784+-0.0829        ? might be 1.0102x slower
   neg-boolean                                        3.0803+-0.0509            3.0783+-0.0755        
   negative-zero-divide                               0.3402+-0.0201            0.3372+-0.0260        
   negative-zero-modulo                               0.3317+-0.0181            0.3227+-0.0090          might be 1.0280x faster
   negative-zero-negate                               0.2969+-0.0079     ?      0.3120+-0.0242        ? might be 1.0508x slower
   nested-function-parsing                           43.9893+-0.7904     ?     45.3918+-1.5144        ? might be 1.0319x slower
   new-array-buffer-dead                             88.6620+-0.6082           88.2945+-0.8151        
   new-array-buffer-push                              5.9692+-0.1872     ?      6.0770+-0.1910        ? might be 1.0181x slower
   new-array-dead                                    13.7745+-0.8925     ?     14.3691+-1.5777        ? might be 1.0432x slower
   new-array-push                                     6.2713+-0.1677     ?      6.3263+-0.3784        ?
   no-inline-constructor                             30.6511+-0.5837     ?     30.6954+-0.4519        ?
   number-test                                        2.9928+-0.5127     ?      3.1079+-0.5614        ? might be 1.0384x slower
   object-closure-call                                5.5295+-1.4609            5.0150+-0.2569          might be 1.1026x faster
   object-get-own-property-symbols-on-large-array   
                                                      4.3228+-0.2826            3.9572+-0.2154          might be 1.0924x faster
   object-test                                        2.5636+-0.0436     ?      2.5942+-0.1060        ? might be 1.0119x slower
   obvious-sink-pathology-taken                      98.1758+-0.5442     ?     98.9768+-1.0129        ?
   obvious-sink-pathology                            94.4482+-0.7450           93.9732+-0.5454        
   obviously-elidable-new-object                     29.0589+-1.9294           27.9064+-1.2521          might be 1.0413x faster
   plus-boolean-arith                                 2.3861+-0.1421            2.3766+-0.1183        
   plus-boolean-double                                3.0127+-0.0298     ?      3.0751+-0.0842        ? might be 1.0207x slower
   plus-boolean                                       2.5045+-0.0434            2.4974+-0.0642        
   poly-chain-access-different-prototypes-simple   
                                                      3.1909+-0.0387     ?      3.3320+-0.2930        ? might be 1.0442x slower
   poly-chain-access-different-prototypes             2.8161+-0.0582     ?      2.9920+-0.3539        ? might be 1.0625x slower
   poly-chain-access-simpler                          3.1650+-0.0639     ?      3.4338+-0.4778        ? might be 1.0849x slower
   poly-chain-access                                  3.2872+-0.0415            3.2510+-0.0889          might be 1.0111x faster
   poly-stricteq                                     51.1250+-2.3018           50.3325+-1.0605          might be 1.0157x faster
   polymorphic-array-call                             1.2576+-0.0962            1.2429+-0.0791          might be 1.0119x faster
   polymorphic-get-by-id                              2.8027+-0.1118     ?      2.8370+-0.1066        ? might be 1.0122x slower
   polymorphic-put-by-id                             25.0592+-1.9886           25.0528+-0.9222        
   polymorphic-structure                             13.2218+-0.4555           13.0255+-0.2717          might be 1.0151x faster
   polyvariant-monomorphic-get-by-id                  6.4676+-0.6409     ?      6.5570+-0.5297        ? might be 1.0138x slower
   proto-getter-access                                8.2242+-0.6557            8.0768+-0.2657          might be 1.0182x faster
   prototype-access-with-mutating-prototype           5.1840+-0.1869     ?      5.2817+-0.2326        ? might be 1.0188x slower
   put-by-id-replace-and-transition                   8.0203+-0.9125            7.8495+-0.7383          might be 1.0218x faster
   put-by-id-slightly-polymorphic                     2.6041+-0.0615     ?      2.6045+-0.0390        ?
   put-by-id                                         10.1289+-0.5885            9.7128+-0.3381          might be 1.0428x faster
   put-by-val-direct                                  0.3269+-0.0392     ?      0.3350+-0.0427        ? might be 1.0248x slower
   put-by-val-large-index-blank-indexing-type   
                                                     12.2990+-0.5069           12.1674+-0.6289          might be 1.0108x faster
   put-by-val-machine-int                             2.4688+-0.0919     ?      2.4921+-0.0649        ?
   rare-osr-exit-on-local                            14.2319+-0.1034     ?     14.2503+-0.2043        ?
   register-pressure-from-osr                        16.2970+-0.4292           16.2575+-0.5536        
   repeat-multi-get-by-offset                        21.3699+-0.4236           21.3483+-0.4161        
   setter-prototype                                   7.4680+-0.5662            7.2887+-0.1055          might be 1.0246x faster
   setter                                             5.5324+-0.4482            5.2751+-0.2463          might be 1.0488x faster
   simple-activation-demo                            24.7349+-1.2257           24.1342+-0.3658          might be 1.0249x faster
   simple-getter-access                              10.3615+-0.3310     ?     10.5888+-0.4820        ? might be 1.0219x slower
   simple-poly-call-nested                            8.4572+-0.1856     ?      8.5163+-0.5092        ?
   simple-poly-call                                   1.2171+-0.0511     ?      1.2476+-0.0236        ? might be 1.0250x slower
   sin-boolean                                       17.6104+-0.7849     ?     18.1829+-0.5633        ? might be 1.0325x slower
   singleton-scope                                   57.3776+-1.3370           57.2948+-0.7073        
   sink-function                                      9.1988+-0.8236     ?      9.5696+-0.8152        ? might be 1.0403x slower
   sink-huge-activation                              16.3401+-0.8877     ?     16.4679+-1.0364        ?
   sinkable-new-object-dag                           56.0027+-2.0327     ?     56.4583+-1.7138        ?
   sinkable-new-object-taken                         42.8366+-0.3905           41.5056+-1.1038          might be 1.0321x faster
   sinkable-new-object                               29.6855+-0.7479           29.5966+-0.8989        
   slow-array-profile-convergence                     2.5593+-0.0700            2.4167+-0.1560          might be 1.0590x faster
   slow-convergence                                   2.3380+-0.0639     ?      2.4402+-0.2639        ? might be 1.0437x slower
   slow-ternaries                                    17.0802+-1.1567           16.3400+-0.5025          might be 1.0453x faster
   sorting-benchmark                                 16.7271+-0.7309     ?     17.2496+-1.1350        ? might be 1.0312x slower
   sparse-conditional                                 1.1231+-0.0685     ?      1.1281+-0.0984        ?
   splice-to-remove                                  12.7288+-0.4852     ?     13.4135+-0.8492        ? might be 1.0538x slower
   string-char-code-at                               14.0789+-0.2012     ?     14.5408+-0.6210        ? might be 1.0328x slower
   string-concat-object                               2.0315+-0.1222     ?      2.0942+-0.2208        ? might be 1.0308x slower
   string-concat-pair-object                          2.1005+-0.0973            2.0427+-0.0998          might be 1.0283x faster
   string-concat-pair-simple                          9.1489+-0.2608     ?      9.4525+-0.6427        ? might be 1.0332x slower
   string-concat-simple                               9.3206+-0.3588            9.1401+-0.2679          might be 1.0197x faster
   string-cons-repeat                                 6.5197+-0.3724            6.2753+-0.0930          might be 1.0389x faster
   string-cons-tower                                  6.8563+-0.2696            6.8459+-0.4448        
   string-equality                                   14.9040+-0.2879     ?     16.8920+-4.9129        ? might be 1.1334x slower
   string-get-by-val-big-char                         6.4945+-0.2081     ?      6.6141+-0.3633        ? might be 1.0184x slower
   string-get-by-val-out-of-bounds-insane             3.2140+-0.1606            3.1789+-0.1510          might be 1.0110x faster
   string-get-by-val-out-of-bounds                    4.0335+-0.0897            4.0117+-0.1012        
   string-get-by-val                                  2.7732+-0.0312     ?      2.8350+-0.1523        ? might be 1.0223x slower
   string-hash                                        1.8011+-0.0503     ?      1.8518+-0.1423        ? might be 1.0281x slower
   string-long-ident-equality                        12.5070+-0.3099           12.4930+-0.1623        
   string-out-of-bounds                              10.2957+-0.2414     ?     10.3895+-0.5651        ?
   string-repeat-arith                               28.3127+-2.4203           27.6652+-1.9970          might be 1.0234x faster
   string-sub                                        52.2406+-1.7343     ?     53.7339+-2.7587        ? might be 1.0286x slower
   string-test                                        2.7150+-0.1090            2.7052+-0.1575        
   string-var-equality                               26.0236+-0.8814           25.8988+-0.1012        
   structure-hoist-over-transitions                   2.5518+-0.3256            2.3871+-0.1026          might be 1.0690x faster
   substring-concat-weird                            36.3336+-1.0488           36.0645+-0.9566        
   substring-concat                                  40.4528+-1.1500           39.6150+-1.0643          might be 1.0211x faster
   substring                                         44.2067+-0.6149     ?     45.1627+-1.7261        ? might be 1.0216x slower
   switch-char-constant                               2.6517+-0.0763            2.6501+-0.0475        
   switch-char                                        5.3918+-0.4063     ?      6.1407+-0.7219        ? might be 1.1389x slower
   switch-constant                                    9.1096+-0.5317            8.6955+-0.7775          might be 1.0476x faster
   switch-string-basic-big-var                       15.8295+-0.6737           15.5594+-0.3867          might be 1.0174x faster
   switch-string-basic-big                           14.6224+-0.0502     ?     14.9059+-0.3785        ? might be 1.0194x slower
   switch-string-basic-var                           12.9771+-0.2716     ?     13.0095+-0.2295        ?
   switch-string-basic                               12.9408+-0.5119     ?     13.0186+-0.4379        ?
   switch-string-big-length-tower-var                18.5831+-0.7327           17.9133+-0.4198          might be 1.0374x faster
   switch-string-length-tower-var                    13.5752+-0.2019     ?     13.7209+-0.5233        ? might be 1.0107x slower
   switch-string-length-tower                        11.9496+-0.5654     ?     12.0768+-0.7585        ? might be 1.0106x slower
   switch-string-short                               11.7242+-0.4068           11.6760+-0.1826        
   switch                                            13.5850+-0.4935           13.5618+-1.2831        
   tear-off-arguments-simple                          3.0074+-0.3629     ?      3.1211+-0.6195        ? might be 1.0378x slower
   tear-off-arguments                                 3.9759+-0.5293            3.8374+-0.1186          might be 1.0361x faster
   temporal-structure                                11.8560+-0.1123     ?     12.1626+-0.4011        ? might be 1.0259x slower
   to-int32-boolean                                  12.9554+-0.1760     ?     12.9683+-0.3637        ?
   try-catch-get-by-val-cloned-arguments             15.0916+-1.5418     ?     15.8611+-2.0070        ? might be 1.0510x slower
   try-catch-get-by-val-direct-arguments              6.3882+-0.1744            6.3547+-0.1022        
   try-catch-get-by-val-scoped-arguments              7.6905+-0.8553            7.4753+-0.2672          might be 1.0288x faster
   typed-array-get-set-by-val-profiling              27.8927+-1.0758           26.7041+-0.4257          might be 1.0445x faster
   undefined-property-access                        222.5137+-2.2166     ?    224.9151+-9.8847        ? might be 1.0108x slower
   undefined-test                                     2.8036+-0.0765     ?      2.9485+-0.2998        ? might be 1.0517x slower
   unprofiled-licm                                   14.4114+-0.7128           13.8948+-0.3951          might be 1.0372x faster
   varargs-call                                      13.8079+-0.1498     ?     13.9062+-0.2306        ?
   varargs-construct-inline                          22.4495+-1.1794     ?     22.5368+-0.8581        ?
   varargs-construct                                 20.5671+-1.8610           19.7297+-0.7125          might be 1.0424x faster
   varargs-inline                                     8.3321+-0.3159     ?      8.4739+-0.4129        ? might be 1.0170x slower
   varargs-strict-mode                               10.2015+-3.0752            8.9217+-0.0706          might be 1.1434x faster
   varargs                                            9.0400+-0.1417     ?      9.2693+-0.3876        ? might be 1.0254x slower
   weird-inlining-const-prop                          2.7458+-0.1319            2.7002+-0.2400          might be 1.0169x faster

   <geometric>                                        7.7854+-0.0322            7.7793+-0.0177          might be 1.0008x faster

                                                        TipOfTree                   Locks                                       
AsmBench:
   bigfib.cpp                                       451.7339+-2.7386          449.9489+-4.0480        
   cray.c                                           396.9018+-1.6414          396.5612+-3.0379        
   dry.c                                            426.3335+-14.6658         418.8844+-17.7420         might be 1.0178x faster
   FloatMM.c                                        684.4924+-3.5659          682.9594+-4.2914        
   gcc-loops.cpp                                   3417.9097+-20.6120    ?   3422.5210+-10.5589       ?
   n-body.c                                         825.2303+-6.7249          824.3636+-6.6153        
   Quicksort.c                                      405.9477+-2.4087          403.7581+-3.2103        
   stepanov_container.cpp                          3570.5713+-53.9702    ?   3580.5184+-46.7172       ?
   Towers.c                                         234.2159+-1.0423     ?    236.5055+-1.6307        ?

   <geometric>                                      716.9863+-3.2946          715.5897+-3.1498          might be 1.0020x faster

                                                        TipOfTree                   Locks                                       
CompressionBench:
   huffman                                           60.2603+-1.2999           59.1404+-1.9187          might be 1.0189x faster
   arithmetic-simple                                269.4722+-0.9750     ?    270.8483+-1.9146        ?
   arithmetic-precise                               245.4895+-1.8101     ?    246.4168+-2.1057        ?
   arithmetic-complex-precise                       245.1629+-0.8891     !    246.9762+-0.8129        ! definitely 1.0074x slower
   arithmetic-precise-order-0                       277.8969+-1.8417     ?    280.1409+-1.1917        ?
   arithmetic-precise-order-1                       303.8276+-1.6805     ?    304.0233+-1.3715        ?
   arithmetic-precise-order-2                       358.0378+-8.1948          357.3034+-1.4944        
   arithmetic-simple-order-1                        322.2326+-1.9211     ?    322.4513+-1.4659        ?
   arithmetic-simple-order-2                        385.6448+-9.7185          384.8462+-4.3312        
   lz-string                                        308.5616+-4.8886          307.8201+-5.6033        

   <geometric>                                      254.3239+-1.1574     ?    254.3342+-0.9678        ? might be 1.0000x slower

                                                        TipOfTree                   Locks                                       
Geomean of preferred means:
   <scaled-result>                                   51.1798+-0.2764     ?     51.2575+-0.1970        ? might be 1.0015x slower
Comment 19 Filip Pizlo 2015-08-04 18:10:00 PDT
(In reply to comment #17)
> > > From what I can tell, Fil's implementation does not support priority
> > > inheritance.  If all a low priority is thread is holding the Lock/Mutex and
> > > high priority threads are waiting on the ConditionVar, the owner thread of
> > > the Lock will continue to run at a low priority relative to other high
> > > priority threads.  Does WebKit rely on priority inheritance anywhere?
> > 
> > True.  Our SpinLock also doesn't support priority inheritance, and this
> > currently replaces SpinLock.  So, we should be fine.
> 
> This line of reasoning make me like the name WTF::SystemMutex for the old
> mutex.
> 
> If you explicitly need all the system mutex behaviors (i.e. priority
> inheritance) you should use WTF::SystemMutex. If you don't (i.e. most use
> cases) then just use WTF::Mutex.

But std::mutex gives you those things as well.  Therefore, you should use std::mutex instead of WTF::Mutex.  That means that WTF::Mutex should be called WTF::DeprecatedMutex.
Comment 20 Filip Pizlo 2015-08-05 14:08:10 PDT
Created attachment 258299 [details]
the patch
Comment 21 WebKit Commit Bot 2015-08-05 14:11:23 PDT
Attachment 258299 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/heap/ListableHandler.h:26:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:64:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:78:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WebKit2/WebProcess/WebPage/EventDispatcher.h:37:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WTF/wtf/MetaAllocator.h:40:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WTF/wtf/Mutex.cpp:68:  Missing space before {  [whitespace/braces] [5]
ERROR: Source/JavaScriptCore/heap/CopiedSpace.h:38:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WebKit2/WebProcess/WebPage/ViewUpdateDispatcher.h:34:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 8 in 34 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 22 Filip Pizlo 2015-08-05 16:19:43 PDT
Created attachment 258316 [details]
the patch
Comment 23 WebKit Commit Bot 2015-08-05 16:24:32 PDT
Attachment 258316 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/heap/ListableHandler.h:26:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:64:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:78:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WebKit2/WebProcess/WebPage/EventDispatcher.h:37:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WTF/wtf/MetaAllocator.h:40:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WTF/wtf/Mutex.cpp:68:  Missing space before {  [whitespace/braces] [5]
ERROR: Source/JavaScriptCore/heap/CopiedSpace.h:38:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WebKit2/WebProcess/WebPage/ViewUpdateDispatcher.h:34:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 8 in 36 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 24 Filip Pizlo 2015-08-06 12:43:34 PDT
Created attachment 258385 [details]
the patch
Comment 25 WebKit Commit Bot 2015-08-06 12:46:04 PDT
Attachment 258385 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/heap/ListableHandler.h:26:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:64:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:78:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WebKit2/WebProcess/WebPage/EventDispatcher.h:37:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WTF/wtf/MetaAllocator.h:40:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/JavaScriptCore/heap/CopiedSpace.h:38:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Tools/TestWebKitAPI/Tests/WTF/Lock.cpp:48:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WebKit2/WebProcess/WebPage/ViewUpdateDispatcher.h:34:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WTF/wtf/Lock.cpp:68:  Missing space before {  [whitespace/braces] [5]
Total errors found: 9 in 44 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 26 Filip Pizlo 2015-08-06 13:37:28 PDT
Created attachment 258392 [details]
fix build
Comment 27 WebKit Commit Bot 2015-08-06 13:39:24 PDT
Attachment 258392 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/heap/ListableHandler.h:26:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:64:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:78:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WebKit2/WebProcess/WebPage/EventDispatcher.h:37:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WTF/wtf/MetaAllocator.h:40:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/JavaScriptCore/heap/CopiedSpace.h:38:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Tools/TestWebKitAPI/Tests/WTF/Lock.cpp:48:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WebKit2/WebProcess/WebPage/ViewUpdateDispatcher.h:34:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WTF/wtf/Lock.cpp:68:  Missing space before {  [whitespace/braces] [5]
Total errors found: 9 in 43 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 28 Geoffrey Garen 2015-08-06 14:04:28 PDT
Comment on attachment 258385 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=258385&action=review

r=me

I think this is all correct code, but I have some substantial comments below.

Please fix style warnings.

> Source/JavaScriptCore/heap/CopiedSpace.cpp:194
> -        SpinLockHolder locker(&m_toSpaceLock);
> +        LockHolder locker(&m_toSpaceLock);

It would be nice just to use std::lock_guard, and get rid of our hand-rolled holders.

> Source/WTF/wtf/Lock.cpp:137
> +            // Release the queue lock.
> +            for (;;) {
> +                currentWordValue = m_word.load();
> +                ASSERT(currentWordValue & ~mask);
> +                ASSERT(currentWordValue & lockedQueueBit);
> +                bool result = m_word.compareExchangeWeak(
> +                    currentWordValue, currentWordValue & ~lockedQueueBit);

Given that another thread holds the lock, that we hold the queue lock, and that the other thread can't drop the lock until we drop the queue lock, it should be impossible at this point for any other thread to store to m_word. Therefore, we can do a plain store here without a looping compare-and-exchange.

If my reasoning were incorrect, then it would seem that queue modification would be non-atomic, which would probably be broken (or at least scary). Therefore, I think it is clearer to remove this looping compare-and-exchange.

> Source/WTF/wtf/Lock.cpp:139
> +                if (result)
> +                    break;

I would say return here. You have to read pretty far to see what break means.

> Source/WTF/wtf/Lock.cpp:148
> +            // We need a CAS loop to install the queue head. This CAS loop is just to protect against
> +            // isHeldBit changing. We also release the queue lock here.
> +            for (;;) {

Again, I find this loop confusing, because it seems to assert that it need not loop, and I believe the assertion.

> Source/WTF/wtf/Lock.cpp:229
> +    // Change the queue head, possibly removing it if newQueueHead is null. This is a CAS loop
> +    // out of paranoia. It doesn't have to be since the state of the lock cannot change right now.
> +    // We would get no performance win from making this anything but a CAS loop, and it would make
> +    // it confusing to add more bits to the lock.
> +    for (;;) {

It helps that you explain in a comment that this loop is not needed, but I really don't like reading and debugging code that says "this code shouldn't be here" :(. I would just remove this loop, and perform a naked store.

I don't understand what paranoia in this loop buys me. I guess it ensures, at a base level, that the queueHead bits will be valid, but if another thread is illegally modifying the queue or the lock while I hold the lock and the queue lock, I don't think a coherent set of queueHead bits will save me from the other incoherencies that will result.

> Source/WTF/wtf/Lock.cpp:255
> +        std::unique_lock<std::mutex> locker(queueHead->parkingLock);

This should be std::lock_guard since it doesn't need to unlock manually. Also, you should scope the lock_guard so that you release parkingLock right before calling notify. Otherwise, you create a pathology where the awoken thread immediately re-sleeps due to contention on parkingLock.

> Source/WTF/wtf/Lock.cpp:258
> +        // Use notify_all() out of paranoia. In reality, only the blocked thread is waiting on
> +        // this condition variable.

I would call this convenience or expedience rather than paranoia. We really do need to notify all threads, since we have no API for notifying just the head of the queue.

> Source/WTF/wtf/Lock.h:136
> +    bool isHeld() const
> +    {
> +        return m_word.load(std::memory_order_acquire) & isHeldBit;
> +    }
> +
> +    bool isLocked() const
> +    {
> +        return isHeld();
> +    }

Can we use just one name for this?

> Source/WTF/wtf/Lock.h:138
> +    // Everything below here should be considered private. It's not private because of POD rules.

Can we say protected here, or perhaps private and make Lock a friend? POD structs can have private stuff.

> Source/WTF/wtf/Lock.h:141
> +    static const uintptr_t isHeldBit = 1;
> +    static const uintptr_t lockedQueueBit = 2;
> +    static const uintptr_t mask = 3;

It would be nice to have helper functions for dealing with these bits:

bool isLocked(uintptr_t);
uintptr_t lock(uintptr_t);
uintptr_t unlock(uintptr_t);

bool isQueueLocked(uintptr_t);
uintptr_t lockQueue(uintptr_t);
uintptr_t unlockQueue(uintptr_t);

ThreadData* queue(uintptr_t word);
uintptr_t setQueue(uintptr_t, ThreadData*);

In addition to being more readable at the call site, these functions can help document and verify preconditions. For example, unlock can assert that the queue is not locked, lockQueue can assert that the lock is locked, and so on.
Comment 29 Filip Pizlo 2015-08-06 14:43:41 PDT
(In reply to comment #28)
> Comment on attachment 258385 [details]
> the patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=258385&action=review
> 
> r=me
> 
> I think this is all correct code, but I have some substantial comments below.
> 
> Please fix style warnings.

OK, but I'll leave the ones regarding lambdas alone since what the style bot thinks is the opposite of what everyone does.

> 
> > Source/JavaScriptCore/heap/CopiedSpace.cpp:194
> > -        SpinLockHolder locker(&m_toSpaceLock);
> > +        LockHolder locker(&m_toSpaceLock);
> 
> It would be nice just to use std::lock_guard, and get rid of our hand-rolled
> holders.

Maybe, but that seems like a different patch.  Also, are you advocating having to always say std::lock_guard<Lock> locker(...) instead of LockHolder locker(...)?  The latter is more concise.

> 
> > Source/WTF/wtf/Lock.cpp:137
> > +            // Release the queue lock.
> > +            for (;;) {
> > +                currentWordValue = m_word.load();
> > +                ASSERT(currentWordValue & ~mask);
> > +                ASSERT(currentWordValue & lockedQueueBit);
> > +                bool result = m_word.compareExchangeWeak(
> > +                    currentWordValue, currentWordValue & ~lockedQueueBit);
> 
> Given that another thread holds the lock, that we hold the queue lock, and
> that the other thread can't drop the lock until we drop the queue lock, it
> should be impossible at this point for any other thread to store to m_word.
> Therefore, we can do a plain store here without a looping
> compare-and-exchange.

This is true.  Using a CAS loop here makes it easier to assert that it's true.  But, I'm fine with removing the loop.

> 
> If my reasoning were incorrect, then it would seem that queue modification
> would be non-atomic, which would probably be broken (or at least scary).
> Therefore, I think it is clearer to remove this looping compare-and-exchange.
> 
> > Source/WTF/wtf/Lock.cpp:139
> > +                if (result)
> > +                    break;
> 
> I would say return here. You have to read pretty far to see what break means.

OK.

> 
> > Source/WTF/wtf/Lock.cpp:148
> > +            // We need a CAS loop to install the queue head. This CAS loop is just to protect against
> > +            // isHeldBit changing. We also release the queue lock here.
> > +            for (;;) {
> 
> Again, I find this loop confusing, because it seems to assert that it need
> not loop, and I believe the assertion.

You're right, isHeldBit cannot change here.  It's true that making this a loop makes it easier to assert things.

> 
> > Source/WTF/wtf/Lock.cpp:229
> > +    // Change the queue head, possibly removing it if newQueueHead is null. This is a CAS loop
> > +    // out of paranoia. It doesn't have to be since the state of the lock cannot change right now.
> > +    // We would get no performance win from making this anything but a CAS loop, and it would make
> > +    // it confusing to add more bits to the lock.
> > +    for (;;) {
> 
> It helps that you explain in a comment that this loop is not needed, but I
> really don't like reading and debugging code that says "this code shouldn't
> be here" :(. I would just remove this loop, and perform a naked store.

OK.

> 
> I don't understand what paranoia in this loop buys me. I guess it ensures,
> at a base level, that the queueHead bits will be valid, but if another
> thread is illegally modifying the queue or the lock while I hold the lock
> and the queue lock, I don't think a coherent set of queueHead bits will save
> me from the other incoherencies that will result.

Loading, asserting, and then CASing to change state ensures that the assert held right up until the moment when you unlock.

If you assert that something is true about something you load(), and then you do a store(), then the bad state could have arisen after the load() but before the store().

> 
> > Source/WTF/wtf/Lock.cpp:255
> > +        std::unique_lock<std::mutex> locker(queueHead->parkingLock);
> 
> This should be std::lock_guard since it doesn't need to unlock manually.
> Also, you should scope the lock_guard so that you release parkingLock right
> before calling notify. Otherwise, you create a pathology where the awoken
> thread immediately re-sleeps due to contention on parkingLock.

You're right that the style I'm using can cause this rare pathology.  My performance testing seems to imply that the pathology isn't happening, in the sense that when I have enough contention to cause parking, my locks perform as well as a WTF::Mutex - so it's unlikely that the awoken thread is paying a steeper price than just waking up out of the condition variable and then successfully acquiring the parking lock.

I'm fine with changing this, but the counterpoint here is that the style I'm using is less error-prone. It prevents a large class of bugs where someone calls notify after having changed some state but without ever holding the lock.  For example, if unlock() never acquired the parking lock, unlock() could end up setting shouldPark just after lock() checks that shouldPark is still true and just before lock() calls wait().  The way that I like to mentally steer myself away from making that mistake is to add "notify should be called while the lock is held" to my mental checklist.  That's a good way of ensuring that you don't mess up, since:

shouldPark = false;
lock {
    notify();
}

is just as correct as either of these:

lock {
    shouldPark = false;
    notify();
}

shouldPark = false;
lock { }
notify();

While this is definitely wrong:

shouldPark = false;
notify();

So, forcing the lock to be held while calling notify() is often enough to compensate for a race condition involving the state variable (in this case, shouldPark).  I'd say that's a pretty good deal.

But for this code I'm down with being precise and efficient, since it's fine to spend more time thinking about this code.  But as a general rule, I guess I'd want to be able to continue to use my "notify while locked" mental checklist item; I'd even prefer if our own API for notify ASSERTed that the lock was held.  I'm curious if the "stuck in unlock" pathology affects WebKit.  Are you bringing it up as a hypothetical or do we have experience with it?

> 
> > Source/WTF/wtf/Lock.cpp:258
> > +        // Use notify_all() out of paranoia. In reality, only the blocked thread is waiting on
> > +        // this condition variable.
> 
> I would call this convenience or expedience rather than paranoia. We really
> do need to notify all threads, since we have no API for notifying just the
> head of the queue.

No, it's paranoia.  There is at most one thread waiting on the parking condition.  Each thread has its own parking condition.  The parking condition is only waited on by the thread that owns it.  Therefore, notify_all and notify_one are exactly the same here.

I have two reasons for preferring notify_all pretty much all the time:

- If you allow yourself to use notify_one, you will probably eventually introduce a hard-to-catch deadlock due to there being more than one thread that needed to be notified.

- In my Jikes RVM days, I played around with making Java's notify() just do notifyAll() behind your back.  There was no change in performance on the benchmarks I had available.

I suspect that you could easily craft a microbenchmark that gets slower if you force yourself to use notify_all() even though you could have used notify_one().  But I've never seen notify_one() turn into a speed-up, and I have seen it turn into a deadlock.  That seems like a rotten deal to me.  For that reason, I prefer notify_all().

In this code, I'm fine with using notify_one() if you believe that the proper style is to be precise about notifications.  But as a general style, I like notify_all() better.  OTOH, in this code, it totally doesn't matter since it would be a bug if more than one thread waited on the same parking condition.

> 
> > Source/WTF/wtf/Lock.h:136
> > +    bool isHeld() const
> > +    {
> > +        return m_word.load(std::memory_order_acquire) & isHeldBit;
> > +    }
> > +
> > +    bool isLocked() const
> > +    {
> > +        return isHeld();
> > +    }
> 
> Can we use just one name for this?

I'd rather save that for another patch.  I was trying to minimize code changes due to existing inconsistency.

> 
> > Source/WTF/wtf/Lock.h:138
> > +    // Everything below here should be considered private. It's not private because of POD rules.
> 
> Can we say protected here, or perhaps private and make Lock a friend? POD
> structs can have private stuff.

The old rules definitely said otherwise, and we seem to still obey those old rules in SpinLockBase.  I guess maybe the rules changed: http://stackoverflow.com/questions/4762788/can-a-class-with-all-private-members-be-a-pod-class

So I guess in new C++, LockBase would be POD for our purposes if all data members are private, which is easy to ensure.

I'm going to try to use protected, but I might revert that if one of our various compilers doesn't like this.

> 
> > Source/WTF/wtf/Lock.h:141
> > +    static const uintptr_t isHeldBit = 1;
> > +    static const uintptr_t lockedQueueBit = 2;
> > +    static const uintptr_t mask = 3;
> 
> It would be nice to have helper functions for dealing with these bits:
> 
> bool isLocked(uintptr_t);
> uintptr_t lock(uintptr_t);
> uintptr_t unlock(uintptr_t);
> 
> bool isQueueLocked(uintptr_t);
> uintptr_t lockQueue(uintptr_t);
> uintptr_t unlockQueue(uintptr_t);
> 
> ThreadData* queue(uintptr_t word);
> uintptr_t setQueue(uintptr_t, ThreadData*);
> 
> In addition to being more readable at the call site, these functions can
> help document and verify preconditions. For example, unlock can assert that
> the queue is not locked, lockQueue can assert that the lock is locked, and
> so on.

I actually disagree with this.  To me, if I had the option of:

1) word = lock(word);
2) word |= isHeldBit;

I would take (2) any day, because the intent is just as clear as (1) and the implementation is strictly more clear.  With (1), I'd have to look to see how the act of locking was represented.
Comment 30 Filip Pizlo 2015-08-06 14:56:09 PDT
Created attachment 258402 [details]
patch for landing

This addresses all of ggaren's comments except for check-style fixes and using helpers for setting and clearing the m_word bits.
Comment 31 WebKit Commit Bot 2015-08-06 14:58:15 PDT
Attachment 258402 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/heap/ListableHandler.h:26:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:64:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:78:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WebKit2/WebProcess/WebPage/EventDispatcher.h:37:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WTF/wtf/MetaAllocator.h:40:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/JavaScriptCore/heap/CopiedSpace.h:38:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Tools/TestWebKitAPI/Tests/WTF/Lock.cpp:48:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WebKit2/WebProcess/WebPage/ViewUpdateDispatcher.h:34:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WTF/wtf/Lock.cpp:69:  Missing space before {  [whitespace/braces] [5]
Total errors found: 9 in 43 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 32 Filip Pizlo 2015-08-06 15:02:31 PDT
(In reply to comment #31)
> Attachment 258402 [details] did not pass style-queue:
> 
> 
> ERROR: Source/JavaScriptCore/heap/ListableHandler.h:26:  Alphabetical
> sorting problem.  [build/include_order] [4]
> ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:64:  Place brace on its own
> line for function definitions.  [whitespace/braces] [4]
> ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:78:  Place brace on its own
> line for function definitions.  [whitespace/braces] [4]
> ERROR: Source/WebKit2/WebProcess/WebPage/EventDispatcher.h:37:  Alphabetical
> sorting problem.  [build/include_order] [4]
> ERROR: Source/WTF/wtf/MetaAllocator.h:40:  Alphabetical sorting problem. 
> [build/include_order] [4]
> ERROR: Source/JavaScriptCore/heap/CopiedSpace.h:38:  Alphabetical sorting
> problem.  [build/include_order] [4]
> ERROR: Tools/TestWebKitAPI/Tests/WTF/Lock.cpp:48:  Place brace on its own
> line for function definitions.  [whitespace/braces] [4]
> ERROR: Source/WebKit2/WebProcess/WebPage/ViewUpdateDispatcher.h:34: 
> Alphabetical sorting problem.  [build/include_order] [4]
> ERROR: Source/WTF/wtf/Lock.cpp:69:  Missing space before { 
> [whitespace/braces] [5]
> Total errors found: 9 in 43 files
> 
> 
> If any of these errors are false positives, please file a bug against
> check-webkit-style.

Fixed most of this locally.  The "Place brace" and "Missing space" errors are because check-webkit-style doesn't grok lambdas.
Comment 33 Filip Pizlo 2015-08-06 15:04:06 PDT
Created attachment 258405 [details]
fix most style
Comment 34 WebKit Commit Bot 2015-08-06 15:07:29 PDT
Attachment 258405 [details] did not pass style-queue:


ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:64:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:78:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Tools/TestWebKitAPI/Tests/WTF/Lock.cpp:48:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 3 in 43 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 35 Geoffrey Garen 2015-08-06 15:43:22 PDT
> > > Source/JavaScriptCore/heap/CopiedSpace.cpp:194
> > > -        SpinLockHolder locker(&m_toSpaceLock);
> > > +        LockHolder locker(&m_toSpaceLock);
> > 
> > It would be nice just to use std::lock_guard, and get rid of our hand-rolled
> > holders.
> 
> Maybe, but that seems like a different patch.  Also, are you advocating
> having to always say std::lock_guard<Lock> locker(...) instead of LockHolder
> locker(...)?  The latter is more concise.

Yeah, I guess it's a different patch.

I'm primarily saying that it's nice not to reinvent things. If we like brevity, I'd prefer to typedef std::lock_guard<Lock> to LockHolder or LockGuard rather than write a class.

The lock_guard has the slight downside that we would need to obey the C++ naming scheme.

> So, forcing the lock to be held while calling notify() is often enough to
> compensate for a race condition involving the state variable (in this case,
> shouldPark).  I'd say that's a pretty good deal.
> 
> But for this code I'm down with being precise and efficient, since it's fine
> to spend more time thinking about this code.  But as a general rule, I guess
> I'd want to be able to continue to use my "notify while locked" mental
> checklist item; I'd even prefer if our own API for notify ASSERTed that the
> lock was held.  I'm curious if the "stuck in unlock" pathology affects
> WebKit.  Are you bringing it up as a hypothetical or do we have experience
> with it?

I think you have a reasonable counterpoint for non-performance-critical code -- and, indeed, it took me a while to understand the subtlety here.

I'm bringing up lock/notify performance as a hypothetical, based on reading C++ documentation. I haven't measured a performance problem caused by lock/notify.

Perhaps we want a custom version of notify that requires a lock, and then unlocks it for you before notifying, as a kind of reminder that you should have locked before notifying, but without the performance penalty.

For my part, I find it easy enough to remember the mental rule "unlock right before notify" -- which implies the same safety as "unlock right after notify", since both require having previously locked. Maybe we should bring this up on webkit-dev to see how other people feel.

> > > Source/WTF/wtf/Lock.cpp:258
> > > +        // Use notify_all() out of paranoia. In reality, only the blocked thread is waiting on
> > > +        // this condition variable.
> > 
> > I would call this convenience or expedience rather than paranoia. We really
> > do need to notify all threads, since we have no API for notifying just the
> > head of the queue.
> 
> No, it's paranoia.  There is at most one thread waiting on the parking
> condition.  Each thread has its own parking condition.  

Oh, I see -- because we guard the parking condition with the parking mutex.

> The parking
> condition is only waited on by the thread that owns it.  Therefore,
> notify_all and notify_one are exactly the same here.
> 
> I have two reasons for preferring notify_all pretty much all the time:
> 
> - If you allow yourself to use notify_one, you will probably eventually
> introduce a hard-to-catch deadlock due to there being more than one thread
> that needed to be notified.
> 
> - In my Jikes RVM days, I played around with making Java's notify() just do
> notifyAll() behind your back.  There was no change in performance on the
> benchmarks I had available.
> 
> I suspect that you could easily craft a microbenchmark that gets slower if
> you force yourself to use notify_all() even though you could have used
> notify_one().  But I've never seen notify_one() turn into a speed-up, and I
> have seen it turn into a deadlock.  That seems like a rotten deal to me. 
> For that reason, I prefer notify_all().
> 
> In this code, I'm fine with using notify_one() if you believe that the
> proper style is to be precise about notifications.  But as a general style,
> I like notify_all() better.  OTOH, in this code, it totally doesn't matter
> since it would be a bug if more than one thread waited on the same parking
> condition.

Hmmm... I haven't written enough wait/notify code to have a strong intuition about notify_one vs notify_all. I'm happy with either in this case.

> > > Source/WTF/wtf/Lock.h:141
> > > +    static const uintptr_t isHeldBit = 1;
> > > +    static const uintptr_t lockedQueueBit = 2;
> > > +    static const uintptr_t mask = 3;
> > 
> > It would be nice to have helper functions for dealing with these bits:
> > 
> > bool isLocked(uintptr_t);
> > uintptr_t lock(uintptr_t);
> > uintptr_t unlock(uintptr_t);
> > 
> > bool isQueueLocked(uintptr_t);
> > uintptr_t lockQueue(uintptr_t);
> > uintptr_t unlockQueue(uintptr_t);
> > 
> > ThreadData* queue(uintptr_t word);
> > uintptr_t setQueue(uintptr_t, ThreadData*);
> > 
> > In addition to being more readable at the call site, these functions can
> > help document and verify preconditions. For example, unlock can assert that
> > the queue is not locked, lockQueue can assert that the lock is locked, and
> > so on.
> 
> I actually disagree with this.  To me, if I had the option of:
> 
> 1) word = lock(word);
> 2) word |= isHeldBit;
> 
> I would take (2) any day, because the intent is just as clear as (1) and the
> implementation is strictly more clear.  With (1), I'd have to look to see
> how the act of locking was represented.

In that case, how about these name changes: 
isLockedBit
isQueueLockedBit
queueHeadMask
Comment 36 Geoffrey Garen 2015-08-06 15:51:19 PDT
> > No, it's paranoia.  There is at most one thread waiting on the parking
> > condition.  Each thread has its own parking condition.  
> 
> Oh, I see -- because we guard the parking condition with the parking mutex.

LOL, no, that was wrong -- it's because there's one condition variable per thread!

I guess in the case of one condition variable per thread, you'd be taking the "always notify_all()" idiom to a bit of an extreme, but I still don't have a strong negative feeling about it.
Comment 37 Filip Pizlo 2015-08-06 15:52:30 PDT
(In reply to comment #35)
> > > > Source/JavaScriptCore/heap/CopiedSpace.cpp:194
> > > > -        SpinLockHolder locker(&m_toSpaceLock);
> > > > +        LockHolder locker(&m_toSpaceLock);
> > > 
> > > It would be nice just to use std::lock_guard, and get rid of our hand-rolled
> > > holders.
> > 
> > Maybe, but that seems like a different patch.  Also, are you advocating
> > having to always say std::lock_guard<Lock> locker(...) instead of LockHolder
> > locker(...)?  The latter is more concise.
> 
> Yeah, I guess it's a different patch.
> 
> I'm primarily saying that it's nice not to reinvent things. If we like
> brevity, I'd prefer to typedef std::lock_guard<Lock> to LockHolder or
> LockGuard rather than write a class.
> 
> The lock_guard has the slight downside that we would need to obey the C++
> naming scheme.
> 
> > So, forcing the lock to be held while calling notify() is often enough to
> > compensate for a race condition involving the state variable (in this case,
> > shouldPark).  I'd say that's a pretty good deal.
> > 
> > But for this code I'm down with being precise and efficient, since it's fine
> > to spend more time thinking about this code.  But as a general rule, I guess
> > I'd want to be able to continue to use my "notify while locked" mental
> > checklist item; I'd even prefer if our own API for notify ASSERTed that the
> > lock was held.  I'm curious if the "stuck in unlock" pathology affects
> > WebKit.  Are you bringing it up as a hypothetical or do we have experience
> > with it?
> 
> I think you have a reasonable counterpoint for non-performance-critical code
> -- and, indeed, it took me a while to understand the subtlety here.
> 
> I'm bringing up lock/notify performance as a hypothetical, based on reading
> C++ documentation. I haven't measured a performance problem caused by
> lock/notify.
> 
> Perhaps we want a custom version of notify that requires a lock, and then
> unlocks it for you before notifying, as a kind of reminder that you should
> have locked before notifying, but without the performance penalty.

How do you do this with lexically scoped locking?  The "notify while locked" rule is easier to enforce.

I guess we could have an API that does the notify as part of ~LockHolder, after it calls unlock.  That strikes me as the best: notify() takes a special LockHolder that has Vector<Condition*, 1> in it.  This can be used to both assert the safety property and ensure that the notify() comes after unlock().

> 
> For my part, I find it easy enough to remember the mental rule "unlock right
> before notify" -- which implies the same safety as "unlock right after
> notify", since both require having previously locked. Maybe we should bring
> this up on webkit-dev to see how other people feel.

I'd probably want to bring this up once I start writing a replacement for ThreadCondition/std::condition_variable using the ParkingLot (https://bugs.webkit.org/show_bug.cgi?id=147665).  The API I propose above seems the nicest, so I'll propose that.

> 
> > > > Source/WTF/wtf/Lock.cpp:258
> > > > +        // Use notify_all() out of paranoia. In reality, only the blocked thread is waiting on
> > > > +        // this condition variable.
> > > 
> > > I would call this convenience or expedience rather than paranoia. We really
> > > do need to notify all threads, since we have no API for notifying just the
> > > head of the queue.
> > 
> > No, it's paranoia.  There is at most one thread waiting on the parking
> > condition.  Each thread has its own parking condition.  
> 
> Oh, I see -- because we guard the parking condition with the parking mutex.
> 
> > The parking
> > condition is only waited on by the thread that owns it.  Therefore,
> > notify_all and notify_one are exactly the same here.
> > 
> > I have two reasons for preferring notify_all pretty much all the time:
> > 
> > - If you allow yourself to use notify_one, you will probably eventually
> > introduce a hard-to-catch deadlock due to there being more than one thread
> > that needed to be notified.
> > 
> > - In my Jikes RVM days, I played around with making Java's notify() just do
> > notifyAll() behind your back.  There was no change in performance on the
> > benchmarks I had available.
> > 
> > I suspect that you could easily craft a microbenchmark that gets slower if
> > you force yourself to use notify_all() even though you could have used
> > notify_one().  But I've never seen notify_one() turn into a speed-up, and I
> > have seen it turn into a deadlock.  That seems like a rotten deal to me. 
> > For that reason, I prefer notify_all().
> > 
> > In this code, I'm fine with using notify_one() if you believe that the
> > proper style is to be precise about notifications.  But as a general style,
> > I like notify_all() better.  OTOH, in this code, it totally doesn't matter
> > since it would be a bug if more than one thread waited on the same parking
> > condition.
> 
> Hmmm... I haven't written enough wait/notify code to have a strong intuition
> about notify_one vs notify_all. I'm happy with either in this case.
> 
> > > > Source/WTF/wtf/Lock.h:141
> > > > +    static const uintptr_t isHeldBit = 1;
> > > > +    static const uintptr_t lockedQueueBit = 2;
> > > > +    static const uintptr_t mask = 3;
> > > 
> > > It would be nice to have helper functions for dealing with these bits:
> > > 
> > > bool isLocked(uintptr_t);
> > > uintptr_t lock(uintptr_t);
> > > uintptr_t unlock(uintptr_t);
> > > 
> > > bool isQueueLocked(uintptr_t);
> > > uintptr_t lockQueue(uintptr_t);
> > > uintptr_t unlockQueue(uintptr_t);
> > > 
> > > ThreadData* queue(uintptr_t word);
> > > uintptr_t setQueue(uintptr_t, ThreadData*);
> > > 
> > > In addition to being more readable at the call site, these functions can
> > > help document and verify preconditions. For example, unlock can assert that
> > > the queue is not locked, lockQueue can assert that the lock is locked, and
> > > so on.
> > 
> > I actually disagree with this.  To me, if I had the option of:
> > 
> > 1) word = lock(word);
> > 2) word |= isHeldBit;
> > 
> > I would take (2) any day, because the intent is just as clear as (1) and the
> > implementation is strictly more clear.  With (1), I'd have to look to see
> > how the act of locking was represented.
> 
> In that case, how about these name changes: 
> isLockedBit
> isQueueLockedBit
> queueHeadMask

Yeah, those are better names.  I'll make the change.
Comment 38 Filip Pizlo 2015-08-06 15:53:42 PDT
(In reply to comment #36)
> > > No, it's paranoia.  There is at most one thread waiting on the parking
> > > condition.  Each thread has its own parking condition.  
> > 
> > Oh, I see -- because we guard the parking condition with the parking mutex.
> 
> LOL, no, that was wrong -- it's because there's one condition variable per
> thread!
> 
> I guess in the case of one condition variable per thread, you'd be taking
> the "always notify_all()" idiom to a bit of an extreme, but I still don't
> have a strong negative feeling about it.

Yup, it's an extreme alright!  I've changed the code to use notify_one().  I think that these kinds of safety idioms make more sense in higher-level code where we don't scrutinize it as much.
Comment 39 Geoffrey Garen 2015-08-06 15:58:50 PDT
> > Perhaps we want a custom version of notify that requires a lock, and then
> > unlocks it for you before notifying, as a kind of reminder that you should
> > have locked before notifying, but without the performance penalty.
> 
> How do you do this with lexically scoped locking?  The "notify while locked"
> rule is easier to enforce.

Either you manually close the scope and then notify (as you did in this patch), or the notify helper function takes an std::unique_lock, which allows it to manually unlock, and which notices at the end of scope that it has already been unlocked, and you call the helper function right before the scope closes.

That said, I agree that it can be annoying to manually close a scope early and move the notify out of it (as you did in this patch).

> I guess we could have an API that does the notify as part of ~LockHolder,
> after it calls unlock.  That strikes me as the best: notify() takes a
> special LockHolder that has Vector<Condition*, 1> in it.  This can be used
> to both assert the safety property and ensure that the notify() comes after
> unlock().

Seems pretty good to me.
Comment 40 Geoffrey Garen 2015-08-06 16:07:58 PDT
> That said, I agree that it can be annoying to manually close a scope early
> and move the notify out of it (as you did in this patch).

An argument in favor of your "hold like while notify" proposal is the degenerate case where your predicate is guarded by some other mutex or lockless design, and so you have a mutex solely for your condition variable.

In such a case, you might be tempted to do this:

Waiter:
while (!predicate()) {
    unique_lock<mutex> lock(m_conditionMutex);
    m_condition.wait(lock...);
}

Signaler:
// I happen to know that predicate() is true now, so I'll signal.
m_condition.notify_all();

But this code is incorrect! In order to make it correct, you must acquire the lock before you signal, like so:

Signaler:
{ lock_guard<mutex> lock(m_conditionMutex); }
m_condition.notify_all();

It is indeed a simpler mental model to think, "The lock guards the condition variable" -- even though it's not quite true.
Comment 41 Filip Pizlo 2015-08-06 16:59:44 PDT
(In reply to comment #40)
> > That said, I agree that it can be annoying to manually close a scope early
> > and move the notify out of it (as you did in this patch).
> 
> An argument in favor of your "hold like while notify" proposal is the
> degenerate case where your predicate is guarded by some other mutex or
> lockless design, and so you have a mutex solely for your condition variable.
> 
> In such a case, you might be tempted to do this:
> 
> Waiter:
> while (!predicate()) {
>     unique_lock<mutex> lock(m_conditionMutex);
>     m_condition.wait(lock...);
> }
> 
> Signaler:
> // I happen to know that predicate() is true now, so I'll signal.
> m_condition.notify_all();
> 
> But this code is incorrect! In order to make it correct, you must acquire
> the lock before you signal, like so:
> 
> Signaler:
> { lock_guard<mutex> lock(m_conditionMutex); }
> m_condition.notify_all();
> 
> It is indeed a simpler mental model to think, "The lock guards the condition
> variable" -- even though it's not quite true.

Interesting, it's been a while since I've seen that kind of code!  I probably got bit by this when I first learned how to threads.

Simple mental models are very powerful for condition-variable-heavy code because the performance benefits of more complex approaches are low while the risks are high.  I've yet to see a legit program (not a pathological microbenchmark) where notify_one was better than notify_all or where calling notify while locked caused a performance problem, or indeed where the speed of the wait/notify algorithm even mattered that much.  I've also yet to see an example where using one condition variable per lock ever caused a performance problem.  So, I prefer to simplify the model down to:

- 1 condition variable per lock, in the style of Monitors in Java.
- only one kind of notify: notifyAll().
- notifyAll() must be called while the lock is held.

The only problem that this could introduce is some thread waking up too soon and then realizing that it should go back to sleep.  This would be a troubling issue if we had a lot of threads - but we don't.  We rely on threads mostly for concurrency rather than parallelism.

I think a lot of why sophisticated wait/notify tricks don't pan out is that all they can do is reduce the probability of spurious thread wake-ups.  The probability of spurious thread wake-ups is low to begin with.  The cost of spurious thread wake-up is small - it's just an extra context switch.  You'd have to spuriously wake up a lot of threads for it to matter.  In WebKit, probably the maximum number of threads you could wake up with any condition variable is bounded by NUM_CPU.

Thinking more about it, I'd be *really* surprised if the notify-before-unlock issue ever arose in real code because if it did, then it would contradict the traditional greediness assumption in locking algorithms: the currently running thread has effectively higher priority than any thread we could wake up because in most schedulers, if a thread wakes up another thread, it doesn't cause an immediate context switch.  Instead, the thread that was previously running will continue to run - at full speed, on a hot cache.  The woken-up thread is at lower priority because either it: (1) is slated to run on the current CPU but cannot because the thread that did notify() is still running, (2) is slated to run on another CPU but that CPU is still doing something else, or (3) is slated to run on another CPU and that CPU is idle, and so it needs to start up.  In (3), there will be a delay since everything will be running on a cold cache.  Also, OS schedulers sometimes delay waking up a CPU to run a thread under the optimistic assumption that the current CPU will go idle before the other one will wake up.  The notion of greed is known in the literature (see for example http://janvitek.github.io/vitekj/490s12/Schedule_files/ols2002-pages-479-495.pdf, page 3) and is thought to be a key quality of barging locks.  I witnessed it when I was writing benchmarks of lock fairness, years ago - if you have many threads all doing "while (true) lock { cnt++; }" then one thread will usually spend some time repeatedly locking and unlocking the lock until it loses its timeslice, and only then will another thread have a chance at acquiring the lock. This property is what makes barging locks faster than FIFO locks - you get fewer context switches.  Because of how fundamental greed is to scheduling, it seems super unlikely that a call to notify() will cause a thread to run before you get around to calling unlock().  Most likely, the notify() will not cause an immediate context switch, and most likely, the current thread will not run out of its timeslice just as it is about to call unlock().

I want to think about this more before proposing a style change, and probably, instead of proposing a style change the best approach would be to write a patch that imposes the style globally and then prove that this patch is not a slow-down.  That, I think should be the thing we use to determine our style: the simplest and most fool-proof style that still maximizes performance.
Comment 42 Filip Pizlo 2015-08-06 17:50:34 PDT
Landed in http://trac.webkit.org/changeset/188100
Comment 43 Filip Pizlo 2015-08-06 19:38:14 PDT
I'm seeing some timeouts on the WK2 bots.  I'm investigating.
Comment 44 Filip Pizlo 2015-08-06 21:18:00 PDT
Fix some test gardening in http://trac.webkit.org/changeset/188113
Comment 45 Simon Fraser (smfr) 2015-08-06 22:46:44 PDT
Window is pretty crashy:
https://build.webkit.org/builders/Apple%20Win%207%20Debug%20%28Tests%29?numbuilds=100
Comment 46 Filip Pizlo 2015-08-06 23:16:32 PDT
(In reply to comment #45)
> Window is pretty crashy:
> https://build.webkit.org/builders/
> Apple%20Win%207%20Debug%20%28Tests%29?numbuilds=100

Oh man, Windows makes weird assertions in compare_exchange_weak and friends.  I've got a fix.
Comment 47 Filip Pizlo 2015-08-06 23:18:07 PDT
(In reply to comment #46)
> (In reply to comment #45)
> > Window is pretty crashy:
> > https://build.webkit.org/builders/
> > Apple%20Win%207%20Debug%20%28Tests%29?numbuilds=100
> 
> Oh man, Windows makes weird assertions in compare_exchange_weak and friends.
> I've got a fix.

I think this fixes it: http://trac.webkit.org/changeset/188117
Comment 48 Csaba Osztrogonác 2015-08-07 02:18:58 PDT
(In reply to comment #44)
> Fix some test gardening in http://trac.webkit.org/changeset/188113

It made ~500-1000 tests fail on ARM Linux bots. Could you check it?
See bug147776 for details.
Comment 49 Chris Dumez 2015-08-07 11:37:05 PDT
Completely broke iOS: rdar://problem/22191332
Comment 50 Chris Dumez 2015-08-07 11:52:25 PDT
(In reply to comment #49)
> Completely broke iOS: rdar://problem/22191332

Rolled out in <http://trac.webkit.org/changeset/188144>.
Comment 51 Chris Dumez 2015-08-07 11:56:49 PDT
(In reply to comment #50)
> (In reply to comment #49)
> > Completely broke iOS: rdar://problem/22191332
> 
> Rolled out in <http://trac.webkit.org/changeset/188144>.

Sorry, it looks like it messed up my commit message somehow.
Comment 52 Filip Pizlo 2015-08-07 13:05:28 PDT
(In reply to comment #51)
> (In reply to comment #50)
> > (In reply to comment #49)
> > > Completely broke iOS: rdar://problem/22191332
> > 
> > Rolled out in <http://trac.webkit.org/changeset/188144>.
> 
> Sorry, it looks like it messed up my commit message somehow.

The bug was that the unlock fast path was using a weak CAS, which could cause the unlock slow path to run even when there is nobody on the lock's queue.  But the unlock slow path assumes that once the queue lock is acquired, the queue will be non-empty - and it dereferences the queue head unconditionally.  That null dereference was crashing every time that the weak CAS failed spuriously.  This is an ARM-only issue because X86 has no notion of weak CAS; all CASes are strong and so they never fail spuriously.

I'll reland shortly with the fix.
Comment 53 Filip Pizlo 2015-08-07 13:06:07 PDT
This got rolled out so reopening.
Comment 54 Filip Pizlo 2015-08-07 13:10:45 PDT
Created attachment 258522 [details]
patch for relanding
Comment 55 WebKit Commit Bot 2015-08-07 13:13:55 PDT
Attachment 258522 [details] did not pass style-queue:


ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:64:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/WTF/benchmarks/LockSpeedTest.cpp:78:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Tools/TestWebKitAPI/Tests/WTF/Lock.cpp:48:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 3 in 44 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 56 Filip Pizlo 2015-08-07 15:40:01 PDT
Landed in http://trac.webkit.org/changeset/188169