Bug 162316 - The write barrier should be down with TSO
Summary: The write barrier should be down with TSO
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: All All
: P2 Normal
Assignee: Filip Pizlo
URL:
Keywords:
: 162318 (view as bug list)
Depends on: 162342 162343 162354 162417 162461 162721
Blocks: 149432
  Show dependency treegraph
 
Reported: 2016-09-20 13:56 PDT by Filip Pizlo
Modified: 2016-09-29 05:45 PDT (History)
14 users (show)

See Also:


Attachments
work in progress (29.60 KB, patch)
2016-09-23 09:59 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
TSO barrier (51.33 KB, patch)
2016-09-23 17:47 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
a different approach (62.29 KB, patch)
2016-09-24 10:54 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
possibly cheaper and less wrong TSO barrier (70.57 KB, patch)
2016-09-25 15:36 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (80.65 KB, patch)
2016-09-25 17:17 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (80.37 KB, patch)
2016-09-25 17:22 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (80.38 KB, patch)
2016-09-25 17:31 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (78.67 KB, patch)
2016-09-25 18:06 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (78.67 KB, patch)
2016-09-25 18:11 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (83.80 KB, patch)
2016-09-25 22:54 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
trying to get around the regexp regression (100.44 KB, patch)
2016-09-25 23:57 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (122.92 KB, patch)
2016-09-26 12:37 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (123.06 KB, patch)
2016-09-26 12:49 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (123.16 KB, patch)
2016-09-26 14:27 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (123.16 KB, patch)
2016-09-26 14:45 PDT, Filip Pizlo
ggaren: review+
Details | Formatted Diff | Diff
patch for landing (123.74 KB, patch)
2016-09-27 12:39 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
rebased patch (123.61 KB, patch)
2016-09-28 09:51 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
rebased patch (123.56 KB, patch)
2016-09-28 13:34 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2016-09-20 13:56:21 PDT
Patch forthcoming.
Comment 1 Filip Pizlo 2016-09-20 13:57:24 PDT
It needs an mfence/dmbish.  We can either make this unconditional, if it's fast enough, or we can do hacks.

I'll try unconditional first.
Comment 2 Filip Pizlo 2016-09-20 14:19:50 PDT
Looks like putting an mfence into the barrier is an enormous slowdown.  We can't do it.
Comment 3 Filip Pizlo 2016-09-22 21:25:41 PDT
Here's what happens if we put the ortop store-load fence before every store barrier.

Biggest slow down on any benchmark: 2x slower on richards
Biggest suite slow down: 12% slower on Octane

It's awfully tempting to say that the barrier is ortop behind a branch.


Benchmark report for SunSpider, LongSpider, Octane, Kraken, and AsmBench on murderface (MacBookPro11,5).

VMs tested:
"TipOfTree" at /Volumes/Data/secondary/OpenSource/WebKitBuild/Release/jsc (r206274)
"Things" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206274)

Collected 6 samples per benchmark/VM, with 6 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to
get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                TipOfTree                   Things                                      
SunSpider:
   3d-cube                                    4.8347+-0.2649     ?      4.8454+-0.2665        ?
   3d-morph                                   5.0168+-0.4898            4.8785+-0.3135          might be 1.0283x faster
   3d-raytrace                                4.8442+-0.1807     ?      4.9115+-0.2780        ? might be 1.0139x slower
   access-binary-trees                        1.9121+-0.1006     ?      2.0883+-0.0962        ? might be 1.0922x slower
   access-fannkuch                            4.8294+-0.2304            4.7858+-0.0596        
   access-nbody                               2.4893+-0.2680     !      3.4361+-0.0847        ! definitely 1.3804x slower
   access-nsieve                              3.3854+-0.3455            3.1194+-0.1590          might be 1.0853x faster
   bitops-3bit-bits-in-byte                   1.0935+-0.0915     ?      1.1083+-0.0736        ? might be 1.0135x slower
   bitops-bits-in-byte                        2.6328+-0.0724     ?      2.8829+-0.5376        ? might be 1.0950x slower
   bitops-bitwise-and                         2.0046+-0.1978            1.9007+-0.0246          might be 1.0546x faster
   bitops-nsieve-bits                         3.0671+-0.1410     ?      3.1798+-0.3074        ? might be 1.0367x slower
   controlflow-recursive                      2.2067+-0.0854     ?      2.4131+-0.2221        ? might be 1.0935x slower
   crypto-aes                                 4.2751+-0.2712            4.1564+-0.1486          might be 1.0286x faster
   crypto-md5                                 2.5822+-0.0850     ?      2.5839+-0.0848        ?
   crypto-sha1                                2.7965+-0.1831            2.7643+-0.2527          might be 1.0116x faster
   date-format-tofte                          6.7047+-0.4060     !      7.9941+-0.1243        ! definitely 1.1923x slower
   date-format-xparb                          4.5741+-0.2277            4.5505+-0.1035        
   math-cordic                                2.8048+-0.3212            2.7110+-0.0609          might be 1.0346x faster
   math-partial-sums                          3.9242+-0.3053     ?      4.2121+-0.0486        ? might be 1.0734x slower
   math-spectral-norm                         2.0118+-0.1842            1.9723+-0.0992          might be 1.0200x faster
   regexp-dna                                 6.0520+-0.1355     ?      6.3698+-0.2368        ? might be 1.0525x slower
   string-base64                              4.4180+-0.2117     ?      4.6267+-0.1075        ? might be 1.0472x slower
   string-fasta                               5.2382+-0.0949     !      5.6356+-0.1404        ! definitely 1.0759x slower
   string-tagcloud                            8.0541+-0.3432     ?      8.2793+-0.2468        ? might be 1.0280x slower
   string-unpack-code                        17.7172+-0.5782     ?     18.2439+-0.6110        ? might be 1.0297x slower
   string-validate-input                      4.2480+-0.3982     ?      4.4784+-0.0904        ? might be 1.0542x slower

   <arithmetic>                               4.3737+-0.0502     !      4.5434+-0.0438        ! definitely 1.0388x slower

                                                TipOfTree                   Things                                      
LongSpider:
   3d-cube                                  783.7228+-11.7323    !    855.6619+-10.6588       ! definitely 1.0918x slower
   3d-morph                                 568.6765+-11.3501         567.2830+-9.1173        
   3d-raytrace                              450.9052+-10.1626    ?    455.0590+-10.0853       ?
   access-binary-trees                      785.3363+-6.8210     ?    786.5133+-11.4327       ?
   access-fannkuch                          237.3588+-2.6353     ?    242.5754+-15.0498       ? might be 1.0220x slower
   access-nbody                             503.9350+-7.6098          503.7887+-5.6037        
   access-nsieve                            285.1292+-5.5067     ?    287.2666+-4.9013        ?
   bitops-3bit-bits-in-byte                  33.3164+-2.5575           31.4300+-0.5543          might be 1.0600x faster
   bitops-bits-in-byte                       93.4222+-3.6022     ?     95.9135+-5.4419        ? might be 1.0267x slower
   bitops-nsieve-bits                       379.5902+-7.2816          375.7859+-1.9959          might be 1.0101x faster
   controlflow-recursive                    441.9387+-8.8400          441.6835+-13.3431       
   crypto-aes                               528.3320+-9.1814     ?    531.1236+-7.3378        ?
   crypto-md5                               468.6188+-10.8985         466.1707+-7.9897        
   crypto-sha1                              624.2290+-5.5675     ?    638.6044+-10.4121       ? might be 1.0230x slower
   date-format-tofte                        334.0788+-6.0447     !    384.7257+-6.9758        ! definitely 1.1516x slower
   date-format-xparb                        617.1805+-9.4164     ?    653.0618+-32.5480       ? might be 1.0581x slower
   hash-map                                 139.1575+-3.9678     !    152.4310+-4.8154        ! definitely 1.0954x slower
   math-cordic                              452.1195+-9.9727     ?    455.3107+-7.3447        ?
   math-partial-sums                        285.9348+-7.1712     !    301.3704+-5.6017        ! definitely 1.0540x slower
   math-spectral-norm                       521.5851+-9.2105     ?    525.1613+-7.4278        ?
   string-base64                            442.5808+-3.0804          441.8303+-8.4247        
   string-fasta                             339.3115+-5.9100     !    369.7477+-11.3455       ! definitely 1.0897x slower
   string-tagcloud                          162.7723+-1.7795     !    170.8195+-4.0981        ! definitely 1.0494x slower

   <geometric>                              342.9567+-2.4812     !    351.8905+-1.5834        ! definitely 1.0260x slower

                                                TipOfTree                   Things                                      
Octane:
   encrypt                                   0.15084+-0.00551          0.14965+-0.00353       
   decrypt                                   2.72931+-0.03442          2.72685+-0.03630       
   deltablue                        x2       0.12096+-0.00304    !     0.15105+-0.00243       ! definitely 1.2488x slower
   earley                                    0.24457+-0.00166    !     0.26245+-0.00300       ! definitely 1.0731x slower
   boyer                                     4.37421+-0.11332    ?     4.45275+-0.12217       ? might be 1.0180x slower
   navier-stokes                    x2       4.65331+-0.03200    ?     4.68742+-0.08098       ?
   raytrace                         x2       0.67654+-0.00346    !     0.77305+-0.01442       ! definitely 1.1426x slower
   richards                         x2       0.07809+-0.00113    !     0.15307+-0.00258       ! definitely 1.9601x slower
   splay                            x2       0.31960+-0.00431    !     0.43687+-0.00644       ! definitely 1.3669x slower
   regexp                           x2      16.43119+-0.62910    !    18.14595+-0.55471       ! definitely 1.1044x slower
   pdfjs                            x2      39.15434+-0.63255    ?    39.90852+-0.85460       ? might be 1.0193x slower
   mandreel                         x2      40.01166+-0.75115    ?    40.23776+-0.38598       ?
   gbemu                            x2      29.55179+-0.39599    !    31.55389+-0.20267       ! definitely 1.0677x slower
   closure                                   0.47107+-0.00548    ?     0.47698+-0.00745       ? might be 1.0125x slower
   jquery                                    6.45894+-0.07439          6.42631+-0.05643       
   box2d                            x2       8.81335+-0.08313    !     9.69109+-0.06784       ! definitely 1.0996x slower
   zlib                             x2     343.59926+-3.70208        336.96984+-8.18467         might be 1.0197x faster
   typescript                       x2     610.15072+-15.15347   ?   627.17623+-9.47707       ? might be 1.0279x slower

   <geometric>                               4.77782+-0.01863    !     5.34574+-0.02589       ! definitely 1.1189x slower

                                                TipOfTree                   Things                                      
Kraken:
   ai-astar                                   90.767+-3.093      ?      90.862+-1.908         ?
   audio-beat-detection                       35.682+-0.328      ?      37.017+-1.870         ? might be 1.0374x slower
   audio-dft                                  95.836+-3.737             95.506+-3.921         
   audio-fft                                  30.314+-4.939             27.592+-0.170           might be 1.0986x faster
   audio-oscillator                           45.864+-2.911             44.036+-1.469           might be 1.0415x faster
   imaging-darkroom                           57.539+-1.431             57.318+-0.417         
   imaging-desaturate                         41.086+-0.372             40.929+-0.503         
   imaging-gaussian-blur                      57.484+-1.655             56.516+-2.836           might be 1.0171x faster
   json-parse-financial                       31.456+-0.733      ?      32.994+-1.105         ? might be 1.0489x slower
   json-stringify-tinderbox                   20.824+-0.872      ?      21.490+-1.169         ? might be 1.0320x slower
   stanford-crypto-aes                        34.984+-0.470      ?      36.128+-0.847         ? might be 1.0327x slower
   stanford-crypto-ccm                        35.019+-2.087             33.424+-2.722           might be 1.0477x faster
   stanford-crypto-pbkdf2                     89.472+-2.999      ?      90.649+-1.593         ? might be 1.0132x slower
   stanford-crypto-sha256-iterative           29.205+-0.226      ?      30.100+-1.818         ? might be 1.0306x slower

   <arithmetic>                               49.681+-0.293             49.612+-0.273           might be 1.0014x faster

                                                TipOfTree                   Things                                      
AsmBench:
   bigfib.cpp                               412.8313+-2.9297     ?    418.5986+-12.6721       ? might be 1.0140x slower
   cray.c                                   362.4455+-6.5672     ?    363.2441+-5.6846        ?
   dry.c                                    411.2849+-14.0614         400.3989+-12.6636         might be 1.0272x faster
   FloatMM.c                                701.0242+-32.9905         680.5246+-11.4577         might be 1.0301x faster
   gcc-loops.cpp                           3417.7974+-21.9542    ?   3420.8556+-46.6906       ?
   n-body.c                                 763.4065+-10.1276         760.3860+-12.8855       
   Quicksort.c                              376.6666+-6.1638          374.2165+-4.0621        
   stepanov_container.cpp                  3192.4034+-38.6218    ?   3253.7346+-132.5702      ? might be 1.0192x slower
   Towers.c                                 252.7060+-3.2718     ?    255.1564+-2.6056        ?

   <geometric>                              687.1447+-5.3013          685.5014+-3.4682          might be 1.0024x faster

                                                TipOfTree                   Things                                      
Geomean of preferred means:
   <scaled-result>                           47.6106+-0.1527     !     49.2801+-0.0711        ! definitely 1.0351x slower
Comment 4 Filip Pizlo 2016-09-23 09:59:59 PDT
Created attachment 289686 [details]
work in progress

I realized that I can be a lot more aggressive about removing barriers.  So I'm wiring this through the compiler now.
Comment 5 Filip Pizlo 2016-09-23 16:38:43 PDT
I'm getting some really interesting data.  Basically, just putting the barrier after the store, so that this:

    o.f = v

gets turned into this:

    o.f = v
    barrier

instead of this:

    barrier
    o.f = v

Right now, we put the barrier before the store.  Putting the barrier after the store appears to be a 10% slow-down on Octane/richards.  This causes a 0.7% slow-down on Octane.  This is an enormous slow-down from what seems like an innocent change, but it was very easy to narrow it down: putting the barriers before the stores and 

On top of this, there is a slow-down from putting in the fence.  Right now this means putting the fence behind a conditional.  So in the end we get:

    o.f = v
    if (marking) fence
    barrier

Putting this conditional fence everywhere causes another 0.9% slow-down.  Together it's a 1.6% slow-down on Octane.

So, do we really want to do it this way?  Probably not.  We probably want to find some way of doing this that does not result in any Octane slow-down.  One possible approach would be to leave the barrier before the store and force the fast path to fail when marking is enabled, for example by making it be this:

    if o->cellState > blackThreshold

where blackThreshold is a global variable that the barrier loads from rather than an inline constant.  Then, the slow path would buffer the object.  When we pop objects off the buffer, we can run the full barrier.

But, I think I'll run some more tests.  It's worthwhile to get a complete performance picture, in case the other approach starts having problems, and we want to make trade-offs.  Also, I want to get a better understanding of why moving the barrier to after the store makes such a big difference.
Comment 6 Filip Pizlo 2016-09-23 16:42:05 PDT
It looks like this performance pathology only shows up in the FTL JIT.
Comment 7 Filip Pizlo 2016-09-23 16:46:54 PDT
Looks like there are some suspicious issues with register allocation in schedule() in richards.
Comment 8 Filip Pizlo 2016-09-23 16:52:58 PDT
I think I see what is going on!  The store barrier slow path looks like it clobbers the world, so it blocks B3's load elimination from working right.

That's actually an easy fix.
Comment 9 Filip Pizlo 2016-09-23 17:47:49 PDT
Created attachment 289728 [details]
TSO barrier

I fixed one of the perf problems.  Now I need to test all of the things.
Comment 10 Filip Pizlo 2016-09-24 10:52:40 PDT
Perf:

Benchmark report for SunSpider, LongSpider, Octane, Kraken, and AsmBench on murderface (MacBookPro11,5).

VMs tested:
"TipOfTree" at /Volumes/Data/secondary/OpenSource/WebKitBuild/Release/jsc (r206314)
"Things" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206314)
    export JSC_useConcurrentBarriers=false
    export JSC_forceFencedBarrier=false
"FenceBarrierOff" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206314)
    export JSC_useConcurrentBarriers=true
    export JSC_forceFencedBarrier=false
"FenceBarrierOn" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206314)
    export JSC_useConcurrentBarriers=true
    export JSC_forceFencedBarrier=true

Collected 6 samples per benchmark/VM, with 6 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation
for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                TipOfTree                   Things               FenceBarrierOff            FenceBarrierOn        FenceBarrierOn v. TipOfTree
SunSpider:
   3d-cube                                    4.8855+-0.3437            4.7451+-0.2015     ?      4.7654+-0.0585     ?      5.0327+-0.2626        ? might be 1.0301x slower
   3d-morph                                   4.7341+-0.2022     ?      4.8443+-0.4272     ?      4.8904+-0.2973            4.6849+-0.0377          might be 1.0105x faster
   3d-raytrace                                4.8137+-0.3456     ?      4.9133+-0.3841            4.7564+-0.0779     ?      4.8594+-0.4431        ?
   access-binary-trees                        1.9301+-0.0335            1.9289+-0.0318     ?      2.1017+-0.2853            2.0540+-0.0671        ! definitely 1.0642x slower
   access-fannkuch                            4.7689+-0.0938     ?      5.0267+-0.3571            4.7893+-0.1429            4.7772+-0.0541        ?
   access-nbody                               2.3896+-0.2317     ?      2.5207+-0.5531            2.3719+-0.2189     ?      2.5851+-0.1483        ? might be 1.0818x slower
   access-nsieve                              3.1248+-0.2647     ?      3.2107+-0.1061            3.1423+-0.1661     ?      3.2327+-0.2379        ? might be 1.0345x slower
   bitops-3bit-bits-in-byte                   1.0614+-0.0761     ?      1.0975+-0.0420            1.0621+-0.0598     ?      1.0815+-0.0585        ? might be 1.0190x slower
   bitops-bits-in-byte                        2.6108+-0.0640     ?      2.6396+-0.0615     ?      2.7493+-0.0688            2.7318+-0.0443        ! definitely 1.0463x slower
   bitops-bitwise-and                         1.8669+-0.0144     ?      1.9815+-0.2187            1.9395+-0.0935     ?      1.9695+-0.1221        ? might be 1.0550x slower
   bitops-nsieve-bits                         3.0007+-0.0639     ?      3.0479+-0.1050            2.9696+-0.0364     ?      2.9954+-0.0258        
   controlflow-recursive                      2.2263+-0.0560            2.2260+-0.0224            2.2219+-0.0388     !      2.3564+-0.0475        ! definitely 1.0585x slower
   crypto-aes                                 4.2644+-0.1752            4.1656+-0.0653     ?      4.1970+-0.0317            4.1866+-0.0229          might be 1.0186x faster
   crypto-md5                                 2.6059+-0.0477            2.6038+-0.0802     ?      2.6512+-0.1150     ?      2.8451+-0.3306        ? might be 1.0918x slower
   crypto-sha1                                2.6635+-0.0809     ?      2.7002+-0.0838            2.6753+-0.0384     ?      2.7194+-0.0890        ? might be 1.0210x slower
   date-format-tofte                          6.5990+-0.1135            6.5070+-0.1606     ?      7.1180+-0.6053     !      7.9360+-0.0940        ! definitely 1.2026x slower
   date-format-xparb                          4.3823+-0.0682     ?      4.4602+-0.1820     ?      4.5015+-0.2546     ?      4.5329+-0.0812        ! definitely 1.0344x slower
   math-cordic                                2.6175+-0.0460     ?      2.7590+-0.2880            2.6494+-0.0492     ?      2.8150+-0.2690        ? might be 1.0755x slower
   math-partial-sums                          3.9207+-0.1780            3.7919+-0.0697     ?      3.8057+-0.0458            3.8049+-0.0539          might be 1.0304x faster
   math-spectral-norm                         1.9714+-0.0703            1.9311+-0.0218     ?      1.9405+-0.0335     ?      1.9581+-0.0354        
   regexp-dna                                 6.1188+-0.0952            6.0387+-0.1454     ?      6.5048+-0.6697            6.0741+-0.2002        
   string-base64                              4.3668+-0.0441     ?      4.4353+-0.1003     ?      4.6130+-0.4051     ?      4.6448+-0.0466        ! definitely 1.0637x slower
   string-fasta                               5.3062+-0.1170     ?      5.3205+-0.1398     ?      5.6424+-0.5966            5.6309+-0.0922        ! definitely 1.0612x slower
   string-tagcloud                            7.9797+-0.1514     ?      8.1622+-0.5594            7.9825+-0.1638     ?      8.7412+-0.8858        ? might be 1.0954x slower
   string-unpack-code                        19.2155+-0.8224           18.8134+-1.7017     ?     18.9829+-0.8186           18.5221+-0.5099          might be 1.0374x faster
   string-validate-input                      4.0581+-0.0559     !      4.1686+-0.0525     ?      4.1766+-0.0505     !      4.3840+-0.1073        ! definitely 1.0803x slower

   <arithmetic>                               4.3647+-0.0593     ?      4.3861+-0.1047     ?      4.4308+-0.0941     ?      4.5060+-0.0705        ! definitely 1.0324x slower

                                                TipOfTree                   Things               FenceBarrierOff            FenceBarrierOn        FenceBarrierOn v. TipOfTree
LongSpider:
   3d-cube                                  790.3628+-14.4636    ?    793.4216+-9.2169          791.1963+-8.9537     !    819.6706+-9.1970        ! definitely 1.0371x slower
   3d-morph                                 560.1136+-2.6505          557.3622+-3.2403     ?    559.6033+-4.2366          556.9900+-2.2083        
   3d-raytrace                              443.4788+-5.0786     ?    445.9079+-3.1484          443.5669+-5.7357          441.3468+-1.9866        
   access-binary-trees                      776.8875+-4.6755          775.5487+-2.5747     ?    782.2760+-4.2171          777.6500+-8.6042        ?
   access-fannkuch                          241.7314+-9.6265     ^    226.0846+-3.6940     ?    233.5190+-12.2351    ?    234.5291+-10.8571         might be 1.0307x faster
   access-nbody                             497.8625+-7.9337          492.4945+-5.1109          491.0199+-1.6968     ?    491.6904+-2.4094          might be 1.0126x faster
   access-nsieve                            275.3899+-8.6057     ?    282.0843+-10.3947         279.8867+-9.4489     ?    280.7753+-7.8075        ? might be 1.0196x slower
   bitops-3bit-bits-in-byte                  31.5917+-0.5577     ?     32.1492+-1.3468           31.7826+-0.4859     ?     32.4253+-1.6604        ? might be 1.0264x slower
   bitops-bits-in-byte                       90.5475+-1.6808     ?     91.0194+-0.6494           90.2509+-0.5593     ?     90.5641+-2.5394        ?
   bitops-nsieve-bits                       373.0286+-7.8153          366.0847+-3.0362     ?    367.7728+-1.9961     ?    372.6866+-8.1861        
   controlflow-recursive                    432.2858+-4.7629     ?    436.5959+-4.9231          434.4041+-2.4137     ?    436.3765+-2.4653        ?
   crypto-aes                               527.0315+-3.5424          526.9155+-5.7155          525.7543+-3.7497     ?    526.3824+-2.8114        
   crypto-md5                               462.0101+-2.3680          461.5408+-3.2257          453.5203+-6.2378     ?    455.1294+-5.8330          might be 1.0151x faster
   crypto-sha1                              619.0815+-3.5727     ?    667.3594+-114.0927        617.0648+-6.5730          612.4217+-3.5069          might be 1.0109x faster
   date-format-tofte                        335.7357+-3.8220     ?    343.8285+-5.4068     ?    346.6802+-29.3892    ?    378.6551+-10.3137       ! definitely 1.1278x slower
   date-format-xparb                        604.3572+-3.2837     ?    615.7940+-15.3735    ?    618.9718+-38.0003    ?    625.1016+-2.5524        ! definitely 1.0343x slower
   hash-map                                 138.0494+-4.4299          134.0581+-3.8635     ?    136.2216+-2.9644     !    145.2815+-4.0119        ? might be 1.0524x slower
   math-cordic                              440.6676+-3.4782     ?    444.4087+-5.2941          442.6868+-5.2303     ?    443.8083+-3.1154        ?
   math-partial-sums                        282.3561+-4.5102     ?    282.8031+-4.0082     ?    283.3629+-3.1374     ?    283.7110+-2.5464        ?
   math-spectral-norm                       518.3130+-4.2663          514.2906+-2.3935     ?    514.5655+-2.4697     ?    514.8337+-2.1327        
   string-base64                            491.0991+-5.6191     ^    475.4124+-2.6904     ?    477.6324+-6.1856          476.0622+-3.8939        ^ definitely 1.0316x faster
   string-fasta                             332.5522+-2.8762          328.0060+-4.0587     ?    331.4103+-3.7413     !    363.7319+-2.8054        ! definitely 1.0938x slower
   string-tagcloud                          163.5163+-2.0491          163.0950+-2.3055     ?    163.6358+-2.4436     ?    167.2279+-2.8409        ? might be 1.0227x slower

   <geometric>                              340.1453+-0.6782          340.0635+-3.1573          339.4599+-1.2458     !    344.6359+-1.8147        ! definitely 1.0132x slower

                                                TipOfTree                   Things               FenceBarrierOff            FenceBarrierOn        FenceBarrierOn v. TipOfTree
Octane:
   encrypt                                   0.14871+-0.00281    ?     0.14898+-0.00394    ?     0.15280+-0.00594          0.14909+-0.00401       ?
   decrypt                                   2.65943+-0.02182          2.65544+-0.01654    ?     2.65951+-0.01500    ?     2.67283+-0.01927       ?
   deltablue                        x2       0.11945+-0.00161               ERROR                0.11842+-0.00251    !     0.13622+-0.00272       ! definitely 1.1404x slower
   earley                                    0.23897+-0.00092    ?     0.23939+-0.00386    ?     0.24030+-0.00175    !     0.26136+-0.00266       ! definitely 1.0937x slower
   boyer                                     4.42934+-0.09664          4.38369+-0.12234    ?     4.44571+-0.10589    ?     4.58476+-0.04672       ! definitely 1.0351x slower
   navier-stokes                    x2       4.62985+-0.04080          4.61322+-0.01365          4.60662+-0.00561    ?     4.60704+-0.05048       
   raytrace                         x2       0.66588+-0.00658    ?     0.66680+-0.00711    ?     0.66809+-0.00307    !     0.69510+-0.00904       ! definitely 1.0439x slower
   richards                         x2       0.07768+-0.00056          0.07736+-0.00135    !     0.08119+-0.00088    !     0.15372+-0.00039       ! definitely 1.9790x slower
   splay                            x2       0.31131+-0.00229          0.30998+-0.00279    ?     0.31254+-0.00342    !     0.39484+-0.00156       ! definitely 1.2683x slower
   regexp                           x2      16.56341+-0.81563    ?    17.68462+-0.36190         17.40606+-0.84950    ?    18.28816+-0.43341       ! definitely 1.1041x slower
   pdfjs                            x2      39.17220+-0.27277         38.63765+-0.46641    ?    38.79572+-0.44520    !    39.94609+-0.26545       ! definitely 1.0198x slower
   mandreel                         x2      39.32074+-0.28400    ?    39.66557+-0.32521         39.48684+-0.28639         39.37410+-0.21706       ?
   gbemu                            x2      28.85909+-0.08143         28.69655+-0.25908    ?    29.16622+-0.30302    ?    32.14633+-2.92521       ! definitely 1.1139x slower
   closure                                   0.46391+-0.00424          0.46188+-0.00404    ?     0.46291+-0.00433    ?     0.46500+-0.00235       ?
   jquery                                    6.33970+-0.04624    ?     6.36350+-0.08610          6.32101+-0.04921    ?     6.35837+-0.04818       ?
   box2d                            x2       8.66160+-0.05203          8.61802+-0.05914    !     8.72656+-0.03259    !     8.96710+-0.03578       ! definitely 1.0353x slower
   zlib                             x2     336.62914+-2.38209    ?   338.11414+-4.19177        335.33173+-11.15128   ?   339.51557+-3.51460       ?
   typescript                       x2     610.39062+-13.85539       608.59302+-9.70143        608.05721+-13.77993   !   635.74292+-10.20850      ! definitely 1.0415x slower

   <geometric>                               4.72171+-0.01352               ERROR                4.75560+-0.01561    !     5.20716+-0.03504       ! definitely 1.1028x slower

                                                TipOfTree                   Things               FenceBarrierOff            FenceBarrierOn        FenceBarrierOn v. TipOfTree
Kraken:
   ai-astar                                   88.745+-0.755             88.128+-1.606      ?      90.754+-4.183             88.752+-1.947         ?
   audio-beat-detection                       35.425+-0.221      ?      35.951+-1.556             35.536+-0.544      ?      35.848+-0.265         ? might be 1.0119x slower
   audio-dft                                  93.684+-1.430      ?      95.182+-2.398             95.145+-1.866             93.880+-1.503         ?
   audio-fft                                  27.499+-0.495      ?      27.692+-1.012             27.321+-0.080             27.295+-0.052         
   audio-oscillator                           43.931+-1.493             43.106+-0.156      ?      43.232+-0.398      ?      43.987+-1.585         ?
   imaging-darkroom                           57.659+-2.017      ?      57.733+-2.253      ?      57.922+-1.756             56.847+-0.177           might be 1.0143x faster
   imaging-desaturate                         41.764+-2.569             40.709+-0.322      ?      40.842+-0.481      ?      41.341+-0.765           might be 1.0102x faster
   imaging-gaussian-blur                      56.245+-1.997      ?      56.403+-1.431      ?      56.644+-3.055      ?      57.091+-2.905         ? might be 1.0150x slower
   json-parse-financial                       31.026+-0.461             30.641+-0.354      ?      31.265+-1.270      ?      34.503+-3.230         ? might be 1.1120x slower
   json-stringify-tinderbox                   20.692+-0.805      ?      21.898+-1.977             21.545+-1.670             21.280+-0.950         ? might be 1.0284x slower
   stanford-crypto-aes                        35.303+-1.064             34.637+-0.380      ?      35.438+-1.000      ?      35.551+-0.240         ?
   stanford-crypto-ccm                        32.985+-0.406      ?      33.687+-1.508             33.332+-2.713      ?      34.518+-4.394         ? might be 1.0465x slower
   stanford-crypto-pbkdf2                     89.313+-1.343      ?      89.358+-2.444      ?      89.626+-3.292             89.037+-1.876         
   stanford-crypto-sha256-iterative           28.878+-0.245      ?      30.131+-2.089             29.116+-0.280      ?      29.451+-0.801         ? might be 1.0198x slower

   <arithmetic>                               48.796+-0.401      ?      48.947+-0.607      ?      49.123+-0.338      ?      49.242+-0.401         ? might be 1.0091x slower

                                                TipOfTree                   Things               FenceBarrierOff            FenceBarrierOn        FenceBarrierOn v. TipOfTree
AsmBench:
   bigfib.cpp                               407.1147+-5.3258          401.8814+-2.2765     ?    404.9376+-4.0144     ?    407.4121+-4.7777        ?
   cray.c                                   361.7572+-3.7513          359.6945+-3.0764     ?    361.8408+-2.4479          357.4190+-2.2745          might be 1.0121x faster
   dry.c                                    456.1795+-93.4875         404.4752+-11.0078    ?    431.2439+-84.5160         427.5564+-73.1661         might be 1.0669x faster
   FloatMM.c                                702.1463+-40.4966         672.1162+-6.5733     ?    684.6254+-31.0437         684.0195+-32.2109         might be 1.0265x faster
   gcc-loops.cpp                           3350.7568+-10.4230    ?   3369.8304+-17.8820        3346.1816+-10.2480    ?   3353.3887+-12.6830       ?
   n-body.c                                 748.5625+-6.1192     ?    750.1156+-3.8622          746.0982+-4.0725     ?    749.2850+-1.9416        ?
   Quicksort.c                              368.8832+-4.4032     ?    369.6404+-5.3791     ?    372.3699+-3.4186          369.6694+-6.2943        ?
   stepanov_container.cpp                  3122.4003+-27.4178    ?   3150.5615+-50.5682        3142.3111+-26.2512    ?   3150.6047+-29.8139       ?
   Towers.c                                 250.4936+-3.4886     ?    251.6215+-4.0640          248.2078+-1.4313     ?    249.3890+-2.4550        

   <geometric>                              685.9817+-14.9669         675.0062+-3.2894     ?    679.7620+-12.0832         679.2585+-11.6418         might be 1.0099x faster

                                                TipOfTree                   Things               FenceBarrierOff            FenceBarrierOn        FenceBarrierOn v. TipOfTree
Geomean of preferred means:
   <scaled-result>                           47.2131+-0.1939                ERROR                47.3805+-0.2497     !     48.5743+-0.2830        ! definitely 1.0288x slower
Comment 11 Filip Pizlo 2016-09-24 10:54:06 PDT
Created attachment 289747 [details]
a different approach

I'm trying a different approach, where instead of emitting:

    if (needsFence)
        fence
    if (o->cellState <= threshold)
        slowPath()

I just do:

    if (o->cellState <= sneakyThreshold)
        slowPath()

Where sneakyThreshold is a global variable, not a constant.  We can raise it to a tautological level when we need fences.
Comment 12 Filip Pizlo 2016-09-25 15:36:50 PDT
Created attachment 289785 [details]
possibly cheaper and less wrong TSO barrier

Still testing it.
Comment 13 Filip Pizlo 2016-09-25 16:58:01 PDT
OK, got some perf.  I'm testing these configurations:

TipOfTree = tip of tree
Things = all of the changes in this patch, but the JIT uses a store barrier without any TSO fence
FenceBarrierOff = the fence is dynamically turned off but can be turned on in O(1) time.  this is the configuration that code would run in when the concurrent GC is not running.
FenceBarrierOn = the fence is dynamically turned on.  this is the configuration that code would run in when the concurrent GC is running.

Summary of results:

SunSpider: FenceBarrierOff is neutral, FenceBarrierOn is 10.6% slower.
LongSpider: FenceBarrierOff is neutral, FenceBarrierOn is 3.5% slower.
Octane: FenceBarrierOff is maybe 0.43% slower, FenceBarrierOn is 14.7% slower.
Kraken: FenceBarrierOff is maybe 0.7% slower, FenceBarrierOn is 3.1% slower.
AsmBench: FenceBarrierOff is maybe 2% slower, FenceBarrierOn is neutral.

Note that I reran SunSpider separately after at first seeing really strange results.  Here's all the data.

Benchmark report for SunSpider, LongSpider, Octane, Kraken, and AsmBench on murderface (MacBookPro11,5).

VMs tested:
"TipOfTree" at /Volumes/Data/secondary/OpenSource/WebKitBuild/Release/jsc (r206363)
"Things" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206363)
    export JSC_useConcurrentBarriers=false
    export JSC_forceFencedBarrier=false
"FenceBarrierOff" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206363)
    export JSC_useConcurrentBarriers=true
    export JSC_forceFencedBarrier=false
"FenceBarrierOn" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206363)
    export JSC_useConcurrentBarriers=true
    export JSC_forceFencedBarrier=true

Collected 6 samples per benchmark/VM, with 6 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation
for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                TipOfTree                   Things               FenceBarrierOff            FenceBarrierOn        FenceBarrierOn v. TipOfTree
SunSpider:
   3d-cube                                    4.6873+-0.2464            4.6723+-0.2168     ?      4.8192+-0.1976     ?      5.1078+-0.1400        ! definitely 1.0897x slower
   3d-morph                                   4.7399+-0.1793            4.7051+-0.1343     ?      4.7362+-0.2140            4.6970+-0.0654        
   3d-raytrace                                4.7384+-0.2259     ?      4.7537+-0.1927     ?      4.7561+-0.1646     ?      4.8793+-0.1036        ? might be 1.0297x slower
   access-binary-trees                        1.9175+-0.0589     ?      1.9902+-0.0583            1.9586+-0.0890     !      2.2765+-0.0678        ! definitely 1.1872x slower
   access-fannkuch                            4.8763+-0.2529            4.8594+-0.3502     ?      4.8758+-0.1839            4.8537+-0.2047        
   access-nbody                               2.3629+-0.0772     ?      2.4242+-0.1369            2.3947+-0.0988     !      3.8912+-0.0881        ! definitely 1.6468x slower
   access-nsieve                              3.3562+-0.3997            3.2106+-0.0809            3.1377+-0.1225     ?      3.1865+-0.1666          might be 1.0532x faster
   bitops-3bit-bits-in-byte                   1.0755+-0.0372            1.0734+-0.0248            1.0601+-0.0196     !      1.1551+-0.0520        ? might be 1.0740x slower
   bitops-bits-in-byte                        2.6095+-0.0462     !      2.7709+-0.0739            2.7496+-0.0328     ?      2.7807+-0.0526        ! definitely 1.0656x slower
   bitops-bitwise-and                         1.8820+-0.0450     ?      2.0783+-0.3709            1.9802+-0.1080     ?      2.0349+-0.1611        ? might be 1.0813x slower
   bitops-nsieve-bits                         2.9371+-0.0603     ?      3.0051+-0.0388     ?      3.2958+-0.4755            3.0335+-0.0485        ? might be 1.0328x slower
   controlflow-recursive                      2.2253+-0.0141     ?      2.2607+-0.0217     ?      2.3227+-0.1296     ?      2.5735+-0.1562        ! definitely 1.1565x slower
   crypto-aes                                 4.1389+-0.0501     ?      4.1825+-0.1189            4.1487+-0.1162     ?      4.3077+-0.0710        ! definitely 1.0408x slower
   crypto-md5                                 2.5885+-0.0409            2.5629+-0.0412     ?      2.8102+-0.4482     ?      2.8254+-0.0653        ! definitely 1.0915x slower
   crypto-sha1                                2.6435+-0.0945     ?      2.7382+-0.2091            2.6968+-0.1981     ?      2.7872+-0.0651        ? might be 1.0544x slower
   date-format-tofte                          6.4510+-0.3090     ?      7.0450+-0.7715            6.8742+-0.5163     !      9.1250+-0.1661        ! definitely 1.4145x slower
   date-format-xparb                          4.3540+-0.1031     ?      4.4781+-0.0952            4.3957+-0.0597     !      4.8894+-0.3077        ! definitely 1.1229x slower
   math-cordic                                2.6507+-0.0538            2.6259+-0.0568     ?      2.6843+-0.1424     ?      2.7441+-0.0910        ? might be 1.0353x slower
   math-partial-sums                          3.9649+-0.2047            3.9623+-0.3220            3.8153+-0.0521     !      4.4985+-0.1553        ! definitely 1.1346x slower
   math-spectral-norm                         1.9359+-0.0580     ?      2.0052+-0.1224            1.9846+-0.0432     ?      2.0085+-0.1015        ? might be 1.0375x slower
   regexp-dna                                 6.2087+-0.1649     ?      6.9290+-0.9818            6.2836+-0.2674     ?      6.3951+-0.5061        ? might be 1.0300x slower
   string-base64                              4.3678+-0.0718     ?      4.6694+-0.3538            4.4497+-0.0878     !      4.9917+-0.1363        ! definitely 1.1428x slower
   string-fasta                               5.2428+-0.0448            5.2300+-0.0742     ?      5.3259+-0.1634     !      5.7222+-0.0634        ! definitely 1.0914x slower
   string-tagcloud                            7.9255+-0.1528     ?      8.2115+-0.5876     ?      8.5007+-0.6124     ?      8.8456+-0.3920        ! definitely 1.1161x slower
   string-unpack-code                        18.2529+-0.5251     ?     18.7590+-0.3757           18.5401+-0.6173     !     20.4280+-0.8089        ! definitely 1.1192x slower
   string-validate-input                      4.1118+-0.1167     ?      4.2775+-0.2347            4.1448+-0.0659     !      4.8023+-0.1121        ! definitely 1.1679x slower

   <arithmetic>                               4.3171+-0.0202     !      4.4415+-0.0572            4.4131+-0.0433     !      4.8016+-0.0313        ! definitely 1.1122x slower

                                                TipOfTree                   Things               FenceBarrierOff            FenceBarrierOn        FenceBarrierOn v. TipOfTree
LongSpider:
   3d-cube                                  790.8875+-11.1969    ^    772.9956+-5.9998     ?    788.2658+-10.4984    !    853.4161+-12.1295       ! definitely 1.0791x slower
   3d-morph                                 563.8531+-14.6256         556.9321+-4.7208          556.1187+-3.2730          555.9510+-4.1080          might be 1.0142x faster
   3d-raytrace                              443.4629+-4.5655     ?    444.7491+-3.3469          443.6145+-3.6014     !    453.6922+-5.2530        ! definitely 1.0231x slower
   access-binary-trees                      780.3031+-4.2928     ?    783.7270+-15.1960         779.8463+-5.3910     ?    786.0037+-6.8504        ?
   access-fannkuch                          233.6416+-9.2029     ?    239.8330+-14.0239         228.8607+-8.1681     ?    230.9484+-10.9864         might be 1.0117x faster
   access-nbody                             496.3149+-9.3575          496.1256+-7.7534          491.3464+-3.2813     ?    495.5094+-3.6910        
   access-nsieve                            277.9156+-8.0497     ?    289.3253+-9.1517          283.2106+-7.9060          275.9535+-4.6446        
   bitops-3bit-bits-in-byte                  31.2857+-0.7790     ?     31.9063+-1.1519     ?     32.7069+-1.5389           31.7387+-0.6159        ? might be 1.0145x slower
   bitops-bits-in-byte                       90.3698+-1.5036     ?     90.8577+-2.0864     ?     91.5820+-2.8605           90.2810+-1.5695        
   bitops-nsieve-bits                       369.9926+-6.0209          366.3011+-4.9484     ?    371.0743+-3.2820          370.2164+-5.3564        ?
   controlflow-recursive                    433.7122+-3.5714     ?    434.0434+-1.9258     ?    435.1054+-2.1277     ?    435.8985+-2.8466        ?
   crypto-aes                               526.2620+-4.3421          524.7892+-10.0911    ?    526.5247+-2.5579     ?    530.3883+-3.2583        ?
   crypto-md5                               464.9832+-5.9635          458.1338+-6.0737     ?    458.4068+-7.5515     ?    465.0562+-12.3124       ?
   crypto-sha1                              620.5447+-8.6803     ?    621.0418+-12.7585         617.4234+-13.8790         614.0472+-3.1212          might be 1.0106x faster
   date-format-tofte                        325.0318+-3.8549     !    342.0714+-11.5045    ?    342.5265+-9.7111     !    401.3534+-3.3674        ! definitely 1.2348x slower
   date-format-xparb                        602.9838+-6.1904     ?    613.9401+-11.7384         607.4906+-2.4960     !    670.5593+-24.0152       ! definitely 1.1121x slower
   hash-map                                 140.2442+-6.7184          135.1236+-3.7929          134.2746+-3.0869     !    154.9412+-4.6041        ! definitely 1.1048x slower
   math-cordic                              446.4979+-8.5427          438.1545+-2.8312     ?    442.4903+-3.4614     ?    442.9231+-5.5418        
   math-partial-sums                        281.4023+-2.4358     ?    282.8702+-2.4484     ?    283.3649+-2.0861     !    303.6508+-2.9016        ! definitely 1.0791x slower
   math-spectral-norm                       515.4920+-4.0498     ?    519.0566+-6.6786          514.0352+-1.9308     ?    515.4438+-2.4737        
   string-base64                            489.9944+-4.9233     ^    477.8973+-3.2024          474.4139+-2.3064     ?    484.0563+-12.3825         might be 1.0123x faster
   string-fasta                             329.9166+-4.8052     ?    330.5812+-4.9300     ?    332.0518+-3.0716     !    372.3283+-7.4094        ! definitely 1.1286x slower
   string-tagcloud                          162.3110+-3.1199     ?    163.8420+-1.6578          163.6790+-2.7616     !    181.6792+-10.0646       ! definitely 1.1193x slower

   <geometric>                              339.2783+-1.1574     ?    339.9833+-1.1491          339.4282+-1.2724     !    351.1887+-0.7784        ! definitely 1.0351x slower

                                                TipOfTree                   Things               FenceBarrierOff            FenceBarrierOn        FenceBarrierOn v. TipOfTree
Octane:
   encrypt                                   0.15209+-0.00578          0.14743+-0.00282    ?     0.14815+-0.00272    ?     0.14910+-0.00369         might be 1.0200x faster
   decrypt                                   2.66624+-0.01183          2.65886+-0.01637    ?     2.66809+-0.01365    ?     2.68677+-0.01892       ?
   deltablue                        x2       0.11968+-0.00238          0.11904+-0.00267    ?     0.11941+-0.00167    !     0.14205+-0.00093       ! definitely 1.1869x slower
   earley                                    0.23867+-0.00097          0.23807+-0.00244    ?     0.24054+-0.00251    !     0.26432+-0.00188       ! definitely 1.1075x slower
   boyer                                     4.38041+-0.10067    ?     4.40628+-0.14514    ?     4.48629+-0.07904    ?     4.54969+-0.07993       ? might be 1.0386x slower
   navier-stokes                    x2       4.60619+-0.01825    ?     4.61248+-0.03914    ?     4.61372+-0.02039          4.60672+-0.02478       ?
   raytrace                         x2       0.66279+-0.00402          0.66272+-0.00781    ?     0.66651+-0.00862    !     0.70993+-0.00683       ! definitely 1.0711x slower
   richards                         x2       0.07788+-0.00155          0.07699+-0.00094    ?     0.07883+-0.00192    !     0.16022+-0.00095       ! definitely 2.0571x slower
   splay                            x2       0.31497+-0.00953    ?     0.31564+-0.00731          0.31335+-0.01110    !     0.40770+-0.00237       ! definitely 1.2944x slower
   regexp                           x2      16.44196+-0.84384    ?    16.84465+-0.81112         16.63412+-1.07471    !    22.06028+-0.20795       ! definitely 1.3417x slower
   pdfjs                            x2      38.99352+-0.45848         38.65428+-0.61341    ?    39.13356+-0.25845    !    40.73753+-0.33209       ! definitely 1.0447x slower
   mandreel                         x2      39.76992+-0.30873         39.57649+-0.49918         39.54078+-0.20670    ?    39.74957+-0.35685       
   gbemu                            x2      28.83507+-0.18851    ?    29.24355+-0.51371         28.96884+-0.13519    !    34.58463+-0.25396       ! definitely 1.1994x slower
   closure                                   0.46649+-0.00371    ?     0.46776+-0.00425          0.46674+-0.00259    !     0.47910+-0.00387       ! definitely 1.0270x slower
   jquery                                    6.39595+-0.03250          6.39522+-0.03984    ?     6.40099+-0.05028    ?     6.46304+-0.04862       ? might be 1.0105x slower
   box2d                            x2       8.82091+-0.30222          8.72143+-0.09720    ?     8.82028+-0.10147    !     9.90167+-0.05397       ! definitely 1.1225x slower
   zlib                             x2     335.06905+-9.72222    ?   335.59928+-11.71526   ?   340.75012+-3.20133        339.90800+-3.43031       ? might be 1.0144x slower
   typescript                       x2     598.06108+-18.86401   ?   606.66683+-4.50645    ?   609.35295+-9.43489    !   661.13749+-12.64584      ! definitely 1.1055x slower

   <geometric>                               4.72498+-0.03139    ?     4.72573+-0.03047    ?     4.74564+-0.02170    !     5.41825+-0.01486       ! definitely 1.1467x slower

                                                TipOfTree                   Things               FenceBarrierOff            FenceBarrierOn        FenceBarrierOn v. TipOfTree
Kraken:
   ai-astar                                   88.782+-2.703      ?      90.374+-4.480             89.317+-1.380             89.230+-1.301         ?
   audio-beat-detection                       35.735+-0.641      ?      36.148+-1.872             35.479+-0.298      !      35.973+-0.185         ?
   audio-dft                                  94.264+-1.122      ?      96.113+-3.379             93.799+-1.145      ?      96.438+-2.629         ? might be 1.0231x slower
   audio-fft                                  27.350+-0.079      ?      27.380+-0.078      ?      27.826+-1.384             27.361+-0.065         ?
   audio-oscillator                           44.081+-1.633             43.626+-0.269      ?      43.694+-0.863             43.495+-0.584           might be 1.0135x faster
   imaging-darkroom                           56.776+-0.209             56.742+-0.115      ?      57.027+-0.822      ?      58.320+-2.634         ? might be 1.0272x slower
   imaging-desaturate                         41.200+-0.607             40.952+-0.514      ?      44.124+-3.302             41.717+-1.615         ? might be 1.0126x slower
   imaging-gaussian-blur                      55.944+-1.432      ?      56.385+-2.696      ?      57.228+-2.885             57.151+-0.780         ? might be 1.0216x slower
   json-parse-financial                       31.049+-0.741      ?      32.910+-1.712             31.516+-0.293      !      38.881+-0.168         ! definitely 1.2522x slower
   json-stringify-tinderbox                   21.655+-1.634      ?      22.725+-2.107             21.565+-1.798      ?      22.370+-2.113         ? might be 1.0330x slower
   stanford-crypto-aes                        34.807+-0.334             34.550+-0.242      ?      35.120+-0.539      !      37.516+-0.371         ! definitely 1.0778x slower
   stanford-crypto-ccm                        32.937+-0.826             32.452+-1.573      ?      33.040+-2.156      ?      34.621+-2.412         ? might be 1.0512x slower
   stanford-crypto-pbkdf2                     88.481+-1.268      ?      89.278+-1.636             88.951+-1.689      ?      91.512+-1.811         ? might be 1.0343x slower
   stanford-crypto-sha256-iterative           29.815+-2.254             29.390+-0.443             29.191+-0.386      ?      29.367+-0.308           might be 1.0153x faster

   <arithmetic>                               48.777+-0.489      ?      49.216+-0.809             49.134+-0.225      !      50.282+-0.259         ! definitely 1.0309x slower

                                                TipOfTree                   Things               FenceBarrierOff            FenceBarrierOn        FenceBarrierOn v. TipOfTree
AsmBench:
   bigfib.cpp                               405.6804+-2.9668     ?    408.3448+-7.4028          408.0582+-7.0837          404.1593+-1.2290        
   cray.c                                   359.2984+-1.4593     ?    360.3912+-4.6435     ?    361.3141+-4.2090          360.1034+-2.2630        ?
   dry.c                                    429.1470+-73.7268         423.8395+-61.3341    ?    492.2854+-108.5908        399.6511+-8.6202          might be 1.0738x faster
   FloatMM.c                                666.9849+-26.2727    ?    706.7268+-46.6179         689.2771+-33.2839         686.3002+-30.3009       ? might be 1.0290x slower
   gcc-loops.cpp                           3356.6525+-9.4942     ?   3369.9730+-11.1756        3362.1044+-8.2965         3359.0931+-15.4549       ?
   n-body.c                                 748.1152+-5.8446     ?    749.5008+-4.2718     ?    749.9667+-7.4312     ?    750.1257+-1.8834        ?
   Quicksort.c                              370.8504+-4.4654     ?    373.2508+-3.5937     ?    375.8099+-5.5369          375.5684+-6.9777        ? might be 1.0127x slower
   stepanov_container.cpp                  3124.3299+-15.4316    ?   3131.8535+-16.2223    ?   3135.3221+-13.0599    ?   3163.1947+-26.3912       ? might be 1.0124x slower
   Towers.c                                 248.8219+-2.2847     ?    251.0608+-4.2427          250.3250+-3.9612     ?    250.6464+-2.3194        ?

   <geometric>                              677.1430+-13.1909    ?    683.2071+-14.3882    ?    692.3095+-15.5077         677.1201+-4.4223          might be 1.0000x faster

                                                TipOfTree                   Things               FenceBarrierOff            FenceBarrierOn        FenceBarrierOn v. TipOfTree
Geomean of preferred means:
   <scaled-result>                           46.9670+-0.1595     !     47.4242+-0.2858     ?     47.4981+-0.2041     !     49.9539+-0.1350        ! definitely 1.0636x slower

Benchmark report for SunSpider on murderface (MacBookPro11,5).

VMs tested:
"TipOfTree" at /Volumes/Data/secondary/OpenSource/WebKitBuild/Release/jsc (r206363)
"Things" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206363)
    export JSC_useConcurrentBarriers=false
    export JSC_forceFencedBarrier=false
"FenceBarrierOff" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206363)
    export JSC_useConcurrentBarriers=true
    export JSC_forceFencedBarrier=false
"FenceBarrierOn" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (r206363)
    export JSC_useConcurrentBarriers=true
    export JSC_forceFencedBarrier=true

Collected 100 samples per benchmark/VM, with 100 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration
per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95%
confidence intervals in milliseconds.

                                  TipOfTree                   Things               FenceBarrierOff            FenceBarrierOn        FenceBarrierOn v. TipOfTree

3d-cube                         4.7798+-0.0733            4.7731+-0.0776     ?      4.7969+-0.0682     !      5.1575+-0.0735        ! definitely 1.0790x slower
3d-morph                        4.9877+-0.1557            4.8620+-0.0797            4.8079+-0.0678     ?      4.8669+-0.0731          might be 1.0248x faster
3d-raytrace                     4.7818+-0.0820     ?      4.8683+-0.1002            4.7576+-0.0745     !      5.1002+-0.1022        ! definitely 1.0666x slower
access-binary-trees             2.0373+-0.0690            2.0250+-0.0641     ?      2.0319+-0.0647     !      2.3336+-0.0758        ! definitely 1.1454x slower
access-fannkuch                 5.0868+-0.0936            5.0123+-0.0779     ?      5.0268+-0.0929     ?      5.1237+-0.0872        ?
access-nbody                    2.4235+-0.0530            2.4070+-0.0554     ?      2.4074+-0.0393     !      4.0332+-0.0752        ! definitely 1.6642x slower
access-nsieve                   3.3544+-0.1123            3.2524+-0.0728     ?      3.2646+-0.0741     ?      3.2886+-0.0592          might be 1.0200x faster
bitops-3bit-bits-in-byte        1.1191+-0.0478            1.0958+-0.0320            1.0884+-0.0286     !      1.1667+-0.0302        ? might be 1.0426x slower
bitops-bits-in-byte             2.6866+-0.0639     !      2.8382+-0.0810            2.8290+-0.0718     ?      2.8594+-0.0656        ! definitely 1.0643x slower
bitops-bitwise-and              2.0123+-0.0615     ?      2.0130+-0.0460     ?      2.0272+-0.0553     ?      2.0289+-0.0352        ?
bitops-nsieve-bits              3.0680+-0.0560     ?      3.1273+-0.0764            3.0812+-0.0607     !      3.2384+-0.0794        ! definitely 1.0555x slower
controlflow-recursive           2.3504+-0.0681            2.3493+-0.0660            2.2999+-0.0489     !      2.5551+-0.0394        ! definitely 1.0871x slower
crypto-aes                      4.2491+-0.0804     ?      4.3124+-0.1037            4.3036+-0.0704     !      4.4892+-0.1032        ! definitely 1.0565x slower
crypto-md5                      2.6509+-0.0573     ?      2.6630+-0.0657     ?      2.6635+-0.0614     !      2.9365+-0.0844        ! definitely 1.1077x slower
crypto-sha1                     2.7601+-0.0670            2.6642+-0.0375     ?      2.7612+-0.0703     ?      2.8800+-0.0746        ? might be 1.0434x slower
date-format-tofte               6.5826+-0.1182     ?      6.7540+-0.1027            6.6871+-0.0935     !      9.3928+-0.1566        ! definitely 1.4269x slower
date-format-xparb               4.4663+-0.0830     ?      4.4911+-0.0612            4.4730+-0.0506     !      4.9996+-0.0873        ! definitely 1.1194x slower
math-cordic                     2.6905+-0.0409     ?      2.7427+-0.0657            2.7222+-0.0514     ?      2.7710+-0.0406        ? might be 1.0299x slower
math-partial-sums               4.0119+-0.0692            3.9269+-0.0461     ?      4.0258+-0.0746     !      4.6097+-0.0818        ! definitely 1.1490x slower
math-spectral-norm              1.9718+-0.0325     ?      2.0352+-0.0585     ?      2.0393+-0.0652            2.0289+-0.0404        ? might be 1.0289x slower
regexp-dna                      6.3970+-0.1175            6.3011+-0.0780     ?      6.3835+-0.1095     ?      6.4992+-0.1180        ? might be 1.0160x slower
string-base64                   4.4889+-0.0816     ?      4.5403+-0.0753     ?      4.5546+-0.0830     !      5.1134+-0.0853        ! definitely 1.1391x slower
string-fasta                    5.4806+-0.0922            5.4062+-0.1009     ?      5.4205+-0.0723     !      5.8931+-0.0713        ! definitely 1.0752x slower
string-tagcloud                 8.2230+-0.0988     ?      8.2351+-0.0956            8.2001+-0.0865     !      9.0668+-0.1247        ! definitely 1.1026x slower
string-unpack-code             18.1459+-0.1789     ?     18.2796+-0.2175           18.2104+-0.1930     !     19.8778+-0.2530        ! definitely 1.0954x slower
string-validate-input           4.2079+-0.0748     ?      4.2927+-0.0884            4.2476+-0.0599     !      4.8576+-0.0471        ! definitely 1.1544x slower

<arithmetic>                    4.4236+-0.0190     ?      4.4334+-0.0149            4.4273+-0.0162     !      4.8911+-0.0188        ! definitely 1.1057x slower
Comment 14 Filip Pizlo 2016-09-25 17:00:32 PDT
*** Bug 162318 has been marked as a duplicate of this bug. ***
Comment 15 Filip Pizlo 2016-09-25 17:17:07 PDT
Created attachment 289786 [details]
the patch
Comment 16 WebKit Commit Bot 2016-09-25 17:19:49 PDT
Attachment 289786 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:128:  Missing space before {  [whitespace/braces] [5]
Total errors found: 1 in 48 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 17 Filip Pizlo 2016-09-25 17:20:50 PDT
Comment on attachment 289786 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=289786&action=review

> Source/JavaScriptCore/ChangeLog:43
> +        store-load fence on any kind before store barriers, because that causes enormous slow

*of any kind

> Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.h:87
> +// FIXME: If we sink barriers, we need to make sure that we execute barriers upon OSR exit.

I fixed this with the mayExit thing.
Comment 18 Filip Pizlo 2016-09-25 17:22:23 PDT
Created attachment 289787 [details]
the patch

Fixes some build issues and small goofs.
Comment 19 WebKit Commit Bot 2016-09-25 17:25:05 PDT
Attachment 289787 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:128:  Missing space before {  [whitespace/braces] [5]
Total errors found: 1 in 48 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 20 Filip Pizlo 2016-09-25 17:31:26 PDT
Created attachment 289788 [details]
the patch

More build fixes!
Comment 21 WebKit Commit Bot 2016-09-25 17:34:13 PDT
Attachment 289788 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:128:  Missing space before {  [whitespace/braces] [5]
Total errors found: 1 in 48 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 22 Filip Pizlo 2016-09-25 18:06:46 PDT
Created attachment 289789 [details]
the patch

Turns out that all of my ARM assembler changes were not necessary.
Comment 23 WebKit Commit Bot 2016-09-25 18:08:03 PDT
Attachment 289789 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:128:  Missing space before {  [whitespace/braces] [5]
Total errors found: 1 in 46 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 24 Filip Pizlo 2016-09-25 18:11:48 PDT
Created attachment 289790 [details]
the patch
Comment 25 WebKit Commit Bot 2016-09-25 18:13:06 PDT
Attachment 289790 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:128:  Missing space before {  [whitespace/braces] [5]
Total errors found: 1 in 46 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 26 Filip Pizlo 2016-09-25 19:13:57 PDT
This looks like a 0.7% slow-down on JetStream with 91% confidence. I think I know why, so I'm going to try things.
Comment 27 Filip Pizlo 2016-09-25 22:30:48 PDT
It looks like a 0.68% slow-down on JetStream 90.4% confidence.  I analyzed the per-benchmark data and found that the following benchmarks are regressed:

- richards, by about 4%
- deltablue, by about 3%
- splay, by about 2%
- regexp-2010, by about 5%

Out of these, the regexp regression seems the most pronounced.  Also, it's the only one confirmed by run-jsc-benchmarks.

If the other benchmarks become bottlenecks them I'm starting to think that maybe for super hot code, we should emit a patchable nop sled for a fence, and then blast over it with fences at the start of GC and then reset it back at the end.  This would lead to amazing perf, provided that we could zap those fences quickly enough.

So, how quickly can we modify such code?  I'll look into all of that after I fix regexp.
Comment 28 Filip Pizlo 2016-09-25 22:54:28 PDT
Created attachment 289799 [details]
the patch

The JetStream regression is small, but I'm making it smaller!
Comment 29 WebKit Commit Bot 2016-09-25 22:57:21 PDT
Attachment 289799 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:135:  Missing space before {  [whitespace/braces] [5]
Total errors found: 1 in 48 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 30 Filip Pizlo 2016-09-25 23:57:52 PDT
Created attachment 289800 [details]
trying to get around the regexp regression

I'm still trying to figure out how best to do it.
Comment 31 Filip Pizlo 2016-09-26 12:37:55 PDT
Created attachment 289846 [details]
the patch

More performant version.
Comment 32 WebKit Commit Bot 2016-09-26 12:39:40 PDT
Attachment 289846 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:135:  Missing space before {  [whitespace/braces] [5]
Total errors found: 1 in 61 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 33 Filip Pizlo 2016-09-26 12:49:56 PDT
Created attachment 289847 [details]
the patch

Fixing debug build.
Comment 34 WebKit Commit Bot 2016-09-26 12:51:15 PDT
Attachment 289847 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:135:  Missing space before {  [whitespace/braces] [5]
Total errors found: 1 in 61 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 35 Filip Pizlo 2016-09-26 14:27:19 PDT
Created attachment 289869 [details]
the patch

Attempting to fix Windows.
Comment 36 WebKit Commit Bot 2016-09-26 14:28:43 PDT
Attachment 289869 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:135:  Missing space before {  [whitespace/braces] [5]
Total errors found: 1 in 61 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 37 Filip Pizlo 2016-09-26 14:45:30 PDT
Created attachment 289875 [details]
the patch

trying to get windows to work again
Comment 38 WebKit Commit Bot 2016-09-26 14:48:39 PDT
Attachment 289875 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:135:  Missing space before {  [whitespace/braces] [5]
Total errors found: 1 in 61 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 39 Geoffrey Garen 2016-09-26 14:59:36 PDT
r=me

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:8321
> +            m_jit.sneakyJumpIfIsRememberedOrInEden(baseGPR, scratch1GPR));

I'm not sure I like the trend of labeling concurrent algorithms "sneaky" or "magic". Usually, algorithms do something explainable, and we should strive to explain them.

How about "jumpIfIsRememberedOrInEdenOrMightNeedFence"?

Or "jumpIfBelowBlackThreshold". (That's what we call this concept in C++.)

Or "jumpIfNotCollectingAndIsRememberedOrInEden".

> Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:96
> +            // requirement by either (1) having a StoreBarrierHint that tells OSR exit to OSR that

to barrier that value?

> Source/JavaScriptCore/heap/CellState.h:58
> +inline bool isWithinThreshold(CellState cellState, unsigned threshold)

Maybe isBelowThreshold?

> Source/JavaScriptCore/heap/DeferralContext.h:33
> +class DeferralContext {

How about GCDeferralContext or DeferGCContext (to match DeferGC)?

> Source/JavaScriptCore/heap/Heap.cpp:1555
> +        // In this case, the sneakyBlackThreshold is tautological impossible threshold, so from

is a

I think you meant that the sneakyBlackThreshold *could be* the tautological threshold -- not that it is guaranteed to be.

> Source/JavaScriptCore/heap/Heap.h:269
> +    unsigned sneakyBlackThreshold() const { return m_sneakyBlackThreshold; }
> +    const unsigned* addressOfSneakyBlackThreshold() const { return &m_sneakyBlackThreshold; }

Let's just call this blackThreshold.
Comment 40 Geoffrey Garen 2016-09-26 14:59:49 PDT
Comment on attachment 289875 [details]
the patch

r=me
Comment 41 Filip Pizlo 2016-09-26 15:13:11 PDT
(In reply to comment #39)
> r=me
> 
> > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:8321
> > +            m_jit.sneakyJumpIfIsRememberedOrInEden(baseGPR, scratch1GPR));
> 
> I'm not sure I like the trend of labeling concurrent algorithms "sneaky" or
> "magic". Usually, algorithms do something explainable, and we should strive
> to explain them.
> 
> How about "jumpIfIsRememberedOrInEdenOrMightNeedFence"?
> 
> Or "jumpIfBelowBlackThreshold". (That's what we call this concept in C++.)
> 
> Or "jumpIfNotCollectingAndIsRememberedOrInEden".

Those names have so many words!

How about I use these names:

sneakyJumpIfIsRememberedOrInEden -> barrierJump
jumpIfIsRememberedOrInEden -> barrierJumpWithFence
jumpIfIsRememberedOrInEdenWithoutFence -> barrierJumpWithoutFence

> 
> > Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:96
> > +            // requirement by either (1) having a StoreBarrierHint that tells OSR exit to OSR that
> 
> to barrier that value?

Yes.

> 
> > Source/JavaScriptCore/heap/CellState.h:58
> > +inline bool isWithinThreshold(CellState cellState, unsigned threshold)
> 
> Maybe isBelowThreshold?

But it's not below threshold.  Below means "<".  It's below or equal to the threshold.  Rather than using BelowOrEqual, I thought "Within" was shorter.

> 
> > Source/JavaScriptCore/heap/DeferralContext.h:33
> > +class DeferralContext {
> 
> How about GCDeferralContext or DeferGCContext (to match DeferGC)?

OK.

> 
> > Source/JavaScriptCore/heap/Heap.cpp:1555
> > +        // In this case, the sneakyBlackThreshold is tautological impossible threshold, so from
> 
> is a
> 
> I think you meant that the sneakyBlackThreshold *could be* the tautological
> threshold -- not that it is guaranteed to be.

It is guaranteed to be if barrierShouldBeFenced() is true, which is the condition that guards this comment.

> 
> > Source/JavaScriptCore/heap/Heap.h:269
> > +    unsigned sneakyBlackThreshold() const { return m_sneakyBlackThreshold; }
> > +    const unsigned* addressOfSneakyBlackThreshold() const { return &m_sneakyBlackThreshold; }
> 
> Let's just call this blackThreshold.

I don't think that really captures it.  This threshold isn't really to do with being black, it's to do with taking the barrier slow path.  I'll call it barrierThreshold.
Comment 42 Geoffrey Garen 2016-09-26 15:21:49 PDT
> > How about "jumpIfIsRememberedOrInEdenOrMightNeedFence"?
> > 
> > Or "jumpIfBelowBlackThreshold". (That's what we call this concept in C++.)
> > 
> > Or "jumpIfNotCollectingAndIsRememberedOrInEden".
> 
> Those names have so many words!
> 
> How about I use these names:
> 
> sneakyJumpIfIsRememberedOrInEden -> barrierJump
> jumpIfIsRememberedOrInEden -> barrierJumpWithFence
> jumpIfIsRememberedOrInEdenWithoutFence -> barrierJumpWithoutFence

Yeah, that sounds pretty good.

> > > Source/JavaScriptCore/heap/CellState.h:58
> > > +inline bool isWithinThreshold(CellState cellState, unsigned threshold)
> > 
> > Maybe isBelowThreshold?
> 
> But it's not below threshold.  Below means "<".  It's below or equal to the
> threshold.  Rather than using BelowOrEqual, I thought "Within" was shorter.

OK.

> > I think you meant that the sneakyBlackThreshold *could be* the tautological
> > threshold -- not that it is guaranteed to be.
> 
> It is guaranteed to be if barrierShouldBeFenced() is true, which is the
> condition that guards this comment.

OK.

> > > Source/JavaScriptCore/heap/Heap.h:269
> > > +    unsigned sneakyBlackThreshold() const { return m_sneakyBlackThreshold; }
> > > +    const unsigned* addressOfSneakyBlackThreshold() const { return &m_sneakyBlackThreshold; }
> > 
> > Let's just call this blackThreshold.
> 
> I don't think that really captures it.  This threshold isn't really to do
> with being black, it's to do with taking the barrier slow path.  I'll call
> it barrierThreshold.

Sounds good.
Comment 43 Filip Pizlo 2016-09-27 12:33:36 PDT
(In reply to comment #41)
> (In reply to comment #39)
> > r=me
> > 
> > > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:8321
> > > +            m_jit.sneakyJumpIfIsRememberedOrInEden(baseGPR, scratch1GPR));
> > 
> > I'm not sure I like the trend of labeling concurrent algorithms "sneaky" or
> > "magic". Usually, algorithms do something explainable, and we should strive
> > to explain them.
> > 
> > How about "jumpIfIsRememberedOrInEdenOrMightNeedFence"?
> > 
> > Or "jumpIfBelowBlackThreshold". (That's what we call this concept in C++.)
> > 
> > Or "jumpIfNotCollectingAndIsRememberedOrInEden".
> 
> Those names have so many words!
> 
> How about I use these names:
> 
> sneakyJumpIfIsRememberedOrInEden -> barrierJump
> jumpIfIsRememberedOrInEden -> barrierJumpWithFence
> jumpIfIsRememberedOrInEdenWithoutFence -> barrierJumpWithoutFence

I ended up removing barrierJumpWithFence.

> 
> > 
> > > Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:96
> > > +            // requirement by either (1) having a StoreBarrierHint that tells OSR exit to OSR that
> > 
> > to barrier that value?
> 
> Yes.
> 
> > 
> > > Source/JavaScriptCore/heap/CellState.h:58
> > > +inline bool isWithinThreshold(CellState cellState, unsigned threshold)
> > 
> > Maybe isBelowThreshold?
> 
> But it's not below threshold.  Below means "<".  It's below or equal to the
> threshold.  Rather than using BelowOrEqual, I thought "Within" was shorter.
> 
> > 
> > > Source/JavaScriptCore/heap/DeferralContext.h:33
> > > +class DeferralContext {
> > 
> > How about GCDeferralContext or DeferGCContext (to match DeferGC)?
> 
> OK.

Fixed.

> 
> > 
> > > Source/JavaScriptCore/heap/Heap.cpp:1555
> > > +        // In this case, the sneakyBlackThreshold is tautological impossible threshold, so from
> > 
> > is a
> > 
> > I think you meant that the sneakyBlackThreshold *could be* the tautological
> > threshold -- not that it is guaranteed to be.
> 
> It is guaranteed to be if barrierShouldBeFenced() is true, which is the
> condition that guards this comment.
> 
> > 
> > > Source/JavaScriptCore/heap/Heap.h:269
> > > +    unsigned sneakyBlackThreshold() const { return m_sneakyBlackThreshold; }
> > > +    const unsigned* addressOfSneakyBlackThreshold() const { return &m_sneakyBlackThreshold; }
> > 
> > Let's just call this blackThreshold.
> 
> I don't think that really captures it.  This threshold isn't really to do
> with being black, it's to do with taking the barrier slow path.  I'll call
> it barrierThreshold.

Fixed.
Comment 44 Filip Pizlo 2016-09-27 12:39:01 PDT
Created attachment 289998 [details]
patch for landing
Comment 45 WebKit Commit Bot 2016-09-27 12:42:29 PDT
Attachment 289998 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:135:  Missing space before {  [whitespace/braces] [5]
Total errors found: 1 in 62 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 46 Filip Pizlo 2016-09-28 09:51:02 PDT
Created attachment 290096 [details]
rebased patch

I just have to test on ARM and then I'll land.
Comment 47 WebKit Commit Bot 2016-09-28 09:52:31 PDT
Attachment 290096 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:135:  Missing space before {  [whitespace/braces] [5]
Total errors found: 1 in 62 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 48 Filip Pizlo 2016-09-28 13:33:57 PDT
Oh boy!  Now I just need to benchmark this on ARM and I'm done!
Comment 49 Filip Pizlo 2016-09-28 13:34:57 PDT
Created attachment 290112 [details]
rebased patch
Comment 50 WebKit Commit Bot 2016-09-28 13:36:37 PDT
Attachment 290112 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGStoreBarrierClusteringPhase.cpp:135:  Missing space before {  [whitespace/braces] [5]
Total errors found: 1 in 62 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 51 Filip Pizlo 2016-09-28 14:58:34 PDT
Landed in https://trac.webkit.org/changeset/206555
Comment 52 Csaba Osztrogonác 2016-09-29 05:45:08 PDT
(In reply to comment #51)
> Landed in https://trac.webkit.org/changeset/206555

It made Dromaeo/jslib-style-jquery.html crash on performance bots,
see bug162721 for details.