Bug 132137

Summary: Make slowPathAllocsBetweenGCs a runtime option
Product: WebKit Reporter: Mark Lam <mark.lam>
Component: JavaScriptCoreAssignee: Mark Lam <mark.lam>
Status: RESOLVED FIXED    
Severity: Normal CC: fpizlo, ggaren, mhahnenberg, mmirman, msaboff, oliver
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
the patch
mhahnenberg: review+
revised patch mhahnenberg: review+

Description Mark Lam 2014-04-24 11:26:22 PDT
This will make it easier to more casually run tests with this configuration as well as to reproduce issues (instead of requiring a code mod and rebuild).  We will now take --collectOnEveryAllocation=N where N is the number of allocations (which go through that slow path) before we trigger a collection.  The option defaults to 0 which is reserved to mean that we will not trigger any collections there.
Comment 1 Mark Lam 2014-04-24 11:37:18 PDT
Created attachment 230097 [details]
the patch
Comment 2 Mark Hahnenberg 2014-04-24 11:41:56 PDT
Comment on attachment 230097 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=230097&action=review

I'd like to see performance numbers for this change.

> Source/JavaScriptCore/heap/MarkedAllocator.cpp:159
> +        static unsigned allocationCount = 0;
> +        if (!allocationCount) {
> +            if (!m_heap->isDeferred())
> +                m_heap->collectAllGarbage();
> +            ASSERT(m_heap->m_operationInProgress == NoOperation);
> +        }
> +        if (++allocationCount >= Options::collectOnEveryAllocation())
> +            allocationCount = 0;

This is sort of an odd way to write this. Why not trigger a GC when you exceed the limit rather than when you hit 0?
Comment 3 Mark Lam 2014-04-24 11:45:09 PDT
(In reply to comment #2)
> (From update of attachment 230097 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=230097&action=review
> 
> I'd like to see performance numbers for this change.
> 
> > Source/JavaScriptCore/heap/MarkedAllocator.cpp:159
> > +        static unsigned allocationCount = 0;
> > +        if (!allocationCount) {
> > +            if (!m_heap->isDeferred())
> > +                m_heap->collectAllGarbage();
> > +            ASSERT(m_heap->m_operationInProgress == NoOperation);
> > +        }
> > +        if (++allocationCount >= Options::collectOnEveryAllocation())
> > +            allocationCount = 0;
> 
> This is sort of an odd way to write this. Why not trigger a GC when you exceed the limit rather than when you hit 0?

Just a heuristic based on my experience of testing with collections on every 100 allocations.  I found that collecting on the first allocation rather than the last makes it a lot more likely that I’ll see issues (based on our regression tests as the workload).  For some short running tests, they may not get to the 100th slow path allocation before the test ends.
Comment 4 Mark Hahnenberg 2014-04-24 11:48:35 PDT
Also, collectOnEveryAllocation is a less than ideal name for this because that's not what it does. Perhaps a better name would be numberOfAllocSlowPathsBeforeCollect or something like that? Less verbose would be good, but an accurate name is more important.
Comment 5 Mark Hahnenberg 2014-04-24 11:49:32 PDT
It also might be helpful to add something like this to CopiedSpace as well.
Comment 6 Mark Hahnenberg 2014-04-24 11:52:48 PDT
Comment on attachment 230097 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=230097&action=review

r=me with good perf numbers

>>> Source/JavaScriptCore/heap/MarkedAllocator.cpp:159
>>> +            allocationCount = 0;
>> 
>> This is sort of an odd way to write this. Why not trigger a GC when you exceed the limit rather than when you hit 0?
> 
> Just a heuristic based on my experience of testing with collections on every 100 allocations.  I found that collecting on the first allocation rather than the last makes it a lot more likely that I’ll see issues (based on our regression tests as the workload).  For some short running tests, they may not get to the 100th slow path allocation before the test ends.

Fair enough.

You should factor this out into an ALWAYS_INLINE method so that we keep the main method relatively clean. I know you're just replacing old crufty code, but let's improve things while we're here :-)
Comment 7 Mark Hahnenberg 2014-04-24 11:55:39 PDT
Oh, one more thing I forgot! It would be cool to add a mode to run-jsc-stress-tests with this enabled on some reasonable setting so we get more GC coverage during our normal test harness runs.
Comment 8 Mark Lam 2014-04-24 11:58:53 PDT
So many demands. =)  Ok, to summarize:
1. collect on allocations in CopiedSpace.
2. run-jsc-stress-tests case to stress some GC action.

Would other folks agree with adding this case to the stress tests given that the nature of this test is slow?
Comment 9 Mark Hahnenberg 2014-04-24 12:02:02 PDT
(In reply to comment #8)
> So many demands. =)  Ok, to summarize:
> 1. collect on allocations in CopiedSpace.
> 2. run-jsc-stress-tests case to stress some GC action.
These can be followup bugs if you want to pad your stats :-) In fact, it's probably better if they are since they're orthogonal to this bug.

> 
> Would other folks agree with adding this case to the stress tests given that the nature of this test is slow?
You could avoid making it part of the default run and instead just make it part of a "very stressful" mode. I know we've talked about something like this for a while.
Comment 10 Mark Lam 2014-04-24 14:01:02 PDT
Benchmark results say we are neutral.  Will upload the updated patch and hopefully land shortly.


                                                          Conf#1                    Conf#2                                      
SunSpider:
   3d-cube                                            7.7817+-0.3281     ?      8.2096+-1.3606        ? might be 1.0550x slower
   3d-morph                                           8.4812+-1.0369     ?      9.6898+-4.6944        ? might be 1.1425x slower
   3d-raytrace                                       11.5082+-2.8843            9.6127+-1.2812          might be 1.1972x faster
   access-binary-trees                                2.6550+-0.6632     ?      2.7865+-0.7214        ? might be 1.0495x slower
   access-fannkuch                                    7.8917+-0.8625            7.8015+-0.6520          might be 1.0116x faster
   access-nbody                                       4.4300+-1.0925     ?      4.8876+-1.0792        ? might be 1.1033x slower
   access-nsieve                                      4.9244+-1.3837            4.4075+-0.0423          might be 1.1173x faster
   bitops-3bit-bits-in-byte                           2.3371+-0.5959            2.0787+-0.0116          might be 1.1243x faster
   bitops-bits-in-byte                                3.6168+-0.1941            3.5460+-0.0218          might be 1.0200x faster
   bitops-bitwise-and                                 3.1810+-0.6017            2.9653+-0.0650          might be 1.0727x faster
   bitops-nsieve-bits                                 5.0885+-0.0589     ?      5.1084+-0.0839        ?
   controlflow-recursive                              2.5657+-0.0380     ?      3.3945+-2.5128        ? might be 1.3230x slower
   crypto-aes                                         6.9799+-2.7119            6.5001+-2.6254          might be 1.0738x faster
   crypto-md5                                         3.6060+-0.6929     ?      4.0301+-1.0396        ? might be 1.1176x slower
   crypto-sha1                                        3.7060+-1.4035            3.6785+-0.7393        
   date-format-tofte                                 11.4237+-0.4750     ?     12.4314+-1.3862        ? might be 1.0882x slower
   date-format-xparb                                  9.1884+-0.8705            9.1500+-0.9057        
   math-cordic                                        4.6897+-1.9326            4.3546+-0.9816          might be 1.0770x faster
   math-partial-sums                                  7.5208+-1.7282     ?      8.1153+-1.3507        ? might be 1.0790x slower
   math-spectral-norm                                 2.7354+-0.2828     ?      3.2899+-1.0157        ? might be 1.2027x slower
   regexp-dna                                        10.6552+-0.8363           10.6185+-0.7766        
   string-base64                                      6.4170+-1.7555            5.9673+-0.2326          might be 1.0754x faster
   string-fasta                                       9.4849+-0.2355     ?      9.9951+-0.4049        ? might be 1.0538x slower
   string-tagcloud                                   14.0712+-1.3022     ?     14.9808+-2.3676        ? might be 1.0646x slower
   string-unpack-code                                30.7645+-3.9564     ?     31.5425+-4.0303        ? might be 1.0253x slower
   string-validate-input                              6.7678+-0.5161     ?      6.8790+-1.1584        ? might be 1.0164x slower

   <arithmetic> *                                     7.4028+-0.3936     ?      7.5393+-0.2117        ? might be 1.0184x slower
   <geometric>                                        6.0253+-0.2874     ?      6.1280+-0.3136        ? might be 1.0170x slower
   <harmonic>                                         5.0649+-0.2447     ?      5.1580+-0.3539        ? might be 1.0184x slower

                                                          Conf#1                    Conf#2                                      
LongSpider:
   3d-cube                                         1857.9495+-15.0811    ?   1866.6677+-32.3299       ?
   3d-morph                                        1238.3037+-13.8758    ?   1275.7288+-128.9616      ? might be 1.0302x slower
   3d-raytrace                                     1222.5578+-30.8440    ?   1293.4274+-280.3974      ? might be 1.0580x slower
   access-binary-trees                             1329.3123+-28.3387    ?   1389.5810+-135.1811      ? might be 1.0453x slower
   access-fannkuch                                  543.8162+-2.4885     ?    543.8214+-1.9855        ?
   access-nbody                                    1168.7630+-5.0817     ?   1173.0695+-9.3087        ?
   access-nsieve                                   1237.9380+-33.6346    ?   1261.2711+-7.8720        ? might be 1.0188x slower
   bitops-3bit-bits-in-byte                         134.8546+-2.1557     ?    135.4371+-1.8951        ?
   bitops-bits-in-byte                              215.7848+-11.8690    ?    222.2466+-36.8085       ? might be 1.0299x slower
   bitops-nsieve-bits                              1133.7550+-20.8954        1128.2193+-32.8758       
   controlflow-recursive                            604.7928+-3.6893     ?    605.1859+-7.8722        ?
   crypto-aes                                      1424.5351+-15.7268    ?   1434.2581+-6.1408        ?
   crypto-md5                                      1178.8233+-15.0178    ?   1182.0520+-20.7265       ?
   crypto-sha1                                     1416.6429+-12.7029        1414.4310+-3.3047        
   date-format-tofte                               1143.2686+-256.8228       1073.5927+-29.6544         might be 1.0649x faster
   date-format-xparb                               1396.4704+-48.5130    ?   1432.3767+-97.5210       ? might be 1.0257x slower
   math-cordic                                     1423.0532+-28.5076        1408.4755+-6.7751          might be 1.0103x faster
   math-partial-sums                                848.6508+-4.8391          847.5743+-11.9581       
   math-spectral-norm                              1376.6771+-330.2995       1293.3749+-53.8112         might be 1.0644x faster
   string-base64                                    542.2126+-117.0719   ?    545.4697+-146.3643      ?
   string-fasta                                     926.9550+-22.1414         914.6357+-16.3680         might be 1.0135x faster
   string-tagcloud                                  338.2670+-7.5392     ?    342.1763+-13.2518       ? might be 1.0116x slower

   <arithmetic>                                    1031.9720+-19.0559    ?   1035.5942+-6.3347        ? might be 1.0035x slower
   <geometric> *                                    881.0723+-16.3946    ?    884.4750+-7.7875        ? might be 1.0039x slower
   <harmonic>                                       658.9682+-11.7429    ?    662.8566+-15.8704       ? might be 1.0059x slower

                                                          Conf#1                    Conf#2                                      
V8Spider:
   crypto                                            69.9048+-4.1645     ?     70.7333+-3.4173        ? might be 1.0119x slower
   deltablue                                        101.1406+-39.7781          91.2475+-5.0083          might be 1.1084x faster
   earley-boyer                                      65.9218+-3.0490     ?     66.0215+-4.6824        ?
   raytrace                                          57.0201+-23.3816    ?     57.5493+-12.6118       ?
   regexp                                            97.2420+-25.5873          88.6986+-1.1290          might be 1.0963x faster
   richards                                          98.4443+-2.7928     ?    115.5652+-46.9034       ? might be 1.1739x slower
   splay                                             48.7642+-1.9023           48.3143+-3.0282        

   <arithmetic>                                      76.9197+-3.3334           76.8757+-6.2474          might be 1.0006x faster
   <geometric> *                                     73.6663+-1.9546     ?     73.7040+-4.2066        ? might be 1.0005x slower
   <harmonic>                                        70.5878+-2.6731     ?     70.8193+-3.7366        ? might be 1.0033x slower

                                                          Conf#1                    Conf#2                                      
Octane and V8v7:
   encrypt                                           0.40158+-0.10244          0.36916+-0.00114         might be 1.0878x faster
   decrypt                                           6.71789+-0.03932          6.71385+-0.06757       
   deltablue                                x2       0.46008+-0.00552    ?     0.46483+-0.00628       ? might be 1.0103x slower
   earley                                            0.70934+-0.00493    ?     0.71613+-0.00964       ?
   boyer                                             9.31738+-0.13290    ?     9.38592+-0.10754       ?
   navier-stokes                            x2       9.51646+-0.03267    ?     9.54048+-0.08468       ?
   raytrace                                 x2       3.70691+-0.27539          3.52496+-0.07128         might be 1.0516x faster
   regexp                                   x2      27.47273+-0.22018         27.28569+-0.30541       
   richards                                 x2       0.25877+-0.00362    ?     0.26166+-0.00347       ? might be 1.0112x slower
   splay                                    x2       0.65997+-0.02005          0.65940+-0.00578       
   pdfjs                                    x2      83.53057+-0.99396    ?    84.59044+-1.17326       ? might be 1.0127x slower
   mandreel                                 x2     128.86821+-7.11868        128.43458+-3.74882       
   gbemu                                    x2      74.22887+-11.32429   ?    74.48753+-11.22750      ?
   closure                                           0.78517+-0.00624    ?     0.78672+-0.01304       ?
   jquery                                           11.67210+-2.83831         10.84885+-0.31685         might be 1.0759x faster
   box2d                                    x2      24.65797+-0.46701         24.51883+-0.07576       
   zlib                                     x2     843.10042+-33.68579       836.15651+-73.20082      
   typescript                               x2    1144.93622+-50.43383   ?  1165.61444+-45.09676      ? might be 1.0181x slower

V8v7:
   <arithmetic>                                      6.33100+-0.05173          6.29119+-0.02317         might be 1.0063x faster
   <geometric> *                                     2.05649+-0.03423          2.04033+-0.00605         might be 1.0079x faster
   <harmonic>                                        0.79500+-0.01966          0.79244+-0.00471         might be 1.0032x faster

Octane including V8v7:
   <arithmetic>                                    157.07993+-2.82761    ?   157.99664+-8.35162       ? might be 1.0058x slower
   <geometric> *                                    12.13662+-0.15402         12.07393+-0.23821         might be 1.0052x faster
   <harmonic>                                        1.38641+-0.03171          1.38210+-0.00749         might be 1.0031x faster

                                                          Conf#1                    Conf#2                                      
Kraken:
   ai-astar                                          341.860+-2.798            340.678+-3.426         
   audio-beat-detection                              184.256+-1.739      ?     185.440+-3.195         ?
   audio-dft                                         308.765+-13.410           305.883+-15.414        
   audio-fft                                         122.492+-49.100           106.389+-0.717           might be 1.1514x faster
   audio-oscillator                                  218.122+-5.223      ?     218.203+-4.089         ?
   imaging-darkroom                                  256.919+-4.533            255.501+-1.907         
   imaging-desaturate                                139.108+-0.574            138.844+-3.743         
   imaging-gaussian-blur                             326.481+-176.703          268.797+-3.180           might be 1.2146x faster
   json-parse-financial                               65.803+-3.706      ?      70.209+-3.340         ? might be 1.0670x slower
   json-stringify-tinderbox                           89.832+-10.574     ?     101.337+-30.933        ? might be 1.1281x slower
   stanford-crypto-aes                                82.302+-33.081     ?      83.899+-40.824        ? might be 1.0194x slower
   stanford-crypto-ccm                                87.922+-2.415             86.757+-4.534           might be 1.0134x faster
   stanford-crypto-pbkdf2                            240.082+-94.425           209.218+-2.996           might be 1.1475x faster
   stanford-crypto-sha256-iterative                   77.190+-1.074      ?      87.154+-25.380        ? might be 1.1291x slower

   <arithmetic> *                                    181.510+-10.614           175.594+-2.702           might be 1.0337x faster
   <geometric>                                       154.345+-5.817            152.932+-5.366           might be 1.0092x faster
   <harmonic>                                        131.274+-5.395      ?     132.886+-7.138         ? might be 1.0123x slower

                                                          Conf#1                    Conf#2                                      
JSRegress:
   adapt-to-double-divide                            26.5853+-1.3726           25.9360+-1.3263          might be 1.0250x faster
   aliased-arguments-getbyval                         1.3508+-0.4089            1.3353+-0.1187          might be 1.0116x faster
   allocate-big-object                                3.3343+-1.0991            3.0991+-0.4356          might be 1.0759x faster
   arity-mismatch-inlining                            1.1399+-0.1252     ?      1.1502+-0.2081        ?
   array-access-polymorphic-structure                10.1639+-4.0119            8.8299+-0.2458          might be 1.1511x faster
   array-nonarray-polymorhpic-access                 50.8994+-2.6047     ?     51.4412+-2.1470        ? might be 1.0106x slower
   array-prototype-every                            111.0372+-2.1073          110.7119+-2.1238        
   array-prototype-forEach                          109.1503+-1.6150     ?    113.6860+-4.7336        ? might be 1.0416x slower
   array-prototype-map                              135.9296+-6.8767          135.4051+-4.4987        
   array-prototype-some                             110.5018+-3.1442          110.4952+-1.7417        
   array-with-double-add                              5.1561+-0.0429     ?      5.4432+-0.9607        ? might be 1.0557x slower
   array-with-double-increment                        4.3077+-0.5172            4.0900+-0.1127          might be 1.0532x faster
   array-with-double-mul-add                          6.4768+-2.6226            6.3420+-1.3743          might be 1.0212x faster
   array-with-double-sum                              5.4218+-2.8657            4.4393+-0.2268          might be 1.2213x faster
   array-with-int32-add-sub                           9.7018+-1.7519            9.6859+-1.0100        
   array-with-int32-or-double-sum                     4.5615+-0.3718            4.4307+-0.0717          might be 1.0295x faster
   ArrayBuffer-DataView-alloc-large-long-lived   
                                                     92.7383+-21.1111    ?     94.5363+-18.2296       ? might be 1.0194x slower
   ArrayBuffer-DataView-alloc-long-lived             27.3530+-0.4370     ?     27.4035+-0.4026        ?
   ArrayBuffer-Int32Array-byteOffset                  5.3277+-0.9605            5.0772+-1.1157          might be 1.0493x faster
   ArrayBuffer-Int8Array-alloc-large-long-lived   
                                                     87.8235+-3.7037     ?     89.3667+-2.8378        ? might be 1.0176x slower
   ArrayBuffer-Int8Array-alloc-long-lived-buffer   
                                                     42.4662+-1.8608     ?     44.5422+-3.2409        ? might be 1.0489x slower
   ArrayBuffer-Int8Array-alloc-long-lived            27.2574+-1.2893     ?     40.7502+-26.2666       ? might be 1.4950x slower
   ArrayBuffer-Int8Array-alloc                       24.3102+-1.8089           23.4233+-0.3014          might be 1.0379x faster
   asmjs_bool_bug                                    10.2275+-1.6232            9.9374+-0.9731          might be 1.0292x faster
   assign-custom-setter-polymorphic                   5.3958+-2.8372            4.2377+-0.0790          might be 1.2733x faster
   assign-custom-setter                               6.7116+-1.8186            5.7207+-0.1553          might be 1.1732x faster
   basic-set                                         15.2944+-1.4699     ?     16.8418+-1.6224        ? might be 1.1012x slower
   big-int-mul                                        5.3982+-2.3876            4.8770+-0.5635          might be 1.1069x faster
   boolean-test                                       3.9315+-0.0601            3.9216+-0.0528        
   branch-fold                                        5.3407+-0.3900            5.2039+-0.1548          might be 1.0263x faster
   by-val-generic                                    14.1953+-3.0894           12.2887+-0.4783          might be 1.1551x faster
   call-spread-apply                                 21.1329+-2.0698           19.9612+-1.6778          might be 1.0587x faster
   call-spread-call                                   9.4270+-2.2975     ?     10.0299+-3.5196        ? might be 1.0640x slower
   captured-assignments                               0.7473+-0.2279            0.6880+-0.0339          might be 1.0861x faster
   cast-int-to-double                                13.8560+-3.3016           12.7221+-1.0309          might be 1.0891x faster
   cell-argument                                     16.7746+-0.7704     ?     16.8671+-1.9028        ?
   cfg-simplify                                       3.9363+-0.1383            3.8776+-0.2038          might be 1.0151x faster
   chain-getter-access                               32.0134+-0.7059     ?     42.8976+-20.4539       ? might be 1.3400x slower
   cmpeq-obj-to-obj-other                            12.2512+-1.0254     ?     12.3747+-0.9741        ? might be 1.0101x slower
   constant-test                                      6.7241+-1.0113     ?      7.2228+-1.5912        ? might be 1.0742x slower
   DataView-custom-properties                       102.6498+-28.4165          95.2300+-3.8282          might be 1.0779x faster
   delay-tear-off-arguments-strictmode                3.8663+-0.6316            3.6145+-0.1341          might be 1.0697x faster
   destructuring-arguments                            7.5680+-1.1250            7.1902+-0.2746          might be 1.0525x faster
   destructuring-swap                                 7.2413+-1.4801            6.7482+-0.3006          might be 1.0731x faster
   direct-arguments-getbyval                          1.0186+-0.0260     ?      1.1976+-0.5724        ? might be 1.1758x slower
   double-get-by-val-out-of-bounds                    8.1046+-0.9512            8.0263+-1.7558        
   double-pollution-getbyval                         13.4520+-1.4616           13.4288+-1.9079        
   double-pollution-putbyoffset                       6.7491+-1.6974            5.6722+-0.0553          might be 1.1899x faster
   double-to-int32-typed-array-no-inline              3.2407+-1.1429     ?      3.2752+-0.9846        ? might be 1.0106x slower
   double-to-int32-typed-array                        2.4253+-0.0577     ?      2.4659+-0.0630        ? might be 1.0168x slower
   double-to-uint32-typed-array-no-inline             3.2659+-0.7862            2.8707+-0.0955          might be 1.1377x faster
   double-to-uint32-typed-array                       3.0692+-1.3075            2.6465+-0.2561          might be 1.1597x faster
   empty-string-plus-int                              9.8101+-1.5952            9.5392+-0.4874          might be 1.0284x faster
   emscripten-cube2hash                              66.7296+-10.0378          63.3163+-12.9618         might be 1.0539x faster
   external-arguments-getbyval                        1.9769+-0.0874     ?      2.3238+-1.0649        ? might be 1.1755x slower
   external-arguments-putbyval                        2.7413+-0.2124     ?      2.7703+-0.1554        ? might be 1.0106x slower
   fixed-typed-array-storage-var-index                1.8513+-0.3844            1.8207+-0.4535          might be 1.0168x faster
   fixed-typed-array-storage                          1.1812+-0.0593            1.1570+-0.0289          might be 1.0209x faster
   Float32Array-matrix-mult                           8.2892+-3.3549            6.9694+-1.0974          might be 1.1894x faster
   Float32Array-to-Float64Array-set                  80.3307+-0.6767     ^     71.7085+-3.4909        ^ definitely 1.1202x faster
   Float64Array-alloc-long-lived                     94.0060+-3.4542     ?     94.2178+-4.4826        ?
   Float64Array-to-Int16Array-set                   122.9407+-75.6291          93.7740+-1.1519          might be 1.3110x faster
   fold-double-to-int                                20.7211+-3.5298     ?     24.0407+-12.4710       ? might be 1.1602x slower
   for-of-iterate-array-entries                       9.3400+-0.9179            8.6447+-0.2646          might be 1.0804x faster
   for-of-iterate-array-keys                          3.8580+-1.0004            3.6834+-0.2588          might be 1.0474x faster
   for-of-iterate-array-values                        3.4532+-0.9667            3.1402+-0.1228          might be 1.0997x faster
   fround                                            32.1973+-0.2235     ?     33.4987+-1.3024        ? might be 1.0404x slower
   function-dot-apply                                 1.8180+-0.1370     ?      2.0289+-0.4967        ? might be 1.1160x slower
   function-test                                      4.5516+-0.3610            4.4225+-0.1952          might be 1.0292x faster
   function-with-eval                                31.7252+-3.1305           30.0603+-2.1529          might be 1.0554x faster
   get-by-id-chain-from-try-block                     8.1958+-0.5394            8.0446+-0.1962          might be 1.0188x faster
   get-by-id-proto-or-self                           20.4640+-0.5515     ?     21.5950+-3.2453        ? might be 1.0553x slower
   get-by-id-self-or-proto                           21.5588+-1.3150     ?     22.1125+-1.0984        ? might be 1.0257x slower
   get-by-val-out-of-bounds                           7.4294+-0.9574     ?      8.6473+-4.1948        ? might be 1.1639x slower
   get_callee_monomorphic                             5.1578+-0.7696     ?      5.4516+-1.8727        ? might be 1.0570x slower
   get_callee_polymorphic                             4.8195+-0.8840            4.5848+-0.2326          might be 1.0512x faster
   getter                                            17.4755+-2.2433           16.7350+-0.5555          might be 1.0442x faster
   global-var-const-infer-fire-from-opt               1.3748+-0.2713            1.3028+-0.1516          might be 1.0552x faster
   global-var-const-infer                             1.0248+-0.1205            0.9642+-0.0273          might be 1.0628x faster
   HashMap-put-get-iterate-keys                      41.7585+-4.0984           41.5412+-2.8311        
   HashMap-put-get-iterate                           41.8647+-1.0272     ?     50.3089+-25.3522       ? might be 1.2017x slower
   HashMap-string-put-get-iterate                    44.7075+-6.2275     ?     45.0616+-6.5395        ?
   imul-double-only                                  15.8456+-2.5819     ?     18.5297+-10.3589       ? might be 1.1694x slower
   imul-int-only                                     14.7005+-1.7614     ?     14.9564+-1.4794        ? might be 1.0174x slower
   imul-mixed                                        19.4137+-1.9534           19.2422+-2.5853        
   in-four-cases                                     24.2115+-10.6961          21.1573+-0.6499          might be 1.1444x faster
   in-one-case-false                                 11.1131+-0.8385     ?     11.3940+-1.1181        ? might be 1.0253x slower
   in-one-case-true                                  13.3086+-6.5538           11.9175+-2.5792          might be 1.1167x faster
   in-two-cases                                      11.4745+-0.5842     ?     12.0227+-0.9180        ? might be 1.0478x slower
   indexed-properties-in-objects                      4.3527+-0.7218            4.1464+-0.9367          might be 1.0498x faster
   infer-closure-const-then-mov-no-inline             5.4289+-3.2787            4.2347+-0.0646          might be 1.2820x faster
   infer-closure-const-then-mov                      26.8688+-2.7559     ?     27.4850+-1.8953        ? might be 1.0229x slower
   infer-closure-const-then-put-to-scope-no-inline   
                                                     17.7740+-1.6431     ?     17.8256+-1.8367        ?
   infer-closure-const-then-put-to-scope             30.9261+-0.9087           30.1460+-0.8932          might be 1.0259x faster
   infer-closure-const-then-reenter-no-inline   
                                                     84.4408+-3.2891     ?     86.1025+-5.5076        ? might be 1.0197x slower
   infer-closure-const-then-reenter                  32.0818+-2.8805     ?     34.7264+-12.0811       ? might be 1.0824x slower
   infer-one-time-closure-ten-vars                   26.1993+-3.0638           25.5443+-1.6031          might be 1.0256x faster
   infer-one-time-closure-two-vars                   25.0657+-0.8285     ?     25.5281+-1.4406        ? might be 1.0184x slower
   infer-one-time-closure                            25.3208+-1.0584     ?     25.3891+-1.6030        ?
   infer-one-time-deep-closure                       50.5080+-2.0417           50.0031+-3.1110          might be 1.0101x faster
   inline-arguments-access                            1.8079+-0.2869     ?      1.8617+-0.5070        ? might be 1.0298x slower
   inline-arguments-aliased-access                    1.8713+-0.0511     ?      2.0662+-0.5325        ? might be 1.1042x slower
   inline-arguments-local-escape                     16.8954+-0.7978     ?     17.5120+-0.3287        ? might be 1.0365x slower
   inline-get-scoped-var                              6.0897+-1.5772            5.5700+-0.5221          might be 1.0933x faster
   inlined-put-by-id-transition                      11.4883+-0.3000     ?     12.2261+-0.7939        ? might be 1.0642x slower
   int-or-other-abs-then-get-by-val                   9.3752+-5.3183            7.9805+-0.4666          might be 1.1748x faster
   int-or-other-abs-zero-then-get-by-val             39.3630+-18.8871          32.8015+-4.1172          might be 1.2000x faster
   int-or-other-add-then-get-by-val                  11.7188+-2.0976           10.9802+-0.4705          might be 1.0673x faster
   int-or-other-add                                   9.9643+-0.8790     ?     10.3979+-1.6176        ? might be 1.0435x slower
   int-or-other-div-then-get-by-val                   7.6581+-3.5966            7.1395+-1.2478          might be 1.0726x faster
   int-or-other-max-then-get-by-val                   7.9525+-1.5630     ?      8.7600+-4.9233        ? might be 1.1015x slower
   int-or-other-min-then-get-by-val                   7.8059+-0.9442            7.7828+-1.2924        
   int-or-other-mod-then-get-by-val                   6.9943+-1.1702     ?      8.1813+-2.8848        ? might be 1.1697x slower
   int-or-other-mul-then-get-by-val                   6.6740+-0.7408     ?      7.6552+-1.8065        ? might be 1.1470x slower
   int-or-other-neg-then-get-by-val                   7.8483+-2.8591     ?      8.3329+-1.2615        ? might be 1.0618x slower
   int-or-other-neg-zero-then-get-by-val             31.6415+-1.7283           29.8420+-2.5352          might be 1.0603x faster
   int-or-other-sub-then-get-by-val                  10.6354+-0.1073     ?     10.6717+-0.3674        ?
   int-or-other-sub                                   7.9955+-1.3759            7.9533+-1.2905        
   int-overflow-local                                 7.0939+-2.2971            5.8370+-0.0554          might be 1.2153x faster
   Int16Array-alloc-long-lived                       69.2947+-0.8772           67.5840+-1.7406          might be 1.0253x faster
   Int16Array-bubble-sort-with-byteLength            32.6938+-2.3920     ?     34.3080+-1.7015        ? might be 1.0494x slower
   Int16Array-bubble-sort                            28.5263+-4.0900     ?     28.8425+-4.5226        ? might be 1.0111x slower
   Int16Array-load-int-mul                            1.9990+-0.2565            1.9376+-0.0578          might be 1.0317x faster
   Int16Array-to-Int32Array-set                      89.5992+-39.9506          70.7845+-1.9310          might be 1.2658x faster
   Int32Array-alloc-large                            31.4408+-2.0799           30.5198+-1.9677          might be 1.0302x faster
   Int32Array-alloc-long-lived                       75.9565+-2.9072           74.9927+-1.1918          might be 1.0129x faster
   Int32Array-alloc                                   4.6470+-1.1440            4.1216+-0.4228          might be 1.1275x faster
   Int32Array-Int8Array-view-alloc                   12.7440+-0.8115     ?     13.3265+-1.0827        ? might be 1.0457x slower
   int52-spill                                       10.8069+-0.7233     ?     11.5554+-2.0249        ? might be 1.0693x slower
   Int8Array-alloc-long-lived                        63.8255+-1.1077     ?     63.8987+-2.1864        ?
   Int8Array-load-with-byteLength                     4.7138+-0.7073            4.5664+-0.5640          might be 1.0323x faster
   Int8Array-load                                     4.9595+-1.2430            4.5480+-0.3524          might be 1.0905x faster
   integer-divide                                    17.4471+-1.7175           16.5241+-0.6360          might be 1.0559x faster
   integer-modulo                                     2.5934+-1.6519            2.2083+-0.4094          might be 1.1744x faster
   large-int-captured                                 9.1987+-1.0069            8.9847+-0.4825          might be 1.0238x faster
   large-int-neg                                     23.6665+-3.1706     ?     23.7142+-2.5426        ?
   large-int                                         26.3954+-15.4044          21.3854+-1.0682          might be 1.2343x faster
   logical-not                                        6.4646+-1.0448            6.0062+-0.4903          might be 1.0763x faster
   lots-of-fields                                    13.9201+-2.3576           13.5471+-2.7176          might be 1.0275x faster
   make-indexed-storage                               3.8641+-0.2683     ?      4.4921+-1.0177        ? might be 1.1625x slower
   make-rope-cse                                      6.4275+-1.4481     ?      6.6870+-1.5049        ? might be 1.0404x slower
   marsaglia-larger-ints                             98.9485+-2.5556     ?     99.3793+-2.2440        ?
   marsaglia-osr-entry                               41.8600+-0.8444     ?     43.2228+-5.4257        ? might be 1.0326x slower
   method-on-number                                  27.2859+-1.1045           26.7730+-2.6778          might be 1.0192x faster
   misc-strict-eq                                    64.6840+-7.1466           62.0667+-3.4867          might be 1.0422x faster
   negative-zero-divide                               0.5828+-0.1612     ?      0.6077+-0.2390        ? might be 1.0427x slower
   negative-zero-modulo                               0.5094+-0.0130     ?      0.5704+-0.1834        ? might be 1.1198x slower
   negative-zero-negate                               0.5148+-0.0159     ?      0.5204+-0.0148        ? might be 1.0110x slower
   nested-function-parsing                           38.3182+-2.5144           37.9318+-1.4442          might be 1.0102x faster
   new-array-buffer-dead                              4.0557+-0.0164     ?      4.0897+-0.1318        ?
   new-array-buffer-push                              9.8289+-1.6838            8.9083+-0.2580          might be 1.1033x faster
   new-array-dead                                    31.1060+-1.6179           30.1218+-0.5609          might be 1.0327x faster
   new-array-push                                     6.4183+-0.0564     ?      6.7667+-0.8311        ? might be 1.0543x slower
   number-test                                        4.2139+-1.0026            3.8815+-0.0832          might be 1.0856x faster
   object-closure-call                                9.1504+-1.1262     ?      9.5311+-1.2198        ? might be 1.0416x slower
   object-test                                        4.2000+-0.2132            4.1095+-0.0541          might be 1.0220x faster
   poly-stricteq                                     84.4606+-4.4393           84.4077+-2.9810        
   polymorphic-get-by-id                              3.9060+-0.1554     ?      3.9498+-0.2273        ? might be 1.0112x slower
   polymorphic-put-by-id                             72.8988+-68.5369    ?     95.8370+-80.6429       ? might be 1.3147x slower
   polymorphic-structure                             28.0064+-18.0453    ?     28.0096+-17.5880       ?
   polyvariant-monomorphic-get-by-id                 12.1135+-2.5919           10.7445+-0.5947          might be 1.1274x faster
   proto-getter-access                               33.6077+-2.3691           33.0453+-3.2087          might be 1.0170x faster
   put-by-id                                         16.5244+-1.2533           16.3435+-0.9697          might be 1.0111x faster
   put-by-val-large-index-blank-indexing-type   
                                                      9.4885+-0.3463     ?      9.5186+-0.7328        ?
   put-by-val-machine-int                             3.3607+-0.2915     ?      3.4525+-0.2176        ? might be 1.0273x slower
   rare-osr-exit-on-local                            19.3826+-0.8860     ?     21.5028+-6.3982        ? might be 1.1094x slower
   register-pressure-from-osr                        29.9227+-1.1443     ?     30.7917+-2.4698        ? might be 1.0290x slower
   setter                                            19.2878+-2.5324           19.0494+-3.5336          might be 1.0125x faster
   simple-activation-demo                            34.5583+-1.8805     ?     36.2892+-2.1257        ? might be 1.0501x slower
   simple-getter-access                              50.9291+-1.9527     ?     51.6914+-3.5013        ? might be 1.0150x slower
   slow-array-profile-convergence                     3.9481+-0.1939            3.8962+-0.0614          might be 1.0133x faster
   slow-convergence                                   5.0742+-0.8411            4.3039+-0.2412          might be 1.1790x faster
   sparse-conditional                                 1.5156+-0.0270     ?      1.5833+-0.2322        ? might be 1.0447x slower
   splice-to-remove                                  63.1953+-1.2919           62.7690+-3.2119        
   string-char-code-at                               26.8528+-1.9617           24.6282+-4.4934          might be 1.0903x faster
   string-concat-object                               2.8815+-0.0217     ?      3.1141+-0.6523        ? might be 1.0807x slower
   string-concat-pair-object                          3.4343+-0.8053            3.1948+-1.0026          might be 1.0749x faster
   string-concat-pair-simple                         14.0681+-1.8277     ?     14.8300+-2.1296        ? might be 1.0542x slower
   string-concat-simple                              13.8769+-0.2379     ?     13.9775+-0.1005        ?
   string-cons-repeat                                10.4382+-1.9957     ?     10.6436+-1.8100        ? might be 1.0197x slower
   string-cons-tower                                 10.6697+-1.5501     ?     10.7988+-2.1100        ? might be 1.0121x slower
   string-equality                                   39.2385+-0.5030     ?     40.8249+-3.0975        ? might be 1.0404x slower
   string-get-by-val-big-char                        11.9070+-1.3326     ?     12.2754+-1.6915        ? might be 1.0309x slower
   string-get-by-val-out-of-bounds-insane             6.5706+-1.6813     ?      7.6985+-1.5752        ? might be 1.1717x slower
   string-get-by-val-out-of-bounds                    5.9410+-1.9276            5.5321+-0.5011          might be 1.0739x faster
   string-get-by-val                                  3.9114+-0.0683     ?      4.3478+-1.4484        ? might be 1.1116x slower
   string-hash                                        2.6592+-0.2756            2.5626+-0.0183          might be 1.0377x faster
   string-long-ident-equality                        34.9924+-1.0943     ?     37.8195+-2.5732        ? might be 1.0808x slower
   string-repeat-arith                               47.6771+-3.3100           43.0648+-1.5542          might be 1.1071x faster
   string-sub                                        89.9006+-4.2412           86.5112+-1.9659          might be 1.0392x faster
   string-test                                        3.7381+-0.0997     ?      3.7490+-0.1849        ?
   string-var-equality                               59.9330+-3.7681     ?     60.0725+-1.7292        ?
   structure-hoist-over-transitions                   3.5838+-1.0085            3.3077+-0.0441          might be 1.0835x faster
   switch-char-constant                               3.1700+-0.2602            3.1155+-0.0355          might be 1.0175x faster
   switch-char                                        8.3749+-2.3614            8.3381+-1.2628        
   switch-constant                                   10.0244+-1.6401            9.6490+-0.7354          might be 1.0389x faster
   switch-string-basic-big-var                       20.6093+-0.4876     ?     21.2503+-1.4227        ? might be 1.0311x slower
   switch-string-basic-big                           20.2893+-1.1519           20.0833+-0.9829          might be 1.0103x faster
   switch-string-basic-var                           21.6278+-0.8927           20.5326+-1.3016          might be 1.0533x faster
   switch-string-basic                               18.0966+-0.8079           18.0683+-0.5337        
   switch-string-big-length-tower-var                27.5045+-0.6520     ?     28.0016+-0.8188        ? might be 1.0181x slower
   switch-string-length-tower-var                    21.0065+-0.5750     ?     21.9965+-2.8253        ? might be 1.0471x slower
   switch-string-length-tower                        18.1628+-1.4607     ?     18.4920+-2.3722        ? might be 1.0181x slower
   switch-string-short                               19.5737+-3.8429           18.6371+-1.0606          might be 1.0503x faster
   switch                                            17.0074+-2.2226           16.3795+-0.3643          might be 1.0383x faster
   tear-off-arguments-simple                          2.6013+-0.0840     ?      2.7831+-0.3933        ? might be 1.0699x slower
   tear-off-arguments                                 3.9955+-0.4599            3.9250+-0.2602          might be 1.0180x faster
   temporal-structure                                18.2529+-1.6863     ?     19.6742+-3.8070        ? might be 1.0779x slower
   to-int32-boolean                                  21.1107+-4.6526           19.9608+-3.0242          might be 1.0576x faster
   undefined-test                                     4.3229+-1.1569            4.1130+-0.6682          might be 1.0510x faster
   unprofiled-licm                                   59.8765+-6.3399           58.6620+-3.3857          might be 1.0207x faster
   weird-inlining-const-prop                          2.6855+-0.6112            2.5662+-0.1349          might be 1.0465x faster

   <arithmetic>                                      22.1598+-0.6929           22.0114+-0.5846          might be 1.0067x faster
   <geometric> *                                     11.4939+-0.2032           11.3916+-0.1575          might be 1.0090x faster
   <harmonic>                                         5.5682+-0.1129            5.5424+-0.0724          might be 1.0047x faster

                                                          Conf#1                    Conf#2                                      
AsmBench:
   bigfib.cpp                                      1466.8200+-14.3095        1463.6599+-23.9301       
   cray.c                                            53.6849+-4.1114     ?     55.8937+-1.8710        ? might be 1.0411x slower
   dry.c                                           1278.2316+-41.4835        1261.2917+-8.1168          might be 1.0134x faster
   FloatMM.c                                       1828.5541+-11.5089    ?   1833.6780+-8.2449        ?
   gcc-loops.cpp                                   3165.5955+-50.1891    ?   3270.8739+-329.1033      ? might be 1.0333x slower
   n-body.c                                        2123.0682+-14.5961    ?   2126.6964+-3.6637        ?
   Quicksort.c                                      104.7718+-0.4877     ?    120.3850+-26.7873       ? might be 1.1490x slower
   stepanov_container.cpp                          7294.9308+-152.7661       7220.3734+-97.6040         might be 1.0103x faster
   Towers.c                                          74.0829+-3.5863     ?     74.0936+-3.4701        ?

   <arithmetic>                                    1932.1933+-22.3593    ?   1936.3273+-32.3755       ? might be 1.0021x slower
   <geometric> *                                    744.1055+-4.0407     ?    759.4225+-13.9067       ? might be 1.0206x slower
   <harmonic>                                       201.6047+-4.2082     ?    210.5416+-8.7023        ? might be 1.0443x slower

                                                          Conf#1                    Conf#2                                      
All benchmarks:
   <arithmetic>                                     164.9232+-1.3712     ?    165.0275+-1.5218        ? might be 1.0006x slower
   <geometric>                                       19.3808+-0.2470           19.2919+-0.1689          might be 1.0046x faster
   <harmonic>                                         4.9128+-0.0795            4.9015+-0.0209          might be 1.0023x faster

                                                          Conf#1                    Conf#2                                      
Geomean of preferred means:
   <scaled-result>                                   70.9432+-1.0831           70.9029+-0.7597          might be 1.0006x faster
Comment 11 Mark Lam 2014-04-24 14:02:42 PDT
Created attachment 230103 [details]
revised patch
Comment 12 Mark Hahnenberg 2014-04-24 14:10:06 PDT
Comment on attachment 230103 [details]
revised patch

View in context: https://bugs.webkit.org/attachment.cgi?id=230103&action=review

r=me

> Source/JavaScriptCore/heap/MarkedAllocator.h:53
> +    ALWAYS_INLINE void doTestCollectionsIfNeeded();

Do you need ALWAYS_INLINE here? I've never seen it in a header before.
Comment 13 Mark Lam 2014-04-24 14:14:19 PDT
(In reply to comment #12)
> > Source/JavaScriptCore/heap/MarkedAllocator.h:53
> > +    ALWAYS_INLINE void doTestCollectionsIfNeeded();
> 
> Do you need ALWAYS_INLINE here? I've never seen it in a header before.

ALWAYS_INLINE is used in headers everywhere.  Just grep for it in header files and you’ll see.

Thanks for the review.  Landed in r167772: <http://trac.webkit.org/r167772>.