Bug 155110 - [JSC] addStaticGlobals should emit SymbolTableEntry watchpoints to encourage constant folding in DFG
Summary: [JSC] addStaticGlobals should emit SymbolTableEntry watchpoints to encourage ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Yusuke Suzuki
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-07 07:16 PST by Yusuke Suzuki
Modified: 2016-04-12 01:25 PDT (History)
8 users (show)

See Also:


Attachments
Patch (13.05 KB, patch)
2016-03-07 07:37 PST, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (13.02 KB, patch)
2016-03-07 07:41 PST, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (17.36 KB, patch)
2016-03-11 13:52 PST, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (18.13 KB, patch)
2016-04-04 04:49 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews114 for mac-yosemite (366.58 KB, application/zip)
2016-04-04 05:37 PDT, Build Bot
no flags Details
Patch (13.77 KB, patch)
2016-04-11 04:30 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (14.45 KB, patch)
2016-04-11 06:06 PDT, Yusuke Suzuki
sbarati: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yusuke Suzuki 2016-03-07 07:16:50 PST
Now, (maybe, accidentally), addStaticGlobals does not emit SymbolTableEntry watchpoints for the added entries.
So, all the global variable lookups pointing to these static globals are not converted into constants in BytecodeGenerator: this fact leaves these lookups as GetGlobalVar.
But, static globals are not configurable. And they are typically non-writable. So they are constant in almost all the cases.

This patch initializes watchpoints for these static globals. These watchpoints allow DFG to convert these nodes into constants in DFG BytecodeParser.
These watchpoints includes some builtin operations and `undefined`!
Comment 1 Yusuke Suzuki 2016-03-07 07:37:42 PST
Created attachment 273178 [details]
Patch

part 1
Comment 2 Yusuke Suzuki 2016-03-07 07:41:02 PST
Created attachment 273179 [details]
Patch

part 2
Comment 3 Yusuke Suzuki 2016-03-11 13:52:05 PST
Created attachment 273763 [details]
Patch
Comment 4 Yusuke Suzuki 2016-03-12 03:48:33 PST
After applying this change, we measure some performance regression on array-prototype-reduceRight test in JSRegress.
IIRC, https://bugs.webkit.org/show_bug.cgi?id=154844 can remove this regression, so I'll first do https://bugs.webkit.org/show_bug.cgi?id=154844.
Comment 5 Yusuke Suzuki 2016-04-04 04:49:05 PDT
Created attachment 275545 [details]
Patch
Comment 6 Build Bot 2016-04-04 05:37:26 PDT
Comment on attachment 275545 [details]
Patch

Attachment 275545 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/1097537

Number of test failures exceeded the failure limit.
Comment 7 Build Bot 2016-04-04 05:37:28 PDT
Created attachment 275546 [details]
Archive of layout-test-results from ews114 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews114  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 8 Yusuke Suzuki 2016-04-04 05:54:59 PDT
Comment on attachment 275545 [details]
Patch

Still WIP.
Comment 9 Yusuke Suzuki 2016-04-11 04:30:39 PDT
Created attachment 276137 [details]
Patch
Comment 10 Yusuke Suzuki 2016-04-11 04:33:16 PDT
Benchmark report for SunSpider, LongSpider, Octane, Kraken, and AsmBench on dandelion (MacBookPro10,1).

VMs tested:
"Baseline" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/globals-master/Release/jsc
"Globals" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/g5/Release/jsc

Collected 10 samples per benchmark/VM, with 10 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to
get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                 Baseline                  Globals                                      
SunSpider:
   3d-cube                                    6.1136+-0.0935            6.0450+-0.0526          might be 1.0113x faster
   3d-morph                                   5.8109+-0.0442     ^      5.7021+-0.0318        ^ definitely 1.0191x faster
   3d-raytrace                                6.6403+-0.1148            6.5815+-0.0498        
   access-binary-trees                        2.3595+-0.0661     ?      2.4184+-0.1082        ? might be 1.0250x slower
   access-fannkuch                            6.6221+-0.0225     ?      6.6243+-0.0314        ?
   access-nbody                               3.1819+-0.2342            3.0983+-0.2441          might be 1.0270x faster
   access-nsieve                              3.1227+-0.0988     ?      3.1822+-0.1108        ? might be 1.0190x slower
   bitops-3bit-bits-in-byte                   1.4233+-0.1350            1.2878+-0.0750          might be 1.1052x faster
   bitops-bits-in-byte                        3.2470+-0.2376            3.1207+-0.1591          might be 1.0405x faster
   bitops-bitwise-and                         2.2525+-0.1753            2.1996+-0.1571          might be 1.0241x faster
   bitops-nsieve-bits                         3.3694+-0.0685     ?      3.4281+-0.1762        ? might be 1.0174x slower
   controlflow-recursive                      2.6635+-0.0883            2.6492+-0.0836        
   crypto-aes                                 4.7409+-0.1255     ?      4.8404+-0.1462        ? might be 1.0210x slower
   crypto-md5                                 2.9343+-0.1313     ?      2.9436+-0.1573        ?
   crypto-sha1                                2.7653+-0.1314            2.7324+-0.1474          might be 1.0120x faster
   date-format-tofte                          9.1097+-0.1255     ?      9.1619+-0.1676        ?
   date-format-xparb                          5.8352+-0.4340            5.8310+-0.5341        
   math-cordic                                3.1780+-0.2292            3.0309+-0.0762          might be 1.0485x faster
   math-partial-sums                          5.5181+-0.0985     ?      5.5448+-0.1357        ?
   math-spectral-norm                         2.3109+-0.1936            2.2401+-0.1435          might be 1.0316x faster
   regexp-dna                                 7.3235+-0.0835     ?      7.3400+-0.0769        ?
   string-base64                              5.4904+-0.0171            5.4526+-0.0898        
   string-fasta                               6.6854+-0.3892            6.3913+-0.1397          might be 1.0460x faster
   string-tagcloud                            9.0543+-0.0971            9.0055+-0.0667        
   string-unpack-code                        19.3328+-0.1389           19.2624+-0.0763        
   string-validate-input                      4.9182+-0.1033            4.8903+-0.0306        

   <arithmetic>                               5.2309+-0.0322            5.1925+-0.0267          might be 1.0074x faster

                                                 Baseline                  Globals                                      
LongSpider:
   3d-cube                                  916.2048+-5.8220     ?    920.5600+-10.9478       ?
   3d-morph                                 673.4544+-4.7696          672.2407+-3.8510        
   3d-raytrace                              722.4553+-4.9574          717.3063+-6.0604        
   access-binary-trees                      973.2210+-5.9480     ?    977.3012+-3.8647        ?
   access-fannkuch                          304.2281+-8.7583          304.1552+-9.3859        
   access-nbody                             576.4872+-7.1122          574.8656+-4.3767        
   access-nsieve                            399.1779+-3.9584     ?    401.0313+-3.9492        ?
   bitops-3bit-bits-in-byte                  33.1424+-1.1743           33.1211+-1.1888        
   bitops-bits-in-byte                      130.2072+-2.8976          126.8325+-1.1433          might be 1.0266x faster
   bitops-nsieve-bits                       394.6471+-2.0726          394.3495+-2.1760        
   controlflow-recursive                    517.0124+-4.6390     ?    518.6992+-6.4181        ?
   crypto-aes                               721.2280+-5.6501          716.8040+-5.0910        
   crypto-md5                               516.4913+-4.6946          515.4666+-3.9111        
   crypto-sha1                              694.2994+-5.3092     ?    698.8859+-5.9099        ?
   date-format-tofte                        743.7155+-12.2938    ?    750.8892+-8.5127        ?
   date-format-xparb                        740.2295+-18.3272    ?    742.1942+-12.7894       ?
   hash-map                                 165.4574+-1.9673          163.7309+-1.7109          might be 1.0105x faster
   math-cordic                              504.1569+-2.1161     ?    505.9414+-4.1270        ?
   math-partial-sums                        484.8505+-4.3969          483.7533+-1.5952        
   math-spectral-norm                       558.7908+-5.2618          557.7940+-4.2058        
   string-base64                            484.0460+-4.9470          482.8081+-4.0482        
   string-fasta                             387.0775+-3.4383     ?    389.6418+-4.8587        ?
   string-tagcloud                          194.7116+-1.9728          191.5824+-1.2712          might be 1.0163x faster

   <geometric>                              426.7983+-1.2807          426.1545+-1.0405          might be 1.0015x faster

                                                 Baseline                  Globals                                      
Octane:
   encrypt                                   0.18655+-0.00213          0.18572+-0.00093       
   decrypt                                   3.25772+-0.00609    ?     3.26802+-0.02063       ?
   deltablue                        x2       0.15764+-0.00157          0.15661+-0.00158       
   earley                                    0.34864+-0.00180          0.34770+-0.00155       
   boyer                                     5.39879+-0.01216    !     5.45084+-0.02442       ! definitely 1.0096x slower
   navier-stokes                    x2       5.03613+-0.01319          5.02649+-0.01372       
   raytrace                         x2       0.99191+-0.00767    ?     1.00541+-0.01106       ? might be 1.0136x slower
   richards                         x2       0.09991+-0.00106    ?     0.10037+-0.00109       ?
   splay                            x2       0.40398+-0.00392    ^     0.39714+-0.00219       ^ definitely 1.0172x faster
   regexp                           x2      19.50037+-0.24156         19.39496+-0.16158       
   pdfjs                            x2      47.01394+-0.42742         46.99978+-0.28597       
   mandreel                         x2      47.77950+-0.39967         47.26620+-0.17380         might be 1.0109x faster
   gbemu                            x2      29.42102+-0.35691         29.37393+-0.36826       
   closure                                   0.61575+-0.00318          0.61095+-0.00229       
   jquery                                    7.95908+-0.02400    ?     7.97821+-0.02185       ?
   box2d                            x2      10.77104+-0.04706    ?    10.77658+-0.02816       ?
   zlib                             x2     386.02963+-10.63456       379.82726+-11.12266        might be 1.0163x faster
   typescript                       x2     798.93118+-1.83252        796.33325+-4.18360       

   <geometric>                               5.85009+-0.00769          5.83292+-0.02200         might be 1.0029x faster

                                                 Baseline                  Globals                                      
Kraken:
   ai-astar                                   91.885+-0.277      ?      92.087+-0.439         ?
   audio-beat-detection                       48.918+-0.672             48.688+-0.416         
   audio-dft                                 106.372+-0.885            105.973+-0.750         
   audio-fft                                  38.929+-0.767             38.685+-0.235         
   audio-oscillator                           54.560+-1.041             54.061+-0.149         
   imaging-darkroom                           70.040+-1.250             69.289+-1.103           might be 1.0108x faster
   imaging-desaturate                         58.319+-1.481             56.835+-0.230           might be 1.0261x faster
   imaging-gaussian-blur                      78.203+-4.005             77.683+-3.446         
   json-parse-financial                       45.561+-0.250      ^      44.140+-0.297         ^ definitely 1.0322x faster
   json-stringify-tinderbox                   27.231+-0.099      ^      26.747+-0.091         ^ definitely 1.0181x faster
   stanford-crypto-aes                        43.270+-1.300             42.567+-0.867           might be 1.0165x faster
   stanford-crypto-ccm                        41.517+-1.232             41.516+-2.300         
   stanford-crypto-pbkdf2                    107.880+-1.962      ?     109.492+-1.358         ? might be 1.0149x slower
   stanford-crypto-sha256-iterative           42.476+-0.132             42.365+-0.136         

   <arithmetic>                               61.083+-0.390             60.723+-0.350           might be 1.0059x faster

                                                 Baseline                  Globals                                      
AsmBench:
   bigfib.cpp                               496.3222+-3.4619          496.2094+-2.2564        
   container.cpp                           3237.5482+-19.3035        3221.4546+-7.1316        
   dry.c                                    478.5976+-13.6448    ?    485.1730+-17.6073       ? might be 1.0137x slower
   float-mm.c                               731.2737+-5.5353     ?    733.5684+-3.4189        ?
   gcc-loops.cpp                           4310.5413+-17.6605        4309.5690+-8.7882        
   hash-map                                 164.6410+-1.8104     ?    165.1901+-2.6786        ?
   n-body.c                                 890.8668+-3.5407     ?    891.5270+-3.9239        ?
   quicksort.c                              410.7419+-4.2702     ?    411.4000+-4.7954        ?
   towers.c                                 282.9170+-4.7584          280.5052+-3.2118        

   <geometric>                              701.7434+-1.9630     ?    702.3848+-3.6333        ? might be 1.0009x slower

                                                 Baseline                  Globals                                      
Geomean of preferred means:
   <scaled-result>                           56.1831+-0.0888           55.9945+-0.1111          might be 1.0034x faster
Comment 11 Yusuke Suzuki 2016-04-11 04:35:55 PDT
Comment on attachment 276137 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=276137&action=review

Added some comments.

> Source/JavaScriptCore/runtime/SymbolTable.h:217
> +    }

Seeing SymbolTableEntry, I found that FatEntry has costly copying path. And it does not have any move constructors (since we define copy constructors, implicit move constructors are not defined).
To alleviate this situation, I added move constructors.
Comment 12 Yusuke Suzuki 2016-04-11 06:06:36 PDT
Created attachment 276144 [details]
Patch

Windows build fix
Comment 13 Saam Barati 2016-04-11 08:41:29 PDT
Comment on attachment 276144 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=276144&action=review

> Source/JavaScriptCore/runtime/JSSymbolTableObject.h:162
> +    Invalidate

Style: spacing looks off here. (I'm on my phone so maybe I'm wrong and bugzilla is showing it weirdly.)

> Source/JavaScriptCore/runtime/SymbolTable.h:572
> +    template<typename Entry>

Why the templates for this method?
Comment 14 Yusuke Suzuki 2016-04-12 01:22:28 PDT
Comment on attachment 276144 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=276144&action=review

Thanks!

>> Source/JavaScriptCore/runtime/JSSymbolTableObject.h:162
>> +    Invalidate
> 
> Style: spacing looks off here. (I'm on my phone so maybe I'm wrong and bugzilla is showing it weirdly.)

4 spaces here :)

>> Source/JavaScriptCore/runtime/SymbolTable.h:572
>> +    template<typename Entry>
> 
> Why the templates for this method?

This is because we don't want to create duplicate methods.

void add(const ConcurrentJITLocker&, UniquedStringImpl* key, Entry&& entry);

and

void add(const ConcurrentJITLocker&, UniquedStringImpl* key, Entry& entry);

C++11 has an interesting rule.
When you define a templatized `T&&` and when you pass the lvalue, it becomes `T&`.
So, you can pass the lvalue even if you just define the method with `T&&`.

For example,

void test(int&& value);

int value = 20;
test(value);  // compile error. value is lvalue ref.
test(20);  // compiling is OK.

But,

template<typename T>
void test(T&& value);

int value = 20;
test(value);  // OK!
test(20);  // OK!



And we should use std::forward instead of std::move.
std::forward is used for this cases.

std::move always converts to T&&.
But std::forward is not the same. It converts the rvalue to T&&, but it converts lvalue to T&.
Comment 15 Yusuke Suzuki 2016-04-12 01:25:45 PDT
Committed r199342: <http://trac.webkit.org/changeset/199342>