Bug 155110

Summary: [JSC] addStaticGlobals should emit SymbolTableEntry watchpoints to encourage constant folding in DFG
Product: WebKit Reporter: Yusuke Suzuki <ysuzuki>
Component: JavaScriptCoreAssignee: Yusuke Suzuki <ysuzuki>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, commit-queue, fpizlo, ggaren, keith_miller, mark.lam, msaboff, saam
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Archive of layout-test-results from ews114 for mac-yosemite
none
Patch
none
Patch saam: review+

Yusuke Suzuki
Reported 2016-03-07 07:16:50 PST
Now, (maybe, accidentally), addStaticGlobals does not emit SymbolTableEntry watchpoints for the added entries. So, all the global variable lookups pointing to these static globals are not converted into constants in BytecodeGenerator: this fact leaves these lookups as GetGlobalVar. But, static globals are not configurable. And they are typically non-writable. So they are constant in almost all the cases. This patch initializes watchpoints for these static globals. These watchpoints allow DFG to convert these nodes into constants in DFG BytecodeParser. These watchpoints includes some builtin operations and `undefined`!
Attachments
Patch (13.05 KB, patch)
2016-03-07 07:37 PST, Yusuke Suzuki
no flags
Patch (13.02 KB, patch)
2016-03-07 07:41 PST, Yusuke Suzuki
no flags
Patch (17.36 KB, patch)
2016-03-11 13:52 PST, Yusuke Suzuki
no flags
Patch (18.13 KB, patch)
2016-04-04 04:49 PDT, Yusuke Suzuki
no flags
Archive of layout-test-results from ews114 for mac-yosemite (366.58 KB, application/zip)
2016-04-04 05:37 PDT, Build Bot
no flags
Patch (13.77 KB, patch)
2016-04-11 04:30 PDT, Yusuke Suzuki
no flags
Patch (14.45 KB, patch)
2016-04-11 06:06 PDT, Yusuke Suzuki
saam: review+
Yusuke Suzuki
Comment 1 2016-03-07 07:37:42 PST
Created attachment 273178 [details] Patch part 1
Yusuke Suzuki
Comment 2 2016-03-07 07:41:02 PST
Created attachment 273179 [details] Patch part 2
Yusuke Suzuki
Comment 3 2016-03-11 13:52:05 PST
Yusuke Suzuki
Comment 4 2016-03-12 03:48:33 PST
After applying this change, we measure some performance regression on array-prototype-reduceRight test in JSRegress. IIRC, https://bugs.webkit.org/show_bug.cgi?id=154844 can remove this regression, so I'll first do https://bugs.webkit.org/show_bug.cgi?id=154844.
Yusuke Suzuki
Comment 5 2016-04-04 04:49:05 PDT
Build Bot
Comment 6 2016-04-04 05:37:26 PDT
Comment on attachment 275545 [details] Patch Attachment 275545 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/1097537 Number of test failures exceeded the failure limit.
Build Bot
Comment 7 2016-04-04 05:37:28 PDT
Created attachment 275546 [details] Archive of layout-test-results from ews114 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews114 Port: mac-yosemite Platform: Mac OS X 10.10.5
Yusuke Suzuki
Comment 8 2016-04-04 05:54:59 PDT
Comment on attachment 275545 [details] Patch Still WIP.
Yusuke Suzuki
Comment 9 2016-04-11 04:30:39 PDT
Yusuke Suzuki
Comment 10 2016-04-11 04:33:16 PDT
Benchmark report for SunSpider, LongSpider, Octane, Kraken, and AsmBench on dandelion (MacBookPro10,1). VMs tested: "Baseline" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/globals-master/Release/jsc "Globals" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/g5/Release/jsc Collected 10 samples per benchmark/VM, with 10 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. Baseline Globals SunSpider: 3d-cube 6.1136+-0.0935 6.0450+-0.0526 might be 1.0113x faster 3d-morph 5.8109+-0.0442 ^ 5.7021+-0.0318 ^ definitely 1.0191x faster 3d-raytrace 6.6403+-0.1148 6.5815+-0.0498 access-binary-trees 2.3595+-0.0661 ? 2.4184+-0.1082 ? might be 1.0250x slower access-fannkuch 6.6221+-0.0225 ? 6.6243+-0.0314 ? access-nbody 3.1819+-0.2342 3.0983+-0.2441 might be 1.0270x faster access-nsieve 3.1227+-0.0988 ? 3.1822+-0.1108 ? might be 1.0190x slower bitops-3bit-bits-in-byte 1.4233+-0.1350 1.2878+-0.0750 might be 1.1052x faster bitops-bits-in-byte 3.2470+-0.2376 3.1207+-0.1591 might be 1.0405x faster bitops-bitwise-and 2.2525+-0.1753 2.1996+-0.1571 might be 1.0241x faster bitops-nsieve-bits 3.3694+-0.0685 ? 3.4281+-0.1762 ? might be 1.0174x slower controlflow-recursive 2.6635+-0.0883 2.6492+-0.0836 crypto-aes 4.7409+-0.1255 ? 4.8404+-0.1462 ? might be 1.0210x slower crypto-md5 2.9343+-0.1313 ? 2.9436+-0.1573 ? crypto-sha1 2.7653+-0.1314 2.7324+-0.1474 might be 1.0120x faster date-format-tofte 9.1097+-0.1255 ? 9.1619+-0.1676 ? date-format-xparb 5.8352+-0.4340 5.8310+-0.5341 math-cordic 3.1780+-0.2292 3.0309+-0.0762 might be 1.0485x faster math-partial-sums 5.5181+-0.0985 ? 5.5448+-0.1357 ? math-spectral-norm 2.3109+-0.1936 2.2401+-0.1435 might be 1.0316x faster regexp-dna 7.3235+-0.0835 ? 7.3400+-0.0769 ? string-base64 5.4904+-0.0171 5.4526+-0.0898 string-fasta 6.6854+-0.3892 6.3913+-0.1397 might be 1.0460x faster string-tagcloud 9.0543+-0.0971 9.0055+-0.0667 string-unpack-code 19.3328+-0.1389 19.2624+-0.0763 string-validate-input 4.9182+-0.1033 4.8903+-0.0306 <arithmetic> 5.2309+-0.0322 5.1925+-0.0267 might be 1.0074x faster Baseline Globals LongSpider: 3d-cube 916.2048+-5.8220 ? 920.5600+-10.9478 ? 3d-morph 673.4544+-4.7696 672.2407+-3.8510 3d-raytrace 722.4553+-4.9574 717.3063+-6.0604 access-binary-trees 973.2210+-5.9480 ? 977.3012+-3.8647 ? access-fannkuch 304.2281+-8.7583 304.1552+-9.3859 access-nbody 576.4872+-7.1122 574.8656+-4.3767 access-nsieve 399.1779+-3.9584 ? 401.0313+-3.9492 ? bitops-3bit-bits-in-byte 33.1424+-1.1743 33.1211+-1.1888 bitops-bits-in-byte 130.2072+-2.8976 126.8325+-1.1433 might be 1.0266x faster bitops-nsieve-bits 394.6471+-2.0726 394.3495+-2.1760 controlflow-recursive 517.0124+-4.6390 ? 518.6992+-6.4181 ? crypto-aes 721.2280+-5.6501 716.8040+-5.0910 crypto-md5 516.4913+-4.6946 515.4666+-3.9111 crypto-sha1 694.2994+-5.3092 ? 698.8859+-5.9099 ? date-format-tofte 743.7155+-12.2938 ? 750.8892+-8.5127 ? date-format-xparb 740.2295+-18.3272 ? 742.1942+-12.7894 ? hash-map 165.4574+-1.9673 163.7309+-1.7109 might be 1.0105x faster math-cordic 504.1569+-2.1161 ? 505.9414+-4.1270 ? math-partial-sums 484.8505+-4.3969 483.7533+-1.5952 math-spectral-norm 558.7908+-5.2618 557.7940+-4.2058 string-base64 484.0460+-4.9470 482.8081+-4.0482 string-fasta 387.0775+-3.4383 ? 389.6418+-4.8587 ? string-tagcloud 194.7116+-1.9728 191.5824+-1.2712 might be 1.0163x faster <geometric> 426.7983+-1.2807 426.1545+-1.0405 might be 1.0015x faster Baseline Globals Octane: encrypt 0.18655+-0.00213 0.18572+-0.00093 decrypt 3.25772+-0.00609 ? 3.26802+-0.02063 ? deltablue x2 0.15764+-0.00157 0.15661+-0.00158 earley 0.34864+-0.00180 0.34770+-0.00155 boyer 5.39879+-0.01216 ! 5.45084+-0.02442 ! definitely 1.0096x slower navier-stokes x2 5.03613+-0.01319 5.02649+-0.01372 raytrace x2 0.99191+-0.00767 ? 1.00541+-0.01106 ? might be 1.0136x slower richards x2 0.09991+-0.00106 ? 0.10037+-0.00109 ? splay x2 0.40398+-0.00392 ^ 0.39714+-0.00219 ^ definitely 1.0172x faster regexp x2 19.50037+-0.24156 19.39496+-0.16158 pdfjs x2 47.01394+-0.42742 46.99978+-0.28597 mandreel x2 47.77950+-0.39967 47.26620+-0.17380 might be 1.0109x faster gbemu x2 29.42102+-0.35691 29.37393+-0.36826 closure 0.61575+-0.00318 0.61095+-0.00229 jquery 7.95908+-0.02400 ? 7.97821+-0.02185 ? box2d x2 10.77104+-0.04706 ? 10.77658+-0.02816 ? zlib x2 386.02963+-10.63456 379.82726+-11.12266 might be 1.0163x faster typescript x2 798.93118+-1.83252 796.33325+-4.18360 <geometric> 5.85009+-0.00769 5.83292+-0.02200 might be 1.0029x faster Baseline Globals Kraken: ai-astar 91.885+-0.277 ? 92.087+-0.439 ? audio-beat-detection 48.918+-0.672 48.688+-0.416 audio-dft 106.372+-0.885 105.973+-0.750 audio-fft 38.929+-0.767 38.685+-0.235 audio-oscillator 54.560+-1.041 54.061+-0.149 imaging-darkroom 70.040+-1.250 69.289+-1.103 might be 1.0108x faster imaging-desaturate 58.319+-1.481 56.835+-0.230 might be 1.0261x faster imaging-gaussian-blur 78.203+-4.005 77.683+-3.446 json-parse-financial 45.561+-0.250 ^ 44.140+-0.297 ^ definitely 1.0322x faster json-stringify-tinderbox 27.231+-0.099 ^ 26.747+-0.091 ^ definitely 1.0181x faster stanford-crypto-aes 43.270+-1.300 42.567+-0.867 might be 1.0165x faster stanford-crypto-ccm 41.517+-1.232 41.516+-2.300 stanford-crypto-pbkdf2 107.880+-1.962 ? 109.492+-1.358 ? might be 1.0149x slower stanford-crypto-sha256-iterative 42.476+-0.132 42.365+-0.136 <arithmetic> 61.083+-0.390 60.723+-0.350 might be 1.0059x faster Baseline Globals AsmBench: bigfib.cpp 496.3222+-3.4619 496.2094+-2.2564 container.cpp 3237.5482+-19.3035 3221.4546+-7.1316 dry.c 478.5976+-13.6448 ? 485.1730+-17.6073 ? might be 1.0137x slower float-mm.c 731.2737+-5.5353 ? 733.5684+-3.4189 ? gcc-loops.cpp 4310.5413+-17.6605 4309.5690+-8.7882 hash-map 164.6410+-1.8104 ? 165.1901+-2.6786 ? n-body.c 890.8668+-3.5407 ? 891.5270+-3.9239 ? quicksort.c 410.7419+-4.2702 ? 411.4000+-4.7954 ? towers.c 282.9170+-4.7584 280.5052+-3.2118 <geometric> 701.7434+-1.9630 ? 702.3848+-3.6333 ? might be 1.0009x slower Baseline Globals Geomean of preferred means: <scaled-result> 56.1831+-0.0888 55.9945+-0.1111 might be 1.0034x faster
Yusuke Suzuki
Comment 11 2016-04-11 04:35:55 PDT
Comment on attachment 276137 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=276137&action=review Added some comments. > Source/JavaScriptCore/runtime/SymbolTable.h:217 > + } Seeing SymbolTableEntry, I found that FatEntry has costly copying path. And it does not have any move constructors (since we define copy constructors, implicit move constructors are not defined). To alleviate this situation, I added move constructors.
Yusuke Suzuki
Comment 12 2016-04-11 06:06:36 PDT
Created attachment 276144 [details] Patch Windows build fix
Saam Barati
Comment 13 2016-04-11 08:41:29 PDT
Comment on attachment 276144 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=276144&action=review > Source/JavaScriptCore/runtime/JSSymbolTableObject.h:162 > + Invalidate Style: spacing looks off here. (I'm on my phone so maybe I'm wrong and bugzilla is showing it weirdly.) > Source/JavaScriptCore/runtime/SymbolTable.h:572 > + template<typename Entry> Why the templates for this method?
Yusuke Suzuki
Comment 14 2016-04-12 01:22:28 PDT
Comment on attachment 276144 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=276144&action=review Thanks! >> Source/JavaScriptCore/runtime/JSSymbolTableObject.h:162 >> + Invalidate > > Style: spacing looks off here. (I'm on my phone so maybe I'm wrong and bugzilla is showing it weirdly.) 4 spaces here :) >> Source/JavaScriptCore/runtime/SymbolTable.h:572 >> + template<typename Entry> > > Why the templates for this method? This is because we don't want to create duplicate methods. void add(const ConcurrentJITLocker&, UniquedStringImpl* key, Entry&& entry); and void add(const ConcurrentJITLocker&, UniquedStringImpl* key, Entry& entry); C++11 has an interesting rule. When you define a templatized `T&&` and when you pass the lvalue, it becomes `T&`. So, you can pass the lvalue even if you just define the method with `T&&`. For example, void test(int&& value); int value = 20; test(value); // compile error. value is lvalue ref. test(20); // compiling is OK. But, template<typename T> void test(T&& value); int value = 20; test(value); // OK! test(20); // OK! And we should use std::forward instead of std::move. std::forward is used for this cases. std::move always converts to T&&. But std::forward is not the same. It converts the rvalue to T&&, but it converts lvalue to T&.
Yusuke Suzuki
Comment 15 2016-04-12 01:25:45 PDT
Note You need to log in before you can comment on or make changes to this bug.