Summary: | [JSC] addStaticGlobals should emit SymbolTableEntry watchpoints to encourage constant folding in DFG | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Yusuke Suzuki <ysuzuki> | ||||||||||||||||
Component: | JavaScriptCore | Assignee: | Yusuke Suzuki <ysuzuki> | ||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||
Severity: | Normal | CC: | benjamin, commit-queue, fpizlo, ggaren, keith_miller, mark.lam, msaboff, saam | ||||||||||||||||
Priority: | P2 | ||||||||||||||||||
Version: | WebKit Nightly Build | ||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||
Attachments: |
|
Description
Yusuke Suzuki
2016-03-07 07:16:50 PST
Created attachment 273178 [details]
Patch
part 1
Created attachment 273179 [details]
Patch
part 2
Created attachment 273763 [details]
Patch
After applying this change, we measure some performance regression on array-prototype-reduceRight test in JSRegress. IIRC, https://bugs.webkit.org/show_bug.cgi?id=154844 can remove this regression, so I'll first do https://bugs.webkit.org/show_bug.cgi?id=154844. Created attachment 275545 [details]
Patch
Comment on attachment 275545 [details] Patch Attachment 275545 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/1097537 Number of test failures exceeded the failure limit. Created attachment 275546 [details]
Archive of layout-test-results from ews114 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews114 Port: mac-yosemite Platform: Mac OS X 10.10.5
Comment on attachment 275545 [details]
Patch
Still WIP.
Created attachment 276137 [details]
Patch
Benchmark report for SunSpider, LongSpider, Octane, Kraken, and AsmBench on dandelion (MacBookPro10,1). VMs tested: "Baseline" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/globals-master/Release/jsc "Globals" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/g5/Release/jsc Collected 10 samples per benchmark/VM, with 10 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. Baseline Globals SunSpider: 3d-cube 6.1136+-0.0935 6.0450+-0.0526 might be 1.0113x faster 3d-morph 5.8109+-0.0442 ^ 5.7021+-0.0318 ^ definitely 1.0191x faster 3d-raytrace 6.6403+-0.1148 6.5815+-0.0498 access-binary-trees 2.3595+-0.0661 ? 2.4184+-0.1082 ? might be 1.0250x slower access-fannkuch 6.6221+-0.0225 ? 6.6243+-0.0314 ? access-nbody 3.1819+-0.2342 3.0983+-0.2441 might be 1.0270x faster access-nsieve 3.1227+-0.0988 ? 3.1822+-0.1108 ? might be 1.0190x slower bitops-3bit-bits-in-byte 1.4233+-0.1350 1.2878+-0.0750 might be 1.1052x faster bitops-bits-in-byte 3.2470+-0.2376 3.1207+-0.1591 might be 1.0405x faster bitops-bitwise-and 2.2525+-0.1753 2.1996+-0.1571 might be 1.0241x faster bitops-nsieve-bits 3.3694+-0.0685 ? 3.4281+-0.1762 ? might be 1.0174x slower controlflow-recursive 2.6635+-0.0883 2.6492+-0.0836 crypto-aes 4.7409+-0.1255 ? 4.8404+-0.1462 ? might be 1.0210x slower crypto-md5 2.9343+-0.1313 ? 2.9436+-0.1573 ? crypto-sha1 2.7653+-0.1314 2.7324+-0.1474 might be 1.0120x faster date-format-tofte 9.1097+-0.1255 ? 9.1619+-0.1676 ? date-format-xparb 5.8352+-0.4340 5.8310+-0.5341 math-cordic 3.1780+-0.2292 3.0309+-0.0762 might be 1.0485x faster math-partial-sums 5.5181+-0.0985 ? 5.5448+-0.1357 ? math-spectral-norm 2.3109+-0.1936 2.2401+-0.1435 might be 1.0316x faster regexp-dna 7.3235+-0.0835 ? 7.3400+-0.0769 ? string-base64 5.4904+-0.0171 5.4526+-0.0898 string-fasta 6.6854+-0.3892 6.3913+-0.1397 might be 1.0460x faster string-tagcloud 9.0543+-0.0971 9.0055+-0.0667 string-unpack-code 19.3328+-0.1389 19.2624+-0.0763 string-validate-input 4.9182+-0.1033 4.8903+-0.0306 <arithmetic> 5.2309+-0.0322 5.1925+-0.0267 might be 1.0074x faster Baseline Globals LongSpider: 3d-cube 916.2048+-5.8220 ? 920.5600+-10.9478 ? 3d-morph 673.4544+-4.7696 672.2407+-3.8510 3d-raytrace 722.4553+-4.9574 717.3063+-6.0604 access-binary-trees 973.2210+-5.9480 ? 977.3012+-3.8647 ? access-fannkuch 304.2281+-8.7583 304.1552+-9.3859 access-nbody 576.4872+-7.1122 574.8656+-4.3767 access-nsieve 399.1779+-3.9584 ? 401.0313+-3.9492 ? bitops-3bit-bits-in-byte 33.1424+-1.1743 33.1211+-1.1888 bitops-bits-in-byte 130.2072+-2.8976 126.8325+-1.1433 might be 1.0266x faster bitops-nsieve-bits 394.6471+-2.0726 394.3495+-2.1760 controlflow-recursive 517.0124+-4.6390 ? 518.6992+-6.4181 ? crypto-aes 721.2280+-5.6501 716.8040+-5.0910 crypto-md5 516.4913+-4.6946 515.4666+-3.9111 crypto-sha1 694.2994+-5.3092 ? 698.8859+-5.9099 ? date-format-tofte 743.7155+-12.2938 ? 750.8892+-8.5127 ? date-format-xparb 740.2295+-18.3272 ? 742.1942+-12.7894 ? hash-map 165.4574+-1.9673 163.7309+-1.7109 might be 1.0105x faster math-cordic 504.1569+-2.1161 ? 505.9414+-4.1270 ? math-partial-sums 484.8505+-4.3969 483.7533+-1.5952 math-spectral-norm 558.7908+-5.2618 557.7940+-4.2058 string-base64 484.0460+-4.9470 482.8081+-4.0482 string-fasta 387.0775+-3.4383 ? 389.6418+-4.8587 ? string-tagcloud 194.7116+-1.9728 191.5824+-1.2712 might be 1.0163x faster <geometric> 426.7983+-1.2807 426.1545+-1.0405 might be 1.0015x faster Baseline Globals Octane: encrypt 0.18655+-0.00213 0.18572+-0.00093 decrypt 3.25772+-0.00609 ? 3.26802+-0.02063 ? deltablue x2 0.15764+-0.00157 0.15661+-0.00158 earley 0.34864+-0.00180 0.34770+-0.00155 boyer 5.39879+-0.01216 ! 5.45084+-0.02442 ! definitely 1.0096x slower navier-stokes x2 5.03613+-0.01319 5.02649+-0.01372 raytrace x2 0.99191+-0.00767 ? 1.00541+-0.01106 ? might be 1.0136x slower richards x2 0.09991+-0.00106 ? 0.10037+-0.00109 ? splay x2 0.40398+-0.00392 ^ 0.39714+-0.00219 ^ definitely 1.0172x faster regexp x2 19.50037+-0.24156 19.39496+-0.16158 pdfjs x2 47.01394+-0.42742 46.99978+-0.28597 mandreel x2 47.77950+-0.39967 47.26620+-0.17380 might be 1.0109x faster gbemu x2 29.42102+-0.35691 29.37393+-0.36826 closure 0.61575+-0.00318 0.61095+-0.00229 jquery 7.95908+-0.02400 ? 7.97821+-0.02185 ? box2d x2 10.77104+-0.04706 ? 10.77658+-0.02816 ? zlib x2 386.02963+-10.63456 379.82726+-11.12266 might be 1.0163x faster typescript x2 798.93118+-1.83252 796.33325+-4.18360 <geometric> 5.85009+-0.00769 5.83292+-0.02200 might be 1.0029x faster Baseline Globals Kraken: ai-astar 91.885+-0.277 ? 92.087+-0.439 ? audio-beat-detection 48.918+-0.672 48.688+-0.416 audio-dft 106.372+-0.885 105.973+-0.750 audio-fft 38.929+-0.767 38.685+-0.235 audio-oscillator 54.560+-1.041 54.061+-0.149 imaging-darkroom 70.040+-1.250 69.289+-1.103 might be 1.0108x faster imaging-desaturate 58.319+-1.481 56.835+-0.230 might be 1.0261x faster imaging-gaussian-blur 78.203+-4.005 77.683+-3.446 json-parse-financial 45.561+-0.250 ^ 44.140+-0.297 ^ definitely 1.0322x faster json-stringify-tinderbox 27.231+-0.099 ^ 26.747+-0.091 ^ definitely 1.0181x faster stanford-crypto-aes 43.270+-1.300 42.567+-0.867 might be 1.0165x faster stanford-crypto-ccm 41.517+-1.232 41.516+-2.300 stanford-crypto-pbkdf2 107.880+-1.962 ? 109.492+-1.358 ? might be 1.0149x slower stanford-crypto-sha256-iterative 42.476+-0.132 42.365+-0.136 <arithmetic> 61.083+-0.390 60.723+-0.350 might be 1.0059x faster Baseline Globals AsmBench: bigfib.cpp 496.3222+-3.4619 496.2094+-2.2564 container.cpp 3237.5482+-19.3035 3221.4546+-7.1316 dry.c 478.5976+-13.6448 ? 485.1730+-17.6073 ? might be 1.0137x slower float-mm.c 731.2737+-5.5353 ? 733.5684+-3.4189 ? gcc-loops.cpp 4310.5413+-17.6605 4309.5690+-8.7882 hash-map 164.6410+-1.8104 ? 165.1901+-2.6786 ? n-body.c 890.8668+-3.5407 ? 891.5270+-3.9239 ? quicksort.c 410.7419+-4.2702 ? 411.4000+-4.7954 ? towers.c 282.9170+-4.7584 280.5052+-3.2118 <geometric> 701.7434+-1.9630 ? 702.3848+-3.6333 ? might be 1.0009x slower Baseline Globals Geomean of preferred means: <scaled-result> 56.1831+-0.0888 55.9945+-0.1111 might be 1.0034x faster Comment on attachment 276137 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=276137&action=review Added some comments. > Source/JavaScriptCore/runtime/SymbolTable.h:217 > + } Seeing SymbolTableEntry, I found that FatEntry has costly copying path. And it does not have any move constructors (since we define copy constructors, implicit move constructors are not defined). To alleviate this situation, I added move constructors. Created attachment 276144 [details]
Patch
Windows build fix
Comment on attachment 276144 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=276144&action=review > Source/JavaScriptCore/runtime/JSSymbolTableObject.h:162 > + Invalidate Style: spacing looks off here. (I'm on my phone so maybe I'm wrong and bugzilla is showing it weirdly.) > Source/JavaScriptCore/runtime/SymbolTable.h:572 > + template<typename Entry> Why the templates for this method? Comment on attachment 276144 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=276144&action=review Thanks! >> Source/JavaScriptCore/runtime/JSSymbolTableObject.h:162 >> + Invalidate > > Style: spacing looks off here. (I'm on my phone so maybe I'm wrong and bugzilla is showing it weirdly.) 4 spaces here :) >> Source/JavaScriptCore/runtime/SymbolTable.h:572 >> + template<typename Entry> > > Why the templates for this method? This is because we don't want to create duplicate methods. void add(const ConcurrentJITLocker&, UniquedStringImpl* key, Entry&& entry); and void add(const ConcurrentJITLocker&, UniquedStringImpl* key, Entry& entry); C++11 has an interesting rule. When you define a templatized `T&&` and when you pass the lvalue, it becomes `T&`. So, you can pass the lvalue even if you just define the method with `T&&`. For example, void test(int&& value); int value = 20; test(value); // compile error. value is lvalue ref. test(20); // compiling is OK. But, template<typename T> void test(T&& value); int value = 20; test(value); // OK! test(20); // OK! And we should use std::forward instead of std::move. std::forward is used for this cases. std::move always converts to T&&. But std::forward is not the same. It converts the rvalue to T&&, but it converts lvalue to T&. Committed r199342: <http://trac.webkit.org/changeset/199342> |