RESOLVED FIXED Bug 152723
[DFG][FTL] Implement ES6 Generators in DFG / FTL
https://bugs.webkit.org/show_bug.cgi?id=152723
Summary [DFG][FTL] Implement ES6 Generators in DFG / FTL
Yusuke Suzuki
Reported 2016-01-04 18:16:28 PST
...
Attachments
Patch (53.44 KB, patch)
2016-01-11 02:57 PST, Yusuke Suzuki
no flags
Patch (57.41 KB, patch)
2016-01-11 04:31 PST, Yusuke Suzuki
no flags
Patch (72.42 KB, patch)
2016-01-14 07:25 PST, Yusuke Suzuki
no flags
Patch (79.77 KB, patch)
2016-01-14 10:06 PST, Yusuke Suzuki
no flags
Patch (84.26 KB, patch)
2016-01-15 04:44 PST, Yusuke Suzuki
no flags
Patch (88.58 KB, patch)
2016-01-15 07:17 PST, Yusuke Suzuki
no flags
Patch (87.79 KB, patch)
2016-01-15 08:11 PST, Yusuke Suzuki
no flags
Patch (92.06 KB, patch)
2016-01-15 10:19 PST, Yusuke Suzuki
no flags
Patch (93.10 KB, patch)
2016-01-15 23:01 PST, Yusuke Suzuki
no flags
Patch (96.69 KB, patch)
2016-01-16 00:13 PST, Yusuke Suzuki
no flags
Patch (97.53 KB, patch)
2016-01-16 02:11 PST, Yusuke Suzuki
no flags
Patch (99.50 KB, patch)
2016-01-16 03:00 PST, Yusuke Suzuki
no flags
Patch (103.33 KB, patch)
2016-01-16 07:00 PST, Yusuke Suzuki
no flags
Patch (107.12 KB, patch)
2016-01-16 08:15 PST, Yusuke Suzuki
no flags
Patch (107.12 KB, patch)
2016-01-16 08:16 PST, Yusuke Suzuki
no flags
Patch (112.59 KB, patch)
2016-01-16 09:03 PST, Yusuke Suzuki
no flags
Patch (113.00 KB, patch)
2016-01-16 09:20 PST, Yusuke Suzuki
no flags
Patch (124.35 KB, patch)
2016-01-19 12:18 PST, Yusuke Suzuki
no flags
Patch (124.29 KB, patch)
2016-01-19 12:29 PST, Yusuke Suzuki
no flags
Patch (128.14 KB, patch)
2016-01-20 22:53 PST, Yusuke Suzuki
no flags
Patch (121.88 KB, patch)
2016-04-27 21:23 PDT, Yusuke Suzuki
no flags
Patch (138.86 KB, patch)
2016-05-05 09:36 PDT, Yusuke Suzuki
no flags
Patch (136.99 KB, patch)
2016-05-18 20:14 PDT, Yusuke Suzuki
no flags
Patch (180.58 KB, patch)
2016-07-03 12:50 PDT, Yusuke Suzuki
no flags
Patch (195.30 KB, patch)
2016-07-04 04:48 PDT, Yusuke Suzuki
no flags
Patch (180.01 KB, patch)
2016-07-04 06:33 PDT, Yusuke Suzuki
no flags
Patch (182.91 KB, patch)
2016-07-04 07:47 PDT, Yusuke Suzuki
no flags
Patch (145.20 KB, patch)
2016-07-11 05:14 PDT, Yusuke Suzuki
no flags
Patch (150.08 KB, patch)
2016-07-12 12:21 PDT, Yusuke Suzuki
no flags
Patch (153.66 KB, patch)
2016-07-14 23:33 PDT, Yusuke Suzuki
no flags
Patch (165.48 KB, patch)
2016-07-15 04:47 PDT, Yusuke Suzuki
no flags
Archive of layout-test-results from ews112 for mac-yosemite (1.32 MB, application/zip)
2016-07-15 06:01 PDT, Build Bot
no flags
Patch (169.13 KB, patch)
2016-07-15 09:36 PDT, Yusuke Suzuki
no flags
Patch (182.12 KB, patch)
2016-07-19 04:24 PDT, Yusuke Suzuki
no flags
Patch (185.10 KB, patch)
2016-07-19 06:46 PDT, Yusuke Suzuki
no flags
Patch (185.16 KB, patch)
2016-07-20 05:36 PDT, Yusuke Suzuki
no flags
Patch (186.27 KB, patch)
2016-07-21 06:38 PDT, Yusuke Suzuki
no flags
Patch (186.29 KB, patch)
2016-07-21 06:48 PDT, Yusuke Suzuki
no flags
Patch (187.15 KB, patch)
2016-07-21 07:07 PDT, Yusuke Suzuki
no flags
Patch (187.32 KB, patch)
2016-07-21 07:57 PDT, Yusuke Suzuki
no flags
Patch (191.12 KB, patch)
2016-07-21 10:26 PDT, Yusuke Suzuki
no flags
Patch (193.45 KB, patch)
2016-07-25 07:40 PDT, Yusuke Suzuki
no flags
Patch (195.06 KB, patch)
2016-07-25 08:50 PDT, Yusuke Suzuki
no flags
Patch (195.02 KB, patch)
2016-07-25 08:57 PDT, Yusuke Suzuki
no flags
Patch (238.43 KB, patch)
2016-08-08 07:57 PDT, Yusuke Suzuki
no flags
Patch (216.74 KB, patch)
2016-08-09 01:17 PDT, Yusuke Suzuki
no flags
Patch (217.19 KB, patch)
2016-08-19 22:10 PDT, Yusuke Suzuki
no flags
Patch (217.49 KB, patch)
2016-08-23 11:58 PDT, Yusuke Suzuki
no flags
Patch (217.72 KB, patch)
2016-08-23 13:30 PDT, Yusuke Suzuki
no flags
Patch (227.27 KB, patch)
2016-08-23 20:02 PDT, Yusuke Suzuki
no flags
Patch (230.02 KB, patch)
2016-08-24 10:28 PDT, Yusuke Suzuki
no flags
Patch (186.50 KB, patch)
2016-08-24 12:55 PDT, Yusuke Suzuki
no flags
Patch (185.15 KB, patch)
2016-08-24 14:51 PDT, Yusuke Suzuki
no flags
Patch (188.03 KB, patch)
2016-08-24 16:01 PDT, Yusuke Suzuki
no flags
Archive of layout-test-results from ews102 for mac-yosemite (860.69 KB, application/zip)
2016-08-24 16:53 PDT, Build Bot
no flags
Archive of layout-test-results from ews106 for mac-yosemite-wk2 (560.51 KB, application/zip)
2016-08-24 17:03 PDT, Build Bot
no flags
Archive of layout-test-results from ews122 for ios-simulator-elcapitan-wk2 (386.18 KB, application/zip)
2016-08-24 17:19 PDT, Build Bot
no flags
Patch (188.06 KB, patch)
2016-08-24 19:49 PDT, Yusuke Suzuki
no flags
Patch (187.14 KB, patch)
2016-08-24 19:58 PDT, Yusuke Suzuki
no flags
Archive of layout-test-results from ews121 for ios-simulator-elcapitan-wk2 (724.89 KB, application/zip)
2016-08-25 01:18 PDT, Build Bot
no flags
Patch (196.46 KB, patch)
2016-08-25 11:34 PDT, Yusuke Suzuki
no flags
Patch (207.73 KB, patch)
2016-08-25 13:44 PDT, Yusuke Suzuki
no flags
Patch (199.87 KB, patch)
2016-08-25 14:18 PDT, Yusuke Suzuki
fpizlo: review+
Patch (213.96 KB, patch)
2016-08-25 15:51 PDT, Yusuke Suzuki
no flags
Patch for landing (204.49 KB, patch)
2016-08-25 15:54 PDT, Yusuke Suzuki
no flags
Yusuke Suzuki
Comment 1 2016-01-11 02:57:21 PST
Created attachment 268682 [details] Patch WIP
Yusuke Suzuki
Comment 2 2016-01-11 04:31:21 PST
Created attachment 268684 [details] Patch WIP part2. Planning (1) transfering type speculation between save/resume and (2) FTL support
Yusuke Suzuki
Comment 3 2016-01-14 07:25:41 PST
Created attachment 268960 [details] Patch WIP part3
Yusuke Suzuki
Comment 4 2016-01-14 10:06:45 PST
Created attachment 268979 [details] Patch WIP part4
Yusuke Suzuki
Comment 5 2016-01-15 04:44:18 PST
Created attachment 269045 [details] Patch WIP part5
Yusuke Suzuki
Comment 6 2016-01-15 04:45:49 PST
WIP, maybe still there're a lot of bugs. Still implementing... But passed the JSC tests and benchmarks. generator-fib 121.5236+-6.1317 ^ 44.7489+-2.9932 ^ definitely 2.7157x faster generator-with-several-types 392.1553+-15.2586 ^ 156.1691+-20.0224 ^ definitely 2.5111x faster
Yusuke Suzuki
Comment 7 2016-01-15 07:17:58 PST
Created attachment 269049 [details] Patch WIP part6
Yusuke Suzuki
Comment 8 2016-01-15 08:11:15 PST
Created attachment 269053 [details] Patch WIP part7
Yusuke Suzuki
Comment 9 2016-01-15 10:19:42 PST
Created attachment 269064 [details] Patch WIP part9
Yusuke Suzuki
Comment 10 2016-01-15 23:01:55 PST
Created attachment 269155 [details] Patch WIP part10
Yusuke Suzuki
Comment 11 2016-01-16 00:13:41 PST
Created attachment 269160 [details] Patch WIP part11
Yusuke Suzuki
Comment 12 2016-01-16 02:11:21 PST
Created attachment 269163 [details] Patch WIP part12
Yusuke Suzuki
Comment 13 2016-01-16 03:00:26 PST
Created attachment 269164 [details] Patch WIP part13
WebKit Commit Bot
Comment 14 2016-01-16 03:03:14 PST
Attachment 269164 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/parser/Parser.cpp:1791: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 1 in 51 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 15 2016-01-16 07:00:34 PST
Created attachment 269166 [details] Patch WIP part14
Yusuke Suzuki
Comment 16 2016-01-16 08:15:10 PST
Yusuke Suzuki
Comment 17 2016-01-16 08:16:52 PST
Created attachment 269170 [details] Patch WIP part15
Yusuke Suzuki
Comment 18 2016-01-16 09:03:08 PST
Created attachment 269171 [details] Patch Now, ready for review! Soon, I'll paste the benchmark results
WebKit Commit Bot
Comment 19 2016-01-16 09:04:47 PST
Attachment 269171 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/parser/Parser.cpp:1791: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 1 in 59 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 20 2016-01-16 09:16:05 PST
Comment on attachment 269171 [details] Patch Fixing 32bit env...
Yusuke Suzuki
Comment 21 2016-01-16 09:20:08 PST
Created attachment 269174 [details] Patch Fix 32bit
WebKit Commit Bot
Comment 22 2016-01-16 09:22:59 PST
Attachment 269174 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/parser/Parser.cpp:1791: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 1 in 59 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 23 2016-01-16 09:40:29 PST
Took benchmark results on my OSX laptop. Before taking this benchmark results, rebooted the laptop and all the apps (except for terminal) are closed. Since the changed part has only effect on generators. So tests that does not include generators should not pose any effect. Highlighted changes are, generator-create 1.2200+-0.5200 0.9627+-0.1419 might be 1.2673x faster generator-fib 118.8876+-0.5708 ^ 43.5643+-0.3838 ^ definitely 2.7290x faster generator-function-create 10.0336+-0.3390 9.7493+-1.4768 might be 1.0292x faster generator-sunspider-access-nsieve 7.8635+-2.5581 5.3915+-0.1714 might be 1.4585x faster generator-with-several-types 416.8899+-22.3979 ^ 164.9109+-24.5890 ^ definitely 2.5280x faster Benchmark report for SunSpider, LongSpider, V8Spider, Octane, and JSRegress on yusuke (MacBookPro8,2). VMs tested: "Baseline" at /Users/yusuke/dev/WebKit/WebKitBuild/Release/jsc "Mine" at /Users/yusuke/dev/WebKit/WebKitBuild/generator-ftl/Release/jsc Collected 4 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. Baseline Mine SunSpider: 3d-cube 6.6135+-1.5516 ? 6.7084+-1.8345 ? might be 1.0144x slower 3d-morph 6.5295+-0.4569 6.4177+-0.1223 might be 1.0174x faster 3d-raytrace 7.8571+-0.2463 7.7281+-0.2176 might be 1.0167x faster access-binary-trees 2.6973+-0.0763 ? 3.4146+-1.0270 ? might be 1.2660x slower access-fannkuch 7.1370+-0.0781 ? 7.3554+-0.2040 ? might be 1.0306x slower access-nbody 3.4297+-0.1254 3.4250+-0.1561 access-nsieve 3.7523+-0.0990 3.7340+-0.0549 bitops-3bit-bits-in-byte 1.8710+-0.6038 ? 1.9328+-0.4185 ? might be 1.0330x slower bitops-bits-in-byte 4.0102+-0.0117 3.9764+-0.0275 bitops-bitwise-and 2.7487+-0.4331 2.6878+-0.3837 might be 1.0227x faster bitops-nsieve-bits 4.1600+-1.0224 ? 4.1672+-1.3401 ? controlflow-recursive 2.9055+-0.0564 ? 2.9125+-0.0220 ? crypto-aes 5.2635+-0.3347 ? 5.5054+-0.8920 ? might be 1.0460x slower crypto-md5 3.3755+-0.1466 ? 3.6356+-1.1968 ? might be 1.0770x slower crypto-sha1 3.2115+-0.2792 3.1746+-0.3205 might be 1.0116x faster date-format-tofte 11.2078+-0.1954 ? 11.2391+-0.2506 ? date-format-xparb 6.1472+-0.5117 6.0203+-0.2171 might be 1.0211x faster math-cordic 3.6597+-0.0684 ? 4.1700+-1.5648 ? might be 1.1395x slower math-partial-sums 8.8577+-4.5654 6.2812+-0.0629 might be 1.4102x faster math-spectral-norm 2.9510+-0.2375 ? 2.9642+-0.4160 ? regexp-dna 7.9128+-0.8870 ? 8.4081+-1.6684 ? might be 1.0626x slower string-base64 5.4590+-0.0866 ? 5.6124+-0.1479 ? might be 1.0281x slower string-fasta 7.5964+-0.5610 7.5217+-0.1122 string-tagcloud 10.1825+-0.9876 ? 10.2383+-1.3992 ? string-unpack-code 22.2491+-0.2934 ? 22.7347+-0.4375 ? might be 1.0218x slower string-validate-input 5.3264+-0.0908 ? 5.4005+-0.0320 ? might be 1.0139x slower <arithmetic> 6.0428+-0.2031 ? 6.0525+-0.0963 ? might be 1.0016x slower Baseline Mine LongSpider: 3d-cube 957.1688+-6.8514 ? 986.7496+-28.9496 ? might be 1.0309x slower 3d-morph 1608.8015+-12.5054 1596.1349+-10.2871 3d-raytrace 770.9914+-25.9698 752.5432+-13.5814 might be 1.0245x faster access-binary-trees 1107.9697+-31.7159 ? 1118.7809+-34.0228 ? access-fannkuch 375.5338+-26.6280 ? 375.8718+-17.8031 ? access-nbody 648.8972+-6.7629 ? 650.2045+-13.1414 ? access-nsieve 521.1401+-10.4271 ? 523.7720+-2.5604 ? bitops-3bit-bits-in-byte 44.8472+-0.1872 ? 45.5958+-1.5058 ? might be 1.0167x slower bitops-bits-in-byte 105.4611+-3.3889 102.9490+-3.8686 might be 1.0244x faster bitops-nsieve-bits 464.4395+-12.2595 462.9997+-14.4145 controlflow-recursive 510.9463+-6.2112 ? 512.9075+-10.3983 ? crypto-aes 748.0834+-23.2711 747.1068+-11.8295 crypto-md5 628.9510+-89.6540 607.7679+-28.1315 might be 1.0349x faster crypto-sha1 778.1740+-9.6546 ? 783.4619+-13.5518 ? date-format-tofte 932.3907+-9.7863 ? 934.8044+-25.4281 ? date-format-xparb 884.8163+-30.5023 873.3738+-56.0092 might be 1.0131x faster hash-map 193.8607+-3.7826 ? 195.5181+-4.1690 ? math-cordic 588.2958+-4.5258 ? 592.5259+-6.1480 ? math-partial-sums 634.2053+-2.2228 631.9078+-5.6767 math-spectral-norm 923.2599+-9.4628 920.8083+-12.9377 string-base64 496.2100+-15.9992 ? 496.6341+-12.4789 ? string-fasta 442.4982+-12.7951 ? 450.5351+-11.3777 ? might be 1.0182x slower string-tagcloud 221.1089+-3.4552 ^ 210.9597+-0.9724 ^ definitely 1.0481x faster <geometric> 509.3734+-4.6004 508.3385+-1.0855 might be 1.0020x faster Baseline Mine V8Spider: crypto 64.7166+-2.7521 ? 66.5525+-4.3877 ? might be 1.0284x slower deltablue 94.7162+-3.3725 ? 99.6348+-5.4619 ? might be 1.0519x slower earley-boyer 54.7803+-1.2778 53.8055+-0.9997 might be 1.0181x faster raytrace 42.2197+-3.4004 ? 42.5782+-3.6748 ? regexp 83.6995+-4.6223 82.0895+-4.6813 might be 1.0196x faster richards 65.7115+-0.9947 ? 68.1229+-5.7969 ? might be 1.0367x slower splay 45.2549+-1.4994 44.7742+-1.2839 might be 1.0107x faster <geometric> 62.0024+-1.2359 ? 62.6703+-2.8512 ? might be 1.0108x slower Baseline Mine Octane: encrypt 0.21127+-0.00257 ? 0.21723+-0.01520 ? might be 1.0282x slower decrypt 3.73184+-0.03738 ? 3.75531+-0.04645 ? deltablue x2 0.18178+-0.00619 ? 0.18183+-0.00242 ? earley 0.41114+-0.00972 ? 0.41200+-0.01168 ? boyer 5.87101+-0.03913 ? 5.89061+-0.19537 ? navier-stokes x2 5.35584+-0.05071 5.35483+-0.02692 raytrace x2 1.23921+-0.03752 ? 1.24130+-0.02253 ? richards x2 0.11116+-0.00191 ? 0.11205+-0.00433 ? splay x2 0.45448+-0.01414 0.44997+-0.00978 might be 1.0100x faster regexp x2 31.99242+-0.99397 ? 32.21648+-0.58565 ? pdfjs x2 49.93894+-0.50640 49.92550+-1.03924 mandreel x2 56.26612+-0.53369 56.07338+-0.12855 gbemu x2 40.30225+-0.23039 ? 40.34692+-0.68051 ? closure 0.75101+-0.00811 ? 0.75724+-0.00721 ? jquery 9.60841+-0.07785 9.55727+-0.08191 box2d x2 12.58502+-0.20859 12.46600+-0.24676 zlib x2 456.10891+-3.99323 434.95986+-35.05054 might be 1.0486x faster typescript x2 907.55774+-19.23192 903.26978+-9.72763 <geometric> 6.94957+-0.04246 6.93233+-0.09327 might be 1.0025x faster Baseline Mine JSRegress: abc-forward-loop-equal 49.1013+-2.2412 48.3077+-0.3020 might be 1.0164x faster abc-postfix-backward-loop 49.3561+-4.2599 49.3277+-3.0291 abc-simple-backward-loop 47.5260+-0.1353 ? 50.0992+-4.1699 ? might be 1.0541x slower abc-simple-forward-loop 48.5655+-1.8298 ? 48.9828+-2.7944 ? abc-skippy-loop 33.1953+-0.3051 ? 36.3948+-5.5464 ? might be 1.0964x slower abs-boolean 3.7972+-1.3310 3.3354+-0.1584 might be 1.1385x faster adapt-to-double-divide 17.3801+-0.5764 17.3141+-0.4275 aliased-arguments-getbyval 1.4850+-0.2827 ? 1.7383+-0.4088 ? might be 1.1706x slower allocate-big-object 3.0532+-0.2087 2.9579+-0.0309 might be 1.0322x faster arguments-named-and-reflective 14.5895+-3.9691 14.1556+-3.5776 might be 1.0307x faster arguments-out-of-bounds 13.7591+-0.4352 ? 14.3187+-2.2317 ? might be 1.0407x slower arguments-strict-mode 13.7490+-7.4651 12.3979+-2.0978 might be 1.1090x faster arguments 10.2050+-0.4068 10.1874+-0.4304 arity-mismatch-inlining 1.3309+-0.4463 1.3090+-0.3275 might be 1.0167x faster array-access-polymorphic-structure 11.4212+-3.7682 10.5645+-0.7110 might be 1.0811x faster array-nonarray-polymorhpic-access 34.0472+-0.5577 ? 34.9432+-3.6240 ? might be 1.0263x slower array-prototype-every 93.6119+-0.3416 ? 95.7658+-6.9164 ? might be 1.0230x slower array-prototype-forEach 92.2443+-2.9804 ? 92.8041+-4.8277 ? array-prototype-map 103.8113+-1.2375 103.2403+-2.4075 array-prototype-reduce 88.1927+-2.2026 ? 90.0714+-3.9577 ? might be 1.0213x slower array-prototype-reduceRight 87.4215+-0.7557 ? 89.4000+-2.1378 ? might be 1.0226x slower array-prototype-some 96.7982+-3.5804 94.3013+-1.4306 might be 1.0265x faster array-splice-contiguous 29.0698+-0.7418 ? 29.4537+-0.9389 ? might be 1.0132x slower array-with-double-add 5.0870+-1.2016 ? 5.3254+-1.5299 ? might be 1.0469x slower array-with-double-increment 4.4110+-1.5465 3.9359+-0.0714 might be 1.1207x faster array-with-double-mul-add 5.5526+-0.1634 ? 5.7277+-0.3723 ? might be 1.0315x slower array-with-double-sum 4.0548+-1.4097 3.6006+-0.0524 might be 1.1262x faster array-with-int32-add-sub 8.1246+-0.0358 ? 8.2971+-0.5359 ? might be 1.0212x slower array-with-int32-or-double-sum 3.7239+-0.1749 ? 3.7304+-0.1544 ? ArrayBuffer-DataView-alloc-large-long-lived 42.1760+-6.4560 36.6850+-1.5309 might be 1.1497x faster ArrayBuffer-DataView-alloc-long-lived 15.5759+-0.2513 ? 16.2318+-1.9449 ? might be 1.0421x slower ArrayBuffer-Int32Array-byteOffset 4.8073+-0.0618 4.7556+-0.1009 might be 1.0109x faster ArrayBuffer-Int8Array-alloc-large-long-lived 36.3182+-1.4733 ? 37.1110+-0.7131 ? might be 1.0218x slower ArrayBuffer-Int8Array-alloc-long-lived-buffer 26.2698+-4.6767 24.6722+-0.9249 might be 1.0648x faster ArrayBuffer-Int8Array-alloc-long-lived 14.5778+-0.5694 ? 14.8995+-0.9044 ? might be 1.0221x slower ArrayBuffer-Int8Array-alloc 12.1888+-0.1477 ? 12.3448+-0.9075 ? might be 1.0128x slower arrowfunction-call 13.0016+-0.5056 ? 13.8998+-2.9600 ? might be 1.0691x slower asmjs_bool_bug 8.8573+-0.2693 8.8447+-0.3275 assign-custom-setter-polymorphic 3.3401+-0.0812 3.2767+-0.1518 might be 1.0193x faster assign-custom-setter 4.7892+-0.1400 4.6832+-0.0945 might be 1.0227x faster basic-set 9.6296+-0.1387 ? 9.8578+-0.6783 ? might be 1.0237x slower big-int-mul 4.6929+-2.2227 4.6152+-1.3148 might be 1.0168x faster boolean-test 4.2409+-0.3126 4.1175+-0.1424 might be 1.0300x faster branch-fold 4.4616+-0.0807 4.4512+-0.1535 branch-on-string-as-boolean 21.0085+-0.6298 20.9976+-0.6887 by-val-generic 3.2210+-0.1501 3.1756+-0.0430 might be 1.0143x faster call-spread-apply 32.3749+-0.8852 32.0688+-0.2365 call-spread-call 26.5223+-0.0734 ! 27.0713+-0.3330 ! definitely 1.0207x slower captured-assignments 0.5914+-0.0175 ? 0.9771+-0.8423 ? might be 1.6520x slower cast-int-to-double 5.8904+-0.1583 5.8314+-0.0923 might be 1.0101x faster cell-argument 7.5052+-0.3289 ? 7.7128+-0.1666 ? might be 1.0277x slower cfg-simplify 3.5175+-0.2400 3.4062+-0.1144 might be 1.0327x faster chain-getter-access 10.1584+-0.1585 10.0335+-0.1005 might be 1.0125x faster cmpeq-obj-to-obj-other 13.5953+-2.2502 ? 15.6415+-1.7784 ? might be 1.1505x slower constant-test 5.6870+-0.0212 ? 5.7828+-0.2088 ? might be 1.0169x slower create-lots-of-functions 12.9837+-0.1956 12.8092+-0.2989 might be 1.0136x faster cse-new-array-buffer 3.2512+-0.8380 2.9987+-0.1960 might be 1.0842x faster cse-new-array 3.1998+-0.6433 3.0286+-0.2043 might be 1.0565x faster custom-setter-getter-as-put-get-by-id 0.8570+-0.2995 ? 0.9675+-0.2983 ? might be 1.1289x slower DataView-custom-properties 46.8139+-6.8609 42.5288+-1.4140 might be 1.1008x faster delay-tear-off-arguments-strictmode 16.8764+-3.3210 15.9200+-0.3468 might be 1.0601x faster deltablue-varargs 238.3080+-3.8775 237.0714+-6.5629 destructuring-arguments 200.8109+-12.1585 ? 203.8318+-4.1242 ? might be 1.0150x slower destructuring-parameters-overridden-by-function 0.7225+-0.2648 0.7112+-0.2608 might be 1.0159x faster destructuring-swap 5.6870+-0.1892 5.6325+-0.1018 direct-arguments-getbyval 1.4431+-0.1903 ? 1.4465+-0.1758 ? div-boolean-double 5.6823+-0.0310 5.6813+-0.0081 div-boolean 8.3016+-0.1088 8.2961+-0.1168 double-get-by-val-out-of-bounds 5.5385+-0.4322 ? 5.6162+-0.6584 ? might be 1.0140x slower double-pollution-getbyval 9.3855+-0.7240 9.1550+-0.1521 might be 1.0252x faster double-pollution-putbyoffset 4.4276+-0.0628 ? 4.4693+-0.2163 ? double-real-use 32.1621+-2.1421 ? 33.9965+-7.1729 ? might be 1.0570x slower double-to-int32-typed-array-no-inline 2.7244+-0.0452 ? 2.8749+-0.2800 ? might be 1.0552x slower double-to-int32-typed-array 2.7892+-0.2477 2.6004+-0.1171 might be 1.0726x faster double-to-uint32-typed-array-no-inline 3.1311+-0.4395 2.9237+-0.2639 might be 1.0709x faster double-to-uint32-typed-array 2.6744+-0.3095 2.5881+-0.1049 might be 1.0334x faster elidable-new-object-dag 52.6190+-2.9322 ? 53.7195+-5.5645 ? might be 1.0209x slower elidable-new-object-roflcopter 47.2400+-4.9890 47.1855+-9.5473 elidable-new-object-then-call 44.2197+-1.2101 44.0276+-0.7706 elidable-new-object-tree 54.5570+-4.2766 ? 56.1700+-5.7376 ? might be 1.0296x slower empty-string-plus-int 6.3687+-0.2020 ? 7.1954+-1.3012 ? might be 1.1298x slower emscripten-cube2hash 42.7892+-1.9381 ? 44.6580+-3.1991 ? might be 1.0437x slower exit-length-on-plain-object 22.1130+-3.7373 ? 22.4902+-2.6078 ? might be 1.0171x slower external-arguments-getbyval 1.5681+-0.2328 ? 1.7063+-0.3875 ? might be 1.0881x slower external-arguments-putbyval 2.7730+-0.0748 ? 2.7867+-0.1058 ? fixed-typed-array-storage-var-index 2.0399+-0.3448 1.8005+-0.4659 might be 1.1330x faster fixed-typed-array-storage 1.5870+-0.4541 1.2667+-0.3060 might be 1.2528x faster Float32Array-matrix-mult 5.4862+-0.1590 ? 5.5446+-0.2223 ? might be 1.0106x slower Float32Array-to-Float64Array-set 58.9124+-4.1674 ? 59.5578+-3.6976 ? might be 1.0110x slower Float64Array-alloc-long-lived 84.1433+-2.7943 ? 86.3827+-8.6735 ? might be 1.0266x slower Float64Array-to-Int16Array-set 73.9370+-3.4517 ? 74.7150+-5.9981 ? might be 1.0105x slower fold-double-to-int 17.5828+-5.4432 ? 18.3354+-4.5448 ? might be 1.0428x slower fold-get-by-id-to-multi-get-by-offset-rare-int 12.2805+-1.4001 11.7153+-1.0435 might be 1.0482x faster fold-get-by-id-to-multi-get-by-offset 11.7339+-1.9971 11.3219+-2.0379 might be 1.0364x faster fold-multi-get-by-offset-to-get-by-offset 8.1058+-1.8668 ? 9.5884+-1.9026 ? might be 1.1829x slower fold-multi-get-by-offset-to-poly-get-by-offset 9.2368+-2.7458 ? 9.9045+-1.3105 ? might be 1.0723x slower fold-multi-put-by-offset-to-poly-put-by-offset 9.7652+-1.1093 ? 10.9388+-0.2238 ? might be 1.1202x slower fold-multi-put-by-offset-to-put-by-offset 10.0662+-1.0147 ? 10.2950+-1.4252 ? might be 1.0227x slower fold-multi-put-by-offset-to-replace-or-transition-put-by-offset 13.1579+-0.9129 12.7476+-0.1239 might be 1.0322x faster fold-put-by-id-to-multi-put-by-offset 12.9548+-3.4941 12.1592+-1.1986 might be 1.0654x faster fold-put-by-val-with-string-to-multi-put-by-offset 13.2679+-0.7840 12.3945+-1.4552 might be 1.0705x faster fold-put-by-val-with-symbol-to-multi-put-by-offset 12.9568+-2.6826 11.7208+-0.6037 might be 1.1055x faster fold-put-structure 8.8802+-1.5153 8.5976+-1.1083 might be 1.0329x faster for-of-iterate-array-entries 13.5336+-0.8718 13.4623+-0.2718 for-of-iterate-array-keys 4.5812+-0.2529 ? 4.9571+-1.4953 ? might be 1.0821x slower for-of-iterate-array-values 5.4697+-1.2407 4.6628+-0.6111 might be 1.1730x faster fround 19.8503+-0.6162 ? 21.5797+-6.2190 ? might be 1.0871x slower ftl-library-inlining-dataview 83.2255+-8.3910 78.0078+-0.5912 might be 1.0669x faster ftl-library-inlining 23.4915+-2.3772 22.4617+-1.2016 might be 1.0458x faster ftl-polymorphic-bitand 153.8563+-7.3986 ? 157.4983+-8.3470 ? might be 1.0237x slower ftl-polymorphic-bitor 151.5762+-1.5081 ? 153.2360+-5.5552 ? might be 1.0110x slower ftl-polymorphic-bitxor 153.8497+-6.3570 151.2029+-1.3124 might be 1.0175x faster ftl-polymorphic-div 486.3962+-6.0661 ? 489.9423+-10.9462 ? ftl-polymorphic-lshift 161.1298+-2.1611 ? 164.0795+-6.3978 ? might be 1.0183x slower ftl-polymorphic-mul 243.6516+-9.4752 ? 244.3573+-3.9593 ? ftl-polymorphic-rshift 161.2322+-4.4639 ? 162.0542+-4.4340 ? ftl-polymorphic-StringFromCharCode 230.0170+-4.7813 ? 230.3253+-5.2941 ? ftl-polymorphic-sub 188.0580+-6.2131 ? 190.3586+-5.4628 ? might be 1.0122x slower ftl-polymorphic-urshift 176.9466+-2.6766 ? 181.2885+-13.3315 ? might be 1.0245x slower function-call 13.9629+-3.3446 12.8900+-0.1625 might be 1.0832x faster function-dot-apply 2.6342+-0.0891 ? 2.7331+-0.2513 ? might be 1.0375x slower function-test 4.0369+-0.3516 3.9719+-0.3710 might be 1.0164x faster function-with-eval 122.1512+-4.1273 119.7998+-5.6069 might be 1.0196x faster gcse-poly-get-less-obvious 29.2258+-0.3031 28.9655+-0.7284 gcse-poly-get 31.8632+-4.8787 30.4772+-1.2352 might be 1.0455x faster gcse 4.6445+-0.1558 ? 4.7795+-0.0825 ? might be 1.0291x slower generator-create 1.2200+-0.5200 0.9627+-0.1419 might be 1.2673x faster generator-fib 118.8876+-0.5708 ^ 43.5643+-0.3838 ^ definitely 2.7290x faster generator-function-create 10.0336+-0.3390 9.7493+-1.4768 might be 1.0292x faster generator-sunspider-access-nsieve 7.8635+-2.5581 5.3915+-0.1714 might be 1.4585x faster generator-with-several-types 416.8899+-22.3979 ^ 164.9109+-24.5890 ^ definitely 2.5280x faster get-by-id-bimorphic-check-structure-elimination-simple 3.0233+-0.0632 ? 3.0429+-0.1112 ? get-by-id-bimorphic-check-structure-elimination 6.1466+-0.1543 ? 6.5287+-1.4919 ? might be 1.0622x slower get-by-id-chain-from-try-block 4.0538+-1.6826 3.4958+-0.3218 might be 1.1596x faster get-by-id-check-structure-elimination 5.3030+-1.4724 5.2213+-1.0842 might be 1.0157x faster get-by-id-proto-or-self 19.7063+-0.7011 ? 20.9982+-3.0637 ? might be 1.0656x slower get-by-id-quadmorphic-check-structure-elimination-simple 3.3561+-0.0807 ? 3.3923+-0.2922 ? might be 1.0108x slower get-by-id-self-or-proto 22.1761+-3.3647 21.7881+-3.9328 might be 1.0178x faster get-by-val-out-of-bounds 5.3730+-0.3829 5.2238+-0.1073 might be 1.0286x faster get-by-val-with-string-bimorphic-check-structure-elimination-simple 3.4805+-0.4063 3.3644+-0.1302 might be 1.0345x faster get-by-val-with-string-bimorphic-check-structure-elimination 7.7040+-0.0198 ? 7.8965+-0.2336 ? might be 1.0250x slower get-by-val-with-string-chain-from-try-block 3.5096+-0.2987 ? 3.5308+-0.2147 ? get-by-val-with-string-check-structure-elimination 7.3524+-0.4233 ? 7.5450+-0.5000 ? might be 1.0262x slower get-by-val-with-string-proto-or-self 20.3856+-0.5104 ? 21.0280+-1.2231 ? might be 1.0315x slower get-by-val-with-string-quadmorphic-check-structure-elimination-simple 4.0242+-0.1460 3.9997+-0.1353 get-by-val-with-string-self-or-proto 20.6885+-0.6575 ? 20.6997+-1.0052 ? get-by-val-with-symbol-bimorphic-check-structure-elimination-simple 4.5553+-1.9463 3.8970+-0.0545 might be 1.1689x faster get-by-val-with-symbol-bimorphic-check-structure-elimination 14.8412+-0.1021 ? 14.8915+-0.2320 ? get-by-val-with-symbol-chain-from-try-block 3.4118+-0.0792 3.4108+-0.1040 get-by-val-with-symbol-check-structure-elimination 14.3368+-0.5205 ? 14.7971+-1.8034 ? might be 1.0321x slower get-by-val-with-symbol-proto-or-self 20.4984+-0.3175 ? 22.1375+-3.5022 ? might be 1.0800x slower get-by-val-with-symbol-quadmorphic-check-structure-elimination-simple 5.2720+-1.2327 5.2632+-1.1514 get-by-val-with-symbol-self-or-proto 21.4274+-2.8726 20.6177+-1.0474 might be 1.0393x faster get_callee_monomorphic 3.6324+-1.0696 3.4792+-0.6792 might be 1.0440x faster get_callee_polymorphic 4.0424+-0.0115 ? 4.0812+-0.1487 ? getter-no-activation 5.5668+-0.0560 ? 5.6093+-0.1390 ? getter-prototype 11.3042+-0.3101 11.2451+-0.3957 getter-richards-try-catch 1400.2003+-46.7937 1373.3050+-77.0875 might be 1.0196x faster getter-richards 129.2112+-10.5066 127.0967+-9.7747 might be 1.0166x faster getter 6.0789+-0.5043 ? 6.4469+-0.9494 ? might be 1.0605x slower global-object-access-with-mutating-structure 6.8382+-0.5325 6.7331+-0.1208 might be 1.0156x faster global-var-const-infer-fire-from-opt 1.3065+-0.3502 1.1509+-0.4315 might be 1.1352x faster global-var-const-infer 1.0833+-0.3658 ? 1.1041+-0.6674 ? might be 1.0193x slower hard-overflow-check-equal 39.4966+-0.3720 ? 40.8600+-4.2979 ? might be 1.0345x slower hard-overflow-check 41.2150+-3.7337 39.3070+-0.5612 might be 1.0485x faster HashMap-put-get-iterate-keys 31.3185+-0.9808 ? 33.6491+-2.2766 ? might be 1.0744x slower HashMap-put-get-iterate 34.3345+-0.8111 ? 35.2565+-0.7706 ? might be 1.0269x slower HashMap-string-put-get-iterate 35.2589+-10.2377 32.7410+-1.0501 might be 1.0769x faster hoist-make-rope 12.5635+-0.3059 ? 13.2006+-2.0610 ? might be 1.0507x slower hoist-poly-check-structure-effectful-loop 5.6074+-0.0112 ? 6.0024+-1.0418 ? might be 1.0705x slower hoist-poly-check-structure 3.6085+-0.0392 ? 3.6943+-0.1198 ? might be 1.0238x slower imul-double-only 8.8657+-0.6659 8.7484+-0.4374 might be 1.0134x faster imul-int-only 9.8210+-1.0109 ? 10.7449+-0.8450 ? might be 1.0941x slower imul-mixed 8.2993+-1.0834 ? 8.8353+-0.0675 ? might be 1.0646x slower in-four-cases 23.8802+-0.4564 ? 23.9651+-0.3161 ? in-one-case-false 12.6683+-0.2835 ? 12.8854+-0.6306 ? might be 1.0171x slower in-one-case-true 12.6793+-0.2091 ? 12.7999+-0.0979 ? in-two-cases 13.2969+-0.5054 13.2891+-0.2681 indexed-properties-in-objects 3.3384+-0.1014 3.3044+-0.1103 might be 1.0103x faster infer-closure-const-then-mov-no-inline 4.8616+-0.1867 4.8461+-0.1081 infer-closure-const-then-mov 20.4728+-0.4341 19.9628+-0.4379 might be 1.0255x faster infer-closure-const-then-put-to-scope-no-inline 15.4203+-4.2822 14.2124+-0.8851 might be 1.0850x faster infer-closure-const-then-put-to-scope 25.1667+-3.4062 ? 25.4562+-4.3928 ? might be 1.0115x slower infer-closure-const-then-reenter-no-inline 59.9123+-5.1407 59.4790+-4.2387 infer-closure-const-then-reenter 25.4523+-2.9961 24.4045+-0.1975 might be 1.0429x faster infer-constant-global-property 4.5656+-1.3923 4.3297+-1.6271 might be 1.0545x faster infer-constant-property 3.1010+-0.0845 ? 4.1234+-1.9238 ? might be 1.3297x slower infer-one-time-closure-ten-vars 11.2654+-0.5650 ? 11.5563+-2.4049 ? might be 1.0258x slower infer-one-time-closure-two-vars 10.9797+-1.2589 ? 11.2925+-1.7257 ? might be 1.0285x slower infer-one-time-closure 10.4309+-0.2435 ? 10.7818+-0.8708 ? might be 1.0336x slower infer-one-time-deep-closure 17.2660+-2.0435 ? 18.5529+-3.3714 ? might be 1.0745x slower inline-arguments-access 4.8803+-0.0737 ? 4.9557+-0.1301 ? might be 1.0155x slower inline-arguments-aliased-access 4.9228+-0.1041 ? 5.1021+-0.2475 ? might be 1.0364x slower inline-arguments-local-escape 4.9770+-0.2579 4.9286+-0.1175 inline-get-scoped-var 5.6722+-0.6399 5.4073+-0.0605 might be 1.0490x faster inlined-put-by-id-transition 12.4840+-0.0867 ? 13.1221+-1.3040 ? might be 1.0511x slower inlined-put-by-val-with-string-transition 55.7495+-4.3925 ? 56.8580+-7.1625 ? might be 1.0199x slower inlined-put-by-val-with-symbol-transition 57.4451+-3.4639 ? 60.4240+-7.8810 ? might be 1.0519x slower instanceof-bound 29.3563+-4.1417 26.1268+-0.9674 might be 1.1236x faster int-or-other-abs-then-get-by-val 5.4945+-0.0392 ? 5.5238+-0.0155 ? int-or-other-abs-zero-then-get-by-val 20.8948+-3.0318 19.0383+-0.3799 might be 1.0975x faster int-or-other-add-then-get-by-val 5.6082+-0.1499 ? 5.6234+-0.2203 ? int-or-other-add 5.8735+-0.1022 ? 5.9098+-0.0703 ? int-or-other-div-then-get-by-val 4.7341+-0.0151 ? 4.7505+-0.0280 ? int-or-other-max-then-get-by-val 4.7966+-0.0562 ? 4.8360+-0.1033 ? int-or-other-min-then-get-by-val 4.8229+-0.0129 ? 5.0958+-0.9002 ? might be 1.0566x slower int-or-other-mod-then-get-by-val 4.9800+-1.1722 4.4495+-0.0672 might be 1.1192x faster int-or-other-mul-then-get-by-val 4.7797+-0.8482 4.5018+-0.0328 might be 1.0617x faster int-or-other-neg-then-get-by-val 5.3150+-0.4995 5.1705+-0.3179 might be 1.0280x faster int-or-other-neg-zero-then-get-by-val 20.5557+-2.9933 20.2563+-3.3712 might be 1.0148x faster int-or-other-sub-then-get-by-val 5.7707+-0.0654 ? 5.8898+-0.4342 ? might be 1.0206x slower int-or-other-sub 4.5635+-0.1708 4.5243+-0.0366 int-overflow-local 5.4025+-0.0268 5.3890+-0.0429 Int16Array-alloc-long-lived 59.2762+-2.1374 ? 61.7969+-7.5823 ? might be 1.0425x slower Int16Array-bubble-sort-with-byteLength 30.0576+-3.9523 28.1776+-1.4801 might be 1.0667x faster Int16Array-bubble-sort 25.7437+-0.5823 ? 26.2030+-1.7249 ? might be 1.0178x slower Int16Array-load-int-mul 1.9960+-0.2715 ? 2.0104+-0.2961 ? Int16Array-to-Int32Array-set 57.9059+-4.8963 ? 58.0270+-5.2190 ? Int32Array-alloc-large 27.9907+-3.2074 ? 28.5502+-3.9664 ? might be 1.0200x slower Int32Array-alloc-long-lived 69.4795+-2.9798 67.2277+-3.6195 might be 1.0335x faster Int32Array-alloc 3.9718+-0.1247 3.9485+-0.1171 Int32Array-Int8Array-view-alloc 7.6320+-0.3578 ? 8.3201+-2.0345 ? might be 1.0902x slower int52-spill 6.1505+-0.0742 6.0763+-0.0531 might be 1.0122x faster Int8Array-alloc-long-lived 54.8463+-5.7979 ? 55.6010+-3.0465 ? might be 1.0138x slower Int8Array-load-with-byteLength 4.0420+-0.2018 ? 4.1711+-0.2583 ? might be 1.0320x slower Int8Array-load 4.0695+-0.1306 ? 4.0820+-0.1899 ? integer-divide 13.0520+-0.2270 ? 13.0660+-0.1571 ? integer-modulo 2.4198+-0.1776 2.3589+-0.1200 might be 1.0258x faster is-boolean-fold-tricky 4.6224+-0.4703 4.5162+-0.1985 might be 1.0235x faster is-boolean-fold 3.5134+-0.0981 ? 4.0441+-1.3284 ? might be 1.1510x slower is-function-fold-tricky-internal-function 14.0395+-3.9912 12.5509+-0.3597 might be 1.1186x faster is-function-fold-tricky 5.0621+-0.9021 4.6710+-0.2278 might be 1.0837x faster is-function-fold 4.0177+-1.4193 3.5801+-0.1824 might be 1.1222x faster is-number-fold-tricky 4.6259+-0.3435 ? 4.7585+-1.0794 ? might be 1.0287x slower is-number-fold 3.5439+-0.1687 ? 4.0706+-1.2607 ? might be 1.1486x slower is-object-or-null-fold-functions 4.2009+-1.9312 ? 4.2788+-1.3202 ? might be 1.0185x slower is-object-or-null-fold-less-tricky 5.3835+-1.6055 4.5892+-0.1951 might be 1.1731x faster is-object-or-null-fold-tricky 6.3756+-0.0439 ? 6.3790+-0.0310 ? is-object-or-null-fold 3.8753+-0.5198 3.6371+-0.3600 might be 1.0655x faster is-object-or-null-trickier-function 4.6408+-0.0832 ? 4.7388+-0.1280 ? might be 1.0211x slower is-object-or-null-trickier-internal-function 12.8523+-0.5099 12.8255+-0.5772 is-object-or-null-tricky-function 5.4935+-1.7374 4.5847+-0.0164 might be 1.1982x faster is-object-or-null-tricky-internal-function 9.6467+-0.0916 9.6308+-0.1889 is-string-fold-tricky 4.6235+-0.2317 4.4738+-0.1307 might be 1.0335x faster is-string-fold 3.5557+-0.1308 ? 4.0798+-1.3563 ? might be 1.1474x slower is-undefined-fold-tricky 3.8113+-0.2422 ? 3.9595+-0.1552 ? might be 1.0389x slower is-undefined-fold 3.5772+-0.1657 3.4903+-0.0227 might be 1.0249x faster JSONP-negative-0 0.4306+-0.1885 ? 0.4485+-0.1878 ? might be 1.0415x slower large-int-captured 5.3687+-0.1264 5.3547+-0.0811 large-int-neg 18.2671+-1.9753 ? 19.0381+-5.2330 ? might be 1.0422x slower large-int 16.2271+-2.4130 ? 17.1702+-5.2807 ? might be 1.0581x slower load-varargs-elimination 23.9445+-0.9496 ? 24.7692+-3.1149 ? might be 1.0344x slower logical-not-weird-types 4.3046+-1.3029 3.8275+-0.0716 might be 1.1247x faster logical-not 5.5958+-0.4278 5.4717+-0.0704 might be 1.0227x faster lots-of-fields 14.4854+-3.6707 13.0905+-0.2793 might be 1.1066x faster make-indexed-storage 3.6874+-0.0916 ? 3.8279+-0.3094 ? might be 1.0381x slower make-rope-cse 5.1948+-0.1912 ? 5.2097+-0.1037 ? map-for-each 6.8508+-0.3992 ? 6.8766+-0.2120 ? map-for-of 21.6964+-6.6175 ? 21.7433+-2.8515 ? marsaglia-larger-ints 44.2672+-0.2060 ? 44.4397+-1.2595 ? marsaglia-osr-entry 25.2855+-0.4323 ? 25.3985+-0.3366 ? math-random 12.6255+-0.2460 12.6190+-0.5425 math-with-out-of-bounds-array-values 26.6885+-1.5858 ? 27.6008+-2.4216 ? might be 1.0342x slower max-boolean 3.2924+-0.1755 3.2900+-0.1605 method-on-number 19.8671+-0.7338 19.2610+-0.3641 might be 1.0315x faster min-boolean 3.1607+-0.1803 ? 3.1866+-0.1253 ? minus-boolean-double 3.4351+-0.1088 3.4127+-0.0659 minus-boolean 3.0648+-0.3516 ? 3.0779+-0.3018 ? misc-strict-eq 36.4464+-3.6843 ? 36.5499+-1.7704 ? mod-boolean-double 11.8383+-0.7135 11.3062+-0.2041 might be 1.0471x faster mod-boolean 8.4670+-0.6013 8.4130+-0.2901 mul-boolean-double 4.1385+-0.1629 4.1372+-0.2401 mul-boolean 3.5843+-1.3110 3.2654+-0.3255 might be 1.0977x faster neg-boolean 3.5370+-0.2357 3.4912+-0.1442 might be 1.0131x faster negative-zero-divide 0.6299+-0.2213 0.4707+-0.0391 might be 1.3382x faster negative-zero-modulo 0.4630+-0.0221 ? 0.5510+-0.2190 ? might be 1.1902x slower negative-zero-negate 0.5151+-0.1993 0.4615+-0.0204 might be 1.1161x faster nested-function-parsing 56.3910+-4.8411 55.6796+-4.4433 might be 1.0128x faster new-array-buffer-dead 123.5787+-6.8205 ? 128.2275+-14.7740 ? might be 1.0376x slower new-array-buffer-push 7.8698+-0.2846 ? 7.9419+-0.1842 ? new-array-dead 20.9135+-0.1971 ? 21.2814+-0.5702 ? might be 1.0176x slower new-array-push 5.2808+-0.0587 5.2662+-0.1013 no-inline-constructor 47.2620+-3.4821 46.8984+-11.7377 number-test 4.0626+-0.1032 ? 4.2947+-0.6674 ? might be 1.0572x slower object-closure-call 6.8326+-0.9067 6.7755+-0.9219 object-get-own-property-symbols-on-large-array 3.7772+-0.2403 3.7421+-0.2783 object-test 3.9470+-0.2324 ? 4.3260+-1.0185 ? might be 1.0960x slower obvious-sink-pathology-taken 156.4062+-1.0733 ? 157.2243+-11.9910 ? obvious-sink-pathology 41.8137+-4.3082 ? 43.6365+-3.8642 ? might be 1.0436x slower obviously-elidable-new-object 41.1153+-2.9941 ? 41.8242+-3.5764 ? might be 1.0172x slower plus-boolean-arith 3.0062+-0.1047 ? 3.1366+-0.3170 ? might be 1.0434x slower plus-boolean-double 3.4490+-0.1866 3.4008+-0.0515 might be 1.0142x faster plus-boolean 3.3332+-0.2873 3.3016+-0.2452 poly-chain-access-different-prototypes-simple 3.4490+-0.2964 3.4277+-0.2259 poly-chain-access-different-prototypes 3.1915+-0.1105 3.1788+-0.1149 poly-chain-access-simpler 3.8633+-1.4464 3.5985+-0.5425 might be 1.0736x faster poly-chain-access 3.1420+-0.1379 ? 3.2377+-0.1503 ? might be 1.0305x slower poly-stricteq 67.6044+-9.2076 ? 68.3341+-6.4331 ? might be 1.0108x slower polymorphic-array-call 2.1653+-1.0336 1.9830+-0.7526 might be 1.0919x faster polymorphic-get-by-id 3.5906+-0.0993 ? 3.6713+-0.2641 ? might be 1.0225x slower polymorphic-put-by-id 38.4598+-6.3643 ? 42.5972+-7.5243 ? might be 1.1076x slower polymorphic-put-by-val-with-string 41.4169+-6.3548 ? 42.7228+-2.8397 ? might be 1.0315x slower polymorphic-put-by-val-with-symbol 42.5748+-2.8176 ? 43.1814+-4.1392 ? might be 1.0142x slower polymorphic-structure 14.6933+-0.3643 ? 14.8711+-0.3465 ? might be 1.0121x slower polyvariant-monomorphic-get-by-id 10.1808+-0.0822 ? 10.2852+-0.2957 ? might be 1.0103x slower proto-getter-access 10.1153+-0.2493 ? 10.2157+-0.1878 ? prototype-access-with-mutating-prototype 6.8580+-0.4128 6.7427+-0.1842 might be 1.0171x faster put-by-id-replace-and-transition 10.2584+-0.1007 ? 10.3621+-0.2205 ? might be 1.0101x slower put-by-id-slightly-polymorphic 3.3077+-0.1029 ? 4.0663+-1.6772 ? might be 1.2293x slower put-by-id 14.6176+-0.2376 14.5065+-0.1803 put-by-val-direct 0.5563+-0.2179 ? 0.6312+-0.2873 ? might be 1.1346x slower put-by-val-large-index-blank-indexing-type 6.4910+-0.3632 ? 7.3265+-3.0158 ? might be 1.1287x slower put-by-val-machine-int 3.2847+-0.1928 ? 3.3715+-0.1265 ? might be 1.0264x slower put-by-val-with-string-replace-and-transition 14.9731+-0.1793 ? 15.1129+-0.2855 ? put-by-val-with-string-slightly-polymorphic 4.0958+-0.1932 4.0462+-0.1823 might be 1.0123x faster put-by-val-with-string 14.9073+-0.2550 ? 14.9196+-0.1959 ? put-by-val-with-symbol-replace-and-transition 16.3869+-0.2558 16.2980+-0.3802 put-by-val-with-symbol-slightly-polymorphic 4.1321+-0.2640 ? 4.5080+-1.4807 ? might be 1.0910x slower put-by-val-with-symbol 15.5836+-1.1736 15.5638+-1.0770 rare-osr-exit-on-local 15.9693+-0.3417 ? 16.0940+-0.0928 ? raytrace-with-empty-try-catch 8.2103+-0.7036 8.1862+-0.7513 raytrace-with-try-catch 14.7606+-1.0495 ? 14.9625+-1.3479 ? might be 1.0137x slower register-pressure-from-osr 21.8005+-1.1761 21.7482+-0.2685 repeat-multi-get-by-offset 25.5856+-2.2963 ? 26.4895+-2.8634 ? might be 1.0353x slower richards-empty-try-catch 76.6583+-3.8688 75.1190+-1.6998 might be 1.0205x faster richards-try-catch 294.3876+-8.4516 290.5861+-3.7070 might be 1.0131x faster set-for-each 5.8733+-0.0245 ? 5.9347+-0.2361 ? might be 1.0105x slower set-for-of 11.7655+-3.8582 9.6788+-0.3351 might be 1.2156x faster setter-prototype 10.0235+-2.4211 9.2670+-0.3951 might be 1.0816x faster setter 6.5450+-0.7979 ? 6.8843+-0.9512 ? might be 1.0518x slower simple-activation-demo 28.2238+-1.3398 ? 28.4749+-0.9541 ? simple-getter-access 13.5765+-0.9194 ? 13.8735+-1.8892 ? might be 1.0219x slower simple-poly-call-nested 8.8252+-0.2985 ? 8.8735+-0.3049 ? simple-poly-call 1.8185+-0.4881 ? 1.9286+-0.5205 ? might be 1.0605x slower sin-boolean 21.0157+-2.7769 ? 24.6975+-6.9417 ? might be 1.1752x slower singleton-scope 67.9980+-2.4083 ? 68.1135+-5.0104 ? sink-function 13.1805+-0.2820 ? 13.5355+-0.7534 ? might be 1.0269x slower sink-huge-activation 19.4793+-0.9062 ? 20.2101+-3.3526 ? might be 1.0375x slower sinkable-new-object-dag 76.3535+-3.7088 76.0950+-2.0315 sinkable-new-object-taken 58.2634+-8.0597 57.7078+-9.1858 sinkable-new-object 44.9326+-7.2817 43.5840+-5.8613 might be 1.0309x faster slow-array-profile-convergence 3.6478+-1.3117 3.2711+-0.0168 might be 1.1151x faster slow-convergence 3.1992+-0.1165 ? 3.2065+-0.1171 ? slow-ternaries 27.4589+-2.3749 26.0146+-0.1675 might be 1.0555x faster sorting-benchmark 21.7480+-0.1951 21.5207+-0.5505 might be 1.0106x faster sparse-conditional 1.8674+-0.3825 1.6850+-0.3798 might be 1.1083x faster splice-to-remove 17.1874+-0.2506 ? 17.2143+-0.3061 ? string-char-code-at 17.7871+-0.4126 17.6122+-0.4465 string-concat-object 2.8518+-0.2617 2.8041+-0.0753 might be 1.0170x faster string-concat-pair-object 2.7423+-0.2227 ? 2.7549+-0.0920 ? string-concat-pair-simple 14.6860+-3.5890 ? 15.2195+-3.7544 ? might be 1.0363x slower string-concat-simple 14.2955+-0.1220 ? 15.4174+-3.5701 ? might be 1.0785x slower string-cons-repeat 10.7249+-4.8813 9.1470+-0.7113 might be 1.1725x faster string-cons-tower 8.9608+-0.3736 ? 11.1104+-3.9131 ? might be 1.2399x slower string-equality 22.5461+-1.4882 ? 22.7748+-3.0165 ? might be 1.0101x slower string-get-by-val-big-char 8.6230+-0.1565 ? 8.8206+-0.4304 ? might be 1.0229x slower string-get-by-val-out-of-bounds-insane 4.1242+-0.2436 4.1128+-0.0678 string-get-by-val-out-of-bounds 5.9644+-1.4780 5.6688+-0.8075 might be 1.0521x faster string-get-by-val 4.5708+-1.6263 3.8514+-0.1876 might be 1.1868x faster string-hash 2.5662+-0.3229 ? 2.6826+-0.5545 ? might be 1.0454x slower string-long-ident-equality 19.8177+-4.8299 17.8335+-0.2628 might be 1.1113x faster string-out-of-bounds 14.3625+-0.1016 ! 14.6243+-0.0954 ! definitely 1.0182x slower string-repeat-arith 27.4365+-3.0967 ? 27.9273+-2.0482 ? might be 1.0179x slower string-rope-with-object 22.8054+-0.2599 22.5442+-0.2012 might be 1.0116x faster string-sub 43.5569+-0.3216 ! 45.1613+-0.3264 ! definitely 1.0368x slower string-test 4.0378+-0.4752 4.0117+-0.4210 string-var-equality 38.6000+-0.4995 38.3755+-0.0257 structure-hoist-over-transitions 3.2219+-0.7886 ? 3.3240+-1.0523 ? might be 1.0317x slower substring-concat-weird 48.6868+-3.9525 ? 51.4867+-8.2790 ? might be 1.0575x slower substring-concat 54.0867+-8.9825 52.8566+-4.5002 might be 1.0233x faster substring 58.1147+-5.2678 55.8556+-0.6686 might be 1.0404x faster switch-char-constant 3.2054+-0.2489 ? 3.2509+-0.4093 ? might be 1.0142x slower switch-char 7.3036+-0.9708 6.9700+-0.0155 might be 1.0479x faster switch-constant 9.7272+-0.3854 9.3713+-0.1156 might be 1.0380x faster switch-string-basic-big-var 20.3176+-0.2645 ? 20.6295+-0.2749 ? might be 1.0154x slower switch-string-basic-big 19.8146+-3.0740 ? 19.8842+-3.7652 ? switch-string-basic-var 19.6000+-4.5017 17.1623+-0.4089 might be 1.1420x faster switch-string-basic 16.5349+-2.0039 15.7368+-0.2081 might be 1.0507x faster switch-string-big-length-tower-var 22.7395+-4.1810 21.7247+-1.2240 might be 1.0467x faster switch-string-length-tower-var 16.9782+-0.3054 ? 17.0869+-0.3637 ? switch-string-length-tower 16.5837+-3.4744 15.3746+-2.1621 might be 1.0786x faster switch-string-short 15.1630+-1.4054 ? 18.4283+-4.3293 ? might be 1.2153x slower switch 13.2080+-0.7392 ? 13.4925+-0.8539 ? might be 1.0215x slower symbol-tostringtag 3.7987+-0.0897 ? 3.8132+-0.1623 ? tear-off-arguments-simple 3.8389+-0.0525 3.8019+-0.0708 tear-off-arguments 5.3348+-0.1510 5.3005+-0.0575 temporal-structure 15.6133+-3.5418 ? 15.7650+-3.4936 ? to-int32-boolean 18.8528+-4.0071 17.3121+-0.3076 might be 1.0890x faster try-catch-get-by-val-cloned-arguments 13.2030+-3.9768 12.7891+-2.8940 might be 1.0324x faster try-catch-get-by-val-direct-arguments 3.0114+-0.3885 ? 3.1597+-0.8626 ? might be 1.0493x slower try-catch-get-by-val-scoped-arguments 5.8954+-0.4172 5.6597+-0.1562 might be 1.0417x faster typed-array-get-set-by-val-profiling 30.9415+-0.9704 ? 31.5344+-1.8677 ? might be 1.0192x slower undefined-property-access 379.4645+-4.3045 ? 383.3814+-8.8896 ? might be 1.0103x slower undefined-test 4.1239+-0.1697 ? 4.5159+-1.0269 ? might be 1.0950x slower unprofiled-licm 14.5941+-1.4869 14.3303+-1.2450 might be 1.0184x faster v8-raytrace-with-empty-try-catch 78.1481+-3.0262 ? 78.4071+-3.7318 ? v8-raytrace-with-try-catch 90.7684+-3.3914 89.8038+-2.5377 might be 1.0107x faster varargs-call 17.3807+-4.2563 15.5209+-0.1578 might be 1.1198x faster varargs-construct-inline 29.8410+-2.2089 ? 30.4965+-3.1872 ? might be 1.0220x slower varargs-construct 23.9575+-0.1950 23.8552+-0.0264 varargs-inline 10.9659+-1.8271 10.6133+-0.4973 might be 1.0332x faster varargs-strict-mode 11.6835+-0.4148 ? 12.3951+-2.2802 ? might be 1.0609x slower varargs 11.6260+-0.2843 11.4458+-0.3952 might be 1.0157x faster weird-inlining-const-prop 2.7510+-0.1419 ? 2.8736+-0.5127 ? might be 1.0446x slower <geometric> 11.3749+-0.0582 11.3071+-0.0545 might be 1.0060x faster Baseline Mine Geomean of preferred means: <scaled-result> 43.2217+-0.4261 ? 43.2363+-0.3838 ? might be 1.0003x slower
Saam Barati
Comment 24 2016-01-18 19:30:22 PST
Comment on attachment 269174 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=269174&action=review Most of the patch looks good to me. I'm mostly interested in why you chose *_hint opcodes + op_nop. I don't think this is the best way to go about things and it seems like an anti-pattern. More questions about that below: > Source/JavaScriptCore/bytecode/BytecodeList.json:139 > + { "name" : "op_save_hint", "length" : 4 }, > + { "name" : "op_resume_hint", "length" : 4 }, These nodes seem like anti-patterns to what we normally do with byte code (i.e these opcodes are basically used as flags and never even make it to the point where they're executed). I can't see a good reason why we really need them. It's nice being able to compile op_resume into a linear stream of stores, but this can be done easily inside the baseline JIT just by consulting the liveness data. It would seem like we wouldn't get this property in the LLInt, but I don't think that matters. Is there a reason I'm overlooking for why we really need these opcodes? It seems like we could solve their use cases by other means. > Source/JavaScriptCore/bytecode/BytecodeList.json:141 > + { "name" : "op_clear_generator_frame", "length" : 2 }, I'm not against having this opcode, but it doesn't seem like its strictly necessary if you no longer use resume_hint + resume_from_generator_frame. It's always emitted right after op_resume, so we could just bake it into that opcode for LLInt/baseline implementation. And then we could still lower it to a separate node inside the DFG. (Again, if we don't use the resume_hint approach). > Source/JavaScriptCore/bytecode/CodeBlock.cpp:2378 > + for (size_t i = 0; i < 4; ++i) We still should have a discussion if we want this approach of having the *_hint opcodes and the nop opcode, but, if we do decide we want that approach, we should use OPCODE_LENGTH here instead of "4" anywhere the length is hard coded. > Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h:2524 > + forNode(node).makeHeapTop(); This should be bytecode top if GetGeneratorLocal could give us TDZ value. > Source/JavaScriptCore/dfg/DFGClobberize.h:1117 > + read(GeneratorLocalCount); why is this true? > Source/JavaScriptCore/dfg/DFGClobberize.h:1123 > + read(GeneratorLocalCount); why is this true? > Source/JavaScriptCore/parser/Parser.cpp:255 > + functionInfo.parameters = createGeneratorParameters(context, parameterCount); shouldn't this be passing in info.parameterCount?
Yusuke Suzuki
Comment 25 2016-01-19 04:35:54 PST
Comment on attachment 269174 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=269174&action=review Thank you for your review! >> Source/JavaScriptCore/bytecode/BytecodeList.json:139 >> + { "name" : "op_resume_hint", "length" : 4 }, > > These nodes seem like anti-patterns to what we normally do with byte code (i.e these opcodes are basically > used as flags and never even make it to the point where they're executed). I can't see a good reason > why we really need them. It's nice being able to compile op_resume into a linear stream of stores, > but this can be done easily inside the baseline JIT just by consulting the liveness data. > It would seem like we wouldn't get this property in the LLInt, but I don't think that matters. > > Is there a reason I'm overlooking for why we really need these opcodes? > It seems like we could solve their use cases by other means. The reason why I took resume_hint approach is aligning the bytecode to the DFG node `set` operation. The old `op_resume` can define multiple locals at one bytecode. Except for op_enter, all the other bytecode op usually defines zero or one local at a time. It makes each DFG node during op_resume m_exitOK = true. But, currently, op_resume does not cause any problems with m_exitOK = false. So just using op_resume is OK to me. I'll change this. >> Source/JavaScriptCore/bytecode/BytecodeList.json:141 >> + { "name" : "op_clear_generator_frame", "length" : 2 }, > > I'm not against having this opcode, but it doesn't seem like its strictly necessary if you no > longer use resume_hint + resume_from_generator_frame. > It's always emitted right after op_resume, so we could just bake it into that opcode > for LLInt/baseline implementation. And then we could still lower it to a separate node > inside the DFG. (Again, if we don't use the resume_hint approach). Right. Since I'll change the design, I'll remove this opcode as well :) >> Source/JavaScriptCore/bytecode/CodeBlock.cpp:2378 >> + for (size_t i = 0; i < 4; ++i) > > We still should have a discussion if we want this approach of having the *_hint opcodes > and the nop opcode, but, if we do decide we want that approach, we should use OPCODE_LENGTH > here instead of "4" anywhere the length is hard coded. Oops! Thanks. I missed this. I'll change the design, so this part will be removed as well :) >> Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h:2524 >> + forNode(node).makeHeapTop(); > > This should be bytecode top if GetGeneratorLocal could give us TDZ value. Nice! You're right. I'll fix this. >> Source/JavaScriptCore/dfg/DFGClobberize.h:1117 >> + read(GeneratorLocalCount); > > why is this true? SetGeneratorLocal should not be executed before executing ResizeGeneratorLocals. GeneratorLocalCount means the allocated slots in the GeneratorFrame. SetGeneratorLocal depends on this slots. >> Source/JavaScriptCore/dfg/DFGClobberize.h:1123 >> + read(GeneratorLocalCount); > > why is this true? Ditto. > Source/JavaScriptCore/dfg/DFGClobberize.h:1129 > + write(GeneratorLocalCount); This means allocate/deallocate the slots in the generator frame. That is what ResizeGeneratorLocals does. >> Source/JavaScriptCore/parser/Parser.cpp:255 >> + functionInfo.parameters = createGeneratorParameters(context, parameterCount); > > shouldn't this be passing in info.parameterCount? Passing `info.parameterCount` seems preferable!
Yusuke Suzuki
Comment 26 2016-01-19 11:52:16 PST
Added a little more detailed comments on IRC to saamyjoon :)
Yusuke Suzuki
Comment 27 2016-01-19 12:18:37 PST
Created attachment 269279 [details] Patch Revised and redesigned
WebKit Commit Bot
Comment 28 2016-01-19 12:20:20 PST
Attachment 269279 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/parser/Parser.cpp:1833: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 1 in 63 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 29 2016-01-19 12:29:25 PST
WebKit Commit Bot
Comment 30 2016-01-19 12:31:32 PST
Attachment 269280 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/parser/Parser.cpp:1833: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 1 in 63 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 31 2016-01-20 22:53:56 PST
WebKit Commit Bot
Comment 32 2016-01-20 22:56:27 PST
Attachment 269434 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/parser/Parser.cpp:1833: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 1 in 64 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 33 2016-01-22 15:13:37 PST
Comment on attachment 269434 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=269434&action=review > Source/JavaScriptCore/ChangeLog:33 > + [entry] -+--> [save block] -> op_save (it is branching bytecode) ------> [merge block] -> ... > + | | ^ > + | +-> op_ret | > + +------------------> op_resume --------------------------------------+ Can you show an example bytecode sequence that creates this control flow? > Source/JavaScriptCore/ChangeLog:48 > + - NewGeneratorFrame > + > + This just allocates GeneratorFrame. How is this different from allocating an activation? Why can't they share the same opcodes and classes? How is this different from allocating a vanilla JS object? Why can't you use that? > Source/JavaScriptCore/ChangeLog:53 > + - ResizeGeneratorLocals > + > + This set GeneratorFrame::m_numberOfSavedCalleeLocals *N*. This node is used to (1) allocate space for locals in GeneratorFrame and (2) deallocate space in > + GeneratorFrame by setting it as 0. This sounds like functionality that JSObject already has. It doesn't even need to be warned of a resize. Why can't you just use JSObject? > Source/JavaScriptCore/ChangeLog:58 > + - GetGeneratorLocal > + > + Get a saved local variable from a generator frame. GetGeneratorLocal and SetGeneratorLocal are linked with GeneratorLocalData. So type predication and flags > + are correctly propagated beyond the generator frame. How is this different from GetFromScope or GetByOffset? Why do we need something new? > Source/JavaScriptCore/ChangeLog:62 > + - SetGeneratorLocal > + > + Store a local variable to a generator frame. How is this different from PutToScope or PutByOffset? > Source/JavaScriptCore/ChangeLog:66 > + 1. Support transferring AbstractValue between SetGeneratorLocal and GetGeneratorLocal. https://bugs.webkit.org/show_bug.cgi?id=153183 We don't do this for PutByOffset->GetByOffset or PutToScope->GetFromScope. That's intentional. We don't need it because if you find yourself in a situation where the load is proved to be a constant then most you can also prove the set of stores that reach that load. And if you can do that then you can get rid of the load. That's a more general optimization. > Source/JavaScriptCore/ChangeLog:72 > + 3. And once we exploit that GeneratorFrame is not escaped from the given FTL-ed function, we will eliminate GeneratorFrame allocation and > + convert GeneratorLocal operations into get/set local operations. It's a lot easier to get escape analysis if you're not inventing new types!
Filip Pizlo
Comment 34 2016-01-22 15:24:17 PST
Comment on attachment 269434 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=269434&action=review > Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:41 > +static bool isBranch(OpcodeID opcodeID, BytecodeAnalysisMode bytecodeAnalysisMode) I really don't like adding modes like this. It makes the code dirty, and less performant. It's common in a compiler to write many different kinds of analyses. Why can't this just be a separate analysis? I think that bytecode analyses could be made a lot easier to write if we just make the existing bytecode basic block and use/def machinery a bit more user-friendly. And maybe it's already user-friendly enough.
Filip Pizlo
Comment 35 2016-01-22 15:33:37 PST
Comment on attachment 269434 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=269434&action=review > Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h:2523 > + // FIXME: We should attempt to convert it to constant. > + // https://bugs.webkit.org/show_bug.cgi?id=153183 Please remove this FIXME.
Yusuke Suzuki
Comment 36 2016-01-23 12:34:24 PST
Comment on attachment 269434 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=269434&action=review Thanks for your review :D >> Source/JavaScriptCore/ChangeLog:33 >> + +------------------> op_resume --------------------------------------+ > > Can you show an example bytecode sequence that creates this control flow? When we use `yield`, now we create the above control flow. function *gen() { yield 1; yield 2; } will creates switch (genStatus) { case 0: // initial seq ... op_save 1, => pseudo jump to case 1's merge block op_ret 1 case 1: op_resume 1 // merge block ... op_save 2, => pseudo jump to case 2's merge block case 2: op_resume 2 ... } >> Source/JavaScriptCore/ChangeLog:48 >> + This just allocates GeneratorFrame. > > How is this different from allocating an activation? Why can't they share the same opcodes and classes? > > How is this different from allocating a vanilla JS object? Why can't you use that? Good pointing. Linking generator locals can be achieved by GetGeneratorLocal and SetGeneratorLocal. So we can use JSLexicalEnvironment instead of GeneratorFrame. >> Source/JavaScriptCore/ChangeLog:53 >> + GeneratorFrame by setting it as 0. > > This sounds like functionality that JSObject already has. It doesn't even need to be warned of a resize. Why can't you just use JSObject? It just avoids marking values that is not used anymore. If holding and marking dead (not used anymore) values is OK, we can remove this op. >> Source/JavaScriptCore/ChangeLog:58 >> + are correctly propagated beyond the generator frame. > > How is this different from GetFromScope or GetByOffset? Why do we need something new? One advantage of GetGeneratorLocal etc. approach is that we can link SetGeneratorLocal and GetGeneratorLocal with DFGGeneratorLocalData. (This is like a GetLocal / SetLocal's VariableAccessData). It enables us to transfer predication data from SetGeneratorLocal position to GetGeneratorLocal position in DFGPredictionPropagationPhase. And currently I didn't, but, we can also do backward propagation with this link. What do you think of? >> Source/JavaScriptCore/ChangeLog:66 >> + 1. Support transferring AbstractValue between SetGeneratorLocal and GetGeneratorLocal. https://bugs.webkit.org/show_bug.cgi?id=153183 > > We don't do this for PutByOffset->GetByOffset or PutToScope->GetFromScope. That's intentional. We don't need it because if you find yourself in a situation where the load is proved to be a constant then most you can also prove the set of stores that reach that load. And if you can do that then you can get rid of the load. That's a more general optimization. Thanks! OK. >> Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:41 >> +static bool isBranch(OpcodeID opcodeID, BytecodeAnalysisMode bytecodeAnalysisMode) > > I really don't like adding modes like this. It makes the code dirty, and less performant. > > It's common in a compiler to write many different kinds of analyses. Why can't this just be a separate analysis? I think that bytecode analyses could be made a lot easier to write if we just make the existing bytecode basic block and use/def machinery a bit more user-friendly. And maybe it's already user-friendly enough. Make sense. Splitting bytecode analysis into 2 sounds nice idea. >> Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h:2523 >> + // https://bugs.webkit.org/show_bug.cgi?id=153183 > > Please remove this FIXME. Dropped. Thanks for your advice at https://bugs.webkit.org/show_bug.cgi?id=153183.
Filip Pizlo
Comment 37 2016-01-23 13:36:54 PST
(In reply to comment #36) > Comment on attachment 269434 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=269434&action=review > > Thanks for your review :D > > >> Source/JavaScriptCore/ChangeLog:33 > >> + +------------------> op_resume --------------------------------------+ > > > > Can you show an example bytecode sequence that creates this control flow? > > When we use `yield`, now we create the above control flow. > > function *gen() { > yield 1; > yield 2; > } > > will creates > > switch (genStatus) { > case 0: > // initial seq > ... > > op_save 1, => pseudo jump to case 1's merge block > op_ret 1 > > case 1: > op_resume 1 > // merge block > ... > op_save 2, => pseudo jump to case 2's merge block > > > case 2: > op_resume 2 > ... > } This sort of makes sense as an intermediate representation, but it has problems. I think it's an antipattern to have an IR opcode that changes meaning depending on mode. The usual way that compilers handle this is having multiple opcodes. For example, you could use op_resume and op_save before the transformation of the program into a switch statement, and then use normal scoped variable stores and loads after the transformation. > > >> Source/JavaScriptCore/ChangeLog:48 > >> + This just allocates GeneratorFrame. > > > > How is this different from allocating an activation? Why can't they share the same opcodes and classes? > > > > How is this different from allocating a vanilla JS object? Why can't you use that? > > Good pointing. Linking generator locals can be achieved by GetGeneratorLocal > and SetGeneratorLocal. > So we can use JSLexicalEnvironment instead of GeneratorFrame. Why not just use existing PutClosureVar/GetClosureVar opcodes instead of adding new opcodes? Adding new opcodes to DFG IR takes a lot of work. It also makes the entire compiler harder to work on. We shouldn't add new opcodes unless there is a really powerful optimization that this enables. I don't believe that GetGeneratorLocal/SetGeneratorLocal enables any optimizations. > > >> Source/JavaScriptCore/ChangeLog:53 > >> + GeneratorFrame by setting it as 0. > > > > This sounds like functionality that JSObject already has. It doesn't even need to be warned of a resize. Why can't you just use JSObject? > > It just avoids marking values that is not used anymore. > If holding and marking dead (not used anymore) values is OK, we can remove > this op. We shouldn't add this optimization until we know that we need it. Probably, the way we would do this is to just store undefined into known-dead variables. > > >> Source/JavaScriptCore/ChangeLog:58 > >> + are correctly propagated beyond the generator frame. > > > > How is this different from GetFromScope or GetByOffset? Why do we need something new? > > One advantage of GetGeneratorLocal etc. approach is that we can link > SetGeneratorLocal and GetGeneratorLocal with DFGGeneratorLocalData. (This is > like a GetLocal / SetLocal's VariableAccessData). > It enables us to transfer predication data from SetGeneratorLocal position > to GetGeneratorLocal position in DFGPredictionPropagationPhase. > And currently I didn't, but, we can also do backward propagation with this > link. > > What do you think of? I don't think any of that is a good idea. First of all, I've worked super hard for the last three years to steadily eradicate VariableAccessData from the DFG. I'm surprised that you like it or that you want to add more things like it. It's a rotten mess. We only keep it around because it's cheaper than SSA conversion, and it closes holes in our value profiling. But once we convert to SSA in the FTL, we drop VariableAccessData. Moreover, the fact that you're proposing to create this feature in such a way that you need to add new data structures to prediction propagation is troubling. Prediction propagation is an extremely subtle algorithm. We don't want to have to change it in big ways every time we implement a new language feature. Instead, we want to be able to reduce new language features to existing compiler capabilities so as to minimize the number of DFG IR changes. It's pretty clear to me that for generator properties, you should emit get_from_scope/put_to_scope bytecodes at each point of use rather than doing wholesale storing and loading at resume/save. Fine-grained accesses would give you detailed type profiling at every load, which would obviate the need for any prediction propagation changes. You would have no need for any VariableAccessData-like hacks. All of the optimizations that you're imagining getting, you would get them for free. And if you did add GeneratorLocalData, then I have no idea how you'd propose to make it work in SSA. SSA drops VariableAccessData because having objects that somehow link live ranges of variables is antithetical to how SSA works. Some more notes: - If GeneratorLocalData was a good idea, then it would also be a good idea for *any* kind of object allocation. We need to apply this rule for all features to IR. It can't be the case that each new kind of object that people invent gets its own completely novel optimization architecture in the DFG. If you can think of an optimization to do for your new object kind, then I bet you that this same optimization is either already implemented for other kinds of objects or it's one that we should implement for all objects. - Currently the compiler's handling of object allocations is powerful enough that all of the optimizations you want to special-case are things you would get for free if you just used some existing object model (either JSObject or activation) and fine-grained loads and stores to the variable properties at every variable use. - Don't worry about backwards propagation. It's not sufficiently profitable for us to worry about it too much. And if we did need to worry about it, we would not implement an optimization that only works for one kind of object. We'd implement an optimization that worked for accesses to all objects! > > >> Source/JavaScriptCore/ChangeLog:66 > >> + 1. Support transferring AbstractValue between SetGeneratorLocal and GetGeneratorLocal. https://bugs.webkit.org/show_bug.cgi?id=153183 > > > > We don't do this for PutByOffset->GetByOffset or PutToScope->GetFromScope. That's intentional. We don't need it because if you find yourself in a situation where the load is proved to be a constant then most you can also prove the set of stores that reach that load. And if you can do that then you can get rid of the load. That's a more general optimization. > > Thanks! OK. > > >> Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:41 > >> +static bool isBranch(OpcodeID opcodeID, BytecodeAnalysisMode bytecodeAnalysisMode) > > > > I really don't like adding modes like this. It makes the code dirty, and less performant. > > > > It's common in a compiler to write many different kinds of analyses. Why can't this just be a separate analysis? I think that bytecode analyses could be made a lot easier to write if we just make the existing bytecode basic block and use/def machinery a bit more user-friendly. And maybe it's already user-friendly enough. > > Make sense. Splitting bytecode analysis into 2 sounds nice idea. > > >> Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h:2523 > >> + // https://bugs.webkit.org/show_bug.cgi?id=153183 > > > > Please remove this FIXME. > > Dropped. Thanks for your advice at > https://bugs.webkit.org/show_bug.cgi?id=153183.
Filip Pizlo
Comment 38 2016-01-23 19:29:55 PST
Here are some more notes that might be useful for getting the generator object model right. Currently we have three primary object models in JSC. By "primary" I mean that they will end up on the hot path of just about any idiomatic JavaScript program, and our compilers got to great lengths to make them fast. I'm only considering object models for storing named fields, where the name is supplied by the program, and those fields can hold any JS value. 1) JSObject 2) JSEnvironmentRecord (i.e. what activation uses) 3) DirectArguments What distinguishes (1) is that it has type inference, constant inference, O(1) direct accesses provided that you lay out the structure carefully enough. You risk having to do extra indirections to access properties if the structure doesn't make them inline. The fact that JSObject has the possibility of this kind of pessimisation is partly why we have (2) at all. If you know the set of properties ahead of time, you can use (2). It has constant inference, but it does not have type inference (a program point that loads from a property in an activation will still have local inference to predict what type of values will be loaded there, but unlike JSObject, they won't also get per-property guarantees about the type). Accesses are O(1) and always direct; you don't have to do anything special to ensure that. Finally, we have (3) for sloppy arguments objects. They are popular enough, and have sufficiently magical semantics, that we need a separate object model here. (3) does not have either constant inference or type inference. Oddly, we created an object model, (2), for the purpose of improving performance over (1). But right now, we are in this strange situation that (1) has a performane feature, type inference, which we still have not ported to (2). That's really unfortunate and I hope we do it soon. Neither type inference nor constant inference have made it into DirectArguments. It would be harder to add those features there, but not impossible. You could argue that having JSEnvironmentRecord be separate from JSObject, rather than using JSObject's object model, is an overall architecture win because when you have a JSEnvironmentRecord* it gives you certain guarantees that JSObject cannot give you (accesses are direct, properties don't get deleted, there are no accessors, etc). But from the standpoint of how to evolve the VM and add features, it's really unfortunate that it's a separate object model. We have implemented constant inference twice now - once for JSEnvironmentRecords and then again for JSObjects. It's unfortunate that we had to spend 2x the time on that feature. We have only implemented type inference once (JSObject), but we will have to implement it a second time soon (JSEnvironmentRecord). DirectArguments is even sadder than JSEnvironmentRecord, but it probably has to be a separate thing because of the crazy semantics. Note that right now, both JSObject and JSEnvironmentRecord enjoy the following optimizations in DFG and FTL: - Local CSE in tier 3 and global CSE in tier 4. - Field-name-based alias analysis. - Escape analysis in tier 4. - Inline allocation in all tiers. A generator frame, on the other hand, feels like it satisfies all of the requirements for using JSEnvironmentRecord: you know the set of things that go into it ahead of time and you want accesses to them to be as fast as possible. I would strongly urge you to use JSEnvironmentRecord. If you don't use it, then you'll have to implement all of those DFG and FTL optimizations a third time, and then you'll either land it without constant or type inference, or you'll have to do a ton of work to reimplement those features.
Filip Pizlo
Comment 39 2016-01-23 19:33:34 PST
Comment on attachment 269434 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=269434&action=review > Source/JavaScriptCore/dfg/DFGClobbersExitState.cpp:60 > + case ResizeGeneratorLocals: > + case SetGeneratorLocal: > + case GetGeneratorLocal: > + return false; > + Really? Are you saying that it's valid to re-execute a bytecode operation that was lowered to these nodes?
Filip Pizlo
Comment 40 2016-01-23 19:49:45 PST
Now the other question is how to compile generators. One approach is to turn save/restore into a sequence of put_to_scope and then get_from_scope operations. You could do this as a bytecode->bytecode transform, or you could somehow make "save" and "restore" imply such a sequence. If you used a bonafide sequence of get_from_scope instructions for restore, then each one would have value profiling. This is all you'd need to ensure that DFG prediction propagation is happy. I think that this would be pretty good, but you run the risk that each save and each restore does more work than strictly necessary. It's possible that the program will have more live state at save/restore points than what is needed for any particular execution of the generator. It seems like the generator should load and save properties of the frame only when it needs them. Another approach would be to do a bytecode analysis that splits bytecode variables into disjoint live ranges, kind of like VariableAccessData, and then checks which of those are live at save/restore. Any live range that is live at save/restore has all of its uses turned into a load from the frame and all of its defs turned into a store into the frame. This means that if loc42 is used for a temporary at a save point, then this temporary doesn't get completely demoted to the heap - it's most likely only demoted for the uses after the restore and the defs before the save. I think that so long as this issue is encapsulated inside a phase that transforms the save/restore form into a form that uses put_to_scope/get_from_scope instructions, then it's fine to have the initial implementation emit a get_from_scope for every live-in-frame local at the restore point and a put_to_scope for each such local at every save. We can always play with different strategies later! This approach implies really beefing up the available machinery for editing bytecode - along with all of the necessary renumbering of jump targets and the like. It's a fun algorithm to implement, and if you create a nice API for it then we'll be able to reuse it for other features that benefit from bytecode->bytecode transformation.
Yusuke Suzuki
Comment 41 2016-01-25 00:12:28 PST
Comment on attachment 269434 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=269434&action=review Thanks! >>>> Source/JavaScriptCore/ChangeLog:53 >>>> + GeneratorFrame by setting it as 0. >>> >>> This sounds like functionality that JSObject already has. It doesn't even need to be warned of a resize. Why can't you just use JSObject? >> >> It just avoids marking values that is not used anymore. >> If holding and marking dead (not used anymore) values is OK, we can remove this op. > > We shouldn't add this optimization until we know that we need it. > > Probably, the way we would do this is to just store undefined into known-dead variables. OK, dropping this op. >>>> Source/JavaScriptCore/ChangeLog:58 >>>> + are correctly propagated beyond the generator frame. >>> >>> How is this different from GetFromScope or GetByOffset? Why do we need something new? >> >> One advantage of GetGeneratorLocal etc. approach is that we can link SetGeneratorLocal and GetGeneratorLocal with DFGGeneratorLocalData. (This is like a GetLocal / SetLocal's VariableAccessData). >> It enables us to transfer predication data from SetGeneratorLocal position to GetGeneratorLocal position in DFGPredictionPropagationPhase. >> And currently I didn't, but, we can also do backward propagation with this link. >> >> What do you think of? > > I don't think any of that is a good idea. > > First of all, I've worked super hard for the last three years to steadily eradicate VariableAccessData from the DFG. I'm surprised that you like it or that you want to add more things like it. It's a rotten mess. We only keep it around because it's cheaper than SSA conversion, and it closes holes in our value profiling. But once we convert to SSA in the FTL, we drop VariableAccessData. > > Moreover, the fact that you're proposing to create this feature in such a way that you need to add new data structures to prediction propagation is troubling. Prediction propagation is an extremely subtle algorithm. We don't want to have to change it in big ways every time we implement a new language feature. Instead, we want to be able to reduce new language features to existing compiler capabilities so as to minimize the number of DFG IR changes. > > It's pretty clear to me that for generator properties, you should emit get_from_scope/put_to_scope bytecodes at each point of use rather than doing wholesale storing and loading at resume/save. Fine-grained accesses would give you detailed type profiling at every load, which would obviate the need for any prediction propagation changes. You would have no need for any VariableAccessData-like hacks. All of the optimizations that you're imagining getting, you would get them for free. > > And if you did add GeneratorLocalData, then I have no idea how you'd propose to make it work in SSA. SSA drops VariableAccessData because having objects that somehow link live ranges of variables is antithetical to how SSA works. > > Some more notes: > > - If GeneratorLocalData was a good idea, then it would also be a good idea for *any* kind of object allocation. We need to apply this rule for all features to IR. It can't be the case that each new kind of object that people invent gets its own completely novel optimization architecture in the DFG. If you can think of an optimization to do for your new object kind, then I bet you that this same optimization is either already implemented for other kinds of objects or it's one that we should implement for all objects. > > - Currently the compiler's handling of object allocations is powerful enough that all of the optimizations you want to special-case are things you would get for free if you just used some existing object model (either JSObject or activation) and fine-grained loads and stores to the variable properties at every variable use. > > - Don't worry about backwards propagation. It's not sufficiently profitable for us to worry about it too much. And if we did need to worry about it, we would not implement an optimization that only works for one kind of object. We'd implement an optimization that worked for accesses to all objects! Thanks for your clarification! I modeled GetGeneratorLocal/SetGeneratorLocal's type prediction part after GetLocal/SetLocal. But reading your descriptions, we should use GetClosureVar/PutClosureVar instead of this. Reading your description, using GetClosureVar/PutClosureVar covers GetGeneratorLocal's benefits (type prediction will be provided by value profiling in get_from_scope). Your suggestion, 1. add transformation bytecode->bytecode pass to transform op_save/op_resume to seq of ops 2. use GetClosureVar/PutClosureVar 3. use JSLexicalEnvironment looks fine to me too. One thing I don't understand yet is "And if you did add GeneratorLocalData, then I have no idea how you'd propose to make it work in SSA. SSA drops VariableAccessData because having objects that somehow link live ranges of variables is antithetical to how SSA works." part. GeneratorLocalData is now just used to transfer type prediction and node flags to linked one and it is done in DFGPredictionPropagationPhase and DFGBackwardPropagationPhase. And GetGeneratorLocal / SetGeneratorLocal does not touch/modify local variables directly. It just did the similar thing to GetClosureVar/PutClosureVar in this patch. So I think SSA is not related to that (as the same to that GetClosureVar/PutClosureVar are not related to SSA). Hmm, maybe I missed something, could you elaborate this? Anyway, I'll change the design to follow your suggestion :) >> Source/JavaScriptCore/dfg/DFGClobbersExitState.cpp:60 >> + > > Really? Are you saying that it's valid to re-execute a bytecode operation that was lowered to these nodes? Yes, op_save and op_resume were idempotent in this patch.
Yusuke Suzuki
Comment 42 2016-01-25 01:21:45 PST
still writing comments, once returning home, i'll add it further :D
Yusuke Suzuki
Comment 43 2016-01-25 10:52:17 PST
(In reply to comment #38) > Here are some more notes that might be useful for getting the generator > object model right. > > Currently we have three primary object models in JSC. By "primary" I mean > that they will end up on the hot path of just about any idiomatic JavaScript > program, and our compilers got to great lengths to make them fast. I'm only > considering object models for storing named fields, where the name is > supplied by the program, and those fields can hold any JS value. > > 1) JSObject > 2) JSEnvironmentRecord (i.e. what activation uses) > 3) DirectArguments > > What distinguishes (1) is that it has type inference, constant inference, > O(1) direct accesses provided that you lay out the structure carefully > enough. You risk having to do extra indirections to access properties if > the structure doesn't make them inline. The fact that JSObject has the > possibility of this kind of pessimisation is partly why we have (2) at all. > > If you know the set of properties ahead of time, you can use (2). It has > constant inference, but it does not have type inference (a program point > that loads from a property in an activation will still have local inference > to predict what type of values will be loaded there, but unlike JSObject, > they won't also get per-property guarantees about the type). Accesses are > O(1) and always direct; you don't have to do anything special to ensure that. > > Finally, we have (3) for sloppy arguments objects. They are popular enough, > and have sufficiently magical semantics, that we need a separate object > model here. (3) does not have either constant inference or type inference. > > Oddly, we created an object model, (2), for the purpose of improving > performance over (1). But right now, we are in this strange situation that > (1) has a performane feature, type inference, which we still have not ported > to (2). That's really unfortunate and I hope we do it soon. Neither type > inference nor constant inference have made it into DirectArguments. It > would be harder to add those features there, but not impossible. > > You could argue that having JSEnvironmentRecord be separate from JSObject, > rather than using JSObject's object model, is an overall architecture win > because when you have a JSEnvironmentRecord* it gives you certain guarantees > that JSObject cannot give you (accesses are direct, properties don't get > deleted, there are no accessors, etc). But from the standpoint of how to > evolve the VM and add features, it's really unfortunate that it's a separate > object model. We have implemented constant inference twice now - once for > JSEnvironmentRecords and then again for JSObjects. It's unfortunate that we > had to spend 2x the time on that feature. We have only implemented type > inference once (JSObject), but we will have to implement it a second time > soon (JSEnvironmentRecord). > > DirectArguments is even sadder than JSEnvironmentRecord, but it probably has > to be a separate thing because of the crazy semantics. > > Note that right now, both JSObject and JSEnvironmentRecord enjoy the > following optimizations in DFG and FTL: > - Local CSE in tier 3 and global CSE in tier 4. > - Field-name-based alias analysis. > - Escape analysis in tier 4. > - Inline allocation in all tiers. > > A generator frame, on the other hand, feels like it satisfies all of the > requirements for using JSEnvironmentRecord: you know the set of things that > go into it ahead of time and you want accesses to them to be as fast as > possible. I would strongly urge you to use JSEnvironmentRecord. If you > don't use it, then you'll have to implement all of those DFG and FTL > optimizations a third time, and then you'll either land it without constant > or type inference, or you'll have to do a ton of work to reimplement those > features. Thanks for your really nice description! It's so helpful. Your suggestion (using JSEnvironmentRecord in this case) sounds pretty reasonable. And I think one thing we need take care is that generator also saves temporary locals. For example, we can write, (1+ a) + yield b (1 + a) temporary local will be saved and resumed by generator. Assume that this temporary is `loc42`. But since this is just a temporary local, this loc42 will be reused for non-related temporary local elsewhere. So, ideally, we should not use this `loc42` as JSLexicalEnvironment's field identifier. But I think we should start from the simplest version. So in the meantime, just using local location as a field name seems OK to me.
Yusuke Suzuki
Comment 44 2016-01-25 10:54:22 PST
Comment on attachment 269434 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=269434&action=review >>>> Source/JavaScriptCore/ChangeLog:48 >>>> + This just allocates GeneratorFrame. >>> >>> How is this different from allocating an activation? Why can't they share the same opcodes and classes? >>> >>> How is this different from allocating a vanilla JS object? Why can't you use that? >> >> Good pointing. Linking generator locals can be achieved by GetGeneratorLocal and SetGeneratorLocal. >> So we can use JSLexicalEnvironment instead of GeneratorFrame. > > Why not just use existing PutClosureVar/GetClosureVar opcodes instead of adding new opcodes? > > Adding new opcodes to DFG IR takes a lot of work. It also makes the entire compiler harder to work on. We shouldn't add new opcodes unless there is a really powerful optimization that this enables. > > I don't believe that GetGeneratorLocal/SetGeneratorLocal enables any optimizations. Transformating op_save / op_resume to actual op seq looks reasonable to me. (Like, transforming them to get_from_scope / put_to_scope ops). It will drop BytecodeAnalysis's flag, that is fragile.
Yusuke Suzuki
Comment 45 2016-01-25 11:21:39 PST
(In reply to comment #40) > Now the other question is how to compile generators. One approach is to > turn save/restore into a sequence of put_to_scope and then get_from_scope > operations. You could do this as a bytecode->bytecode transform, or you > could somehow make "save" and "restore" imply such a sequence. > > If you used a bonafide sequence of get_from_scope instructions for restore, > then each one would have value profiling. This is all you'd need to ensure > that DFG prediction propagation is happy. It looks reasonable :) > > I think that this would be pretty good, but you run the risk that each save > and each restore does more work than strictly necessary. It's possible that > the program will have more live state at save/restore points than what is > needed for any particular execution of the generator. It seems like the > generator should load and save properties of the frame only when it needs > them. Interesting! > > Another approach would be to do a bytecode analysis that splits bytecode > variables into disjoint live ranges, kind of like VariableAccessData, and > then checks which of those are live at save/restore. Any live range that is > live at save/restore has all of its uses turned into a load from the frame > and all of its defs turned into a store into the frame. This means that if > loc42 is used for a temporary at a save point, then this temporary doesn't > get completely demoted to the heap - it's most likely only demoted for the > uses after the restore and the defs before the save. Interesting! my understanding is as follows, 1. first, to identify meaningful locals (how can I say...), construct live ranges (should not use loc42 name to construct locals, b/c loc42 will be used elsewhere for non-related temporary local) 2. And then, if the range is live at the save point, making it heap variable; it means that converting use to get_from_scope and def to put_to_scope. loc42 : [def] -> [save] -> [use] [def] -> [use] [save] range1 range2 => converted to heap => not converted > > I think that so long as this issue is encapsulated inside a phase that > transforms the save/restore form into a form that uses > put_to_scope/get_from_scope instructions, then it's fine to have the initial > implementation emit a get_from_scope for every live-in-frame local at the > restore point and a put_to_scope for each such local at every save. We can > always play with different strategies later! > > This approach implies really beefing up the available machinery for editing > bytecode - along with all of the necessary renumbering of jump targets and > the like. It's a fun algorithm to implement, and if you create a nice API > for it then we'll be able to reuse it for other features that benefit from > bytecode->bytecode transformation. It sounds very nice! OK, I'll redesign the implementation based on Filip's suggestions, thank you :D As a first step, I'll just implement naive version, that emits get_from_scope/put_to_scope ops for all the live locals at the save/restore.
Yusuke Suzuki
Comment 46 2016-03-06 04:39:32 PST
Currently, all the bytecode analysis and modification does not inject new basic blocks. And only generators require the bytecode modification. So I think it is good first step to introduce the bytecode rewriter; that rewrites some bytecodes and adjusts the jump offsets, but it does not have the functionality to add new jumps and new BB.
Yusuke Suzuki
Comment 47 2016-04-27 21:23:47 PDT
Created attachment 277585 [details] Patch Rebaseline and start reworking
Yusuke Suzuki
Comment 48 2016-05-05 09:36:22 PDT
Created attachment 278168 [details] Patch WIP
Yusuke Suzuki
Comment 49 2016-05-18 20:14:05 PDT
Created attachment 279342 [details] Patch WIP: rebaseline. Need to add liveness analysis in UnlinkedCodeBlock phase
Filip Pizlo
Comment 50 2016-05-18 21:35:05 PDT
Comment on attachment 279342 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=279342&action=review > Source/JavaScriptCore/ChangeLog:13 > + - op_new_generator_frame > + > + Allocate GeneratorFrame in the bytecode instead of allocating it in the op_save. Are you still going to change this to either JSObjects or JSLexicalEnvironment? > Source/JavaScriptCore/ChangeLog:25 > + - op_save > + > + Now BytecodeLiveness analysis and PreciseJumpTarget analysis have 2 modes; Standard and ExtractGeneratorLiveCalleeLocals. > + Under ExtractGeneratorLiveCalleeLocals mode, op_save has jump to the merge point. This jump is just used for use-def analysis to extract > + live callee locals at the merge point. These live callee locals should be saved and resumed to suspend and resume generators. > + The merge point is the point that merging the 2 basic blocks, saving locals to the generator frame and resuming locals from the generator frame. > + > + - op_resume > + > + This retrieves local variable from the generator frame. ExtractGeneratorLiveCalleeLocals analysis will discover live locals at the merge point and record liveness > + information to op_save and op_resume. On the basis of this information, op_resume will retrieve saved locals from the generator frame. I still think that the cleanest way to make this work is to add a bytecode rewriter to JSC: i.e. a thing that allows us to insert or remove bytecode operations and then do a fixup over the jump targets. This would be something like the bytecode rewriters that people wrote for Java bytecode, like BCEL (https://commons.apache.org/proper/commons-bcel/) and asm (http://asm.ow2.org), but a heck of a lot simpler since our bytecode is much less sophisticated. Both of those frameworks do this by first making the bytecode object-oriented - one object per bytecode instruction, with a linked list connecting the objects. I think we want to take a somewhat different approach, probably using something like the DFG::InsertionSet: build up a vector of intended insertions and deletions and then "execute" it at the end, which would allow us to fixup all jump destinations in one go. If you had this, then you could have the BytecodeGenerator flag UnlinkedCodeBlocks that have op_save/resume, and for those codeblocks, we would immediately run the "generatorification" phase that uses liveness to lower save/resume to sequences of put_to_scope/get_from_scope operations plus the control flow. After this phase, nobody has to know about generators anymore. But this immediately reveals a question: is it clear that the generator function has to be a function and not multiple functions? Why can't we have a separate function for each resume entrypoint? This would likely be the most optimal implementation. That would require "breeding" multiple UnlinkedCodeBlocks as part of generatorification. Is this feasible according to the spec, or does the spec require the generator to behave as a single function object?
Yusuke Suzuki
Comment 51 2016-05-19 13:11:18 PDT
Comment on attachment 279342 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=279342&action=review >> Source/JavaScriptCore/ChangeLog:13 >> + Allocate GeneratorFrame in the bytecode instead of allocating it in the op_save. > > Are you still going to change this to either JSObjects or JSLexicalEnvironment? Yes! I'm now going to make this JSLexicalEnvironment. But before that, first, I'm now working on implementing Bytecode rewriting system. Sorry for my confusing change. This is still WIP, and this changelog comment is not updated yet... >> Source/JavaScriptCore/ChangeLog:25 >> + information to op_save and op_resume. On the basis of this information, op_resume will retrieve saved locals from the generator frame. > > I still think that the cleanest way to make this work is to add a bytecode rewriter to JSC: i.e. a thing that allows us to insert or remove bytecode operations and then do a fixup over the jump targets. This would be something like the bytecode rewriters that people wrote for Java bytecode, like BCEL (https://commons.apache.org/proper/commons-bcel/) and asm (http://asm.ow2.org), but a heck of a lot simpler since our bytecode is much less sophisticated. Both of those frameworks do this by first making the bytecode object-oriented - one object per bytecode instruction, with a linked list connecting the objects. I think we want to take a somewhat different approach, probably using something like the DFG::InsertionSet: build up a vector of intended insertions and deletions and then "execute" it at the end, which would allow us to fixup all jump destinations in one go. > > If you had this, then you could have the BytecodeGenerator flag UnlinkedCodeBlocks that have op_save/resume, and for those codeblocks, we would immediately run the "generatorification" phase that uses liveness to lower save/resume to sequences of put_to_scope/get_from_scope operations plus the control flow. After this phase, nobody has to know about generators anymore. > > But this immediately reveals a question: is it clear that the generator function has to be a function and not multiple functions? Why can't we have a separate function for each resume entrypoint? This would likely be the most optimal implementation. That would require "breeding" multiple UnlinkedCodeBlocks as part of generatorification. Is this feasible according to the spec, or does the spec require the generator to behave as a single function object? Bytecode rewriting seems clean approach. First, I attempted to create bytecode rewriting as follows, (1). Construct bytecode operation object (such as BytecodeOperation) as you described. This is similar to DFGNode: it is constructed per bytecode operation (op_get_by_id etc.). (2). Represent bytecode BasicBlock as the Vector<BytecodeOperation> (3). And modifying this, and generate the modified bytecode from this graph. But I think it's too heavy for "generator" modification... Since we don't need to modify the jumps of bytecode basic blocks (if we don't want to discard op_save's pseudo jumps), insertion set is good enough. That's why I'm now attempting to introduce BytecodeInsertionSet.h in this patch (not implemented yet, it's just a stub). To do so (and to make the generator code ready when linking CodeBlock), this modification should be done when the code is UnlinkedCodeBlock. So now I'm working on changing BytecodeLivenessAnalysis applicable to UnlinkedCodeBlock / CodeBlock. Now I'm considering BytecodeLivenessAnalysis<BytecodeGraph<UnlinkedCodeBlock>> style. And I'm now considering about applying such change just after BytecodeGenerator::generate. Maybe, I'll insert some bytecode modification phase after that, and in this phase, if the UnlinkedCodeBlock is generator's code, we will apply this "generatorification". By doing so, when linking CodeBlock, the generator code's JSLexicalEnvironment-based op_save / op_resume is ready. (this will become the sequence of op_put_to_scope* / op_get_from_scope*). Your last question is interesting. The spec allows us to create the generator function as the group of small functions. The spec requires us to use the same Generator.prototype.next() function, but the suqsequent sequence (such as, how the generator works after Generator.prototype.next() call) is not defined. But is that so efficient compared to the exising switch based implementation? Is the case that receives benefit function *() { small processings... (should not be long processing. If the processing is long, anyway, this function fragment cannot be inlined) yield small processings... yield small processings... yield } case? And if we do that, I think the implementation would become the following. 1. First, create this whole function and analyze it. 2. When the execution reach to the resume point (yield), we assign the function that corresponds to this resume point. 3. And when this function is called, we pass the whole function's information and create the corresponding UnlinkedCodeBlock at that time.
Filip Pizlo
Comment 52 2016-05-19 13:18:01 PDT
(In reply to comment #51) > Comment on attachment 279342 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=279342&action=review > > >> Source/JavaScriptCore/ChangeLog:13 > >> + Allocate GeneratorFrame in the bytecode instead of allocating it in the op_save. > > > > Are you still going to change this to either JSObjects or JSLexicalEnvironment? > > Yes! I'm now going to make this JSLexicalEnvironment. > But before that, first, I'm now working on implementing Bytecode rewriting > system. > > Sorry for my confusing change. This is still WIP, and this changelog comment > is not updated yet... > > >> Source/JavaScriptCore/ChangeLog:25 > >> + information to op_save and op_resume. On the basis of this information, op_resume will retrieve saved locals from the generator frame. > > > > I still think that the cleanest way to make this work is to add a bytecode rewriter to JSC: i.e. a thing that allows us to insert or remove bytecode operations and then do a fixup over the jump targets. This would be something like the bytecode rewriters that people wrote for Java bytecode, like BCEL (https://commons.apache.org/proper/commons-bcel/) and asm (http://asm.ow2.org), but a heck of a lot simpler since our bytecode is much less sophisticated. Both of those frameworks do this by first making the bytecode object-oriented - one object per bytecode instruction, with a linked list connecting the objects. I think we want to take a somewhat different approach, probably using something like the DFG::InsertionSet: build up a vector of intended insertions and deletions and then "execute" it at the end, which would allow us to fixup all jump destinations in one go. > > > > If you had this, then you could have the BytecodeGenerator flag UnlinkedCodeBlocks that have op_save/resume, and for those codeblocks, we would immediately run the "generatorification" phase that uses liveness to lower save/resume to sequences of put_to_scope/get_from_scope operations plus the control flow. After this phase, nobody has to know about generators anymore. > > > > But this immediately reveals a question: is it clear that the generator function has to be a function and not multiple functions? Why can't we have a separate function for each resume entrypoint? This would likely be the most optimal implementation. That would require "breeding" multiple UnlinkedCodeBlocks as part of generatorification. Is this feasible according to the spec, or does the spec require the generator to behave as a single function object? > > Bytecode rewriting seems clean approach. > First, I attempted to create bytecode rewriting as follows, > (1). Construct bytecode operation object (such as BytecodeOperation) as you > described. This is similar to DFGNode: it is constructed per bytecode > operation (op_get_by_id etc.). > (2). Represent bytecode BasicBlock as the Vector<BytecodeOperation> > (3). And modifying this, and generate the modified bytecode from this graph. > > But I think it's too heavy for "generator" modification... Since we don't > need to modify the jumps of bytecode basic blocks (if we don't want to > discard op_save's pseudo jumps), insertion set is good enough. I think you'll still need to adjust the jump targets. Inserting code between a jump and the label it jumps to means that the jump's "offset" will have to be increased. I think that this should be easy to do if you have an insertion set, though. > That's why > I'm now attempting to introduce BytecodeInsertionSet.h in this patch (not > implemented yet, it's just a stub). > To do so (and to make the generator code ready when linking CodeBlock), this > modification should be done when the code is UnlinkedCodeBlock. So now I'm > working on changing BytecodeLivenessAnalysis applicable to UnlinkedCodeBlock > / CodeBlock. Now I'm considering > BytecodeLivenessAnalysis<BytecodeGraph<UnlinkedCodeBlock>> style. > And I'm now considering about applying such change just after > BytecodeGenerator::generate. Maybe, I'll insert some bytecode modification > phase after that, and in this phase, if the UnlinkedCodeBlock is generator's > code, we will apply this "generatorification". By doing so, when linking > CodeBlock, the generator code's JSLexicalEnvironment-based op_save / > op_resume is ready. (this will become the sequence of op_put_to_scope* / > op_get_from_scope*). > > Your last question is interesting. The spec allows us to create the > generator function as the group of small functions. The spec requires us to > use the same Generator.prototype.next() function, but the suqsequent > sequence (such as, how the generator works after Generator.prototype.next() > call) is not defined. Interesting! In that case, your current approach is optimal. I don't think that breeding a new function for each case would not help at all. > But is that so efficient compared to the exising > switch based implementation? Is the case that receives benefit > > function *() { > small processings... (should not be long processing. If the processing is > long, anyway, this function fragment cannot be inlined) > yield > small processings... > yield > small processings... > yield > } > > case? > > And if we do that, I think the implementation would become the following. > 1. First, create this whole function and analyze it. > 2. When the execution reach to the resume point (yield), we assign the > function that corresponds to this resume point. > 3. And when this function is called, we pass the whole function's > information and create the corresponding UnlinkedCodeBlock at that time. Right. But I think that this would still lead to a virtual dispatch. The VM isn't going to do virtual dispatch any better than a switch statement. So, having the generator function have a switch statement is optimal. I'm assuming that this is what you do now.
Yusuke Suzuki
Comment 53 2016-05-19 14:12:55 PDT
(In reply to comment #52) > (In reply to comment #51) > > Comment on attachment 279342 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=279342&action=review > > > > >> Source/JavaScriptCore/ChangeLog:13 > > >> + Allocate GeneratorFrame in the bytecode instead of allocating it in the op_save. > > > > > > Are you still going to change this to either JSObjects or JSLexicalEnvironment? > > > > Yes! I'm now going to make this JSLexicalEnvironment. > > But before that, first, I'm now working on implementing Bytecode rewriting > > system. > > > > Sorry for my confusing change. This is still WIP, and this changelog comment > > is not updated yet... > > > > >> Source/JavaScriptCore/ChangeLog:25 > > >> + information to op_save and op_resume. On the basis of this information, op_resume will retrieve saved locals from the generator frame. > > > > > > I still think that the cleanest way to make this work is to add a bytecode rewriter to JSC: i.e. a thing that allows us to insert or remove bytecode operations and then do a fixup over the jump targets. This would be something like the bytecode rewriters that people wrote for Java bytecode, like BCEL (https://commons.apache.org/proper/commons-bcel/) and asm (http://asm.ow2.org), but a heck of a lot simpler since our bytecode is much less sophisticated. Both of those frameworks do this by first making the bytecode object-oriented - one object per bytecode instruction, with a linked list connecting the objects. I think we want to take a somewhat different approach, probably using something like the DFG::InsertionSet: build up a vector of intended insertions and deletions and then "execute" it at the end, which would allow us to fixup all jump destinations in one go. > > > > > > If you had this, then you could have the BytecodeGenerator flag UnlinkedCodeBlocks that have op_save/resume, and for those codeblocks, we would immediately run the "generatorification" phase that uses liveness to lower save/resume to sequences of put_to_scope/get_from_scope operations plus the control flow. After this phase, nobody has to know about generators anymore. > > > > > > But this immediately reveals a question: is it clear that the generator function has to be a function and not multiple functions? Why can't we have a separate function for each resume entrypoint? This would likely be the most optimal implementation. That would require "breeding" multiple UnlinkedCodeBlocks as part of generatorification. Is this feasible according to the spec, or does the spec require the generator to behave as a single function object? > > > > Bytecode rewriting seems clean approach. > > First, I attempted to create bytecode rewriting as follows, > > (1). Construct bytecode operation object (such as BytecodeOperation) as you > > described. This is similar to DFGNode: it is constructed per bytecode > > operation (op_get_by_id etc.). > > (2). Represent bytecode BasicBlock as the Vector<BytecodeOperation> > > (3). And modifying this, and generate the modified bytecode from this graph. > > > > But I think it's too heavy for "generator" modification... Since we don't > > need to modify the jumps of bytecode basic blocks (if we don't want to > > discard op_save's pseudo jumps), insertion set is good enough. > > I think you'll still need to adjust the jump targets. Inserting code > between a jump and the label it jumps to means that the jump's "offset" will > have to be increased. I think that this should be easy to do if you have an > insertion set, though. Yeah, right, adjusting the offset when inserting the bytecodes is OK. What I thought is that we don't need to create / drop basic blocks, that makes insertion set enough. > > > That's why > > I'm now attempting to introduce BytecodeInsertionSet.h in this patch (not > > implemented yet, it's just a stub). > > To do so (and to make the generator code ready when linking CodeBlock), this > > modification should be done when the code is UnlinkedCodeBlock. So now I'm > > working on changing BytecodeLivenessAnalysis applicable to UnlinkedCodeBlock > > / CodeBlock. Now I'm considering > > BytecodeLivenessAnalysis<BytecodeGraph<UnlinkedCodeBlock>> style. > > And I'm now considering about applying such change just after > > BytecodeGenerator::generate. Maybe, I'll insert some bytecode modification > > phase after that, and in this phase, if the UnlinkedCodeBlock is generator's > > code, we will apply this "generatorification". By doing so, when linking > > CodeBlock, the generator code's JSLexicalEnvironment-based op_save / > > op_resume is ready. (this will become the sequence of op_put_to_scope* / > > op_get_from_scope*). > > > > Your last question is interesting. The spec allows us to create the > > generator function as the group of small functions. The spec requires us to > > use the same Generator.prototype.next() function, but the suqsequent > > sequence (such as, how the generator works after Generator.prototype.next() > > call) is not defined. > > Interesting! In that case, your current approach is optimal. I don't think > that breeding a new function for each case would not help at all. > > > But is that so efficient compared to the exising > > switch based implementation? Is the case that receives benefit > > > > function *() { > > small processings... (should not be long processing. If the processing is > > long, anyway, this function fragment cannot be inlined) > > yield > > small processings... > > yield > > small processings... > > yield > > } > > > > case? > > > > And if we do that, I think the implementation would become the following. > > 1. First, create this whole function and analyze it. > > 2. When the execution reach to the resume point (yield), we assign the > > function that corresponds to this resume point. > > 3. And when this function is called, we pass the whole function's > > information and create the corresponding UnlinkedCodeBlock at that time. > > Right. But I think that this would still lead to a virtual dispatch. The > VM isn't going to do virtual dispatch any better than a switch statement. > So, having the generator function have a switch statement is optimal. I'm > assuming that this is what you do now. OK. Maybe, by using the group of the functions, we can remove one virtual dispatching point. The current. Generator.prototype.next() (This function is required to be the same in all the generators)'s dispatching => each generator function => switch Your proposal. Generator.prototype.next()'s dispatching => each function fragment. But I think "Generator.prototype.next()'s dispatching => each generator function" can have the chance to optimize with inlining if we can figure out that the generator object is JSConstant.
Filip Pizlo
Comment 54 2016-05-19 14:17:41 PDT
(In reply to comment #53) > (In reply to comment #52) > > (In reply to comment #51) > > > Comment on attachment 279342 [details] > > > Patch > > > > > > View in context: > > > https://bugs.webkit.org/attachment.cgi?id=279342&action=review > > > > > > >> Source/JavaScriptCore/ChangeLog:13 > > > >> + Allocate GeneratorFrame in the bytecode instead of allocating it in the op_save. > > > > > > > > Are you still going to change this to either JSObjects or JSLexicalEnvironment? > > > > > > Yes! I'm now going to make this JSLexicalEnvironment. > > > But before that, first, I'm now working on implementing Bytecode rewriting > > > system. > > > > > > Sorry for my confusing change. This is still WIP, and this changelog comment > > > is not updated yet... > > > > > > >> Source/JavaScriptCore/ChangeLog:25 > > > >> + information to op_save and op_resume. On the basis of this information, op_resume will retrieve saved locals from the generator frame. > > > > > > > > I still think that the cleanest way to make this work is to add a bytecode rewriter to JSC: i.e. a thing that allows us to insert or remove bytecode operations and then do a fixup over the jump targets. This would be something like the bytecode rewriters that people wrote for Java bytecode, like BCEL (https://commons.apache.org/proper/commons-bcel/) and asm (http://asm.ow2.org), but a heck of a lot simpler since our bytecode is much less sophisticated. Both of those frameworks do this by first making the bytecode object-oriented - one object per bytecode instruction, with a linked list connecting the objects. I think we want to take a somewhat different approach, probably using something like the DFG::InsertionSet: build up a vector of intended insertions and deletions and then "execute" it at the end, which would allow us to fixup all jump destinations in one go. > > > > > > > > If you had this, then you could have the BytecodeGenerator flag UnlinkedCodeBlocks that have op_save/resume, and for those codeblocks, we would immediately run the "generatorification" phase that uses liveness to lower save/resume to sequences of put_to_scope/get_from_scope operations plus the control flow. After this phase, nobody has to know about generators anymore. > > > > > > > > But this immediately reveals a question: is it clear that the generator function has to be a function and not multiple functions? Why can't we have a separate function for each resume entrypoint? This would likely be the most optimal implementation. That would require "breeding" multiple UnlinkedCodeBlocks as part of generatorification. Is this feasible according to the spec, or does the spec require the generator to behave as a single function object? > > > > > > Bytecode rewriting seems clean approach. > > > First, I attempted to create bytecode rewriting as follows, > > > (1). Construct bytecode operation object (such as BytecodeOperation) as you > > > described. This is similar to DFGNode: it is constructed per bytecode > > > operation (op_get_by_id etc.). > > > (2). Represent bytecode BasicBlock as the Vector<BytecodeOperation> > > > (3). And modifying this, and generate the modified bytecode from this graph. > > > > > > But I think it's too heavy for "generator" modification... Since we don't > > > need to modify the jumps of bytecode basic blocks (if we don't want to > > > discard op_save's pseudo jumps), insertion set is good enough. > > > > I think you'll still need to adjust the jump targets. Inserting code > > between a jump and the label it jumps to means that the jump's "offset" will > > have to be increased. I think that this should be easy to do if you have an > > insertion set, though. > > Yeah, right, adjusting the offset when inserting the bytecodes is OK. What I > thought is that we don't need to create / drop basic blocks, that makes > insertion set enough. > > > > > > That's why > > > I'm now attempting to introduce BytecodeInsertionSet.h in this patch (not > > > implemented yet, it's just a stub). > > > To do so (and to make the generator code ready when linking CodeBlock), this > > > modification should be done when the code is UnlinkedCodeBlock. So now I'm > > > working on changing BytecodeLivenessAnalysis applicable to UnlinkedCodeBlock > > > / CodeBlock. Now I'm considering > > > BytecodeLivenessAnalysis<BytecodeGraph<UnlinkedCodeBlock>> style. > > > And I'm now considering about applying such change just after > > > BytecodeGenerator::generate. Maybe, I'll insert some bytecode modification > > > phase after that, and in this phase, if the UnlinkedCodeBlock is generator's > > > code, we will apply this "generatorification". By doing so, when linking > > > CodeBlock, the generator code's JSLexicalEnvironment-based op_save / > > > op_resume is ready. (this will become the sequence of op_put_to_scope* / > > > op_get_from_scope*). > > > > > > Your last question is interesting. The spec allows us to create the > > > generator function as the group of small functions. The spec requires us to > > > use the same Generator.prototype.next() function, but the suqsequent > > > sequence (such as, how the generator works after Generator.prototype.next() > > > call) is not defined. > > > > Interesting! In that case, your current approach is optimal. I don't think > > that breeding a new function for each case would not help at all. > > > > > But is that so efficient compared to the exising > > > switch based implementation? Is the case that receives benefit > > > > > > function *() { > > > small processings... (should not be long processing. If the processing is > > > long, anyway, this function fragment cannot be inlined) > > > yield > > > small processings... > > > yield > > > small processings... > > > yield > > > } > > > > > > case? > > > > > > And if we do that, I think the implementation would become the following. > > > 1. First, create this whole function and analyze it. > > > 2. When the execution reach to the resume point (yield), we assign the > > > function that corresponds to this resume point. > > > 3. And when this function is called, we pass the whole function's > > > information and create the corresponding UnlinkedCodeBlock at that time. > > > > Right. But I think that this would still lead to a virtual dispatch. The > > VM isn't going to do virtual dispatch any better than a switch statement. > > So, having the generator function have a switch statement is optimal. I'm > > assuming that this is what you do now. > > OK. Maybe, by using the group of the functions, we can remove one virtual > dispatching point. > > The current. > Generator.prototype.next() (This function is required to be the same in all > the generators)'s dispatching => each generator function => switch > > Your proposal. > Generator.prototype.next()'s dispatching => each function fragment. > > But I think "Generator.prototype.next()'s dispatching => each generator > function" can have the chance to optimize with inlining if we can figure out > that the generator object is JSConstant. Aha! OK, now I understand it even better. Splitting a generator into multiple functions is definitely the way to go. Then loading from the generator frame is just like loading your arguments, and storing back to it at the end is just like populating a returned tuple (if you return a struct in C the best the compiler can do is have the caller pass in the address of where to put the struct and the callee returns the struct by storing to this address). This would be a better approach. It would be cool if JSC had the fastest implementation of generators, so it's worth spending a good deal of time thinking about what the best approach would look like. Do you think it's worth it to try to rework towards this approach now? It might be cheaper than finishing the current implementation and then reworking it, but I'll let you make the call.
Geoffrey Garen
Comment 55 2016-05-19 14:38:43 PDT
Is there a risk that dispatch to more than one target function will harm inlining? At the limit, it would be nice to inline generators, so that you could write iterators as generators without paying a high cost.
Yusuke Suzuki
Comment 56 2016-05-19 14:51:14 PDT
(In reply to comment #54) > Aha! OK, now I understand it even better. Splitting a generator into > multiple functions is definitely the way to go. Then loading from the > generator frame is just like loading your arguments, and storing back to it > at the end is just like populating a returned tuple (if you return a struct > in C the best the compiler can do is have the caller pass in the address of > where to put the struct and the callee returns the struct by storing to this > address). This would be a better approach. It would be cool if JSC had the > fastest implementation of generators, so it's worth spending a good deal of > time thinking about what the best approach would look like. > > Do you think it's worth it to try to rework towards this approach now? It > might be cheaper than finishing the current implementation and then > reworking it, but I'll let you make the call. Yes! Making JSC's generator super faster than the others is very nice thing, and I would like to choose the fastest one! So, before doing it, I would like to share and discuss one concern about performance with you. I'll note about that. The current implementation (simplified one) function next(generator) { // dispatching site. (CallSite) let result = generator.@body(generator, generator.@status); return result; } function body(generator, status) { switch (status) { ... generator.@status = updatedStatus; return value; } } And fragmented version, function next(generator) { // dispatching site. (CallSite) let result = generator.@body(generator); return result; } function body1(generator) { ... generator.@body = body2; return value; } function body2(generator) { ... } My concern is about inlining. Generator can be called by for-of loops. And inlining when doing this would be super beneficial. for (let value of generator()) { ... } This calls Generator.prototype.next() repeatedly. At that case, if we take the current version, and the generator is figured out that is JSConstant during this iterations, we can retrieve the body function from JSConstant generator and perform inlining whole the callees (I'm not sure this can be done in the current ToT, but I believe that using pure get_by_id in next function can make us implementing it) But, the fragmented function version, its dispatching causes megamorphic call site in next() function. Considering the case that there is many kind of generators, in this case, this call site has too many call variants. This is the same in the current version, but in the current version, we have the chance to determine one possible function by using the constant generator and `generator.@body` lookup (And if we get one possible function, we have the chance to inline it. DFGByteCodeParser::refineStatistically allow us to do so.) But in this fragmented function version, `generator.@body` has also many variants and I think we cannot determine the target of inlining. This is my concern, and I would like to ask you about what you think of. And maybe, I misunderstand many things. And please point them when you find.
Yusuke Suzuki
Comment 57 2016-06-07 22:50:53 PDT
Ping? Because of the above reason, in the meantime, I'm now planning to go with the switch based implementation.
Filip Pizlo
Comment 58 2016-06-07 22:54:01 PDT
(In reply to comment #56) > (In reply to comment #54) > > Aha! OK, now I understand it even better. Splitting a generator into > > multiple functions is definitely the way to go. Then loading from the > > generator frame is just like loading your arguments, and storing back to it > > at the end is just like populating a returned tuple (if you return a struct > > in C the best the compiler can do is have the caller pass in the address of > > where to put the struct and the callee returns the struct by storing to this > > address). This would be a better approach. It would be cool if JSC had the > > fastest implementation of generators, so it's worth spending a good deal of > > time thinking about what the best approach would look like. > > > > Do you think it's worth it to try to rework towards this approach now? It > > might be cheaper than finishing the current implementation and then > > reworking it, but I'll let you make the call. > > Yes! Making JSC's generator super faster than the others is very nice thing, > and I would like to choose the fastest one! > So, before doing it, I would like to share and discuss one concern about > performance with you. I'll note about that. > > The current implementation (simplified one) > > function next(generator) > { > // dispatching site. (CallSite) > let result = generator.@body(generator, generator.@status); > return result; > } > > function body(generator, status) > { > switch (status) > { > ... > generator.@status = updatedStatus; > return value; > } > } > > And fragmented version, > > function next(generator) > { > // dispatching site. (CallSite) > let result = generator.@body(generator); > return result; > } > > function body1(generator) > { > ... > generator.@body = body2; > return value; > } > > function body2(generator) > { > ... > } > > > My concern is about inlining. Generator can be called by for-of loops. And > inlining when doing this would be super beneficial. > > for (let value of generator()) { > ... > } > > This calls Generator.prototype.next() repeatedly. At that case, if we take > the current version, and the generator is figured out that is JSConstant > during this iterations, we can retrieve the body function from JSConstant > generator and perform inlining whole the callees (I'm not sure this can be > done in the current ToT, but I believe that using pure get_by_id in next > function can make us implementing it) > > But, the fragmented function version, its dispatching causes megamorphic > call site in next() function. Considering the case that there is many kind > of generators, in this case, this call site has too many call variants. This > is the same in the current version, but in the current version, we have the > chance to determine one possible function by using the constant generator > and `generator.@body` lookup (And if we get one possible function, we have > the chance to inline it. DFGByteCodeParser::refineStatistically allow us to > do so.) But in this fragmented function version, `generator.@body` has also > many variants and I think we cannot determine the target of inlining. > > This is my concern, and I would like to ask you about what you think of. > And maybe, I misunderstand many things. And please point them when you find. I don't think this is a valid concern. I believe that virtual dispatch on a next call is going to exhibit higher performance than a switch statement because: - If the switch statement gets large, then it won't get inlined. - If the next() call is polymorphic, it will still be inlined so long as the total size of the inlinees is smaller than the inlining size limit. Therefore, inlining will happen in exactly the same cases regardless. But if it doesn't happen, then the switch approach is strictly worse: first you pay the price for a call and then you pay the price for a switch, as opposed to having the call do the switch for you.
Filip Pizlo
Comment 59 2016-06-07 22:55:02 PDT
(In reply to comment #57) > Ping? > > Because of the above reason, in the meantime, I'm now planning to go with > the switch based implementation. Sorry for not responding. :-( I sometimes need to being pinged on e-mail if I really need to comment on something.
Yusuke Suzuki
Comment 60 2016-06-08 01:16:15 PDT
(In reply to comment #58) > > I don't think this is a valid concern. I believe that virtual dispatch on a > next call is going to exhibit higher performance than a switch statement > because: > > - If the switch statement gets large, then it won't get inlined. > > - If the next() call is polymorphic, it will still be inlined so long as the > total size of the inlinees is smaller than the inlining size limit. > > Therefore, inlining will happen in exactly the same cases regardless. But > if it doesn't happen, then the switch approach is strictly worse: first you > pay the price for a call and then you pay the price for a switch, as opposed > to having the call do the switch for you. Thanks. The number of the call variants of the next() function's dispatcher becomes many in both implementations if there are many kinds of generators. But I thought that we can have a chance to retrieve one possible inlinee from this many call variants by using refineStatistically() in the switch case (in the current ToT implementation, it's not possible). But maybe, the number of the call variants depends on the benchmark (how to use generators), and maybe, before becoming super large, callsite will be cleared by GC's weak visiting. So if this becomes a problem, we can revisit the design. And, using switch based implementation makes the function larger, avoiding the inlining. Currently, I'm considering the following rough design. 1. First, creating UnlinkedCodeBlock for the whole generator code. suspend marker opcode is inserted. (Maybe, suspend becomes simple jump. to split basic blocks) 2. Perform liveness analysis and store liveness info into suspend marker. 3. When creating the function fragment (CodeBlock linking phase), we retrieve the BB graph from the above unlinked code block, those BBs are dominated from the target suspend BB. And create the partial function. The above is the simplest one, I think the above one is good for the first implementation currently. Several interesting changes can be considered, if there's a function like, function *gen() { for (...) { yield; } super long code! } This will produce 2 fragmented function, function g1() { if (...) { // suspend return; } super long code! } and function g2() { // resume increment; if (...) { // suspend return; } super long code! } the above functions can be reduced by using the tail calls. function g1() { if (...) { // suspend return; } return mergePoint(); } function g2() { // resume increment; if (...) { // suspend return; } return mergePoint(); } function mergePoint() { super long code! }
Yusuke Suzuki
Comment 61 2016-06-08 03:36:44 PDT
(In reply to comment #59) > (In reply to comment #57) > > Ping? > > > > Because of the above reason, in the meantime, I'm now planning to go with > > the switch based implementation. > > Sorry for not responding. :-( I sometimes need to being pinged on e-mail if > I really need to comment on something. No probelm. I'll mail when I need your comment :D
Filip Pizlo
Comment 62 2016-06-08 07:12:54 PDT
(In reply to comment #60) > (In reply to comment #58) > > > > I don't think this is a valid concern. I believe that virtual dispatch on a > > next call is going to exhibit higher performance than a switch statement > > because: > > > > - If the switch statement gets large, then it won't get inlined. > > > > - If the next() call is polymorphic, it will still be inlined so long as the > > total size of the inlinees is smaller than the inlining size limit. > > > > Therefore, inlining will happen in exactly the same cases regardless. But > > if it doesn't happen, then the switch approach is strictly worse: first you > > pay the price for a call and then you pay the price for a switch, as opposed > > to having the call do the switch for you. > > Thanks. > The number of the call variants of the next() function's dispatcher becomes > many in both implementations if there are many kinds of generators. But I > thought that we can have a chance to retrieve one possible inlinee from this > many call variants by using refineStatistically() in the switch case (in the > current ToT implementation, it's not possible). > > But maybe, the number of the call variants depends on the benchmark (how to > use generators), and maybe, before becoming super large, callsite will be > cleared by GC's weak visiting. So if this becomes a problem, we can revisit > the design. > > And, using switch based implementation makes the function larger, avoiding > the inlining. > > Currently, I'm considering the following rough design. > > 1. First, creating UnlinkedCodeBlock for the whole generator code. suspend > marker opcode is inserted. (Maybe, suspend becomes simple jump. to split > basic blocks) > 2. Perform liveness analysis and store liveness info into suspend marker. > 3. When creating the function fragment (CodeBlock linking phase), we > retrieve the BB graph from the above unlinked code block, those BBs are > dominated from the target suspend BB. And create the partial function. Note that the property you want is reachability, not dominance. > > The above is the simplest one, I think the above one is good for the first > implementation currently. > Several interesting changes can be considered, > > if there's a function like, > > function *gen() > { > for (...) { > yield; > } > > super long code! > } > > This will produce 2 fragmented function, > > function g1() { > if (...) { > // suspend > return; > } > > super long code! > } > > and > > function g2() { > // resume > increment; > if (...) { > // suspend > return; > } > > super long code! > } > > the above functions can be reduced by using the tail calls. > > function g1() { > if (...) { > // suspend > return; > } > return mergePoint(); > } > > function g2() { > // resume > increment; > if (...) { > // suspend > return; > } > return mergePoint(); > } > > function mergePoint() > { > super long code! > } Intereating. duplicating super long code is definitely worth avoiding. Do you have N algorithm in mind for identifying the mergePoint?
Yusuke Suzuki
Comment 63 2016-06-08 08:05:09 PDT
(In reply to comment #62) > (In reply to comment #60) > > (In reply to comment #58) > > > > > > I don't think this is a valid concern. I believe that virtual dispatch on a > > > next call is going to exhibit higher performance than a switch statement > > > because: > > > > > > - If the switch statement gets large, then it won't get inlined. > > > > > > - If the next() call is polymorphic, it will still be inlined so long as the > > > total size of the inlinees is smaller than the inlining size limit. > > > > > > Therefore, inlining will happen in exactly the same cases regardless. But > > > if it doesn't happen, then the switch approach is strictly worse: first you > > > pay the price for a call and then you pay the price for a switch, as opposed > > > to having the call do the switch for you. > > > > Thanks. > > The number of the call variants of the next() function's dispatcher becomes > > many in both implementations if there are many kinds of generators. But I > > thought that we can have a chance to retrieve one possible inlinee from this > > many call variants by using refineStatistically() in the switch case (in the > > current ToT implementation, it's not possible). > > > > But maybe, the number of the call variants depends on the benchmark (how to > > use generators), and maybe, before becoming super large, callsite will be > > cleared by GC's weak visiting. So if this becomes a problem, we can revisit > > the design. > > > > And, using switch based implementation makes the function larger, avoiding > > the inlining. > > > > Currently, I'm considering the following rough design. > > > > 1. First, creating UnlinkedCodeBlock for the whole generator code. suspend > > marker opcode is inserted. (Maybe, suspend becomes simple jump. to split > > basic blocks) > > 2. Perform liveness analysis and store liveness info into suspend marker. > > 3. When creating the function fragment (CodeBlock linking phase), we > > retrieve the BB graph from the above unlinked code block, those BBs are > > dominated from the target suspend BB. And create the partial function. > > Note that the property you want is reachability, not dominance. Oops, right. > > > > > The above is the simplest one, I think the above one is good for the first > > implementation currently. > > Several interesting changes can be considered, > > > > if there's a function like, > > > > function *gen() > > { > > for (...) { > > yield; > > } > > > > super long code! > > } > > > > This will produce 2 fragmented function, > > > > function g1() { > > if (...) { > > // suspend > > return; > > } > > > > super long code! > > } > > > > and > > > > function g2() { > > // resume > > increment; > > if (...) { > > // suspend > > return; > > } > > > > super long code! > > } > > > > the above functions can be reduced by using the tail calls. > > > > function g1() { > > if (...) { > > // suspend > > return; > > } > > return mergePoint(); > > } > > > > function g2() { > > // resume > > increment; > > if (...) { > > // suspend > > return; > > } > > return mergePoint(); > > } > > > > function mergePoint() > > { > > super long code! > > } > > Intereating. duplicating super long code is definitely worth avoiding. Do > you have N algorithm in mind for identifying the mergePoint? The super simple idea I've come up to is the following, 1. first, we calculate the reachablility of each BB from the suspending BBs and entry BB 2. we investigate the edges, and if the reachability of the src BB is different from the reachability of the dst BB, this edge should be represented as a tail call jump to the function. Mark the dst BB as an entry to the new function. And for this new entry BB, we additionally perform 1 and 2 repeatedly Maybe, it takes much time tough. https://drive.google.com/file/d/177tgP4pZgE7I68L2McYbA3GpUzjwK98z-w/view?usp=sharing https://drive.google.com/file/d/1V1tseE3MQrEg0Cemq0WpG5bR0KO0N889dQ/view?usp=sharing
Yusuke Suzuki
Comment 64 2016-06-08 08:10:14 PDT
(In reply to comment #63) > > Maybe, it takes much time tough. > https://drive.google.com/file/d/177tgP4pZgE7I68L2McYbA3GpUzjwK98z-w/ > view?usp=sharing > https://drive.google.com/file/d/1V1tseE3MQrEg0Cemq0WpG5bR0KO0N889dQ/ > view?usp=sharing /tough/though/
Yusuke Suzuki
Comment 65 2016-06-08 09:00:21 PDT
Like this. (Currently, not considered deeply yet) // functionEntryPoint is the entry BB of this whole function. // resumeN is the BB where generator will resume. generatorEntries = Set([ functionEntryPoint, resume1, resume2, resume3, ... resumeN ]) generatorizeTasks = [ functionEntryPoint, resume1, resume2, resume3, ... resumeN ] while (!generatorizeTasks.isEmpty()) { entry = generatorizeTasks.take(); reachabilityTasks = [ entry ]; entry.markReachableFrom(entry); while (!reachablityTasks.isEmpty()) { node = reachabilityTasks.take(); for (successor of node.successors()) { if (!successor.isReachableFrom(entry)) { successor.markReachableFrom(entry); reachabilityTasks.append(successor); if (successor.reachabilitySet != node.reachabilitySet) { if (!generatorEntries.contain(successor)) { // This is the new entry to the new function. generatorEntries.add(successor); generatorizeTasks.append(successor); } } } } } }
Filip Pizlo
Comment 66 2016-06-08 09:23:48 PDT
(In reply to comment #65) > Like this. (Currently, not considered deeply yet) > > // functionEntryPoint is the entry BB of this whole function. > // resumeN is the BB where generator will resume. > generatorEntries = Set([ functionEntryPoint, resume1, resume2, resume3, ... > resumeN ]) > generatorizeTasks = [ functionEntryPoint, resume1, resume2, resume3, ... > resumeN ] > while (!generatorizeTasks.isEmpty()) { > entry = generatorizeTasks.take(); > > reachabilityTasks = [ entry ]; > entry.markReachableFrom(entry); > while (!reachablityTasks.isEmpty()) { > node = reachabilityTasks.take(); > for (successor of node.successors()) { > if (!successor.isReachableFrom(entry)) { > successor.markReachableFrom(entry); > reachabilityTasks.append(successor); > > if (successor.reachabilitySet != node.reachabilitySet) { > if (!generatorEntries.contain(successor)) { > // This is the new entry to the new function. > generatorEntries.add(successor); > generatorizeTasks.append(successor); > } > } > } > } > } > } I just realized that we are trying super hard to invent multiple entry points. If a JSC CodeBlock could have a vector of entry points where each entry point was a different bytecode offset, then your generator lowering could use this. Imagine if you could have SideEntryExecutable, which pointed to another FunctionExecutable and just acted as a proxy for calls into that function at a different bytecode entry point. Then you would repoint the next function at different entry point functions, which had different SideEntryExecutables that all shared the same underlying ScriptExecutable. This would save us from having the switch dispatch problem. It would also save us from code duplication. And there wouldn't be any tail call overhead. The easiest incremental first step is to use your original switch lowering, and the later replace the switch with the multiple entry points. Saam and I were talking about multiple entry points for solving other problems, so probably we will eventually have the infrastructure to do this. I started thinking about this once I realized what the duplication would mean for some obvious code patterns. This would have quadratic explosion: If (a) Yield If (b) Yield If (c) Yield ... By the way if you did want to go with the merge point approach, I think that the most precise algorithm would be iterated dominance frontiers, like for picking Phis.
Yusuke Suzuki
Comment 67 2016-06-08 11:13:10 PDT
(In reply to comment #66) > > I just realized that we are trying super hard to invent multiple entry > points. If a JSC CodeBlock could have a vector of entry points where each > entry point was a different bytecode offset, then your generator lowering > could use this. Imagine if you could have SideEntryExecutable, which pointed > to another FunctionExecutable and just acted as a proxy for calls into that > function at a different bytecode entry point. Then you would repoint the > next function at different entry point functions, which had different > SideEntryExecutables that all shared the same underlying ScriptExecutable. That sounds great. > > This would save us from having the switch dispatch problem. It would also > save us from code duplication. And there wouldn't be any tail call overhead. Yeah, nice. Still, one cons compared to the fragmented func ver is... fragmented ver has the chance to make the generator code into small ones, but I think it's ok. It is natural that a large generator code is represeted as a large function. > The easiest incremental first step is to use your original switch lowering, > and the later replace the switch with the multiple entry points. Saam and I > were talking about multiple entry points for solving other problems, so > probably we will eventually have the infrastructure to do this. Make sense. So I'll go with switch based implementation first. > > I started thinking about this once I realized what the duplication would > mean for some obvious code patterns. This would have quadratic explosion: > > If (a) > Yield > If (b) > Yield > If (c) > Yield > ... > > By the way if you did want to go with the merge point approach, I think that > the most precise algorithm would be iterated dominance frontiers, like for > picking Phis. Oh, right! Nodes we should treat them as entry points are `(function-entry U resume-points) U DF+(function-entry U resume-points)`, since the newly created entry points themselves are the entries to the functions as the same to the function-entry and theresume-points.
Yusuke Suzuki
Comment 68 2016-07-03 12:50:29 PDT
Created attachment 282667 [details] Patch WIP: Just move generatorification code to unlinked code block phase. It still uses the old op_save / op_resume impls. We will drop op_save / op_resume related code and introduce transformation that changes op_save / op_resume to scope operations. And currently, there are many duplicate code. We will clean up
Yusuke Suzuki
Comment 69 2016-07-04 04:48:24 PDT
Created attachment 282707 [details] Patch WIP
Yusuke Suzuki
Comment 70 2016-07-04 06:33:14 PDT
Created attachment 282713 [details] Patch WIP
Yusuke Suzuki
Comment 71 2016-07-04 07:47:59 PDT
Created attachment 282719 [details] Patch WIP
Yusuke Suzuki
Comment 72 2016-07-11 05:14:50 PDT
Created attachment 283310 [details] Patch WIP: not refactored yet
WebKit Commit Bot
Comment 73 2016-07-11 05:16:51 PDT
Attachment 283310 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:31: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:176: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:183: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:201: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:205: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:342: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:343: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:344: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:347: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:354: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:355: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:356: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:358: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:359: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:360: One space before end of line comments [whitespace/comments] [5] Total errors found: 15 in 37 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 74 2016-07-12 12:21:22 PDT
Created attachment 283438 [details] Patch WIP
Yusuke Suzuki
Comment 75 2016-07-14 23:33:33 PDT
Created attachment 283739 [details] Patch WIP: OK, it works as expected! Still doing refactoring and testing
Yusuke Suzuki
Comment 76 2016-07-15 04:47:46 PDT
Created attachment 283753 [details] Patch WIP: adding OS X changes (xcodeproj)
WebKit Commit Bot
Comment 77 2016-07-15 04:52:54 PDT
Attachment 283753 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:122: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:77: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:84: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:142: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:149: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:177: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:181: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:278: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 8 in 38 files If any of these errors are false positives, please file a bug against check-webkit-style.
Build Bot
Comment 78 2016-07-15 06:01:03 PDT
Comment on attachment 283753 [details] Patch Attachment 283753 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/1685426 New failing tests: storage/indexeddb/modern/blob-cursor.html js/regress/generator-sunspider-access-nsieve.html
Build Bot
Comment 79 2016-07-15 06:01:09 PDT
Created attachment 283754 [details] Archive of layout-test-results from ews112 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews112 Port: mac-yosemite Platform: Mac OS X 10.10.5
Yusuke Suzuki
Comment 80 2016-07-15 09:36:51 PDT
Created attachment 283765 [details] Patch WIP: Update, more and more refactoring
WebKit Commit Bot
Comment 81 2016-07-15 09:45:22 PDT
Attachment 283765 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:121: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:77: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:84: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:144: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:151: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:179: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:183: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:280: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 8 in 38 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 82 2016-07-19 04:24:47 PDT
Created attachment 283991 [details] Patch WIP
WebKit Commit Bot
Comment 83 2016-07-19 04:27:38 PDT
Attachment 283991 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:115: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:120: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:69: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:76: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:138: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:145: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:173: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:177: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:275: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 9 in 42 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 84 2016-07-19 06:46:11 PDT
Created attachment 284001 [details] Patch OK, ready for review. EWS may fail to build it since Bytecodes.h is not updated while generate-bytecode-files is updated
WebKit Commit Bot
Comment 85 2016-07-19 06:49:25 PDT
Attachment 284001 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:115: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:120: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:69: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:76: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:207: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:214: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:242: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:246: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:418: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 9 in 42 files If any of these errors are false positives, please file a bug against check-webkit-style.
Keith Miller
Comment 86 2016-07-19 09:48:42 PDT
Comment on attachment 284001 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284001&action=review I have a couple of grammar fixes in the Changelog but it looks like EWS is sad. I'll take deeper look later. > Source/JavaScriptCore/ChangeLog:29 > + 3. Second, we perform the liveness analysis again, but at that time, we explicitly ignore the edges between the save points and the merge points. This analysis, but at that time => but this time. > Source/JavaScriptCore/ChangeLog:30 > + referred to as the resumed variables analysis, extracts the liveness at merge point. This liveness information only considers the reachable blocks from the at each merge point.
Yusuke Suzuki
Comment 87 2016-07-20 05:34:30 PDT
Comment on attachment 284001 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284001&action=review Super thanks! EWS may fail to build it since Bytecodes.h is not updated while generate-bytecode-files is updated... >> Source/JavaScriptCore/ChangeLog:29 >> + 3. Second, we perform the liveness analysis again, but at that time, we explicitly ignore the edges between the save points and the merge points. This analysis, > > but at that time => but this time. Fixed. >> Source/JavaScriptCore/ChangeLog:30 >> + referred to as the resumed variables analysis, extracts the liveness at merge point. This liveness information only considers the reachable blocks from the > > at each merge point. Fixed.
Yusuke Suzuki
Comment 88 2016-07-20 05:36:07 PDT
Created attachment 284097 [details] Patch Fixed according to the review
WebKit Commit Bot
Comment 89 2016-07-20 05:37:43 PDT
Attachment 284097 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:115: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:120: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:69: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:76: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:207: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:214: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:242: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:246: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:418: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 9 in 42 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 90 2016-07-21 06:38:46 PDT
Created attachment 284211 [details] Patch Update Bytecodes.h dependency
WebKit Commit Bot
Comment 91 2016-07-21 06:41:00 PDT
Attachment 284211 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:115: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:120: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:69: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:76: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:207: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:214: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:242: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:246: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:418: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 9 in 42 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 92 2016-07-21 06:48:04 PDT
Created attachment 284212 [details] Patch Update Bytecodes.h dependency part 2
Yusuke Suzuki
Comment 93 2016-07-21 06:55:57 PDT
Ah, OK. Bytecodes.h is not updated sometimes b/c generate-bytecode-files checks SHA1, but at that time, only input files are checked. In this patch, we changed generate-bytecode-files itself. So, this is not included in SHA1 hash.
Yusuke Suzuki
Comment 94 2016-07-21 07:07:10 PDT
Created attachment 284213 [details] Patch Update Bytecodes.h dependency
WebKit Commit Bot
Comment 95 2016-07-21 07:09:44 PDT
Attachment 284213 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:115: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:120: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:69: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:76: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:207: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:214: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:242: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:246: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:418: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 9 in 42 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 96 2016-07-21 07:10:39 PDT
Comment on attachment 284213 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284213&action=review Maybe, it pass EWS. I've just add generate-bytecode-files itself to the sha1 of the output. > Source/JavaScriptCore/bytecode/BytecodeGraph.h:27 > +#define BytecodeGraph_h I'll update this to #pragma once.
Yusuke Suzuki
Comment 97 2016-07-21 07:57:10 PDT
Created attachment 284215 [details] Patch Windows build fix
WebKit Commit Bot
Comment 98 2016-07-21 08:00:20 PDT
Attachment 284215 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:115: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:120: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:69: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:76: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:207: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:214: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:242: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:246: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:418: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 9 in 42 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 99 2016-07-21 08:27:40 PDT
Comment on attachment 284215 [details] Patch Oh, i'll fix the patch slightly.
Yusuke Suzuki
Comment 100 2016-07-21 10:26:11 PDT
WebKit Commit Bot
Comment 101 2016-07-21 10:28:39 PDT
Attachment 284224 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:115: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:120: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:69: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:76: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:210: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:215: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:224: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:231: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:398: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 9 in 42 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 102 2016-07-25 07:40:15 PDT
Created attachment 284482 [details] Patch rebaseline
WebKit Commit Bot
Comment 103 2016-07-25 07:43:41 PDT
Attachment 284482 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:115: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:120: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:69: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysis.cpp:76: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:211: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:216: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:225: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:232: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:399: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 9 in 43 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 104 2016-07-25 07:46:02 PDT
Ping? :)
Yusuke Suzuki
Comment 105 2016-07-25 07:48:01 PDT
Comment on attachment 284482 [details] Patch I'll perform slight clean up. Soon, I'll upload the patch.
Yusuke Suzuki
Comment 106 2016-07-25 08:50:55 PDT
WebKit Commit Bot
Comment 107 2016-07-25 08:54:31 PDT
Attachment 284487 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:115: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:120: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:292: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysisInlines.h:91: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysisInlines.h:98: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysisInlines.h:118: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysisInlines.h:122: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 7 in 43 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 108 2016-07-25 08:57:05 PDT
WebKit Commit Bot
Comment 109 2016-07-25 08:59:40 PDT
Attachment 284488 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:115: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:120: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:291: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysisInlines.h:91: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysisInlines.h:98: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysisInlines.h:118: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysisInlines.h:122: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 7 in 43 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 110 2016-07-25 09:05:00 PDT
OK, ping? :)
Saam Barati
Comment 111 2016-08-03 01:01:28 PDT
Comment on attachment 284488 [details] Patch I started reading this. I'll continue tmrw.
Saam Barati
Comment 112 2016-08-07 17:54:04 PDT
Comment on attachment 284488 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284488&action=review I'll keep reviewing. I'm still trying to fully understand what all the code does, but I've reviewed some of it, so I'm just posting initial comments > Source/JavaScriptCore/ChangeLog:30 > + At op_save, we should save the variables that (i) will be used after the merge point AND (ii) the newly defined variables from the previous op_resume. So, we propagate If a variable is both def'd and killed between merge points, will we not save it? > Source/JavaScriptCore/ChangeLog:40 > + the variables that is saved conservatively in (3). Why is this needed if the variable is dead? > Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:212 > + // This produces the conservative results for the question, "which variables should be saved and resumed?". What causes it to be conservative? > Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:238 > + changed |= outDidChange; Could this cause the fix point to end early? Is is possible that processing the block would cause the "in" to change, but that we won't ever go over it again visiting that block as a successor? Maybe we also need to see if the block's "in" changes? > Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:267 > + if (opcodeID == op_resume) { Are there not some registers before the switch statement that jumps to the op_resume that def things of value? > Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:330 > + // no-jumps & no r1 defs. No jumps? > Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:336 > + // We consider the basic blocks within the try range are considered as the implicit predecessors of the handlers. By handler do you mean catch handler? > Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:342 > + // This is a little bit conservative: the try range may not cover all the basic block. But since the out set status at > + // the tail of the basic block is always larger than the out set status in the middle of the basic blocks, this does not What's the "out set status" here? Is this live-out data? If so, I don't think this assertion holds. There may be more live variables in the middle of a block then at the tail, right? What would stop that from being true? > Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:354 > + FastBitVector predecessors; I wonder if B3's IndexSet is a more apt data structure? > Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:384 > + do { So this analysis is just computing the defs that reach any block, correct? (except if the predecessor block contained a save) > Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:391 > + // Merge the out set of the incoming predecessors unless the current block starts > + // with the merge point and the predecessor ends with the save point. I see that the code is doing this, the comment should state *why* it's doing this. > Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:393 > + if (!(mergeBlock && m_generatorification.isSaveBlock(predecessor))) > + newIn.merge(predecessor->out()); I have a question, lets say that mergeBlock is true. Isn't this block only reachable through two paths? The "save" path and the "resume" path. We don't account for the "save" path, and the "resume" path will clear all the bits. So shouldn't newIn just be the empty set here? > Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:453 > + // Let's declare the uses based on the saved variables analysis information. I see that the code is doing this. It'd be good for the comment to state why, etc.
Yusuke Suzuki
Comment 113 2016-08-08 07:53:12 PDT
Comment on attachment 284488 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284488&action=review Thanks! I'll upload the patch with the corresponding fixes. >> Source/JavaScriptCore/ChangeLog:30 >> + At op_save, we should save the variables that (i) will be used after the merge point AND (ii) the newly defined variables from the previous op_resume. So, we propagate > > If a variable is both def'd and killed between merge points, will we not save it? Yes, we don't save it. This is handled by taking bit and & with (2) and (3) results. (2) bit mask tell us which variable is live. And if the variable is killed, (2) does not include it. It is safe since we already figured out that this variable won't be used since (2) tell us it is killed. >> Source/JavaScriptCore/ChangeLog:40 >> + the variables that is saved conservatively in (3). > > Why is this needed if the variable is dead? Performing (4) phase without (3) infomration extracts the exact set of resumed variables. It could not contain the variables which will be saved conservatively in (3). So, resume r1, r2, r3 without r4 def... save r4 could occur. If the op_save tell us it uses r4, the resume op will def r4 for that. >> Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:212 >> + // This produces the conservative results for the question, "which variables should be saved and resumed?". > > What causes it to be conservative? Let's consider the example, function *gen() { var target = 42; yield 30; // Here, we should save "target". It is OK. But at the resume point, we unnecessary resume "target" here. yield 40; // And here, we unnecessarily save "target", since it is *def*ined by the previous resume. yield target; // And reference the "target" here. } ToT (without this patch) compiles it into the following bytecodes. Since ToT just uses this liveness analysis, this bytecode shows what I say this is conservative. [ 0] enter [ 1] get_scope loc3 [ 3] mov loc4, loc3 [ 6] switch_imm 0, 145(->151), arg2 [ 10] mov loc6, Undefined(const0) [ 13] stricteq loc7, arg4, Int32: 0(const1) [ 17] jtrue loc7, 14(->31) [ 20] stricteq loc7, arg4, Int32: 2(const2) [ 24] jtrue loc7, 5(->29) [ 27] ret arg3 [ 29] throw arg3 [ 31] mov loc5, Int32: 42(const3) [ 34] mov loc7, Int32: 30(const4) [ 37] put_by_id arg1, PrivateSymbol.generatorState(@id0), Int32: 1(const5), Bottom [ 46] save arg1, -----1----(@live0), 9(->55) [ 50] ret loc7 [ 52] resume arg1, -----1----(@live0) [ 55] stricteq loc8, arg4, Int32: 0(const1) [ 59] jtrue loc8, 14(->73) [ 62] stricteq loc8, arg4, Int32: 2(const2) [ 66] jtrue loc8, 5(->71) [ 69] ret arg3 [ 71] throw arg3 [ 73] mov loc7, Int32: 40(const6) [ 76] put_by_id arg1, PrivateSymbol.generatorState(@id0), Int32: 2(const2), Bottom [ 85] save arg1, -----1----(@live1), 9(->94) [ 89] ret loc7 [ 91] resume arg1, -----1----(@live1) [ 94] stricteq loc8, arg4, Int32: 0(const1) [ 98] jtrue loc8, 14(->112) [ 101] stricteq loc8, arg4, Int32: 2(const2) [ 105] jtrue loc8, 5(->110) [ 108] ret arg3 [ 110] throw arg3 [ 112] mov loc7, loc5 [ 115] put_by_id arg1, PrivateSymbol.generatorState(@id0), Int32: 3(const7), Bottom [ 124] save arg1, ----------(@live2), 9(->133) [ 128] ret loc7 [ 130] resume arg1, ----------(@live2) [ 133] stricteq loc8, arg4, Int32: 0(const1) [ 137] jtrue loc8, 14(->151) [ 140] stricteq loc8, arg4, Int32: 2(const2) [ 144] jtrue loc8, 5(->149) [ 147] ret arg3 [ 149] throw arg3 [ 151] ret Undefined(const0) Focusing on the part, corresponding to `yield 40;`, [ 52] resume arg1, -----1----(@live0) [ 55] stricteq loc8, arg4, Int32: 0(const1) [ 59] jtrue loc8, 14(->73) [ 62] stricteq loc8, arg4, Int32: 2(const2) [ 66] jtrue loc8, 5(->71) [ 69] ret arg3 [ 71] throw arg3 [ 73] mov loc7, Int32: 40(const6) [ 76] put_by_id arg1, PrivateSymbol.generatorState(@id0), Int32: 2(const2), Bottom [ 85] save arg1, -----1----(@live1), 9(->94) [ 89] ret loc7 Looking into it, we can find the fact that "while op_resume resumes r5, it is not used except for op_save". This is the problem. If the variable is used in the later generator code, this variable is repeatedly resumed and saved even if it is not used. Saving and resuming the variables that this liveness analysis tells just works. But it unnecessarily saves and resumes variables. That's why I said "conservative" here. In BytecodeGeneratorification::run()'s comment also mention about this. NOTE: Current ToT code allows get_scope and mov before switch_imm, this patch moves switch_imm just after op_enter. >> Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:238 >> + changed |= outDidChange; > > Could this cause the fix point to end early? > Is is possible that processing the block would cause the "in" to change, but that we won't ever go over it again visiting that block as a successor? > Maybe we also need to see if the block's "in" changes? This is the same logic to the existing BytecodeLivenessAnalysis, but right, I think we should use the result of `computeLocalLivenessForBlock` as I do in the forward analysis in L403. Now I think that we have the bugs in this (& the current BytecodeLinvenessAnalysis). If in() is changed, we should recalculate the graph. So code should be block->out().set(newOut); changed |= computeLocalLivenessForBlock(m_generatorification.graph(), block.get()); // return true if block->in() is changed. >> Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:267 >> + if (opcodeID == op_resume) { > > Are there not some registers before the switch statement that jumps to the op_resume that def things of value? We guaranteed that there are no defs that should be saved. op_switch_imm is placed just after op_enter. And the jump target is op_resume. >> Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:330 >> + // no-jumps & no r1 defs. > > No jumps? Ah, no-jumps is not necessary. If there are no reachable r1 defs in the op_save, op_save need not to save r1. >> Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:336 >> + // We consider the basic blocks within the try range are considered as the implicit predecessors of the handlers. > > By handler do you mean catch handler? Right. The catch and finally handlers. > Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:341 > + // This is a little bit conservative: the try range may not cover all the basic block. But since the out set status at This "conservative" in the saved variables analysis is caused by the different reason from the conservativeness in the liveness analysis. start of BB1 ... ... ... def r1 ^ def r2 | => Jump to the handler BB2 (try-range). This range does not cover `def r4`. def r3 v def r4 end of BB1 (defs, r1, r2, r3, r4) start of BB2 (Here, we use the whole BB1 as the implicit BB of BB2. So, we consider r1, r2, r3, r4 are defed in the predecessor. But however, precisely, only r1, r2, r3 are defed. But we consider r4 as defed for simplicity of the analysis. I said this as "conservative" in this saved variables analysis.) >> Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:342 >> + // the tail of the basic block is always larger than the out set status in the middle of the basic blocks, this does not > > What's the "out set status" here? > Is this live-out data? If so, I don't think this assertion holds. There may be more live variables in the middle of a block then at the tail, right? What would stop that from being true? No. It's not live-out data. It's set of defs that may reach the end of BB. We don't kill defs during this analysis except for op_resume. >> Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:354 >> + FastBitVector predecessors; > > I wonder if B3's IndexSet is a more apt data structure? Looks nice. Moving B3IndexSet, B3IndexMap to WTF and use them here would be good. I'll do that. >> Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:384 >> + do { > > So this analysis is just computing the defs that reach any block, correct? (except if the predecessor block contained a save) Right. This information allows op_save to only save reached defs. >> Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:391 >> + // with the merge point and the predecessor ends with the save point. > > I see that the code is doing this, the comment should state *why* it's doing this. I've described about why we would like to drop the edge here. >> Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:393 >> + newIn.merge(predecessor->out()); > > I have a question, lets say that mergeBlock is true. > Isn't this block only reachable through two paths? The "save" path and the "resume" path. > We don't account for the "save" path, and the "resume" path will clear all the bits. So shouldn't > newIn just be the empty set here? Ah, right! In the merge block, first, we need to check the resume mode, so the BB starting with the merge point is only used for generator resume case and there is no incomming edge into this merge block except for resume and save edges. >> Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:453 >> + // Let's declare the uses based on the saved variables analysis information. > > I see that the code is doing this. It'd be good for the comment to state why, etc. I've fixed the comment :) > Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:322 > + // Before emitting any of the actual code except for op_enter, emit a generator prologue that contains jump based on a generator's state. NOTE: Just after we emit op_enter, we emit op_switch_imm for generator labels. And it jumps to op_resume / initial basic block.
Yusuke Suzuki
Comment 114 2016-08-08 07:57:14 PDT
WebKit Commit Bot
Comment 115 2016-08-08 07:58:55 PDT
Attachment 285565 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:115: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:120: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysisInlines.h:91: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysisInlines.h:98: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysisInlines.h:118: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysisInlines.h:122: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:289: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/WTF/wtf/IndexedContainerIterator.h:48: This { should be at the end of the previous line [whitespace/braces] [4] Total errors found: 8 in 69 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 116 2016-08-08 08:41:54 PDT
Comment on attachment 285565 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=285565&action=review > Source/JavaScriptCore/ChangeLog:28 > + to be saved and resumed at each point. NOTE: If we ignore the inefficiency of LLInt and baseline JIT, we can just use (2)'s result since (2) gives conservative (just works) liveness data. But if we would like to get the same effect in DFG to (3) & (4), we need to annotate inserted get_from_scope/put_from_scope as generator variable related. This is because, 1: GetClosureVar("g1") 2: Call() 3: PutClosureVar("g1", @1) We cannot drop @1 and @2 without the "generator-variables" annotation since @2 call may change the ClosureVars if @1 and @2 is not related to generator-variables. Currently, I took the approach that analyzes in this phase.
Filip Pizlo
Comment 117 2016-08-08 09:41:03 PDT
(In reply to comment #116) > Comment on attachment 285565 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=285565&action=review > > > Source/JavaScriptCore/ChangeLog:28 > > + to be saved and resumed at each point. > > NOTE: If we ignore the inefficiency of LLInt and baseline JIT, we can just > use (2)'s result since (2) gives conservative (just works) liveness data. > But if we would like to get the same effect in DFG to (3) & (4), we need to > annotate inserted get_from_scope/put_from_scope as generator variable > related. > This is because, > > 1: GetClosureVar("g1") > 2: Call() > 3: PutClosureVar("g1", @1) > > We cannot drop @1 and @2 without the "generator-variables" annotation since > @2 call may change the ClosureVars if @1 and @2 is not related to > generator-variables. > Currently, I took the approach that analyzes in this phase. Does the current patch do this or is this a future feature? In my experience it's best to implement CSE improvements after everything is already working. Also, I don't think that annotating these as "closure" vars is really the right ting. In reality, the annotation you're looking for is something like the C++ "restrict" annotation, which means: the thing I point to cannot be pointed to by anyone else. Maybe we can even call it "restrict" since that's already a term of art.
Filip Pizlo
Comment 118 2016-08-08 09:52:27 PDT
Comment on attachment 285565 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=285565&action=review > Source/JavaScriptCore/ChangeLog:17 > + entry -> generator switch ----> op_save -> merge point -> ... > + | ^ > + +--------------> op_resume -----+ Is it really necessary to have separate op_save/op_resume instructions in the initial generation? It seems like this would work just as well: function* foo() { fizz(); yield; buzz(); } becomes: op_call fizz op_yield op_call buzz Then you can find the set of live variables at any op_yield. I think this would reduce the number of analyses you have to do. Right now, instead of op_yield you have a complex control flow structure that is a union of the control flow before generatorification and control flow after generatorification, and so your analyses have to reason about which edges to ignore. Then, once you've analyzed everything, you can turn the op_yield into a ret followed by a jump target, and convert the top of the function to be a switch. It's hard for me to review this patch without understanding why op_save/op_resume are part of the IR. To me, they seem to just make everything more complicated. > Source/JavaScriptCore/ChangeLog:30 > + At op_save, we should save the variables that (i) will be used after the merge point AND (ii) the newly defined variables from the previous op_resume. So, we propagate Why are the newly defined variables at the previous op_resume relevant? Intuitively, liveness should suffice. > Source/JavaScriptCore/ChangeLog:78 > + Basic 38.61 ms ± 1.65 ms 23.97 ms ± 0.38 ms 3262.46 ms ± 74.12 ms NICE!
Yusuke Suzuki
Comment 119 2016-08-08 11:15:04 PDT
Comment on attachment 285565 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=285565&action=review Thanks! >> Source/JavaScriptCore/ChangeLog:17 >> + +--------------> op_resume -----+ > > Is it really necessary to have separate op_save/op_resume instructions in the initial generation? It seems like this would work just as well: > > function* foo() > { > fizz(); > yield; > buzz(); > } > > becomes: > > op_call fizz > op_yield > op_call buzz > > Then you can find the set of live variables at any op_yield. I think this would reduce the number of analyses you have to do. Right now, instead of op_yield you have a complex control flow structure that is a union of the control flow before generatorification and control flow after generatorification, and so your analyses have to reason about which edges to ignore. > > Then, once you've analyzed everything, you can turn the op_yield into a ret followed by a jump target, and convert the top of the function to be a switch. > > It's hard for me to review this patch without understanding why op_save/op_resume are part of the IR. To me, they seem to just make everything more complicated. Nice. It looks that this op_yield approach simplifies the code. If we take this approach, instead of ignoring the edge, the use & def declaration of op_yield will be changed in each analysis. (And if we take the (3) approach described in the below comment. If we take the (1) or (2) approach, the code here becomes simpler, but LLInt / Baseline will get inefficient implementations) >>> Source/JavaScriptCore/ChangeLog:28 >>> + to be saved and resumed at each point. >> >> NOTE: If we ignore the inefficiency of LLInt and baseline JIT, we can just use (2)'s result since (2) gives conservative (just works) liveness data. >> But if we would like to get the same effect in DFG to (3) & (4), we need to annotate inserted get_from_scope/put_from_scope as generator variable related. >> This is because, >> >> 1: GetClosureVar("g1") >> 2: Call() >> 3: PutClosureVar("g1", @1) >> >> We cannot drop @1 and @2 without the "generator-variables" annotation since @2 call may change the ClosureVars if @1 and @2 is not related to generator-variables. >> Currently, I took the approach that analyzes in this phase. > > Does the current patch do this or is this a future feature? > > In my experience it's best to implement CSE improvements after everything is already working. Also, I don't think that annotating these as "closure" vars is really the right ting. In reality, the annotation you're looking for is something like the C++ "restrict" annotation, which means: the thing I point to cannot be pointed to by anyone else. Maybe we can even call it "restrict" since that's already a term of art. The above DFG handling is not included in this patch. There are roughly 3 design choices (And the first one is transient one for the final design). Currently, this patch takes the third (3) option. But I think the first choice would be good for the first implementaiton. Fortunately, (3) implementation still exists here. So if we implement (2) in the latter patch, we can compare the performance. 1. Use liveness data Just use the liveness data. It's the simplest (And currrent ToT does). But it easily causes many unnecessary saves and resumes. function *gen() { def r1 def r2 def r3 yield 42 // r1, r2, r3 are live => saved and resumed yield 43 // r1, r2, r3 are live => saved and resumed yield 44 // r1, r2, r3 are live => saved and resumed ... // if we add more yields here, you can see more unnecessary saves and resumes! yield 45 // r1, r2, r3 are live => saved and resumed use r1 use r2 use r3 } In the above example, we repeatedly saves and resumes the r1, r2, r3. Except for the first save and last resume, they are completely unnecessary. 2. Use (1) in LLInt and Baseline and drop unnecessary saves and resumes in DFG and later This is the explained method in the above comment. In that case, we annotate get_from_scope/put_to_scope *bytecode* and convert them to some DFG Nodes in DFGByteCodeParser with "restrict" annotation. (Maybe, the simplest way is producing GetClosureVar / PutClosureVar with "restrict" opinfo. At that case, maybe, we should carefully load the values from the scope when OSRExit occurs) Pros: + Bytecode generation is a little bit simplified. (2 of 3 analysis are dropped) Cons: + DFG need to handle "restrict" Nodes additionally. + LLInt and Baseline should work with unnecessary saves and resumes. 3. Drop unnecessary save and resume in generatorification phase (bytecode level) This is currently used method in this patch. In addition to the liveness analysis, driving one forward analysis and one backward analysis to extract which save and resume are unnecessary. And do not emit them in the generatorification phase. Pros: + We need to do nothing in DFG phase. + LLInt and Baseline also receive the benefit. Cons: + We need to drive 2 more analysis in the bytecode generatorification phase. >> Source/JavaScriptCore/ChangeLog:30 >> + At op_save, we should save the variables that (i) will be used after the merge point AND (ii) the newly defined variables from the previous op_resume. So, we propagate > > Why are the newly defined variables at the previous op_resume relevant? Intuitively, liveness should suffice. In order to filter the unnecessary saves (explained in the above comment) in the liveness.
Filip Pizlo
Comment 120 2016-08-08 11:22:53 PDT
(In reply to comment #119) > Comment on attachment 285565 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=285565&action=review > > Thanks! > > >> Source/JavaScriptCore/ChangeLog:17 > >> + +--------------> op_resume -----+ > > > > Is it really necessary to have separate op_save/op_resume instructions in the initial generation? It seems like this would work just as well: > > > > function* foo() > > { > > fizz(); > > yield; > > buzz(); > > } > > > > becomes: > > > > op_call fizz > > op_yield > > op_call buzz > > > > Then you can find the set of live variables at any op_yield. I think this would reduce the number of analyses you have to do. Right now, instead of op_yield you have a complex control flow structure that is a union of the control flow before generatorification and control flow after generatorification, and so your analyses have to reason about which edges to ignore. > > > > Then, once you've analyzed everything, you can turn the op_yield into a ret followed by a jump target, and convert the top of the function to be a switch. > > > > It's hard for me to review this patch without understanding why op_save/op_resume are part of the IR. To me, they seem to just make everything more complicated. > > Nice. It looks that this op_yield approach simplifies the code. > If we take this approach, instead of ignoring the edge, the use & def > declaration of op_yield will be changed in each analysis. > (And if we take the (3) approach described in the below comment. If we take > the (1) or (2) approach, the code here becomes simpler, but LLInt / Baseline > will get inefficient implementations) Actually, yield does not use or def anything, and I think it would be very confusing to have any analysis say otherwise. My understanding is that you're really doing this: 1) Determine what is live at each yield and prepare to emit put_to_scope/get_from_scope around it. 2) Perform a CSE over the put_to_scope/get_from_scope that you would have inserted. Usually, CSE is a backwards analysis. > > >>> Source/JavaScriptCore/ChangeLog:28 > >>> + to be saved and resumed at each point. > >> > >> NOTE: If we ignore the inefficiency of LLInt and baseline JIT, we can just use (2)'s result since (2) gives conservative (just works) liveness data. > >> But if we would like to get the same effect in DFG to (3) & (4), we need to annotate inserted get_from_scope/put_from_scope as generator variable related. > >> This is because, > >> > >> 1: GetClosureVar("g1") > >> 2: Call() > >> 3: PutClosureVar("g1", @1) > >> > >> We cannot drop @1 and @2 without the "generator-variables" annotation since @2 call may change the ClosureVars if @1 and @2 is not related to generator-variables. > >> Currently, I took the approach that analyzes in this phase. > > > > Does the current patch do this or is this a future feature? > > > > In my experience it's best to implement CSE improvements after everything is already working. Also, I don't think that annotating these as "closure" vars is really the right ting. In reality, the annotation you're looking for is something like the C++ "restrict" annotation, which means: the thing I point to cannot be pointed to by anyone else. Maybe we can even call it "restrict" since that's already a term of art. > > The above DFG handling is not included in this patch. > There are roughly 3 design choices (And the first one is transient one for > the final design). > Currently, this patch takes the third (3) option. > But I think the first choice would be good for the first implementaiton. > Fortunately, (3) implementation still exists here. So if we implement (2) in > the latter patch, we can compare the performance. > > 1. Use liveness data > > Just use the liveness data. It's the simplest (And currrent ToT does). But > it easily causes many unnecessary saves and resumes. > function *gen() { > def r1 > def r2 > def r3 > yield 42 // r1, r2, r3 are live => saved and resumed > yield 43 // r1, r2, r3 are live => saved and resumed > yield 44 // r1, r2, r3 are live => saved and resumed > ... // if we add more yields here, you can see more unnecessary > saves and resumes! > yield 45 // r1, r2, r3 are live => saved and resumed > use r1 > use r2 > use r3 > } > In the above example, we repeatedly saves and resumes the r1, r2, r3. Except > for the first save and last resume, they are completely unnecessary. > > 2. Use (1) in LLInt and Baseline and drop unnecessary saves and resumes in > DFG and later > > This is the explained method in the above comment. In that case, we annotate > get_from_scope/put_to_scope *bytecode* and convert them to some DFG Nodes in > DFGByteCodeParser with "restrict" annotation. (Maybe, the simplest way is > producing GetClosureVar / PutClosureVar with "restrict" opinfo. At that > case, maybe, we should carefully load the values from the scope when OSRExit > occurs) > > Pros: > + Bytecode generation is a little bit simplified. (2 of 3 analysis are > dropped) > Cons: > + DFG need to handle "restrict" Nodes additionally. > + LLInt and Baseline should work with unnecessary saves and resumes. > > 3. Drop unnecessary save and resume in generatorification phase (bytecode > level) > > This is currently used method in this patch. In addition to the liveness > analysis, driving one forward analysis and one backward analysis to extract > which save and resume are unnecessary. > And do not emit them in the generatorification phase. > > Pros: > + We need to do nothing in DFG phase. > + LLInt and Baseline also receive the benefit. > Cons: > + We need to drive 2 more analysis in the bytecode generatorification > phase. > > >> Source/JavaScriptCore/ChangeLog:30 > >> + At op_save, we should save the variables that (i) will be used after the merge point AND (ii) the newly defined variables from the previous op_resume. So, we propagate > > > > Why are the newly defined variables at the previous op_resume relevant? Intuitively, liveness should suffice. > > In order to filter the unnecessary saves (explained in the above comment) in > the liveness. Gotcha.
Yusuke Suzuki
Comment 121 2016-08-08 19:54:32 PDT
Comment on attachment 285565 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=285565&action=review >>>> Source/JavaScriptCore/ChangeLog:17 >>>> + +--------------> op_resume -----+ >>> >>> Is it really necessary to have separate op_save/op_resume instructions in the initial generation? It seems like this would work just as well: >>> >>> function* foo() >>> { >>> fizz(); >>> yield; >>> buzz(); >>> } >>> >>> becomes: >>> >>> op_call fizz >>> op_yield >>> op_call buzz >>> >>> Then you can find the set of live variables at any op_yield. I think this would reduce the number of analyses you have to do. Right now, instead of op_yield you have a complex control flow structure that is a union of the control flow before generatorification and control flow after generatorification, and so your analyses have to reason about which edges to ignore. >>> >>> Then, once you've analyzed everything, you can turn the op_yield into a ret followed by a jump target, and convert the top of the function to be a switch. >>> >>> It's hard for me to review this patch without understanding why op_save/op_resume are part of the IR. To me, they seem to just make everything more complicated. >> >> Nice. It looks that this op_yield approach simplifies the code. >> If we take this approach, instead of ignoring the edge, the use & def declaration of op_yield will be changed in each analysis. >> (And if we take the (3) approach described in the below comment. If we take the (1) or (2) approach, the code here becomes simpler, but LLInt / Baseline will get inefficient implementations) > > Actually, yield does not use or def anything, and I think it would be very confusing to have any analysis say otherwise. > > My understanding is that you're really doing this: > > 1) Determine what is live at each yield and prepare to emit put_to_scope/get_from_scope around it. > 2) Perform a CSE over the put_to_scope/get_from_scope that you would have inserted. Usually, CSE is a backwards analysis. 1 is correct, but 2 is different. I'll explain about it here. And after considering, I think the current op_save / op_restore design would be nice. OK, let's describe why I thought. Yup, as you said, yield does not use or def anything (in precise, yield uses ther return value (`yield v`), and def the next(...) value (`v = yield a`). but it is not so related to the current topic). But if I took this op_yield design, we still need to change the use-def declaration of op_yield, tricky. In this patch, we took the (3) option in the previous comment. It drives 3 analysis, "liveness analysis", "saved variables analysis", and "resumed variables analysis". The first "liveness analysis" assumes that the graph is something like the op_yield approach. But the other 2 analysis assume that the graph is the final generator graph (switch etc.). Currently, we achieve this by dropping merge-save edge in the latter 2 analysis. This easily changes the graph from the op_yield like one to the final graph shape. And if we use op_yield approach and if we would like to change the graph without reconstructing the whole BBs, the easiest way is changing op_yield use-def: clear all the defs at op_yield. Comparing the above 2 approach, I think the current union-of-graphs approach is rather simple, since we achieve the result by just dropping merge-save edges. And also, I would like to describe why we need 2 graphs. This is because of the core concept of th unnecessary saved and resumed variables. The core concept is, the given live variable may be used beyond the generator-next-call. If so, this variable may not be live in the current generator-next-call. While the liveness can be calculated in the op_yield graph, the liveness in the generator-next-call-split-graph can be calculated in the final graph. So now, this patch performing, 1. Determine what is live at each yield and prepare to emit put_to_scope/get_from_scope around it. 2. Changing the graph shape from op_yield style to the final graph style 3. Calculate saves in each generator-next-call's subgraph. 4. Calculate resumes in each generator-next-call's subgraph. For examle, function *gen() { def r1 def r2 def r3 yield 42 // r1, r2, r3 are live => saved and resumed yield 43 // r1, r2, r3 are live => saved and resumed yield 44 // r1, r2, r3 are live => saved and resumed yield 45 // r1, r2, r3 are live => saved and resumed use r1 use r2 use r3 } First, we calculate the liveness in gen. And next, we change the graph to the following and extract the variables should be resumed and saved. (And this graph change is just done by dropping merge-save edge) function *gen() { #BB0 switch -> BB1, BB2, BB3, BB4, BB5 #BB1 def r1 def r2 def r3 yield_ret 42 # r1, r2, r3 should be saved (by propagating r1, r2, r3 defs) #BB2 yield_def # no! yield_ret 43 # no! #BB3 yield_def # no! yield_ret 44 # no! #BB4 yield_def # no! yield_ret 45 # no! #BB5 yield_def # r1, r2, r3 should be resumed use r1 use r2 use r3 }
Filip Pizlo
Comment 122 2016-08-08 19:59:43 PDT
(In reply to comment #121) > Comment on attachment 285565 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=285565&action=review > > >>>> Source/JavaScriptCore/ChangeLog:17 > >>>> + +--------------> op_resume -----+ > >>> > >>> Is it really necessary to have separate op_save/op_resume instructions in the initial generation? It seems like this would work just as well: > >>> > >>> function* foo() > >>> { > >>> fizz(); > >>> yield; > >>> buzz(); > >>> } > >>> > >>> becomes: > >>> > >>> op_call fizz > >>> op_yield > >>> op_call buzz > >>> > >>> Then you can find the set of live variables at any op_yield. I think this would reduce the number of analyses you have to do. Right now, instead of op_yield you have a complex control flow structure that is a union of the control flow before generatorification and control flow after generatorification, and so your analyses have to reason about which edges to ignore. > >>> > >>> Then, once you've analyzed everything, you can turn the op_yield into a ret followed by a jump target, and convert the top of the function to be a switch. > >>> > >>> It's hard for me to review this patch without understanding why op_save/op_resume are part of the IR. To me, they seem to just make everything more complicated. > >> > >> Nice. It looks that this op_yield approach simplifies the code. > >> If we take this approach, instead of ignoring the edge, the use & def declaration of op_yield will be changed in each analysis. > >> (And if we take the (3) approach described in the below comment. If we take the (1) or (2) approach, the code here becomes simpler, but LLInt / Baseline will get inefficient implementations) > > > > Actually, yield does not use or def anything, and I think it would be very confusing to have any analysis say otherwise. > > > > My understanding is that you're really doing this: > > > > 1) Determine what is live at each yield and prepare to emit put_to_scope/get_from_scope around it. > > 2) Perform a CSE over the put_to_scope/get_from_scope that you would have inserted. Usually, CSE is a backwards analysis. > > 1 is correct, but 2 is different. > I'll explain about it here. And after considering, I think the current > op_save / op_restore design would be nice. > OK, let's describe why I thought. > > Yup, as you said, yield does not use or def anything (in precise, yield uses > ther return value (`yield v`), and def the next(...) value (`v = yield a`). > but it is not so related to the current topic). > But if I took this op_yield design, we still need to change the use-def > declaration of op_yield, tricky. > > In this patch, we took the (3) option in the previous comment. It drives 3 > analysis, "liveness analysis", "saved variables analysis", and "resumed > variables analysis". > The first "liveness analysis" assumes that the graph is something like the > op_yield approach. But the other 2 analysis assume that the graph is the > final generator graph (switch etc.). > Currently, we achieve this by dropping merge-save edge in the latter 2 > analysis. This easily changes the graph from the op_yield like one to the > final graph shape. > And if we use op_yield approach and if we would like to change the graph > without reconstructing the whole BBs, the easiest way is changing op_yield > use-def: clear all the defs at op_yield. > Comparing the above 2 approach, I think the current union-of-graphs approach > is rather simple, since we achieve the result by just dropping merge-save > edges. > > And also, I would like to describe why we need 2 graphs. > This is because of the core concept of th unnecessary saved and resumed > variables. > The core concept is, the given live variable may be used beyond the > generator-next-call. If so, this variable may not be live in the current > generator-next-call. > While the liveness can be calculated in the op_yield graph, the liveness in > the generator-next-call-split-graph can be calculated in the final graph. > > So now, this patch performing, > > 1. Determine what is live at each yield and prepare to emit > put_to_scope/get_from_scope around it. > 2. Changing the graph shape from op_yield style to the final graph style > 3. Calculate saves in each generator-next-call's subgraph. > 4. Calculate resumes in each generator-next-call's subgraph. > > For examle, > > function *gen() { > def r1 > def r2 > def r3 > yield 42 // r1, r2, r3 are live => saved and resumed > yield 43 // r1, r2, r3 are live => saved and resumed > yield 44 // r1, r2, r3 are live => saved and resumed > yield 45 // r1, r2, r3 are live => saved and resumed > use r1 > use r2 > use r3 > } > > First, we calculate the liveness in gen. > And next, we change the graph to the following and extract the variables > should be resumed and saved. (And this graph change is just done by dropping > merge-save edge) > > function *gen() { > #BB0 > switch -> BB1, BB2, BB3, BB4, BB5 > > #BB1 > def r1 > def r2 > def r3 > yield_ret 42 # r1, r2, r3 should be saved (by propagating r1, r2, r3 > defs) > > #BB2 > yield_def # no! > yield_ret 43 # no! > > #BB3 > yield_def # no! > yield_ret 44 # no! > > #BB4 > yield_def # no! > yield_ret 45 # no! > > #BB5 > yield_def # r1, r2, r3 should be resumed > use r1 > use r2 > use r3 > } I see. I think that your patch is implementing two things at once: 1) A simplification to the current generator implementation that more naturally permits the DFG to compile generators. 2) An optimization to the current generator implementation that removes some of the need to save/restore variables. Do you know if (2) is needed? If each op_yield emits put_to_scope/get_from_scope instructions for each live variable even when it's not needed, then what happens to performance on the various benchmarks?
Yusuke Suzuki
Comment 123 2016-08-08 20:15:14 PDT
Comment on attachment 285565 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=285565&action=review >>>>>> Source/JavaScriptCore/ChangeLog:17 >>>>>> + +--------------> op_resume -----+ >>>>> >>>>> Is it really necessary to have separate op_save/op_resume instructions in the initial generation? It seems like this would work just as well: >>>>> >>>>> function* foo() >>>>> { >>>>> fizz(); >>>>> yield; >>>>> buzz(); >>>>> } >>>>> >>>>> becomes: >>>>> >>>>> op_call fizz >>>>> op_yield >>>>> op_call buzz >>>>> >>>>> Then you can find the set of live variables at any op_yield. I think this would reduce the number of analyses you have to do. Right now, instead of op_yield you have a complex control flow structure that is a union of the control flow before generatorification and control flow after generatorification, and so your analyses have to reason about which edges to ignore. >>>>> >>>>> Then, once you've analyzed everything, you can turn the op_yield into a ret followed by a jump target, and convert the top of the function to be a switch. >>>>> >>>>> It's hard for me to review this patch without understanding why op_save/op_resume are part of the IR. To me, they seem to just make everything more complicated. >>>> >>>> Nice. It looks that this op_yield approach simplifies the code. >>>> If we take this approach, instead of ignoring the edge, the use & def declaration of op_yield will be changed in each analysis. >>>> (And if we take the (3) approach described in the below comment. If we take the (1) or (2) approach, the code here becomes simpler, but LLInt / Baseline will get inefficient implementations) >>> >>> Actually, yield does not use or def anything, and I think it would be very confusing to have any analysis say otherwise. >>> >>> My understanding is that you're really doing this: >>> >>> 1) Determine what is live at each yield and prepare to emit put_to_scope/get_from_scope around it. >>> 2) Perform a CSE over the put_to_scope/get_from_scope that you would have inserted. Usually, CSE is a backwards analysis. >> >> 1 is correct, but 2 is different. >> I'll explain about it here. And after considering, I think the current op_save / op_restore design would be nice. >> OK, let's describe why I thought. >> >> Yup, as you said, yield does not use or def anything (in precise, yield uses ther return value (`yield v`), and def the next(...) value (`v = yield a`). but it is not so related to the current topic). >> But if I took this op_yield design, we still need to change the use-def declaration of op_yield, tricky. >> >> In this patch, we took the (3) option in the previous comment. It drives 3 analysis, "liveness analysis", "saved variables analysis", and "resumed variables analysis". >> The first "liveness analysis" assumes that the graph is something like the op_yield approach. But the other 2 analysis assume that the graph is the final generator graph (switch etc.). >> Currently, we achieve this by dropping merge-save edge in the latter 2 analysis. This easily changes the graph from the op_yield like one to the final graph shape. >> And if we use op_yield approach and if we would like to change the graph without reconstructing the whole BBs, the easiest way is changing op_yield use-def: clear all the defs at op_yield. >> Comparing the above 2 approach, I think the current union-of-graphs approach is rather simple, since we achieve the result by just dropping merge-save edges. >> >> And also, I would like to describe why we need 2 graphs. >> This is because of the core concept of th unnecessary saved and resumed variables. >> The core concept is, the given live variable may be used beyond the generator-next-call. If so, this variable may not be live in the current generator-next-call. >> While the liveness can be calculated in the op_yield graph, the liveness in the generator-next-call-split-graph can be calculated in the final graph. >> >> So now, this patch performing, >> >> 1. Determine what is live at each yield and prepare to emit put_to_scope/get_from_scope around it. >> 2. Changing the graph shape from op_yield style to the final graph style >> 3. Calculate saves in each generator-next-call's subgraph. >> 4. Calculate resumes in each generator-next-call's subgraph. >> >> For examle, >> >> function *gen() { >> def r1 >> def r2 >> def r3 >> yield 42 // r1, r2, r3 are live => saved and resumed >> yield 43 // r1, r2, r3 are live => saved and resumed >> yield 44 // r1, r2, r3 are live => saved and resumed >> yield 45 // r1, r2, r3 are live => saved and resumed >> use r1 >> use r2 >> use r3 >> } >> >> First, we calculate the liveness in gen. >> And next, we change the graph to the following and extract the variables should be resumed and saved. (And this graph change is just done by dropping merge-save edge) >> >> function *gen() { >> #BB0 >> switch -> BB1, BB2, BB3, BB4, BB5 >> >> #BB1 >> def r1 >> def r2 >> def r3 >> yield_ret 42 # r1, r2, r3 should be saved (by propagating r1, r2, r3 defs) >> >> #BB2 >> yield_def # no! >> yield_ret 43 # no! >> >> #BB3 >> yield_def # no! >> yield_ret 44 # no! >> >> #BB4 >> yield_def # no! >> yield_ret 45 # no! >> >> #BB5 >> yield_def # r1, r2, r3 should be resumed >> use r1 >> use r2 >> use r3 >> } > > I see. I think that your patch is implementing two things at once: > > 1) A simplification to the current generator implementation that more naturally permits the DFG to compile generators. > > 2) An optimization to the current generator implementation that removes some of the need to save/restore variables. > > Do you know if (2) is needed? If each op_yield emits put_to_scope/get_from_scope instructions for each live variable even when it's not needed, then what happens to performance on the various benchmarks? I'll measure this effect, and once this is measured, I'll paste here! (Initially, I introduced this optimization because I saw so many unnecessary saves and resumes in ES6SampleBench/Basic.)
Filip Pizlo
Comment 124 2016-08-08 20:59:40 PDT
Just some context for why I'm asking for those numbers: my intuition is that most functions have a small number of long-lived locals and that most calls to yield will not be one right after the other. Now, it's possible to occasionally see functions with a large number of long-lived locals, and it's also possible to see lots of calls to yield in a row. But unless *both* of these things happen, your optimization may not have a large effect. For example, if you have two calls to yield that are far apart then even with no changes to locals in between them, your optimization simply won't matter: it's likely that the execution time will be dominated by something other than those load->store combos. I would expect that you would need lots of locals, and very little work between yields, for this to make a big difference. But separately from that, lets assume that this optimization is profitable for common benchmarks, and consider what the simplest mechanism is for achieving the optimization. Remember, the game with writing optimizations is *not* to write the most comprehensive version of the optimization - but one that succeeds in those cases that are likely to arise in real code. You argue that op_yield does not permit reasoning about the control flow in the way that your analyses need. That's only partly true. An analysis can simulate whatever control flow it likes. So, if you need an analysis to "see" the control flow as if the op_yield had already been lowered, then you can simply modify the analysis to do this. You don't actually need to make the bytecode's control flow resemble the shape that the analysis would prefer to consume. Your generator code will not be the only client of this bytecode. It matters that the bytecode before generator conversion is easy to understand, and I think that op_yield is easier to understand than op_save/op_resume. But that's not my only concern about your conversion strategy. You appear to be using a lot of flow analyses. You need to run three flow analyses, two of which are for your optimization. This seems like it could adversely impact page load times when pages start to use generators in anger. I think we should aim to not do a lot of expensive things before we even know that a function is hot. This suggests that your benchmarks should also investigate the effect on parsing performance from your optimization. I bet it makes parsing take longer. To benchmark this you'd probably have to create a microbenchmark that just consists of a few large generator functions that are executed exactly once. If there is indeed a speed-up from your optimization on long-running things and a slow-down on short-running things, then we'd obviously want to switch to a design that elides the unnecessary get_from_scope/put_to_scope operations in DFG and FTL but not in lower tiers.
Yusuke Suzuki
Comment 125 2016-08-08 22:46:26 PDT
(In reply to comment #124) > Just some context for why I'm asking for those numbers: my intuition is that > most functions have a small number of long-lived locals and that most calls > to yield will not be one right after the other. Now, it's possible to > occasionally see functions with a large number of long-lived locals, and > it's also possible to see lots of calls to yield in a row. But unless > *both* of these things happen, your optimization may not have a large effect. > > For example, if you have two calls to yield that are far apart then even > with no changes to locals in between them, your optimization simply won't > matter: it's likely that the execution time will be dominated by something > other than those load->store combos. I would expect that you would need > lots of locals, and very little work between yields, for this to make a big > difference. > > But separately from that, lets assume that this optimization is profitable > for common benchmarks, and consider what the simplest mechanism is for > achieving the optimization. Remember, the game with writing optimizations > is *not* to write the most comprehensive version of the optimization - but > one that succeeds in those cases that are likely to arise in real code. > > You argue that op_yield does not permit reasoning about the control flow in > the way that your analyses need. That's only partly true. An analysis can > simulate whatever control flow it likes. So, if you need an analysis to > "see" the control flow as if the op_yield had already been lowered, then you > can simply modify the analysis to do this. You don't actually need to make > the bytecode's control flow resemble the shape that the analysis would > prefer to consume. Your generator code will not be the only client of this > bytecode. It matters that the bytecode before generator conversion is easy > to understand, and I think that op_yield is easier to understand than > op_save/op_resume. > > But that's not my only concern about your conversion strategy. You appear > to be using a lot of flow analyses. You need to run three flow analyses, > two of which are for your optimization. This seems like it could adversely > impact page load times when pages start to use generators in anger. I think > we should aim to not do a lot of expensive things before we even know that a > function is hot. > > This suggests that your benchmarks should also investigate the effect on > parsing performance from your optimization. I bet it makes parsing take > longer. To benchmark this you'd probably have to create a microbenchmark > that just consists of a few large generator functions that are executed > exactly once. > > If there is indeed a speed-up from your optimization on long-running things > and a slow-down on short-running things, then we'd obviously want to switch > to a design that elides the unnecessary get_from_scope/put_to_scope > operations in DFG and FTL but not in lower tiers. I agree to your opinion on the optimization phase. But even if we don't use the optimization, I still op_save / op_resume approach is simple and efficient. Let me describe why I think! Yup, op_yield makes a little bit complicated to my analysis. And this thing becomes the problem if we use the optimization. But rather than using op_save / op_resume, op_yield makes the bytecode modification complicated. In the correct words, inserting switch and jump targets additionally is more complicated than just converting op_save / op_resume. Currently, we don't take the approach that is similar to DFG Graph: it means that we change the bytecode instructions in-place and we don't create any BytecodeNode or so. It's space/time-efficient. It does not allocate additional sequence of nodes. It does not take additional path converting bytecodes to the BytecodeNode graph and making the graph back to the bytecode. It's nice for bytecode analysis. However, this makes inserting jumps takes more operations. For example, if you would like to insert op_switch_imm additionally at first, we need 2 phase bytecode modifications. 1. before inserting op_switch_imm, we need to adjust jumps in UnlinkedCodeBlock since inserting a bytecode changes the jump offsets. 2. insert op_switch_imm just after the op_enter. 3. adjust inserted op_switch_imm's jump targets based on it. 4. adjust UnlinkedCodeBlock *again* for the subsequent addition / removal 5. insert/remove other bytecode opcodes, (get_from_scope/put_to_scope) 4. re-calculate jump targets OR a little bit efficient version is the following. 1. adjust UnlinkedCodeBlock *again* for the subsequent addition / removal 2. insert/remove other bytecode opcodes, (get_from_scope/put_to_scope), it includes switch_imm insertion. 3. specially adjust the inserted switch_imm jump offsets (this is special ops before calculating jump targets) 4. re-calculate jump targets. But if we already inserted op_switch_imm, the above sequence is more simplified. (And this is just done in this patch's modification phase.) 1. adjust UnlinkedCodeBlock for the subsequent addition / removal 2. insert / remove bytecode opcodes, (get_from_scope/put_to_scope) 4. re-calculate jump targets We don't need to pay attention to the inserted switch_imm specially. Since dropping the merge-save edge is only required if we perform the latter optimizations, if we don't do that, the things become super easy. We just perform liveness analysis and it does not consier anything about save-merge edge. In that case, we don't need to pay any attension to the graph's shape. So if we don't use the optimization pass, op_yield does not simplify things. So even if we do not use the optimization, still I prefer op_save / op_resume approach. In precise, the approach inserting switch in the bytecode generator makes things simple.
Yusuke Suzuki
Comment 126 2016-08-08 23:04:11 PDT
(In reply to comment #125) > (In reply to comment #124) > > Just some context for why I'm asking for those numbers: my intuition is that > > most functions have a small number of long-lived locals and that most calls > > to yield will not be one right after the other. Now, it's possible to > > occasionally see functions with a large number of long-lived locals, and > > it's also possible to see lots of calls to yield in a row. But unless > > *both* of these things happen, your optimization may not have a large effect. > > > > For example, if you have two calls to yield that are far apart then even > > with no changes to locals in between them, your optimization simply won't > > matter: it's likely that the execution time will be dominated by something > > other than those load->store combos. I would expect that you would need > > lots of locals, and very little work between yields, for this to make a big > > difference. > > > > But separately from that, lets assume that this optimization is profitable > > for common benchmarks, and consider what the simplest mechanism is for > > achieving the optimization. Remember, the game with writing optimizations > > is *not* to write the most comprehensive version of the optimization - but > > one that succeeds in those cases that are likely to arise in real code. > > > > You argue that op_yield does not permit reasoning about the control flow in > > the way that your analyses need. That's only partly true. An analysis can > > simulate whatever control flow it likes. So, if you need an analysis to > > "see" the control flow as if the op_yield had already been lowered, then you > > can simply modify the analysis to do this. You don't actually need to make > > the bytecode's control flow resemble the shape that the analysis would > > prefer to consume. Your generator code will not be the only client of this > > bytecode. It matters that the bytecode before generator conversion is easy > > to understand, and I think that op_yield is easier to understand than > > op_save/op_resume. > > > > But that's not my only concern about your conversion strategy. You appear > > to be using a lot of flow analyses. You need to run three flow analyses, > > two of which are for your optimization. This seems like it could adversely > > impact page load times when pages start to use generators in anger. I think > > we should aim to not do a lot of expensive things before we even know that a > > function is hot. > > > > This suggests that your benchmarks should also investigate the effect on > > parsing performance from your optimization. I bet it makes parsing take > > longer. To benchmark this you'd probably have to create a microbenchmark > > that just consists of a few large generator functions that are executed > > exactly once. > > > > If there is indeed a speed-up from your optimization on long-running things > > and a slow-down on short-running things, then we'd obviously want to switch > > to a design that elides the unnecessary get_from_scope/put_to_scope > > operations in DFG and FTL but not in lower tiers. > > I agree to your opinion on the optimization phase. > But even if we don't use the optimization, I still op_save / op_resume > approach is simple and efficient. > Let me describe why I think! > > Yup, op_yield makes a little bit complicated to my analysis. And this thing > becomes the problem if we use the optimization. > > But rather than using op_save / op_resume, op_yield makes the bytecode > modification complicated. > In the correct words, inserting switch and jump targets additionally is more > complicated than just converting op_save / op_resume. > > Currently, we don't take the approach that is similar to DFG Graph: it means > that we change the bytecode instructions in-place and we don't create any > BytecodeNode or so. > It's space/time-efficient. It does not allocate additional sequence of > nodes. It does not take additional path converting bytecodes to the > BytecodeNode graph and making the graph back to the bytecode. It's nice for > bytecode analysis. > > However, this makes inserting jumps takes more operations. For example, if > you would like to insert op_switch_imm additionally at first, we need 2 > phase bytecode modifications. > > 1. before inserting op_switch_imm, we need to adjust jumps in > UnlinkedCodeBlock since inserting a bytecode changes the jump offsets. > 2. insert op_switch_imm just after the op_enter. > 3. adjust inserted op_switch_imm's jump targets based on it. > 4. adjust UnlinkedCodeBlock *again* for the subsequent addition / removal > 5. insert/remove other bytecode opcodes, (get_from_scope/put_to_scope) > 4. re-calculate jump targets > > OR a little bit efficient version is the following. > > 1. adjust UnlinkedCodeBlock *again* for the subsequent addition / removal > 2. insert/remove other bytecode opcodes, (get_from_scope/put_to_scope), it > includes switch_imm insertion. > 3. specially adjust the inserted switch_imm jump offsets (this is special > ops before calculating jump targets) > 4. re-calculate jump targets. > > But if we already inserted op_switch_imm, the above sequence is more > simplified. (And this is just done in this patch's modification phase.) > > 1. adjust UnlinkedCodeBlock for the subsequent addition / removal > 2. insert / remove bytecode opcodes, (get_from_scope/put_to_scope) > 4. re-calculate jump targets > > We don't need to pay attention to the inserted switch_imm specially. > > Since dropping the merge-save edge is only required if we perform the latter > optimizations, if we don't do that, the things become super easy. > We just perform liveness analysis and it does not consier anything about > save-merge edge. > In that case, we don't need to pay any attension to the graph's shape. > So if we don't use the optimization pass, op_yield does not simplify things. > > So even if we do not use the optimization, still I prefer op_save / > op_resume approach. > In precise, the approach inserting switch in the bytecode generator makes > things simple. In addition, if we use op_yield approach, we additionally need to handle try range split. This patch split the try-range at op_save-op_resume. As a result, try-range does not cover op_resume part. This is super nice, since this removes https://bugs.webkit.org/show_bug.cgi?id=159279 hack. Catch variable's use will be enabled after resume's get_from_scope is done. But if there is only op_yield, we need to do either 1. split try-range here! 2. BytecodeLivenessAnalysis handles op_get_from_scope with annotation specially as we do currently for op_resume (that's super bad)
Yusuke Suzuki
Comment 127 2016-08-08 23:26:42 PDT
1 Geometric Mean Result: 166.51 ms ± 4.06 ms 2 3 Benchmark First Iteration Worst 2% Steady State 4 Air 70.89 ms ± 7.25 ms 34.67 ms ± 0.72 ms 3032.77 ms ± 23.28 ms 5 Basic 39.44 ms ± 1.94 ms 23.21 ms ± 0.55 ms 3148.57 ms ± 76.40 ms Do optimization 7 Geometric Mean Result: 166.51 ms ± 4.32 ms 8 9 Benchmark First Iteration Worst 2% Steady State 10 Air 70.94 ms ± 7.90 ms 35.63 ms ± 1.20 ms 2999.62 ms ± 32.52 ms 11 Basic 39.55 ms ± 2.08 ms 22.91 ms ± 0.59 ms 3131.43 ms ± 54.98 ms It seems slightly better. But not so large impact. For the first patch, I just perform the liveness analysis. Due to the described reason, still updated patch uses op_save / op_resume.
Yusuke Suzuki
Comment 128 2016-08-09 00:45:12 PDT
(In reply to comment #126) After considering I think the following design is good. 1. op_ret should be inserted by replacing op_save 2. But keep op_resume to handle try-range split easily 3. switch_imm should be inserted before the generatorification. Inserting jumps in the generatorification brings much more complicated things! I'll prototype the design. This makes the graph a little bit simpler while we still use the op_save and op_resume.
Yusuke Suzuki
Comment 129 2016-08-09 00:55:19 PDT
(In reply to comment #128) > (In reply to comment #126) > > After considering I think the following design is good. > > 1. op_ret should be inserted by replacing op_save > 2. But keep op_resume to handle try-range split easily > 3. switch_imm should be inserted before the generatorification. Inserting > jumps in the generatorification brings much more complicated things! > > I'll prototype the design. This makes the graph a little bit simpler while > we still use the op_save and op_resume. Ah, after prototyping, I think the current one is still simpler. If we introduce the switch_imm before the generatorification phase, the current design works super fine. op_resume absorbs all the uses while use information is propagated to the op_save block through the pseudo branch. If we don't introduce this op_resume block, we still need to handle these mis-propagated uses to the global generator switch. To avoid this situation, we should not insert the switch before the generatorification. But it makes the generatorification super complicated rather than this op_save / op_resume. Anyway, I'll update the patch. It drops the optimization phases. And as a result, the patch itself becomes super simple.
Yusuke Suzuki
Comment 130 2016-08-09 01:17:17 PDT
Yusuke Suzuki
Comment 131 2016-08-09 01:18:54 PDT
Comment on attachment 285633 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=285633&action=review > Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:4507 > + // Restore the try contexts, which start offset is updated to the merge point. This is super important! It drops hack introduced in https://bugs.webkit.org/show_bug.cgi?id=159279. The important thing is that, this try-range does not include op_resume!
WebKit Commit Bot
Comment 132 2016-08-09 01:20:41 PDT
Attachment 285633 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:115: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:120: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysisInlines.h:91: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysisInlines.h:98: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysisInlines.h:118: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/bytecode/BytecodeLivenessAnalysisInlines.h:122: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/WTF/wtf/IndexedContainerIterator.h:48: This { should be at the end of the previous line [whitespace/braces] [4] Total errors found: 7 in 69 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 133 2016-08-09 04:57:34 PDT
Comment on attachment 285633 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=285633&action=review > Source/JavaScriptCore/ChangeLog:17 > + +--------------> op_resume -----+ If we keep the switch while we replace op_save and op_resume to the op_yield, we accidentally propagate the use information to the root block through the switch's edge. entry -> generator switch -> op_yield -> merge point -> | ^ +--------------+ (use is propagated through this edge) I don't think this is good thing. In addition to the try-range's maintenance ease(This is described in the latter comment), I think op_save / op_resume seems simple. Let's look at BytecodeGeneratorification phase, since the optimization is dropped, there is no dropping edge! And I think inserting the switch jump during generatorification is super complicated thing. I would like to avoid such situation. Jumps should be inserted before the generatorification. And the generatorification's modification should focus on just inserting get_from_scope/put_to_scope and dropping op_save & op_resume. >> Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:4507 >> + // Restore the try contexts, which start offset is updated to the merge point. > > This is super important! It drops hack introduced in https://bugs.webkit.org/show_bug.cgi?id=159279. > The important thing is that, this try-range does not include op_resume! Putting op_resume here makes things easy that the merge point label is not the same to the generator's jump target. So we can easily keep the following conditions through the generatorification. 1. merge point and later is covered by the split try-range 2. but switch jump target and op_resume is not covered by the try-range
Yusuke Suzuki
Comment 134 2016-08-09 09:14:19 PDT
Perf results! Baseline Patched generator-fib 111.1883+-1.1031 ^ 38.3085+-0.2268 ^ definitely 2.9024x faster generator-sunspider-access-nsieve 6.0454+-0.0903 ^ 4.9577+-0.1906 ^ definitely 1.2194x faster generator-with-several-types 349.9097+-8.9055 ^ 152.6911+-2.1553 ^ definitely 2.2916x faster <geometric> 61.7180+-0.7429 ^ 30.7132+-0.4180 ^ definitely 2.0095x faster Baseline: Geometric Mean Result: 181.40 ms +- 1.96 ms Benchmark First Iteration Worst 2% Steady State Air 68.18 ms +- 1.03 ms 35.07 ms +- 0.66 ms 3046.77 ms +- 51.49 ms Basic 39.17 ms +- 1.67 ms 28.45 ms +- 0.43 ms 4397.31 ms +- 72.35 ms Patched: Geometric Mean Result: 167.70 ms ± 1.22 ms Benchmark First Iteration Worst 2% Steady State Air 67.86 ms +- 0.86 ms 36.05 ms +- 0.54 ms 3032.62 ms +- 26.96 ms Basic 38.85 ms +- 0.58 ms 24.66 ms +- 0.70 ms 3134.15 ms +- 54.65 ms
Filip Pizlo
Comment 135 2016-08-09 11:46:07 PDT
(In reply to comment #125) > (In reply to comment #124) > > Just some context for why I'm asking for those numbers: my intuition is that > > most functions have a small number of long-lived locals and that most calls > > to yield will not be one right after the other. Now, it's possible to > > occasionally see functions with a large number of long-lived locals, and > > it's also possible to see lots of calls to yield in a row. But unless > > *both* of these things happen, your optimization may not have a large effect. > > > > For example, if you have two calls to yield that are far apart then even > > with no changes to locals in between them, your optimization simply won't > > matter: it's likely that the execution time will be dominated by something > > other than those load->store combos. I would expect that you would need > > lots of locals, and very little work between yields, for this to make a big > > difference. > > > > But separately from that, lets assume that this optimization is profitable > > for common benchmarks, and consider what the simplest mechanism is for > > achieving the optimization. Remember, the game with writing optimizations > > is *not* to write the most comprehensive version of the optimization - but > > one that succeeds in those cases that are likely to arise in real code. > > > > You argue that op_yield does not permit reasoning about the control flow in > > the way that your analyses need. That's only partly true. An analysis can > > simulate whatever control flow it likes. So, if you need an analysis to > > "see" the control flow as if the op_yield had already been lowered, then you > > can simply modify the analysis to do this. You don't actually need to make > > the bytecode's control flow resemble the shape that the analysis would > > prefer to consume. Your generator code will not be the only client of this > > bytecode. It matters that the bytecode before generator conversion is easy > > to understand, and I think that op_yield is easier to understand than > > op_save/op_resume. > > > > But that's not my only concern about your conversion strategy. You appear > > to be using a lot of flow analyses. You need to run three flow analyses, > > two of which are for your optimization. This seems like it could adversely > > impact page load times when pages start to use generators in anger. I think > > we should aim to not do a lot of expensive things before we even know that a > > function is hot. > > > > This suggests that your benchmarks should also investigate the effect on > > parsing performance from your optimization. I bet it makes parsing take > > longer. To benchmark this you'd probably have to create a microbenchmark > > that just consists of a few large generator functions that are executed > > exactly once. > > > > If there is indeed a speed-up from your optimization on long-running things > > and a slow-down on short-running things, then we'd obviously want to switch > > to a design that elides the unnecessary get_from_scope/put_to_scope > > operations in DFG and FTL but not in lower tiers. > > I agree to your opinion on the optimization phase. > But even if we don't use the optimization, I still op_save / op_resume > approach is simple and efficient. > Let me describe why I think! > > Yup, op_yield makes a little bit complicated to my analysis. And this thing > becomes the problem if we use the optimization. > > But rather than using op_save / op_resume, op_yield makes the bytecode > modification complicated. > In the correct words, inserting switch and jump targets additionally is more > complicated than just converting op_save / op_resume. > > Currently, we don't take the approach that is similar to DFG Graph: it means > that we change the bytecode instructions in-place and we don't create any > BytecodeNode or so. > It's space/time-efficient. It does not allocate additional sequence of > nodes. It does not take additional path converting bytecodes to the > BytecodeNode graph and making the graph back to the bytecode. It's nice for > bytecode analysis. I think you're misunderstanding what I've been suggesting with respect to op_yield. I'm *not* suggesting creating a bytecode IR in which each instruction gets its own separately allocated node. I want bytecode to stay the way it is: a linear sequence of instructions. A common feature of other bytecode VMs, which JSC still lacks, is a bytecode rewriter. A bytecode rewriter is a module that can be asked to insert instructions, remove instructions, insert branches, remove branches, etc. The bytecode rewriter can handle rewiring jump offsets, and supports inserting branches both to existing code and to new code that was just inserted. Some bytecode rewriters, like asm (http://asm.ow2.org), provide a rewriting facility based on first allocating an object for each bytecode instruction. Others, like the one we had in OpenVM (no link, dead project), only require you to allocate memory for the fragments of new instructions you would insert. So each slab of put_to_scope's would be a "fragment". I understand that writing such a rewriter is harder than what you have done. But IR design *must not* be driven by the needs of just one optimization. You have created a new IR, which is very different from bytecode. Prior to your change, our bytecode is very conventional. Control flow is explicit and unambiguous. This is a useful property, because this is what everyone will expect from bytecode. You have changed bytecode to have entirely new semantics that nobody would expect: a branch edge can sometimes magically disappear. We have some experience designing IRs in WebKit. Some IRs, like our existing bytecode, B3, and Air, follow textbook approaches and are entirely unsurprising. Other IRs, like DFG CPS and to a lesser extent DFG SSA, are exotic. Also, our worst bugs are in phases that try to analyze or modify DFG CPS. Any chance we make to anything that touches DFG CPS takes much longer than changes that modify other parts of the system. I believe this is caused by DFG CPS's exotic nature: it has a lot of sneaky features that people don't expect, and they forget to handle those features when they make changes. Because you have introduced a new IR - a bytecode with phantom control flow - that is used only between bytecode generation and generatorification, you have created a situation where either this IR can *only* be used for pre-generatorification and cannot be used for any other purpose, or you have created a situation where any attempt to add other lowering phases before generatorification will be super error-prone because people will forget how to handle the semantics of phantom control flow (or they'll simply not even know that the IR has phantom control flow). It's extremely important to design IRs that are not surprising and this is exactly what we've worked hard to do in JSC ever since we realized how much of a disaster DFG CPS was. I could be convinced that we need to use a special IR here, but so far your argument is primarily that it would make your phase easier. I agree that it would make your phase easier, but this is not a scalable approach to designing IRs. If we let people make bizarre changes to DFG IR, B3 IR, or Air just because it would make their phase easier, then we would end up with an unmanagable engine. > > However, this makes inserting jumps takes more operations. For example, if > you would like to insert op_switch_imm additionally at first, we need 2 > phase bytecode modifications. > > 1. before inserting op_switch_imm, we need to adjust jumps in > UnlinkedCodeBlock since inserting a bytecode changes the jump offsets. > 2. insert op_switch_imm just after the op_enter. > 3. adjust inserted op_switch_imm's jump targets based on it. > 4. adjust UnlinkedCodeBlock *again* for the subsequent addition / removal > 5. insert/remove other bytecode opcodes, (get_from_scope/put_to_scope) > 4. re-calculate jump targets > > OR a little bit efficient version is the following. > > 1. adjust UnlinkedCodeBlock *again* for the subsequent addition / removal > 2. insert/remove other bytecode opcodes, (get_from_scope/put_to_scope), it > includes switch_imm insertion. > 3. specially adjust the inserted switch_imm jump offsets (this is special > ops before calculating jump targets) > 4. re-calculate jump targets. This doesn't seem that bad. Do you think that this is bad enough that we need a special IR? > > But if we already inserted op_switch_imm, the above sequence is more > simplified. (And this is just done in this patch's modification phase.) > > 1. adjust UnlinkedCodeBlock for the subsequent addition / removal > 2. insert / remove bytecode opcodes, (get_from_scope/put_to_scope) > 4. re-calculate jump targets > > We don't need to pay attention to the inserted switch_imm specially. > > Since dropping the merge-save edge is only required if we perform the latter > optimizations, if we don't do that, the things become super easy. > We just perform liveness analysis and it does not consier anything about > save-merge edge. > In that case, we don't need to pay any attension to the graph's shape. > So if we don't use the optimization pass, op_yield does not simplify things. > > So even if we do not use the optimization, still I prefer op_save / > op_resume approach. > In precise, the approach inserting switch in the bytecode generator makes > things simple. I would be convinced that op_save/op_resume is better if it makes the parse->generate->generatorify pipeline run faster than it would otherwise. This is a very convincing argument. Is this what you're saying, or are you saying something else? If it's just that your phase would have less code as a result of having a quirky IR preceding that phase, then it's not a convincing argument.
Yusuke Suzuki
Comment 136 2016-08-10 05:43:51 PDT
Thank you!! Aha! Maybe I've just understood why you suggest. The desgin difference between my one and your suggestion comes from the thought about this pre-generatorification IR. My current design see the pre-generatorification bytecode as the half-baked one. And this patch assumes the generatofication is only the user of this IR. So everything is designed for this generatorification pass. I think your op_yield design is good. I'm now exploring the way to use the consistent IR during these phases. However, still I'm not sure about the situation you assume. So, can I ask you about the IR? :) (In reply to comment #135) > I think you're misunderstanding what I've been suggesting with respect to > op_yield. I'm *not* suggesting creating a bytecode IR in which each > instruction gets its own separately allocated node. I want bytecode to stay > the way it is: a linear sequence of instructions. > > A common feature of other bytecode VMs, which JSC still lacks, is a bytecode > rewriter. A bytecode rewriter is a module that can be asked to insert > instructions, remove instructions, insert branches, remove branches, etc. > The bytecode rewriter can handle rewiring jump offsets, and supports > inserting branches both to existing code and to new code that was just > inserted. It is good to implement the rewriter that can insert the jumps. > Some bytecode rewriters, like asm (http://asm.ow2.org), provide a rewriting > facility based on first allocating an object for each bytecode instruction. > Others, like the one we had in OpenVM (no link, dead project), only require > you to allocate memory for the fragments of new instructions you would > insert. So each slab of put_to_scope's would be a "fragment". > > I understand that writing such a rewriter is harder than what you have done. > But IR design *must not* be driven by the needs of just one optimization. I'm not sure about this. The IR after the generatorification is the same one to the current one. So all the other pass can enjoy the existing unsurprising IR, right? The problematic pattern happens only when we would like to insert the pass before the generatorification. But wait. What is the actual example of such a pass? Why not inserting the pass after the generatorification? The beneficial case seems only one case: inserting the op_yield. The other pass should be inserted after the generatorification and it works without problems. But, in that case, why don't we insert the op_yield in the BytecodeGenerator? > You have created a new IR, which is very different from bytecode. Prior to > your change, our bytecode is very conventional. Control flow is explicit > and unambiguous. This is a useful property, because this is what everyone > will expect from bytecode. You have changed bytecode to have entirely new > semantics that nobody would expect: a branch edge can sometimes magically > disappear. If we don't insert anything before the generatorification phase, this IR can be only seen by the generatorification. Is there any user that can see op_save / op_resume? The currrent conventional bytecode is completely kept for the pass after the generatorification. > We have some experience designing IRs in WebKit. Some IRs, like our > existing bytecode, B3, and Air, follow textbook approaches and are entirely > unsurprising. Other IRs, like DFG CPS and to a lesser extent DFG SSA, are > exotic. Also, our worst bugs are in phases that try to analyze or modify > DFG CPS. Any chance we make to anything that touches DFG CPS takes much > longer than changes that modify other parts of the system. I believe this > is caused by DFG CPS's exotic nature: it has a lot of sneaky features that > people don't expect, and they forget to handle those features when they make > changes. > > Because you have introduced a new IR - a bytecode with phantom control flow > - that is used only between bytecode generation and generatorification, you > have created a situation where either this IR can *only* be used for > pre-generatorification and cannot be used for any other purpose, or you have This is right. It is convincing if this is the reason why we need to avoid this op_save's phantom jump. Having the consistent IR throughout all the pass is nice. > created a situation where any attempt to add other lowering phases before > generatorification will be super error-prone because people will forget how > to handle the semantics of phantom control flow (or they'll simply not even > know that the IR has phantom control flow). Yup, using the consistent IR is good. (And now I'm exploring to change the current implementation to that.) While it is right, but I don't think such a situation will occur even if we don't use the consistent IR. Again, is there any case we would like to insert the pass before the generatorifiation? My current design see the pre-generatorification bytecode as the half-baked one. This is something like the bytecode that does not finish its generation yet. For example, the bytecode may be meaningless for the analysis if its handlers are not inserted yet. And that's why the generatorification is placed just after inserting the handlers: the generatorification is the part of the phase of finishing the bytecode generation. I didn't consider the situation that performing the other analysis onto this half-baked IR. When we perform the flow analysis, the bytecode IR should be already generatorificated. So, there are no op_save, op_resume. All the jumps are not phantom ones and data flow is explicitly described by get_from_scope/put_to_scope. > It's extremely important to design IRs that are not surprising and this is > exactly what we've worked hard to do in JSC ever since we realized how much > of a disaster DFG CPS was. I could be convinced that we need to use a > special IR here, but so far your argument is primarily that it would make > your phase easier. I agree that it would make your phase easier, but this > is not a scalable approach to designing IRs. If we let people make bizarre > changes to DFG IR, B3 IR, or Air just because it would make their phase > easier, then we would end up with an unmanagable engine. Right. Currently, this patch assumes that the generatorification is only the user of this half-baked IR. So this patch is focusing on the ease of the generatorification phase since only the generatorification uses it. > > However, this makes inserting jumps takes more operations. For example, if > > you would like to insert op_switch_imm additionally at first, we need 2 > > phase bytecode modifications. > > > > 1. before inserting op_switch_imm, we need to adjust jumps in > > UnlinkedCodeBlock since inserting a bytecode changes the jump offsets. > > 2. insert op_switch_imm just after the op_enter. > > 3. adjust inserted op_switch_imm's jump targets based on it. > > 4. adjust UnlinkedCodeBlock *again* for the subsequent addition / removal > > 5. insert/remove other bytecode opcodes, (get_from_scope/put_to_scope) > > 4. re-calculate jump targets > > > > OR a little bit efficient version is the following. > > > > 1. adjust UnlinkedCodeBlock *again* for the subsequent addition / removal > > 2. insert/remove other bytecode opcodes, (get_from_scope/put_to_scope), it > > includes switch_imm insertion. > > 3. specially adjust the inserted switch_imm jump offsets (this is special > > ops before calculating jump targets) > > 4. re-calculate jump targets. > > This doesn't seem that bad. Do you think that this is bad enough that we > need a special IR? BTW, i'm currently exploring this way. > > > > But if we already inserted op_switch_imm, the above sequence is more > > simplified. (And this is just done in this patch's modification phase.) > > > > 1. adjust UnlinkedCodeBlock for the subsequent addition / removal > > 2. insert / remove bytecode opcodes, (get_from_scope/put_to_scope) > > 4. re-calculate jump targets > > > > We don't need to pay attention to the inserted switch_imm specially. > > > > Since dropping the merge-save edge is only required if we perform the latter > > optimizations, if we don't do that, the things become super easy. > > We just perform liveness analysis and it does not consier anything about > > save-merge edge. > > In that case, we don't need to pay any attension to the graph's shape. > > So if we don't use the optimization pass, op_yield does not simplify things. > > > > So even if we do not use the optimization, still I prefer op_save / > > op_resume approach. > > In precise, the approach inserting switch in the bytecode generator makes > > things simple. > > I would be convinced that op_save/op_resume is better if it makes the > parse->generate->generatorify pipeline run faster than it would otherwise. > This is a very convincing argument. Is this what you're saying, or are you > saying something else? > > If it's just that your phase would have less code as a result of having a > quirky IR preceding that phase, then it's not a convincing argument. And this is due to the ease of the generatorification as described in the sequence of the rewriting. This is because I saw the pre-generatorification IR is the half-baked one and I saw the generatorification is only the user.
Yusuke Suzuki
Comment 137 2016-08-10 05:56:11 PDT
Comment on attachment 285633 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=285633&action=review > Source/JavaScriptCore/bytecode/BytecodeModificationSet.h:35 > +class BytecodeModificationSet { Let's implement the more powerful utility.
Filip Pizlo
Comment 138 2016-08-10 12:49:25 PDT
(In reply to comment #136) > Thank you!! > > Aha! Maybe I've just understood why you suggest. > The desgin difference between my one and your suggestion comes from the > thought about this pre-generatorification IR. > My current design see the pre-generatorification bytecode as the half-baked > one. > And this patch assumes the generatofication is only the user of this IR. So > everything is designed for this generatorification pass. > > I think your op_yield design is good. I'm now exploring the way to use the > consistent IR during these phases. > However, still I'm not sure about the situation you assume. So, can I ask > you about the IR? :) > > (In reply to comment #135) > > I think you're misunderstanding what I've been suggesting with respect to > > op_yield. I'm *not* suggesting creating a bytecode IR in which each > > instruction gets its own separately allocated node. I want bytecode to stay > > the way it is: a linear sequence of instructions. > > > > A common feature of other bytecode VMs, which JSC still lacks, is a bytecode > > rewriter. A bytecode rewriter is a module that can be asked to insert > > instructions, remove instructions, insert branches, remove branches, etc. > > The bytecode rewriter can handle rewiring jump offsets, and supports > > inserting branches both to existing code and to new code that was just > > inserted. > > It is good to implement the rewriter that can insert the jumps. > > > Some bytecode rewriters, like asm (http://asm.ow2.org), provide a rewriting > > facility based on first allocating an object for each bytecode instruction. > > Others, like the one we had in OpenVM (no link, dead project), only require > > you to allocate memory for the fragments of new instructions you would > > insert. So each slab of put_to_scope's would be a "fragment". > > > > I understand that writing such a rewriter is harder than what you have done. > > But IR design *must not* be driven by the needs of just one optimization. > > I'm not sure about this. The IR after the generatorification is the same one > to the current one. > So all the other pass can enjoy the existing unsurprising IR, right? > > The problematic pattern happens only when we would like to insert the pass > before the generatorification. > But wait. What is the actual example of such a pass? > Why not inserting the pass after the generatorification? > > The beneficial case seems only one case: inserting the op_yield. > The other pass should be inserted after the generatorification and it works > without problems. > But, in that case, why don't we insert the op_yield in the BytecodeGenerator? It's true that we could simply say that generatorification must run first. But what if TC39 adds another feature to the language that also requires a desugaring, and we also decide to implement it as a bytecode rewrite. If generatorification must run first, then it would mean that generatorification would have to be taught to understand the pre-desugared version of the new desugaring, and the new desugaring will have to run after generatorification. That seems like a nasty constraint, and I don't think this would scale to more than a handful of desugarings. In DFG SSA and B3 we resolve the scalability problem of inter-phase dependencies by requiring that the input and output of any phase is valid IR that obeys the same rules. This way, even if you don't understand what (for example) B3's LowerMacros does to ChillDiv, Switch, and other opcodes, you can still write phases that run either before LowerMacros, after LowerMacros, or both. > > > You have created a new IR, which is very different from bytecode. Prior to > > your change, our bytecode is very conventional. Control flow is explicit > > and unambiguous. This is a useful property, because this is what everyone > > will expect from bytecode. You have changed bytecode to have entirely new > > semantics that nobody would expect: a branch edge can sometimes magically > > disappear. > > If we don't insert anything before the generatorification phase, this IR can > be only seen by the generatorification. > Is there any user that can see op_save / op_resume? > The currrent conventional bytecode is completely kept for the pass after the > generatorification. I'm just thinking that if we decided to use bytecode rewriting to handle generators then this suggests that we will also want to use bytecode rewriting for other things in the future. If generatorification has the special status that it must be the first bytecode rewrite, then this places a constraint on any future bytecode rewrite: generatorification will now have to be taught the un-lowered pseudo-opcodes that are inputs to any future bytecode rewrite and no future bytecode rewrite will be able to take first place. On the other hand, the op_yield approach means that when we do write a new rewriter, we have the freedom of putting it either before generatorification or after. I like having that kind of freedom, because we've found it to be handy in DFG and B3. Certainly other compilers, like LLVM, strive to ensure that each phases takes valid IR as input and produces valid IR as output, and the semantics of the IR don't change. > > > We have some experience designing IRs in WebKit. Some IRs, like our > > existing bytecode, B3, and Air, follow textbook approaches and are entirely > > unsurprising. Other IRs, like DFG CPS and to a lesser extent DFG SSA, are > > exotic. Also, our worst bugs are in phases that try to analyze or modify > > DFG CPS. Any chance we make to anything that touches DFG CPS takes much > > longer than changes that modify other parts of the system. I believe this > > is caused by DFG CPS's exotic nature: it has a lot of sneaky features that > > people don't expect, and they forget to handle those features when they make > > changes. > > > > Because you have introduced a new IR - a bytecode with phantom control flow > > - that is used only between bytecode generation and generatorification, you > > have created a situation where either this IR can *only* be used for > > pre-generatorification and cannot be used for any other purpose, or you have > > This is right. It is convincing if this is the reason why we need to avoid > this op_save's phantom jump. > Having the consistent IR throughout all the pass is nice. > > > created a situation where any attempt to add other lowering phases before > > generatorification will be super error-prone because people will forget how > > to handle the semantics of phantom control flow (or they'll simply not even > > know that the IR has phantom control flow). > > Yup, using the consistent IR is good. (And now I'm exploring to change the > current implementation to that.) > While it is right, but I don't think such a situation will occur even if we > don't use the consistent IR. > Again, is there any case we would like to insert the pass before the > generatorifiation? > > My current design see the pre-generatorification bytecode as the half-baked > one. > This is something like the bytecode that does not finish its generation yet. > For example, the bytecode may be meaningless for the analysis if its > handlers are not inserted yet. > And that's why the generatorification is placed just after inserting the > handlers: the generatorification is the part of the phase of finishing the > bytecode generation. > > I didn't consider the situation that performing the other analysis onto this > half-baked IR. > When we perform the flow analysis, the bytecode IR should be already > generatorificated. > So, there are no op_save, op_resume. All the jumps are not phantom ones and > data flow is explicitly described by get_from_scope/put_to_scope. > > > It's extremely important to design IRs that are not surprising and this is > > exactly what we've worked hard to do in JSC ever since we realized how much > > of a disaster DFG CPS was. I could be convinced that we need to use a > > special IR here, but so far your argument is primarily that it would make > > your phase easier. I agree that it would make your phase easier, but this > > is not a scalable approach to designing IRs. If we let people make bizarre > > changes to DFG IR, B3 IR, or Air just because it would make their phase > > easier, then we would end up with an unmanagable engine. > > Right. Currently, this patch assumes that the generatorification is only the > user of this half-baked IR. > So this patch is focusing on the ease of the generatorification phase since > only the generatorification uses it. > > > > However, this makes inserting jumps takes more operations. For example, if > > > you would like to insert op_switch_imm additionally at first, we need 2 > > > phase bytecode modifications. > > > > > > 1. before inserting op_switch_imm, we need to adjust jumps in > > > UnlinkedCodeBlock since inserting a bytecode changes the jump offsets. > > > 2. insert op_switch_imm just after the op_enter. > > > 3. adjust inserted op_switch_imm's jump targets based on it. > > > 4. adjust UnlinkedCodeBlock *again* for the subsequent addition / removal > > > 5. insert/remove other bytecode opcodes, (get_from_scope/put_to_scope) > > > 4. re-calculate jump targets > > > > > > OR a little bit efficient version is the following. > > > > > > 1. adjust UnlinkedCodeBlock *again* for the subsequent addition / removal > > > 2. insert/remove other bytecode opcodes, (get_from_scope/put_to_scope), it > > > includes switch_imm insertion. > > > 3. specially adjust the inserted switch_imm jump offsets (this is special > > > ops before calculating jump targets) > > > 4. re-calculate jump targets. > > > > This doesn't seem that bad. Do you think that this is bad enough that we > > need a special IR? > > BTW, i'm currently exploring this way. > > > > > > > But if we already inserted op_switch_imm, the above sequence is more > > > simplified. (And this is just done in this patch's modification phase.) > > > > > > 1. adjust UnlinkedCodeBlock for the subsequent addition / removal > > > 2. insert / remove bytecode opcodes, (get_from_scope/put_to_scope) > > > 4. re-calculate jump targets > > > > > > We don't need to pay attention to the inserted switch_imm specially. > > > > > > Since dropping the merge-save edge is only required if we perform the latter > > > optimizations, if we don't do that, the things become super easy. > > > We just perform liveness analysis and it does not consier anything about > > > save-merge edge. > > > In that case, we don't need to pay any attension to the graph's shape. > > > So if we don't use the optimization pass, op_yield does not simplify things. > > > > > > So even if we do not use the optimization, still I prefer op_save / > > > op_resume approach. > > > In precise, the approach inserting switch in the bytecode generator makes > > > things simple. > > > > I would be convinced that op_save/op_resume is better if it makes the > > parse->generate->generatorify pipeline run faster than it would otherwise. > > This is a very convincing argument. Is this what you're saying, or are you > > saying something else? > > > > If it's just that your phase would have less code as a result of having a > > quirky IR preceding that phase, then it's not a convincing argument. > > And this is due to the ease of the generatorification as described in the > sequence of the rewriting. > This is because I saw the pre-generatorification IR is the half-baked one > and I saw the generatorification is only the user. I've been thinking more about the performance argument. I can totally buy that your current approach, however half-baked from a philosophical standpoint, is actually optimal because it conserves total work. The bytecode generator is already well-equipped to emit things like switches, which then reduces the amount of work that the generatorificator has to do. Now that we've explored a lot of these ideas, I'll leave it up to decide what you think is best. I'm happy to review the op_save/resume version in its current form, on the grounds that it *might* be the fastest approach. You could even try the op_yield version after this lands, and then we will be able to do a head-to-head perf comparison.
Yusuke Suzuki
Comment 139 2016-08-12 07:59:23 PDT
Thanks! (In reply to comment #138) > (In reply to comment #136) > > Thank you!! > > > > Aha! Maybe I've just understood why you suggest. > > The desgin difference between my one and your suggestion comes from the > > thought about this pre-generatorification IR. > > My current design see the pre-generatorification bytecode as the half-baked > > one. > > And this patch assumes the generatofication is only the user of this IR. So > > everything is designed for this generatorification pass. > > > > I think your op_yield design is good. I'm now exploring the way to use the > > consistent IR during these phases. > > However, still I'm not sure about the situation you assume. So, can I ask > > you about the IR? :) > > > > (In reply to comment #135) > > > I think you're misunderstanding what I've been suggesting with respect to > > > op_yield. I'm *not* suggesting creating a bytecode IR in which each > > > instruction gets its own separately allocated node. I want bytecode to stay > > > the way it is: a linear sequence of instructions. > > > > > > A common feature of other bytecode VMs, which JSC still lacks, is a bytecode > > > rewriter. A bytecode rewriter is a module that can be asked to insert > > > instructions, remove instructions, insert branches, remove branches, etc. > > > The bytecode rewriter can handle rewiring jump offsets, and supports > > > inserting branches both to existing code and to new code that was just > > > inserted. > > > > It is good to implement the rewriter that can insert the jumps. > > > > > Some bytecode rewriters, like asm (http://asm.ow2.org), provide a rewriting > > > facility based on first allocating an object for each bytecode instruction. > > > Others, like the one we had in OpenVM (no link, dead project), only require > > > you to allocate memory for the fragments of new instructions you would > > > insert. So each slab of put_to_scope's would be a "fragment". > > > > > > I understand that writing such a rewriter is harder than what you have done. > > > But IR design *must not* be driven by the needs of just one optimization. > > > > I'm not sure about this. The IR after the generatorification is the same one > > to the current one. > > So all the other pass can enjoy the existing unsurprising IR, right? > > > > The problematic pattern happens only when we would like to insert the pass > > before the generatorification. > > But wait. What is the actual example of such a pass? > > Why not inserting the pass after the generatorification? > > > > The beneficial case seems only one case: inserting the op_yield. > > The other pass should be inserted after the generatorification and it works > > without problems. > > But, in that case, why don't we insert the op_yield in the BytecodeGenerator? > > It's true that we could simply say that generatorification must run first. > > But what if TC39 adds another feature to the language that also requires a > desugaring, and we also decide to implement it as a bytecode rewrite. If > generatorification must run first, then it would mean that > generatorification would have to be taught to understand the pre-desugared > version of the new desugaring, and the new desugaring will have to run after > generatorification. > > That seems like a nasty constraint, and I don't think this would scale to > more than a handful of desugarings. > > In DFG SSA and B3 we resolve the scalability problem of inter-phase > dependencies by requiring that the input and output of any phase is valid IR > that obeys the same rules. This way, even if you don't understand what (for > example) B3's LowerMacros does to ChillDiv, Switch, and other opcodes, you > can still write phases that run either before LowerMacros, after > LowerMacros, or both. Make sense. > > > > > > You have created a new IR, which is very different from bytecode. Prior to > > > your change, our bytecode is very conventional. Control flow is explicit > > > and unambiguous. This is a useful property, because this is what everyone > > > will expect from bytecode. You have changed bytecode to have entirely new > > > semantics that nobody would expect: a branch edge can sometimes magically > > > disappear. > > > > If we don't insert anything before the generatorification phase, this IR can > > be only seen by the generatorification. > > Is there any user that can see op_save / op_resume? > > The currrent conventional bytecode is completely kept for the pass after the > > generatorification. > > I'm just thinking that if we decided to use bytecode rewriting to handle > generators then this suggests that we will also want to use bytecode > rewriting for other things in the future. If generatorification has the > special status that it must be the first bytecode rewrite, then this places > a constraint on any future bytecode rewrite: generatorification will now > have to be taught the un-lowered pseudo-opcodes that are inputs to any > future bytecode rewrite and no future bytecode rewrite will be able to take > first place. > > On the other hand, the op_yield approach means that when we do write a new > rewriter, we have the freedom of putting it either before generatorification > or after. I like having that kind of freedom, because we've found it to be > handy in DFG and B3. Certainly other compilers, like LLVM, strive to ensure > that each phases takes valid IR as input and produces valid IR as output, > and the semantics of the IR don't change. > > > > > > We have some experience designing IRs in WebKit. Some IRs, like our > > > existing bytecode, B3, and Air, follow textbook approaches and are entirely > > > unsurprising. Other IRs, like DFG CPS and to a lesser extent DFG SSA, are > > > exotic. Also, our worst bugs are in phases that try to analyze or modify > > > DFG CPS. Any chance we make to anything that touches DFG CPS takes much > > > longer than changes that modify other parts of the system. I believe this > > > is caused by DFG CPS's exotic nature: it has a lot of sneaky features that > > > people don't expect, and they forget to handle those features when they make > > > changes. > > > > > > Because you have introduced a new IR - a bytecode with phantom control flow > > > - that is used only between bytecode generation and generatorification, you > > > have created a situation where either this IR can *only* be used for > > > pre-generatorification and cannot be used for any other purpose, or you have > > > > This is right. It is convincing if this is the reason why we need to avoid > > this op_save's phantom jump. > > Having the consistent IR throughout all the pass is nice. > > > > > created a situation where any attempt to add other lowering phases before > > > generatorification will be super error-prone because people will forget how > > > to handle the semantics of phantom control flow (or they'll simply not even > > > know that the IR has phantom control flow). > > > > Yup, using the consistent IR is good. (And now I'm exploring to change the > > current implementation to that.) > > While it is right, but I don't think such a situation will occur even if we > > don't use the consistent IR. > > Again, is there any case we would like to insert the pass before the > > generatorifiation? > > > > My current design see the pre-generatorification bytecode as the half-baked > > one. > > This is something like the bytecode that does not finish its generation yet. > > For example, the bytecode may be meaningless for the analysis if its > > handlers are not inserted yet. > > And that's why the generatorification is placed just after inserting the > > handlers: the generatorification is the part of the phase of finishing the > > bytecode generation. > > > > I didn't consider the situation that performing the other analysis onto this > > half-baked IR. > > When we perform the flow analysis, the bytecode IR should be already > > generatorificated. > > So, there are no op_save, op_resume. All the jumps are not phantom ones and > > data flow is explicitly described by get_from_scope/put_to_scope. > > > > > It's extremely important to design IRs that are not surprising and this is > > > exactly what we've worked hard to do in JSC ever since we realized how much > > > of a disaster DFG CPS was. I could be convinced that we need to use a > > > special IR here, but so far your argument is primarily that it would make > > > your phase easier. I agree that it would make your phase easier, but this > > > is not a scalable approach to designing IRs. If we let people make bizarre > > > changes to DFG IR, B3 IR, or Air just because it would make their phase > > > easier, then we would end up with an unmanagable engine. > > > > Right. Currently, this patch assumes that the generatorification is only the > > user of this half-baked IR. > > So this patch is focusing on the ease of the generatorification phase since > > only the generatorification uses it. > > > > > > However, this makes inserting jumps takes more operations. For example, if > > > > you would like to insert op_switch_imm additionally at first, we need 2 > > > > phase bytecode modifications. > > > > > > > > 1. before inserting op_switch_imm, we need to adjust jumps in > > > > UnlinkedCodeBlock since inserting a bytecode changes the jump offsets. > > > > 2. insert op_switch_imm just after the op_enter. > > > > 3. adjust inserted op_switch_imm's jump targets based on it. > > > > 4. adjust UnlinkedCodeBlock *again* for the subsequent addition / removal > > > > 5. insert/remove other bytecode opcodes, (get_from_scope/put_to_scope) > > > > 4. re-calculate jump targets > > > > > > > > OR a little bit efficient version is the following. > > > > > > > > 1. adjust UnlinkedCodeBlock *again* for the subsequent addition / removal > > > > 2. insert/remove other bytecode opcodes, (get_from_scope/put_to_scope), it > > > > includes switch_imm insertion. > > > > 3. specially adjust the inserted switch_imm jump offsets (this is special > > > > ops before calculating jump targets) > > > > 4. re-calculate jump targets. > > > > > > This doesn't seem that bad. Do you think that this is bad enough that we > > > need a special IR? > > > > BTW, i'm currently exploring this way. > > > > > > > > > > But if we already inserted op_switch_imm, the above sequence is more > > > > simplified. (And this is just done in this patch's modification phase.) > > > > > > > > 1. adjust UnlinkedCodeBlock for the subsequent addition / removal > > > > 2. insert / remove bytecode opcodes, (get_from_scope/put_to_scope) > > > > 4. re-calculate jump targets > > > > > > > > We don't need to pay attention to the inserted switch_imm specially. > > > > > > > > Since dropping the merge-save edge is only required if we perform the latter > > > > optimizations, if we don't do that, the things become super easy. > > > > We just perform liveness analysis and it does not consier anything about > > > > save-merge edge. > > > > In that case, we don't need to pay any attension to the graph's shape. > > > > So if we don't use the optimization pass, op_yield does not simplify things. > > > > > > > > So even if we do not use the optimization, still I prefer op_save / > > > > op_resume approach. > > > > In precise, the approach inserting switch in the bytecode generator makes > > > > things simple. > > > > > > I would be convinced that op_save/op_resume is better if it makes the > > > parse->generate->generatorify pipeline run faster than it would otherwise. > > > This is a very convincing argument. Is this what you're saying, or are you > > > saying something else? > > > > > > If it's just that your phase would have less code as a result of having a > > > quirky IR preceding that phase, then it's not a convincing argument. > > > > And this is due to the ease of the generatorification as described in the > > sequence of the rewriting. > > This is because I saw the pre-generatorification IR is the half-baked one > > and I saw the generatorification is only the user. > > I've been thinking more about the performance argument. I can totally buy > that your current approach, however half-baked from a philosophical > standpoint, is actually optimal because it conserves total work. The > bytecode generator is already well-equipped to emit things like switches, > which then reduces the amount of work that the generatorificator has to do. > > Now that we've explored a lot of these ideas, I'll leave it up to decide > what you think is best. I'm happy to review the op_save/resume version in > its current form, on the grounds that it *might* be the fastest approach. > You could even try the op_yield version after this lands, and then we will > be able to do a head-to-head perf comparison. Interesting. OK, I'll explore the way to use the one IR (using op_yield, and insert jumps in the generatorification phase) and check the performance change :)
Yusuke Suzuki
Comment 140 2016-08-19 22:10:53 PDT
Created attachment 286526 [details] Patch WIP: rebaseline
Yusuke Suzuki
Comment 141 2016-08-23 11:58:27 PDT
Created attachment 286753 [details] Patch WIP
Yusuke Suzuki
Comment 142 2016-08-23 13:30:05 PDT
Created attachment 286766 [details] Patch WIP
Yusuke Suzuki
Comment 143 2016-08-23 20:02:41 PDT
Created attachment 286826 [details] Patch WIP: OK, op_yield approach just works! We need to refactor it
WebKit Commit Bot
Comment 144 2016-08-23 20:05:28 PDT
Attachment 286826 [details] did not pass style-queue: ERROR: Source/WTF/wtf/IndexedContainerIterator.h:48: This { should be at the end of the previous line [whitespace/braces] [4] Total errors found: 1 in 72 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 145 2016-08-24 10:28:11 PDT
Created attachment 286863 [details] Patch WIP: OK! Almost complete patch. Rewriter can insert switches and fragments. And if we extend it, we can insert any jumps! The patch includes moving good data structures from B3 to WTF. This is used in the previous patch. But now, it is not used. While it's not used, it is still beneficial since it can be used for bytecode basic blocks. So I'll split this out of this patch and create the simple patch moving the data structures out of B3. Simplifying...
WebKit Commit Bot
Comment 146 2016-08-24 10:33:16 PDT
Attachment 286863 [details] did not pass style-queue: ERROR: Source/WTF/wtf/IndexedContainerIterator.h:48: This { should be at the end of the previous line [whitespace/braces] [4] Total errors found: 1 in 72 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yusuke Suzuki
Comment 147 2016-08-24 10:40:35 PDT
Comment on attachment 286863 [details] Patch Oops. Still simplifying...
Yusuke Suzuki
Comment 148 2016-08-24 12:55:06 PDT
Created attachment 286882 [details] Patch WIP
Yusuke Suzuki
Comment 149 2016-08-24 14:51:12 PDT
Created attachment 286894 [details] Patch WIP: more cleanup part 2
Yusuke Suzuki
Comment 150 2016-08-24 16:01:17 PDT
Created attachment 286902 [details] Patch WIP
WebKit Commit Bot
Comment 151 2016-08-24 16:11:05 PDT
Attachment 286902 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/bytecode/BytecodeGraph.h:48: reverse_iterator is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] Total errors found: 1 in 45 files If any of these errors are false positives, please file a bug against check-webkit-style.
Build Bot
Comment 152 2016-08-24 16:53:21 PDT
Comment on attachment 286902 [details] Patch Attachment 286902 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/1936568 Number of test failures exceeded the failure limit.
Build Bot
Comment 153 2016-08-24 16:53:28 PDT
Created attachment 286911 [details] Archive of layout-test-results from ews102 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews102 Port: mac-yosemite Platform: Mac OS X 10.10.5
Build Bot
Comment 154 2016-08-24 17:03:36 PDT
Comment on attachment 286902 [details] Patch Attachment 286902 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/1936614 Number of test failures exceeded the failure limit.
Build Bot
Comment 155 2016-08-24 17:03:43 PDT
Created attachment 286913 [details] Archive of layout-test-results from ews106 for mac-yosemite-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews106 Port: mac-yosemite-wk2 Platform: Mac OS X 10.10.5
Build Bot
Comment 156 2016-08-24 17:19:45 PDT
Comment on attachment 286902 [details] Patch Attachment 286902 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/1936622 Number of test failures exceeded the failure limit.
Build Bot
Comment 157 2016-08-24 17:19:52 PDT
Created attachment 286916 [details] Archive of layout-test-results from ews122 for ios-simulator-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews122 Port: ios-simulator-elcapitan-wk2 Platform: Mac OS X 10.11.5
Yusuke Suzuki
Comment 158 2016-08-24 19:49:13 PDT
Created attachment 286933 [details] Patch WIP
Yusuke Suzuki
Comment 159 2016-08-24 19:58:52 PDT
Created attachment 286935 [details] Patch OK! Patch is ready
Build Bot
Comment 160 2016-08-25 01:18:27 PDT
Comment on attachment 286935 [details] Patch Attachment 286935 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/1938695 New failing tests: fast/viewport/ios/width-is-device-width-overflowing.html
Build Bot
Comment 161 2016-08-25 01:18:35 PDT
Created attachment 286951 [details] Archive of layout-test-results from ews121 for ios-simulator-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews121 Port: ios-simulator-elcapitan-wk2 Platform: Mac OS X 10.11.5
Yusuke Suzuki
Comment 162 2016-08-25 08:32:33 PDT
Comment on attachment 286935 [details] Patch I've just come up to my mind about further simplying. I'll do that and upload the patch with r?.
Yusuke Suzuki
Comment 163 2016-08-25 11:34:43 PDT
Created attachment 286991 [details] Patch WIP: OK! super generic bytecode rewriter comes. you can add any jumps anytime if you want!
Yusuke Suzuki
Comment 164 2016-08-25 13:44:20 PDT
Created attachment 287004 [details] Patch WIP
Yusuke Suzuki
Comment 165 2016-08-25 14:18:29 PDT
Filip Pizlo
Comment 166 2016-08-25 14:27:34 PDT
Comment on attachment 287015 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=287015&action=review > Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:182 > +void BytecodeBasicBlock::computeBytecodeBasicBlocks(CodeBlock* codeBlock, Instruction* instructionsBegin, unsigned instructionCount, Vector<std::unique_ptr<BytecodeBasicBlock>>& basicBlocks) Nit: Since the name of the class is "BytecodeBasicBlock", anytime the method names have the word "bytecode" in them, it's a tautology. Based on this, I would name this method computeBasicBlocks() for example. But I might even go further. The name of the class is BytecodeBasicBlocks, so maybe this method should be called "compute"? Ditto for the other methods. > Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:208 > + rewriter.insertFragmentBefore(nextToEnterPoint, [&](BytecodeRewriter::Fragment& fragment) { > + fragment.appendInstruction(op_switch_imm, switchTableIndex, nextToEnterPoint, state.offset()); > + }); I like this syntax! > Source/JavaScriptCore/bytecode/BytecodeRewriter.h:79 > +class BytecodeRewriter { This is great!
Yusuke Suzuki
Comment 167 2016-08-25 14:49:04 PDT
Comment on attachment 287015 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=287015&action=review >> Source/JavaScriptCore/bytecode/BytecodeBasicBlock.cpp:182 >> +void BytecodeBasicBlock::computeBytecodeBasicBlocks(CodeBlock* codeBlock, Instruction* instructionsBegin, unsigned instructionCount, Vector<std::unique_ptr<BytecodeBasicBlock>>& basicBlocks) > > Nit: Since the name of the class is "BytecodeBasicBlock", anytime the method names have the word "bytecode" in them, it's a tautology. Based on this, I would name this method computeBasicBlocks() for example. > > But I might even go further. The name of the class is BytecodeBasicBlocks, so maybe this method should be called "compute"? Ditto for the other methods. The caller sees the "BytecodeBasicBlock::compute(...)". It looks nice, I'll use "compute". And right. Many methods include the word "bytecode" while non-"bytecode" version does not exist. I've dropped them. >> Source/JavaScriptCore/bytecode/BytecodeRewriter.h:79 >> +class BytecodeRewriter { > > This is great! JSC now has the ability to rewrite bytecodes!
Saam Barati
Comment 168 2016-08-25 15:13:26 PDT
Comment on attachment 287015 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=287015&action=review > Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:229 > + rewriter.insertFragmentBefore(data.point, [&](BytecodeRewriter::Fragment& fragment) { > + data.liveness.forEachSetBit([&](size_t index) { > + VirtualRegister operand = virtualRegisterForLocal(index); > + Storage storage = storageForGeneratorLocal(index); > + > + fragment.appendInstruction( > + op_put_to_scope, > + scope.offset(), // scope > + storage.identifierIndex, // identifier > + operand.offset(), // value > + GetPutInfo(DoNotThrowIfNotFound, LocalClosureVar, InitializationMode::NotInitialization).operand(), // info > + m_generatorFrameSymbolTableIndex, // symbol table constant index > + storage.scopeOffset.offset() // scope offset > + ); > + }); I agree with Fil, this is a nice API. > Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:236 > + rewriter.insertFragmentBefore(data.point + opcodeLength(op_yield), [&](BytecodeRewriter::Fragment& fragment) { What about making insertFragmentAfter do the opcodeLength() add for you (since it's not used in an other way now)? (Or maybe it's worth adding at a later point). It also seems like it'd be nice to assert that the opcode at that point in bytecode is what you expect it to be > Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:243 > + op_get_from_scope, Why is the scope loaded at this point?
Yusuke Suzuki
Comment 169 2016-08-25 15:28:44 PDT
Comment on attachment 287015 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=287015&action=review > Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:212 > + VirtualRegister scope = virtualRegisterForArgument(static_cast<int32_t>(JSGeneratorFunction::GeneratorArgument::Frame)); This is the passed scope. >> Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:236 >> + rewriter.insertFragmentBefore(data.point + opcodeLength(op_yield), [&](BytecodeRewriter::Fragment& fragment) { > > What about making insertFragmentAfter do the opcodeLength() add for you (since it's not used in an other way now)? (Or maybe it's worth adding at a later point). > It also seems like it'd be nice to assert that the opcode at that point in bytecode is what you expect it to be In the generatorification, we don't use the insertFragmentAfter. But I think it is useful. When saying before / after, it means the before and after the label (original bytecode offset). For this place, I think insertFragmentAfter(data.point, [&](BytecodeRewriter::Fragment& fragment) { }) just works. I'll change to use insertFragmentAfter. [ ] | [op_yield] | [ ] | A B ----> <------ try-range A: data.point B: data.point + opcodeLength(op_yield) Insert bytecodes [*] before A. [ ] [*] | [op_yield] | [ ] | A B --------> <------ try-range Insert bytecodes [*] after A. [ ] | [*] [op_yield] | [ ] | A B ----> <------ try-range Insert bytecodes [*] before B. [ ] | [op_yield] [*] | [ ] | A B ----> <------ try-range Insert bytecodes [*] after B. [ ] | [op_yield] | [*] [ ] | A B ----> <------ try-range >> Source/JavaScriptCore/bytecode/BytecodeGeneratorification.cpp:243 >> + op_get_from_scope, > > Why is the scope loaded at this point? The scope holding generator's variables is passed by the argument.
Yusuke Suzuki
Comment 170 2016-08-25 15:51:01 PDT
Created attachment 287035 [details] Patch Patch for landing
Yusuke Suzuki
Comment 171 2016-08-25 15:54:01 PDT
Created attachment 287036 [details] Patch for landing
Yusuke Suzuki
Comment 172 2016-08-25 15:57:03 PDT
Note You need to log in before you can comment on or make changes to this bug.