RESOLVED FIXED 67176
JavaScriptCore does not have tiered compilation
https://bugs.webkit.org/show_bug.cgi?id=67176
Summary JavaScriptCore does not have tiered compilation
Filip Pizlo
Reported 2011-08-29 18:53:37 PDT
JSC does not support the notion of a piece of JS code being recompiled. This prevents the implementation of a number of interesting features: 1) Profile-driven optimization in DFG JIT. 2) JITing code only after the code is hot. 3) Having a JIT so powerful that it would be prohibitive to run it for every function.
Attachments
work in progress (66.70 KB, patch)
2011-08-29 19:43 PDT, Filip Pizlo
no flags
the patch (fix style) (66.61 KB, patch)
2011-08-29 20:04 PDT, Filip Pizlo
no flags
the patch - fix review (58.94 KB, patch)
2011-08-30 20:37 PDT, Filip Pizlo
no flags
work in progress (132.56 KB, patch)
2011-09-02 19:50 PDT, Filip Pizlo
gyuyoung.kim: commit-queue-
the patch (117.56 KB, patch)
2011-09-03 01:12 PDT, Filip Pizlo
no flags
the patch - fix conflicts, style (117.61 KB, patch)
2011-09-03 01:19 PDT, Filip Pizlo
no flags
the patch - removed spurious debug nonsense from v8-crypto (117.13 KB, patch)
2011-09-03 01:21 PDT, Filip Pizlo
no flags
the patch - fix windows (117.67 KB, patch)
2011-09-03 02:01 PDT, Filip Pizlo
no flags
the patch - another attempt to fix windows (119.41 KB, patch)
2011-09-03 02:53 PDT, Filip Pizlo
gustavo.noronha: commit-queue-
the patch - fix GTK (120.03 KB, patch)
2011-09-03 21:38 PDT, Filip Pizlo
barraclough: review-
barraclough: commit-queue-
the patch - partial fix review (130.88 KB, patch)
2011-09-05 17:49 PDT, Filip Pizlo
webkit-ews: commit-queue-
the patch - partial fix review, fix Qt build (130.86 KB, patch)
2011-09-05 18:07 PDT, Filip Pizlo
barraclough: review+
barraclough: commit-queue-
the patch - more fix review, sort of (130.12 KB, patch)
2011-09-05 21:16 PDT, Filip Pizlo
no flags
the patch (130.66 KB, patch)
2011-09-05 23:45 PDT, Filip Pizlo
no flags
Filip Pizlo
Comment 1 2011-08-29 19:21:48 PDT
Here is the performance of tiered compilation, without any optimizations that would benefit from tiered compilation. As in: 1) There is no profile-driven compilation. 2) We always JIT code. The first tier JIT is the old JIT, and the second tier JIT is the DFG. 3) The DFG JIT is not so much more expensive than the old JIT that tiering would reduce compile times significantly. As in, this is an worst-case scenario measurement - it gives us the worst-case pathological cost of not immediately compiling with the DFG. Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "TieringEnabled" at /Volumes/Data/pizlo/octonary/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree TieringEnabled SunSpider: 3d-cube 7.5697+-0.1302 ! 8.1947+-0.1789 ! definitely 1.0826x slower 3d-morph 7.3310+-0.1581 ! 7.6346+-0.1198 ! definitely 1.0414x slower 3d-raytrace 7.5872+-0.1801 ! 8.6595+-0.2115 ! definitely 1.1413x slower access-binary-trees 2.3936+-0.1317 ? 2.4144+-0.0996 ? access-fannkuch 11.7383+-0.1824 ? 12.0820+-0.2585 ? might be 1.0293x slower access-nbody 4.2421+-0.0683 ? 4.3795+-0.1220 ? might be 1.0324x slower access-nsieve 2.4851+-0.0591 ! 2.6592+-0.0485 ! definitely 1.0701x slower bitops-3bit-bits-in-byte 1.7253+-0.0628 ? 1.8459+-0.0681 ? might be 1.0699x slower bitops-bits-in-byte 4.5103+-0.2142 ! 4.9207+-0.0933 ! definitely 1.0910x slower bitops-bitwise-and 3.6722+-0.0712 ! 4.0303+-0.1294 ! definitely 1.0975x slower bitops-nsieve-bits 5.4084+-0.1234 ? 5.4185+-0.0880 ? controlflow-recursive 2.0106+-0.0685 ? 2.0777+-0.0619 ? might be 1.0334x slower crypto-aes 6.4807+-0.1584 ! 7.1072+-0.2762 ! definitely 1.0967x slower crypto-md5 2.8333+-0.0827 ! 3.0637+-0.0605 ! definitely 1.0813x slower crypto-sha1 2.1963+-0.0656 ! 2.4276+-0.0576 ! definitely 1.1053x slower date-format-tofte 10.1948+-0.3012 ? 10.3135+-0.3361 ? might be 1.0116x slower date-format-xparb 8.3079+-0.1122 ! 9.1526+-0.2566 ! definitely 1.1017x slower math-cordic 6.4793+-0.3005 ? 6.5032+-0.1603 ? math-partial-sums 7.7221+-0.1542 7.4640+-0.1467 might be 1.0346x faster math-spectral-norm 2.4993+-0.0623 ? 2.6149+-0.0656 ? might be 1.0463x slower regexp-dna 10.2265+-0.2753 10.0736+-0.2134 might be 1.0152x faster string-base64 6.0017+-0.2027 6.0012+-0.0787 string-fasta 7.4792+-0.2292 7.1326+-0.1780 might be 1.0486x faster string-tagcloud 13.4894+-0.2745 13.0466+-0.2483 might be 1.0339x faster string-unpack-code 18.3762+-0.3810 ? 19.0636+-0.4548 ? might be 1.0374x slower string-validate-input 6.9764+-0.2309 6.8614+-0.2039 might be 1.0168x faster <arithmetic> 6.5360+-0.0434 ! 6.7363+-0.0402 ! definitely 1.0306x slower <geometric> 5.4317+-0.0374 ! 5.6435+-0.0279 ! definitely 1.0390x slower <harmonic> 4.4484+-0.0301 ! 4.6648+-0.0340 ! definitely 1.0486x slower TipOfTree TieringEnabled V8: crypto 90.1561+-0.4657 ! 92.6007+-0.5567 ! definitely 1.0271x slower deltablue 261.2825+-1.2059 ! 265.1254+-1.7597 ! definitely 1.0147x slower earley-boyer 101.1890+-0.4968 ! 104.4750+-0.2499 ! definitely 1.0325x slower raytrace 77.7841+-0.3200 ? 77.9236+-0.4897 ? regexp 108.4768+-0.6381 ! 111.4160+-0.3292 ! definitely 1.0271x slower richards 241.2977+-1.2640 ! 249.0259+-2.2002 ! definitely 1.0320x slower splay 109.6627+-0.5339 ^ 108.3732+-0.5277 ^ definitely 1.0119x faster <arithmetic> 141.4070+-0.2960 ! 144.1343+-0.6120 ! definitely 1.0193x slower <geometric> 126.9741+-0.2363 ! 129.1974+-0.3898 ! definitely 1.0175x slower <harmonic> 116.3908+-0.2368 ! 118.2490+-0.2897 ! definitely 1.0160x slower TipOfTree TieringEnabled Kraken: ai-astar 1099.4152+-8.6736 ^ 1081.0538+-7.3193 ^ definitely 1.0170x faster audio-beat-detection 464.7362+-1.6339 ! 475.2486+-2.9215 ! definitely 1.0226x slower audio-dft 421.8051+-3.9788 ? 423.8868+-3.3273 ? audio-fft 374.3489+-3.1956 ^ 369.9372+-0.9839 ^ definitely 1.0119x faster audio-oscillator 375.2353+-0.4462 ^ 357.1955+-3.6390 ^ definitely 1.0505x faster imaging-darkroom 534.1020+-7.4953 ? 536.1056+-3.3782 ? imaging-desaturate 591.7430+-6.5169 ? 596.3413+-6.6589 ? imaging-gaussian-blur 1727.0032+-7.2924 ! 2019.4240+-18.6047 ! definitely 1.1693x slower json-parse-financial 49.2286+-1.2706 48.5747+-0.3344 might be 1.0135x faster json-stringify-tinderbox 61.5364+-0.3364 ? 62.2929+-0.4507 ? might be 1.0123x slower stanford-crypto-aes 145.0281+-0.7231 ? 147.0604+-1.3901 ? might be 1.0140x slower stanford-crypto-ccm 110.8017+-0.2247 ! 113.3705+-0.4937 ! definitely 1.0232x slower stanford-crypto-pbkdf2 337.3557+-1.4168 ? 339.8807+-3.2412 ? stanford-crypto-sha256-iterative 129.9748+-0.6521 ? 130.0296+-0.8055 ? <arithmetic> 458.7367+-1.0545 ! 478.6001+-1.4694 ! definitely 1.0433x slower <geometric> 293.5137+-0.6624 ! 296.9148+-0.5346 ! definitely 1.0116x slower <harmonic> 179.3067+-1.1042 ? 179.8807+-0.5467 ? TipOfTree TieringEnabled All benchmarks: <arithmetic> 161.3213+-0.3186 ! 167.7550+-0.4560 ! definitely 1.0399x slower <geometric> 28.5043+-0.0949 ! 29.2899+-0.0912 ! definitely 1.0276x slower <harmonic> 7.8556+-0.0515 ! 8.2301+-0.0588 ! definitely 1.0477x slower
Filip Pizlo
Comment 2 2011-08-29 19:38:00 PDT
Here's the performance with the patch (that I'm about to put up) with TIERED_COMPILATION disabled. It appears to be neutral. Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "TieringDisabled" at /Volumes/Data/pizlo/octonary/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree TieringDisabled SunSpider: 3d-cube 7.5376+-0.1175 ? 7.7058+-0.3003 ? might be 1.0223x slower 3d-morph 7.3861+-0.1502 ? 7.4441+-0.1638 ? 3d-raytrace 7.9061+-0.1573 7.7273+-0.1972 might be 1.0231x faster access-binary-trees 2.2056+-0.0434 ? 2.3275+-0.1729 ? might be 1.0553x slower access-fannkuch 11.8176+-0.2460 ? 11.8344+-0.2259 ? access-nbody 4.3464+-0.1199 4.2083+-0.0448 might be 1.0328x faster access-nsieve 2.5201+-0.0780 2.4449+-0.0236 might be 1.0308x faster bitops-3bit-bits-in-byte 1.7132+-0.0409 ? 1.7336+-0.0583 ? might be 1.0119x slower bitops-bits-in-byte 4.4561+-0.2215 ? 4.4569+-0.2233 ? bitops-bitwise-and 3.7906+-0.1316 3.6795+-0.0828 might be 1.0302x faster bitops-nsieve-bits 5.4563+-0.1567 5.4409+-0.1175 controlflow-recursive 2.0028+-0.0456 ? 2.0519+-0.0506 ? might be 1.0245x slower crypto-aes 6.5796+-0.1957 6.5665+-0.2603 crypto-md5 2.7610+-0.0811 ? 2.7837+-0.1032 ? crypto-sha1 2.2108+-0.0583 ? 2.2621+-0.0746 ? might be 1.0232x slower date-format-tofte 10.0549+-0.2862 ? 10.0583+-0.2622 ? date-format-xparb 8.5015+-0.2033 ? 8.6012+-0.2057 ? might be 1.0117x slower math-cordic 6.3485+-0.1002 6.2499+-0.0975 might be 1.0158x faster math-partial-sums 7.8015+-0.1462 7.6821+-0.1618 might be 1.0155x faster math-spectral-norm 2.4839+-0.0601 ? 2.5007+-0.0551 ? regexp-dna 10.2690+-0.2139 ? 10.4431+-0.1691 ? might be 1.0169x slower string-base64 5.9300+-0.1266 ? 6.0768+-0.1648 ? might be 1.0248x slower string-fasta 7.5471+-0.2306 ? 7.6356+-0.2971 ? might be 1.0117x slower string-tagcloud 13.1309+-0.2623 ? 13.2849+-0.2355 ? might be 1.0117x slower string-unpack-code 18.4339+-0.3087 ? 18.6805+-0.4475 ? might be 1.0134x slower string-validate-input 6.9646+-0.2545 ? 7.2475+-0.1046 ? might be 1.0406x slower <arithmetic> 6.5445+-0.0258 ? 6.5818+-0.0279 ? <geometric> 5.4303+-0.0209 ? 5.4580+-0.0307 ? <harmonic> 4.4312+-0.0178 ? 4.4605+-0.0364 ? TipOfTree TieringDisabled V8: crypto 89.7999+-0.4161 ? 90.5472+-0.3796 ? deltablue 261.8698+-0.8554 ! 267.0150+-1.3306 ! definitely 1.0196x slower earley-boyer 101.2657+-0.6838 ? 102.4584+-1.3149 ? might be 1.0118x slower raytrace 77.6936+-0.2382 ? 77.9529+-1.3854 ? regexp 109.3492+-0.6109 108.4792+-0.4163 richards 240.9417+-1.7815 240.3986+-0.6354 splay 108.7129+-0.9346 ? 108.9099+-0.8099 ? <arithmetic> 141.3761+-0.3684 ! 142.2516+-0.3139 ! definitely 1.0062x slower <geometric> 126.8950+-0.2565 ? 127.5128+-0.3834 ? <harmonic> 116.2815+-0.1985 ? 116.7642+-0.4657 ? TipOfTree TieringDisabled Kraken: ai-astar 1103.8657+-8.3351 ^ 1087.3269+-2.9530 ^ definitely 1.0152x faster audio-beat-detection 467.9644+-1.6541 ? 469.2818+-1.6212 ? audio-dft 419.3480+-5.3797 415.8305+-3.9171 audio-fft 370.3477+-0.7266 ! 372.8229+-0.6572 ! definitely 1.0067x slower audio-oscillator 377.0762+-3.1503 ! 384.9752+-2.9895 ! definitely 1.0209x slower imaging-darkroom 529.8054+-1.8055 ? 533.5978+-5.9939 ? imaging-desaturate 588.5433+-6.8542 587.1311+-6.6282 imaging-gaussian-blur 1736.0974+-10.3083 1722.0121+-6.5547 json-parse-financial 48.0002+-0.3510 ! 49.2782+-0.7477 ! definitely 1.0266x slower json-stringify-tinderbox 61.6009+-0.2921 ! 62.3810+-0.3903 ! definitely 1.0127x slower stanford-crypto-aes 144.3355+-0.5705 ? 145.9901+-2.8725 ? might be 1.0115x slower stanford-crypto-ccm 111.3489+-0.3694 ! 113.0364+-1.1566 ! definitely 1.0152x slower stanford-crypto-pbkdf2 336.8550+-1.3959 ? 336.9365+-1.4817 ? stanford-crypto-sha256-iterative 130.2361+-0.6723 ^ 128.8800+-0.4144 ^ definitely 1.0105x faster <arithmetic> 458.9589+-1.5070 457.8200+-1.0110 <geometric> 292.8484+-0.6801 ? 294.0579+-0.7532 ? <harmonic> 178.1834+-0.3880 ! 180.2808+-1.0294 ! definitely 1.0118x slower TipOfTree TieringDisabled All benchmarks: <arithmetic> 161.3875+-0.4462 161.1994+-0.3013 <geometric> 28.4787+-0.0634 ? 28.6145+-0.0824 ? <harmonic> 7.8251+-0.0308 ? 7.8771+-0.0627 ?
Filip Pizlo
Comment 3 2011-08-29 19:43:10 PDT
Created attachment 105564 [details] work in progress Not yet ready for review (tests still running, some asm work not yet ported to other architectures).
WebKit Review Bot
Comment 4 2011-08-29 19:45:01 PDT
Attachment 105564 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/bytecode/CodeBlock.h:217: The parameter name "symbolTable" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/runtime/Executable.h:311: The parameter name "exec" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/runtime/Executable.h:311: The parameter name "scopeChainNode" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/runtime/Executable.h:374: The parameter name "exec" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/runtime/Executable.h:374: The parameter name "scopeChainNode" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/runtime/Executable.h:455: The parameter name "exec" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/runtime/Executable.h:455: The parameter name "scopeChainNode" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/runtime/Executable.h:478: The parameter name "exec" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/runtime/Executable.h:478: The parameter name "scopeChainNode" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/bytecode/CodeBlock.cpp:1454: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Source/JavaScriptCore/bytecode/CodeBlock.cpp:1814: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Total errors found: 11 in 20 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 5 2011-08-29 20:04:05 PDT
Created attachment 105565 [details] the patch (fix style)
Filip Pizlo
Comment 6 2011-08-29 20:04:55 PDT
(In reply to comment #5) > Created an attachment (id=105565) [details] > the patch (fix style) This now passes tests with tiering disabled, and has style fixes. Now I just want to test again, to make sure that the tiering-disabled fixes don't break tiering-enabled.
Filip Pizlo
Comment 7 2011-08-29 20:34:27 PDT
(In reply to comment #6) > (In reply to comment #5) > > Created an attachment (id=105565) [details] [details] > > the patch (fix style) > > This now passes tests with tiering disabled, and has style fixes. Now I just want to test again, to make sure that the tiering-disabled fixes don't break tiering-enabled. Tests pass with and without tiering enabled. The web is usable with tiering enabled. Now checking if the web is usable with tiering disabled...
Filip Pizlo
Comment 8 2011-08-29 21:11:47 PDT
Comment on attachment 105565 [details] the patch (fix style) All tests pass. The web is usable with and without tiered compilation enabled.
Geoffrey Garen
Comment 9 2011-08-30 13:36:49 PDT
> wtf/DoublyLinkedSentinelList.h: Added. I think WTF::SentinelLinkedList can work for your purposes. Would be nice to only have one such class.
Filip Pizlo
Comment 10 2011-08-30 13:41:05 PDT
(In reply to comment #9) > > wtf/DoublyLinkedSentinelList.h: Added. > > I think WTF::SentinelLinkedList can work for your purposes. Would be nice to only have one such class. Note quite: SentinelLinkedList would result in inlining two copies of CallLinkInfo into CodeBlock. CallLinkInfo is a big-ish structure. I'm totally fine with doing that except for the increased mem usage. So what do you think? Have one such class but more space usage, or have two and win back ~10 words or so per CodeBlock?
Geoffrey Garen
Comment 11 2011-08-30 14:35:57 PDT
+// Finally, it differs from SentinelLinkedList by ensuring that the +// sentinels only have next/prev pointers instead of the full contents of +// the node structure. This makes DoublyLinkedSentinelList great for cases +// where the nodes are large. Sorry, didn't finish reading and see this comment. > SentinelLinkedList would result in inlining two copies of CallLinkInfo into CodeBlock. What we definitely don't want is two classes with similar-sounding names that don't distinguish what's different between them. "DoublyLinkedSentinelList" isn't a very good way to distinguish vs "SentinelLinkedList", since they're both doubly linked, and it doesn't say what's actually different between them (the inlining of the full node vs just prev/next pointers). We also probably don't want two different implementations of very similar algorithms, if we can avoid it. To solve this problem, how about templatizing the existing SentinelLinkedList to be SentinelLinkedList<T, Node=SentinelLinkedListNode<T> >, to allow clients to specialize as necessary for a class that doesn't inherit from SentinelLinkedListNode? (Internally, SentinelLinkedList would only deal in Node*, but when returning a Node* to the client, it would static_cast to T*.) Then, HandleHeap would use SentinelLinkedList<HandleHeap::Node, HandleHeap::Node>, and this patch would use SentinelLinkedList<CallLinkInfo>.
Filip Pizlo
Comment 12 2011-08-30 14:37:36 PDT
(In reply to comment #11) > +// Finally, it differs from SentinelLinkedList by ensuring that the > +// sentinels only have next/prev pointers instead of the full contents of > +// the node structure. This makes DoublyLinkedSentinelList great for cases > +// where the nodes are large. > > Sorry, didn't finish reading and see this comment. > > > SentinelLinkedList would result in inlining two copies of CallLinkInfo into CodeBlock. > > What we definitely don't want is two classes with similar-sounding names that don't distinguish what's different between them. "DoublyLinkedSentinelList" isn't a very good way to distinguish vs "SentinelLinkedList", since they're both doubly linked, and it doesn't say what's actually different between them (the inlining of the full node vs just prev/next pointers). > > We also probably don't want two different implementations of very similar algorithms, if we can avoid it. > > To solve this problem, how about templatizing the existing SentinelLinkedList to be SentinelLinkedList<T, Node=SentinelLinkedListNode<T> >, to allow clients to specialize as necessary for a class that doesn't inherit from SentinelLinkedListNode? (Internally, SentinelLinkedList would only deal in Node*, but when returning a Node* to the client, it would static_cast to T*.) > > Then, HandleHeap would use SentinelLinkedList<HandleHeap::Node, HandleHeap::Node>, and this patch would use SentinelLinkedList<CallLinkInfo>. This sounds good, I'll give it a shot.
Filip Pizlo
Comment 13 2011-08-30 20:37:04 PDT
Created attachment 105733 [details] the patch - fix review
Filip Pizlo
Comment 14 2011-08-31 14:12:21 PDT
The latest performance of this patch: Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/secondary/OpenSource/WebKitBuild/Release/jsc "TieringDisabled" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc Collected 18 samples per benchmark/VM, with 6 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree TieringDisabled SunSpider: 3d-cube 8.2834+-0.0364 8.2577+-0.0268 3d-morph 8.1428+-0.0572 8.1077+-0.0321 3d-raytrace 8.3447+-0.0734 8.2389+-0.0536 might be 1.0128x faster access-binary-trees 2.5759+-0.0295 2.5376+-0.0195 might be 1.0151x faster access-fannkuch 13.0586+-0.0459 ? 13.0897+-0.0386 ? access-nbody 4.9319+-0.0224 ? 4.9500+-0.0322 ? access-nsieve 3.0545+-0.0191 ? 3.0591+-0.0254 ? bitops-3bit-bits-in-byte 1.8822+-0.0140 1.8570+-0.0186 might be 1.0136x faster bitops-bits-in-byte 5.6499+-0.0987 5.6485+-0.0961 bitops-bitwise-and 4.1460+-0.0288 4.1221+-0.0225 bitops-nsieve-bits 5.7399+-0.0354 5.7095+-0.0371 controlflow-recursive 2.2823+-0.0281 ? 2.3029+-0.0331 ? crypto-aes 6.8808+-0.0355 6.8423+-0.0293 crypto-md5 3.0541+-0.0535 3.0423+-0.0357 crypto-sha1 2.4634+-0.0340 2.4452+-0.0260 date-format-tofte 11.1781+-0.1704 11.0855+-0.1969 date-format-xparb 9.3850+-0.0766 ! 9.7049+-0.0942 ! definitely 1.0341x slower math-cordic 7.0896+-0.0331 7.0832+-0.0353 math-partial-sums 10.7382+-0.0415 ^ 10.6581+-0.0225 ^ definitely 1.0075x faster math-spectral-norm 2.7366+-0.0245 ? 2.7499+-0.0275 ? regexp-dna 12.0810+-0.0976 ? 12.1257+-0.0889 ? string-base64 6.5605+-0.0861 6.5471+-0.0790 string-fasta 8.3149+-0.0272 ? 8.3204+-0.0309 ? string-tagcloud 15.0926+-0.1070 ? 15.2919+-0.1229 ? might be 1.0132x slower string-unpack-code 20.6676+-0.0894 ? 20.6991+-0.1211 ? string-validate-input 7.5367+-0.2107 ? 7.6801+-0.2017 ? might be 1.0190x slower <arithmetic> 7.3797+-0.0296 ? 7.3906+-0.0288 ? <geometric> 6.1216+-0.0265 6.1210+-0.0254 <harmonic> 5.0008+-0.0258 4.9910+-0.0263 TipOfTree TieringDisabled V8: crypto 102.7261+-0.2975 ? 102.8874+-0.2085 ? deltablue 298.8926+-1.6919 298.6987+-1.9454 earley-boyer 122.4942+-0.3332 ? 122.6251+-0.2529 ? raytrace 87.8440+-0.5329 87.6053+-0.7894 regexp 131.1448+-0.5178 ! 132.5521+-0.3833 ! definitely 1.0107x slower richards 301.1331+-0.5884 300.9316+-1.3004 splay 156.6207+-1.4762 156.5469+-0.9595 <arithmetic> 171.5508+-0.3431 ? 171.6924+-0.3451 ? <geometric> 153.8781+-0.2491 ? 154.0718+-0.2973 ? <harmonic> 140.1027+-0.2176 ? 140.2896+-0.3298 ? TipOfTree TieringDisabled Kraken: ai-astar 1662.2271+-14.8888 ? 1662.3195+-15.2833 ? audio-beat-detection 541.9177+-3.5377 538.8129+-1.4198 audio-dft 451.6485+-2.3367 449.9374+-2.8209 audio-fft 422.3728+-2.3541 419.6235+-0.6314 audio-oscillator 408.6592+-3.3316 ^ 404.3170+-0.6803 ^ definitely 1.0107x faster imaging-darkroom 586.6719+-1.3957 ? 594.6635+-7.4560 ? might be 1.0136x slower imaging-desaturate 635.8691+-13.4831 ? 636.3929+-13.6931 ? imaging-gaussian-blur 1854.6785+-2.7334 1850.4129+-2.9003 json-parse-financial 62.4728+-0.4730 ! 63.5460+-0.1337 ! definitely 1.0172x slower json-stringify-tinderbox 75.8496+-0.1552 75.5888+-0.1383 stanford-crypto-aes 166.6060+-0.4238 166.5077+-0.4959 stanford-crypto-ccm 130.7967+-0.3259 ? 130.9919+-0.4046 ? stanford-crypto-pbkdf2 376.4091+-1.3560 ^ 371.4304+-1.1870 ^ definitely 1.0134x faster stanford-crypto-sha256-iterative 144.4347+-0.3728 ? 144.4837+-0.5826 ? <arithmetic> 537.1867+-1.4454 536.3592+-1.7807 <geometric> 339.6910+-0.7336 339.3740+-0.9549 <harmonic> 213.0308+-0.4558 ? 213.5641+-0.4034 ? TipOfTree TieringDisabled All benchmarks: <arithmetic> 189.6455+-0.4455 189.4262+-0.5564 <geometric> 32.7321+-0.0950 32.7274+-0.0929 <harmonic> 8.8432+-0.0447 8.8266+-0.0455
Filip Pizlo
Comment 15 2011-09-02 19:50:03 PDT
Created attachment 106240 [details] work in progress Decided to combine the tiering patch with implementing dynamic optimization, because it turns out that my efforts to get dynamic optimization working revealed the need to make some changes to tiering heuristics. This patch is not in reviewable state yet, because it depends on a value profiling patch that is still in the commit queue.
WebKit Review Bot
Comment 16 2011-09-02 19:53:52 PDT
Attachment 106240 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'PerformanceTests/SunSpider/tests/v8-v6/v8-..." exit_code: 1 Source/JavaScriptCore/dfg/DFGPropagation.cpp:62: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Source/JavaScriptCore/bytecode/ValueProfile.h:168: One line control clauses should not use braces. [whitespace/braces] [4] Source/JavaScriptCore/bytecode/ValueProfile.h:172: One line control clauses should not use braces. [whitespace/braces] [4] Source/JavaScriptCore/dfg/DFGGraph.h:261: The parameter name "codeBlock" adds no information, so it should be removed. [readability/parameter_name] [5] Total errors found: 4 in 39 files If any of these errors are false positives, please file a bug against check-webkit-style.
Gyuyoung Kim
Comment 17 2011-09-02 20:11:02 PDT
Comment on attachment 106240 [details] work in progress Attachment 106240 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/9586660
Filip Pizlo
Comment 18 2011-09-02 22:23:57 PDT
Tiered compilation - even in this current fairly dumb form - is starting to show signs of goodness. When combined with https://bugs.webkit.org/show_bug.cgi?id=67553, we get a 10% speed-up on v8-crypto. Other benchmarks are still hurting because of a lack of OSR (so many benchmarks never tier up) and the fact that the DFG back-end still doesn't properly incorporate the rich set of numeric predictions we now have (bottom, int-static, int, double, number, top). Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "DynamicOpt" at /Volumes/Data/pizlo/quartary/OpenSource/WebKitBuild/Release/jsc "DynamicOpt2" at /Volumes/Data/pizlo/octonary/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree DynamicOpt DynamicOpt2 DynamicOpt2 v. TipOfTree SunSpider: 3d-cube 7.9698+-0.2087 ! 12.3583+-0.2144 ? 12.4259+-0.2686 ! definitely 1.5591x slower 3d-morph 7.7744+-0.1793 ! 8.1831+-0.1140 ? 8.1930+-0.2560 ? might be 1.0538x slower 3d-raytrace 7.6502+-0.1953 ! 8.4078+-0.1219 ? 8.7114+-0.3048 ! definitely 1.1387x slower access-binary-trees 2.3893+-0.0906 ? 2.4728+-0.0687 2.4634+-0.0682 ? might be 1.0310x slower access-fannkuch 11.9543+-0.2186 ! 13.0364+-0.2499 13.0251+-0.2684 ! definitely 1.0896x slower access-nbody 4.4484+-0.1560 4.3845+-0.0838 4.3802+-0.0679 might be 1.0156x faster access-nsieve 2.5408+-0.0307 ! 2.7409+-0.0567 ? 2.7937+-0.1355 ! definitely 1.0995x slower bitops-3bit-bits-in-byte 1.7154+-0.0824 ! 1.9264+-0.0602 1.9255+-0.0443 ! definitely 1.1224x slower bitops-bits-in-byte 4.5476+-0.2678 ! 5.3181+-0.2523 ? 5.5533+-0.3096 ! definitely 1.2211x slower bitops-bitwise-and 3.6850+-0.0556 ! 4.0393+-0.1645 ? 4.0762+-0.0869 ! definitely 1.1062x slower bitops-nsieve-bits 5.5807+-0.2016 ! 5.9615+-0.1491 ? 5.9727+-0.1217 ! definitely 1.0702x slower controlflow-recursive 2.0337+-0.0349 ? 2.0952+-0.0382 2.0585+-0.0367 ? might be 1.0122x slower crypto-aes 6.8239+-0.1930 ! 8.0399+-0.3227 7.8494+-0.3143 ! definitely 1.1503x slower crypto-md5 2.8098+-0.0809 ! 2.9741+-0.0799 2.9550+-0.0900 ? might be 1.0517x slower crypto-sha1 2.2982+-0.0805 ! 2.4675+-0.0638 ? 2.5981+-0.1117 ! definitely 1.1305x slower date-format-tofte 10.2719+-0.2905 ! 10.8794+-0.2106 10.5788+-0.2598 ? might be 1.0299x slower date-format-xparb 8.7886+-0.1802 ! 9.7157+-0.2797 9.5360+-0.2017 ! definitely 1.0850x slower math-cordic 6.4300+-0.1255 ! 6.8179+-0.0991 6.8140+-0.1386 ! definitely 1.0597x slower math-partial-sums 7.7420+-0.1017 7.6207+-0.1555 ? 7.7644+-0.1373 ? math-spectral-norm 2.5527+-0.0737 ? 2.6611+-0.0927 ? 2.7136+-0.0870 ! definitely 1.0630x slower regexp-dna 10.5863+-0.2935 10.4799+-0.1675 10.3336+-0.1690 might be 1.0245x faster string-base64 6.1758+-0.1848 ? 6.4751+-0.1442 ? 6.4844+-0.2474 ? might be 1.0500x slower string-fasta 7.6886+-0.2365 7.4774+-0.1863 7.4341+-0.1208 might be 1.0342x faster string-tagcloud 12.4531+-0.4789 ? 12.5956+-0.4655 12.5680+-0.3960 ? string-unpack-code 18.8142+-0.2687 ? 19.4565+-0.5111 19.2330+-0.4460 ? might be 1.0223x slower string-validate-input 7.2405+-0.3145 7.0192+-0.2957 6.8666+-0.1451 might be 1.0545x faster <arithmetic> 6.6525+-0.0370 ! 7.1386+-0.0312 7.1272+-0.0400 ! definitely 1.0714x slower <geometric> 5.5352+-0.0272 ! 5.9295+-0.0265 ? 5.9402+-0.0369 ! definitely 1.0732x slower <harmonic> 4.5247+-0.0275 ! 4.8366+-0.0343 ? 4.8590+-0.0477 ! definitely 1.0739x slower TipOfTree DynamicOpt DynamicOpt2 DynamicOpt2 v. TipOfTree V8: crypto 92.1294+-0.3719 ! 97.0536+-1.8706 ^ 83.3741+-0.6893 ^ definitely 1.1050x faster deltablue 143.2966+-1.3033 142.6860+-1.3562 ^ 138.8982+-0.6696 ^ definitely 1.0317x faster earley-boyer 100.6071+-0.4585 ! 108.4473+-0.9126 108.2658+-1.2111 ! definitely 1.0761x slower raytrace 52.3645+-0.4948 ! 55.0936+-0.4014 ? 55.6850+-0.4855 ! definitely 1.0634x slower regexp 110.4607+-0.8697 ? 110.7716+-0.3791 ! 112.5372+-1.1640 ! definitely 1.0188x slower richards 247.5590+-1.0163 ^ 235.0075+-1.6766 234.1091+-0.7328 ^ definitely 1.0575x faster splay 104.8805+-0.7566 ? 106.2095+-0.5770 ^ 104.7761+-0.6962 <arithmetic> 121.6140+-0.2401 ? 122.1813+-0.3841 ^ 119.6636+-0.2768 ^ definitely 1.0163x faster <geometric> 110.3617+-0.2388 ! 112.5381+-0.4232 ^ 109.8188+-0.2973 ^ definitely 1.0049x faster <harmonic> 100.7142+-0.3088 ! 103.8665+-0.4832 ^ 101.3032+-0.3159 ? TipOfTree DynamicOpt DynamicOpt2 DynamicOpt2 v. TipOfTree Kraken: ai-astar 1105.0592+-7.7066 ! 1130.2690+-4.0420 ! 1147.0835+-6.9637 ! definitely 1.0380x slower audio-beat-detection 480.9968+-2.2409 ! 510.7421+-2.2406 ? 516.0047+-4.2511 ! definitely 1.0728x slower audio-dft 443.7356+-16.1293 ! 477.5576+-6.2779 ? 479.8115+-6.7655 ! definitely 1.0813x slower audio-fft 378.2326+-4.0596 ! 394.4044+-3.0770 ! 401.4458+-2.2673 ! definitely 1.0614x slower audio-oscillator 381.0797+-1.3206 ^ 355.8970+-2.5926 355.7188+-2.0398 ^ definitely 1.0713x faster imaging-darkroom 540.2286+-3.5820 ? 550.3868+-7.1062 ^ 540.4649+-2.3819 ? imaging-desaturate 606.5337+-7.1713 ? 608.6832+-5.2899 606.8448+-1.6518 ? imaging-gaussian-blur 1753.3311+-11.1394 ! 2310.2974+-5.2848 ? 2320.8847+-16.6530 ! definitely 1.3237x slower json-parse-financial 48.0398+-0.4362 ^ 47.3302+-0.2202 ! 48.6760+-0.1937 ! definitely 1.0132x slower json-stringify-tinderbox 75.2160+-1.0468 74.0179+-0.2819 ? 74.3878+-0.9351 might be 1.0111x faster stanford-crypto-aes 148.1361+-0.7944 ! 163.0568+-0.9930 162.7343+-0.9379 ! definitely 1.0985x slower stanford-crypto-ccm 113.2673+-1.2877 ! 126.0253+-1.6922 ? 126.2694+-1.5381 ! definitely 1.1148x slower stanford-crypto-pbkdf2 343.6412+-1.8972 ! 367.0378+-2.6543 363.8759+-2.2831 ! definitely 1.0589x slower stanford-crypto-sha256-iterative 133.1163+-0.9564 ! 140.6644+-1.3553 ? 143.2696+-1.3289 ! definitely 1.0763x slower <arithmetic> 467.9010+-1.5248 ! 518.3121+-1.0232 ! 520.5337+-0.8504 ! definitely 1.1125x slower <geometric> 302.6273+-0.8312 ! 318.6361+-0.7012 ! 320.3008+-0.4731 ! definitely 1.0584x slower <harmonic> 187.3953+-0.5625 ! 192.5117+-0.5001 ! 194.6711+-0.4082 ! definitely 1.0388x slower TipOfTree DynamicOpt DynamicOpt2 DynamicOpt2 v. TipOfTree All benchmarks: <arithmetic> 161.1676+-0.4418 ! 176.5371+-0.3106 ? 176.8176+-0.2616 ! definitely 1.0971x slower <geometric> 28.4665+-0.0678 ! 30.1160+-0.0719 30.0832+-0.1083 ! definitely 1.0568x slower <harmonic> 7.9791+-0.0471 ! 8.5209+-0.0588 ? 8.5580+-0.0820 ! definitely 1.0726x slower
Filip Pizlo
Comment 19 2011-09-02 22:24:40 PDT
> VMs tested: > "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc > "DynamicOpt" at /Volumes/Data/pizlo/quartary/OpenSource/WebKitBuild/Release/jsc > "DynamicOpt2" at /Volumes/Data/pizlo/octonary/OpenSource/WebKitBuild/Release/jsc Clarification: DynamicOpt is without https://bugs.webkit.org/show_bug.cgi?id=67553, while DynamicOpt2 includes it.
Filip Pizlo
Comment 20 2011-09-03 00:57:34 PDT
This now passes tests on Mac with tiering turned on or off, and is performance neutral: Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "DynamicOptOff" at /Volumes/Data/pizlo/quartary/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree DynamicOptOff SunSpider: 3d-cube 7.8026+-0.1319 ? 8.0115+-0.2923 ? might be 1.0268x slower 3d-morph 7.7991+-0.1005 7.7622+-0.1117 3d-raytrace 7.5590+-0.1870 ? 7.7149+-0.1663 ? might be 1.0206x slower access-binary-trees 2.2853+-0.0465 ? 2.3733+-0.0971 ? might be 1.0385x slower access-fannkuch 11.9772+-0.1921 ? 12.0523+-0.2001 ? access-nbody 4.2695+-0.0637 ? 4.3550+-0.0838 ? might be 1.0200x slower access-nsieve 2.5857+-0.0623 2.5259+-0.0335 might be 1.0237x faster bitops-3bit-bits-in-byte 1.7484+-0.0610 1.7483+-0.0627 bitops-bits-in-byte 4.5852+-0.2422 ? 4.6009+-0.2260 ? bitops-bitwise-and 3.9057+-0.1456 ^ 3.6829+-0.0259 ^ definitely 1.0605x faster bitops-nsieve-bits 5.5786+-0.1629 5.5126+-0.1170 might be 1.0120x faster controlflow-recursive 2.0641+-0.0366 2.0464+-0.0442 crypto-aes 6.6893+-0.1335 6.6408+-0.1453 crypto-md5 2.8474+-0.0735 2.8030+-0.0486 might be 1.0158x faster crypto-sha1 2.2725+-0.0441 2.2244+-0.0288 might be 1.0216x faster date-format-tofte 10.4639+-0.3802 ? 10.5192+-0.3720 ? date-format-xparb 8.7383+-0.2265 ? 8.7740+-0.2847 ? math-cordic 6.4347+-0.0758 6.4201+-0.1139 math-partial-sums 7.7395+-0.1153 ? 7.7764+-0.1624 ? math-spectral-norm 2.5587+-0.0453 ? 2.6333+-0.0696 ? might be 1.0292x slower regexp-dna 10.4553+-0.1850 ? 10.4958+-0.3009 ? string-base64 6.0972+-0.1483 6.0229+-0.0849 might be 1.0123x faster string-fasta 7.6165+-0.1736 7.5390+-0.0975 might be 1.0103x faster string-tagcloud 12.1907+-0.3555 ? 12.2377+-0.3733 ? string-unpack-code 18.9763+-0.3913 ? 18.9901+-0.4418 ? string-validate-input 7.3371+-0.2900 ? 7.4599+-0.3495 ? might be 1.0167x slower <arithmetic> 6.6376+-0.0251 ? 6.6509+-0.0409 ? <geometric> 5.5297+-0.0173 ? 5.5308+-0.0351 ? <harmonic> 4.5317+-0.0182 4.5257+-0.0334 TipOfTree DynamicOptOff V8: crypto 92.2392+-0.3275 91.8243+-0.5196 deltablue 142.5229+-0.8982 142.2090+-1.0506 earley-boyer 100.9745+-0.8094 ? 101.6080+-0.3983 ? raytrace 52.8335+-0.3815 ? 53.4131+-0.4945 ? might be 1.0110x slower regexp 111.2927+-0.2813 ^ 110.2772+-0.3147 ^ definitely 1.0092x faster richards 247.2330+-0.5759 ? 249.2783+-2.2003 ? splay 105.6152+-0.3719 104.7256+-0.7142 <arithmetic> 121.8159+-0.2410 ? 121.9051+-0.3111 ? <geometric> 110.7040+-0.2602 ? 110.7184+-0.2196 ? <harmonic> 101.1695+-0.2987 ? 101.2719+-0.2621 ? TipOfTree DynamicOptOff Kraken: ai-astar 1114.1983+-9.0038 1113.4397+-6.8018 audio-beat-detection 482.8746+-3.3869 482.4078+-3.1902 audio-dft 438.5291+-5.3478 ? 443.8677+-8.3113 ? might be 1.0122x slower audio-fft 371.6724+-0.6381 ! 378.7265+-0.3997 ! definitely 1.0190x slower audio-oscillator 386.6817+-0.6754 ^ 381.2110+-0.8942 ^ definitely 1.0144x faster imaging-darkroom 560.0890+-1.2229 ^ 539.1056+-1.3278 ^ definitely 1.0389x faster imaging-desaturate 601.8699+-6.3995 600.7752+-6.6430 imaging-gaussian-blur 1768.8532+-6.7428 ^ 1754.3062+-2.6500 ^ definitely 1.0083x faster json-parse-financial 48.5644+-0.2174 ? 48.5948+-0.3096 ? json-stringify-tinderbox 74.0839+-0.5558 ? 75.0791+-0.4754 ? might be 1.0134x slower stanford-crypto-aes 147.1298+-1.2234 ? 148.3481+-0.5196 ? stanford-crypto-ccm 112.6304+-0.2754 ! 113.9235+-0.3058 ! definitely 1.0115x slower stanford-crypto-pbkdf2 349.9624+-3.3126 347.6330+-1.4456 stanford-crypto-sha256-iterative 133.9685+-0.4310 ^ 133.0874+-0.4124 ^ definitely 1.0066x faster <arithmetic> 470.7934+-0.5195 ^ 468.6075+-0.9538 ^ definitely 1.0047x faster <geometric> 303.5829+-0.5037 303.3028+-0.6546 <harmonic> 187.5942+-0.5320 ? 188.1997+-0.4789 ? TipOfTree DynamicOptOff All benchmarks: <arithmetic> 162.0510+-0.1704 ^ 161.4205+-0.2913 ^ definitely 1.0039x faster <geometric> 28.4908+-0.0604 28.4865+-0.1037 <harmonic> 7.9916+-0.0315 7.9817+-0.0575
Filip Pizlo
Comment 21 2011-09-03 01:12:41 PDT
Created attachment 106254 [details] the patch This might be the one, if the bots are happy.
Filip Pizlo
Comment 22 2011-09-03 01:19:52 PDT
Created attachment 106255 [details] the patch - fix conflicts, style
Filip Pizlo
Comment 23 2011-09-03 01:21:30 PDT
Created attachment 106256 [details] the patch - removed spurious debug nonsense from v8-crypto
Filip Pizlo
Comment 24 2011-09-03 01:43:14 PDT
Comment on attachment 106256 [details] the patch - removed spurious debug nonsense from v8-crypto All tests pass after merge.
Filip Pizlo
Comment 25 2011-09-03 02:01:17 PDT
Created attachment 106257 [details] the patch - fix windows
Filip Pizlo
Comment 26 2011-09-03 02:53:37 PDT
Created attachment 106259 [details] the patch - another attempt to fix windows
Collabora GTK+ EWS bot
Comment 27 2011-09-03 04:49:19 PDT
Comment on attachment 106259 [details] the patch - another attempt to fix windows Attachment 106259 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/9591065
Filip Pizlo
Comment 28 2011-09-03 21:38:58 PDT
Created attachment 106276 [details] the patch - fix GTK
Gavin Barraclough
Comment 29 2011-09-05 16:21:54 PDT
Comment on attachment 106276 [details] the patch - fix GTK View in context: https://bugs.webkit.org/attachment.cgi?id=106276&action=review r- for a few comments to look at, basically all looks good though. > Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:1006 > +#if CPU(X86_64) This should be using 'scratchRegister', not 'X86Registers::r11'. > Source/JavaScriptCore/bytecode/CodeBlock.h:227 > + PassOwnPtr<CodeBlock> releaseAlternative() { return m_alternative.release(); } I think the code could probably be refactored quite easily to remove releaseAlternative(), and I think it would be easier to follow if we did so. The code currently sets the m_fooBarCodeBlock variables on Executables quite aggressively, and then has to reset this value if compilation fails. Now this involves switching CodeBlocks & switching back - I think it would be conceptually cleaner to just store the new code block in a local RefPtr, and assign across (releasing from the local) to the member only if compilation succeeds. > Source/JavaScriptCore/dfg/DFGPropagation.cpp:35 > +class Propagator { Sorry to be a pain, but this .cpp & .h should really probably be DFGPropagator.cpp/.h. > Source/JavaScriptCore/dfg/DFGPropagation.cpp:138 > + setPrediction(makePrediction(PredictInt32, DynamicPrediction)); If I understand this correctly, the value 'DynamicPrediction' seems to be being used here to cover two different types of static predictions. In the case of bitops & valuetoint32, we can statically infer with absolute certainty that the result is an int32. In the case of ArithMod & UInt32ToNumber I guess you are assuming the results are highly likely to be int32, and as such want to give a string prediction rather than a weak one, despite this being static & not dynamic? If so, I'd suggest adding a couple of new enum values, say StrongStaticPrediction & StaticInference. You could make their values equal to DynamicPrediction such that they will currently be handled the same for now, but give us the option to change how these predictions are merged with each other in the future. > Source/JavaScriptCore/jit/JITCode.h:52 > + It might be more helpful to have a separate type indicating the compilation kind. Using the JITType here artificially makes a tie between a policy (tiered compilation), and a set of particular mechanisms (BaselineJIT, DFGJIT). Using the same enum to select may make it more difficult to reconfigure this in the future. E.g. should we want to use the non-spec path of the DFG-JIT with optimization hooks as our low tier JIT & a speculating compile from the DFG-JIT exploiting value profiling. In such a case, I guess we can just replace DFGJIT with two new enum entries, & deal with working out in all cases whether DFGJIT checks were checking for top tier or the new compiler. But we may regret not having an explicit type for the tiers. > Source/JavaScriptCore/wtf/Platform.h:978 > +#endif Slightly verbose set of config, & requires two switches setting - could just make this: #if !defined(ENABLE_VALUE_PROFILER) #define ENABLE_VALUE_PROFILER ENABLE_TIERED_COMPILATION #endif #if !defined(ENABLE_DYNAMIC_OPTIMIZATION) #define ENABLE_DYNAMIC_OPTIMIZATION ENABLE_TIERED_COMPILATION #endif So by default both follow the ENABLE_TIERED_COMPILATION setting.
Filip Pizlo
Comment 30 2011-09-05 17:23:45 PDT
> > Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:1006 > > +#if CPU(X86_64) > > This should be using 'scratchRegister', not 'X86Registers::r11'. I wanted to do that, but that won't work. :-( scratchRegister is in MacroAssemblerX86_64.h. I also can't move this branchAdd32 overload to the subclasses (MacroAssemblerX86_64 and MacroAssemblerX86) because if you have a subclass overloading something in a superclass, the compiler gets confused. Basically if I have this particular overload of branchAdd32 in MacroAssemblerX86_64 then the branchAdd32's in MacroAssemblerX86Common get hidden. So we have a number of choices for workarounds: 1) Keep it as is, and accept this as local ugliness. 2) Move the scratchRegister declaration to MacroAssemblerX86Common. 3) Move all branchAdd32 overloads to the MacroAssemblerX86_64 and MacroAssemblerX86. 4) Rename this particular overload of branchAdd32 and put it in MacroAssemblerX86_64. Which do you prefer? I'm starting to like (4) best. > > > Source/JavaScriptCore/bytecode/CodeBlock.h:227 > > + PassOwnPtr<CodeBlock> releaseAlternative() { return m_alternative.release(); } > > I think the code could probably be refactored quite easily to remove releaseAlternative(), and I think it would be easier to follow if we did so. > > The code currently sets the m_fooBarCodeBlock variables on Executables quite aggressively, and then has to reset this value if compilation fails. Now this involves switching CodeBlocks & switching back - I think it would be conceptually cleaner to just store the new code block in a local RefPtr, and assign across (releasing from the local) to the member only if compilation succeeds. +1 > > > Source/JavaScriptCore/dfg/DFGPropagation.cpp:35 > > +class Propagator { > > Sorry to be a pain, but this .cpp & .h should really probably be DFGPropagator.cpp/.h. +1 > > > Source/JavaScriptCore/dfg/DFGPropagation.cpp:138 > > + setPrediction(makePrediction(PredictInt32, DynamicPrediction)); > > If I understand this correctly, the value 'DynamicPrediction' seems to be being used here to cover two different types of static predictions. > In the case of bitops & valuetoint32, we can statically infer with absolute certainty that the result is an int32. > In the case of ArithMod & UInt32ToNumber I guess you are assuming the results are highly likely to be int32, and as such want to give a string prediction rather than a weak one, despite this being static & not dynamic? > > If so, I'd suggest adding a couple of new enum values, say StrongStaticPrediction & StaticInference. You could make their values equal to DynamicPrediction such that they will currently be handled the same for now, but give us the option to change how these predictions are merged with each other in the future. What about just renaming DynamicPrediction to StrongPrediction and StaticPrediction to WeakPrediction? > > > Source/JavaScriptCore/jit/JITCode.h:52 > > + > > It might be more helpful to have a separate type indicating the compilation kind. Using the JITType here artificially makes a tie between a policy (tiered compilation), and a set of particular mechanisms (BaselineJIT, DFGJIT). Using the same enum to select may make it more difficult to reconfigure this in the future. E.g. should we want to use the non-spec path of the DFG-JIT with optimization hooks as our low tier JIT & a speculating compile from the DFG-JIT exploiting value profiling. > > In such a case, I guess we can just replace DFGJIT with two new enum entries, & deal with working out in all cases whether DFGJIT checks were checking for top tier or the new compiler. But we may regret not having an explicit type for the tiers. There isn't a lot of code that relies on this enum. So do we really need to future-proof this? Right now we do indeed have a tie between mechanism and policy, because we're using the BaselineJIT as our bottom tier, and the DFGJIT for the top tier. So to me it makes sense to have one enum for both policy and mechanism, does it not? > > > Source/JavaScriptCore/wtf/Platform.h:978 > > +#endif > > Slightly verbose set of config, & requires two switches setting - could just make this: > > #if !defined(ENABLE_VALUE_PROFILER) > #define ENABLE_VALUE_PROFILER ENABLE_TIERED_COMPILATION > #endif > #if !defined(ENABLE_DYNAMIC_OPTIMIZATION) > #define ENABLE_DYNAMIC_OPTIMIZATION ENABLE_TIERED_COMPILATION > #endif > > So by default both follow the ENABLE_TIERED_COMPILATION setting. +1
Filip Pizlo
Comment 31 2011-09-05 17:31:39 PDT
> > Source/JavaScriptCore/bytecode/CodeBlock.h:227 > > + PassOwnPtr<CodeBlock> releaseAlternative() { return m_alternative.release(); } > > I think the code could probably be refactored quite easily to remove releaseAlternative(), and I think it would be easier to follow if we did so. > > The code currently sets the m_fooBarCodeBlock variables on Executables quite aggressively, and then has to reset this value if compilation fails. Now this involves switching CodeBlocks & switching back - I think it would be conceptually cleaner to just store the new code block in a local RefPtr, and assign across (releasing from the local) to the member only if compilation succeeds. Just remembered why I did it this way: CodeBlock is not ref counted. But do we really want to make it ref counted just for this case?
Gavin Barraclough
Comment 32 2011-09-05 17:47:20 PDT
> Just remembered why I did it this way: CodeBlock is not ref counted. But do we really want to make it ref counted just for this case? Gah, sorry, meant OwnPtr rather than RefPtr. I think the suggestion still stands? - code block could be a local OwnPtr, released & assigned to the executable only if compilation is successful.
Filip Pizlo
Comment 33 2011-09-05 17:49:31 PDT
Created attachment 106369 [details] the patch - partial fix review For the more subtle parts of the review I proceeded as follows. For the last two bullets, I kept the code as-is, because AFAIK that's the lesser of the various evils right now. branchAdd32 should use scratchRegister not r11: I renamed it to branchAdd32Absolute and moved it to the subclasses. DynamicPrediction: I renamed DynamicPrediction to StrongPrediction and StaticPrediction to WeakPrediction. setting m_fooBarCodeBlock before we know that optimizing will succeed: I kept this as-is, because CodeBlock is not ref-counted, and the CodeBlock::m_alternative pointer is an OwnPtr that needs to be initialized prior to compiling. So we need a releaseAlternative() method, and we need to call this method in Executable.cpp in a number of places, if we want to back out of optimization. We can either (1) make CodeBlock ref-counted just to satisfy this case and get rid of releaseAlternative(), or (2) keep it as-is. JITType enum: I kept this as-is, because currently having two enums (policy and mechanism) would just mean having two enums that say the same thing.
Filip Pizlo
Comment 34 2011-09-05 17:51:26 PDT
(In reply to comment #32) > > Just remembered why I did it this way: CodeBlock is not ref counted. But do we really want to make it ref counted just for this case? > > Gah, sorry, meant OwnPtr rather than RefPtr. I think the suggestion still stands? - code block could be a local OwnPtr, released & assigned to the executable only if compilation is successful. But then we still need releaseAlternative(). We need CodeBlock::m_alternative to be set prior to invoking optimized compilation, because optimized compilation reads this field. So if optimized compilation fails, we need releaseAlternative(). I know, it's nasty. But that leaves us with these choices: 1) Leave it as-is. 2) Make CodeBlock ref-counted to satisfy this case. 3) Don't set m_fooBarCodeBlock, but still release() it, prior to optimized compilation. That might make the code make a bit more sense, but it wouldn't be too different from what I have right now.
Early Warning System Bot
Comment 35 2011-09-05 18:05:38 PDT
Comment on attachment 106369 [details] the patch - partial fix review Attachment 106369 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/9592670
Filip Pizlo
Comment 36 2011-09-05 18:07:39 PDT
Created attachment 106370 [details] the patch - partial fix review, fix Qt build
Gavin Barraclough
Comment 37 2011-09-05 20:51:13 PDT
> branchAdd32 should use scratchRegister not r11: > I renamed it to branchAdd32Absolute and moved it to the subclasses. You forgot to remove branchAdd32 from MacroAssemblerX86Common.h! Also branchAdd32Absolute should be named branchAdd32, other AbsoluteAddress'ed methods aren't named in this fashion. I'm fine with the other changes / review push-back, r+ with MacroAssemblerX86Common::branchAdd32 removed, branchAdd32Absolute renamed to branchAdd32.
Filip Pizlo
Comment 38 2011-09-05 21:09:18 PDT
(In reply to comment #37) > > branchAdd32 should use scratchRegister not r11: > > I renamed it to branchAdd32Absolute and moved it to the subclasses. > > You forgot to remove branchAdd32 from MacroAssemblerX86Common.h! > Also branchAdd32Absolute should be named branchAdd32, other AbsoluteAddress'ed methods aren't named in this fashion. If I rename it to branchAdd32() then I get overload hiding madness. If I just change the name to branchAdd32() but leave it in the subclasses, I get the following errors for uses of the other branchAdd32() overloads: /Volumes/Data/pizlo/quartary/OpenSource/Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:686:102:{686:34-686:51}{686:102-686:114}: error: too many arguments to function call, expected 3, have 4 [2] speculationCheck(m_jit.branchAdd32(MacroAssembler::Overflow, op2.gpr(), Imm32(imm1), result.gpr())); ~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~~ Notice that here we're trying to call MacroAssemblerX86Common::branchAdd32(ResultCondition, RegisterID, TrustedImm32, RegisterID). But the compiler assumes that MacroAssemblerX86_64::branchAdd32(ResultCondition, RegisterID, AbsoluteAddress) is hiding that method, along with the other overloads. This is because C++ apparently dictates that name resolution is done before overload resolution, so as soon as MacroAssemblerX86_64::branchAdd32 is found, none of the overloads of MacroAssemblerX86Common::branchAdd32are visible. > I'm fine with the other changes / review push-back, r+ with MacroAssemblerX86Common::branchAdd32 removed, branchAdd32Absolute renamed to branchAdd32. OK, what would you prefer? :-) Once again, the options are: 1) Do as I did before. Move branchAdd32Absolute to MacroAssemblerX86Common, but then I have to use r11 directly instead of via the scratchRegister alias. 2) As with (1) but move the scratchRegister declaration to MacroAssemblerX86Common. 3) Move all branchAdd32 overloads to the MacroAssemblerX86_64 and MacroAssemblerX86. I don't like this option because that's a lot of code duplication. 4) Rename this particular overload of branchAdd32 to branchAdd32Absolute and put it in MacroAssemblerX86_64, which is what I had done for this patch. I tend to think that (4) is the lesser of the various evils, though I'm happy with any of the other approaches. I'm not trying to advocate styles here ... I totally agree that branchAdd32Absolute "should" be called branchAdd32, and that we "should" refer to scratchRegister instead of directly to r11, but we're pushing up against bizarre corner cases of C++ here.
Filip Pizlo
Comment 39 2011-09-05 21:16:27 PDT
Created attachment 106384 [details] the patch - more fix review, sort of This removes branchAdd32 from MacroAssembler86Common, but still calls the absolute one branchAdd32Absolute due to C++ overload hiding rules. I'm happy to use some other strategy to get around the overload hiding ... this was just the one I thought was least evil.
Gavin Barraclough
Comment 40 2011-09-05 23:32:18 PDT
(In reply to comment #39) > Created an attachment (id=106384) [details] > the patch - more fix review, sort of > > This removes branchAdd32 from MacroAssembler86Common, but still calls the absolute one branchAdd32Absolute due to C++ overload hiding rules. I'm happy to use some other strategy to get around the overload hiding ... this was just the one I thought was least evil. Ah! - I see - you just need to add a using directive in the MacroAssemblerX86/MacroAssemblerX86_64 classes: using MacroAssemblerX86Common::branchAdd32; You should see a bunch of them at these head of these classes already, around lines 45-55 of both files.
Filip Pizlo
Comment 41 2011-09-05 23:40:31 PDT
(In reply to comment #40) > (In reply to comment #39) > > Created an attachment (id=106384) [details] [details] > > the patch - more fix review, sort of > > > > This removes branchAdd32 from MacroAssembler86Common, but still calls the absolute one branchAdd32Absolute due to C++ overload hiding rules. I'm happy to use some other strategy to get around the overload hiding ... this was just the one I thought was least evil. > > Ah! - I see - you just need to add a using directive in the MacroAssemblerX86/MacroAssemblerX86_64 classes: > using MacroAssemblerX86Common::branchAdd32; > > You should see a bunch of them at these head of these classes already, around lines 45-55 of both files. Ah, that's pretty cool. I didn't know using was that powerful! :-) OK, I've fixed this and will commit with this change.
Filip Pizlo
Comment 42 2011-09-05 23:45:00 PDT
Created attachment 106393 [details] the patch Will run a final batch of tests and then cq+.
Gavin Barraclough
Comment 43 2011-09-06 00:02:08 PDT
Comment on attachment 106393 [details] the patch preemptive r+ pending cq+ on your testing :-)
WebKit Review Bot
Comment 44 2011-09-06 02:23:04 PDT
Comment on attachment 106393 [details] the patch Clearing flags on attachment: 106393 Committed r94559: <http://trac.webkit.org/changeset/94559>
WebKit Review Bot
Comment 45 2011-09-06 02:23:13 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.