Patch forthcoming
<rdar://problem/10698060>
Created attachment 122589 [details] work in progress Getting there. Not there yet though.
Created attachment 122591 [details] work in progress It's starting to work, but it needs quite a bit more testing.
Attachment 122591 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h:877: The parameter name "codeOrigin" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h:878: The parameter name "codeOrigin" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h:880: The parameter name "codeOrigin" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h:881: The parameter name "codeOrigin" adds no information, so it should be removed. [readability/parameter_name] [5] Total errors found: 4 in 16 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 122591 [details] work in progress Attachment 122591 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/11113970
Created attachment 122678 [details] the patch
Comment on attachment 122678 [details] the patch Attachment 122678 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/11254305
Comment on attachment 122678 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=122678&action=review r=me Qt build not happy. > Source/JavaScriptCore/dfg/DFGJITCompiler.cpp:131 > + // FIXME: this is absurdly wrong. (1) Yikes. (2) Why don't you file a bug about what's wrong, and remove the FIXME. > Source/JavaScriptCore/dfg/DFGOperations.cpp:835 > + CodeOrigin codeOrigin = stubInfo->codeOrigin; > + while (codeOrigin.inlineCallFrame) > + codeOrigin = codeOrigin.inlineCallFrame->caller; Is this loop also absurdly wrong, as the loop above is? > Source/JavaScriptCore/dfg/DFGRepatch.cpp:283 > + // Need extra checks. A "why" comment would be better here, explaining that non-flushing gets must be trivial values, because non-trivial values might call to C++ code and stomp on our non-flushed registers.
This is perf-neutral on the major benchmarks. Benchmark report for SunSpider, V8, and Kraken on bigmac (MacPro5,1). VMs tested: "TipOfTree" at /Volumes/Data/pizlo/secondary/OpenSource/WebKitBuild/Release/DumpRenderTree (r105082) "OldJIT" at /Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/DumpRenderTree (r105082) "GetterCache" at /Volumes/Data/pizlo/quartary/OpenSource/WebKitBuild/Release/DumpRenderTree (r105082) Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the portable Date.now() method to get millisecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree OldJIT GetterCache GetterCache v. TipOfTree SunSpider: 3d-cube 7.0833+-0.8332 ! 9.5833+-0.6885 ^ 7.2500+-0.7722 ? might be 1.0235x slower 3d-morph 8.8333+-0.2473 8.3333+-0.3128 ? 8.7500+-0.2874 3d-raytrace 8.5833+-0.5720 ? 9.5000+-0.5069 8.2500+-0.7722 might be 1.0404x faster access-binary-trees 2.5000+-0.6354 ? 3.0833+-0.6330 2.4167+-0.4248 might be 1.0345x faster access-fannkuch 7.5833+-0.4248 ! 13.5833+-0.4248 ^ 7.6667+-0.4138 ? might be 1.0110x slower access-nbody 4.0833+-0.1834 ! 7.5833+-0.3272 ^ 4.2500+-0.2874 ? might be 1.0408x slower access-nsieve 3.3333+-0.3128 ? 3.9167+-0.3272 3.4167+-0.3272 ? might be 1.0250x slower bitops-3bit-bits-in-byte 1.2500+-0.2874 ! 2.5833+-0.3272 ^ 1.4167+-0.3272 ? might be 1.1333x slower bitops-bits-in-byte 5.5833+-0.3272 ! 6.0000+-0.0000 ^ 5.6667+-0.3128 ? might be 1.0149x slower bitops-bitwise-and 3.3333+-0.3128 ! 4.0833+-0.1834 ^ 3.2500+-0.2874 might be 1.0256x faster bitops-nsieve-bits 5.7500+-0.2874 ? 5.8333+-0.2473 5.7500+-0.2874 controlflow-recursive 2.5833+-0.3272 2.2500+-0.2874 ? 2.5000+-0.3318 might be 1.0333x faster crypto-aes 8.5000+-0.8779 7.5000+-0.7420 ? 8.1667+-1.0076 might be 1.0408x faster crypto-md5 2.5000+-0.3318 ! 3.0000+-0.0000 2.7500+-0.2874 ? might be 1.1000x slower crypto-sha1 2.2500+-0.2874 ? 2.6667+-0.3128 2.4167+-0.3272 ? might be 1.0741x slower date-format-tofte 13.0000+-1.4336 ? 13.6667+-1.3636 12.9167+-1.4704 date-format-xparb 12.1667+-1.4798 11.5000+-1.3133 ? 11.9167+-1.4704 might be 1.0210x faster math-cordic 7.4167+-0.3272 7.3333+-0.3128 7.3333+-0.3128 might be 1.0114x faster math-partial-sums 10.6667+-0.3128 10.6667+-0.3128 10.6667+-0.3128 math-spectral-norm 2.6667+-0.3128 ! 4.7500+-0.2874 ^ 2.6667+-0.3128 regexp-dna 9.8333+-0.5956 ? 9.9167+-0.5038 9.9167+-0.5720 ? string-base64 5.6667+-0.9121 5.6667+-0.9121 ? 5.7500+-0.9815 ? might be 1.0147x slower string-fasta 8.0000+-0.7663 ? 8.2500+-0.7722 8.0000+-0.8567 string-tagcloud 13.2500+-0.3949 13.1667+-0.4560 ? 13.5833+-0.5720 ? might be 1.0252x slower string-unpack-code 23.0000+-1.3272 22.3333+-1.2513 ? 23.1667+-1.2366 ? string-validate-input 7.0833+-1.0991 7.0833+-0.9171 7.0000+-0.9385 might be 1.0119x faster <arithmetic> * 7.1731+-0.4415 ? 7.8397+-0.4109 7.1859+-0.4386 ? might be 1.0018x slower <geometric> 5.7494+-0.3382 ! 6.6279+-0.3450 ^ 5.8054+-0.3348 ? might be 1.0097x slower <harmonic> 4.4182+-0.2477 ! 5.5129+-0.2933 ^ 4.5538+-0.2810 ? might be 1.0307x slower TipOfTree OldJIT GetterCache GetterCache v. TipOfTree V8: crypto 77.5000+-0.8779 ! 200.5833+-0.5038 ^ 77.5833+-0.7399 ? deltablue 174.9167+-2.2416 ! 280.5000+-3.4958 ^ 174.4167+-2.9858 earley-boyer 98.3333+-1.3364 ! 122.2500+-1.0182 ^ 98.2500+-1.3848 raytrace 53.5833+-0.8332 ! 71.3333+-0.7821 ^ 53.2500+-0.9815 regexp 102.0833+-0.8761 ? 102.1667+-0.5956 101.5000+-0.5069 richards 141.4167+-1.8664 ! 274.7500+-2.0832 ^ 139.5000+-0.8350 might be 1.0137x faster splay 121.1667+-45.3339 ? 131.9167+-45.9184 121.5833+-43.9626 ? <arithmetic> 109.8571+-7.1039 ! 169.0714+-7.3637 ^ 109.4405+-7.0731 might be 1.0038x faster <geometric> * 101.4766+-5.7549 ! 149.0634+-7.4459 ^ 101.2528+-5.7107 might be 1.0022x faster <harmonic> 93.7618+-4.4638 ! 131.1242+-6.4855 ^ 93.6635+-4.4156 might be 1.0010x faster TipOfTree OldJIT GetterCache GetterCache v. TipOfTree Kraken: ai-astar 820.7500+-12.3355 ! 2011.6667+-26.5260 ^ 820.0833+-12.2019 audio-beat-detection 202.0000+-2.7761 ! 555.0000+-3.5634 ^ 204.2500+-1.5356 ? might be 1.0111x slower audio-dft 276.6667+-1.5165 ! 447.5833+-2.3690 ^ 278.6667+-1.6554 ? audio-fft 123.0833+-0.8761 ! 426.4167+-2.5481 ^ 122.2500+-0.5502 audio-oscillator 279.7500+-2.5720 ! 383.2500+-1.4870 ^ 279.5833+-0.9939 imaging-darkroom 308.5000+-0.6907 ! 593.6667+-11.0526 ^ 308.1667+-1.1757 imaging-desaturate 230.7500+-0.2874 ! 614.0000+-1.1170 ^ 230.3333+-0.3128 imaging-gaussian-blur 507.5000+-2.6197 ! 2127.9167+-3.4425 ^ 506.8333+-2.6103 json-parse-financial 73.2500+-5.0476 72.8333+-5.1321 72.6667+-5.4612 json-stringify-tinderbox 84.5833+-0.5720 ^ 83.0000+-0.3831 ! 84.0833+-0.6330 stanford-crypto-aes 118.9167+-2.1066 ! 166.5833+-1.3936 ^ 119.8333+-1.5759 ? stanford-crypto-ccm 122.8333+-1.2946 121.0833+-0.9563 ! 123.5000+-0.9579 ? stanford-crypto-pbkdf2 227.2500+-2.0832 ! 415.5833+-2.9611 ^ 226.0833+-1.1951 stanford-crypto-sha256-iterative 103.3333+-3.1517 ! 161.9167+-3.5784 ^ 104.0833+-3.1532 ? <arithmetic> * 248.5119+-1.0005 ! 584.3214+-1.6714 ^ 248.6012+-1.2898 ? might be 1.0004x slower <geometric> 195.6625+-1.9151 ! 355.5735+-2.1038 ^ 195.7577+-1.9882 ? might be 1.0005x slower <harmonic> 160.4673+-2.7914 ! 225.7891+-4.0986 ^ 160.4073+-2.8873 might be 1.0004x faster TipOfTree OldJIT GetterCache GetterCache v. TipOfTree All benchmarks: <arithmetic> 94.3546+-1.5450 ! 203.5709+-1.3371 ^ 94.3262+-1.5735 might be 1.0003x faster <geometric> 25.1982+-1.0981 ! 34.4913+-1.3017 ^ 25.3294+-1.0832 ? might be 1.0052x slower <harmonic> 7.7723+-0.4299 ! 9.7273+-0.5124 ^ 8.0038+-0.4855 ? might be 1.0298x slower TipOfTree OldJIT GetterCache GetterCache v. TipOfTree Geomean of preferred means: <scaled-result> 56.5109+-2.2661 ! 88.0008+-2.9640 ^ 56.5109+-2.2584 ? might be 1.0000x slower
Performance on Dromaeo DOM Traversal nextSibling: Old JIT ToT: 644.40runs/s ±7.18% DFG ToT: 377.60runs/s ±0.50% DFG ToT + this patch: 767.80runs/s ±6.80% So a 2x speed-up in the DFG alone, and a 20% speed-up over where we would have been pre-DFG.
Landed in http://trac.webkit.org/changeset/105107
> Qt build not happy. Should be fixed. Turns out that the 32-bit build was not happy in general. > > > Source/JavaScriptCore/dfg/DFGJITCompiler.cpp:131 > > + // FIXME: this is absurdly wrong. > > (1) Yikes. > > (2) Why don't you file a bug about what's wrong, and remove the FIXME. Filed a radar. > > > Source/JavaScriptCore/dfg/DFGOperations.cpp:835 > > + CodeOrigin codeOrigin = stubInfo->codeOrigin; > > + while (codeOrigin.inlineCallFrame) > > + codeOrigin = codeOrigin.inlineCallFrame->caller; > > Is this loop also absurdly wrong, as the loop above is? Yup. > > > Source/JavaScriptCore/dfg/DFGRepatch.cpp:283 > > + // Need extra checks. > > A "why" comment would be better here, explaining that non-flushing gets must be trivial values, because non-trivial values might call to C++ code and stomp on our non-flushed registers. Added a better comment.
Reopen, because it broke many tests on 32 bit: http://build.webkit.org/results/Qt%20Linux%20Release/r105107%20%2842168%29/results.html
Created attachment 122724 [details] the patch to fix 32-bit Fixed JSVALUE32_64 reversal of tag and payload when handling stub call return. Fixed double use of GPRResult. (Second use should always use the GPRResult2 class.)
Comment on attachment 122724 [details] the patch to fix 32-bit r=me, I tested on 32 bit Qt and it works.
Comment on attachment 122724 [details] the patch to fix 32-bit Clearing flags because I just landed.
Landed a fix for 32-bit in http://trac.webkit.org/changeset/105131
(In reply to comment #17) > Landed a fix for 32-bit in http://trac.webkit.org/changeset/105131