Summary: | DFG JIT does not speculate aggressively enough on GetById | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Filip Pizlo <fpizlo> | ||||||||||||
Component: | JavaScriptCore | Assignee: | Nobody <webkit-unassigned> | ||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||
Severity: | Normal | CC: | aroben, fpizlo, oliver, webkit.review.bot | ||||||||||||
Priority: | P2 | ||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||
Hardware: | All | ||||||||||||||
OS: | All | ||||||||||||||
Attachments: |
|
Description
Filip Pizlo
2011-09-17 22:56:31 PDT
Created attachment 107783 [details]
work in progress
This is still incomplete. Putting this up here as back-up.
Created attachment 107786 [details]
work in progress
Added DataFormatStorage, StorageOperand, and implemented CheckStructure and GetByOffset. Still totally untested.
Created attachment 107791 [details]
work in progress
Still testing this, but it's looking pretty good - 76% speed-up on property accesses. Consider this simple program:
function foo(o, n) {
var result = 0;
while(n-->0) {
result += o.a + o.b + o.c;
}
return result;
}
var object = {a:1, b:2, c:3};
var before = preciseTime();
var result = foo(object, 10000000);
var after = preciseTime();
print("Result: "+result+" Time: "+((after-before)*1000));
Here's the before/after:
BEFORE:
Result: 60000000 Time: 57.630062103271484
AFTER:
Result: 60000000 Time: 32.59611129760742
I still need to test this on real programs.
It's a 4% speed-up on V8. Benchmark report for V8. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "GetByOffset" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree GetByOffset crypto 83.8914+-1.1391 82.4519+-1.3381 might be 1.0175x faster deltablue 243.3592+-1.9683 ^ 228.2100+-2.6421 ^ definitely 1.0664x faster earley-boyer 96.1566+-0.2312 ^ 92.6819+-0.4199 ^ definitely 1.0375x faster raytrace 69.7617+-0.6717 ^ 63.8253+-0.8824 ^ definitely 1.0930x faster regexp 106.6008+-0.6734 105.8089+-0.3464 richards 217.7843+-0.6560 ^ 203.0919+-1.3343 ^ definitely 1.0723x faster splay 98.1221+-0.4274 ? 99.5908+-1.1138 ? might be 1.0150x slower <arithmetic> 130.8109+-0.2346 ^ 125.0944+-0.5873 ^ definitely 1.0457x faster <geometric> 117.6414+-0.1960 ^ 113.1895+-0.5083 ^ definitely 1.0393x faster <harmonic> 107.7976+-0.2612 ^ 103.8596+-0.5277 ^ definitely 1.0379x faster This looks like it's doing some badness for Kraken. Will investigate. Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "GetByOffset" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree GetByOffset SunSpider: 3d-cube 7.5804+-0.1025 ? 7.6657+-0.1333 ? might be 1.0113x slower 3d-morph 7.4365+-0.1415 7.3918+-0.1206 3d-raytrace 7.6661+-0.1294 ? 7.7235+-0.0875 ? access-binary-trees 2.2694+-0.0512 2.2500+-0.0551 access-fannkuch 11.5251+-0.2310 11.4352+-0.1252 access-nbody 4.2127+-0.0956 ^ 3.7493+-0.0932 ^ definitely 1.1236x faster access-nsieve 2.5927+-0.0339 2.5845+-0.0651 bitops-3bit-bits-in-byte 1.6769+-0.0350 ? 1.6862+-0.0559 ? bitops-bits-in-byte 2.7056+-0.0484 2.6958+-0.0434 bitops-bitwise-and 3.6031+-0.1254 ? 3.6406+-0.1198 ? might be 1.0104x slower bitops-nsieve-bits 5.2817+-0.1050 ? 5.3470+-0.0798 ? might be 1.0124x slower controlflow-recursive 1.9900+-0.0507 ? 2.0723+-0.0604 ? might be 1.0414x slower crypto-aes 7.0620+-0.2252 6.9372+-0.2120 might be 1.0180x faster crypto-md5 2.7704+-0.0604 2.7578+-0.0877 crypto-sha1 2.1668+-0.0376 ? 2.1892+-0.0495 ? might be 1.0104x slower date-format-tofte 9.9513+-0.1459 ? 10.0977+-0.1961 ? might be 1.0147x slower date-format-xparb 8.8571+-0.1317 8.7075+-0.1660 might be 1.0172x faster math-cordic 6.1409+-0.1022 ? 6.2348+-0.2257 ? might be 1.0153x slower math-partial-sums 7.4173+-0.1544 7.3043+-0.1211 might be 1.0155x faster math-spectral-norm 2.5619+-0.0318 ? 2.5942+-0.0542 ? might be 1.0126x slower regexp-dna 10.8761+-0.2557 10.7495+-0.1449 might be 1.0118x faster string-base64 5.7409+-0.1239 5.6667+-0.1387 might be 1.0131x faster string-fasta 7.0083+-0.1162 ? 7.0727+-0.1328 ? string-tagcloud 11.8651+-0.1467 11.7426+-0.2059 might be 1.0104x faster string-unpack-code 18.8426+-0.3420 18.6557+-0.2536 might be 1.0100x faster string-validate-input 6.6370+-0.2098 6.4269+-0.1164 might be 1.0327x faster <arithmetic> 6.4015+-0.0389 6.3607+-0.0311 <geometric> 5.2729+-0.0289 5.2464+-0.0233 <harmonic> 4.2973+-0.0306 4.2888+-0.0286 TipOfTree GetByOffset V8: crypto 83.5475+-0.8616 ? 85.2520+-1.0487 ? might be 1.0204x slower deltablue 242.3931+-2.8106 ^ 226.9906+-1.2293 ^ definitely 1.0679x faster earley-boyer 96.2478+-0.4171 ^ 93.4152+-0.6641 ^ definitely 1.0303x faster raytrace 69.6915+-0.4099 ^ 64.1450+-0.3871 ^ definitely 1.0865x faster regexp 107.7330+-1.6024 106.3051+-0.5100 might be 1.0134x faster richards 217.4246+-1.6886 ^ 201.0345+-1.6029 ^ definitely 1.0815x faster splay 100.1975+-2.4578 98.5484+-0.4887 might be 1.0167x faster <arithmetic> 131.0336+-0.9944 ^ 125.0987+-0.2380 ^ definitely 1.0474x faster <geometric> 117.9991+-0.8652 ^ 113.5980+-0.2332 ^ definitely 1.0387x faster <harmonic> 108.1704+-0.7331 ^ 104.5304+-0.2575 ^ definitely 1.0348x faster TipOfTree GetByOffset Kraken: ai-astar 639.2052+-9.5529 ^ 617.7010+-5.2686 ^ definitely 1.0348x faster audio-beat-detection 469.1687+-4.6644 ? 475.5938+-5.4870 ? might be 1.0137x slower audio-dft 427.3820+-6.7772 419.8933+-4.6365 might be 1.0178x faster audio-fft 364.7972+-1.1347 ! 371.4988+-2.9430 ! definitely 1.0184x slower audio-oscillator 313.8251+-0.8023 ! 388.3917+-2.8457 ! definitely 1.2376x slower imaging-darkroom 411.9896+-0.7587 ? 420.0720+-8.4754 ? might be 1.0196x slower imaging-desaturate 218.3297+-0.3545 217.8529+-0.5346 imaging-gaussian-blur 587.5906+-2.2849 ? 590.5179+-1.0733 ? json-parse-financial 49.8097+-0.4280 ^ 48.8371+-0.2607 ^ definitely 1.0199x faster json-stringify-tinderbox 68.3106+-0.4785 ? 68.9709+-0.6994 ? stanford-crypto-aes 144.6621+-1.6741 143.5692+-1.5384 stanford-crypto-ccm 111.8888+-0.6187 ? 112.4789+-1.2441 ? stanford-crypto-pbkdf2 388.5531+-2.2593 ! 396.6219+-2.2329 ! definitely 1.0208x slower stanford-crypto-sha256-iterative 147.8469+-0.6880 ? 149.7923+-2.0456 ? might be 1.0132x slower <arithmetic> 310.2399+-0.5340 ! 315.8423+-1.0812 ! definitely 1.0181x slower <geometric> 242.6350+-0.3027 ! 246.7610+-0.7284 ! definitely 1.0170x slower <harmonic> 173.7287+-0.4609 ! 174.8320+-0.4895 ! definitely 1.0064x slower TipOfTree GetByOffset All benchmarks: <arithmetic> 115.4688+-0.1885 ! 116.2311+-0.3077 ! definitely 1.0066x slower <geometric> 26.2069+-0.0884 26.1173+-0.0668 <harmonic> 7.5860+-0.0528 7.5692+-0.0494 It looks like the reason for the slow-down in Kraken is just because of a lack of recompilation. On the first ~5000 executions of this function, the get_by_id self cache is successful. It then falls apart, and the get_by_id's go polymorphic. But by then the DFG has already done its compilation. Recommendation: since the speed-up on V8 is bigger than the slow-down on Kraken, I suggest landing this, but adding this to the list of reasons why we need recompilation. Created attachment 107796 [details]
the patch
Attachment 107796 [details] did not pass style-queue:
Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1
Source/JavaScriptCore/dfg/DFGJITCodeGenerator.h:66: One space before end of line comments [whitespace/comments] [5]
Source/JavaScriptCore/dfg/DFGNode.h:234: Missing space inside { }. [whitespace/braces] [5]
Source/JavaScriptCore/dfg/DFGNode.h:236: Missing space inside { }. [whitespace/braces] [5]
Source/JavaScriptCore/dfg/DFGNode.h:237: Missing space inside { }. [whitespace/braces] [5]
Source/JavaScriptCore/bytecode/PredictedType.h:155: The parameter name "classInfo" adds no information, so it should be removed. [readability/parameter_name] [5]
Total errors found: 5 in 12 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 107797 [details]
the patch
Comment on attachment 107797 [details]
the patch
safeCast isn't actually safe -- you should use Checked<T> if you want overflow safe arithmetic, or at least make safeCast use isInBounds from CheckedArithmetic.h so that your overflow detection is correct.
Comment on attachment 107797 [details]
the patch
r+ with my suggested change to safeCast
Looks like this isn't ready yet. It breaks bing.com maps. (In reply to comment #12) > Looks like this isn't ready yet. It breaks bing.com maps. Never mind. It's not a regression, and the fix is here: https://bugs.webkit.org/show_bug.cgi?id=68430 Landed in r95523. This broke the Windows build <http://build.webkit.org/builders/Windows%20Release%20%28Build%29/builds/21118/steps/compile-webkit/logs/stdio>. I'm not sure why it passed the EWS bot; maybe some changes were made before landing? In any case, I landed http://trac.webkit.org/changeset/95540 to try to fix it. |