Bug 68320

Summary:

DFG JIT does not speculate aggressively enough on GetById

Product:

WebKit

Reporter:

Filip Pizlo <fpizlo>

Component:

JavaScriptCore

Assignee:

Nobody <webkit-unassigned>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

aroben, fpizlo, oliver, webkit.review.bot

Priority:

Version:

528+ (Nightly build)

Hardware:

All

OS:

All

Attachments:

Description	Flags
work in progress	none
work in progress	none
work in progress	none
the patch	none
the patch	oliver: review+

Filip Pizlo

Reported 2011-09-17 22:56:31 PDT

The DFG JIT uses the same optimization strategy as the old JIT for GetById: it assumes nothing a priori and lets the code get repatched. But this results in a large amount of slow-path code, which takes up space and increases i-cache misses. It also means that repeated GetById's on the same base do repeated structure checks, and repeated loads of the storage pointer. But the DFG JIT has complete information available to it about how the old JIT patched its get_by_id's, and whether or not that patching was the right thing to do (if the slow path is ever taken then the get_by_id turns into a polymorphic list access). It should take advantage of this to reduce the amount of slow path code that it emits, and eliminate redundant structure checks and redundant storage loads. This will also allow the DFG to treat these optimized GetById's as being pure. This will enable load elimination on GetById, and make it so that CSE no longer has to assume that GetById clobbers the world. That being said, the current GetById strategy should still be used as a fall-back if the old JIT's profiling tells us that it would be unwise to speculate on the structure.

Attachments
work in progress (15.40 KB, patch) 2011-09-17 22:59 PDT, Filip Pizlo	no flags	Details Formatted Diff Diff
work in progress (27.48 KB, patch) 2011-09-18 00:34 PDT, Filip Pizlo	no flags	Details Formatted Diff Diff
work in progress (31.64 KB, patch) 2011-09-18 12:54 PDT, Filip Pizlo	no flags	Details Formatted Diff Diff
the patch (32.32 KB, patch) 2011-09-18 14:28 PDT, Filip Pizlo	no flags	Details Formatted Diff Diff
the patch (32.82 KB, patch) 2011-09-18 14:37 PDT, Filip Pizlo	oliver: review+	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Filip Pizlo

Comment 1 2011-09-17 22:59:25 PDT

Created attachment 107783 [details] work in progress This is still incomplete. Putting this up here as back-up.

Filip Pizlo

Comment 2 2011-09-18 00:34:25 PDT

Created attachment 107786 [details] work in progress Added DataFormatStorage, StorageOperand, and implemented CheckStructure and GetByOffset. Still totally untested.

Filip Pizlo

Comment 3 2011-09-18 12:54:54 PDT

Created attachment 107791 [details] work in progress Still testing this, but it's looking pretty good - 76% speed-up on property accesses. Consider this simple program: function foo(o, n) { var result = 0; while(n-->0) { result += o.a + o.b + o.c; } return result; } var object = {a:1, b:2, c:3}; var before = preciseTime(); var result = foo(object, 10000000); var after = preciseTime(); print("Result: "+result+" Time: "+((after-before)*1000)); Here's the before/after: BEFORE: Result: 60000000 Time: 57.630062103271484 AFTER: Result: 60000000 Time: 32.59611129760742 I still need to test this on real programs.

Filip Pizlo

Comment 4 2011-09-18 13:04:01 PDT

It's a 4% speed-up on V8. Benchmark report for V8. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "GetByOffset" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree GetByOffset crypto 83.8914+-1.1391 82.4519+-1.3381 might be 1.0175x faster deltablue 243.3592+-1.9683 ^ 228.2100+-2.6421 ^ definitely 1.0664x faster earley-boyer 96.1566+-0.2312 ^ 92.6819+-0.4199 ^ definitely 1.0375x faster raytrace 69.7617+-0.6717 ^ 63.8253+-0.8824 ^ definitely 1.0930x faster regexp 106.6008+-0.6734 105.8089+-0.3464 richards 217.7843+-0.6560 ^ 203.0919+-1.3343 ^ definitely 1.0723x faster splay 98.1221+-0.4274 ? 99.5908+-1.1138 ? might be 1.0150x slower <arithmetic> 130.8109+-0.2346 ^ 125.0944+-0.5873 ^ definitely 1.0457x faster <geometric> 117.6414+-0.1960 ^ 113.1895+-0.5083 ^ definitely 1.0393x faster <harmonic> 107.7976+-0.2612 ^ 103.8596+-0.5277 ^ definitely 1.0379x faster

Filip Pizlo

Comment 5 2011-09-18 13:08:31 PDT

This looks like it's doing some badness for Kraken. Will investigate. Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "GetByOffset" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree GetByOffset SunSpider: 3d-cube 7.5804+-0.1025 ? 7.6657+-0.1333 ? might be 1.0113x slower 3d-morph 7.4365+-0.1415 7.3918+-0.1206 3d-raytrace 7.6661+-0.1294 ? 7.7235+-0.0875 ? access-binary-trees 2.2694+-0.0512 2.2500+-0.0551 access-fannkuch 11.5251+-0.2310 11.4352+-0.1252 access-nbody 4.2127+-0.0956 ^ 3.7493+-0.0932 ^ definitely 1.1236x faster access-nsieve 2.5927+-0.0339 2.5845+-0.0651 bitops-3bit-bits-in-byte 1.6769+-0.0350 ? 1.6862+-0.0559 ? bitops-bits-in-byte 2.7056+-0.0484 2.6958+-0.0434 bitops-bitwise-and 3.6031+-0.1254 ? 3.6406+-0.1198 ? might be 1.0104x slower bitops-nsieve-bits 5.2817+-0.1050 ? 5.3470+-0.0798 ? might be 1.0124x slower controlflow-recursive 1.9900+-0.0507 ? 2.0723+-0.0604 ? might be 1.0414x slower crypto-aes 7.0620+-0.2252 6.9372+-0.2120 might be 1.0180x faster crypto-md5 2.7704+-0.0604 2.7578+-0.0877 crypto-sha1 2.1668+-0.0376 ? 2.1892+-0.0495 ? might be 1.0104x slower date-format-tofte 9.9513+-0.1459 ? 10.0977+-0.1961 ? might be 1.0147x slower date-format-xparb 8.8571+-0.1317 8.7075+-0.1660 might be 1.0172x faster math-cordic 6.1409+-0.1022 ? 6.2348+-0.2257 ? might be 1.0153x slower math-partial-sums 7.4173+-0.1544 7.3043+-0.1211 might be 1.0155x faster math-spectral-norm 2.5619+-0.0318 ? 2.5942+-0.0542 ? might be 1.0126x slower regexp-dna 10.8761+-0.2557 10.7495+-0.1449 might be 1.0118x faster string-base64 5.7409+-0.1239 5.6667+-0.1387 might be 1.0131x faster string-fasta 7.0083+-0.1162 ? 7.0727+-0.1328 ? string-tagcloud 11.8651+-0.1467 11.7426+-0.2059 might be 1.0104x faster string-unpack-code 18.8426+-0.3420 18.6557+-0.2536 might be 1.0100x faster string-validate-input 6.6370+-0.2098 6.4269+-0.1164 might be 1.0327x faster <arithmetic> 6.4015+-0.0389 6.3607+-0.0311 <geometric> 5.2729+-0.0289 5.2464+-0.0233 <harmonic> 4.2973+-0.0306 4.2888+-0.0286 TipOfTree GetByOffset V8: crypto 83.5475+-0.8616 ? 85.2520+-1.0487 ? might be 1.0204x slower deltablue 242.3931+-2.8106 ^ 226.9906+-1.2293 ^ definitely 1.0679x faster earley-boyer 96.2478+-0.4171 ^ 93.4152+-0.6641 ^ definitely 1.0303x faster raytrace 69.6915+-0.4099 ^ 64.1450+-0.3871 ^ definitely 1.0865x faster regexp 107.7330+-1.6024 106.3051+-0.5100 might be 1.0134x faster richards 217.4246+-1.6886 ^ 201.0345+-1.6029 ^ definitely 1.0815x faster splay 100.1975+-2.4578 98.5484+-0.4887 might be 1.0167x faster <arithmetic> 131.0336+-0.9944 ^ 125.0987+-0.2380 ^ definitely 1.0474x faster <geometric> 117.9991+-0.8652 ^ 113.5980+-0.2332 ^ definitely 1.0387x faster <harmonic> 108.1704+-0.7331 ^ 104.5304+-0.2575 ^ definitely 1.0348x faster TipOfTree GetByOffset Kraken: ai-astar 639.2052+-9.5529 ^ 617.7010+-5.2686 ^ definitely 1.0348x faster audio-beat-detection 469.1687+-4.6644 ? 475.5938+-5.4870 ? might be 1.0137x slower audio-dft 427.3820+-6.7772 419.8933+-4.6365 might be 1.0178x faster audio-fft 364.7972+-1.1347 ! 371.4988+-2.9430 ! definitely 1.0184x slower audio-oscillator 313.8251+-0.8023 ! 388.3917+-2.8457 ! definitely 1.2376x slower imaging-darkroom 411.9896+-0.7587 ? 420.0720+-8.4754 ? might be 1.0196x slower imaging-desaturate 218.3297+-0.3545 217.8529+-0.5346 imaging-gaussian-blur 587.5906+-2.2849 ? 590.5179+-1.0733 ? json-parse-financial 49.8097+-0.4280 ^ 48.8371+-0.2607 ^ definitely 1.0199x faster json-stringify-tinderbox 68.3106+-0.4785 ? 68.9709+-0.6994 ? stanford-crypto-aes 144.6621+-1.6741 143.5692+-1.5384 stanford-crypto-ccm 111.8888+-0.6187 ? 112.4789+-1.2441 ? stanford-crypto-pbkdf2 388.5531+-2.2593 ! 396.6219+-2.2329 ! definitely 1.0208x slower stanford-crypto-sha256-iterative 147.8469+-0.6880 ? 149.7923+-2.0456 ? might be 1.0132x slower <arithmetic> 310.2399+-0.5340 ! 315.8423+-1.0812 ! definitely 1.0181x slower <geometric> 242.6350+-0.3027 ! 246.7610+-0.7284 ! definitely 1.0170x slower <harmonic> 173.7287+-0.4609 ! 174.8320+-0.4895 ! definitely 1.0064x slower TipOfTree GetByOffset All benchmarks: <arithmetic> 115.4688+-0.1885 ! 116.2311+-0.3077 ! definitely 1.0066x slower <geometric> 26.2069+-0.0884 26.1173+-0.0668 <harmonic> 7.5860+-0.0528 7.5692+-0.0494

Filip Pizlo

Comment 6 2011-09-18 14:18:55 PDT

It looks like the reason for the slow-down in Kraken is just because of a lack of recompilation. On the first ~5000 executions of this function, the get_by_id self cache is successful. It then falls apart, and the get_by_id's go polymorphic. But by then the DFG has already done its compilation. Recommendation: since the speed-up on V8 is bigger than the slow-down on Kraken, I suggest landing this, but adding this to the list of reasons why we need recompilation.

Filip Pizlo

Comment 7 2011-09-18 14:28:49 PDT

Created attachment 107796 [details] the patch

WebKit Review Bot

Comment 8 2011-09-18 14:31:22 PDT

Attachment 107796 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/dfg/DFGJITCodeGenerator.h:66: One space before end of line comments [whitespace/comments] [5] Source/JavaScriptCore/dfg/DFGNode.h:234: Missing space inside { }. [whitespace/braces] [5] Source/JavaScriptCore/dfg/DFGNode.h:236: Missing space inside { }. [whitespace/braces] [5] Source/JavaScriptCore/dfg/DFGNode.h:237: Missing space inside { }. [whitespace/braces] [5] Source/JavaScriptCore/bytecode/PredictedType.h:155: The parameter name "classInfo" adds no information, so it should be removed. [readability/parameter_name] [5] Total errors found: 5 in 12 files If any of these errors are false positives, please file a bug against check-webkit-style.

Filip Pizlo

Comment 9 2011-09-18 14:37:18 PDT

Created attachment 107797 [details] the patch

Oliver Hunt

Comment 10 2011-09-19 14:33:55 PDT

Comment on attachment 107797 [details] the patch safeCast isn't actually safe -- you should use Checked<T> if you want overflow safe arithmetic, or at least make safeCast use isInBounds from CheckedArithmetic.h so that your overflow detection is correct.

Oliver Hunt

Comment 11 2011-09-19 15:28:20 PDT

Comment on attachment 107797 [details] the patch r+ with my suggested change to safeCast

Filip Pizlo

Comment 12 2011-09-20 02:30:16 PDT

Looks like this isn't ready yet. It breaks bing.com maps.

Filip Pizlo

Comment 13 2011-09-20 02:38:20 PDT

(In reply to comment #12) > Looks like this isn't ready yet. It breaks bing.com maps. Never mind. It's not a regression, and the fix is here: https://bugs.webkit.org/show_bug.cgi?id=68430

Filip Pizlo

Comment 14 2011-09-20 02:41:31 PDT

Landed in r95523.

Adam Roben (:aroben)

Comment 15 2011-09-20 07:08:47 PDT

This broke the Windows build <http://build.webkit.org/builders/Windows%20Release%20%28Build%29/builds/21118/steps/compile-webkit/logs/stdio>. I'm not sure why it passed the EWS bot; maybe some changes were made before landing? In any case, I landed http://trac.webkit.org/changeset/95540 to try to fix it.

Note You need to log in before you can comment on or make changes to this bug.