Bug 68320 - DFG JIT does not speculate aggressively enough on GetById
Summary: DFG JIT does not speculate aggressively enough on GetById
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-17 22:56 PDT by Filip Pizlo
Modified: 2011-09-20 07:08 PDT (History)
4 users (show)

See Also:


Attachments
work in progress (15.40 KB, patch)
2011-09-17 22:59 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
work in progress (27.48 KB, patch)
2011-09-18 00:34 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
work in progress (31.64 KB, patch)
2011-09-18 12:54 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (32.32 KB, patch)
2011-09-18 14:28 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (32.82 KB, patch)
2011-09-18 14:37 PDT, Filip Pizlo
oliver: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2011-09-17 22:56:31 PDT
The DFG JIT uses the same optimization strategy as the old JIT for GetById: it assumes nothing a priori and lets the code get repatched.  But this results in a large amount of slow-path code, which takes up space and increases i-cache misses.  It also means that repeated GetById's on the same base do repeated structure checks, and repeated loads of the storage pointer.  But the DFG JIT has complete information available to it about how the old JIT patched its get_by_id's, and whether or not that patching was the right thing to do (if the slow path is ever taken then the get_by_id turns into a polymorphic list access).  It should take advantage of this to reduce the amount of slow path code that it emits, and eliminate redundant structure checks and redundant storage loads.

This will also allow the DFG to treat these optimized GetById's as being pure.  This will enable load elimination on GetById, and make it so that CSE no longer has to assume that GetById clobbers the world.

That being said, the current GetById strategy should still be used as a fall-back if the old JIT's profiling tells us that it would be unwise to speculate on the structure.
Comment 1 Filip Pizlo 2011-09-17 22:59:25 PDT
Created attachment 107783 [details]
work in progress

This is still incomplete.  Putting this up here as back-up.
Comment 2 Filip Pizlo 2011-09-18 00:34:25 PDT
Created attachment 107786 [details]
work in progress

Added DataFormatStorage, StorageOperand, and implemented CheckStructure and GetByOffset.  Still totally untested.
Comment 3 Filip Pizlo 2011-09-18 12:54:54 PDT
Created attachment 107791 [details]
work in progress

Still testing this, but it's looking pretty good - 76% speed-up on property accesses.  Consider this simple program:

function foo(o, n) {
    var result = 0;
    while(n-->0) {
        result += o.a + o.b + o.c;
    }
    return result;
}

var object = {a:1, b:2, c:3};
var before = preciseTime();
var result = foo(object, 10000000);
var after = preciseTime();
print("Result: "+result+"  Time: "+((after-before)*1000));

Here's the before/after:

BEFORE: 
Result: 60000000  Time: 57.630062103271484

AFTER:
Result: 60000000  Time: 32.59611129760742

I still need to test this on real programs.
Comment 4 Filip Pizlo 2011-09-18 13:04:01 PDT
It's a 4% speed-up on V8.



Benchmark report for V8.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"GetByOffset" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1
benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime()
function to get microsecond-level timing. Reporting benchmark execution times with 95%
confidence intervals in milliseconds.

                     TipOfTree              GetByOffset                                   

crypto            83.8914+-1.1391         82.4519+-1.3381         might be 1.0175x faster
deltablue        243.3592+-1.9683    ^   228.2100+-2.6421       ^ definitely 1.0664x faster
earley-boyer      96.1566+-0.2312    ^    92.6819+-0.4199       ^ definitely 1.0375x faster
raytrace          69.7617+-0.6717    ^    63.8253+-0.8824       ^ definitely 1.0930x faster
regexp           106.6008+-0.6734        105.8089+-0.3464       
richards         217.7843+-0.6560    ^   203.0919+-1.3343       ^ definitely 1.0723x faster
splay             98.1221+-0.4274    ?    99.5908+-1.1138       ? might be 1.0150x slower

<arithmetic>     130.8109+-0.2346    ^   125.0944+-0.5873       ^ definitely 1.0457x faster
<geometric>      117.6414+-0.1960    ^   113.1895+-0.5083       ^ definitely 1.0393x faster
<harmonic>       107.7976+-0.2612    ^   103.8596+-0.5277       ^ definitely 1.0379x faster
Comment 5 Filip Pizlo 2011-09-18 13:08:31 PDT
This looks like it's doing some badness for Kraken.  Will investigate.



Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"GetByOffset" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree              GetByOffset                                   
SunSpider:
   3d-cube                                7.5804+-0.1025    ?     7.6657+-0.1333       ? might be 1.0113x slower
   3d-morph                               7.4365+-0.1415          7.3918+-0.1206       
   3d-raytrace                            7.6661+-0.1294    ?     7.7235+-0.0875       ?
   access-binary-trees                    2.2694+-0.0512          2.2500+-0.0551       
   access-fannkuch                       11.5251+-0.2310         11.4352+-0.1252       
   access-nbody                           4.2127+-0.0956    ^     3.7493+-0.0932       ^ definitely 1.1236x faster
   access-nsieve                          2.5927+-0.0339          2.5845+-0.0651       
   bitops-3bit-bits-in-byte               1.6769+-0.0350    ?     1.6862+-0.0559       ?
   bitops-bits-in-byte                    2.7056+-0.0484          2.6958+-0.0434       
   bitops-bitwise-and                     3.6031+-0.1254    ?     3.6406+-0.1198       ? might be 1.0104x slower
   bitops-nsieve-bits                     5.2817+-0.1050    ?     5.3470+-0.0798       ? might be 1.0124x slower
   controlflow-recursive                  1.9900+-0.0507    ?     2.0723+-0.0604       ? might be 1.0414x slower
   crypto-aes                             7.0620+-0.2252          6.9372+-0.2120         might be 1.0180x faster
   crypto-md5                             2.7704+-0.0604          2.7578+-0.0877       
   crypto-sha1                            2.1668+-0.0376    ?     2.1892+-0.0495       ? might be 1.0104x slower
   date-format-tofte                      9.9513+-0.1459    ?    10.0977+-0.1961       ? might be 1.0147x slower
   date-format-xparb                      8.8571+-0.1317          8.7075+-0.1660         might be 1.0172x faster
   math-cordic                            6.1409+-0.1022    ?     6.2348+-0.2257       ? might be 1.0153x slower
   math-partial-sums                      7.4173+-0.1544          7.3043+-0.1211         might be 1.0155x faster
   math-spectral-norm                     2.5619+-0.0318    ?     2.5942+-0.0542       ? might be 1.0126x slower
   regexp-dna                            10.8761+-0.2557         10.7495+-0.1449         might be 1.0118x faster
   string-base64                          5.7409+-0.1239          5.6667+-0.1387         might be 1.0131x faster
   string-fasta                           7.0083+-0.1162    ?     7.0727+-0.1328       ?
   string-tagcloud                       11.8651+-0.1467         11.7426+-0.2059         might be 1.0104x faster
   string-unpack-code                    18.8426+-0.3420         18.6557+-0.2536         might be 1.0100x faster
   string-validate-input                  6.6370+-0.2098          6.4269+-0.1164         might be 1.0327x faster

   <arithmetic>                           6.4015+-0.0389          6.3607+-0.0311       
   <geometric>                            5.2729+-0.0289          5.2464+-0.0233       
   <harmonic>                             4.2973+-0.0306          4.2888+-0.0286       

                                            TipOfTree              GetByOffset                                   
V8:
   crypto                                83.5475+-0.8616    ?    85.2520+-1.0487       ? might be 1.0204x slower
   deltablue                            242.3931+-2.8106    ^   226.9906+-1.2293       ^ definitely 1.0679x faster
   earley-boyer                          96.2478+-0.4171    ^    93.4152+-0.6641       ^ definitely 1.0303x faster
   raytrace                              69.6915+-0.4099    ^    64.1450+-0.3871       ^ definitely 1.0865x faster
   regexp                               107.7330+-1.6024        106.3051+-0.5100         might be 1.0134x faster
   richards                             217.4246+-1.6886    ^   201.0345+-1.6029       ^ definitely 1.0815x faster
   splay                                100.1975+-2.4578         98.5484+-0.4887         might be 1.0167x faster

   <arithmetic>                         131.0336+-0.9944    ^   125.0987+-0.2380       ^ definitely 1.0474x faster
   <geometric>                          117.9991+-0.8652    ^   113.5980+-0.2332       ^ definitely 1.0387x faster
   <harmonic>                           108.1704+-0.7331    ^   104.5304+-0.2575       ^ definitely 1.0348x faster

                                            TipOfTree              GetByOffset                                   
Kraken:
   ai-astar                             639.2052+-9.5529    ^   617.7010+-5.2686       ^ definitely 1.0348x faster
   audio-beat-detection                 469.1687+-4.6644    ?   475.5938+-5.4870       ? might be 1.0137x slower
   audio-dft                            427.3820+-6.7772        419.8933+-4.6365         might be 1.0178x faster
   audio-fft                            364.7972+-1.1347    !   371.4988+-2.9430       ! definitely 1.0184x slower
   audio-oscillator                     313.8251+-0.8023    !   388.3917+-2.8457       ! definitely 1.2376x slower
   imaging-darkroom                     411.9896+-0.7587    ?   420.0720+-8.4754       ? might be 1.0196x slower
   imaging-desaturate                   218.3297+-0.3545        217.8529+-0.5346       
   imaging-gaussian-blur                587.5906+-2.2849    ?   590.5179+-1.0733       ?
   json-parse-financial                  49.8097+-0.4280    ^    48.8371+-0.2607       ^ definitely 1.0199x faster
   json-stringify-tinderbox              68.3106+-0.4785    ?    68.9709+-0.6994       ?
   stanford-crypto-aes                  144.6621+-1.6741        143.5692+-1.5384       
   stanford-crypto-ccm                  111.8888+-0.6187    ?   112.4789+-1.2441       ?
   stanford-crypto-pbkdf2               388.5531+-2.2593    !   396.6219+-2.2329       ! definitely 1.0208x slower
   stanford-crypto-sha256-iterative     147.8469+-0.6880    ?   149.7923+-2.0456       ? might be 1.0132x slower

   <arithmetic>                         310.2399+-0.5340    !   315.8423+-1.0812       ! definitely 1.0181x slower
   <geometric>                          242.6350+-0.3027    !   246.7610+-0.7284       ! definitely 1.0170x slower
   <harmonic>                           173.7287+-0.4609    !   174.8320+-0.4895       ! definitely 1.0064x slower

                                            TipOfTree              GetByOffset                                   
All benchmarks:
   <arithmetic>                         115.4688+-0.1885    !   116.2311+-0.3077       ! definitely 1.0066x slower
   <geometric>                           26.2069+-0.0884         26.1173+-0.0668       
   <harmonic>                             7.5860+-0.0528          7.5692+-0.0494
Comment 6 Filip Pizlo 2011-09-18 14:18:55 PDT
It looks like the reason for the slow-down in Kraken is just because of a lack of recompilation.  On the first ~5000 executions of this function, the get_by_id self cache is successful.  It then falls apart, and the get_by_id's go polymorphic.  But by then the DFG has already done its compilation.

Recommendation: since the speed-up on V8 is bigger than the slow-down on Kraken, I suggest landing this, but adding this to the list of reasons why we need recompilation.
Comment 7 Filip Pizlo 2011-09-18 14:28:49 PDT
Created attachment 107796 [details]
the patch
Comment 8 WebKit Review Bot 2011-09-18 14:31:22 PDT
Attachment 107796 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/dfg/DFGJITCodeGenerator.h:66:  One space before end of line comments  [whitespace/comments] [5]
Source/JavaScriptCore/dfg/DFGNode.h:234:  Missing space inside { }.  [whitespace/braces] [5]
Source/JavaScriptCore/dfg/DFGNode.h:236:  Missing space inside { }.  [whitespace/braces] [5]
Source/JavaScriptCore/dfg/DFGNode.h:237:  Missing space inside { }.  [whitespace/braces] [5]
Source/JavaScriptCore/bytecode/PredictedType.h:155:  The parameter name "classInfo" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 5 in 12 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 9 Filip Pizlo 2011-09-18 14:37:18 PDT
Created attachment 107797 [details]
the patch
Comment 10 Oliver Hunt 2011-09-19 14:33:55 PDT
Comment on attachment 107797 [details]
the patch

safeCast isn't actually safe -- you should use Checked<T> if you want overflow safe arithmetic, or at least make safeCast use isInBounds from CheckedArithmetic.h so that your overflow detection is correct.
Comment 11 Oliver Hunt 2011-09-19 15:28:20 PDT
Comment on attachment 107797 [details]
the patch

r+ with my suggested change to safeCast
Comment 12 Filip Pizlo 2011-09-20 02:30:16 PDT
Looks like this isn't ready yet.  It breaks bing.com maps.
Comment 13 Filip Pizlo 2011-09-20 02:38:20 PDT
(In reply to comment #12)
> Looks like this isn't ready yet.  It breaks bing.com maps.

Never mind.  It's not a regression, and the fix is here: https://bugs.webkit.org/show_bug.cgi?id=68430
Comment 14 Filip Pizlo 2011-09-20 02:41:31 PDT
Landed in r95523.
Comment 15 Adam Roben (:aroben) 2011-09-20 07:08:47 PDT
This broke the Windows build <http://build.webkit.org/builders/Windows%20Release%20%28Build%29/builds/21118/steps/compile-webkit/logs/stdio>. I'm not sure why it passed the EWS bot; maybe some changes were made before landing?

In any case, I landed http://trac.webkit.org/changeset/95540 to try to fix it.