Bug 74170 - DFG's interpretation of rare case profiles should be frequency-based not count-based
Summary: DFG's interpretation of rare case profiles should be frequency-based not coun...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-08 23:44 PST by Filip Pizlo
Modified: 2011-12-09 16:10 PST (History)
0 users

See Also:


Attachments
the patch (14.56 KB, patch)
2011-12-09 13:31 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (14.60 KB, patch)
2011-12-09 13:48 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2011-12-08 23:44:28 PST
Consider the following code:

function foo(...) {
    a = b / c; // b and c are integers, but b / c returns a fractional number
    for (var i = 0; i < 100000; ++i) { ... }
}

This will get optimized while in the loop, so the rare case counter for b / c will be 1 - not enough to force b / c to be a double division.  foo() will then fail speculation the next time it is called, but even then, the rare case counter will just be at 2 - still not enough.  The threshold is currently around 100 before b / c would be viewed by the DFG as a double division - but in this case, that won't happen for a very long time, so the loop will never benefit from DFG optimization due to this one overzealous speculation.

The solution is to decide b / c is double based on the frequency of slow path execution, rather than the absolute count of how many times it was taken.  This requires knowing the total number of times that an instruction executed, which in turn requires a bit more profiling in the old JIT.
Comment 1 Filip Pizlo 2011-12-08 23:45:21 PST
The easiest way to get a rate rather than an absolute count is to count the number of times each basic block executes.  This requires counters only at jump targets, and immediately following branches.  I implemented this, and it's really cheap - I can't see any performance degradation.


Benchmark report for SunSpider, V8, and Kraken on oldmac.local (MacPro4,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc (r102385)
"CountBB" at /Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/jsc (r102385)

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime()
function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in
milliseconds.

                                            TipOfTree                CountBB                                     
SunSpider:
   3d-cube                                8.8183+-0.0418    ?     8.8363+-0.0535       ?
   3d-morph                              10.1214+-0.0387    ?    10.1393+-0.0775       ?
   3d-raytrace                            9.3476+-0.1096          9.2083+-0.0700         might be 1.0151x faster
   access-binary-trees                    1.9328+-0.0373    ?     1.9617+-0.0373       ? might be 1.0150x slower
   access-fannkuch                        8.9966+-0.0166    !     9.1016+-0.0566       ! definitely 1.0117x slower
   access-nbody                           4.7551+-0.0418          4.7248+-0.0083       
   access-nsieve                          3.7196+-0.0121    ?     3.7310+-0.0119       ?
   bitops-3bit-bits-in-byte               1.5102+-0.0225    ?     1.5232+-0.0294       ?
   bitops-bits-in-byte                    5.9291+-0.0192    !     5.9748+-0.0233       ! definitely 1.0077x slower
   bitops-bitwise-and                     3.9838+-0.0191    ?     4.0024+-0.0467       ?
   bitops-nsieve-bits                     6.8696+-0.0404          6.8539+-0.0441       
   controlflow-recursive                  2.7657+-0.0195    ?     2.7738+-0.0122       ?
   crypto-aes                             8.7318+-0.0918          8.7155+-0.0921       
   crypto-md5                             2.9179+-0.0189    ?     2.9627+-0.0286       ? might be 1.0154x slower
   crypto-sha1                            2.5999+-0.0151    ?     2.6182+-0.0207       ?
   date-format-tofte                     13.3252+-0.1260         13.1250+-0.1467         might be 1.0153x faster
   date-format-xparb                     11.9514+-0.0594    ?    12.0433+-0.0878       ?
   math-cordic                            8.6429+-0.0445    ?     8.6963+-0.0691       ?
   math-partial-sums                     12.6434+-0.0336         12.6069+-0.0228       
   math-spectral-norm                     3.1378+-0.0201    ?     3.1393+-0.0184       ?
   regexp-dna                            15.8619+-0.1243         15.7403+-0.0863       
   string-base64                          4.7747+-0.0419    ?     4.8000+-0.0489       ?
   string-fasta                           8.8949+-0.0507          8.8560+-0.0287       
   string-tagcloud                       15.0438+-0.0765    ?    15.0568+-0.0822       ?
   string-unpack-code                    25.9398+-0.0298    !    26.5210+-0.2463       ! definitely 1.0224x slower
   string-validate-input                  6.7550+-0.0779    ?     6.7989+-0.0832       ?

   <arithmetic> *                         8.0758+-0.0236    ?     8.0966+-0.0343       ? might be 1.0026x slower
   <geometric>                            6.4493+-0.0230    ?     6.4674+-0.0271       ? might be 1.0028x slower
   <harmonic>                             5.0302+-0.0242    ?     5.0551+-0.0246       ? might be 1.0050x slower

                                            TipOfTree                CountBB                                     
V8:
   crypto                                92.3103+-0.3728    ?    92.5307+-0.4690       ?
   deltablue                            204.9305+-1.6018        204.4549+-1.6621       
   earley-boyer                         120.8883+-1.4748    ?   121.7168+-1.2146       ?
   raytrace                              69.9284+-0.2411         69.8041+-0.2873       
   regexp                               149.0791+-0.7718        148.3416+-0.5401       
   richards                             171.1122+-1.6146    ^   167.8128+-1.0429       ^ definitely 1.0197x faster
   splay                                107.8526+-1.1289    ?   108.0241+-1.2968       ?

   <arithmetic>                         130.8716+-0.5034        130.3836+-0.3249         might be 1.0037x faster
   <geometric> *                        123.5641+-0.4652        123.2515+-0.2999         might be 1.0025x faster
   <harmonic>                           116.4428+-0.4195        116.2714+-0.2818         might be 1.0015x faster

                                            TipOfTree                CountBB                                     
Kraken:
   ai-astar                             900.4571+-1.3743    ?   900.4581+-0.8574       ?
   audio-beat-detection                 247.8751+-0.5911    ?   248.3219+-0.8955       ?
   audio-dft                            320.1299+-3.1069        319.5416+-3.4951       
   audio-fft                            160.9397+-0.4120        160.5048+-0.2567       
   audio-oscillator                     343.0422+-7.1926    ?   352.6219+-6.6597       ? might be 1.0279x slower
   imaging-darkroom                     403.1950+-5.5486    ?   404.6109+-5.9675       ?
   imaging-desaturate                   288.5352+-0.3384    ?   288.7382+-0.3635       ?
   imaging-gaussian-blur                740.5361+-2.3800        738.8561+-0.5093       
   json-parse-financial                  86.2767+-0.2412    !    87.2189+-0.2078       ! definitely 1.0109x slower
   json-stringify-tinderbox              99.1519+-0.2664         99.1123+-0.2535       
   stanford-crypto-aes                  142.6808+-0.6472    ?   145.4456+-3.9557       ? might be 1.0194x slower
   stanford-crypto-ccm                  138.1201+-1.0647        137.1065+-1.0269       
   stanford-crypto-pbkdf2               281.7826+-2.0743    ?   283.1335+-2.1555       ?
   stanford-crypto-sha256-iterative     114.8826+-0.2872    ?   115.0961+-0.5195       ?

   <arithmetic> *                       304.8289+-0.5537    ?   305.7690+-0.7952       ? might be 1.0031x slower
   <geometric>                          237.8354+-0.5344    ?   238.7712+-0.7594       ? might be 1.0039x slower
   <harmonic>                           192.1020+-0.4203    ?   192.8865+-0.6767       ? might be 1.0041x slower

                                            TipOfTree                CountBB                                     
All benchmarks:
   <arithmetic>                         114.7591+-0.2254    ?   114.9779+-0.2364       ? might be 1.0019x slower
   <geometric>                           29.3225+-0.0838    ?    29.3910+-0.0855       ? might be 1.0023x slower
   <harmonic>                             8.8650+-0.0420    ?     8.9081+-0.0426       ? might be 1.0049x slower

                                            TipOfTree                CountBB                                     
Geomean of preferred means:
   <scaled-result>                       67.2527+-0.1669    ?    67.3225+-0.1479       ? might be 1.0010x slower
Comment 2 Filip Pizlo 2011-12-09 02:11:11 PST
Well this is interesting.  What I was trying to say is that we want a test like:

rateOfFailure / rateOfExecution >= threshold

to determine if we should assume the worst for this instruction.  But that's not quite right.  Consider the following code:

for (var i = 0; i < 10000; ++i)
    doSomethingThatTakesSlowPathOneInEveryLoopIterations()

In this case, rateOfFailure = rateOfExecution/10.  So it seems like we can assume that it rarely fails and then compile the code as such.

But this would be wrong!  Our goal is not to maximize the probability of a single speculation check succeeding; it is instead to maximize the probability that the function completes without any speculation failures.  Thus if we have a 1/10 chance that any iteration of the loop fails, and the loop executes 10,000 times, then the probability that we will fail at some point during the execution of this loop is essentially 1.  Formally, it is something like:

1 - (1 - rateOfFailure / rateOfExecution) ^ 10000

or more broadly:

1 - (1 - rateOfFailure / rateOfExecution) ^ (rateOfExecution / numbeOfFunctionInvocations)

We have two choices.  We can either perform the formal computation, or we can be clever.  Consider that we do not count the total number of executions of an operation (rateOfExecution) but we do count the number of times the function executes and test:

rateOfFailure / numberOfFunctionExecutions >= threshold

then oddly, we get something much closer to the more general formulation, in terms of its behavior when you plot it for varying inputs.  This is actually great news, because it means that we don't even have to track the execution rate of basic blocks - all we need is the total number of times that a code block executed.
Comment 3 Filip Pizlo 2011-12-09 02:21:48 PST
> for (var i = 0; i < 10000; ++i)
>     doSomethingThatTakesSlowPathOneInEveryLoopIterations()

This was meant to be doSomethingThatTakesSlowPathOneInEveryTenLoopIterations().  As in s/OneInEveryLoop/OneInEveryTenLoop/.
Comment 4 Filip Pizlo 2011-12-09 13:24:56 PST
It turns out that this is necessary, but insufficient to precisely catch operations that often take slow path.  The early phase of the program's execution is often not representative of how the program will behave in steady-state.  Hence if we try to make an operation safe by switching it to use a double addition instead of an integer addition, for example, just because 1/2 of the initial executions took slow path then we may run the risk that in steady state that number will actually be <1/10.

So we need a rule like:

Make the operation safe only when:
(It has taken slow path some minimum number of times (probably close to 100),
 and It has taken slow path more than some fraction of total executions (probably 1/10))
or The speculation failures that have led to recompilation being triggered were caused by this operation not being safe.

In this bug, I'll only solve the first two parts, and fix the last part in another bug.
Comment 5 Filip Pizlo 2011-12-09 13:31:52 PST
Created attachment 118623 [details]
the patch
Comment 6 Filip Pizlo 2011-12-09 13:48:32 PST
Created attachment 118627 [details]
the patch

Fixed some #if's.
Comment 7 Geoffrey Garen 2011-12-09 13:49:51 PST
Comment on attachment 118627 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=118627&action=review

r=me

> Source/JavaScriptCore/runtime/Heuristics.cpp:148
> +    SET(likelyToTakeSlowCaseMinimumCount, 100);

It seems like this 100 limit is still problematic in the first example you gave.
Comment 8 Filip Pizlo 2011-12-09 13:54:29 PST
(In reply to comment #7)
> (From update of attachment 118627 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=118627&action=review
> 
> r=me
> 
> > Source/JavaScriptCore/runtime/Heuristics.cpp:148
> > +    SET(likelyToTakeSlowCaseMinimumCount, 100);
> 
> It seems like this 100 limit is still problematic in the first example you gave.

Yup.  But making it lower will cause us to become overly pessimistic for code that takes many slow cases in the early part of its execution.  That's why we also need a story for realizing that the reason why recompilation was triggered, was because of some slow case in the DFG.  I'll do that in a follow-on patch, since this one is big enough already.
Comment 9 Filip Pizlo 2011-12-09 16:10:15 PST
Landed in http://trac.webkit.org/changeset/102489