Bug 177925 - Evaluate the benefit of skipping dead code in the DFGByteCodeParser when a function returns in its first block
Summary: Evaluate the benefit of skipping dead code in the DFGByteCodeParser when a fu...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Robin Morisset
URL:
Keywords: InRadar
Depends on: 177922
Blocks:
  Show dependency treegraph
 
Reported: 2017-10-05 02:38 PDT by Robin Morisset
Modified: 2017-10-09 18:31 PDT (History)
8 users (show)

See Also:


Attachments
benchmarks results (103.90 KB, text/plain)
2017-10-09 10:02 PDT, Robin Morisset
no flags Details
Patch (6.14 KB, patch)
2017-10-09 17:26 PDT, Robin Morisset
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Robin Morisset 2017-10-05 02:38:22 PDT
There is code in the treatment of op_ret in DFGByteCodeParser.cpp::parseBlock() that checks if an early return is from the first block.
In that case, it skips the rest of the function, knowing that all other blocks are dead code.
I would like to measure whether this is something that ever happens in actual code, because I do not see why it would happen, and if it does not, we could get rid of this code, and of everything that touches the shouldContinueParsing variable.
Comment 1 Filip Pizlo 2017-10-06 14:58:27 PDT
(In reply to Robin Morisset from comment #0)
> There is code in the treatment of op_ret in
> DFGByteCodeParser.cpp::parseBlock() that checks if an early return is from
> the first block.
> In that case, it skips the rest of the function, knowing that all other
> blocks are dead code.
> I would like to measure whether this is something that ever happens in
> actual code, because I do not see why it would happen, and if it does not,
> we could get rid of this code, and of everything that touches the
> shouldContinueParsing variable.

Lol.  I think I remember why.

Say you do:

function foo()
{
    return 42;
}

This will turn into this bytecode:

    op_enter
    op_ret 42
    op_ret undefined

This is because the parser views "return" as just a statement, and then emits a "op_ret undefined" at the end of foo because if foo did fall through, it would have returned undefined.

The original DFG only supported single-basic-block functions. I think that was before I even started working on WebKit. So, probably, that check was there to even make it possible for the DFG to compile single basic block functions as single basic block functions.

Also, if there is anything in the DFG that optimizes for single basic block, those optimizations would fail if you disabled this.

Anyway, enough history. :-)

I think it's a good idea to test removing these silly "optimizations".  If it is performance-neutral, then it's good for the sanity of the code to just remove it.
Comment 2 Robin Morisset 2017-10-09 10:02:01 PDT
Created attachment 323182 [details]
benchmarks results

I ran benchmarks with --inner 5 --outer 5 (I was not sure of what parameters to use), comparing top of tree ("Baseline"), with my patch from https://bugs.webkit.org/show_bug.cgi?id=177922 ("Refactor") and with a combination of that patch + removing the optimisation that stops parsing when there is a return in the first block of the function ("Return").

I am not sure exactly how to read the results: even after having disabled spotlight, wifi, inSync and every other application, it was still quite noisy. Overall, there does not seem to be a large difference on average, but a few micro-benchmarks have significant performance changes.
Comment 3 Robin Morisset 2017-10-09 17:03:55 PDT
I ran some basic benchmarks in which I removed this "optimisation":

Benchmark report for Octane on MacBook-Pro-4 (MacBookPro14,3).

VMs tested:
"Baseline" at /Users/rmorisset/Webkit2/OpenSource/WebKitBuild/Baseline/Release/jsc
"Return" at /Users/rmorisset/Webkit2/OpenSource/WebKitBuild/ReturnSimpl/Release/jsc

Collected 6 samples per benchmark/VM, with 6 VM invocations per benchmark. Emitted a call to gc()
between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the
jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution
times with 95% confidence intervals in milliseconds.

                            Baseline                   Return                                      

encrypt                 0.12978+-0.00274    ?     0.12990+-0.00180       ?
decrypt                 2.27387+-0.02004          2.26779+-0.01558       
deltablue      x2       0.12433+-0.00526          0.12070+-0.00666         might be 1.0300x faster
earley                  0.23988+-0.00299          0.23789+-0.00510       
boyer                   4.14790+-0.04999    ?     4.17062+-0.03407       ?
navier-stokes  x2       4.75290+-0.03824    ?     4.78339+-0.08019       ?
raytrace       x2       0.65215+-0.00778          0.65068+-0.00691       
richards       x2       0.07699+-0.00072          0.07680+-0.00075       
splay          x2       0.30160+-0.00452          0.29970+-0.00216       
regexp         x2      14.76220+-0.28300    ?    14.89334+-0.18142       ?
pdfjs          x2      37.43778+-0.94895    ?    37.52657+-0.59151       ?
mandreel       x2      36.69220+-0.56194    ?    37.27994+-1.44801       ? might be 1.0160x slower
gbemu          x2      27.81859+-0.16985         27.81324+-0.31214       
closure                 0.43844+-0.00260    ?     0.43908+-0.00490       ?
jquery                  5.70626+-0.06706          5.65665+-0.04046       
box2d          x2       7.13657+-0.06160    ?     7.18821+-0.06391       ?
zlib           x2     296.23094+-9.83051    ?   296.76506+-2.64597       ?
typescript     x2     595.48682+-5.07882        588.20536+-6.52180         might be 1.0124x faster

<geometric>             4.45677+-0.01361          4.45171+-0.02024         might be 1.0011x faster

And
Benchmark report for Octane on MacBook-Pro-4 (MacBookPro14,3).

VMs tested:
"Baseline" at /Users/rmorisset/Webkit2/OpenSource/WebKitBuild/Baseline/Release/jsc
"Return" at /Users/rmorisset/Webkit2/OpenSource/WebKitBuild/ReturnSimpl/Release/jsc

Collected 6 samples per benchmark/VM, with 6 VM invocations per benchmark. Emitted a call to gc()
between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the
jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution
times with 95% confidence intervals in milliseconds.

                            Baseline                   Return                                      

encrypt                 0.12978+-0.00274    ?     0.12990+-0.00180       ?
decrypt                 2.27387+-0.02004          2.26779+-0.01558       
deltablue      x2       0.12433+-0.00526          0.12070+-0.00666         might be 1.0300x faster
earley                  0.23988+-0.00299          0.23789+-0.00510       
boyer                   4.14790+-0.04999    ?     4.17062+-0.03407       ?
navier-stokes  x2       4.75290+-0.03824    ?     4.78339+-0.08019       ?
raytrace       x2       0.65215+-0.00778          0.65068+-0.00691       
richards       x2       0.07699+-0.00072          0.07680+-0.00075       
splay          x2       0.30160+-0.00452          0.29970+-0.00216       
regexp         x2      14.76220+-0.28300    ?    14.89334+-0.18142       ?
pdfjs          x2      37.43778+-0.94895    ?    37.52657+-0.59151       ?
mandreel       x2      36.69220+-0.56194    ?    37.27994+-1.44801       ? might be 1.0160x slower
gbemu          x2      27.81859+-0.16985         27.81324+-0.31214       
closure                 0.43844+-0.00260    ?     0.43908+-0.00490       ?
jquery                  5.70626+-0.06706          5.65665+-0.04046       
box2d          x2       7.13657+-0.06160    ?     7.18821+-0.06391       ?
zlib           x2     296.23094+-9.83051    ?   296.76506+-2.64597       ?
typescript     x2     595.48682+-5.07882        588.20536+-6.52180         might be 1.0124x faster

<geometric>             4.45677+-0.01361          4.45171+-0.02024         might be 1.0011x faster


I think it shows pretty clearly that this "optimisation" does not bring anything in practice. So I will upload a patch that removes it (as soon as https://bugs.webkit.org/show_bug.cgi?id=177922 lands).
Comment 4 Robin Morisset 2017-10-09 17:26:47 PDT
Created attachment 323253 [details]
Patch
Comment 5 Robin Morisset 2017-10-09 17:29:32 PDT
Just realized I had some copy-paste failure in my comment on benchmarks results above.
One of the two benchmarks run should have shown the results of sunspider instead of octane. It also had no statistically significant regression.
Comment 6 Saam Barati 2017-10-09 17:50:16 PDT
Comment on attachment 323253 [details]
Patch

r=me
Comment 7 WebKit Commit Bot 2017-10-09 18:30:06 PDT
Comment on attachment 323253 [details]
Patch

Clearing flags on attachment: 323253

Committed r223112: <http://trac.webkit.org/changeset/223112>
Comment 8 WebKit Commit Bot 2017-10-09 18:30:07 PDT
All reviewed patches have been landed.  Closing bug.
Comment 9 Radar WebKit Bug Importer 2017-10-09 18:31:03 PDT
<rdar://problem/34900752>