Bug 144787 - REGRESSION(r180595): same-callee profiling no longer works
Summary: REGRESSION(r180595): same-callee profiling no longer works
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Ryosuke Niwa
URL:
Keywords: InRadar
Depends on: 144899
Blocks:
  Show dependency treegraph
 
Reported: 2015-05-07 23:46 PDT by Ryosuke Niwa
Modified: 2015-05-13 21:20 PDT (History)
7 users (show)

See Also:


Attachments
Prototype (12.00 KB, patch)
2015-05-07 23:46 PDT, Ryosuke Niwa
no flags Details | Formatted Diff | Diff
Benchmark results (summary) (102.30 KB, text/plain)
2015-05-08 21:31 PDT, Ryosuke Niwa
no flags Details
WIP2 (13.93 KB, patch)
2015-05-09 00:20 PDT, Ryosuke Niwa
no flags Details | Formatted Diff | Diff
Implements the optimization (17.91 KB, patch)
2015-05-09 02:05 PDT, Ryosuke Niwa
no flags Details | Formatted Diff | Diff
New benchmark results (51.12 KB, text/plain)
2015-05-09 15:53 PDT, Ryosuke Niwa
no flags Details
Fixed the crash and addressed Phil's comments (16.97 KB, patch)
2015-05-13 20:38 PDT, Ryosuke Niwa
fpizlo: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ryosuke Niwa 2015-05-07 23:46:07 PDT
3/2/15, 6:56 PM Phil Pizlo:
It used to be the case that if you had a constructor that wasn't inlined, we'd be able to infer that the constructor's callee was a constant if it always was the same.  This was an important optimization because it allowed us to strength-reduce CreateThis into NewObject for constructors that couldn't be inlined for whatever reason.

r180595 broke this, because it removed the use of op_get_callee.  op_get_callee was primarily there to profile whether the callee happened to always be the same.  Now, because we get the callee from "this", we have no such profiling.

Worse, that change left op_get_callee in the codebase despite it being unused.

The right solution is probably to have create_this profile whether it always sees the same callee or not.

<rdar://problem/20016842>
Comment 1 Ryosuke Niwa 2015-05-07 23:46:52 PDT
Created attachment 252691 [details]
Prototype
Comment 2 Ryosuke Niwa 2015-05-08 21:31:07 PDT
Created attachment 252768 [details]
Benchmark results (summary)

Here's the output from two runs of run-jsc-benchmarks. Signinificantly, I'm seeing Octane to be 1.5 - 2.0% faster:

Octane:
   encrypt                                           0.19000+-0.00046    ?     0.19081+-0.00221       ?
   decrypt                                           3.17850+-0.03612    ?     3.22190+-0.18735       ? might be 1.0137x slower
   deltablue                                x2       0.15496+-0.00104          0.15385+-0.00452       
   earley                                            0.39098+-0.00314    ^     0.36285+-0.00449       ^ definitely 1.0775x faster
   boyer                                             5.33157+-0.17280    ^     4.00897+-0.01914       ^ definitely 1.3299x faster
   navier-stokes                            x2       4.79455+-0.06503          4.78718+-0.05907       
   raytrace                                 x2       0.96876+-0.06991    ?     0.97763+-0.06643       ?
   richards                                 x2       0.08811+-0.00207    ?     0.08890+-0.00181       ?
   splay                                    x2       0.33256+-0.00298    ?     0.33877+-0.01250       ? might be 1.0187x slower
   regexp                                   x2      26.54050+-0.47729         26.15802+-1.54154         might be 1.0146x faster
   pdfjs                                    x2      37.21557+-0.79409         36.78595+-1.51987         might be 1.0117x faster
   mandreel                                 x2      42.97181+-0.99295    ?    43.05130+-1.28202       ?
   gbemu                                    x2      32.85735+-3.09244         31.79627+-0.51182         might be 1.0334x faster
   closure                                           0.47578+-0.00959          0.47554+-0.00437       
   jquery                                            5.86110+-0.10876    ?     5.89380+-0.09717       ?
   box2d                                    x2      10.24121+-0.23015         10.11899+-0.35676         might be 1.0121x faster
   zlib                                     x2     361.80455+-3.84013        338.13940+-36.79902        might be 1.0700x faster
   typescript                               x2     634.92981+-20.34847       632.39490+-14.44438      

   <geometric> *                                     5.50521+-0.04661          5.40241+-0.07164         might be 1.0190x faster


Octane:
   encrypt                                           0.19202+-0.00500          0.18906+-0.00097         might be 1.0157x faster
   decrypt                                           3.18267+-0.02299    ?     3.23126+-0.04715       ? might be 1.0153x slower
   deltablue                                x2       0.15525+-0.00097    ^     0.15088+-0.00121       ^ definitely 1.0290x faster
   earley                                            0.39824+-0.00512    ^     0.36348+-0.00596       ^ definitely 1.0956x faster
   boyer                                             5.30302+-0.08364    ^     3.97578+-0.03324       ^ definitely 1.3338x faster
   navier-stokes                            x2       4.86913+-0.26404          4.81151+-0.09188         might be 1.0120x faster
   raytrace                                 x2       0.99406+-0.06960          0.98159+-0.05859         might be 1.0127x faster
   richards                                 x2       0.08730+-0.00150          0.08704+-0.00248       
   splay                                    x2       0.33225+-0.00364    ?     0.33411+-0.00372       ?
   regexp                                   x2      26.20238+-1.44534    ?    26.47932+-0.62304       ? might be 1.0106x slower
   pdfjs                                    x2      37.22585+-0.36108         36.85741+-1.11554       
   mandreel                                 x2      43.15461+-1.84598    ?    43.64833+-1.23361       ? might be 1.0114x slower
   gbemu                                    x2      31.74233+-0.42850         31.55462+-0.28688       
   closure                                           0.47360+-0.00342    ?     0.47582+-0.00683       ?
   jquery                                            5.85984+-0.09226    ?     5.88440+-0.06700       ?
   box2d                                    x2       9.95725+-0.09875    ?    10.00289+-0.23351       ?
   zlib                                     x2     357.65324+-12.99726       349.95835+-27.65437        might be 1.0220x faster
   typescript                               x2     645.34644+-18.24724       643.10883+-7.71424       

   <geometric> *                                     5.49677+-0.01248    ^     5.40585+-0.04412       ^ definitely 1.0168x faster
Comment 3 Ryosuke Niwa 2015-05-09 00:20:07 PDT
Created attachment 252770 [details]
WIP2
Comment 4 Ryosuke Niwa 2015-05-09 02:05:55 PDT
Created attachment 252772 [details]
Implements the optimization
Comment 5 Ryosuke Niwa 2015-05-09 02:14:06 PDT
Comment on attachment 252772 [details]
Implements the optimization

View in context: https://bugs.webkit.org/attachment.cgi?id=252772&action=review

> Source/JavaScriptCore/jit/JITOpcodes32_64.cpp:941
> -    RegisterID rareDataReg = regT0;
> +    RegisterID rareDataReg = regT4;

This change is needed since rareDataReg is trashed in line 949 before we use calleeReg in line 957.
Comment 6 Ryosuke Niwa 2015-05-09 15:53:21 PDT
Created attachment 252786 [details]
New benchmark results

The previous patch had a bug that ended up in more slow path code to run. With the latest patch up for review, I'm seeing as much as 30% improvement on Octane/boyer and 2.5% on the total score:

Octane:
   encrypt                                           0.18998+-0.00250          0.18837+-0.00048       
   decrypt                                           3.16326+-0.02920    ?     3.26929+-0.13545       ? might be 1.0335x slower
   deltablue                                x2       0.15635+-0.00154    ^     0.15126+-0.00193       ^ definitely 1.0336x faster
   earley                                            0.39484+-0.00553    ^     0.36266+-0.00361       ^ definitely 1.0887x faster
   boyer                                             5.25573+-0.04004    ^     3.99607+-0.03167       ^ definitely 1.3152x faster
   navier-stokes                            x2       4.77331+-0.04282          4.75211+-0.03096       
   raytrace                                 x2       0.96809+-0.06535    ?     0.99300+-0.06863       ? might be 1.0257x slower
   richards                                 x2       0.08766+-0.00282          0.08751+-0.00237       
   splay                                    x2       0.33253+-0.00374          0.33101+-0.00229       
   regexp                                   x2      26.89484+-0.58181         26.07440+-0.25150         might be 1.0315x faster
   pdfjs                                    x2      36.76267+-0.79859    ?    37.64845+-0.47423       ? might be 1.0241x slower
   mandreel                                 x2      42.92566+-1.10883         42.30633+-0.49463         might be 1.0146x faster
   gbemu                                    x2      35.97368+-13.58311        31.57536+-0.10342         might be 1.1393x faster
   closure                                           0.46977+-0.00467    ?     0.47123+-0.00431       ?
   jquery                                            5.89873+-0.13058          5.85201+-0.05856       
   box2d                                    x2       9.95036+-0.09179    ?    10.01234+-0.17866       ?
   zlib                                     x2     364.39117+-15.92980       346.96222+-32.88104        might be 1.0502x faster
   typescript                               x2     630.20581+-15.89247   ?   630.27069+-12.07680      ?

   <geometric> *                                     5.51835+-0.12695          5.38519+-0.02228         might be 1.0247x faster

I wonder if r180595 caused the same magnitude of regression on Octane, or if there's something we're doing better this time.
Comment 7 Michael Saboff 2015-05-11 13:42:56 PDT
Comment on attachment 252772 [details]
Implements the optimization

View in context: https://bugs.webkit.org/attachment.cgi?id=252772&action=review

r=me

> Source/JavaScriptCore/ChangeLog:22
> +        (JSC::computeUsesForBytecodeOffset): op_create_this uses 2nd (constrcutor) and 4th (callee cache)

Spelling "constructor".
Comment 8 Ryosuke Niwa 2015-05-11 14:22:40 PDT
Committed r184123: <http://trac.webkit.org/changeset/184123>
Comment 9 Csaba Osztrogonác 2015-05-11 21:53:57 PDT
(In reply to comment #8)
> Committed r184123: <http://trac.webkit.org/changeset/184123>

It made zillion tests fail on the x86 and x86_64 Linux bots.
Comment 10 Filip Pizlo 2015-05-11 22:02:29 PDT
(In reply to comment #9)
> (In reply to comment #8)
> > Committed r184123: <http://trac.webkit.org/changeset/184123>
> 
> It made zillion tests fail on the x86 and x86_64 Linux bots.

Link?
Comment 11 Filip Pizlo 2015-05-11 22:29:34 PDT
Comment on attachment 252772 [details]
Implements the optimization

View in context: https://bugs.webkit.org/attachment.cgi?id=252772&action=review

> Source/JavaScriptCore/bytecode/BytecodeUseDef.h:189
> +    case op_create_this: {
> +        functor(codeBlock, instruction, opcodeID, instruction[2].u.operand);
> +        functor(codeBlock, instruction, opcodeID, instruction[4].u.operand);
> +        return;
> +    }

I don't understand this change.  Isn't operand 4 the cached callee?  In that case, it's not an operand use since it's not a virtual register operand.  I believe that this only uses [2].  Can you roll a patch to fix this?

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2676
> +                RELEASE_ASSERT(cachedFunction); // LLint and BaselineJIT always set it to a JSFunction* or seenMultipleCalleeObjects().

This assert is wrong.  They don't always set it.  We could be compiling code that had never run in LLInt or Baseline.  That's rare, but it does happen.  Can you roll a patch to remove this assert?

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2677
> +                if (cachedFunction != JSCell::seenMultipleCalleeObjects()) {

Can you also add a check that we didn't see the BadCell exit kind?  This will ensure that we don't perform the optimization if we ever exited here due to the CheckCell failing on a previous compile.  That's a bit stronger than relying on profiling.  It's a bit redundant here, but we like to have the DFG use both forms of profiling for completeness and to reduce the chances of recompiling if the profiling gets goofed up.  This would mean doing a check like:

if (cachedFunction != JSCell::seenMultipleCalleeObjects() && !m_inlineStackTop->m_exitProfile.hasExitSite(m_currentIndex, BadCell)) {
Comment 12 Alexey Proskuryakov 2015-05-11 22:34:54 PDT
> Link?

rdar://problem/20911096
Comment 13 WebKit Commit Bot 2015-05-11 22:46:44 PDT
Re-opened since this is blocked by bug 144899
Comment 14 Csaba Osztrogonác 2015-05-11 23:30:02 PDT
(In reply to comment #10)
> (In reply to comment #9)
> > (In reply to comment #8)
> > > Committed r184123: <http://trac.webkit.org/changeset/184123>
> > 
> > It made zillion tests fail on the x86 and x86_64 Linux bots.
> 
> Link?

build.webkit.org
Comment 15 Filip Pizlo 2015-05-11 23:52:46 PDT
(In reply to comment #14)
> (In reply to comment #10)
> > (In reply to comment #9)
> > > (In reply to comment #8)
> > > > Committed r184123: <http://trac.webkit.org/changeset/184123>
> > > 
> > > It made zillion tests fail on the x86 and x86_64 Linux bots.
> > 
> > Link?
> 
> build.webkit.org

Don't be ridiculous.  I meant, a link to the failing results.  At the time I asked, the GTK+ console showed no regressions due to this changeset.
Comment 16 Csaba Osztrogonác 2015-05-12 01:49:58 PDT
(In reply to comment #15)
> (In reply to comment #14)
> > (In reply to comment #10)
> > > (In reply to comment #9)
> > > > (In reply to comment #8)
> > > > > Committed r184123: <http://trac.webkit.org/changeset/184123>
> > > > 
> > > > It made zillion tests fail on the x86 and x86_64 Linux bots.
> > > 
> > > Link?
> > 
> > build.webkit.org
> 
> Don't be ridiculous.  I meant, a link to the failing results.  At the time I
> asked, the GTK+ console showed no regressions due to this changeset.

It's hard to believe that you can't find the detailed results on 
build.webkit.org's waterfall. It's super easy to handle it.

Buildbots console view isn't useful at all, because GTK and EFL bots
are always red. The console can show you if the bot turns from green
to red, but not the increasing number of failing tests.

But here you are, these are the first failing builds:
(previously there were no failing JSC tests, only layout tests)
- https://build.webkit.org/builders/EFL%20Linux%2064-bit%20Release%20WK2/builds/21822
- https://build.webkit.org/builders/GTK%20Linux%2032-bit%20Release/builds/52623
- https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Debug%20%28Tests%29/builds/4539

(Otherwise I think the patch author and reviewer are 
responsibles to watch the bots after landing, not me.)
Comment 17 Alexey Proskuryakov 2015-05-12 13:23:41 PDT
I hope that the Radar link is helpful enough. We hit the problem a lot more reliably on certain internal bots.
Comment 18 Ryosuke Niwa 2015-05-13 20:38:11 PDT
Created attachment 253089 [details]
Fixed the crash and addressed Phil's comments
Comment 19 Ryosuke Niwa 2015-05-13 21:20:44 PDT
Committed r184328: <http://trac.webkit.org/changeset/184328>