144787 – REGRESSION(r180595): same-callee profiling no longer works

RESOLVED FIXED144787

REGRESSION(r180595): same-callee profiling no longer works

https://bugs.webkit.org/show_bug.cgi?id=144787

Summary REGRESSION(r180595): same-callee profiling no longer works

Ryosuke Niwa

Reported 2015-05-07 23:46:07 PDT

3/2/15, 6:56 PM Phil Pizlo: It used to be the case that if you had a constructor that wasn't inlined, we'd be able to infer that the constructor's callee was a constant if it always was the same. This was an important optimization because it allowed us to strength-reduce CreateThis into NewObject for constructors that couldn't be inlined for whatever reason. r180595 broke this, because it removed the use of op_get_callee. op_get_callee was primarily there to profile whether the callee happened to always be the same. Now, because we get the callee from "this", we have no such profiling. Worse, that change left op_get_callee in the codebase despite it being unused. The right solution is probably to have create_this profile whether it always sees the same callee or not. <rdar://problem/20016842>

Attachments
Prototype (12.00 KB, patch) 2015-05-07 23:46 PDT, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Benchmark results (summary) (102.30 KB, text/plain) 2015-05-08 21:31 PDT, Ryosuke Niwa	no flags	Details
WIP2 (13.93 KB, patch) 2015-05-09 00:20 PDT, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Implements the optimization (17.91 KB, patch) 2015-05-09 02:05 PDT, Ryosuke Niwa	no flags	Details Formatted Diff Diff
New benchmark results (51.12 KB, text/plain) 2015-05-09 15:53 PDT, Ryosuke Niwa	no flags	Details
Fixed the crash and addressed Phil's comments (16.97 KB, patch) 2015-05-13 20:38 PDT, Ryosuke Niwa	fpizlo: review+	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Ryosuke Niwa

Comment 1 2015-05-07 23:46:52 PDT

Created attachment 252691 [details] Prototype

Ryosuke Niwa

Comment 2 2015-05-08 21:31:07 PDT

Created attachment 252768 [details] Benchmark results (summary) Here's the output from two runs of run-jsc-benchmarks. Signinificantly, I'm seeing Octane to be 1.5 - 2.0% faster: Octane: encrypt 0.19000+-0.00046 ? 0.19081+-0.00221 ? decrypt 3.17850+-0.03612 ? 3.22190+-0.18735 ? might be 1.0137x slower deltablue x2 0.15496+-0.00104 0.15385+-0.00452 earley 0.39098+-0.00314 ^ 0.36285+-0.00449 ^ definitely 1.0775x faster boyer 5.33157+-0.17280 ^ 4.00897+-0.01914 ^ definitely 1.3299x faster navier-stokes x2 4.79455+-0.06503 4.78718+-0.05907 raytrace x2 0.96876+-0.06991 ? 0.97763+-0.06643 ? richards x2 0.08811+-0.00207 ? 0.08890+-0.00181 ? splay x2 0.33256+-0.00298 ? 0.33877+-0.01250 ? might be 1.0187x slower regexp x2 26.54050+-0.47729 26.15802+-1.54154 might be 1.0146x faster pdfjs x2 37.21557+-0.79409 36.78595+-1.51987 might be 1.0117x faster mandreel x2 42.97181+-0.99295 ? 43.05130+-1.28202 ? gbemu x2 32.85735+-3.09244 31.79627+-0.51182 might be 1.0334x faster closure 0.47578+-0.00959 0.47554+-0.00437 jquery 5.86110+-0.10876 ? 5.89380+-0.09717 ? box2d x2 10.24121+-0.23015 10.11899+-0.35676 might be 1.0121x faster zlib x2 361.80455+-3.84013 338.13940+-36.79902 might be 1.0700x faster typescript x2 634.92981+-20.34847 632.39490+-14.44438 <geometric> * 5.50521+-0.04661 5.40241+-0.07164 might be 1.0190x faster Octane: encrypt 0.19202+-0.00500 0.18906+-0.00097 might be 1.0157x faster decrypt 3.18267+-0.02299 ? 3.23126+-0.04715 ? might be 1.0153x slower deltablue x2 0.15525+-0.00097 ^ 0.15088+-0.00121 ^ definitely 1.0290x faster earley 0.39824+-0.00512 ^ 0.36348+-0.00596 ^ definitely 1.0956x faster boyer 5.30302+-0.08364 ^ 3.97578+-0.03324 ^ definitely 1.3338x faster navier-stokes x2 4.86913+-0.26404 4.81151+-0.09188 might be 1.0120x faster raytrace x2 0.99406+-0.06960 0.98159+-0.05859 might be 1.0127x faster richards x2 0.08730+-0.00150 0.08704+-0.00248 splay x2 0.33225+-0.00364 ? 0.33411+-0.00372 ? regexp x2 26.20238+-1.44534 ? 26.47932+-0.62304 ? might be 1.0106x slower pdfjs x2 37.22585+-0.36108 36.85741+-1.11554 mandreel x2 43.15461+-1.84598 ? 43.64833+-1.23361 ? might be 1.0114x slower gbemu x2 31.74233+-0.42850 31.55462+-0.28688 closure 0.47360+-0.00342 ? 0.47582+-0.00683 ? jquery 5.85984+-0.09226 ? 5.88440+-0.06700 ? box2d x2 9.95725+-0.09875 ? 10.00289+-0.23351 ? zlib x2 357.65324+-12.99726 349.95835+-27.65437 might be 1.0220x faster typescript x2 645.34644+-18.24724 643.10883+-7.71424 <geometric> * 5.49677+-0.01248 ^ 5.40585+-0.04412 ^ definitely 1.0168x faster

Ryosuke Niwa

Comment 3 2015-05-09 00:20:07 PDT

Created attachment 252770 [details] WIP2

Ryosuke Niwa

Comment 4 2015-05-09 02:05:55 PDT

Created attachment 252772 [details] Implements the optimization

Ryosuke Niwa

Comment 5 2015-05-09 02:14:06 PDT

Comment on attachment 252772 [details] Implements the optimization View in context: https://bugs.webkit.org/attachment.cgi?id=252772&action=review > Source/JavaScriptCore/jit/JITOpcodes32_64.cpp:941 > - RegisterID rareDataReg = regT0; > + RegisterID rareDataReg = regT4; This change is needed since rareDataReg is trashed in line 949 before we use calleeReg in line 957.

Ryosuke Niwa

Comment 6 2015-05-09 15:53:21 PDT

Created attachment 252786 [details] New benchmark results The previous patch had a bug that ended up in more slow path code to run. With the latest patch up for review, I'm seeing as much as 30% improvement on Octane/boyer and 2.5% on the total score: Octane: encrypt 0.18998+-0.00250 0.18837+-0.00048 decrypt 3.16326+-0.02920 ? 3.26929+-0.13545 ? might be 1.0335x slower deltablue x2 0.15635+-0.00154 ^ 0.15126+-0.00193 ^ definitely 1.0336x faster earley 0.39484+-0.00553 ^ 0.36266+-0.00361 ^ definitely 1.0887x faster boyer 5.25573+-0.04004 ^ 3.99607+-0.03167 ^ definitely 1.3152x faster navier-stokes x2 4.77331+-0.04282 4.75211+-0.03096 raytrace x2 0.96809+-0.06535 ? 0.99300+-0.06863 ? might be 1.0257x slower richards x2 0.08766+-0.00282 0.08751+-0.00237 splay x2 0.33253+-0.00374 0.33101+-0.00229 regexp x2 26.89484+-0.58181 26.07440+-0.25150 might be 1.0315x faster pdfjs x2 36.76267+-0.79859 ? 37.64845+-0.47423 ? might be 1.0241x slower mandreel x2 42.92566+-1.10883 42.30633+-0.49463 might be 1.0146x faster gbemu x2 35.97368+-13.58311 31.57536+-0.10342 might be 1.1393x faster closure 0.46977+-0.00467 ? 0.47123+-0.00431 ? jquery 5.89873+-0.13058 5.85201+-0.05856 box2d x2 9.95036+-0.09179 ? 10.01234+-0.17866 ? zlib x2 364.39117+-15.92980 346.96222+-32.88104 might be 1.0502x faster typescript x2 630.20581+-15.89247 ? 630.27069+-12.07680 ? <geometric> * 5.51835+-0.12695 5.38519+-0.02228 might be 1.0247x faster I wonder if r180595 caused the same magnitude of regression on Octane, or if there's something we're doing better this time.

Michael Saboff

Comment 7 2015-05-11 13:42:56 PDT

Comment on attachment 252772 [details] Implements the optimization View in context: https://bugs.webkit.org/attachment.cgi?id=252772&action=review r=me > Source/JavaScriptCore/ChangeLog:22 > + (JSC::computeUsesForBytecodeOffset): op_create_this uses 2nd (constrcutor) and 4th (callee cache) Spelling "constructor".

Ryosuke Niwa

Comment 8 2015-05-11 14:22:40 PDT

Committed r184123: <http://trac.webkit.org/changeset/184123>

Csaba Osztrogonác

Comment 9 2015-05-11 21:53:57 PDT

(In reply to comment #8) > Committed r184123: <http://trac.webkit.org/changeset/184123> It made zillion tests fail on the x86 and x86_64 Linux bots.

Filip Pizlo

Comment 10 2015-05-11 22:02:29 PDT

(In reply to comment #9) > (In reply to comment #8) > > Committed r184123: <http://trac.webkit.org/changeset/184123> > > It made zillion tests fail on the x86 and x86_64 Linux bots. Link?

Filip Pizlo

Comment 11 2015-05-11 22:29:34 PDT

Comment on attachment 252772 [details] Implements the optimization View in context: https://bugs.webkit.org/attachment.cgi?id=252772&action=review > Source/JavaScriptCore/bytecode/BytecodeUseDef.h:189 > + case op_create_this: { > + functor(codeBlock, instruction, opcodeID, instruction[2].u.operand); > + functor(codeBlock, instruction, opcodeID, instruction[4].u.operand); > + return; > + } I don't understand this change. Isn't operand 4 the cached callee? In that case, it's not an operand use since it's not a virtual register operand. I believe that this only uses [2]. Can you roll a patch to fix this? > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2676 > + RELEASE_ASSERT(cachedFunction); // LLint and BaselineJIT always set it to a JSFunction* or seenMultipleCalleeObjects(). This assert is wrong. They don't always set it. We could be compiling code that had never run in LLInt or Baseline. That's rare, but it does happen. Can you roll a patch to remove this assert? > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2677 > + if (cachedFunction != JSCell::seenMultipleCalleeObjects()) { Can you also add a check that we didn't see the BadCell exit kind? This will ensure that we don't perform the optimization if we ever exited here due to the CheckCell failing on a previous compile. That's a bit stronger than relying on profiling. It's a bit redundant here, but we like to have the DFG use both forms of profiling for completeness and to reduce the chances of recompiling if the profiling gets goofed up. This would mean doing a check like: if (cachedFunction != JSCell::seenMultipleCalleeObjects() && !m_inlineStackTop->m_exitProfile.hasExitSite(m_currentIndex, BadCell)) {

Alexey Proskuryakov

Comment 12 2015-05-11 22:34:54 PDT

> Link? rdar://problem/20911096

WebKit Commit Bot

Comment 13 2015-05-11 22:46:44 PDT

Re-opened since this is blocked by bug 144899

Csaba Osztrogonác

Comment 14 2015-05-11 23:30:02 PDT

(In reply to comment #10) > (In reply to comment #9) > > (In reply to comment #8) > > > Committed r184123: <http://trac.webkit.org/changeset/184123> > > > > It made zillion tests fail on the x86 and x86_64 Linux bots. > > Link? build.webkit.org

Filip Pizlo

Comment 15 2015-05-11 23:52:46 PDT

(In reply to comment #14) > (In reply to comment #10) > > (In reply to comment #9) > > > (In reply to comment #8) > > > > Committed r184123: <http://trac.webkit.org/changeset/184123> > > > > > > It made zillion tests fail on the x86 and x86_64 Linux bots. > > > > Link? > > build.webkit.org Don't be ridiculous. I meant, a link to the failing results. At the time I asked, the GTK+ console showed no regressions due to this changeset.

Csaba Osztrogonác

Comment 16 2015-05-12 01:49:58 PDT

(In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #10) > > > (In reply to comment #9) > > > > (In reply to comment #8) > > > > > Committed r184123: <http://trac.webkit.org/changeset/184123> > > > > > > > > It made zillion tests fail on the x86 and x86_64 Linux bots. > > > > > > Link? > > > > build.webkit.org > > Don't be ridiculous. I meant, a link to the failing results. At the time I > asked, the GTK+ console showed no regressions due to this changeset. It's hard to believe that you can't find the detailed results on build.webkit.org's waterfall. It's super easy to handle it. Buildbots console view isn't useful at all, because GTK and EFL bots are always red. The console can show you if the bot turns from green to red, but not the increasing number of failing tests. But here you are, these are the first failing builds: (previously there were no failing JSC tests, only layout tests) - https://build.webkit.org/builders/EFL%20Linux%2064-bit%20Release%20WK2/builds/21822 - https://build.webkit.org/builders/GTK%20Linux%2032-bit%20Release/builds/52623 - https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Debug%20%28Tests%29/builds/4539 (Otherwise I think the patch author and reviewer are responsibles to watch the bots after landing, not me.)

Alexey Proskuryakov

Comment 17 2015-05-12 13:23:41 PDT

I hope that the Radar link is helpful enough. We hit the problem a lot more reliably on certain internal bots.

Ryosuke Niwa

Comment 18 2015-05-13 20:38:11 PDT

Created attachment 253089 [details] Fixed the crash and addressed Phil's comments

Ryosuke Niwa

Comment 19 2015-05-13 21:20:44 PDT

Committed r184328: <http://trac.webkit.org/changeset/184328>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component JavaScriptCore

Assignee

Ryosuke Niwa

Reported

2015-05-07 23:46 PDT

Modified

2015-05-13 21:20 PDT History

CC List

7 users Show

URL

Keywords InRadar

Depends on

144899

Blocks

Dependencies

tree graph