3/2/15, 6:56 PM Phil Pizlo: It used to be the case that if you had a constructor that wasn't inlined, we'd be able to infer that the constructor's callee was a constant if it always was the same. This was an important optimization because it allowed us to strength-reduce CreateThis into NewObject for constructors that couldn't be inlined for whatever reason. r180595 broke this, because it removed the use of op_get_callee. op_get_callee was primarily there to profile whether the callee happened to always be the same. Now, because we get the callee from "this", we have no such profiling. Worse, that change left op_get_callee in the codebase despite it being unused. The right solution is probably to have create_this profile whether it always sees the same callee or not. <rdar://problem/20016842>
Created attachment 252691 [details] Prototype
Created attachment 252768 [details] Benchmark results (summary) Here's the output from two runs of run-jsc-benchmarks. Signinificantly, I'm seeing Octane to be 1.5 - 2.0% faster: Octane: encrypt 0.19000+-0.00046 ? 0.19081+-0.00221 ? decrypt 3.17850+-0.03612 ? 3.22190+-0.18735 ? might be 1.0137x slower deltablue x2 0.15496+-0.00104 0.15385+-0.00452 earley 0.39098+-0.00314 ^ 0.36285+-0.00449 ^ definitely 1.0775x faster boyer 5.33157+-0.17280 ^ 4.00897+-0.01914 ^ definitely 1.3299x faster navier-stokes x2 4.79455+-0.06503 4.78718+-0.05907 raytrace x2 0.96876+-0.06991 ? 0.97763+-0.06643 ? richards x2 0.08811+-0.00207 ? 0.08890+-0.00181 ? splay x2 0.33256+-0.00298 ? 0.33877+-0.01250 ? might be 1.0187x slower regexp x2 26.54050+-0.47729 26.15802+-1.54154 might be 1.0146x faster pdfjs x2 37.21557+-0.79409 36.78595+-1.51987 might be 1.0117x faster mandreel x2 42.97181+-0.99295 ? 43.05130+-1.28202 ? gbemu x2 32.85735+-3.09244 31.79627+-0.51182 might be 1.0334x faster closure 0.47578+-0.00959 0.47554+-0.00437 jquery 5.86110+-0.10876 ? 5.89380+-0.09717 ? box2d x2 10.24121+-0.23015 10.11899+-0.35676 might be 1.0121x faster zlib x2 361.80455+-3.84013 338.13940+-36.79902 might be 1.0700x faster typescript x2 634.92981+-20.34847 632.39490+-14.44438 <geometric> * 5.50521+-0.04661 5.40241+-0.07164 might be 1.0190x faster Octane: encrypt 0.19202+-0.00500 0.18906+-0.00097 might be 1.0157x faster decrypt 3.18267+-0.02299 ? 3.23126+-0.04715 ? might be 1.0153x slower deltablue x2 0.15525+-0.00097 ^ 0.15088+-0.00121 ^ definitely 1.0290x faster earley 0.39824+-0.00512 ^ 0.36348+-0.00596 ^ definitely 1.0956x faster boyer 5.30302+-0.08364 ^ 3.97578+-0.03324 ^ definitely 1.3338x faster navier-stokes x2 4.86913+-0.26404 4.81151+-0.09188 might be 1.0120x faster raytrace x2 0.99406+-0.06960 0.98159+-0.05859 might be 1.0127x faster richards x2 0.08730+-0.00150 0.08704+-0.00248 splay x2 0.33225+-0.00364 ? 0.33411+-0.00372 ? regexp x2 26.20238+-1.44534 ? 26.47932+-0.62304 ? might be 1.0106x slower pdfjs x2 37.22585+-0.36108 36.85741+-1.11554 mandreel x2 43.15461+-1.84598 ? 43.64833+-1.23361 ? might be 1.0114x slower gbemu x2 31.74233+-0.42850 31.55462+-0.28688 closure 0.47360+-0.00342 ? 0.47582+-0.00683 ? jquery 5.85984+-0.09226 ? 5.88440+-0.06700 ? box2d x2 9.95725+-0.09875 ? 10.00289+-0.23351 ? zlib x2 357.65324+-12.99726 349.95835+-27.65437 might be 1.0220x faster typescript x2 645.34644+-18.24724 643.10883+-7.71424 <geometric> * 5.49677+-0.01248 ^ 5.40585+-0.04412 ^ definitely 1.0168x faster
Created attachment 252770 [details] WIP2
Created attachment 252772 [details] Implements the optimization
Comment on attachment 252772 [details] Implements the optimization View in context: https://bugs.webkit.org/attachment.cgi?id=252772&action=review > Source/JavaScriptCore/jit/JITOpcodes32_64.cpp:941 > - RegisterID rareDataReg = regT0; > + RegisterID rareDataReg = regT4; This change is needed since rareDataReg is trashed in line 949 before we use calleeReg in line 957.
Created attachment 252786 [details] New benchmark results The previous patch had a bug that ended up in more slow path code to run. With the latest patch up for review, I'm seeing as much as 30% improvement on Octane/boyer and 2.5% on the total score: Octane: encrypt 0.18998+-0.00250 0.18837+-0.00048 decrypt 3.16326+-0.02920 ? 3.26929+-0.13545 ? might be 1.0335x slower deltablue x2 0.15635+-0.00154 ^ 0.15126+-0.00193 ^ definitely 1.0336x faster earley 0.39484+-0.00553 ^ 0.36266+-0.00361 ^ definitely 1.0887x faster boyer 5.25573+-0.04004 ^ 3.99607+-0.03167 ^ definitely 1.3152x faster navier-stokes x2 4.77331+-0.04282 4.75211+-0.03096 raytrace x2 0.96809+-0.06535 ? 0.99300+-0.06863 ? might be 1.0257x slower richards x2 0.08766+-0.00282 0.08751+-0.00237 splay x2 0.33253+-0.00374 0.33101+-0.00229 regexp x2 26.89484+-0.58181 26.07440+-0.25150 might be 1.0315x faster pdfjs x2 36.76267+-0.79859 ? 37.64845+-0.47423 ? might be 1.0241x slower mandreel x2 42.92566+-1.10883 42.30633+-0.49463 might be 1.0146x faster gbemu x2 35.97368+-13.58311 31.57536+-0.10342 might be 1.1393x faster closure 0.46977+-0.00467 ? 0.47123+-0.00431 ? jquery 5.89873+-0.13058 5.85201+-0.05856 box2d x2 9.95036+-0.09179 ? 10.01234+-0.17866 ? zlib x2 364.39117+-15.92980 346.96222+-32.88104 might be 1.0502x faster typescript x2 630.20581+-15.89247 ? 630.27069+-12.07680 ? <geometric> * 5.51835+-0.12695 5.38519+-0.02228 might be 1.0247x faster I wonder if r180595 caused the same magnitude of regression on Octane, or if there's something we're doing better this time.
Comment on attachment 252772 [details] Implements the optimization View in context: https://bugs.webkit.org/attachment.cgi?id=252772&action=review r=me > Source/JavaScriptCore/ChangeLog:22 > + (JSC::computeUsesForBytecodeOffset): op_create_this uses 2nd (constrcutor) and 4th (callee cache) Spelling "constructor".
Committed r184123: <http://trac.webkit.org/changeset/184123>
(In reply to comment #8) > Committed r184123: <http://trac.webkit.org/changeset/184123> It made zillion tests fail on the x86 and x86_64 Linux bots.
(In reply to comment #9) > (In reply to comment #8) > > Committed r184123: <http://trac.webkit.org/changeset/184123> > > It made zillion tests fail on the x86 and x86_64 Linux bots. Link?
Comment on attachment 252772 [details] Implements the optimization View in context: https://bugs.webkit.org/attachment.cgi?id=252772&action=review > Source/JavaScriptCore/bytecode/BytecodeUseDef.h:189 > + case op_create_this: { > + functor(codeBlock, instruction, opcodeID, instruction[2].u.operand); > + functor(codeBlock, instruction, opcodeID, instruction[4].u.operand); > + return; > + } I don't understand this change. Isn't operand 4 the cached callee? In that case, it's not an operand use since it's not a virtual register operand. I believe that this only uses [2]. Can you roll a patch to fix this? > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2676 > + RELEASE_ASSERT(cachedFunction); // LLint and BaselineJIT always set it to a JSFunction* or seenMultipleCalleeObjects(). This assert is wrong. They don't always set it. We could be compiling code that had never run in LLInt or Baseline. That's rare, but it does happen. Can you roll a patch to remove this assert? > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:2677 > + if (cachedFunction != JSCell::seenMultipleCalleeObjects()) { Can you also add a check that we didn't see the BadCell exit kind? This will ensure that we don't perform the optimization if we ever exited here due to the CheckCell failing on a previous compile. That's a bit stronger than relying on profiling. It's a bit redundant here, but we like to have the DFG use both forms of profiling for completeness and to reduce the chances of recompiling if the profiling gets goofed up. This would mean doing a check like: if (cachedFunction != JSCell::seenMultipleCalleeObjects() && !m_inlineStackTop->m_exitProfile.hasExitSite(m_currentIndex, BadCell)) {
> Link? rdar://problem/20911096
Re-opened since this is blocked by bug 144899
(In reply to comment #10) > (In reply to comment #9) > > (In reply to comment #8) > > > Committed r184123: <http://trac.webkit.org/changeset/184123> > > > > It made zillion tests fail on the x86 and x86_64 Linux bots. > > Link? build.webkit.org
(In reply to comment #14) > (In reply to comment #10) > > (In reply to comment #9) > > > (In reply to comment #8) > > > > Committed r184123: <http://trac.webkit.org/changeset/184123> > > > > > > It made zillion tests fail on the x86 and x86_64 Linux bots. > > > > Link? > > build.webkit.org Don't be ridiculous. I meant, a link to the failing results. At the time I asked, the GTK+ console showed no regressions due to this changeset.
(In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #10) > > > (In reply to comment #9) > > > > (In reply to comment #8) > > > > > Committed r184123: <http://trac.webkit.org/changeset/184123> > > > > > > > > It made zillion tests fail on the x86 and x86_64 Linux bots. > > > > > > Link? > > > > build.webkit.org > > Don't be ridiculous. I meant, a link to the failing results. At the time I > asked, the GTK+ console showed no regressions due to this changeset. It's hard to believe that you can't find the detailed results on build.webkit.org's waterfall. It's super easy to handle it. Buildbots console view isn't useful at all, because GTK and EFL bots are always red. The console can show you if the bot turns from green to red, but not the increasing number of failing tests. But here you are, these are the first failing builds: (previously there were no failing JSC tests, only layout tests) - https://build.webkit.org/builders/EFL%20Linux%2064-bit%20Release%20WK2/builds/21822 - https://build.webkit.org/builders/GTK%20Linux%2032-bit%20Release/builds/52623 - https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Debug%20%28Tests%29/builds/4539 (Otherwise I think the patch author and reviewer are responsibles to watch the bots after landing, not me.)
I hope that the Radar link is helpful enough. We hit the problem a lot more reliably on certain internal bots.
Created attachment 253089 [details] Fixed the crash and addressed Phil's comments
Committed r184328: <http://trac.webkit.org/changeset/184328>