Summary: | OSR entry: delay outer-loop compilation when at inner-loop | ||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | JF Bastien <jfbastien> | ||||||||||||||||||||||||||||||||||||||||||||||||||
Component: | JavaScriptCore | Assignee: | JF Bastien <jfbastien> | ||||||||||||||||||||||||||||||||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Severity: | Normal | CC: | commit-queue, fpizlo, ggaren, jfbastien, keith_miller, mark.lam, msaboff, saam, webkit-bug-importer | ||||||||||||||||||||||||||||||||||||||||||||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||||||||||||||||||||||||||||||||||||||||||||
Version: | WebKit Nightly Build | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||||||||||||||||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Bug Depends on: | 167479, 167721 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Bug Blocks: | 167741, 167811, 167460 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Attachments: |
|
Description
JF Bastien
2017-01-17 18:39:58 PST
Created attachment 299107 [details] patch Should address <rdar://problem/29144126> Here are some benchmark results: $ run-jsc-benchmarks original:original/bin/jsc delay:current-release/bin/jsc --kraken --outer 20 original delay ai-astar 67.957+-1.211 67.896+-1.411 audio-beat-detection 28.329+-1.439 ? 28.507+-1.279 ? audio-dft 59.496+-0.586 ? 59.804+-0.611 ? audio-fft 21.121+-0.106 ? 21.168+-0.115 ? audio-oscillator 45.449+-1.932 44.702+-0.187 might be 1.0167x faster imaging-darkroom 49.720+-0.295 ? 49.832+-0.357 ? imaging-desaturate 43.173+-0.864 42.325+-0.775 might be 1.0200x faster imaging-gaussian-blur 64.513+-9.577 60.548+-7.272 might be 1.0655x faster json-parse-financial 29.491+-0.126 ? 29.521+-0.132 ? json-stringify-tinderbox 18.723+-0.186 18.607+-0.184 stanford-crypto-aes 30.532+-0.240 30.424+-0.124 stanford-crypto-ccm 28.714+-0.854 28.599+-0.896 stanford-crypto-pbkdf2 95.847+-0.485 95.605+-0.445 stanford-crypto-sha256-iterative 30.506+-0.080 ? 30.534+-0.131 ? <arithmetic> 43.827+-0.718 43.434+-0.557 might be 1.0090x faster $ run-jsc-benchmarks original:original/bin/jsc delay:current-release/bin/jsc --kraken --benchmark gaussian-blur --outer 200 original delay imaging-gaussian-blur 63.372+-2.660 61.678+-2.618 might be 1.0275x faster <arithmetic> 63.372+-2.660 61.678+-2.618 might be 1.0275x faster $ run-jsc-benchmarks original:original/bin/jsc delay:current-release/bin/jsc --benchmark marsaglia --outer 200 original delay marsaglia-larger-ints 71.3672+-0.0630 ? 71.4541+-0.1207 ? marsaglia-osr-entry 15.8131+-0.0794 ? 15.8814+-0.0861 ? <geometric> 33.5883+-0.0823 ? 33.6799+-0.0910 ? might be 1.0027x slower $ run-jsc-benchmarks original:original/bin/jsc delay:current-release/bin/jsc --benchmark mandelbrot --outer 20 original delay mandelbrot 376.3381+-2.3982 374.4935+-0.6074 <geometric> 376.3381+-2.3982 374.4935+-0.6074 might be 1.0049x faster $ run-jsc-benchmarks original:original/bin/jsc delay:current-release/bin/jsc --benchmark nonude --outer 60 original delay nonude 67.6165+-0.6550 ? 67.9252+-0.7012 ? <geometric> 67.6165+-0.6550 ? 67.9252+-0.7012 ? might be 1.0046x slower In the mandelbrot.js test I've got Unicode characters that display a fancy Mandelbrot image. Bugzilla doesn't like Unicode, but my terminal does (you can also print on each iteration to see a neat animation!). nonude.js has baked-in constant array, I figure that's more reliable than randomly generated. I could avoid line-wrap and put everything in a single line if you think that would be better. It needs to be about this size to tier up. Comment on attachment 299107 [details]
patch
I'd like to provide feedback on this patch, but I can't because the mandelbrot microbenchmark is too big for the patch review tool.
Also, it seems to be fairly long-running. We don't want microbenchmarks that run for 100ms or more.
I don't think this is the right approach. It seems to defeat the purpose of having a target loop trigger. When you try to compile and you're not at the loop header, you just try again. The whole point of the trigger is so that we don't have to wait and hope that we enter in the loop. Created attachment 299313 [details]
patch
Update with a different approach: mark the outer loop for entry, and when we force its compilation then unmark it.
I still need to handle the cases where:
- outer-loop compilation fails (can it fail for outer but not inner? Do we care?)
- outer-loop doesn't trigger for too long (maybe we should try to trigger one loop in, and so on, until we reach the innermost loop)
I ran performance number but they look weird. I'll close things and re-run them.
Created attachment 299364 [details]
patch
Here's a sketch of what I think back off could be like.
I still need to:
- Verify benchmarks, and tune this backoff thing.
- Make new tests shorter.
- Update changelog.
(In reply to comment #6) > Created attachment 299364 [details] > patch > > Here's a sketch of what I think back off could be like. > > I still need to: > > - Verify benchmarks, and tune this backoff thing. > - Make new tests shorter. > - Update changelog. Can you please upload a smaller patch? You should remove the benchmarks from the patch and land them anyway when you commit. We don't review the benchmark contents usually. Comment on attachment 299364 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=299364&action=review > Source/JavaScriptCore/dfg/DFGTierUpEntryTrigger.h:35 > + DontEnter = 0, // Must stay zero: JIT-compiled code enters when non-zero. > + TryToCompileCode, > + CodeCompiledSuccessfully, s/enters/takes the triggerOSREntryNow slow path Maybe need to rename triggerOSREntryNow to triggerTierUpNow Maybe: None, Compile, OSREnter Created attachment 299381 [details]
patch
Update patch with Geoff's comments, and drop tests for now.
Created attachment 299395 [details]
patch
Add some logging, and clean up a minor issue with entry location.
Comment on attachment 299395 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=299395&action=review > Source/JavaScriptCore/bytecode/ExecutionCounter.cpp:75 > template<CountingVariant countingVariant> > +void ExecutionCounter<countingVariant>::deferForAWhile() > +{ > + ASSERT(m_activeThreshold != std::numeric_limits<int32_t>::max()); > + m_totalCount = count(); > + m_counter = -static_cast<int32_t>(m_activeThreshold); > +} This function is kinda weird. It's totally different from how we usually ask the execution counter to trigger soon. > Source/JavaScriptCore/dfg/DFGJITCode.cpp:144 > +void JITCode::dontOptimizeForAWhile(CodeBlock* codeBlock) > +{ > + ASSERT(codeBlock->jitType() == JITCode::DFGJIT); > + if (Options::verboseOSR()) > + dataLog(*codeBlock, ": Not FTL-optimizing for a while.\n"); > + tierUpCounter.deferForAWhile(); > +} > + This seems like it's almost like optimizeAfterWarmUp or optimizeSoon. You should preferably choose one of those rather than introduce a new thing. Created attachment 299639 [details]
patch
Substantially change the logic after in-person talk with pizlo.
I still need to:
- Test this a bunch.
- Make the tests run faster.
The basic idea should be there though.
Comment on attachment 299639 [details]
patch
This looks pretty solid to me.
Created attachment 299676 [details]
pach
Fixed a few bugs found by tests. They're still running, I'll then benchmark and shorten the two new tests.
Tests seem happy. First run of the benchmarks seem OK, pretty much perf neutral without regressing variance on kraken/imagine-gaussian-blur. I'm kicking off a longer benchmark run and signing out for today, will look a the results tomorrow and report back. Created attachment 299713 [details]
patch
Typo fix.
Benchmarks finished, ruby is crunching on them.
Created attachment 299774 [details]
patch
This looks good to go. I rebased the changes, made the new benchmarks faster, and ran benchmarks (everything looks pretty neutral).
I'll file a separate bug to investigate mandelbrot's lack of FTL-ing. It's important to fix, but not as much as other things.
Attachment 299774 [details] did not pass style-queue:
ERROR: Source/JavaScriptCore/dfg/DFGOperations.cpp:2431: An else statement can be removed when the prior "if" concludes with a return, break, continue or goto statement. [readability/control_flow] [4]
ERROR: Source/JavaScriptCore/dfg/DFGOperations.cpp:2613: An else statement can be removed when the prior "if" concludes with a return, break, continue or goto statement. [readability/control_flow] [4]
Total errors found: 2 in 18 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 299774 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=299774&action=review > JSTests/microbenchmarks/nonude.js:1 > +const makeFaster = 8; Does this test actually exercise the crash? > Source/JavaScriptCore/ChangeLog:29 > + right JSValues, otherwise we're confusing the compiler and giving > + it inaccurate JSValues which can lead it to predict the wrong > + things, leading to suboptimal code or recompilation due to > + misprediction. Also, crash > Source/JavaScriptCore/dfg/DFGOperations.cpp:2390 > + // We're already compiling, and can't OSR enter. We therefore must have set our slow-path trigger as well as another slow path trigger, hoping one of these would kick off a compilation. Sure enough another bytecode location did kick off a compilation before this one got a chance to do so. There's nothing for us to do but wait. Nit: make this multiline. > Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:5787 > + static_assert(!static_cast<int>(TierUpEntryTrigger::None), "NonZero test below depends on this"); nit: I'd use uint8_t instead of int Comment on attachment 299774 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=299774&action=review > Source/JavaScriptCore/dfg/DFGOperations.cpp:2625 > + switch (osrEntryPreparation.error()) { > + case FTL::OSREntryFail::StackGrowthFailed: > + // It's not clear what we should do. We signal to try again after a while if that happens. > + return nullptr; > + case FTL::OSREntryFail::WrongBytecode: > + RELEASE_ASSERT_NOT_REACHED(); > + } Why not just RELEASE_ASSERT(error == StackGrowthFailed)? Also, we tend not to have "else" statements for "if"s that end in returns. Comment on attachment 299774 [details]
patch
LGTM, but I think you should address Saam's advice.
Created attachment 299816 [details] patch Address Saam's comments except one as noted below. Fix GTK build, it was sad about missing return because it wasn't smart enough to see there was no missing return (or spoon for that matter). > > JSTests/microbenchmarks/nonude.js:1 > > +const makeFaster = 8; > > Does this test actually exercise the crash? It's complicated... The code is super finicky, and I can add asserts at specific spots to observe unwanted things that turn out to be innocuous. As-is the benchmark pretty much doesn't exhibit bad behavior. I was thinking about adding a few asserts as sanity checks, but I'd do that separately. > > Source/JavaScriptCore/ChangeLog:29 > > + right JSValues, otherwise we're confusing the compiler and giving > > + it inaccurate JSValues which can lead it to predict the wrong > > + things, leading to suboptimal code or recompilation due to > > + misprediction. > > Also, crash I was purposefully not using that word, but pizlo says it's fine. > View in context: > https://bugs.webkit.org/attachment.cgi?id=299774&action=review > > > Source/JavaScriptCore/dfg/DFGOperations.cpp:2625 > > + switch (osrEntryPreparation.error()) { > > + case FTL::OSREntryFail::StackGrowthFailed: > > + // It's not clear what we should do. We signal to try again after a while if that happens. > > + return nullptr; > > + case FTL::OSREntryFail::WrongBytecode: > > + RELEASE_ASSERT_NOT_REACHED(); > > + } > > Why not just RELEASE_ASSERT(error == StackGrowthFailed)? > Also, we tend not to have "else" statements for "if"s that end in returns. I like this approach because if / when we add new reasons for OSR entry failure we *have* to edit this code (because the switch must cover all enum cases, otherwise compiler warns), think about what it means, and document it as above (what do we do about stack growth? Dunno... But wrong bytecode definitely can't happen here!). Removing the "else" would make this code invalid: osrEntryPreparation wouldn't be in scope anymore (it's only in scope in the if / else). I can hoist it above the "if", but then IMO the code is ugly :-) Happy to change it if you want, I just like this style quite a bit because IMO it's less likely to break in the future. Attachment 299816 [details] did not pass style-queue:
ERROR: Source/JavaScriptCore/dfg/DFGOperations.cpp:2439: An else statement can be removed when the prior "if" concludes with a return, break, continue or goto statement. [readability/control_flow] [4]
ERROR: Source/JavaScriptCore/dfg/DFGOperations.cpp:2637: An else statement can be removed when the prior "if" concludes with a return, break, continue or goto statement. [readability/control_flow] [4]
Total errors found: 2 in 18 files
If any of these errors are false positives, please file a bug against check-webkit-style.
(In reply to comment #22) > Created attachment 299816 [details] > patch > > Address Saam's comments except one as noted below. > > Fix GTK build, it was sad about missing return because it wasn't smart > enough to see there was no missing return (or spoon for that matter). > > > > > JSTests/microbenchmarks/nonude.js:1 > > > +const makeFaster = 8; > > > > Does this test actually exercise the crash? > > It's complicated... The code is super finicky, and I can add asserts at > specific spots to observe unwanted things that turn out to be innocuous. > As-is the benchmark pretty much doesn't exhibit bad behavior. > > I was thinking about adding a few asserts as sanity checks, but I'd do that > separately. > > > > > Source/JavaScriptCore/ChangeLog:29 > > > + right JSValues, otherwise we're confusing the compiler and giving > > > + it inaccurate JSValues which can lead it to predict the wrong > > > + things, leading to suboptimal code or recompilation due to > > > + misprediction. > > > > Also, crash > > I was purposefully not using that word, but pizlo says it's fine. > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=299774&action=review > > > > > Source/JavaScriptCore/dfg/DFGOperations.cpp:2625 > > > + switch (osrEntryPreparation.error()) { > > > + case FTL::OSREntryFail::StackGrowthFailed: > > > + // It's not clear what we should do. We signal to try again after a while if that happens. > > > + return nullptr; > > > + case FTL::OSREntryFail::WrongBytecode: > > > + RELEASE_ASSERT_NOT_REACHED(); > > > + } > > > > Why not just RELEASE_ASSERT(error == StackGrowthFailed)? > > Also, we tend not to have "else" statements for "if"s that end in returns. > > I like this approach because if / when we add new reasons for OSR entry > failure we *have* to edit this code (because the switch must cover all enum > cases, otherwise compiler warns), think about what it means, and document it > as above (what do we do about stack growth? Dunno... But wrong bytecode > definitely can't happen here!). I agree with that approach. I also like to use switch statements this way. We do it all the time in the compiler guts. > > Removing the "else" would make this code invalid: osrEntryPreparation > wouldn't be in scope anymore (it's only in scope in the if / else). I can > hoist it above the "if", but then IMO the code is ugly :-) That would be the WebKit style. > > Happy to change it if you want, I just like this style quite a bit because > IMO it's less likely to break in the future. We're all very used to reading code where else is only used when 'then' might fall through. It's annoying when code breaks this tradition. Created attachment 299823 [details]
patch
Un-else the code, pizlo makes a good case that it's WebKit style.
CQ+
Comment on attachment 299823 [details] patch Clearing flags on attachment: 299823 Committed r211224: <http://trac.webkit.org/changeset/211224> All reviewed patches have been landed. Closing bug. I'll follow-up on mandelbrot not FTL-ing here: https://bugs.webkit.org/show_bug.cgi?id=167460 Not now though, it's not super high priority. Re-opened since this is blocked by bug 167479 Created attachment 300231 [details]
patch
This patch does away with tierUpEntrySeen which seems to have been the cause of the regression on kraken/ai-astar. That benchmark has nested loops through inlining, and the innermost-loop enters through triggerTierUpNowInLoop which wasn't tracking tierUpEntrySeen, thereby preventing the trigger loop from setting any enclosing loop's trigger.
Interestingly, my local build didn't spot this as a regression because the generated code was perf-neutral but had significantly different Base/DFG/FTL/FTLOSR profile numbers. The same was true of other builds, on multiple machines (different profiles, but perf-neutral). In the end the performance regression only showed up on builds using older SDKs, which had different profiles *and* had significantly worst runtime performance (~80% regression on kraken/ai-astar). This is a troubling sign of this code's brittleness (which I'll ignore for now, and look into it later, better fix this now).
For this second fix I benchmarked approaches for the following combinations:
* Whether to drop tierUpEntrySeen for triggerTierUpNowInLoop only, or for checkTierUpAndOSREnterNow as well.
* Whether to set the triggers from outer-loop to inner-loop, or the reverse (again, trying both approaches for triggerTierUpNowInLoop and checkTierUpAndOSREnterNow).
The benchmarks tell me that dropping tierUpEntrySeen entirely and always doing outer-loop to inner-loop is a win. This patch is exactly the same as the last one, except for dropping tierUpEntrySeen.
Results (and profiles) attached separately.
Created attachment 300232 [details]
Results + profiles
Results + profiles
Comment on attachment 300231 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=300231&action=review > Source/JavaScriptCore/ChangeLog:9 > + Re-land of r211224, but without tierUpEntrySeen which seems to > + have been the cause of the kraken/ai-astar regression. tierUpEntrySeen has been around for a while. Can you provide a lot more details about why that's not needed anymore, and why it caused the regression in ai-astar? I think that if we can't answer that question, then wrapping this fix up together with a rewrite of a large chunk DFG->FTL OSR entry is not the right way to go. We need to find what is the minimal change to the code that fixes the bug. (In reply to comment #32) > Comment on attachment 300231 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=300231&action=review > > > Source/JavaScriptCore/ChangeLog:9 > > + Re-land of r211224, but without tierUpEntrySeen which seems to > > + have been the cause of the kraken/ai-astar regression. > > tierUpEntrySeen has been around for a while. Can you provide a lot more > details about why that's not needed anymore, and why it caused the > regression in ai-astar? > > I think that if we can't answer that question, then wrapping this fix up > together with a rewrite of a large chunk DFG->FTL OSR entry is not the right > way to go. We need to find what is the minimal change to the code that > fixes the bug. The following code is in triggerOSREntryNow but not in triggerTierUpNowInLoop: jitCode->tierUpEntrySeen.add(bytecodeIndex); The loop-traversal code then does: // Traverse the loop hierarchy from the outer-most loop, to the inner-most one. for (auto iterator = tierUpHierarchyEntry->value.rbegin(), end = tierUpHierarchyEntry->value.rend(); iterator != end; ++iterator) { unsigned osrEntryCandidate = *iterator; if (jitCode->tierUpEntrySeen.contains(osrEntryCandidate)) { .... That's the source of the bug in the patch which was previously committed, and then reverted. I agree with your suggestion: I'll create a minimally-disruptive fix, upload+commit after review, and then submit this code as a bigger refactoring in a separate patch. That'll make sure that I separate bug fix from performance+refactoring fix. Created attachment 300271 [details]
patch
Here's a very minimal patch as discussed. If it sticks I'll commit the bigger refactor+tuning in a follow-up.
Created attachment 300272 [details]
Benchmark results
Benchmark.
Attachment 300271 [details] did not pass style-queue:
ERROR: Source/JavaScriptCore/dfg/DFGOperations.cpp:2431: Multi line control clauses should use braces. [whitespace/braces] [4]
Total errors found: 1 in 2 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 300271 [details]
patch
I don't understand this change. Why does every conditional do nothing when triggeredSlowPath? Doesn't that mean that when you do get force triggered at the outer loop at which you could compile, then you'll end up not compiling? Shouldn't you be predicating on canOSRFromHere?
Comment on attachment 300271 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=300271&action=review > Source/JavaScriptCore/dfg/DFGOperations.cpp:2419 > + if (!triggeredSlowPath) { Not sure if this should be here. If we already have an osrEntryBlock, as the code this is guarding checks, then we have to count retry. Comment on attachment 300271 [details]
patch
OK - the only part of this that I'm uncomfortable about is not doing the existing-osrEntryBlock dance when triggeredSlowPath. What do you think?
(In reply to comment #37) > Comment on attachment 300271 [details] > patch > > I don't understand this change. Why does every conditional do nothing when > triggeredSlowPath? Doesn't that mean that when you do get force triggered > at the outer loop at which you could compile, then you'll end up not > compiling? Shouldn't you be predicating on canOSRFromHere? (reposting here from IRC discussion). canOSRFromHere only happens when entering from triggerTierUpNowInLoop, which gets generated by CheckTierUpInLoop, which itself gets generated by performTierUpCheckInjection when canOSREnterAtLoopHint(level, block, nodeIndex) says "can't". That's mostly orthogonal to triggeredSlowPath. triggeredSlowPath basically skips all the places we can return from, and goes directly to the compile step instead. That's why it does all the shortcuts (but only after testing the condition, which may have side-effects). The intent is that triggeredSlowPath does all the required bookkeeping, and takes you straight down to jitCode->reconstruct, which then triggers a compile. Separately though, Fil pointed out that the entire section under "It's time to try to compile code for OSR entry" didn't need to be guarded by `! triggeredSlowPath`. I'll revert that part of the change, benchmark, and upload a new patch. Created attachment 300280 [details]
patch
Updated patch. Perf numbers seem to be the same.
Comment on attachment 300280 [details] patch Clearing flags on attachment: 300280 Committed r211461: <http://trac.webkit.org/changeset/211461> All reviewed patches have been landed. Closing bug. This seems to regress *some* kraken/ai-astar builds again by ~80%. I'm looking at why, may revert again. This is odd. In the previous patch I had perf-neutral results with different tiering behavior because of a bug. This patch has the same tiering behavior as far as I can tell. I ran the following to create a csv file, before and after my change: for i in `seq 1 100`; do ./current/bin/jsc ./JSTests/ChakraCore/test/benchmarks/Kraken/ai-astar.js -p a 2>&1 > /dev/null && printf "f\nq" | ./Tools/Scripts/display-profiler-output a 2>&1 | grep findGraphNode | sed 's| *[^ ]* *[0-9]* *\([^ ]*\).*|\1|' | sed 's|/|,|g' ; done Before my change: Base DFG FTL FTLOSR Average 1360 39254117 2175101 5277488 stddev 392 11810751 233459 11788088 After my change: Base DFG FTL FTLOSR Average 1279 40142662 2128971 4435185 stddev 242 10385349 229923 10371637 There are two modes when running the code: sometimes we hit straight FTL for findGraphNode, and sometimes we hit FTLOSR. Both before as well as after, straight FTL occurs about 80% of the time, and FTLOSR 20%. Hitting FTLOSR is a 10%-20% perf uptick. But this is all with my local build, not with the build on the bots that show a slowdown. I need to dig some more. Interesting data point: using an older compiler and comparing before / after my change (again 100 profiled runs) I get the following results: Before: Runtime Base DFG FTL FTLOSR Average 183 1424 4002246 2317610 40386834 Stddev 5 456 452149 310826 594810 After: Runtime Base DFG FTL FTLOSR Average 268 1525 40450595 2265327 3990661 Stddev 28 1065 10648728 471780 10704523 Somehow, using an older C++ compiler drastically changes jsc's behavior. It used to go out of DFG pretty quickly, and each run does *both* FTL ad FTLOSR. After my change it's modal: most runs do FTL only, and a few do FTL+FTLOSR. Even the few that do FTL+FTLOSR stay in DFG for longer, so their performance is worse than before my change (and the FTL only ones have way worse performance). I'll try out a build with a newer compiler (instead of whatever I have locally), and compare to this. Odd: a bot build with newer compiler has similar results! I'm probably perturbing compilation enough (by waiting until we hit the outer loop) that we enter FTL from the function instead of through the OSR entry. Digging some more... OK the data problem stems from my cmake build *not* exhibiting this issue (before / after my patch performance and behavior are the same), but the make build exhibits this issue (behavior and performance change, code becomes modal FTL mostly and FTLOSR sometimes). I'm not sure *why* there's a difference, but now I know *where* the difference is. Re-opened since this is blocked by bug 167721 Created attachment 300434 [details]
patch
This patch tries to only create mustHandleValues when the byte code location allows it. Contrary to what I'd measured before (with a different approach) it's a disaster:
ai-astar 67.960+-0.933 ! 78.297+-1.001 ! definitely 1.1521x slower
audio-beat-detection 29.127+-0.044 ? 29.759+-0.616 ? might be 1.0217x slower
audio-dft 60.107+-1.480 59.898+-0.281
audio-fft 21.444+-0.360 21.253+-0.086
audio-oscillator 44.575+-0.124 ? 44.627+-0.072 ?
imaging-darkroom 49.453+-0.107 49.391+-0.095
imaging-desaturate 42.325+-0.314 ? 42.503+-0.343 ?
imaging-gaussian-blur 52.743+-0.439 ! 196.948+-48.231 ! definitely 3.7341x slower
json-parse-financial 28.857+-0.080 ? 29.336+-0.467 ? might be 1.0166x slower
json-stringify-tinderbox 18.620+-0.198 18.599+-0.195
stanford-crypto-aes 30.074+-0.094 ? 30.164+-0.367 ?
stanford-crypto-ccm 26.892+-0.281 ? 26.936+-0.524 ?
stanford-crypto-pbkdf2 92.042+-0.862 91.473+-0.273
stanford-crypto-sha256-iterative 29.186+-0.077 ? 29.214+-0.126 ?
<arithmetic> 42.386+-0.150 ! 53.457+-3.464 ! definitely 1.2612x slower
Looking at ai-astar, there's more FTLOSR exit going on:
CodeBlock #Instr Source Counts Machine Counts #Compil Inlines #Exits Last Opts
Base/DFG/FTL/FTLOSR Base/DFG/FTL/FTLOSR Src/Total Get/Put/Call
findGraphNode#EBmwVP 83 140738/8742609/37824773/15 140738/1803397/37824773/0 7 1/9 15 2/0/0
search#DTtNtq 686 253327/49301/0/15 253327/6873500/0/15 10 0/0 0 33/4/9
removeGraphNode#DqCfaQ 102 8094/155404/134382/0 8094/106372/134382/0 3 1/6 0 3/0/0
I'll explore other options.
Created attachment 300437 [details]
patch
This patch goes back to one of my original ideas of imprecise recovery. It seems to dampen the pain a bit:
original new
ai-astar 67.288+-1.586 ! 74.064+-1.935 ! definitely 1.1007x slower
audio-beat-detection 29.172+-0.201 ? 29.256+-0.248 ?
audio-dft 60.255+-1.880 ? 61.046+-1.703 ? might be 1.0131x slower
audio-fft 21.212+-0.124 ? 21.370+-0.147 ?
audio-oscillator 44.652+-0.222 ? 44.848+-0.159 ?
imaging-darkroom 49.880+-0.140 ? 50.416+-0.816 ? might be 1.0107x slower
imaging-desaturate 43.052+-1.005 43.049+-1.091
imaging-gaussian-blur 52.471+-1.869 ? 53.245+-1.872 ? might be 1.0148x slower
json-parse-financial 29.343+-0.346 29.313+-0.239
json-stringify-tinderbox 18.963+-0.167 ? 18.988+-0.232 ?
stanford-crypto-aes 30.038+-0.175 ? 30.156+-0.212 ?
stanford-crypto-ccm 26.529+-0.861 ? 27.998+-0.710 ? might be 1.0553x slower
stanford-crypto-pbkdf2 91.727+-0.497 ? 92.433+-0.530 ?
stanford-crypto-sha256-iterative 29.734+-0.982 29.671+-0.279
<arithmetic> 42.451+-0.264 ! 43.275+-0.223 ! definitely 1.0194x slower
CodeBlock #Instr Source Counts Machine Counts #Compil Inlines #Exits Last Opts
Base/DFG/FTL/FTLOSR Base/DFG/FTL/FTLOSR Src/Total Get/Put/Call
findGraphNode#EBmwVP 83 161883/7457230/39088698/19 161883/3149734/39088698/0 7 1/9 19 2/0/0
search#DTtNtq 686 264380/37702/0/19 264380/4255896/0/19 10 0/0 0 33/4/9
removeGraphNode#DqCfaQ 102 4214/142140/151396/0 4214/104520/151396/0 3 1/6 0 3/0/0
I'll science this some more.
I just realized why I couldn't figure out the exit reasons: `findGraphNode` gets inlined so the exit attributions go to it, but exit events to `search`. With this current patch (imprecise recovery) exits in kraken/ai-astar are due to BadType. With the one just before (no mustHandleValue when byte code location is bad) it's due to InadequateCoverage. Created attachment 300445 [details]
benchmark results for imprecise mustHandleValues
benchmark results for imprecise mustHandleValues when the bytecode location isn't suitable for entry. This looks mostly neutral, though kraken is a slight regression.
We can't land a statistically significant Kraken regression, no matter how small. Also, we consider 2% on Kraken to be a medium-to-big regression, it takes a big amount of work to get a 2% Kraken speed-up. Comment on attachment 300437 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=300437&action=review > Source/JavaScriptCore/bytecode/ValueRecovery.cpp:76 > + return jsUndefined(); This is probably your problem. If you tell the DFG that something could be Undefined and there is no other signal telling the DFG otherwise, then the DFG will assume that it should speculate that it is Undefined. It's not clear to me what you could return here that would be safe. Our type inference system is not designed to allow for this kind of imprecision. Comment on attachment 300437 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=300437&action=review > Source/JavaScriptCore/bytecode/ValueRecovery.cpp:75 > + if (v.isInt32()) > + return JSValue(static_cast<int32_t>(v.asInt32())); > + if (v.isDouble()) > + return jsNumber(purifyNaN(v.asDouble())); This is a possible security vulnerability. Say that `v` was actually a pointer. Now the compiler is constant-propagating a constant. Probably someone could extract this integer. I think that we want to actively defend against leaking pointers. Created attachment 300476 [details]
collect mustHandleValues eagerly
Here's a separate approach where we eagerly keep a side-collection of mustHandleValues. Anything which ends up in tierUpCommon will reconstruct the mustHandleValues for this bytecode, and store them in a hash table which lives on jitCode. I tried to do a few different things (see below).
Neither ai-astar nor imagining-gaussian-blur are exiting.
Here's imaging-gaussian-blur:
Tip of tree:
CodeBlock #Instr Source Counts Machine Counts #Compil Inlines #Exits Last Opts
Base/DFG/FTL/FTLOSR Base/DFG/FTL/FTLOSR Src/Total Get/Put/Call
gaussianBlur#AVtXjn 1335 36827/629333/0/17161183 36827/629333/0/17161183 4 0/0 0 8/0/8
buildKernel#BJnJ58 758 0/0/0/0 0/0/0/0 0 0/0 0 N/A
<global>#AzIPSP 382 0/0/0/0 0/0/0/0 0 0/0 0 N/A
Snapshot on every entry:
gaussianBlur#AVtXjn 1335 40170/4244885/0/13542113 40170/4244885/0/13542113 4 0/0 0 8/0/8
buildKernel#BJnJ58 758 0/0/0/0 0/0/0/0 0 0/0 0 N/A
<global>#AzIPSP 382 0/0/0/0 0/0/0/0 0 0/0 0 N/A
Here's ai-astar:
Tip of tree:
CodeBlock #Instr Source Counts Machine Counts #Compil Inlines #Exits Last Opts
Base/DFG/FTL/FTLOSR Base/DFG/FTL/FTLOSR Src/Total Get/Put/Call
findGraphNode#EBmwVP 83 1223/4263839/2462385/39980673 1223/381831/2462385/0 3 1/3 0 2/0/0
search#DTtNtq 686 44722/29270/0/228626 44722/3848207/0/39679633 4 0/0 1 33/4/9
removeGraphNode#DqCfaQ 102 8168/64194/0/225518 8168/35557/0/0 2 1/3 0 3/0/0
isWall#ESQBmN 21 328/4319/0/7717 328/2689/0/0 2 1/3 0 1/0/0
GraphNode#BmXric 86 116/9411/0/0 116/9411/0/0 2 0/0 0 0/7/0
init#Ct2Vq3 168 9511/0/0/0 9511/0/0/0 2 0/0 0 0/4/0
heuristic#DCkysN 125 374/1472/0/3127 374/847/0/0 2 1/3 0 6/0/2
neighbors#CZggLo 349 1335/3531/0/0 1335/3531/0/0 2 0/0 50 8/0/4
go#A9jRvV 100 0/0/0/0 0/0/0/0 0 0/0 0 N/A
getEndingCell#EIKyLA 113 0/0/0/0 0/0/0/0 0 0/0 0 N/A
getStartingCell#DPYZfh 133 0/0/0/0 0/0/0/0 0 0/0 0 N/A
<global>#CnBI62 311876 0/0/0/0 0/0/0/0 0 0/0 0 N/A
Snapshot on every entry:
findGraphNode#EBmwVP 83 2192/40538040/1801301/4366274 2192/620297/1801301/0 3 1/7 0 2/0/0
search#DTtNtq 686 36652/228678/0/35763 36652/39611751/0/4329914 8 0/0 1 33/4/9
removeGraphNode#DqCfaQ 102 1441/260938/0/35371 1441/35700/0/0 2 1/7 0 3/0/0
isWall#ESQBmN 21 236/11525/0/602 236/2536/0/0 2 1/7 0 1/0/0
<global>#CnBI62 311876 9511/0/0/0 9511/0/0/0 1 0/0 0 0/0/0
GraphNode#BmXric 86 8805/701/0/0 8805/701/0/0 2 0/0 0 0/7/0
init#Ct2Vq3 168 8605/0/0/0 8605/0/0/0 2 0/0 0 0/4/0
heuristic#DCkysN 125 280/4465/0/213 280/837/0/0 2 1/7 0 6/0/2
neighbors#CZggLo 349 1266/3614/0/0 1266/3614/0/0 2 0/0 65 8/0/4
getStartingCell#DPYZfh 133 0/0/0/0 0/0/0/0 0 0/0 0 N/A
getEndingCell#EIKyLA 113 0/0/0/0 0/0/0/0 0 0/0 0 N/A
go#A9jRvV 100 0/0/0/0 0/0/0/0 0 0/0 0 N/A
Snapshot on every entry that canOSRFromHere (attached patch):
original new
ai-astar 68.971+-0.810 ! 122.015+-10.873 ! definitely 1.7691x slower
audio-beat-detection 30.100+-0.494 29.857+-0.234
audio-dft 70.446+-0.501 ? 71.522+-0.954 ? might be 1.0153x slower
audio-fft 22.214+-1.279 21.557+-0.058 might be 1.0305x faster
audio-oscillator 44.986+-0.178 ? 45.233+-0.140 ?
imaging-darkroom 50.246+-0.178 ? 50.995+-1.570 ? might be 1.0149x slower
imaging-desaturate 46.750+-0.492 ? 46.863+-0.725 ?
imaging-gaussian-blur 54.106+-0.972 ! 85.167+-13.275 ! definitely 1.5741x slower
json-parse-financial 29.485+-0.062 ^ 29.339+-0.049 ^ definitely 1.0050x faster
json-stringify-tinderbox 19.100+-0.076 ? 19.238+-0.089 ?
stanford-crypto-aes 30.627+-0.137 ? 30.683+-0.146 ?
stanford-crypto-ccm 28.089+-0.631 26.967+-0.564 might be 1.0416x faster
stanford-crypto-pbkdf2 93.139+-0.430 ? 94.900+-3.599 ? might be 1.0189x slower
stanford-crypto-sha256-iterative 29.866+-0.079 ? 30.381+-0.875 ? might be 1.0172x slower
<arithmetic> 44.152+-0.128 ! 50.337+-1.148 ! definitely 1.1401x slower
With snapshot refresh every time tierUpEntryTriggers->value is non-zero (not when the threshold causes entry):
original new
ai-astar 68.432+-0.934 ! 129.185+-8.533 ! definitely 1.8878x slower
audio-beat-detection 29.719+-0.078 ? 30.212+-0.435 ? might be 1.0166x slower
audio-dft 71.023+-0.540 70.820+-0.490
audio-fft 21.629+-0.052 21.588+-0.062
audio-oscillator 45.109+-0.110 ? 45.236+-0.182 ?
imaging-darkroom 50.273+-0.223 50.272+-0.220
imaging-desaturate 47.295+-0.650 ? 47.364+-0.885 ?
imaging-gaussian-blur 54.633+-1.209 ! 94.046+-13.214 ! definitely 1.7214x slower
json-parse-financial 29.578+-0.079 29.514+-0.149
json-stringify-tinderbox 19.259+-0.078 19.145+-0.068
stanford-crypto-aes 30.612+-0.113 ? 31.585+-2.217 ? might be 1.0318x slower
stanford-crypto-ccm 27.001+-0.822 26.943+-0.737
stanford-crypto-pbkdf2 94.911+-2.870 93.123+-0.145 might be 1.0192x faster
stanford-crypto-sha256-iterative 30.470+-1.105 30.468+-1.097
<arithmetic> 44.282+-0.237 ! 51.393+-1.039 ! definitely 1.1606x slower
With snapshot creation only on first entry (not on every trigger, and not when the threshold causes entry):
original new
ai-astar 69.160+-0.928 ! 131.462+-13.933 ! definitely 1.9008x slower
audio-beat-detection 29.831+-0.121 ? 29.856+-0.129 ?
audio-dft 71.850+-1.703 70.949+-0.512 might be 1.0127x faster
audio-fft 21.573+-0.047 ? 21.635+-0.072 ?
audio-oscillator 45.208+-0.140 ? 45.211+-0.128 ?
imaging-darkroom 50.431+-0.221 ? 50.452+-0.172 ?
imaging-desaturate 51.604+-9.816 46.967+-0.597 might be 1.0987x faster
imaging-gaussian-blur 53.568+-1.192 ! 88.972+-13.724 ! definitely 1.6609x slower
json-parse-financial 29.618+-0.087 ^ 28.988+-0.102 ^ definitely 1.0217x faster
json-stringify-tinderbox 19.201+-0.096 ^ 18.511+-0.101 ^ definitely 1.0373x faster
stanford-crypto-aes 30.747+-0.260 ? 31.444+-1.533 ? might be 1.0227x slower
stanford-crypto-ccm 27.170+-0.658 ? 27.527+-0.765 ? might be 1.0131x slower
stanford-crypto-pbkdf2 93.346+-0.389 93.088+-0.147
stanford-crypto-sha256-iterative 29.946+-0.084 29.930+-0.103
<arithmetic> 44.518+-0.736 ! 51.071+-1.430 ! definitely 1.1472x slower
Created attachment 300546 [details]
patch
This patch is perf neutral, and is minimally-disruptive. I think this will stick! 🎉
Created attachment 300547 [details]
Performance results
Of special interest are the kraken results:
Kraken:
ai-astar 68.933+-1.456 68.404+-1.343
audio-beat-detection 29.792+-0.137 ? 29.965+-0.765 ?
audio-dft 71.272+-0.797 70.672+-0.700
audio-fft 23.429+-2.957 21.546+-0.103 might be 1.0874x faster
audio-oscillator 44.972+-0.203 ? 45.022+-0.296 ?
imaging-darkroom 50.153+-0.296 50.057+-0.407
imaging-desaturate 46.187+-0.685 46.068+-1.034
imaging-gaussian-blur 54.450+-1.002 53.633+-1.700 might be 1.0152x faster
json-parse-financial 30.566+-1.894 28.952+-0.161 might be 1.0557x faster
json-stringify-tinderbox 19.054+-0.142 ^ 18.637+-0.167 ^ definitely 1.0224x faster
stanford-crypto-aes 31.273+-1.453 30.708+-0.460 might be 1.0184x faster
stanford-crypto-ccm 28.895+-2.198 27.556+-0.912 might be 1.0486x faster
stanford-crypto-pbkdf2 92.832+-0.457 ? 92.892+-0.516 ?
stanford-crypto-sha256-iterative 29.809+-0.130 ? 30.369+-1.425 ? might be 1.0188x slower
<arithmetic> 44.401+-0.297 43.892+-0.216 might be 1.0116x faster
Comment on attachment 300546 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=300546&action=review Seems mostly good to me. Just some questions, some of which may stem from my shallow understanding of this code. > Source/JavaScriptCore/ChangeLog:23 > + entering. It therefore makes the trigger a tri-state (0: don't > + enter, 1: compilation complete, 2: please compile). Only 2 gets > + reset to zero, 1 does not. Does it get reset to zero if the OSR entry compilation is jettisoned? > Source/JavaScriptCore/dfg/DFGOperations.cpp:2359 > + bool triggeredSlowPath = false; This name confuses me. I think it really means that we want a compilation elsewhere > Source/JavaScriptCore/dfg/DFGOperations.cpp:2363 > + case 0: Maybe it's worth giving these numbers some names in an enum? > Source/JavaScriptCore/dfg/DFGOperations.cpp:2470 > + jitCode->dontOptimizeAnytimeSoon(codeBlock); Why this? > Source/JavaScriptCore/dfg/DFGOperations.cpp:2479 > + for (unsigned osrEntryCandidate : tierUpHierarchyEntry->value) { Don't you want to try to compile the outermost loop first? If so, I think you want to iterate this vector backwards. > Source/JavaScriptCore/dfg/DFGOperations.cpp:2485 > + jitCode->dontOptimizeAnytimeSoon(codeBlock); Why this? Fil asked me to check what the FTL compilation is doing to see when we FTL versus FTLOSR compile. $ VM=./WebKitBuild/Release/ && DYLD_FRAMEWORK_PATH=$VM $VM/jsc ./JSTests/ChakraCore/test/benchmarks/Kraken/ai-astar.js --reportFTLCompileTimes=1 Before my patch: Optimized findGraphNode#EBmwVP:[0x10756ac60->0x107569ae0->0x1075b7480, NoneFunctionCall, 83 (DidTryToEnterInLoop)] using FTLMode with FTL into 731 bytes in 18.484423 ms (DFG: 6.894247, B3: 11.590176). Optimized search#DTtNtq:[0x10756ae90->0x107568ff0->0x1075b7660, NoneFunctionCall, 686 (DidTryToEnterInLoop)] using FTLForOSREntryMode with FTL into 3318 bytes in 16.226048 ms (DFG: 6.966971, B3: 9.259077). Optimized search#DTtNtq:[0x10756b0c0->0x107568ff0->0x1075b7660, NoneFunctionCall, 686 (DidTryToEnterInLoop)] using FTLMode with FTL into 4111 bytes in 16.218263 ms (DFG: 7.276756, B3: 8.941507). After my patch: Optimized GraphNode#BmXric:[0x106568730->0x1065682d0->0x1065b6e40, NoneFunctionConstruct, 86] using FTLMode with FTL into 981 bytes in 1.618677 ms (DFG: 0.628264, B3: 0.990413). Optimized findGraphNode#EBmwVP:[0x10656ae90->0x106569d10->0x1065b7480, NoneFunctionCall, 83] using FTLMode with FTL into 731 bytes in 1.043065 ms (DFG: 0.535857, B3: 0.507208). Optimized search#DTtNtq:[0x10656b0c0->0x106569220->0x1065b7660, NoneFunctionCall, 686 (DidTryToEnterInLoop)] using FTLForOSREntryMode with FTL into 3433 bytes in 11.175271 ms (DFG: 6.220067, B3: 4.955204). Optimized search#DTtNtq:[0x10656b2f0->0x106569220->0x1065b7660, NoneFunctionCall, 686 (DidTryToEnterInLoop)] using FTLMode with FTL into 4261 bytes in 11.527336 ms (DFG: 6.404499, B3: 5.122837). So search starts with FTLOSR before FTL. The window is between the two (which is where unsetting the trigger was causing perf to tank, because we didn't enter DFG->FTLOSR more than once) is pretty small. Adding more printing, we start FTL and FTLOSR compilations within 0.01 ms of each other, but FTLOSR finishes 0.5 ms earlier. There's just a tiny window, considering the benchmark itself runs for ~160 ms. Inlining looks like this: Compilation search#DTtNtq-2-DFG: bc#297 --> removeGraphNode#DqCfaQ bc#407 --> findGraphNode#EBmwVP bc#431 --> isWall#ESQBmN bc#475 --> findGraphNode#EBmwVP bc#535 --> heuristic#DCkysN Compilation search#DTtNtq-3-FTLForOSREntry: bc#297 --> removeGraphNode#DqCfaQ bc#297 --> removeGraphNode#DqCfaQ bc#407 --> findGraphNode#EBmwVP bc#431 --> isWall#ESQBmN bc#475 --> findGraphNode#EBmwVP bc#535 --> heuristic#DCkysN Created attachment 300563 [details] patch Patch updated. > > Source/JavaScriptCore/ChangeLog:23 > > + entering. It therefore makes the trigger a tri-state (0: don't > > + enter, 1: compilation complete, 2: please compile). Only 2 gets > > + reset to zero, 1 does not. > > Does it get reset to zero if the OSR entry compilation is jettisoned? Oh yeah I should mention that! > > Source/JavaScriptCore/dfg/DFGOperations.cpp:2359 > > + bool triggeredSlowPath = false; > > This name confuses me. I think it really means that we want a compilation > elsewhere True, I renamed it. My refactoring will also make that much clearer. > > Source/JavaScriptCore/dfg/DFGOperations.cpp:2363 > > + case 0: > > Maybe it's worth giving these numbers some names in an enum? Definitely! The refactoring patch did this. I was trying to keep this patch as small as I can (using an enum touches other files). But since it's not changing any behavior I'll add the enum here. > > Source/JavaScriptCore/dfg/DFGOperations.cpp:2470 > > + jitCode->dontOptimizeAnytimeSoon(codeBlock); > > Why this? The idea here is to bump the threshold so that this function's loops stop entering this OSR handling code. We decided than another loop is a good place to optimize, and we've set their trigger accordingly, so they'll come into this OSR handling code, but we don't want the counter to do so as well. > > Source/JavaScriptCore/dfg/DFGOperations.cpp:2479 > > + for (unsigned osrEntryCandidate : tierUpHierarchyEntry->value) { > > Don't you want to try to compile the outermost loop first? If so, I think > you want to iterate this vector backwards. Correct, that's what yields better perf IIUC, but I'm leaving the old code as-is to not change its behavior. I'm just delaying the compile for correctness, not changing which loop gets compiled (as I did in my refactoring). I'll validate that going the other way is better separately. > > Source/JavaScriptCore/dfg/DFGOperations.cpp:2485 > > + jitCode->dontOptimizeAnytimeSoon(codeBlock); > > Why this? Same as above. Comment on attachment 300563 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=300563&action=review > Source/JavaScriptCore/dfg/DFGOperations.cpp:2470 > + jitCode->dontOptimizeAnytimeSoon(codeBlock); This should be optimizeSoon. > Source/JavaScriptCore/dfg/DFGOperations.cpp:2485 > + jitCode->dontOptimizeAnytimeSoon(codeBlock); This should be optimizeSoon. Comment on attachment 300563 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=300563&action=review >> Source/JavaScriptCore/dfg/DFGOperations.cpp:2470 >> + jitCode->dontOptimizeAnytimeSoon(codeBlock); > > This should be optimizeSoon. Make that setOptimizationBlahBlah(Deferred) Created attachment 300574 [details]
patch
As we discussed on IRC I tried the 3 different approaches. setOptimizationThresholdBasedOnCompilationResult is the least different from existing code (it currently compiles and calls that function, we now set the trigger and then we want to back off a bit). The worry Saam expressed was that I was delaying regular FTL compile too much. Benchmarks don't show a difference, but we may just be getting lucky.
This patch calls setOptimizationThresholdBasedOnCompilationResult.
Here are kraken results. I'll compare setOptimizationThresholdBasedOnCompilationResult in a full benchmark run and post those results.
$ run-jsc-benchmarks original:./WebKitBuild/current/jsc dontSoon:./WebKitBuild/dontOptimizeAnytimeSoon/jsc optimizeSoon:./WebKitBuild/optimizeSoon/jsc setBlah:./WebKitBuild/setOptimizationThresholdBasedOnCompilationResult/jsc --kraken --outer 20
original dontSoon optimizeSoon setBlah setBlah v. original
ai-astar 67.627+-1.053 67.544+-0.798 ? 67.953+-0.978 67.816+-0.893 ?
audio-beat-detection 29.133+-0.075 ? 29.167+-0.195 29.132+-0.082 ? 29.444+-0.602 ? might be 1.0107x slower
audio-dft 60.809+-0.943 60.250+-0.500 ? 60.580+-0.707 60.182+-0.612 might be 1.0104x faster
audio-fft 21.700+-0.663 21.397+-0.116 ? 21.871+-0.689 21.791+-0.882 ?
audio-oscillator 44.702+-0.137 ? 44.817+-0.178 ? 44.823+-0.352 ? 44.930+-0.291 ?
imaging-darkroom 50.023+-0.183 ? 50.038+-0.235 49.932+-0.170 49.903+-0.215
imaging-desaturate 42.867+-0.705 ? 44.027+-0.812 43.403+-0.818 43.248+-0.686 ?
imaging-gaussian-blur 53.495+-0.845 ? 53.639+-0.928 53.342+-0.993 52.668+-1.117 might be 1.0157x faster
json-parse-financial 29.381+-0.133 ^ 27.807+-0.093 27.804+-0.128 ! 28.143+-0.180 ^ definitely 1.0440x faster
json-stringify-tinderbox 17.856+-0.060 ! 18.469+-0.080 ? 18.482+-0.116 18.456+-0.134 ! definitely 1.0336x slower
stanford-crypto-aes 30.170+-0.145 30.113+-0.180 ? 30.128+-0.187 29.999+-0.192
stanford-crypto-ccm 26.464+-0.905 ? 26.932+-0.658 ? 27.611+-1.037 26.562+-0.881 ?
stanford-crypto-pbkdf2 94.567+-3.453 92.140+-0.546 ? 92.339+-0.588 92.136+-0.507 might be 1.0264x faster
stanford-crypto-sha256-iterative 29.667+-0.616 ? 29.714+-0.638 ? 30.296+-1.051 29.510+-0.134
<arithmetic> 42.747+-0.281 42.575+-0.110 ? 42.693+-0.165 42.485+-0.168 might be 1.0062x faster
Comment on attachment 300574 [details]
patch
r=me too
I ran the full benchmark suite. Everything looks perf-neutral, including Sunspider and Octane! Comment on attachment 300574 [details] patch Clearing flags on attachment: 300574 Committed r211658: <http://trac.webkit.org/changeset/211658> All reviewed patches have been landed. Closing bug. |