WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 154575
[JSC] Be aggressive with OSR Entry to FTL if the DFG function was only used for OSR Entry itself
https://bugs.webkit.org/show_bug.cgi?id=154575
Summary
[JSC] Be aggressive with OSR Entry to FTL if the DFG function was only used f...
Benjamin Poulain
Reported
2016-02-23 00:34:43 PST
[JSC] Be aggressive with OSR Entry to FTL if the DFG function was only used for OSR Entry itself
Attachments
Patch
(13.52 KB, patch)
2016-02-23 01:08 PST
,
Benjamin Poulain
no flags
Details
Formatted Diff
Diff
Patch
(14.06 KB, patch)
2016-02-23 16:03 PST
,
Benjamin Poulain
no flags
Details
Formatted Diff
Diff
Archive of layout-test-results from ews117 for mac-yosemite
(1.55 MB, application/zip)
2016-02-23 17:21 PST
,
Build Bot
no flags
Details
Patch for landing
(13.30 KB, patch)
2016-02-25 21:01 PST
,
Benjamin Poulain
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Benjamin Poulain
Comment 1
2016-02-23 01:08:31 PST
Created
attachment 271995
[details]
Patch
Benjamin Poulain
Comment 2
2016-02-23 01:16:36 PST
Hey Filip, can you have a look and give me your opinion? That little trick works very well: Conf#1 Conf#2 SunSpider: 3d-cube 4.8309+-0.0998 4.8237+-0.1627 3d-morph 5.4340+-0.1700 5.2512+-0.0494 might be 1.0348x faster 3d-raytrace 5.7331+-0.1216 5.7299+-0.1765 access-binary-trees 2.2273+-0.1203 ? 2.2488+-0.0972 ? access-fannkuch 6.0140+-0.2546 5.9969+-0.0787 access-nbody 2.7809+-0.2294 2.7195+-0.0924 might be 1.0226x faster access-nsieve 3.3097+-0.0783 ^ 3.0451+-0.0695 ^ definitely 1.0869x faster bitops-3bit-bits-in-byte 1.1572+-0.0170 ? 1.1713+-0.0245 ? might be 1.0122x slower bitops-bits-in-byte 3.1884+-0.0471 ? 3.2759+-0.1029 ? might be 1.0275x slower bitops-bitwise-and 2.0665+-0.0687 ? 2.1054+-0.1248 ? might be 1.0188x slower bitops-nsieve-bits 3.0493+-0.0936 2.9679+-0.0313 might be 1.0274x faster controlflow-recursive 2.3688+-0.0154 ? 2.3803+-0.0244 ? crypto-aes 4.0833+-0.0675 4.0278+-0.0443 might be 1.0138x faster crypto-md5 2.6064+-0.0138 2.5866+-0.0259 crypto-sha1 2.3346+-0.0386 ? 2.3821+-0.1082 ? might be 1.0203x slower date-format-tofte 7.0051+-0.1386 6.8112+-0.1304 might be 1.0285x faster date-format-xparb 4.8731+-0.2053 4.7642+-0.1009 might be 1.0229x faster math-cordic 2.9680+-0.0175 ? 3.0015+-0.1107 ? might be 1.0113x slower math-partial-sums 4.9964+-0.1935 4.8825+-0.0865 might be 1.0233x faster math-spectral-norm 2.0783+-0.0989 2.0186+-0.0157 might be 1.0296x faster regexp-dna 6.1427+-0.1263 6.1332+-0.1810 string-base64 4.5708+-0.1230 4.5161+-0.0304 might be 1.0121x faster string-fasta 5.9487+-0.1408 5.9158+-0.1049 string-tagcloud 8.1149+-0.0482 ? 8.1798+-0.1746 ? string-unpack-code 19.5484+-1.5039 19.4468+-0.8064 string-validate-input 4.5427+-0.1467 4.4092+-0.0482 might be 1.0303x faster <arithmetic> 4.6913+-0.0685 4.6458+-0.0477 might be 1.0098x faster Conf#1 Conf#2 Octane: encrypt 0.15762+-0.00148 0.15729+-0.00191 decrypt 2.81182+-0.01771 2.80344+-0.00966 deltablue x2 0.14028+-0.00574 0.13679+-0.00104 might be 1.0255x faster earley 0.28466+-0.00142 ? 0.28473+-0.00136 ? boyer 4.75015+-0.07751 4.66607+-0.09973 might be 1.0180x faster navier-stokes x2 4.97927+-0.06056 4.95214+-0.01942 raytrace x2 0.89419+-0.00406 ? 0.89631+-0.00544 ? richards x2 0.08224+-0.00067 0.08112+-0.00061 might be 1.0138x faster splay x2 0.34324+-0.00214 0.34075+-0.00168 regexp x2 25.14563+-0.53362 24.88729+-0.38187 might be 1.0104x faster pdfjs x2 38.77979+-0.18101 ? 38.89784+-0.16883 ? mandreel x2 43.96661+-0.11696 ? 43.98979+-0.31191 ? gbemu x2 24.78282+-0.11070 ? 24.83071+-0.11930 ? closure 0.56542+-0.00223 ? 0.56736+-0.00172 ? jquery 7.43416+-0.03158 7.41405+-0.06601 box2d x2 9.35055+-0.08793 ? 9.40528+-0.06360 ? zlib x2 390.20225+-3.35458 382.76264+-8.50273 might be 1.0194x faster typescript x2 665.50718+-5.42229 ? 671.65210+-5.75402 ? <geometric> 5.29438+-0.01384 5.27045+-0.01076 might be 1.0045x faster Conf#1 Conf#2 Kraken: ai-astar 96.835+-1.794 95.968+-1.847 audio-beat-detection 47.489+-0.149 ? 47.619+-0.489 ? audio-dft 97.756+-1.274 97.345+-1.540 audio-fft 36.054+-0.615 36.014+-0.597 audio-oscillator 48.688+-0.334 48.401+-0.070 imaging-darkroom 60.075+-0.078 ? 60.154+-0.146 ? imaging-desaturate 44.236+-0.113 ? 44.794+-0.478 ? might be 1.0126x slower imaging-gaussian-blur 77.570+-2.137 ^ 71.610+-2.005 ^ definitely 1.0832x faster json-parse-financial 37.282+-0.756 36.592+-0.363 might be 1.0189x faster json-stringify-tinderbox 24.363+-0.741 ? 25.270+-0.595 ? might be 1.0372x slower stanford-crypto-aes 39.551+-0.182 ? 39.837+-0.392 ? stanford-crypto-ccm 37.334+-1.214 36.860+-0.978 might be 1.0128x faster stanford-crypto-pbkdf2 103.014+-1.560 101.889+-0.458 might be 1.0110x faster stanford-crypto-sha256-iterative 38.691+-0.396 ? 38.947+-0.641 ? <arithmetic> 56.353+-0.348 55.807+-0.203 might be 1.0098x faster Conf#1 Conf#2 AsmBench: bigfib.cpp 431.5021+-1.5016 ? 439.3357+-6.5325 ? might be 1.0182x slower cray.c 370.1062+-1.5145 ^ 365.4786+-1.4808 ^ definitely 1.0127x faster dry.c 483.1682+-45.2844 432.2826+-28.2020 might be 1.1177x faster FloatMM.c 714.6547+-6.7181 ? 719.8654+-2.8353 ? gcc-loops.cpp 3638.1944+-38.9169 3630.2125+-10.2586 n-body.c 813.7874+-3.7919 ? 815.4070+-4.8202 ? Quicksort.c 403.4789+-3.7404 399.3948+-2.9873 might be 1.0102x faster stepanov_container.cpp 3307.9732+-12.8669 3293.9502+-14.2785 Towers.c 273.2829+-5.2717 268.3912+-1.1232 might be 1.0182x faster <geometric> 730.1670+-7.7325 720.0118+-4.3914 might be 1.0141x faster Conf#1 Conf#2 Geomean of preferred means: <scaled-result> 31.7928+-0.1591 ^ 31.4935+-0.1016 ^ definitely 1.0095x faster
Filip Pizlo
Comment 3
2016-02-23 10:09:22 PST
Comment on
attachment 271995
[details]
Patch This seems like a really cool trick to me! The patch looks good to me.
Benjamin Poulain
Comment 4
2016-02-23 16:03:23 PST
Created
attachment 272061
[details]
Patch
Filip Pizlo
Comment 5
2016-02-23 16:24:48 PST
Comment on
attachment 272061
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=272061&action=review
> Source/JavaScriptCore/dfg/DFGJITCode.h:142 > + uint8_t neverExecutedEntry { 0xff };
Why 0xff and not just 1?
Benjamin Poulain
Comment 6
2016-02-23 16:25:37 PST
(In reply to
comment #5
)
> Comment on
attachment 272061
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=272061&action=review
> > > Source/JavaScriptCore/dfg/DFGJITCode.h:142 > > + uint8_t neverExecutedEntry { 0xff }; > > Why 0xff and not just 1?
That was arbitrary. I'll change to 1.
Build Bot
Comment 7
2016-02-23 17:21:52 PST
Comment on
attachment 272061
[details]
Patch
Attachment 272061
[details]
did not pass mac-debug-ews (mac): Output:
http://webkit-queues.webkit.org/results/874012
New failing tests: webaudio/oscillator-square.html js/regress/get-by-val-with-string-self-or-proto.html js/regress/string-get-by-val-big-char.html webaudio/oscillator-custom.html js/regress/Int16Array-to-Int32Array-set.html js/regress/Float32Array-to-Float64Array-set.html webaudio/audiobuffersource-playbackrate.html webaudio/oscillator-triangle.html js/regress/rare-osr-exit-on-local.html js/regress/Float64Array-to-Int16Array-set.html js/regress/adapt-to-double-divide.html webaudio/oscillator-sine.html webaudio/oscillator-sawtooth.html
Build Bot
Comment 8
2016-02-23 17:21:55 PST
Created
attachment 272069
[details]
Archive of layout-test-results from ews117 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews117 Port: mac-yosemite Platform: Mac OS X 10.10.5
Benjamin Poulain
Comment 9
2016-02-23 17:38:01 PST
Looks like the new assertion was useful :) I'll fix that before landing.
Geoffrey Garen
Comment 10
2016-02-23 22:14:07 PST
Comment on
attachment 272061
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=272061&action=review
> Source/JavaScriptCore/dfg/DFGJITCompiler.cpp:118 > + store8(TrustedImm32(0), &m_jitCode->neverExecutedEntry);
FWIW, I think this would be clearer as a store of true to a value named didExecuteEntry. The way this code reads now, it says "I did not not enter", which hurts my brain.
Saam Barati
Comment 11
2016-02-23 23:36:01 PST
(In reply to
comment #10
)
> Comment on
attachment 272061
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=272061&action=review
> > > Source/JavaScriptCore/dfg/DFGJITCompiler.cpp:118 > > + store8(TrustedImm32(0), &m_jitCode->neverExecutedEntry); > > FWIW, I think this would be clearer as a store of true to a value named > didExecuteEntry. > > The way this code reads now, it says "I did not not enter", which hurts my > brain.
I suggested the same thing but as Ben pointed out to me this is more efficient to do because we can always store 0 using the zero register on arm64. Maybe it's worth adding a function "didExecuteEntry()" into jitCode to make most code concerning this field more readable?
Benjamin Poulain
Comment 12
2016-02-25 21:01:29 PST
Created
attachment 272289
[details]
Patch for landing
WebKit Commit Bot
Comment 13
2016-02-25 21:59:43 PST
Comment on
attachment 272289
[details]
Patch for landing Clearing flags on attachment: 272289 Committed
r197159
: <
http://trac.webkit.org/changeset/197159
>
WebKit Commit Bot
Comment 14
2016-02-25 21:59:47 PST
All reviewed patches have been landed. Closing bug.
Alexey Proskuryakov
Comment 15
2016-02-27 12:06:14 PST
This broke a test, filed
bug 154778
.
Filip Pizlo
Comment 16
2016-03-01 21:32:06 PST
Comment on
attachment 272289
[details]
Patch for landing View in context:
https://bugs.webkit.org/attachment.cgi?id=272289&action=review
> Source/JavaScriptCore/ChangeLog:22 > + What this patch does is set a flag when a DFG function is entered. > + If we try to triggerOSREntryNow() and the flag was never set, > + we start compiling both the full function and the one for OSR Entry.
I don't think this is an accurate description of what your change does...
> Source/JavaScriptCore/dfg/DFGOperations.cpp:1562 > - if (!codeBlock->hasOptimizedReplacement()) > - return 0; > - > - if (jitCode->osrEntryRetry < Options::ftlOSREntryRetryThreshold()) { > - jitCode->osrEntryRetry++; > - return 0; > + if (!shouldTriggerFTLCompile(codeBlock, jitCode)) > + return nullptr; > + > + if (!jitCode->neverExecutedEntry) { > + triggerFTLReplacementCompile(vm, codeBlock, jitCode); > + > + if (!codeBlock->hasOptimizedReplacement()) > + return nullptr;
Prior to your change, we would require an optimized replacement and ftlOSREntryRetryThreshold OSR entry attempts before we'd initiate an OSR entry compile. After your change, a code block that had never executed entry, we will no longer require ftlOSREntryRetryThreshold OSR entry attempts. Also, your ChangeLog claims that we will trigger both compilations simultaneously. I don't think that's true. It looks like if neverExecutedEntry is true, then we won't trigger an entry compile until we actually enter the function. I actually like the new policy, but maybe it would be good to document it somewhere.
Filip Pizlo
Comment 17
2016-03-01 21:41:39 PST
(In reply to
comment #16
)
> Comment on
attachment 272289
[details]
> Patch for landing > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=272289&action=review
> > > Source/JavaScriptCore/ChangeLog:22 > > + What this patch does is set a flag when a DFG function is entered. > > + If we try to triggerOSREntryNow() and the flag was never set, > > + we start compiling both the full function and the one for OSR Entry. > > I don't think this is an accurate description of what your change does... > > > Source/JavaScriptCore/dfg/DFGOperations.cpp:1562 > > - if (!codeBlock->hasOptimizedReplacement()) > > - return 0; > > - > > - if (jitCode->osrEntryRetry < Options::ftlOSREntryRetryThreshold()) { > > - jitCode->osrEntryRetry++; > > - return 0; > > + if (!shouldTriggerFTLCompile(codeBlock, jitCode)) > > + return nullptr; > > + > > + if (!jitCode->neverExecutedEntry) { > > + triggerFTLReplacementCompile(vm, codeBlock, jitCode); > > + > > + if (!codeBlock->hasOptimizedReplacement()) > > + return nullptr; > > Prior to your change, we would require an optimized replacement and > ftlOSREntryRetryThreshold OSR entry attempts before we'd initiate an OSR > entry compile. > > After your change, a code block that had never executed entry, we will no > longer require ftlOSREntryRetryThreshold OSR entry attempts. > > Also, your ChangeLog claims that we will trigger both compilations > simultaneously. I don't think that's true. It looks like if > neverExecutedEntry is true, then we won't trigger an entry compile until we > actually enter the function. > > I actually like the new policy, but maybe it would be good to document it > somewhere.
Oh wow, but that first part - the fact that we no longer do ftlOSREntryRetryThreshold attempts after compiling the replacement - is a huge loss on Octane/regexp after I enable it to run FTL. I'm going to try to add that heuristic back in. The point is: if the function is known to have executed entry in DFG, and we have a replacement compile, but we're still in the DFG loop, then it's better to try to wait a bit instead of immediately firing off an OSR entry compile. That's a good rule in case the function is large and expensive to compile.
Benjamin Poulain
Comment 18
2016-03-01 22:10:11 PST
> Also, your ChangeLog claims that we will trigger both compilations > simultaneously. I don't think that's true. It looks like if > neverExecutedEntry is true, then we won't trigger an entry compile until we > actually enter the function.
I think I did fuck that up on some patches. The patch that landed should have the right thing:
http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/dfg/DFGOperations.cpp#L1622
> Oh wow, but that first part - the fact that we no longer do > ftlOSREntryRetryThreshold attempts after compiling the replacement - is a > huge loss on Octane/regexp after I enable it to run FTL.
Okay. I did that because ftlOSREntryRetryThreshold() was delaying entry on a bunch of benchmark and being more aggressive was worth a few percents on a few tests. Please test everything before adding the threshold.
Filip Pizlo
Comment 19
2016-03-02 12:10:26 PST
(In reply to
comment #18
)
> > Also, your ChangeLog claims that we will trigger both compilations > > simultaneously. I don't think that's true. It looks like if > > neverExecutedEntry is true, then we won't trigger an entry compile until we > > actually enter the function. > > I think I did fuck that up on some patches. > The patch that landed should have the right thing: >
http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/dfg/DFGOperations
. > cpp#L1622 > > > Oh wow, but that first part - the fact that we no longer do > > ftlOSREntryRetryThreshold attempts after compiling the replacement - is a > > huge loss on Octane/regexp after I enable it to run FTL. > > Okay. > > I did that because ftlOSREntryRetryThreshold() was delaying entry on a bunch > of benchmark and being more aggressive was worth a few percents on a few > tests. > > Please test everything before adding the threshold.
I tested everything! It looks OK. Note that I kept all other aspects of your heuristics, including only triggering an OSR entry compile if entry never executed.
Filip Pizlo
Comment 20
2016-03-02 12:11:20 PST
(In reply to
comment #19
)
> (In reply to
comment #18
) > > > Also, your ChangeLog claims that we will trigger both compilations > > > simultaneously. I don't think that's true. It looks like if > > > neverExecutedEntry is true, then we won't trigger an entry compile until we > > > actually enter the function. > > > > I think I did fuck that up on some patches. > > The patch that landed should have the right thing: > >
http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/dfg/DFGOperations
. > > cpp#L1622 > > > > > Oh wow, but that first part - the fact that we no longer do > > > ftlOSREntryRetryThreshold attempts after compiling the replacement - is a > > > huge loss on Octane/regexp after I enable it to run FTL. > > > > Okay. > > > > I did that because ftlOSREntryRetryThreshold() was delaying entry on a bunch > > of benchmark and being more aggressive was worth a few percents on a few > > tests. > > > > Please test everything before adding the threshold. > > I tested everything! It looks OK. Note that I kept all other aspects of > your heuristics, including only triggering an OSR entry compile if entry > never executed.
See
https://bugs.webkit.org/show_bug.cgi?id=154901#c2
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug