Bug 154575 - [JSC] Be aggressive with OSR Entry to FTL if the DFG function was only used for OSR Entry itself
Summary: [JSC] Be aggressive with OSR Entry to FTL if the DFG function was only used f...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Benjamin Poulain
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-23 00:34 PST by Benjamin Poulain
Modified: 2016-03-02 12:11 PST (History)
6 users (show)

See Also:


Attachments
Patch (13.52 KB, patch)
2016-02-23 01:08 PST, Benjamin Poulain
no flags Details | Formatted Diff | Diff
Patch (14.06 KB, patch)
2016-02-23 16:03 PST, Benjamin Poulain
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews117 for mac-yosemite (1.55 MB, application/zip)
2016-02-23 17:21 PST, Build Bot
no flags Details
Patch for landing (13.30 KB, patch)
2016-02-25 21:01 PST, Benjamin Poulain
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Benjamin Poulain 2016-02-23 00:34:43 PST
[JSC] Be aggressive with OSR Entry to FTL if the DFG function was only used for OSR Entry itself
Comment 1 Benjamin Poulain 2016-02-23 01:08:31 PST
Created attachment 271995 [details]
Patch
Comment 2 Benjamin Poulain 2016-02-23 01:16:36 PST
Hey Filip, can you have a look and give me your opinion?

That little trick works very well:

                                                  Conf#1                    Conf#2                                      
SunSpider:
   3d-cube                                    4.8309+-0.0998            4.8237+-0.1627        
   3d-morph                                   5.4340+-0.1700            5.2512+-0.0494          might be 1.0348x faster
   3d-raytrace                                5.7331+-0.1216            5.7299+-0.1765        
   access-binary-trees                        2.2273+-0.1203     ?      2.2488+-0.0972        ?
   access-fannkuch                            6.0140+-0.2546            5.9969+-0.0787        
   access-nbody                               2.7809+-0.2294            2.7195+-0.0924          might be 1.0226x faster
   access-nsieve                              3.3097+-0.0783     ^      3.0451+-0.0695        ^ definitely 1.0869x faster
   bitops-3bit-bits-in-byte                   1.1572+-0.0170     ?      1.1713+-0.0245        ? might be 1.0122x slower
   bitops-bits-in-byte                        3.1884+-0.0471     ?      3.2759+-0.1029        ? might be 1.0275x slower
   bitops-bitwise-and                         2.0665+-0.0687     ?      2.1054+-0.1248        ? might be 1.0188x slower
   bitops-nsieve-bits                         3.0493+-0.0936            2.9679+-0.0313          might be 1.0274x faster
   controlflow-recursive                      2.3688+-0.0154     ?      2.3803+-0.0244        ?
   crypto-aes                                 4.0833+-0.0675            4.0278+-0.0443          might be 1.0138x faster
   crypto-md5                                 2.6064+-0.0138            2.5866+-0.0259        
   crypto-sha1                                2.3346+-0.0386     ?      2.3821+-0.1082        ? might be 1.0203x slower
   date-format-tofte                          7.0051+-0.1386            6.8112+-0.1304          might be 1.0285x faster
   date-format-xparb                          4.8731+-0.2053            4.7642+-0.1009          might be 1.0229x faster
   math-cordic                                2.9680+-0.0175     ?      3.0015+-0.1107        ? might be 1.0113x slower
   math-partial-sums                          4.9964+-0.1935            4.8825+-0.0865          might be 1.0233x faster
   math-spectral-norm                         2.0783+-0.0989            2.0186+-0.0157          might be 1.0296x faster
   regexp-dna                                 6.1427+-0.1263            6.1332+-0.1810        
   string-base64                              4.5708+-0.1230            4.5161+-0.0304          might be 1.0121x faster
   string-fasta                               5.9487+-0.1408            5.9158+-0.1049        
   string-tagcloud                            8.1149+-0.0482     ?      8.1798+-0.1746        ?
   string-unpack-code                        19.5484+-1.5039           19.4468+-0.8064        
   string-validate-input                      4.5427+-0.1467            4.4092+-0.0482          might be 1.0303x faster

   <arithmetic>                               4.6913+-0.0685            4.6458+-0.0477          might be 1.0098x faster

                                                  Conf#1                    Conf#2                                      
Octane:
   encrypt                                   0.15762+-0.00148          0.15729+-0.00191       
   decrypt                                   2.81182+-0.01771          2.80344+-0.00966       
   deltablue                        x2       0.14028+-0.00574          0.13679+-0.00104         might be 1.0255x faster
   earley                                    0.28466+-0.00142    ?     0.28473+-0.00136       ?
   boyer                                     4.75015+-0.07751          4.66607+-0.09973         might be 1.0180x faster
   navier-stokes                    x2       4.97927+-0.06056          4.95214+-0.01942       
   raytrace                         x2       0.89419+-0.00406    ?     0.89631+-0.00544       ?
   richards                         x2       0.08224+-0.00067          0.08112+-0.00061         might be 1.0138x faster
   splay                            x2       0.34324+-0.00214          0.34075+-0.00168       
   regexp                           x2      25.14563+-0.53362         24.88729+-0.38187         might be 1.0104x faster
   pdfjs                            x2      38.77979+-0.18101    ?    38.89784+-0.16883       ?
   mandreel                         x2      43.96661+-0.11696    ?    43.98979+-0.31191       ?
   gbemu                            x2      24.78282+-0.11070    ?    24.83071+-0.11930       ?
   closure                                   0.56542+-0.00223    ?     0.56736+-0.00172       ?
   jquery                                    7.43416+-0.03158          7.41405+-0.06601       
   box2d                            x2       9.35055+-0.08793    ?     9.40528+-0.06360       ?
   zlib                             x2     390.20225+-3.35458        382.76264+-8.50273         might be 1.0194x faster
   typescript                       x2     665.50718+-5.42229    ?   671.65210+-5.75402       ?

   <geometric>                               5.29438+-0.01384          5.27045+-0.01076         might be 1.0045x faster

                                                  Conf#1                    Conf#2                                      
Kraken:
   ai-astar                                   96.835+-1.794             95.968+-1.847         
   audio-beat-detection                       47.489+-0.149      ?      47.619+-0.489         ?
   audio-dft                                  97.756+-1.274             97.345+-1.540         
   audio-fft                                  36.054+-0.615             36.014+-0.597         
   audio-oscillator                           48.688+-0.334             48.401+-0.070         
   imaging-darkroom                           60.075+-0.078      ?      60.154+-0.146         ?
   imaging-desaturate                         44.236+-0.113      ?      44.794+-0.478         ? might be 1.0126x slower
   imaging-gaussian-blur                      77.570+-2.137      ^      71.610+-2.005         ^ definitely 1.0832x faster
   json-parse-financial                       37.282+-0.756             36.592+-0.363           might be 1.0189x faster
   json-stringify-tinderbox                   24.363+-0.741      ?      25.270+-0.595         ? might be 1.0372x slower
   stanford-crypto-aes                        39.551+-0.182      ?      39.837+-0.392         ?
   stanford-crypto-ccm                        37.334+-1.214             36.860+-0.978           might be 1.0128x faster
   stanford-crypto-pbkdf2                    103.014+-1.560            101.889+-0.458           might be 1.0110x faster
   stanford-crypto-sha256-iterative           38.691+-0.396      ?      38.947+-0.641         ?

   <arithmetic>                               56.353+-0.348             55.807+-0.203           might be 1.0098x faster

                                                  Conf#1                    Conf#2                                      
AsmBench:
   bigfib.cpp                               431.5021+-1.5016     ?    439.3357+-6.5325        ? might be 1.0182x slower
   cray.c                                   370.1062+-1.5145     ^    365.4786+-1.4808        ^ definitely 1.0127x faster
   dry.c                                    483.1682+-45.2844         432.2826+-28.2020         might be 1.1177x faster
   FloatMM.c                                714.6547+-6.7181     ?    719.8654+-2.8353        ?
   gcc-loops.cpp                           3638.1944+-38.9169        3630.2125+-10.2586       
   n-body.c                                 813.7874+-3.7919     ?    815.4070+-4.8202        ?
   Quicksort.c                              403.4789+-3.7404          399.3948+-2.9873          might be 1.0102x faster
   stepanov_container.cpp                  3307.9732+-12.8669        3293.9502+-14.2785       
   Towers.c                                 273.2829+-5.2717          268.3912+-1.1232          might be 1.0182x faster

   <geometric>                              730.1670+-7.7325          720.0118+-4.3914          might be 1.0141x faster

                                                  Conf#1                    Conf#2                                      
Geomean of preferred means:
   <scaled-result>                           31.7928+-0.1591     ^     31.4935+-0.1016        ^ definitely 1.0095x faster
Comment 3 Filip Pizlo 2016-02-23 10:09:22 PST
Comment on attachment 271995 [details]
Patch

This seems like a really cool trick to me!  The patch looks good to me.
Comment 4 Benjamin Poulain 2016-02-23 16:03:23 PST
Created attachment 272061 [details]
Patch
Comment 5 Filip Pizlo 2016-02-23 16:24:48 PST
Comment on attachment 272061 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=272061&action=review

> Source/JavaScriptCore/dfg/DFGJITCode.h:142
> +    uint8_t neverExecutedEntry { 0xff };

Why 0xff and not just 1?
Comment 6 Benjamin Poulain 2016-02-23 16:25:37 PST
(In reply to comment #5)
> Comment on attachment 272061 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=272061&action=review
> 
> > Source/JavaScriptCore/dfg/DFGJITCode.h:142
> > +    uint8_t neverExecutedEntry { 0xff };
> 
> Why 0xff and not just 1?

That was arbitrary. I'll change to 1.
Comment 7 Build Bot 2016-02-23 17:21:52 PST
Comment on attachment 272061 [details]
Patch

Attachment 272061 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/874012

New failing tests:
webaudio/oscillator-square.html
js/regress/get-by-val-with-string-self-or-proto.html
js/regress/string-get-by-val-big-char.html
webaudio/oscillator-custom.html
js/regress/Int16Array-to-Int32Array-set.html
js/regress/Float32Array-to-Float64Array-set.html
webaudio/audiobuffersource-playbackrate.html
webaudio/oscillator-triangle.html
js/regress/rare-osr-exit-on-local.html
js/regress/Float64Array-to-Int16Array-set.html
js/regress/adapt-to-double-divide.html
webaudio/oscillator-sine.html
webaudio/oscillator-sawtooth.html
Comment 8 Build Bot 2016-02-23 17:21:55 PST
Created attachment 272069 [details]
Archive of layout-test-results from ews117 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews117  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 9 Benjamin Poulain 2016-02-23 17:38:01 PST
Looks like the new assertion was useful :)
I'll fix that before landing.
Comment 10 Geoffrey Garen 2016-02-23 22:14:07 PST
Comment on attachment 272061 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=272061&action=review

> Source/JavaScriptCore/dfg/DFGJITCompiler.cpp:118
> +        store8(TrustedImm32(0), &m_jitCode->neverExecutedEntry);

FWIW, I think this would be clearer as a store of true to a value named didExecuteEntry.

The way this code reads now, it says "I did not not enter", which hurts my brain.
Comment 11 Saam Barati 2016-02-23 23:36:01 PST
(In reply to comment #10)
> Comment on attachment 272061 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=272061&action=review
> 
> > Source/JavaScriptCore/dfg/DFGJITCompiler.cpp:118
> > +        store8(TrustedImm32(0), &m_jitCode->neverExecutedEntry);
> 
> FWIW, I think this would be clearer as a store of true to a value named
> didExecuteEntry.
> 
> The way this code reads now, it says "I did not not enter", which hurts my
> brain.

I suggested the same thing but as Ben pointed out
to me this is more efficient to do because we 
can always store 0 using the zero register on arm64.

Maybe it's worth adding a function "didExecuteEntry()"
into jitCode to make most code concerning this
field more readable?
Comment 12 Benjamin Poulain 2016-02-25 21:01:29 PST
Created attachment 272289 [details]
Patch for landing
Comment 13 WebKit Commit Bot 2016-02-25 21:59:43 PST
Comment on attachment 272289 [details]
Patch for landing

Clearing flags on attachment: 272289

Committed r197159: <http://trac.webkit.org/changeset/197159>
Comment 14 WebKit Commit Bot 2016-02-25 21:59:47 PST
All reviewed patches have been landed.  Closing bug.
Comment 15 Alexey Proskuryakov 2016-02-27 12:06:14 PST
This broke a test, filed bug 154778.
Comment 16 Filip Pizlo 2016-03-01 21:32:06 PST
Comment on attachment 272289 [details]
Patch for landing

View in context: https://bugs.webkit.org/attachment.cgi?id=272289&action=review

> Source/JavaScriptCore/ChangeLog:22
> +        What this patch does is set a flag when a DFG function is entered.
> +        If we try to triggerOSREntryNow() and the flag was never set,
> +        we start compiling both the full function and the one for OSR Entry.

I don't think this is an accurate description of what your change does...

> Source/JavaScriptCore/dfg/DFGOperations.cpp:1562
> -    if (!codeBlock->hasOptimizedReplacement())
> -        return 0;
> -    
> -    if (jitCode->osrEntryRetry < Options::ftlOSREntryRetryThreshold()) {
> -        jitCode->osrEntryRetry++;
> -        return 0;
> +    if (!shouldTriggerFTLCompile(codeBlock, jitCode))
> +        return nullptr;
> +
> +    if (!jitCode->neverExecutedEntry) {
> +        triggerFTLReplacementCompile(vm, codeBlock, jitCode);
> +
> +        if (!codeBlock->hasOptimizedReplacement())
> +            return nullptr;

Prior to your change, we would require an optimized replacement and ftlOSREntryRetryThreshold OSR entry attempts before we'd initiate an OSR entry compile.

After your change, a code block that had never executed entry, we will no longer require ftlOSREntryRetryThreshold OSR entry attempts.

Also, your ChangeLog claims that we will trigger both compilations simultaneously.  I don't think that's true.  It looks like if neverExecutedEntry is true, then we won't trigger an entry compile until we actually enter the function.

I actually like the new policy, but maybe it would be good to document it somewhere.
Comment 17 Filip Pizlo 2016-03-01 21:41:39 PST
(In reply to comment #16)
> Comment on attachment 272289 [details]
> Patch for landing
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=272289&action=review
> 
> > Source/JavaScriptCore/ChangeLog:22
> > +        What this patch does is set a flag when a DFG function is entered.
> > +        If we try to triggerOSREntryNow() and the flag was never set,
> > +        we start compiling both the full function and the one for OSR Entry.
> 
> I don't think this is an accurate description of what your change does...
> 
> > Source/JavaScriptCore/dfg/DFGOperations.cpp:1562
> > -    if (!codeBlock->hasOptimizedReplacement())
> > -        return 0;
> > -    
> > -    if (jitCode->osrEntryRetry < Options::ftlOSREntryRetryThreshold()) {
> > -        jitCode->osrEntryRetry++;
> > -        return 0;
> > +    if (!shouldTriggerFTLCompile(codeBlock, jitCode))
> > +        return nullptr;
> > +
> > +    if (!jitCode->neverExecutedEntry) {
> > +        triggerFTLReplacementCompile(vm, codeBlock, jitCode);
> > +
> > +        if (!codeBlock->hasOptimizedReplacement())
> > +            return nullptr;
> 
> Prior to your change, we would require an optimized replacement and
> ftlOSREntryRetryThreshold OSR entry attempts before we'd initiate an OSR
> entry compile.
> 
> After your change, a code block that had never executed entry, we will no
> longer require ftlOSREntryRetryThreshold OSR entry attempts.
> 
> Also, your ChangeLog claims that we will trigger both compilations
> simultaneously.  I don't think that's true.  It looks like if
> neverExecutedEntry is true, then we won't trigger an entry compile until we
> actually enter the function.
> 
> I actually like the new policy, but maybe it would be good to document it
> somewhere.

Oh wow, but that first part - the fact that we no longer do ftlOSREntryRetryThreshold attempts after compiling the replacement - is a huge loss on Octane/regexp after I enable it to run FTL.

I'm going to try to add that heuristic back in.  The point is: if the function is known to have executed entry in DFG, and we have a replacement compile, but we're still in the DFG loop, then it's better to try to wait a bit instead of immediately firing off an OSR entry compile.  That's a good rule in case the function is large and expensive to compile.
Comment 18 Benjamin Poulain 2016-03-01 22:10:11 PST
> Also, your ChangeLog claims that we will trigger both compilations
> simultaneously.  I don't think that's true.  It looks like if
> neverExecutedEntry is true, then we won't trigger an entry compile until we
> actually enter the function.

I think I did fuck that up on some patches.
The patch that landed should have the right thing: http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/dfg/DFGOperations.cpp#L1622

> Oh wow, but that first part - the fact that we no longer do
> ftlOSREntryRetryThreshold attempts after compiling the replacement - is a
> huge loss on Octane/regexp after I enable it to run FTL.

Okay.

I did that because ftlOSREntryRetryThreshold() was delaying entry on a bunch of benchmark and being more aggressive was worth a few percents on a few tests.

Please test everything before adding the threshold.
Comment 19 Filip Pizlo 2016-03-02 12:10:26 PST
(In reply to comment #18)
> > Also, your ChangeLog claims that we will trigger both compilations
> > simultaneously.  I don't think that's true.  It looks like if
> > neverExecutedEntry is true, then we won't trigger an entry compile until we
> > actually enter the function.
> 
> I think I did fuck that up on some patches.
> The patch that landed should have the right thing:
> http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/dfg/DFGOperations.
> cpp#L1622
> 
> > Oh wow, but that first part - the fact that we no longer do
> > ftlOSREntryRetryThreshold attempts after compiling the replacement - is a
> > huge loss on Octane/regexp after I enable it to run FTL.
> 
> Okay.
> 
> I did that because ftlOSREntryRetryThreshold() was delaying entry on a bunch
> of benchmark and being more aggressive was worth a few percents on a few
> tests.
> 
> Please test everything before adding the threshold.

I tested everything!  It looks OK.  Note that I kept all other aspects of your heuristics, including only triggering an OSR entry compile if entry never executed.
Comment 20 Filip Pizlo 2016-03-02 12:11:20 PST
(In reply to comment #19)
> (In reply to comment #18)
> > > Also, your ChangeLog claims that we will trigger both compilations
> > > simultaneously.  I don't think that's true.  It looks like if
> > > neverExecutedEntry is true, then we won't trigger an entry compile until we
> > > actually enter the function.
> > 
> > I think I did fuck that up on some patches.
> > The patch that landed should have the right thing:
> > http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/dfg/DFGOperations.
> > cpp#L1622
> > 
> > > Oh wow, but that first part - the fact that we no longer do
> > > ftlOSREntryRetryThreshold attempts after compiling the replacement - is a
> > > huge loss on Octane/regexp after I enable it to run FTL.
> > 
> > Okay.
> > 
> > I did that because ftlOSREntryRetryThreshold() was delaying entry on a bunch
> > of benchmark and being more aggressive was worth a few percents on a few
> > tests.
> > 
> > Please test everything before adding the threshold.
> 
> I tested everything!  It looks OK.  Note that I kept all other aspects of
> your heuristics, including only triggering an OSR entry compile if entry
> never executed.

See https://bugs.webkit.org/show_bug.cgi?id=154901#c2