Bug 169914 - [JSC][DFG] Propagate AnyIntAsDouble information carefully to utilize it in fixup
Summary: [JSC][DFG] Propagate AnyIntAsDouble information carefully to utilize it in fixup
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Yusuke Suzuki
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-21 06:22 PDT by Yusuke Suzuki
Modified: 2017-03-23 00:51 PDT (History)
6 users (show)

See Also:


Attachments
Patch (5.39 KB, patch)
2017-03-21 06:23 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (5.08 KB, patch)
2017-03-21 07:57 PDT, Yusuke Suzuki
buildbot: commit-queue-
Details | Formatted Diff | Diff
WIP (10.72 KB, patch)
2017-03-21 13:43 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
WIP (9.87 KB, patch)
2017-03-21 14:06 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
WIP: Patch becomes much solid (6.37 KB, patch)
2017-03-21 15:41 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
WIP: Cleaning up right now (9.71 KB, patch)
2017-03-21 18:04 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (11.18 KB, patch)
2017-03-22 13:50 PDT, Yusuke Suzuki
sbarati: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yusuke Suzuki 2017-03-21 06:22:54 PDT
[JSC][DFG] Propagate AnyIntAsDouble information carefully to utilize it in fixup
Comment 1 Yusuke Suzuki 2017-03-21 06:23:32 PDT
Created attachment 305000 [details]
Patch

WIP
Comment 2 Yusuke Suzuki 2017-03-21 06:31:49 PDT
We should investigate the heap prediction to offer proper Double speculation for the result of GetByVal. By doing so, we can propagate AnyIntAsDouble information in prediction propagation phase.
And later, in fixup, we can select ArithAdd(Int52, Int52) for such integers.

The current implementation is sooooooooo fragile, it's WIP. We should select appropriate ValueAdd fixup super carefully. For example, when both values are AnyIntAsDouble, we have a chance to make it to ArithAdd(Int52, Int52). But if it always overflows, we should select ArithAdd(Double, Double).

Good news: With the current WIP, we can gain perf improvement in kraken crypto benchmarks.

                                                       baseline                  patched

            stanford-crypto-aes                     58.704+-2.227             56.238+-2.149           might be 1.0438x faster
            stanford-crypto-ccm                     52.842+-3.927             51.027+-2.285           might be 1.0356x faster
            stanford-crypto-pbkdf2                 130.434+-4.405      ^      94.416+-2.397         ^ definitely 1.3815x faster
            stanford-crypto-sha256-iterative        44.315+-0.970      ^      33.151+-0.708         ^ definitely 1.3368x faster

            <arithmetic>                            71.574+-1.483      ^      58.708+-0.960         ^ definitely 1.2191x faster
Comment 3 Yusuke Suzuki 2017-03-21 07:57:56 PDT
Created attachment 305003 [details]
Patch

WIP
Comment 4 Yusuke Suzuki 2017-03-21 08:35:41 PDT
kraken-imaging-darkroom shows 20% regression. Nice catch, let's investigate what happens.
Comment 5 Filip Pizlo 2017-03-21 08:48:24 PDT
Comment on attachment 305003 [details]
Patch

This is super interesting. It's a good idea if it becomes not-fragile!
Comment 6 Build Bot 2017-03-21 09:09:17 PDT
Comment on attachment 305003 [details]
Patch

Attachment 305003 [details] did not pass jsc-ews (mac):
Output: http://webkit-queues.webkit.org/results/3376907

New failing tests:
stress/exit-from-getter.js.ftl-no-cjit-no-inline-validate
stress/exit-from-getter.js.no-cjit-collect-continuously
jsc-layout-tests.yaml/js/script-tests/dfg-int-overflow-in-loop.js.layout-dfg-eager-no-cjit
jsc-layout-tests.yaml/js/script-tests/dfg-int-overflow-in-loop.js.layout-no-ftl
stress/exit-from-getter.js.dfg-eager-no-cjit-validate
jsc-layout-tests.yaml/js/script-tests/dfg-int-overflow-in-loop.js.layout-no-llint
jsc-layout-tests.yaml/js/script-tests/dfg-int-overflow-in-loop.js.layout-no-cjit
stress/exit-from-getter.js.ftl-eager
stress/exit-from-getter.js.ftl-no-cjit-validate-sampling-profiler
stress/exit-from-getter.js.ftl-no-cjit-no-put-stack-validate
stress/exit-from-getter.js.no-cjit-validate-phases
Comment 7 Yusuke Suzuki 2017-03-21 09:14:23 PDT
(In reply to comment #4)
> kraken-imaging-darkroom shows 20% regression. Nice catch, let's investigate
> what happens.

We should honor DoubleConstant. It is offered by SourceCodeRepresentation::Double. It means that users explicitly write 1.0 instead of 1. In that case, handling it as double should be preferable.
Comment 8 Yusuke Suzuki 2017-03-21 13:43:54 PDT
Created attachment 305024 [details]
WIP
Comment 9 Yusuke Suzuki 2017-03-21 14:06:59 PDT
Created attachment 305026 [details]
WIP
Comment 10 Yusuke Suzuki 2017-03-21 14:32:00 PDT
Comment on attachment 305026 [details]
WIP

View in context: https://bugs.webkit.org/attachment.cgi?id=305026&action=review

Haha, we found bunch of related issues!

> Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h:478
> +        forNode(node).setType(SpecAnyInt);

Note: should be SpecAnyInt since the above constant value can be any int.

> Source/JavaScriptCore/dfg/DFGPredictionPropagationPhase.cpp:408
> +                    changed |= mergePrediction(SpecBytecodeNumber);

Note: Oops!

> Source/JavaScriptCore/ftl/FTLLowerDFGToB3.cpp:2568
> +                speculate(NegativeZero, noValue(), 0, m_out.isZero64(result));

Note: Oops! part 2.
Comment 11 Yusuke Suzuki 2017-03-21 15:41:27 PDT
Created attachment 305038 [details]
WIP: Patch becomes much solid
Comment 12 Yusuke Suzuki 2017-03-21 16:49:58 PDT
(In reply to Yusuke Suzuki from comment #7)
> (In reply to comment #4)
> > kraken-imaging-darkroom shows 20% regression. Nice catch, let's investigate
> > what happens.
> 
> We should honor DoubleConstant. It is offered by
> SourceCodeRepresentation::Double. It means that users explicitly write 1.0
> instead of 1. In that case, handling it as double should be preferable.

That's rather caused by Int52Rep's incorrect AI result.
Comment 13 Yusuke Suzuki 2017-03-21 18:04:29 PDT
Created attachment 305055 [details]
WIP: Cleaning up right now
Comment 14 Yusuke Suzuki 2017-03-21 18:06:38 PDT
Performance numbers. Seems good.

Benchmark report for SunSpider, Octane, and Kraken on sakura-trick.

VMs tested:
"baseline" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/ic-dfg-tot/Release/bin/jsc
"patched" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/ic-dfg-value-add/Release/bin/jsc

Collected 50 samples per benchmark/VM, with 50 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to
get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                 baseline                  patched
SunSpider:
   3d-cube                                    9.0468+-0.4192            8.8036+-0.4336          might be 1.0276x faster
   3d-morph                                  25.1012+-3.2549     ?     26.3637+-3.4597        ? might be 1.0503x slower
   3d-raytrace                                8.9271+-0.4894     ?      9.1181+-0.4437        ? might be 1.0214x slower
   access-binary-trees                        4.1115+-0.2068     ?      4.1644+-0.2009        ? might be 1.0129x slower
   access-fannkuch                           11.3397+-0.6874           10.7369+-0.6946          might be 1.0561x faster
   access-nbody                               5.1591+-0.2796     ?      5.1814+-0.2438        ?
   access-nsieve                              6.6204+-0.3874            6.3539+-0.3811          might be 1.0419x faster
   bitops-3bit-bits-in-byte                   2.3271+-0.1316     ?      2.4111+-0.1198        ? might be 1.0361x slower
   bitops-bits-in-byte                        5.7656+-0.2966            5.4785+-0.3563          might be 1.0524x faster
   bitops-bitwise-and                         4.2864+-0.2719     ?      4.5494+-0.2811        ? might be 1.0614x slower
   bitops-nsieve-bits                         6.1865+-0.3359            6.1447+-0.3429
   controlflow-recursive                      5.0490+-0.2080            5.0349+-0.2069
   crypto-aes                                 7.7566+-0.4096     ?      8.2666+-0.3546        ? might be 1.0657x slower
   crypto-md5                                 5.0399+-0.2592            4.8526+-0.2656          might be 1.0386x faster
   crypto-sha1                                5.0899+-0.2812     ?      5.3179+-0.3035        ? might be 1.0448x slower
   date-format-tofte                         12.7879+-0.6442     ?     13.2374+-0.6718        ? might be 1.0352x slower
   date-format-xparb                          9.6938+-0.4552     ?      9.7779+-0.4665        ?
   math-cordic                                5.5725+-0.2822     ?      5.7865+-0.2725        ? might be 1.0384x slower
   math-partial-sums                         11.1509+-0.5344     ?     11.3622+-0.5034        ? might be 1.0189x slower
   math-spectral-norm                         4.1114+-0.2003     ?      4.1346+-0.1848        ?
   regexp-dna                                12.4260+-0.7087     ?     12.6531+-0.6848        ? might be 1.0183x slower
   string-base64                              8.0524+-0.3953            7.7479+-0.4451          might be 1.0393x faster
   string-fasta                              10.7088+-0.5774           10.5849+-0.6265          might be 1.0117x faster
   string-tagcloud                           15.0000+-0.6955     ?     15.8826+-0.5997        ? might be 1.0588x slower
   string-unpack-code                        33.6959+-1.1267     ^     31.2100+-1.0928        ^ definitely 1.0796x faster
   string-validate-input                      7.7340+-0.3487            7.5341+-0.3061          might be 1.0265x faster

   <arithmetic>                               9.3362+-0.1705            9.3342+-0.1536          might be 1.0002x faster

                                                 baseline                  patched
Octane:
   encrypt                                   0.22284+-0.00726          0.21788+-0.00208         might be 1.0228x faster
   decrypt                                   3.19350+-0.03107          3.15163+-0.01135         might be 1.0133x faster
   deltablue                        x2       0.19005+-0.00489          0.18991+-0.00378
   earley                                    0.38626+-0.00634          0.38608+-0.00754
   boyer                                     6.81411+-0.05849          6.78422+-0.04951
   navier-stokes                    x2       5.13978+-0.02146    ?     5.14224+-0.02263       ?
   raytrace                         x2       1.07106+-0.00759    ?     1.07823+-0.00639       ?
   richards                         x2       0.10447+-0.00262          0.10354+-0.00372
   splay                            x2       0.41359+-0.00187    ?     0.41372+-0.00254       ?
   regexp                           x2      24.17902+-0.57214         23.88639+-0.34467         might be 1.0123x faster
   pdfjs                            x2      52.99913+-1.12541         51.55391+-0.97258         might be 1.0280x faster
   mandreel                         x2      61.73522+-2.28842         59.43819+-0.62873         might be 1.0386x faster
   gbemu                            x2      47.52163+-1.43188         47.43213+-1.47181
   closure                                   0.76502+-0.01333          0.75352+-0.01460         might be 1.0153x faster
   jquery                                    9.76510+-0.24014    ?     9.84420+-0.24252       ?
   box2d                            x2      13.43593+-0.27014         13.22219+-0.36775         might be 1.0162x faster
   zlib                             x2     467.78036+-4.54568        466.03177+-5.53952
   typescript                       x2     874.60354+-18.01238       856.91126+-17.50218        might be 1.0206x faster

   <geometric>                               6.84285+-0.03025          6.77972+-0.03310         might be 1.0093x faster

                                                 baseline                  patched
Kraken:
   ai-astar                                  156.991+-4.220      ?     160.604+-3.616         ? might be 1.0230x slower
   audio-beat-detection                       52.085+-2.036      ?      54.572+-2.393         ? might be 1.0478x slower
   audio-dft                                 104.422+-1.980            102.996+-2.037           might be 1.0138x faster
   audio-fft                                  40.316+-0.724             40.158+-0.751
   audio-oscillator                           71.545+-1.686             70.198+-2.657           might be 1.0192x faster
   imaging-darkroom                          117.680+-2.308      ?     120.226+-2.270         ? might be 1.0216x slower
   imaging-desaturate                         73.859+-3.123             73.527+-1.867
   imaging-gaussian-blur                     107.189+-2.530            107.122+-2.535
   json-parse-financial                       62.838+-1.349             60.334+-1.798           might be 1.0415x faster
   json-stringify-tinderbox                   39.174+-1.706      ?      39.392+-0.957         ?
   stanford-crypto-aes                        58.237+-1.560             56.834+-1.717           might be 1.0247x faster
   stanford-crypto-ccm                        48.910+-2.176      ?      50.467+-1.892         ? might be 1.0318x slower
   stanford-crypto-pbkdf2                    128.624+-2.491      ^      91.417+-1.972         ^ definitely 1.4070x faster
   stanford-crypto-sha256-iterative           43.015+-0.876      ^      31.761+-0.818         ^ definitely 1.3543x faster

   <arithmetic>                               78.920+-0.666      ^      75.686+-0.493         ^ definitely 1.0427x faster

                                                 baseline                  patched
Geomean of preferred means:
   <scaled-result>                           17.1374+-0.1164     ^     16.8491+-0.1070        ^ definitely 1.0171x faster
Comment 15 Saam Barati 2017-03-22 12:06:46 PDT
Comment on attachment 305055 [details]
WIP: Cleaning up right now

View in context: https://bugs.webkit.org/attachment.cgi?id=305055&action=review

> Source/JavaScriptCore/bytecode/SpeculatedType.h:439
> +inline bool isAnyIntSpeculationForAdd(SpeculatedType value)
> +{
> +    return !!value && (value & (SpecAnyInt | SpecAnyIntAsDouble)) == value;
> +}

Style nit: This seems specialized enough to the DFG's use case that I'd probably make this a lambda inside the function that calls this. The reason is that the name implies to me that this might definitely be the case, but what's really happening is we're making an optimistic assumption.
Comment 16 Yusuke Suzuki 2017-03-22 13:08:38 PDT
Comment on attachment 305055 [details]
WIP: Cleaning up right now

View in context: https://bugs.webkit.org/attachment.cgi?id=305055&action=review

>> Source/JavaScriptCore/bytecode/SpeculatedType.h:439
>> +}
> 
> Style nit: This seems specialized enough to the DFG's use case that I'd probably make this a lambda inside the function that calls this. The reason is that the name implies to me that this might definitely be the case, but what's really happening is we're making an optimistic assumption.

Thanks. OK, I've moved this to DFGGraph::addShouldSpeculateAnyInt.
Comment 17 Yusuke Suzuki 2017-03-22 13:50:52 PDT
Created attachment 305124 [details]
Patch
Comment 18 Yusuke Suzuki 2017-03-22 13:51:08 PDT
OK, ready.
Comment 19 Saam Barati 2017-03-22 18:17:04 PDT
Comment on attachment 305124 [details]
Patch

r=me
Comment 20 Yusuke Suzuki 2017-03-22 23:01:51 PDT
Committed r214296: <http://trac.webkit.org/changeset/214296>
Comment 21 Yusuke Suzuki 2017-03-23 00:51:01 PDT
Track the darkroom regression in https://bugs.webkit.org/show_bug.cgi?id=169998