Bug 160539 - [JSC] Speed up InPlaceAbstractState::endBasicBlock()
Summary: [JSC] Speed up InPlaceAbstractState::endBasicBlock()
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Benjamin Poulain
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-04 00:09 PDT by Benjamin Poulain
Modified: 2016-08-04 12:34 PDT (History)
5 users (show)

See Also:


Attachments
Patch (15.18 KB, patch)
2016-08-04 00:23 PDT, Benjamin Poulain
no flags Details | Formatted Diff | Diff
Patch for landing (14.90 KB, patch)
2016-08-04 11:46 PDT, Benjamin Poulain
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Benjamin Poulain 2016-08-04 00:09:26 PDT
[JSC] Speed up InPlaceAbstractState::endBasicBlock()
Comment 1 Benjamin Poulain 2016-08-04 00:23:39 PDT
Created attachment 285305 [details]
Patch
Comment 2 Benjamin Poulain 2016-08-04 00:24:25 PDT
24 runs:

                                                  Conf#1                    Conf#2                                      
SunSpider:
   3d-cube                                    4.7019+-0.0445     ?      4.7260+-0.0561        ?
   3d-morph                                   5.0836+-0.0799     ^      4.9649+-0.0244        ^ definitely 1.0239x faster
   3d-raytrace                                4.9661+-0.1071            4.9161+-0.0559          might be 1.0102x faster
   access-binary-trees                        2.0023+-0.0263            1.9813+-0.0175          might be 1.0106x faster
   access-fannkuch                            5.7993+-0.1762            5.6978+-0.0431          might be 1.0178x faster
   access-nbody                               2.3969+-0.0276            2.3558+-0.0251          might be 1.0174x faster
   access-nsieve                              2.9599+-0.0235            2.9555+-0.0190        
   bitops-3bit-bits-in-byte                   1.0751+-0.0229     ?      1.0854+-0.0299        ?
   bitops-bits-in-byte                        2.5775+-0.0164     ?      2.5817+-0.0210        ?
   bitops-bitwise-and                         1.9970+-0.0298            1.9916+-0.0243        
   bitops-nsieve-bits                         3.0999+-0.0508     ?      3.1225+-0.1006        ?
   controlflow-recursive                      2.3036+-0.0123     ?      2.3185+-0.0339        ?
   crypto-aes                                 4.4593+-0.0247            4.4255+-0.0157        
   crypto-md5                                 2.6770+-0.0292            2.6587+-0.0351        
   crypto-sha1                                2.7302+-0.0260     ?      2.7562+-0.0388        ?
   date-format-tofte                          6.5233+-0.0668            6.4830+-0.0387        
   date-format-xparb                          4.7152+-0.0269     ?      4.7308+-0.0337        ?
   math-cordic                                2.7434+-0.0169     ?      2.7471+-0.0198        ?
   math-partial-sums                          3.9862+-0.0359            3.9834+-0.0454        
   math-spectral-norm                         2.0678+-0.0431            2.0581+-0.0369        
   regexp-dna                                 6.4471+-0.1295            6.4428+-0.0842        
   string-base64                              4.0172+-0.0486            3.9858+-0.0242        
   string-fasta                               5.5386+-0.0332            5.5166+-0.0451        
   string-tagcloud                            8.4564+-0.1596            8.4525+-0.1690        
   string-unpack-code                        18.2313+-0.2031           18.1489+-0.2162        
   string-validate-input                      4.0572+-0.0358     ?      4.0676+-0.0298        ?

   <arithmetic>                               4.4467+-0.0138            4.4290+-0.0144          might be 1.0040x faster

                                                  Conf#1                    Conf#2                                      
Octane:
   encrypt                                   0.15532+-0.00134    ?     0.15603+-0.00141       ?
   decrypt                                   2.76892+-0.02251          2.75079+-0.00556       
   deltablue                        x2       0.13315+-0.00364          0.13298+-0.00372       
   earley                                    0.28542+-0.00091    ?     0.28626+-0.00112       ?
   boyer                                     4.94042+-0.04671    ?     4.97826+-0.03856       ?
   navier-stokes                    x2       4.94975+-0.01413          4.94368+-0.00819       
   raytrace                         x2       0.79568+-0.00331          0.79556+-0.00277       
   richards                         x2       0.08226+-0.00068    ?     0.08235+-0.00046       ?
   splay                            x2       0.33844+-0.00093    ?     0.33932+-0.00111       ?
   regexp                           x2      16.62279+-0.26727    ?    16.70929+-0.29884       ?
   pdfjs                            x2      38.90697+-0.17084    ^    38.37963+-0.14737       ^ definitely 1.0137x faster
   mandreel                         x2      42.89519+-0.23220         42.77201+-0.13249       
   gbemu                            x2      29.87420+-0.47615         29.46608+-0.09044         might be 1.0139x faster
   closure                                   0.48411+-0.00113          0.48294+-0.00134       
   jquery                                    6.45182+-0.01299          6.44828+-0.01829       
   box2d                            x2       9.24477+-0.02668    ?     9.24608+-0.03331       ?
   zlib                             x2     365.30700+-4.35224        357.26474+-5.09561         might be 1.0225x faster
   typescript                       x2     604.52266+-3.40624        597.89421+-3.47951         might be 1.0111x faster

   <geometric>                               5.03481+-0.01501          5.01664+-0.01060         might be 1.0036x faster

                                                  Conf#1                    Conf#2                                      
Kraken:
   ai-astar                                   86.698+-0.545      ?      86.725+-0.791         ?
   audio-beat-detection                       38.933+-0.145             38.907+-0.161         
   audio-dft                                  98.381+-1.161             97.826+-0.890         
   audio-fft                                  30.242+-0.024      ?      30.326+-0.109         ?
   audio-oscillator                           49.023+-0.509             48.819+-0.436         
   imaging-darkroom                           61.247+-0.278      ?      61.384+-0.509         ?
   imaging-desaturate                         44.062+-0.566             43.958+-0.326         
   imaging-gaussian-blur                      59.879+-0.743      ?      60.623+-0.889         ? might be 1.0124x slower
   json-parse-financial                       34.682+-0.344             34.153+-0.319           might be 1.0155x faster
   json-stringify-tinderbox                   23.239+-0.199      ^      22.415+-0.249         ^ definitely 1.0367x faster
   stanford-crypto-aes                        37.081+-0.433      ?      37.185+-0.300         ?
   stanford-crypto-ccm                        33.517+-0.969      ?      34.784+-0.780         ? might be 1.0378x slower
   stanford-crypto-pbkdf2                     93.333+-0.490             92.594+-0.319         
   stanford-crypto-sha256-iterative           30.490+-0.119      ?      30.523+-0.121         ?

   <arithmetic>                               51.486+-0.145             51.445+-0.130           might be 1.0008x faster

                                                  Conf#1                    Conf#2                                      
AsmBench:
   bigfib.cpp                               425.7745+-0.8215          425.4747+-1.2422        
   cray.c                                   387.3546+-1.3674          384.7876+-1.4579        
   dry.c                                    466.1051+-33.8684    ^    428.9013+-3.2271        ^ definitely 1.0867x faster
   FloatMM.c                                735.3686+-14.8127         721.2791+-7.1189          might be 1.0195x faster
   gcc-loops.cpp                           3594.9277+-7.0192     ?   3600.2280+-9.0302        ?
   n-body.c                                 800.5333+-1.3423          799.7137+-1.3151        
   Quicksort.c                              398.6741+-2.8060          395.8523+-1.9722        
   stepanov_container.cpp                  3295.6699+-10.0760        3279.1960+-10.5901       
   Towers.c                                 264.8358+-0.5208     ?    265.1645+-0.8941        ?

   <geometric>                              725.8621+-6.1604     ^    717.1344+-1.2986        ^ definitely 1.0122x faster

                                                  Conf#1                    Conf#2                                      
Geomean of preferred means:
   <scaled-result>                           30.2426+-0.0712     ^     30.0891+-0.0366        ^ definitely 1.0051x faster
Comment 3 Mark Lam 2016-08-04 10:15:56 PDT
Comment on attachment 285305 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=285305&action=review

r=me.

> Source/JavaScriptCore/ChangeLog:17
> +        One key insight is that using HashMap to map Nodes
> +        to Value in valuesAtTail is too inefficient at the scale
> +        we use it.
> +
> +        We need some mapping from node to value, but HashMap
> +        does not cut it. Instead, I reuse our existing mapping
> +        from every Node to its value, abstracted by forNode().

The "We need some mapping from node to value, but HashMap does not cut it." part is redundant with (and not as informative as) the previous sentence.  I suggest removing it

> Source/JavaScriptCore/ChangeLog:19
> +        Since we are not gonna use the mapping after endBasicBlock()

nit: /gonna/going to/.

> Source/JavaScriptCore/ChangeLog:23
> +        In endBasicBlock(), valuesAtTail is now a vector all values live

typo: /vector all/vector of all/?

Also, you capitalize Vector below (as in a reference to the type).  Do you want to capitalize it here?

> Source/JavaScriptCore/dfg/DFGInPlaceAbstractState.cpp:179
> -    bool changed = checkAndSet(block->cfaStructureClobberStateAtTail, m_structureClobberState);
> +    checkAndSet(block->cfaStructureClobberStateAtTail, m_structureClobberState);

You're changing this to not use the "check" part (because the changed bool is unused).  Why not just change this to an assignment?
    block->cfaStructureClobberStateAtTail = m_structureClobberState;

> Source/JavaScriptCore/dfg/DFGInPlaceAbstractState.cpp:295
> +            changed |= entry.value.merge(forNode(node));

Above this line, can you add the following?
    ASSERT(from->ssa->valuesAtTail.find(node)->value == forNode(node));

If I read your patch correctly, this is the crux of the patch i.e. you also stash the from->ssa->valuesAtTail.find(node)->value in forNode(node) back in InPlaceAbstractState::endBasicBlock() when you computed from->ssa->valuesAtTail.find(node)->value.  This assertion documents this expectation of equivalence and ensures that nothing breaks it silently.
Comment 4 Benjamin Poulain 2016-08-04 11:46:40 PDT
Created attachment 285342 [details]
Patch for landing
Comment 5 WebKit Commit Bot 2016-08-04 12:34:43 PDT
Comment on attachment 285342 [details]
Patch for landing

Clearing flags on attachment: 285342

Committed r204130: <http://trac.webkit.org/changeset/204130>
Comment 6 WebKit Commit Bot 2016-08-04 12:34:47 PDT
All reviewed patches have been landed.  Closing bug.