Bug 155594 - [JSC] Make CSE's ImpureData faster when dealing with large blocks
Summary: [JSC] Make CSE's ImpureData faster when dealing with large blocks
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Benjamin Poulain
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-17 12:48 PDT by Benjamin Poulain
Modified: 2016-03-17 21:07 PDT (History)
8 users (show)

See Also:


Attachments
Patch (16.70 KB, patch)
2016-03-17 16:14 PDT, Benjamin Poulain
no flags Details | Formatted Diff | Diff
Patch (16.68 KB, patch)
2016-03-17 16:21 PDT, Benjamin Poulain
no flags Details | Formatted Diff | Diff
Patch (18.63 KB, patch)
2016-03-17 18:43 PDT, Benjamin Poulain
no flags Details | Formatted Diff | Diff
Patch (20.00 KB, patch)
2016-03-17 19:22 PDT, Benjamin Poulain
no flags Details | Formatted Diff | Diff
Patch (20.01 KB, patch)
2016-03-17 19:23 PDT, Benjamin Poulain
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Benjamin Poulain 2016-03-17 12:48:16 PDT
[JSC] Make CSE's ImpureData faster when dealing with large blocks
Comment 1 Benjamin Poulain 2016-03-17 16:14:30 PDT
Created attachment 274337 [details]
Patch
Comment 2 Benjamin Poulain 2016-03-17 16:21:36 PDT
Created attachment 274338 [details]
Patch
Comment 3 Benjamin Poulain 2016-03-17 17:15:33 PDT
Worth about 1% on Sunspider on my Haswell MBP:

                                                  Conf#1                    Conf#2                                      
SunSpider:
   3d-cube                                    4.7620+-0.1528     ?      5.0014+-0.2815        ? might be 1.0503x slower
   3d-morph                                   5.6442+-0.6014            5.1633+-0.0740          might be 1.0931x faster
   3d-raytrace                                5.6119+-0.0455     ?      5.6422+-0.1120        ?
   access-binary-trees                        2.1840+-0.0982     ?      2.2405+-0.4026        ? might be 1.0259x slower
   access-fannkuch                            5.9225+-0.6964     ?      6.0875+-0.0801        ? might be 1.0279x slower
   access-nbody                               2.6172+-0.0474            2.5616+-0.1011          might be 1.0217x faster
   access-nsieve                              3.0339+-0.0196     ?      3.0972+-0.0746        ? might be 1.0209x slower
   bitops-3bit-bits-in-byte                   1.1820+-0.0544            1.1519+-0.0194          might be 1.0261x faster
   bitops-bits-in-byte                        2.8886+-0.3271            2.7882+-0.0511          might be 1.0360x faster
   bitops-bitwise-and                         2.0168+-0.0254     ?      2.0743+-0.1251        ? might be 1.0285x slower
   bitops-nsieve-bits                         3.2024+-0.1270            3.1933+-0.0690        
   controlflow-recursive                      2.3562+-0.0547            2.3550+-0.0283        
   crypto-aes                                 4.2165+-0.4866            4.0472+-0.0830          might be 1.0418x faster
   crypto-md5                                 2.5848+-0.0138     ^      2.5099+-0.0507        ^ definitely 1.0298x faster
   crypto-sha1                                2.3505+-0.0463     ?      2.4702+-0.3803        ? might be 1.0509x slower
   date-format-tofte                          7.0665+-0.1434            6.7672+-0.2004          might be 1.0442x faster
   date-format-xparb                          5.0988+-0.4837            4.7952+-0.0534          might be 1.0633x faster
   math-cordic                                2.8953+-0.0496     ?      2.9280+-0.1650        ? might be 1.0113x slower
   math-partial-sums                          4.9780+-0.3295     ?      5.0692+-0.5964        ? might be 1.0183x slower
   math-spectral-norm                         2.1288+-0.3209            2.0122+-0.0190          might be 1.0580x faster
   regexp-dna                                 6.2723+-1.3781            6.0810+-0.7191          might be 1.0315x faster
   string-base64                              4.5028+-0.1108     ?      4.7388+-0.9072        ? might be 1.0524x slower
   string-fasta                               6.1963+-0.3864            5.9904+-0.1745          might be 1.0344x faster
   string-tagcloud                            8.5071+-1.0164            8.0998+-0.1173          might be 1.0503x faster
   string-unpack-code                        18.6016+-0.8419     ?     18.9164+-0.9788        ? might be 1.0169x slower
   string-validate-input                      4.2703+-0.2170            4.2188+-0.1649          might be 1.0122x faster

   <arithmetic>                               4.6574+-0.1135            4.6154+-0.1004          might be 1.0091x faster

                                                  Conf#1                    Conf#2                                      
Octane:
   encrypt                                   0.15853+-0.00595    ?     0.15979+-0.00351       ?
   decrypt                                   2.89425+-0.07303          2.83491+-0.02284         might be 1.0209x faster
   deltablue                        x2       0.13968+-0.00411          0.13907+-0.00364       
   earley                                    0.28603+-0.00424          0.28510+-0.00099       
   boyer                                     5.00685+-0.03148          4.90460+-0.29752         might be 1.0208x faster
   navier-stokes                    x2       4.93707+-0.03806    ?     4.94465+-0.00570       ?
   raytrace                         x2       0.89199+-0.00800          0.89028+-0.01185       
   richards                         x2       0.08227+-0.00105          0.08128+-0.00171         might be 1.0121x faster
   splay                            x2       0.34863+-0.00383          0.34622+-0.00583       
   regexp                           x2      19.92103+-0.05291         19.89375+-0.63229       
   pdfjs                            x2      39.68920+-1.64384         39.39607+-1.54674       
   mandreel                         x2      42.47091+-0.56361         42.23536+-0.30597       
   gbemu                            x2      24.69891+-0.19513         24.56994+-0.18549       
   closure                                   0.56687+-0.00601    ?     0.57261+-0.01139       ? might be 1.0101x slower
   jquery                                    7.45649+-0.12023          7.42992+-0.12502       
   box2d                            x2       9.31075+-0.09297          9.29540+-0.02329       
   zlib                             x2     377.59475+-29.33945       363.35997+-5.49547         might be 1.0392x faster
   typescript                       x2     665.67371+-11.03509       643.76984+-11.76872        might be 1.0340x faster

   <geometric>                               5.21253+-0.03424          5.16709+-0.03924         might be 1.0088x faster

                                                  Conf#1                    Conf#2                                      
Kraken:
   ai-astar                                   88.362+-1.991             87.599+-0.827         
   audio-beat-detection                       45.229+-0.313             45.172+-0.230         
   audio-dft                                 100.450+-6.442             97.193+-2.396           might be 1.0335x faster
   audio-fft                                  35.810+-0.112             35.743+-0.116         
   audio-oscillator                           47.443+-0.509      ?      47.564+-0.390         ?
   imaging-darkroom                           60.868+-2.461             60.726+-1.790         
   imaging-desaturate                         44.586+-0.788      ?      45.959+-5.253         ? might be 1.0308x slower
   imaging-gaussian-blur                      60.520+-7.612             60.275+-7.601         
   json-parse-financial                       36.993+-0.137             36.739+-0.225         
   json-stringify-tinderbox                   23.397+-1.573             23.249+-2.266         
   stanford-crypto-aes                        40.144+-0.079      ?      40.335+-0.540         ?
   stanford-crypto-ccm                        35.667+-2.626             35.213+-1.648           might be 1.0129x faster
   stanford-crypto-pbkdf2                    101.108+-0.954      ?     101.879+-1.316         ?
   stanford-crypto-sha256-iterative           38.750+-0.506      ?      39.067+-1.336         ?

   <arithmetic>                               54.238+-1.117             54.051+-0.758           might be 1.0035x faster

                                                  Conf#1                    Conf#2                                      
AsmBench:
   bigfib.cpp                               437.6711+-19.6185    ?    440.0609+-12.9610       ?
   cray.c                                   358.5124+-5.3609     ?    361.3328+-9.4286        ?
   dry.c                                    417.5919+-6.3117     ?    486.2893+-128.8717      ? might be 1.1645x slower
   FloatMM.c                                715.5072+-8.4764          712.2838+-10.9938       
   gcc-loops.cpp                           3666.7940+-24.0637    ?   3694.4663+-37.0843       ?
   n-body.c                                 807.5378+-5.3084     ?    815.3618+-11.8043       ?
   Quicksort.c                              390.3484+-4.3191     ?    390.7597+-1.3254        ?
   stepanov_container.cpp                  3314.4139+-54.8842        3257.5195+-19.6485         might be 1.0175x faster
   Towers.c                                 267.8196+-0.9613     ?    269.1882+-2.7965        ?

   <geometric>                              713.7646+-4.6354     ?    726.4106+-21.4725       ? might be 1.0177x slower

                                                  Conf#1                    Conf#2                                      
Geomean of preferred means:
   <scaled-result>                           31.1351+-0.3067           31.1058+-0.3086          might be 1.0009x faster
Comment 4 Filip Pizlo 2016-03-17 17:28:11 PDT
Comment on attachment 274338 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=274338&action=review

I suspect that the SideState map is not needed.

> Source/JavaScriptCore/ChangeLog:36
> +        -SideState

Which heap locations list SideState or World as their heap?
Comment 5 Benjamin Poulain 2016-03-17 17:33:51 PDT
(In reply to comment #4)
> Comment on attachment 274338 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=274338&action=review
> 
> I suspect that the SideState map is not needed.
> 
> > Source/JavaScriptCore/ChangeLog:36
> > +        -SideState
> 
> Which heap locations list SideState or World as their heap?

I don't think any. It is just to be complete if someone where to do that.
Since the maps are empty on clobber(), that operation does not do much.
Comment 6 Filip Pizlo 2016-03-17 17:44:13 PDT
(In reply to comment #5)
> (In reply to comment #4)
> > Comment on attachment 274338 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=274338&action=review
> > 
> > I suspect that the SideState map is not needed.
> > 
> > > Source/JavaScriptCore/ChangeLog:36
> > > +        -SideState
> > 
> > Which heap locations list SideState or World as their heap?
> 
> I don't think any. It is just to be complete if someone where to do that.
> Since the maps are empty on clobber(), that operation does not do much.

I believe that it would be wrong to use SideState as the heap of a HeapLocation.  It would also be wrong to use World or Stack directly.

The reason is that a HeapLocation describes some place in the heap that you've already precisely resolved - i.e. it's a global variable and you know which one, or you know it's a field and you know the field's name and the base object Node, or you know it's an array element and you know the kind of array, the base object Node, and the index Node.

If you have resolved something so precisely, then you have no need to use a super abstract catch-all heap like World.

Also, SideState is write-only, so you'd never want to track it for CSE.  It's for writes to things we will read on OSR for example.  It would be wrong to ever read(SideState).  It would be wrong to def using an abstract heap that is never read.  Therefore, SideState is incorrect for HeapLocation.

I think it would be best to actually add a validation rule that rules clobberize() and checks that def() never creates a HeapLocation that refers directly to:
- SideState
- World
- Heap

I think it's OK to have a HeapLocation that refers to Stack, because you might be indexing into the stack.  But if you have a HeapLocation that wants to use World or Heap, then you really should have been more specific.
Comment 7 Benjamin Poulain 2016-03-17 17:47:07 PDT
Comment on attachment 274338 [details]
Patch

I'll make the patch stricter. That's probably be faster too.
Comment 8 Benjamin Poulain 2016-03-17 18:43:30 PDT
Created attachment 274353 [details]
Patch
Comment 9 Filip Pizlo 2016-03-17 18:52:01 PDT
Comment on attachment 274353 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=274353&action=review

> Source/JavaScriptCore/dfg/DFGCSEPhase.cpp:194
> +            auto addResult = m_abstractHeapStackMap.add(abstractHeap, nullptr);

Don't you want this to just be an integer-indexed HashMap?  Once you know that you have Stack, the only thing that matters is the payload, which is an int64_t.  But really, it will contain a VirtualRegister, which is just an int32.  That means you can use INT_MIN/INT_MAX for the empty/deleted values, for example.
Comment 10 Benjamin Poulain 2016-03-17 18:57:16 PDT
(In reply to comment #9)
> Comment on attachment 274353 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=274353&action=review
> 
> > Source/JavaScriptCore/dfg/DFGCSEPhase.cpp:194
> > +            auto addResult = m_abstractHeapStackMap.add(abstractHeap, nullptr);
> 
> Don't you want this to just be an integer-indexed HashMap?  Once you know
> that you have Stack, the only thing that matters is the payload, which is an
> int64_t.  But really, it will contain a VirtualRegister, which is just an
> int32.  That means you can use INT_MIN/INT_MAX for the empty/deleted values,
> for example.

That's a good idea, I did not realize the payload was int32.
Comment 11 Benjamin Poulain 2016-03-17 19:22:44 PDT
Created attachment 274357 [details]
Patch
Comment 12 Benjamin Poulain 2016-03-17 19:23:38 PDT
Created attachment 274358 [details]
Patch
Comment 13 WebKit Commit Bot 2016-03-17 21:07:05 PDT
Comment on attachment 274358 [details]
Patch

Clearing flags on attachment: 274358

Committed r198376: <http://trac.webkit.org/changeset/198376>
Comment 14 WebKit Commit Bot 2016-03-17 21:07:09 PDT
All reviewed patches have been landed.  Closing bug.