Bug 75342 - Adjust spill order in DFG
Summary: Adjust spill order in DFG
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-29 04:39 PST by Yuqiang Xian
Modified: 2014-02-05 10:51 PST (History)
5 users (show)

See Also:


Attachments
the patch (4.93 KB, patch)
2011-12-29 04:46 PST, Yuqiang Xian
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
patch updated (4.98 KB, patch)
2012-01-04 22:18 PST, Yuqiang Xian
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yuqiang Xian 2011-12-29 04:39:16 PST
The register spill order in DFG needs to be adjusted to reflect current spilling policy that no primitive value is boxed. 

Patch forthcoming.
Comment 1 Yuqiang Xian 2011-12-29 04:46:16 PST
Created attachment 120731 [details]
the patch
Comment 2 Yuqiang Xian 2011-12-29 04:52:23 PST
Performance result on 32bit (Core i7 Nehalem Linux): Some cases get obvious improvement while some get slowdown (needs further investigation). Overall almost neutral.

VMs tested:
"ToT" at /home/yxian/WebKit_orig/WebKitBuild/Release/Source/JavaScriptCore/shell/jsc_efl
"SpillOrder" at /mnt/supplement/WebKit/WebKitBuild/Release_efl/Source/JavaScriptCore/shell/jsc_efl

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime()
function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in
milliseconds.

                                               ToT                  SpillOrder
SunSpider:
   3d-cube                                7.1117+-0.0836    !     7.4183+-0.0941       ! definitely 1.0431x slower
   3d-morph                              10.6417+-0.0857         10.6191+-0.1000
   3d-raytrace                            9.5057+-0.0722    ?     9.5482+-0.0750       ?
   access-binary-trees                    1.8021+-0.0360    ?     1.8418+-0.0472       ? might be 1.0220x slower
   access-fannkuch                       10.7467+-0.0628    !    10.8864+-0.0622       ! definitely 1.0130x slower
   access-nbody                           5.4244+-0.0447    ^     4.3335+-0.0586       ^ definitely 1.2517x faster
   access-nsieve                          3.7283+-0.0538          3.6961+-0.0432
   bitops-3bit-bits-in-byte               1.1868+-0.0384          1.1696+-0.0299         might be 1.0147x faster
   bitops-bits-in-byte                    4.6701+-0.0406    ?     4.6898+-0.0489       ?
   bitops-bitwise-and                     4.2241+-0.0366          4.2229+-0.0515
   bitops-nsieve-bits                     6.8010+-0.1106          6.7109+-0.1390         might be 1.0134x faster
   controlflow-recursive                  2.7756+-0.0994          2.7524+-0.0462
   crypto-aes                             9.8318+-0.1618          9.7819+-0.1195
   crypto-md5                             3.0158+-0.0298    ?     3.0516+-0.0669       ? might be 1.0118x slower
   crypto-sha1                            2.6072+-0.0896          2.5776+-0.0408         might be 1.0115x faster
   date-format-tofte                     11.6659+-0.1436    ?    11.7558+-0.1014       ?
   date-format-xparb                     11.3672+-0.1453         11.1826+-0.1065         might be 1.0165x faster
   math-cordic                            9.1636+-0.0433    ?     9.1973+-0.0679       ?
   math-partial-sums                     13.8785+-0.0619         13.8386+-0.0674
   math-spectral-norm                     2.5840+-0.0408    ?     2.6044+-0.0486       ?
   regexp-dna                             9.0942+-0.0692    ?     9.1036+-0.0740       ?
   string-base64                          5.3639+-0.0457          5.3459+-0.0590
   string-fasta                           9.4544+-0.0840    ?     9.4748+-0.0962       ?
   string-tagcloud                       15.3900+-0.0881    ?    15.5524+-0.1076       ? might be 1.0106x slower
   string-unpack-code                    24.4761+-0.1017    !    24.7848+-0.2017       ! definitely 1.0126x slower
   string-validate-input                  7.3702+-0.0667    ?     7.4342+-0.0782       ?

   <arithmetic> *                         7.8416+-0.0215          7.8298+-0.0235         might be 1.0015x faster
   <geometric>                            6.2691+-0.0241          6.2310+-0.0176         might be 1.0061x faster
   <harmonic>                             4.7724+-0.0372          4.7361+-0.0222         might be 1.0077x faster

                                               ToT                  SpillOrder
V8:
   crypto                                97.1258+-1.0538    ?    99.0318+-1.2908       ? might be 1.0196x slower
   deltablue                            167.9481+-0.6349    ?   168.0176+-0.8081       ?
   earley-boyer                         107.8401+-0.1878    !   108.4812+-0.4099       ! definitely 1.0059x slower
   raytrace                              53.0712+-0.4534         52.6155+-0.5008
   regexp                               129.2446+-1.0419        128.8020+-0.5767
   richards                             173.4236+-0.3544    ^   171.5025+-1.5316       ^ definitely 1.0112x faster
   splay                                125.3138+-0.4902        124.7199+-0.5666

   <arithmetic>                         121.9953+-0.3170        121.8815+-0.3005         might be 1.0009x faster
   <geometric> *                        114.7610+-0.3722        114.7226+-0.2784         might be 1.0003x faster
   <harmonic>                           106.2956+-0.4520        106.2297+-0.3151         might be 1.0006x faster

                                               ToT                  SpillOrder
Kraken:
   ai-astar                             785.4906+-1.0815    ?   786.9144+-1.8163       ?
   audio-beat-detection                 248.5505+-1.2685    !   256.1680+-0.2308       ! definitely 1.0306x slower
   audio-dft                            355.2405+-1.9364    ?   358.1129+-2.4869       ?
   audio-fft                            155.0614+-0.0526    !   163.8682+-0.6396       ! definitely 1.0568x slower
   audio-oscillator                     335.0814+-1.3393    !   341.2124+-1.6898       ! definitely 1.0183x slower
   imaging-darkroom                     380.4842+-9.7790    ?   382.8186+-10.1730      ?
   imaging-desaturate                   317.8581+-1.0630        317.4988+-0.4798
   imaging-gaussian-blur                596.2950+-1.4765    ^   537.7586+-0.9815       ^ definitely 1.1089x faster
   json-parse-financial                  69.5950+-0.4176    ?    70.1253+-0.5553       ?
   json-stringify-tinderbox             105.5944+-0.8867        104.1428+-1.3432         might be 1.0139x faster
   stanford-crypto-aes                  132.9845+-0.3843    ?   133.6331+-0.3833       ?
   stanford-crypto-ccm                  127.3089+-0.5082    ?   127.8093+-0.4991       ?
   stanford-crypto-pbkdf2               281.0584+-0.6893    !   283.0713+-0.8316       ! definitely 1.0072x slower
   stanford-crypto-sha256-iterative     108.2076+-0.2986        108.1264+-0.3654

   <arithmetic> *                       285.6293+-0.8992    ^   283.6614+-0.8614       ^ definitely 1.0069x faster
   <geometric>                          227.2966+-0.6312    ?   227.6839+-0.7165       ? might be 1.0017x slower
   <harmonic>                           181.7799+-0.4415    ?   182.8114+-0.6295       ? might be 1.0057x slower

                                               ToT                  SpillOrder
All benchmarks:
   <arithmetic>                         107.5885+-0.3070    ^   106.9788+-0.2664       ^ definitely 1.0057x faster
   <geometric>                           28.1674+-0.0808         28.0854+-0.0606         might be 1.0029x faster
   <harmonic>                             8.4065+-0.0638          8.3448+-0.0381         might be 1.0074x faster

                                               ToT                  SpillOrder
Geomean of preferred means:
   <scaled-result>                       63.5818+-0.1712         63.3964+-0.1339         might be 1.0029x faster
Comment 3 Yuqiang Xian 2011-12-29 04:59:52 PST
Performance result on 64bit (Mac OS): Overall almost neutral while a obvious slowdown (8%) on imaging-gaussian-blur is observed (on 32bit there's a 11% improvement for this case), which may be further investigated in the future.

VMs tested:
"ToT" at /webkit/WebKitBuild-orig/Release/jsc
"SpillOrder" at /webkit/WebKitBuild-64/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime()
function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in
milliseconds.

                                               ToT                  SpillOrder
SunSpider:
   3d-cube                                6.5219+-0.0772    ?     6.5384+-0.0815       ?
   3d-morph                               8.0982+-0.0638    ?     8.1566+-0.0522       ?
   3d-raytrace                            8.3841+-0.8049          7.8268+-0.1195         might be 1.0712x faster
   access-binary-trees                    1.7863+-0.0935          1.7486+-0.0512         might be 1.0215x faster
   access-fannkuch                        7.3589+-0.0960          7.3325+-0.0762
   access-nbody                           3.8475+-0.0496    ?     3.8839+-0.0597       ?
   access-nsieve                          3.1162+-0.0518    ?     3.1720+-0.0791       ? might be 1.0179x slower
   bitops-3bit-bits-in-byte               1.2023+-0.0080    ?     1.2062+-0.0125       ?
   bitops-bits-in-byte                    4.8450+-0.0564    ?     4.8767+-0.0639       ?
   bitops-bitwise-and                     3.1650+-0.0216          3.1500+-0.0052
   bitops-nsieve-bits                     5.4344+-0.0415    ?     5.4718+-0.0413       ?
   controlflow-recursive                  2.2807+-0.0339          2.2514+-0.0101         might be 1.0130x faster
   crypto-aes                             8.5308+-0.1246          8.4498+-0.1046
   crypto-md5                             2.4437+-0.0741          2.4285+-0.0325
   crypto-sha1                            2.2442+-0.0703    ?     2.2763+-0.1707       ? might be 1.0143x slower
   date-format-tofte                     11.0530+-0.0630    ?    11.1376+-0.0764       ?
   date-format-xparb                      9.5936+-0.0409    ^     9.4306+-0.0888       ^ definitely 1.0173x faster
   math-cordic                            7.0268+-0.0950          6.9927+-0.0693
   math-partial-sums                     10.3018+-0.0443         10.2964+-0.0644
   math-spectral-norm                     2.5789+-0.0124          2.5767+-0.0079
   regexp-dna                             9.0382+-0.1357    ?     9.1914+-0.1509       ? might be 1.0169x slower
   string-base64                          4.6544+-0.0123    ?     4.6783+-0.0162       ?
   string-fasta                           7.8880+-0.1092    ?     7.8938+-0.1069       ?
   string-tagcloud                       12.8570+-0.1142    ?    12.9316+-0.0944       ?
   string-unpack-code                    21.8200+-0.2621         21.6093+-0.1502
   string-validate-input                  6.0590+-0.0585    ?     6.0633+-0.2153       ?

   <arithmetic> *                         6.6204+-0.0354          6.5989+-0.0214         might be 1.0033x faster
   <geometric>                            5.3390+-0.0330          5.3287+-0.0287         might be 1.0019x faster
   <harmonic>                             4.1997+-0.0354          4.1934+-0.0359         might be 1.0015x faster

                                               ToT                  SpillOrder
V8:
   crypto                                75.4973+-0.2389         75.0776+-0.3059
   deltablue                            161.0954+-0.4027    !   163.9173+-0.4714       ! definitely 1.0175x slower
   earley-boyer                          99.6115+-1.5928    ?   100.1018+-1.4401       ?
   raytrace                              50.3873+-0.2048    ^    49.8641+-0.0921       ^ definitely 1.0105x faster
   regexp                               121.3851+-0.1344    !   122.4113+-0.3352       ! definitely 1.0085x slower
   richards                             135.4591+-0.2543        134.3140+-1.1460
   splay                                 97.6467+-0.9025         96.1546+-1.4374         might be 1.0155x faster

   <arithmetic>                         105.8689+-0.2327    ?   105.9773+-0.2850       ? might be 1.0010x slower
   <geometric> *                         99.7080+-0.2394         99.5747+-0.2911         might be 1.0013x faster
   <harmonic>                            93.0755+-0.2284         92.7282+-0.2595         might be 1.0037x faster

                                               ToT                  SpillOrder
Kraken:
   ai-astar                             837.8626+-49.6049       796.1270+-0.4181         might be 1.0524x faster
   audio-beat-detection                 186.1555+-0.9161    ?   187.6734+-1.2043       ?
   audio-dft                            446.1301+-1.5645    ?   453.0984+-8.1987       ? might be 1.0156x slower
   audio-fft                            122.8065+-7.3403        118.5203+-3.1245         might be 1.0362x faster
   audio-oscillator                     256.0711+-1.2247        255.9696+-1.2486
   imaging-darkroom                     290.8491+-5.3218    ?   291.6089+-5.3305       ?
   imaging-desaturate                   220.6728+-0.0440    ?   220.7556+-0.0591       ?
   imaging-gaussian-blur                497.9859+-0.9100    !   540.6406+-0.0741       ! definitely 1.0857x slower
   json-parse-financial                  62.6418+-1.6843         61.9968+-0.0362         might be 1.0104x faster
   json-stringify-tinderbox              75.5961+-0.1585         75.2695+-0.1742
   stanford-crypto-aes                  111.9879+-0.3878        111.9130+-0.2919
   stanford-crypto-ccm                  109.2299+-0.8334    ?   109.3411+-0.8627       ?
   stanford-crypto-pbkdf2               213.3721+-0.1806    !   214.8904+-0.2552       ! definitely 1.0071x slower
   stanford-crypto-sha256-iterative     104.4605+-0.1416    ?   104.5802+-0.1995       ?

   <arithmetic> *                       252.5587+-3.6387    ?   253.0275+-0.8441       ? might be 1.0019x slower
   <geometric>                          191.5593+-1.3249    ?   191.8617+-0.5199       ? might be 1.0016x slower
   <harmonic>                           151.4256+-1.0019        151.0671+-0.3450         might be 1.0024x faster

                                               ToT                  SpillOrder
All benchmarks:
   <arithmetic>                          94.6603+-1.1013    ?    94.8042+-0.2741       ? might be 1.0015x slower
   <geometric>                           23.9847+-0.1147         23.9658+-0.0879         might be 1.0008x faster
   <harmonic>                             7.3915+-0.0609          7.3801+-0.0618         might be 1.0015x faster

                                               ToT                  SpillOrder
Geomean of preferred means:
   <scaled-result>                       55.0351+-0.3552         54.9871+-0.1370         might be 1.0009x faster
Comment 4 WebKit Review Bot 2011-12-29 05:33:10 PST
Comment on attachment 120731 [details]
the patch

Attachment 120731 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/11047213

New failing tests:
http/tests/inspector/resource-tree/resource-tree-document-url.html
Comment 5 Yuqiang Xian 2011-12-29 16:53:23 PST
(In reply to comment #4)
> (From update of attachment 120731 [details])
> Attachment 120731 [details] did not pass chromium-ews (chromium-xvfb):
> Output: http://queues.webkit.org/results/11047213
> 
> New failing tests:
> http/tests/inspector/resource-tree/resource-tree-document-url.html

I think this should be a fake failure as JSC has nothing to do with Chromium.
Comment 6 Adam Barth 2011-12-29 16:59:01 PST
> I think this should be a fake failure as JSC has nothing to do with Chromium.

Thanks.  It looks like that test is somewhat flaky.
Comment 7 Yuqiang Xian 2012-01-04 22:18:30 PST
Created attachment 121216 [details]
patch updated

Update the patch to _not_ adjust the spill order for 64bit, i.e. still Integers have lower priority to be spilled as there may be additional operations - in some cases the spilled integers are boxed as JS integers when being filled (btw, is it necessary?). The regression of the last patch on imaging-gaussian-blur should be caused by this.
So this patch keeps the spill order for 64bit as is, but simply modifies the comments a bit to better describe the reason, and also removes SpillOrderBoolean for 64bit as it's never used.
Comment 8 Filip Pizlo 2013-10-31 11:32:30 PDT
Comment on attachment 121216 [details]
patch updated

View in context: https://bugs.webkit.org/attachment.cgi?id=121216&action=review

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h:169
>      enum SpillOrder {
>          SpillOrderConstant = 1, // no spill, and cheap fill
>          SpillOrderSpilled  = 2, // no spill
> -        SpillOrderJS       = 4, // needs spill
> -        SpillOrderCell     = 4, // needs spill
> -        SpillOrderStorage  = 4, // needs spill
> -        SpillOrderInteger  = 5, // needs spill and box
> -        SpillOrderBoolean  = 5, // needs spill and box
> -        SpillOrderDouble   = 6, // needs spill and convert
> +        SpillOrderStorage  = 3, // needs spill
> +        SpillOrderCell     = 3, // needs spill
> +        SpillOrderJS       = 3, // needs spill
> +        SpillOrderInteger  = 4, // needs spill, and complex fill
> +        SpillOrderDouble   = 5, // needs spill
>      };
>  #elif USE(JSVALUE32_64)
>      enum SpillOrder {
>          SpillOrderConstant = 1, // no spill, and cheap fill
>          SpillOrderSpilled  = 2, // no spill
> -        SpillOrderJS       = 4, // needs spill
> -        SpillOrderStorage  = 4, // needs spill
> -        SpillOrderDouble   = 4, // needs spill
> -        SpillOrderInteger  = 5, // needs spill and box
> -        SpillOrderCell     = 5, // needs spill and box
> -        SpillOrderBoolean  = 5, // needs spill and box
> +        SpillOrderStorage  = 3, // spill involving one GPR
> +        SpillOrderCell     = 3, // spill involving one GPR
> +        SpillOrderInteger  = 3, // spill involving one GPR
> +        SpillOrderBoolean  = 3, // spill involving one GPR
> +        SpillOrderJS       = 4, // spill involving two GPRs
> +        SpillOrderDouble   = 5, // spill involving one FPR
>      };
>  #endif

Does changing any of this have any effect on performance?

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h:-674
> -            // We need to box int32 and cell values ...
> -            // but on JSVALUE64 boxing a cell is a no-op!
> -            if (spillFormat == DataFormatInteger)
> -                m_jit.orPtr(GPRInfo::tagTypeNumberRegister, reg);
> -            

Why is this removed?  Seems wrong.
Comment 9 Yuqiang Xian 2013-10-31 18:00:27 PDT
Thanks for taking a look at this "nearly two years old" patch. :)

(In reply to comment #8)
> (From update of attachment 121216 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=121216&action=review
> 
> Does changing any of this have any effect on performance?

The performance data was presented in comment #2. The second patch doesn't change the order for JSVALUE64 and as I can remember there's no performance impact on 64bit platforms.

> 
> > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h:-674
> > -            // We need to box int32 and cell values ...
> > -            // but on JSVALUE64 boxing a cell is a no-op!
> > -            if (spillFormat == DataFormatInteger)
> > -                m_jit.orPtr(GPRInfo::tagTypeNumberRegister, reg);
> > -            
> 
> Why is this removed?  Seems wrong.

This can be removed. Actually we already captured the DataFormatInteger (now DataFormatInt32) case in a previous "case", and when we reached here we actually only have the following cases:

RELEASE_ASSERT(spillFormat == DataFormatCell || spillFormat & DataFormatJS);
Comment 10 Anders Carlsson 2014-02-05 10:51:29 PST
Comment on attachment 121216 [details]
patch updated

Clearing review flag on patches from before 2014. If this patch is still relevant, please reset the r? flag.