Bug 74908 - Temporary GPR should not be lazily allocated in DFG JIT on X86
Summary: Temporary GPR should not be lazily allocated in DFG JIT on X86
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-19 20:51 PST by Yuqiang Xian
Modified: 2011-12-19 23:32 PST (History)
2 users (show)

See Also:


Attachments
the patch (2.50 KB, patch)
2011-12-19 21:10 PST, Yuqiang Xian
fpizlo: review+
Details | Formatted Diff | Diff
the patch (3.85 KB, patch)
2011-12-19 23:13 PST, Yuqiang Xian
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yuqiang Xian 2011-12-19 20:51:29 PST
On X86, we used to allocate a temporary GPR lazily when it's really used rather than defined. This may cause potential issues of allocating registers inside control flow and result in problems in subsequent code generation, for example the DFG JIT may think an operand already being spilled (to satisfy the allocation request) and generate code to read the data from memory, but the allocation and spilling are in a branch which is not taken at runtime, so the generated code is incorrect.

Although current DFG JIT code doesn't have this problematic pattern, it's better to cut-off the root to avoid any potential issues in the future.

Patch forthcoming.
Comment 1 Yuqiang Xian 2011-12-19 21:10:28 PST
Created attachment 119984 [details]
the patch
Comment 2 Filip Pizlo 2011-12-19 21:40:12 PST
I like this change. But I would like to see what performance effect this has. Can you post some performance comparisons, to show that this is either neutral or a win?
Comment 3 Yuqiang Xian 2011-12-19 22:18:22 PST
Seems some cases (especially, Kraken) are impacted by this change, on Linux IA32 (Core i7 Nehalem), possibly caused by some instruction ordering changes, which may need deeper investigation.

The performance result:

Benchmark report for SunSpider, V8, and Kraken on \c.

VMs tested:
"ToT" at /home/yxian/WebKit_orig/WebKitBuild/Release/Source/JavaScriptCore/shell/jsc_efl
"74908" at /mnt/supplement/WebKit/WebKitBuild/Release_efl/Source/JavaScriptCore/shell/jsc_efl

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime()
function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in
milliseconds.

                                               ToT                    74908
SunSpider:
   3d-cube                                8.1819+-1.1747          7.4230+-0.0596         might be 1.1022x faster
   3d-morph                              10.3083+-0.0440    ?    10.3085+-0.0402       ?
   3d-raytrace                            9.5854+-0.0422    ?     9.6253+-0.0472       ?
   access-binary-trees                    2.0577+-0.4597          1.8671+-0.0448         might be 1.1021x faster
   access-fannkuch                       10.5143+-0.0494    ?    10.5700+-0.0385       ?
   access-nbody                           5.4011+-0.0447          5.3716+-0.0419
   access-nsieve                          3.7123+-0.0302          3.6787+-0.0179
   bitops-3bit-bits-in-byte               1.1956+-0.0357          1.1731+-0.0326         might be 1.0192x faster
   bitops-bits-in-byte                    4.6542+-0.0578          4.5883+-0.0299         might be 1.0144x faster
   bitops-bitwise-and                     4.2350+-0.0416          4.2235+-0.0216
   bitops-nsieve-bits                     6.7435+-0.1197          6.7103+-0.0849
   controlflow-recursive                  2.8677+-0.0536          2.8480+-0.0474
   crypto-aes                            10.5288+-0.0560    !    10.7119+-0.0832       ! definitely 1.0174x slower
   crypto-md5                             3.6444+-0.0481          3.6160+-0.0533
   crypto-sha1                            3.0002+-0.0659          2.9547+-0.0457         might be 1.0154x faster
   date-format-tofte                     11.7048+-0.0871    ?    12.4378+-1.5467       ? might be 1.0626x slower
   date-format-xparb                     11.1803+-0.0719    ?    11.1967+-0.0848       ?
   math-cordic                            9.1706+-0.0320    ?     9.1716+-0.0408       ?
   math-partial-sums                     14.5324+-1.4646         13.8742+-0.0552         might be 1.0474x faster
   math-spectral-norm                     2.5975+-0.0438          2.5914+-0.0397
   regexp-dna                             9.1554+-0.0716          9.1394+-0.0621
   string-base64                          5.3683+-0.0487    ?     5.3807+-0.0720       ?
   string-fasta                           9.6004+-0.0515          9.5828+-0.0477
   string-tagcloud                       15.6370+-0.0537    ?    15.6695+-0.0588       ?
   string-unpack-code                    24.9733+-0.2835    ^    24.5256+-0.0843       ^ definitely 1.0183x faster
   string-validate-input                  7.1826+-0.0475    ?     7.1969+-0.0606       ?

   <arithmetic> *                         7.9897+-0.0725          7.9399+-0.0647         might be 1.0063x faster
   <geometric>                            6.4304+-0.0505          6.3797+-0.0265         might be 1.0079x faster
   <harmonic>                             4.9402+-0.0567          4.8836+-0.0293         might be 1.0116x faster

                                               ToT                    74908
V8:
   crypto                                97.0162+-0.4236    ?    97.1926+-0.3836       ?
   deltablue                            166.9306+-0.6094    ?   168.1061+-0.6851       ?
   earley-boyer                         107.9100+-0.2276    ?   108.3725+-0.6809       ?
   raytrace                              52.6538+-0.4061         52.5592+-0.4340
   regexp                               128.7707+-0.7285        128.2945+-0.2053
   richards                             173.4722+-0.6191    !   176.2666+-1.7734       ! definitely 1.0161x slower
   splay                                126.8781+-0.8428    ?   127.1935+-1.4026       ?

   <arithmetic>                         121.9474+-0.2155    !   122.5693+-0.2931       ! definitely 1.0051x slower
   <geometric> *                        114.6731+-0.2402    ?   115.0977+-0.2633       ? might be 1.0037x slower
   <harmonic>                           106.1049+-0.3105    ?   106.3385+-0.2966       ? might be 1.0022x slower

                                               ToT                    74908
Kraken:
   ai-astar                             785.4775+-1.8174        784.4535+-5.3564
   audio-beat-detection                 352.6068+-2.4209    !   362.4537+-1.0279       ! definitely 1.0279x slower
   audio-dft                            344.7694+-1.9229    !   371.6174+-3.1400       ! definitely 1.0779x slower
   audio-fft                            215.2775+-0.1940    !   224.9574+-0.7567       ! definitely 1.0450x slower
   audio-oscillator                     340.3142+-4.0461    ?   343.4859+-2.6913       ?
   imaging-darkroom                     386.6637+-10.3607   ?   387.8107+-10.2604      ?
   imaging-desaturate                   316.4435+-0.3136    ^   289.5850+-0.1891       ^ definitely 1.0927x faster
   imaging-gaussian-blur                628.6579+-1.5163    ^   626.8492+-0.2748       ^ definitely 1.0029x faster
   json-parse-financial                  68.8907+-0.6046    ?    69.0699+-0.5954       ?
   json-stringify-tinderbox              99.8946+-0.3831    !   107.0620+-2.1414       ! definitely 1.0717x slower
   stanford-crypto-aes                  132.6336+-0.5289    ?   134.1565+-1.3846       ? might be 1.0115x slower
   stanford-crypto-ccm                  128.3499+-0.9291    ?   128.4467+-0.4705       ?
   stanford-crypto-pbkdf2               282.3136+-2.1761        282.2923+-0.8014
   stanford-crypto-sha256-iterative     108.4356+-0.2734    !   109.9840+-0.5452       ! definitely 1.0143x slower

   <arithmetic> *                       299.3377+-0.9721    !   301.5874+-1.0977       ! definitely 1.0075x slower
   <geometric>                          238.5068+-0.7272    !   241.3118+-0.8654       ! definitely 1.0118x slower
   <harmonic>                           187.7669+-0.5586    !   190.6914+-0.7903       ! definitely 1.0156x slower

                                               ToT                    74908
All benchmarks:
   <arithmetic>                         111.7467+-0.3133    !   112.4818+-0.3473       ! definitely 1.0066x slower
   <geometric>                           28.9750+-0.1359         28.9653+-0.0887         might be 1.0003x faster
   <harmonic>                             8.6980+-0.0973          8.6030+-0.0503         might be 1.0110x faster

                                               ToT                    74908
Geomean of preferred means:
   <scaled-result>                       64.9695+-0.2435    ?    65.0765+-0.2367       ? might be 1.0016x slower
Comment 4 Filip Pizlo 2011-12-19 22:20:31 PST
Comment on attachment 119984 [details]
the patch

Thanks for running the tests!  The overall performance looks OK to me.  I'll leave it up to you if you want to investigate the Kraken issue more, or if you want to land as-is.  R=me.
Comment 5 Yuqiang Xian 2011-12-19 23:13:19 PST
Created attachment 119991 [details]
the patch

Found the performance root cause and updated the patch.

The new performance result is:

Benchmark report for SunSpider, V8, and Kraken on \c.

VMs tested:
"ToT" at /home/yxian/WebKit_orig/WebKitBuild/Release/Source/JavaScriptCore/shell/jsc_efl
"74908" at /mnt/supplement/WebKit/WebKitBuild/Release_efl/Source/JavaScriptCore/shell/jsc_efl

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime()
function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in
milliseconds.

                                               ToT                    74908
SunSpider:
   3d-cube                                7.4440+-0.0813    ?     7.4742+-0.0635       ?
   3d-morph                              10.3584+-0.0856    ?    10.3893+-0.0923       ?
   3d-raytrace                            9.6399+-0.1474    ?     9.7055+-0.1280       ?
   access-binary-trees                    1.8435+-0.0381    ?     1.8930+-0.0779       ? might be 1.0269x slower
   access-fannkuch                       10.5576+-0.0693    ?    10.5681+-0.0450       ?
   access-nbody                           5.4380+-0.0805          5.3877+-0.0474
   access-nsieve                          3.7275+-0.0526          3.7014+-0.0445
   bitops-3bit-bits-in-byte               1.1854+-0.0383          1.1732+-0.0324         might be 1.0104x faster
   bitops-bits-in-byte                    4.6396+-0.0472          4.6245+-0.0620
   bitops-bitwise-and                     4.2584+-0.0474          4.2339+-0.0436
   bitops-nsieve-bits                     6.7512+-0.0879          6.7050+-0.0850
   controlflow-recursive                  3.2853+-0.5479          2.8959+-0.0633         might be 1.1344x faster
   crypto-aes                            10.5296+-0.0667         10.5109+-0.0734
   crypto-md5                             3.6452+-0.0310    ?     3.6682+-0.0425       ?
   crypto-sha1                            3.0064+-0.0570          2.9904+-0.0770
   date-format-tofte                     11.7823+-0.1143         11.7537+-0.0900
   date-format-xparb                     11.0766+-0.1094    ?    11.4662+-0.4110       ? might be 1.0352x slower
   math-cordic                            9.2163+-0.0833    ?     9.2179+-0.0949       ?
   math-partial-sums                     13.9619+-0.0751    ?    13.9655+-0.0788       ?
   math-spectral-norm                     2.6618+-0.0692          2.6456+-0.0585
   regexp-dna                             9.2515+-0.1169          9.1970+-0.1043
   string-base64                          5.3948+-0.0629    ?     5.3971+-0.0619       ?
   string-fasta                           9.6174+-0.0744    ?     9.6770+-0.0797       ?
   string-tagcloud                       15.7726+-0.0843         15.6713+-0.0877
   string-unpack-code                    24.6931+-0.0797         24.6056+-0.0981
   string-validate-input                  7.2508+-0.0784          7.2343+-0.0573

   <arithmetic> *                         7.9611+-0.0386          7.9520+-0.0259         might be 1.0011x faster
   <geometric>                            6.4309+-0.0476          6.4081+-0.0254         might be 1.0036x faster
   <harmonic>                             4.9437+-0.0469          4.9163+-0.0264         might be 1.0056x faster

                                               ToT                    74908
V8:
   crypto                                97.3909+-0.3260    ?    97.5333+-0.3836       ?
   deltablue                            167.5801+-0.6006    ?   167.9793+-0.8711       ?
   earley-boyer                         108.7405+-0.2172        108.4502+-0.1726
   raytrace                              53.0596+-0.4061         52.8163+-0.4649
   regexp                               128.6210+-0.4573        128.3101+-0.3374
   richards                             175.4681+-1.0956    ^   173.8435+-0.3191       ^ definitely 1.0093x faster
   splay                                127.1825+-1.1579        125.7619+-0.3356         might be 1.0113x faster

   <arithmetic>                         122.5775+-0.2861        122.0992+-0.2293         might be 1.0039x faster
   <geometric> *                        115.2596+-0.2390        114.8277+-0.2664         might be 1.0038x faster
   <harmonic>                           106.6765+-0.2646        106.2852+-0.3581         might be 1.0037x faster

                                               ToT                    74908
Kraken:
   ai-astar                             786.9061+-1.1917    ?   787.0541+-0.6627       ?
   audio-beat-detection                 358.1306+-6.2473        355.5454+-2.8121
   audio-dft                            348.0289+-2.0739    ?   349.0720+-2.3584       ?
   audio-fft                            216.8900+-0.9561    ?   217.2359+-0.6450       ?
   audio-oscillator                     340.9966+-2.6576        339.1060+-3.3476
   imaging-darkroom                     388.1350+-9.4523        385.7510+-10.3425
   imaging-desaturate                   317.7499+-0.7087        317.3586+-1.5265
   imaging-gaussian-blur                631.5004+-0.9397    ^   592.7648+-1.7196       ^ definitely 1.0653x faster
   json-parse-financial                  69.2648+-0.7859         68.9126+-0.5316
   json-stringify-tinderbox             101.1172+-0.6247    ?   101.6840+-0.7057       ?
   stanford-crypto-aes                  133.1517+-0.3595    ^   131.7723+-0.2330       ^ definitely 1.0105x faster
   stanford-crypto-ccm                  128.3865+-0.6013        128.1132+-0.4693
   stanford-crypto-pbkdf2               283.4278+-0.7111    ^   279.5538+-1.4392       ^ definitely 1.0139x faster
   stanford-crypto-sha256-iterative     108.9552+-0.7702        108.2972+-0.3137

   <arithmetic> *                       300.9029+-0.9610    ^   297.3015+-1.1468       ^ definitely 1.0121x faster
   <geometric>                          239.8573+-0.7456    ^   237.9736+-0.8437       ^ definitely 1.0079x faster
   <harmonic>                           188.8577+-0.6692        187.9468+-0.6201         might be 1.0048x faster

                                               ToT                    74908
All benchmarks:
   <arithmetic>                         112.2909+-0.3145    ^   111.1419+-0.3781       ^ definitely 1.0103x faster
   <geometric>                           29.0472+-0.1370         28.9062+-0.0918         might be 1.0049x faster
   <harmonic>                             8.7053+-0.0806          8.6573+-0.0456         might be 1.0055x faster

                                               ToT                    74908
Geomean of preferred means:
   <scaled-result>                       65.1164+-0.1789    ^    64.7499+-0.1814       ^ definitely 1.0057x faster
Comment 6 Filip Pizlo 2011-12-19 23:24:22 PST
Comment on attachment 119991 [details]
the patch

r=me
Comment 7 Yuqiang Xian 2011-12-19 23:32:06 PST
Landed as r103306: http://trac.webkit.org/changeset/103306
Comment 8 Yuqiang Xian 2011-12-19 23:32:44 PST
Comment on attachment 119991 [details]
the patch

Clearing flags.