Bug 69314

Summary: DFG should inline Array.push and Array.pop
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ggaren, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
work in progress
none
the patch
oliver: review+
the patch to fix Geoff's concerns ggaren: review+

Description Filip Pizlo 2011-10-03 19:10:03 PDT
Array.push and pop have a common case that is trivial to inline.  They are both easy to mark intrinsic and recognize in the DFG parser.
Comment 1 Filip Pizlo 2011-10-03 19:12:52 PDT
Created attachment 109571 [details]
work in progress
Comment 2 Filip Pizlo 2011-10-03 19:13:08 PDT
Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"ArrayPushPop" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree              ArrayPushPop                                  
SunSpider:
   3d-cube                                7.5292+-0.2685          7.4916+-0.2162       
   3d-morph                               7.3690+-0.1498          7.3064+-0.1296       
   3d-raytrace                            7.4644+-0.1651          7.4570+-0.2123       
   access-binary-trees                    1.7149+-0.0517    ?     1.7663+-0.0731       ? might be 1.0300x slower
   access-fannkuch                        6.4593+-0.2204          6.3815+-0.1350         might be 1.0122x faster
   access-nbody                           3.4865+-0.0837    ?     3.5506+-0.0637       ? might be 1.0184x slower
   access-nsieve                          2.6700+-0.0462          2.6154+-0.0684         might be 1.0209x faster
   bitops-3bit-bits-in-byte               1.7404+-0.0418          1.7285+-0.0722       
   bitops-bits-in-byte                    2.7215+-0.0405    ?     2.7849+-0.0656       ? might be 1.0233x slower
   bitops-bitwise-and                     3.3357+-0.0719    ?     3.3819+-0.0840       ? might be 1.0139x slower
   bitops-nsieve-bits                     5.4838+-0.1237          5.4370+-0.1371       
   controlflow-recursive                  2.1070+-0.0606          2.0732+-0.0486         might be 1.0163x faster
   crypto-aes                             6.5896+-0.2535    ?     6.6451+-0.2620       ?
   crypto-md5                             2.8113+-0.0864    ?     2.8687+-0.1216       ? might be 1.0204x slower
   crypto-sha1                            2.5359+-0.0903          2.4997+-0.0786         might be 1.0145x faster
   date-format-tofte                     10.3008+-0.3695         10.0081+-0.2452         might be 1.0292x faster
   date-format-xparb                      9.2723+-0.2153    ?     9.6544+-0.3292       ? might be 1.0412x slower
   math-cordic                            6.4975+-0.1773          6.2694+-0.0924         might be 1.0364x faster
   math-partial-sums                      7.6509+-0.1885          7.5679+-0.1814         might be 1.0110x faster
   math-spectral-norm                     2.8249+-0.0756          2.8108+-0.0630       
   regexp-dna                            10.6957+-0.2133    ?    10.8980+-0.2347       ? might be 1.0189x slower
   string-base64                          5.6088+-0.2108          5.4974+-0.1223         might be 1.0203x faster
   string-fasta                           6.7084+-0.2026          6.5720+-0.1958         might be 1.0208x faster
   string-tagcloud                       11.8739+-0.4024         11.7775+-0.3457       
   string-unpack-code                    21.2762+-0.3991         21.1529+-0.5638       
   string-validate-input                  6.3806+-0.2175          6.3697+-0.1696       

   <arithmetic> *                         6.2734+-0.0391          6.2525+-0.0317       
   <geometric>                            5.1439+-0.0283          5.1324+-0.0220       
   <harmonic>                             4.2177+-0.0296          4.2156+-0.0345       

                                            TipOfTree              ArrayPushPop                                  
V8:
   crypto                                71.9777+-0.3315         71.9465+-0.4020       
   deltablue                            226.6944+-1.3479    ^   213.3755+-1.1471       ^ definitely 1.0624x faster
   earley-boyer                          87.8452+-1.3144         87.5402+-0.2419       
   raytrace                              61.2115+-0.4091    !    61.9857+-0.3375       ! definitely 1.0126x slower
   regexp                               103.9358+-0.7486        103.5198+-0.4380       
   richards                             187.2110+-0.4576        186.3760+-0.7210       
   splay                                 91.3657+-0.6363         90.6163+-0.6436       

   <arithmetic>                         118.6059+-0.2335    ^   116.4800+-0.2288       ^ definitely 1.0183x faster
   <geometric> *                        106.5501+-0.2310    ^   105.5149+-0.1960       ^ definitely 1.0098x faster
   <harmonic>                            97.3001+-0.2407         96.9440+-0.1968       

                                            TipOfTree              ArrayPushPop                                  
Kraken:
   ai-astar                             490.0243+-3.5827        488.5005+-2.0491       
   audio-beat-detection                 189.5874+-0.9309    ?   190.0763+-1.3708       ?
   audio-dft                            269.2471+-2.7212    ?   269.3562+-2.5368       ?
   audio-fft                            125.3247+-0.8031        125.3164+-0.2451       
   audio-oscillator                     245.6467+-1.8417        244.7928+-1.9703       
   imaging-darkroom                     418.5553+-1.0923    ?   419.8428+-1.6280       ?
   imaging-desaturate                   224.0831+-0.4321        223.2770+-0.8439       
   imaging-gaussian-blur                583.0219+-0.8408        582.1372+-0.7914       
   json-parse-financial                  48.3008+-0.4007    ?    48.9651+-0.2916       ? might be 1.0138x slower
   json-stringify-tinderbox              67.4519+-0.2791    !    68.7807+-0.6156       ! definitely 1.0197x slower
   stanford-crypto-aes                  129.2170+-1.4831    ?   130.8275+-2.3204       ? might be 1.0125x slower
   stanford-crypto-ccm                  100.7769+-0.5569        100.6993+-0.7224       
   stanford-crypto-pbkdf2               194.2613+-0.7755    ^   192.3477+-0.8872       ^ definitely 1.0099x faster
   stanford-crypto-sha256-iterative      76.3842+-0.2803    ^    75.0738+-0.4485       ^ definitely 1.0175x faster

   <arithmetic> *                       225.8488+-0.4877        225.7138+-0.5722       
   <geometric>                          175.5947+-0.4593    ?   175.7403+-0.4814       ?
   <harmonic>                           135.8117+-0.4880    ?   136.2780+-0.4244       ?

                                            TipOfTree              ArrayPushPop                                  
All benchmarks:
   <arithmetic>                          88.4092+-0.1537    ^    88.0408+-0.1957       ^ definitely 1.0042x faster
   <geometric>                           23.1224+-0.0772         23.0659+-0.0677       
   <harmonic>                             7.4137+-0.0508          7.4102+-0.0592       

                                            TipOfTree              ArrayPushPop                                  
Geomean of preferred means:
   <scaled-result>                       53.2459+-0.1127    ^    53.0037+-0.1270       ^ definitely 1.0046x faster
Comment 3 Filip Pizlo 2011-10-03 19:32:47 PDT
Created attachment 109574 [details]
the patch
Comment 4 Oliver Hunt 2011-10-03 19:36:16 PDT
Comment on attachment 109574 [details]
the patch

So much sadness in seeing this code written twice :-/
Comment 5 Filip Pizlo 2011-10-03 19:40:28 PDT
Updated performance numbers after implementing 32-bit.



Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"ArrayPushPop" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree              ArrayPushPop                                  
SunSpider:
   3d-cube                                7.4501+-0.2440          7.3643+-0.1557         might be 1.0117x faster
   3d-morph                               7.5153+-0.1409          7.3244+-0.1445         might be 1.0261x faster
   3d-raytrace                            7.7931+-0.3258          7.4860+-0.1856         might be 1.0410x faster
   access-binary-trees                    1.7623+-0.0869          1.7397+-0.0671         might be 1.0130x faster
   access-fannkuch                        6.3620+-0.1525          6.3063+-0.1004       
   access-nbody                           3.5119+-0.0704    ?     3.5548+-0.1139       ? might be 1.0122x slower
   access-nsieve                          2.6261+-0.0668    ?     2.6422+-0.0647       ?
   bitops-3bit-bits-in-byte               1.7484+-0.0531    ?     1.7505+-0.0409       ?
   bitops-bits-in-byte                    2.7235+-0.0699          2.7132+-0.0929       
   bitops-bitwise-and                     3.3628+-0.0731          3.3303+-0.0849       
   bitops-nsieve-bits                     5.3552+-0.1130    ?     5.5315+-0.1475       ? might be 1.0329x slower
   controlflow-recursive                  2.0652+-0.0315    ?     2.0902+-0.0544       ? might be 1.0121x slower
   crypto-aes                             6.7184+-0.2505    ?     6.7252+-0.2908       ?
   crypto-md5                             2.7716+-0.0827    ?     2.8147+-0.0787       ? might be 1.0155x slower
   crypto-sha1                            2.4449+-0.0748    ?     2.5473+-0.0594       ? might be 1.0419x slower
   date-format-tofte                     10.0495+-0.2735    ?    10.1968+-0.2721       ? might be 1.0147x slower
   date-format-xparb                      9.2339+-0.2371    ?     9.4363+-0.1953       ? might be 1.0219x slower
   math-cordic                            6.4047+-0.1813    ?     6.4916+-0.1603       ? might be 1.0136x slower
   math-partial-sums                      7.6492+-0.1104          7.5506+-0.1539         might be 1.0130x faster
   math-spectral-norm                     2.8396+-0.0693    ?     2.8464+-0.0796       ?
   regexp-dna                            10.8828+-0.2515         10.8588+-0.2314       
   string-base64                          5.3824+-0.0973    ?     5.5847+-0.1752       ? might be 1.0376x slower
   string-fasta                           6.6755+-0.1671          6.5792+-0.1215         might be 1.0146x faster
   string-tagcloud                       11.8123+-0.3088         11.5394+-0.3608         might be 1.0236x faster
   string-unpack-code                    21.2662+-0.4159    ?    21.2771+-0.4652       ?
   string-validate-input                  6.4247+-0.1741    ?     6.5574+-0.1890       ? might be 1.0207x slower

   <arithmetic> *                         6.2627+-0.0328    ?     6.2630+-0.0282       ?
   <geometric>                            5.1298+-0.0249    ?     5.1429+-0.0272       ?
   <harmonic>                             4.2046+-0.0371    ?     4.2236+-0.0369       ?

                                            TipOfTree              ArrayPushPop                                  
V8:
   crypto                                72.2659+-0.5199         72.0394+-0.2829       
   deltablue                            226.7959+-2.5189    ^   212.2174+-1.2944       ^ definitely 1.0687x faster
   earley-boyer                          87.3534+-0.2806         87.2338+-0.2535       
   raytrace                              61.2481+-0.2701    ?    61.6174+-0.6874       ?
   regexp                               103.8244+-0.4626    ?   104.3105+-1.5225       ?
   richards                             187.4970+-1.4137        185.9868+-0.7538       
   splay                                 91.2620+-0.4211    ?    92.0918+-0.6788       ?

   <arithmetic>                         118.6067+-0.4224    ^   116.4996+-0.5168       ^ definitely 1.0181x faster
   <geometric> *                        106.5333+-0.2654    ^   105.6339+-0.5156       ^ definitely 1.0085x faster
   <harmonic>                            97.2911+-0.2082         97.0665+-0.5095       

                                            TipOfTree              ArrayPushPop                                  
Kraken:
   ai-astar                             499.3114+-1.7404    ^   495.2293+-1.4996       ^ definitely 1.0082x faster
   audio-beat-detection                 189.2983+-1.3873    ?   190.5103+-0.8675       ?
   audio-dft                            270.4937+-2.4771    ?   279.0681+-8.9224       ? might be 1.0317x slower
   audio-fft                            125.6829+-0.8037        125.4832+-0.4322       
   audio-oscillator                     247.0646+-1.6372        245.4810+-1.8741       
   imaging-darkroom                     418.2131+-1.3020    ?   420.0575+-1.9099       ?
   imaging-desaturate                   223.7626+-0.6231    ?   224.0123+-0.9501       ?
   imaging-gaussian-blur                583.4748+-1.6843        580.2892+-1.7409       
   json-parse-financial                  48.2553+-0.2874    ?    48.6114+-0.3813       ?
   json-stringify-tinderbox              69.0173+-0.9414         67.9909+-0.2805         might be 1.0151x faster
   stanford-crypto-aes                  130.2836+-1.5283    ?   130.7792+-1.7017       ?
   stanford-crypto-ccm                  101.0237+-0.4357    ^    99.8091+-0.5694       ^ definitely 1.0122x faster
   stanford-crypto-pbkdf2               194.9367+-1.6563        192.7134+-1.9841         might be 1.0115x faster
   stanford-crypto-sha256-iterative      76.9176+-0.2668    ?    77.6517+-2.6786       ?

   <arithmetic> *                       226.9811+-0.3435        226.9776+-0.8794       
   <geometric>                          176.4986+-0.3526    ?   176.5179+-0.7770       ?
   <harmonic>                           136.5980+-0.3590        136.5628+-0.7900       

                                            TipOfTree              ArrayPushPop                                  
All benchmarks:
   <arithmetic>                          88.7407+-0.1208         88.4260+-0.3158       
   <geometric>                           23.1221+-0.0698    ?    23.1261+-0.0792       ?
   <harmonic>                             7.3920+-0.0636    ?     7.4244+-0.0632       ?

                                            TipOfTree              ArrayPushPop                                  
Geomean of preferred means:
   <scaled-result>                       53.3020+-0.1172         53.1518+-0.1505
Comment 6 Filip Pizlo 2011-10-03 19:55:19 PDT
Landed in r96567.
Comment 7 Geoffrey Garen 2011-10-03 20:01:30 PDT
Looks good overall, but I found some potential problems.

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT32_64.cpp:1284
> +        speculationCheck(m_jit.branch32(MacroAssembler::Above, storageLengthGPR, TrustedImm32(0x7ffffffe)));

Why is 0x7ffffffe special? I think we only need to guard unsigned overflow here. Could this just turn into a jo after the increment of storageLengthGPR?

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT32_64.cpp:1346
> +        MacroAssembler::Jump notHole = m_jit.branchTestPtr(MacroAssembler::NonZero, valueTagGPR);
> +        MacroAssembler::Jump holeCase = m_jit.branchTestPtr(MacroAssembler::Zero, valuePayloadGPR);

The whole value test in 32_64 should test the tag for equality to JSValue::EmptyValueTag. I think this is one more branch than necessary, and not correct.

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT32_64.cpp:1351
> +        m_jit.store32(storageLengthGPR, MacroAssembler::BaseIndex(storageGPR, storageLengthGPR, MacroAssembler::ScalePtr, OBJECT_OFFSETOF(ArrayStorage, m_vector[0]) + OBJECT_OFFSETOF(JSValue, u.asBits.tag)));
> +        m_jit.store32(storageLengthGPR, MacroAssembler::BaseIndex(storageGPR, storageLengthGPR, MacroAssembler::ScalePtr, OBJECT_OFFSETOF(ArrayStorage, m_vector[0]) + OBJECT_OFFSETOF(JSValue, u.asBits.payload)));

I think you're trying to leave JSValue() behind in the popped location. Storing zero is not the right way to do that. (See comment above.)

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT32_64.cpp:1371
> +        silentSpillAllRegisters(valueTagGPR, valuePayloadGPR);
> +        m_jit.push(baseGPR);
> +        m_jit.push(GPRInfo::callFrameRegister);
> +        appendCallWithExceptionCheck(operationArrayPop);
> +        setupResults(valueTagGPR, valuePayloadGPR);
> +        silentFillAllRegisters(valueTagGPR, valuePayloadGPR);

Is inlining pop on a discontiguous array worth it? I'd expect the operation to be very rare.
Comment 8 Filip Pizlo 2011-10-03 20:10:06 PDT
Created attachment 109576 [details]
the patch to fix Geoff's concerns
Comment 9 Geoffrey Garen 2011-10-03 20:17:44 PDT
Comment on attachment 109576 [details]
the patch to fix Geoff's concerns

r=me
Comment 10 Filip Pizlo 2011-10-03 20:21:40 PDT
32-bit fix landed in r96569.