RESOLVED FIXED Bug 69314
DFG should inline Array.push and Array.pop
https://bugs.webkit.org/show_bug.cgi?id=69314
Summary DFG should inline Array.push and Array.pop
Filip Pizlo
Reported 2011-10-03 19:10:03 PDT
Array.push and pop have a common case that is trivial to inline. They are both easy to mark intrinsic and recognize in the DFG parser.
Attachments
work in progress (18.90 KB, patch)
2011-10-03 19:12 PDT, Filip Pizlo
no flags
the patch (25.54 KB, patch)
2011-10-03 19:32 PDT, Filip Pizlo
oliver: review+
the patch to fix Geoff's concerns (2.14 KB, patch)
2011-10-03 20:10 PDT, Filip Pizlo
ggaren: review+
Filip Pizlo
Comment 1 2011-10-03 19:12:52 PDT
Created attachment 109571 [details] work in progress
Filip Pizlo
Comment 2 2011-10-03 19:13:08 PDT
Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "ArrayPushPop" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree ArrayPushPop SunSpider: 3d-cube 7.5292+-0.2685 7.4916+-0.2162 3d-morph 7.3690+-0.1498 7.3064+-0.1296 3d-raytrace 7.4644+-0.1651 7.4570+-0.2123 access-binary-trees 1.7149+-0.0517 ? 1.7663+-0.0731 ? might be 1.0300x slower access-fannkuch 6.4593+-0.2204 6.3815+-0.1350 might be 1.0122x faster access-nbody 3.4865+-0.0837 ? 3.5506+-0.0637 ? might be 1.0184x slower access-nsieve 2.6700+-0.0462 2.6154+-0.0684 might be 1.0209x faster bitops-3bit-bits-in-byte 1.7404+-0.0418 1.7285+-0.0722 bitops-bits-in-byte 2.7215+-0.0405 ? 2.7849+-0.0656 ? might be 1.0233x slower bitops-bitwise-and 3.3357+-0.0719 ? 3.3819+-0.0840 ? might be 1.0139x slower bitops-nsieve-bits 5.4838+-0.1237 5.4370+-0.1371 controlflow-recursive 2.1070+-0.0606 2.0732+-0.0486 might be 1.0163x faster crypto-aes 6.5896+-0.2535 ? 6.6451+-0.2620 ? crypto-md5 2.8113+-0.0864 ? 2.8687+-0.1216 ? might be 1.0204x slower crypto-sha1 2.5359+-0.0903 2.4997+-0.0786 might be 1.0145x faster date-format-tofte 10.3008+-0.3695 10.0081+-0.2452 might be 1.0292x faster date-format-xparb 9.2723+-0.2153 ? 9.6544+-0.3292 ? might be 1.0412x slower math-cordic 6.4975+-0.1773 6.2694+-0.0924 might be 1.0364x faster math-partial-sums 7.6509+-0.1885 7.5679+-0.1814 might be 1.0110x faster math-spectral-norm 2.8249+-0.0756 2.8108+-0.0630 regexp-dna 10.6957+-0.2133 ? 10.8980+-0.2347 ? might be 1.0189x slower string-base64 5.6088+-0.2108 5.4974+-0.1223 might be 1.0203x faster string-fasta 6.7084+-0.2026 6.5720+-0.1958 might be 1.0208x faster string-tagcloud 11.8739+-0.4024 11.7775+-0.3457 string-unpack-code 21.2762+-0.3991 21.1529+-0.5638 string-validate-input 6.3806+-0.2175 6.3697+-0.1696 <arithmetic> * 6.2734+-0.0391 6.2525+-0.0317 <geometric> 5.1439+-0.0283 5.1324+-0.0220 <harmonic> 4.2177+-0.0296 4.2156+-0.0345 TipOfTree ArrayPushPop V8: crypto 71.9777+-0.3315 71.9465+-0.4020 deltablue 226.6944+-1.3479 ^ 213.3755+-1.1471 ^ definitely 1.0624x faster earley-boyer 87.8452+-1.3144 87.5402+-0.2419 raytrace 61.2115+-0.4091 ! 61.9857+-0.3375 ! definitely 1.0126x slower regexp 103.9358+-0.7486 103.5198+-0.4380 richards 187.2110+-0.4576 186.3760+-0.7210 splay 91.3657+-0.6363 90.6163+-0.6436 <arithmetic> 118.6059+-0.2335 ^ 116.4800+-0.2288 ^ definitely 1.0183x faster <geometric> * 106.5501+-0.2310 ^ 105.5149+-0.1960 ^ definitely 1.0098x faster <harmonic> 97.3001+-0.2407 96.9440+-0.1968 TipOfTree ArrayPushPop Kraken: ai-astar 490.0243+-3.5827 488.5005+-2.0491 audio-beat-detection 189.5874+-0.9309 ? 190.0763+-1.3708 ? audio-dft 269.2471+-2.7212 ? 269.3562+-2.5368 ? audio-fft 125.3247+-0.8031 125.3164+-0.2451 audio-oscillator 245.6467+-1.8417 244.7928+-1.9703 imaging-darkroom 418.5553+-1.0923 ? 419.8428+-1.6280 ? imaging-desaturate 224.0831+-0.4321 223.2770+-0.8439 imaging-gaussian-blur 583.0219+-0.8408 582.1372+-0.7914 json-parse-financial 48.3008+-0.4007 ? 48.9651+-0.2916 ? might be 1.0138x slower json-stringify-tinderbox 67.4519+-0.2791 ! 68.7807+-0.6156 ! definitely 1.0197x slower stanford-crypto-aes 129.2170+-1.4831 ? 130.8275+-2.3204 ? might be 1.0125x slower stanford-crypto-ccm 100.7769+-0.5569 100.6993+-0.7224 stanford-crypto-pbkdf2 194.2613+-0.7755 ^ 192.3477+-0.8872 ^ definitely 1.0099x faster stanford-crypto-sha256-iterative 76.3842+-0.2803 ^ 75.0738+-0.4485 ^ definitely 1.0175x faster <arithmetic> * 225.8488+-0.4877 225.7138+-0.5722 <geometric> 175.5947+-0.4593 ? 175.7403+-0.4814 ? <harmonic> 135.8117+-0.4880 ? 136.2780+-0.4244 ? TipOfTree ArrayPushPop All benchmarks: <arithmetic> 88.4092+-0.1537 ^ 88.0408+-0.1957 ^ definitely 1.0042x faster <geometric> 23.1224+-0.0772 23.0659+-0.0677 <harmonic> 7.4137+-0.0508 7.4102+-0.0592 TipOfTree ArrayPushPop Geomean of preferred means: <scaled-result> 53.2459+-0.1127 ^ 53.0037+-0.1270 ^ definitely 1.0046x faster
Filip Pizlo
Comment 3 2011-10-03 19:32:47 PDT
Created attachment 109574 [details] the patch
Oliver Hunt
Comment 4 2011-10-03 19:36:16 PDT
Comment on attachment 109574 [details] the patch So much sadness in seeing this code written twice :-/
Filip Pizlo
Comment 5 2011-10-03 19:40:28 PDT
Updated performance numbers after implementing 32-bit. Benchmark report for SunSpider, V8, and Kraken. VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc "ArrayPushPop" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree ArrayPushPop SunSpider: 3d-cube 7.4501+-0.2440 7.3643+-0.1557 might be 1.0117x faster 3d-morph 7.5153+-0.1409 7.3244+-0.1445 might be 1.0261x faster 3d-raytrace 7.7931+-0.3258 7.4860+-0.1856 might be 1.0410x faster access-binary-trees 1.7623+-0.0869 1.7397+-0.0671 might be 1.0130x faster access-fannkuch 6.3620+-0.1525 6.3063+-0.1004 access-nbody 3.5119+-0.0704 ? 3.5548+-0.1139 ? might be 1.0122x slower access-nsieve 2.6261+-0.0668 ? 2.6422+-0.0647 ? bitops-3bit-bits-in-byte 1.7484+-0.0531 ? 1.7505+-0.0409 ? bitops-bits-in-byte 2.7235+-0.0699 2.7132+-0.0929 bitops-bitwise-and 3.3628+-0.0731 3.3303+-0.0849 bitops-nsieve-bits 5.3552+-0.1130 ? 5.5315+-0.1475 ? might be 1.0329x slower controlflow-recursive 2.0652+-0.0315 ? 2.0902+-0.0544 ? might be 1.0121x slower crypto-aes 6.7184+-0.2505 ? 6.7252+-0.2908 ? crypto-md5 2.7716+-0.0827 ? 2.8147+-0.0787 ? might be 1.0155x slower crypto-sha1 2.4449+-0.0748 ? 2.5473+-0.0594 ? might be 1.0419x slower date-format-tofte 10.0495+-0.2735 ? 10.1968+-0.2721 ? might be 1.0147x slower date-format-xparb 9.2339+-0.2371 ? 9.4363+-0.1953 ? might be 1.0219x slower math-cordic 6.4047+-0.1813 ? 6.4916+-0.1603 ? might be 1.0136x slower math-partial-sums 7.6492+-0.1104 7.5506+-0.1539 might be 1.0130x faster math-spectral-norm 2.8396+-0.0693 ? 2.8464+-0.0796 ? regexp-dna 10.8828+-0.2515 10.8588+-0.2314 string-base64 5.3824+-0.0973 ? 5.5847+-0.1752 ? might be 1.0376x slower string-fasta 6.6755+-0.1671 6.5792+-0.1215 might be 1.0146x faster string-tagcloud 11.8123+-0.3088 11.5394+-0.3608 might be 1.0236x faster string-unpack-code 21.2662+-0.4159 ? 21.2771+-0.4652 ? string-validate-input 6.4247+-0.1741 ? 6.5574+-0.1890 ? might be 1.0207x slower <arithmetic> * 6.2627+-0.0328 ? 6.2630+-0.0282 ? <geometric> 5.1298+-0.0249 ? 5.1429+-0.0272 ? <harmonic> 4.2046+-0.0371 ? 4.2236+-0.0369 ? TipOfTree ArrayPushPop V8: crypto 72.2659+-0.5199 72.0394+-0.2829 deltablue 226.7959+-2.5189 ^ 212.2174+-1.2944 ^ definitely 1.0687x faster earley-boyer 87.3534+-0.2806 87.2338+-0.2535 raytrace 61.2481+-0.2701 ? 61.6174+-0.6874 ? regexp 103.8244+-0.4626 ? 104.3105+-1.5225 ? richards 187.4970+-1.4137 185.9868+-0.7538 splay 91.2620+-0.4211 ? 92.0918+-0.6788 ? <arithmetic> 118.6067+-0.4224 ^ 116.4996+-0.5168 ^ definitely 1.0181x faster <geometric> * 106.5333+-0.2654 ^ 105.6339+-0.5156 ^ definitely 1.0085x faster <harmonic> 97.2911+-0.2082 97.0665+-0.5095 TipOfTree ArrayPushPop Kraken: ai-astar 499.3114+-1.7404 ^ 495.2293+-1.4996 ^ definitely 1.0082x faster audio-beat-detection 189.2983+-1.3873 ? 190.5103+-0.8675 ? audio-dft 270.4937+-2.4771 ? 279.0681+-8.9224 ? might be 1.0317x slower audio-fft 125.6829+-0.8037 125.4832+-0.4322 audio-oscillator 247.0646+-1.6372 245.4810+-1.8741 imaging-darkroom 418.2131+-1.3020 ? 420.0575+-1.9099 ? imaging-desaturate 223.7626+-0.6231 ? 224.0123+-0.9501 ? imaging-gaussian-blur 583.4748+-1.6843 580.2892+-1.7409 json-parse-financial 48.2553+-0.2874 ? 48.6114+-0.3813 ? json-stringify-tinderbox 69.0173+-0.9414 67.9909+-0.2805 might be 1.0151x faster stanford-crypto-aes 130.2836+-1.5283 ? 130.7792+-1.7017 ? stanford-crypto-ccm 101.0237+-0.4357 ^ 99.8091+-0.5694 ^ definitely 1.0122x faster stanford-crypto-pbkdf2 194.9367+-1.6563 192.7134+-1.9841 might be 1.0115x faster stanford-crypto-sha256-iterative 76.9176+-0.2668 ? 77.6517+-2.6786 ? <arithmetic> * 226.9811+-0.3435 226.9776+-0.8794 <geometric> 176.4986+-0.3526 ? 176.5179+-0.7770 ? <harmonic> 136.5980+-0.3590 136.5628+-0.7900 TipOfTree ArrayPushPop All benchmarks: <arithmetic> 88.7407+-0.1208 88.4260+-0.3158 <geometric> 23.1221+-0.0698 ? 23.1261+-0.0792 ? <harmonic> 7.3920+-0.0636 ? 7.4244+-0.0632 ? TipOfTree ArrayPushPop Geomean of preferred means: <scaled-result> 53.3020+-0.1172 53.1518+-0.1505
Filip Pizlo
Comment 6 2011-10-03 19:55:19 PDT
Landed in r96567.
Geoffrey Garen
Comment 7 2011-10-03 20:01:30 PDT
Looks good overall, but I found some potential problems. > Source/JavaScriptCore/dfg/DFGSpeculativeJIT32_64.cpp:1284 > + speculationCheck(m_jit.branch32(MacroAssembler::Above, storageLengthGPR, TrustedImm32(0x7ffffffe))); Why is 0x7ffffffe special? I think we only need to guard unsigned overflow here. Could this just turn into a jo after the increment of storageLengthGPR? > Source/JavaScriptCore/dfg/DFGSpeculativeJIT32_64.cpp:1346 > + MacroAssembler::Jump notHole = m_jit.branchTestPtr(MacroAssembler::NonZero, valueTagGPR); > + MacroAssembler::Jump holeCase = m_jit.branchTestPtr(MacroAssembler::Zero, valuePayloadGPR); The whole value test in 32_64 should test the tag for equality to JSValue::EmptyValueTag. I think this is one more branch than necessary, and not correct. > Source/JavaScriptCore/dfg/DFGSpeculativeJIT32_64.cpp:1351 > + m_jit.store32(storageLengthGPR, MacroAssembler::BaseIndex(storageGPR, storageLengthGPR, MacroAssembler::ScalePtr, OBJECT_OFFSETOF(ArrayStorage, m_vector[0]) + OBJECT_OFFSETOF(JSValue, u.asBits.tag))); > + m_jit.store32(storageLengthGPR, MacroAssembler::BaseIndex(storageGPR, storageLengthGPR, MacroAssembler::ScalePtr, OBJECT_OFFSETOF(ArrayStorage, m_vector[0]) + OBJECT_OFFSETOF(JSValue, u.asBits.payload))); I think you're trying to leave JSValue() behind in the popped location. Storing zero is not the right way to do that. (See comment above.) > Source/JavaScriptCore/dfg/DFGSpeculativeJIT32_64.cpp:1371 > + silentSpillAllRegisters(valueTagGPR, valuePayloadGPR); > + m_jit.push(baseGPR); > + m_jit.push(GPRInfo::callFrameRegister); > + appendCallWithExceptionCheck(operationArrayPop); > + setupResults(valueTagGPR, valuePayloadGPR); > + silentFillAllRegisters(valueTagGPR, valuePayloadGPR); Is inlining pop on a discontiguous array worth it? I'd expect the operation to be very rare.
Filip Pizlo
Comment 8 2011-10-03 20:10:06 PDT
Created attachment 109576 [details] the patch to fix Geoff's concerns
Geoffrey Garen
Comment 9 2011-10-03 20:17:44 PDT
Comment on attachment 109576 [details] the patch to fix Geoff's concerns r=me
Filip Pizlo
Comment 10 2011-10-03 20:21:40 PDT
32-bit fix landed in r96569.
Note You need to log in before you can comment on or make changes to this bug.