There's some FIXME code in ARMv7 that claims that VFP is not usable in the current conditions for the truncate and division operations. Nevertheless the ARM traditional code uses VFP just fine while sharing the same MacroAssembler APIs, so it stands to reason that either the comment is somewhat obsolete or there is some confusion somewhere (maybe on my side!). In order to test this, I took the VFP truncate stuff and ported it to ARM Thumb2; all tests seem to pass, and I get a 0.6% win in SunSpider. This patch depends on the patch in https://bugs.webkit.org/show_bug.cgi?id=55046, since now we need the VFP detection mechanism in ARMv7 too.
Created attachment 83675 [details] vcvtr.diff
Two notes: Someone suggested to check whether using FPSCR actually worked better. I tried doing something like: #if 0 m_assembler.vcvtr_S32_F64(fpTempRegisterAsSingle(), src); m_assembler.vmov(dest, fpTempRegisterAsSingle()); m_assembler.vmrs(); return makeBranch(ARMv7Assembler::ConditionVS); #endif but 3 tests fail. Is VS checking for all kinds of over and underflows? Is there something wrong in the code? Also, here's the SunSpider compare for reference: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.006x as fast 2037.0ms +/- 0.2% 2024.6ms +/- 0.2% significant ============================================================================= 3d: ?? 278.9ms +/- 1.0% 282.1ms +/- 1.4% not conclusive: might be *1.011x as slow* cube: *1.006x as slow* 93.5ms +/- 0.2% 94.1ms +/- 0.4% significant morph: ?? 65.8ms +/- 1.1% 66.9ms +/- 2.1% not conclusive: might be *1.017x as slow* raytrace: ?? 119.6ms +/- 1.7% 121.1ms +/- 2.0% not conclusive: might be *1.013x as slow* access: - 274.2ms +/- 0.8% 273.8ms +/- 0.6% binary-trees: - 59.2ms +/- 1.7% 59.2ms +/- 1.6% fannkuch: - 103.4ms +/- 1.2% 103.1ms +/- 0.8% nbody: ?? 73.8ms +/- 0.2% 73.9ms +/- 0.1% not conclusive: might be *1.002x as slow* nsieve: - 37.8ms +/- 0.6% 37.5ms +/- 0.5% bitops: ?? 146.9ms +/- 0.1% 147.2ms +/- 0.2% not conclusive: might be *1.002x as slow* 3bit-bits-in-byte: - 23.0ms +/- 0.0% 23.0ms +/- 0.0% bits-in-byte: - 32.0ms +/- 0.1% 32.0ms +/- 0.0% bitwise-and: *1.027x as slow* 39.8ms +/- 0.4% 40.9ms +/- 0.5% significant nsieve-bits: 1.016x as fast 52.1ms +/- 0.2% 51.3ms +/- 0.3% significant controlflow: - 16.0ms +/- 0.0% 16.0ms +/- 0.0% recursive: - 16.0ms +/- 0.0% 16.0ms +/- 0.0% crypto: 1.004x as fast 152.1ms +/- 0.2% 151.6ms +/- 0.1% significant aes: ?? 86.4ms +/- 0.2% 86.5ms +/- 0.2% not conclusive: might be *1.001x as slow* md5: - 39.0ms +/- 0.0% 39.0ms +/- 0.0% sha1: 1.027x as fast 26.7ms +/- 0.5% 26.0ms +/- 0.2% significant date: 1.003x as fast 251.1ms +/- 0.2% 250.3ms +/- 0.2% significant format-tofte: - 124.5ms +/- 0.2% 124.3ms +/- 0.2% format-xparb: 1.005x as fast 126.6ms +/- 0.3% 125.9ms +/- 0.3% significant math: 1.083x as fast 194.5ms +/- 0.1% 179.5ms +/- 0.1% significant cordic: 1.32x as fast 61.4ms +/- 0.2% 46.4ms +/- 0.3% significant partial-sums: - 92.1ms +/- 0.1% 92.1ms +/- 0.1% spectral-norm: - 41.0ms +/- 0.1% 41.1ms +/- 0.2% regexp: - 88.8ms +/- 0.4% 88.7ms +/- 0.2% dna: - 88.8ms +/- 0.4% 88.7ms +/- 0.2% string: *1.002x as slow* 634.4ms +/- 0.1% 635.5ms +/- 0.1% significant base64: *1.006x as slow* 79.0ms +/- 0.2% 79.5ms +/- 0.3% significant fasta: - 76.1ms +/- 0.1% 76.1ms +/- 0.1% tagcloud: *1.003x as slow* 146.6ms +/- 0.2% 147.0ms +/- 0.2% significant unpack-code: ?? 240.3ms +/- 0.1% 240.6ms +/- 0.2% not conclusive: might be *1.001x as slow* validate-input: - 92.3ms +/- 0.2% 92.3ms +/- 0.2%
This should be easy for Gavin to review. But I don't speak ARM.
Comment on attachment 83675 [details] vcvtr.diff r=me
Xan, This is great! But, supportsFloatingPointTruncate() only existed because ARMv7 didn't support it. I think you really should remove this method from all MacroAssemblers as a part of this patch, it would be a great cleanup (any calls to supportsFloatingPointTruncate should now be calls to supportsFloatingPoint). r=me with this cleanup. cheers, G.
Comment on attachment 83675 [details] vcvtr.diff Okay, I'm going to re-set the r+ so I don't hold this up, but please please do remove the redundant functions!
(In reply to comment #6) > (From update of attachment 83675 [details]) > Okay, I'm going to re-set the r+ so I don't hold this up, but please please do remove the redundant functions! Turns out I forgot the Mac build stuff in https://bugs.webkit.org/show_bug.cgi?id=55046 (which this patch needs), so I'll need to go through everything again before landing. Feel free to r- for now and I can apply your suggestions and upload again.
(In reply to comment #5) > Xan, > > This is great! But, supportsFloatingPointTruncate() only existed because ARMv7 didn't support it. I think you really should remove this method from all MacroAssemblers as a part of this patch, it would be a great cleanup (any calls to supportsFloatingPointTruncate should now be calls to supportsFloatingPoint). > It seems MIPS considers a case where it can support FP in general but not FP truncate, since the methods are not identical, so perhaps we cannot completely remove it (or we should make it MIPS only somehow). CCing MIPS people according to the logs.
Gah, I thought I'd checked all platforms before making my comment, I'd missed the extra clause on the ifdef in MIPS's supportsFloatingPoint. Based on a little wikipedia research, the only chips toting MIPS I architecture appear to be over 20 years old, and run at under 100MHz. If this understanding is correct, I doubt there are many users out there running the JSC JIT on these chips, and as such the performance benefit for having (some) FP enabled is unlikely to be significant! As such, it doesn't seem unreasonable to unify the MIPS floating point support on: #if WTF_MIPS_DOUBLE_FLOAT && WTF_MIPS_ISA_AT_LEAST(2) so that we can unify the supportsFloatingPoint & supportsFloatingPointTruncate methods. Would be great to get some input from the mips folk to confirm that my understanding of the WTF_MIPS_ISA_AT_LEAST is correct, and that we would not be degrading the performance for real deployment uses.
(In reply to comment #9) > Gah, I thought I'd checked all platforms before making my comment, I'd missed the extra clause on the ifdef in MIPS's supportsFloatingPoint. > > Based on a little wikipedia research, the only chips toting MIPS I architecture appear to be over 20 years old, and run at under 100MHz. If this understanding is correct, I doubt there are many users out there running the JSC JIT on these chips, and as such the performance benefit for having (some) FP enabled is unlikely to be significant! I know only Maciej W. Rozycki uses MIPS1. But I don't know if he runs WebKit. Other people should be ok. > > As such, it doesn't seem unreasonable to unify the MIPS floating point support on: > #if WTF_MIPS_DOUBLE_FLOAT && WTF_MIPS_ISA_AT_LEAST(2) > so that we can unify the supportsFloatingPoint & supportsFloatingPointTruncate methods. > > Would be great to get some input from the mips folk to confirm that my understanding of the WTF_MIPS_ISA_AT_LEAST is correct, and that we would not be degrading the performance for real deployment uses. I am fine to unify the floating-point support by adding WTF_MIPS_ISA_AT_LEAST(2). Thanks! Regards, Chao-ying
Created attachment 91715 [details] fptruncate.diff Thanks for the quick reply, here's the patch with the suggested cleanup. I have another patch that enables fp sqrt for ARMv7, I guess I can do the same cleanup there too. I'll push it to bugzilla ASAP.
This builds on bug #55046, marking the dependency.
Comment on attachment 83675 [details] vcvtr.diff Clearing review on the pre-cleanup patch.
Gaving: ping about this? We only need to land bug #55046 and then we can land this.
Comment on attachment 91715 [details] fptruncate.diff View in context: https://bugs.webkit.org/attachment.cgi?id=91715&action=review Looks great xan. > Source/JavaScriptCore/assembler/MacroAssemblerARMv7.h:676 > + m_assembler.sub_S(dataTempRegister, dest, ARMThumbImmediate::makeEncodedImm(0x80000000)); Perhaps we should change the signature of this method to take a JumpList, to avoid the branch-EQ to branch-EQ. Still, this is a second order issue to getting branchTruncateDoubleToInt32 doing anything at all!
Comment on attachment 91715 [details] fptruncate.diff r- re comments in 55046, lets not make supportsFloatingPoint() on ARMv7 perform a dynamic check.