55158 – Use VFP for double to int truncate in ARMv7

NEW55158

Use VFP for double to int truncate in ARMv7

https://bugs.webkit.org/show_bug.cgi?id=55158

Summary Use VFP for double to int truncate in ARMv7

Xan Lopez

Reported 2011-02-24 09:57:28 PST

There's some FIXME code in ARMv7 that claims that VFP is not usable in the current conditions for the truncate and division operations. Nevertheless the ARM traditional code uses VFP just fine while sharing the same MacroAssembler APIs, so it stands to reason that either the comment is somewhat obsolete or there is some confusion somewhere (maybe on my side!). In order to test this, I took the VFP truncate stuff and ported it to ARM Thumb2; all tests seem to pass, and I get a 0.6% win in SunSpider. This patch depends on the patch in https://bugs.webkit.org/show_bug.cgi?id=55046, since now we need the VFP detection mechanism in ARMv7 too.

Attachments
vcvtr.diff (4.48 KB, patch) 2011-02-24 09:59 PST, Xan Lopez	no flags	Details Formatted Diff Diff
fptruncate.diff (12.93 KB, patch) 2011-04-29 12:08 PDT, Xan Lopez	barraclough: review-	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Xan Lopez

Comment 1 2011-02-24 09:59:53 PST

Created attachment 83675 [details] vcvtr.diff

Xan Lopez

Comment 2 2011-02-24 10:01:41 PST

Two notes: Someone suggested to check whether using FPSCR actually worked better. I tried doing something like: #if 0 m_assembler.vcvtr_S32_F64(fpTempRegisterAsSingle(), src); m_assembler.vmov(dest, fpTempRegisterAsSingle()); m_assembler.vmrs(); return makeBranch(ARMv7Assembler::ConditionVS); #endif but 3 tests fail. Is VS checking for all kinds of over and underflows? Is there something wrong in the code? Also, here's the SunSpider compare for reference: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.006x as fast 2037.0ms +/- 0.2% 2024.6ms +/- 0.2% significant ============================================================================= 3d: ?? 278.9ms +/- 1.0% 282.1ms +/- 1.4% not conclusive: might be *1.011x as slow* cube: *1.006x as slow* 93.5ms +/- 0.2% 94.1ms +/- 0.4% significant morph: ?? 65.8ms +/- 1.1% 66.9ms +/- 2.1% not conclusive: might be *1.017x as slow* raytrace: ?? 119.6ms +/- 1.7% 121.1ms +/- 2.0% not conclusive: might be *1.013x as slow* access: - 274.2ms +/- 0.8% 273.8ms +/- 0.6% binary-trees: - 59.2ms +/- 1.7% 59.2ms +/- 1.6% fannkuch: - 103.4ms +/- 1.2% 103.1ms +/- 0.8% nbody: ?? 73.8ms +/- 0.2% 73.9ms +/- 0.1% not conclusive: might be *1.002x as slow* nsieve: - 37.8ms +/- 0.6% 37.5ms +/- 0.5% bitops: ?? 146.9ms +/- 0.1% 147.2ms +/- 0.2% not conclusive: might be *1.002x as slow* 3bit-bits-in-byte: - 23.0ms +/- 0.0% 23.0ms +/- 0.0% bits-in-byte: - 32.0ms +/- 0.1% 32.0ms +/- 0.0% bitwise-and: *1.027x as slow* 39.8ms +/- 0.4% 40.9ms +/- 0.5% significant nsieve-bits: 1.016x as fast 52.1ms +/- 0.2% 51.3ms +/- 0.3% significant controlflow: - 16.0ms +/- 0.0% 16.0ms +/- 0.0% recursive: - 16.0ms +/- 0.0% 16.0ms +/- 0.0% crypto: 1.004x as fast 152.1ms +/- 0.2% 151.6ms +/- 0.1% significant aes: ?? 86.4ms +/- 0.2% 86.5ms +/- 0.2% not conclusive: might be *1.001x as slow* md5: - 39.0ms +/- 0.0% 39.0ms +/- 0.0% sha1: 1.027x as fast 26.7ms +/- 0.5% 26.0ms +/- 0.2% significant date: 1.003x as fast 251.1ms +/- 0.2% 250.3ms +/- 0.2% significant format-tofte: - 124.5ms +/- 0.2% 124.3ms +/- 0.2% format-xparb: 1.005x as fast 126.6ms +/- 0.3% 125.9ms +/- 0.3% significant math: 1.083x as fast 194.5ms +/- 0.1% 179.5ms +/- 0.1% significant cordic: 1.32x as fast 61.4ms +/- 0.2% 46.4ms +/- 0.3% significant partial-sums: - 92.1ms +/- 0.1% 92.1ms +/- 0.1% spectral-norm: - 41.0ms +/- 0.1% 41.1ms +/- 0.2% regexp: - 88.8ms +/- 0.4% 88.7ms +/- 0.2% dna: - 88.8ms +/- 0.4% 88.7ms +/- 0.2% string: *1.002x as slow* 634.4ms +/- 0.1% 635.5ms +/- 0.1% significant base64: *1.006x as slow* 79.0ms +/- 0.2% 79.5ms +/- 0.3% significant fasta: - 76.1ms +/- 0.1% 76.1ms +/- 0.1% tagcloud: *1.003x as slow* 146.6ms +/- 0.2% 147.0ms +/- 0.2% significant unpack-code: ?? 240.3ms +/- 0.1% 240.6ms +/- 0.2% not conclusive: might be *1.001x as slow* validate-input: - 92.3ms +/- 0.2% 92.3ms +/- 0.2%

Eric Seidel (no email)

Comment 3 2011-04-26 15:38:53 PDT

This should be easy for Gavin to review. But I don't speak ARM.

Geoffrey Garen

Comment 4 2011-04-26 16:49:16 PDT

Comment on attachment 83675 [details] vcvtr.diff r=me

Gavin Barraclough

Comment 5 2011-04-26 17:03:54 PDT

Xan, This is great! But, supportsFloatingPointTruncate() only existed because ARMv7 didn't support it. I think you really should remove this method from all MacroAssemblers as a part of this patch, it would be a great cleanup (any calls to supportsFloatingPointTruncate should now be calls to supportsFloatingPoint). r=me with this cleanup. cheers, G.

Gavin Barraclough

Comment 6 2011-04-26 17:05:19 PDT

Comment on attachment 83675 [details] vcvtr.diff Okay, I'm going to re-set the r+ so I don't hold this up, but please please do remove the redundant functions!

Xan Lopez

Comment 7 2011-04-26 17:13:24 PDT

(In reply to comment #6) > (From update of attachment 83675 [details]) > Okay, I'm going to re-set the r+ so I don't hold this up, but please please do remove the redundant functions! Turns out I forgot the Mac build stuff in https://bugs.webkit.org/show_bug.cgi?id=55046 (which this patch needs), so I'll need to go through everything again before landing. Feel free to r- for now and I can apply your suggestions and upload again.

Xan Lopez

Comment 8 2011-04-28 16:27:44 PDT

(In reply to comment #5) > Xan, > > This is great! But, supportsFloatingPointTruncate() only existed because ARMv7 didn't support it. I think you really should remove this method from all MacroAssemblers as a part of this patch, it would be a great cleanup (any calls to supportsFloatingPointTruncate should now be calls to supportsFloatingPoint). > It seems MIPS considers a case where it can support FP in general but not FP truncate, since the methods are not identical, so perhaps we cannot completely remove it (or we should make it MIPS only somehow). CCing MIPS people according to the logs.

Gavin Barraclough

Comment 9 2011-04-29 00:11:00 PDT

Gah, I thought I'd checked all platforms before making my comment, I'd missed the extra clause on the ifdef in MIPS's supportsFloatingPoint. Based on a little wikipedia research, the only chips toting MIPS I architecture appear to be over 20 years old, and run at under 100MHz. If this understanding is correct, I doubt there are many users out there running the JSC JIT on these chips, and as such the performance benefit for having (some) FP enabled is unlikely to be significant! As such, it doesn't seem unreasonable to unify the MIPS floating point support on: #if WTF_MIPS_DOUBLE_FLOAT && WTF_MIPS_ISA_AT_LEAST(2) so that we can unify the supportsFloatingPoint & supportsFloatingPointTruncate methods. Would be great to get some input from the mips folk to confirm that my understanding of the WTF_MIPS_ISA_AT_LEAST is correct, and that we would not be degrading the performance for real deployment uses.

Chao-ying Fu

Comment 10 2011-04-29 11:12:37 PDT

(In reply to comment #9) > Gah, I thought I'd checked all platforms before making my comment, I'd missed the extra clause on the ifdef in MIPS's supportsFloatingPoint. > > Based on a little wikipedia research, the only chips toting MIPS I architecture appear to be over 20 years old, and run at under 100MHz. If this understanding is correct, I doubt there are many users out there running the JSC JIT on these chips, and as such the performance benefit for having (some) FP enabled is unlikely to be significant! I know only Maciej W. Rozycki uses MIPS1. But I don't know if he runs WebKit. Other people should be ok. > > As such, it doesn't seem unreasonable to unify the MIPS floating point support on: > #if WTF_MIPS_DOUBLE_FLOAT && WTF_MIPS_ISA_AT_LEAST(2) > so that we can unify the supportsFloatingPoint & supportsFloatingPointTruncate methods. > > Would be great to get some input from the mips folk to confirm that my understanding of the WTF_MIPS_ISA_AT_LEAST is correct, and that we would not be degrading the performance for real deployment uses. I am fine to unify the floating-point support by adding WTF_MIPS_ISA_AT_LEAST(2). Thanks! Regards, Chao-ying

Xan Lopez

Comment 11 2011-04-29 12:08:03 PDT

Created attachment 91715 [details] fptruncate.diff Thanks for the quick reply, here's the patch with the suggested cleanup. I have another patch that enables fp sqrt for ARMv7, I guess I can do the same cleanup there too. I'll push it to bugzilla ASAP.

Xan Lopez

Comment 12 2011-04-29 12:10:00 PDT

This builds on bug #55046, marking the dependency.

Darin Adler

Comment 13 2011-06-18 12:15:27 PDT

Comment on attachment 83675 [details] vcvtr.diff Clearing review on the pre-cleanup patch.

Xan Lopez

Comment 14 2011-07-08 04:41:02 PDT

Gaving: ping about this? We only need to land bug #55046 and then we can land this.

Gavin Barraclough

Comment 15 2011-07-12 23:17:02 PDT

Comment on attachment 91715 [details] fptruncate.diff View in context: https://bugs.webkit.org/attachment.cgi?id=91715&action=review Looks great xan. > Source/JavaScriptCore/assembler/MacroAssemblerARMv7.h:676 > + m_assembler.sub_S(dataTempRegister, dest, ARMThumbImmediate::makeEncodedImm(0x80000000)); Perhaps we should change the signature of this method to take a JumpList, to avoid the branch-EQ to branch-EQ. Still, this is a second order issue to getting branchTruncateDoubleToInt32 doing anything at all!

Gavin Barraclough

Comment 16 2011-07-13 11:30:23 PDT

Comment on attachment 91715 [details] fptruncate.diff r- re comments in 55046, lets not make supportsFloatingPoint() on ARMv7 perform a dynamic check.

Note You need to log in before you can comment on or make changes to this bug.

Status NEW

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware PC

OS OS X 10.5

Product WebKit

Component JavaScriptCore

Assignee

Nobody

Reported

2011-02-24 09:57 PST

Modified

2011-07-13 11:30 PDT History

CC List

5 users Show

URL

Keywords

Depends on

55046

Blocks

Dependencies

tree graph