WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
55158
Use VFP for double to int truncate in ARMv7
https://bugs.webkit.org/show_bug.cgi?id=55158
Summary
Use VFP for double to int truncate in ARMv7
Xan Lopez
Reported
2011-02-24 09:57:28 PST
There's some FIXME code in ARMv7 that claims that VFP is not usable in the current conditions for the truncate and division operations. Nevertheless the ARM traditional code uses VFP just fine while sharing the same MacroAssembler APIs, so it stands to reason that either the comment is somewhat obsolete or there is some confusion somewhere (maybe on my side!). In order to test this, I took the VFP truncate stuff and ported it to ARM Thumb2; all tests seem to pass, and I get a 0.6% win in SunSpider. This patch depends on the patch in
https://bugs.webkit.org/show_bug.cgi?id=55046
, since now we need the VFP detection mechanism in ARMv7 too.
Attachments
vcvtr.diff
(4.48 KB, patch)
2011-02-24 09:59 PST
,
Xan Lopez
no flags
Details
Formatted Diff
Diff
fptruncate.diff
(12.93 KB, patch)
2011-04-29 12:08 PDT
,
Xan Lopez
barraclough
: review-
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Xan Lopez
Comment 1
2011-02-24 09:59:53 PST
Created
attachment 83675
[details]
vcvtr.diff
Xan Lopez
Comment 2
2011-02-24 10:01:41 PST
Two notes: Someone suggested to check whether using FPSCR actually worked better. I tried doing something like: #if 0 m_assembler.vcvtr_S32_F64(fpTempRegisterAsSingle(), src); m_assembler.vmov(dest, fpTempRegisterAsSingle()); m_assembler.vmrs(); return makeBranch(ARMv7Assembler::ConditionVS); #endif but 3 tests fail. Is VS checking for all kinds of over and underflows? Is there something wrong in the code? Also, here's the SunSpider compare for reference: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.006x as fast 2037.0ms +/- 0.2% 2024.6ms +/- 0.2% significant ============================================================================= 3d: ?? 278.9ms +/- 1.0% 282.1ms +/- 1.4% not conclusive: might be *1.011x as slow* cube: *1.006x as slow* 93.5ms +/- 0.2% 94.1ms +/- 0.4% significant morph: ?? 65.8ms +/- 1.1% 66.9ms +/- 2.1% not conclusive: might be *1.017x as slow* raytrace: ?? 119.6ms +/- 1.7% 121.1ms +/- 2.0% not conclusive: might be *1.013x as slow* access: - 274.2ms +/- 0.8% 273.8ms +/- 0.6% binary-trees: - 59.2ms +/- 1.7% 59.2ms +/- 1.6% fannkuch: - 103.4ms +/- 1.2% 103.1ms +/- 0.8% nbody: ?? 73.8ms +/- 0.2% 73.9ms +/- 0.1% not conclusive: might be *1.002x as slow* nsieve: - 37.8ms +/- 0.6% 37.5ms +/- 0.5% bitops: ?? 146.9ms +/- 0.1% 147.2ms +/- 0.2% not conclusive: might be *1.002x as slow* 3bit-bits-in-byte: - 23.0ms +/- 0.0% 23.0ms +/- 0.0% bits-in-byte: - 32.0ms +/- 0.1% 32.0ms +/- 0.0% bitwise-and: *1.027x as slow* 39.8ms +/- 0.4% 40.9ms +/- 0.5% significant nsieve-bits: 1.016x as fast 52.1ms +/- 0.2% 51.3ms +/- 0.3% significant controlflow: - 16.0ms +/- 0.0% 16.0ms +/- 0.0% recursive: - 16.0ms +/- 0.0% 16.0ms +/- 0.0% crypto: 1.004x as fast 152.1ms +/- 0.2% 151.6ms +/- 0.1% significant aes: ?? 86.4ms +/- 0.2% 86.5ms +/- 0.2% not conclusive: might be *1.001x as slow* md5: - 39.0ms +/- 0.0% 39.0ms +/- 0.0% sha1: 1.027x as fast 26.7ms +/- 0.5% 26.0ms +/- 0.2% significant date: 1.003x as fast 251.1ms +/- 0.2% 250.3ms +/- 0.2% significant format-tofte: - 124.5ms +/- 0.2% 124.3ms +/- 0.2% format-xparb: 1.005x as fast 126.6ms +/- 0.3% 125.9ms +/- 0.3% significant math: 1.083x as fast 194.5ms +/- 0.1% 179.5ms +/- 0.1% significant cordic: 1.32x as fast 61.4ms +/- 0.2% 46.4ms +/- 0.3% significant partial-sums: - 92.1ms +/- 0.1% 92.1ms +/- 0.1% spectral-norm: - 41.0ms +/- 0.1% 41.1ms +/- 0.2% regexp: - 88.8ms +/- 0.4% 88.7ms +/- 0.2% dna: - 88.8ms +/- 0.4% 88.7ms +/- 0.2% string: *1.002x as slow* 634.4ms +/- 0.1% 635.5ms +/- 0.1% significant base64: *1.006x as slow* 79.0ms +/- 0.2% 79.5ms +/- 0.3% significant fasta: - 76.1ms +/- 0.1% 76.1ms +/- 0.1% tagcloud: *1.003x as slow* 146.6ms +/- 0.2% 147.0ms +/- 0.2% significant unpack-code: ?? 240.3ms +/- 0.1% 240.6ms +/- 0.2% not conclusive: might be *1.001x as slow* validate-input: - 92.3ms +/- 0.2% 92.3ms +/- 0.2%
Eric Seidel (no email)
Comment 3
2011-04-26 15:38:53 PDT
This should be easy for Gavin to review. But I don't speak ARM.
Geoffrey Garen
Comment 4
2011-04-26 16:49:16 PDT
Comment on
attachment 83675
[details]
vcvtr.diff r=me
Gavin Barraclough
Comment 5
2011-04-26 17:03:54 PDT
Xan, This is great! But, supportsFloatingPointTruncate() only existed because ARMv7 didn't support it. I think you really should remove this method from all MacroAssemblers as a part of this patch, it would be a great cleanup (any calls to supportsFloatingPointTruncate should now be calls to supportsFloatingPoint). r=me with this cleanup. cheers, G.
Gavin Barraclough
Comment 6
2011-04-26 17:05:19 PDT
Comment on
attachment 83675
[details]
vcvtr.diff Okay, I'm going to re-set the r+ so I don't hold this up, but please please do remove the redundant functions!
Xan Lopez
Comment 7
2011-04-26 17:13:24 PDT
(In reply to
comment #6
)
> (From update of
attachment 83675
[details]
) > Okay, I'm going to re-set the r+ so I don't hold this up, but please please do remove the redundant functions!
Turns out I forgot the Mac build stuff in
https://bugs.webkit.org/show_bug.cgi?id=55046
(which this patch needs), so I'll need to go through everything again before landing. Feel free to r- for now and I can apply your suggestions and upload again.
Xan Lopez
Comment 8
2011-04-28 16:27:44 PDT
(In reply to
comment #5
)
> Xan, > > This is great! But, supportsFloatingPointTruncate() only existed because ARMv7 didn't support it. I think you really should remove this method from all MacroAssemblers as a part of this patch, it would be a great cleanup (any calls to supportsFloatingPointTruncate should now be calls to supportsFloatingPoint). >
It seems MIPS considers a case where it can support FP in general but not FP truncate, since the methods are not identical, so perhaps we cannot completely remove it (or we should make it MIPS only somehow). CCing MIPS people according to the logs.
Gavin Barraclough
Comment 9
2011-04-29 00:11:00 PDT
Gah, I thought I'd checked all platforms before making my comment, I'd missed the extra clause on the ifdef in MIPS's supportsFloatingPoint. Based on a little wikipedia research, the only chips toting MIPS I architecture appear to be over 20 years old, and run at under 100MHz. If this understanding is correct, I doubt there are many users out there running the JSC JIT on these chips, and as such the performance benefit for having (some) FP enabled is unlikely to be significant! As such, it doesn't seem unreasonable to unify the MIPS floating point support on: #if WTF_MIPS_DOUBLE_FLOAT && WTF_MIPS_ISA_AT_LEAST(2) so that we can unify the supportsFloatingPoint & supportsFloatingPointTruncate methods. Would be great to get some input from the mips folk to confirm that my understanding of the WTF_MIPS_ISA_AT_LEAST is correct, and that we would not be degrading the performance for real deployment uses.
Chao-ying Fu
Comment 10
2011-04-29 11:12:37 PDT
(In reply to
comment #9
)
> Gah, I thought I'd checked all platforms before making my comment, I'd missed the extra clause on the ifdef in MIPS's supportsFloatingPoint. > > Based on a little wikipedia research, the only chips toting MIPS I architecture appear to be over 20 years old, and run at under 100MHz. If this understanding is correct, I doubt there are many users out there running the JSC JIT on these chips, and as such the performance benefit for having (some) FP enabled is unlikely to be significant!
I know only Maciej W. Rozycki uses MIPS1. But I don't know if he runs WebKit. Other people should be ok.
> > As such, it doesn't seem unreasonable to unify the MIPS floating point support on: > #if WTF_MIPS_DOUBLE_FLOAT && WTF_MIPS_ISA_AT_LEAST(2) > so that we can unify the supportsFloatingPoint & supportsFloatingPointTruncate methods. > > Would be great to get some input from the mips folk to confirm that my understanding of the WTF_MIPS_ISA_AT_LEAST is correct, and that we would not be degrading the performance for real deployment uses.
I am fine to unify the floating-point support by adding WTF_MIPS_ISA_AT_LEAST(2). Thanks! Regards, Chao-ying
Xan Lopez
Comment 11
2011-04-29 12:08:03 PDT
Created
attachment 91715
[details]
fptruncate.diff Thanks for the quick reply, here's the patch with the suggested cleanup. I have another patch that enables fp sqrt for ARMv7, I guess I can do the same cleanup there too. I'll push it to bugzilla ASAP.
Xan Lopez
Comment 12
2011-04-29 12:10:00 PDT
This builds on
bug #55046
, marking the dependency.
Darin Adler
Comment 13
2011-06-18 12:15:27 PDT
Comment on
attachment 83675
[details]
vcvtr.diff Clearing review on the pre-cleanup patch.
Xan Lopez
Comment 14
2011-07-08 04:41:02 PDT
Gaving: ping about this? We only need to land
bug #55046
and then we can land this.
Gavin Barraclough
Comment 15
2011-07-12 23:17:02 PDT
Comment on
attachment 91715
[details]
fptruncate.diff View in context:
https://bugs.webkit.org/attachment.cgi?id=91715&action=review
Looks great xan.
> Source/JavaScriptCore/assembler/MacroAssemblerARMv7.h:676 > + m_assembler.sub_S(dataTempRegister, dest, ARMThumbImmediate::makeEncodedImm(0x80000000));
Perhaps we should change the signature of this method to take a JumpList, to avoid the branch-EQ to branch-EQ. Still, this is a second order issue to getting branchTruncateDoubleToInt32 doing anything at all!
Gavin Barraclough
Comment 16
2011-07-13 11:30:23 PDT
Comment on
attachment 91715
[details]
fptruncate.diff r- re comments in 55046, lets not make supportsFloatingPoint() on ARMv7 perform a dynamic check.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug