Bug 55158 - Use VFP for double to int truncate in ARMv7
Summary: Use VFP for double to int truncate in ARMv7
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on: 55046
Blocks:
  Show dependency treegraph
 
Reported: 2011-02-24 09:57 PST by Xan Lopez
Modified: 2011-07-13 11:30 PDT (History)
5 users (show)

See Also:


Attachments
vcvtr.diff (4.48 KB, patch)
2011-02-24 09:59 PST, Xan Lopez
no flags Details | Formatted Diff | Diff
fptruncate.diff (12.93 KB, patch)
2011-04-29 12:08 PDT, Xan Lopez
barraclough: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Xan Lopez 2011-02-24 09:57:28 PST
There's some FIXME code in ARMv7 that claims that VFP is not usable in the current conditions for the truncate and division operations. Nevertheless the ARM traditional code uses VFP just fine while sharing the same MacroAssembler APIs, so it stands to reason that either the comment is somewhat obsolete or there is some confusion somewhere (maybe on my side!). In order to test this, I took the VFP truncate stuff and ported it to ARM Thumb2; all tests seem to pass, and I get a 0.6% win in SunSpider. This patch depends on the patch in https://bugs.webkit.org/show_bug.cgi?id=55046, since now we need the VFP detection mechanism in ARMv7 too.
Comment 1 Xan Lopez 2011-02-24 09:59:53 PST
Created attachment 83675 [details]
vcvtr.diff
Comment 2 Xan Lopez 2011-02-24 10:01:41 PST
Two notes:

Someone suggested to check whether using FPSCR actually worked better. I tried doing something like:


#if 0
        m_assembler.vcvtr_S32_F64(fpTempRegisterAsSingle(), src);
        m_assembler.vmov(dest, fpTempRegisterAsSingle());
        m_assembler.vmrs();
        return makeBranch(ARMv7Assembler::ConditionVS);
#endif

but 3 tests fail. Is VS checking for all kinds of over and underflows? Is there something wrong in the code?

Also, here's the SunSpider compare for reference:

TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           1.006x as fast    2037.0ms +/- 0.2%   2024.6ms +/- 0.2%     significant

=============================================================================

  3d:                  ??                 278.9ms +/- 1.0%    282.1ms +/- 1.4%     not conclusive: might be *1.011x as slow*
    cube:              *1.006x as slow*    93.5ms +/- 0.2%     94.1ms +/- 0.4%     significant
    morph:             ??                  65.8ms +/- 1.1%     66.9ms +/- 2.1%     not conclusive: might be *1.017x as slow*
    raytrace:          ??                 119.6ms +/- 1.7%    121.1ms +/- 2.0%     not conclusive: might be *1.013x as slow*

  access:              -                  274.2ms +/- 0.8%    273.8ms +/- 0.6% 
    binary-trees:      -                   59.2ms +/- 1.7%     59.2ms +/- 1.6% 
    fannkuch:          -                  103.4ms +/- 1.2%    103.1ms +/- 0.8% 
    nbody:             ??                  73.8ms +/- 0.2%     73.9ms +/- 0.1%     not conclusive: might be *1.002x as slow*
    nsieve:            -                   37.8ms +/- 0.6%     37.5ms +/- 0.5% 

  bitops:              ??                 146.9ms +/- 0.1%    147.2ms +/- 0.2%     not conclusive: might be *1.002x as slow*
    3bit-bits-in-byte: -                   23.0ms +/- 0.0%     23.0ms +/- 0.0% 
    bits-in-byte:      -                   32.0ms +/- 0.1%     32.0ms +/- 0.0% 
    bitwise-and:       *1.027x as slow*    39.8ms +/- 0.4%     40.9ms +/- 0.5%     significant
    nsieve-bits:       1.016x as fast      52.1ms +/- 0.2%     51.3ms +/- 0.3%     significant

  controlflow:         -                   16.0ms +/- 0.0%     16.0ms +/- 0.0% 
    recursive:         -                   16.0ms +/- 0.0%     16.0ms +/- 0.0% 

  crypto:              1.004x as fast     152.1ms +/- 0.2%    151.6ms +/- 0.1%     significant
    aes:               ??                  86.4ms +/- 0.2%     86.5ms +/- 0.2%     not conclusive: might be *1.001x as slow*
    md5:               -                   39.0ms +/- 0.0%     39.0ms +/- 0.0% 
    sha1:              1.027x as fast      26.7ms +/- 0.5%     26.0ms +/- 0.2%     significant

  date:                1.003x as fast     251.1ms +/- 0.2%    250.3ms +/- 0.2%     significant
    format-tofte:      -                  124.5ms +/- 0.2%    124.3ms +/- 0.2% 
    format-xparb:      1.005x as fast     126.6ms +/- 0.3%    125.9ms +/- 0.3%     significant

  math:                1.083x as fast     194.5ms +/- 0.1%    179.5ms +/- 0.1%     significant
    cordic:            1.32x as fast       61.4ms +/- 0.2%     46.4ms +/- 0.3%     significant
    partial-sums:      -                   92.1ms +/- 0.1%     92.1ms +/- 0.1% 
    spectral-norm:     -                   41.0ms +/- 0.1%     41.1ms +/- 0.2% 

  regexp:              -                   88.8ms +/- 0.4%     88.7ms +/- 0.2% 
    dna:               -                   88.8ms +/- 0.4%     88.7ms +/- 0.2% 

  string:              *1.002x as slow*   634.4ms +/- 0.1%    635.5ms +/- 0.1%     significant
    base64:            *1.006x as slow*    79.0ms +/- 0.2%     79.5ms +/- 0.3%     significant
    fasta:             -                   76.1ms +/- 0.1%     76.1ms +/- 0.1% 
    tagcloud:          *1.003x as slow*   146.6ms +/- 0.2%    147.0ms +/- 0.2%     significant
    unpack-code:       ??                 240.3ms +/- 0.1%    240.6ms +/- 0.2%     not conclusive: might be *1.001x as slow*
    validate-input:    -                   92.3ms +/- 0.2%     92.3ms +/- 0.2%
Comment 3 Eric Seidel (no email) 2011-04-26 15:38:53 PDT
This should be easy for Gavin to review. But I don't speak ARM.
Comment 4 Geoffrey Garen 2011-04-26 16:49:16 PDT
Comment on attachment 83675 [details]
vcvtr.diff

r=me
Comment 5 Gavin Barraclough 2011-04-26 17:03:54 PDT
Xan,

This is great! But, supportsFloatingPointTruncate() only existed because ARMv7 didn't support it.  I think you really should remove this method from all MacroAssemblers as a part of this patch, it would be a great cleanup (any calls to supportsFloatingPointTruncate should now be calls to supportsFloatingPoint).

r=me with this cleanup.

cheers,
G.
Comment 6 Gavin Barraclough 2011-04-26 17:05:19 PDT
Comment on attachment 83675 [details]
vcvtr.diff

Okay, I'm going to re-set the r+ so I don't hold this up, but please please do remove the redundant functions!
Comment 7 Xan Lopez 2011-04-26 17:13:24 PDT
(In reply to comment #6)
> (From update of attachment 83675 [details])
> Okay, I'm going to re-set the r+ so I don't hold this up, but please please do remove the redundant functions!

Turns out I forgot the Mac build stuff in https://bugs.webkit.org/show_bug.cgi?id=55046 (which this patch needs), so I'll need to go through everything again before landing. Feel free to r- for now and I can apply your suggestions and upload again.
Comment 8 Xan Lopez 2011-04-28 16:27:44 PDT
(In reply to comment #5)
> Xan,
> 
> This is great! But, supportsFloatingPointTruncate() only existed because ARMv7 didn't support it.  I think you really should remove this method from all MacroAssemblers as a part of this patch, it would be a great cleanup (any calls to supportsFloatingPointTruncate should now be calls to supportsFloatingPoint).
> 

It seems MIPS considers a case where it can support FP in general but not FP truncate, since the methods are not identical, so perhaps we cannot completely remove it (or we should make it MIPS only somehow). CCing MIPS people according to the logs.
Comment 9 Gavin Barraclough 2011-04-29 00:11:00 PDT
Gah, I thought I'd checked all platforms before making my comment, I'd missed the extra clause on the ifdef in MIPS's supportsFloatingPoint.

Based on a little wikipedia research, the only chips toting MIPS I architecture appear to be over 20 years old, and run at under 100MHz.  If this understanding is correct, I doubt there are many users out there running the JSC JIT on these chips, and as such the performance benefit for having (some) FP enabled is unlikely to be significant!

As such, it doesn't seem unreasonable to unify the MIPS floating point support on:
    #if WTF_MIPS_DOUBLE_FLOAT && WTF_MIPS_ISA_AT_LEAST(2)
so that we can unify the supportsFloatingPoint & supportsFloatingPointTruncate methods.

Would be great to get some input from the mips folk to confirm that my understanding of the WTF_MIPS_ISA_AT_LEAST is correct, and that we would not be degrading the performance for real deployment uses.
Comment 10 Chao-ying Fu 2011-04-29 11:12:37 PDT
(In reply to comment #9)
> Gah, I thought I'd checked all platforms before making my comment, I'd missed the extra clause on the ifdef in MIPS's supportsFloatingPoint.
> 
> Based on a little wikipedia research, the only chips toting MIPS I architecture appear to be over 20 years old, and run at under 100MHz.  If this understanding is correct, I doubt there are many users out there running the JSC JIT on these chips, and as such the performance benefit for having (some) FP enabled is unlikely to be significant!

I know only Maciej W. Rozycki uses MIPS1.  But I don't know if he runs WebKit.  Other people should be ok.

> 
> As such, it doesn't seem unreasonable to unify the MIPS floating point support on:
>     #if WTF_MIPS_DOUBLE_FLOAT && WTF_MIPS_ISA_AT_LEAST(2)
> so that we can unify the supportsFloatingPoint & supportsFloatingPointTruncate methods.
> 
> Would be great to get some input from the mips folk to confirm that my understanding of the WTF_MIPS_ISA_AT_LEAST is correct, and that we would not be degrading the performance for real deployment uses.

I am fine to unify the floating-point support by adding WTF_MIPS_ISA_AT_LEAST(2).  Thanks!

Regards,
Chao-ying
Comment 11 Xan Lopez 2011-04-29 12:08:03 PDT
Created attachment 91715 [details]
fptruncate.diff

Thanks for the quick reply, here's the patch with the suggested cleanup. I have another patch that enables fp sqrt for ARMv7, I guess I can do the same cleanup there too. I'll push it to bugzilla ASAP.
Comment 12 Xan Lopez 2011-04-29 12:10:00 PDT
This builds on bug #55046, marking the dependency.
Comment 13 Darin Adler 2011-06-18 12:15:27 PDT
Comment on attachment 83675 [details]
vcvtr.diff

Clearing review on the pre-cleanup patch.
Comment 14 Xan Lopez 2011-07-08 04:41:02 PDT
Gaving: ping about this? We only need to land bug #55046 and then we can land this.
Comment 15 Gavin Barraclough 2011-07-12 23:17:02 PDT
Comment on attachment 91715 [details]
fptruncate.diff

View in context: https://bugs.webkit.org/attachment.cgi?id=91715&action=review

Looks great xan.

> Source/JavaScriptCore/assembler/MacroAssemblerARMv7.h:676
> +        m_assembler.sub_S(dataTempRegister, dest, ARMThumbImmediate::makeEncodedImm(0x80000000));

Perhaps we should change the signature of this method to take a JumpList, to avoid the branch-EQ to branch-EQ.  Still, this is a second order issue to getting branchTruncateDoubleToInt32 doing anything at all!
Comment 16 Gavin Barraclough 2011-07-13 11:30:23 PDT
Comment on attachment 91715 [details]
fptruncate.diff

r- re comments in 55046, lets not make supportsFloatingPoint() on ARMv7 perform a dynamic check.