Bug 15950 - Add evaluateToInt32 and evaluateToUInt32 methods to ExpressionNode subclasses
Summary: Add evaluateToInt32 and evaluateToUInt32 methods to ExpressionNode subclasses
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Macintosh OS X 10.4
: P2 Normal
Assignee: Eric Seidel (no email)
URL:
Keywords:
Depends on:
Blocks: 15963
  Show dependency treegraph
 
Reported: 2007-11-12 02:29 PST by Eric Seidel (no email)
Modified: 2007-11-13 00:17 PST (History)
0 users

See Also:


Attachments
first attempt (sunspider says this is a 0.5% slowdown) (45.65 KB, patch)
2007-11-12 02:30 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
better fix (still 0.1% regression overall, large speedup for math, slowdown for few tests) (52.28 KB, patch)
2007-11-12 14:06 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
better fix (0.3% improvement on SunSpider) (54.92 KB, patch)
2007-11-12 15:12 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
better fix (1.4% speedup according to SunSpider) (58.36 KB, patch)
2007-11-12 22:32 PST, Eric Seidel (no email)
oliver: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2007-11-12 02:29:35 PST
Add evaluateToInt32 and evaluateToUInt32 methods to ExpressionNode subclasses

Hum... Unfortunately this isn't yet a speedup, but I think I still have some nodes which are missing the necessary overrides to take advantage of this opportunity (and to avoid the double-virtual call default implementation).
Comment 1 Eric Seidel (no email) 2007-11-12 02:30:07 PST
Created attachment 17199 [details]
first attempt (sunspider says this is a 0.5% slowdown)
Comment 2 Eric Seidel (no email) 2007-11-12 14:06:37 PST
Created attachment 17212 [details]
better fix (still 0.1% regression overall, large speedup for math, slowdown for few tests)

I'm not quite sure where to go next with this patch, w/o going off in a totally different direction to make it a speedup.  Shark "compare" seems to report that the extra time on the regressed tests is spent in LocalVarAccessNode::evaluateToInt32, but that time should just be moved from the callsites where the callers no longer have to call toInt32().  Maybe the compiler has more trouble with the toInt calls in such a small function since it can't reorder the instructions to avoid stalls.  That's a totally unfounded guess however.
Comment 3 Eric Seidel (no email) 2007-11-12 15:05:08 PST
Huzzah!  It's now 0.3% faster overall after ggaren's suggestion of optimizing ImmediateNumberNode
Sadly although it improves some tests dramatically, it worsens others.
It's not clear to me how to pick up the slack on the slower tests yet.  The slower tests seem to have less to gain from type inferencing, thus they don't call the faster evaluateTo* calls, and instead to up the evaluate() recursion tree.  I believe part of the slowdown from any of these evaluateTo* tests is caused by unnecessary flipping back and forth between the type specific recursive call trees (if you can really term it that).  The worse cases of each of those is when ExpressionNode::evaluateTo* default implementations are called, but I believe I have removed all of the important instances of those (at least none are appearing in the sample).  Those are slow due to turning one virtual call into two (thus doubling the stack impact), again, I think all such instances are gone, but I'm about to add some ASSERTS to check.

TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           0.3% faster     4421.6ms +/- 0.1%   4406.2ms +/- 0.1%     significant

=============================================================================

  3d:                  -                502.8ms +/- 0.3%    503.0ms +/- 0.2% 
    cube:              ??               166.0ms +/- 0.5%    167.0ms +/- 0.5%     not conclusive: might be 0.6% *slower*
    morph:             0.7% faster      163.2ms +/- 0.3%    162.0ms +/- 0.9%     significant
    raytrace:          ??               173.6ms +/- 0.4%    174.0ms +/- 0.0%     not conclusive: might be 0.2% *slower*

  access:              1.3% *slower*    708.6ms +/- 0.1%    717.6ms +/- 0.3%     significant
    binary-trees:      0.7% *slower*    108.2ms +/- 0.5%    109.0ms +/- 0.8%     significant
    fannkuch:          3.7% *slower*    336.2ms +/- 0.2%    348.8ms +/- 0.2%     significant
    nbody:             2.0% faster      185.6ms +/- 0.4%    181.8ms +/- 0.3%     significant
    nsieve:            0.8% faster       78.6ms +/- 0.9%     78.0ms +/- 0.0%     significant

  bitops:              3.4% faster      644.4ms +/- 0.2%    622.6ms +/- 0.3%     significant
    3bit-bits-in-byte: 5.5% faster      122.6ms +/- 1.4%    115.8ms +/- 0.9%     significant
    bits-in-byte:      1.7% faster      160.2ms +/- 0.3%    157.4ms +/- 0.4%     significant
    bitwise-and:       0.7% faster      214.0ms +/- 0.0%    212.6ms +/- 0.3%     significant
    nsieve-bits:       7.3% faster      147.6ms +/- 0.5%    136.8ms +/- 0.8%     significant

  controlflow:         ??               155.6ms +/- 0.4%    156.4ms +/- 0.4%     not conclusive: might be 0.5% *slower*
    recursive:         ??               155.6ms +/- 0.4%    156.4ms +/- 0.4%     not conclusive: might be 0.5% *slower*

  crypto:              5.1% faster      364.2ms +/- 0.3%    345.6ms +/- 0.6%     significant
    aes:               -                 97.8ms +/- 0.6%     97.8ms +/- 0.6% 
    md5:               7.0% faster      134.0ms +/- 1.1%    124.6ms +/- 1.1%     significant
    sha1:              6.9% faster      132.4ms +/- 0.8%    123.2ms +/- 0.8%     significant

  date:                0.7% *slower*    362.2ms +/- 0.4%    364.6ms +/- 0.3%     significant
    format-tofte:      ??               160.8ms +/- 0.6%    161.8ms +/- 0.3%     not conclusive: might be 0.6% *slower*
    format-xparb:      0.7% *slower*    201.4ms +/- 0.6%    202.8ms +/- 0.7%     significant

  math:                1.8% *slower*    514.4ms +/- 0.1%    523.8ms +/- 0.3%     significant
    cordic:            4.8% *slower*    214.8ms +/- 0.5%    225.2ms +/- 0.5%     significant
    partial-sums:      1.1% faster      176.6ms +/- 0.4%    174.6ms +/- 0.4%     significant
    spectral-norm:     0.8% *slower*    123.0ms +/- 0.0%    124.0ms +/- 0.7%     significant

  regexp:              ??               297.4ms +/- 0.5%    298.6ms +/- 0.8%     not conclusive: might be 0.4% *slower*
    dna:               ??               297.4ms +/- 0.5%    298.6ms +/- 0.8%     not conclusive: might be 0.4% *slower*

  string:              ??               872.0ms +/- 0.3%    874.0ms +/- 0.1%     not conclusive: might be 0.2% *slower*
    base64:            1.3% *slower*    121.4ms +/- 0.6%    123.0ms +/- 0.0%     significant
    fasta:             1.5% *slower*    208.6ms +/- 0.3%    211.8ms +/- 0.5%     significant
    tagcloud:          ??               198.0ms +/- 0.9%    199.4ms +/- 0.7%     not conclusive: might be 0.7% *slower*
    unpack-code:       1.8% faster      204.4ms +/- 0.3%    200.8ms +/- 0.3%     significant
    validate-input:    -                139.6ms +/- 0.8%    139.0ms +/- 1.3% 

Comment 4 Eric Seidel (no email) 2007-11-12 15:12:42 PST
Created attachment 17214 [details]
better fix (0.3% improvement on SunSpider)

This could actually go up for review... but I think I'm gonna look at making it a little faster yet first.
Comment 5 Eric Seidel (no email) 2007-11-12 22:32:48 PST
Created attachment 17220 [details]
better fix (1.4% speedup according to SunSpider)

Possibly still some fat to be trimmed here, but this is good enough to get reviewed.  My machine is really noisy tonight for some reason (maybe we're just too fast), but this is at least 1.4% faster.
Comment 6 Eric Seidel (no email) 2007-11-13 00:17:13 PST
r27747