15607 – Add float/double specific versions of getUInt32() for a 0.5% speedup in SunSpider

Bug 15607 - Add float/double specific versions of getUInt32() for a 0.5% speedup in SunSpider

Summary: Add float/double specific versions of getUInt32() for a 0.5% speedup in SunSp...

Status:	RESOLVED WONTFIX

Alias:	None

Product:	WebKit
Classification:	Unclassified
Component:	JavaScriptCore (show other bugs)
Version:	523.x (Safari 3)
Hardware:	Mac OS X 10.4

Importance:	P2 Normal
Assignee:	Eric Seidel (no email)

URL:
Keywords:

Depends on:
Blocks:

Reported:	2007-10-22 00:41 PDT by Eric Seidel (no email)
Modified:	2008-01-13 14:52 PST (History)
CC List:	2 users (show)

See Also:

Attachments
patch (4.40 KB, patch) 2007-10-22 00:43 PDT, Eric Seidel (no email)	no flags	Details \| Formatted Diff \| Diff
final patch (4.40 KB, patch) 2007-10-22 02:43 PDT, Eric Seidel (no email)	no flags	Details \| Formatted Diff \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Eric Seidel (no email) 2007-10-22 00:41:52 PDT

Add float/double specific versions of getUInt32() for a 6% speedup in SunSpider

This patch pushes the getUInt32() logic down into JSImmediate (into the FPBitValues structs) to avoid unnecessary double/float conversions on 32bit machines.  This resulted in a 6% overall speedup on SunSpider, with a 22% speedup for bitops tests.

Before this patch:
========================================
RESULTS (means and 95% confidence intervals)
----------------------------------------
Total:                  6546.6ms [ +/- 4.13ms | +/- 0.06% ]
----------------------------------------
  3d:                   1251.4ms [ +/- 1.97ms | +/- 0.16% ]
    cube:                430.6ms [ +/- 1.19ms | +/- 0.28% ]
    morph:               486.0ms [ +/- 1.36ms | +/- 0.28% ]
    raytrace:            334.8ms [ +/- 0.35ms | +/- 0.10% ]
  access:                490.2ms [ +/- 3.97ms | +/- 0.81% ]
    binary-trees:        141.2ms [ +/- 2.44ms | +/- 1.73% ]
    nsieve:              349.0ms [ +/- 3.14ms | +/- 0.90% ]
  bitops:               1682.2ms [ +/- 3.74ms | +/- 0.22% ]
    3bit-bits-in-byte:   347.8ms [ +/- 0.66ms | +/- 0.19% ]
    bits-in-byte:        449.8ms [ +/- 1.02ms | +/- 0.23% ]
    bitwise-and:         334.2ms [ +/- 4.31ms | +/- 1.29% ]
    nsieve-bits:         550.4ms [ +/- 1.19ms | +/- 0.22% ]
  crypto:                823.0ms [ +/- 3.28ms | +/- 0.40% ]
    aes:                 238.2ms [ +/- 0.35ms | +/- 0.15% ]
    md5:                 297.8ms [ +/- 1.16ms | +/- 0.39% ]
    sha1:                287.0ms [ +/- 1.92ms | +/- 0.67% ]
  math:                 1138.6ms [ +/- 0.70ms | +/- 0.06% ]
    cordic:              613.0ms [ +/- 0.78ms | +/- 0.13% ]
    partial-sums:        263.4ms [ +/- 0.70ms | +/- 0.27% ]
    spectral-norm:       262.2ms [ +/- 0.35ms | +/- 0.13% ]
  string:               1161.2ms [ +/- 1.02ms | +/- 0.09% ]
    base64:              322.2ms [ +/- 0.35ms | +/- 0.11% ]
    fasta:               338.2ms [ +/- 0.35ms | +/- 0.10% ]
    tagcloud:            277.4ms [ +/- 0.89ms | +/- 0.32% ]
    unpack-code:         223.4ms [ +/- 0.43ms | +/- 0.19% ]


After this patch:
========================================
RESULTS (means and 95% confidence intervals)
----------------------------------------
Total:                  6099.2ms [ +/- 3.20ms | +/- 0.05% ]
----------------------------------------
  3d:                   1240.4ms [ +/- 1.80ms | +/- 0.15% ]
    cube:                428.4ms [ +/- 0.43ms | +/- 0.10% ]
    morph:               479.8ms [ +/- 2.18ms | +/- 0.45% ]
    raytrace:            332.2ms [ +/- 1.29ms | +/- 0.39% ]
  access:                485.6ms [ +/- 1.19ms | +/- 0.24% ]
    binary-trees:        141.2ms [ +/- 1.02ms | +/- 0.72% ]
    nsieve:              344.4ms [ +/- 0.70ms | +/- 0.20% ]
  bitops:               1287.4ms [ +/- 2.33ms | +/- 0.18% ]
    3bit-bits-in-byte:   340.6ms [ +/- 1.19ms | +/- 0.35% ]
    bits-in-byte:        439.6ms [ +/- 0.70ms | +/- 0.16% ]
    bitwise-and:         331.4ms [ +/- 1.19ms | +/- 0.36% ]
    nsieve-bits:         175.8ms [ +/- 0.35ms | +/- 0.20% ]
  crypto:                763.8ms [ +/- 2.38ms | +/- 0.31% ]
    aes:                 232.2ms [ +/- 0.86ms | +/- 0.37% ]
    md5:                 271.0ms [ +/- 0.55ms | +/- 0.20% ]
    sha1:                260.6ms [ +/- 1.97ms | +/- 0.76% ]
  math:                 1141.6ms [ +/- 1.80ms | +/- 0.16% ]
    cordic:              609.8ms [ +/- 1.70ms | +/- 0.28% ]
    partial-sums:        267.6ms [ +/- 0.89ms | +/- 0.33% ]
    spectral-norm:       264.2ms [ +/- 0.86ms | +/- 0.33% ]
  string:               1180.4ms [ +/- 2.64ms | +/- 0.22% ]
    base64:              320.8ms [ +/- 0.66ms | +/- 0.20% ]
    fasta:               341.2ms [ +/- 1.87ms | +/- 0.55% ]
    tagcloud:            277.4ms [ +/- 0.43ms | +/- 0.15% ]
    unpack-code:         241.0ms [ +/- 0.00ms | +/- 0.00% ]

Comment 1 Eric Seidel (no email) 2007-10-22 00:43:34 PDT

Created attachment 16786 [details]
patch

Comment 2 Eric Seidel (no email) 2007-10-22 00:48:16 PDT

Holy crap!  I just realized this was a 68% speedup for nseive-bits!  wooo hooo!

Comment 3 Eric Seidel (no email) 2007-10-22 02:42:39 PDT

Bleh.  I'm not even sure this is worth it anymore.  Turns out Shark was sending me to the wrong source file.  I've learned my lesson.  This might still be worth landing.

Comment 4 Eric Seidel (no email) 2007-10-22 02:43:10 PDT

Created attachment 16789 [details]
final patch

Comment 5 Eric Seidel (no email) 2007-10-22 02:48:29 PDT

So the major speedup before was due to a bug in the initial patch.  That bug is now fixed, and this turns out to be a much smaller speedup.  I actually show more time being spent in this function (under shark).

Comment 6 David Kilzer (:ddkilzer) 2007-10-22 09:10:33 PDT

See Bug 15617.

Comment 7 Darin Adler 2007-10-22 10:13:12 PDT

In bug 15617 I do this same optimization and a few others. We should probably take mine.

Comment 8 Eric Seidel (no email) 2008-01-13 14:52:37 PST

Oliver found another way to solve this.  Closing.