Bug 15950

Summary: Add evaluateToInt32 and evaluateToUInt32 methods to ExpressionNode subclasses
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: JavaScriptCoreAssignee: Eric Seidel (no email) <eric>
Status: RESOLVED FIXED    
Severity: Normal    
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.4   
Bug Depends on:    
Bug Blocks: 15963    
Attachments:
Description Flags
first attempt (sunspider says this is a 0.5% slowdown)
none
better fix (still 0.1% regression overall, large speedup for math, slowdown for few tests)
none
better fix (0.3% improvement on SunSpider)
none
better fix (1.4% speedup according to SunSpider) oliver: review+

Eric Seidel (no email)
Reported 2007-11-12 02:29:35 PST
Add evaluateToInt32 and evaluateToUInt32 methods to ExpressionNode subclasses Hum... Unfortunately this isn't yet a speedup, but I think I still have some nodes which are missing the necessary overrides to take advantage of this opportunity (and to avoid the double-virtual call default implementation).
Attachments
first attempt (sunspider says this is a 0.5% slowdown) (45.65 KB, patch)
2007-11-12 02:30 PST, Eric Seidel (no email)
no flags
better fix (still 0.1% regression overall, large speedup for math, slowdown for few tests) (52.28 KB, patch)
2007-11-12 14:06 PST, Eric Seidel (no email)
no flags
better fix (0.3% improvement on SunSpider) (54.92 KB, patch)
2007-11-12 15:12 PST, Eric Seidel (no email)
no flags
better fix (1.4% speedup according to SunSpider) (58.36 KB, patch)
2007-11-12 22:32 PST, Eric Seidel (no email)
oliver: review+
Eric Seidel (no email)
Comment 1 2007-11-12 02:30:07 PST
Created attachment 17199 [details] first attempt (sunspider says this is a 0.5% slowdown)
Eric Seidel (no email)
Comment 2 2007-11-12 14:06:37 PST
Created attachment 17212 [details] better fix (still 0.1% regression overall, large speedup for math, slowdown for few tests) I'm not quite sure where to go next with this patch, w/o going off in a totally different direction to make it a speedup. Shark "compare" seems to report that the extra time on the regressed tests is spent in LocalVarAccessNode::evaluateToInt32, but that time should just be moved from the callsites where the callers no longer have to call toInt32(). Maybe the compiler has more trouble with the toInt calls in such a small function since it can't reorder the instructions to avoid stalls. That's a totally unfounded guess however.
Eric Seidel (no email)
Comment 3 2007-11-12 15:05:08 PST
Huzzah! It's now 0.3% faster overall after ggaren's suggestion of optimizing ImmediateNumberNode Sadly although it improves some tests dramatically, it worsens others. It's not clear to me how to pick up the slack on the slower tests yet. The slower tests seem to have less to gain from type inferencing, thus they don't call the faster evaluateTo* calls, and instead to up the evaluate() recursion tree. I believe part of the slowdown from any of these evaluateTo* tests is caused by unnecessary flipping back and forth between the type specific recursive call trees (if you can really term it that). The worse cases of each of those is when ExpressionNode::evaluateTo* default implementations are called, but I believe I have removed all of the important instances of those (at least none are appearing in the sample). Those are slow due to turning one virtual call into two (thus doubling the stack impact), again, I think all such instances are gone, but I'm about to add some ASSERTS to check. TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 0.3% faster 4421.6ms +/- 0.1% 4406.2ms +/- 0.1% significant ============================================================================= 3d: - 502.8ms +/- 0.3% 503.0ms +/- 0.2% cube: ?? 166.0ms +/- 0.5% 167.0ms +/- 0.5% not conclusive: might be 0.6% *slower* morph: 0.7% faster 163.2ms +/- 0.3% 162.0ms +/- 0.9% significant raytrace: ?? 173.6ms +/- 0.4% 174.0ms +/- 0.0% not conclusive: might be 0.2% *slower* access: 1.3% *slower* 708.6ms +/- 0.1% 717.6ms +/- 0.3% significant binary-trees: 0.7% *slower* 108.2ms +/- 0.5% 109.0ms +/- 0.8% significant fannkuch: 3.7% *slower* 336.2ms +/- 0.2% 348.8ms +/- 0.2% significant nbody: 2.0% faster 185.6ms +/- 0.4% 181.8ms +/- 0.3% significant nsieve: 0.8% faster 78.6ms +/- 0.9% 78.0ms +/- 0.0% significant bitops: 3.4% faster 644.4ms +/- 0.2% 622.6ms +/- 0.3% significant 3bit-bits-in-byte: 5.5% faster 122.6ms +/- 1.4% 115.8ms +/- 0.9% significant bits-in-byte: 1.7% faster 160.2ms +/- 0.3% 157.4ms +/- 0.4% significant bitwise-and: 0.7% faster 214.0ms +/- 0.0% 212.6ms +/- 0.3% significant nsieve-bits: 7.3% faster 147.6ms +/- 0.5% 136.8ms +/- 0.8% significant controlflow: ?? 155.6ms +/- 0.4% 156.4ms +/- 0.4% not conclusive: might be 0.5% *slower* recursive: ?? 155.6ms +/- 0.4% 156.4ms +/- 0.4% not conclusive: might be 0.5% *slower* crypto: 5.1% faster 364.2ms +/- 0.3% 345.6ms +/- 0.6% significant aes: - 97.8ms +/- 0.6% 97.8ms +/- 0.6% md5: 7.0% faster 134.0ms +/- 1.1% 124.6ms +/- 1.1% significant sha1: 6.9% faster 132.4ms +/- 0.8% 123.2ms +/- 0.8% significant date: 0.7% *slower* 362.2ms +/- 0.4% 364.6ms +/- 0.3% significant format-tofte: ?? 160.8ms +/- 0.6% 161.8ms +/- 0.3% not conclusive: might be 0.6% *slower* format-xparb: 0.7% *slower* 201.4ms +/- 0.6% 202.8ms +/- 0.7% significant math: 1.8% *slower* 514.4ms +/- 0.1% 523.8ms +/- 0.3% significant cordic: 4.8% *slower* 214.8ms +/- 0.5% 225.2ms +/- 0.5% significant partial-sums: 1.1% faster 176.6ms +/- 0.4% 174.6ms +/- 0.4% significant spectral-norm: 0.8% *slower* 123.0ms +/- 0.0% 124.0ms +/- 0.7% significant regexp: ?? 297.4ms +/- 0.5% 298.6ms +/- 0.8% not conclusive: might be 0.4% *slower* dna: ?? 297.4ms +/- 0.5% 298.6ms +/- 0.8% not conclusive: might be 0.4% *slower* string: ?? 872.0ms +/- 0.3% 874.0ms +/- 0.1% not conclusive: might be 0.2% *slower* base64: 1.3% *slower* 121.4ms +/- 0.6% 123.0ms +/- 0.0% significant fasta: 1.5% *slower* 208.6ms +/- 0.3% 211.8ms +/- 0.5% significant tagcloud: ?? 198.0ms +/- 0.9% 199.4ms +/- 0.7% not conclusive: might be 0.7% *slower* unpack-code: 1.8% faster 204.4ms +/- 0.3% 200.8ms +/- 0.3% significant validate-input: - 139.6ms +/- 0.8% 139.0ms +/- 1.3%
Eric Seidel (no email)
Comment 4 2007-11-12 15:12:42 PST
Created attachment 17214 [details] better fix (0.3% improvement on SunSpider) This could actually go up for review... but I think I'm gonna look at making it a little faster yet first.
Eric Seidel (no email)
Comment 5 2007-11-12 22:32:48 PST
Created attachment 17220 [details] better fix (1.4% speedup according to SunSpider) Possibly still some fat to be trimmed here, but this is good enough to get reviewed. My machine is really noisy tonight for some reason (maybe we're just too fast), but this is at least 1.4% faster.
Eric Seidel (no email)
Comment 6 2007-11-13 00:17:13 PST
Note You need to log in before you can comment on or make changes to this bug.