Summary: | speed up SunSpider by optimizing immediate number cases | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Darin Adler <darin> | ||||||||||||||||||
Component: | JavaScriptCore | Assignee: | Nobody <webkit-unassigned> | ||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||
Severity: | Normal | CC: | ap, barraclough, emacemac7, oliver | ||||||||||||||||||
Priority: | P2 | ||||||||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||||||||
Hardware: | Mac | ||||||||||||||||||||
OS: | OS X 10.5 | ||||||||||||||||||||
Attachments: |
|
Description
Darin Adler
2008-05-21 22:14:07 PDT
Created attachment 21285 [details]
patch for < operator
Comment on attachment 21285 [details]
patch for < operator
r=me
Comment on attachment 21285 [details]
patch for < operator
Clearing since I landed this.
Created attachment 21348 [details]
patch for &, |, and ^ operators
I'm going to see if jsAdd can be similarly optimized - about 60% of number additions in SunSpider are for immediates. (In reply to comment #5) > I'm going to see if jsAdd can be similarly optimized - about 60% of number > additions in SunSpider are for immediates. For what it's worth, I tried that and failed. Comment on attachment 21348 [details]
patch for &, |, and ^ operators
Clear flag since I landed this patch.
Created attachment 21363 [details]
Patch for + and -
Hmm, makes me wonder if I got my bit operations wrong. This passes JSC tests, still rebuilding to run layout tests.
Oliver has a similar patch for ++, which we discussed a little yesterday.
OK. My original patch did ++, --, + and -, but there was a net slowdown for all of them. Comment on attachment 21363 [details]
Patch for + and -
r=me
So, I tried to land my patch, and re-measured performance with ToT, just in case. And it has turned into a 6% regression! I've triple-checked the numbers, and it is just that: your patch is a 1% win, mine is 1.8% win, but together, they become a 6% regression. (In reply to comment #11) > I've triple-checked the numbers, and it is just that: your patch is a 1% win, > mine is 1.8% win, but together, they become a 6% regression. Which specific patch do you mean when you say "your patch"? I was talking about the latest one, which I didn't have in my tree when measuring originally, "patch for &, |, and ^ operators". Comment on attachment 21363 [details]
Patch for + and -
Landed this patch, it was a 2% improvement after a slight random change that should have been a slowdown - I've moved VM_CHECK_EXCEPTION() out of else block. Applying the same change to bitops regresses performance, though.
On disassembly, I saw a lot of changes everywhere in Machine::executePrivate, but I can't tell what gcc was thinking. There are also changes in completely unrelated places, such as base64 encoding and registerThread().
Created attachment 21384 [details]
Patch for ++ and --
Comment on attachment 21384 [details]
Patch for ++ and --
r=me
Comment on attachment 21384 [details]
Patch for ++ and --
Clearing since I landed this.
From the looks of it, right shift should also benefit from such optimization, but I cannot get it to work yet. I do not see any other cases to optimize in this manner. Created attachment 21395 [details]
Patch for >>, >>>, ==, ===, !=, !==
Comment on attachment 21395 [details]
Patch for >>, >>>, ==, ===, !=, !==
Oops, this is buggy...
Comment on attachment 21395 [details]
Patch for >>, >>>, ==, ===, !=, !==
124 return reinterpret_cast<JSValue*>((reinterpret_cast<intptr_t>(val) >> ((reinterpret_cast<uintptr_t>(shift) >> 2) & 0x1f)) | NumberType);
Does this do the correct thing on numbers that have low bits set? What about a 1 shifted right by 1?
Created attachment 21398 [details] Patch for >>, ==, ===, !=, !== Removed the fast path for >>> for now, as it was quite incorrect for negative numbers, and fixing this results in rather mixed performance changes. (In reply to comment #21) > Does this do the correct thing on numbers that have low bits set? As far as I can tell, it does. > What about a 1 shifted right by 1? That's (7 >> (7 >> 2)) | 3 == (7 >> 1) | 3 == 3, which is correct. Created attachment 21425 [details]
Patch for >>, ==, ===, !=, !==
Renamed v1/v2 to src1/src2, to make code match documentation.
I just realized that there is the same issue on PPC with << - I'll see if a fix for it can be combined with an optimization again, or will file a separate bug.
Comment on attachment 21425 [details]
Patch for >>, ==, ===, !=, !==
r=me, assuming layout and jsc tests pass. And assuming you tested perf on current ToT :D
Comment on attachment 21425 [details] Patch for >>, ==, ===, !=, !== Clearing, because I landed this. (In reply to comment #24) > And assuming you tested perf on current ToT :D It was a larger speedup today - because r34204 was a random regression, and this patch brings absolute numbers back to what I saw before. Created attachment 21430 [details]
Patch for << and >>>
Randomness strikes again, but a win overall.
Many of these above cases can be further improved by using inline assembly - it is faster to try an arithmetic operation and check carry and overflow flags than to check if it can be run.
Comment on attachment 21430 [details]
Patch for << and >>>
If you fixed a regression on PowerPC, was there a regression test indicating it?
+ (KJS::JSImmediate::toTruncatedUInt32): Added. Same as getTruncatedInt32, but casts the result
+ to unsigned.
I'm concerned about the "to" vs. "get" in the name here. This class seems we use "get" to mean "convert with no possibility of error" and "to" to mean "convert with a possible exception or error".
r=me
Comment on attachment 21430 [details] Patch for << and >>> Clearing review flag, since I landed this. (In reply to comment #27) > If you fixed a regression on PowerPC, was there a regression test indicating > it? Yes, it's one of the three Mozilla tests that I claimed to have fixed in the previous patch. Only two were fixed, in fact. > I'm concerned about the "to" vs. "get" in the name here. This class seems we > use "get" to mean "convert with no possibility of error" and "to" to mean > "convert with a possible exception or error". I'm rather confused about the naming in JSImmediate.h - some "get" functions also return a bool to indicate success. That said, this new function just implements "ToUInt32" algorithm from ECMA-232 for immediates as far as I can tell, and is not really a "get" in the sense that -1 turns into 4294967296. The "get" functions that we have do not cast the result like that. I wasn't sure what the word "truncated" meant here, but since this function does almost the same work as getTruncatedInt32, I kept it. (In reply to comment #28) > I'm rather confused about the naming in JSImmediate.h - some "get" functions > also return a bool to indicate success. That said, this new function just > implements "ToUInt32" algorithm from ECMA-232 for immediates as far as I can > tell, and is not really a "get" in the sense that -1 turns into 4294967296. The > "get" functions that we have do not cast the result like that. I wasn't sure > what the word "truncated" meant here, but since this function does almost the > same work as getTruncatedInt32, I kept it. I can live with this. It's worth redoing the names at some point. The "truncated" is left over from when immediate numbers could be non-integers. "Truncated" meant that the function was free to discard the fractional portion or values that didn't fit in a 32-bit integer. I think I'm done with this for now - do we need to keep the bug open? Fine to close this if there are no more obvious cases to optimize. But <= and >= haven't been done yet. Maybe there are others. The JITs generate immediate arithmetic ops & compares. |