Summary:  BinarySwitch should be faster on average  

Product:  WebKit  Reporter:  Filip Pizlo <fpizlo>  
Component:  JavaScriptCore  Assignee:  Filip Pizlo <fpizlo>  
Status:  RESOLVED FIXED  
Severity:  Normal  CC:  barraclough, fpizlo, ggaren, mark.lam, mhahnenb, msaboff, nrotem, oliver, sam, sbarati  
Priority:  P2  
Version:  528+ (Nightly build)  
Hardware:  All  
OS:  All  
Attachments: 

Description
Filip Pizlo
20150129 12:56:49 PST
Created attachment 245643 [details]
average binary switch performance
This plots the average number of branches that will be taken to get to the code for some case, assuming that cases are sparse and each case is taken with equal probability. The X axis is the total number of cases.
The top blue line is our current algorithm, which specialcases 1 and 2 cases (just branches on each of them in turn, which I say is a linear branch cascade) and handles all other cases by finding the median (choosing one of the two median candidates at random for odd inputs) and then saying something like:
if (value < median)
recurse left
else if (value == median)
handle median
else
recurse right
The red line just below it shows the performance if we also specialcases 3 cases with a linear branch cascade. This is indeed faster. Here's an example of 3 cases with the old algorithm for case values (left, middle, right):
if (value < middle) {
if (value == left)
handle left
} else if (value == middle) {
handle middle
} else {
if (value == right)
handle right
}
This requires 2 branches for left, 2 branches for middle, and three branches for right. If we do a linear branch cascade:
if (value == left)
handle left
else if (value == middle)
handle middle
else if (value == right)
handle right
then we get 1 branch for left, 2 branches for middle, and three branches for right. So, on average, we have 2 branches instead of 2.33.
The lowest line represents an additional optimization, where we handle even numbers of cases totally differently. Instead of picking one of the two median candidates and using the leftmedianright approach where the median is directly compared against, we just branch to determine whether we lie in the left half of the cases or the right half. Let rightMedian be the value in an evensized array that is just to the right of the formal median (i.e. it's the larger of the two median candidates). Then for even numbers of cases we do:
if (value < rightMedian)
recurse left
else
recurse right
And for odd numbers of cases we still do as before:
if (value < median)
recurse left
else if (value == median)
handle median
else
recurse right
This will sometimes save us a dramatic number of branches. I'm still exploring ways of improving this even further. The noisiness of this algorithm's performance suggests that maybe we can do even better.
Created attachment 245663 [details]
average binary switch performance
As I suspected, you can get even better performance by simply never having a direct equality comparison against the median when bisecting. Of course this assumes that branches are the true cost and so doing one compare and two branches is as bad as doing two compare/branches. I believe that this is the case in our simple JIT backend.
Something else to keep in mind: in odd cases where one side of the bisection will be one larger than the other, it totally doesn't matter if you allow unbalancing to happen. One side will end up having one more element than the other, and that side might still be odd, in which case it will also have two sides and one of those sides will be bigger. Attempting to be leftright fair doesn't improve average throughput assuming that all cases have equal probability. Still, the code I'm working on will attempt to preserve balance using randomness, to ensure that we don't end up with a deterministic pathology where the input always goes down the right side of the tree and that side is always deeper. Created attachment 245798 [details]
the patch
Created attachment 245799 [details]
the patch
Just fixed a comment.
Created attachment 245882 [details]
the patch
Revised the comments a bit. I believe I came up with a good formal way of reasoning about the cost of taking default. We stand to shave off 1/6 branch on average if we optimized for default by reducing leafThreshold to 2, but then we'd regress by 1/6 branch on average in the case that we take a case.
Created attachment 245883 [details]
the patch
Fix the changelog to say that I've thought about the fall through case.
Landed in http://trac.webkit.org/changeset/179490 