Bug 141046 - BinarySwitch should be faster on average
Summary: BinarySwitch should be faster on average
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Filip Pizlo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-01-29 12:56 PST by Filip Pizlo
Modified: 2015-02-02 12:55 PST (History)
10 users (show)

See Also:


Attachments
average binary switch performance (96.98 KB, image/png)
2015-01-29 13:06 PST, Filip Pizlo
no flags Details
average binary switch performance (101.83 KB, image/png)
2015-01-29 15:28 PST, Filip Pizlo
no flags Details
the patch (17.66 KB, patch)
2015-01-31 13:33 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (17.42 KB, patch)
2015-01-31 13:36 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (18.43 KB, patch)
2015-02-02 09:27 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (18.28 KB, patch)
2015-02-02 09:31 PST, Filip Pizlo
andersca: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2015-01-29 12:56:49 PST
I've done some messing around in Mathematica with recurrence relations representing our BinarySwitch's average performance - that is, the average number of branches you will take to get to the case you want, for varying numbers of cases.  I found that we could improve on BinarySwitch's strategy to achieve better average-case performance in some situations.
Comment 1 Filip Pizlo 2015-01-29 13:06:54 PST
Created attachment 245643 [details]
average binary switch performance

This plots the average number of branches that will be taken to get to the code for some case, assuming that cases are sparse and each case is taken with equal probability.  The X axis is the total number of cases.

The top blue line is our current algorithm, which special-cases 1 and 2 cases (just branches on each of them in turn, which I say is a linear branch cascade) and handles all other cases by finding the median (choosing one of the two median candidates at random for odd inputs) and then saying something like:

if (value < median)
    recurse left
else if (value == median)
    handle median
else
    recurse right

The red line just below it shows the performance if we also special-cases 3 cases with a linear branch cascade.  This is indeed faster.  Here's an example of 3 cases with the old algorithm for case values (left, middle, right):

if (value < middle) {
    if (value == left)
        handle left
} else if (value == middle) {
    handle middle
} else {
    if (value == right)
        handle right
}

This requires 2 branches for left, 2 branches for middle, and three branches for right.  If we do a linear branch cascade:

if (value == left)
    handle left
else if (value == middle)
    handle middle
else if (value == right)
    handle right

then we get 1 branch for left, 2 branches for middle, and three branches for right.  So, on average, we have 2 branches instead of 2.33.

The lowest line represents an additional optimization, where we handle even numbers of cases totally differently.  Instead of picking one of the two median candidates and using the left-median-right approach where the median is directly compared against, we just branch to determine whether we lie in the left half of the cases or the right half.  Let rightMedian be the value in an even-sized array that is just to the right of the formal median (i.e. it's the larger of the two median candidates).  Then for even numbers of cases we do:

if (value < rightMedian)
    recurse left
else
    recurse right

And for odd numbers of cases we still do as before:

if (value < median)
    recurse left
else if (value == median)
    handle median
else
    recurse right

This will sometimes save us a dramatic number of branches.  I'm still exploring ways of improving this even further.  The noisiness of this algorithm's performance suggests that maybe we can do even better.
Comment 2 Filip Pizlo 2015-01-29 15:28:53 PST
Created attachment 245663 [details]
average binary switch performance

As I suspected, you can get even better performance by simply never having a direct equality comparison against the median when bisecting.  Of course this assumes that branches are the true cost and so doing one compare and two branches is as bad as doing two compare/branches.  I believe that this is the case in our simple JIT backend.
Comment 3 Filip Pizlo 2015-01-29 15:52:55 PST
Something else to keep in mind: in odd cases where one side of the bisection will be one larger than the other, it totally doesn't matter if you allow unbalancing to happen.  One side will end up having one more element than the other, and that side might still be odd, in which case it will also have two sides and one of those sides will be bigger.  Attempting to be left-right fair doesn't improve average throughput assuming that all cases have equal probability.  Still, the code I'm working on will attempt to preserve balance using randomness, to ensure that we don't end up with a deterministic pathology where the input always goes down the right side of the tree and that side is always deeper.
Comment 4 Filip Pizlo 2015-01-31 13:33:19 PST
Created attachment 245798 [details]
the patch
Comment 5 Filip Pizlo 2015-01-31 13:36:33 PST
Created attachment 245799 [details]
the patch

Just fixed a comment.
Comment 6 Filip Pizlo 2015-02-02 09:27:34 PST
Created attachment 245882 [details]
the patch

Revised the comments a bit.  I believe I came up with a good formal way of reasoning about the cost of taking default.  We stand to shave off 1/6 branch on average if we optimized for default by reducing leafThreshold to 2, but then we'd regress by 1/6 branch on average in the case that we take a case.
Comment 7 Filip Pizlo 2015-02-02 09:31:59 PST
Created attachment 245883 [details]
the patch

Fix the changelog to say that I've thought about the fall through case.
Comment 8 Filip Pizlo 2015-02-02 12:55:19 PST
Landed in http://trac.webkit.org/changeset/179490