Bug 152191 - Polymorphic operand types for DFG and FTL bit operators.
Summary: Polymorphic operand types for DFG and FTL bit operators.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Local Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Mark Lam
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2015-12-11 15:11 PST by Mark Lam
Modified: 2015-12-15 21:56 PST (History)
8 users (show)

See Also:


Attachments
work in progress for archive (63.49 KB, patch)
2015-12-11 17:14 PST, Mark Lam
no flags Details | Formatted Diff | Diff
proposed patch. (60.79 KB, patch)
2015-12-14 12:03 PST, Mark Lam
no flags Details | Formatted Diff | Diff
proposed patch with appropriate style fixes. (60.81 KB, patch)
2015-12-14 12:09 PST, Mark Lam
sbarati: review+
Details | Formatted Diff | Diff
x86_64 benchmark result. (66.00 KB, text/plain)
2015-12-14 12:37 PST, Mark Lam
no flags Details
x86_64 benchmark result without FTL. (65.71 KB, text/plain)
2015-12-14 12:37 PST, Mark Lam
no flags Details
x86 benchmark result. (65.93 KB, text/plain)
2015-12-14 12:38 PST, Mark Lam
no flags Details
Patch for landing. (62.02 KB, patch)
2015-12-15 13:13 PST, Mark Lam
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Lam 2015-12-11 15:11:09 PST
This is for shifts and bitwise masking operators.
Comment 1 Radar WebKit Bug Importer 2015-12-11 15:12:36 PST
<rdar://problem/23866780>
Comment 2 Mark Lam 2015-12-11 17:14:32 PST
Created attachment 267207 [details]
work in progress for archive
Comment 3 WebKit Commit Bot 2015-12-11 17:17:21 PST
Attachment 267207 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/jit/JITLeftShiftGenerator.h:39:  Wrong number of spaces before statement. (expected: 12)  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:3061:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
Total errors found: 2 in 38 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Mark Lam 2015-12-14 12:03:05 PST
Created attachment 267307 [details]
proposed patch.
Comment 5 WebKit Commit Bot 2015-12-14 12:05:24 PST
Attachment 267307 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/ChangeLog:25:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:33:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:37:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:38:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:39:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:40:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:41:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:89:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:110:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/jit/JITLeftShiftGenerator.h:39:  Wrong number of spaces before statement. (expected: 12)  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:3050:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
Total errors found: 11 in 38 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 6 Mark Lam 2015-12-14 12:09:28 PST
Created attachment 267308 [details]
proposed patch with appropriate style fixes.
Comment 7 WebKit Commit Bot 2015-12-14 12:12:16 PST
Attachment 267308 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/jit/JITLeftShiftGenerator.h:39:  Wrong number of spaces before statement. (expected: 12)  [whitespace/indent] [4]
Total errors found: 1 in 38 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 Mark Lam 2015-12-14 12:37:09 PST
Created attachment 267310 [details]
x86_64 benchmark result.
Comment 9 Mark Lam 2015-12-14 12:37:31 PST
Created attachment 267311 [details]
x86_64 benchmark result without FTL.
Comment 10 Mark Lam 2015-12-14 12:38:04 PST
Created attachment 267312 [details]
x86 benchmark result.
Comment 11 Mark Lam 2015-12-14 12:49:29 PST
Perf is neutral in general except for some targeted cases tasted in JSRegress.  From the x86_64 results:

   ftl-polymorphic-bitand                           594.9220+-8.1735     ^    329.4421+-2.4850        ^ definitely 1.8058x faster
   ftl-polymorphic-bitor                            595.7047+-11.7361    ^    304.7192+-24.4380       ^ definitely 1.9549x faster
   ftl-polymorphic-bitxor                           592.9283+-11.5598    ^    317.4730+-29.1609       ^ definitely 1.8676x faster
   ftl-polymorphic-lshift                           589.3531+-10.8520    ^    311.9728+-19.3596       ^ definitely 1.8891x faster
   ftl-polymorphic-rshift                           602.4149+-9.6393     ^    304.7771+-7.4704        ^ definitely 1.9766x faster
   ftl-polymorphic-urshift                          582.8847+-11.2889    ^    311.1393+-28.7932       ^ definitely 1.8734x faster

As with other snippets, the gains shown here is not due to the speed of the snippet itself, but rather that the support for untyped operands now allows the test function to be DFG and FTL compiled (as opposed to having to fallback to the baseline JIT).  The test functions does other work that are better optimized by the DFG and FTL, and this is the reason for the gains here.

We also see the following progression:

   string-repeat-arith                               34.7181+-1.2319     ^     28.4572+-0.7629        ^ definitely 1.2200x faster

string-repeat-arith also happens to be a test function that exercises 3 of the bitops on a string.  As a result, we are now able to DFG and FTL compile the test function and realize some additional gains.

On 32-bit x86, the following progression was consistently reproducible:

   Int16Array-to-Int32Array-set                      77.5821+-5.4359     ^     66.3774+-2.2837        ^ definitely 1.1688x faster

However, the test does not make use of any of the bitops, at least not in the test functions themselves.  Considering that this only manifests on x86 and not x86_64 (and I didn't do anything to optimize for x86 more than x86_64), the gains could be just due to cache line alignment effects.
Comment 12 Mark Lam 2015-12-14 12:50:46 PST
Comment on attachment 267308 [details]
proposed patch with appropriate style fixes.

Now ready for review.
Comment 13 Saam Barati 2015-12-15 11:27:11 PST
Comment on attachment 267308 [details]
proposed patch with appropriate style fixes.

View in context: https://bugs.webkit.org/attachment.cgi?id=267308&action=review

r=me with comments and suggestions

> Source/JavaScriptCore/ChangeLog:101
> +          sizes values later in another patch once all snippet ICs have been added.

We won't have to worry about this with B3.

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:2884
> +    return;

not needed.

> Source/JavaScriptCore/ftl/FTLCompileBinaryOp.cpp:182
> +    auto numberOfBytesUsedToPreserveReusedRegisters =
> +        allocator.preserveReusedRegistersByPushing(jit, ScratchRegisterAllocator::ExtraStackSpace::NoExtraSpace);

style: I think this is easier to read as "unsigned"

> Source/JavaScriptCore/ftl/FTLCompileBinaryOp.cpp:192
> +    allocator.restoreReusedRegistersByPopping(jit, numberOfBytesUsedToPreserveReusedRegisters,
> +        ScratchRegisterAllocator::ExtraStackSpace::SpaceForCCall);

Your ExtraStackSpace parameter here doesn't match the parameter to preserveReusedRegistersBytPushing.
This also means that no tests are actually hitting the path where we actually spill anything. It might be worth writing such a test.

> Source/JavaScriptCore/ftl/FTLCompileBinaryOp.cpp:218
> +    auto numberOfBytesUsedToPreserveReusedRegisters =
> +        allocator.preserveReusedRegistersByPushing(jit, ScratchRegisterAllocator::ExtraStackSpace::NoExtraSpace);

ditto.
(Bias disclosure: I almost never like the use of auto)
Comment 14 Mark Lam 2015-12-15 13:13:48 PST
Created attachment 267388 [details]
Patch for landing.

Thanks for the review.

(In reply to comment #13)
> > Source/JavaScriptCore/ChangeLog:101
> > +          sizes values later in another patch once all snippet ICs have been added.
> 
> We won't have to worry about this with B3.
> 
> > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:2884
> > +    return;
> 
> not needed.

Removed.

> > Source/JavaScriptCore/ftl/FTLCompileBinaryOp.cpp:182
> > +    auto numberOfBytesUsedToPreserveReusedRegisters =
> > +        allocator.preserveReusedRegistersByPushing(jit, ScratchRegisterAllocator::ExtraStackSpace::NoExtraSpace);
> 
> style: I think this is easier to read as "unsigned"

Fixed, and also in generateRightShiftFastPath() and generateBinaryArithOpFastPath().
 
> > Source/JavaScriptCore/ftl/FTLCompileBinaryOp.cpp:192
> > +    allocator.restoreReusedRegistersByPopping(jit, numberOfBytesUsedToPreserveReusedRegisters,
> > +        ScratchRegisterAllocator::ExtraStackSpace::SpaceForCCall);
> 
> Your ExtraStackSpace parameter here doesn't match the parameter to
> preserveReusedRegistersBytPushing.
> This also means that no tests are actually hitting the path where we
> actually spill anything. It might be worth writing such a test.

Fixed.  Should be using ExtraStackSpace::NoExtraSpace.  Also did the same in generateRightShiftFastPath() and generateBinaryArithOpFastPath().

As for the test you had in mind, it requires adequate register pressure to trigger the issue.  Instead, I'll write a separate patch later to ensure that we will always call restoreReusedRegistersByPopping() with the same parameters as the preserveReusedRegistersBytPushing() call (perhaps with RAII).

Will land this patch with the above fixes.
Comment 15 Mark Lam 2015-12-15 13:21:18 PST
Landed in r194113: <http://trac.webkit.org/r194113>
Comment 16 Mark Lam 2015-12-15 14:45:04 PST
(In reply to comment #14)
> (In reply to comment #13)
> > > Source/JavaScriptCore/ftl/FTLCompileBinaryOp.cpp:192
> > > +    allocator.restoreReusedRegistersByPopping(jit, numberOfBytesUsedToPreserveReusedRegisters,
> > > +        ScratchRegisterAllocator::ExtraStackSpace::SpaceForCCall);
> > 
> > Your ExtraStackSpace parameter here doesn't match the parameter to
> > preserveReusedRegistersBytPushing.
> > This also means that no tests are actually hitting the path where we
> > actually spill anything. It might be worth writing such a test.
> 
> ...  Instead, I'll write a separate patch later to ensure
> that we will always call restoreReusedRegistersByPopping() with the same
> parameters as the preserveReusedRegistersBytPushing() call (perhaps with
> RAII).

RAII won't work.  I'll have preserveReusedRegistersBytPushing() return a token that restoreReusedRegistersByPopping() uses.  That way we ensure that restoreReusedRegistersByPopping() always get the right values it needs.  See https://bugs.webkit.org/show_bug.cgi?id=152315.
Comment 17 Mark Lam 2015-12-15 21:56:30 PST
Also landed gardening fix for 32-bit JSC tests in r194131: <http://trac.webkit.org/r194131>.