Bug 152191

Summary: Polymorphic operand types for DFG and FTL bit operators.
Product: WebKit Reporter: Mark Lam <mark.lam>
Component: JavaScriptCoreAssignee: Mark Lam <mark.lam>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, commit-queue, fpizlo, ggaren, keith_miller, msaboff, sbarati, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Local Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
work in progress for archive
none
proposed patch.
none
proposed patch with appropriate style fixes.
sbarati: review+
x86_64 benchmark result.
none
x86_64 benchmark result without FTL.
none
x86 benchmark result.
none
Patch for landing. none

Description Mark Lam 2015-12-11 15:11:09 PST
This is for shifts and bitwise masking operators.
Comment 1 Radar WebKit Bug Importer 2015-12-11 15:12:36 PST
<rdar://problem/23866780>
Comment 2 Mark Lam 2015-12-11 17:14:32 PST
Created attachment 267207 [details]
work in progress for archive
Comment 3 WebKit Commit Bot 2015-12-11 17:17:21 PST
Attachment 267207 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/jit/JITLeftShiftGenerator.h:39:  Wrong number of spaces before statement. (expected: 12)  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:3061:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
Total errors found: 2 in 38 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Mark Lam 2015-12-14 12:03:05 PST
Created attachment 267307 [details]
proposed patch.
Comment 5 WebKit Commit Bot 2015-12-14 12:05:24 PST
Attachment 267307 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/ChangeLog:25:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:33:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:37:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:38:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:39:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:40:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:41:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:89:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/ChangeLog:110:  Line contains tab character.  [whitespace/tab] [5]
ERROR: Source/JavaScriptCore/jit/JITLeftShiftGenerator.h:39:  Wrong number of spaces before statement. (expected: 12)  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:3050:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
Total errors found: 11 in 38 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 6 Mark Lam 2015-12-14 12:09:28 PST
Created attachment 267308 [details]
proposed patch with appropriate style fixes.
Comment 7 WebKit Commit Bot 2015-12-14 12:12:16 PST
Attachment 267308 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/jit/JITLeftShiftGenerator.h:39:  Wrong number of spaces before statement. (expected: 12)  [whitespace/indent] [4]
Total errors found: 1 in 38 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 Mark Lam 2015-12-14 12:37:09 PST
Created attachment 267310 [details]
x86_64 benchmark result.
Comment 9 Mark Lam 2015-12-14 12:37:31 PST
Created attachment 267311 [details]
x86_64 benchmark result without FTL.
Comment 10 Mark Lam 2015-12-14 12:38:04 PST
Created attachment 267312 [details]
x86 benchmark result.
Comment 11 Mark Lam 2015-12-14 12:49:29 PST
Perf is neutral in general except for some targeted cases tasted in JSRegress.  From the x86_64 results:

   ftl-polymorphic-bitand                           594.9220+-8.1735     ^    329.4421+-2.4850        ^ definitely 1.8058x faster
   ftl-polymorphic-bitor                            595.7047+-11.7361    ^    304.7192+-24.4380       ^ definitely 1.9549x faster
   ftl-polymorphic-bitxor                           592.9283+-11.5598    ^    317.4730+-29.1609       ^ definitely 1.8676x faster
   ftl-polymorphic-lshift                           589.3531+-10.8520    ^    311.9728+-19.3596       ^ definitely 1.8891x faster
   ftl-polymorphic-rshift                           602.4149+-9.6393     ^    304.7771+-7.4704        ^ definitely 1.9766x faster
   ftl-polymorphic-urshift                          582.8847+-11.2889    ^    311.1393+-28.7932       ^ definitely 1.8734x faster

As with other snippets, the gains shown here is not due to the speed of the snippet itself, but rather that the support for untyped operands now allows the test function to be DFG and FTL compiled (as opposed to having to fallback to the baseline JIT).  The test functions does other work that are better optimized by the DFG and FTL, and this is the reason for the gains here.

We also see the following progression:

   string-repeat-arith                               34.7181+-1.2319     ^     28.4572+-0.7629        ^ definitely 1.2200x faster

string-repeat-arith also happens to be a test function that exercises 3 of the bitops on a string.  As a result, we are now able to DFG and FTL compile the test function and realize some additional gains.

On 32-bit x86, the following progression was consistently reproducible:

   Int16Array-to-Int32Array-set                      77.5821+-5.4359     ^     66.3774+-2.2837        ^ definitely 1.1688x faster

However, the test does not make use of any of the bitops, at least not in the test functions themselves.  Considering that this only manifests on x86 and not x86_64 (and I didn't do anything to optimize for x86 more than x86_64), the gains could be just due to cache line alignment effects.
Comment 12 Mark Lam 2015-12-14 12:50:46 PST
Comment on attachment 267308 [details]
proposed patch with appropriate style fixes.

Now ready for review.
Comment 13 Saam Barati 2015-12-15 11:27:11 PST
Comment on attachment 267308 [details]
proposed patch with appropriate style fixes.

View in context: https://bugs.webkit.org/attachment.cgi?id=267308&action=review

r=me with comments and suggestions

> Source/JavaScriptCore/ChangeLog:101
> +          sizes values later in another patch once all snippet ICs have been added.

We won't have to worry about this with B3.

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:2884
> +    return;

not needed.

> Source/JavaScriptCore/ftl/FTLCompileBinaryOp.cpp:182
> +    auto numberOfBytesUsedToPreserveReusedRegisters =
> +        allocator.preserveReusedRegistersByPushing(jit, ScratchRegisterAllocator::ExtraStackSpace::NoExtraSpace);

style: I think this is easier to read as "unsigned"

> Source/JavaScriptCore/ftl/FTLCompileBinaryOp.cpp:192
> +    allocator.restoreReusedRegistersByPopping(jit, numberOfBytesUsedToPreserveReusedRegisters,
> +        ScratchRegisterAllocator::ExtraStackSpace::SpaceForCCall);

Your ExtraStackSpace parameter here doesn't match the parameter to preserveReusedRegistersBytPushing.
This also means that no tests are actually hitting the path where we actually spill anything. It might be worth writing such a test.

> Source/JavaScriptCore/ftl/FTLCompileBinaryOp.cpp:218
> +    auto numberOfBytesUsedToPreserveReusedRegisters =
> +        allocator.preserveReusedRegistersByPushing(jit, ScratchRegisterAllocator::ExtraStackSpace::NoExtraSpace);

ditto.
(Bias disclosure: I almost never like the use of auto)
Comment 14 Mark Lam 2015-12-15 13:13:48 PST
Created attachment 267388 [details]
Patch for landing.

Thanks for the review.

(In reply to comment #13)
> > Source/JavaScriptCore/ChangeLog:101
> > +          sizes values later in another patch once all snippet ICs have been added.
> 
> We won't have to worry about this with B3.
> 
> > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:2884
> > +    return;
> 
> not needed.

Removed.

> > Source/JavaScriptCore/ftl/FTLCompileBinaryOp.cpp:182
> > +    auto numberOfBytesUsedToPreserveReusedRegisters =
> > +        allocator.preserveReusedRegistersByPushing(jit, ScratchRegisterAllocator::ExtraStackSpace::NoExtraSpace);
> 
> style: I think this is easier to read as "unsigned"

Fixed, and also in generateRightShiftFastPath() and generateBinaryArithOpFastPath().
 
> > Source/JavaScriptCore/ftl/FTLCompileBinaryOp.cpp:192
> > +    allocator.restoreReusedRegistersByPopping(jit, numberOfBytesUsedToPreserveReusedRegisters,
> > +        ScratchRegisterAllocator::ExtraStackSpace::SpaceForCCall);
> 
> Your ExtraStackSpace parameter here doesn't match the parameter to
> preserveReusedRegistersBytPushing.
> This also means that no tests are actually hitting the path where we
> actually spill anything. It might be worth writing such a test.

Fixed.  Should be using ExtraStackSpace::NoExtraSpace.  Also did the same in generateRightShiftFastPath() and generateBinaryArithOpFastPath().

As for the test you had in mind, it requires adequate register pressure to trigger the issue.  Instead, I'll write a separate patch later to ensure that we will always call restoreReusedRegistersByPopping() with the same parameters as the preserveReusedRegistersBytPushing() call (perhaps with RAII).

Will land this patch with the above fixes.
Comment 15 Mark Lam 2015-12-15 13:21:18 PST
Landed in r194113: <http://trac.webkit.org/r194113>
Comment 16 Mark Lam 2015-12-15 14:45:04 PST
(In reply to comment #14)
> (In reply to comment #13)
> > > Source/JavaScriptCore/ftl/FTLCompileBinaryOp.cpp:192
> > > +    allocator.restoreReusedRegistersByPopping(jit, numberOfBytesUsedToPreserveReusedRegisters,
> > > +        ScratchRegisterAllocator::ExtraStackSpace::SpaceForCCall);
> > 
> > Your ExtraStackSpace parameter here doesn't match the parameter to
> > preserveReusedRegistersBytPushing.
> > This also means that no tests are actually hitting the path where we
> > actually spill anything. It might be worth writing such a test.
> 
> ...  Instead, I'll write a separate patch later to ensure
> that we will always call restoreReusedRegistersByPopping() with the same
> parameters as the preserveReusedRegistersBytPushing() call (perhaps with
> RAII).

RAII won't work.  I'll have preserveReusedRegistersBytPushing() return a token that restoreReusedRegistersByPopping() uses.  That way we ensure that restoreReusedRegistersByPopping() always get the right values it needs.  See https://bugs.webkit.org/show_bug.cgi?id=152315.
Comment 17 Mark Lam 2015-12-15 21:56:30 PST
Also landed gardening fix for 32-bit JSC tests in r194131: <http://trac.webkit.org/r194131>.