Bug 150038 - Use JITSubGenerator to support UntypedUse operands for op_sub in the DFG
Summary: Use JITSubGenerator to support UntypedUse operands for op_sub in the DFG
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Local Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Mark Lam
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-10-12 11:14 PDT by Mark Lam
Modified: 2015-10-16 16:27 PDT (History)
8 users (show)

See Also:


Attachments
patch 1: almost there. (15.09 KB, patch)
2015-10-12 11:17 PDT, Mark Lam
no flags Details | Formatted Diff | Diff
patch 2 (25.63 KB, patch)
2015-10-13 11:02 PDT, Mark Lam
ggaren: review-
Details | Formatted Diff | Diff
patch 2 32-bit benchmark results (63.12 KB, text/plain)
2015-10-13 11:13 PDT, Mark Lam
no flags Details
patch 2 64-bit benchmark results (64.16 KB, text/plain)
2015-10-13 11:13 PDT, Mark Lam
no flags Details
patch 3 (30.96 KB, patch)
2015-10-15 14:34 PDT, Mark Lam
ggaren: review+
Details | Formatted Diff | Diff
patch3: 32-bit benchmark results 1 (63.85 KB, text/plain)
2015-10-15 15:47 PDT, Mark Lam
no flags Details
patch3: 32-bit benchmark results 2 (63.81 KB, text/plain)
2015-10-15 15:48 PDT, Mark Lam
no flags Details
patch3: 64 -bit benchmark results 1 (64.32 KB, text/plain)
2015-10-15 15:48 PDT, Mark Lam
no flags Details
patch3: 64 -bit benchmark results 2 (64.17 KB, text/plain)
2015-10-15 15:48 PDT, Mark Lam
no flags Details
patch for landing. (30.72 KB, patch)
2015-10-16 16:03 PDT, Mark Lam
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Lam 2015-10-12 11:14:04 PDT
Patch coming.
Comment 1 Mark Lam 2015-10-12 11:17:00 PDT
Created attachment 262906 [details]
patch 1: almost there.

This patch works and passes tests so far.  Still need to reason a bit more about how I compute the ResultType.  Also need to run benchmarks.
Comment 2 Mark Lam 2015-10-13 11:02:44 PDT
Created attachment 262994 [details]
patch 2
Comment 3 Mark Lam 2015-10-13 11:13:19 PDT
Created attachment 262996 [details]
patch 2 32-bit benchmark results
Comment 4 Mark Lam 2015-10-13 11:13:48 PDT
Created attachment 262997 [details]
patch 2 64-bit benchmark results
Comment 5 Mark Lam 2015-10-13 11:16:13 PDT
In the benchmark results, string-sub (of JSRegress) shows a significant speed up.  On 32-bit, int-or-other-sub (of JSRegress) shows a significant slow down.  I'll re-run the benchmarks to make sure, and also look into int-or-other-sub to see what's happening here.
Comment 6 Geoffrey Garen 2015-10-13 11:59:00 PDT
Comment on attachment 262994 [details]
patch 2

View in context: https://bugs.webkit.org/attachment.cgi?id=262994&action=review

Code looks generally correct, with some comments below.

Please add a regression test that verifies we have a speedup in some case where subtraction previously caused the DFG problems. (Or does your current test do this? If so, please post its result.)

> Source/JavaScriptCore/dfg/DFGAbstractValue.cpp:497
> +ResultType AbstractValue::asResultType() const

I would call this "resultType" instead of "asResultType". We use "as" to mean a cast, but this is a just an access to a field, with a type conversion.

> Source/JavaScriptCore/dfg/DFGAbstractValue.cpp:512
> +    ResultType::Type bits = 0;
> +    if (isType(SpecInt32))
> +        bits |= (ResultType::TypeInt32 | ResultType::TypeMaybeNumber);
> +    if (isType(SpecBytecodeDouble))
> +        bits |= ResultType::TypeMaybeNumber;
> +    if (isType(SpecString))
> +        bits |= ResultType::TypeMaybeString;
> +    if (isType(SpecBoolean))
> +        bits |= ResultType::TypeMaybeBool;
> +    if (isType(SpecOther))
> +        bits |= ResultType::TypeMaybeNull;
> +    if (!bits)
> +        bits |= ResultType::TypeMaybeOther;
> +    return ResultType(bits);

This code duplicates the logic of the static functions in the ResultType class. This logic is involved, and somewhat surprising at first read. So, I think it's a liability to duplicate this stuff. You should use the ResultType helper functions instead.

For example, if I know that I'm an Int32, it's pretty surprising that I need to "ResultType::TypeInt32 | ResultType::TypeMaybeNumber". But it wouldn't be surprising at all to "ResultType::numberTypeIsInt32()".

Once you switch to helper functions, you'll notice that you have a small problem with undefined. SpecOther means "undefined or null". But your ResultType only says "null". So, you need to add a resultType helper function for what to do with undefined.

I think SpecBytecodeNumber misses some cases of number -- namely, Int52 and impure NaN. Don't you want SpecFullNumber?

You shouldn't use |= logic here. "isType" means "I am this and only this". So, you will never |= with anything else.

> Source/JavaScriptCore/ftl/FTLLowerDFGToLLVM.cpp:1464
> +        case UntypedUse:
> +            compileForceOSRExit();
> +            break;

I think I suggested the force OSR exit here -- but I think it would be better to edit FT::canCompile instead so that we don't waste time FTL compiling such a function in the first place.

> Source/JavaScriptCore/jit/AssemblyHelpers.h:969
> +    void unboxDouble(JSValueRegs regs, FPRReg destFPR, GPRReg scratchGPR, FPRReg)

Instead of a comment, let's use naming to distinguish from the destructive function. (In future, as you've suggested, we can use a type, which is even better than a name or a comment.)

> Source/JavaScriptCore/jit/AssemblyHelpers.h:1018
> +    // This version is non-destructive to the incoming regs.
> +    void unboxDouble(JSValueRegs regs, FPRReg destFPR, GPRReg, FPRReg scratchFPR)

Ditto.
Comment 7 Geoffrey Garen 2015-10-13 12:06:54 PDT
> Please add a regression test that verifies we have a speedup in some case
> where subtraction previously caused the DFG problems.

It looks like string-sub does just enough to test this.
Comment 8 Mark Lam 2015-10-13 15:27:37 PDT
(In reply to comment #6)
> Please add a regression test that verifies we have a speedup in some case
> where subtraction previously caused the DFG problems. (Or does your current
> test do this? If so, please post its result.)

Yes, string-sub.js (in JSRegress) should exercise this code.  I'll take a look to see if I need to add something more.

> > Source/JavaScriptCore/dfg/DFGAbstractValue.cpp:497
> > +ResultType AbstractValue::asResultType() const
> 
> I would call this "resultType" instead of "asResultType". We use "as" to
> mean a cast, but this is a just an access to a field, with a type conversion.

Fixed.

> > Source/JavaScriptCore/dfg/DFGAbstractValue.cpp:512
> ...
> This code duplicates the logic of the static functions in the ResultType
> class. This logic is involved, and somewhat surprising at first read. So, I
> think it's a liability to duplicate this stuff. You should use the
> ResultType helper functions instead.

From re-reading the code, I see that the resultType values are only every set once during Node creation in the BytecodeGenerator.  I've changed the ResultType struct to reflect this by making its constructor private.  This ensures that I don't have to check for cases where the ResultType::Type bits are OR'ed together other than those which ResultType already provided factory methods for.  With that, I can now switch to using the ResultType factory methods only.

> Once you switch to helper functions, you'll notice that you have a small
> problem with undefined. SpecOther means "undefined or null". But your
> ResultType only says "null". So, you need to add a resultType helper
> function for what to do with undefined.

It appears that there is currently no way to test for is definitely TypeMaybeNull.  The snippets also currently don't need to know if the type is null.  I'll punt on this for now by lumping it into ResultType::unknownType() (which does include TypeMaybeNull).  If there's a need in the future, we can re-visit this.

> I think SpecBytecodeNumber misses some cases of number -- namely, Int52 and
> impure NaN. Don't you want SpecFullNumber?

No.  The ResultType is for JSValues, which do not include Int52 and impure NaNs.  Similarly, the snippet generators which work with ResultType will not be able to handle Int52 and impure NaNs (and the baseline JIT code which they come from don't either).  We could re-write the snippet generators to be based on SpeculatedType instead of ResultType, and have the BaselineJIT map ResultType to SpeculatedType instead.  However, to stay on task with the migration to snippets, I suggest holding off on adding supporting for Int52 and NaNs, and only do so in a subsequent pass if needed.

Back to the result type conversion here, we actually want SpecBytecodeNumber.  Patch 2 was erroneously using SpecBytecodeDouble.  Will fix it to use SpecBytecodeNumber instead.
 
> You shouldn't use |= logic here. "isType" means "I am this and only this".
> So, you will never |= with anything else.

Got it, and I'm now enforcing it by making the ResultType constructor private.

> > Source/JavaScriptCore/ftl/FTLLowerDFGToLLVM.cpp:1464
> > +        case UntypedUse:
> > +            compileForceOSRExit();
> > +            break;
> 
> I think I suggested the force OSR exit here -- but I think it would be
> better to edit FT::canCompile instead so that we don't waste time FTL
> compiling such a function in the first place.

Fixed.

> > Source/JavaScriptCore/jit/AssemblyHelpers.h:969
> > +    void unboxDouble(JSValueRegs regs, FPRReg destFPR, GPRReg scratchGPR, FPRReg)
> 
> Instead of a comment, let's use naming to distinguish from the destructive
> function. (In future, as you've suggested, we can use a type, which is even
> better than a name or a comment.)
> 
> > Source/JavaScriptCore/jit/AssemblyHelpers.h:1018
> > +    // This version is non-destructive to the incoming regs.
> > +    void unboxDouble(JSValueRegs regs, FPRReg destFPR, GPRReg, FPRReg scratchFPR)
> 
> Ditto.

Fixed.  I renamed them to unboxDoubleNonDestructive().  I'm not crazy about that name, but it should do until I can introduce read-only registers.
Comment 9 Geoffrey Garen 2015-10-13 16:23:31 PDT
> > I think SpecBytecodeNumber misses some cases of number -- namely, Int52 and
> > impure NaN. Don't you want SpecFullNumber?
> 
> No.  The ResultType is for JSValues, which do not include Int52 and impure
> NaNs.  Similarly, the snippet generators which work with ResultType will not
> be able to handle Int52 and impure NaNs (and the baseline JIT code which
> they come from don't either).  We could re-write the snippet generators to
> be based on SpeculatedType instead of ResultType, and have the BaselineJIT
> map ResultType to SpeculatedType instead.  However, to stay on task with the
> migration to snippets, I suggest holding off on adding supporting for Int52
> and NaNs, and only do so in a subsequent pass if needed.

Yes, it's the right call to stay on task and hold off on adding to snippets.

But let's consider Int52 for a minute. Let's say my input is speculated to be Int52. Of course there's no reason to tell the snippet generator "Int52", since it doesn't know what Int52 means. But surely, since the snippet does not understand Int52, before I run the snippet, I must convert my input to JSValue, right? If so, is the snippet generator not guaranteed to see a JSValue that is a number?

The same goes for signaling NaN. If I have one, don't I need to convert to non-signaling NaN JSValue before running the snippet?
Comment 10 Mark Lam 2015-10-14 09:39:16 PDT
(In reply to comment #9)
> But let's consider Int52 for a minute. Let's say my input is speculated to
> be Int52. Of course there's no reason to tell the snippet generator "Int52",
> since it doesn't know what Int52 means. But surely, since the snippet does
> not understand Int52, before I run the snippet, I must convert my input to
> JSValue, right? If so, is the snippet generator not guaranteed to see a
> JSValue that is a number?
>
> The same goes for signaling NaN. If I have one, don't I need to convert to
> non-signaling NaN JSValue before running the snippet?

If I understand you correctly, you're telling me that I should add code to the DFG ArithSub generator to convert Int52s and impure NaNs to JSValues before calling the snippet.  I'll look into that.
Comment 11 Geoffrey Garen 2015-10-14 09:47:03 PDT
> If I understand you correctly, you're telling me that I should add code to
> the DFG ArithSub generator to convert Int52s and impure NaNs to JSValues
> before calling the snippet.  I'll look into that.

I think this is automatic by virtue of UntypedUse providing JSValues. (See DFGUseKind.h.)
Comment 12 Filip Pizlo 2015-10-14 10:13:57 PDT
(In reply to comment #11)
> > If I understand you correctly, you're telling me that I should add code to
> > the DFG ArithSub generator to convert Int52s and impure NaNs to JSValues
> > before calling the snippet.  I'll look into that.
> 
> I think this is automatic by virtue of UntypedUse providing JSValues. (See
> DFGUseKind.h.)

Exactly. 

There is no way for an UntypedUse edge to see either Int52 or those parts of FullNumber that aren't in BytecodeNunber. UntyprdUse means that you are dealing with a JSValye. 

There is no value in trying convert Int52 or non-BytecodeNunber numbers to JsVavlues as some kind of special case of this code. The compiler already converts int52s and non-bytecode numbers to JSValues if they get used by a UntypedUse.
Comment 13 Geoffrey Garen 2015-10-14 10:18:36 PDT
Thinking about this more, the ResultType accessor should probably ASSERT if your AbstractValue is not a JSValue type, since ResultType is a bytecode concept, and it implies being a JSValue. A ResultType that corresponds to Int52 is a contradiction.
Comment 14 Mark Lam 2015-10-14 10:45:59 PDT
(In reply to comment #12)
> There is no way for an UntypedUse edge to see either Int52 or those parts of
> FullNumber that aren't in BytecodeNunber. UntyprdUse means that you are
> dealing with a JSValye. 

OK, I think I see what you guys are getting at.  There's a detail here that I missed, and therefore my patch has a bug too.  In the fixup phase, I only fixed the subtract node to be of type NodeResultJS, but I neglected to fix up the child edges to be of UntypedUse.  That bug will be fixed in my next patch. 

(In reply to comment #13)
> Thinking about this more, the ResultType accessor should probably ASSERT if
> your AbstractValue is not a JSValue type, since ResultType is a bytecode
> concept, and it implies being a JSValue. A ResultType that corresponds to
> Int52 is a contradiction.

Sounds good.  Will do.
Comment 15 Filip Pizlo 2015-10-14 12:03:05 PDT
(In reply to comment #13)
> Thinking about this more, the ResultType accessor should probably ASSERT if
> your AbstractValue is not a JSValue type, since ResultType is a bytecode
> concept, and it implies being a JSValue. A ResultType that corresponds to
> Int52 is a contradiction.

Yeah, you could assert that the AbstractValue type is a subset of SpecBytecodeTop, or even stricter, SpecHeapTop. Like this:

ASSERT(isType(SpecHeapTop));
Comment 16 Mark Lam 2015-10-15 14:34:01 PDT
Created attachment 263191 [details]
patch 3

Patch 3 incorporates all the feedback, and have passed all the tests.  Preliminary benchmark results seem to be noisy.  I'm re-running the benchmarks now with a cleanly booted machine to see if I can more consistent results.
Comment 17 Mark Lam 2015-10-15 15:47:53 PDT
Created attachment 263206 [details]
patch3: 32-bit benchmark results 1
Comment 18 Mark Lam 2015-10-15 15:48:12 PDT
Created attachment 263207 [details]
patch3: 32-bit benchmark results 2
Comment 19 Mark Lam 2015-10-15 15:48:37 PDT
Created attachment 263208 [details]
patch3: 64 -bit benchmark results 1
Comment 20 Mark Lam 2015-10-15 15:48:57 PDT
Created attachment 263209 [details]
patch3: 64 -bit benchmark results 2
Comment 21 Mark Lam 2015-10-15 15:52:35 PDT
From the patch 3 benchmark results, the aggregate results all show no significant difference.  The only individual values that are "definitely" different are all from JSRegress as follows:

32-bit Run 1:
JSRegress
   arguments-strict-mode                            153.8041+-5.4704     ^    146.4424+-0.8439        ^ definitely 1.0503x faster
   fold-double-to-int                                43.5707+-2.4937     !     47.8533+-0.9335        ! definitely 1.0983x slower
   string-sub                                       186.9781+-7.3105     ^    149.8940+-4.3438        ^ definitely 1.2474x faster

32-bit Run 2:
JSRegress
   infer-closure-const-then-mov                      23.7246+-1.0659     !     25.1477+-0.2112        ! definitely 1.0600x slower
   string-sub                                       184.3235+-4.9505     ^    147.9168+-3.5522        ^ definitely 1.2461x faster

64-bit Run 1:
JSRegress
   create-lots-of-functions                          13.2767+-0.8298     !     14.8483+-0.3443        ! definitely 1.1184x slower
   get-by-id-chain-from-try-block                     3.0848+-0.0438     !      3.2379+-0.1021        ! definitely 1.0496x slower
   in-one-case-true                                  16.6160+-1.1027     ^     14.3440+-0.7501        ^ definitely 1.1584x faster
   string-repeat-arith                               35.5699+-1.0970     ^     34.0988+-0.3381        ^ definitely 1.0431x faster
   string-sub                                        74.8361+-5.4934     ^     47.2620+-2.7527        ^ definitely 1.5834x faster

64-bit Run 2:
JSRegress
   array-prototype-map                              110.7988+-1.9023     !    116.7427+-3.8079        ! definitely 1.0536x slower
   Float64Array-to-Int16Array-set                    82.3625+-2.4934     !     94.1563+-2.6538        ! definitely 1.1432x slower
   get_callee_polymorphic                             4.4044+-0.2873     !      5.2390+-0.5333        ! definitely 1.1895x slower
   repeat-multi-get-by-offset                        27.7042+-0.5242     !     30.7776+-0.5496        ! definitely 1.1109x slower
   richards-try-catch                               330.9247+-5.0731     ^    323.6898+-1.8041        ^ definitely 1.0224x faster
   sink-function                                     13.2079+-1.9366     ^     10.7634+-0.1808        ^ definitely 1.2271x faster
   string-sub                                        74.4352+-3.4954     ^     47.4148+-3.4284        ^ definitely 1.5699x faster
   substring-concat                                  51.1590+-1.2356     !     57.2363+-4.2831        ! definitely 1.1188x slower

There is no consistency in the results except for string-sub which hows a 24% speed up on 32-bit and about 57% speed up on 64-bit.  The string-sub code does exercise the code that was change in this patch.
Comment 22 Geoffrey Garen 2015-10-15 16:27:12 PDT
Comment on attachment 263191 [details]
patch 3

View in context: https://bugs.webkit.org/attachment.cgi?id=263191&action=review

r=me with some fixes below

> Source/JavaScriptCore/dfg/DFGFixupPhase.cpp:188
> +            if (op == ArithSub
> +                && (node->child1()->shouldSpeculateUntypedForArithmetic()
> +                    || node->child2()->shouldSpeculateUntypedForArithmetic())

This test asks, "*can* either side be non-number?"

We want to ask "*is* either side known to be non-number?"

For values that can be number or not, we think it's optimal to try number first, and then recompile the generic path if we exit.

> Source/JavaScriptCore/dfg/DFGFixupPhase.cpp:189
> +                && m_graph.hasExitSite(node->origin.semantic, BadType)) {

Let's change this to ||. In other words, if, for any reason, speculation has been causing enough OSR exits to recompile, then the generic code is best.

> Source/JavaScriptCore/dfg/DFGFixupPhase.cpp:191
> +                fixEdge<UntypedUse>(node->child2());

Please fix child1 edge.

> Source/JavaScriptCore/dfg/DFGFixupPhase.cpp:194
> +            }

So, to summarize, the test we want is:

if (op == ArithSub
    && (shouldSpeculateUntypedForArithmetic(node->child1(), node->child2() || m_graph.hasExitSite(node->origin.semantic, BadType))

and for shouldSpeculateUntypedForArithmetic:
    check both nodes

and for Node::shouldSpeculateUntypedForArithmetic
    return !(value & SpecFullNumber);

> Source/WTF/wtf/Platform.h:867
> +#define ENABLE_MASM_PROBE 1

Revert.
Comment 23 Mark Lam 2015-10-16 16:03:31 PDT
Created attachment 263344 [details]
patch for landing.

(In reply to comment #22)
> and for Node::shouldSpeculateUntypedForArithmetic
>     return !(value & SpecFullNumber);

Turns out we need to check for SpecBoolean as well:

     return !(value & (SpecFullNumber | SpecBoolean));

There were 2 JSRegress components (minus-boolean.js and minus-boolean-double.js) that showed significant degradations without this.  Adding the check for SpecBoolean makes them happy again.

I will land as soon as my test and full benchmark runs complete.
Comment 24 Mark Lam 2015-10-16 16:27:33 PDT
Landed in r191224: <http://trac.webkit.org/r191224>.