Bug 177281

Summary: Add Above/Below comparisons for UInt32 patterns
Product: WebKit Reporter: Yusuke Suzuki <ysuzuki>
Component: New BugsAssignee: Yusuke Suzuki <ysuzuki>
Status: RESOLVED FIXED    
Severity: Normal CC: buildbot, commit-queue, darin, fpizlo, keith_miller, mark.lam, msaboff, saam, webkit-bug-importer, zan
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 177507, 177720    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
Archive of layout-test-results from ews116 for mac-elcapitan
none
WIP
none
WIP
none
Patch
none
Patch
none
Patch for landing
none
Patch none

Description Yusuke Suzuki 2017-09-20 16:47:07 PDT
Add Above/Below comparisons for UInt32 patterns
Comment 1 Yusuke Suzuki 2017-09-20 17:06:25 PDT
Created attachment 321391 [details]
Patch
Comment 2 Yusuke Suzuki 2017-09-20 17:07:32 PDT
Spawn from bug 177100. WIP, adding tests now. But ready for early review.
Comment 3 Yusuke Suzuki 2017-09-20 17:11:48 PDT
Comment on attachment 321391 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=321391&action=review

> Source/JavaScriptCore/dfg/DFGIntegerRangeOptimizationPhase.cpp:1144
> +                        // TODO: Handle CompareAbove etc.

I'll do this.

> Source/JavaScriptCore/dfg/DFGMayExit.cpp:102
> +    case CompareAboveEq:

This is correct, but I don't think they are necessary. I'll drop this part.

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT32_64.cpp:2543
> +    case CompareBelow:
> +        if (compare(node, JITCompiler::Below, JITCompiler::DoubleLessThan, operationCompareLess))
> +            return;
> +        break;
> +
> +    case CompareBelowEq:
> +        if (compare(node, JITCompiler::BelowOrEqual, JITCompiler::DoubleLessThanOrEqual, operationCompareLessEq))
> +            return;
> +        break;
> +
> +    case CompareAbove:
> +        if (compare(node, JITCompiler::Above, JITCompiler::DoubleGreaterThan, operationCompareGreater))
> +            return;
> +        break;
> +
> +    case CompareAboveEq:
> +        if (compare(node, JITCompiler::AboveOrEqual, JITCompiler::DoubleGreaterThanOrEqual, operationCompareGreaterEq))
> +            return;
> +        break;

I'll add `compareUnsigned`, which only accepts KnownInt32 binary uses and perform unsigned comparison.

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2685
> +        break;

Ditto.
Comment 4 Build Bot 2017-09-20 18:04:22 PDT
Comment on attachment 321391 [details]
Patch

Attachment 321391 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/4610048

Number of test failures exceeded the failure limit.
Comment 5 Build Bot 2017-09-20 18:04:23 PDT
Created attachment 321398 [details]
Archive of layout-test-results from ews116 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews116  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 6 Yusuke Suzuki 2017-09-21 06:53:47 PDT
Created attachment 321425 [details]
WIP
Comment 7 Yusuke Suzuki 2017-09-21 07:34:15 PDT
Created attachment 321427 [details]
WIP
Comment 8 Yusuke Suzuki 2017-09-21 09:33:11 PDT
Created attachment 321436 [details]
Patch
Comment 9 Yusuke Suzuki 2017-09-23 05:40:51 PDT
Created attachment 321629 [details]
Patch

Just updating ChangeLog
Comment 10 Darin Adler 2017-09-23 17:12:29 PDT
Comment on attachment 321629 [details]
Patch

To me, nothing about the names "above" and "below" say "treat negative numbers as if they were large numbers" or "treat as unsigned without checking". I would consider names that make that explicit, maybe a "u" added to the existing names?
Comment 11 Yusuke Suzuki 2017-09-23 17:24:35 PDT
(In reply to Darin Adler from comment #10)
> Comment on attachment 321629 [details]
> Patch
> 
> To me, nothing about the names "above" and "below" say "treat negative
> numbers as if they were large numbers" or "treat as unsigned without
> checking". I would consider names that make that explicit, maybe a "u" added
> to the existing names?

`above` / `below` is used in assembly lang as unsigned comparison, but using op_uless / op_ugreater is OK to me.
Comment 12 Saam Barati 2017-09-23 18:43:28 PDT
(In reply to Darin Adler from comment #10)
> Comment on attachment 321629 [details]
> Patch
> 
> To me, nothing about the names "above" and "below" say "treat negative
> numbers as if they were large numbers" or "treat as unsigned without
> checking". I would consider names that make that explicit, maybe a "u" added
> to the existing names?

I agree with your intuition and reasoning here, but I think it’s more important to be consistent with the rest of JSC here. We adopt various assembly language conventions and use Above to mean unsigned comparison and GreaterThan to mean signed comparison throughout the various compilers and MacroAssemblers.
Comment 13 Darin Adler 2017-09-23 18:53:25 PDT
(In reply to Saam Barati from comment #12)
> I agree with your intuition and reasoning here, but I think it’s more
> important to be consistent with the rest of JSC here. We adopt various
> assembly language conventions and use Above to mean unsigned comparison and
> GreaterThan to mean signed comparison throughout the various compilers and
> MacroAssemblers.

Good to know.

My knowledge is out of date. The instruction sets I know best, ones other than X86, call unsigned comparison "higher" and "lower" rather than "above" and "below". But I recall now that the signed comparisons were the only ones called "greater" and "less than" and I suppose I had forgotten that.

I see how this is following tradition, especially for people familiar with X86; sorry for the distraction.
Comment 14 Saam Barati 2017-09-23 19:16:53 PDT
(In reply to Darin Adler from comment #13)
> (In reply to Saam Barati from comment #12)
> > I agree with your intuition and reasoning here, but I think it’s more
> > important to be consistent with the rest of JSC here. We adopt various
> > assembly language conventions and use Above to mean unsigned comparison and
> > GreaterThan to mean signed comparison throughout the various compilers and
> > MacroAssemblers.
> 
> Good to know.
> 
> My knowledge is out of date. The instruction sets I know best, ones other
> than X86, call unsigned comparison "higher" and "lower" rather than "above"
> and "below". But I recall now that the signed comparisons were the only ones
> called "greater" and "less than" and I suppose I had forgotten that.
> 
> I see how this is following tradition, especially for people familiar with
> X86; sorry for the distraction.

FWIW, this distinction has caused cognitive overhead for me in the past. At some point in the last year or so I crossed over to these names intuitively meaning what they do, but before that, I had to keep looking it up. I’m not against us using better names than X86, but if we do it, I think we should pull the trigger and just do it everywhere. Perhaps there is enough inertia behind the names that it’s not actually worth changing. My main comment here is just that I think it’s best to be consistent within JSC.
Comment 15 Filip Pizlo 2017-09-23 19:34:04 PDT
(In reply to Saam Barati from comment #14)
> (In reply to Darin Adler from comment #13)
> > (In reply to Saam Barati from comment #12)
> > > I agree with your intuition and reasoning here, but I think it’s more
> > > important to be consistent with the rest of JSC here. We adopt various
> > > assembly language conventions and use Above to mean unsigned comparison and
> > > GreaterThan to mean signed comparison throughout the various compilers and
> > > MacroAssemblers.
> > 
> > Good to know.
> > 
> > My knowledge is out of date. The instruction sets I know best, ones other
> > than X86, call unsigned comparison "higher" and "lower" rather than "above"
> > and "below". But I recall now that the signed comparisons were the only ones
> > called "greater" and "less than" and I suppose I had forgotten that.
> > 
> > I see how this is following tradition, especially for people familiar with
> > X86; sorry for the distraction.
> 
> FWIW, this distinction has caused cognitive overhead for me in the past. At
> some point in the last year or so I crossed over to these names intuitively
> meaning what they do, but before that, I had to keep looking it up. I’m not
> against us using better names than X86, but if we do it, I think we should
> pull the trigger and just do it everywhere. Perhaps there is enough inertia
> behind the names that it’s not actually worth changing. My main comment here
> is just that I think it’s best to be consistent within JSC.

Yeah... in my case I was already fluent in this terminology before hacking on WebKit and I was super happy that WebKit used these terms. 

I’ve also hacked on systems that say ULessThan/UGreaterThan. I didn’t like that as much. If I had to guess why, I would say it’s because these are super common operations internally and having one-word names makes it easier to speak about it (below is two syllables, unsigned less than is four).
Comment 16 Yusuke Suzuki 2017-09-25 09:44:42 PDT
OK, for now, we use above/below :)
Comment 17 Yusuke Suzuki 2017-09-26 10:22:31 PDT
Ping?
Comment 18 Saam Barati 2017-09-26 11:32:58 PDT
Comment on attachment 321629 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=321629&action=review

r=me
I'm not entirely convinced we want all this machinery for a very specific type of program written in a very specific way at the JS source level. That said, I don't want to keep you from landing this.

I still wonder if a better approach would've been to teach OSR exit in a way similar to it undoing math operations that we eliminate Uint32ToNumber and just perform the operation upon exit.

> Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:2005
> +                static_cast<BinaryOpNode*>(left)->m_notCastUnsigned = true;

I'd assert here it's urshift

> Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:2007
> +                static_cast<BinaryOpNode*>(right)->m_notCastUnsigned = true;

ditto

> Source/JavaScriptCore/dfg/DFGIntegerRangeOptimizationPhase.cpp:1143
> +                        // FIXME: Handle CompareBelow and CompareBelowEq.

Please remove or link to a bug.

> Source/JavaScriptCore/parser/Nodes.h:1084
> +        bool m_notCastUnsigned { false };

Name nit: I think I'd flip the boolean state of this variable and maybe name it as:
bool m_shouldToUnsignedResult { true }
and the pattern matching code can set it to false
I think that'll be more clear from reading the code
Comment 19 Yusuke Suzuki 2017-09-26 11:59:30 PDT
Comment on attachment 321629 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=321629&action=review

Thanks. I think this patch is worth introducing because of the nature of JS.
JS does not support uint32 efficiently. But it is not a large problem since we can do int32 operations.
add, sub etc. are OK since we can perform int32 operations and recognize the resulted value as uint32 bit patterns.

But one problematic thing is comparison. It is frequently executed and it recognizes unsignedness.
So, performing >>> 0 operation when doing comparison is one of the frequently seen pattern in JS, and actually it is shown in Octane/zlib. So I think this optimization covers large part of JS programs using uint32 comparison.

In addition, we also introduced DFG strength reduction rule for unsigned comparison. So once our DFG MovHint eliminations's conservative analysis is fixed, these UInt32ToNumber will be removed, which is kept by MovHint. I think this could remove large part of UInt32ToNumber in the future.

After that, I think we could have UInt32ToNumber yet. They can be removed by using somewhat  PhantoUInt32ToNumber which will materialize UInt32ToNumber resulted value when performing OSR exit. But I think this requires OSR availability analysis for MovHint analysis, so it is only done in FTL.

So, I think this optimization is worth introducing since it can improve some patterns frequently seen in JS supported by this reasonable story.

>> Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:2005
>> +                static_cast<BinaryOpNode*>(left)->m_notCastUnsigned = true;
> 
> I'd assert here it's urshift

Fixed.

>> Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:2007
>> +                static_cast<BinaryOpNode*>(right)->m_notCastUnsigned = true;
> 
> ditto

Fixed.

>> Source/JavaScriptCore/dfg/DFGIntegerRangeOptimizationPhase.cpp:1143
>> +                        // FIXME: Handle CompareBelow and CompareBelowEq.
> 
> Please remove or link to a bug.

Added a bug link for this.

>> Source/JavaScriptCore/parser/Nodes.h:1084
>> +        bool m_notCastUnsigned { false };
> 
> Name nit: I think I'd flip the boolean state of this variable and maybe name it as:
> bool m_shouldToUnsignedResult { true }
> and the pattern matching code can set it to false
> I think that'll be more clear from reading the code

Sounds nice. Fixed.
Comment 20 Yusuke Suzuki 2017-09-26 13:18:24 PDT
Created attachment 321860 [details]
Patch for landing
Comment 21 Yusuke Suzuki 2017-09-26 13:32:26 PDT
Committed r222518: <http://trac.webkit.org/changeset/222518>
Comment 22 WebKit Commit Bot 2017-09-26 14:21:54 PDT
Re-opened since this is blocked by bug 177507
Comment 23 Yusuke Suzuki 2017-09-26 17:22:01 PDT
(In reply to WebKit Commit Bot from comment #22)
> Re-opened since this is blocked by bug 177507

I'll land it later with a fix.
BTW, this patch actually improved performance of Octane/zlib by 5%.
https://arewefastyet.com/#machine=29&view=single&suite=octane&subtest=zlib
Comment 24 Yusuke Suzuki 2017-09-27 11:51:21 PDT
Committed r222564: <http://trac.webkit.org/changeset/222564>
Comment 25 Radar WebKit Bug Importer 2017-09-27 12:19:27 PDT
<rdar://problem/34693039>
Comment 26 Saam Barati 2017-10-01 10:13:24 PDT
This patch is a 50% slowdown on JetStream/bigfig on iOS. This is leading to a 2% JetStream regression on iOS.
Comment 27 WebKit Commit Bot 2017-10-01 10:20:08 PDT
Re-opened since this is blocked by bug 177720
Comment 28 Yusuke Suzuki 2017-10-02 07:36:46 PDT
(In reply to WebKit Commit Bot from comment #27)
> Re-opened since this is blocked by bug 177720

I've measured the execution time of bigfib.cpp in several configurations.

In my Linux box (Xeon, 8 physical cores with hyperthreading), this patch rather poses performance progression.

                         baseline                  patched                                      

bigfib.cpp          570.4888+-37.6582         563.6416+-35.9494         might be 1.0121x faster

In my MacBook Pro, without any CPU configuration change, this patch rather improves performance by 1.8%.
With CPU configuration (disabling hyperthreading and using only 2 cores), it does not show any differences.

Saam, could you ensure the following two things?

1. After reverting this patch, the regression is fixed in iOS bot.
2. MacBook Pro / MacPro bots do not show regression due to this patch.
Comment 29 Filip Pizlo 2017-10-02 08:04:20 PDT
(In reply to Saam Barati from comment #18)
> Comment on attachment 321629 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=321629&action=review
> 
> r=me
> I'm not entirely convinced we want all this machinery for a very specific
> type of program written in a very specific way at the JS source level. That
> said, I don't want to keep you from landing this.
> 
> I still wonder if a better approach would've been to teach OSR exit in a way
> similar to it undoing math operations that we eliminate Uint32ToNumber and
> just perform the operation upon exit.

Yes. This. 

-Filip

> 
> > Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:2005
> > +                static_cast<BinaryOpNode*>(left)->m_notCastUnsigned = true;
> 
> I'd assert here it's urshift
> 
> > Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:2007
> > +                static_cast<BinaryOpNode*>(right)->m_notCastUnsigned = true;
> 
> ditto
> 
> > Source/JavaScriptCore/dfg/DFGIntegerRangeOptimizationPhase.cpp:1143
> > +                        // FIXME: Handle CompareBelow and CompareBelowEq.
> 
> Please remove or link to a bug.
> 
> > Source/JavaScriptCore/parser/Nodes.h:1084
> > +        bool m_notCastUnsigned { false };
> 
> Name nit: I think I'd flip the boolean state of this variable and maybe name
> it as:
> bool m_shouldToUnsignedResult { true }
> and the pattern matching code can set it to false
> I think that'll be more clear from reading the code
Comment 30 Saam Barati 2017-10-02 13:08:42 PDT
(In reply to Yusuke Suzuki from comment #28)
> (In reply to WebKit Commit Bot from comment #27)
> > Re-opened since this is blocked by bug 177720
> 
> I've measured the execution time of bigfib.cpp in several configurations.
> 
> In my Linux box (Xeon, 8 physical cores with hyperthreading), this patch
> rather poses performance progression.
> 
>                          baseline                  patched                  
> 
> 
> bigfib.cpp          570.4888+-37.6582         563.6416+-35.9494        
> might be 1.0121x faster
> 
> In my MacBook Pro, without any CPU configuration change, this patch rather
> improves performance by 1.8%.
> With CPU configuration (disabling hyperthreading and using only 2 cores), it
> does not show any differences.
> 
> Saam, could you ensure the following two things?
> 
> 1. After reverting this patch, the regression is fixed in iOS bot.
iOS perf bots recovered this 2%.

> 2. MacBook Pro / MacPro bots do not show regression due to this patch.
I don't have enough data to answer this question yet.
Comment 31 Yusuke Suzuki 2017-10-03 05:43:06 PDT
(In reply to Saam Barati from comment #30)
> (In reply to Yusuke Suzuki from comment #28)
> > (In reply to WebKit Commit Bot from comment #27)
> > > Re-opened since this is blocked by bug 177720
> > 
> > I've measured the execution time of bigfib.cpp in several configurations.
> > 
> > In my Linux box (Xeon, 8 physical cores with hyperthreading), this patch
> > rather poses performance progression.
> > 
> >                          baseline                  patched                  
> > 
> > 
> > bigfib.cpp          570.4888+-37.6582         563.6416+-35.9494        
> > might be 1.0121x faster
> > 
> > In my MacBook Pro, without any CPU configuration change, this patch rather
> > improves performance by 1.8%.
> > With CPU configuration (disabling hyperthreading and using only 2 cores), it
> > does not show any differences.
> > 
> > Saam, could you ensure the following two things?
> > 
> > 1. After reverting this patch, the regression is fixed in iOS bot.
> iOS perf bots recovered this 2%.
> 
> > 2. MacBook Pro / MacPro bots do not show regression due to this patch.
> I don't have enough data to answer this question yet.

I've a bit hacked in bug 177806. I extended Object Allocation Sinking phase to handle wider range of materialization (including UInt32ToNumber). Converting UInt32ToNumber instead. With this patch's DFG part, almost all UInt32ToNumber becomes PhantomUInt32ToNumber, which is no-op in FTL.

Actually, we get 2% improvement in the above approach. But this patch offers 5% improvement, it is better. I think this is because of LLInt/Baseline/DFG optimization effect in this patch. In the above approach, only FTL gets performance boost. But this patch's simple pattern matching can make LLInt/Baseline/DFG faster.

And this pattern can be seen in the actual code. So I think having this patch is good. Yeah, after that, we would like to introdue PhantomUInt32ToNumber thing too (in bug 177806).
Comment 32 Yusuke Suzuki 2017-10-11 04:47:11 PDT
(In reply to Yusuke Suzuki from comment #31)
> (In reply to Saam Barati from comment #30)
> > (In reply to Yusuke Suzuki from comment #28)
> > > (In reply to WebKit Commit Bot from comment #27)
> > > > Re-opened since this is blocked by bug 177720
> > > 
> > > I've measured the execution time of bigfib.cpp in several configurations.
> > > 
> > > In my Linux box (Xeon, 8 physical cores with hyperthreading), this patch
> > > rather poses performance progression.
> > > 
> > >                          baseline                  patched                  
> > > 
> > > 
> > > bigfib.cpp          570.4888+-37.6582         563.6416+-35.9494        
> > > might be 1.0121x faster
> > > 
> > > In my MacBook Pro, without any CPU configuration change, this patch rather
> > > improves performance by 1.8%.
> > > With CPU configuration (disabling hyperthreading and using only 2 cores), it
> > > does not show any differences.
> > > 
> > > Saam, could you ensure the following two things?
> > > 
> > > 1. After reverting this patch, the regression is fixed in iOS bot.
> > iOS perf bots recovered this 2%.
> > 
> > > 2. MacBook Pro / MacPro bots do not show regression due to this patch.
> > I don't have enough data to answer this question yet.
> 
> I've a bit hacked in bug 177806. I extended Object Allocation Sinking phase
> to handle wider range of materialization (including UInt32ToNumber).
> Converting UInt32ToNumber instead. With this patch's DFG part, almost all
> UInt32ToNumber becomes PhantomUInt32ToNumber, which is no-op in FTL.
> 
> Actually, we get 2% improvement in the above approach. But this patch offers
> 5% improvement, it is better. I think this is because of LLInt/Baseline/DFG
> optimization effect in this patch. In the above approach, only FTL gets
> performance boost. But this patch's simple pattern matching can make
> LLInt/Baseline/DFG faster.
> 
> And this pattern can be seen in the actual code. So I think having this
> patch is good. Yeah, after that, we would like to introdue
> PhantomUInt32ToNumber thing too (in bug 177806).

I would like to reland this patch without DFG strength reduction part.
This means that only `a >>> 0 <=> b >>>0` pattern will be handled.
In that case, integer range analysis does not offer any values for this since `a >>> 0` is only used for this part.
So even without this extension, it does not disable the existing optimization.
If that patch causes regression, it's a bit difficult to investigate without actual machine...
Maybe, due to accidental code generation in FTL.

Anyway, reducing the patch would offer a chance to bisect the culprit.
Comment 33 Yusuke Suzuki 2017-10-14 08:35:03 PDT
Created attachment 323810 [details]
Patch
Comment 34 Yusuke Suzuki 2017-10-14 08:35:59 PDT
Committed r223318: <https://trac.webkit.org/changeset/223318>
Comment 35 Zan Dobersek 2017-10-17 00:35:15 PDT
Comment on attachment 323810 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=323810&action=review

> Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:1996
> +                int32_t value = static_cast<int32_t>(static_cast<IntegerNode*>(node)->value());

This conversion is causing a failure on Linux ARM64, likely due to the conversion instruction that GCC is generating. Opened bug #178379.