177281 – Add Above/Below comparisons for UInt32 patterns

Yusuke Suzuki

Reported 2017-09-20 16:47:07 PDT

Add Above/Below comparisons for UInt32 patterns

Yusuke Suzuki

Comment 1 2017-09-20 17:06:25 PDT

Created attachment 321391 [details] Patch

Yusuke Suzuki

Comment 2 2017-09-20 17:07:32 PDT

Spawn from bug 177100. WIP, adding tests now. But ready for early review.

Yusuke Suzuki

Comment 3 2017-09-20 17:11:48 PDT

Comment on attachment 321391 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=321391&action=review > Source/JavaScriptCore/dfg/DFGIntegerRangeOptimizationPhase.cpp:1144 > + // TODO: Handle CompareAbove etc. I'll do this. > Source/JavaScriptCore/dfg/DFGMayExit.cpp:102 > + case CompareAboveEq: This is correct, but I don't think they are necessary. I'll drop this part. > Source/JavaScriptCore/dfg/DFGSpeculativeJIT32_64.cpp:2543 > + case CompareBelow: > + if (compare(node, JITCompiler::Below, JITCompiler::DoubleLessThan, operationCompareLess)) > + return; > + break; > + > + case CompareBelowEq: > + if (compare(node, JITCompiler::BelowOrEqual, JITCompiler::DoubleLessThanOrEqual, operationCompareLessEq)) > + return; > + break; > + > + case CompareAbove: > + if (compare(node, JITCompiler::Above, JITCompiler::DoubleGreaterThan, operationCompareGreater)) > + return; > + break; > + > + case CompareAboveEq: > + if (compare(node, JITCompiler::AboveOrEqual, JITCompiler::DoubleGreaterThanOrEqual, operationCompareGreaterEq)) > + return; > + break; I'll add `compareUnsigned`, which only accepts KnownInt32 binary uses and perform unsigned comparison. > Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:2685 > + break; Ditto.

Build Bot

Comment 4 2017-09-20 18:04:22 PDT

Comment on attachment 321391 [details] Patch Attachment 321391 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/4610048 Number of test failures exceeded the failure limit.

Build Bot

Comment 5 2017-09-20 18:04:23 PDT

Created attachment 321398 [details] Archive of layout-test-results from ews116 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews116 Port: mac-elcapitan Platform: Mac OS X 10.11.6

Yusuke Suzuki

Comment 6 2017-09-21 06:53:47 PDT

Created attachment 321425 [details] WIP

Yusuke Suzuki

Comment 7 2017-09-21 07:34:15 PDT

Created attachment 321427 [details] WIP

Yusuke Suzuki

Comment 8 2017-09-21 09:33:11 PDT

Created attachment 321436 [details] Patch

Yusuke Suzuki

Comment 9 2017-09-23 05:40:51 PDT

Created attachment 321629 [details] Patch Just updating ChangeLog

Darin Adler

Comment 10 2017-09-23 17:12:29 PDT

Comment on attachment 321629 [details] Patch To me, nothing about the names "above" and "below" say "treat negative numbers as if they were large numbers" or "treat as unsigned without checking". I would consider names that make that explicit, maybe a "u" added to the existing names?

Yusuke Suzuki

Comment 11 2017-09-23 17:24:35 PDT

(In reply to Darin Adler from comment #10) > Comment on attachment 321629 [details] > Patch > > To me, nothing about the names "above" and "below" say "treat negative > numbers as if they were large numbers" or "treat as unsigned without > checking". I would consider names that make that explicit, maybe a "u" added > to the existing names? `above` / `below` is used in assembly lang as unsigned comparison, but using op_uless / op_ugreater is OK to me.

Saam Barati

Comment 12 2017-09-23 18:43:28 PDT

(In reply to Darin Adler from comment #10) > Comment on attachment 321629 [details] > Patch > > To me, nothing about the names "above" and "below" say "treat negative > numbers as if they were large numbers" or "treat as unsigned without > checking". I would consider names that make that explicit, maybe a "u" added > to the existing names? I agree with your intuition and reasoning here, but I think it’s more important to be consistent with the rest of JSC here. We adopt various assembly language conventions and use Above to mean unsigned comparison and GreaterThan to mean signed comparison throughout the various compilers and MacroAssemblers.

Darin Adler

Comment 13 2017-09-23 18:53:25 PDT

(In reply to Saam Barati from comment #12) > I agree with your intuition and reasoning here, but I think it’s more > important to be consistent with the rest of JSC here. We adopt various > assembly language conventions and use Above to mean unsigned comparison and > GreaterThan to mean signed comparison throughout the various compilers and > MacroAssemblers. Good to know. My knowledge is out of date. The instruction sets I know best, ones other than X86, call unsigned comparison "higher" and "lower" rather than "above" and "below". But I recall now that the signed comparisons were the only ones called "greater" and "less than" and I suppose I had forgotten that. I see how this is following tradition, especially for people familiar with X86; sorry for the distraction.

Saam Barati

Comment 14 2017-09-23 19:16:53 PDT

(In reply to Darin Adler from comment #13) > (In reply to Saam Barati from comment #12) > > I agree with your intuition and reasoning here, but I think it’s more > > important to be consistent with the rest of JSC here. We adopt various > > assembly language conventions and use Above to mean unsigned comparison and > > GreaterThan to mean signed comparison throughout the various compilers and > > MacroAssemblers. > > Good to know. > > My knowledge is out of date. The instruction sets I know best, ones other > than X86, call unsigned comparison "higher" and "lower" rather than "above" > and "below". But I recall now that the signed comparisons were the only ones > called "greater" and "less than" and I suppose I had forgotten that. > > I see how this is following tradition, especially for people familiar with > X86; sorry for the distraction. FWIW, this distinction has caused cognitive overhead for me in the past. At some point in the last year or so I crossed over to these names intuitively meaning what they do, but before that, I had to keep looking it up. I’m not against us using better names than X86, but if we do it, I think we should pull the trigger and just do it everywhere. Perhaps there is enough inertia behind the names that it’s not actually worth changing. My main comment here is just that I think it’s best to be consistent within JSC.

Filip Pizlo

Comment 15 2017-09-23 19:34:04 PDT

(In reply to Saam Barati from comment #14) > (In reply to Darin Adler from comment #13) > > (In reply to Saam Barati from comment #12) > > > I agree with your intuition and reasoning here, but I think it’s more > > > important to be consistent with the rest of JSC here. We adopt various > > > assembly language conventions and use Above to mean unsigned comparison and > > > GreaterThan to mean signed comparison throughout the various compilers and > > > MacroAssemblers. > > > > Good to know. > > > > My knowledge is out of date. The instruction sets I know best, ones other > > than X86, call unsigned comparison "higher" and "lower" rather than "above" > > and "below". But I recall now that the signed comparisons were the only ones > > called "greater" and "less than" and I suppose I had forgotten that. > > > > I see how this is following tradition, especially for people familiar with > > X86; sorry for the distraction. > > FWIW, this distinction has caused cognitive overhead for me in the past. At > some point in the last year or so I crossed over to these names intuitively > meaning what they do, but before that, I had to keep looking it up. I’m not > against us using better names than X86, but if we do it, I think we should > pull the trigger and just do it everywhere. Perhaps there is enough inertia > behind the names that it’s not actually worth changing. My main comment here > is just that I think it’s best to be consistent within JSC. Yeah... in my case I was already fluent in this terminology before hacking on WebKit and I was super happy that WebKit used these terms. I’ve also hacked on systems that say ULessThan/UGreaterThan. I didn’t like that as much. If I had to guess why, I would say it’s because these are super common operations internally and having one-word names makes it easier to speak about it (below is two syllables, unsigned less than is four).

Yusuke Suzuki

Comment 16 2017-09-25 09:44:42 PDT

OK, for now, we use above/below :)

Yusuke Suzuki

Comment 17 2017-09-26 10:22:31 PDT

Ping?

Saam Barati

Comment 18 2017-09-26 11:32:58 PDT

Comment on attachment 321629 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=321629&action=review r=me I'm not entirely convinced we want all this machinery for a very specific type of program written in a very specific way at the JS source level. That said, I don't want to keep you from landing this. I still wonder if a better approach would've been to teach OSR exit in a way similar to it undoing math operations that we eliminate Uint32ToNumber and just perform the operation upon exit. > Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:2005 > + static_cast<BinaryOpNode*>(left)->m_notCastUnsigned = true; I'd assert here it's urshift > Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:2007 > + static_cast<BinaryOpNode*>(right)->m_notCastUnsigned = true; ditto > Source/JavaScriptCore/dfg/DFGIntegerRangeOptimizationPhase.cpp:1143 > + // FIXME: Handle CompareBelow and CompareBelowEq. Please remove or link to a bug. > Source/JavaScriptCore/parser/Nodes.h:1084 > + bool m_notCastUnsigned { false }; Name nit: I think I'd flip the boolean state of this variable and maybe name it as: bool m_shouldToUnsignedResult { true } and the pattern matching code can set it to false I think that'll be more clear from reading the code

Yusuke Suzuki

Comment 19 2017-09-26 11:59:30 PDT

Comment on attachment 321629 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=321629&action=review Thanks. I think this patch is worth introducing because of the nature of JS. JS does not support uint32 efficiently. But it is not a large problem since we can do int32 operations. add, sub etc. are OK since we can perform int32 operations and recognize the resulted value as uint32 bit patterns. But one problematic thing is comparison. It is frequently executed and it recognizes unsignedness. So, performing >>> 0 operation when doing comparison is one of the frequently seen pattern in JS, and actually it is shown in Octane/zlib. So I think this optimization covers large part of JS programs using uint32 comparison. In addition, we also introduced DFG strength reduction rule for unsigned comparison. So once our DFG MovHint eliminations's conservative analysis is fixed, these UInt32ToNumber will be removed, which is kept by MovHint. I think this could remove large part of UInt32ToNumber in the future. After that, I think we could have UInt32ToNumber yet. They can be removed by using somewhat PhantoUInt32ToNumber which will materialize UInt32ToNumber resulted value when performing OSR exit. But I think this requires OSR availability analysis for MovHint analysis, so it is only done in FTL. So, I think this optimization is worth introducing since it can improve some patterns frequently seen in JS supported by this reasonable story. >> Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:2005 >> + static_cast<BinaryOpNode*>(left)->m_notCastUnsigned = true; > > I'd assert here it's urshift Fixed. >> Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:2007 >> + static_cast<BinaryOpNode*>(right)->m_notCastUnsigned = true; > > ditto Fixed. >> Source/JavaScriptCore/dfg/DFGIntegerRangeOptimizationPhase.cpp:1143 >> + // FIXME: Handle CompareBelow and CompareBelowEq. > > Please remove or link to a bug. Added a bug link for this. >> Source/JavaScriptCore/parser/Nodes.h:1084 >> + bool m_notCastUnsigned { false }; > > Name nit: I think I'd flip the boolean state of this variable and maybe name it as: > bool m_shouldToUnsignedResult { true } > and the pattern matching code can set it to false > I think that'll be more clear from reading the code Sounds nice. Fixed.

Yusuke Suzuki

Comment 20 2017-09-26 13:18:24 PDT

Created attachment 321860 [details] Patch for landing

Yusuke Suzuki

Comment 21 2017-09-26 13:32:26 PDT

Committed r222518: <http://trac.webkit.org/changeset/222518>

WebKit Commit Bot

Comment 22 2017-09-26 14:21:54 PDT

Re-opened since this is blocked by bug 177507

Yusuke Suzuki

Comment 23 2017-09-26 17:22:01 PDT

(In reply to WebKit Commit Bot from comment #22) > Re-opened since this is blocked by bug 177507 I'll land it later with a fix. BTW, this patch actually improved performance of Octane/zlib by 5%. https://arewefastyet.com/#machine=29&view=single&suite=octane&subtest=zlib

Yusuke Suzuki

Comment 24 2017-09-27 11:51:21 PDT

Committed r222564: <http://trac.webkit.org/changeset/222564>

Radar WebKit Bug Importer

Comment 25 2017-09-27 12:19:27 PDT

<rdar://problem/34693039>

Saam Barati

Comment 26 2017-10-01 10:13:24 PDT

This patch is a 50% slowdown on JetStream/bigfig on iOS. This is leading to a 2% JetStream regression on iOS.

WebKit Commit Bot

Comment 27 2017-10-01 10:20:08 PDT

Re-opened since this is blocked by bug 177720

Yusuke Suzuki

Comment 28 2017-10-02 07:36:46 PDT

(In reply to WebKit Commit Bot from comment #27) > Re-opened since this is blocked by bug 177720 I've measured the execution time of bigfib.cpp in several configurations. In my Linux box (Xeon, 8 physical cores with hyperthreading), this patch rather poses performance progression. baseline patched bigfib.cpp 570.4888+-37.6582 563.6416+-35.9494 might be 1.0121x faster In my MacBook Pro, without any CPU configuration change, this patch rather improves performance by 1.8%. With CPU configuration (disabling hyperthreading and using only 2 cores), it does not show any differences. Saam, could you ensure the following two things? 1. After reverting this patch, the regression is fixed in iOS bot. 2. MacBook Pro / MacPro bots do not show regression due to this patch.

Filip Pizlo

Comment 29 2017-10-02 08:04:20 PDT

(In reply to Saam Barati from comment #18) > Comment on attachment 321629 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=321629&action=review > > r=me > I'm not entirely convinced we want all this machinery for a very specific > type of program written in a very specific way at the JS source level. That > said, I don't want to keep you from landing this. > > I still wonder if a better approach would've been to teach OSR exit in a way > similar to it undoing math operations that we eliminate Uint32ToNumber and > just perform the operation upon exit. Yes. This. -Filip > > > Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:2005 > > + static_cast<BinaryOpNode*>(left)->m_notCastUnsigned = true; > > I'd assert here it's urshift > > > Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:2007 > > + static_cast<BinaryOpNode*>(right)->m_notCastUnsigned = true; > > ditto > > > Source/JavaScriptCore/dfg/DFGIntegerRangeOptimizationPhase.cpp:1143 > > + // FIXME: Handle CompareBelow and CompareBelowEq. > > Please remove or link to a bug. > > > Source/JavaScriptCore/parser/Nodes.h:1084 > > + bool m_notCastUnsigned { false }; > > Name nit: I think I'd flip the boolean state of this variable and maybe name > it as: > bool m_shouldToUnsignedResult { true } > and the pattern matching code can set it to false > I think that'll be more clear from reading the code

Saam Barati

Comment 30 2017-10-02 13:08:42 PDT

(In reply to Yusuke Suzuki from comment #28) > (In reply to WebKit Commit Bot from comment #27) > > Re-opened since this is blocked by bug 177720 > > I've measured the execution time of bigfib.cpp in several configurations. > > In my Linux box (Xeon, 8 physical cores with hyperthreading), this patch > rather poses performance progression. > > baseline patched > > > bigfib.cpp 570.4888+-37.6582 563.6416+-35.9494 > might be 1.0121x faster > > In my MacBook Pro, without any CPU configuration change, this patch rather > improves performance by 1.8%. > With CPU configuration (disabling hyperthreading and using only 2 cores), it > does not show any differences. > > Saam, could you ensure the following two things? > > 1. After reverting this patch, the regression is fixed in iOS bot. iOS perf bots recovered this 2%. > 2. MacBook Pro / MacPro bots do not show regression due to this patch. I don't have enough data to answer this question yet.

Yusuke Suzuki

Comment 31 2017-10-03 05:43:06 PDT

(In reply to Saam Barati from comment #30) > (In reply to Yusuke Suzuki from comment #28) > > (In reply to WebKit Commit Bot from comment #27) > > > Re-opened since this is blocked by bug 177720 > > > > I've measured the execution time of bigfib.cpp in several configurations. > > > > In my Linux box (Xeon, 8 physical cores with hyperthreading), this patch > > rather poses performance progression. > > > > baseline patched > > > > > > bigfib.cpp 570.4888+-37.6582 563.6416+-35.9494 > > might be 1.0121x faster > > > > In my MacBook Pro, without any CPU configuration change, this patch rather > > improves performance by 1.8%. > > With CPU configuration (disabling hyperthreading and using only 2 cores), it > > does not show any differences. > > > > Saam, could you ensure the following two things? > > > > 1. After reverting this patch, the regression is fixed in iOS bot. > iOS perf bots recovered this 2%. > > > 2. MacBook Pro / MacPro bots do not show regression due to this patch. > I don't have enough data to answer this question yet. I've a bit hacked in bug 177806. I extended Object Allocation Sinking phase to handle wider range of materialization (including UInt32ToNumber). Converting UInt32ToNumber instead. With this patch's DFG part, almost all UInt32ToNumber becomes PhantomUInt32ToNumber, which is no-op in FTL. Actually, we get 2% improvement in the above approach. But this patch offers 5% improvement, it is better. I think this is because of LLInt/Baseline/DFG optimization effect in this patch. In the above approach, only FTL gets performance boost. But this patch's simple pattern matching can make LLInt/Baseline/DFG faster. And this pattern can be seen in the actual code. So I think having this patch is good. Yeah, after that, we would like to introdue PhantomUInt32ToNumber thing too (in bug 177806).

Yusuke Suzuki

Comment 32 2017-10-11 04:47:11 PDT

(In reply to Yusuke Suzuki from comment #31) > (In reply to Saam Barati from comment #30) > > (In reply to Yusuke Suzuki from comment #28) > > > (In reply to WebKit Commit Bot from comment #27) > > > > Re-opened since this is blocked by bug 177720 > > > > > > I've measured the execution time of bigfib.cpp in several configurations. > > > > > > In my Linux box (Xeon, 8 physical cores with hyperthreading), this patch > > > rather poses performance progression. > > > > > > baseline patched > > > > > > > > > bigfib.cpp 570.4888+-37.6582 563.6416+-35.9494 > > > might be 1.0121x faster > > > > > > In my MacBook Pro, without any CPU configuration change, this patch rather > > > improves performance by 1.8%. > > > With CPU configuration (disabling hyperthreading and using only 2 cores), it > > > does not show any differences. > > > > > > Saam, could you ensure the following two things? > > > > > > 1. After reverting this patch, the regression is fixed in iOS bot. > > iOS perf bots recovered this 2%. > > > > > 2. MacBook Pro / MacPro bots do not show regression due to this patch. > > I don't have enough data to answer this question yet. > > I've a bit hacked in bug 177806. I extended Object Allocation Sinking phase > to handle wider range of materialization (including UInt32ToNumber). > Converting UInt32ToNumber instead. With this patch's DFG part, almost all > UInt32ToNumber becomes PhantomUInt32ToNumber, which is no-op in FTL. > > Actually, we get 2% improvement in the above approach. But this patch offers > 5% improvement, it is better. I think this is because of LLInt/Baseline/DFG > optimization effect in this patch. In the above approach, only FTL gets > performance boost. But this patch's simple pattern matching can make > LLInt/Baseline/DFG faster. > > And this pattern can be seen in the actual code. So I think having this > patch is good. Yeah, after that, we would like to introdue > PhantomUInt32ToNumber thing too (in bug 177806). I would like to reland this patch without DFG strength reduction part. This means that only `a >>> 0 <=> b >>>0` pattern will be handled. In that case, integer range analysis does not offer any values for this since `a >>> 0` is only used for this part. So even without this extension, it does not disable the existing optimization. If that patch causes regression, it's a bit difficult to investigate without actual machine... Maybe, due to accidental code generation in FTL. Anyway, reducing the patch would offer a chance to bisect the culprit.

Yusuke Suzuki

Comment 33 2017-10-14 08:35:03 PDT

Created attachment 323810 [details] Patch

Yusuke Suzuki

Comment 34 2017-10-14 08:35:59 PDT

Committed r223318: <https://trac.webkit.org/changeset/223318>

Zan Dobersek

Comment 35 2017-10-17 00:35:15 PDT

Comment on attachment 323810 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=323810&action=review > Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:1996 > + int32_t value = static_cast<int32_t>(static_cast<IntegerNode*>(node)->value()); This conversion is causing a failure on Linux ARM64, likely due to the conversion instruction that GCC is generating. Opened bug #178379.

Attachments
Patch (64.55 KB, patch) 2017-09-20 17:06 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews116 for mac-elcapitan (666.87 KB, application/zip) 2017-09-20 18:04 PDT, Build Bot	no flags	Details
WIP (59.93 KB, patch) 2017-09-21 06:53 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
WIP (58.02 KB, patch) 2017-09-21 07:34 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch (66.47 KB, patch) 2017-09-21 09:33 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch (66.50 KB, patch) 2017-09-23 05:40 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch for landing (66.71 KB, patch) 2017-09-26 13:18 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch (64.15 KB, patch) 2017-10-14 08:35 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Show Obsolete (6) View All Add attachment proposed patch, testcase, etc.