Bug 153321 - B3 CSE should be able to match a full redundancy even if none of the matches dominate the value in question
Summary: B3 CSE should be able to match a full redundancy even if none of the matches ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: All All
: P2 Normal
Assignee: Filip Pizlo
URL:
Keywords:
Depends on:
Blocks: 150507
  Show dependency treegraph
 
Reported: 2016-01-21 13:48 PST by Filip Pizlo
Modified: 2016-01-21 19:32 PST (History)
12 users (show)

See Also:


Attachments
wrote some code (12.99 KB, patch)
2016-01-21 13:50 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff
makes FTL B3 faster than FTL LLVM on Octane/richards (19.62 KB, patch)
2016-01-21 15:28 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff
passing so many tests (22.97 KB, patch)
2016-01-21 16:17 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff
current performance standings (80.92 KB, text/plain)
2016-01-21 17:11 PST, Filip Pizlo
no flags Details
the patch (26.42 KB, patch)
2016-01-21 18:40 PST, Filip Pizlo
benjamin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2016-01-21 13:48:53 PST
Patch forthcoming.

This is important for Octane/richards.  And probably other things too.
Comment 1 Filip Pizlo 2016-01-21 13:50:05 PST
Created attachment 269488 [details]
wrote some code
Comment 2 Filip Pizlo 2016-01-21 15:28:15 PST
Created attachment 269504 [details]
makes FTL B3 faster than FTL LLVM on Octane/richards

I should probably run more than just richards before r?.
Comment 3 Geoffrey Garen 2016-01-21 15:45:25 PST
Comment on attachment 269504 [details]
makes FTL B3 faster than FTL LLVM on Octane/richards

View in context: https://bugs.webkit.org/attachment.cgi?id=269504&action=review

> Source/JavaScriptCore/dfg/DFGCommon.h:41
> -#define FTL_USES_B3 0
> +#define FTL_USES_B3 1

Revert.
Comment 4 Filip Pizlo 2016-01-21 15:46:46 PST
(In reply to comment #3)
> Comment on attachment 269504 [details]
> makes FTL B3 faster than FTL LLVM on Octane/richards
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=269504&action=review
> 
> > Source/JavaScriptCore/dfg/DFGCommon.h:41
> > -#define FTL_USES_B3 0
> > +#define FTL_USES_B3 1
> 
> Revert.

I will this time!
Comment 5 Filip Pizlo 2016-01-21 16:17:02 PST
Created attachment 269513 [details]
passing so many tests
Comment 6 Filip Pizlo 2016-01-21 17:11:27 PST
Created attachment 269523 [details]
current performance standings
Comment 7 Benjamin Poulain 2016-01-21 18:18:46 PST
Comment on attachment 269513 [details]
passing so many tests

View in context: https://bugs.webkit.org/attachment.cgi?id=269513&action=review

I have a question for Load8Z and Load16Z.

> Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:64
>  typedef Vector<MemoryValue*, 1> MemoryMatches;

Looks like this can be scoped inside the CSE now.

> Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:211
>                      return candidate->offset() == offset
>                          && (candidate->opcode() == Load8Z || candidate->opcode() == Store8);

Say I have 3 blocks: A, B, C.

A:
@1 = Const32(0xffffffff)
@2 = Store8(@1, @mem)
@3 = Jump C.

B: 
@4 = Const32(0xff00ff00)
@5 = Store8(@1, @ mem)
@6 = Jump C.

C:
@7 = Load8Z(@mem)

-> The load-store would nuke the top bits of my constants.

Now after changed into StackSlots and back to SSA, the full values is taken. Am I missing something?

> Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:466
>      typedef Vector<Value*, 1> Matches;

Looks like this redefines Matches from B3PureCSE. That's probably unnecessary.
Comment 8 Filip Pizlo 2016-01-21 18:27:05 PST
(In reply to comment #7)
> Comment on attachment 269513 [details]
> passing so many tests
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=269513&action=review
> 
> I have a question for Load8Z and Load16Z.
> 
> > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:64
> >  typedef Vector<MemoryValue*, 1> MemoryMatches;
> 
> Looks like this can be scoped inside the CSE now.
> 
> > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:211
> >                      return candidate->offset() == offset
> >                          && (candidate->opcode() == Load8Z || candidate->opcode() == Store8);
> 
> Say I have 3 blocks: A, B, C.
> 
> A:
> @1 = Const32(0xffffffff)
> @2 = Store8(@1, @mem)
> @3 = Jump C.
> 
> B: 
> @4 = Const32(0xff00ff00)
> @5 = Store8(@1, @ mem)
> @6 = Jump C.
> 
> C:
> @7 = Load8Z(@mem)
> 
> -> The load-store would nuke the top bits of my constants.
> 
> Now after changed into StackSlots and back to SSA, the full values is taken.
> Am I missing something?

It's a bug!  I'll add a test and fix.

> 
> > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:466
> >      typedef Vector<Value*, 1> Matches;
> 
> Looks like this redefines Matches from B3PureCSE. That's probably
> unnecessary.
Comment 9 Filip Pizlo 2016-01-21 18:40:58 PST
Created attachment 269537 [details]
the patch
Comment 10 Benjamin Poulain 2016-01-21 18:44:07 PST
Comment on attachment 269537 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=269537&action=review

> Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:217
> +                        Value* sext = m_proc.add<Value>(

sext->zext?

> Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:257
> +                        Value* sext = m_proc.add<Value>(

sext->zext?

> Source/JavaScriptCore/b3/testb3.cpp:9400
> +void testStore8Load8Z(int32_t value)

I think it would be good to have a new test for the new feature too (with the value being stored in predecessors).
Comment 11 Filip Pizlo 2016-01-21 19:32:21 PST
Landed in http://trac.webkit.org/changeset/195434