Bug 153321

Summary: B3 CSE should be able to match a full redundancy even if none of the matches dominate the value in question
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: JavaScriptCoreAssignee: Filip Pizlo <fpizlo>
Status: RESOLVED FIXED    
Severity: Normal CC: barraclough, benjamin, commit-queue, ggaren, keith_miller, mario, mark.lam, mhahnenb, msaboff, oliver, saam, sam
Priority: P2    
Version: WebKit Nightly Build   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 150507    
Attachments:
Description Flags
wrote some code
none
makes FTL B3 faster than FTL LLVM on Octane/richards
none
passing so many tests
none
current performance standings
none
the patch benjamin: review+

Description Filip Pizlo 2016-01-21 13:48:53 PST
Patch forthcoming.

This is important for Octane/richards.  And probably other things too.
Comment 1 Filip Pizlo 2016-01-21 13:50:05 PST
Created attachment 269488 [details]
wrote some code
Comment 2 Filip Pizlo 2016-01-21 15:28:15 PST
Created attachment 269504 [details]
makes FTL B3 faster than FTL LLVM on Octane/richards

I should probably run more than just richards before r?.
Comment 3 Geoffrey Garen 2016-01-21 15:45:25 PST
Comment on attachment 269504 [details]
makes FTL B3 faster than FTL LLVM on Octane/richards

View in context: https://bugs.webkit.org/attachment.cgi?id=269504&action=review

> Source/JavaScriptCore/dfg/DFGCommon.h:41
> -#define FTL_USES_B3 0
> +#define FTL_USES_B3 1

Revert.
Comment 4 Filip Pizlo 2016-01-21 15:46:46 PST
(In reply to comment #3)
> Comment on attachment 269504 [details]
> makes FTL B3 faster than FTL LLVM on Octane/richards
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=269504&action=review
> 
> > Source/JavaScriptCore/dfg/DFGCommon.h:41
> > -#define FTL_USES_B3 0
> > +#define FTL_USES_B3 1
> 
> Revert.

I will this time!
Comment 5 Filip Pizlo 2016-01-21 16:17:02 PST
Created attachment 269513 [details]
passing so many tests
Comment 6 Filip Pizlo 2016-01-21 17:11:27 PST
Created attachment 269523 [details]
current performance standings
Comment 7 Benjamin Poulain 2016-01-21 18:18:46 PST
Comment on attachment 269513 [details]
passing so many tests

View in context: https://bugs.webkit.org/attachment.cgi?id=269513&action=review

I have a question for Load8Z and Load16Z.

> Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:64
>  typedef Vector<MemoryValue*, 1> MemoryMatches;

Looks like this can be scoped inside the CSE now.

> Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:211
>                      return candidate->offset() == offset
>                          && (candidate->opcode() == Load8Z || candidate->opcode() == Store8);

Say I have 3 blocks: A, B, C.

A:
@1 = Const32(0xffffffff)
@2 = Store8(@1, @mem)
@3 = Jump C.

B: 
@4 = Const32(0xff00ff00)
@5 = Store8(@1, @ mem)
@6 = Jump C.

C:
@7 = Load8Z(@mem)

-> The load-store would nuke the top bits of my constants.

Now after changed into StackSlots and back to SSA, the full values is taken. Am I missing something?

> Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:466
>      typedef Vector<Value*, 1> Matches;

Looks like this redefines Matches from B3PureCSE. That's probably unnecessary.
Comment 8 Filip Pizlo 2016-01-21 18:27:05 PST
(In reply to comment #7)
> Comment on attachment 269513 [details]
> passing so many tests
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=269513&action=review
> 
> I have a question for Load8Z and Load16Z.
> 
> > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:64
> >  typedef Vector<MemoryValue*, 1> MemoryMatches;
> 
> Looks like this can be scoped inside the CSE now.
> 
> > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:211
> >                      return candidate->offset() == offset
> >                          && (candidate->opcode() == Load8Z || candidate->opcode() == Store8);
> 
> Say I have 3 blocks: A, B, C.
> 
> A:
> @1 = Const32(0xffffffff)
> @2 = Store8(@1, @mem)
> @3 = Jump C.
> 
> B: 
> @4 = Const32(0xff00ff00)
> @5 = Store8(@1, @ mem)
> @6 = Jump C.
> 
> C:
> @7 = Load8Z(@mem)
> 
> -> The load-store would nuke the top bits of my constants.
> 
> Now after changed into StackSlots and back to SSA, the full values is taken.
> Am I missing something?

It's a bug!  I'll add a test and fix.

> 
> > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:466
> >      typedef Vector<Value*, 1> Matches;
> 
> Looks like this redefines Matches from B3PureCSE. That's probably
> unnecessary.
Comment 9 Filip Pizlo 2016-01-21 18:40:58 PST
Created attachment 269537 [details]
the patch
Comment 10 Benjamin Poulain 2016-01-21 18:44:07 PST
Comment on attachment 269537 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=269537&action=review

> Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:217
> +                        Value* sext = m_proc.add<Value>(

sext->zext?

> Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:257
> +                        Value* sext = m_proc.add<Value>(

sext->zext?

> Source/JavaScriptCore/b3/testb3.cpp:9400
> +void testStore8Load8Z(int32_t value)

I think it would be good to have a new test for the new feature too (with the value being stored in predecessors).
Comment 11 Filip Pizlo 2016-01-21 19:32:21 PST
Landed in http://trac.webkit.org/changeset/195434