153321 – B3 CSE should be able to match a full redundancy even if none of the matches dominate the value in question

RESOLVED FIXED153321

B3 CSE should be able to match a full redundancy even if none of the matches dominate the value in question

https://bugs.webkit.org/show_bug.cgi?id=153321

Summary B3 CSE should be able to match a full redundancy even if none of the matches ...

Filip Pizlo

Reported 2016-01-21 13:48:53 PST

Patch forthcoming. This is important for Octane/richards. And probably other things too.

Attachments
wrote some code (12.99 KB, patch) 2016-01-21 13:50 PST, Filip Pizlo	no flags	Details Formatted Diff Diff
makes FTL B3 faster than FTL LLVM on Octane/richards (19.62 KB, patch) 2016-01-21 15:28 PST, Filip Pizlo	no flags	Details Formatted Diff Diff
passing so many tests (22.97 KB, patch) 2016-01-21 16:17 PST, Filip Pizlo	no flags	Details Formatted Diff Diff
current performance standings (80.92 KB, text/plain) 2016-01-21 17:11 PST, Filip Pizlo	no flags	Details
the patch (26.42 KB, patch) 2016-01-21 18:40 PST, Filip Pizlo	benjamin: review+	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Filip Pizlo

Comment 1 2016-01-21 13:50:05 PST

Created attachment 269488 [details] wrote some code

Filip Pizlo

Comment 2 2016-01-21 15:28:15 PST

Created attachment 269504 [details] makes FTL B3 faster than FTL LLVM on Octane/richards I should probably run more than just richards before r?.

Geoffrey Garen

Comment 3 2016-01-21 15:45:25 PST

Comment on attachment 269504 [details] makes FTL B3 faster than FTL LLVM on Octane/richards View in context: https://bugs.webkit.org/attachment.cgi?id=269504&action=review > Source/JavaScriptCore/dfg/DFGCommon.h:41 > -#define FTL_USES_B3 0 > +#define FTL_USES_B3 1 Revert.

Filip Pizlo

Comment 4 2016-01-21 15:46:46 PST

(In reply to comment #3) > Comment on attachment 269504 [details] > makes FTL B3 faster than FTL LLVM on Octane/richards > > View in context: > https://bugs.webkit.org/attachment.cgi?id=269504&action=review > > > Source/JavaScriptCore/dfg/DFGCommon.h:41 > > -#define FTL_USES_B3 0 > > +#define FTL_USES_B3 1 > > Revert. I will this time!

Filip Pizlo

Comment 5 2016-01-21 16:17:02 PST

Created attachment 269513 [details] passing so many tests

Filip Pizlo

Comment 6 2016-01-21 17:11:27 PST

Created attachment 269523 [details] current performance standings

Benjamin Poulain

Comment 7 2016-01-21 18:18:46 PST

Comment on attachment 269513 [details] passing so many tests View in context: https://bugs.webkit.org/attachment.cgi?id=269513&action=review I have a question for Load8Z and Load16Z. > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:64 > typedef Vector<MemoryValue*, 1> MemoryMatches; Looks like this can be scoped inside the CSE now. > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:211 > return candidate->offset() == offset > && (candidate->opcode() == Load8Z || candidate->opcode() == Store8); Say I have 3 blocks: A, B, C. A: @1 = Const32(0xffffffff) @2 = Store8(@1, @mem) @3 = Jump C. B: @4 = Const32(0xff00ff00) @5 = Store8(@1, @ mem) @6 = Jump C. C: @7 = Load8Z(@mem) -> The load-store would nuke the top bits of my constants. Now after changed into StackSlots and back to SSA, the full values is taken. Am I missing something? > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:466 > typedef Vector<Value*, 1> Matches; Looks like this redefines Matches from B3PureCSE. That's probably unnecessary.

Filip Pizlo

Comment 8 2016-01-21 18:27:05 PST

(In reply to comment #7) > Comment on attachment 269513 [details] > passing so many tests > > View in context: > https://bugs.webkit.org/attachment.cgi?id=269513&action=review > > I have a question for Load8Z and Load16Z. > > > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:64 > > typedef Vector<MemoryValue*, 1> MemoryMatches; > > Looks like this can be scoped inside the CSE now. > > > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:211 > > return candidate->offset() == offset > > && (candidate->opcode() == Load8Z || candidate->opcode() == Store8); > > Say I have 3 blocks: A, B, C. > > A: > @1 = Const32(0xffffffff) > @2 = Store8(@1, @mem) > @3 = Jump C. > > B: > @4 = Const32(0xff00ff00) > @5 = Store8(@1, @ mem) > @6 = Jump C. > > C: > @7 = Load8Z(@mem) > > -> The load-store would nuke the top bits of my constants. > > Now after changed into StackSlots and back to SSA, the full values is taken. > Am I missing something? It's a bug! I'll add a test and fix. > > > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:466 > > typedef Vector<Value*, 1> Matches; > > Looks like this redefines Matches from B3PureCSE. That's probably > unnecessary.

Filip Pizlo

Comment 9 2016-01-21 18:40:58 PST

Created attachment 269537 [details] the patch

Benjamin Poulain

Comment 10 2016-01-21 18:44:07 PST

Comment on attachment 269537 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=269537&action=review > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:217 > + Value* sext = m_proc.add<Value>( sext->zext? > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:257 > + Value* sext = m_proc.add<Value>( sext->zext? > Source/JavaScriptCore/b3/testb3.cpp:9400 > +void testStore8Load8Z(int32_t value) I think it would be good to have a new test for the new feature too (with the value being stored in predecessors).

Filip Pizlo

Comment 11 2016-01-21 19:32:21 PST

Landed in http://trac.webkit.org/changeset/195434

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware All

OS All

Product WebKit

Component JavaScriptCore

Assignee

Filip Pizlo

Reported

2016-01-21 13:48 PST

Modified

2016-01-21 19:32 PST History

CC List

12 users Show

URL

Keywords

Depends on

Blocks

150507

Dependencies

tree graph