Bug 153113

Summary: FTL B3 should be just as fast as FTL LLVM on Octane/crypto
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: JavaScriptCoreAssignee: Filip Pizlo <fpizlo>
Status: RESOLVED FIXED    
Severity: Normal CC: barraclough, benjamin, commit-queue, ggaren, keith_miller, mark.lam, mhahnenb, msaboff, oliver, sam, sbarati
Priority: P2    
Version: WebKit Nightly Build   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 150279, 153197    
Attachments:
Description Flags
it's a start
none
a bit more
none
starting to make sense
none
more
none
lots more
none
more
none
IRC appears to work
none
more
none
sort of getting it to work
none
trying more things
none
things
none
the patch
none
the patch
none
did more things
none
the patch
sbarati: review+
performance none

Description Filip Pizlo 2016-01-14 18:09:52 PST
Things.
Comment 1 Filip Pizlo 2016-01-14 18:19:25 PST
Created attachment 269023 [details]
it's a start
Comment 2 Filip Pizlo 2016-01-14 19:52:50 PST
Created attachment 269028 [details]
a bit more
Comment 3 Filip Pizlo 2016-01-14 21:18:49 PST
Here's my current thinking on how to proceed.  I think I'll get rid of ForwardLiveness and use a much more deliberate approach:


Our state mapping tells us what is live at any given moment. We use a separate
liveness pass to collect the kill lists for all instructions. The kill lists tell us
about the ways that things die at an Inst. There are four ways:

- An early def could die above an Inst.
- An early use could kill above an Inst.
- A late def could die below an Inst.
- A late use could kill below an Inst.

Once we are armed with our state mapping and the kill lists, we can process the Inst
in the order of its execution and update the state mapping based on defs and kill
lists.

We start above the Inst with a state that tells us what is live before it. This
includes things that will die at the top. We can handle early uses and defs by giving
them registers that are not live. We know which registers become live because our
mapping tells us this. Then we can apply the early deaths. Then we process late uses
and defs, and then apply the late deaths.

This algorithm must proceed in the order of the Inst's execution. We can use the
notion of locking at each stage - early and then late - to indicate which registers
have already been purposed. All early uses will hold the lock while we are processing
the early part of the instruction, and then release it before we start processing the
late stuff. In between early and late, after we have unlocked things used early, we
can perform early deaths. All early defs will hold the lock early and then release it
after we process late things. As a special case, UseDef will also release the lock
late.

Things get weird with UseDef. Lucky for us, Insts will have a small number of UseDefs,
so the algorithm doesn't have to be efficient and we can be sure that if we handle
UseDefs early, then the greedy approach should just work. The locking strategy ensures
that we won't try to reuse the register we already picked. The only additional logic
for UseDef is that when we pick a register for it, which happens early, we look ahead
to make sure that we didn't already lock the register late. The only registers that we
could have already locked late are the ones that are clobbered late or defined late.

One of the beautiful things about this approach is that we can process the Inst
in-place. We won't edit the uses until we have accounted for them. We won't edit the
defs until we have accounted for them. So, we can compute forward liveness while also
editing the instruction.
Comment 4 Filip Pizlo 2016-01-15 13:05:22 PST
Created attachment 269090 [details]
starting to make sense
Comment 5 Filip Pizlo 2016-01-15 22:04:17 PST
Created attachment 269150 [details]
more
Comment 6 Filip Pizlo 2016-01-16 13:02:28 PST
Created attachment 269178 [details]
lots more
Comment 7 Filip Pizlo 2016-01-16 16:40:36 PST
Created attachment 269182 [details]
more
Comment 8 Filip Pizlo 2016-01-16 17:25:55 PST
Created attachment 269183 [details]
IRC appears to work

It's not saying much, but the current status of the patch is that IRC still appears to work.  I haven't enabled register scavenging yet.
Comment 9 Filip Pizlo 2016-01-16 19:38:12 PST
Created attachment 269184 [details]
more
Comment 10 Filip Pizlo 2016-01-16 23:55:16 PST
Created attachment 269187 [details]
sort of getting it to work

I'm no longer sure if this is a great idea.
Comment 11 Filip Pizlo 2016-01-17 11:35:58 PST
Created attachment 269190 [details]
trying more things
Comment 12 Filip Pizlo 2016-01-17 12:27:36 PST
Created attachment 269194 [details]
things

This is a speed-up now that I've taken a different approach.
Comment 13 Filip Pizlo 2016-01-17 13:08:45 PST
Created attachment 269196 [details]
the patch
Comment 14 Filip Pizlo 2016-01-17 13:09:22 PST
Created attachment 269197 [details]
the patch
Comment 15 WebKit Commit Bot 2016-01-17 13:10:42 PST
Attachment 269197 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/b3/air/AirCustom.cpp:49:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirCustom.h:74:  The parameter name "inst" adds no information, so it should be removed.  [readability/parameter_name] [5]
ERROR: Source/JavaScriptCore/b3/air/AirValidate.cpp:88:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirIteratedRegisterCoalescing.cpp:1204:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirLogRegisterPressure.cpp:60:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirLogRegisterPressure.cpp:69:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirReportUsedRegisters.cpp:55:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirAllocateStack.cpp:178:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirAllocateStack.cpp:281:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/B3CheckSpecial.cpp:112:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp:117:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp:183:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp:234:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 13 in 40 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 16 Filip Pizlo 2016-01-17 14:08:50 PST
Created attachment 269198 [details]
did more things

We're now at parity with LLVM on Octane/crypto.
Comment 17 Filip Pizlo 2016-01-17 15:01:24 PST
Created attachment 269201 [details]
the patch
Comment 18 WebKit Commit Bot 2016-01-17 15:03:21 PST
Attachment 269201 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/b3/air/AirCustom.cpp:49:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirValidate.cpp:88:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirIteratedRegisterCoalescing.cpp:1204:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirLogRegisterPressure.cpp:60:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirLogRegisterPressure.cpp:69:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirReportUsedRegisters.cpp:55:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirCustom.h:74:  The parameter name "inst" adds no information, so it should be removed.  [readability/parameter_name] [5]
ERROR: Source/JavaScriptCore/b3/air/AirAllocateStack.cpp:178:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirAllocateStack.cpp:281:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/B3CheckSpecial.cpp:112:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp:117:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp:183:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp:234:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 13 in 42 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 19 Filip Pizlo 2016-01-17 15:48:31 PST
Created attachment 269203 [details]
performance
Comment 20 Saam Barati 2016-01-18 23:18:25 PST
Comment on attachment 269201 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=269201&action=review

r=me with a couple comments

> Source/JavaScriptCore/b3/air/AirAllocateStack.cpp:210
> +            block->insts().removeAllMatching(
> +                [&] (const Inst& inst) -> bool {
> +                    return !inst;
> +                });

Could/should this be done after looping over the block?

> Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp:202
> +                case Arg::Width32:

Should this check for alias->mode != AllBits?

> Source/JavaScriptCore/runtime/Options.h:346
> +    v(bool, logAirRegisterPressure, false, nullptr) \

when do we use the "log" prefix versus the "dump" prefix?

> Source/WTF/wtf/CheckedArithmetic.h:822
> +    return Checked<T, RecordOverflow>(value) * checkedSum<T>(args...);

Shouldn't this be:
Checked<T, RecordOverflow>(value) * checkedProduct<T>(args...);
?
Comment 21 Filip Pizlo 2016-01-19 10:34:15 PST
(In reply to comment #20)
> Comment on attachment 269201 [details]
> the patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=269201&action=review
> 
> r=me with a couple comments
> 
> > Source/JavaScriptCore/b3/air/AirAllocateStack.cpp:210
> > +            block->insts().removeAllMatching(
> > +                [&] (const Inst& inst) -> bool {
> > +                    return !inst;
> > +                });
> 
> Could/should this be done after looping over the block?

Wow, yeah.

> 
> > Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp:202
> > +                case Arg::Width32:
> 
> Should this check for alias->mode != AllBits?

No.  The Width64 case has such a check because if you're observing all 64 bits (that's what Width64 means) then the alias better account for all bits (that's what AllBits means).  But if we only observe 32-bits (Width32) then any of the alias modes work.

> 
> > Source/JavaScriptCore/runtime/Options.h:346
> > +    v(bool, logAirRegisterPressure, false, nullptr) \
> 
> when do we use the "log" prefix versus the "dump" prefix?

This felt like it was like logB3PhaseTimes and logGC.

> 
> > Source/WTF/wtf/CheckedArithmetic.h:822
> > +    return Checked<T, RecordOverflow>(value) * checkedSum<T>(args...);
> 
> Shouldn't this be:
> Checked<T, RecordOverflow>(value) * checkedProduct<T>(args...);
> ?

Lol of course.
Comment 22 Filip Pizlo 2016-01-19 11:29:30 PST
Landed in http://trac.webkit.org/changeset/195298