Summary: | FTL B3 should be just as fast as FTL LLVM on Octane/crypto | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Filip Pizlo <fpizlo> | ||||||||||||||||||||||||||||||||||
Component: | JavaScriptCore | Assignee: | Filip Pizlo <fpizlo> | ||||||||||||||||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||||||||||||||||
Severity: | Normal | CC: | barraclough, benjamin, commit-queue, ggaren, keith_miller, mark.lam, mhahnenb, msaboff, oliver, saam, sam | ||||||||||||||||||||||||||||||||||
Priority: | P2 | ||||||||||||||||||||||||||||||||||||
Version: | WebKit Nightly Build | ||||||||||||||||||||||||||||||||||||
Hardware: | All | ||||||||||||||||||||||||||||||||||||
OS: | All | ||||||||||||||||||||||||||||||||||||
Bug Depends on: | |||||||||||||||||||||||||||||||||||||
Bug Blocks: | 150279, 153197 | ||||||||||||||||||||||||||||||||||||
Attachments: |
|
Description
Filip Pizlo
2016-01-14 18:09:52 PST
Created attachment 269023 [details]
it's a start
Created attachment 269028 [details]
a bit more
Here's my current thinking on how to proceed. I think I'll get rid of ForwardLiveness and use a much more deliberate approach: Our state mapping tells us what is live at any given moment. We use a separate liveness pass to collect the kill lists for all instructions. The kill lists tell us about the ways that things die at an Inst. There are four ways: - An early def could die above an Inst. - An early use could kill above an Inst. - A late def could die below an Inst. - A late use could kill below an Inst. Once we are armed with our state mapping and the kill lists, we can process the Inst in the order of its execution and update the state mapping based on defs and kill lists. We start above the Inst with a state that tells us what is live before it. This includes things that will die at the top. We can handle early uses and defs by giving them registers that are not live. We know which registers become live because our mapping tells us this. Then we can apply the early deaths. Then we process late uses and defs, and then apply the late deaths. This algorithm must proceed in the order of the Inst's execution. We can use the notion of locking at each stage - early and then late - to indicate which registers have already been purposed. All early uses will hold the lock while we are processing the early part of the instruction, and then release it before we start processing the late stuff. In between early and late, after we have unlocked things used early, we can perform early deaths. All early defs will hold the lock early and then release it after we process late things. As a special case, UseDef will also release the lock late. Things get weird with UseDef. Lucky for us, Insts will have a small number of UseDefs, so the algorithm doesn't have to be efficient and we can be sure that if we handle UseDefs early, then the greedy approach should just work. The locking strategy ensures that we won't try to reuse the register we already picked. The only additional logic for UseDef is that when we pick a register for it, which happens early, we look ahead to make sure that we didn't already lock the register late. The only registers that we could have already locked late are the ones that are clobbered late or defined late. One of the beautiful things about this approach is that we can process the Inst in-place. We won't edit the uses until we have accounted for them. We won't edit the defs until we have accounted for them. So, we can compute forward liveness while also editing the instruction. Created attachment 269090 [details]
starting to make sense
Created attachment 269150 [details]
more
Created attachment 269178 [details]
lots more
Created attachment 269182 [details]
more
Created attachment 269183 [details]
IRC appears to work
It's not saying much, but the current status of the patch is that IRC still appears to work. I haven't enabled register scavenging yet.
Created attachment 269184 [details]
more
Created attachment 269187 [details]
sort of getting it to work
I'm no longer sure if this is a great idea.
Created attachment 269190 [details]
trying more things
Created attachment 269194 [details]
things
This is a speed-up now that I've taken a different approach.
Created attachment 269196 [details]
the patch
Created attachment 269197 [details]
the patch
Attachment 269197 [details] did not pass style-queue:
ERROR: Source/JavaScriptCore/b3/air/AirCustom.cpp:49: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirCustom.h:74: The parameter name "inst" adds no information, so it should be removed. [readability/parameter_name] [5]
ERROR: Source/JavaScriptCore/b3/air/AirValidate.cpp:88: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirIteratedRegisterCoalescing.cpp:1204: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirLogRegisterPressure.cpp:60: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirLogRegisterPressure.cpp:69: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirReportUsedRegisters.cpp:55: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirAllocateStack.cpp:178: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirAllocateStack.cpp:281: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/B3CheckSpecial.cpp:112: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp:117: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp:183: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp:234: Place brace on its own line for function definitions. [whitespace/braces] [4]
Total errors found: 13 in 40 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 269198 [details]
did more things
We're now at parity with LLVM on Octane/crypto.
Created attachment 269201 [details]
the patch
Attachment 269201 [details] did not pass style-queue:
ERROR: Source/JavaScriptCore/b3/air/AirCustom.cpp:49: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirValidate.cpp:88: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirIteratedRegisterCoalescing.cpp:1204: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirLogRegisterPressure.cpp:60: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirLogRegisterPressure.cpp:69: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirReportUsedRegisters.cpp:55: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirCustom.h:74: The parameter name "inst" adds no information, so it should be removed. [readability/parameter_name] [5]
ERROR: Source/JavaScriptCore/b3/air/AirAllocateStack.cpp:178: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirAllocateStack.cpp:281: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/B3CheckSpecial.cpp:112: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp:117: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp:183: Place brace on its own line for function definitions. [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp:234: Place brace on its own line for function definitions. [whitespace/braces] [4]
Total errors found: 13 in 42 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 269203 [details]
performance
Comment on attachment 269201 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=269201&action=review r=me with a couple comments > Source/JavaScriptCore/b3/air/AirAllocateStack.cpp:210 > + block->insts().removeAllMatching( > + [&] (const Inst& inst) -> bool { > + return !inst; > + }); Could/should this be done after looping over the block? > Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp:202 > + case Arg::Width32: Should this check for alias->mode != AllBits? > Source/JavaScriptCore/runtime/Options.h:346 > + v(bool, logAirRegisterPressure, false, nullptr) \ when do we use the "log" prefix versus the "dump" prefix? > Source/WTF/wtf/CheckedArithmetic.h:822 > + return Checked<T, RecordOverflow>(value) * checkedSum<T>(args...); Shouldn't this be: Checked<T, RecordOverflow>(value) * checkedProduct<T>(args...); ? (In reply to comment #20) > Comment on attachment 269201 [details] > the patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=269201&action=review > > r=me with a couple comments > > > Source/JavaScriptCore/b3/air/AirAllocateStack.cpp:210 > > + block->insts().removeAllMatching( > > + [&] (const Inst& inst) -> bool { > > + return !inst; > > + }); > > Could/should this be done after looping over the block? Wow, yeah. > > > Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp:202 > > + case Arg::Width32: > > Should this check for alias->mode != AllBits? No. The Width64 case has such a check because if you're observing all 64 bits (that's what Width64 means) then the alias better account for all bits (that's what AllBits means). But if we only observe 32-bits (Width32) then any of the alias modes work. > > > Source/JavaScriptCore/runtime/Options.h:346 > > + v(bool, logAirRegisterPressure, false, nullptr) \ > > when do we use the "log" prefix versus the "dump" prefix? This felt like it was like logB3PhaseTimes and logGC. > > > Source/WTF/wtf/CheckedArithmetic.h:822 > > + return Checked<T, RecordOverflow>(value) * checkedSum<T>(args...); > > Shouldn't this be: > Checked<T, RecordOverflow>(value) * checkedProduct<T>(args...); > ? Lol of course. Landed in http://trac.webkit.org/changeset/195298 |