Bug 159391

Summary: B3 should support multiple entrypoints
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: JavaScriptCoreAssignee: Filip Pizlo <fpizlo>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, commit-queue, saam, ysuzuki
Priority: P2    
Version: WebKit Nightly Build   
Hardware: All   
OS: All   
See Also: https://bugs.webkit.org/show_bug.cgi?id=151141
Attachments:
Description Flags
work in progress
none
rebased and improved
none
it's starting to work!
none
works like a charm
none
the patch
saam: review+
the patch saam: review+

Filip Pizlo
Reported 2016-07-03 18:22:10 PDT
Patch forthcoming.
Attachments
work in progress (27.32 KB, patch)
2016-07-03 20:58 PDT, Filip Pizlo
no flags
rebased and improved (24.50 KB, patch)
2016-07-19 20:06 PDT, Filip Pizlo
no flags
it's starting to work! (50.37 KB, patch)
2016-07-19 22:12 PDT, Filip Pizlo
no flags
works like a charm (61.36 KB, patch)
2016-07-20 13:29 PDT, Filip Pizlo
no flags
the patch (64.39 KB, patch)
2016-07-21 14:33 PDT, Filip Pizlo
saam: review+
the patch (69.49 KB, patch)
2016-07-24 11:04 PDT, Filip Pizlo
saam: review+
Filip Pizlo
Comment 1 2016-07-03 19:40:02 PDT
This is going to grind really bad with PolyJump and the whole idea of non-cloneable code.
Filip Pizlo
Comment 2 2016-07-03 20:58:06 PDT
Created attachment 282674 [details] work in progress
Filip Pizlo
Comment 3 2016-07-19 20:06:20 PDT
Created attachment 284081 [details] rebased and improved
Filip Pizlo
Comment 4 2016-07-19 22:12:35 PDT
Created attachment 284084 [details] it's starting to work!
WebKit Commit Bot
Comment 5 2016-07-19 22:14:24 PDT
Attachment 284084 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/b3/air/AirOptimizeBlockOrder.cpp:87: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12434: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12435: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12436: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12437: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12438: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12439: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:73: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 8 in 31 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 6 2016-07-20 13:29:32 PDT
Created attachment 284143 [details] works like a charm
WebKit Commit Bot
Comment 7 2016-07-20 13:30:51 PDT
Attachment 284143 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/b3/air/AirOptimizeBlockOrder.cpp:87: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12444: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12445: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12446: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12447: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12448: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12449: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:73: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 8 in 32 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 8 2016-07-21 14:33:34 PDT
Created attachment 284262 [details] the patch
WebKit Commit Bot
Comment 9 2016-07-21 14:35:36 PDT
Attachment 284262 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/b3/air/AirOptimizeBlockOrder.cpp:87: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12444: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12445: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12446: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12447: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12448: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12449: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12725: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12726: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12727: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12729: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12730: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12731: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:73: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 14 in 33 files If any of these errors are false positives, please file a bug against check-webkit-style.
Saam Barati
Comment 10 2016-07-22 19:01:19 PDT
Comment on attachment 284262 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=284262&action=review Still looking at the code. Just a few quick comments. I'll finish reviewing later tonight or tomorrow. > Source/JavaScriptCore/ChangeLog:8 > + This teaches B3 how to compile procedures with multiple entrypoints in the best way ever. I agree that this is a very elegant way to think about this problem. > Source/JavaScriptCore/ChangeLog:40 > + Usually, you would use this by creating an EntrySwitch in the root block, but you don't > + have to do that. Just remember that the code before EntrySwitch gets cloned for each > + entrypoint. We allow cloning of an arbitrarily large amount of code because restricting it, > + and so restricing the placement of EntrySwitches, would be unelegant. It would be hard to > + preserve this invariant. For example we wouldn't be able to lower any value before an > + EntrySwitch to a control flow diamond. How does the algorithm react in cases where the cloned code preceding some arbitrary EntrySwitch has an EntrySwitch itself? Will this lead to exponential blow up as we recursively clone? What happens with a loop? I'm just curious about an explanation here. Still trying to grok the algorithm. > Source/JavaScriptCore/b3/B3MoveConstants.cpp:65 > + return key.opcode() == Const32 || key.opcode() == Const64 || key.opcode() == ArgumentReg; Why is this legal? Will this have effects on register allocation?
Filip Pizlo
Comment 11 2016-07-22 19:33:45 PDT
(In reply to comment #10) > Comment on attachment 284262 [details] > the patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=284262&action=review > > Still looking at the code. Just a few quick comments. I'll finish reviewing > later tonight or tomorrow. > > > Source/JavaScriptCore/ChangeLog:8 > > + This teaches B3 how to compile procedures with multiple entrypoints in the best way ever. > > I agree that this is a very elegant way to think about this problem. > > > Source/JavaScriptCore/ChangeLog:40 > > + Usually, you would use this by creating an EntrySwitch in the root block, but you don't > > + have to do that. Just remember that the code before EntrySwitch gets cloned for each > > + entrypoint. We allow cloning of an arbitrarily large amount of code because restricting it, > > + and so restricing the placement of EntrySwitches, would be unelegant. It would be hard to > > + preserve this invariant. For example we wouldn't be able to lower any value before an > > + EntrySwitch to a control flow diamond. > > How does the algorithm react in cases where the cloned code preceding some > arbitrary EntrySwitch has an EntrySwitch itself? I should add a test! This is meant to work gracefully and efficiently. Any block with an EntrySwitch is guaranteed to be cloned because it's in the backwards closure of itself. If there is some EntrySwitch in front of it, then this does not change its membership in the set of cloned blocks. Note that it's really best to reason about the algorithm in terms of a set of cloned blocks. If a block belongs to the set, it will be cloned for each entrypoint. If it doesn't, it won't be cloned. It's impossible for the algorithm to clone a block more than once per entrypoint. The number of EntrySwitches does not affect the number of clones, because the number of clones is just determined by the number of entrypoints. The cloning algorithm applies two separate rules to the successors of blocks after cloning them. It applies these rules in this order: 1) For each successor S: if S is a cloned block, then repoint it to its version in this clone. This means that the EntrySwitch will have all of its successors repointed into the clone. 2) If the terminal is an EntrySwitch, replace it with a Jump to the successor that corresponds to the current clone. So, each EntrySwitch turns into a Jump. In your test, this Jump could lead you to a block that has another EntrySwitch, which will also turn into a Jump. If that EntrySwitch really is the last one, then it will jump out of cloned code and into common code. Also, it's possible for one successor of EntrySwitch to lead to another EntrySwitch while another does not. So, depending on which clone we're talking about, that EntrySwitch could either jump to cloned code or common code. > Will this lead to exponential blow up as we recursively clone? No. We create clones per entrypoint, not per EntrySwitch. > What happens with a loop? In the loop test in testb3, every block gets cloned, because the only common code is a basic block that does a Return and that gets tail-duped. It turns out that for that test, cloning everything is optimal in space and time (space because we replace a jump to a ret with a jump, and time because fewer jumps). In general, the exit block(s) of the loop that contains EnrtySwitch would not get cloned if they were large enough to avoid tail duplication (which is a very low bar to meet) and if they did not lead to another EntrySwitch. I'm going to make a strong claim about this algorithm: it does minimal cloning under the precondition that all EntrySwitches must turn into unconditional jumps. A piece of code gets cloned only if the Entry variable would be live in that code, and we encode the value of the Entry variable as the index of the clone. > I'm just curious about an explanation here. Still trying to > grok the algorithm. > > > Source/JavaScriptCore/b3/B3MoveConstants.cpp:65 > > + return key.opcode() == Const32 || key.opcode() == Const64 || key.opcode() == ArgumentReg; > > Why is this legal? Will this have effects on register allocation? Each ArgumentReg for some register will always evaluate to the same value. For example: Block #0: @x = ArgumentReg(%rdi) ... Block #42: @y = ArgumentReg(%rdi) We are guaranteed that @x == @y. This is because ArgumentReg's semantics are: it gives you the value that the register would have had just before execution of the function's prologue. Therefore, you can hoist ArgumentReg as if it was a constant. That's because it is constant throughout any execution of the Procedure! It turns out that we need to do *something* to make ArgumentReg's work with EntrySwitch, since the instruction selector will place code in the prologue for each ArgumentReg Value in the Procedure. If there are duplicate ArgumentRegs, that means that the prologue gets really nasty. Air doesn't have the power to deduplicate them. When writing canonical code that uses EntrySwitch, you get into the habit of just creating an empty root block that ends in an EntrySwitch, and then creating the actual entrypoints as successors to that. When you do this, you invariably end up creating duplicate ArgumentReg's. Without this optimization, this would always lead to terrible code in the prologue. Because EntrySwitch works by duplicating everything before the EntrySwitch, that means that the terrible code in the prologue gets duplicated for each entrypoint, which then looks even nastier. Basically you get O(n^2) blow-up, where each entrypoint executes code that is meant to set up the arguments for every single other entrypoint. With this optimization, everything basically works: you get only one ArgumentReg per register in the prologue. This code does get duplicated, which might be moderately janky if one entrypoint uses argument registers and another doesn't - the one that doesn't would have to execute code to do things to argument registers that it isn't going to use. But I don't think this will be an issue for us. We will of course have argument registers in FTL soon. But our goal is to have one entrypoint that is legitimately hot (the entrypoint that corresponds to calling the function) and a bunch of additional entrypoints that will all be cold (even if you throw often, the time spent executing argument reg code in the prologue will be the least of your worries). If we do ever find this to be a problem, then we can fix it by moving the EntrySwitch lowering to before register allocation. I doubt we will need to do this anytime soon. It's just an interesting shortcoming of this approach.
Filip Pizlo
Comment 12 2016-07-22 19:46:28 PDT
(In reply to comment #11) > (In reply to comment #10) > > Comment on attachment 284262 [details] > > the patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=284262&action=review > > > > Still looking at the code. Just a few quick comments. I'll finish reviewing > > later tonight or tomorrow. > > > > > Source/JavaScriptCore/ChangeLog:8 > > > + This teaches B3 how to compile procedures with multiple entrypoints in the best way ever. > > > > I agree that this is a very elegant way to think about this problem. > > > > > Source/JavaScriptCore/ChangeLog:40 > > > + Usually, you would use this by creating an EntrySwitch in the root block, but you don't > > > + have to do that. Just remember that the code before EntrySwitch gets cloned for each > > > + entrypoint. We allow cloning of an arbitrarily large amount of code because restricting it, > > > + and so restricing the placement of EntrySwitches, would be unelegant. It would be hard to > > > + preserve this invariant. For example we wouldn't be able to lower any value before an > > > + EntrySwitch to a control flow diamond. > > > > How does the algorithm react in cases where the cloned code preceding some > > arbitrary EntrySwitch has an EntrySwitch itself? > > I should add a test! This is meant to work gracefully and efficiently. > > Any block with an EntrySwitch is guaranteed to be cloned because it's in the > backwards closure of itself. If there is some EntrySwitch in front of it, > then this does not change its membership in the set of cloned blocks. Note > that it's really best to reason about the algorithm in terms of a set of > cloned blocks. If a block belongs to the set, it will be cloned for each > entrypoint. If it doesn't, it won't be cloned. It's impossible for the > algorithm to clone a block more than once per entrypoint. The number of > EntrySwitches does not affect the number of clones, because the number of > clones is just determined by the number of entrypoints. > > The cloning algorithm applies two separate rules to the successors of blocks > after cloning them. It applies these rules in this order: > > 1) For each successor S: if S is a cloned block, then repoint it to its > version in this clone. This means that the EntrySwitch will have all of its > successors repointed into the clone. > > 2) If the terminal is an EntrySwitch, replace it with a Jump to the > successor that corresponds to the current clone. > > So, each EntrySwitch turns into a Jump. In your test, this Jump could lead > you to a block that has another EntrySwitch, which will also turn into a > Jump. If that EntrySwitch really is the last one, then it will jump out of > cloned code and into common code. Also, it's possible for one successor of > EntrySwitch to lead to another EntrySwitch while another does not. So, > depending on which clone we're talking about, that EntrySwitch could either > jump to cloned code or common code. > > > Will this lead to exponential blow up as we recursively clone? > > No. We create clones per entrypoint, not per EntrySwitch. > > > What happens with a loop? > > In the loop test in testb3, every block gets cloned, because the only common > code is a basic block that does a Return and that gets tail-duped. It turns > out that for that test, cloning everything is optimal in space and time > (space because we replace a jump to a ret with a jump, and time because > fewer jumps). > > In general, the exit block(s) of the loop that contains EnrtySwitch would > not get cloned if they were large enough to avoid tail duplication (which is > a very low bar to meet) and if they did not lead to another EntrySwitch. > > I'm going to make a strong claim about this algorithm: it does minimal > cloning under the precondition that all EntrySwitches must turn into > unconditional jumps. A piece of code gets cloned only if the Entry variable > would be live in that code, and we encode the value of the Entry variable as > the index of the clone. > > > I'm just curious about an explanation here. Still trying to > > grok the algorithm. > > > > > Source/JavaScriptCore/b3/B3MoveConstants.cpp:65 > > > + return key.opcode() == Const32 || key.opcode() == Const64 || key.opcode() == ArgumentReg; > > > > Why is this legal? Will this have effects on register allocation? > > Each ArgumentReg for some register will always evaluate to the same value. > For example: > > Block #0: > @x = ArgumentReg(%rdi) > ... > Block #42: > @y = ArgumentReg(%rdi) > > We are guaranteed that @x == @y. This is because ArgumentReg's semantics > are: it gives you the value that the register would have had just before > execution of the function's prologue. > > Therefore, you can hoist ArgumentReg as if it was a constant. That's > because it is constant throughout any execution of the Procedure! > > It turns out that we need to do *something* to make ArgumentReg's work with > EntrySwitch, since the instruction selector will place code in the prologue > for each ArgumentReg Value in the Procedure. If there are duplicate > ArgumentRegs, that means that the prologue gets really nasty. Air doesn't > have the power to deduplicate them. When writing canonical code that uses > EntrySwitch, you get into the habit of just creating an empty root block > that ends in an EntrySwitch, and then creating the actual entrypoints as > successors to that. When you do this, you invariably end up creating > duplicate ArgumentReg's. Without this optimization, this would always lead > to terrible code in the prologue. Because EntrySwitch works by duplicating > everything before the EntrySwitch, that means that the terrible code in the > prologue gets duplicated for each entrypoint, which then looks even nastier. > Basically you get O(n^2) blow-up, where each entrypoint executes code that > is meant to set up the arguments for every single other entrypoint. With > this optimization, everything basically works: you get only one ArgumentReg > per register in the prologue. This code does get duplicated, which might be > moderately janky if one entrypoint uses argument registers and another > doesn't - the one that doesn't would have to execute code to do things to > argument registers that it isn't going to use. > > But I don't think this will be an issue for us. We will of course have > argument registers in FTL soon. But our goal is to have one entrypoint that > is legitimately hot (the entrypoint that corresponds to calling the > function) and a bunch of additional entrypoints that will all be cold (even > if you throw often, the time spent executing argument reg code in the > prologue will be the least of your worries). > > If we do ever find this to be a problem, then we can fix it by moving the > EntrySwitch lowering to before register allocation. I doubt we will need to > do this anytime soon. It's just an interesting shortcoming of this approach. Actually, I thought through this some more. Yes, we will end up with some entrypoints having effectively dead instructions to shuffle argument registers that they don't use. But that will be the *only* bad thing that happens. As such, we could fix this by doing a really basic DCE after EntrySwitch lowering. We have the power to do DCE in Air even after regalloc. Here's a fun example of what will happen in FTL, once we have EntrySwitch and we are using it for catch, and we have register-based calling. Say that we have a function that has one catch and takes one argument register (maybe this). Initially our code might look like this. Block #0: EntrySwitch() Successors: #1, #2 Block #1: // this is the entrypoint for normal calls @arg = ArgumentReg(%rdi) // get this Jump() Successors: ... Block #2: // this is the catch block @arg2 = Load(things) // load the value of the argument from the OSR entry buffer Jump() Successors: ... Block #42: // this is a merge between paths from catch and other paths @arg3 = Phi(...) // this merges @arg with @arg2 Then ArgumentReg hoisting will do this: Block #0: @arg = ArgumentReg(%rdi) // get this EntrySwitch() Successors: #1, #2 Block #1: // this is the entrypoint for normal calls Jump() Successors: ... Block #2: // this is the catch block // Note that @arg is available here, but it's dead. We won't use it, since we use // @arg2 instead. @arg2 = Load(things) // load the value of the argument from the OSR entry buffer Jump() Successors: ... Block #42: // this is a merge between paths from catch and other paths @arg3 = Phi(...) // this merges @arg with @arg2 Then when we lower to Air and lower EntrySwitch we will get: Block #0: // This is now the real entrypoint for normal calls Move %rdi, %argreg // %argreg is some register we picked to hold the value of @arg ... Block #1: // This is the entrypoint for catch Move %rdi, %argreg // Dead move of the argument // %argreg is available for use in this code until Block #42. it's not live right now. Move (some address), %arg2reg // %arg2reg is the register for @arg2 It may be that we will coalesce %argreg and %arg2reg because they both merge at @arg3 in #42, or we may use different registers because that relieves register pressure. It may also be that in the catch entrypoint, we use %argreg for something unrelated to the argument. The move into %argreg in the catch entrypoint is just a dead goof in this case. On the other hand, if the catch entrypoint did use ArgumentReg(%rdi), then it would force the register allocator to use the same register for that argument along both entrypoints. That's not great (forcing the register allocator to do anything is not great) but it's probably harmless.
Saam Barati
Comment 13 2016-07-23 14:52:16 PDT
(In reply to comment #11) > (In reply to comment #10) > > Comment on attachment 284262 [details] > > the patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=284262&action=review > > > > Still looking at the code. Just a few quick comments. I'll finish reviewing > > later tonight or tomorrow. > > > > > Source/JavaScriptCore/ChangeLog:8 > > > + This teaches B3 how to compile procedures with multiple entrypoints in the best way ever. > > > > I agree that this is a very elegant way to think about this problem. > > > > > Source/JavaScriptCore/ChangeLog:40 > > > + Usually, you would use this by creating an EntrySwitch in the root block, but you don't > > > + have to do that. Just remember that the code before EntrySwitch gets cloned for each > > > + entrypoint. We allow cloning of an arbitrarily large amount of code because restricting it, > > > + and so restricing the placement of EntrySwitches, would be unelegant. It would be hard to > > > + preserve this invariant. For example we wouldn't be able to lower any value before an > > > + EntrySwitch to a control flow diamond. > > > > How does the algorithm react in cases where the cloned code preceding some > > arbitrary EntrySwitch has an EntrySwitch itself? > > I should add a test! This is meant to work gracefully and efficiently. > > Any block with an EntrySwitch is guaranteed to be cloned because it's in the > backwards closure of itself. If there is some EntrySwitch in front of it, > then this does not change its membership in the set of cloned blocks. Note > that it's really best to reason about the algorithm in terms of a set of > cloned blocks. If a block belongs to the set, it will be cloned for each > entrypoint. If it doesn't, it won't be cloned. It's impossible for the > algorithm to clone a block more than once per entrypoint. The number of > EntrySwitches does not affect the number of clones, because the number of > clones is just determined by the number of entry points. I realized last night that my mental model for this was completely off. I'm not quite sure what I was thinking before. I now realize that each EntrySwitch operates over the same EntryVariables. I used to think that each EntrySwitch operated over its own EntryVariable (which I'm not quite sure what that even means). So, if a graph has N blocks with K entry switches each with M entry variables, my old thinking is that we could end up with a graph of O(N^M) blocks. But in your model, I believed it's O(N*M). I now understand the idea better, and it sense. > > The cloning algorithm applies two separate rules to the successors of blocks > after cloning them. It applies these rules in this order: > > 1) For each successor S: if S is a cloned block, then repoint it to its > version in this clone. This means that the EntrySwitch will have all of its > successors repointed into the clone. > > 2) If the terminal is an EntrySwitch, replace it with a Jump to the > successor that corresponds to the current clone. > > So, each EntrySwitch turns into a Jump. In your test, this Jump could lead > you to a block that has another EntrySwitch, which will also turn into a > Jump. If that EntrySwitch really is the last one, then it will jump out of > cloned code and into common code. Also, it's possible for one successor of > EntrySwitch to lead to another EntrySwitch while another does not. So, > depending on which clone we're talking about, that EntrySwitch could either > jump to cloned code or common code. > > > Will this lead to exponential blow up as we recursively clone? > > No. We create clones per entrypoint, not per EntrySwitch. > > > What happens with a loop? > > In the loop test in testb3, every block gets cloned, because the only common > code is a basic block that does a Return and that gets tail-duped. It turns > out that for that test, cloning everything is optimal in space and time > (space because we replace a jump to a ret with a jump, and time because > fewer jumps). > > In general, the exit block(s) of the loop that contains EnrtySwitch would > not get cloned if they were large enough to avoid tail duplication (which is > a very low bar to meet) and if they did not lead to another EntrySwitch. > > I'm going to make a strong claim about this algorithm: it does minimal > cloning under the precondition that all EntrySwitches must turn into > unconditional jumps. A piece of code gets cloned only if the Entry variable > would be live in that code, and we encode the value of the Entry variable as > the index of the clone. > > > I'm just curious about an explanation here. Still trying to > > grok the algorithm. > > > > > Source/JavaScriptCore/b3/B3MoveConstants.cpp:65 > > > + return key.opcode() == Const32 || key.opcode() == Const64 || key.opcode() == ArgumentReg; > > > > Why is this legal? Will this have effects on register allocation? > > Each ArgumentReg for some register will always evaluate to the same value. > For example: > > Block #0: > @x = ArgumentReg(%rdi) > ... > Block #42: > @y = ArgumentReg(%rdi) > > We are guaranteed that @x == @y. This is because ArgumentReg's semantics > are: it gives you the value that the register would have had just before > execution of the function's prologue. > > Therefore, you can hoist ArgumentReg as if it was a constant. That's > because it is constant throughout any execution of the Procedure! > > It turns out that we need to do *something* to make ArgumentReg's work with > EntrySwitch, since the instruction selector will place code in the prologue > for each ArgumentReg Value in the Procedure. If there are duplicate > ArgumentRegs, that means that the prologue gets really nasty. Air doesn't > have the power to deduplicate them. When writing canonical code that uses > EntrySwitch, you get into the habit of just creating an empty root block > that ends in an EntrySwitch, and then creating the actual entrypoints as > successors to that. When you do this, you invariably end up creating > duplicate ArgumentReg's. Without this optimization, this would always lead > to terrible code in the prologue. Because EntrySwitch works by duplicating > everything before the EntrySwitch, that means that the terrible code in the > prologue gets duplicated for each entrypoint, which then looks even nastier. > Basically you get O(n^2) blow-up, where each entrypoint executes code that > is meant to set up the arguments for every single other entrypoint. With > this optimization, everything basically works: you get only one ArgumentReg > per register in the prologue. This code does get duplicated, which might be > moderately janky if one entrypoint uses argument registers and another > doesn't - the one that doesn't would have to execute code to do things to > argument registers that it isn't going to use. > > But I don't think this will be an issue for us. We will of course have > argument registers in FTL soon. But our goal is to have one entrypoint that > is legitimately hot (the entrypoint that corresponds to calling the > function) and a bunch of additional entrypoints that will all be cold (even > if you throw often, the time spent executing argument reg code in the > prologue will be the least of your worries). > > If we do ever find this to be a problem, then we can fix it by moving the > EntrySwitch lowering to before register allocation. I doubt we will need to > do this anytime soon. It's just an interesting shortcoming of this approach.
Saam Barati
Comment 14 2016-07-23 15:59:33 PDT
Comment on attachment 284262 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=284262&action=review Almost done reviewing, will finish a bit later, gotta go do something right now. > Source/JavaScriptCore/b3/air/AirCode.h:126 > + const Vector<FrequentedBlock> entrypoints() const { return m_entrypoints; } Do you mean to be copying the entry points here? Or did you mean "const Vector<FrequentedBlock>& entrypoints() ..."? Maybe we can get rid of this? Doesn't look like it's used. > Source/JavaScriptCore/b3/air/AirCode.h:130 > + // This is for use by lowerEntrySwitch(). "for use by" => "used by" or something similar. > Source/JavaScriptCore/b3/air/AirCode.h:146 > + m_entrypointLabels = std::forward<Vector>(vector); IMO, the contract here should be that we don't do forwarding here. It seems weird if somebody passed in an lvalue reference here and then we copied the vector. I also think this would be easier to read if you just wrote is as: void setEntrypointLabels(Vector<CCallHelpers::Label>&& vector) { m_entrypointLabels = WTFMove(vector); }
Filip Pizlo
Comment 15 2016-07-23 16:21:05 PDT
(In reply to comment #14) > Comment on attachment 284262 [details] > the patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=284262&action=review > > Almost done reviewing, will finish a bit later, gotta go do something right > now. > > > Source/JavaScriptCore/b3/air/AirCode.h:126 > > + const Vector<FrequentedBlock> entrypoints() const { return m_entrypoints; } > > Do you mean to be copying the entry points here? > Or did you mean "const Vector<FrequentedBlock>& entrypoints() ..."? > Maybe we can get rid of this? Doesn't look like it's used. Yeah, it was meant to be &. > > > Source/JavaScriptCore/b3/air/AirCode.h:130 > > + // This is for use by lowerEntrySwitch(). > > "for use by" => "used by" or something similar. I changed it. However, "for use by" to me meant that it's intended to be used by that phase, while "used by" is just a statement of fact that it's used by that phase. I want to make clear that it's intended only to be used by that phase. > > > Source/JavaScriptCore/b3/air/AirCode.h:146 > > + m_entrypointLabels = std::forward<Vector>(vector); > > IMO, the contract here should be that we don't do forwarding here. This is a setter. I've been adopting the philosophy that setters should do forwarding unless there is some really compelling reason not prohibit it. > It seems > weird if somebody passed in an lvalue reference here and then we copied the > vector. We wouldn't copy the vector in that case. This is a setter. If you want to move the vector, you call this with setEntrypoints(WTFMove(things)). The syntax is a small price to pay for giving the caller some flexibility. > I also think this would be easier to read if you just wrote is as: > void setEntrypointLabels(Vector<CCallHelpers::Label>&& vector) { > m_entrypointLabels = WTFMove(vector); } I guess I don't find the current version that hard to read. Throughout Air, we try to do perfect forwarding whenever we can, rather than requiring either move or copy semantics.
Filip Pizlo
Comment 16 2016-07-23 16:22:06 PDT
> > > Source/JavaScriptCore/b3/air/AirCode.h:146 > > > + m_entrypointLabels = std::forward<Vector>(vector); > > > > IMO, the contract here should be that we don't do forwarding here. > > This is a setter. I've been adopting the philosophy that setters should do > forwarding unless there is some really compelling reason not prohibit it. > > > It seems > > weird if somebody passed in an lvalue reference here and then we copied the > > vector. > > We wouldn't copy the vector in that case. This is a setter. If you want to > move the vector, you call this with setEntrypoints(WTFMove(things)). The > syntax is a small price to pay for giving the caller some flexibility. Sorry, we *would* copy the vector in that case, and that's fine, since that's consistent with how setters work.
Saam Barati
Comment 17 2016-07-23 23:42:58 PDT
Comment on attachment 284262 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=284262&action=review r=me > Source/JavaScriptCore/b3/B3Opcode.h:220 > + // Multiple entrypoint are supported via the EntrySwitch operation. Place this in the root Typo: entrypoint => entrypoints >>> Source/JavaScriptCore/b3/air/AirCode.h:130 >>> + // This is for use by lowerEntrySwitch(). >> >> "for use by" => "used by" or something similar. > > I changed it. However, "for use by" to me meant that it's intended to be used by that phase, while "used by" is just a statement of fact that it's used by that phase. I want to make clear that it's intended only to be used by that phase. Ok. You can keep that, I don't have a strong opinion either way. It just read a bit weirdly to me. >>> Source/JavaScriptCore/b3/air/AirCode.h:146 >>> + m_entrypointLabels = std::forward<Vector>(vector); >> >> IMO, the contract here should be that we don't do forwarding here. It seems weird if somebody passed in an lvalue reference here and then we copied the vector. >> I also think this would be easier to read if you just wrote is as: >> void setEntrypointLabels(Vector<CCallHelpers::Label>&& vector) { m_entrypointLabels = WTFMove(vector); } > > This is a setter. I've been adopting the philosophy that setters should do forwarding unless there is some really compelling reason not prohibit it. I was just thinking about this form the perspective that it'd be wrong for somebody to call this with an lvalue reference, and then start mutating the lvalue reference afterwards, and not the m_entrypountLabels vector. I don't think enforcing move semantics, like I suggested, prevents this from happening though. And this problem exists with many setter APIs. So it's not really a big deal implementing it either way. > Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:52 > + Vector<FrequentedBlock> entrypoints(code.proc().numEntrypoints(), FrequentedBlock(code[0])); Should we assert that this is 1 here? > Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:60 > + RELEASE_ASSERT(worklist.saw(code[0])); So we're guaranteed here to have removed any unreachable blocks before this phase runs? I'm not sure if you have a test for this, but can you add one? > Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:77 > + block->last().opcode = Jump; Out of curiosity, is all that's needed in Air to change a node is to replace its opcode? Or is this just because both EntrySwitch and Jump are terminals? > Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:83 > + Vector<FrequentedBlock> entrypoints; Is it worth having this pre-allocate code.proc().numEntrypoints() entries? > Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:85 > + IndexMap<BasicBlock, BasicBlock*> map(code.size()); Suggestion: maybe call this "clonedBlocks" or something similar?
Filip Pizlo
Comment 18 2016-07-24 10:45:32 PDT
(In reply to comment #17) > Comment on attachment 284262 [details] > the patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=284262&action=review > > r=me > > > Source/JavaScriptCore/b3/B3Opcode.h:220 > > + // Multiple entrypoint are supported via the EntrySwitch operation. Place this in the root > > Typo: entrypoint => entrypoints Fixed. > > >>> Source/JavaScriptCore/b3/air/AirCode.h:130 > >>> + // This is for use by lowerEntrySwitch(). > >> > >> "for use by" => "used by" or something similar. > > > > I changed it. However, "for use by" to me meant that it's intended to be used by that phase, while "used by" is just a statement of fact that it's used by that phase. I want to make clear that it's intended only to be used by that phase. > > Ok. You can keep that, I don't have a strong opinion either way. It just > read a bit weirdly to me. I left it as "used by". I don't have a strong opinion either. > > >>> Source/JavaScriptCore/b3/air/AirCode.h:146 > >>> + m_entrypointLabels = std::forward<Vector>(vector); > >> > >> IMO, the contract here should be that we don't do forwarding here. It seems weird if somebody passed in an lvalue reference here and then we copied the vector. > >> I also think this would be easier to read if you just wrote is as: > >> void setEntrypointLabels(Vector<CCallHelpers::Label>&& vector) { m_entrypointLabels = WTFMove(vector); } > > > > This is a setter. I've been adopting the philosophy that setters should do forwarding unless there is some really compelling reason not prohibit it. > > I was just thinking about this form the perspective that it'd be wrong for > somebody to call this with an lvalue reference, and then start mutating the > lvalue reference afterwards, and not the m_entrypountLabels vector. I don't > think enforcing move semantics, like I suggested, prevents this from > happening though. And this problem exists with many setter APIs. So it's not > really a big deal implementing it either way. > > > Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:52 > > + Vector<FrequentedBlock> entrypoints(code.proc().numEntrypoints(), FrequentedBlock(code[0])); > > Should we assert that this is 1 here? No. You could have code where the EntrySwitch was optimized out for some reason. For example, the code leading up to the EntrySwitch might become unreachable, or all of the successors of the EntrySwitch might be merged together. We merge identical successors in strength reduction. > > > Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:60 > > + RELEASE_ASSERT(worklist.saw(code[0])); > > So we're guaranteed here to have removed any unreachable blocks before this > phase runs? Yes. > I'm not sure if you have a test for this, but can you add one? Not sure what to test for. :-) I don't want to create a test that always asserts, since that would just crash testb3. > > > Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:77 > > + block->last().opcode = Jump; > > Out of curiosity, is all that's needed in Air to change a node is to replace > its opcode? Or is this just because both EntrySwitch and Jump are terminals? Just the opcode and the args. Since neither EntrySwitch nor Jump have any args, you can change one to the other by just changing the opcode. > > > Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:83 > > + Vector<FrequentedBlock> entrypoints; > > Is it worth having this pre-allocate code.proc().numEntrypoints() entries? I usually don't end up adding code to preallocate unless the vector is hot. This one isn't. > > > Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:85 > > + IndexMap<BasicBlock, BasicBlock*> map(code.size()); > > Suggestion: maybe call this "clonedBlocks" or something similar? I don't like that any better than "map". In code that does graph cloning I always look for a mapping from original to clone, which is necessary for rewiring the graph edges. So, I called this "map" so that I'll always be able to easily find that mapping.
Saam Barati
Comment 19 2016-07-24 10:57:52 PDT
(In reply to comment #18) > (In reply to comment #17) > > Comment on attachment 284262 [details] > > the patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=284262&action=review > > > > r=me > > > > > Source/JavaScriptCore/b3/B3Opcode.h:220 > > > + // Multiple entrypoint are supported via the EntrySwitch operation. Place this in the root > > > > Typo: entrypoint => entrypoints > > Fixed. > > > > > >>> Source/JavaScriptCore/b3/air/AirCode.h:130 > > >>> + // This is for use by lowerEntrySwitch(). > > >> > > >> "for use by" => "used by" or something similar. > > > > > > I changed it. However, "for use by" to me meant that it's intended to be used by that phase, while "used by" is just a statement of fact that it's used by that phase. I want to make clear that it's intended only to be used by that phase. > > > > Ok. You can keep that, I don't have a strong opinion either way. It just > > read a bit weirdly to me. > > I left it as "used by". I don't have a strong opinion either. > > > > > >>> Source/JavaScriptCore/b3/air/AirCode.h:146 > > >>> + m_entrypointLabels = std::forward<Vector>(vector); > > >> > > >> IMO, the contract here should be that we don't do forwarding here. It seems weird if somebody passed in an lvalue reference here and then we copied the vector. > > >> I also think this would be easier to read if you just wrote is as: > > >> void setEntrypointLabels(Vector<CCallHelpers::Label>&& vector) { m_entrypointLabels = WTFMove(vector); } > > > > > > This is a setter. I've been adopting the philosophy that setters should do forwarding unless there is some really compelling reason not prohibit it. > > > > I was just thinking about this form the perspective that it'd be wrong for > > somebody to call this with an lvalue reference, and then start mutating the > > lvalue reference afterwards, and not the m_entrypountLabels vector. I don't > > think enforcing move semantics, like I suggested, prevents this from > > happening though. And this problem exists with many setter APIs. So it's not > > really a big deal implementing it either way. > > > > > Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:52 > > > + Vector<FrequentedBlock> entrypoints(code.proc().numEntrypoints(), FrequentedBlock(code[0])); > > > > Should we assert that this is 1 here? > > No. You could have code where the EntrySwitch was optimized out for some > reason. For example, the code leading up to the EntrySwitch might become > unreachable, or all of the successors of the EntrySwitch might be merged > together. We merge identical successors in strength reduction. Interesting! I wasn't thinking about that. If you don't already have a test that does this, maybe you can add one that asserts that all entry points are the same executable address? > > > > > > Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:60 > > > + RELEASE_ASSERT(worklist.saw(code[0])); > > > > So we're guaranteed here to have removed any unreachable blocks before this > > phase runs? > > Yes. > > > I'm not sure if you have a test for this, but can you add one? > > Not sure what to test for. :-) I don't want to create a test that always > asserts, since that would just crash testb3. > > > > > > Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:77 > > > + block->last().opcode = Jump; > > > > Out of curiosity, is all that's needed in Air to change a node is to replace > > its opcode? Or is this just because both EntrySwitch and Jump are terminals? > > Just the opcode and the args. Since neither EntrySwitch nor Jump have any > args, you can change one to the other by just changing the opcode. > > > > > > Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:83 > > > + Vector<FrequentedBlock> entrypoints; > > > > Is it worth having this pre-allocate code.proc().numEntrypoints() entries? > > I usually don't end up adding code to preallocate unless the vector is hot. > This one isn't. > > > > > > Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:85 > > > + IndexMap<BasicBlock, BasicBlock*> map(code.size()); > > > > Suggestion: maybe call this "clonedBlocks" or something similar? > > I don't like that any better than "map". In code that does graph cloning I > always look for a mapping from original to clone, which is necessary for > rewiring the graph edges. So, I called this "map" so that I'll always be > able to easily find that mapping.
Filip Pizlo
Comment 20 2016-07-24 11:04:51 PDT
Created attachment 284448 [details] the patch Addressed Saam's comments and also added changes to the B3 docs, which need review.
WebKit Commit Bot
Comment 21 2016-07-24 11:07:08 PDT
Attachment 284448 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/b3/air/AirOptimizeBlockOrder.cpp:87: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12464: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12465: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12466: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12467: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12468: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12469: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12745: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12746: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12747: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12749: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12750: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/testb3.cpp:12751: Consider using CHECK_EQ instead of CHECK(a == b) [readability/check] [2] ERROR: Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:73: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 14 in 35 files If any of these errors are false positives, please file a bug against check-webkit-style.
Saam Barati
Comment 22 2016-07-24 11:10:35 PDT
Comment on attachment 284448 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=284448&action=review r=me on doc changes > Websites/webkit.org/docs/b3/intermediate-representation.html:580 > + <p>This is the canonical way to use EntrySwitch, however the semantics are flexible enough > + to allow its use anywhere in the control flow graph. This ensures that EntrySwitch works > + even when B3 does transformations that move code above the EntrySwitch.</p> Maybe it's also worth saying that b3 allows an arbitrary numbers of entry switches in the CFG.
Filip Pizlo
Comment 23 2016-07-24 11:16:43 PDT
> > > > Source/JavaScriptCore/b3/air/AirLowerEntrySwitch.cpp:52 > > > > + Vector<FrequentedBlock> entrypoints(code.proc().numEntrypoints(), FrequentedBlock(code[0])); > > > > > > Should we assert that this is 1 here? > > > > No. You could have code where the EntrySwitch was optimized out for some > > reason. For example, the code leading up to the EntrySwitch might become > > unreachable, or all of the successors of the EntrySwitch might be merged > > together. We merge identical successors in strength reduction. > Interesting! I wasn't thinking about that. If you don't already have a test > that does this, maybe you can add one that asserts that all entry points are > the same executable address? I added such a test. I didn't assert that everything points to the same executable address, since I'm not sure it's meaningful to actually guarantee that. I just test that the code does the same thing when called from any entrypoint.
Filip Pizlo
Comment 24 2016-07-24 11:17:38 PDT
(In reply to comment #22) > Comment on attachment 284448 [details] > the patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=284448&action=review > > r=me on doc changes > > > Websites/webkit.org/docs/b3/intermediate-representation.html:580 > > + <p>This is the canonical way to use EntrySwitch, however the semantics are flexible enough > > + to allow its use anywhere in the control flow graph. This ensures that EntrySwitch works > > + even when B3 does transformations that move code above the EntrySwitch.</p> > > Maybe it's also worth saying that b3 allows an arbitrary numbers of entry > switches in the CFG. Added!
Filip Pizlo
Comment 25 2016-07-24 13:35:36 PDT
Note You need to log in before you can comment on or make changes to this bug.