Bug 165979

Summary: Rename finallyActionRegister to completionTypeRegister and only store int JSValues in it.
Product: WebKit Reporter: Mark Lam <mark.lam>
Component: JavaScriptCoreAssignee: Mark Lam <mark.lam>
Status: RESOLVED DUPLICATE    
Severity: Normal CC: commit-queue, fpizlo, ggaren, jfbastien, keith_miller, msaboff, saam, ysuzuki
Priority: P2    
Version: WebKit Local Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 160168, 166049    
Bug Blocks:    
Attachments:
Description Flags
proposed patch. none

Description Mark Lam 2016-12-16 16:55:45 PST
It is better for JIT type speculation if the type stays an int always.
Comment 1 Mark Lam 2016-12-18 09:49:07 PST
Created attachment 297438 [details]
proposed patch.
Comment 2 Saam Barati 2016-12-18 10:24:35 PST
Comment on attachment 297438 [details]
proposed patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=297438&action=review

> Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:4874
>              for (size_t i = 0; i < numberOfJumps; i++) {

I know this isn't part of this patch, but why not emit a switch() statement instead of a linear set of jumps? Maybe worth doing in another patch? Or at least if the number of jumps is large we can emit a switch
Comment 3 Mark Lam 2016-12-18 10:39:55 PST
(In reply to comment #2)
> Comment on attachment 297438 [details]
> proposed patch.
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=297438&action=review
> 
> > Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:4874
> >              for (size_t i = 0; i < numberOfJumps; i++) {
> 
> I know this isn't part of this patch, but why not emit a switch() statement
> instead of a linear set of jumps? Maybe worth doing in another patch? Or at
> least if the number of jumps is large we can emit a switch

Good point, but I'll look into doing that separately.  Thanks.
Comment 4 WebKit Commit Bot 2016-12-18 11:04:59 PST
Comment on attachment 297438 [details]
proposed patch.

Clearing flags on attachment: 297438

Committed r209974: <http://trac.webkit.org/changeset/209974>
Comment 5 WebKit Commit Bot 2016-12-18 11:05:03 PST
All reviewed patches have been landed.  Closing bug.
Comment 6 Mark Lam 2016-12-19 09:59:07 PST
(In reply to comment #2)
> > Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:4874
> >              for (size_t i = 0; i < numberOfJumps; i++) {
> 
> I know this isn't part of this patch, but why not emit a switch() statement
> instead of a linear set of jumps? Maybe worth doing in another patch? Or at
> least if the number of jumps is large we can emit a switch

I took a look at this.  Here's an example of JS code that would resemble the code generated in emitFinallyCompletion():

function foo(x) {
    var scope1 = x; // Fake scope value
    var scope2 = x; // Fake scope value
    var scope3 = x; // Fake scope value
    var completionType = 0;
    switch(x) {
    case 0: break; // Normal completion.
    case 3: return x; // Return completion.

    // Simulate Break/Continue completions with non-contiguous jumpIDs.
    // each case needs to do 1. a move to restore scope, 2. set the completion type
    // back to Normal, and 3. jump to the target.
    case 8: x = scope1; completionType = 0; break; 
    case 21: x = scope2; completionType = 0; break; 
    case 55: x = scope3; completionType = 0; break; 

    default:
        throw x; // Throw completion.
    }
}

Here's the best bytecode that we can generate for this at present:

foo#A4jB2P:[0x10f5979e0->0x10f5a64c0, NoneFunctionCall, 102]: 102 m_instructions; 816 bytes; 2 parameter(s); 12 callee register(s); 10 variable(s)
[   0] enter             
[   1] get_scope         loc3
[   3] mov               loc4, loc3
[   6] mov               loc6, arg1
[   9] mov               loc5, arg1
[  12] mov               loc8, arg1
[  15] mov               loc7, Int32: 0(const0)

[  18] mov               loc10, Int32: 0(const0) // Check for Normal completion.
[  21] stricteq          loc10, loc10, arg1
[  25] jtrue             loc10, 45(->70)

[  28] mov               loc10, Int32: 3(const1) // Check for Return completion.
[  31] stricteq          loc10, loc10, arg1
[  35] jtrue             loc10, 37(->72)

[  38] mov               loc10, Int32: 8(const2) // Check for non-contiguous jumpID i.e. Break/Continue completion.
[  41] stricteq          loc10, loc10, arg1
[  45] jtrue             loc10, 29(->74)

[  48] mov               loc10, Int32: 21(const3) // Check for non-contiguous jumpID i.e. Break/Continue completion.
[  51] stricteq          loc10, loc10, arg1
[  55] jtrue             loc10, 27(->82)

[  58] mov               loc10, Int32: 55(const4) // Check for non-contiguous jumpID i.e. Break/Continue completion.
[  61] stricteq          loc10, loc10, arg1
[  65] jtrue             loc10, 25(->90)

[  68] jmp               30(->98) // Default Throw completion.

[  70] jmp               30(->100)
[  72] ret               arg1

[  74] mov               arg1, loc6 // Restore scope.
[  77] mov               loc7, Int32: 0(const0) // Restore Normal completion.
[  80] jmp               20(->100)

[  82] mov               arg1, loc5 // Restore scope.
[  85] mov               loc7, Int32: 0(const0) // Restore Normal completion.
[  88] jmp               12(->100)

[  90] mov               arg1, loc8 // Restore scope.
[  93] mov               loc7, Int32: 0(const0) // Restore Normal completion.
[  96] jmp               4(->100)

[  98] throw             arg1
[ 100] ret               Undefined(const5)
 

Hence, the best representation we have for the type of non-contiguous switch cases that emitFinallyCompletion() emits is still a series of test-and-jumps.  Hence, the bytecode I'm currently emitting is as efficient as it gets.  The one difference might be that the switch code layout may or may not yield better cache locality in JIT generated code eventually.  That may be more the case if the number of cases in the switch would be large.  However, I think the number of breaks and continues in for-of loops would tend to be few in number.  Hence, it is unlikely, we'll see much benefit in terms of cache locality. 

In summary, we can't really emit an op_switch (because the cases are not contiguous), and the current generated bytecode is already efficient.  I won't pursue this inquiry further.
Comment 7 Mark Lam 2016-12-19 11:36:09 PST
(In reply to comment #6)
> In summary, we can't really emit an op_switch (because the cases are not
> contiguous), and the current generated bytecode is already efficient.  I
> won't pursue this inquiry further.

Correction: I had talked with Saam offline.  We can make it so that the jumpIDs are contiguous (by changing our scheme for assigning jumpIDs) thereby enabling us to use a jump table switch.  We may look into this in the future if we see a perf need in this area.
Comment 8 Mark Lam 2016-12-19 17:42:34 PST
Rolling out r209974 and r209952.  They break some websites in mysterious ways.
Comment 9 Mark Lam 2016-12-19 18:10:28 PST
Part 1: r209974 is rolled out in r210007: <http://trac.webkit.org/r210007>.
Part 2: r209952 is rolled out in r210010: <http://trac.webkit.org/r210010>.
Comment 10 Mark Lam 2016-12-21 12:40:59 PST
(In reply to comment #9)
> Part 1: r209974 is rolled out in r210007: <http://trac.webkit.org/r210007>.
> Part 2: r209952 is rolled out in r210010: <http://trac.webkit.org/r210010>.

I will be posting a fixed patch for https://bugs.webkit.org/show_bug.cgi?id=160168.  So, I'll just roll this patch into that one.  Will close this one as a dupe.
Comment 11 Mark Lam 2016-12-21 12:41:21 PST

*** This bug has been marked as a duplicate of bug 160168 ***