Bug 163394 - B3 needs a special WasmAddress Opcode
Summary: B3 needs a special WasmAddress Opcode
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Keith Miller
URL:
Keywords:
Depends on:
Blocks: 163171
  Show dependency treegraph
 
Reported: 2016-10-13 10:58 PDT by Keith Miller
Modified: 2016-10-14 16:24 PDT (History)
2 users (show)

See Also:


Attachments
Working but looking into lea failures (17.63 KB, patch)
2016-10-13 11:00 PDT, Keith Miller
no flags Details | Formatted Diff | Diff
Patch (21.73 KB, patch)
2016-10-14 15:30 PDT, Keith Miller
no flags Details | Formatted Diff | Diff
Patch for landing (21.46 KB, patch)
2016-10-14 16:08 PDT, Keith Miller
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Keith Miller 2016-10-13 10:58:34 PDT
...
Comment 1 Keith Miller 2016-10-13 11:00:00 PDT
Created attachment 291495 [details]
Working but looking into lea failures
Comment 2 Keith Miller 2016-10-13 11:12:12 PDT
For some reason the test case in the patch produces the following disassembly on x86_64:

      0x21d1cec01000: push %rbp
      0x21d1cec01001: mov %rsp, %rbp
      0x21d1cec01004: cmp $0x0, %edi
      0x21d1cec01007: ja 0x21d1cec0100f
      0x21d1cec0100d: pop %rbp
      0x21d1cec0100e: ret
      0x21d1cec0100f: xor %rax, %rax
      0x21d1cec01012: mov %eax, %ecx
      0x21d1cec01014: shl $0x2, %ecx
      0x21d1cec01017: add %rsi, %rcx
      0x21d1cec0101a: mov $0x2a, (%rcx)
      0x21d1cec01020: inc %eax
      0x21d1cec01022: cmp %edi, %eax
      0x21d1cec01024: jb 0x21d1cec01012
      0x21d1cec0102a: jmp 0x21d1cec0100d

I want to figure out why the add and mov are not being compressed into one instruction.

It's also to bad that we can't fuse the shl, add, and mov since I expect that case to be common. Note, we can't fuse the shl with the others since it's 32-bit and the other values are 64-bit.
Comment 3 Filip Pizlo 2016-10-13 11:14:04 PDT
(In reply to comment #2)
> For some reason the test case in the patch produces the following
> disassembly on x86_64:
> 
>       0x21d1cec01000: push %rbp
>       0x21d1cec01001: mov %rsp, %rbp
>       0x21d1cec01004: cmp $0x0, %edi
>       0x21d1cec01007: ja 0x21d1cec0100f
>       0x21d1cec0100d: pop %rbp
>       0x21d1cec0100e: ret
>       0x21d1cec0100f: xor %rax, %rax
>       0x21d1cec01012: mov %eax, %ecx
>       0x21d1cec01014: shl $0x2, %ecx
>       0x21d1cec01017: add %rsi, %rcx
>       0x21d1cec0101a: mov $0x2a, (%rcx)
>       0x21d1cec01020: inc %eax
>       0x21d1cec01022: cmp %edi, %eax
>       0x21d1cec01024: jb 0x21d1cec01012
>       0x21d1cec0102a: jmp 0x21d1cec0100d
> 
> I want to figure out why the add and mov are not being compressed into one
> instruction.
> 
> It's also to bad that we can't fuse the shl, add, and mov since I expect
> that case to be common. Note, we can't fuse the shl with the others since
> it's 32-bit and the other values are 64-bit.

Can you please JSC_dumpGraphAtEachPhase=true?
Comment 4 Filip Pizlo 2016-10-13 11:20:57 PDT
(In reply to comment #2)
> For some reason the test case in the patch produces the following
> disassembly on x86_64:
> 
>       0x21d1cec01000: push %rbp
>       0x21d1cec01001: mov %rsp, %rbp
>       0x21d1cec01004: cmp $0x0, %edi
>       0x21d1cec01007: ja 0x21d1cec0100f
>       0x21d1cec0100d: pop %rbp
>       0x21d1cec0100e: ret
>       0x21d1cec0100f: xor %rax, %rax
>       0x21d1cec01012: mov %eax, %ecx
>       0x21d1cec01014: shl $0x2, %ecx

This is a 32-bit left-shift-by-2.  There is no lea for that.

Lea is only for Add(Shl(value, const), value).  A lone Shl is better off as a lone Shl.

>       0x21d1cec01017: add %rsi, %rcx

You're emitting this instruction yourself.  This add instruction is there because of the code you added to B3LowerToAir.

>       0x21d1cec0101a: mov $0x2a, (%rcx)
>       0x21d1cec01020: inc %eax
>       0x21d1cec01022: cmp %edi, %eax
>       0x21d1cec01024: jb 0x21d1cec01012
>       0x21d1cec0102a: jmp 0x21d1cec0100d
> 
> I want to figure out why the add and mov are not being compressed into one
> instruction.
> 
> It's also to bad that we can't fuse the shl, add, and mov since I expect
> that case to be common. Note, we can't fuse the shl with the others since
> it's 32-bit and the other values are 64-bit.
Comment 5 Keith Miller 2016-10-14 15:30:38 PDT
Created attachment 291672 [details]
Patch
Comment 6 Filip Pizlo 2016-10-14 15:36:42 PDT
Comment on attachment 291672 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=291672&action=review

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:521
> +            // We would further pattern match on the opcode of the pointer and
> +            // try to construct a more complete addressing mode. But since any
> +            // math on the pointer would be 32-bit there is not much we can do.

Remove this comment, since you already say specifically about the ARM thing.

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:526
> +            // FIXME: We should support ARM64 LDR 32-bit addressing, which will
> +            // allow us to fuse a Shl ptr, 2 into the address. Additionally, and
> +            // perhaps more importantly, it would allow us to avoid a truncating
> +            // move.

File a bug and reference it here.

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:528
> +            return Arg::index(Air::Tmp(wasmAddress->pinnedGPR()), tmp(pointer), 1, offset);

Do you have to say Air:: here?

> Source/JavaScriptCore/b3/testb3.cpp:13766
> +void testWasmAddress()
> +    {

Indentation.
Comment 7 Keith Miller 2016-10-14 16:08:06 PDT
Created attachment 291677 [details]
Patch for landing
Comment 8 Keith Miller 2016-10-14 16:24:48 PDT
Committed r207360: <http://trac.webkit.org/changeset/207360>