163394 – B3 needs a special WasmAddress Opcode

RESOLVED FIXED 163394

B3 needs a special WasmAddress Opcode

https://bugs.webkit.org/show_bug.cgi?id=163394

Summary B3 needs a special WasmAddress Opcode

Keith Miller

Reported 2016-10-13 10:58:34 PDT

...

Attachments
Working but looking into lea failures (17.63 KB, patch) 2016-10-13 11:00 PDT, Keith Miller	no flags	Details Formatted Diff Diff
Patch (21.73 KB, patch) 2016-10-14 15:30 PDT, Keith Miller	no flags	Details Formatted Diff Diff
Patch for landing (21.46 KB, patch) 2016-10-14 16:08 PDT, Keith Miller	no flags	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Keith Miller

Comment 1 2016-10-13 11:00:00 PDT

Created attachment 291495 [details] Working but looking into lea failures

Keith Miller

Comment 2 2016-10-13 11:12:12 PDT

For some reason the test case in the patch produces the following disassembly on x86_64: 0x21d1cec01000: push %rbp 0x21d1cec01001: mov %rsp, %rbp 0x21d1cec01004: cmp $0x0, %edi 0x21d1cec01007: ja 0x21d1cec0100f 0x21d1cec0100d: pop %rbp 0x21d1cec0100e: ret 0x21d1cec0100f: xor %rax, %rax 0x21d1cec01012: mov %eax, %ecx 0x21d1cec01014: shl $0x2, %ecx 0x21d1cec01017: add %rsi, %rcx 0x21d1cec0101a: mov $0x2a, (%rcx) 0x21d1cec01020: inc %eax 0x21d1cec01022: cmp %edi, %eax 0x21d1cec01024: jb 0x21d1cec01012 0x21d1cec0102a: jmp 0x21d1cec0100d I want to figure out why the add and mov are not being compressed into one instruction. It's also to bad that we can't fuse the shl, add, and mov since I expect that case to be common. Note, we can't fuse the shl with the others since it's 32-bit and the other values are 64-bit.

Filip Pizlo

Comment 3 2016-10-13 11:14:04 PDT

(In reply to comment #2) > For some reason the test case in the patch produces the following > disassembly on x86_64: > > 0x21d1cec01000: push %rbp > 0x21d1cec01001: mov %rsp, %rbp > 0x21d1cec01004: cmp $0x0, %edi > 0x21d1cec01007: ja 0x21d1cec0100f > 0x21d1cec0100d: pop %rbp > 0x21d1cec0100e: ret > 0x21d1cec0100f: xor %rax, %rax > 0x21d1cec01012: mov %eax, %ecx > 0x21d1cec01014: shl $0x2, %ecx > 0x21d1cec01017: add %rsi, %rcx > 0x21d1cec0101a: mov $0x2a, (%rcx) > 0x21d1cec01020: inc %eax > 0x21d1cec01022: cmp %edi, %eax > 0x21d1cec01024: jb 0x21d1cec01012 > 0x21d1cec0102a: jmp 0x21d1cec0100d > > I want to figure out why the add and mov are not being compressed into one > instruction. > > It's also to bad that we can't fuse the shl, add, and mov since I expect > that case to be common. Note, we can't fuse the shl with the others since > it's 32-bit and the other values are 64-bit. Can you please JSC_dumpGraphAtEachPhase=true?

Filip Pizlo

Comment 4 2016-10-13 11:20:57 PDT

(In reply to comment #2) > For some reason the test case in the patch produces the following > disassembly on x86_64: > > 0x21d1cec01000: push %rbp > 0x21d1cec01001: mov %rsp, %rbp > 0x21d1cec01004: cmp $0x0, %edi > 0x21d1cec01007: ja 0x21d1cec0100f > 0x21d1cec0100d: pop %rbp > 0x21d1cec0100e: ret > 0x21d1cec0100f: xor %rax, %rax > 0x21d1cec01012: mov %eax, %ecx > 0x21d1cec01014: shl $0x2, %ecx This is a 32-bit left-shift-by-2. There is no lea for that. Lea is only for Add(Shl(value, const), value). A lone Shl is better off as a lone Shl. > 0x21d1cec01017: add %rsi, %rcx You're emitting this instruction yourself. This add instruction is there because of the code you added to B3LowerToAir. > 0x21d1cec0101a: mov $0x2a, (%rcx) > 0x21d1cec01020: inc %eax > 0x21d1cec01022: cmp %edi, %eax > 0x21d1cec01024: jb 0x21d1cec01012 > 0x21d1cec0102a: jmp 0x21d1cec0100d > > I want to figure out why the add and mov are not being compressed into one > instruction. > > It's also to bad that we can't fuse the shl, add, and mov since I expect > that case to be common. Note, we can't fuse the shl with the others since > it's 32-bit and the other values are 64-bit.

Keith Miller

Comment 5 2016-10-14 15:30:38 PDT

Created attachment 291672 [details] Patch

Filip Pizlo

Comment 6 2016-10-14 15:36:42 PDT

Comment on attachment 291672 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=291672&action=review > Source/JavaScriptCore/b3/B3LowerToAir.cpp:521 > + // We would further pattern match on the opcode of the pointer and > + // try to construct a more complete addressing mode. But since any > + // math on the pointer would be 32-bit there is not much we can do. Remove this comment, since you already say specifically about the ARM thing. > Source/JavaScriptCore/b3/B3LowerToAir.cpp:526 > + // FIXME: We should support ARM64 LDR 32-bit addressing, which will > + // allow us to fuse a Shl ptr, 2 into the address. Additionally, and > + // perhaps more importantly, it would allow us to avoid a truncating > + // move. File a bug and reference it here. > Source/JavaScriptCore/b3/B3LowerToAir.cpp:528 > + return Arg::index(Air::Tmp(wasmAddress->pinnedGPR()), tmp(pointer), 1, offset); Do you have to say Air:: here? > Source/JavaScriptCore/b3/testb3.cpp:13766 > +void testWasmAddress() > + { Indentation.

Keith Miller

Comment 7 2016-10-14 16:08:06 PDT

Created attachment 291677 [details] Patch for landing

Keith Miller

Comment 8 2016-10-14 16:24:48 PDT

Committed r207360: <http://trac.webkit.org/changeset/207360>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component JavaScriptCore

Assignee

Keith Miller

Reported

2016-10-13 10:58 PDT

Modified

2016-10-14 16:24 PDT History

CC List

2 users Show

URL

Keywords

Depends on

Blocks

163171

Dependencies

tree graph