Bug 163394

Summary:

B3 needs a special WasmAddress Opcode

Product:

WebKit

Reporter:

Keith Miller <keith_miller>

Component:

JavaScriptCore

Assignee:

Keith Miller <keith_miller>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

commit-queue, fpizlo

Priority:

Version:

WebKit Nightly Build

Hardware:

Unspecified

OS:

Unspecified

Bug Depends on:

Bug Blocks:

163171

Attachments:

Description	Flags
Working but looking into lea failures	none
Patch	none
Patch for landing	none

Keith Miller

Reported 2016-10-13 10:58:34 PDT

...

Attachments
Working but looking into lea failures (17.63 KB, patch) 2016-10-13 11:00 PDT, Keith Miller	no flags	Details Formatted Diff Diff
Patch (21.73 KB, patch) 2016-10-14 15:30 PDT, Keith Miller	no flags	Details Formatted Diff Diff
Patch for landing (21.46 KB, patch) 2016-10-14 16:08 PDT, Keith Miller	no flags	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Keith Miller

Comment 1 2016-10-13 11:00:00 PDT

Created attachment 291495 [details] Working but looking into lea failures

Keith Miller

Comment 2 2016-10-13 11:12:12 PDT

For some reason the test case in the patch produces the following disassembly on x86_64: 0x21d1cec01000: push %rbp 0x21d1cec01001: mov %rsp, %rbp 0x21d1cec01004: cmp $0x0, %edi 0x21d1cec01007: ja 0x21d1cec0100f 0x21d1cec0100d: pop %rbp 0x21d1cec0100e: ret 0x21d1cec0100f: xor %rax, %rax 0x21d1cec01012: mov %eax, %ecx 0x21d1cec01014: shl $0x2, %ecx 0x21d1cec01017: add %rsi, %rcx 0x21d1cec0101a: mov $0x2a, (%rcx) 0x21d1cec01020: inc %eax 0x21d1cec01022: cmp %edi, %eax 0x21d1cec01024: jb 0x21d1cec01012 0x21d1cec0102a: jmp 0x21d1cec0100d I want to figure out why the add and mov are not being compressed into one instruction. It's also to bad that we can't fuse the shl, add, and mov since I expect that case to be common. Note, we can't fuse the shl with the others since it's 32-bit and the other values are 64-bit.

Filip Pizlo

Comment 3 2016-10-13 11:14:04 PDT

(In reply to comment #2) > For some reason the test case in the patch produces the following > disassembly on x86_64: > > 0x21d1cec01000: push %rbp > 0x21d1cec01001: mov %rsp, %rbp > 0x21d1cec01004: cmp $0x0, %edi > 0x21d1cec01007: ja 0x21d1cec0100f > 0x21d1cec0100d: pop %rbp > 0x21d1cec0100e: ret > 0x21d1cec0100f: xor %rax, %rax > 0x21d1cec01012: mov %eax, %ecx > 0x21d1cec01014: shl $0x2, %ecx > 0x21d1cec01017: add %rsi, %rcx > 0x21d1cec0101a: mov $0x2a, (%rcx) > 0x21d1cec01020: inc %eax > 0x21d1cec01022: cmp %edi, %eax > 0x21d1cec01024: jb 0x21d1cec01012 > 0x21d1cec0102a: jmp 0x21d1cec0100d > > I want to figure out why the add and mov are not being compressed into one > instruction. > > It's also to bad that we can't fuse the shl, add, and mov since I expect > that case to be common. Note, we can't fuse the shl with the others since > it's 32-bit and the other values are 64-bit. Can you please JSC_dumpGraphAtEachPhase=true?

Filip Pizlo

Comment 4 2016-10-13 11:20:57 PDT

(In reply to comment #2) > For some reason the test case in the patch produces the following > disassembly on x86_64: > > 0x21d1cec01000: push %rbp > 0x21d1cec01001: mov %rsp, %rbp > 0x21d1cec01004: cmp $0x0, %edi > 0x21d1cec01007: ja 0x21d1cec0100f > 0x21d1cec0100d: pop %rbp > 0x21d1cec0100e: ret > 0x21d1cec0100f: xor %rax, %rax > 0x21d1cec01012: mov %eax, %ecx > 0x21d1cec01014: shl $0x2, %ecx This is a 32-bit left-shift-by-2. There is no lea for that. Lea is only for Add(Shl(value, const), value). A lone Shl is better off as a lone Shl. > 0x21d1cec01017: add %rsi, %rcx You're emitting this instruction yourself. This add instruction is there because of the code you added to B3LowerToAir. > 0x21d1cec0101a: mov $0x2a, (%rcx) > 0x21d1cec01020: inc %eax > 0x21d1cec01022: cmp %edi, %eax > 0x21d1cec01024: jb 0x21d1cec01012 > 0x21d1cec0102a: jmp 0x21d1cec0100d > > I want to figure out why the add and mov are not being compressed into one > instruction. > > It's also to bad that we can't fuse the shl, add, and mov since I expect > that case to be common. Note, we can't fuse the shl with the others since > it's 32-bit and the other values are 64-bit.

Keith Miller

Comment 5 2016-10-14 15:30:38 PDT

Created attachment 291672 [details] Patch

Filip Pizlo

Comment 6 2016-10-14 15:36:42 PDT

Comment on attachment 291672 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=291672&action=review > Source/JavaScriptCore/b3/B3LowerToAir.cpp:521 > + // We would further pattern match on the opcode of the pointer and > + // try to construct a more complete addressing mode. But since any > + // math on the pointer would be 32-bit there is not much we can do. Remove this comment, since you already say specifically about the ARM thing. > Source/JavaScriptCore/b3/B3LowerToAir.cpp:526 > + // FIXME: We should support ARM64 LDR 32-bit addressing, which will > + // allow us to fuse a Shl ptr, 2 into the address. Additionally, and > + // perhaps more importantly, it would allow us to avoid a truncating > + // move. File a bug and reference it here. > Source/JavaScriptCore/b3/B3LowerToAir.cpp:528 > + return Arg::index(Air::Tmp(wasmAddress->pinnedGPR()), tmp(pointer), 1, offset); Do you have to say Air:: here? > Source/JavaScriptCore/b3/testb3.cpp:13766 > +void testWasmAddress() > + { Indentation.

Keith Miller

Comment 7 2016-10-14 16:08:06 PDT

Created attachment 291677 [details] Patch for landing

Keith Miller

Comment 8 2016-10-14 16:24:48 PDT

Committed r207360: <http://trac.webkit.org/changeset/207360>

Note You need to log in before you can comment on or make changes to this bug.