Summary: | B3 needs a special WasmAddress Opcode | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Keith Miller <keith_miller> | ||||||||
Component: | JavaScriptCore | Assignee: | Keith Miller <keith_miller> | ||||||||
Status: | RESOLVED FIXED | ||||||||||
Severity: | Normal | CC: | commit-queue, fpizlo | ||||||||
Priority: | P2 | ||||||||||
Version: | WebKit Nightly Build | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Bug Depends on: | |||||||||||
Bug Blocks: | 163171 | ||||||||||
Attachments: |
|
Description
Keith Miller
2016-10-13 10:58:34 PDT
Created attachment 291495 [details]
Working but looking into lea failures
For some reason the test case in the patch produces the following disassembly on x86_64: 0x21d1cec01000: push %rbp 0x21d1cec01001: mov %rsp, %rbp 0x21d1cec01004: cmp $0x0, %edi 0x21d1cec01007: ja 0x21d1cec0100f 0x21d1cec0100d: pop %rbp 0x21d1cec0100e: ret 0x21d1cec0100f: xor %rax, %rax 0x21d1cec01012: mov %eax, %ecx 0x21d1cec01014: shl $0x2, %ecx 0x21d1cec01017: add %rsi, %rcx 0x21d1cec0101a: mov $0x2a, (%rcx) 0x21d1cec01020: inc %eax 0x21d1cec01022: cmp %edi, %eax 0x21d1cec01024: jb 0x21d1cec01012 0x21d1cec0102a: jmp 0x21d1cec0100d I want to figure out why the add and mov are not being compressed into one instruction. It's also to bad that we can't fuse the shl, add, and mov since I expect that case to be common. Note, we can't fuse the shl with the others since it's 32-bit and the other values are 64-bit. (In reply to comment #2) > For some reason the test case in the patch produces the following > disassembly on x86_64: > > 0x21d1cec01000: push %rbp > 0x21d1cec01001: mov %rsp, %rbp > 0x21d1cec01004: cmp $0x0, %edi > 0x21d1cec01007: ja 0x21d1cec0100f > 0x21d1cec0100d: pop %rbp > 0x21d1cec0100e: ret > 0x21d1cec0100f: xor %rax, %rax > 0x21d1cec01012: mov %eax, %ecx > 0x21d1cec01014: shl $0x2, %ecx > 0x21d1cec01017: add %rsi, %rcx > 0x21d1cec0101a: mov $0x2a, (%rcx) > 0x21d1cec01020: inc %eax > 0x21d1cec01022: cmp %edi, %eax > 0x21d1cec01024: jb 0x21d1cec01012 > 0x21d1cec0102a: jmp 0x21d1cec0100d > > I want to figure out why the add and mov are not being compressed into one > instruction. > > It's also to bad that we can't fuse the shl, add, and mov since I expect > that case to be common. Note, we can't fuse the shl with the others since > it's 32-bit and the other values are 64-bit. Can you please JSC_dumpGraphAtEachPhase=true? (In reply to comment #2) > For some reason the test case in the patch produces the following > disassembly on x86_64: > > 0x21d1cec01000: push %rbp > 0x21d1cec01001: mov %rsp, %rbp > 0x21d1cec01004: cmp $0x0, %edi > 0x21d1cec01007: ja 0x21d1cec0100f > 0x21d1cec0100d: pop %rbp > 0x21d1cec0100e: ret > 0x21d1cec0100f: xor %rax, %rax > 0x21d1cec01012: mov %eax, %ecx > 0x21d1cec01014: shl $0x2, %ecx This is a 32-bit left-shift-by-2. There is no lea for that. Lea is only for Add(Shl(value, const), value). A lone Shl is better off as a lone Shl. > 0x21d1cec01017: add %rsi, %rcx You're emitting this instruction yourself. This add instruction is there because of the code you added to B3LowerToAir. > 0x21d1cec0101a: mov $0x2a, (%rcx) > 0x21d1cec01020: inc %eax > 0x21d1cec01022: cmp %edi, %eax > 0x21d1cec01024: jb 0x21d1cec01012 > 0x21d1cec0102a: jmp 0x21d1cec0100d > > I want to figure out why the add and mov are not being compressed into one > instruction. > > It's also to bad that we can't fuse the shl, add, and mov since I expect > that case to be common. Note, we can't fuse the shl with the others since > it's 32-bit and the other values are 64-bit. Created attachment 291672 [details]
Patch
Comment on attachment 291672 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=291672&action=review > Source/JavaScriptCore/b3/B3LowerToAir.cpp:521 > + // We would further pattern match on the opcode of the pointer and > + // try to construct a more complete addressing mode. But since any > + // math on the pointer would be 32-bit there is not much we can do. Remove this comment, since you already say specifically about the ARM thing. > Source/JavaScriptCore/b3/B3LowerToAir.cpp:526 > + // FIXME: We should support ARM64 LDR 32-bit addressing, which will > + // allow us to fuse a Shl ptr, 2 into the address. Additionally, and > + // perhaps more importantly, it would allow us to avoid a truncating > + // move. File a bug and reference it here. > Source/JavaScriptCore/b3/B3LowerToAir.cpp:528 > + return Arg::index(Air::Tmp(wasmAddress->pinnedGPR()), tmp(pointer), 1, offset); Do you have to say Air:: here? > Source/JavaScriptCore/b3/testb3.cpp:13766 > +void testWasmAddress() > + { Indentation. Created attachment 291677 [details]
Patch for landing
Committed r207360: <http://trac.webkit.org/changeset/207360> |