Bug 163394

Summary: B3 needs a special WasmAddress Opcode
Product: WebKit Reporter: Keith Miller <keith_miller>
Component: JavaScriptCoreAssignee: Keith Miller <keith_miller>
Status: RESOLVED FIXED    
Severity: Normal CC: commit-queue, fpizlo
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 163171    
Attachments:
Description Flags
Working but looking into lea failures
none
Patch
none
Patch for landing none

Keith Miller
Reported 2016-10-13 10:58:34 PDT
...
Attachments
Working but looking into lea failures (17.63 KB, patch)
2016-10-13 11:00 PDT, Keith Miller
no flags
Patch (21.73 KB, patch)
2016-10-14 15:30 PDT, Keith Miller
no flags
Patch for landing (21.46 KB, patch)
2016-10-14 16:08 PDT, Keith Miller
no flags
Keith Miller
Comment 1 2016-10-13 11:00:00 PDT
Created attachment 291495 [details] Working but looking into lea failures
Keith Miller
Comment 2 2016-10-13 11:12:12 PDT
For some reason the test case in the patch produces the following disassembly on x86_64: 0x21d1cec01000: push %rbp 0x21d1cec01001: mov %rsp, %rbp 0x21d1cec01004: cmp $0x0, %edi 0x21d1cec01007: ja 0x21d1cec0100f 0x21d1cec0100d: pop %rbp 0x21d1cec0100e: ret 0x21d1cec0100f: xor %rax, %rax 0x21d1cec01012: mov %eax, %ecx 0x21d1cec01014: shl $0x2, %ecx 0x21d1cec01017: add %rsi, %rcx 0x21d1cec0101a: mov $0x2a, (%rcx) 0x21d1cec01020: inc %eax 0x21d1cec01022: cmp %edi, %eax 0x21d1cec01024: jb 0x21d1cec01012 0x21d1cec0102a: jmp 0x21d1cec0100d I want to figure out why the add and mov are not being compressed into one instruction. It's also to bad that we can't fuse the shl, add, and mov since I expect that case to be common. Note, we can't fuse the shl with the others since it's 32-bit and the other values are 64-bit.
Filip Pizlo
Comment 3 2016-10-13 11:14:04 PDT
(In reply to comment #2) > For some reason the test case in the patch produces the following > disassembly on x86_64: > > 0x21d1cec01000: push %rbp > 0x21d1cec01001: mov %rsp, %rbp > 0x21d1cec01004: cmp $0x0, %edi > 0x21d1cec01007: ja 0x21d1cec0100f > 0x21d1cec0100d: pop %rbp > 0x21d1cec0100e: ret > 0x21d1cec0100f: xor %rax, %rax > 0x21d1cec01012: mov %eax, %ecx > 0x21d1cec01014: shl $0x2, %ecx > 0x21d1cec01017: add %rsi, %rcx > 0x21d1cec0101a: mov $0x2a, (%rcx) > 0x21d1cec01020: inc %eax > 0x21d1cec01022: cmp %edi, %eax > 0x21d1cec01024: jb 0x21d1cec01012 > 0x21d1cec0102a: jmp 0x21d1cec0100d > > I want to figure out why the add and mov are not being compressed into one > instruction. > > It's also to bad that we can't fuse the shl, add, and mov since I expect > that case to be common. Note, we can't fuse the shl with the others since > it's 32-bit and the other values are 64-bit. Can you please JSC_dumpGraphAtEachPhase=true?
Filip Pizlo
Comment 4 2016-10-13 11:20:57 PDT
(In reply to comment #2) > For some reason the test case in the patch produces the following > disassembly on x86_64: > > 0x21d1cec01000: push %rbp > 0x21d1cec01001: mov %rsp, %rbp > 0x21d1cec01004: cmp $0x0, %edi > 0x21d1cec01007: ja 0x21d1cec0100f > 0x21d1cec0100d: pop %rbp > 0x21d1cec0100e: ret > 0x21d1cec0100f: xor %rax, %rax > 0x21d1cec01012: mov %eax, %ecx > 0x21d1cec01014: shl $0x2, %ecx This is a 32-bit left-shift-by-2. There is no lea for that. Lea is only for Add(Shl(value, const), value). A lone Shl is better off as a lone Shl. > 0x21d1cec01017: add %rsi, %rcx You're emitting this instruction yourself. This add instruction is there because of the code you added to B3LowerToAir. > 0x21d1cec0101a: mov $0x2a, (%rcx) > 0x21d1cec01020: inc %eax > 0x21d1cec01022: cmp %edi, %eax > 0x21d1cec01024: jb 0x21d1cec01012 > 0x21d1cec0102a: jmp 0x21d1cec0100d > > I want to figure out why the add and mov are not being compressed into one > instruction. > > It's also to bad that we can't fuse the shl, add, and mov since I expect > that case to be common. Note, we can't fuse the shl with the others since > it's 32-bit and the other values are 64-bit.
Keith Miller
Comment 5 2016-10-14 15:30:38 PDT
Filip Pizlo
Comment 6 2016-10-14 15:36:42 PDT
Comment on attachment 291672 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=291672&action=review > Source/JavaScriptCore/b3/B3LowerToAir.cpp:521 > + // We would further pattern match on the opcode of the pointer and > + // try to construct a more complete addressing mode. But since any > + // math on the pointer would be 32-bit there is not much we can do. Remove this comment, since you already say specifically about the ARM thing. > Source/JavaScriptCore/b3/B3LowerToAir.cpp:526 > + // FIXME: We should support ARM64 LDR 32-bit addressing, which will > + // allow us to fuse a Shl ptr, 2 into the address. Additionally, and > + // perhaps more importantly, it would allow us to avoid a truncating > + // move. File a bug and reference it here. > Source/JavaScriptCore/b3/B3LowerToAir.cpp:528 > + return Arg::index(Air::Tmp(wasmAddress->pinnedGPR()), tmp(pointer), 1, offset); Do you have to say Air:: here? > Source/JavaScriptCore/b3/testb3.cpp:13766 > +void testWasmAddress() > + { Indentation.
Keith Miller
Comment 7 2016-10-14 16:08:06 PDT
Created attachment 291677 [details] Patch for landing
Keith Miller
Comment 8 2016-10-14 16:24:48 PDT
Note You need to log in before you can comment on or make changes to this bug.