I already wrote the code. I'm not sure if there was a good reason to not do this when first introducing MultiGetByOffset. But if it's not a regression, we might as well take it.
Created attachment 330870 [details] patch
Attachment 330870 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:5689: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] Total errors found: 1 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 330870 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=330870&action=review > Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:5618 > + case MultiGetByOffset: { Oops, this needs to be up in the switch statement
Created attachment 330945 [details] patch
Comment on attachment 330945 [details] patch Attachment 330945 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/6027133 New failing tests: http/tests/misc/slow-loading-animated-image.html
Created attachment 330996 [details] Archive of layout-test-results from ews103 for mac-sierra The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews103 Port: mac-sierra Platform: Mac OS X 10.12.6
(In reply to Build Bot from comment #5) > Comment on attachment 330945 [details] > patch > > Attachment 330945 [details] did not pass mac-ews (mac): > Output: http://webkit-queues.webkit.org/results/6027133 > > New failing tests: > http/tests/misc/slow-loading-animated-image.html I seriously doubt this.
Created attachment 331097 [details] patch Tighten up the code we generate a bit
Comment on attachment 331097 [details] patch Attachment 331097 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/6039918 New failing tests: imported/w3c/web-platform-tests/media-source/mediasource-config-change-mp4-a-bitrate.html
Created attachment 331114 [details] Archive of layout-test-results from ews102 for mac-sierra The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews102 Port: mac-sierra Platform: Mac OS X 10.12.6
Comment on attachment 331097 [details] patch Will revise this patch. Fil reminded me that we rely on profiling GetById in the DFG when it's inlined to allow for each inlined site to have separate profiling data (polyvariant inlining). We should probably only lower to MultiGetByOffset if we're not inlined. Otherwise, it's probably important enough to get that per-inlined-site data. There's a chance this poly variant profiling doesn't actually matter in practice, and perhaps the FTL has enough data to filter down the MultiGetByOffset, but we might as well do the more conservative thing. And we can re-evaluate the more aggressive thing later.
Created attachment 331125 [details] patch
Comment on attachment 331125 [details] patch Attachment 331125 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/6042029 New failing tests: fast/history/page-cache-suspended-audiocontext.html
Created attachment 331142 [details] Archive of layout-test-results from ews113 for mac-sierra The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews113 Port: mac-sierra Platform: Mac OS X 10.12.6
(In reply to Build Bot from comment #13) > Comment on attachment 331125 [details] > patch > > Attachment 331125 [details] did not pass mac-debug-ews (mac): > Output: http://webkit-queues.webkit.org/results/6042029 > > New failing tests: > fast/history/page-cache-suspended-audiocontext.html This is crashing in some audio code. Seems really unlikely it's this change.
Created attachment 331542 [details] patch Let's retest EWS
Comment on attachment 331542 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=331542&action=review r=me with some comments. > Source/JavaScriptCore/dfg/DFGGraph.h:980 > + // We want to ensure we get polyvariant profiling data from the GetById. This allows > + // the same get_by_id inlined into two separate functions to get independent profiling data. It seems like we could still profile in the MultiGetByOffset when the the origin is from an inlined source. Maybe file a bug? > Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:4371 > + m_jit.load32(JITCompiler::Address(baseGPR, JSCell::structureIDOffset()), resultGPR); > + JITCompiler::JumpList match; > + for (size_t i = 0; i < structures.size() - 1; ++i) { > + match.append( > + m_jit.branchWeakStructure(JITCompiler::Equal, resultGPR, structures[i])); > + } > + next = m_jit.branchWeakStructure(JITCompiler::NotEqual, resultGPR, structures.last()); > + match.link(&m_jit); Couldn't this use BinarySwitch?
Comment on attachment 331542 [details] patch Attachment 331542 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/6111320 New failing tests: imported/w3c/web-platform-tests/media-source/mediasource-config-change-mp4-av-audio-bitrate.html
Created attachment 331545 [details] Archive of layout-test-results from ews100 for mac-sierra The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews100 Port: mac-sierra Platform: Mac OS X 10.12.6
Comment on attachment 331542 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=331542&action=review >> Source/JavaScriptCore/dfg/DFGGraph.h:980 >> + // the same get_by_id inlined into two separate functions to get independent profiling data. > > It seems like we could still profile in the MultiGetByOffset when the the origin is from an inlined source. Maybe file a bug? At this point, I don't think it'd really be worth building a second way to profile input. If we really want this, we should just leave this code as is, since GetById is good at doing profiling for us. >> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:4371 >> + match.link(&m_jit); > > Couldn't this use BinarySwitch? It could. I'm not sure if it's any better though. It'll mean increasing the code size by 2x. This is the same pattern CheckStructure uses.
(In reply to Saam Barati from comment #20) > Comment on attachment 331542 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=331542&action=review > > >> Source/JavaScriptCore/dfg/DFGGraph.h:980 > >> + // the same get_by_id inlined into two separate functions to get independent profiling data. > > > > It seems like we could still profile in the MultiGetByOffset when the the origin is from an inlined source. Maybe file a bug? > > At this point, I don't think it'd really be worth building a second way to > profile input. If we really want this, we should just leave this code as is, > since GetById is good at doing profiling for us. > > >> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:4371 > >> + match.link(&m_jit); > > > > Couldn't this use BinarySwitch? > > It could. I'm not sure if it's any better though. It's designed to always be better. It'll use a linear set of jumps if that's most profitable, otherwise it will bisect. > It'll mean increasing the > code size by 2x. Where is this number from? > This is the same pattern CheckStructure uses. I've been wanting to convert CheckStructure to BinarySwitch!
(In reply to Filip Pizlo from comment #21) > (In reply to Saam Barati from comment #20) > > Comment on attachment 331542 [details] > > patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=331542&action=review > > > > >> Source/JavaScriptCore/dfg/DFGGraph.h:980 > > >> + // the same get_by_id inlined into two separate functions to get independent profiling data. > > > > > > It seems like we could still profile in the MultiGetByOffset when the the origin is from an inlined source. Maybe file a bug? > > > > At this point, I don't think it'd really be worth building a second way to > > profile input. If we really want this, we should just leave this code as is, > > since GetById is good at doing profiling for us. > > > > >> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:4371 > > >> + match.link(&m_jit); > > > > > > Couldn't this use BinarySwitch? > > > > It could. I'm not sure if it's any better though. > > It's designed to always be better. It'll use a linear set of jumps if > that's most profitable, otherwise it will bisect. > > > It'll mean increasing the > > code size by 2x. > > Where is this number from? 2x is too high. But 20% is probably accurate. The number comes from the code we have now jumps to the success part when the structure is a match. Where as binary switch, it'll does an inverted compare to jump to the next case if it isn't a match. Then, the matching code must emit a jump. Hence, we end up with 1/3rd more instructions. So we go from this: 0x415f4a800260: mov (%rax), %esi 0x415f4a800262: cmp $0x120, %esi 0x415f4a800268: jz 0x415f4a80027a 0x415f4a80026e: cmp $0x123, %esi 0x415f4a800274: jnz 0x415f4a800283 0x415f4a80027a: mov 0x28(%rax), %rsi 0x415f4a80027e: jmp 0x415f4a8002ab 0x415f4a800283: mov (%rax), %esi 0x415f4a800285: cmp $0x11e, %esi 0x415f4a80028b: jz 0x415f4a80029d 0x415f4a800291: cmp $0x11c, %esi 0x415f4a800297: jnz 0x415f4a8002a6 0x415f4a80029d: mov 0x20(%rax), %rsi 0x415f4a8002a1: jmp 0x415f4a8002ab 0x415f4a8002a6: jmp 0x415f4a80043c To this, with a BinarySwitch that uses linear jumps: 0x377193a03380: mov (%rax), %esi 0x377193a03382: cmp $0x123, %esi 0x377193a03388: jnz 0x377193a03393 0x377193a0338e: jmp 0x377193a033a4 0x377193a03393: cmp $0x120, %esi 0x377193a03399: jnz 0x377193a033ad 0x377193a0339f: jmp 0x377193a033a4 0x377193a033a4: mov 0x28(%rax), %rsi 0x377193a033a8: jmp 0x377193a033df 0x377193a033ad: mov (%rax), %esi 0x377193a033af: cmp $0x11c, %esi 0x377193a033b5: jnz 0x377193a033c0 0x377193a033bb: jmp 0x377193a033d1 0x377193a033c0: cmp $0x11e, %esi 0x377193a033c6: jnz 0x377193a033da 0x377193a033cc: jmp 0x377193a033d1 0x377193a033d1: mov 0x20(%rax), %rsi 0x377193a033d5: jmp 0x377193a033df 0x377193a033da: jmp 0x377193a03570 Obviously we could make BinarySwitch be better in this situation. But that require some refactoring. > > > This is the same pattern CheckStructure uses. > > I've been wanting to convert CheckStructure to BinarySwitch!
(In reply to Saam Barati from comment #22) > (In reply to Filip Pizlo from comment #21) > > (In reply to Saam Barati from comment #20) > > > Comment on attachment 331542 [details] > > > patch > > > > > > View in context: > > > https://bugs.webkit.org/attachment.cgi?id=331542&action=review > > > > > > >> Source/JavaScriptCore/dfg/DFGGraph.h:980 > > > >> + // the same get_by_id inlined into two separate functions to get independent profiling data. > > > > > > > > It seems like we could still profile in the MultiGetByOffset when the the origin is from an inlined source. Maybe file a bug? > > > > > > At this point, I don't think it'd really be worth building a second way to > > > profile input. If we really want this, we should just leave this code as is, > > > since GetById is good at doing profiling for us. > > > > > > >> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:4371 > > > >> + match.link(&m_jit); > > > > > > > > Couldn't this use BinarySwitch? > > > > > > It could. I'm not sure if it's any better though. > > > > It's designed to always be better. It'll use a linear set of jumps if > > that's most profitable, otherwise it will bisect. > > > > > It'll mean increasing the > > > code size by 2x. > > > > Where is this number from? > 2x is too high. But 20% is probably accurate. The number comes from the code > we have now jumps to the success part when the structure is a match. Where > as binary switch, it'll does an inverted compare to jump to the next case if > it isn't a match. Then, the matching code must emit a jump. Hence, we end up > with 1/3rd more instructions. > > So we go from this: > 0x415f4a800260: mov (%rax), %esi > 0x415f4a800262: cmp $0x120, %esi > 0x415f4a800268: jz 0x415f4a80027a > 0x415f4a80026e: cmp $0x123, %esi > 0x415f4a800274: jnz 0x415f4a800283 > 0x415f4a80027a: mov 0x28(%rax), %rsi > 0x415f4a80027e: jmp 0x415f4a8002ab > 0x415f4a800283: mov (%rax), %esi > 0x415f4a800285: cmp $0x11e, %esi > 0x415f4a80028b: jz 0x415f4a80029d > 0x415f4a800291: cmp $0x11c, %esi > 0x415f4a800297: jnz 0x415f4a8002a6 > 0x415f4a80029d: mov 0x20(%rax), %rsi > 0x415f4a8002a1: jmp 0x415f4a8002ab > 0x415f4a8002a6: jmp 0x415f4a80043c > > To this, with a BinarySwitch that uses linear jumps: > 0x377193a03380: mov (%rax), %esi > 0x377193a03382: cmp $0x123, %esi > 0x377193a03388: jnz 0x377193a03393 > 0x377193a0338e: jmp 0x377193a033a4 > 0x377193a03393: cmp $0x120, %esi > 0x377193a03399: jnz 0x377193a033ad > 0x377193a0339f: jmp 0x377193a033a4 > 0x377193a033a4: mov 0x28(%rax), %rsi > 0x377193a033a8: jmp 0x377193a033df > 0x377193a033ad: mov (%rax), %esi > 0x377193a033af: cmp $0x11c, %esi > 0x377193a033b5: jnz 0x377193a033c0 > 0x377193a033bb: jmp 0x377193a033d1 > 0x377193a033c0: cmp $0x11e, %esi > 0x377193a033c6: jnz 0x377193a033da > 0x377193a033cc: jmp 0x377193a033d1 > 0x377193a033d1: mov 0x20(%rax), %rsi > 0x377193a033d5: jmp 0x377193a033df > 0x377193a033da: jmp 0x377193a03570 > > Obviously we could make BinarySwitch be better in this situation. But that > require some refactoring. Here is a comparison when binary switch doesn't do linear search. my way: 0x4e2017603340: mov (%rax), %esi 0x4e2017603342: cmp $0x120, %esi 0x4e2017603348: jz 0x4e2017603372 0x4e201760334e: cmp $0x122, %esi 0x4e2017603354: jz 0x4e2017603372 0x4e201760335a: cmp $0x11c, %esi 0x4e2017603360: jz 0x4e2017603372 0x4e2017603366: cmp $0x11e, %esi 0x4e201760336c: jnz 0x4e201760337b 0x4e2017603372: mov 0x20(%rax), %rsi 0x4e2017603376: jmp 0x4e20176033bb 0x4e201760337b: mov (%rax), %esi 0x4e201760337d: cmp $0x124, %esi 0x4e2017603383: jz 0x4e20176033ad 0x4e2017603389: cmp $0x127, %esi 0x4e201760338f: jz 0x4e20176033ad 0x4e2017603395: cmp $0x12a, %esi 0x4e201760339b: jz 0x4e20176033ad 0x4e20176033a1: cmp $0x12d, %esi 0x4e20176033a7: jnz 0x4e20176033b6 0x4e20176033ad: mov 0x28(%rax), %rsi 0x4e20176033b1: jmp 0x4e20176033bb 0x4e20176033b6: jmp 0x4e201760354c vs binary switch: 0x4e952de001c0: mov (%rax), %esi 0x4e952de001c2: cmp $0x120, %esi 0x4e952de001c8: jl 0x4e952de001f0 0x4e952de001ce: cmp $0x122, %esi 0x4e952de001d4: jnz 0x4e952de001df 0x4e952de001da: jmp 0x4e952de00212 0x4e952de001df: cmp $0x120, %esi 0x4e952de001e5: jnz 0x4e952de0021b 0x4e952de001eb: jmp 0x4e952de00212 0x4e952de001f0: cmp $0x11c, %esi 0x4e952de001f6: jnz 0x4e952de00201 0x4e952de001fc: jmp 0x4e952de00212 0x4e952de00201: cmp $0x11e, %esi 0x4e952de00207: jnz 0x4e952de0021b 0x4e952de0020d: jmp 0x4e952de00212 0x4e952de00212: mov 0x20(%rax), %rsi 0x4e952de00216: jmp 0x4e952de0027b 0x4e952de0021b: mov (%rax), %esi 0x4e952de0021d: cmp $0x12a, %esi 0x4e952de00223: jl 0x4e952de0024b 0x4e952de00229: cmp $0x12a, %esi 0x4e952de0022f: jnz 0x4e952de0023a 0x4e952de00235: jmp 0x4e952de0026d 0x4e952de0023a: cmp $0x12d, %esi 0x4e952de00240: jnz 0x4e952de00276 0x4e952de00246: jmp 0x4e952de0026d 0x4e952de0024b: cmp $0x124, %esi 0x4e952de00251: jnz 0x4e952de0025c 0x4e952de00257: jmp 0x4e952de0026d 0x4e952de0025c: cmp $0x127, %esi 0x4e952de00262: jnz 0x4e952de00276 0x4e952de00268: jmp 0x4e952de0026d 0x4e952de0026d: mov 0x28(%rax), %rsi 0x4e952de00271: jmp 0x4e952de0027b 0x4e952de00276: jmp 0x4e952de0040c > > > > > > This is the same pattern CheckStructure uses. > > > > I've been wanting to convert CheckStructure to BinarySwitch!
I think what we really want is for BinarySwitch to have a notion of all cases (or a subset of cases) all emitting the same code. But, doing this sounds like a different patch.
(In reply to Build Bot from comment #18) > Comment on attachment 331542 [details] > patch > > Attachment 331542 [details] did not pass mac-ews (mac): > Output: http://webkit-queues.webkit.org/results/6111320 > > New failing tests: > imported/w3c/web-platform-tests/media-source/mediasource-config-change-mp4- > av-audio-bitrate.html I ran this 100 times locally. I can't reproduce.
Comment on attachment 331542 [details] patch Clearing flags on attachment: 331542 Committed r227096: <https://trac.webkit.org/changeset/227096>
All reviewed patches have been landed. Closing bug.
<rdar://problem/36600812>
It seems that this introduces 15% regression in Octane. https://arewefastyet.com/#machine=29&view=single&suite=octane&subtest=DeltaBlue
(In reply to Yusuke Suzuki from comment #29) > It seems that this introduces 15% regression in Octane. > https://arewefastyet.com/ > #machine=29&view=single&suite=octane&subtest=DeltaBlue And it causes 34.3% overhead in Octane/Raytrace.
(In reply to Yusuke Suzuki from comment #30) > (In reply to Yusuke Suzuki from comment #29) > > It seems that this introduces 15% regression in Octane. > > https://arewefastyet.com/ > > #machine=29&view=single&suite=octane&subtest=DeltaBlue > > And it causes 34.3% overhead in Octane/Raytrace. Crazy. Probably broke allocation sinking somehow. I'll roll it out for now.
Re-opened since this is blocked by bug 181788