Bug 181466 - Support MultiGetByOffset in the DFG
Summary: Support MultiGetByOffset in the DFG
Status: REOPENED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Saam Barati
URL:
Keywords: InRadar
Depends on: 181788
Blocks:
  Show dependency treegraph
 
Reported: 2018-01-09 18:22 PST by Saam Barati
Modified: 2018-01-18 00:25 PST (History)
15 users (show)

See Also:


Attachments
patch (7.96 KB, patch)
2018-01-09 19:13 PST, Saam Barati
no flags Details | Formatted Diff | Diff
patch (7.97 KB, patch)
2018-01-10 12:05 PST, Saam Barati
ews-watchlist: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews103 for mac-sierra (2.26 MB, application/zip)
2018-01-10 17:24 PST, EWS Watchlist
no flags Details
patch (8.49 KB, patch)
2018-01-11 12:39 PST, Saam Barati
saam: review-
ews-watchlist: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews102 for mac-sierra (2.38 MB, application/zip)
2018-01-11 14:09 PST, EWS Watchlist
no flags Details
patch (8.89 KB, patch)
2018-01-11 15:19 PST, Saam Barati
ews-watchlist: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews113 for mac-sierra (2.94 MB, application/zip)
2018-01-11 17:02 PST, EWS Watchlist
no flags Details
patch (8.88 KB, patch)
2018-01-17 14:21 PST, Saam Barati
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews100 for mac-sierra (2.20 MB, application/zip)
2018-01-17 15:14 PST, EWS Watchlist
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Saam Barati 2018-01-09 18:22:10 PST
I already wrote the code. I'm not sure if there was a good reason to not do this when first introducing MultiGetByOffset. But if it's not a regression, we might as well take it.
Comment 1 Saam Barati 2018-01-09 19:13:50 PST
Created attachment 330870 [details]
patch
Comment 2 EWS Watchlist 2018-01-09 19:15:58 PST
Attachment 330870 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:5689:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
Total errors found: 1 in 6 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Saam Barati 2018-01-10 11:59:04 PST
Comment on attachment 330870 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=330870&action=review

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:5618
> +    case MultiGetByOffset: {

Oops, this needs to be up in the switch statement
Comment 4 Saam Barati 2018-01-10 12:05:08 PST
Created attachment 330945 [details]
patch
Comment 5 EWS Watchlist 2018-01-10 17:24:17 PST
Comment on attachment 330945 [details]
patch

Attachment 330945 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/6027133

New failing tests:
http/tests/misc/slow-loading-animated-image.html
Comment 6 EWS Watchlist 2018-01-10 17:24:19 PST
Created attachment 330996 [details]
Archive of layout-test-results from ews103 for mac-sierra

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103  Port: mac-sierra  Platform: Mac OS X 10.12.6
Comment 7 Saam Barati 2018-01-10 18:13:38 PST
(In reply to Build Bot from comment #5)
> Comment on attachment 330945 [details]
> patch
> 
> Attachment 330945 [details] did not pass mac-ews (mac):
> Output: http://webkit-queues.webkit.org/results/6027133
> 
> New failing tests:
> http/tests/misc/slow-loading-animated-image.html

I seriously doubt this.
Comment 8 Saam Barati 2018-01-11 12:39:27 PST
Created attachment 331097 [details]
patch

Tighten up the code we generate a bit
Comment 9 EWS Watchlist 2018-01-11 14:09:45 PST
Comment on attachment 331097 [details]
patch

Attachment 331097 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/6039918

New failing tests:
imported/w3c/web-platform-tests/media-source/mediasource-config-change-mp4-a-bitrate.html
Comment 10 EWS Watchlist 2018-01-11 14:09:47 PST
Created attachment 331114 [details]
Archive of layout-test-results from ews102 for mac-sierra

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews102  Port: mac-sierra  Platform: Mac OS X 10.12.6
Comment 11 Saam Barati 2018-01-11 14:57:51 PST
Comment on attachment 331097 [details]
patch

Will revise this patch. Fil reminded me that we rely on profiling GetById in the DFG when it's inlined to allow for each inlined site to have separate profiling data (polyvariant inlining). We should probably only lower to MultiGetByOffset if we're not inlined. Otherwise, it's probably important enough to get that per-inlined-site data.

There's a chance this poly variant profiling doesn't actually matter in practice, and perhaps the FTL has enough data to filter down the MultiGetByOffset, but we might as well do the more conservative thing. And we can re-evaluate the more aggressive thing later.
Comment 12 Saam Barati 2018-01-11 15:19:24 PST
Created attachment 331125 [details]
patch
Comment 13 EWS Watchlist 2018-01-11 17:02:12 PST
Comment on attachment 331125 [details]
patch

Attachment 331125 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/6042029

New failing tests:
fast/history/page-cache-suspended-audiocontext.html
Comment 14 EWS Watchlist 2018-01-11 17:02:14 PST
Created attachment 331142 [details]
Archive of layout-test-results from ews113 for mac-sierra

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews113  Port: mac-sierra  Platform: Mac OS X 10.12.6
Comment 15 Saam Barati 2018-01-11 17:22:30 PST
(In reply to Build Bot from comment #13)
> Comment on attachment 331125 [details]
> patch
> 
> Attachment 331125 [details] did not pass mac-debug-ews (mac):
> Output: http://webkit-queues.webkit.org/results/6042029
> 
> New failing tests:
> fast/history/page-cache-suspended-audiocontext.html

This is crashing in some audio code. Seems really unlikely it's this change.
Comment 16 Saam Barati 2018-01-17 14:21:09 PST
Created attachment 331542 [details]
patch

Let's retest EWS
Comment 17 Keith Miller 2018-01-17 14:37:45 PST
Comment on attachment 331542 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=331542&action=review

r=me with some comments.

> Source/JavaScriptCore/dfg/DFGGraph.h:980
> +        // We want to ensure we get polyvariant profiling data from the GetById. This allows
> +        // the same get_by_id inlined into two separate functions to get independent profiling data.

It seems like we could still profile in the MultiGetByOffset when the the origin is from an inlined source. Maybe file a bug?

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:4371
> +                m_jit.load32(JITCompiler::Address(baseGPR, JSCell::structureIDOffset()), resultGPR);
> +                JITCompiler::JumpList match;
> +                for (size_t i = 0; i < structures.size() - 1; ++i) {
> +                    match.append(
> +                        m_jit.branchWeakStructure(JITCompiler::Equal, resultGPR, structures[i]));
> +                }
> +                next = m_jit.branchWeakStructure(JITCompiler::NotEqual, resultGPR, structures.last());
> +                match.link(&m_jit);

Couldn't this use BinarySwitch?
Comment 18 EWS Watchlist 2018-01-17 15:14:13 PST
Comment on attachment 331542 [details]
patch

Attachment 331542 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/6111320

New failing tests:
imported/w3c/web-platform-tests/media-source/mediasource-config-change-mp4-av-audio-bitrate.html
Comment 19 EWS Watchlist 2018-01-17 15:14:15 PST
Created attachment 331545 [details]
Archive of layout-test-results from ews100 for mac-sierra

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews100  Port: mac-sierra  Platform: Mac OS X 10.12.6
Comment 20 Saam Barati 2018-01-17 15:26:55 PST
Comment on attachment 331542 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=331542&action=review

>> Source/JavaScriptCore/dfg/DFGGraph.h:980
>> +        // the same get_by_id inlined into two separate functions to get independent profiling data.
> 
> It seems like we could still profile in the MultiGetByOffset when the the origin is from an inlined source. Maybe file a bug?

At this point, I don't think it'd really be worth building a second way to profile input. If we really want this, we should just leave this code as is, since GetById is good at doing profiling for us.

>> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:4371
>> +                match.link(&m_jit);
> 
> Couldn't this use BinarySwitch?

It could. I'm not sure if it's any better though. It'll mean increasing the code size by 2x. This is the same pattern CheckStructure uses.
Comment 21 Filip Pizlo 2018-01-17 15:29:21 PST
(In reply to Saam Barati from comment #20)
> Comment on attachment 331542 [details]
> patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=331542&action=review
> 
> >> Source/JavaScriptCore/dfg/DFGGraph.h:980
> >> +        // the same get_by_id inlined into two separate functions to get independent profiling data.
> > 
> > It seems like we could still profile in the MultiGetByOffset when the the origin is from an inlined source. Maybe file a bug?
> 
> At this point, I don't think it'd really be worth building a second way to
> profile input. If we really want this, we should just leave this code as is,
> since GetById is good at doing profiling for us.
> 
> >> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:4371
> >> +                match.link(&m_jit);
> > 
> > Couldn't this use BinarySwitch?
> 
> It could. I'm not sure if it's any better though. 

It's designed to always be better.  It'll use a linear set of jumps if that's most profitable, otherwise it will bisect.

> It'll mean increasing the
> code size by 2x.

Where is this number from?

> This is the same pattern CheckStructure uses.

I've been wanting to convert CheckStructure to BinarySwitch!
Comment 22 Saam Barati 2018-01-17 15:54:49 PST
(In reply to Filip Pizlo from comment #21)
> (In reply to Saam Barati from comment #20)
> > Comment on attachment 331542 [details]
> > patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=331542&action=review
> > 
> > >> Source/JavaScriptCore/dfg/DFGGraph.h:980
> > >> +        // the same get_by_id inlined into two separate functions to get independent profiling data.
> > > 
> > > It seems like we could still profile in the MultiGetByOffset when the the origin is from an inlined source. Maybe file a bug?
> > 
> > At this point, I don't think it'd really be worth building a second way to
> > profile input. If we really want this, we should just leave this code as is,
> > since GetById is good at doing profiling for us.
> > 
> > >> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:4371
> > >> +                match.link(&m_jit);
> > > 
> > > Couldn't this use BinarySwitch?
> > 
> > It could. I'm not sure if it's any better though. 
> 
> It's designed to always be better.  It'll use a linear set of jumps if
> that's most profitable, otherwise it will bisect.
> 
> > It'll mean increasing the
> > code size by 2x.
> 
> Where is this number from?
2x is too high. But 20% is probably accurate. The number comes from the code we have now jumps to the success part when the structure is a match. Where as binary switch, it'll does an inverted compare to jump to the next case if it isn't a match. Then, the matching code must emit a jump. Hence, we end up with 1/3rd more instructions.

So we go from this:
          0x415f4a800260: mov (%rax), %esi
          0x415f4a800262: cmp $0x120, %esi
          0x415f4a800268: jz 0x415f4a80027a
          0x415f4a80026e: cmp $0x123, %esi
          0x415f4a800274: jnz 0x415f4a800283
          0x415f4a80027a: mov 0x28(%rax), %rsi
          0x415f4a80027e: jmp 0x415f4a8002ab
          0x415f4a800283: mov (%rax), %esi
          0x415f4a800285: cmp $0x11e, %esi
          0x415f4a80028b: jz 0x415f4a80029d
          0x415f4a800291: cmp $0x11c, %esi
          0x415f4a800297: jnz 0x415f4a8002a6
          0x415f4a80029d: mov 0x20(%rax), %rsi
          0x415f4a8002a1: jmp 0x415f4a8002ab
          0x415f4a8002a6: jmp 0x415f4a80043c

To this, with a BinarySwitch that uses linear jumps:
          0x377193a03380: mov (%rax), %esi
          0x377193a03382: cmp $0x123, %esi
          0x377193a03388: jnz 0x377193a03393
          0x377193a0338e: jmp 0x377193a033a4
          0x377193a03393: cmp $0x120, %esi
          0x377193a03399: jnz 0x377193a033ad
          0x377193a0339f: jmp 0x377193a033a4
          0x377193a033a4: mov 0x28(%rax), %rsi
          0x377193a033a8: jmp 0x377193a033df
          0x377193a033ad: mov (%rax), %esi
          0x377193a033af: cmp $0x11c, %esi
          0x377193a033b5: jnz 0x377193a033c0
          0x377193a033bb: jmp 0x377193a033d1
          0x377193a033c0: cmp $0x11e, %esi
          0x377193a033c6: jnz 0x377193a033da
          0x377193a033cc: jmp 0x377193a033d1
          0x377193a033d1: mov 0x20(%rax), %rsi
          0x377193a033d5: jmp 0x377193a033df
          0x377193a033da: jmp 0x377193a03570

Obviously we could make BinarySwitch be better in this situation. But that require some refactoring.

> 
> > This is the same pattern CheckStructure uses.
> 
> I've been wanting to convert CheckStructure to BinarySwitch!
Comment 23 Saam Barati 2018-01-17 15:58:30 PST
(In reply to Saam Barati from comment #22)
> (In reply to Filip Pizlo from comment #21)
> > (In reply to Saam Barati from comment #20)
> > > Comment on attachment 331542 [details]
> > > patch
> > > 
> > > View in context:
> > > https://bugs.webkit.org/attachment.cgi?id=331542&action=review
> > > 
> > > >> Source/JavaScriptCore/dfg/DFGGraph.h:980
> > > >> +        // the same get_by_id inlined into two separate functions to get independent profiling data.
> > > > 
> > > > It seems like we could still profile in the MultiGetByOffset when the the origin is from an inlined source. Maybe file a bug?
> > > 
> > > At this point, I don't think it'd really be worth building a second way to
> > > profile input. If we really want this, we should just leave this code as is,
> > > since GetById is good at doing profiling for us.
> > > 
> > > >> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:4371
> > > >> +                match.link(&m_jit);
> > > > 
> > > > Couldn't this use BinarySwitch?
> > > 
> > > It could. I'm not sure if it's any better though. 
> > 
> > It's designed to always be better.  It'll use a linear set of jumps if
> > that's most profitable, otherwise it will bisect.
> > 
> > > It'll mean increasing the
> > > code size by 2x.
> > 
> > Where is this number from?
> 2x is too high. But 20% is probably accurate. The number comes from the code
> we have now jumps to the success part when the structure is a match. Where
> as binary switch, it'll does an inverted compare to jump to the next case if
> it isn't a match. Then, the matching code must emit a jump. Hence, we end up
> with 1/3rd more instructions.
> 
> So we go from this:
>           0x415f4a800260: mov (%rax), %esi
>           0x415f4a800262: cmp $0x120, %esi
>           0x415f4a800268: jz 0x415f4a80027a
>           0x415f4a80026e: cmp $0x123, %esi
>           0x415f4a800274: jnz 0x415f4a800283
>           0x415f4a80027a: mov 0x28(%rax), %rsi
>           0x415f4a80027e: jmp 0x415f4a8002ab
>           0x415f4a800283: mov (%rax), %esi
>           0x415f4a800285: cmp $0x11e, %esi
>           0x415f4a80028b: jz 0x415f4a80029d
>           0x415f4a800291: cmp $0x11c, %esi
>           0x415f4a800297: jnz 0x415f4a8002a6
>           0x415f4a80029d: mov 0x20(%rax), %rsi
>           0x415f4a8002a1: jmp 0x415f4a8002ab
>           0x415f4a8002a6: jmp 0x415f4a80043c
> 
> To this, with a BinarySwitch that uses linear jumps:
>           0x377193a03380: mov (%rax), %esi
>           0x377193a03382: cmp $0x123, %esi
>           0x377193a03388: jnz 0x377193a03393
>           0x377193a0338e: jmp 0x377193a033a4
>           0x377193a03393: cmp $0x120, %esi
>           0x377193a03399: jnz 0x377193a033ad
>           0x377193a0339f: jmp 0x377193a033a4
>           0x377193a033a4: mov 0x28(%rax), %rsi
>           0x377193a033a8: jmp 0x377193a033df
>           0x377193a033ad: mov (%rax), %esi
>           0x377193a033af: cmp $0x11c, %esi
>           0x377193a033b5: jnz 0x377193a033c0
>           0x377193a033bb: jmp 0x377193a033d1
>           0x377193a033c0: cmp $0x11e, %esi
>           0x377193a033c6: jnz 0x377193a033da
>           0x377193a033cc: jmp 0x377193a033d1
>           0x377193a033d1: mov 0x20(%rax), %rsi
>           0x377193a033d5: jmp 0x377193a033df
>           0x377193a033da: jmp 0x377193a03570
> 
> Obviously we could make BinarySwitch be better in this situation. But that
> require some refactoring.

Here is a comparison when binary switch doesn't do linear search.

my way:
          0x4e2017603340: mov (%rax), %esi
          0x4e2017603342: cmp $0x120, %esi
          0x4e2017603348: jz 0x4e2017603372
          0x4e201760334e: cmp $0x122, %esi
          0x4e2017603354: jz 0x4e2017603372
          0x4e201760335a: cmp $0x11c, %esi
          0x4e2017603360: jz 0x4e2017603372
          0x4e2017603366: cmp $0x11e, %esi
          0x4e201760336c: jnz 0x4e201760337b
          0x4e2017603372: mov 0x20(%rax), %rsi
          0x4e2017603376: jmp 0x4e20176033bb
          0x4e201760337b: mov (%rax), %esi
          0x4e201760337d: cmp $0x124, %esi
          0x4e2017603383: jz 0x4e20176033ad
          0x4e2017603389: cmp $0x127, %esi
          0x4e201760338f: jz 0x4e20176033ad
          0x4e2017603395: cmp $0x12a, %esi
          0x4e201760339b: jz 0x4e20176033ad
          0x4e20176033a1: cmp $0x12d, %esi
          0x4e20176033a7: jnz 0x4e20176033b6
          0x4e20176033ad: mov 0x28(%rax), %rsi
          0x4e20176033b1: jmp 0x4e20176033bb
          0x4e20176033b6: jmp 0x4e201760354c

vs binary switch:
          0x4e952de001c0: mov (%rax), %esi
          0x4e952de001c2: cmp $0x120, %esi
          0x4e952de001c8: jl 0x4e952de001f0
          0x4e952de001ce: cmp $0x122, %esi
          0x4e952de001d4: jnz 0x4e952de001df
          0x4e952de001da: jmp 0x4e952de00212
          0x4e952de001df: cmp $0x120, %esi
          0x4e952de001e5: jnz 0x4e952de0021b
          0x4e952de001eb: jmp 0x4e952de00212
          0x4e952de001f0: cmp $0x11c, %esi
          0x4e952de001f6: jnz 0x4e952de00201
          0x4e952de001fc: jmp 0x4e952de00212
          0x4e952de00201: cmp $0x11e, %esi
          0x4e952de00207: jnz 0x4e952de0021b
          0x4e952de0020d: jmp 0x4e952de00212
          0x4e952de00212: mov 0x20(%rax), %rsi
          0x4e952de00216: jmp 0x4e952de0027b
          0x4e952de0021b: mov (%rax), %esi
          0x4e952de0021d: cmp $0x12a, %esi
          0x4e952de00223: jl 0x4e952de0024b
          0x4e952de00229: cmp $0x12a, %esi
          0x4e952de0022f: jnz 0x4e952de0023a
          0x4e952de00235: jmp 0x4e952de0026d
          0x4e952de0023a: cmp $0x12d, %esi
          0x4e952de00240: jnz 0x4e952de00276
          0x4e952de00246: jmp 0x4e952de0026d
          0x4e952de0024b: cmp $0x124, %esi
          0x4e952de00251: jnz 0x4e952de0025c
          0x4e952de00257: jmp 0x4e952de0026d
          0x4e952de0025c: cmp $0x127, %esi
          0x4e952de00262: jnz 0x4e952de00276
          0x4e952de00268: jmp 0x4e952de0026d
          0x4e952de0026d: mov 0x28(%rax), %rsi
          0x4e952de00271: jmp 0x4e952de0027b
          0x4e952de00276: jmp 0x4e952de0040c

> 
> > 
> > > This is the same pattern CheckStructure uses.
> > 
> > I've been wanting to convert CheckStructure to BinarySwitch!
Comment 24 Saam Barati 2018-01-17 16:13:06 PST
I think what we really want is for BinarySwitch to have a notion of all cases (or a subset of cases) all emitting the same code. But, doing this sounds like a different patch.
Comment 25 Saam Barati 2018-01-17 16:30:20 PST
(In reply to Build Bot from comment #18)
> Comment on attachment 331542 [details]
> patch
> 
> Attachment 331542 [details] did not pass mac-ews (mac):
> Output: http://webkit-queues.webkit.org/results/6111320
> 
> New failing tests:
> imported/w3c/web-platform-tests/media-source/mediasource-config-change-mp4-
> av-audio-bitrate.html

I ran this 100 times locally. I can't reproduce.
Comment 26 WebKit Commit Bot 2018-01-17 16:34:21 PST
Comment on attachment 331542 [details]
patch

Clearing flags on attachment: 331542

Committed r227096: <https://trac.webkit.org/changeset/227096>
Comment 27 WebKit Commit Bot 2018-01-17 16:34:23 PST
All reviewed patches have been landed.  Closing bug.
Comment 28 Radar WebKit Bug Importer 2018-01-17 16:35:40 PST
<rdar://problem/36600812>
Comment 29 Yusuke Suzuki 2018-01-17 20:26:11 PST
It seems that this introduces 15% regression in Octane.
https://arewefastyet.com/#machine=29&view=single&suite=octane&subtest=DeltaBlue
Comment 30 Yusuke Suzuki 2018-01-17 20:29:54 PST
(In reply to Yusuke Suzuki from comment #29)
> It seems that this introduces 15% regression in Octane.
> https://arewefastyet.com/
> #machine=29&view=single&suite=octane&subtest=DeltaBlue

And it causes 34.3% overhead in Octane/Raytrace.
Comment 31 Saam Barati 2018-01-18 00:24:17 PST
(In reply to Yusuke Suzuki from comment #30)
> (In reply to Yusuke Suzuki from comment #29)
> > It seems that this introduces 15% regression in Octane.
> > https://arewefastyet.com/
> > #machine=29&view=single&suite=octane&subtest=DeltaBlue
> 
> And it causes 34.3% overhead in Octane/Raytrace.

Crazy. Probably broke allocation sinking somehow.
I'll roll it out for now.
Comment 32 WebKit Commit Bot 2018-01-18 00:25:00 PST
Re-opened since this is blocked by bug 181788