Bug 150562 - Update FTL to support UntypedUse operands for op_sub.
Summary: Update FTL to support UntypedUse operands for op_sub.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Local Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Mark Lam
URL:
Keywords:
Depends on: 150687 150712
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-26 09:57 PDT by Mark Lam
Modified: 2015-11-12 11:10 PST (History)
6 users (show)

See Also:


Attachments
the patch. (21.49 KB, patch)
2015-10-27 17:03 PDT, Mark Lam
ggaren: review+
ggaren: commit-queue-
Details | Formatted Diff | Diff
X86_64 benchmark result 1 (64.37 KB, text/plain)
2015-10-28 10:01 PDT, Mark Lam
no flags Details
X86_64 benchmark result 2 (63.92 KB, text/plain)
2015-10-28 10:02 PDT, Mark Lam
no flags Details
X86_64 benchmark result 3 (64.04 KB, text/plain)
2015-10-28 10:02 PDT, Mark Lam
no flags Details
Patch for landing: incorporated Geoff's feedback, and added a JSRegress benchmark. (26.04 KB, patch)
2015-10-28 11:25 PDT, Mark Lam
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Lam 2015-10-26 09:57:59 PDT
Patch coming.
Comment 1 Mark Lam 2015-10-26 15:09:22 PDT
The patch is almost ready, but I'm seeing failed test i.e. got a bug.  Will resolve that first before posting.
Comment 2 Mark Lam 2015-10-27 17:03:11 PDT
Created attachment 264176 [details]
the patch.
Comment 3 Mark Lam 2015-10-27 17:04:53 PDT
Comment on attachment 264176 [details]
the patch.

This patch has passed the JSC tests.  Still need to run benchmarks and layout tests.
Comment 4 Geoffrey Garen 2015-10-27 21:06:37 PDT
Comment on attachment 264176 [details]
the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=264176&action=review

Code looks pretty good.

I think I see one bug.

> Source/JavaScriptCore/ftl/FTLCompile.cpp:308
> +class BinaryArithGenerationContext {

Let's call this BinarySnippetRegisterContext.

> Source/JavaScriptCore/ftl/FTLCompile.cpp:330
> +    void assignRegisters()

You can fold this function into the constructor.

> Source/JavaScriptCore/ftl/FTLCompile.cpp:345
> +#if CPU(X86_64)
> +        reservedRegisters.set(MacroAssembler::scratchRegister);
> +#elif CPU(ARM64)
> +        reservedRegisters.set(MacroAssembler::dataTempRegister);
> +        reservedRegisters.set(MacroAssembler::memoryTempRegister);
> +#endif

I think it would be nicer to give each MacroAssembler something like a static std::array of reserved registers, and have this code loop over that abstract array. That way, the definition of reserved registers lives with the assembler that reserves them. This could go in GPRInfo or MacroAssembler.

> Source/JavaScriptCore/ftl/FTLCompile.cpp:443
> +        auto numberOfBytesUsedToPreserveReusedRegisters =
> +            allocator.preserveReusedRegistersByPushing(fastPathJIT, ScratchRegisterAllocator::ExtraStackSpace::NoExtraSpace);
> +
> +        context.initializeRegisters(fastPathJIT);

This seems wrong. The code preserves all allocated scratch registers and then allocates new scratch registers. My understanding is that preservation is only correct if it happens after we have computed all the scratch registers we will use.

I think you need to initializeRegisters before preserveReused...
Comment 5 Mark Lam 2015-10-27 21:20:35 PDT
Comment on attachment 264176 [details]
the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=264176&action=review

>> Source/JavaScriptCore/ftl/FTLCompile.cpp:308
>> +class BinaryArithGenerationContext {
> 
> Let's call this BinarySnippetRegisterContext.

OK, will do.

>> Source/JavaScriptCore/ftl/FTLCompile.cpp:330
>> +    void assignRegisters()
> 
> You can fold this function into the constructor.

Ok.  Will do.

>> Source/JavaScriptCore/ftl/FTLCompile.cpp:345
>> +#endif
> 
> I think it would be nicer to give each MacroAssembler something like a static std::array of reserved registers, and have this code loop over that abstract array. That way, the definition of reserved registers lives with the assembler that reserves them. This could go in GPRInfo or MacroAssembler.

OK, I'll look into it.

>> Source/JavaScriptCore/ftl/FTLCompile.cpp:443
>> +        context.initializeRegisters(fastPathJIT);
> 
> This seems wrong. The code preserves all allocated scratch registers and then allocates new scratch registers. My understanding is that preservation is only correct if it happens after we have computed all the scratch registers we will use.
> 
> I think you need to initializeRegisters before preserveReused...

initializeRegisters() does not allocate scratch registers.  It only initializes them.  assignRegister() above (which will be folded into the BinarySnippetRegisterContext) did all needed allocations, and waits till after we have preserved their original values before we start writing into the registers here.  Hence, there's no bug here.
Comment 6 Mark Lam 2015-10-28 10:01:44 PDT
Created attachment 264220 [details]
X86_64 benchmark result 1
Comment 7 Mark Lam 2015-10-28 10:02:04 PDT
Created attachment 264221 [details]
X86_64 benchmark result 2
Comment 8 Mark Lam 2015-10-28 10:02:47 PDT
Created attachment 264222 [details]
X86_64 benchmark result 3

Benchmark results show no significant change.
Comment 9 Geoffrey Garen 2015-10-28 10:13:32 PDT
Can you add a benchmark that shows a change?
Comment 10 Mark Lam 2015-10-28 11:25:41 PDT
Created attachment 264230 [details]
Patch for landing: incorporated Geoff's feedback, and added a JSRegress benchmark.
Comment 11 Mark Lam 2015-10-28 11:33:09 PDT
Here's the result of the newly added benchmark:

                           base                      new                                        

ftl-object-sub      379.0623+-19.9868    ^    236.6578+-3.0308        ^ definitely 1.6017x faster

<geometric>         379.0623+-19.9868    ^    236.6578+-3.0308        ^ definitely 1.6017x faster

The 1.6x speed up is deceiving.  Basically, this patch enables the FTL to compile functions that have ArithSub operating on UnusedType operands.  The implementation of this ArithSub in the FTL is effectively the same as the DFG i.e. they both use the JITSubGenerator.  In fact, the FTL is slightly slower because of the extra register setup and restoration work that it has too do to match the snippet's expectations.

The real benefit of this patch is that it allows the FTL to compile that function at all.  So, what I did in the benchmark was created a loop and some conditional logic that the FTL will optimize better than the DFG.  Hence, what the benchmark is actually measuring is the speed benefits of the FTL optimizing that other logic, not the ArithSub itself.  The difference is that with this patch, the FTL can now compile that function whereas before, the FTL cannot, and we had to run with the DFG version.

Will land the patch shortly.
Comment 12 Mark Lam 2015-10-28 11:38:09 PDT
Thanks for the review.  Landed in r191683: <http://trac.webkit.org/r191683>.
Comment 14 Mark Lam 2015-10-29 10:02:10 PDT
(In reply to comment #13)
> This looks like it regressed cdjs:
> https://build.webkit.org/builders/Apple%20Mavericks%2032-
> bit%20JSC%20%28BuildAndTest%29/builds/13753/steps/webkit-32bit-jsc-test/logs/
> stdio

I'm investigating, but I have not been able to reproduce this failure yet.  Not all the bots are able to reproduce it either.  Even the ones that do reproduce it does not do so consistently.  I'll continue investigating.
Comment 15 Geoffrey Garen 2015-10-29 11:19:21 PDT
If we can't resolve in the next day or two, we should roll out. We have too many regressions accumulating.
Comment 16 Mark Lam 2015-10-29 13:07:21 PDT
Re-opening to address issue documented in https://bugs.webkit.org/show_bug.cgi?id=150687.
Comment 17 Mark Lam 2015-10-29 17:17:54 PDT
Found the issue.  It was a dumb mistake.  In BinarySnippetRegisterContext, I shuffle registers to make room for the tag registers and other reserved registers that the snippet expects.  In BinarySnippetRegisterContext::restoreRegister(), I shuffled everything back ... except for the result register.

If the result was ever assigned a scratch (e.g. the incoming result register assigned by LLVM is one the tag registers), then I need to copy the result back from its assigned scratch.  Otherwise, we get the wrong result.

I have a fix but I want to think it through more carefully first.  Will upload a patch later tonight.
Comment 18 Mark Lam 2015-11-12 11:10:25 PST
With https://bugs.webkit.org/show_bug.cgi?id=150712, the remaining issues should be resolved.