Bug 167725 - When OSR entering to the baseline JIT from the LLInt for a ProgramCodeBlock we can skip compiling a lot of the program
Summary: When OSR entering to the baseline JIT from the LLInt for a ProgramCodeBlock w...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Local Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Saam Barati
URL:
Keywords: InRadar
Depends on:
Blocks: 167856
  Show dependency treegraph
 
Reported: 2017-02-02 00:42 PST by Saam Barati
Modified: 2017-02-05 11:04 PST (History)
12 users (show)

See Also:


Attachments
WIP (10.00 KB, patch)
2017-02-02 01:17 PST, Saam Barati
no flags Details | Formatted Diff | Diff
WIP (13.67 KB, patch)
2017-02-02 14:21 PST, Saam Barati
no flags Details | Formatted Diff | Diff
patch (16.18 KB, patch)
2017-02-02 16:11 PST, Saam Barati
no flags Details | Formatted Diff | Diff
patch (16.22 KB, patch)
2017-02-02 16:13 PST, Saam Barati
msaboff: review+
saam: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Saam Barati 2017-02-02 00:42:11 PST
For example, consider we're in a loop inside a ProgramCodeBlock and want to run in the baseline JIT, but we've already executed a bunch of this program prior to the loop,
and just want to compile code for things only reachable from the loop. It'd be really nice to not have to compile the stuff we've already executed.

Kraken/ai-astar would really benefit from this treatment. It has 50k LOC that just initializes a bunch of data in an array. We spend like 50-100ms compiling code in the baseline
JIT that we will never execute again. So we should just not compile it.

However, this could be dangerous. We probably need some kind of story for when a ProgramCodeBlock is executed again. I'm not sure off the top of my head what that story is.
It seems unlikely that re-executing a ProgramCodeBlock is common, but I don't think it's impossible.
Comment 1 Saam Barati 2017-02-02 01:17:55 PST
Created attachment 300395 [details]
WIP

It runs kraken faster. But it's still certainly broken for programs that run more than once.
Comment 2 Oliver Hunt 2017-02-02 08:29:17 PST
Program code /should/ be run-once.

By definition it's top level global code, so cannot be addressed directly. Any re-use would be via cached bytecode (so it would have to tier all the way up to a separate copy of jit code anyway)
Comment 3 Saam Barati 2017-02-02 10:31:17 PST
(In reply to comment #2)
> Program code /should/ be run-once.
> 
> By definition it's top level global code, so cannot be addressed directly.
> Any re-use would be via cached bytecode (so it would have to tier all the
> way up to a separate copy of jit code anyway)

Is this true even for the ObjC API?
Comment 4 Saam Barati 2017-02-02 12:09:05 PST
(In reply to comment #2)
> Program code /should/ be run-once.
> 
> By definition it's top level global code, so cannot be addressed directly.
> Any re-use would be via cached bytecode (so it would have to tier all the
> way up to a separate copy of jit code anyway)

Ok, I think I've convinced myself this is true.
There is only one caller of:

`Interpreter::execute(ProgramExecutable*, ...)`

and that caller creates the ProgramExecutable*. Therefore, I think this optimization is sound. I may go further and just remove the Interpreter::execute(ProgramExecutaable*) API and just have the caller pass in the SourceCode/|this| it wants to execute to the Interpreter, and have the Interpreter create the ProgramExecutable. That way, we ensure the safety of this optimization going into the future via the API boundary to Interpreter.
Comment 5 Filip Pizlo 2017-02-02 12:09:53 PST
(In reply to comment #4)
> (In reply to comment #2)
> > Program code /should/ be run-once.
> > 
> > By definition it's top level global code, so cannot be addressed directly.
> > Any re-use would be via cached bytecode (so it would have to tier all the
> > way up to a separate copy of jit code anyway)
> 
> Ok, I think I've convinced myself this is true.
> There is only one caller of:
> 
> `Interpreter::execute(ProgramExecutable*, ...)`
> 
> and that caller creates the ProgramExecutable*. Therefore, I think this
> optimization is sound. I may go further and just remove the
> Interpreter::execute(ProgramExecutaable*) API and just have the caller pass
> in the SourceCode/|this| it wants to execute to the Interpreter, and have
> the Interpreter create the ProgramExecutable. That way, we ensure the safety
> of this optimization going into the future via the API boundary to
> Interpreter.

Nice.
Comment 6 Saam Barati 2017-02-02 14:21:09 PST
Created attachment 300447 [details]
WIP

I wanna see if EWS likes the patch.
Comment 7 Radar WebKit Bug Importer 2017-02-02 15:57:00 PST
<rdar://problem/30339082>
Comment 8 Saam Barati 2017-02-02 16:11:32 PST
Created attachment 300464 [details]
patch
Comment 9 Saam Barati 2017-02-02 16:13:23 PST
Created attachment 300465 [details]
patch

Fix bug name.
Comment 10 Geoffrey Garen 2017-02-02 21:13:57 PST
I think it might be neat to do this optimization more broadly. For example, it would be a shame to miss out on this optimization just because you used an IIFE or a module.

I think it's valid to create a throw-away CodeBlock that can't be re-entered whenever you want to: You just need to avoid installing it in the Executable. The net result could be just like any CodeBlock that has been jettisoned: You're still running on the stack, and that's fine, but you'll never be entered again.
Comment 11 Saam Barati 2017-02-02 21:29:27 PST
(In reply to comment #10)
> I think it might be neat to do this optimization more broadly. For example,
> it would be a shame to miss out on this optimization just because you used
> an IIFE or a module.
> 
> I think it's valid to create a throw-away CodeBlock that can't be re-entered
> whenever you want to: You just need to avoid installing it in the
> Executable. The net result could be just like any CodeBlock that has been
> jettisoned: You're still running on the stack, and that's fine, but you'll
> never be entered again.

I hadn't thought about creating an anonymous CodeBlock. I think that'd work. The only downside is it could lead to us compiling the same code more than once. 

I was also thinking we could plant a jump at the beginning of the code to generate the beginning of the function on first execution. This would allow us to install the code, but would require synchronous compilation of the beginning on first execution. 

Fil has also talked to me about moving the baseline more to a snippet architecture, where we just compile small chunks of bytecode.

My goal of this patch is to keep it small, and then file follow up bugs to do this work for more things, and perhaps in a more comprehensive way. A trivial follow up bug could be to make this work for Module code. I don't think IIFE would be trivial, but we could go with your suggestion of not installing the code for all functions. I still think that perhaps the best architecture could be to move to something like a snippet baseline JIT. This way, just chunks of code would be compiled. If we did this, I think we'd probably want to do some other things. We'd probably want OSR exit to LLInt code, and perhaps some other things. Maybe more value profiling in the LLInt.
Comment 12 Geoffrey Garen 2017-02-02 21:32:43 PST
> My goal of this patch is to keep it small, and then file follow up bugs to
> do this work for more things, and perhaps in a more comprehensive way. A
> trivial follow up bug could be to make this work for Module code. I don't
> think IIFE would be trivial, but we could go with your suggestion of not
> installing the code for all functions. I still think that perhaps the best
> architecture could be to move to something like a snippet baseline JIT. This
> way, just chunks of code would be compiled. If we did this, I think we'd
> probably want to do some other things. We'd probably want OSR exit to LLInt
> code, and perhaps some other things. Maybe more value profiling in the LLInt.

I like all of these suggestions. They're probably not mutually exclusive with the "don't install" trick. Maybe the "don't install" trick can be one step along the way.
Comment 13 Michael Saboff 2017-02-03 09:21:26 PST
Comment on attachment 300465 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300465&action=review

r=me
I like this.

Since you do reachability analysis, do you think that there is much to gain by only generating for reachable basic blocks?  Or do you think it is the case that most code is reachable beyond the entry bytecode offset.

> Source/JavaScriptCore/jit/JIT.cpp:198
> +        if (m_loopOSREntryBytecodeOffset >= 1000) {

Did you reach this offset empirically?  My gut should say that we would be profitable at a much lower offset.
Comment 14 Saam Barati 2017-02-03 10:05:46 PST
(In reply to comment #13)
> Comment on attachment 300465 [details]
> patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=300465&action=review
> 
> r=me
> I like this.
> 
> Since you do reachability analysis, do you think that there is much to gain
> by only generating for reachable basic blocks?  Or do you think it is the
> case that most code is reachable beyond the entry bytecode offset.
Let me try to take data on this. I'll measure the size difference between my simplified approach and if we found precise boundaries that we compiled.

> 
> > Source/JavaScriptCore/jit/JIT.cpp:198
> > +        if (m_loopOSREntryBytecodeOffset >= 1000) {
> 
> Did you reach this offset empirically?  My gut should say that we would be
> profitable at a much lower offset.
I did not reach this empirically. Maybe we should just remove this unless we find it's useful. Or we can make it lower, like 100 or something.
Comment 15 Saam Barati 2017-02-03 11:56:52 PST
(In reply to comment #14)
> (In reply to comment #13)
> > Comment on attachment 300465 [details]
> > patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=300465&action=review
> > 
> > r=me
> > I like this.
> > 
> > Since you do reachability analysis, do you think that there is much to gain
> > by only generating for reachable basic blocks?  Or do you think it is the
> > case that most code is reachable beyond the entry bytecode offset.
> Let me try to take data on this. I'll measure the size difference between my
> simplified approach and if we found precise boundaries that we compiled.
Ok, running all JSC tests, I logged data on the percent of the codeblock we end up compiling both using my current simplified approach, and if we had a precise approach. The difference was almost never above 1 percent. So I'll stick with the simplified approach.

> 
> > 
> > > Source/JavaScriptCore/jit/JIT.cpp:198
> > > +        if (m_loopOSREntryBytecodeOffset >= 1000) {
> > 
> > Did you reach this offset empirically?  My gut should say that we would be
> > profitable at a much lower offset.
> I did not reach this empirically. Maybe we should just remove this unless we
> find it's useful. Or we can make it lower, like 100 or something.
I'm going to go with 200 here. If there are regressions, I'll increase it.
Comment 16 Saam Barati 2017-02-03 12:02:39 PST
landed in:
https://trac.webkit.org/changeset/211642