167725 – When OSR entering to the baseline JIT from the LLInt for a ProgramCodeBlock we can skip compiling a lot of the program

RESOLVED FIXED 167725

When OSR entering to the baseline JIT from the LLInt for a ProgramCodeBlock we can skip compiling a lot of the program

https://bugs.webkit.org/show_bug.cgi?id=167725

Summary When OSR entering to the baseline JIT from the LLInt for a ProgramCodeBlock w...

Saam Barati

Reported 2017-02-02 00:42:11 PST

For example, consider we're in a loop inside a ProgramCodeBlock and want to run in the baseline JIT, but we've already executed a bunch of this program prior to the loop, and just want to compile code for things only reachable from the loop. It'd be really nice to not have to compile the stuff we've already executed. Kraken/ai-astar would really benefit from this treatment. It has 50k LOC that just initializes a bunch of data in an array. We spend like 50-100ms compiling code in the baseline JIT that we will never execute again. So we should just not compile it. However, this could be dangerous. We probably need some kind of story for when a ProgramCodeBlock is executed again. I'm not sure off the top of my head what that story is. It seems unlikely that re-executing a ProgramCodeBlock is common, but I don't think it's impossible.

Attachments
WIP (10.00 KB, patch) 2017-02-02 01:17 PST, Saam Barati	no flags	Details Formatted Diff Diff
WIP (13.67 KB, patch) 2017-02-02 14:21 PST, Saam Barati	no flags	Details Formatted Diff Diff
patch (16.18 KB, patch) 2017-02-02 16:11 PST, Saam Barati	no flags	Details Formatted Diff Diff
patch (16.22 KB, patch) 2017-02-02 16:13 PST, Saam Barati	msaboff: review+ saam: commit-queue-	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Saam Barati

Comment 1 2017-02-02 01:17:55 PST

Created attachment 300395 [details] WIP It runs kraken faster. But it's still certainly broken for programs that run more than once.

Oliver Hunt

Comment 2 2017-02-02 08:29:17 PST

Program code /should/ be run-once. By definition it's top level global code, so cannot be addressed directly. Any re-use would be via cached bytecode (so it would have to tier all the way up to a separate copy of jit code anyway)

Saam Barati

Comment 3 2017-02-02 10:31:17 PST

Saam Barati

Comment 4 2017-02-02 12:09:05 PST

(In reply to comment #2) > Program code /should/ be run-once. > > By definition it's top level global code, so cannot be addressed directly. > Any re-use would be via cached bytecode (so it would have to tier all the > way up to a separate copy of jit code anyway) Ok, I think I've convinced myself this is true. There is only one caller of: `Interpreter::execute(ProgramExecutable*, ...)` and that caller creates the ProgramExecutable*. Therefore, I think this optimization is sound. I may go further and just remove the Interpreter::execute(ProgramExecutaable*) API and just have the caller pass in the SourceCode/|this| it wants to execute to the Interpreter, and have the Interpreter create the ProgramExecutable. That way, we ensure the safety of this optimization going into the future via the API boundary to Interpreter.

Filip Pizlo

Comment 5 2017-02-02 12:09:53 PST

(In reply to comment #4) > (In reply to comment #2) > > Program code /should/ be run-once. > > > > By definition it's top level global code, so cannot be addressed directly. > > Any re-use would be via cached bytecode (so it would have to tier all the > > way up to a separate copy of jit code anyway) > > Ok, I think I've convinced myself this is true. > There is only one caller of: > > `Interpreter::execute(ProgramExecutable*, ...)` > > and that caller creates the ProgramExecutable*. Therefore, I think this > optimization is sound. I may go further and just remove the > Interpreter::execute(ProgramExecutaable*) API and just have the caller pass > in the SourceCode/|this| it wants to execute to the Interpreter, and have > the Interpreter create the ProgramExecutable. That way, we ensure the safety > of this optimization going into the future via the API boundary to > Interpreter. Nice.

Saam Barati

Comment 6 2017-02-02 14:21:09 PST

Created attachment 300447 [details] WIP I wanna see if EWS likes the patch.

Radar WebKit Bug Importer

Comment 7 2017-02-02 15:57:00 PST

<rdar://problem/30339082>

Saam Barati

Comment 8 2017-02-02 16:11:32 PST

Created attachment 300464 [details] patch

Saam Barati

Comment 9 2017-02-02 16:13:23 PST

Created attachment 300465 [details] patch Fix bug name.

Geoffrey Garen

Comment 10 2017-02-02 21:13:57 PST

I think it might be neat to do this optimization more broadly. For example, it would be a shame to miss out on this optimization just because you used an IIFE or a module. I think it's valid to create a throw-away CodeBlock that can't be re-entered whenever you want to: You just need to avoid installing it in the Executable. The net result could be just like any CodeBlock that has been jettisoned: You're still running on the stack, and that's fine, but you'll never be entered again.

Saam Barati

Comment 11 2017-02-02 21:29:27 PST

(In reply to comment #10) > I think it might be neat to do this optimization more broadly. For example, > it would be a shame to miss out on this optimization just because you used > an IIFE or a module. > > I think it's valid to create a throw-away CodeBlock that can't be re-entered > whenever you want to: You just need to avoid installing it in the > Executable. The net result could be just like any CodeBlock that has been > jettisoned: You're still running on the stack, and that's fine, but you'll > never be entered again. I hadn't thought about creating an anonymous CodeBlock. I think that'd work. The only downside is it could lead to us compiling the same code more than once. I was also thinking we could plant a jump at the beginning of the code to generate the beginning of the function on first execution. This would allow us to install the code, but would require synchronous compilation of the beginning on first execution. Fil has also talked to me about moving the baseline more to a snippet architecture, where we just compile small chunks of bytecode. My goal of this patch is to keep it small, and then file follow up bugs to do this work for more things, and perhaps in a more comprehensive way. A trivial follow up bug could be to make this work for Module code. I don't think IIFE would be trivial, but we could go with your suggestion of not installing the code for all functions. I still think that perhaps the best architecture could be to move to something like a snippet baseline JIT. This way, just chunks of code would be compiled. If we did this, I think we'd probably want to do some other things. We'd probably want OSR exit to LLInt code, and perhaps some other things. Maybe more value profiling in the LLInt.

Geoffrey Garen

Comment 12 2017-02-02 21:32:43 PST

> My goal of this patch is to keep it small, and then file follow up bugs to > do this work for more things, and perhaps in a more comprehensive way. A > trivial follow up bug could be to make this work for Module code. I don't > think IIFE would be trivial, but we could go with your suggestion of not > installing the code for all functions. I still think that perhaps the best > architecture could be to move to something like a snippet baseline JIT. This > way, just chunks of code would be compiled. If we did this, I think we'd > probably want to do some other things. We'd probably want OSR exit to LLInt > code, and perhaps some other things. Maybe more value profiling in the LLInt. I like all of these suggestions. They're probably not mutually exclusive with the "don't install" trick. Maybe the "don't install" trick can be one step along the way.

Michael Saboff

Comment 13 2017-02-03 09:21:26 PST

Comment on attachment 300465 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=300465&action=review r=me I like this. Since you do reachability analysis, do you think that there is much to gain by only generating for reachable basic blocks? Or do you think it is the case that most code is reachable beyond the entry bytecode offset. > Source/JavaScriptCore/jit/JIT.cpp:198 > + if (m_loopOSREntryBytecodeOffset >= 1000) { Did you reach this offset empirically? My gut should say that we would be profitable at a much lower offset.

Saam Barati

Comment 14 2017-02-03 10:05:46 PST

(In reply to comment #13) > Comment on attachment 300465 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=300465&action=review > > r=me > I like this. > > Since you do reachability analysis, do you think that there is much to gain > by only generating for reachable basic blocks? Or do you think it is the > case that most code is reachable beyond the entry bytecode offset. Let me try to take data on this. I'll measure the size difference between my simplified approach and if we found precise boundaries that we compiled. > > > Source/JavaScriptCore/jit/JIT.cpp:198 > > + if (m_loopOSREntryBytecodeOffset >= 1000) { > > Did you reach this offset empirically? My gut should say that we would be > profitable at a much lower offset. I did not reach this empirically. Maybe we should just remove this unless we find it's useful. Or we can make it lower, like 100 or something.

Saam Barati

Comment 15 2017-02-03 11:56:52 PST

(In reply to comment #14) > (In reply to comment #13) > > Comment on attachment 300465 [details] > > patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=300465&action=review > > > > r=me > > I like this. > > > > Since you do reachability analysis, do you think that there is much to gain > > by only generating for reachable basic blocks? Or do you think it is the > > case that most code is reachable beyond the entry bytecode offset. > Let me try to take data on this. I'll measure the size difference between my > simplified approach and if we found precise boundaries that we compiled. Ok, running all JSC tests, I logged data on the percent of the codeblock we end up compiling both using my current simplified approach, and if we had a precise approach. The difference was almost never above 1 percent. So I'll stick with the simplified approach. > > > > > > Source/JavaScriptCore/jit/JIT.cpp:198 > > > + if (m_loopOSREntryBytecodeOffset >= 1000) { > > > > Did you reach this offset empirically? My gut should say that we would be > > profitable at a much lower offset. > I did not reach this empirically. Maybe we should just remove this unless we > find it's useful. Or we can make it lower, like 100 or something. I'm going to go with 200 here. If there are regressions, I'll increase it.

Saam Barati

Comment 16 2017-02-03 12:02:39 PST

landed in: https://trac.webkit.org/changeset/211642

Saam Barati

Comment 17 2017-02-03 12:10:19 PST

Ok, some follow up bugs w.r.t Geoff's suggestions: - https://bugs.webkit.org/show_bug.cgi?id=167808 - https://bugs.webkit.org/show_bug.cgi?id=167807 - https://bugs.webkit.org/show_bug.cgi?id=167805

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Local Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component JavaScriptCore

Assignee

Saam Barati

Reported

2017-02-02 00:42 PST

Modified

2017-02-05 11:04 PST History

CC List

12 users Show

URL

Keywords InRadar

Depends on

Blocks

167856

Dependencies

tree graph