original plan try to do this as a replacement of the baseline.
Created attachment 387969 [details] WIP it begins
We're probably going to change directions here and invent a new optimizer that works over bytecode. The main optimizations we're planning on doing are type speculation and register allocation. To get fast compile times, the goal will also be to not have to construct an entirely new IR. We'd also want to get to the point where each opcode can speculate on what its inputs are without doing some kind of prediction propagation phase. Many opcodes already have enough metadata to do this. However, not all do. We'll have to invent ways for all to do this. Our reasoning for this was based on this data. Here is the JS2 score in various configurations of JSC: [A] Baseline only: 67 [B] DFG only: 137 [C] DFG only, no inlining, no access inlining, AI says each structure set is infinite, but each variable still has a flush format: 116 [D] DFG only, no inlining, no access inlining, AI says each structure set is infinite, each variable has JSValue flush format: 109 [E] DFG only, no inlining, no access inlining, AI says each structure set is infinite, each variable has JSValue flush format, no intrinsic inlining: 106 Here are our compile time numbers on a large subset of JS2 in [E]: Baseline Compile Time: 171.968 ms DFG Compile Time: 1396.220 ms FTL (B3) Compile Time: 0.000 ms FTL (DFG) Compile Time: 0.000 ms FTL Compile Time: 0.000 ms Total Compile Time: 1568.189 ms [DFG] bytecode parser total ms: 380.117813 [DFG] live catch variable preservation phase total ms: 1.461701 [DFG] CPS rethreading total ms: 48.442496 [DFG] unification total ms: 11.940636 [DFG] prediction injection total ms: 2.781105 [DFG] static execution count estimation total ms: 43.239496 [DFG] backwards propagation total ms: 22.783780 [DFG] prediction propagation total ms: 51.928968 [DFG] fixup total ms: 55.983858 [DFG] invalidation point injection total ms: 24.982650 [DFG] control flow analysis total ms: 101.180000 [DFG] tier-up check injection total ms: 0.332139 [DFG] fast store barrier insertion total ms: 22.395290 [DFG] dead code elimination total ms: 42.648463 [DFG] phantom insertion total ms: 47.266917 [DFG] stack layout total ms: 12.130792 [DFG] virtual register allocation total ms: 10.887650 [DFG] watchpoint collection total ms: 4.037530 [DFG] machine code generation total ms: 259.451395 Just building the IR and generating machine code makes us > 2x slower than baseline compile times. As seen in [E], we get a lot of benefit just from register allocation and type speculation. Having a dedicated "IR" for making this fast could lead to really good results. The IR here could likely just be bit vectors for block boundaries representing the state of types. Or perhaps, we'd just need one bit vector we update as we parse a block. The goal would be able to do this over bytecode itself. This also had a nice OSR exit story, since if we're compiling the bytecode, OSR exit is straight forward.
I think we're gonna do this building off of baseline JIT. This allows us to incrementally improve the JIT without implementing all opcodes in optimized form.
Created attachment 402617 [details] WIP it begins round 2
Created attachment 402702 [details] WIP
Comment on attachment 402702 [details] WIP View in context: https://bugs.webkit.org/attachment.cgi?id=402702&action=review > Source/JavaScriptCore/jit/JIT.cpp:63 > +void JIT::stepOverInstruction(const Instruction* instruction, const HashMap<ValueProfile*, SpeculatedType>& profiles) Not sure you're going to want that to be a hashtable.
(In reply to Filip Pizlo from comment #6) > Comment on attachment 402702 [details] > WIP > > View in context: > https://bugs.webkit.org/attachment.cgi?id=402702&action=review > > > Source/JavaScriptCore/jit/JIT.cpp:63 > > +void JIT::stepOverInstruction(const Instruction* instruction, const HashMap<ValueProfile*, SpeculatedType>& profiles) > > Not sure you're going to want that to be a hashtable. Yeah I kinda doubted this too, and questioned it when I first started. The other thing I convinced myself is ok to do is to just racily read out the prediction. Since the prediction should only get wider as we read it, it shouldn't cause the algorithm to be unsound.
Created attachment 402811 [details] WIP
Created attachment 402832 [details] WIP
Created attachment 402886 [details] WIP
Created attachment 402928 [details] WIP Ok, I think the prediction propagator is good enough to start working on other parts of the system.
Created attachment 402930 [details] WIP
Created attachment 403159 [details] WIP
Created attachment 403342 [details] WIP It can now speculate and register allocate this program: ``` function foo(x) { x += 2; let y; if (x > 5) { y = 42; } else { y = 77; } x = x + 48; x = x + 1337; x = x + 42; x += y; x += y; x += y; x += y; x += y; x += y; x += y; x += y; x += y; x += y; x += y; x += y; x += y; x += y; x += y; x += y; x += y; x += y; x += y; x += y; x += y; x += y; return x + y; } ``` Unsurprisingly, runs 2x faster than old baseline code. Seems to compile in a similar amount of time.
Created attachment 403811 [details] WIP
Created attachment 403839 [details] WIP
Created attachment 403841 [details] WIP
Created attachment 403941 [details] WIP
Created attachment 404019 [details] WIP Starting to do some basic double math. It's ~2x faster also.
Created attachment 404037 [details] WIP
Created attachment 404517 [details] WIP
Created attachment 409399 [details] WIP rebased
Created attachment 409401 [details] WIP
Created attachment 409417 [details] WIP
Created attachment 409625 [details] WIP
Created attachment 410075 [details] WIP
Created attachment 410146 [details] WIP Getting silent refill working
Created attachment 410302 [details] WIP
Created attachment 410403 [details] WIP
Created attachment 410575 [details] WIP
Created attachment 410602 [details] WIP
Created attachment 410723 [details] WIP
Created attachment 410769 [details] WIP
Created attachment 410808 [details] WIP
Created attachment 411814 [details] WIP
Created attachment 411823 [details] WIP
Created attachment 411946 [details] WIP
<rdar://problem/69217617>
Created attachment 412345 [details] WIP
Created attachment 412378 [details] WIP
Created attachment 412481 [details] WIP
Created attachment 412484 [details] WIP
Created attachment 412990 [details] WIP
Created attachment 413624 [details] WIP
Created attachment 413747 [details] WIP
Created attachment 413758 [details] WIP
Created attachment 413979 [details] WIP
Created attachment 413980 [details] WIP
Created attachment 414002 [details] WIP
Created attachment 414089 [details] WIP
Created attachment 414115 [details] WIP
Created attachment 415161 [details] WIP
Created attachment 415270 [details] WIP
Created attachment 415356 [details] WIP
Created attachment 415380 [details] WIP
Created attachment 415471 [details] WIP
Created attachment 415476 [details] WIP Fixing a bunch of bugs in the baseline JIT when run without LOL
Created attachment 415479 [details] WIP
Created attachment 415481 [details] WIP
Created attachment 415596 [details] WIP it's starting to run some of our tests
Created attachment 415605 [details] WIP Fixing lots of bugs and implementing opcodes I forgot to rewrite.
Created attachment 415608 [details] WIP
Created attachment 415658 [details] WIP
Created attachment 415670 [details] WIP When running just LOL, without any tier up past it, I'm at about a 5% failure rate of tests. Working through the failures before I start to focus more on perf.
Created attachment 415676 [details] WIP
Created attachment 415690 [details] WIP Turns out negative zero is a thing.
Created attachment 415803 [details] WIP
Created attachment 415817 [details] WIP
Created attachment 415914 [details] WIP
Created attachment 415942 [details] WIP
Created attachment 416032 [details] WIP Seems to pass the tests now when running LOL instead of baseline and without tier up to DFG. Now, I need to focus on perf, there's a lot of obvious work to be done.
Created attachment 416177 [details] WIP
Created attachment 416193 [details] WIP rebased.
Created attachment 416381 [details] WIP
Created attachment 416438 [details] WIP
Created attachment 416490 [details] WIP
Created attachment 416565 [details] WIP
Created attachment 416616 [details] WIP
Created attachment 416642 [details] WIP
Created attachment 417123 [details] WIP
Created attachment 417130 [details] WIP
Created attachment 417141 [details] WIP
Created attachment 417201 [details] WIP
Created attachment 417396 [details] WIP
Created attachment 417408 [details] WIP
Created attachment 417424 [details] WIP
Created attachment 417479 [details] WIP
Created attachment 417481 [details] WIP
Created attachment 417498 [details] WIP
Created attachment 417574 [details] WIP
Created attachment 417585 [details] WIP
Created attachment 417662 [details] WIP
Created attachment 417753 [details] WIP
Created attachment 417923 [details] WIP
Created attachment 417927 [details] WIP
Created attachment 417998 [details] WIP
Created attachment 418000 [details] WIP
Created attachment 418175 [details] WIP
Created attachment 418203 [details] WIP
Created attachment 418317 [details] WIP
Created attachment 419094 [details] WIP
Created attachment 419166 [details] WIP