Patch forthcoming.
Created attachment 305548 [details] it begins
Created attachment 305558 [details] so close
Created attachment 305561 [details] the latest I realized that I stumbled upon a super gnarly situation where linear scan has a bad time with Air's liberal use of precoloring. I think I found a megahack to work around this. I wrote it down in a comment. I'll get back to this tomorrow.
Created attachment 305612 [details] seems like it might work Haven't tested it yet.
Created attachment 305662 [details] it linear scanned marsaglia and didn't crash
Created attachment 305690 [details] seems like it's starting to work It linear scanned imaging-gaussian-blur
Created attachment 305694 [details] passes a few tests here and there
Created attachment 305697 [details] passing more tests!
Created attachment 305787 [details] more things Added a bunch of debug support, including a way to force linear scan to first spill all the things before proceeding further. This tests that linear scan's spiller is sound. So far it looks like the spiller is sound but the allocator itself is borked.
Created attachment 305814 [details] fixed another bug It's now passing a bunch of tests that it was previously failing. Rerunning all tests now.
Created attachment 305825 [details] changed the approach to clobbers Switching to using one of the approaches suggested in the paper.
Created attachment 305829 [details] more
Created attachment 305830 [details] not failing so many tests anymore But still failing many tests
Created attachment 305832 [details] another big fix
(In reply to Filip Pizlo from comment #14) > Created attachment 305832 [details] > another big fix With this it passes all JSC tests but I'm still able to get it to crash through fuzz tests.
Created attachment 305885 [details] the patch
Created attachment 305890 [details] the patch
Attachment 305890 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp:34: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp:420: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] ERROR: Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp:420: Multi line control clauses should use braces. [whitespace/braces] [4] ERROR: Source/WTF/wtf/Range.h:88: Missing spaces around | [whitespace/operators] [3] Total errors found: 4 in 65 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 305896 [details] the patch
Attachment 305896 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp:34: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp:420: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] ERROR: Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp:420: Multi line control clauses should use braces. [whitespace/braces] [4] ERROR: Source/WTF/wtf/Range.h:88: Missing spaces around | [whitespace/operators] [3] Total errors found: 4 in 65 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 305896 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=305896&action=review r=me. Please fix the various places you have OOPS > Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp:151 > + index += block->size() * 2; shouldn't this be block->size()*2+1 since block->size()*2 is the tail? Or is the head of one block OK to be the same as the tail? > Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp:407 > + if (m_active.size() != m_registers.size()) { > + bool didAssign = false; > + for (Reg reg : m_registers) { > + if (!m_activeRegs.contains(reg) && entry.possibleRegs.contains(reg)) { > + assign(tmp, reg); > + didAssign = true; > + break; > + } > + } > + if (didAssign) > + continue; > + } We could totally do biased coloring here to try to remove some extra moves. Should be cheap, and benefit quality of code in some programs. Code gen quality isn't a priority for this phase, but if we get a throughput progression without slowing this phase down, we should do it. Might be worth a FIXME. I'm happy to hack it up. > Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp:422 > + spill(tmp); What guarantees !entry.isUnspillable here?
Comment on attachment 305896 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=305896&action=review > Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp:61 > +bool verbose() { return Options::airLinearScanVerbose(); } nit: I'm not sure this really needs to be an option now that the phase works.
(In reply to Saam Barati from comment #21) > Comment on attachment 305896 [details] > the patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=305896&action=review > > r=me. Please fix the various places you have OOPS Right! I will! > > > Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp:151 > > + index += block->size() * 2; > > shouldn't this be block->size()*2+1 since block->size()*2 is the tail? > Or is the head of one block OK to be the same as the tail? Basic blocks already have "double padding" between them in the sense that we treat the point after the terminal to be distinct from the point before the first instruction in the successor(s). The indexing scheme in this patch means that if you have a block like: Foo Bar Baz Then it will be indexed as follows: 0 Foo 1 2 Bar 3 4 Baz 5 We don't need the extra +1, since every block is indexed with padding. > > > Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp:407 > > + if (m_active.size() != m_registers.size()) { > > + bool didAssign = false; > > + for (Reg reg : m_registers) { > > + if (!m_activeRegs.contains(reg) && entry.possibleRegs.contains(reg)) { > > + assign(tmp, reg); > > + didAssign = true; > > + break; > > + } > > + } > > + if (didAssign) > > + continue; > > + } > > We could totally do biased coloring here to try to remove some extra moves. > Should be cheap, and benefit quality of code in some programs. Code gen > quality isn't a priority for this phase, but if we get a throughput > progression without slowing this phase down, we should do it. Might be worth > a FIXME. I'm happy to hack it up. I'll add a FIXME. > > > Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp:422 > > + spill(tmp); > > What guarantees !entry.isUnspillable here? This means that a tmp introduced by the spiller was born at a time when none of the registers that it could legally use are available. Since spill tmps have minuscule live ranges - literally live either just at the end, or just the beginning, or both, of some instruction - it takes some serious effort to reach this point. Graph coloring fails on this case also. It's like if you created a patchpoint that returns SomeRegister but clobbers the entire register file.
(In reply to Saam Barati from comment #22) > Comment on attachment 305896 [details] > the patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=305896&action=review > > > Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp:61 > > +bool verbose() { return Options::airLinearScanVerbose(); } > > nit: I'm not sure this really needs to be an option now that the phase works. You're prolly gonna like it when you add biased coloring.
Landed in https://trac.webkit.org/changeset/214636/webkit
(In reply to Filip Pizlo from comment #23) > (In reply to Saam Barati from comment #21) > > Comment on attachment 305896 [details] > > the patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=305896&action=review > > > > r=me. Please fix the various places you have OOPS > > Right! I will! > > > > > > Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp:151 > > > + index += block->size() * 2; > > > > shouldn't this be block->size()*2+1 since block->size()*2 is the tail? > > Or is the head of one block OK to be the same as the tail? > > Basic blocks already have "double padding" between them in the sense that we > treat the point after the terminal to be distinct from the point before the > first instruction in the successor(s). The indexing scheme in this patch > means that if you have a block like: > > Foo > Bar > Baz > > Then it will be indexed as follows: > > 0 > Foo > 1 > 2 > Bar > 3 > 4 > Baz > 5 > > We don't need the extra +1, since every block is indexed with padding. > > > > > > Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp:407 > > > + if (m_active.size() != m_registers.size()) { > > > + bool didAssign = false; > > > + for (Reg reg : m_registers) { > > > + if (!m_activeRegs.contains(reg) && entry.possibleRegs.contains(reg)) { > > > + assign(tmp, reg); > > > + didAssign = true; > > > + break; > > > + } > > > + } > > > + if (didAssign) > > > + continue; > > > + } > > > > We could totally do biased coloring here to try to remove some extra moves. > > Should be cheap, and benefit quality of code in some programs. Code gen > > quality isn't a priority for this phase, but if we get a throughput > > progression without slowing this phase down, we should do it. Might be worth > > a FIXME. I'm happy to hack it up. > > I'll add a FIXME. > > > > > > Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp:422 > > > + spill(tmp); > > > > What guarantees !entry.isUnspillable here? > > This means that a tmp introduced by the spiller was born at a time when none > of the registers that it could legally use are available. Since spill tmps > have minuscule live ranges - literally live either just at the end, or just > the beginning, or both, of some instruction - it takes some serious effort > to reach this point. Right. I know that this is the case. I just realized the function you call into has a RELEASE_ASSERT. > > Graph coloring fails on this case also. > > It's like if you created a patchpoint that returns SomeRegister but clobbers > the entire register file.
Is there any chance this patch could've regressed ARM64? I'm seeing a 4% JetStream regression on iOS perf bots.
(In reply to Saam Barati from comment #27) > Is there any chance this patch could've regressed ARM64? I'm seeing a 4% > JetStream regression on iOS perf bots. Anything is possible but I don't see how this would have done it.