Summary: | make css jit work on arm64 | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Alex Christensen <achristensen> | ||||||||||||||||
Component: | CSS | Assignee: | Nobody <webkit-unassigned> | ||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||
Severity: | Normal | CC: | benjamin, ysuzuki | ||||||||||||||||
Priority: | P2 | ||||||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||
Attachments: |
|
Description
Alex Christensen
2014-05-21 12:23:55 PDT
Created attachment 231836 [details]
Patch
Comment on attachment 231836 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=231836&action=review > Source/WebCore/cssjit/SelectorCompiler.cpp:879 > + generatePrologue(); Why doesn't x86_64 need a prologue? Doesn't it need to save the frame pointer register (%rbp)? (In reply to comment #2) > Why doesn't x86_64 need a prologue? Doesn't it need to save the frame pointer register (%rbp)? Doesn't x86_64 do that with the CALL instruction? This will need to be optimized later to only push and pop the link register and frame pointer if it calls a function, but this is just to get it working so we can optimize from here. Comment on attachment 231836 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=231836&action=review > Source/WebCore/cssjit/SelectorCompiler.cpp:874 > +#if CPU(ARM64) > + m_assembler.pushPair(JSC::ARM64Registers::lr, JSC::ARM64Registers::fp); > +#endif > +} > + > +void SelectorCodeGenerator::generateEpilogue() > +{ > +#if CPU(ARM64) > + m_assembler.popPair(JSC::ARM64Registers::lr, JSC::ARM64Registers::fp); > +#endif I would prefer this to be done through the stack allocator. Here, if we have a mismatch of pushPair/popPair, we could run into security problems. > Source/WebCore/cssjit/SelectorCompiler.cpp:935 > + generateEpilogue(); > m_assembler.ret(); > > // Failure. > if (!failureCases.empty()) { > failureCases.link(&m_assembler); > m_assembler.move(Assembler::TrustedImm32(0), returnRegister); > + generateEpilogue(); > m_assembler.ret(); The duplicated epilogue may be slower than an unconditional jump. You may want to generalize reservedCalleeSavedRegisters. Created attachment 231910 [details]
Patch
Created attachment 231917 [details]
Patch
Created attachment 231918 [details]
Patch
Comment on attachment 231918 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=231918&action=review Great encapsulation for the ARM64 parts! Some comments bellow and I have a big doubt with the stack references. > Source/WebCore/cssjit/FunctionCall.h:168 > + m_savedRegisterStackReferences.appendVector(m_stackAllocator.push(m_registerAllocator.allocatedRegisters())); Can you split this on 3 lines? It is 3 important operations, I would prefer to have 3 statements. > Source/WebCore/cssjit/RegisterAllocator.h:-32 > -#include <StackAllocator.h> I am so happy about this :) > Source/WebCore/cssjit/RegisterAllocator.h:116 > + Vector<JSC::MacroAssembler::RegisterID, calleeSavedRegisterCount> calleeSavedRegistersToReserve(unsigned count) I think the name calleeSavedRegistersToReserve masks the side effects of this function. What about: "preAllocateCalleeSavedRegisters(count)", or even keep "reserveCalleeSavedRegisters()"? > Source/WebCore/cssjit/RegisterAllocator.h:118 > + Vector<JSC::MacroAssembler::RegisterID, calleeSavedRegisterCount> registers; Is this used? > Source/WebCore/cssjit/RegisterAllocator.h:120 > + ASSERT(!m_reservedCalleeSavedRegisters.size()); We can afford RELEASE_ASSERT here. > Source/WebCore/cssjit/RegisterAllocator.h:129 > + Vector<JSC::MacroAssembler::RegisterID, calleeSavedRegisterCount> calleeSavedRegistersToRestore() restoreCalleeSavedRegisters/returnPreAllocatedCalleeSavedRegisters? > Source/WebCore/cssjit/SelectorCompiler.cpp:550 > -#if CSS_SELECTOR_JIT_DEBUGGING && ASSERT_DISABLED > +#if CSS_SELECTOR_JIT_DEBUGGING IIRC, this was for certain debug builds, I can't remember why, that was during the initial proof-of-concept work. Did you try debug builds? > Source/WebCore/cssjit/SelectorCompiler.cpp:865 > + Spaaaaaaces > Source/WebCore/cssjit/SelectorCompiler.cpp:871 > +static inline bool willGenerateFunctionCalls(const SelectorFragmentList& selectorFragments) > +{ > + // FIXME: If this selector doesn't call any functions, then we don't need to push the link register or frame pointer. > + UNUSED_PARAM(selectorFragments); > + return true; > +} I would add this with the patch, not ahead of time. > Source/WebCore/cssjit/SelectorCompiler.cpp:872 > + Spaaaaaaces > Source/WebCore/cssjit/SelectorCompiler.cpp:883 > + prologueRegister.append(JSC::X86Registers::ebp); You can use GPRInfo::callFrameRegister to have a name for it. > Source/WebCore/cssjit/SelectorCompiler.cpp:915 > + bool reservedCalleeSavedRegisters = m_registerAllocator.availableRegisterCount() < minimumRegisterCountForAttributes; > + bool callsFunctions = willGenerateFunctionCalls(m_selectorFragments); > + generatePrologue(callsFunctions, reservedCalleeSavedRegisters, minimumRegisterCountForAttributes); I would just have: bool needsEpilogue = generatePrologue(callsFunctions, reservedCalleeSavedRegisters, minimumRegisterCountForAttributes) > Source/WebCore/cssjit/SelectorCompiler.cpp:952 > + if (!m_needsAdjacentBacktrackingStart && !reservedCalleeSavedRegisters && !callsFunctions) { && !needsEpilogue > Source/WebCore/cssjit/SelectorCompiler.cpp:955 > + // Epilogue not needed because it wouldn't write anything. Once you move the name to the condition, you could remove the comments. The comments could still be valuable but I would have it as a little explanation at the beginning of this branch. Or...you could make this little block into a method generateFunctionEndWithoutEpilogue(). > Source/WebCore/cssjit/StackAllocator.h:71 > + Vector<StackReference> push(Vector<JSC::MacroAssembler::RegisterID> registerIDs) const Vector<JSC::MacroAssembler::RegisterID>& > Source/WebCore/cssjit/StackAllocator.h:78 > + for (unsigned i = 0; i < registerCount - 1; i += 2) { I wonder how clang compile this :) > Source/WebCore/cssjit/StackAllocator.h:82 > + stackReferences.append(StackReference(m_offsetFromTop - stackUnitInBytes / 2)); > + stackReferences.append(StackReference(m_offsetFromTop)); Is this right? I would have thought stackReferences.append(StackReference(m_offsetFromTop)); should go before stackReferences.append(StackReference(m_offsetFromTop - stackUnitInBytes / 2)); Or maybe you should to push i + 1 before i? > Source/WebCore/cssjit/StackAllocator.h:116 > + ASSERT(stackReferences.size() == registerCount); RELEASE_ASSERT, if we run into this, something horribly wrong has happened. > Source/WebCore/cssjit/StackAllocator.h:126 > + RELEASE_ASSERT(stackReferences[i - 1] == m_offsetFromTop && stackReferences[i - 2] == m_offsetFromTop - stackUnitInBytes / 2); Let's split this into two release assert for clarity. > Source/WebCore/cssjit/StackAllocator.h:128 > + m_assembler.popPair(registerIDs[i - 2], registerIDs[i - 1]); Don't forget to update this if you change the order of push above. Comment on attachment 231918 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=231918&action=review Great! > Source/WebCore/cssjit/StackAllocator.h:105 > m_assembler.push(registerID); Since MacroAssembler provides `pushToSave` for all environments, we can remove this ifdef and simply use `pushToSave` for x86_64 environment :) > Source/WebCore/cssjit/StackAllocator.h:146 > + m_assembler.popToRestore(registerID); Likewise we can use `popToRestore` for `StackAllocator::pop` and remove ifdef :) (In reply to comment #8) > > Source/WebCore/cssjit/SelectorCompiler.cpp:550 > > -#if CSS_SELECTOR_JIT_DEBUGGING && ASSERT_DISABLED > > +#if CSS_SELECTOR_JIT_DEBUGGING > > IIRC, this was for certain debug builds, I can't remember why, that was during the initial proof-of-concept work. > > Did you try debug builds? Yes. It works fine. I think this should be changed to make debugging debug builds easier. > > Source/WebCore/cssjit/StackAllocator.h:82 > > + stackReferences.append(StackReference(m_offsetFromTop - stackUnitInBytes / 2)); > > + stackReferences.append(StackReference(m_offsetFromTop)); > > Is this right? Yes. The order is arbitrary, but it has to be consistent. This way the registers are pushed in the order they are in the vector and popped in the opposite order of the given vector, which I think makes the most sense and is most like x86_64. I fixed all other comments. Created attachment 231973 [details]
Patch
(In reply to comment #10) > > Did you try debug builds? > Yes. It works fine. I think this should be changed to make debugging debug builds easier. > > > Source/WebCore/cssjit/StackAllocator.h:82 > > > + stackReferences.append(StackReference(m_offsetFromTop - stackUnitInBytes / 2)); > > > + stackReferences.append(StackReference(m_offsetFromTop)); > > > > Is this right? > Yes. The order is arbitrary, but it has to be consistent. This way the registers are pushed in the order they are in the vector and popped in the opposite order of the given vector, which I think makes the most sense and is most like x86_64. The order in which you store registers on the stack can be arbitrary. But the StackReferences are used to access values by address, they need to match the order in which the register are stored on the stack. (In reply to comment #12) > The order in which you store registers on the stack can be arbitrary. But the StackReferences are used to access values by address, they need to match the order in which the register are stored on the stack. They do. Created attachment 231975 [details]
Patch
Created attachment 231989 [details]
Patch
Comment on attachment 231989 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=231989&action=review > Source/WebCore/cssjit/RegisterAllocator.h:125 > + return Vector<JSC::MacroAssembler::RegisterID, calleeSavedRegisterCount>(m_reservedCalleeSavedRegisters); Do you need to explicit copy constructor here? > Source/WebCore/cssjit/StackAllocator.h:76 > + stackReferences.reserveCapacity(registerCount); reserveInitialCapacity() > Source/WebCore/cssjit/StackAllocator.h:113 > + ASSERT(m_offsetFromTop >= stackUnitInBytes * registerCount); Is this right on ARM64? You could have 2 registers by stack units. This is probably only good for x86_64. |