Bug 133156 - make css jit work on arm64
Summary: make css jit work on arm64
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: CSS (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-05-21 12:23 PDT by Alex Christensen
Modified: 2014-05-23 18:21 PDT (History)
2 users (show)

See Also:


Attachments
Patch (3.38 KB, patch)
2014-05-21 12:27 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (11.53 KB, patch)
2014-05-22 13:53 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (13.05 KB, patch)
2014-05-22 16:13 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (17.50 KB, patch)
2014-05-22 16:46 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (17.37 KB, patch)
2014-05-23 11:04 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (17.46 KB, patch)
2014-05-23 11:37 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (16.43 KB, patch)
2014-05-23 14:33 PDT, Alex Christensen
benjamin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Christensen 2014-05-21 12:23:55 PDT
The css jit works on arm64.  It still needs to be tested and optimized so I won't enable it yet.
Comment 1 Alex Christensen 2014-05-21 12:27:38 PDT
Created attachment 231836 [details]
Patch
Comment 2 Geoffrey Garen 2014-05-21 12:45:30 PDT
Comment on attachment 231836 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=231836&action=review

> Source/WebCore/cssjit/SelectorCompiler.cpp:879
> +    generatePrologue();

Why doesn't x86_64 need a prologue? Doesn't it need to save the frame pointer register (%rbp)?
Comment 3 Alex Christensen 2014-05-21 13:19:48 PDT
(In reply to comment #2)
> Why doesn't x86_64 need a prologue? Doesn't it need to save the frame pointer register (%rbp)?
Doesn't x86_64 do that with the CALL instruction?  This will need to be optimized later to only push and pop the link register and frame pointer if it calls a function, but this is just to get it working so we can optimize from here.
Comment 4 Benjamin Poulain 2014-05-21 13:35:26 PDT
Comment on attachment 231836 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=231836&action=review

> Source/WebCore/cssjit/SelectorCompiler.cpp:874
> +#if CPU(ARM64)
> +    m_assembler.pushPair(JSC::ARM64Registers::lr, JSC::ARM64Registers::fp);
> +#endif
> +}
> +
> +void SelectorCodeGenerator::generateEpilogue()
> +{
> +#if CPU(ARM64)
> +    m_assembler.popPair(JSC::ARM64Registers::lr, JSC::ARM64Registers::fp);
> +#endif

I would prefer this to be done through the stack allocator.

Here, if we have a mismatch of pushPair/popPair, we could run into security problems.

> Source/WebCore/cssjit/SelectorCompiler.cpp:935
> +            generateEpilogue();
>              m_assembler.ret();
>  
>              // Failure.
>              if (!failureCases.empty()) {
>                  failureCases.link(&m_assembler);
>                  m_assembler.move(Assembler::TrustedImm32(0), returnRegister);
> +                generateEpilogue();
>                  m_assembler.ret();

The duplicated epilogue may be slower than an unconditional jump. You may want to generalize reservedCalleeSavedRegisters.
Comment 5 Alex Christensen 2014-05-22 13:53:17 PDT
Created attachment 231910 [details]
Patch
Comment 6 Alex Christensen 2014-05-22 16:13:55 PDT
Created attachment 231917 [details]
Patch
Comment 7 Alex Christensen 2014-05-22 16:46:32 PDT
Created attachment 231918 [details]
Patch
Comment 8 Benjamin Poulain 2014-05-22 17:53:23 PDT
Comment on attachment 231918 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=231918&action=review

Great encapsulation for the ARM64 parts!

Some comments bellow and I have a big doubt with the stack references.

> Source/WebCore/cssjit/FunctionCall.h:168
> +        m_savedRegisterStackReferences.appendVector(m_stackAllocator.push(m_registerAllocator.allocatedRegisters()));

Can you split this on 3 lines?
It is 3 important operations, I would prefer to have 3 statements.

> Source/WebCore/cssjit/RegisterAllocator.h:-32
> -#include <StackAllocator.h>

I am so happy about this :)

> Source/WebCore/cssjit/RegisterAllocator.h:116
> +    Vector<JSC::MacroAssembler::RegisterID, calleeSavedRegisterCount> calleeSavedRegistersToReserve(unsigned count)

I think the name calleeSavedRegistersToReserve masks the side effects of this function.
What about: "preAllocateCalleeSavedRegisters(count)", or even keep "reserveCalleeSavedRegisters()"?

> Source/WebCore/cssjit/RegisterAllocator.h:118
> +        Vector<JSC::MacroAssembler::RegisterID, calleeSavedRegisterCount> registers;

Is this used?

> Source/WebCore/cssjit/RegisterAllocator.h:120
> +        ASSERT(!m_reservedCalleeSavedRegisters.size());

We can afford RELEASE_ASSERT here.

> Source/WebCore/cssjit/RegisterAllocator.h:129
> +    Vector<JSC::MacroAssembler::RegisterID, calleeSavedRegisterCount> calleeSavedRegistersToRestore()

restoreCalleeSavedRegisters/returnPreAllocatedCalleeSavedRegisters?

> Source/WebCore/cssjit/SelectorCompiler.cpp:550
> -#if CSS_SELECTOR_JIT_DEBUGGING && ASSERT_DISABLED
> +#if CSS_SELECTOR_JIT_DEBUGGING

IIRC, this was for certain debug builds, I can't remember why, that was during the initial proof-of-concept work.

Did you try debug builds?

> Source/WebCore/cssjit/SelectorCompiler.cpp:865
> +    

Spaaaaaaces

> Source/WebCore/cssjit/SelectorCompiler.cpp:871
> +static inline bool willGenerateFunctionCalls(const SelectorFragmentList& selectorFragments)
> +{
> +    // FIXME: If this selector doesn't call any functions, then we don't need to push the link register or frame pointer.
> +    UNUSED_PARAM(selectorFragments);
> +    return true;
> +}

I would add this with the patch, not ahead of time.

> Source/WebCore/cssjit/SelectorCompiler.cpp:872
> +    

Spaaaaaaces

> Source/WebCore/cssjit/SelectorCompiler.cpp:883
> +        prologueRegister.append(JSC::X86Registers::ebp);

You can use GPRInfo::callFrameRegister to have a name for it.

> Source/WebCore/cssjit/SelectorCompiler.cpp:915
> +    bool reservedCalleeSavedRegisters = m_registerAllocator.availableRegisterCount() < minimumRegisterCountForAttributes;
> +    bool callsFunctions = willGenerateFunctionCalls(m_selectorFragments);
> +    generatePrologue(callsFunctions, reservedCalleeSavedRegisters, minimumRegisterCountForAttributes);

I would just have:
    bool needsEpilogue = generatePrologue(callsFunctions, reservedCalleeSavedRegisters, minimumRegisterCountForAttributes)

> Source/WebCore/cssjit/SelectorCompiler.cpp:952
> +        if (!m_needsAdjacentBacktrackingStart && !reservedCalleeSavedRegisters && !callsFunctions) {

&& !needsEpilogue

> Source/WebCore/cssjit/SelectorCompiler.cpp:955
> +            // Epilogue not needed because it wouldn't write anything.

Once you move the name to the condition, you could remove the comments.

The comments could still be valuable but I would have it as a little explanation at the beginning of this branch.
Or...you could make this little block into a method generateFunctionEndWithoutEpilogue().

> Source/WebCore/cssjit/StackAllocator.h:71
> +    Vector<StackReference> push(Vector<JSC::MacroAssembler::RegisterID> registerIDs)

const Vector<JSC::MacroAssembler::RegisterID>&

> Source/WebCore/cssjit/StackAllocator.h:78
> +        for (unsigned i = 0; i < registerCount - 1; i += 2) {

I wonder how clang compile this :)

> Source/WebCore/cssjit/StackAllocator.h:82
> +            stackReferences.append(StackReference(m_offsetFromTop - stackUnitInBytes / 2));
> +            stackReferences.append(StackReference(m_offsetFromTop));

Is this right?

I would have thought stackReferences.append(StackReference(m_offsetFromTop)); should go before stackReferences.append(StackReference(m_offsetFromTop - stackUnitInBytes / 2));

Or maybe you should to push i + 1 before i?

> Source/WebCore/cssjit/StackAllocator.h:116
> +        ASSERT(stackReferences.size() == registerCount);

RELEASE_ASSERT, if we run into this, something horribly wrong has happened.

> Source/WebCore/cssjit/StackAllocator.h:126
> +            RELEASE_ASSERT(stackReferences[i - 1] == m_offsetFromTop && stackReferences[i - 2] == m_offsetFromTop - stackUnitInBytes / 2);

Let's split this into two release assert for clarity.

> Source/WebCore/cssjit/StackAllocator.h:128
> +            m_assembler.popPair(registerIDs[i - 2], registerIDs[i - 1]);

Don't forget to update this if you change the order of push above.
Comment 9 Yusuke Suzuki 2014-05-22 21:51:29 PDT
Comment on attachment 231918 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=231918&action=review

Great!

> Source/WebCore/cssjit/StackAllocator.h:105
>          m_assembler.push(registerID);

Since MacroAssembler provides `pushToSave` for all environments, we can remove this ifdef and simply use `pushToSave` for x86_64 environment :)

> Source/WebCore/cssjit/StackAllocator.h:146
> +        m_assembler.popToRestore(registerID);

Likewise we can use `popToRestore` for `StackAllocator::pop` and remove ifdef :)
Comment 10 Alex Christensen 2014-05-23 10:42:22 PDT
(In reply to comment #8)
> > Source/WebCore/cssjit/SelectorCompiler.cpp:550
> > -#if CSS_SELECTOR_JIT_DEBUGGING && ASSERT_DISABLED
> > +#if CSS_SELECTOR_JIT_DEBUGGING
> 
> IIRC, this was for certain debug builds, I can't remember why, that was during the initial proof-of-concept work.
> 
> Did you try debug builds?
Yes.  It works fine.  I think this should be changed to make debugging debug builds easier.
> > Source/WebCore/cssjit/StackAllocator.h:82
> > +            stackReferences.append(StackReference(m_offsetFromTop - stackUnitInBytes / 2));
> > +            stackReferences.append(StackReference(m_offsetFromTop));
> 
> Is this right?
Yes.  The order is arbitrary, but it has to be consistent.  This way the registers are pushed in the order they are in the vector and popped in the opposite order of the given vector, which I think makes the most sense and is most like x86_64.

I fixed all other comments.
Comment 11 Alex Christensen 2014-05-23 11:04:16 PDT
Created attachment 231973 [details]
Patch
Comment 12 Benjamin Poulain 2014-05-23 11:09:21 PDT
(In reply to comment #10)
> > Did you try debug builds?
> Yes.  It works fine.  I think this should be changed to make debugging debug builds easier.
> > > Source/WebCore/cssjit/StackAllocator.h:82
> > > +            stackReferences.append(StackReference(m_offsetFromTop - stackUnitInBytes / 2));
> > > +            stackReferences.append(StackReference(m_offsetFromTop));
> > 
> > Is this right?
> Yes.  The order is arbitrary, but it has to be consistent.  This way the registers are pushed in the order they are in the vector and popped in the opposite order of the given vector, which I think makes the most sense and is most like x86_64.

The order in which you store registers on the stack can be arbitrary. But the StackReferences are used to access values by address, they need to match the order in which the register are stored on the stack.
Comment 13 Alex Christensen 2014-05-23 11:11:48 PDT
(In reply to comment #12)
> The order in which you store registers on the stack can be arbitrary. But the StackReferences are used to access values by address, they need to match the order in which the register are stored on the stack.
They do.
Comment 14 Alex Christensen 2014-05-23 11:37:48 PDT
Created attachment 231975 [details]
Patch
Comment 15 Alex Christensen 2014-05-23 14:33:25 PDT
Created attachment 231989 [details]
Patch
Comment 16 Benjamin Poulain 2014-05-23 17:49:01 PDT
Comment on attachment 231989 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=231989&action=review

> Source/WebCore/cssjit/RegisterAllocator.h:125
> +        return Vector<JSC::MacroAssembler::RegisterID, calleeSavedRegisterCount>(m_reservedCalleeSavedRegisters);

Do you need to explicit copy constructor here?

> Source/WebCore/cssjit/StackAllocator.h:76
> +        stackReferences.reserveCapacity(registerCount);

reserveInitialCapacity()

> Source/WebCore/cssjit/StackAllocator.h:113
> +        ASSERT(m_offsetFromTop >= stackUnitInBytes * registerCount);

Is this right on ARM64? You could have 2 registers by stack units.

This is probably only good for x86_64.
Comment 17 Alex Christensen 2014-05-23 18:21:31 PDT
http://trac.webkit.org/changeset/169297