Bug 16685

Summary: eliminate List::empty() to cut down on PIC branches
Product: WebKit Reporter: Darin Adler <darin>
Component: JavaScriptCoreAssignee: Darin Adler <darin>
Status: RESOLVED FIXED    
Severity: Normal CC: ggaren
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Macintosh   
OS: OS X 10.4   
Attachments:
Description Flags
patch ggaren: review+

Description Darin Adler 2007-12-31 02:53:46 PST
This will eliminate the PIC branch from FunctionBodyNode::execute, which will speed things up quite a bit.
Comment 1 Darin Adler 2007-12-31 03:04:02 PST
Created attachment 18214 [details]
patch
Comment 2 Geoffrey Garen 2008-01-01 10:34:11 PST
This looks a little stinky to me:

-    localStorage.reserveCapacity(m_varStack.size() + m_parameters.size() + m_functionStack.size());
+    size_t totalSize = m_varStack.size() + m_parameters.size() + m_functionStack.size();
+    if (totalSize > localStorage.capacity()) // Doing this check inline avoids function call overhead.
+        localStorage.reserveCapacity(totalSize);

If this is really a measurable win, I think it would be preferable to change Vector, along these lines:

    template<typename T, size_t inlineCapacity>
    void Vector<T, inlineCapacity>::reserveCapacity(size_t newCapacity)
    inlne void reserveCapacity(size_t newCapacity)
    {
        if (newCapacity <= capacity())
            return;
        reserveCapacitySlowCase(newCapacity);
    }

    template<typename T, size_t inlineCapacity>
    void Vector<T, inlineCapacity>::reserveCapacitySlowCase(size_t newCapacity)
    {
        ASSERT(newCapacity > capacity());
        T* oldBuffer = begin();
        T* oldEnd = end();
        m_impl.allocateBuffer(newCapacity);
        TypeOperations::move(oldBuffer, oldEnd, begin());
        m_impl.deallocateBuffer(oldBuffer);
    }

That way, in the fast case, everybody would get the win, and in the slow case, you would avoid an extra branch.

Anyway, r=me
Comment 3 Darin Adler 2008-01-01 10:50:52 PST
(In reply to comment #2)
> If this is really a measurable win

It is.

> I think it would be preferable to change Vector [...]
> 
> That way, in the fast case, everybody would get the win, and in the slow case,
> you would avoid an extra branch.

I considered exactly that.

But this is not necessarily a win for other callers of reserveCapacity. This call site is special in two ways: 1) the vector has an inline capacity of 32, which is almost always big enough, and 2) it's in a super-hot code path; so hot that we use the uncheckedAppend. I think it's unlikely that other callers of reserveCapacity qualify. So we'd have to consider having two versions of reserveCapacity.

The slow case is so uncommon I think the extra branch is no big deal at all.

Given that, do you still think there's room for improvement here?
Comment 4 Geoffrey Garen 2008-01-01 10:54:06 PST
OK, you persuaded me. Let's keep the patch as is.
Comment 5 Darin Adler 2008-01-01 12:02:57 PST
r29067