Bug 109335

Summary: ARM_NEON Inline Assembly for copyLCharsFromUCharSource() inefficient for aligned destinations
Product: WebKit Reporter: Michael Saboff <msaboff>
Component: JavaScriptCoreAssignee: Michael Saboff <msaboff>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, cmarcelo, ojan.autocc, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Other   
OS: All   
Attachments:
Description Flags
Patch fpizlo: review+

Description Michael Saboff 2013-02-08 17:11:10 PST
The ARM_NEON specific code for copyLCharsFromUCharSource() always tries to align the destination, even when it is aligned.  The can be seen for moves > 15 characters in length.

The code in question is marked

    if (length >= (2 * memoryAccessSize) - 1) {
        // Prefix: align dst on 64 bits.
        const uintptr_t memoryAccessMask = memoryAccessSize - 1;
 *      do {
 *          *destination++ = static_cast<LChar>(*source++);
 *       } while (!isAlignedTo<memoryAccessMask>(destination));

        // Vector interleaved unpack, we only store the lower 8 bits.
        const uintptr_t lengthLeft = end - destination;
        const LChar* const simdEnd = end - (lengthLeft % memoryAccessSize);
        do {
            asm("vld2.8   { d0-d1 }, [%[SOURCE]] !\n\t"
                "vst1.8   { d0 }, [%[DESTINATION],:64] !\n\t"
                : [SOURCE]"+r" (source), [DESTINATION]"+r" (destination)
                :
                : "memory", "d0", "d1");
        } while (destination != simdEnd);
    }

The do { } while should be changed to a while.
Comment 1 Michael Saboff 2013-02-08 17:45:23 PST
Created attachment 187391 [details]
Patch

In a synthetic test harness, this is a speed up in the small, but above 15 character case.
Comment 2 Michael Saboff 2013-02-08 18:00:19 PST
Committed r142336: <http://trac.webkit.org/changeset/142336>