The ARM_NEON specific code for copyLCharsFromUCharSource() always tries to align the destination, even when it is aligned. The can be seen for moves > 15 characters in length. The code in question is marked if (length >= (2 * memoryAccessSize) - 1) { // Prefix: align dst on 64 bits. const uintptr_t memoryAccessMask = memoryAccessSize - 1; * do { * *destination++ = static_cast<LChar>(*source++); * } while (!isAlignedTo<memoryAccessMask>(destination)); // Vector interleaved unpack, we only store the lower 8 bits. const uintptr_t lengthLeft = end - destination; const LChar* const simdEnd = end - (lengthLeft % memoryAccessSize); do { asm("vld2.8 { d0-d1 }, [%[SOURCE]] !\n\t" "vst1.8 { d0 }, [%[DESTINATION],:64] !\n\t" : [SOURCE]"+r" (source), [DESTINATION]"+r" (destination) : : "memory", "d0", "d1"); } while (destination != simdEnd); } The do { } while should be changed to a while.
Created attachment 187391 [details] Patch In a synthetic test harness, this is a speed up in the small, but above 15 character case.
Committed r142336: <http://trac.webkit.org/changeset/142336>