RESOLVED FIXED 109335
ARM_NEON Inline Assembly for copyLCharsFromUCharSource() inefficient for aligned destinations
https://bugs.webkit.org/show_bug.cgi?id=109335
Summary ARM_NEON Inline Assembly for copyLCharsFromUCharSource() inefficient for alig...
Michael Saboff
Reported 2013-02-08 17:11:10 PST
The ARM_NEON specific code for copyLCharsFromUCharSource() always tries to align the destination, even when it is aligned. The can be seen for moves > 15 characters in length. The code in question is marked if (length >= (2 * memoryAccessSize) - 1) { // Prefix: align dst on 64 bits. const uintptr_t memoryAccessMask = memoryAccessSize - 1; * do { * *destination++ = static_cast<LChar>(*source++); * } while (!isAlignedTo<memoryAccessMask>(destination)); // Vector interleaved unpack, we only store the lower 8 bits. const uintptr_t lengthLeft = end - destination; const LChar* const simdEnd = end - (lengthLeft % memoryAccessSize); do { asm("vld2.8 { d0-d1 }, [%[SOURCE]] !\n\t" "vst1.8 { d0 }, [%[DESTINATION],:64] !\n\t" : [SOURCE]"+r" (source), [DESTINATION]"+r" (destination) : : "memory", "d0", "d1"); } while (destination != simdEnd); } The do { } while should be changed to a while.
Attachments
Patch (1.53 KB, patch)
2013-02-08 17:45 PST, Michael Saboff
fpizlo: review+
Michael Saboff
Comment 1 2013-02-08 17:45:23 PST
Created attachment 187391 [details] Patch In a synthetic test harness, this is a speed up in the small, but above 15 character case.
Michael Saboff
Comment 2 2013-02-08 18:00:19 PST
Note You need to log in before you can comment on or make changes to this bug.