Some more SIMD :)
Created attachment 160303 [details] Patch
I could not use intrinsics here because: -I need the explicit alignment for speed. -I need the auto increment for speed. -IIRC the interleaved load intrinsics are not always available anyway.
Comment on attachment 160303 [details] Patch Attachment 160303 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13562967
Comment on attachment 160303 [details] Patch Attachment 160303 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13570906
Created attachment 160310 [details] Patch
Comment on attachment 160310 [details] Patch Attachment 160310 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13568917
Comment on attachment 160310 [details] Patch Attachment 160310 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/13559960
Comment on attachment 160310 [details] Patch Attachment 160310 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/13559975
Comment on attachment 160310 [details] Patch I'll fix that tomorrow.
Comment on attachment 160310 [details] Patch Attachment 160310 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/13566937
Comment on attachment 160310 [details] Patch Attachment 160310 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/13563945
Comment on attachment 160310 [details] Patch Attachment 160310 [details] did not pass cr-android-ews (chromium-android): Output: http://queues.webkit.org/results/13569880
Created attachment 160481 [details] Patch
This new version improves the performance when we cannot use Neon, and is just as fast when the input is big enough.
Comment on attachment 160481 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=160481&action=review This looks awesome, didn't know about the interleaved unpack, that's really handy. > Source/WTF/wtf/text/ASCIIFastPath.h:134 > +#elif COMPILER(GCC) && CPU(ARM_NEON) && !(PLATFORM(BIG_ENDIAN) || PLATFORM(MIDDLE_ENDIAN)) Your optimized path skips an ASSERT to check the upper bits are zero; might be worth adding "&& defined(NDEBUG)" so that debug builds get the C-loop with the ASSERT. > Source/WTF/wtf/text/ASCIIFastPath.h:141 > + do { I think WebKit coding style is no parens here.
Committed r133100: <http://trac.webkit.org/changeset/133100>