Reduce loop strength on ARM64 too
Created attachment 439504 [details] Patch
This is WIP, but I wanted to save it here so I don't lose it while testing perf
A mixed bag so far: memcpy-wasm-small 171.2750+-1.6420 ! 181.5388+-0.9457 ! definitely 1.0599x slower memcpy-wasm-large 438.1443+-14.3229 ^ 44.0324+-1.8863 ^ definitely 9.9505x faster memcpy-wasm-medium 20.1398+-0.6413 ! 22.5029+-0.4836 ! definitely 1.1173x slower memcpy-typed-loop-speculative 1304.0646+-23.7363 1283.0492+-10.2491 might be 1.0164x faster memcpy-typed-loop 59.6541+-0.3893 ? 59.7131+-0.6997 ? memcpy-wasm 339.8212+-1.6913 ^ 46.7857+-0.6053 ^ definitely 7.2634x faster memcpy-typed-loop-large 81118.9502+-150.6358 80774.3593+-894.3635 memcpy-loop 4919.8907+-68.8522 ? 4960.3104+-48.1172 ? memcpy-typed-loop-small 76.2613+-1.0061 ! 78.2971+-0.7698 ! definitely 1.0267x slower
Created attachment 439678 [details] Patch
<rdar://problem/83897351>