[PATCH v2] aarch64: Optimized implementation of strcpy

Tue Oct 22 17:54:00 GMT 2019

Hi Xuelei,

> Optimize the strcpy implementation by using vector loads and operations
> in main loop.Compared to aarch64/strcpy.S, it reduces latency of cases
> in bench-strlen by 5%~18% when the length of src is greater than 64
> bytes, with gains throughout the benchmark.

This is OK. I tried it on a few microarchitectures, and it's either as fast or
faster on long strings.

Wilco