[PATCH v2] aarch64: Optimized implementation of strcpy
Wilco Dijkstra
Wilco.Dijkstra@arm.com
Tue Oct 22 17:54:00 GMT 2019
Hi Xuelei,
> Optimize the strcpy implementation by using vector loads and operations
> in main loop.Compared to aarch64/strcpy.S, it reduces latency of cases
> in bench-strlen by 5%~18% when the length of src is greater than 64
> bytes, with gains throughout the benchmark.
This is OK. I tried it on a few microarchitectures, and it's either as fast or
faster on long strings.
Wilco
More information about the Libc-alpha
mailing list