[PATCH v2] aarch64: Optimized implementation of strcpy

Wilco Dijkstra Wilco.Dijkstra@arm.com
Tue Oct 22 17:54:00 GMT 2019


Hi Xuelei,

> Optimize the strcpy implementation by using vector loads and operations
> in main loop.Compared to aarch64/strcpy.S, it reduces latency of cases
> in bench-strlen by 5%~18% when the length of src is greater than 64
> bytes, with gains throughout the benchmark.

This is OK. I tried it on a few microarchitectures, and it's either as fast or
faster on long strings.

Wilco


More information about the Libc-alpha mailing list