This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Ping: [PATCH v4] faster strlen on x64
Ping,
On Wed, Feb 13, 2013 at 12:38:40PM +0100, OndÅej BÃlka wrote:
> Hello,
>
> I wrote at previous version that unaligned read of first 16 bytes is bad
> tradeoff. When I made faster strcpy header I realized that it was because
> I was doing separate check if it crosses page.
>
> When I do only check if next 64 bytes do not cross page and first do
> unaligned 16 byte load then it causes only small overhead for larger
> strings. This makes my implementation faster for wider family of
> workloads. It speed up gcc benchmark and most other programs.
>
> On unit tests revised version is somewhat slower than previous version.
> It is caused by choosing first 16 bytes only rarely which causes branch
> misprediction.
>
> I did two additional small improvements, first is squashing padding patch.
> Second bit is test to cross page can be done as x%4096 < 4096-48 instead
> x%4096 <= 4096-64 because I align x into 16 bytes.
>
> I updated benchmarks, difference between new and revised version is at
> http://kam.mff.cuni.cz/~ondra/benchmark_string/strlen_profile.html
> http://kam.mff.cuni.cz/~ondra/strlen_profile.tar.bz2
>
>
> Ondra