This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[PING] [PATCH] faster string operations for bulldozer (take 2)
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Carlos O'Donell <carlos at systemhalted dot org>
- Cc: Roland McGrath <roland at hack dot frob dot com>, libc-alpha at sourceware dot org
- Date: Fri, 26 Apr 2013 18:56:14 +0200
- Subject: [PING] [PATCH] faster string operations for bulldozer (take 2)
- References: <20120926171541 dot GA12300 at domone dot kolej dot mff dot cuni dot cz> <20120926172758 dot 56BEC2C097 at topped-with-meat dot com> <20120926184013 dot GA13454 at domone dot kolej dot mff dot cuni dot cz> <20120926194423 dot 15F9B2C061 at topped-with-meat dot com> <20120926211433 dot GA17771 at domone dot kolej dot mff dot cuni dot cz> <CAE2sS1i2nfrn58PNwtOXYx9qt=bWX_C4_fJN=CnGuefBTvN-Bw at mail dot gmail dot com> <20120930103730 dot GA5682 at domone dot kolej dot mff dot cuni dot cz>
Hi,
I got sidetracked by other work,
I need this patch for new memset so I ping it.
On Sun, Sep 30, 2012 at 12:37:30PM +0200, OndÅej BÃlka wrote:
> Here is updated version of my patch. I also made two minor changes to my
> benchmark.
>
> One is lower bound that only loads data from memory to registers. I
> still do not know how to avoid slowdown of prefetch when data is in L1
> cache without duplicating too much code.
>
> Second is that benchmark now avoids writes caused by cache evictions.
>
> I updated benchmark at
> http://kam.mff.cuni.cz/~ondra/benchmark_string/fx10/strlen/html/test.html
> and http://kam.mff.cuni.cz/~ondra/benchmark_string/fx10/strlen/html/test_r.html
>
> --
> * sysdeps/x86_64/multiarch/init_arch.c (__init_cpu_features):
> Set bit_Prefer_PMINUB_for_stringop for AMD processors.
> Set bit_Fast_Unaligned_Load for AMD processors with AVX.
>
> ---
> sysdeps/x86_64/multiarch/init-arch.c | 3 +++
> 1 files changed, 8 insertions(+), 0 deletions(-)
>
>
> diff --git a/sysdeps/x86_64/multiarch/init-arch.c
> b/sysdeps/x86_64/multiarch/init-arch.c
> index fb44dcf..46ed502 100644
> --- a/sysdeps/x86_64/multiarch/init-arch.c
> +++ b/sysdeps/x86_64/multiarch/init-arch.c
> @@ -131,6 +131,9 @@ __init_cpu_features (void)
> __cpu_features.feature[index_Prefer_SSE_for_memop]
> |= bit_Prefer_SSE_for_memop;
>
> + /* Assume unaligned loads are fast when avx is available. */
> + if ((ecx & bit_AVX) != 0)
> + __cpu_features.feature[index_Fast_Rep_String]
> + |= ( bit_Fast_Unaligned_Load);
> +
> + __cpu_features.feature[index_Fast_Rep_String]
> + |= bit_Prefer_PMINUB_for_stringop;
> +
> unsigned int eax;
> __cpuid (0x80000000, eax, ebx, ecx, edx);
> if (eax >= 0x80000001)
> --
> 1.7.4.4