This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] faster string operations for bulldozer (take 2)


Here is updated version of my patch. I also made two minor changes to my
benchmark. 

One is lower bound that only loads data from memory to registers. I
still do not know how to avoid slowdown of prefetch when data is in L1
cache without duplicating too much code.

Second is that benchmark now avoids writes caused by cache evictions.

I updated benchmark at
http://kam.mff.cuni.cz/~ondra/benchmark_string/fx10/strlen/html/test.html
and http://kam.mff.cuni.cz/~ondra/benchmark_string/fx10/strlen/html/test_r.html

--
	* sysdeps/x86_64/multiarch/init_arch.c (__init_cpu_features):
	Set bit_Prefer_PMINUB_for_stringop for AMD processors.
	Set bit_Fast_Unaligned_Load for AMD processors with AVX.

---
 sysdeps/x86_64/multiarch/init-arch.c |    3 +++
 1 files changed, 8 insertions(+), 0 deletions(-)


diff --git a/sysdeps/x86_64/multiarch/init-arch.c
b/sysdeps/x86_64/multiarch/init-arch.c
index fb44dcf..46ed502 100644
--- a/sysdeps/x86_64/multiarch/init-arch.c
+++ b/sysdeps/x86_64/multiarch/init-arch.c
@@ -131,6 +131,9 @@ __init_cpu_features (void)
	__cpu_features.feature[index_Prefer_SSE_for_memop]
	  |= bit_Prefer_SSE_for_memop;

+      /* Assume unaligned loads are fast when avx is available.  */
+      if ((ecx & bit_AVX) != 0)
+	__cpu_features.feature[index_Fast_Rep_String]
+	  |= ( bit_Fast_Unaligned_Load);
+
+      __cpu_features.feature[index_Fast_Rep_String]
+	|= bit_Prefer_PMINUB_for_stringop;
+
       unsigned int eax;
       __cpuid (0x80000000, eax, ebx, ecx, edx);
       if (eax >= 0x80000001)
-- 
1.7.4.4


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]