This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[PATCH] faster string operations for bulldozer (take 2)

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Carlos O'Donell <carlos at systemhalted dot org>
Cc: Roland McGrath <roland at hack dot frob dot com>, libc-alpha at sourceware dot org
Date: Sun, 30 Sep 2012 12:37:30 +0200
Subject: [PATCH] faster string operations for bulldozer (take 2)
References: <20120926171541.GA12300@domone.kolej.mff.cuni.cz><20120926172758.56BEC2C097@topped-with-meat.com><20120926184013.GA13454@domone.kolej.mff.cuni.cz><20120926194423.15F9B2C061@topped-with-meat.com><20120926211433.GA17771@domone.kolej.mff.cuni.cz><CAE2sS1i2nfrn58PNwtOXYx9qt=bWX_C4_fJN=CnGuefBTvN-Bw@mail.gmail.com>

Here is updated version of my patch. I also made two minor changes to my
benchmark. 

One is lower bound that only loads data from memory to registers. I
still do not know how to avoid slowdown of prefetch when data is in L1
cache without duplicating too much code.

Second is that benchmark now avoids writes caused by cache evictions.

I updated benchmark at
http://kam.mff.cuni.cz/~ondra/benchmark_string/fx10/strlen/html/test.html
and http://kam.mff.cuni.cz/~ondra/benchmark_string/fx10/strlen/html/test_r.html

--
	* sysdeps/x86_64/multiarch/init_arch.c (__init_cpu_features):
	Set bit_Prefer_PMINUB_for_stringop for AMD processors.
	Set bit_Fast_Unaligned_Load for AMD processors with AVX.

---
 sysdeps/x86_64/multiarch/init-arch.c |    3 +++
 1 files changed, 8 insertions(+), 0 deletions(-)


diff --git a/sysdeps/x86_64/multiarch/init-arch.c
b/sysdeps/x86_64/multiarch/init-arch.c
index fb44dcf..46ed502 100644
--- a/sysdeps/x86_64/multiarch/init-arch.c
+++ b/sysdeps/x86_64/multiarch/init-arch.c
@@ -131,6 +131,9 @@ __init_cpu_features (void)
	__cpu_features.feature[index_Prefer_SSE_for_memop]
	  |= bit_Prefer_SSE_for_memop;

+      /* Assume unaligned loads are fast when avx is available.  */
+      if ((ecx & bit_AVX) != 0)
+	__cpu_features.feature[index_Fast_Rep_String]
+	  |= ( bit_Fast_Unaligned_Load);
+
+      __cpu_features.feature[index_Fast_Rep_String]
+	|= bit_Prefer_PMINUB_for_stringop;
+
       unsigned int eax;
       __cpuid (0x80000000, eax, ebx, ecx, edx);
       if (eax >= 0x80000001)
-- 
1.7.4.4

References:
- [PATCH] faster string operations for buldozer.
  - From: OndÅej BÃlka
- Re: [PATCH] faster string operations for buldozer.
  - From: Roland McGrath
- Re: [PATCH] faster string operations for buldozer.
  - From: OndÅej BÃlka
- Re: [PATCH] faster string operations for buldozer.
  - From: Roland McGrath
- Re: [PATCH] faster string operations for buldozer.
  - From: OndÅej BÃlka
- Re: [PATCH] faster string operations for buldozer.
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]