This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH RFC V2] Improve 64bit memset for Corei7 with avx2 instruction


Andreas,

Our code is based on glibc 2.17. Could you please give us one route so
that we can get latest version and update the patch ASAP?

Best Regards
Ling

2013/7/22, Andreas Jaeger <aj@suse.com>:
> On 07/16/2013 03:37 PM, ling.ma.program@gmail.com wrote:
>> From: Ma Ling <ling.ml@alibaba-inc.com>
>>
>> In this patch we use the similar approach with memcpy to avoid branch
>> instructions
>> and force destination to be aligned with avx instruction.
>> By gcc.403 benchmark we find memset spend more time than memcpy by 5~10
>> times.
>> The benchmark also indicate this patch improve performance from  30% to
>> 100%
>> compared with original __memset_sse2.
>>
>> Ondra, I sent test gcc.403 test suit ,patch for glibc and readme.txt as
>> well.
>>
>> Thanks
>> Ling
>> ---
>> In this version we do clearify vzeroupper instruction to avoid SAVE &
>> STORE Penalty.
>> vpshufb need only one cycle to fill xmm0 register, thanks Ondra.
>>
>>  sysdeps/x86_64/multiarch/Makefile          |   2 +-
>>  sysdeps/x86_64/multiarch/ifunc-impl-list.c |   2 +
>>  sysdeps/x86_64/multiarch/memset-avx2.S     | 202
>> +++++++++++++++++++++++++++++
>>  3 files changed, 205 insertions(+), 1 deletion(-)
>>  create mode 100644 sysdeps/x86_64/multiarch/memset-avx2.S
>>
>> diff --git a/sysdeps/x86_64/multiarch/Makefile
>> b/sysdeps/x86_64/multiarch/Makefile
>> index f92cf18..ae666bf 100644
>> --- a/sysdeps/x86_64/multiarch/Makefile
>> +++ b/sysdeps/x86_64/multiarch/Makefile
>> @@ -18,7 +18,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c
>> strcmp-ssse3 strncmp-ssse3 \
>>  		   strcat-sse2-unaligned strncat-sse2-unaligned \
>>  		   strcat-ssse3 strncat-ssse3 strlen-sse2-pminub \
>>  		   strnlen-sse2-no-bsf strrchr-sse2-no-bsf strchr-sse2-no-bsf \
>> -		   memcmp-ssse3
>> +		   memcmp-ssse3 memset-avx2
>>  ifeq (yes,$(config-cflags-sse4))
>>  sysdep_routines += strcspn-c strpbrk-c strspn-c strstr-c strcasestr-c
>> varshift
>>  CFLAGS-varshift.c += -msse4
>> diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
>> b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
>> index 5639702..24d05d7 100644
>> --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
>> +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
>> @@ -67,12 +67,14 @@ __libc_ifunc_impl_list (const char *name, struct
>> libc_ifunc_impl *array,
>>
>>    /* Support sysdeps/x86_64/multiarch/memset_chk.S.  */
>>    IFUNC_IMPL (i, name, __memset_chk,
>> +	      IFUNC_IMPL_ADD (array, i, __memset_chk, HAS_AVX2,
>> __memset_chk_avx2)
>>  	      IFUNC_IMPL_ADD (array, i, __memset_chk, 1, __memset_chk_sse2)
>>  	      IFUNC_IMPL_ADD (array, i, __memset_chk, 1,
>>  			      __memset_chk_x86_64))
>>
>>    /* Support sysdeps/x86_64/multiarch/memset.S.  */
>>    IFUNC_IMPL (i, name, memset,
>> +	      IFUNC_IMPL_ADD (array, i, memset, HAS_AVX2, __memset_avx2)
>>  	      IFUNC_IMPL_ADD (array, i, memset, 1, __memset_sse2)
>>  	      IFUNC_IMPL_ADD (array, i, memset, 1, __memset_x86_64))
>
> Against which version are you patching? The code above does not exist in
> my version.
>
> Note that we're in freeze for glibc 2.18, so this can only go in once we
> open development of new features for 2.19,
>
> Andreas
> --
>  Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi
>   SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
>    GF: Jeff Hawn,Jennifer Guild,Felix Imendörffer,HRB16746 (AG Nürnberg)
>     GPG fingerprint = 93A3 365E CE47 B889 DF7F  FED1 389A 563C C272 A126
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]