This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH RFC V2] Improve 64bit memset for Corei7 with avx2 instruction
- From: Ling Ma <ling dot ma dot program at gmail dot com>
- To: Andreas Jaeger <aj at suse dot com>
- Cc: libc-alpha at sourceware dot org, neleai at seznam dot cz, liubov dot dmitrieva at gmail dot com, Ma Ling <ling dot ml at alibaba-inc dot com>
- Date: Mon, 22 Jul 2013 15:41:51 +0800
- Subject: Re: [PATCH RFC V2] Improve 64bit memset for Corei7 with avx2 instruction
- References: <1373981861-3498-1-git-send-email-ling dot ma dot program at gmail dot com> <51ECDCEE dot 4060402 at suse dot com>
Andreas,
Our code is based on glibc 2.17. Could you please give us one route so
that we can get latest version and update the patch ASAP?
Best Regards
Ling
2013/7/22, Andreas Jaeger <aj@suse.com>:
> On 07/16/2013 03:37 PM, ling.ma.program@gmail.com wrote:
>> From: Ma Ling <ling.ml@alibaba-inc.com>
>>
>> In this patch we use the similar approach with memcpy to avoid branch
>> instructions
>> and force destination to be aligned with avx instruction.
>> By gcc.403 benchmark we find memset spend more time than memcpy by 5~10
>> times.
>> The benchmark also indicate this patch improve performance from 30% to
>> 100%
>> compared with original __memset_sse2.
>>
>> Ondra, I sent test gcc.403 test suit ,patch for glibc and readme.txt as
>> well.
>>
>> Thanks
>> Ling
>> ---
>> In this version we do clearify vzeroupper instruction to avoid SAVE &
>> STORE Penalty.
>> vpshufb need only one cycle to fill xmm0 register, thanks Ondra.
>>
>> sysdeps/x86_64/multiarch/Makefile | 2 +-
>> sysdeps/x86_64/multiarch/ifunc-impl-list.c | 2 +
>> sysdeps/x86_64/multiarch/memset-avx2.S | 202
>> +++++++++++++++++++++++++++++
>> 3 files changed, 205 insertions(+), 1 deletion(-)
>> create mode 100644 sysdeps/x86_64/multiarch/memset-avx2.S
>>
>> diff --git a/sysdeps/x86_64/multiarch/Makefile
>> b/sysdeps/x86_64/multiarch/Makefile
>> index f92cf18..ae666bf 100644
>> --- a/sysdeps/x86_64/multiarch/Makefile
>> +++ b/sysdeps/x86_64/multiarch/Makefile
>> @@ -18,7 +18,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c
>> strcmp-ssse3 strncmp-ssse3 \
>> strcat-sse2-unaligned strncat-sse2-unaligned \
>> strcat-ssse3 strncat-ssse3 strlen-sse2-pminub \
>> strnlen-sse2-no-bsf strrchr-sse2-no-bsf strchr-sse2-no-bsf \
>> - memcmp-ssse3
>> + memcmp-ssse3 memset-avx2
>> ifeq (yes,$(config-cflags-sse4))
>> sysdep_routines += strcspn-c strpbrk-c strspn-c strstr-c strcasestr-c
>> varshift
>> CFLAGS-varshift.c += -msse4
>> diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
>> b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
>> index 5639702..24d05d7 100644
>> --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
>> +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
>> @@ -67,12 +67,14 @@ __libc_ifunc_impl_list (const char *name, struct
>> libc_ifunc_impl *array,
>>
>> /* Support sysdeps/x86_64/multiarch/memset_chk.S. */
>> IFUNC_IMPL (i, name, __memset_chk,
>> + IFUNC_IMPL_ADD (array, i, __memset_chk, HAS_AVX2,
>> __memset_chk_avx2)
>> IFUNC_IMPL_ADD (array, i, __memset_chk, 1, __memset_chk_sse2)
>> IFUNC_IMPL_ADD (array, i, __memset_chk, 1,
>> __memset_chk_x86_64))
>>
>> /* Support sysdeps/x86_64/multiarch/memset.S. */
>> IFUNC_IMPL (i, name, memset,
>> + IFUNC_IMPL_ADD (array, i, memset, HAS_AVX2, __memset_avx2)
>> IFUNC_IMPL_ADD (array, i, memset, 1, __memset_sse2)
>> IFUNC_IMPL_ADD (array, i, memset, 1, __memset_x86_64))
>
> Against which version are you patching? The code above does not exist in
> my version.
>
> Note that we're in freeze for glibc 2.18, so this can only go in once we
> open development of new features for 2.19,
>
> Andreas
> --
> Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn,Jennifer Guild,Felix Imendörffer,HRB16746 (AG Nürnberg)
> GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
>