This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH RFC V2] Improve 64bit memcpy/memove for Corei7 with unaligned avx instruction

From: OndÅej BÃlka <neleai at seznam dot cz>
To: ling dot ma dot program at gmail dot com
Cc: libc-alpha at sourceware dot org, liubov dot dmitrieva at gmail dot com, Ma Ling <ling dot ml at alibaba-inc dot com>
Date: Fri, 12 Jul 2013 06:36:08 +0200
Subject: Re: [PATCH RFC V2] Improve 64bit memcpy/memove for Corei7 with unaligned avx instruction
References: <1373547096-8095-1-git-send-email-ling dot ma dot program at gmail dot com>

On Thu, Jul 11, 2013 at 08:51:36AM -0400, ling.ma.program@gmail.com wrote:
> From: Ma Ling <ling.ml@alibaba-inc.com>
> 
> +L(gobble_big_data_bwd):
> +	sub	$0x80, %rdx
> +L(gobble_mem_bwd_loop):
> +	prefetcht0 -0x1c0(%rsi)
> +	prefetcht0 -0x280(%rsi)
> +	vmovups	-0x10(%rsi), %xmm0
> +	vmovups	-0x20(%rsi), %xmm1
> +	vmovups	-0x30(%rsi), %xmm2
> +	vmovups	-0x40(%rsi), %xmm3
> +	vmovntdq	%xmm0, -0x10(%rdi)
> +	vmovntdq	%xmm1, -0x20(%rdi)
> +	vmovntdq	%xmm2, -0x30(%rdi)
> +	vmovntdq	%xmm3, -0x40(%rdi)
> +	vmovups	-0x50(%rsi), %xmm0
> +	vmovups	-0x60(%rsi), %xmm1
> +	vmovups	-0x70(%rsi), %xmm2
> +	vmovups	-0x80(%rsi), %xmm3
> +	lea	-0x80(%rsi), %rsi
> +	vmovntdq	%xmm0, -0x50(%rdi)
> +	vmovntdq	%xmm1, -0x60(%rdi)
> +	vmovntdq	%xmm2, -0x70(%rdi)
> +	vmovntdq	%xmm3, -0x80(%rdi)
> +	lea	-0x80(%rdi), %rdi
> +	sub	$0x80, %rdx
> +	jae	L(gobble_mem_bwd_loop)
> +	sfence

Wait doing prefetching memory at read and having nontemporal stores?
These aims are contradictory and if you want best memcpy performance
do not use nontemporal store and when we do not want to trash cache we
do not use prefetch and load use nontemporal loads.

Also following code does not use avx. Is it intentional or could it
improve performance?

Follow-Ups:
- Re: [PATCH RFC V2] Improve 64bit memcpy/memove for Corei7 with unaligned avx instruction
  - From: Ling Ma

References:
- [PATCH RFC V2] Improve 64bit memcpy/memove for Corei7 with unaligned avx instruction
  - From: ling . ma . program

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]