This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH RFC] Improve 64bit memset for Corei7 with avx2 instruction


On Tue, Jul 16, 2013 at 09:35:39PM +0800, Ling Ma wrote:
> >> +	sub	$0x80, %rdx
> >> +L(gobble_128_loop):
> >> +	prefetcht0 0x1c0(%rdi)
> >> +	vmovaps	%ymm0, (%rdi)
> >> +	prefetcht0 0x280(%rdi)
> > you should be so aggressive with prefetches when you know how much data
> > you use. This fetches unnecessary data which can double cache usage and
> > generaly slow us down.
> Haswell could issue 2 loads & 1 store in one cycle, so we can use it
> to prefetch our data if data is not in cache, even though the data is
> in L1 cache without hurting performance, our experiments also proved
> it.
> 
That experiments proved nothing. A benchmark below shows that
prefetching data that you do not need can degrade performance by 30%. It
is simple to fix so you should do it.

Otherwise you need to prove that benefits of prefetching are bigger than
risk described above. 

A whole program benchmarking is only way to do it, measuring only time
spend in memset is not acceptable here due of scenarios:

1. compute something that occupies 1/2 of L1 cache
2. do lot of memsets to initialize structures
  nonprefetching: memset fills second 1/2 of L1 cache
  prefetching:    memset fills whole L1 cache evicting data from 1.
3. Compute something with data from 1.

A time spend in step 2 is nearly identical in both scenarios yet
when we account time spend in 1 and 3 prefetching one will come worse
than nonprefetching one.

size: 32000
0.29	0.29
0.29	0.29
0.29	0.29
0.29	0.29
0.29	0.29
0.29	0.29
0.29	0.29
0.29	0.29
0.29	0.29
0.29	0.29
size: 256000
0.33	0.33
0.33	0.33
0.33	0.33
0.33	0.33
0.33	0.33
0.33	0.33
0.33	0.33
0.33	0.33
0.33	0.33
0.33	0.33
size: 1024000
0.35	0.36
0.35	0.36
0.35	0.36
0.35	0.36
0.35	0.36
0.35	0.36
0.35	0.36
0.35	0.36
0.35	0.36
0.35	0.36
size: 204800
0.34	0.35
0.34	0.35
0.34	0.35
0.34	0.35
0.35	0.35
0.34	0.35
0.34	0.35
0.34	0.35
0.34	0.35
0.34	0.35
size: 4048000
0.67	0.81
0.67	0.79
0.67	0.80
0.67	0.79
0.67	0.80
0.67	0.80
0.67	0.81
0.67	0.81
0.68	0.81
0.68	0.80
size: 8096000
1.00	1.33
1.00	1.33
0.99	1.33
0.99	1.33
0.99	1.33
0.99	1.33
1.00	1.34
0.99	1.33
0.99	1.33
0.99	1.33


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]