This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH RFC] Improve 64bit memset for Corei7 with avx2 instruction


On Tue, Jul 30, 2013 at 08:49:58PM +0800, Ling Ma wrote:
> Ljuba could you please test our patch on haswell with gcc.403 we sent.
> I also will test it to compare among, without prfetch, or with
> prefetchw and prefetcht0,
> gcc.403 benchmark should be more reliable and stringency.
> 
Are you sure? In my testcases memset_big and memset_hash we seen a 30%
performance regression. 

A memory access pattern of memset in both cases are nearly identical.

When we run it with your tool it should find regression. Otherwise it
does not report data related with reality.
 
Ljuba, Could you try test them. First you need to compile files

gcc -O2 memset_big.c -o memset_big
gcc -O2 memset_hash.c -o memset_hash

Then at step 12. in readme.txt also run

./memset_big
./memset_hash


We want to minimize time of program runs. Best way to measure it is to
measure how long it took program to complete. 
It has major disadvantage that for deterministic programs you need to 
run them for days to reduce noise and get statistically significant
results.

Then a simplification that I did and you also do is to measure only time
spend in function that changed instead of entire time. When you can be
sure that your modifications do not change running time of rest of
program much then you can get results much faster and for much wider
range of programs.

This is not case here as prefetching changes memory layout which changes
running time of rest of program so only first alternative is advisable.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]