This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH RFC 2/2 V3] Improve 64bit memset for Corei7 with avx2 instruction


On Tue, Jul 30, 2013 at 02:19:07PM +0200, OndÅej BÃlka wrote:
> On Mon, Jul 29, 2013 at 05:42:02AM -0400, ling.ma.program@gmail.com wrote:
> > From: Ma Ling <ling.ml@alibaba-inc.com>
> > 
> > In this patch we use the similar approach with memcpy to
> > avoid branch instructions and force destination to be aligned
> > with avx instruction. By gcc.403 benchmark we find memset
> > spend more time than memcpy by 5~20 times.
> > 
> Another issue is if a big loop is really needed. I tested variant with
> big loop disabled on ivy bridge and for sizes upto 262144 performance is
> about same but from that a rep movsb becomes 20% faster.
> 
> Ljuba, could you test also this case?
> 
Here is third testcase,

Other problem of nontemporal loads is that some applications assume that
memset will prefetch data. From gcc-4.8 onwards even following loops are
replaced by memset. 

char *foo(){
  int *x=malloc(4000000);
  int i;
  for (i=0;i<1000000;i++)
    x[i]=0;
  return x;
}

I could increase penalty caused by nontemporals to 50% with simulated
initialization of hash table, I used following to test.

   char *loc=ary+rand_r(&seed)%100000000;
    memset2(loc,0,SIZE);
    k=0;
    for (j=0;j<SIZE/10;j++){
     loc[k]=loc[k+1]+1;
     k=(11*k+7)%SIZE;
    }

On ivy bridge regression happens here (small uses only prefetching loop,
no stosb or nontemporals.)

original stosb small
0.31	0.32	0.35
0.31	0.30	0.35
0.30	0.30	0.35
0.30	0.32	0.36
0.30	0.31	0.35
0.29	0.31	0.35
0.31	0.32	0.36
0.31	0.31	0.36
0.30	0.31	0.34
0.32	0.31	0.36
size: 524288
original stosb small
0.57	0.33	0.37
0.56	0.33	0.37
0.55	0.33	0.36
0.55	0.32	0.36
0.55	0.33	0.36
0.54	0.33	0.38
0.55	0.34	0.37
0.54	0.33	0.37
0.57	0.32	0.36
0.56	0.33	0.38

Attachment: memset_hash.tar.bz2
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]