This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Liubov Dmitrieva <liubov dot dmitrieva at gmail dot com>
Cc: "H.J. Lu" <hjl dot tools at gmail dot com>, Matt Turner <mattst88 at gmail dot com>, Andreas Jaeger <aj at suse dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Fri, 12 Jul 2013 19:05:16 +0200
Subject: Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
References: <51DCE51F dot 7000001 at suse dot com> <CAMe9rOqb3_DnhSh0jPh9=suJo5c+WjegxfDh1+1go6pY+7+PLA at mail dot gmail dot com> <CAEdQ38Go4UY=k==nYT_6S86-tsOoxOO=Wn=8_pNk+LkkxSxU_Q at mail dot gmail dot com> <CAMe9rOpgaNgGSdoM5rXdhLT-TqVEJjGMyHgKRP=t+2LrSTpFAA at mail dot gmail dot com> <CAEdQ38FBeyuJpQ1eSHnM5w=8MHD3cfFjgWekkXnRFHO+Aathnw at mail dot gmail dot com> <CAMe9rOompuMMzQm+RX=ejoPMX0uWmXarvSZa_fp-Fi1p_-8o1Q at mail dot gmail dot com> <CAHjhQ91+RSKU=1F4vQ1XrJ=1j1wAv6HuQJh_s9BzcBOOTP8BDg at mail dot gmail dot com> <20130712030150 dot GA7461 at domone dot PAOCY> <CAHjhQ92CdsOemOAj+k_8gwxmJH5dsmdyNdDepWufrff4AuW1UQ at mail dot gmail dot com> <20130712162050 dot GA12414 at domone dot PAOCY>

On Fri, Jul 12, 2013 at 06:20:50PM +0200, OndÅej BÃlka wrote:
> On Fri, Jul 12, 2013 at 10:12:34AM +0400, Liubov Dmitrieva wrote:
> > Do you mean AMD? For Intel there is no a machine without SSE4_1 where
> > sse2 unaligned version is faster than ssse3.
> >
> Good to know. 
> 
> I looked at sources and found that memcmp is horribly misoptimized as usual.
> 
> As in 70% cases difference is found in first 16 characters and 99% in 64
> characters loop case is cold. 
> 
> This is not much problem when n>48 as starting unaligned comparison handles 
> this effectively for differences in first 16 characters.
> 
> However otherwise there is lot of jumps to choose based on size which is
> ineffective.
> 
> Code also answered what I thought was roadblock and why I did not try to
> optimize memcmp: That n is authoritative and we can seqfault when
> there is unallocated memory after first difference in range specified by
> n.
> 
> I will prepare patch with faster memcmp.
>  
For first 16 characters best I can come with is following:

#define LT  _mm_cmplt_epi8
#define get_mask(x) ((uint64_t) _mm_movemask_epi8 (x))
#define first_bit(x) ((x)^((x)-1))

      tp_vector va = LOADU (a);
      tp_vector vb = LOADU (b);
      tp_vector lt = first_bit (get_mask (LT (va,vb)) | ( 1 << 16));
      tp_vector gt = first_bit (get_mask (LT (vb,va)) | ( 1 << 16));
      if (get_mask (LT (va,vb)) | get_mask (LT (vb,va)))
        return lt-gt; // maybe swapped.
    
It finds first byte that is smaller and first byte that is bigger.
Then it creates byte masks which will come positive/negative based which
of these bytes was bigger.

        movdqu  (%rsi), %xmm0
        movdqu  (%rdi), %xmm1
        movdqa  %xmm0, %xmm2
        pcmpgtb %xmm1, %xmm2
        pcmpgtb %xmm0, %xmm1
        pmovmskb        %xmm2, %edx
        pmovmskb        %xmm1, %eax
        movl    %eax, %ecx
        orl     %edx, %ecx
        je      .L3
        orl     $65536, %edx
        movl    %eax, %ecx
        leal    -1(%rdx), %eax
        orl     $65536, %ecx
        xorl    %edx, %eax
        leal    -1(%rcx), %edx
        xorl    %ecx, %edx
        subl    %edx, %eax
        ret


Comments?

References:
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: Andreas Jaeger
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: H.J. Lu
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: Matt Turner
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: H.J. Lu
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: Matt Turner
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: H.J. Lu
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: Liubov Dmitrieva
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: OndÅej BÃlka
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: Liubov Dmitrieva
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]