This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.


I checked performance on machine with SSE4_1 and without SSE4_2.
So, SSE4_1 version is faster than SSSE3 on the machine because of fast
unaligned loads and stuff like that.
I agree that SSE 4.1 is not really needed, we can just replace ptest
with "pmovmskb + test" pair and
performance will be nearly identical and call the implementation as
memcmp_sse2_unaligned version.
Then it will look similar as strcpy, memcpy, e.t.c. dispatching.

--
Liubov

On Thu, Jul 25, 2013 at 2:22 AM, Matt Turner <mattst88@gmail.com> wrote:
> On Thu, Jul 11, 2013 at 7:07 AM, Liubov Dmitrieva
> <liubov.dmitrieva@gmail.com> wrote:
>> My Silvermont patch in the latest edition doesn't touch memcmp and
>> wmemcmp at all because I didn't see good boost from switching SSE42
>> off for these 2 functions.
>> Now I see why. There are no SSE42 instruction there. :)
>> The patch looks good. I will just check performance regressions for Penryn.
>
> Any performance numbers?

Attachment: bench-memcmp-ifunc.out
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]