This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Here is a memcmp implementation for the i386. From timing the test program it looks faster than the GCC builtin memcmp (a simple "repe cmpsb") and than the glibc implementation of memcmp (also a simple "repe cmpsb" but with additional call/return overhead). Though I have not timed it, it should also be a lot faster than sysdeps/generic/memcmp.c because this has a register pressure that makes it impossible to make it fast on the x86 architecture. utente@engineer:~/esperimenti$ gcc -g -O3 -fno-builtin test.c memcmp.S utente@engineer:~/esperimenti$ time ./a.out real 0m0.088s user 0m0.090s sys 0m0.000s utente@engineer:~/esperimenti$ gcc -g -O3 -fno-builtin test.c utente@engineer:~/esperimenti$ time ./a.out real 0m0.108s user 0m0.100s sys 0m0.010s utente@engineer:~/esperimenti$ gcc -g -O3 test.c utente@engineer:~/esperimenti$ time ./a.out real 0m0.102s user 0m0.100s sys 0m0.010s Notes: 1) the ideas are the same behind memcmp.c, but the implementation was heavily simplified (no loop unrolling, use of the shrdl instruction even though it might be suboptimal on the i586) to decrease register pressure. 2) this is not meant to be optimized for a particular arch, so it should be good for sysdeps/i386/i686. By taking a more careful look at pairing instructions, it can easily be adapted to work well for sysdeps/i386/i586 too. 3) I can send papers, add headers, and do more things that are like a `real' patch if it is accepted. 4) If this is accepted, I would disable inlining memcmp in GCC because it is a lot slower. It would also be useful to remove the memcmp optimization in bits/string.h and bits/string2.h, or to redo it so that the first few bytes are compared and then the real meat is still done with the faster algorithm. 5) The improvement (20%) is consistent across several runs, and also with the performance pitfall that I saw when using memcmp in the regex routines. So the current implementation of memcmp for i386 architectures is decidedly suboptimal. Paolo Bonzini
Attachment:
memcmp.S
Description: Binary data
Attachment:
test.c
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |