This is the mail archive of the
newlib@sourceware.org
mailing list for the newlib project.
Re: faster memset
Aaron J. Grier <aaron <at> frye.com> writes:
>
> On Thu, May 22, 2008 at 04:56:54PM +0000, Eric Blake wrote:
> > My patched assembly is no longer sensitive to alignment, and always
> > gets the speed of 8-byte alignment. This clinches it - for memset,
> > x86 assembly is noticeably faster than C.
>
> have you done comparisons with the builtin memset() in recent versions
> of gcc?
>
I was testing with gcc 3.4.4, which does have __builtin_memset. But my
understanding is that __builtin_memset defers to the library function on cases
it cannot optimize at compile time? At any rate, my test app called the
library function via a function pointer - does __builtin_memset even have an
address to be used via a function pointer?
If I understand it correctly, __builtin_memset(ptr,0,8) is a good example of
where the compiler optimization helps (it is faster to open-code two 32-bit
writes than to call a function), in which case that is faster than anything I
can code in assembly. But __builtin_memset(ptr,0,1000), even though 1000 is
constant, starts to be such a large amount of open-coded assignments that the
compiler probably falls back to the library routine anyway, probably trusting
that the library knows more architecture tricks for efficiency than what you
can represent generically in gcc's builtin definition table. Finally,
__builtin_memset(ptr,0,len) cannot be optimized, since len is not known at
compile time, so the compiler must fall back on the library.
In other words, by comparing against __builtin_memset, wouldn't I merely be
comparing against my own implementation for most of the interesting cases?
--
Eric Blake