This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [WIKI] Optimization overview.
- From: "Ryan S. Arnold" <ryan dot arnold at gmail dot com>
- To: David Miller <davem at davemloft dot net>
- Cc: Ondrej Bilka <neleai at seznam dot cz>, libc-alpha <libc-alpha at sourceware dot org>, Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>, Will Schmidt <will_schmidt at vnet dot ibm dot com>
- Date: Mon, 18 Feb 2013 14:22:25 -0600
- Subject: Re: [WIKI] Optimization overview.
- References: <20130218172258.GA31978@domone.kolej.mff.cuni.cz><20130218.151053.1513159742443477770.davem@davemloft.net>
On Mon, Feb 18, 2013 at 2:10 PM, David Miller <davem@davemloft.net> wrote:
> From: OndÅej BÃlka <neleai@seznam.cz>
> Date: Mon, 18 Feb 2013 18:22:58 +0100
>
>> I wrote about common string optimizations at
>> http://sourceware.org/glibc/wiki/Optimizations/string_functions
>> I put there control flow that is fastest in practice among those I
>> tried (there are several possible tradeoffs which I will write about).
>> I plan use it for most subsequent patches.
>>
>>
>> What can be covered relatively generaly?
>> Comments?
>
> Thanks for writing this up, it is of great value.
>
> On Sparc v9 32-bit and 64-bit I do the unaligned initial word in
> parallel with the loop setup. This seemed to be the most optimal.
>
> You simply align the pointer and do an aligned load, and use a 'or' to
> set all of the bytes in the word up to the actual starting byte to
> one. If the initial pointer is aligned, the mask cancels out and we
> pre-process one full word upon entry to the loop.
>
> PowerPC uses this method as well, as that's where I got the idea
> from.
Adhemerval Zanella and Will Schmidt should take a look at this and
comment as well. Both of them have worked extensively in this space.
Ryan