This is the mail archive of the
libc-ports@sources.redhat.com
mailing list for the libc-ports project.
Re: [ARM] Optimised strchr and strlen
On 12/21/2011 02:55 AM, David Gilbert wrote:
> That 'simple' one is showing the benefit at the short lengths,
> the 'smarter' one I have is doing 8 bytes/loop and is nice on the long
> strings - but as you can see worse at the short ones.
Having not seen your "smarter" strchr, it's hard to suggest anything
concrete. I'd have thought that there's enough slack in load delay
that one or two arithmetic operations could be done without penalty...
Something like performing a simple compare loop looking for "alignment plus":
...
bic r3, r0, #7
and r1, r1, #255
adds r3, r3, #32
1:
ldrb r2, [r0],#1
cmp r2, r1
cbz r2, .Lfound_zero
it ne
cmpne r0, r3
bne 1b
cmp r2, r1
beq .Lfound
@ Here, r0 is aligned. Do something word-based.
...
or even just
and r3, r0, #7
and r1, r1, #255
rsb r3, r3, #32
1:
ldrb r2, [r0],#1
cmp r2, r1
beq .Lfound
subs r3, r3, #1
cbz r2, .Lfound_zero
bne 1b
@ Here, r0 is aligned. Do something word-based.
r~