This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 20/26] arm: Implement armv6t2 optimized strlen


Richard Henderson <rth@twiddle.net> writes:

> +ENTRY(strlen)
> +	@ r0 = start of string
> +	pld	[r0]
> +
> +	@ To cater to long strings, we want to search through a few
> +	@ characters until we reach an aligned pointer.  To cater to
> +	@ small strings, we don't want to start doing word operations
> +	@ immediately.  The compromise is a maximum of 16 bytes less
> +	@ whatever is required to end with an aligned pointer.
> +	@ r3 = number of characters to search in alignment loop
> +	and	r3, r0, #7
> +	s(mov)	r1, r0			@ Save the input pointer
> +	rsb	r3, r3, #16
> +
> +	@ Loop until we find ...
> +1:	ldrb	r2, [r0], #1
> +	subs	r3, r3, #1		@ ... the aligment point
> +	it	ne
> +	cmpne	r2, #0			@ ... or EOS
> +	bne	1b
> +
> +	@ Disambiguate the exit possibilites above
> +	cmp	r2, #0			@ Found EOS
> +	ittt	eq
> +	subeq	r0, r0, #1		@ Undo post-inc above
> +	subeq	r0, r0, r1		@ Subtract input to compute length
> +	bxeq	lr
> +
> +	@ So now we're aligned.
> +	ldrd	r2, r3, [r0], #8
> +	movw	ip, #0xfefe
> +	pld	[r0, #64]
> +	movt	ip, #0xfefe
> +	pld	[r0, #128]
> +	pld	[r0, #192]
> +
> +	@ Loop searching for EOS or C, 8 bytes at a time.

This comment seems to be for strchr().

> +	@ Adding (unsigned saturating) 0xfe means result of 0xfe for any byte
> +	@ that was originally zero and 0xff otherwise.  Therefore we consider
> +	@ the lsb of each byte the "found" bit, with 0 for a match.
> +	.balign	16
> +2:	uqadd8	r2, r2, ip		@ Find EOS
> +	uqadd8	r3, r3, ip
> +	pld	[r0, #256]		@ Prefetch 4 lines ahead
> +	s(and)	r3, r3, r2		@ Combine the two words
> +	mvns	r3, r3			@ Test for any found bit true
> +	it	eq
> +	ldrdeq	r2, r3, [r0], #8
> +	beq	2b

Subtracting the values (with UQSUB8) from 1 instead would result in a 0
result any non-zero input and a 1 for "found", i.e. the inverse of what
you have here.  Testing for a match anywhere in the double-word then
becomes a single ORRS instruction.  Unless I'm making some stupid mistake.

> +	@ Found something.  Disambiguate between first and second words.
> +	@ Adjust r0 to point to the word containing the match.
> +	@ Adjust r2 to the found bits for the word containing the match.
> +	mvns	r2, r2
> +	itee	ne
> +	subne	r0, r0, #8
> +	moveq	r2, r3
> +	subeq	r0, r0, #4
> +
> +	@ Find the bit-offset of the match within the word.
> +#ifdef __ARMEL__
> +	rbit	r2, r2			@ For LE we need count-trailing-zeros
> +#endif
> +	clz	r2, r2
> +	add	r0, r0, r2, lsr #3	@ Adjust the pointer to the found byte
> +	s(sub)	r0, r0, r1		@ Subtract input to compute length
> +	bx	lr
> +
> +END(strlen)

This code could be made to work for any ARMv6 by (conditionally)
replacing the MOVW/MOVT with some equivalent and the RBIT by REV.  REV
works since only the lsb in each byte can be set, so the result of CLZ
will simply be 7 more than we want, and the 3 low-order bits are shifted
out anyway.

-- 
Måns Rullgård
mans@mansr.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]