This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Example of optimized strlen


I attach a fast strlen that I wrote and a commented version from glibc's
CVS repository.  The comments include cycle counts and highlight
three partial register stalls.

Here are the results for a Pentium (counting clocks for the P6 is
difficult, but take into account that up to 12 clocks are lost for the
partial register stalls on the P6 in the finalization, and that *each*
iteration of the inner loop loses 6 clocks because of the other stall).
I'm not considering cache misses nor branch mispredictions.

                        		 my strlen 	glibc strlen
---------------------------------------------------------------------
startup if aligned   			     2		    2
startup if misaligned (worst case)	     7		   12
---------------------------------------------------------------------
inner loop    				     n		 1.25*n
---------------------------------------------------------------------
finalization (worst case)		     9		    9
---------------------------------------------------------------------

The startup costs are better in my version, as is the inner loop's
timing.

(My strlen has no support for bounded pointers yet).

Paolo

strlen.S

glibc-strlen.S


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]