This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: rawmemchr

From: Eric Blake <ebb9 at byu dot net>
To: newlib at sources dot redhat dot com
Date: Wed, 21 May 2008 22:33:05 +0000 (UTC)
Subject: Re: rawmemchr
References: <483386B5.20507@byu.net> <20080521032228.GA21932@ednor.casa.cgf.cx> <48339D45.3040307@byu.net> <002201c8bb4b$3a5ec5d0$2708a8c0@CAM.ARTIMI.COM> <loom.20080521T153512-174@post.gmane.org>

Eric Blake <ebb9 <at> byu.net> writes:

> Here's my test program:
> 
>       int *ptr = (int *) str;
>       while (!DETECTNULL (*ptr) && !DETECTCHAR (*ptr, mask))
>         ptr++;
>       /* Found the end of string or word containing c.  */
>       str = (const char *) str;

FYI.  That was supposed to be
str = (const char *) ptr;

That one-character flaw in my test app C code resulted in searching the string 
twice (once fast and discarding the progress, then again bytewise).  With the 
test app fixed, I get these even more impressive numbers to demonstrate the 
benefits of my optimization:

> $ time ./foo 1000000 1 0 1000 2 1
> 
> real	0m1.594s
> user	0m1.608s
> sys	0m0.015s
> 
> # C is 3x slower than the current assembly on aligned ptr
> 
> $ time ./foo 1000000 1 0 1000 0 1
> 
> real	0m0.922s
> user	0m0.921s
> sys	0m0.030s
> 
> # But my special casing of strchr(ptr,0) shows > 40% improvement!

$ time ./foo 1000000 1 0 1000 2 1

real	0m0.656s
user	0m0.671s
sys	0m0.015s

Only 17% worse, not 3x worse, than the pre-patched assembly version on aligned 
searches.

$ time ./foo 1000000 1 0 1000 0 1

real	0m0.312s
user	0m0.327s
sys	0m0.015s

Better than pre-patch assembly on searching for 0, comparable to patched 
assembly.

> 
> $ time ./foo 1000000 1 1 1000 2 1
> 
> real	0m1.594s
> user	0m1.608s
> sys	0m0.015s
> 
> $ time ./foo 1000000 1 1 1000 0 1
> 
> real	0m0.921s
> user	0m0.936s
> sys	0m0.015s
> 
> # the C code does not slow down for unaligned str
> # And my C code for strchr(unaligned,0) BEATS the current assembly!
> 

$ time ./foo 1000000 1 1 1000 2 1

real	0m0.656s
user	0m0.671s
sys	0m0.015s

# Better than pre-patched assembly on unaligned data, and only 17% slower than 
patched assembly

$ time ./foo 1000000 1 3 1000 0 1

real	0m0.328s
user	0m0.343s
sys	0m0.000s

# Better than even pre-patched assembly on aligned data, and comparable to 
patched assembly

You have to admit it's pretty cool that the buggy version of the test app doing 
twice the necessary work on strchr(unaligned,0) was outperforming the pre-patch 
assembly.

-- 
Eric Blake

Follow-Ups:
- Re: rawmemchr
  - From: Christopher Faylor

References:
- rawmemchr
  - From: Eric Blake
- Re: rawmemchr
  - From: Eric Blake
- RE: rawmemchr
  - From: Dave Korn
- Re: rawmemchr
  - From: Eric Blake

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]