This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: strstr, strcasestr speedup; add memmem


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Duane Ellis on 1/10/2008 6:08 PM:
| Eric Blake wrote:
|> OK to commit?
|>
| to be clear, i did not try these, just a quick 2 minute glance.
|
| Optimizations like this, for some are helpful, for others - painful.
| Newlib is still found on _tiny_ systems with very limited code space.
|
| Thank you for including _OPTIMIZE_SPEED_ in this.

Actually, I used this idiom, borrowed from strchr:

#if !defined(PREFER_SIZE_OVER_SPEED) && !defined(__OPTIMIZE_SIZE__)
~ lots of code but linear speed on all possible input
#else
~ compact code, but quadratic speed on worst-case input
#endif

but yes, I intentionally made my patch have the smallest code footprint
possible for platforms that are concerned about code size.  Quadratic
effects only come into play when the needle is long and its prefix is
repetitive; oftentimes, if you are on a memory-constrained system and want
the __OPTIMIZE_SIZE__ version, you are unlikely to be using arbitrary
needles in the first place, so you wouldn't notice much speed difference
between the two versions because you are never calling these functions
with problematic arguments.

On the flip side, newlib clients that don't care much about code size, but
do care about searching through arbitrary data not known at compile time
(for example, cygwin), get the benefit of the guaranteed linear execution
time; without this patch, you could launch a form of denial-of-service
attack by crafting strings that will lock the processor up in needless
searching loops.  For an example of sniffing whether your system's strstr
is quadratic (and note that even glibc 2.6.1 has this bug), you can do
things like:

$ time perl -e "print \"index(\" . 'a'x200000 . \"b, \" . 'a'x100000 .
\"b)\"" | m4
100000
real	0m14.436s
user	0m13.889s
sys	0m0.061s

$ time perl -e "print \"index(\" . 'a'x400000 . \"b, \" . 'a'x100000 .
\"b)\"" | m4
300000
real	0m43.363s
user	0m42.405s
sys	0m0.091s

Ouch - doubling the search space resulted in more than triple the search
time.  But with a linear strstr, the above example completes in less than
a second.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             ebb9@byu.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHhsdI84KuGfSFAYARAvzxAKCHgrDoQfexkWh7Qy7+sjeSm/yY4QCgrMyF
2i3ynRY5RHLzot10WYbADwI=
=nwtN
-----END PGP SIGNATURE-----


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]