This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 1/4] Optimize first-character loop of strstr, strcasestrand memmem for short needle.


On 5/30/2012 5:08 AM, Maxim Kuvyrkov wrote:
> [PATCH 1/4] Optimize first-character loop of strstr, strcasestr and memmem for short needle.

Overall I learned a lot about glibc's two-way algorithm implementation. Sorry
for the slow review.

I have reviewed all four of your patches, and there are a couple of minor nits,
but overall it looks great.

*Gets down on knees, hands clasped*
Please please I implore you, can we we get some data up on the wiki?

http://sourceware.org/glibc/wiki/benchmarking/benchmarks

That way we have some data to show that these changes were at performance X,
on target Y, for this release. That way we can show we're meeting our goals
(even if those goals change with time). We can use these datapoints for
comparison with other proposed patches.

Does this make sense to you?
 
> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
> 
> 	[BZ #11607]
> 	* string/str-two-way.h (two_way_short_needle): Optimize matching of
> 	the first character.
> ---
>  string/str-two-way.h |   15 ++++++++++++++-
>  1 files changed, 14 insertions(+), 1 deletions(-)
> 
> diff --git a/string/str-two-way.h b/string/str-two-way.h
> index 1b2a8bd..3791404 100644
> --- a/string/str-two-way.h
> +++ b/string/str-two-way.h

PATCH 2/4 fixes the copyright here. So OK.

> @@ -261,14 +261,27 @@ two_way_short_needle (const unsigned char *haystack, size_t haystack_len,
>      }
>    else
>      {
> +      /* The comparison always starts from needle[suffix], so cache it
> +	 and use an optimized first-character loop.  */
> +      unsigned char needle_suffix = CANON_ELEMENT (needle[suffix]);
> +

OK.

>        /* The two halves of needle are distinct; no extra memory is
>  	 required, and any mismatch results in a maximal shift.  */
>        period = MAX (suffix, needle_len - suffix) + 1;
>        j = 0;
>        while (AVAILABLE (haystack, haystack_len, j, needle_len))
>  	{
> +	  /* TODO: The first-character loop can be sped up by adapting
> +	     longword-at-a-time implementation of memchr/strchr.  */
> +	  if (needle_suffix
> +	      != CANON_ELEMENT (haystack[suffix + j]))
> +	    {
> +	      ++j;
> +	      continue;
> +	    }
> +

OK.

>  	  /* Scan for matches in right half.  */
> -	  i = suffix;
> +	  i = suffix + 1;

OK. Though I can't quite figure out why we went from suffix => suffix +1.

>  	  while (i < needle_len && (CANON_ELEMENT (needle[i])
>  				    == CANON_ELEMENT (haystack[i + j])))
>  	    ++i;
> 

This patch is OK. I can't tell if your mailer is killing the tab/spaces 
or not, please feel free to send as attachments or fixup as appropriate.

Cheers,
Carlos.
-- 
Carlos O'Donell
Mentor Graphics / CodeSourcery
carlos_odonell@mentor.com
carlos@codesourcery.com
+1 (613) 963 1026


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]