This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Testing 2.16 release candidate with gnulib - quadratic behaviour detected in SSE42 strstr?


Hi Carlos,

> (d) strstr
> 
> This is the one that worries me.
> 
> configure:121073: checking whether strstr works in linear time
> configure:121161: gcc -lm -Wl,-rpath=/scratch/carloso/build4-lucid-cs/install//lib64:/scratch/carloso/build4-lucid-cs/install//usr/lib64 -Wl,--dynamic-linker=/scratch/carloso/build4-lucid-cs/install//lib64/ld-linux-x86-64.so.2 -std=gnu99 -o conftest -g -O2 -nostdinc -I/usr/local/tools/gcc-4.3.3/lib/gcc/x86_64-unknown-linux-gnu/4.3.2/include-fixed -I/scratch/carloso/build4-lucid-cs/install//usr/include -I/usr/local/tools/gcc-4.3.3/lib/gcc/x86_64-unknown-linux-gnu/4.3.2/include -Wall  conftest.c  >&5
> configure:121165: $? = 0
> configure:121171: ./conftest
> configure:121175: $? = 142
> configure: program exited with status 142
> configure: failed program was:
> | /* confdefs.h.  */
> ...
> | /* end confdefs.h.  */
> |
> | #include <signal.h> /* for signal */
> | #include <string.h> /* for strstr */
> | #include <stdlib.h> /* for malloc */
> | #include <unistd.h> /* for alarm */
> | static void quit (int sig) { exit (sig + 128); }
> |
> | int
> | main ()
> | {
> |
> |     int result = 0;
> |     size_t m = 1000000;
> |     char *haystack = (char *) malloc (2 * m + 2);
> |     char *needle = (char *) malloc (m + 2);
> |     /* Failure to compile this test due to missing alarm is okay,
> |        since all such platforms (mingw) also have quadratic strstr.  */
> |     signal (SIGALRM, quit);
> |     alarm (5);
> |     /* Check for quadratic performance.  */
> |     if (haystack && needle)
> |       {
> |         memset (haystack, 'A', 2 * m);
> |         haystack[2 * m] = 'B';
> |         haystack[2 * m + 1] = 0;
> |         memset (needle, 'A', m);
> |         needle[m] = 'B';
> |         needle[m + 1] = 0;
> |         if (!strstr (haystack, needle))
> |           result |= 1;
> |       }
> |     return result;
> |
> |   ;
> |   return 0;
> | }
> configure:121193: result: no
> ...
> user    4m21.060s

Yes, if an strstr implementation takes more than 4 minutes to search for
a 100000 byte long string in a 2000000 byte long string, it's quadratic
behaviour.

The O(n) worst-case algorithm with O(1) intermediate storage was introduced
in string/strstr.c in 2008. But sysdeps/x86_64/multiarch/strstr.c was added
in 2009; it looks like it intends to use Knuth-Morris-Pratt only on small
substrings (the original Knuth-Morris-Pratt is O(n) worst-case and uses
O(n) intermediate storage) and therefore is O(n²).

Bruno


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]